Configuration

Configuration is done via YAML or JSON files or http api ressources. Logprep searches for the file /etc/logprep/pipeline.yml if no configuration file is passed.

You can pass multiple configuration files via valid file paths or urls.

Valid Run Examples
logprep run /different/path/file.yml
logprep run http://url-to-our-yaml-file-or-api
logprep run http://api/v1/pipeline http://api/v1/addition_processor_pipline /path/to/conector.yaml

Security Best Practice - Configuration - Combining multiple configuration files

Consider when using multiple configuration files logprep will reject all configuration files if one can not be retrieved or is not valid. If using multiple files ensure that all can be loaded safely and that all endpoints (if using http resources) are accessible.

Configuration File Structure

Example of a complete configuration file
version: config-1.0
process_count: 2
timeout: 5
logger:
    level: INFO
input:
    kafka:
        type: confluentkafka_input
        topic: consumer
        offset_reset_policy: smallest
        kafka_config:
            bootstrap.servers: localhost:9092
            group.id: test
output:
    kafka:
        type: confluentkafka_output
        topic: producer
        error_topic: producer_error
        flush_timeout: 30
        send_timeout: 2
        kafka_config:
            bootstrap.servers: localhost:9092
pipeline:
- labelername:
    type: labeler
    schema: quickstart/exampledata/rules/labeler/schema.json
    include_parent_labels: true
    specific_rules:
        - quickstart/exampledata/rules/labeler/specific
    generic_rules:
        - quickstart/exampledata/rules/labeler/generic

- dissectorname:
    type: dissector
    specific_rules:
        - quickstart/exampledata/rules/dissector/specific/
    generic_rules:
        - quickstart/exampledata/rules/dissector/generic/

- dropper:
    type: dropper
    specific_rules:
        - quickstart/exampledata/rules/dropper/specific
    generic_rules:
        - quickstart/exampledata/rules/dropper/generic
        - filter: "test_dropper"
        dropper:
            drop:
            - drop_me
        description: "..."

- pre_detector:
    type: pre_detector
    specific_rules:
        - quickstart/exampledata/rules/pre_detector/specific
    generic_rules:
        - quickstart/exampledata/rules/pre_detector/generic
    outputs:
        - opensearch: sre
    tree_config: quickstart/exampledata/rules/pre_detector/tree_config.json
    alert_ip_list_path: quickstart/exampledata/rules/pre_detector/alert_ips.yml

- amides:
    type: amides
    specific_rules:
        - quickstart/exampledata/rules/amides/specific
    generic_rules:
        - quickstart/exampledata/rules/amides/generic
    models_path: quickstart/exampledata/models/model.zip
    num_rule_attributions: 10
    max_cache_entries: 1000000
    decision_threshold: 0.32

- pseudonymizer:
    type: pseudonymizer
    pubkey_analyst: quickstart/exampledata/rules/pseudonymizer/example_analyst_pub.pem
    pubkey_depseudo: quickstart/exampledata/rules/pseudonymizer/example_depseudo_pub.pem
    regex_mapping: quickstart/exampledata/rules/pseudonymizer/regex_mapping.yml
    hash_salt: a_secret_tasty_ingredient
    outputs:
        - opensearch: pseudonyms
    specific_rules:
        - quickstart/exampledata/rules/pseudonymizer/specific/
    generic_rules:
        - quickstart/exampledata/rules/pseudonymizer/generic/
    max_cached_pseudonyms: 1000000

- calculator:
    type: calculator
    specific_rules:
        - filter: "test_label: execute"
        calculator:
            target_field: "calculation"
            calc: "1 + 1"
    generic_rules: []

The options under input, output and pipeline are passed to factories in Logprep. They contain settings for each separate processor and connector. Details for configuring connectors are described in Output and Input and for processors in Processors.

It is possible to use environment variables in all configuration and rule files in all places. Environment variables have to be set in uppercase and prefixed with LOGPREP_, GITHUB_, PYTEST_ or CI_. Lowercase variables are ignored. Forbidden variable names are: ["LOGPREP_LIST"], as it is already used internally.

Security Best Practice - Configuration Environment Variables

As it is possible to replace all configuration options with environment variables it is recommended to use these especially for sensitive information like usernames, password, secrets or hash salts. Examples where this could be useful would be the key for the hmac calculation (see input > preprocessing) or the user/secret for the elastic-/opensearch connectors.

The following config file will be valid by setting the given environment variables:

pipeline.yml config file with environment variables
version: $LOGPREP_VERSION
process_count: $LOGPREP_PROCESS_COUNT
timeout: 0.1
logger:
    level: $LOGPREP_LOG_LEVEL
$LOGPREP_PIPELINE
$LOGPREP_INPUT
$LOGPREP_OUTPUT
setting the bash environment variables
export LOGPREP_VERSION="1"
export LOGPREP_PROCESS_COUNT="1"
export LOGPREP_LOG_LEVEL="DEBUG"
export LOGPREP_PIPELINE="
pipeline:
    - labelername:
        type: labeler
        schema: quickstart/exampledata/rules/labeler/schema.json
        include_parent_labels: true
        specific_rules:
            - quickstart/exampledata/rules/labeler/specific
        generic_rules:
            - quickstart/exampledata/rules/labeler/generic"
export LOGPREP_OUTPUT="
output:
    kafka:
        type: confluentkafka_output
        topic: producer
        error_topic: producer_error
        flush_timeout: 30
        send_timeout: 2
        kafka_config:
            bootstrap.servers: localhost:9092"
export LOGPREP_INPUT="
input:
    kafka:
        type: confluentkafka_input
        topic: consumer
        offset_reset_policy: smallest
        kafka_config:
            bootstrap.servers: localhost:9092
            group.id: test"
class logprep.util.configuration.Configuration

the configuration class

version: str

It is optionally possible to set a version to your configuration file which can be printed via logprep run --version config/pipeline.yml. This has no effect on the execution of logprep and is merely used for documentation purposes. Defaults to unset.

config_refresh_interval: int | None

Configures the interval in seconds on which logprep should try to reload the configuration. If not configured, logprep won’t reload the configuration automatically. If configured the configuration will only be reloaded if the configuration version changes. If http errors occurs on configuration reload config_refresh_interval is set to a quarter of the current config_refresh_interval until a minimum of 5 seconds is reached. Defaults to None, which means that the configuration will not be refreshed.

Security Best Practice - Configuration Refresh Interval

The refresh interval for the configuration shouldn’t be set too high in production environments. It is suggested to not set a value higher than 300 (5 min). That way configuration updates are propagated fairly quickly instead of once a day.

It should also be noted that a new configuration file will be read as long as it is a valid config. There is no further check to ensure credibility.

In case a new configuration could not be retrieved successfully and the config_refresh_interval is already reduced automatically to 5 seconds it should be noted that this could lead to a blocking behavior or an significant reduction in performance as logprep is often retrying to reload the configuration. Because of that ensure that the configuration endpoint is always available.

process_count: int

Number of logprep processes to start. Defaults to 1.

timeout: float

Logprep tries to react to signals (like sent by CTRL+C) within the given time. The time taken for some processing steps is not always predictable, thus it is not possible to ensure that this time will be adhered to. However, Logprep reacts quickly for small values (< 1.0), but this requires more processing power. This can be useful for testing and debugging. Larger values (like 5.0) slow the reaction time down, but this requires less processing power, which makes in preferable for continuous operation. Defaults to 5.0.

logger: dict

Logger configuration. Defaults to {"level": "INFO"}.

Security Best Practice - Logprep Log-Level

The loglevel of logprep should be set to "INFO" in production environments, as the "DEBUG" level could expose sensitive events into the log.

input: dict

Input connector configuration. Defaults to {}. For detailed configurations see Input.

output: dict

Output connector configuration. Defaults to {}. For detailed configurations see Output.

pipeline: list[dict]

Pipeline configuration. Defaults to []. See Processors for a detailed overview on how to configure a pipeline.

metrics: MetricsConfig

Metrics configuration. Defaults to {"enabled": False, "port": 8000, "uvicorn_config": {}}.

The key uvicorn_config can be configured with any uvicorn config parameters. For further information see the uvicorn documentation.

Security Best Practice - Metrics Configuration

Additionaly to the below it is recommended to configure ssl on the metrics server endpoint

metrics:
  enabled: true
  port: 9000
  uvicorn_config:
    access_log: true
    server_header: false
    date_header: false
    workers: 1
profile_pipelines: bool

Start the profiler to profile the pipeline. Defaults to False.