Configuration
Configuration is done via YAML or JSON files or http api ressources.
Logprep searches for the file /etc/logprep/pipeline.yml
if no
configuration file is passed.
You can pass multiple configuration files via valid file paths or urls.
logprep run /different/path/file.yml
logprep run http://url-to-our-yaml-file-or-api
logprep run http://api/v1/pipeline http://api/v1/addition_processor_pipline /path/to/conector.yaml
Security Best Practice - Configuration - Combining multiple configuration files
Consider when using multiple configuration files logprep will reject all configuration files if one can not be retrieved or is not valid. If using multiple files ensure that all can be loaded safely and that all endpoints (if using http resources) are accessible.
Configuration File Structure
version: config-1.0
process_count: 2
timeout: 5
logger:
level: INFO
input:
kafka:
type: confluentkafka_input
topic: consumer
offset_reset_policy: smallest
kafka_config:
bootstrap.servers: localhost:9092
group.id: test
output:
kafka:
type: confluentkafka_output
topic: producer
error_topic: producer_error
flush_timeout: 30
send_timeout: 2
kafka_config:
bootstrap.servers: localhost:9092
pipeline:
- labelername:
type: labeler
schema: quickstart/exampledata/rules/labeler/schema.json
include_parent_labels: true
specific_rules:
- quickstart/exampledata/rules/labeler/specific
generic_rules:
- quickstart/exampledata/rules/labeler/generic
- dissectorname:
type: dissector
specific_rules:
- quickstart/exampledata/rules/dissector/specific/
generic_rules:
- quickstart/exampledata/rules/dissector/generic/
- dropper:
type: dropper
specific_rules:
- quickstart/exampledata/rules/dropper/specific
generic_rules:
- quickstart/exampledata/rules/dropper/generic
- filter: "test_dropper"
dropper:
drop:
- drop_me
description: "..."
- pre_detector:
type: pre_detector
specific_rules:
- quickstart/exampledata/rules/pre_detector/specific
generic_rules:
- quickstart/exampledata/rules/pre_detector/generic
outputs:
- opensearch: sre
tree_config: quickstart/exampledata/rules/pre_detector/tree_config.json
alert_ip_list_path: quickstart/exampledata/rules/pre_detector/alert_ips.yml
- amides:
type: amides
specific_rules:
- quickstart/exampledata/rules/amides/specific
generic_rules:
- quickstart/exampledata/rules/amides/generic
models_path: quickstart/exampledata/models/model.zip
num_rule_attributions: 10
max_cache_entries: 1000000
decision_threshold: 0.32
- pseudonymizer:
type: pseudonymizer
pubkey_analyst: quickstart/exampledata/rules/pseudonymizer/example_analyst_pub.pem
pubkey_depseudo: quickstart/exampledata/rules/pseudonymizer/example_depseudo_pub.pem
regex_mapping: quickstart/exampledata/rules/pseudonymizer/regex_mapping.yml
hash_salt: a_secret_tasty_ingredient
outputs:
- opensearch: pseudonyms
specific_rules:
- quickstart/exampledata/rules/pseudonymizer/specific/
generic_rules:
- quickstart/exampledata/rules/pseudonymizer/generic/
max_cached_pseudonyms: 1000000
- calculator:
type: calculator
specific_rules:
- filter: "test_label: execute"
calculator:
target_field: "calculation"
calc: "1 + 1"
generic_rules: []
The options under input
, output
and pipeline
are passed
to factories in Logprep.
They contain settings for each separate processor and connector.
Details for configuring connectors are described in
Output and Input and for processors in Processors.
It is possible to use environment variables in all configuration
and rule files in all places.
Environment variables have to be set in uppercase and prefixed
with LOGPREP_
, GITHUB_
, PYTEST_
or
CI_
. Lowercase variables are ignored. Forbidden
variable names are: ["LOGPREP_LIST"]
, as it is already used internally.
Security Best Practice - Configuration Environment Variables
As it is possible to replace all configuration options with environment variables it is
recommended to use these especially for sensitive information like usernames, password, secrets
or hash salts.
Examples where this could be useful would be the key
for the hmac calculation (see
input > preprocessing) or the user
/secret
for the elastic-/opensearch
connectors.
The following config file will be valid by setting the given environment variables:
version: $LOGPREP_VERSION
process_count: $LOGPREP_PROCESS_COUNT
timeout: 0.1
logger:
level: $LOGPREP_LOG_LEVEL
$LOGPREP_PIPELINE
$LOGPREP_INPUT
$LOGPREP_OUTPUT
export LOGPREP_VERSION="1"
export LOGPREP_PROCESS_COUNT="1"
export LOGPREP_LOG_LEVEL="DEBUG"
export LOGPREP_PIPELINE="
pipeline:
- labelername:
type: labeler
schema: quickstart/exampledata/rules/labeler/schema.json
include_parent_labels: true
specific_rules:
- quickstart/exampledata/rules/labeler/specific
generic_rules:
- quickstart/exampledata/rules/labeler/generic"
export LOGPREP_OUTPUT="
output:
kafka:
type: confluentkafka_output
topic: producer
error_topic: producer_error
flush_timeout: 30
send_timeout: 2
kafka_config:
bootstrap.servers: localhost:9092"
export LOGPREP_INPUT="
input:
kafka:
type: confluentkafka_input
topic: consumer
offset_reset_policy: smallest
kafka_config:
bootstrap.servers: localhost:9092
group.id: test"
- class logprep.util.configuration.Configuration
the configuration class
- version: str
It is optionally possible to set a version to your configuration file which can be printed via
logprep run --version config/pipeline.yml
. This has no effect on the execution of logprep and is merely used for documentation purposes. Defaults tounset
.
- config_refresh_interval: int | None
Configures the interval in seconds on which logprep should try to reload the configuration. If not configured, logprep won’t reload the configuration automatically. If configured the configuration will only be reloaded if the configuration version changes. If http errors occurs on configuration reload config_refresh_interval is set to a quarter of the current config_refresh_interval until a minimum of 5 seconds is reached. Defaults to
None
, which means that the configuration will not be refreshed.Security Best Practice - Configuration Refresh Interval
The refresh interval for the configuration shouldn’t be set too high in production environments. It is suggested to not set a value higher than
300
(5 min). That way configuration updates are propagated fairly quickly instead of once a day.It should also be noted that a new configuration file will be read as long as it is a valid config. There is no further check to ensure credibility.
In case a new configuration could not be retrieved successfully and the
config_refresh_interval
is already reduced automatically to 5 seconds it should be noted that this could lead to a blocking behavior or an significant reduction in performance as logprep is often retrying to reload the configuration. Because of that ensure that the configuration endpoint is always available.
- process_count: int
Number of logprep processes to start. Defaults to
1
.
- timeout: float
Logprep tries to react to signals (like sent by CTRL+C) within the given time. The time taken for some processing steps is not always predictable, thus it is not possible to ensure that this time will be adhered to. However, Logprep reacts quickly for small values (< 1.0), but this requires more processing power. This can be useful for testing and debugging. Larger values (like 5.0) slow the reaction time down, but this requires less processing power, which makes in preferable for continuous operation. Defaults to
5.0
.
- logger: dict
Logger configuration. Defaults to
{"level": "INFO"}
.Security Best Practice - Logprep Log-Level
The loglevel of logprep should be set to
"INFO"
in production environments, as the"DEBUG"
level could expose sensitive events into the log.
- input: dict
Input connector configuration. Defaults to
{}
. For detailed configurations see Input.
- output: dict
Output connector configuration. Defaults to
{}
. For detailed configurations see Output.
- pipeline: list[dict]
Pipeline configuration. Defaults to
[]
. See Processors for a detailed overview on how to configure a pipeline.
- metrics: MetricsConfig
Metrics configuration. Defaults to
{"enabled": False, "port": 8000, "uvicorn_config": {}}
.The key
uvicorn_config
can be configured with any uvicorn config parameters. For further information see the uvicorn documentation.Security Best Practice - Metrics Configuration
Additionaly to the below it is recommended to configure ssl on the metrics server endpoint
metrics: enabled: true port: 9000 uvicorn_config: access_log: true server_header: false date_header: false workers: 1
- profile_pipelines: bool
Start the profiler to profile the pipeline. Defaults to
False
.
- Input
- Output
- Processors
- Amides
- Calculator
- Clusterer
- Concatenator
- DatetimeExtractor
- Deleter
- Dissector
- DomainLabelExtractor
- DomainResolver
- Dropper
- FieldManager
- GenericAdder
- GenericResolver
- GeoipEnricher
- Grokker
- HyperscanResolver
- IpInformer
- KeyChecker
- Labeler
- ListComparison
- Normalizer
- PreDetector
- Pseudonymizer
- Requester
- SelectiveExtractor
- StringSplitter
- TemplateReplacer
- Timestamper
- TimestampDiffer
- Rules
- Getters
- Metrics