Input
Security Best Practice - Input Connectors
It is advised to only use the ConfluentKafkaInput, HttpConnector or
FileInput as input connectors in production environments.
The connectors DummyInput, JsonInput and JsonlInput are mainly designed
for debugging purposes.
Furthermore, it is suggested to enable the HMAC preprocessor to ensure no tempering of
processed events.
hmac:
target: <RAW_MSG>
key: <SECRET>
output_field: HMAC
ConfluentkafkaInput
Logprep uses confluent-kafka python client library to communicate with kafka-clusters. Important documentation sources are:
Example
1input:
2 mykafkainput:
3 type: confluentkafka_input
4 topic: consumer
5 kafka_config:
6 bootstrap.servers: "127.0.0.1:9092,127.0.0.1:9093"
7 group.id: "cgroup"
8 enable.auto.commit: "true"
9 session.timeout.ms: "6000"
10 auto.offset.reset: "earliest"
- class logprep.connector.confluent_kafka.input.ConfluentKafkaInput.Config
Kafka input connector specific configurations
- topic: str
The topic from which new log messages will be fetched.
- kafka_config: MappingProxyType | None
Kafka configuration for the kafka client. At minimum the following keys must be set:
bootstrap.servers (STRING): a comma separated list of kafka brokers
group.id (STRING): a unique identifier for the consumer group
The following keys are injected by the connector and should not be set:
“enable.auto.offset.store” is set to “false”,
“enable.auto.commit” is set to “true”,
For additional configuration options see the official: librdkafka configuration.
DEFAULTS:
enable.auto.offset.store:falseenable.auto.commit:trueclient.id:<<hostname>>auto.offset.reset:earliestsession.timeout.ms:6000statistics.interval.ms:30000
- preprocessing: dict
All input connectors support different preprocessing methods:
log_arrival_time_target_field - It is possible to automatically add the arrival time in Logprep to every incoming log message. To enable adding arrival times to each event the keyword
log_arrival_time_target_fieldhas to be set under the fieldpreprocessing. It defines the name of the dotted field in which the arrival times should be stored. If the fieldpreprocessingandlog_arrival_time_target_fieldare not present, no arrival timestamp is added to the event.log_arrival_timedelta - It is possible to automatically calculate the difference between the arrival time of logs in Logprep and their generation timestamp, which is then added to every incoming log message. To enable adding delta times to each event, the keyword
log_arrival_time_target_fieldhas to be set as a precondition (see above). Furthermore, two configurations for the timedelta are needed. Atarget_fieldas well as areference_fieldhas to be set.target_field - Defines the fieldname to which the time difference should be written to.
reference_field - Defines a field with a timestamp that should be used for the time difference. The calculation will be the arrival time minus the time of this reference field.
version_info_target_field - If required it is possible to automatically add the logprep version and the used configuration version to every incoming log message. This helps to keep track of the processing of the events when the configuration is changing often. To enable adding the versions to each event the keyword
version_info_target_fieldhas to be set under the fieldpreprocessing. It defines the name of the parent field under which the version info should be given. If the fieldpreprocessingandversion_info_target_fieldare not present then no version information is added to the event.hmac - If required it is possible to automatically attach an HMAC to incoming log messages. To activate this preprocessor the following options should be appended to the preprocessor options. This field is completely optional and can also be omitted if no hmac is needed.
target - Defines a field inside the log message which should be used for the hmac calculation. If the target field is not found or does not exists an error message is written into the configured output field. If the hmac should be calculated on the full incoming raw message instead of a subfield the target option should be set to
<RAW_MSG>.key - The secret key that will be used to calculate the hmac.
output_field - The parent name of the field where the hmac result should be written to in the original incoming log message. As subfields the result will have a field called
hmac, containing the calculated hmac, andcompressed_base64, containing the original message that was used to calculate the hmac in compressed and base64 encoded. In case the output field exists already in the original message an error is raised.
enrich_by_env_variables - If required it is possible to automatically enrich incoming events by environment variables. To activate this preprocessor the fields value has to be a mapping from the target field name (key) to the environment variable name (value).
- type: str
Type of the component
- health_timeout: float
Default is 1 seconds
- Type:
Timeout in seconds for health check
DummyInput
A dummy input that returns the documents it was initialized with.
If a “document” is derived from Exception, that exception will be thrown instead of returning a document. The exception will be removed and subsequent calls may return documents or throw other exceptions in the given order.
Example
1input:
2 mydummyinput:
3 type: dummy_input
4 documents: [{"document":"one"}, "Exception", {"document":"two"}]
- class logprep.connector.dummy.input.DummyInput.Config
DummyInput specific configuration
- documents: List[dict | type | Exception]
A list of documents that should be returned.
- repeat_documents: str | None
If set to
true, then the given input documents will be repeated after the last one is reached. Default:False
- preprocessing: dict
All input connectors support different preprocessing methods:
log_arrival_time_target_field - It is possible to automatically add the arrival time in Logprep to every incoming log message. To enable adding arrival times to each event the keyword
log_arrival_time_target_fieldhas to be set under the fieldpreprocessing. It defines the name of the dotted field in which the arrival times should be stored. If the fieldpreprocessingandlog_arrival_time_target_fieldare not present, no arrival timestamp is added to the event.log_arrival_timedelta - It is possible to automatically calculate the difference between the arrival time of logs in Logprep and their generation timestamp, which is then added to every incoming log message. To enable adding delta times to each event, the keyword
log_arrival_time_target_fieldhas to be set as a precondition (see above). Furthermore, two configurations for the timedelta are needed. Atarget_fieldas well as areference_fieldhas to be set.target_field - Defines the fieldname to which the time difference should be written to.
reference_field - Defines a field with a timestamp that should be used for the time difference. The calculation will be the arrival time minus the time of this reference field.
version_info_target_field - If required it is possible to automatically add the logprep version and the used configuration version to every incoming log message. This helps to keep track of the processing of the events when the configuration is changing often. To enable adding the versions to each event the keyword
version_info_target_fieldhas to be set under the fieldpreprocessing. It defines the name of the parent field under which the version info should be given. If the fieldpreprocessingandversion_info_target_fieldare not present then no version information is added to the event.hmac - If required it is possible to automatically attach an HMAC to incoming log messages. To activate this preprocessor the following options should be appended to the preprocessor options. This field is completely optional and can also be omitted if no hmac is needed.
target - Defines a field inside the log message which should be used for the hmac calculation. If the target field is not found or does not exists an error message is written into the configured output field. If the hmac should be calculated on the full incoming raw message instead of a subfield the target option should be set to
<RAW_MSG>.key - The secret key that will be used to calculate the hmac.
output_field - The parent name of the field where the hmac result should be written to in the original incoming log message. As subfields the result will have a field called
hmac, containing the calculated hmac, andcompressed_base64, containing the original message that was used to calculate the hmac in compressed and base64 encoded. In case the output field exists already in the original message an error is raised.
enrich_by_env_variables - If required it is possible to automatically enrich incoming events by environment variables. To activate this preprocessor the fields value has to be a mapping from the target field name (key) to the environment variable name (value).
- type: str
Type of the component
- health_timeout: float
Default is 1 seconds
- Type:
Timeout in seconds for health check
HTTPInput
A http input connector that spawns an uvicorn server and accepts http requests, parses them,
puts them to an internal queue and pops them via get_next method.
HTTP Connector Config Example
An example config file would look like:
1input:
2 myhttpinput:
3 type: http_input
4 message_backlog_size: 15000
5 collect_meta: False
6 metafield_name: "@metadata"
7 uvicorn_config:
8 host: 0.0.0.0
9 port: 9000
10 endpoints:
11 /firstendpoint: json
12 /second*: plaintext
13 /(third|fourth)/endpoint: jsonl
- The endpoint config supports regex and wildcard patterns:
/second*: matches everything after asterisk/(third|fourth)/endpointmatches either third or forth in the first part
Endpoint Credentials Config Example
By providing a credentials file in environment variable LOGPREP_CREDENTIALS_FILE you can
add basic authentication for a specific endpoint. The format of this file would look like:
1input:
2 endpoints:
3 /firstendpoint:
4 username: user
5 password_file: examples/exampledata/config/user_password.txt
6 /second*:
7 username: user
8 password: secret_password
You can choose between a plain secret with the key password or a filebased secret
with the key password_file.
Security Best Practice - Http Input Connector - Authentication
- When using basic auth with the http input connector the following points should be taken into account:
basic auth must only be used with strong passwords
basic auth must only be used with TLS encryption
avoid to reveal your plaintext secrets in public repositories
Behaviour of HTTP Requests
GET:
Responds always with 200 (ignores configured Basic Auth)
When Messages Queue is full, it responds with 429
POST:
Responds with 200 on non-Basic Auth Endpoints
Responds with 401 on Basic Auth Endpoints (and 200 with appropriate credentials)
When Messages Queue is full, it responds wiht 429
ALL OTHER:
Responds with 405
- class logprep.connector.http.input.HttpInput.Config
Config for HTTPInput
- uvicorn_config: Mapping[str, str | int]
Configure uvicorn server. For possible settings see uvicorn settings page.
Security Best Practice - Uvicorn Webserver Configuration
Additionally to the below it is recommended to configure ssl on the metrics server endpoint
uvicorn_config: access_log: true server_header: false date_header: false workers: 2
- endpoints: Mapping[str, str]
Configure endpoint routes with a Mapping of a path to an endpoint. Possible endpoints are:
json,jsonl,plaintext. It’s possible to use wildcards and regexps for pattern matching.- class PlaintextHttpEndpoint
plaintextendpoint to get the body from request and put it inmessagefield
- class JSONLHttpEndpoint
jsonlendpoint to get jsonl from request
- class JSONHttpEndpoint
jsonendpoint to get json from request
- message_backlog_size: int
Configures maximum size of input message queue for this connector. When limit is reached the server will answer with 429 Too Many Requests. For reasonable throughput this shouldn’t be smaller than default value of 15.000 messages.
- collect_meta: str
Defines if metadata should be collected -
True: Collect metadata -False: Won’t collect metadataSecurity Best Practice - Input Connector - HttpConnector
It is suggested to enable the collection of meta data (
collect_meta: True) to ensure transparency of the incoming events.
- metafield_name: str
Defines the name of the key for the collected metadata fields
- preprocessing: dict
All input connectors support different preprocessing methods:
log_arrival_time_target_field - It is possible to automatically add the arrival time in Logprep to every incoming log message. To enable adding arrival times to each event the keyword
log_arrival_time_target_fieldhas to be set under the fieldpreprocessing. It defines the name of the dotted field in which the arrival times should be stored. If the fieldpreprocessingandlog_arrival_time_target_fieldare not present, no arrival timestamp is added to the event.log_arrival_timedelta - It is possible to automatically calculate the difference between the arrival time of logs in Logprep and their generation timestamp, which is then added to every incoming log message. To enable adding delta times to each event, the keyword
log_arrival_time_target_fieldhas to be set as a precondition (see above). Furthermore, two configurations for the timedelta are needed. Atarget_fieldas well as areference_fieldhas to be set.target_field - Defines the fieldname to which the time difference should be written to.
reference_field - Defines a field with a timestamp that should be used for the time difference. The calculation will be the arrival time minus the time of this reference field.
version_info_target_field - If required it is possible to automatically add the logprep version and the used configuration version to every incoming log message. This helps to keep track of the processing of the events when the configuration is changing often. To enable adding the versions to each event the keyword
version_info_target_fieldhas to be set under the fieldpreprocessing. It defines the name of the parent field under which the version info should be given. If the fieldpreprocessingandversion_info_target_fieldare not present then no version information is added to the event.hmac - If required it is possible to automatically attach an HMAC to incoming log messages. To activate this preprocessor the following options should be appended to the preprocessor options. This field is completely optional and can also be omitted if no hmac is needed.
target - Defines a field inside the log message which should be used for the hmac calculation. If the target field is not found or does not exists an error message is written into the configured output field. If the hmac should be calculated on the full incoming raw message instead of a subfield the target option should be set to
<RAW_MSG>.key - The secret key that will be used to calculate the hmac.
output_field - The parent name of the field where the hmac result should be written to in the original incoming log message. As subfields the result will have a field called
hmac, containing the calculated hmac, andcompressed_base64, containing the original message that was used to calculate the hmac in compressed and base64 encoded. In case the output field exists already in the original message an error is raised.
enrich_by_env_variables - If required it is possible to automatically enrich incoming events by environment variables. To activate this preprocessor the fields value has to be a mapping from the target field name (key) to the environment variable name (value).
- type: str
Type of the component
- health_timeout: float
Default is 1 seconds
- Type:
Timeout in seconds for health check
JsonInput
A json input that returns the documents it was initialized with.
If a “document” is derived from Exception, that exception will be thrown instead of returning a document. The exception will be removed and subsequent calls may return documents or throw other exceptions in the given order.
Example
1input:
2 myjsoninput:
3 type: json_input
4 documents_path: path/to/a/document.json
5 repeat_documents: true
- class logprep.connector.json.input.JsonInput.Config
JsonInput connector specific configuration
- documents_path: str
A path to a file in json format, with can also include multiple jsons dicts wrapped in a list.
- repeat_documents: bool | None
If set to
true, then the given input documents will be repeated after the last one is reached. Default:False
- preprocessing: dict
All input connectors support different preprocessing methods:
log_arrival_time_target_field - It is possible to automatically add the arrival time in Logprep to every incoming log message. To enable adding arrival times to each event the keyword
log_arrival_time_target_fieldhas to be set under the fieldpreprocessing. It defines the name of the dotted field in which the arrival times should be stored. If the fieldpreprocessingandlog_arrival_time_target_fieldare not present, no arrival timestamp is added to the event.log_arrival_timedelta - It is possible to automatically calculate the difference between the arrival time of logs in Logprep and their generation timestamp, which is then added to every incoming log message. To enable adding delta times to each event, the keyword
log_arrival_time_target_fieldhas to be set as a precondition (see above). Furthermore, two configurations for the timedelta are needed. Atarget_fieldas well as areference_fieldhas to be set.target_field - Defines the fieldname to which the time difference should be written to.
reference_field - Defines a field with a timestamp that should be used for the time difference. The calculation will be the arrival time minus the time of this reference field.
version_info_target_field - If required it is possible to automatically add the logprep version and the used configuration version to every incoming log message. This helps to keep track of the processing of the events when the configuration is changing often. To enable adding the versions to each event the keyword
version_info_target_fieldhas to be set under the fieldpreprocessing. It defines the name of the parent field under which the version info should be given. If the fieldpreprocessingandversion_info_target_fieldare not present then no version information is added to the event.hmac - If required it is possible to automatically attach an HMAC to incoming log messages. To activate this preprocessor the following options should be appended to the preprocessor options. This field is completely optional and can also be omitted if no hmac is needed.
target - Defines a field inside the log message which should be used for the hmac calculation. If the target field is not found or does not exists an error message is written into the configured output field. If the hmac should be calculated on the full incoming raw message instead of a subfield the target option should be set to
<RAW_MSG>.key - The secret key that will be used to calculate the hmac.
output_field - The parent name of the field where the hmac result should be written to in the original incoming log message. As subfields the result will have a field called
hmac, containing the calculated hmac, andcompressed_base64, containing the original message that was used to calculate the hmac in compressed and base64 encoded. In case the output field exists already in the original message an error is raised.
enrich_by_env_variables - If required it is possible to automatically enrich incoming events by environment variables. To activate this preprocessor the fields value has to be a mapping from the target field name (key) to the environment variable name (value).
- type: str
Type of the component
- health_timeout: float
Default is 1 seconds
- Type:
Timeout in seconds for health check
JsonlInput
A json line input that returns the documents it was initialized with.
If a “document” is derived from Exception, that exception will be thrown instead of returning a document. The exception will be removed and subsequent calls may return documents or throw other exceptions in the given order.
Example
1input:
2 myjsonlinput:
3 type: jsonl_input
4 documents_path: path/to/a/document.jsonl
5 repeat_documents: true
- class logprep.connector.jsonl.input.JsonlInput.Config
JsonInput connector specific configuration
- documents_path: str
A path to a file in json format, with can also include multiple jsons dicts wrapped in a list.
- repeat_documents: bool | None
If set to
true, then the given input documents will be repeated after the last one is reached. Default:False
- preprocessing: dict
All input connectors support different preprocessing methods:
log_arrival_time_target_field - It is possible to automatically add the arrival time in Logprep to every incoming log message. To enable adding arrival times to each event the keyword
log_arrival_time_target_fieldhas to be set under the fieldpreprocessing. It defines the name of the dotted field in which the arrival times should be stored. If the fieldpreprocessingandlog_arrival_time_target_fieldare not present, no arrival timestamp is added to the event.log_arrival_timedelta - It is possible to automatically calculate the difference between the arrival time of logs in Logprep and their generation timestamp, which is then added to every incoming log message. To enable adding delta times to each event, the keyword
log_arrival_time_target_fieldhas to be set as a precondition (see above). Furthermore, two configurations for the timedelta are needed. Atarget_fieldas well as areference_fieldhas to be set.target_field - Defines the fieldname to which the time difference should be written to.
reference_field - Defines a field with a timestamp that should be used for the time difference. The calculation will be the arrival time minus the time of this reference field.
version_info_target_field - If required it is possible to automatically add the logprep version and the used configuration version to every incoming log message. This helps to keep track of the processing of the events when the configuration is changing often. To enable adding the versions to each event the keyword
version_info_target_fieldhas to be set under the fieldpreprocessing. It defines the name of the parent field under which the version info should be given. If the fieldpreprocessingandversion_info_target_fieldare not present then no version information is added to the event.hmac - If required it is possible to automatically attach an HMAC to incoming log messages. To activate this preprocessor the following options should be appended to the preprocessor options. This field is completely optional and can also be omitted if no hmac is needed.
target - Defines a field inside the log message which should be used for the hmac calculation. If the target field is not found or does not exists an error message is written into the configured output field. If the hmac should be calculated on the full incoming raw message instead of a subfield the target option should be set to
<RAW_MSG>.key - The secret key that will be used to calculate the hmac.
output_field - The parent name of the field where the hmac result should be written to in the original incoming log message. As subfields the result will have a field called
hmac, containing the calculated hmac, andcompressed_base64, containing the original message that was used to calculate the hmac in compressed and base64 encoded. In case the output field exists already in the original message an error is raised.
enrich_by_env_variables - If required it is possible to automatically enrich incoming events by environment variables. To activate this preprocessor the fields value has to be a mapping from the target field name (key) to the environment variable name (value).
- type: str
Type of the component
- health_timeout: float
Default is 1 seconds
- Type:
Timeout in seconds for health check
FileInput
A generic line input that returns the documents it was initialized with. If a “document” is derived from Exception, that exception will be thrown instead of returning a document. The exception will be removed and subsequent calls may return documents or throw other exceptions in the given order.
Example
1input:
2 myfileinput:
3 type: file_input
4 logfile_path: path/to/a/document
5 start: begin
6 interval: 1
7 watch_file: True
- class logprep.connector.file.input.FileInput.Config
FileInput connector specific configuration
- logfile_path: str
A path to a file in generic raw format, which can be in any string based format. Needs to be parsed with dissector or another processor
- start: str
Defines the behaviour of the file monitor with the following options: -
begin: starts to read from the beginning of a file -end: goes initially to the end of the file and waits for new content
- watch_file: str
Defines the behaviour of the file monitor with the following options: -
True: Read the file like defined in start param and monitor continuously for newly appended log lines or file changes -False: Read the file like defined in start param only once and exit afterwards
- interval: int
Defines the refresh interval, how often the file is checked for changes
- preprocessing: dict
All input connectors support different preprocessing methods:
log_arrival_time_target_field - It is possible to automatically add the arrival time in Logprep to every incoming log message. To enable adding arrival times to each event the keyword
log_arrival_time_target_fieldhas to be set under the fieldpreprocessing. It defines the name of the dotted field in which the arrival times should be stored. If the fieldpreprocessingandlog_arrival_time_target_fieldare not present, no arrival timestamp is added to the event.log_arrival_timedelta - It is possible to automatically calculate the difference between the arrival time of logs in Logprep and their generation timestamp, which is then added to every incoming log message. To enable adding delta times to each event, the keyword
log_arrival_time_target_fieldhas to be set as a precondition (see above). Furthermore, two configurations for the timedelta are needed. Atarget_fieldas well as areference_fieldhas to be set.target_field - Defines the fieldname to which the time difference should be written to.
reference_field - Defines a field with a timestamp that should be used for the time difference. The calculation will be the arrival time minus the time of this reference field.
version_info_target_field - If required it is possible to automatically add the logprep version and the used configuration version to every incoming log message. This helps to keep track of the processing of the events when the configuration is changing often. To enable adding the versions to each event the keyword
version_info_target_fieldhas to be set under the fieldpreprocessing. It defines the name of the parent field under which the version info should be given. If the fieldpreprocessingandversion_info_target_fieldare not present then no version information is added to the event.hmac - If required it is possible to automatically attach an HMAC to incoming log messages. To activate this preprocessor the following options should be appended to the preprocessor options. This field is completely optional and can also be omitted if no hmac is needed.
target - Defines a field inside the log message which should be used for the hmac calculation. If the target field is not found or does not exists an error message is written into the configured output field. If the hmac should be calculated on the full incoming raw message instead of a subfield the target option should be set to
<RAW_MSG>.key - The secret key that will be used to calculate the hmac.
output_field - The parent name of the field where the hmac result should be written to in the original incoming log message. As subfields the result will have a field called
hmac, containing the calculated hmac, andcompressed_base64, containing the original message that was used to calculate the hmac in compressed and base64 encoded. In case the output field exists already in the original message an error is raised.
enrich_by_env_variables - If required it is possible to automatically enrich incoming events by environment variables. To activate this preprocessor the fields value has to be a mapping from the target field name (key) to the environment variable name (value).
- type: str
Type of the component
- health_timeout: float
Default is 1 seconds
- Type:
Timeout in seconds for health check