Rules

Basic Functionality

How processors process log messages is defined via configurable rules. Each rule contains a filter that is used to select log messages. Other parameters within the rules define how certain log messages should be transformed. Those parameters depend on the processor for which they were created.

Rule Files

Rules are defined as YAML objects or JSON objects. Rules can be distributed over different files or multiple rules can reside within one file. Each file contains multiple YAML documents or a JSON array of JSON objects. The YAML format is preferred, since it is a superset of JSON and has better readability.

Depending on the filter, a rule can trigger for different types of messages or just for specific log messages. In general, specific rules are being applied first. It depends on the directory where the rule is located if it is considered specific or generic.

Further details can be found in the section for processors.

Example structure of a YAML file with a rule for the labeler processor

filter: 'command: execute'  # A comment
labeler:
  label:
    action:
    - execute
description: '...'

Example structure of a YAML file containing multiple rules for the labeler processor

filter: 'command: "execute something"'
labeler:
  label:
    action:
    - execute
description: '...'
---
filter: 'command: "terminate something"'
labeler:
  label:
    action:
    - execute
description: '...'

Example structure of a JSON file with a rule for the labeler processor

{
  "filter": "command: execute",
  "labeler": {
    "label": {
      "action": ["execute"]
    }
  }
  "description": "..."
}

Example structure of a JSON file containing multiple rules for the labeler processor

[
  {
    "filter": "command: execute",
    "labeler": {
      "label": {
        "action": ["execute"]
      }
    }
    "description": "..."
  },
  {
    "filter": "command: execute",
    "labeler": {
      "label": {
        "action": ["execute"]
      }
    }
    "description": "..."
  }
]

Log message field value access

All rules reference fields or field values of log messages. This can be done via the dot notation. To reference a nested field inside the log event, just give the whole path from the event root to the desired field. To reference the field information in the following example you would use the following notation: more.nested.information. If you do want to access a specific item inside a list of the event you can extend the dotted notation with indices. Given the following example you can access the list element lists with the following notation: more.nested.sometimes.1. In case you want to have more than one element then you can slice the list with the pattern start:stop:step_size, e.g: more.nested.sometimes.0:2 which would return ["inside", "lists"]. This slicing is based on the native python list slicing.

Example Event

{
  "some": "data",
  "more": {
    "nested": {
      "information": "is here",
      "sometimes": ["inside", "lists", "of", "elements"]
    }
  }
}

Warning

The dotted field notation is available in all processors, the use of indices to access list elements is though not available in the Clusterer, Labeler and the Pseudonymizer.

Filter

The filters are based on the Lucene query language, but contain some additional enhancements. It is possible to filter for keys and values in log messages. Dot notation is used to access subfields in log messages. A filter for {'field': {'subfield': 'value'}} can be specified by field.subfield': 'value'.

If a key without a value is given it is filtered for the existence of the key. The existence of a specific field can therefore be checked by a key without a value. The filter filter: field.subfield would match for every value subfield in {'field': {'subfield': 'value'}}. The special key * can be used to always match on any input. Thus, the filter filter: * would match any input document.

The filter in the following example would match fields ip_address with the value 192.168.0.1. Meaning all following transformations done by this rule would be applied only on log messages that match this criterion. This example is not complete, since rules are specific to processors and require additional options.

Example

{ "filter": "ip_address: 192.168.0.1" }

It is possible to use filters with field names that contain white spaces or use special symbols of the Lucene syntax. However, this has to be escaped. The filter filter: 'field.a subfield(test): value' must be escaped as filter: 'field.a\ subfield(test): value'. Other references to this field do not require such escaping. This is only necessary for the filter. It is necessary to escape twice if the file is in the JSON format - once for the filter itself and once for JSON.

Operators

A subset of Lucene query operators is supported:

NOT: Condition is not true.
AND: Connects two conditions. Both conditions must be true.
OR: Connects two conditions. At least one them must be true.

In the following example log messages are filtered for which event_id: 1 is true and ip_address: 192.168.0.1 is false. This example is not complete, since rules are specific to processors and require additional options.

Example

{ "filter": "event_id: 1 AND NOT ip_address: 192.168.0.1" }

RegEx-Filter

It is possible use regex expressions to match values. For this, the field with the regex pattern must be added to the optional field regex_fields in the rule definition.

In the following example the field ip_address is defined as regex field. It would be filtered for log messages in which the value ip_address starts with 192.168.0.. This example is not complete, since rules are specific to processors and require additional options.

Example

filter: 'ip_address: "192\.168\.0\..*"'
regex_fields:
- ip_address