Testing Rules

Dry Run

Rules can be tested by executing them in a dry run of Logprep. Instead of the connectors defined in the configuration file, the dry run takes a path parameter to an input JSON (line) file that contains log messages. The output is displayed in the console and changes made by Logprep are being highlighted:

Directly with Python

logprep test dry-run $CONFIG $EVENTS

With a PEX file

logprep.pex test dry-run $CONFIG $EVENTS

Where $CONFIG is the path to a configuration file (see Configuration). The only required section in the configuration is pipeline (see tests/testdata/config/config-dry-run.yml for an example). The remaining options are set internally or are being ignored.

$EVENTS is the path to a file with one or multiple log messages. A single log message can be provided with a file containing a plain json or wrapped in brackets (beginning with [ and ending with ]). For multiple events it must be a list wrapped inside brackets, while each log object separated by a comma. By specifying the parameter --dry-run-input-type jsonl a list of JSON lines can be used instead. Additional output, like pseudonyms, will be printed if --dry-run-full-output is added.

Example for execution with a JSON lines file (dry-run-input-type jsonl) printing all results, including pseudonyms (dry-run-full-output)

logprep.pex tests/testdata/config/config-dry-run.yml --dry-run tests/testdata/input_logdata/wineventlog_raw.jsonl --dry-run-input-type jsonl --dry-run-full-output

Rule Tests

It is possible to write tests for rules. In those tests it is possible to define inputs and expected outputs for these inputs. Only one test file can exist per rule file. The tests must be located in the same directory as the rule files. They are identified by naming them like the rule, but ending with _test.json. For example rule_one.json and rule_one_test.json.

The rule file must contain a JSON list of JSON objects. Each object corresponds to a test. They must have the fields raw and processed. raw contains an input log message and processed the corresponding processed result.

When using multi-rules it may be necessary to restrict tests to specific rules in the file. This can be achieved by the field target_rule_idx. The value of that field corresponds to the index of the rule in the JSON list of multi-rules (starting with 0).

Logprep gets the events in raw as input. The result will be compared with the content of processed.

Fields with variable results can be matched via regex expressions by appending |re to a field name and using a regex expression as value. It is furthermore possible to use GROK patterns. Some patterns are pre-defined, but others can be added by adding a directory with GROK patterns to the configuration file.

The rules get automatically validated if an auto-test is being executed. The rule tests will be only performed if the validation was successful.

The output is printed to the console, highlighting differences between raw and processed:

Directly with Python

logprep test unit $CONFIG

With PEX file

logprep.pex test unit $CONFIG

Where $CONFIG is the path to a configuration file (see Configuration).

Auto-testing does also perform a verification of the pipeline section of the Logprep configuration.

Rule Corpus Tests

The rule corpus tester can be used to test a full logprep pipeline and configuration against a set of expected outputs.

To start the tester call:

Run rule corpus test

logprep test integration $CONFIG $CORPUS_TEST_DATA

Where in the parameter CONFIG should point to a valid logprep configuration and CORPUS_TEST_DATA to a directory containing the test data with the different test cases. The test cases can be organized into subdirectories. Each test case should contain one input event (*_in.json), one expected output event (*_out.json) and an expected extra outputs like predetections or pseudonyms (*_out_extra.json). The expected extra data is optional though, but if given, it is a single json file, where each output has a root key of the expected target. All files belonging to the same test case have to start with the same name, like the following example:

Test data setup

- test_one_in.json
- test_one_out.json
- test_one_out_extra.json
- test_two_in.json
- test_two_out.json

Content of test_one_in.json - Logprep input

{
    "test": "event"
}

Content of test_one_out.json - Expected Logprep Output

{
    "processed": ["test", "event"]
    "with": "<IGNORE_VALUE>"
}

Content of test_one_out_extra.json - Expected Logprep Extra Output

[
    {
        "predetection_target": {
            "id": "..."
        }
    }
]

As sometimes test could have cases where you don’t want to test for a specific value of a key it is possible to test only for the key and ignore the value. In order to achieve this just set a field in an expected output as <IGNORE_VALUE>, with that the value won’t be considered during the testing. Furthermore, it is possible to set an entire field as optional with <OPTIONAL_KEY>. This way fields can be testet for their presents when they exist, and will be ignored when they do not exist. This can for example be the case for the geo ip enricher, which sometimes finds city information about an ip and sometimes not.

While executing the tests report print statements are collected which will be printed to the console after the test run is completed. During the run itself only a short summary is given for each case.

If during the test run logprep has an error or warning it logs it to the console as well, which will be printed inside the test cases summary and before the summary result of the test, which created the log message.

If one or more test cases fail this tester ends with an exit code of 1, otherwise 0.

Custom Tests

Some processors can not be tested with regular auto-tests. Therefore, it is possible to implement custom tests for these processors. Processors that use custom tests must have set the instance variable has_custom_tests to True and they must implement the method test_rules. Custom tests are performed before other auto-tests and they do not require additional test files. One processor that uses custom tests is the clusterer (see Rules).

Example Tests

Example Test for a single Rule

The raw value of the test triggers the rule, since the filter matches. The result of the rule is, as expected, a pseudonymization of param1. The test is successful.

Example - Rule that shall be tested

[{
  "filter": "event_id: 1 AND source_name: \"Test\"",
  "pseudonymizer": {
    "pseudonyms": {
      "event_data.param1": "RE_WHOLE_FIELD"
    }
  },
  "description": "..."
}]

Example - Test for one Rule

[{
  "raw": {
    "event_id": 1,
    "source_name": "Test",
    "event_data.param1": "ANYTHING"
  },
  "processed": {
    "event_id": 1,
    "source_name": "Test",
    "event_data.param1|re": "%{PSEUDONYM}"
  }
}]

Example Tests for a Multi-Rule

With multi-rules it has to be noted that all tests will be performed for all rules in the multi-rule file, unless restricted via target_rule_idx. In this example the second rule would trigger for both test inputs and fail for the first rule. Therefore, the test was specified so that it triggers for the appropriate rules and thus succeeds.

Example - Multi-Rule to be tested

[{
  "filter": "event_id: 1 AND source_name: \"Test\"",
  "pseudonymizer": {
    "pseudonyms": {
      "event_data.param1": "RE_WHOLE_FIELD"
    }
  },
  "description": "..."
},
{
  "filter": "event_id: 1",
  "pseudonymizer": {
    "pseudonyms": {
      "event_data.param2": "RE_WHOLE_FIELD"
    }
  },
  "description": "..."
}]

Example - Test for a Multi-Rule with specified rule indices

[{
  "target_rule_idx": 0,
  "raw": {
    "event_id": 1,
    "source_name": "Test",
    "event_data.param1": "ANYTHING"
  },
  "processed": {
    "event_id": 1,
    "source_name": "Test",
    "event_data.param1|re": "%{PSEUDONYM}"
  }
},
{
  "target_rule_idx": 1,
  "raw": {
    "event_id": 1,
    "event_data.param1": "ANYTHING"
  },
  "processed": {
    "event_id": 1,
    "event_data.param2|re": "%{PSEUDONYM}"
  }
}]