Interpreting a Logfile with Grok

Interpreting a Logfile with Grok

Rocco Gagliardi
by Rocco Gagliardi
time to read: 7 minutes

We have a BSM audit log, iptable log, Apache, smbd_audit log. How can we normalize the useful information and extract/correlate what we need? A small piece of software can make our life easier: Grok.

The Problem

Interpreting a logfile means to extract the information we need and ignore the rest.

For data transferred via syslog, the field data has no strictly predefined format and can also contain any data type in any order.

The number of existing log formats is around 800, each application logs specific information in proprietary formats; sometimes even the same application uses different formats for different events.

Pure Regular Expressions

With regular expressions it is possible to parse all possible sequences of chars; getting a result is just a question of practice and time (to test), but it can become a nightmare very quickly.

Take for example the parsing of a common IPv4 address: 192.168.1.2

At first look we can write a simple regexp like

\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b;

Does this regexp grep all IP addresses? Yes, but it also greps a lot of non-IP addresses; in particular all combinations above 255.255.255.255.

We can go further and create a smarter regexp to grep only valid IPs; specifically: the regexp should parse the following valid sequences:

  1. 25 followed 0-5
  2. 2 followed by 0-4 0-9
  3. A possible – but not necessary – 0 or 1 followed by 0-9 and an optional 0-9
  4. This must happen 4 times separated by a dot

Just straightforward, but once coded it looks like:

\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b

And that’s just for an IP address!

Debugging and maintaining such monster regular expression can result in a lot of wasted time.

Using Grok

Grok is a little nice tool that makes regular expressions easier to manage and helps to turn unstructured log and event data into structured data. It is a great tool for parsing log data and program output. You can match any number of complex patterns on any number of inputs (processes and files) and get custom reactions.

You can look at it as a template engine with a lot of predefined and usable (tested) regular expressions, simply accessible through an alias.

Looking at the Grok regexp database we find many predefined regexp with which we can manipulate the most of the log messages format.

Using Grok, if we need to parse an IP address, a valid one, we just specify %{IP}; Grok looks up own regexp library for an IP alias and translates in following monster-regexp:

(?<![0-9])(??:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))(?![0-9])

For example look at the following log line:

Nov  1 21:14:23 scorn kernel: pid 84558 (expect), uid 30206: exited on signal 3

We are interested in timestamp, pid, program-name, uid and exit-signal. We want ignore the message exited on signal and all other human-helps like pid, uid etc.

In Grok, this pattern looks like:

%{SYSLOGBASE} pid %{NUMBERid} \(%{WORDrogram}\), uid %{NUMBER:uid}:exited on signal %{NUMBER:signal}

Straightforward, human-readable, efficient.

Summary

I stumbled on Grok thanks to Logstash and it was love at first run. Writing filters is as simple and straightforward as writing a normal text. Debugging was never so easy. It is possible to use Grok within several languages and/or tools. It is also easy to expand and use as a general parsing instrument.

About the Author

Rocco Gagliardi

Rocco Gagliardi has been working in IT since the 1980s and specialized in IT security in the 1990s. His main focus lies in security frameworks, network routing, firewalling and log management.

Links

You need support in such a project?

Our experts will get in contact with you!

×
Enhancing Data Understanding

Enhancing Data Understanding

Rocco Gagliardi

Transition to OpenSearch

Transition to OpenSearch

Rocco Gagliardi

Graylog v5

Graylog v5

Rocco Gagliardi

auditd

auditd

Rocco Gagliardi

You want more?

Further articles available here

You need support in such a project?

Our experts will get in contact with you!

You want more?

Further articles available here