Enhancing Data Understanding
Rocco Gagliardi
If multiple systems observe the same occurrence, it should be expected that their description of that event is identical. When combined with relevant event details (time, source, destination), a computer should be able to immediately determine whether two or more logs, data logs, audit logs, alerts, alarms, or audit trails refer to the same event.
In order to make this happen, we need:
As NIST 800-92, Guide to Computer Security Log Management states “there is no consensus in the security community as to the standard terms to be used to describe the composition of log entries and files.”
Many attempt to address this problem have been started, but we still miss a recognized standard. The following list is not exhaustive; I just picked up some formats I loved or hated during my career, but there are around thousand (~1000) different (syslog) message formats.
Format | Type | Proposed by | Year | Status | Comment | Used by |
---|---|---|---|---|---|---|
XDAS | Open | OpenGroup | 1997 | Dead | XDAS provides several key features like a common audit record format and a standardized audit event taxonomy that allows audit records to be classified in a well-known manner. | Unknown |
CIDF | Open | DARPA | 1999 | Dead | The Common Intrusion Detection Framework defined the Common Intrusion Specification Language (CISL), was the base for US Navy log standards. Merged in IDMEF. | US Navy |
IDMEF | Open | IETF | 2002 | Dead | The Intrusion Detection Message Exchange Format (RFC4765) is an IETF effort that followed CIDF. IDMEF was designed to enable the communication of intrusion events observed by IDS devices. | Snort |
CIEL | Open | MITRE | 2002 | Dead | The “CVE for intrusion detection” indented to provide a naming scheme for all network or host related events. | None |
SDEE/CIDEE | Proprietary | ICSA Labs | 2003 | Dead | The Security Device Event Exchange has an XML syntax and SOAP transport. Supported only by Cisco. | Cisco |
CBE | Proprietary | IBM, Cisco | 2003 | Dead | The Common Base Event model, is part of the “Autonomic Computing” IBM’s initiative. CBE is described as a “common language to detect, log and resolve system problems”. | Tivoli |
CIM | Open | DMTF | 2005 | Alive | The DMTF’s Common Information Model (CIM) provides a common definition of management information for systems, networks, applications and services, and allows for vendor extensions. | Microsoft WMI |
CEF | Proprietary | ArcSight | 2006 | Alive | A CEF message is composed of delimited plaintext strings with optional sets of key-value pairs. It is relatively simple to generate and parse, and is transport independent. | Many |
CEE | Open | MITRE | 2007 | Killed | Address different areas of the log management: syntax (CLS), transport (CLT), vocabulary (CEET), recommendations (CELR). | None |
OLF | Proprietary | eIQNetworks | 2007 | Dead | OLF was designed for logging network events such as those often generated by firewalls, but it can also be used for events not related to the network. | Unknown |
WELF | Proprietary | WebTrends | 2008 | Alive | WELF log is composed of records, in chronological order, of 4 mandatory and 20 optional fields, with focus on firewalls and network devices. | Many |
LEEF | Proprietary | Q1 Labs | 2013 | Alive | The Log Event Extended Format (LEEF) is a customized event format for IBM Security QRadar. Like CEF, is easy to read and parse. | Qradar |
CADF | Open | DMTF | 2015 | Alive | The Cloud Auditing Data Federation (CADF) standard defines a full event model anyone can use to fill in the essential data needed to certify, self-manage and self-audit application security in cloud environments. | OPENShift |
The MITRE corporation in the early 2000 started a series of projects to address the information exchange problem, focusing on different areas of the log management. They decided to kill CIEL (Common Intrusion Event List) and created the CEE (Common Event Expression) Framework, that covers transport (Common Log Transport), syntax (Common Log Syntax), taxonomy (Common Event Expression Taxonomy), and a set of recommendations on when and what log (Common Event Log Recommendations), as part of the plan to build a national cyber information sharing ecosystem composed by CVE, CWE, CAPEC, ATT&CK, and CAR. But in 2014 someone decided that the CEE was no longer a priority.
As result, we have dictionaries, the words to identify the different parts of the cyber-security puzzle, but we miss the glue: we still use syslog to transmit the analysis produced by cyber security systems like IDS/IPS/EPS/++, and we still must ad-hoc grok the message field.
Why is so hard to define a log format? Why have so many initiatives failed?
It is not hard to define a standard! Sure, some proposal was overkill (IDMF), some other too complex to implement, but basically all standards proposed a solution for a problem not recognized by developers. Remember the "xennet: skb rides the rocket: 19 slots"
message flooding /var/log/syslog for “some reason”! For a coder who creates such meaningful message, all the efforts to standardize content are just waste of time.
Still in 2018 we don’t have a standard for security log messages! We have syslog somehow recognized as “universal” transport, that’s all.
The message field remain a land of conquest!
CEF, LEEF, CIM/CADF are the most used and supported formats:
Format | Transport | Encoding | Structure | Num of fields | Extensibility | Remarks |
---|---|---|---|---|---|---|
CEF | syslog | k→v | Loose, no dictionary | 91 | Some additional fields | Easy to read and parse, lot of information |
LEEF | syslog | k→v | Loose, no dictionary | 51 | Not specified | Easy to read and parse, focus on network |
CIM | agnostic | agnostic | XML schema | 59 | New instances | Easy to parse. Structures describing security events should be enhanced. |
At the moment, the winner is one with a loose format, transmitted over syslog, readable by human, and parsable by code. But the IoT will produce most of the messages for a code, not for a user! So, we will probably see more structured and cryptic machine-oriented formats, compressed, and composed more by IDs and less by words.
My fear is that, without a central authority, we will have a lot of git-wanna-be-master-clones.
Especially with the IoT knocking at the door, it is urgent to have a log standard to facilitate the correlation and identification of abnormal behaviors.
Many organization tried to define them, but without success. In the last years, support for CEE and LEEF transported over syslog established a de facto standard, but they are still archaic, big, user oriented logs, in a world demanding more compact, agile, machine centric logs.
Our experts will get in contact with you!
Rocco Gagliardi
Rocco Gagliardi
Rocco Gagliardi
Rocco Gagliardi
Our experts will get in contact with you!