Data Breach Databases - Lot of Incidents, just few Data

Data Breach Databases

Lot of Incidents, just few Data

Rocco Gagliardi
by Rocco Gagliardi
on May 02, 2019
time to read: 10 minutes


Start your own Cyber Threat Intelligence based on Data Breaches

  • We know for sure that we are constantly under attack, but by whom and how?
  • To date, information on IT incidents is practically ridiculous compared to their number
  • The few public data present on the web are not structured, thus making it difficult to use
  • Today, CTI generic reports help to identify the most obvious critical issues, now we need details

We are adding technology to every aspect of our lives; the net is becoming a medium like air and water; like them, it can carry threats. If a country is under strong attack, is it possible to have a complete disconnection from the Internet? Now, nobody knows what would happen; with 5G it will be increasingly probable to have a physical actor remotely controlled by an A.I. distributed at a transnational level: total or partial disconnection will become impractical. After having discarded the easy solution, the difficult one remains: we must assess the risks, prevent accidents and be ready with the recovery procedure.

But how can we decide the priorities? Cyber attackers are often looking backward more than forward, it is therefore sufficient to analyze what has happened so far and act accordingly. Sounds easy, but what has happened so far?

We all know so-called Cyber Threat Intelligence (CTI) Reports, we all read our yearly set of documents – full of statistics, charts, trends and comments – and we all do not have any idea about the underlying, often proprietary, data. But when we read the documents, we can note at least the following:

The need of quality data

Until now generic CTI reports have helped to identify the most obvious critical issues, now it’s time to go further, we need details about actors, victims, and vectors. Maintaining a list of incidents and statistics is useful, but to make data usable they must be normalized and include technical, industrial, and social aspects. Many lists and databases exists with data describing actors, victims, impacted assets, and other aspects of an attack, but normally they just describe what happened or categorize a very minimal number of information.

For specific tasks, in risk management process, it is useful to have solid data and not just perceptions. A good dataset should answer questions like what is the role of flash drives in incidents? How flash drives are involved in attacks to companies like mine?

Crafting our own CTI Reports

To craft our own report, we need raw high quality data; we can then observe them from our perspective and extract the facets we need. It is not enough to have Financial or Malware and another couple of columns to highlight – even with limited data – the patterns used in common attacks; we need to know more precisely who the actors are, who the victims, and what is being manipulated.

VERIS Database is the most structured and complete database of incidents we have seen; incidents are analyzed, data – where known – are normalized and verified, so that the consistency is maintained. Each incident is described with four elements:

Furthermore, the victim is assigned to an industrial sector using the NAICS standard (but the conversion to SIC/ISIC and others is possible), this is fundamental to obtain data concerning specific companies in our sector.

As example, in Manufacturing we have Petroleum and Coal Products Manufacturing and Chemical Manufacturing and although similar, there can be many differences, especially in IT standards. Taking a quick look at the data, we can spot the pattern used in attacks, the differences, and adjust our risk matrix:

Pivoting data from the Sector perspective:

Sector Asset Variety Action Variety Actor
3240 – Petroleum and Coal Products Manufacturing S – Database Misconfiguration Internal
3250 – Chemical Manufacturing U – Laptop Theft External
3250 – Chemical Manufacturing M – Payment card Possession abuse Internal

Pivoting data from the Asset perspective:

Asset Variety Actor Action Variety C-I-A Impact
S – Database External Knowledge abuse CIA
S – Database Internal Misconfiguration -IA
S – Database Internal Abuse of functionality -IA

Although they may seem like hot water, these are facts! Different from perceptions and with a high value as starting point or as integration in a risk management framework.

The VERIS Database is available as JSON file, very easy to parse with R, Phyton/Panda/Jupiter, or other tools. For our risk analysis, we prefer to flatten the JSON in CSV, extract the parameters we are interested in and pivot the data with worksheets.

Other resources

As mentioned before, VERIS Database is not the only data source in the net. Here some database of incidents:

Here some other CTI resources:


When doing intelligence work, models are sought that take into account multiple variables; intuition is a good thing, but concrete facts are needed to support decisions on how much and where allocate resources. It is not an easy job mostly because, even with continuous incidents happening, there are not enough public data, nor analyzed and organized in an effective manner. Several initiatives have been undertaken in recent years to at least be able to share data, but we have still a long way ahead.

About the Author

Rocco Gagliardi

Rocco Gagliardi has been working in IT since the 1980s and specialized in IT security in the 1990s. His main focus lies in network routing, firewalling and log management.


You want to test the strength of your enterprise regarding malware attacks?

Our experts will get in contact with you!

USB Armory Drive

USB Armory Drive

Rocco Gagliardi



Rocco Gagliardi

Sandboxing Containers

Sandboxing Containers

Rocco Gagliardi

You want more?

Further articles available here

You need support in such a project?

Our experts will get in contact with you!

You want more?

Further articles available here