Data Loss Prevention

Are you blocking yet?

by Dominik Altermatt

time to read: 9 minutes

Keypoints

DLP systems are designed to identify and prevent unwanted data leakage
Often, systematic blocking is not used because the rules are so error-prone
DLP is therefore designed to serve more as a broken process detector
Projects must be business-driven and approached with an eye toward reporting

For years it has been clear that internal employees pose a significant risk when it comes to the issue of data breaches. Data loss prevention tools (DLP) offer one approach to dealing with this problem. One of the biggest reasons for using these tools is the ability to prevent data leakage automatically (blocking rules). It sounds almost too good to be true: After implementing a DLP tool, simply activate the predefined rules, watch the behavior for several weeks in monitoring mode, then change to blocking mode straight away and you have an effective risk management solution. Of course it’s not that simple. In reality, there are few enterprise-level companies willing to take the risk of deploying blocking mode even on a rudimentary level.

Why blocking is avoided

Simply put, because the DLP rules are imprecise and thus return too many false positives (FP). The FPs resulting from blocking would impede key business processes or even render them inoperative altogether.

There are many different reasons for this, from fundamentally incorrect assumptions as to how DLP solutions are supposed to be used through to highly complex protected objects that cannot simply be described with a simple DLP rule.

Three incorrect assumptions

DLP tools protect companies against data leakage: Data leakage prevention is a putative term and misleading in terms of the actual value of a DLP solution. If the basic measures for protecting data (technical and organizational) are not particularly well implemented, a DLP solution is of little or no use in protecting against data leakage. It might be better to consider DLP as a controlling tool. A good DLP implementation shows where there are gaps in existing measures or where measures have not yet been implemented, allowing improvements to be made. This principle or ones similar to it should be the basis for defining DLP rules.
Blocking mode is the ultimate goal: As previously mentioned, it’s more useful to understand where there are gaps in processes or technical measures so that these can be dealt with appropriately. Once blocking is sufficiently refined with highly effective DLP rules, it can of course have very desirable results.
DLP projects are IT projects: Anyone who thinks that an IT department can define effective DLP rules might be mistaken. The Risk and Business departments should be able to define the protected objects to be described in the DLP rules. Without their input, the IT department cannot define effective rules. Organizational measures must be implemented here to ensure that those IT departments that are implementing the DLP are provided with correct data about the assets requiring protection. Unfortunately, the DLP rules that are built-in to the DLP solutions are all too often applied without seriously considering the structure and organization of the protected objects.

Protected objects

You find all kinds of different data in the use cases for which the DLP solutions were designed to minimize the risk of data leakage. In addition to the false assumptions just described, the complexity of various protected objects is often the reason behind the high number of false positives. The problem can be illustrated using the following example with customer data.

Simple protected objects

The easiest objects to protect are identification numbers, such as account numbers. IDs, particularly those with more characters than a telephone number, are easy to implement (with indexing or regular expression rules). No FPs or only negligible numbers can be expected for these types of IDs.

Complex protected objects

The problem is with data that does not always fit the same pattern, such as free text fields, e.g. names or addresses. These types of data are always susceptible to FPs, because basically anything can occur here.

If, for instance, you want to represent your customer base using a DLP, you would most likely index the data from the customer database to create a DLP rule. However, this raises a whole set of problems. Even in the rather unusual scenario in which the customer database has been entirely cleaned up and validated, there are still plenty of sticking points in the various attributes of the free-text fields.

It is very likely that a process will have to be defined to prevent unvalidated data from being included in the DLP rules. Only an undesired attribute in the DLP rule can unleash a storm of FPs.

If you have the data quality under control, free-text fields still always have various idiosyncrasies that increase FP rates. For example, a popular Asian first name is “An”. But “An” is also a commonly used preposition in the German language. Do we exclude customers named “An” or accept the high number of FPs triggered by the presence of the preposition?

Ultimately, there is no way to get around excluding certain data from the DLP rules, especially when you have to cover thousands or hundreds of thousands of data records. If you then exclude certain data in order to minimize the FP rates, you also reduce the coverage of the DLP rule at the same time. If 5% or 10% of the initial data has to be excluded from the customer database, this must be documented and accepted as a residual risk.

If you have reached this point, the tools included with the DLP solutions are unlikely to be very helpful. The DLP setup is, of course, highly specific to the particular company and must be able to handle this complexity on its own.

The solution

Along with the technical DLP implementation, an administrative body must be introduced to deal with DLP rules in particular. This sort of DLP policy management is designed to create the necessary transparency and thus information about the quality and status of the DLP rules, as well as to constantly improve the quality of the DLP rules.

DLP policy management

DLP policy management should take a position between the Business and IT departments. A two-person team, one from the Business (Risk) and one from the IT (DLP rule author) departments is one possible method.

It is important to understand that while DLP rules can be defined during an initial (implementation) project, they usually result in very high FP rates. The important tuning (minimizing FP rates and increasing coverage) of the DLP rules can only succeed over time. Thus, policy management is aimed at ensuring that even after the project is complete, DLP rules can be improved iteratively during operation.

Tools and processes

So how is DLP policy management supposed to work? Basically, it is essential to understand that only an iterative process can be effective when it comes to protecting large volumes of data. Too many factors affect the quality of DLP rules, and most of these factors are still unknown during the initial DLP project. The following concepts can therefore be effective.

The simplest representation of a DLP rule is its configuration in the DLP solution, but this does not reveal anything about the quality of the rule. To gain an end-to-end view of a DLP rule, the following factors should be documented, at the very least:

Requirement for the DLP rule (ideal state: cover 100% of customer names/no FPs)
Current implementation (actual state: 80% coverage)
Analysis of the results (actual state: 30% FPs)

To address the iterative aspect, version management needs to be introduced.

Reporting

If the DLP rules are managed in an end-to-end view throughout their lifecycle, the reporting is the easy part. The information is logical and does not have to be interpreted – but only if the data collection is centralized and automated. On the other hand, the standard DLP solutions offer only a few tools for this and often you are on your own. For this reason, the reporting features in the DLP implementation project should be evaluated closely and consideration given to the requirements of a DLP policy management solution.

Conclusion

DLP projects should be business-driven. IT departments are service providers wanting to implement the requirements of their business. Because the detailed requirements for DLP solutions cannot be specified by the IT departments in the same way as an anti-virus solution, this input must be provided clearly on the business end.

DLP solutions can be very expensive and yet have only a minimal impact in reducing the risk of data leakage. It becomes tempting to install a tool with a blocking mode and then assume that risks have been effectively minimized. This problem is particularly common at the enterprise level. But if a DLP solution is thought of as a controlling tool for already-existing measures to prevent data leakage, it serves as an excellent broken process detector.

Yet inconsistencies in data management (here DLP at least identifies the first broken processes) and the properties of the many different kinds of data requiring protection make creating and using effective DLP rules complex. In addition, the tools included with DLP solutions are usually inadequate. For this reason, clearly structured policy management should be implemented as the operational standard at the same time. During the evaluation of the DLP solution, particular attention should also be paid to reporting functions. If the budget does not provide for policy management, the result is usually stagnation, which is concealed by false reporting (which increases risks!), or a new DLP solution is considered in order to straighten it out again. This is expensive and the benefits are unfortunately limited.

Those who can implement DLP policy management from the outset should do so conscientiously. Those still struggling with an onslaught of FPs or unsatisfactory coverage should also consider introducing a DLP policy management solution – even if this may require taking a few steps backwards.

About the Author

Dominik Altermatt is working since 2003 in the IT business and was responsible for Data Leakage Prevention at a Swiss bank for many years. Besides traditional penetration testing he is also focusing on the introduction and improvement of IT security management processes. (ORCID 0000-0003-4575-4597)

Is your data also traded on the dark net?

We are going to monitor the digital underground for you!

You want more?

Further articles available here

Analyzing Threema Messages on iOS

Ian Boschung

Facial Recognition Injection Attacks

Yann Santschi

Hidden data trade

Michèle Trebo

Chaos Communication Congress 38C3

Ralph Meier

You need support in such a project?

Our experts will get in contact with you!

You want more?

Further articles available here

Data Loss Prevention

Are you blocking yet?

Keypoints

Why blocking is avoided

Three incorrect assumptions

Protected objects

Simple protected objects

Complex protected objects

The solution

DLP policy management

Tools and processes

Reporting

Conclusion

About the Author

Links

Tags

Is your data also traded on the dark net?

You want more?

Analyzing Threema Messages on iOS

Facial Recognition Injection Attacks

Hidden data trade

Chaos Communication Congress 38C3

You need support in such a project?

You want more?