Data Leakage Prevention

Tilting at Windmills

by Tomaso Vasella

on April 22, 2021

time to read: 9 minutes

Keypoints

How to master Data Leakage Prevention

Unintentional data leaks are rightly feared as one of the biggest cyber risks
Data leakage prevention often promises more than is actually possible
Data Leakage Prevention cannot make up for inadequate processes or security gaps
Proven security concepts and recommendations can also help against data leakage
Preparations for emergencies should be part of the precautions
Data economy: Non-existing data cannot leak

Data leaks have long been feared as one of the greatest cyber risks. In recent years, more and more serious cases have occurred and have become publicly known. It is therefore not surprising that solutions to the problem of data leaks are being touted in the form of products and services. The associated promises sound tempting, especially since it usually seems much easier to introduce a new security tool than to change established processes and behaviors. Why this is nevertheless often necessary to prevent data leaks and what help DLP solutions can offer will be examined in the following sections.

Simply put, data leakage occurs when data leaves its intended habitat. In practice, this means that sensitive data such as confidential documents or customer information leaves system or organizational boundaries and then is placed in an environment that does _not offer the level of protection _demanded by the data content. In most cases, this also means that the data becomes accessible to unauthorized parties. Two important elements can be observed in this example:

Data requires a minimum level of protection proportionate to its content
Data leakage implies crossing of system limits

These two points play a central role in data leakage prevention because they concern the detection (content) and prevention of unwanted data leakage and the place where this can occur (system boundaries and interfaces).

How does Data Leakage Prevention work?

The principle consists of detecting relevant data flows and the subsequent reaction based on defined rules. Data flows are analyzed in terms of the exchanged data content and the communication partners involved, and the data flow is then prevented, or an alarm is generated, or the user is warned. For example, sending sensitive information via email or downloading a document to an untrusted end device or copying a confidential document to a USB memory device might be prevented.

The idea is simple in concept but can be enormously complex in practical implementations. On the one hand, there is a huge amount of data formats that must be read and correctly interpreted, and on the other hand, it is often very difficult to automatically distinguish between permissible data flows that are necessary from a business perspective and those that are not.

An additional factor adds to this complexity: The numerous uses of cloud services blur system and organizational boundaries and lead to an increasing separation between data ownership and the ability to exercise technical control over one’s own data. Concepts such as Zero Trust can be helpful here.

The Challenges of DLP Rules

DLP rule definitions often make use of so-called regular expressions which are machine-readable descriptions of data patterns. They make it quite easy to describe things like an ISIN (international securities identification number) based on its format: Starts with two letters for the country code, followed by nine alphanumeric digits and a check digit. The corresponding regex might look like this:

[A-Z]{2}[A-Z0-9]{9}[0-9]{1}

But what happens if the ISIN appears in a document with one or more spaces? Or a character string appears which corresponds to the pattern mentioned, but the first two letters are not a valid country code at all? A regular expression would have to be used that contains all valid country codes – that’s over 250! What to do if the relevant pattern is not a text but contained in an image? Should one use optical text recognition? How to deal with container formats? Simply renaming a Word document that is blocked by DLP from .docx to .zip is sometimes enough to bypass detection. And so on. With increasing technical effort, more such cases can be handled. However, this quickly becomes confusing and can lead to very resource-intensive data analysis, yet it will never be possible to achieve a hundred percent hit rate.

The next level of challenges involves data that belongs together in terms of content but is unstructured from a machine perspective. Using rules to identify which combination of data elements is harmful in a data leak and which flows of unstructured data are permissible from a business perspective can be very difficult. The increasing collection of data (think Big Data) and the proliferation of NoSQL databases pose particular challenges to traditional DLP methods.

Similar observations can be made concerning the flow of data: Sending confidential documents with unencrypted email is clearly undesirable. But what if confidential documents are uploaded to a business partner’s cloud application? Should that cloud application be trusted? And if so, how does the DLP solution know that the cloud application is trustworthy, but others are not? And how do you know that the data is not subsequently leaking out of the cloud application, that is completely outside your own control?

Another challenge is posed by the strong increase in location-independent working, home office and BYOD concepts. It is usually no longer possible to analyze the relevant data flows at a central location. The data flows often take place directly between the endpoints and the various cloud applications, requiring measures directly on the end device to control unwanted data flows.

These circumstances typically lead to a high number of false positives. In other words, it is almost impossible to create precise rules for blocking data outflows without impacting business processes. As a result, a logging-only mode is often chosen practice and automated blocking is frequently omitted. How well the automatic detection of sensitive data and its automatic protection, which are often touted in DLP products, really work in practice can be guessed based on these considerations. Machine learning and algorithms for recognizing data content, behavior patterns and anomalies can help, but they cannot solve these problems entirely.

What can be done against data leakage?

In view of the above, one might conclude that data leakage cannot be prevented at all. To a certain extent, this is true, because it will never be possible to fully protect against a sufficiently motivated adversary. Someone with access to data will always be able to leak that data somehow, even if it is just via a screenshot. However, there are several security measures that can be taken to reduce these risks to an acceptable level. None of these are new:

Data Inventory: It must be known which data is located where (data inventory). At least since the creation of the General Data Protection Regulation (GDPR), this topic has gained increased attention.
Data Classification: Data should be labeled according to its sensitivity. For this purpose, it makes sense to use a data classification scheme. Technical tools exist that can help by analyzing data content through pattern recognition.
Policies: Furthermore, clear guidelines are needed to regulate what is permitted and what is not in accordance with the defined the data classification levels. The processes for data storage and processing must comply with these requirements. In this context, it is worth paying increased attention to the topic of data economy. And not only regarding the protection of personal data.
Baseline Security Measures: Solid technical and organizational security measures, at least adequate baseline security measures, must be implemented that take the above points into account. The implementation of correct access control, whether on premises or in the cloud, is an essential element here. For particularly critical data, especially stringent measures should be implemented. The importance of endpoint security, including mobile devices, has increased again in times of working in the home office. Securing and monitoring endpoint devices well is therefore an essential part of effective basic protection.
Logging and Monitoring: Collecting and centrally analyzing security-relevant information is indispensable, for example with a SOC. In the context of data leakage, special attention should be paid to monitoring data access and data flows, but the detection of threats, undesirable behavior and anomalies is also useful.
Use of Encryption: If used correctly, data encryption can provide very effective access protection for sensitive data. This applies to all technically controllable areas such as endpoints, data connections, portable storage media, etc. but must also consider data that is stored and used in the cloud.
Security Awareness: Sufficient protection cannot be achieved with technical measures alone. Users must be actively involved and therefore their security awareness is important to help preventing data leaks. Warnings and blocking of data activities can raise the security awareness of end users and increase the chance of preventing accidental data leaks.
DLP Solutions: If the above points are implemented and the defined processes are actually followed, DLP solutions can be helpful in identifying gaps and for targeted improvements. Especially in combination with other information, the features of such solutions can be very helpful to monitor processes and activities and to identify deviations.
Communication Strategy: Once data has been leaked, the damage is done. But it is helpful to do damage control, and that includes a predefined communication strategy. Especially if personal data is involved or in case a regulated area is affected, breach notification may even be mandated by law.

Conclusion

The risk of data leakage is real and major damage can result. DLP solutions can help, but they cannot make up for poor processes or substantial gaps in the security posture. Data inventory and classification, solid baseline security measures, appropriate processes, and powerful security monitoring are well-known, commonly recommended measures that are also effective in helping to prevent data leaks. With the increasing use of cloud applications, monitoring and controlling endpoints and generally of all those elements that can still be controlled is more important than ever. Should an emergency occur, a communication strategy and predefined procedures are needed to be able to react quickly and effectively. That might also be a good occasion to think again about data economy.

About the Author

Tomaso Vasella has a Master in Organic Chemistry at ETH Zürich. He is working in the cybersecurity field since 1999 and worked as a consultant, engineer, auditor and business developer. (ORCID 0000-0002-0216-1268)

You want to evaluate or develop an AI?

Our experts will get in contact with you!

Security Testing

Tomaso Vasella

The new NIST Cybersecurity Framework

Tomaso Vasella

Flipper Zero WiFi Devboard

Tomaso Vasella

Denial of Service Attacks

Tomaso Vasella

You want more?

Further articles available here

You need support in such a project?

Our experts will get in contact with you!

You want more?

Further articles available here

Data Leakage Prevention

Tilting at Windmills

Keypoints

How does Data Leakage Prevention work?

The Challenges of DLP Rules

What can be done against data leakage?

Conclusion

About the Author

Links

Tags

You want to evaluate or develop an AI?

Security Testing

The new NIST Cybersecurity Framework

Flipper Zero WiFi Devboard

Denial of Service Attacks

You want more?

You need support in such a project?

You want more?