Data Leakage Prevention - Tilting at Windmills

Data Leakage Prevention

Tilting at Windmills

Tomaso Vasella
by Tomaso Vasella
on April 22, 2021
time to read: 9 minutes

Keypoints

How to master Data Leakage Prevention

  • Unintentional data leaks are rightly feared as one of the biggest cyber risks
  • Data leakage prevention often promises more than is actually possible
  • Data Leakage Prevention cannot make up for inadequate processes or security gaps
  • Proven security concepts and recommendations can also help against data leakage
  • Preparations for emergencies should be part of the precautions
  • Data economy: Non-existing data cannot leak

Data leaks have long been feared as one of the greatest cyber risks. In recent years, more and more serious cases have occurred and have become publicly known. It is therefore not surprising that solutions to the problem of data leaks are being touted in the form of products and services. The associated promises sound tempting, especially since it usually seems much easier to introduce a new security tool than to change established processes and behaviors. Why this is nevertheless often necessary to prevent data leaks and what help DLP solutions can offer will be examined in the following sections.

Simply put, data leakage occurs when data leaves its intended habitat. In practice, this means that sensitive data such as confidential documents or customer information leaves system or organizational boundaries and then is placed in an environment that does _not offer the level of protection _demanded by the data content. In most cases, this also means that the data becomes accessible to unauthorized parties. Two important elements can be observed in this example:

These two points play a central role in data leakage prevention because they concern the detection (content) and prevention of unwanted data leakage and the place where this can occur (system boundaries and interfaces).

How does Data Leakage Prevention work?

The principle consists of detecting relevant data flows and the subsequent reaction based on defined rules. Data flows are analyzed in terms of the exchanged data content and the communication partners involved, and the data flow is then prevented, or an alarm is generated, or the user is warned. For example, sending sensitive information via email or downloading a document to an untrusted end device or copying a confidential document to a USB memory device might be prevented.

The idea is simple in concept but can be enormously complex in practical implementations. On the one hand, there is a huge amount of data formats that must be read and correctly interpreted, and on the other hand, it is often very difficult to automatically distinguish between permissible data flows that are necessary from a business perspective and those that are not.

An additional factor adds to this complexity: The numerous uses of cloud services blur system and organizational boundaries and lead to an increasing separation between data ownership and the ability to exercise technical control over one’s own data. Concepts such as Zero Trust can be helpful here.

The Challenges of DLP Rules

DLP rule definitions often make use of so-called regular expressions which are machine-readable descriptions of data patterns. They make it quite easy to describe things like an ISIN (international securities identification number) based on its format: Starts with two letters for the country code, followed by nine alphanumeric digits and a check digit. The corresponding regex might look like this:

[A-Z]{2}[A-Z0-9]{9}[0-9]{1}

But what happens if the ISIN appears in a document with one or more spaces? Or a character string appears which corresponds to the pattern mentioned, but the first two letters are not a valid country code at all? A regular expression would have to be used that contains all valid country codes – that’s over 250! What to do if the relevant pattern is not a text but contained in an image? Should one use optical text recognition? How to deal with container formats? Simply renaming a Word document that is blocked by DLP from .docx to .zip is sometimes enough to bypass detection. And so on. With increasing technical effort, more such cases can be handled. However, this quickly becomes confusing and can lead to very resource-intensive data analysis, yet it will never be possible to achieve a hundred percent hit rate.

The next level of challenges involves data that belongs together in terms of content but is unstructured from a machine perspective. Using rules to identify which combination of data elements is harmful in a data leak and which flows of unstructured data are permissible from a business perspective can be very difficult. The increasing collection of data (think Big Data) and the proliferation of NoSQL databases pose particular challenges to traditional DLP methods.

Similar observations can be made concerning the flow of data: Sending confidential documents with unencrypted email is clearly undesirable. But what if confidential documents are uploaded to a business partner’s cloud application? Should that cloud application be trusted? And if so, how does the DLP solution know that the cloud application is trustworthy, but others are not? And how do you know that the data is not subsequently leaking out of the cloud application, that is completely outside your own control?

Another challenge is posed by the strong increase in location-independent working, home office and BYOD concepts. It is usually no longer possible to analyze the relevant data flows at a central location. The data flows often take place directly between the endpoints and the various cloud applications, requiring measures directly on the end device to control unwanted data flows.

These circumstances typically lead to a high number of false positives. In other words, it is almost impossible to create precise rules for blocking data outflows without impacting business processes. As a result, a logging-only mode is often chosen practice and automated blocking is frequently omitted. How well the automatic detection of sensitive data and its automatic protection, which are often touted in DLP products, really work in practice can be guessed based on these considerations. Machine learning and algorithms for recognizing data content, behavior patterns and anomalies can help, but they cannot solve these problems entirely.

What can be done against data leakage?

In view of the above, one might conclude that data leakage cannot be prevented at all. To a certain extent, this is true, because it will never be possible to fully protect against a sufficiently motivated adversary. Someone with access to data will always be able to leak that data somehow, even if it is just via a screenshot. However, there are several security measures that can be taken to reduce these risks to an acceptable level. None of these are new:

Conclusion

The risk of data leakage is real and major damage can result. DLP solutions can help, but they cannot make up for poor processes or substantial gaps in the security posture. Data inventory and classification, solid baseline security measures, appropriate processes, and powerful security monitoring are well-known, commonly recommended measures that are also effective in helping to prevent data leaks. With the increasing use of cloud applications, monitoring and controlling endpoints and generally of all those elements that can still be controlled is more important than ever. Should an emergency occur, a communication strategy and predefined procedures are needed to be able to react quickly and effectively. That might also be a good occasion to think again about data economy.

About the Author

Tomaso Vasella

Tomaso Vasella has a Master in Organic Chemistry at ETH Z├╝rich. He is working in the cybersecurity field since 1999 and worked as a consultant, engineer, auditor and business developer. (ORCID 0000-0002-0216-1268)

Links

You want to evaluate or develop an AI?

Our experts will get in contact with you!

×
Passwordless Authentication

Passwordless Authentication

Tomaso Vasella

Webscraping with Powershell

Webscraping with Powershell

Tomaso Vasella

Data Encryption in the Cloud

Data Encryption in the Cloud

Tomaso Vasella

You want more?

Further articles available here

You need support in such a project?

Our experts will get in contact with you!

You want more?

Further articles available here