Security Testing
Tomaso Vasella
How to Encrypt Data in the Cloud
The EU and the USA had therefore reached an agreement: Data importers in the USA could certify themselves according to the requirements of the Safe Harbor agreement and were then considered safe recipients. However, the Safe Harbor Agreement was terminated by a ruling of the European Court of Justice, following a complaint of Max Schrems, an Austrian data protection activist (Schrems I ruling). The EU and the USA replaced it with an improved agreement called the Privacy Shield. Switzerland followed this in each case which is why there was also a CH-US Privacy Shield that allowed transmissions from Switzerland to certified recipients in the USA.
The European Court of Justice abolished the Privacy Shield with immediate effect in its ruling from July 2020 (Schrems II ruling). It concluded that US law allows too extensive access to data with too little legal protection.
As an alternative to the Privacy Shield, the data exporter and the data importer could enter data transfer contracts. This happened almost always on the basis of a set of contracts recognized by the EU, the so-called Standard Contractual Clauses (or Model Clauses). These clauses have not been abolished by the European Court of Justice. But the ECJ has raised serious concerns since an agreement with a US importer cannot prevent access by the authorities. These concerns also apply to all other unsafe states and the Federal Data Protection and Information Commissioner (FDPIC) also believes that these clauses can only be sufficient in exceptional cases.
It is clear that the fundamental problem of undesired data access cannot be solved by contracts alone. Therefore, technical measures are focused that prevent such data access or at least make it more difficult. Access protection by means of data encryption is one of the most important measures and is widely used. The FDPIC also expressly referred to BYOK (Bring Your Own Key) and BYOE (Bring Your Own Encryption). This article covers the most common terms and measures related to the encryption of data located in the cloud or off-premises, respectively.
For the following sections, it is useful to remember some basic principles of data encryption. Simply put, the goal of data encryption in this context is to protect data from unauthorized access. This is achieved by transforming data into an unreadable form so that it can only be converted back into its readable from by authorized parties. In the field of cryptography, the following terms are used:
Action | Term | Description |
---|---|---|
Data | Cleartext, Plaintext | Designation for data in its original, readable form. Plain text refers to any kind of unencrypted data, not just text. |
Data transformation | Encryption | Encryption with an algorithm and using a key. There are symmetric algorithms that use the same key for encryption and decryption and asymmetric algorithms that use a pair of keys. The encrypted data as the result of the encryption process is called ciphertext. |
Conversion into the readable form | Decryption | Decryption of the ciphertext with an algorithm and using a secret key. Assuming that the algorithm used has no exploitable weaknesses, the ciphertext cannot be decrypted without knowing the key. |
Data can be stored (data at rest), processed (data in use) or transported (data in motion). Naturally, data must also be protected while in motion for example by means of transport encryption. However, transport encryption will not be discussed further in the following chapters since corresponding measures such as using TLS or VPNs have been well established independently of cloud applications.
The goal that data can only be read by authorized parties is achieved by making the key required for its decryption accessible to only these parties. The management and protection of cryptographic keys is therefore extremely important. This brief observation alone already suggests that some practical challenges arise:
Besides the challenges of key management, the following facts are important for the subsequent considerations:
Cryptographic processes are often performed in software, i.e. within an application or by an operating system. A complete isolation of these processes from other processes on the same system cannot be achieved by this. A system administrator can access the systems memory and influence the cryptographic process or read the data or the encryption key. Hardware security modules can be used to reduce this attack surface.
An HSM is a specialized hardware device for cryptographic applications. A software application can communicate with the HSM via an interface and use its cryptographic functions. These are mainly the secure generation and management of cryptographic keys as well as all processes that use these keys. It is essential that the secret keys are generated in the HSM and never leave it. HSMs therefore have a number of protection measures implemented.
Several cloud providers enable their customers to use HSMs. The HSM itself is managed by the cloud provider, but the customer can use its functionalities. For example, HSM-based key management in the cloud could be implemented this way. To be able to rely on the security provided by an HSM, one must also be able to rely on the fact that the protected keys can never leave the HSM. It is therefore advisable to check exactly how the keys are generated, managed and used for each application that uses the HSM.
The term BYOK refers to a model in which the customer or the data owner generates and uses its own cryptographic keys and ideally has sole access to them. However, the BYOK model has weaknesses and practical challenges that are important to know:
A pure BYOK model allows control over the keys, but not over the cryptographic algorithms. Bring Your Own Encryption (BYOE) differs from that by adding the possibility to manage the cryptographic algorithms. There are even solutions that allow to bring in your own cryptographic functionalities. From a security point of view, these additional control options play an insignificant role and the above points are also largely applicable for a BYOE model.
Several cloud service providers have developed more advanced models allowing for higher degrees of data confidentiality. One of them is called Hold Your Own Key (HYOK) and refers to a process where data is encrypted before it even enters the cloud and the key material is not transferred to the cloud.
The goal of this model is to prevent the plain text data from ever entering the cloud. Provided that a proper implementation is used, this is can be a suitable approach to prevent unwanted data access. However, it usually causes serious impacts on the functionality of cloud applications. Because the data content is completely inaccessible to the cloud provider, a fact sometimes called data opacity, most applications work only partially or not at all. For this reason, this method is typically only used for a few, selected data areas, for example individual documents or sensitive fields in a database. On the other hand, data that only needs to be stored and in the cloud and doesn’t require cloud processing can easily be encrypted before it is transferred to the cloud. An example for this would be storing encrypted backups in the cloud.
Double Key Encryption is a term used by Microsoft for a data encryption model that uses two different keys. One is under the control of Microsoft, the other one is controlled by the cloud user only. From a security perspective, this model differs only slightly from the HYOK model. Here too, the challenge lies in the fact that the cloud applications cannot read the data content and therefore function only partially or not at all.
Tokenization replaces plain text data with irreversible, non-sensitive placeholders (tokens). In contrast to encryption, tokenization does not use a key in conjunction with a mathematical algorithm to transform sensitive data. With tokenization, it is possible to ensure that the replaced data has the same length and data type as the original data. This is a clear advantage over data encryption since data tokenized in this way can be processed by cloud applications, unlike traditionally encrypted data. However, there are also disadvantages. While tokenization can be used quite well for structured data fields such as credit card numbers, it is not well suited for unstructured data such as entire documents.
Instead of a key, tokenization uses a mapping table (sometimes called token database or token vault) to associate the plain text data with the tokens. These mapping tables can grow large and can have negative effects on application performance. Tokens can only be transformed back into the original data by accessing the mapping table while encrypted data only needs the correct key to be decrypted. Therefore, tokenized data can be less suited for sharing with authorized parties than encrypted data but tokenization can be combined with encryption.
Tokenization is commonly implemented by using specialized on-premise solutions. To enable applications to correctly process tokenized data, the tokenization solution must specifically support the respective applications.
The following table summarizes important properties of the described encryption models, assuming their correct implementation.
Model | Hold You Own Key | Double Key Encryption | Bring Your Own Encryption | Key Management Cloud Provider | Tokenization |
---|---|---|---|---|---|
Data is accessible by cloud provider | no | no | yes | yes | no |
Cloud-only solution | no | no | yes | yes | no |
On-premise HSM required | yes | yes | yes* | no | n.a. |
* The keys could also be generated without using an on-premise HSM and could then be transferred to the cloud but this would seriously compromise their confidentiality.
When used correctly, all described models can help to increase data protection the cloud. However, none of them provides complete protection against third-party access or can result in a severe impact on the functionality of cloud applications. It is therefore necessary to carefully analyze the relevant threat scenarios and threat actors and to choose the most suitable model on this basis. For example, a Bring Your Own Key scenario can effectively protect against unauthorized access to data at rest, but it cannot prevent access by the cloud provider. Comprehensive protection of data essentially always requires a combination of different measures and must cover the entire data life cycle.
Aside from specific scenarios such as storing data in the cloud, using cloud applications while completely protecting data against any undesired access are two goals that are ultimately incompatible. Therefore, marketing promises should not be blindly trusted and a risk-based approach with a carefully selected balance between functionality and data protection is required.
Our experts will get in contact with you!
Tomaso Vasella
Tomaso Vasella
Tomaso Vasella
Tomaso Vasella
Our experts will get in contact with you!