Over the past few years, there have been a number of data breaches of companies using AWS but not securing the data correctly. Leaving Personally Identifiable Information (PII) in publicly accessible buckets on S3 seems to be a common problem. While AWS is a very secure environment, AWS’s shared responsibility clearly shows that their job with security ends at “Security of the Cloud” which leaves “Security in the Cloud” up to the customer (https://aws.amazon.com/compliance/shared-responsibility-model/). The customer’s security responsibility includes securing operating systems on EC2/ECS, network configurations and firewall settings, Identity and Access Management (IAM) and data encryption which is what the rest of this article is about.
One strategy for data encryption is called Envelope Encryption which can be used to encrypt data in transit or at rest. The envelope encryption service used in AwAws was built around AWS Key Management Service (KMS) which is a dedicated Hardware Security Module (HSM) to secure and encrypt data at rest and by leveraging the AWS Encryption SDK (available for python) following their best practices.
How it works
KMS allows us to build and store encryption keys in a very secure way using FIPS 140-2 validated cryptographic modules which do not allow the managed keys to leave the service. The idea here is that you can’t get access to the the actual key - you can send data to be encrypted/decrypted by the service but you can’t get the key yourself and use it from encryption/decryption. Items to be encrypted are sent to the service over a TLS connection, encrypts the data and then returns the encrypted data. The amount of data that can be sent to the service is limited to 4kb (that’s 4 kilobytes) which is not very much. This is great for passwords or API secrets but not very useful for large files. This kind of makes sense though, you don’t necessarily want to be moving large amounts of sensitive unencrypted data around. It also makes sense from the service’s standpoint as the KMS service could easily become a bottleneck if people start trying to send huge files to it.
This is where envelope encryption comes in. Envelope encryption is a two-layer encryption system which involves creating a custom data encryption key to encrypt a specific set of data and then using the KMS master key to encrypt the custom data key. The encrypted key is then stored along side the data that was encrypted. The benefit of this technique is that all of the data objects will have their own unique encryption key which can only be opened with the master key.
Envelope Encryption Steps
Let’s say we want to create a lambda service that encrypts files coming in through API Gateway and then stores the encrypted files in S3. Our encryption service would work as follows:
receive data object from API Gateway
locally create a unique key specifically for this data object
use this key to encrypt the data
So, at this point, we have a data object that has been encrypted with some random key that we generated on the fly. Now we need to do something with the key so that we can decrypt the data later on. In theory you could store the key in a database, but if someone got access to the database they would be able to decrypt all of your data! Besides, managing all the data objects locations in S3 and the keys for those objects would be a bit of a nightmare (like say someone renames the files). This is where the envelope part comes in. We take the unique key we generated and:
send the unique key to KMS to be encrypted with our master key
get back the encrypted key
attach the now encrypted unique key to the encrypted data
now the data can be safely moved or stored (encryption in transit, encryption at rest)
We will also attach an encryption context to the object. This is a set of plain text key/value pairs (tags) that identifies the encrypted data. These tags not only help identify and track the data, the tags are also cryptographically bound to the encrypted data - meaning it is also required for encrypting and decrypting the data.
Ultimately, we wind up with a new data object that contains:
an encrypted version of the original data object
a set of plain text key/value pairs that identify the original data which cannot be altered
the encrypted key which (when decrypted by KMS) can be used to decrypt the data
Kind of cool right. Since we attach the key to the data we don’t have to worry about losing track of which keys belong to which data, or someone coming in and stealing all our keys and data. The only way to get at the original data object is via access to the master key.
In Transit
This could be set up to move data securely across the internet. The idea would be that we can encrypt data before it leaves the sender site, send it over a secure TLS connection, and the immediately store the data in object storage in our cloud production environments. In this case, we would encrypt the data encryption key with a shared RSA key (similar to the way ssh connections work), send that encrypted key over TLS to the KMS service which would then decrypt RSA and encrypt the key with AES-GCM.
Encryption Algorithms (The Gory Details)
AwAws was set up to use a 256 bit AES-GCM (Advanced Encryption Standard algorithm in Galois/Counter Mode) with an Elliptic Curve Digital Signature Algorithm (ECDSA with P-384 and SHA-384) and a HMAC-based extract-and-expand key derivation function (HKDF - SHA-384) which helps prevent accidental reuse of a data encryption keys.