End-to-end encryption

Requests from clients to servers are secured using TLS (as are requests between servers), which ensures that no unauthorized person can read messages by observing network traffic unless they are capable of circumventing TLS. However, messages are stored on the servers, which means that anyone who is able to access the data on the server, such as server administrators or malicious attackers, will be able to read the messages.

End-to-end encryption encrypts messages on the sending client in such a way that only the recipients’ clients are able to decrypt it — the server, as well as any one who accesses the data on the server, only sees the encrypted version, rather than the original message.

Glossary

When talking about encryption, the plaintext is the message that gets encrypted, and the ciphertext is the result of encryption. Data that does not get encrypted is called cleartext.

A brief history of encryption

There are many ways of encrypting messages. We won’t go into too much detail about the different ways of performing encryption, but we will give a high-level overview in order to explain why Matrix uses the methods that it does.

The most basic form of encryption is known as symmetric encryption, in which a single key is used for encrypting and decrypting. This type of encryption has been used for thousands of years, though early form were quite basic. Symmetric encryption is used in virtually every encryption system today, but often as a building block, rather than as the main part. When used for encrypting messages, the fact that the same key is used for encrypting and decrypting means that both the sender and the recipient must somehow agree on a key. This poses a sort of chicken-and-egg problem: in order for the sender and recipient to agree on a key, it seems like they need to be able to communicate in secret, for which they need encryption.

This problem was solved in the 1970s with the advent of public-key encryption. In this type of system, the recipient (call her Alice) generates a public and private keypair and publishes the public key. Anyone who wishes to communicate with Alice (say, Bob) can take the public key, encrypt the message using it, and send the message to Alice. After a message has been encrypted with a public key, it can only be decrypted by the private key, which is only known by Alice. Thus Bob can send a message securely to Alice without anyone else being able to read it.

Or, at least, Bob can send a message securely to the owner of the public key. One issue that comes up in this case is that Bob needs some way to ensure that the public key that he received actually belongs to Alice. This is the problem of key verification, which in the past often consisted of trying to meet in person and comparing long strings of characters, if it was done at all.

Encryption systems can ensure that different properties are satisfied. For example, the property of secrecy ensures that only the sender and recipient(s) can read the message; obviously, secrecy is a property that we want in an encryption system. Some other properties are authenticity and integrity, which mean, respectively, that Alice can ensure that the message that she received came from Bob rather than from someone else, and that the message was not modified. These properties are not usually provided by symmetric encryption nor public-key encryption algorithms on their own, but can be provided by other algorithms, such as digital signature algorithms, which were also introduced in the 1970s.

More recent developments have allowed encryption systems to provide other security properties. Some properties focus on what kind of security can be maintained for a sequence of messages in the face of a compromise of a device or a device’s public long-term keys. For example, if Alice was only using a public-key encryption system and an attacker manages to obtain Alice’s private key somehow, then they could decrypt any message, past or present, that was encrypted for that key. If Alice wants to receive new messages that the attacker cannot read, she must generate a new keypair and ensure that Bob, and any other potential senders, updates to her new public key and stops using her old public key. Any key verification that was done previously would need to be redone. In effect, Alice would need to restart from scratch. In addition, this must be done manually by Alice; if Alice is unaware that her key has been compromised, she may not generate a new keypair, and the attacker will be able to continue reading her messages.

Perfect forward secrecy (also sometimes called simply forward secrecy) means that if a user’s long-term keys are compromised (e.g. the private key corresponding to a public key that Alice has published), an attacker cannot decrypt messages that have already been sent. This can be true even if the attacker has recorded all communication between the users. In contrast, future secrecy (also sometimes called backward secrecy) means that an attacker cannot decrypt messages that are sent after the keys have been compromised.

Another property known as deniability means that Alice cannot, by herself, cryptographically prove to a third party that the message that she received came from Bob and not from someone else. This seems to be in conflict with the property of authenticity — Alice can be sure that the message came from Bob — but some systems provide both properties at the same time. One way to do that is for Bob to send a message in a way that Alice can be sure that either she or Bob sent it. Since she knows that she didn’t send it herself (she has no reason to fool herself), she knows that it was indeed sent by Bob. But she cannot prove it to someone else, because she cannot prove that she did not create the message herself.

Most encryption systems are designed for communication between two parties. A group discussion can be encrypted by treating it as multiple two-way communications. For example, when Alice wants to send a message to a group that includes Bob and Carol, she can encrypt the message separately to both. However, this can be inefficient and be prohibitively slow when the group is large.

So far, we have been discussing “Alice” and “Bob” as if they were each a singular recipient. However, Alice may have multiple devices that she receives her messages on, say, a laptop and a tablet. If she her tablet gets stolen, she will still want to be able to receive messages on her laptop without her tablet being able to decrypt the messages, so that whoever has stolen her laptop cannot read them. Because of this, systems that allow multiple devices may treat each device as a separate recipient. Note that when we speak of “devices”, we do not necessarily mean a physical device; clients on the same physical device generally do not share information between themselves, so different clients on the same physical device are treated as separate devices. In Matrix, devices are identified by the device ID given when the client logs in.

Usability issues

Some times, security properties come in conflict with usability. For example, one of the results of a system that has all of the security properties above is that when a user logs into a new device, they will not be able to read messages that were already sent. However, many users do want to be able to read old messages on a new device. On the other hand, some users may need the extra privacy given by a system that does not allow that. Moreover, users in the same room may have differing security needs. Perhaps Alice has a high degree of control over her devices and so can afford to keep the full history of her discussions, whereas Bob is in a situation where others may have access to his devices and so would prefer to be able to “purge” his conversation history.

The approach taken by Matrix is to start with an encryption system that has a high degree of security, but then to provide workarounds to allow users to make their own decisions about the security-usability tradeoffs.

Handling secure data

Todo

This might fit better somewhere else

Care must be taken when handling data that is intended to be used with encryption. This includes any data that is going to be encrypted, text that has been decrypted, keys, passwords, etc. Clients should try to prevent attackers from being able to access these in the event of a compromise of the device. For example, clients should:

  • overwrite the data after it is done being used (certain languages or libraries can make this easier, such as zeroize for Rust),

  • avoid keeping data in memory longer than it is needed,

  • ensure that the data isn’t written to the swap file (for example, by using the mlock system call on POSIX systems) or to temporary files,

  • avoid copying the data unnecessarily,

  • ensure that sensitive data written to disk (or other permanent storage) is stored securely,

Some languages, especially higher-level languages, may not give you enough control to do these. This may affect the choice of language if security is a high priority compared to other factors. In addition, this data may need to be passed to other libraries (for example, to display the message to users) that may not treat the data as carefully. Nevertheless, clients should do their best in handling sensitive data.

Olm and Megolm

The encryption system currently specified for Matrix is a combination of two parts, known as Olm and Megolm. Olm and Megolm both use a method called a cryptographic ratchet. In essence, the state of an Olm or Megolm session changes each time that an event is sent or received, but previous states cannot be recovered. This is part of what gives perfect forward secrecy. Olm is based on Signal’s Double Ratchet protocol, and establishes secure one-to-one channels between two devices. Events within rooms are encrypted using a Megolm session created by the sender; these Megolm sessions are distributed to the recipient devices using Olm, allowing the recipients to decrypt the room events.

The combination of Olm and Megolm is similar to how encryption is commonly done with certain public-key encryption algorithms. In some cases, public-key encryption is much slower than symmetric encryption. In addition, if you are sending a message to many recipients, using public-key encryption means that each recipient gets a separate ciphertext, which will be large if the message is large. To get around these issues, the message can be encrypted using a symmetric algorithm, and the encryption key is itself encrypted using public-key encryption for each of the recipients. Since the symmetric encryption key is usually smaller than the message itself, this is faster (since the public-key encryption algorithm needs to process less data) and can save space (since each recipient receives the same message ciphertext, plus an encrypted key that is small).

A high-level overview of encryption using Olm an Megolm is as follows:

  1. Alice and Bob upload their device public keys and one-time keys to their servers.

  2. Alice and Bob are in a room that has encryption enabled, as indicated by the m.room.encryption state event.

  3. Alice writes a message in the room.

  4. Alice’s client ensures that she has a Megolm session for the room that is up to date.

  5. Alice’s client encrypts the message using the Megolm session.

  6. Alice’s client distributes the Megolm session to all the other devices in the room as m.room_key events, encrypted using Olm and wrapped in m.room.encrypted to-device events. Alice’s client may need to create new Olm sessions if it does not yet have one with some devices.

  7. Alice’s client sends the encrypted message to the room as an m.room.encrypted event.

  8. Bob’s device receives the Megolm session from Alice over the Olm session.

  9. Bob uses the Megolm session to decrypt the message.

We will first look at device key and one-time key management (step 1), then at Megolm encryption (steps 4 and 5) and decryption (step 9), and then at distributing the Megolm sessions using Olm (steps 6 and 8). After looking at the step separately, we will look at how they interact, and how to manage everything as a whole.

Todo

add a “How encryption can fail” section somewhere