As a toy weekend project I have implemented a minimal version of the private set intersection protocol (PSI) in Rust using homomorphic encryption. There are numerous ways to implement protocols to achieve this. Though homomorphic encryption is interesting as it gives as asymmetric implementation it is also resource intensive.

Private set intersection is interesting in a new realm of privacy engineering. With more regulation companies have the responsibility to not reveal data. A plethora of new techniques attempts to make that practical.

The main applications are in the ad-industry where companies want to integrate data from different ad-companies without holding to any personally identifieble material. There are also applications in the anti-money laundering industry where banks want to ensure that people are not present in various registries without revealing the person they consider to said registries.

Anyways, this is my short exposition from a airport terminal in Munich. The code can be found here.

Homomorphic Encypritoin Background Speed Run

Homomorphic Encryption is the name for the schemes that allow running computations on encrypted data, which in the game of mathematics means addition and multiplication. There are a number of schemes and it is an entire research field. To keep this simple, we will merely focus on the properties that we need:

$$ \begin{align*} E(a) - E(b) &= E(a - b) \\ E(a) \times E(b) &= E(a \times b) \\ r \times E(b) &= E(r \times b) \end{align*} $$

Where \( E(x) \) is the enrypted value of x. These are the ones we will focus on.

The HE PSI Insight

Prerquisite for this, elements are fixed-width integers. This make entirely sense for this field as it is set intersection we are solving. All classes of entities can reasonably be hashed to a fixed-width integer.

So! Alice has the number a and Bob has the number b. They want to test whether they have the same number without revealing the numbers (unless they actually have the same number).

  1. Alice generates \( E(a) \) and sends it to Bob. This is secure as \( a \) is encrypted.
  2. Bob calculates \( E(a) - b \) which we know is the same as \( E(a - b) \) If Alice decrypts that value and get 0, we know that \( a \) and \( b \) are the same. This is not secure: Alice can derive Bobs value \( b \)!
  3. Bob needs to return \( E(a - b)^r \) where r is some private non-zero value. This is secure, as Alice can not derive Bobs value \( b \) as she does not know r.

Implementation in Rust

To my best knowledge, there are no Rust implementation for tensored homomorphic libraries. It is a tad out of scope to implement that, so for the exposition, we will experiment with the library TFHE-rs.

For this to work, the client encrypts the value it wants to check intersection on and send the encrypted value to the server. In this example, the server only allow a single value. But it should be clear that we can run the algorithm multiple times to find the intersection between a client set and a server set.

// Hash to testlet client_hash = 123456u32;// Hash is encrypted using the (private) client_key and sent to the serverlet e_client_hash = FheUint32::try_encrypt(client_hash, &client_key)?;println!("CLIENT: Encrypts the value and sends it to the server");// Let the server to its worklet server_response = server(e_client_hash);// Decrypt the returned valuelet clear_res: u32 = server_response.decrypt(&client_key);if clear_res == 0 {   println!("CLIENT: value is in the database 👏");} else {   println!("CLIENT: value is NOT in the database 👎");}

By virtue of the client secret being encrypted, the server has no knowledge of it. It will work on the encrypted value.

The server calculates \( ((c - db_1) * (c - db_2) * (c - db_{...}))^r \), or ideally should. In this test, we merely multiply on r. This would be non grata for production usage.

// Represent a database by a listlet server_hashes = vec![...];// Generate a number to disguise server valueslet server_r: u32 = random();// Encrypt the values so we can make homomorphic operations on them. This// is not secure and does not require the client key.let e_server_r = FheUint32::encrypt_trivial(server_r);let e_server_hashes = server_hashes   .iter()   .map(|&server_hash| FheUint32::encrypt_trivial(server_hash));// calculate the difference between the server values and the client values.// If they are identical, the result will be 0. Multiply it all together, if// a single value was zero, the resulting value is 0let test_equal = e_server_hashes   .map(|e_server_hash| &e_client_value - &e_server_hash)   .fold(FheUint32::encrypt_trivial(1), |acc, e| &acc * &e);// Exponentiate on r to avoid releasing information about the servers database// Note: We multiply instead of taking the power as no exponentiation function// is present and iteratively adding would be too computationally expensive.return &test_equal * &e_server_r;

The resulting value can be decrypted and tested by the client.

Privacy Engineering

This piece of technology fits in the realm of privacy engineering. It is a sub problem of doing generalized privacy preserving operations.

In particular this is one tool out of several that allows to release only specific information in a privacy preserving manner, in this case the identity of an intersecting set.

Other tools exists for other types of data. Homomorphic encryption allows for a broad range of protocols to be build on it due to its general nature. Unfortunately the schemes are still slow and for some schemes values decay quickly making elaborate computation difficult.

Another tehnique to release data in a privacy preserving way is by using differential privacy. There are different schemes for differential privacy though the key idea to to add noise in a way to disquises the original values. In comparison to homomorphic encryption schemes, differential pricavy works when release data for aggregate analysis.

References

A more practical way would be to take the same path as the BitML people and implement the scheme using TenSEAL in Python or another vectorized implementation of homomorphic encryption.

To further understand this field, I can really recommend BitML's article: Private Set Intersection from Homomorphic Encryption: A Python Implementation.