Читать книгу Federated Learning - Yang Liu - Страница 11
ОглавлениеCHAPTER 2
Background
In this chapter, we introduce the background knowledge related to federated learning, covering privacy-preserving machine learning techniques and data analytics.
2.1 PRIVACY-PRESERVING MACHINE LEARNING
Data leakage and privacy violation incidents have brought about heightened public awareness of the need for AI systems to be able to preserve user privacy and data confidentiality. Researchers are interested in developing techniques for privacy-preserving properties to be built inside machine learning (ML) systems. The resulting systems are known as privacy-preserving machine learning systems (PPML). In fact, 2018 was considered a breakout year for PPML [Mancuso et al., 2019]. PPML is a broad term that generally refers to ML equipped with defense measures for protecting user privacy and data security. The system security and cryptography community has also proposed various secure frameworks for ML.
In Westin [1968], Westin defined information privacy as follows: “the claim of individuals, groups, or institutions to determine for themselves when, how, and to what extent information about them is communicated to others.” This essentially defines the right to control the access and handling of one’s information. The main idea of information privacy is to have control over the collection and handling of one’s personal data [Mendes and Vilela, 2017].
In this chapter, we will introduce several popular approaches used in PPML including secure multi-party computation (MPC), homomorphic encryption (HE) for privacy-preserving model training and inference, as well as differential privacy (DP) for preventing unwanted data disclosure. Privacy-preserving gradient descent methods will also be discussed.
2.2 PPML AND SECURE ML
Before going into the details of PPML, we first clarify the difference between PPML and secure ML. PPML and secure ML differ mainly in the types of security violations that they are designed to deal with [Barreno et al., 2006]. In secure ML, the adversary (i.e., attacker) is assumed to violate the integrity and availability of a data-analytic system, while in PPML, the adversary is assumed to violate the privacy and confidentiality of an ML system.
Most of the time, compromise in security is caused by the intentional attack by a third party. We are concerned with three major types of attacks in ML.
• Integrity attack. An attack on integrity may result in intrusion points being classified as normal (i.e., false negatives) by the ML system.
• Availability attack. An attack on availability may lead to classification errors (both false negatives and false positives) such that the ML system becomes unusable. This is a broader type of integrity attacks.
• Confidentiality attack. An attack on confidentiality may result in sensitive information (e.g., training data or model) of an ML system being leaked.
Table 2.1 gives a comparison between PPML and secure ML in terms of security violations, adversary attacks, and defense techniques.
Table 2.1: Comparison between PPML and secure ML
In this chapter, we mainly focus on PPML and defense techniques against privacy and confidentiality violations in ML. Interested readers can refer to Barreno et al. [2006] for a more detailed explanation of secure ML.
2.3 THREAT AND SECURITY MODELS
2.3.1 PRIVACY THREAT MODELS
In order to preserve privacy and confidentiality in ML, it is important to understand the possible threat models. In ML tasks, the participants usually take up three different roles: (1) as the input party, e.g., the data owner; (2) as the computation party (e.g., the model builder and inference service provider); and (3) as the result party (e.g., the model querier and user) [Bogdanov et al., 2014].
Attacks on ML may happen in any stage, including data publishing, model training, and model inference. Attribute-inference attacks can happen in the data publishing stage, where adversaries may attempt to de-anonymize or target data-record owners for malevolent purposes. The attacks during ML model training are called reconstruction attacks, where the computation party aims to reconstruct the raw data of the data providers or to learn more information about the data providers than what the model builders intend to reveal.
For federated learning, reconstruction attacks are the major privacy concerns. In the inference phase of ML models, an adversarial result party may conduct reconstruction attack, model inversion attacks, or membership-inference attacks, using reverse engineering techniques to gain extra information about the model or raw training data.
Reconstruction Attacks. In reconstruction attacks, the adversary’s goal is to extract the training data or feature vectors of the training data during ML model training or model inference. In centralized learning, raw data from different data parties are uploaded to the computation party, which makes the data vulnerable to adversaries, such as a malicious computation party. Large companies may collect raw data from users to train an ML model. However, the collected data may be used for other purposes or sent to a third-party without informed consent from the users. In federated learning, each participating party carries out ML model training using their local data. Only the model weight updates or gradient information are shared with other parties. However, the gradient information may also be leveraged to reveal extra information about the training data [Aono et al., 2018]. Plain-text gradient updating may also violate privacy in some application scenarios. To resist reconstruction attacks, ML models that store explicit feature values such as support vector machine (SVM) and k-nearest neighbors (kNN) should be avoided. During model training, secure multi-party computation (MPC) [Yao, 1982] and homomorphic encryption (HE) [Rivest et al., 1978] can be used to defend against such attacks by keeping the intermediate values private. During model inference, the party computing the inference result should only be granted black-box access to the model. MPC and HE can be leveraged to protect the privacy of the user query during model inference. MPC, HE, and their corresponding applications in PPML will be introduced in Sections 2.4.1 and 2.4.2, respectively.
Model Inversion Attacks. In model inversion attacks, the adversary is assumed to have either white-box access or black-box access to the model. For the case of white-box access, the adversary knows the clear-text model without stored feature vectors. For the case of black-box access, the adversary can only query the model with data and collect the responses. The adversary’s target is to extract the training data or feature vectors of the training data from the model. The black-box access adversary may also reconstruct the clear-text model from the response by conducting an equation solving attack. Theoretically, for an N-dimensional linear model, an adversary can steal it with N + 1 queries. Such a problem can be formalized as solving θ from (x, hθ(x)). The adversary can also learn a similar model using the query-response pairs to simulate the original model. To resist model inversion attacks, less knowledge of the model should be exposed to the adversary. The access to model should be limited to black-box access, and the output should be limited as well. There are several strategies proposed to reduce the success rate of model inversion attack. Fredrikson et al. [2015] choose to report only rounded confidence values. Al-Rubaie and Chang [2016] take the predicted class labels as response, and the aggregated prediction results of multiple testing instances are returned to further enhance model protection. Bayesian neural networks combined with homomorphic encryption have been developed [Xie et al., 2019], to resist such attacks during secure neural network inference.
Membership-Inference Attacks. In membership-inference attacks, the adversary has black-box access to a model, as well as a certain sample, as its knowledge. The adversary’s target is to learn if the sample is inside the training set of the model. The adversary infers whether a sample belongs to the training set or not based on the ML model output. The adversary conducts such attacks by finding and leveraging the differences in the model predictions on the samples belonging to the training set vs. other samples. Defense techniques that are proposed to resist model inversion attacks, such as result generalization by reporting rounded prediction results are shown to be effective to thwart such attacks [Shokri et al., 2017]. Differential privacy (DP) [Dwork et al., 2006] is a major approach to resist membership inference attacks, which will be introduced in Section 2.4.3.
Attribute-Inference Attacks. In attribute-inference attacks, the adversary tries to de-anonymize or target record owners for malevolent purpose. Anonymization by removing personally identifiable information (PII) (also known as sensitive features), such as user IDs and names, before data publishing appears to be a natural approach for protecting user privacy. However, it has been shown to be ineffective. For example, Netflix, the world’s largest online movie rental service provider, released a movie rating dataset, which contains anonymous movie ratings from 500,000 subscribers. Despite anonymization, Narayanan and Shmatikov [2008] managed to leverage this dataset along with the Internet Movie Database (IMDB) as background knowledge to re-identify the Netflix users in the records, and further managed to deduce the user’s apparent political preferences. This incident shows that anonymization fails in the face of strong adversaries with access to alternative background knowledge. To deal with attribute-inference attacks, group anonymization privacy approaches have been proposed in Mendes and Vilela [2017]. Privacy preservation in group anonymization privacy is achieved via generalization and suppression mechanisms.
Model Poisoning Attacks. It has been shown that federated learning may be vulnerable to model poisoning attacks [Bhagoji et al., 2019], also known as backdoor attacks [Bagdasaryan et al., 2019]. For example, a malicious participant in federated learning may inject a hidden backdoor functionality into the trained federated model, e.g., to cause a trained word predictor to complete certain sentences with an attacker-chosen word [Bagdasaryan et al., 2019]. Bhagoji et al. [2019] proposed a number of strategies to carry out model poisoning attacks, such as boosting of the malicious participant’s model update, an alternating minimization strategy that alternately optimizes for the legit training loss and the adversarial backdoor objective, and using parameter estimation for the benign updates to improve attack success. Bagdasaryan et al. [2019] developed a new model-poisoning methodology using model replacement, where a constrain-and-scale technique is proposed to evade anomaly detection-based defenses by incorporating the evasion into the attacker’s loss function during model training. Possible solutions against model poisoning attacks include blockchain-based approaches [Preuveneers et al., 2018] and trusted execution environment (TEE) based approaches [Mo and Haddadi, 2019].
2.3.2 ADVERSARY AND SECURITY MODELS
For cryptographic PPML techniques, including MPC and HE, two types of adversaries are concerned in the literature.
• Semi-honest adversaries. In the semi-honest (a.k.a honest-but-curious, and passive) adversary model, the adversaries abide by the protocol honestly, but also attempt to learn more information beyond the output from the received information.
• Malicious adversaries. In the malicious (a.k.a. active) adversary model, the adversaries deviate from the protocol and can behave arbitrarily.
The semi-honest adversary model is widely considered in most PPML studies. The main reason is that, in federated learning, it is beneficial to each party to honestly follow the ML protocol, since malicious behaviors also break the benefits of the adversaries themselves. The other reason is that, in cryptography, it is a standard method to build a protocol secure against semi-honest adversaries first, then modify it to be secure against malicious adversaries via zero-knowledge proof.
For both security models, the adversaries corrupt a fraction of the parties, and the corrupted parties may collude with each other. The corruption of parties can be static or adaptive. The complexity of an adversary can be either polynomial-time or computational unbounded, corresponding to information-theoretic secure and computational secure, respectively. The security in cryptography is based on the notion of indistinguishability. Interested readers can refer to Lindell [2005] and Lindell and Pinkas [2009] for detailed analysis of adversary and security models.
2.4 PRIVACY PRESERVATION TECHNIQUES
In this section, we discuss privacy preservation techniques. We cover three types of such approaches, namely (1) MPC, (2) HE, and (3) DP.
2.4.1 SECURE MULTI-PARTY COMPUTATION
Secure Multi-Party Computation (MPC), a.k.a. secure function evaluation (SFE), was initially introduced as a secure two-party computation problem (the famous Millionaire’s Problem), and generalized in 1986 by Andrew Yao [1986]. In MPC, the objective is to jointly compute a function from the private input by each party, without revealing such inputs to the other parties. MPC tells us that for any functionality, it is possible to compute it without revealing anything other than the output.
Definition
MPC allows us to compute functions of private input values so that each party learns only the corresponding function output value, but not input values from other parties. For example, given a secret value x that is split into n shares so that a party Pi only knows xi, all parties can collaboratively compute
so that party Pi learns nothing beyond the output value yi corresponding to its own input xi.
The standard approach to prove that an MPC protocol is secure is the simulation paradigm [Lindell, 2017]. To prove an MPC protocol is secure against adversaries that corrupt t parties under the simulation paradigm, we build a simulator that, when given inputs and outputs of t colluding parties, generates t transcripts, so that the generated transcripts are indistinguishable to that generated in the actual protocol.
In general, MPC can be implemented through three different frameworks, namely: (1) Oblivious Transfer (OT) [Keller et al., 2016, Goldreich et al., 1987]; (2) Secret Sharing (SS) [Shamir, 1979, Rabin and Ben-Or, 1989]; and (3) Threshold Homomorphic Encryption (THE) [Cramer et al., 2001, Damgård and Nielsen, 2003]. From a certain point of view, both oblivious transfer protocols and threshold homomorphic encryption schemes use the idea of secret sharing. This might be the reason why secret sharing is widely regarded as the core of MPC. In the rest of this section, we will introduce oblivious transfer and secret sharing.
Oblivious Transfer
OT is a two-party computation protocol proposed by Rabin in 1981 [Rabin, 2005]. In OT, the sender owns a database of message-index pairs (M1, 1),…, (MN, N). At each transfer, the receiver chooses an index i for some 1 ≤ i ≤ N, and receives Mi. The receiver does not learn any other information about the database, and the sender does not learn anything about the receiver’s selection i. Here, we give the definition of 1-out-of-n OT.
Definition 2.1 1-out-of-n OT: Suppose Party A has a list (x1,…, xn) as the input, Party B has i ∈ 1,…, n as the input. 1-out-of-n OT is an MPC protocol where A learns nothing about i and B learns nothing else but xi.
When n = 2, we get 1-out-of-2 OT which has the following property: 1-out-of-2 OT is universal for two-party MPC [Ishai et al., 2008]. That is, given a 1-out-of-2 OT protocol, one can conduct any secure two-party computation.
Many Constructions of OT has been proposed such as Bellare–Micali’s [Bellare and Micali, 1990], Naor–Pinka’s [Naor and Pinkas, 2001], and Hazay–Lindell’s [Hazay and Lindell, 2010] approaches. Here, we demonstrate the Bellare-Micali’s construction of OT, which utilizes Diffie–Hellman key exchange and is based on the computational Diffie–Hellman (CDH) assumption [Diffie and Hellman, 1976]. The Bellare–Micali’s construction works as follows: the receiver sends two public keys to the sender. The receiver only holds one private key corresponding to one of the two public keys, and the sender does not know which public key it is. Then, the sender encrypts the two massages with their corresponding public keys, and sends the ciphertexts to the receiver. Finally, the receiver decrypts the target ciphertext with the private key.
Bellare–Micali Construction. In a discrete logarithm setting (G, g, p), where G is a group of prime order p, g ∈ G is a generator, and H : G → {0,1}n is a hash function. Suppose the sender A has x0, x1 ∈ {0,1}n, and the receiver B has b ∈ {0,1}.
1. A chooses a random element c ← G and sends it to B.
2. B chooses k ← Zp and sets PKb, = gk, PK1–b, = c/PKb, then sends PK0 to A. A sets PK1 = c/PK0.
3. A encrypts x0 with ElGamal scheme using PK0 (i.e., setting and encrypting x1 using PK1). Then, A sends (C0, C1) to B.
4. B decrypts Cb using private key k to obtain .
Yao’s Garbled Circuit (GC). [Yao, 1986] is a well-known OT-based secure two-party computation protocol that can evaluate any function. The key idea of Yao’s GC is to decompose the computational circuits into generation and evaluation stages. The circuits consisting of gates like AND, OR, and NOT can be used to compute any arithmetic operation. Each party is in charge of one stage and the circuit is garbled in each stage, so that any of them cannot get information from the other one, but they can still achieve the result according to the circuit. GC consists of an OT protocol and a block cipher. The complexity of the circuit grows at least linearly with the input size. Soon after GC was proposed, GMW [Goldreich et al., 1987] extended GC to the multi-party setting against malicious adversaries. For more detailed survey of GC, readers can refer to Yakoubov [2017].
OT Extension. Impagliazzo and Rudich [1989] showed that OT provably requires “public-key” type of assumptions (such as factoring, discrete log, etc.). However, Beaver [1996] pointed out that OT can be “extended” in the sense that it is enough to generate a few “seed” OTs based on public-key cryptography, which can then be extended to any number of OTs using symmetric-key cryptosystems only. OT extension is now widely applied in MPC protocols [Keller et al., 2016, Mohassel and Zhang, 2017, Demmler et al., 2015] to improve efficiency.
Secret Sharing
Secret sharing is a concept of hiding a secret value by splitting it into random parts and distributing these parts (a.k.a. shares) to different parties, so that each party has only one share and thus only one piece of the secret [Shamir, 1979, Beimel, 2011]. Depending on the specific secret sharing schemes used, all or a known threshold of shares are needed to reconstruct the original secret value [Shamir, 1979, Tutdere and Uzunko, 2015]. For example, Shamir’s Secret Sharing is constructed based on polynomial equations and provides information-theoretic security, and it is also efficient using matrix calculation speedup [Shamir, 1979]. There are several types of secret sharing, mainly including arithmetic secret sharing [Damård et al., 2011], Shamir’s secret sharing [Shamir, 1979], and binary secret sharing [Wang et al., 2007]. As arithmetic secret sharing is mostly adopted by existing SMPC-based PPML approaches and binary secret sharing are closely related to OT which is discussed in Section 2.4.1, here we focus on arithmetic secret sharing.
Consider that a party Pi wants to share a secret S among n parties in a finite field Fq. To share S, the party Pi randomly samples n – 1 values from Zq and set mod q. Then, Pi distributes sk to party Pk, for k ≠ i. We denote the shared S as .
The arithmetic addition operation is carried out locally at each party. The secure multiplication is performed by using Beaver triples [Beaver, 1991]. The Beaver triples can be generated in an offline phase. The offline phase (i.e., preprocessing) serves as a trusted dealer who generates Beaver triples {(〈a〉, 〈b〉, 〈c〉)|ab = c} and distributes the shares among the n parties.
To compute first computes 〈e〉 = 〈x〉 – 〈a〉, 〈f〉 = 〈y〉 – 〈b〉. Then, e and f are reconstructed. Finally, Pi computes 〈z〉 = 〈c〉 + e〈x〉 + f〈y〉 locally, and a random party Pj adds its share into ef. We denote element-wise multiplication of vectors as 〈·〉 ⨀ 〈·〉.
Secure multiplication can also be performed by leveraging the Gilboa’s protocol [Gilboa, 1999], in which n-bit arithmetic multiplication can be conducted via n 1-out-of-2 OTs. Suppose that Party A holds x and Party B holds y. Now we show Gilboa’s protocol, which results in A holding 〈z〉A and B holding 〈z〉B such that z = x · y. Let l be the maximum length of the binary representation of the numbers involved in our protocol. Denote the m× 1-out-of-2 OT for l -bit strings as . Denote the i th bit of x as x[i]. The secure 2-party multiplication via OT can be conducted as follows.
1. A represents x in binary format.
2. B builds . For the i th OT, randomly pick ai,0 and compute ai,1 = 2i y – ai,0. Use (–ai,0, ai,1) as the input for the i th OT.
3. A inputs X[i] as the choice bit in the i th OT and obtains x[i] × 2i y × – ai,0.
4. A computes B computes .
The offline phase can be carried out efficiently with the help of a semi-honest dealer who generates Beaver triples and distributes them among all the parties. To perform such a preprocessing step without a semi-honest dealer, there are several protocols available, such as SPDZ [Damård et al., 2011], SPDZ-2 [Damård et al., 2012], MASCOT [Keller et al., 2016], and HighGear [Keller et al., 2018].
• SPDZ is an offline protocol in the preprocessing model based on somewhat homomorphic encryption (SHE) in the form of BGV, first described in Damård et al. [2011].
• SPDZ-2 [Damård et al., 2012] is a protocol based on threshold SHE cryptography (with a shared decryption key).
• MASCOT is an oblivious-transfer-based protocol, proposed in Keller et al. [2016]. It is far more computationally efficient than SPDZ and SPDZ-2.
• In 2018, Keller et al. [2018] developed a BGV-based SHE protocol, called the HighGear protocol, which achieves better performance than the MASCOT protocol.
Application in PPML
Various MPC-based approaches have been designed and implemented for PPML in the past. Most MPC-based PPML approaches leverage a two-phase architecture, comprising of an offline phase and an online phase. The majority of cryptographic operations are conducted in the offline phase, where multiplication triples are generated. The ML model is then trained in the online phase using the multiplication triples generated in the offline phase. The DeepSecure [Rouhani et al., 2017] is a GC-based framework for secure neural network inference, where the inference function has to be represented as a Boolean circuit. The computation and communication cost in GC only depend on the total number of AND gates in the circuit.
SecureML [Mohassel and Zhang, 2017] is another two-party framework for PPML employing two-phase architecture. Parties in federated learning distributes arithmetic shared of their data among two non-colluding servers, who run secure two-party model training protocols. Both Linearly HE (LHE)-based and OT-based protocols are proposed for multiplication triples generation in offline phase. The online phase is based on arithmetic secret sharing and division GC. Therefore, only linear operations are allowed in model training, and various approximations are done to nonlinear functions.
The Chameleon framework is another hybrid MPC framework based on ABY for neural network model inference [Demmler et al., 2015]. Arithmetic secret sharing is used to conduct linear operations, and GC as well as GMW [Goldreich et al., 1987] are used for nonlinear operations. Conversion protocols are also implemented to convert data representations among different protocols.
Privacy-preserving ID3 learning based on OT is addressed in Lindell and Pinkas [2002]. Shamir’s threshold secret sharing is used for secure model aggregation for PPML with security against both honest-but-curious and malicious adversaries [Bonawitz et al., 2017], where a group of clients do MPC to evaluate the average of their private input models, and disclose the average to the parameter server for model update. Recently, MPC-based approaches pursuing security against malicious corrupted majority has been studied. For example, linear regression and logistic regression training and evaluation with SPDZ is studied in Chen et al. [2019]. The authors in Damgård et al. [2019] embraces SPDZ2k [Cramer et al., 2018] for actively secure private ML against a dishonest majority. It implements decision tree and SVM evaluation algorithms.
2.4.2 HOMOMORPHIC ENCRYPTION
HE is generally considered as an alternative approach to MPC in PPML. HE can also be used to achieve MPC as discussed in Section 2.4.1. The concept of HE was proposed in 1978 by Rivest et al. [1978] as a solution to perform computation over ciphertext without decrypting the ciphertext. Since then, numerous attempts have been made by researchers all over the world to design such homomorphic schemes.
The encryption system proposed by Goldwasser and Micali [1982] was a provably secure encryption scheme that reached a remarkable level of safety. It allows an additive operation over ciphertext, but is able to encrypt only a single bit. Paillier [1999] proposed a provable security encryption system that also allows an additive operation over ciphertext in 1999. It has been widely used in various applications. A few years later, in 2005, Boneh et al. [2005] invented a system of provable security encryption, which allows unlimited number of additive operations and one multiplicative operation. Gentry made a breakthrough in 2009 and proposed the first HE scheme that supports both additive and multiplicative operations for unlimited number of times [Gentry, 2009].
Definition
An HE scheme H is an encryption scheme that allows certain algebraic operations to be carried out on the encrypted content, by applying an efficient operation to the corresponding ciphertext (without knowing the decryption key). An HE scheme H consists of a set of four functions:
where
- KeyGen: Key generation. A cryptographic generator g is taken as the input. For asymmetric HE, a pair of keys {pk, sk} = KeyGen(g) are generated, where pk is the public key for encryption of the plaintext and sk is the secret (private) key for decryption of the ciphertext. For symmetric HE, only a secret key sk = KeyGen(g) is generated.
- Enc: Encryption. For asymmetric HE, an encryption scheme takes the public key pk and the plaintext m as the input, and generates the ciphertext c = Encpk (m) as the output. For symmetric HE, an HE scheme takes the secret key sk and the plaintext m, and generates ciphertext c = Encsk(m).
- Dec: Decryption. For both symmetric and asymmetric HE, the secret key sk and the ciphertext c are taken as the input to produce the corresponding plaintext m = Decsk(c).
- Eval: Evaluation. The function Eval takes the ciphertext c and the public key pk (for asymmetric HE) as the input, and outputs a ciphertext corresponding to a functioned plaintext.
Let Encenk(·) denote the encryption function with enk as the encryption key. Let M denote the plaintext space and C denote the ciphertext space. A secure cryptosystem is called homomorphic if it satisfies the following condition:
for some operators ⨀M in M and ⨀C in C, where ← indicates the left-hand side term is equal to or can be directly computed from the right-hand side term without any intermediate decryption. In this book we denote homomorphic encryption operator as ⟦·⟧, and we overload the addition and multiplication operators over ciphertexts as follows.
• Addition: Decsk(⟦u⟧ ⨀C ⟦v⟧) = Decsk(⟦u + v⟧), where “⨀C” may represent multiplication of the ciphertexts (see, e.g., Paillier [1999]).
• Scalar multiplication: Decsk(⟦u⟧ ⨀C n) = Decsk(⟦u · n⟧), where “⨀C” may represent taking the power of n of the ciphertext (see, e.g., Paillier [1999]).
Categorization of HE Schemes
HE schemes can be categorized into three classes: Partially Homomorphic Encryption (PHE), Somewhat Homomorphic Encryption (SHE), and Fully Homomorphic Encryption (FHE). In general, for HE schemes, the computational complexity increases as the functionality grows. Here, we provide a brief introduction to different types of HE schemes. Interested readers can refer to Armknecht et al. [2015] and Acar et al. [2018] for more details regarding different classes of HE schemes.
Partially Homomorphic Encryption (PHE). For PHE, both (M, ⨀M) and (C, ⨀C) are groups. The operator ⨀C can be applied on ciphertexts for a unlimited number of times. PHE is a group homomorphism technique. Specifically, if ⨀M is addition operator, the scheme is additively homomorphic, and if ⨀M is a multiplication operator, we say that the scheme is multiplicative homomorphic. The references Rivest et al. [1978] and ElGamal [1985] represent two typical multiplicative HE schemes. Examples of additive HE schemes can be found in Goldwasser and Micali [1982] and Paillier [1999].
Somewhat Homomorphic Encryption (SHE). An HE scheme is called SHE if some operations (e.g., addition and multiplication) can be applied for only a limited number of times. Some literature also refer to the schemes supporting arbitrary operations while only some limited circuits (e.g., the branching programs [Ishai and Paskin, 2007], garbled circuit [Yao, 1982]) as SHE. Examples are BV [Brakerski and Vaikuntanathan, 2011], BGN [Boneh et al., 2005], and IP [Ishai and Paskin, 2007]. SHE schemes introduce noise for security. Each operation on the ciphertext increases the noise of the output ciphertext, and multiplication is the main technique for increasing noise. When the noise exceeds an upper bound, decryption cannot be conducted correctly. This is the reason why most SHE schemes require a limited number of times of applying the operations.
Fully Homomorphic Encryption (FHE). FHE schemes allow both additive and multiplicative operations with unlimited number of times over ciphertexts. It is worth noticing that additive and multiplicative operations are the only two operations needed to compute arbitrary functions. Consider A, B ∈ F2. The NAND gate can be constructed by 1 + A * B. Thanks to its functional completeness, the NAND gate can be used to construct any gate. Therefore, any functionality can be evaluated by FHE. There are four main families of FHE [Acar et al., 2018]: (1) Ideal Lattice-based FHE (see, e.g., Gentry [2009]); (2) Approximate-GCD based FHE (see, e.g., Dijk et al. [2010]); (3) (R)LWE-based FHE (e.g., Lyubashevsky et al. [2010] and Brakerski et al. [2011]); and (4) NTRU-like FHE (see, e.g., López-Alt et al. [2012]).
The existing FHE schemes are built on SHE, by assuming circular security and implementing an expensive bootstrap operation. The bootstrap operation re-encrypts the ciphertexts, by evaluating the decryption and encryption functions over the ciphertexts and the encrypted secret key, in order to reduce the noise of ciphertext for further computation. As a result of the costly bootstrap operation, FHE schemes are very slow and not competitive against general MPC approaches in practice. Researchers are now focusing on finding more efficient SHE schemes that satisfy certain requirements, instead of trying to develop an FHE scheme. In addition, FHE schemes assume circular security (a.k.a. key dependent message (KDM) security), which keeps the secret key secure by encrypting it with the public key. However, no FHE is proven to be semantically secure with respect to any function and is IND-CCA1 secure [Acar et al., 2018].
Application in PPML
Many research efforts based on HE have been devoted to PPML in the past. For example, Hardy et al. [2017] proposed algorithms for privacy-preserving two-party logistic regression for vertically partitioned data. Paillier’s scheme is leveraged in secure gradient descent to train the logistic regression model, where constant-multiplication and addition operations are conducted via a mask encrypted by Paillier’s scheme and the intermediate data computed by each party. The encrypted masked intermediate results are exchanged between the two parties in the secure gradient descent algorithm. Finally, the encrypted gradient is sent to a coordinator for decryption and model update.
CryptoNets [Gilad-Bachrach et al., 2016] is an HE-based methodology announced by Microsoft Research that allows secure evaluation (inference) of encrypted queries over already trained neural networks on cloud servers: queries from the clients can be classified securely by the trained neural network model on a cloud server without inferring any information about the query or the result. The CryptoDL [Hesamifard et al., 2017] framework is a leveled HE-based approach for secure neural network inference. In CryptoDL, several activation functions are approximated using low-degree polynomials and mean-pooling is used as a replacement for max-pooling. The GAZELLE [Juvekar et al., 2018] framework is proposed as a scalable and low-latency system for secure neural network inference. In GAZELLE, to conduct secure nonlinear function evaluation in neural network inference, HE and traditional secure two-party computation techniques (such as GC) are combined in an intricate way. The packed additive homomorphic encryption (PAHE) embraced in GAZELLE allows single instruction multiple data (SIMD) arithmetic homomorphic operations over encrypted data.
FedMF [Chai et al., 2019] uses Paillier’s HE for secure federated matrix factorization assuming honest-but-curious server and honest clients. Secure federated transfer learning is studied via Paillier’s HE scheme in Liu et al. [2019], where the semi-honest third party is into the discard by mixing HE with additive secret sharing in decryption process.
2.4.3 DIFFERENTIAL PRIVACY
DP was originally developed to facilitate secure analysis over sensitive data. With the rise of ML, DP has become an active research field again in the ML community. This is motivated by the fact that many exciting results from DP can be applied to PPML [Dwork et al., 2016, 2006]. The key idea of DP is to confuse the adversaries when they are trying to query individual information from the database so that adversaries cannot distinguish individual-level sensitivity from the query result.
Definition
DP is a privacy definition initially proposed by Dwork et al. [2006], developed in the context of statistical disclosure control. It provides an information-theoretic security guarantee that the outcome of a function to be insensitive to any particular record in the dataset. Therefore, DP can be used to resist the membership inference attack. The (∊, δ)-differential privacy is defined as follows.
Definition 2.2 (∊, δ)-differential privacy. A randomized mechanism M preserves (∊, δ)-differential privacy if given any two datasets D and D′ differing by only one record, and for all S ⊂ Range (M),
where ∊ is the privacy budget and δ is the failure probability.
The quantity is called the privacy loss, with ln denoting natural logarithm operation. When δ = 0, the stronger notion of ∊-differential privacy is achieved.
DP has utility-privacy trade-offs as it introduces noise to data. Jayaraman and Evans [2019] found out that current mechanisms for differential privacy for ML rarely offer acceptable utility-privacy trade-offs: settings that provide limited accuracy loss provide little effective privacy, and settings that provide strong privacy result in large accuracy loss.
Categorization of DP Schemes
Typically, there are mainly two ways to achieve DP by adding noise to the data. One is the addition of noise according to the sensitivity of a function [Dwork et al., 2006]. The other is choosing noise according to an exponential distribution among discrete values [McSherry and Talwar, 2007].
The sensitivity of a real-valued function expresses the maximum possible change in its value due to the addition or removal of a single sample.
Definition 2.3 Sensitivity. For two datasets D and D′ differing by only one record, and a function M : D → Rd over an arbitrary domain, the sensitivity of M is the maximum change in the output of M over all possible inputs:
where ‖·‖ is a norm of the vector. The l1-sensitivity or the l2-sensitivity is defined when the l1-norm or l2-norm is applied, respectively.
We denote the Laplace distribution with parameter b as Lap(b). Lap(b) has a probability density function . Given a function M with sensitivity ΔM, the addition of noise drawn from a calibrated Laplace distribution Lap(ΔM/∊) maintains ∊-differential privacy [Dwork et al., 2006].
Theorem 2.4 Given a function M : D → Rd over an arbitrary domain D, for any input X, the function:
provides ∊-differential privacy. The ∊-differential privacy can also be achieved by adding independently generated Laplace noise from distribution Lap(ΔM/∊) to each of the d output terms.
The addition of Gaussian or binomial noise, scaled to the l2-sensitivity of the function, sometimes yields better accuracy, while only ensuring the weaker (∊, δ)-differential privacy [Dwork et al., 2006, Dwork and Nissim, 2004].
The exponential mechanism [McSherry and Talwar, 2007] is another way to obtain DP. The exponential mechanism is given a quality function q that scores outcomes of a calculation, where higher scores are better. For a given database and ∊ parameter, the quality function induces a probability distribution over the output domain, from which the exponential mechanism samples the outcome. This probability distribution favors high-scoring outcomes, while ensuring ∊-differential privacy.
Definition 2.5 Let q : (Dn × R) → R be a quality function, which given a dataset d ∈ Dn, assigns a score to each outcome r ∈ R. For any two datasets D and D′ differing by only one record, let . Let M be a mechanism for choosing an outcome r ∈ R given a dataset instance d ∈ Dn. Then, the mechanism M, defined as
provides ∊-differential privacy.
The DP algorithms can be categorized according to how and where the perturbation is applied.
1. Input perturbation: The noise is added to the training data.
2. Objective perturbation: The noise is added to the objective function of the learning algorithms.
3. Algorithm perturbation: The noise is added to the intermediate values such as gradients in iterative algorithms.
4. Output perturbation: The noise is added to the output parameters after training.
DP still exposes the statistics of a party, which are sensitive in some cases, such as financial data, medical data and other commercial and health applications. Readers who are interested in DP and willing to learn more about it can refer to the tutorial given by Dwork and Roth [2014].
Application in PPML
In federated learning, to enable model training on distributed datasets held by multiple parties, local differential privacy (LDP) can be used. With local differential privacy, each input party would perturb their data, then release the obfuscated data to the un-trusted server. The main idea behind local differential privacy is randomized response (RR).
Papernot et al. [2016] utilized the teacher ensemble framework to first learn a teacher model ensemble from the distributed datasets among all the parties. Then, the teacher model ensemble is used to make noisy predictions on a public dataset. Finally, the labeled public dataset is used to train a student model. The privacy loss is precisely controlled by the number of public data samples inferred by the teacher ensemble. Generative adversarial network (GAN) is further applied in Papernot et al. [2018] to generate synthetic training data for the training of the student model. Although this approach is not limited to a single ML algorithm, it requires adequate data quantity at each location.
Moments accountant is proposed for differentially private stochastic gradient descent (SGD), which computes the overall privacy cost in neural networks model training by taking into account the particular noise distribution under consideration [Abadi et al., 2016]. It proves less privacy loss for appropriately chosen settings of the noise scale and the clipping threshold.
The differentially private Long Short Term Memory (LSTM) language model is built with user-level differential privacy guarantees with only a negligible cost in predictive accuracy [McMahan et al., 2017]. Phan et al. [2017] proposed a private convolutional deep belief network (pCDBN) by leveraging the functional mechanism to perturb the energy-based objective functions of traditional convolutional deep belief networks. Generating differentially private datasets using GANs is explored in Triastcyn and Faltings [2018], where a Gaussian noise layer is added to the discriminator of a GAN to make the output and the gradients differentially private with respect to the training data. Finally, the privacy-preserving artificial dataset is synthesized by the generator. In addition to the DP dataset publishing, differentially private model publishing for deep learning is also addressed in Yu et al. [2019], where concentrated DP and a dynamic privacy budget allocator are embraced to improve the model accuracy.
Geyer et al. [2018] studied differentially private federated learning and proposed an algorithm for client-level DP preserving federated optimization. It was shown that DP on a client level is feasible and high model accuracy can be reached when sufficiently many participants are involved in federated learning.