﻿Identification with Encrypted Biometric Data ∗
arXiv:0901.1062v2 [cs.CR] 7 Sep 2009
Julien Bringer1 , Hervé Chabanne1,2 1,2 †
and Bruno Kindarji
1
Sagem Sécurité, Osny, France.
2
Institut TELECOM, Télécom ParisTech, Paris, France.
Abstract
Biometrics make human identification possible with a sample of a biometric
trait and an associated database. Classical identification techniques
lead to privacy concerns. This paper introduces a new method to
identify someone using his biometrics in an encrypted way.
Our construction combines Bloom Filters with Storage and Locality-
Sensitive Hashing. We apply this error-tolerant scheme, in a Hamming
space, to achieve biometric identification in an efficient way. This is the
first non-trivial identification scheme dealing with fuzziness and encrypted
data.
Keywords. Identification, Biometrics, Privacy, Searchable Encryption.
1 Introduction
The arising of biometric recognition systems is based on the uniqueness of some
natural information every human being carries along. For instance, it is possible
to verify that a given individual is the one he claims to be (Verification). It
is also possible to find someone’s identity among a collection thanks to his
biometrics (Identification).
In this paper, we design a biometric identification system that is based on
encrypted data, so that privacy is guaranteed, and in a way that does not take
too much time and memory to process. For that purpose, we need to find a way
to:
• mitigate the effects of biometrics fuzziness,
• and efficiently identify someone over an encrypted database.
It follows the idea of searchable encryption and we here explain how to make
efficient queries to the database, that look for a pattern close to a given one in
encrypted data, i.e. a search with error-tolerance.
∗ An extended abstract – entitled “Error-Tolerant Searchable Encryption” – of this work
has been accepted to and will be presented at the Communication and Information Systems
Security Symposium, International Conference on Communications (ICC) 2009, June 14-18,
Dresden, Germany. This paper “Identification with Encrypted Biometric Data” is the full
version of our work.
† This work was partially supported by the french ANR RNRT project BACH.
11.1 Related Works and Motivation
Security of biometric systems is widely studied – cf. [31, 33, 4] – and although a
lot of vulnerabilities are now well understood and controlled, it is still difficult
to achieve an end-to-end system which satisfies all constraints. In particular,
biometric template privacy is an important issue due to the non-revocability
and non-renewability of biometric features.
1.1.1 Biometrics and Cryptography
A specific difficulty concerning biometrics is their fuzziness. It is nearly impossible
for a sensor to obtain the same image from a biometric data twice: there
will always be significant differences. The classical way to supersede variations
between different captures is to use a matching function, which basically tells if
two measures represent the same biometric data or not.
The integration of biometrics into cryptographic protocols is thus difficult
as state-of-the-art protocols are not designed for error-tolerance and fuzziness
in their inputs. The two main leads for that are achieving a good stable coding
of the data or making the matching algorithm part of the protocol.
Both sides of the problem are quite hard. The extraction of a constant-length
vector has been studied for the iris [18] and the fingerprint [32, 48]; the result
is a fixed-length bit string on which the matching is realized with the Hamming
distance. Following this, we solely focus in this paper on binary biometric data
compared with Hamming distance.
Most of protocols involving biometric data and cryptography use Secure
Sketches or Fuzzy Extractors [19, 34]. It uses error correction to reduce variations
between the different measures, and to somehow hide the biometric data
behind a random codeword – e.g. [46, 39, 27, 9, 8, 11].
On the other hand, several biometrics verification protocols, e.g. [14, 10, 12,
44, 47], have proposed to embed the matching directly. They use the property of
homomorphic encryption schemes to compute the Hamming distance between
two encrypted templates. Some other interesting solutions based on adaptation
of known cryptographic protocols are also investigated in [7, 13].
The drawback with all these techniques is that they do not fit well with
identification in large databases as the way to run an identification among N
data would be to run almost as many authentication algorithms. As far as we
know, no non-trivial protocol for biometric identification involving privacy and
confidentiality features has been proposed yet.
1.1.2 Identification
Several algorithms have been proposed for the so-called Nearest Neighbour and
Approximate Nearest Neighbour (ANN) problems. Indyk wrote a review on
these topics in [29]. Recently, Hao et al. [26] demonstrated the efficiency of the
ANN approach for iris biometrics where projected values of iris templates are
used to speed up identification requests into a large database; indeed [26] derived
a specific ANN algorithm from the iris structure and statistical properties.However, in their construction the iris biometric data are never encrypted, and
the way they boost the search for the nearest match reveals a large amount of
information about sensitive data.
Our works are also influenced by the problem of finding a match on encrypted
data. Boneh et al. defined the notion of Public-key encryption with Keyword
Search (PEKS) [5], in which specific trapdoors are created for the lookup of
keywords over public-key encrypted messages. Several other papers, e.g. [24,
2, 15, 35, 43], have also elaborated solutions in this field. However the main
difference between the search for a keyword as understood by Boneh et al. [5, 6]
and biometric matching is that an exact match for a given bit string in the
plaintext suffices for the former, but not for our motivation. For this purpose, we
introduce a new model for error-tolerant search in Sec. 3 and specific functions
to take into account fuzziness in Sec. 4.1.
The most significant difference here from the primitives introduced previously
in [5] is that messages are no longer associated to keywords. Moreover,
our primitives enable some imprecision on the message that is looked up. For
example, one can imagine a mailing application, where all the mails are encrypted,
and where it is possible to make queries on the mail subject. If there is
a typo in the query, then looking for the correct word should also give the mail
among the results – at least, we would like that to happen. Note that wildcards
are not well-adapted to this kind of application, as a wildcard permits to catch
errors providing that we know where it is located, whereas error-tolerance does
not have this constraint.
1.2 Construction Outline
We propose to use recent advances done in the fields of similarity searching
and public-key cryptography. Our technique narrows our identification to a few
candidates. In a further step, we must complete it by fine-tuning the results
in checking the remaining identities so that the identification request gets a
definite answer.
The first step is accomplished by combining Bloom filters with localitysensitive
hashing functions. Bloom filters enable to speed up the search for
a specified keyword using a time-space trade-off. We use locality-sensitive hashing
functions to speed the search for the (approximate-)nearest neighbour of an
element in a reference set. Combining these primitives enables to efficiently use
cryptographic methods on biometric templates, and to achieve error-tolerant
searchable encryption.
1.3 Organization
In Section 2 we describe the biometric identification architecture that we consider
and explain our security objectives to reach. Section 3 introduces the
security model for the cryptographic primitives that we use, based on the new
concept of Error-Tolerant Searchable Encryption. We introduce the differentfunctions used for our proposition in Section 4. We give in Section 5 a stepby-step
construction of an error-tolerant searchable scheme, together with its
security analysis. Application to biometric identification is explained in Section
6 and Section 6.2 gives a practical illustration with IrisCodes. Section 7
concludes.
An additional property of symmetric privacy is analyzed in Appendix A.
2 Architecture for Biometric Identification
2.1 Introduction to Biometric Identification
For a given biometrics technology, such as the fingerprint or the iris, let B be
the set of all possible corresponding biometric features – i.e. data which are
captured by biometric sensors. For biometric recognition, a matching algorithm
m : B × B → R is used to compute a dissimilarity score between two data. Its
goal is to differentiate similar data from different ones:
Definition 1 A biometric template b ∈ B is the result of a measurement from
someone’s biometrics thanks to a sensor. For a specific user whose biometrics
is β, we note b ← β the fact that b is a measure of β.
Two different measures of the same user b, b ′ ← β have with high probability
a small score m(b, b ′ ); measures of different users b1 ← β1, b2 ← β2 have a
large value m(b1, b2).
In practice, some thresholds λmin, λmax are chosen and the score is considered
as small (resp. large) if it is less (resp. greater) than the threshold λmin
(resp. λmax). This score is usually enough to determine with some precision if
two measures correspond to the same user or not. Errors, called False Reject
and False Acceptance, are possible but this problem is outside the scope of our
paper.
In the following, we restrict ourselves to B = {0, 1} N equipped with the
Hamming distance d. A biometric template b ∈ B is the result of a measurement
from someone’s biometrics thanks to a sensor. Two different measures b, b ′ of
the same user U are with high probability at a Hamming distance d(b, b ′ ) ≤ λmin
; measures b1, b2 of different users U1, U2 are at a Hamming distance d(b1, b2) >
λmax. In this case, the matching algorithm m simply consists in evaluating the
Hamming distance.
Remark 1 For instance, iris biometric features are binary vectors of length
2048 when coded as IrisCodes following [18]. In this case of IrisCode [18], the
matching algorithm m is related to the computation of a Hamming distance
between two IrisCodes.
A biometric identification system – also called a one-to-many biometric system
– recognizes a person among a collection of templates. A system is given
by a reference data set D ⊂ B and a identification function id : B → P(D). On
input bnew, the system outputs a subset C of D containing biometric templatesbref ∈ D such that the matching score between bnew and bref is small. This
means that bnew and bref possibly corresponds to the same person. C is the ∅
if no such template can be found; the size of C depends on the accuracy of the
system. With pseudo-identities (either real identities of persons or pseudonyms)
registered together with the reference templates in D, the set C gives a list of
candidates for the pseudo-identity of the person associated to bnew.
2.2 Architecture
Our general model for biometric identification relies on the following entities.
• Human users Ui: a set of N users are registered thanks to a sample of their
biometrics βi and pseudo-identities IDi, more human users Uj (j > N)
represent possible impostors with biometrics βj.
• Sensor client SC: a device that extracts the biometric template from βi.
• Identity Provider IP: replies to queries sent by SC by providing an identity,
• Database DB: stores the biometric data.
Remark 2 Here the sensor client is a client which captures the raw image of
a biometric data and extracts its characteristics to output a so-called biometric
template. Consequently, we assume that the sensor client is always honest and
trusted by all other components. Indeed, as biometrics are public information,
additional credentials are always required to establish security links in order to
prevent some well-known attacks (e.g. replay attacks) and to ensure that, with
a high probability, the biometric template captured by the sensor and used in
the system is from a living human user. In other words, we assume that it is
difficult to produce a fake biometric template that can be accepted by the sensor.
In an identification system, we have two main services:
1. Enrolment registers users thanks to their physiological characteristics (for
a user Ui, it requires a biometric sample bi ← βi and its identity IDi)
2. Identification answers to a request by returning a subset of the data which
was registered
The enrolment service can be run each time a new user has to be registered.
Depending on the application, the identification service can output either the
identity of the candidates or their reference templates.
As protection against outsiders, such as eavesdroppers, can be achieved with
classical cryptographic techniques, our main objective is the protection of the
data against insiders. In particular we assume that no attacker is able to interfere
with these communications.2.3 Informal Objectives
We here formulate the properties we would like to achieve in order to meet good
privacy standards.
Condition 1 When the biometric identification system is dealing with the identification
of a template b coming from the registered user Ui with identity IDi, it
should return a subset containing a reference to (IDi, bi) except for a negligible
probability.
Condition 2 When the system is dealing with the identification of a template
b coming from an unregistered user, it should return the empty set ∅ except for
a negligible probability.
We do not want a malicious database to be able to link an identity to a
biometric template, nor to be able to make relations between different identities.
Condition 3 The database DB should not be able to distinguish two enrolled
biometric data.
Another desired property is the fact that the database knows nothing of the
identity of the user who goes through the identification process, for example, to
avoid unwanted statistics.
Condition 4 The database DB should not be able to guess which identification
request is executed.
3 Security Model for Error-Tolerant Searchable
Encryption
In this section, we describe a formal model for an error-tolerant searchable
encryption protocol. A specific construction fitting in this model is described in
Section 5. This scheme enables to approximately search and retrieve a message
stored in a database, i.e. with some error-tolerance on the request. This is
in fact a problem quite close to biometric identification and the corresponding
cryptographic primitives are thus used in our system, cf. Section 6.
In the sequel, we note {m, . . .,n} the set of all integers between m and n
(inclusive).
3.1 Entities for the Protocol
Our primitive models the interactions between users that store and retrieve
information, and a remote server. We distinguish the user who stores the data
from the one who wants to get it. This leads to three entities:
• The server S: a remote storage system. As the server is untrusted, we
consider the content to be public. Communications to and from this server
are also subject to eavesdropping,• The sender X incrementally creates the database, by sending data to S,
• The receiver Y makes queries to the server S.
In a latter part (Sec. 6), we integrate our cryptographic protocols into our
biometric identification system. This doing, we merge the entities defined in
Sec. 2.2 and those just previously introduced.
We emphasize that X and Y are not necessarily the same user, as X has full
knowledge of the database he created whereas Y knows only what he receives
from S.
3.2 Definition of the Primitives
In the sequel, messages are binary strings of a fixed length N, and d(x1, x2) the
Hamming Distance between x1, x2 ∈ {0, 1} N is the canonical distance, i.e. the
number of positions in {1, . . ., N} in which x1 and x2 differ.
Here comes a formal definition of the primitives that enable to perform an
error-tolerant searchable encryption; this definition cannot be parted from the
definition of Completeness(λmin) and ǫ-Soundness(λmax), which follows.
Definition 2 A (ǫ, λmin, λmax)-Public Key Error-Tolerant Searchable Encryption
is obtained with the following probabilistic polynomial-time methods:
• KeyGen(1 k ) initializes the system, and outputs public and private keys
(pk, sk); k is the security parameter. The public key pk is used to store
data on a server, and the secret key sk is used to retrieve information
from that server.
• SendX,S(x, pk) is a protocol in which X sends to S the data x ∈ {0, 1} N
to be stored on the storage system. At the end of the protocol, S associated
an identifier to x, noted ϕ(x).
• RetrieveY,S(x ′ , sk) is a protocol in which, given a fresh data x ′ ∈ {0, 1} N ,
Y asks for the identifiers of all data that are stored on S and are close to
x ′ , with Completeness(λmin) and ǫ-Soundness(λmax). This outputs a set
of identifiers, noted Φ(x ′ ).
These definitions are comforted by the condition 5 of Section 3.3 that defines
Completeness and ǫ-Soundness for the parameters already introduced in Section
2.1, λmin, λmax. In a few words, Completeness implies that a registered message
x is indeed found if the query word x ′ is at a distance less than λmin from x,
while ǫ-Soundness means that with probability greater than 1 − ǫ, no message
at a distance greater than λmax from x ′ will be returned.
The Send protocol produces an output ϕ(x) that identifies the data x. This
output ϕ(x) is meant to be a unique identifier, which is a binary string of
undetermined length – in other words, elements of {0, 1} ⋆ – that enables to
retrieve x. It can be a timestamp, a name or nickname, etc. depending on the
application.3.3 Security Requirements
First of all, it is important that the scheme actually works, i.e. that the retrieval
of a message near a registered one gives the correct result. This can be formalized
into the following condition:
Condition 5 (Completeness(λmin), ǫ-Soundness(λmax)) Let x1, . . .,xp ∈
B = {0, 1} N be p different binary vectors, and let x ′ ∈ B be another binary
vector. Suppose that the system was initialized, that all the messages xi have
been sent by user X to the system S with identifiers ϕ(xi), and that user Y
retrieved the set of identifiers Φ(x ′ ) associated to x ′ .
1. The scheme is said to be complete if the identifiers of all the xi that are
near x ′ are almost all in the resulting set Φ(x ′ ), i.e. if
is negligible.
ηc = Pr
x ′ [∃i s.t. d(x′ , xi) ≤ λmin and ϕ(xi) /∈ Φ(x ′ )]
2. The scheme is said to be ǫ-sound if the probability of finding an unwanted
result in Φ(x ′ ), i.e.
ηs = Pr
x ′ [∃i ∈ {1, . . .,p} s.t. d(x′ , xi) > λmax and ϕ(xi) ∈ Φ(x ′ )] ,
is bounded by ǫ.
The first condition simply means that registered data is effectively retrieved
if the input is close. ηc expresses the probability of failure of this Retrieve
operation.
The second condition means that only the close messages are retrieved, thus
limiting false alarms. ηs measures the reliability of the Retrieve query, i.e. if all
the results are identifiers of messages near to x ′ .
These two properties (Completeness and ǫ-Soundness) are sufficient to have
a working set of primitives which allows to make approximate queries on a
remote storage server. The following conditions, namely Sender Privacy and
Receiver Privacy, ensure that the data stored in the server is secure, and that
communications can be done on an untrusted network.
Condition 6 (Sender Privacy) The scheme is said to respect Sender Privacy
Sender Privacy
if the advantage of any malicious server is negligible in the ExpA
experiment,
described below. Here, A is an “honest-but-curious” opponent taking
the place of S, and C is a challenger at the user side.
Sender Privacy
ExpA
1. (pk,sk) ← KeyGen(1 k ) (C)
2. {x2, . . . , xΩ} ← A (A)
3. ϕ(xi) ← SendX,S(xi, pk) (C)
4. {x0, x1} ← A (A)
5. ϕ(xe) ← SendX,S(xe, pk), (C)
e ∈R {0, 1}
6. Repeat steps (2,3)
7. e ′ ∈ {0, 1} ← A (A)The advantage of the adversary is | Pr[e ′ = e] − 1
2 |.
Informally, in a first step, the adversary receives Send requests that he chose
himself; A then looks for a couple (x0, x1) of messages on which he should have
an advantage. C chooses one of the two messages, and the adversary must guess,
by receiving the Send requests, which one of x0 or x1 it was.
This condition permits to have privacy on the content stored on the server.
The content that the sender transmits is protected, justifying the title “Sender
Privacy”.
Another important privacy aspect is the secrecy of the data that is retrieved.
We do not want the server to have information on the fresh data x ′ that is
queried; this is expressed by the following condition.
Condition 7 (Receiver Privacy) The scheme is said to respect Receiver Pri-
Receiver Privacy
vacy if the advantage of any malicious server is negligible in the ExpA
experiment described below. As in the previous condition, A denotes the “honestbut-curious”
opponent taking the place of S, and C the challenger at the user
side.
Receiver Privacy
ExpA
1. (pk,sk) ← KeyGen(1 k ) (C)
2. {x1, . . . , xΩ} ← A (A)
d(xi, xj) > λmax, ∀i, j ∈ {1, . . . , Ω}
3. ϕ(xi),(i ∈ {1, . . . , Ω}) ← SendX,S(xi, pk) (C)
4. {x ′ 2, . . . , x ′ p} ← A (A)
5. Φ(x ′ j),(j ∈ {2, . . . , p}) ← RetrieveY,S(x ′ j, sk) (C)
6. (x ′ 0, x ′ 1) ← A (A)
7. Φ(x ′ e) ← RetrieveY,S(x ′ e, sk), (C)
e ∈R {0, 1}
8. Repeat steps (4,5)
9. e ′ ∈ {0, 1} ← A (A)
The advantage of the adversary is | Pr[e ′ = e] − 1
2 |.
This condition is the mirror image of the previous one. It transposes the idea
that the receiver Y can make his queries to S without leaking information on
their content. The processing of the experiment is the same as the Sender
Privacy experiment, except that A has to distinguish between Retrieve queries
instead of Send queries.
Remark 3 Conditions 6 and 7 are the transposition of their homonym statement
in [6]. They aim for the same goal, i.e. privacy – against the server – of
the data that is registered first, then looked for.
Section 5 is dedicated to give a construction that fits these security conditions.4 Our Data Structure for Approximate Searching
After the recall of the notions of locality-sensitive hashing and Bloom filters,
we introduce a new structure which enables approximate searching by combining
both notions. We end this section with the introduction of some classical
cryptographic protocols.
In the sequel, we denote [a, b] the interval of all real values between a and b
(inclusive).
4.1 Locality-Sensitive Hashing
We first consider the following problem:
Problem 1 (Approximate Nearest Neighbour Problem) Given a set P
of points in the metric space (B, d) pre-process P to efficiently answer queries.
The answer of a query x is a point px ∈ P such that d(x, px) ≤ (1+ǫ)minp∈P d(x, p).
This problem has been widely studied over the last decades; reviews on the
subject include [29]. However, most algorithms proposed to solve the matter
consider real spaces over the lp distance, which is not relevant in our case. A
way to search the approximate nearest neighbour in a Hamming space is to use a
generic construction called locality-sensitive hashing. It looks for hash functions
(not cryptographic ones) that give the same result for near points, as defined in
[30]:
Definition 3 ([30]) Let B be a metric space, U a set with a smaller dimensionality,
r1, r2 ∈ R with r1 < r2, p1, p2 ∈ [0, 1] with p1 > p2. A family H =
{h1, . . . , hµ}, hi : B → U, is (r1, r2, p1, p2)-LSH (Locality-Sensitive Hashing), if
for all h ∈ H, x, x ′ ∈ B, Pr[h(x) = h(x ′ )] > p1 (resp. Pr[h(x) = h(x ′ )] < p2) if
dB(x, x ′ ) < r1 (resp. dB(x, x ′ ) > r2).
Such functions reduce the differences occurring between similar data with
high probability, whereas distant data should remain significantly remote.
A noticeable example of a LSH family was proposed by Kushilevitz et al. in
[37]; see also [36, 30, 1].
4.2 Bloom Filters
As introduced by Bloom in [3], a set of Bloom filters is a data structure used
for answering set membership queries.
Definition 4 Let D be a finite subset of Y . For a collection of ν (independent)
hash functions H ′ = {h′ 1, . . .,h ′ ν}, with each h ′ i : Y → {1, . . ., m} , the induced
(ν, m)-Bloom filter is H, together with an array (t1, . . . , tm) ∈ {0, 1} m , defined
as:
{
′
1 if ∃i ∈ {1, . . .,ν}, y ∈ D s.t. h i (y) = α
tα =
0 otherwiseWith this setting, testing if y is in D is the same as checking if for all
i ∈ {1, . . ., ν}, th ′
i (y) = 1. The best setting for the filter is that the involved hash
function be as randomized as possible, in order to fill all the buckets tα.
In this setting, some false positive may happen, i.e. it is possible for all
th ′
i (y) to be set to 1 and y /∈ D. This event is well known, and the probability
for a query to be a false positive is:
(
1 − ( 1 − ν
m
) |D|
) ν
.
This probability can be made as small as needed. On the other hand, no
false negative is enabled.
We work here with the Bloom filters with storage (BFS) defined in [6] as
an extension of Bloom filters. Their aim is to give not only the result of the
set membership test, but also an index associated to the element. The iterative
definition below introduces these objects and the notion of tags and buckets
which are used in the construction.
Definition 5 (Bloom Filter with Storage, [6]) Let D be a finite subset of
a set Y . For a collection of ν hash functions H ′
= {h′ 1, . . . , h ′ ν}, with each
h ′ j : Y → {1, . . ., m}, a set V of tags associated to D with a tagging function
ψ : D → P(V ). A (ν, m)-Bloom Filter with Storage is H ′, together with an
array of subsets (T1, . . . , Tm) of V , called buckets, iteratively defined as:
1. ∀i ∈ {1, . . .,m}, Ti ← ∅,
2. ∀y ∈ D, ∀j ∈ {1, . . .,ν}, update the bucket Tα with Tα ← Tα ∪ ψ(y) where
α = h ′ j (y).
In other words, the bucket structure is empty at first, and for each element
y ∈ D to be indexed, we add to the bucket Tα all the tags associated to y.
Construction of such a structure is illustrated in Fig. 1.
h ′ 2 (y3) = 2 h ′
1 (y3) = α h ′ 3 (y3) = m
ψ(y2)
· · · · · ·
ψ(y1) ψ(y1) ψ(y1)
∅
ψ(y3)
ψ(y2)
ψ(y2)
ψ(y3)
ψ(y3)
Figure 1: Construction of Bloom Filters with Storage
Example 1 In Fig. 1, assume that D = {y1, y2, y3} and ν = 3, the tags
associated to y1 (resp. y2) have already been incorporated into the buckets T2, T3
and Tα (resp. T1, T2 and T3) so that T1 = {ψ(y2)}, T2 = T3 = {ψ(y1), ψ(y2)},
Tα = {ψ(y1)} and Ti = ∅ otherwise. We are now treating the case of y3:
• h ′ 1 (y3) = α so Tα ← Tα ∪ {ψ(y3)}, i.e. Tα = {ψ(y1), ψ(y3)};• h ′ 2 (y3) = 2 so T2 ← T2 ∪ {ψ(y3)}, i.e. T2 = {ψ(y1), ψ(y2), ψ(y3)};
• h ′ 3 (y3) = m so Tm ← Tm ∪ {ψ(y3)}, i.e. Tm = {ψ(y3)}.
This construction enables to retrieve a set of tags associated to an element
y ∈ D: it is designed to obtain ψ(y), the set of tags associated to y, by computing
⋂ ν
j=1 T h ′ j (y). For instance, in the previous example, ⋂ ν
j=1 T h ′ j (y3) =
T2 ∩ Tα ∩ Tm = {ψ(y3)}. This intersection may capture inappropriate tags,
but the choice of relevant hash functions and increasing their number allow to
reduce the probability of that event. These properties are summed up in the
following lemma.
Lemma 1 ([3]) Let (H ′ , T1, . . .,Tm) be a (ν, m)-Bloom filter with storage indexing
a set D with tags from a tag set V . Then, for y ∈ D, the following
properties hold:
• ψ(y) ⊂ T(y) = ⋂ ν
j=1 T h ′ j (y), i.e. each of y’s tag is retrieved,
•
(
the probability for a false positive t ∈ V is Pr[t ∈ T(y) and t ̸∈ ψ(y)] =
1 − ( 1 − ν
) )
|D|
ν
m
.
4.3 Combining BFS and LSH
We want to apply Bloom filters to data that are very likely to vary. To this
aim, we first apply LSH-families as input to Bloom filters.
We choose µ hash functions from an adequate LSH family h1, . . .,hµ : B →
{0, 1} t , and ν hash functions dedicated to a Bloom filter with Storage h ′ 1 , . . . , h′ ν :
{0, 1} t × {1, . . ., µ} → {1, . . ., m}. The LSH family is denoted H, and H ′ is the
BFS one. To obtain a BFS with locality-sensitive functionality, we use composite
µ × ν hash functions induced by both families.
We define hc (j,i)
: B → {1, . . .,m} the corresponding composite functions (c
stands for composite) with hc (j,i)(y) = h′ j (hi(y) ‖ i). Let Hc = {hc (j,i), (j, i) ∈
{1, . . ., ν} × {1, . . ., µ}} the set of all these functions.
To sum up, we modify the update of the buckets in Def. 5 by α = h ′ j (hi(y) ‖
i). Later on, to recover tags related to an approximate query x ′
we have to consider is ⋂ν
j=1
⋂
∈ B, all
µ
i=1 Th ′ j (hi(x′)‖i). Indeed, if x and x ′ are close
enough, then the LSH functions give the same results on x and x ′ , effectively
providing a Bloom filter with storage that has the LSH property. This property
is numerically estimated in the following lemma:
Lemma 2 Let H, H ′ , H c be families constructed in this setting. Let x, x ′ ∈ B
be two binary vectors. Assume that H is (λmin, λmax, ǫ1, ǫ2)-LSH from B to
{0, 1} t ; assume that H ′ is a family of ν pseudo-random hash functions. If
the tagging function ψ associates only one tag per element, then the following
properties stand:1. If x and x ′ are far enough, then except with a small probability, ψ(x ′ ) does
not intersect all the buckets that index x, i.e.
Pr
x ′
"
ψ(x ′ ) ⊂ \
Thc (x) and d(x, x ′ # „
) ≥ λmax ≤
h c ∈H c
ǫ2 + (1 − ǫ2) 1
m
« |H
c
|
,
2. If x and x ′ are close enough, then except with a small probability, ψ(x ′ ) is
in all the buckets that index x ′, i.e.
Pr
x ′
"
ψ(x ′ ) ̸⊂ \
Thc (x) and d(x,x ′ #
) ≤ λmin ≤ 1 − (1 − ǫ1) |Hc| .
h c ∈H c
Note that this lemma used the simplified hypothesis that ∀x, |ψ(x)| = 1,
which means that there is only one tag per vector. This has a direct application
in Section 5.2. In practice, ψ(x) can be a unique handle for x.
Sketch of proof. The first part of the lemma expresses the fact that if
d(x, x ′ ) ≥ λmax, due to the composition of a LSH function with a pseudorandom
function, the collision probability is 1
m . Indeed, if h′ 1(y1) = h ′ 2(y2), either y1 = y2
and h ′ 1 = h ′ 2, or there is a collision of two independent pseudo-random hash
function. In our case, if y1 = y2, that means that y1 = hi1(x)||i1 and y2 =
hi2(x ′ )||i2. To these vectors to be the same, i1 = i2 and hi1(x) = hi2(x ′ ), which
happens with probability ǫ2.
The second part of the lemma says that for each h c ∈ H c , h c (x) and h c (x ′ )
are the same with probability 1 − ǫ1. Combining the incremental construction
of the Ti with this property gives the lemma.
✷
4.4 Cryptographic Primitives
Public Key Cryptosystem Our construction requires a semantically secure
public key cryptosystem – as defined in [25], see for instance [20, 42] – to store
some encrypted data in the database. Encryption function is noted Enc and
decryption function Dec, the use of the keys is implicit. An encryption scheme
is said to be semantically secure (against a chosen plaintext attack, also noted
IND-CPA [25]) if an adversary without access to the secret key sk, cannot
distinguish between the encryptions of a message x0 and a message x1.
Private Information Retrieval Protocols A primitive that enables privacyensuring
queries to databases is Private Information Retrieval protocol (PIR,
[17]). Its goal is to retrieve a specific information from a remote server in such a
way that he does not know which data was sent. This is done through a method
Query PIR
Y,S (a), that allows Y to recover the element stored at index a in S by
running the PIR protocol.
Suppose a database is constituted with M bits X = x1, ..., xM. To be secure,
the protocol should satisfy the following properties [23]:
• Soundness: When the user and the database follow the protocol, the
result of the request is exactly the requested bit.• User Privacy: For all X ∈ {0, 1} M , for 1 ≤ i, j ≤ M, for any algorithm
used by the database, it cannot distinguish with a non-negligible
probability the difference between the requests of index i and j.
Among the known constructions of computational secure PIR, block-based
PIR – i.e. working on block of bits – allows to efficiently reduce the cost. The
best performances are from Gentry and Ramzan [22] and Lipmaa [38] with a
communication complexity polynomial in the logarithm of M. Surveys of the
subject are available in [21, 40].
Some PIR protocols are called Symmetric Private Information Retrieval,
when they comply with the Data Privacy requirement [23]. This condition
states that the querier cannot distinguish between a database that possesses
only the information he requested, and a regular one; in other words, that the
querier do not get more information that what he asked.
Private Information Storage (PIS) Protocols PIR protocols enable to
retrieve information of a database. A Private Information Storage (PIS) protocol
[40] is a protocol that enables to write information in a database with
properties that are similar to that of PIR. The goal is to prevent the database
from knowing the content of the information that is being stored; for detailed
description of such protocols, see [6, 41].
Such a protocol provides a method update(val, index), which takes as input
an element and a database index, and puts the value val into the database
entry index. To be secure, the protocol must also satisfy the Soundness and
User Privacy properties, meaning that 1. updateBF does update the database
with the appropriate value, and 2. any algorithm run by the database cannot
distinguish between the writing requests of (vali, indi) and (valj, indj).
5 Our Construction for Error-Tolerant Searchable
Encryption
5.1 Technical Description
Our searching scheme uses all the tools we described in the previous section.
As we will see in section 5.2, this enables to meet the privacy requirements of
section 3.3. More precisely:
• We pick a family H ′ of functions: h ′ : {0, 1} t × {1, . . ., |H|} → {1, . . .,m},
adapted to a Bloom filter structure,
• We choose a family H of functions: h : {0, 1} N → {0, 1} t that have the
LSH property,
• From these two families, we deduce a family H c of functions h c : {0, 1} N →
{1, . . .,m} as specified in Sec. 4.3,• We use a semantically secure public key cryptosystem (Setup, Enc, Dec)
[25],
• We use a PIR protocol with query function Query PIR
Y,S .
• We use a PIS function updateBF(val, i) that adds val to the i-th bucket of
the Bloom filter, see Sec. 4.4.
Here come the details of the implementation. In a few words, storage and
indexing of the data are separated, so that it becomes feasible to search over
the encrypted documents. Indexing is made thanks to Bloom Filters, with an
extra precaution of encrypting the content of all the buckets. Finally, using our
locality-sensitive hashing functions permits error-tolerance.
5.1.1 System setup
The method KeyGen(1 k ) initializes m different buckets to ∅. The public and
secret keys of the cryptosystem (pk, sk) are generated by Setup(1 k ), and sk is
given to Y.
5.1.2 Sending a message
The protocol SendX,S(x, pk) goes through the following steps (cf. Fig. 2):
1. Identifier establishment S attributes to x a unique identifier ϕ(x), and
sends it to X.
2. Data storage X sends Enc(x) to S, who stores it in a memory cell that
depends on ϕ(x).
3. Data indexing
• X computes h c (x) for all h c ∈ H c ,
• and executes updateBF(Enc(ϕ(x)), h c (x)) to send Enc(ϕ(x)) to be
added to the filter’s bucket of index h c (x) on the server side.
Note that for privacy concerns, we complete the buckets with random data in
order to get the same bucket size l for the whole data structure.
The first phase (identifier establishment) is done to create an identifier that
can be used to register and then retrieve x from the database. For example,
ϕ(x) can be the time at which S received x, or the first memory address that is
free for the storage of Enc(x).
The third phase applies the combination of BFS and LSH functions (see Sec.
4.3) to x so that it is possible to retrieve x with some approximate data. This
is done with the procedure described hereafter.1.
mem alloc
X
2.
ϕ(x) = &x
Store Enc(x) at ϕ(x)
Store Enc(ϕ(x)) at all {h c
i (x)} i
S
Figure 2: Sending a message in a nutshell
5.1.3 Retrieving data
The protocol RetrieveY,S(x ′ , sk) goes through the following steps (cf. Fig. 3):
1. Y computes each αi = h c i (x′ ) for each h c i ∈ Hc , then executes Query PIR
Y,S (αi)
to receive the filter bucket Tαi,
2. Y decrypts the content of each bucket Tαi
and computes the intersection
of all the Dec(Tαi),
3. This intersection is a set of identifiers {ϕ(xi1), . . . , ϕ(xiγ)}, which is the
result of the execution of Retrieve.
PIR
αi = h c i (x′ )
Y
S
Tα i
T
i Tα i = {ϕ(xi 1 ), . . . , ϕ(xiγ)}
Figure 3: Retrieving data in a nutshell
As we can see, the retrieving process follows that of Sec. 4.3, with the noticeable
differences that 1. the identifiers are always encrypted in the database,
and 2. the query is made following a PIR protocol. This permits to benefit
from both the Bloom filter structure, the locality-sensitive hashing, and the
privacy-preserving protocols.
The secure protocols involved do not leak information on the requests made,
and the next section discusses more precisely the security properties achieved.5.2 Security Properties
We now demonstrate that this construction faithfully achieves the security requirements
we defined in Sec. 3.3.
Proposition 1 (Completeness) Provided that H is a (λmin, λmax, ǫ1, ǫ2)-LSH
family, for a negligible ǫ1, this scheme is complete.
Proposition 2 (ǫ-Soundness) Provided that H is a (λmin, λmax, ǫ1, ǫ2)-LSH
family from {0, 1} N to {0, 1} t , and provided that the Bloom filter functions H ′
behave like pseudo-random functions from {0, 1} t × {1, . . .,|H|} to {1, . . ., m},
then the scheme is ǫ-sound, with:
„
ǫ =
ǫ2 + (1 − ǫ2) 1
m
« |H
c
|
Propositions 1 and 2 are direct consequence of Lemma 2.
Remark 4 Proposition 2 assumes that the Bloom filter hash functions are pseudorandom;
this hypothesis is pretty standard for Bloom filter analysis. It can be
achieved by using cryptographic hash functions with a random oracle-like behaviour.
Proposition 3 (Sender Privacy) Assume that the underlying cryptosystem
is semantically secure and that the PIS function update BF achieves User Privacy,
then the scheme ensures Sender Privacy.
Proof. If the scheme does not ensure Sender Privacy, that means that there
exists an attacker who can distinguish between the output of Send(x0, pk) and
Send(x1, pk), after the execution of Send(xi, pk), i ∈ {2, . . .,Ω}.
Note that the content of the Bloom filter buckets does not reveal information
that can permit to distinguish between x0 and x1. Indeed, the only information
A has with the filter structure is a set of Enc(ϕ(xi)) placed at different indexes
hc (xi), i = e, 2, . . .,Ω. Thanks to the semantic security of Enc, this does not
permit to distinguish between ϕ(x0) and ϕ(x1).
This implies that, with inputs Enc(xi), updateBF(Enc(ϕ(xi)), hc (xi)) ( for i ≥
2), the attacker can distinguish between Enc(x0), updateBF(Enc(ϕ(x0)), hc (x0))
and Enc(x1), updateBF(Enc(ϕ(x1)), hc (x1)).
As updateBF does not leak information on its inputs, that means that the
attacker can distinguish between Enc(x0) and Enc(x1) by choosing some other
inputs to Enc. That contradicts the semantic security assumption. ✷
Proposition 4 (Receiver Privacy) Assume that the PIR ensures User Privacy,
then the scheme ensures Receiver Privacy.
Proof.
This property is a direct deduction of the PIR’s User Privacy, as the only
information S gets from the execution of a Retrieve is a set of Query PIR . ✷
These properties show that this protocol for Error-Tolerant Searchable Encryption
has the security properties that we looked for. LSH functions are used
in such a way that they do not degrade the security properties of the system.6 Application to Identification with Encrypted
Biometric Data
6.1 Our Biometric Identification System
We now apply our construction for error-tolerant searchable encryption to our
biometric identification purpose. Thanks to the security properties of the above
construction, this enables us to design a biometric identification system which
achieves the security objectives stated in Section 2.3.
While applying the primitives of error-tolerant searchable encryption, the
database DB takes the place of the server S; the role of the Identity Provider
IP varies with the step we are involved in. During the Enrolment step, IP
behaves as X, and as Y during the Identification step. In this step, IP is in
possession of the private key sk used for the Retrieve query.
6.1.1 Enrolment
• To enrol a user Ui, the sensor SC acquires a sample bi from his biometrics
and sends it to IP,
• The Identity Provider IP then executes SendX,S(bi, pk).
6.1.2 Identification
• SC captures a fresh biometric template b ′ from a user U and sends it to
IP,.
• The Identity Provider IP then executes RetrieveY,S(b ′ , sk).
At the end of the identification, IP has the fresh biometric template b ′ along
with the address of the candidate reference templates in DB. To reduce the list
of identities, we can use a secure matching scheme [12, 44] to run a final secure
comparison between b ′ and the candidates.
6.2 Practical Considerations
6.2.1 Choosing the LSH family: an Example
Let’s place ourself in the practical setting of human identification through iris
recognition. A well-known method to doing so is to use Daugman’s IrisCode [18].
This extracts a 2048-bit vector, along with a “mask”, that defines the relevant
information in this vector. Iris recognition is then performed by computing a
simple Hamming distance; vectors that are at a Hamming distance less than a
given threshold are believed to come from the same individual, while vectors
that come from different eyes will be at a significantly larger distance.
There are several paths to design LSH functions adapted to this kind of
data. Random projections such as those defined in [37], is a convenient way to
create LSH functions for binary vectors. However, for the sake of simplicity, wepropose to use the functions used in [26], in which they are referred as ’beacon
indexes’. These functions are based on the fact that all IrisCode bits do not
have the same distribution probability.
In a few words, these functions first reorder the bits of the IrisCode by rows,
so that in each row, the bits that are the most likely to induce an error are
the least significant ones. The column are then reordered to avoid correlations
between following bits. The most significant bits of rows are then taken as 10-bit
hashes. The efficiency of this approach is demonstrated in [26] where the authors
apply these LSH functions to identify a person thanks to his IrisCode. They
interact with the UAE database which contains N = 632500 records; trivial
identification would then require about N/2 classical matching computation,
which is way too much for a large database. Instead, they apply µ = 128 of
those hashes to the biometric data, and look for IrisCodes that get the same
LSH results for at least 3 functions. In doing this, they limit the number of
necessary matching to 41 instead of N.
To determine the LSH capacity of these hash functions is not easy to do
with real data; however, if we model b and b ′ as binary vectors such that the
each bit of b is flipped with a fixed probability (i.e. if b ′ is obtained out of
b through a binary symmetric channel), then the family induced is (r1, r2, 1 −
(1 − r1
2048 )10, (1 − r2
2048 )10)-LSH. This estimation is conservative as IrisCodes are
designed for biometric matching.
Combining these functions with a Bloom filter with storage in the way described
in Sec. 4.3 enables to have an secure identification scheme.
6.2.2 Overall complexity and efficiency
We here evaluate the computational complexity of an identification request on
the client’s side. We note κ(op) the cost of operation op, and |S| the size of the
set S. Recalling Section 5.1, the overall cost of a request is:
κ(request)
= |H c |(κ(hash) + κ(PIR) + |T |κ(Dec)) + κ(intersection)
≤ |H c | (κ(hBF) + κ(hLSH) + κ(PIR) + |T |κ(Dec)) + O(|T ||H c |)
We here used data structures in which intersection of sets is linear in the set
length, hence the term O(|T ||H c |); |T | is the maximum size of a Bloom filter
with storage bucket.
To conclude this complexity estimation, let us recall that the cost of a hash
function can be neglected in front of the cost of a decryption step. The PIR
query complexity at the sensor level depends on the scheme used (recall that
the PIR query is made only over the set of buckets and not over the whole
database); in the case of Lipmaa’s PIR [38], this cost κ(PIR) is dominated by
the cost of a Damg˚ard-Jurik encryption. The overall sensor complexity of an
identification request is O(µν(|T |κ(Dec) + κ(PIR))).7 Conclusion
This paper details the first non-trivial construction for biometric identification
over encrypted binary templates. This construction meets the privacy model
one can expect from an identification scheme and the computation costs are
sublinear in the size of the database.
We studied identification scheme using binary data, together with Hamming
distance. We plan to extend our scope to other metrics. A first lead to follow is
to use techniques from [37] which reduce the problem of ANN over Euclidean
spaces into ANN over a Hamming space.
References
[1] A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate
nearest neighbor in high dimensions. Commun. ACM, 51(1):117–122, 2008.
[2] J. Bethencourt, D. X. Song, and B. Waters. New constructions and practical
applications for private stream searching (extended abstract). In IEEE
Symposium on Security and Privacy, pages 132–139. IEEE Computer Society,
2006.
[3] B. H. Bloom. Space/time trade-offs in hash coding with allowable errors.
Commun. ACM, 13(7):422–426, 1970.
[4] Ruud M. Bolle, Jonathan H. Connell, and Nalini K. Ratha. Biometric
perils and patches. Pattern Recognition, 35(12):2727–2738, 2002.
[5] D. Boneh, G. Di Crescenzo, R. Ostrovsky, and G. Persiano. Public key
encryption with keyword search. In Cachin and Camenisch [16], pages
506–522.
[6] D. Boneh, E. Kushilevitz, R. Ostrovsky, and W. E. Skeith III. Public
key encryption that allows PIR queries. In CRYPTO, volume 4622, pages
50–67. Springer, 2007.
[7] J. Bringer and H. Chabanne. An authentication protocol with encrypted
biometric data. In Serge Vaudenay, editor, AFRICACRYPT, volume 5023
of Lecture Notes in Computer Science, pages 109–124. Springer, 2008.
[8] J. Bringer, H. Chabanne, G. Cohen, B. Kindarji, and G. Zémor. Theoretical
and practical boundaries of binary secure sketches. IEEE Transactions on
Information Forensics and Security, 3(4):673–683, 2008.
[9] J. Bringer, H. Chabanne, and Q. D. Do. A fuzzy sketch with trapdoor.
IEEE Transactions on Information Theory, 52(5):2266–2269, 2006.
[10] J. Bringer, H. Chabanne, M. Izabachène, D. Pointcheval, Q. Tang, and
S. Zimmer. An application of the Goldwasser-Micali cryptosystem to biometric
authentication. In Josef Pieprzyk, Hossein Ghodosi, and Ed Dawson,editors, ACISP, volume 4586 of Lecture Notes in Computer Science, pages
96–106. Springer, 2007.
[11] J. Bringer, H. Chabanne, and B. Kindarji. The best of both worlds: Applying
secure sketches to cancelable biometrics. Science of Computer Programming,
74(1–2):43–51, 2008. Special Issue on Security and Trust.
[12] J. Bringer, H. Chabanne, D. Pointcheval, and Q. Tang. Extended private
information retrieval and its application in biometrics authentications. In
Feng Bao, San Ling, Tatsuaki Okamoto, Huaxiong Wang, and Chaoping
Xing, editors, CANS, volume 4856 of Lecture Notes in Computer Science,
pages 175–193. Springer, 2007.
[13] J. Bringer, H. Chabanne, D. Pointcheval, and S. Zimmer. An application
of the boneh and shacham group signature scheme to biometric authentication.
In Kanta Matsuura and Eiichiro Fujisaki, editors, IWSEC, volume
5312 of Lecture Notes in Computer Science, pages 219–230. Springer, 2008.
[14] J. Bringer, H. Chabanne, and Q. Tang. An application of the Naccache-
Stern knapsack cryptosystem to biometric authentication. In AutoID, pages
180–185. IEEE, 2007.
[15] J. W. Byun, D. H. Lee, and J. Lim. Efficient conjunctive keyword search
on encrypted data storage system. In Andrea S. Atzeni and Antonio Lioy,
editors, EuroPKI, volume 4043, pages 184–196. Springer, 2006.
[16] C. Cachin and J. Camenisch, editors. Advances in Cryptology - EURO-
CRYPT 2004, International Conference on the Theory and Applications
of Cryptographic Techniques, Interlaken, Switzerland, May 2-6, 2004, Proceedings,
volume 3027 of LCNS. Springer, 2004.
[17] B. Chor, E. Kushilevitz, O. Goldreich, and M. Sudan. Private information
retrieval. J. ACM, 45(6):965–981, 1998.
[18] J. Daugman. High confidence visual recognition of persons by a test
of statistical independence. IEEE Trans. Pattern Anal. Mach. Intell.,
15(11):1148–1161, 1993.
[19] Y. Dodis, L. Reyzin, and A. Smith. Fuzzy extractors: How to generate
strong keys from biometrics and other noisy data. In Cachin and Camenisch
[16], pages 523–540.
[20] T. El Gamal. A public key cryptosystem and a signature scheme based on
discrete logarithms. In CRYPTO, pages 10–18, 1984.
[21] W. I. Gasarch. A survey on private information retrieval.
http://www.cs.umd.edu/∼gasarch/pir/pir.html.
[22] C. Gentry and Z. Ramzan. Single-database private information retrieval
with constant communication rate. In ICALP, volume 3580 of LCNS, pages
803–815. Springer, 2005.[23] Y. Gertner, Y. Ishai, E. Kushilevitz, and T. Malkin. Protecting data privacy
in private information retrieval schemes. In STOC, pages 151–160, 1998.
[24] E.-J. Goh. Secure indexes. Cryptology ePrint Archive, Report 2003/216,
2003. http://eprint.iacr.org/2003/216/.
[25] S. Goldwasser and S. Micali. Probabilistic encryption. J. Comput. Syst.
Sci., 28(2):270–299, 1984.
[26] F. Hao, J. Daugman, and P. Zielinski. A fast search algorithm for a large
fuzzy database. IEEE Trans. on Inf. Forensics and Security, 2008.
[27] Feng Hao, R. Anderson, and J. Daugman. Combining crypto with biometrics
effectively. Computers, IEEE Transactions on, 55(9):1081–1088, Sept.
2006.
[28] A. Herzberg, S. Jarecki, H. Krawczyk, and M. Yung. Proactive secret
sharing or: How to cope with perpetual leakage. In CRYPTO, volume 963,
pages 339–352. Springer, 1995.
[29] P. Indyk. Nearest neighbors in high-dimensional spaces. In Handbook of
Discrete and Computational Geometry, chapter 39. CRC Press, 2004. 2rd
edition.
[30] P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing
the curse of dimensionality. In Symposium on the Theory Of Computing,
pages 604–613, 1998.
[31] A. K. Jain, K. Nandakumar, and A. Nagar. Biometric template security.
EURASIP Journal on Advances in Signal Processing,, Special Issue on Advanced
Signal Processing and Pattern Recognition Methods for Biometrics,
2008.
[32] A. K. Jain, S. Prabhakar, L. Hong, and S. Pankanti. Fingercode: A filterbank
for fingerprint representation and matching. In CVPR, pages 2187–.
IEEE Computer Society, 1999.
[33] A. K. Jain, A. Ross, and U. Uludag. Biometric template security: Challenges
and solutions. In Proc. of 13th European Signal Processing Conference
(EUSIPCO), Antalya, Turkey, September 2005.
[34] A. Juels and M. Wattenberg. A fuzzy commitment scheme. In ACM Conference
on Computer and Communications Security, pages 28–36, 1999.
[35] D. Khader. Public key encryption with keyword search based on K-resilient
IBE. In ICCSA (3), volume 3982, pages 298–308. Springer, 2006.
[36] A. Kirsch and M. Mitzenmacher. Distance-sensitive Bloom filters. In Algorithm
Engineering & Experiments, Jan 2006.[37] E. Kushilevitz, R. Ostrovsky, and Y. Rabani. Efficient search for approximate
nearest neighbor in high dimensional spaces. In Symposium on the
Theory Of Computing, pages 614–623, 1998.
[38] H. Lipmaa. An oblivious transfer protocol with log-squared communication.
In ISC, volume 3650, pages 314–328. Springer, 2005.
[39] Karthik Nandakumar, Abhishek Nagar, and Anil K. Jain. Hardening fingerprint
fuzzy vault using password. In Seong-Whan Lee and Stan Z. Li,
editors, ICB, volume 4642 of Lecture Notes in Computer Science, pages
927–937. Springer, 2007.
[40] R. Ostrovsky and V. Shoup. Private information storage (extended abstract).
In STOC, pages 294–303, 1997.
[41] R. Ostrovsky and W. E. Skeith III. Algebraic lower bounds for computing
on encrypted data. Cryptology ePrint Archive, Report 2007/064, 2007.
http://eprint.iacr.org/.
[42] P. Paillier. Public-key cryptosystems based on composite degree residuosity
classes. In Advances in Cryptology, Proceedings of EUROCRYPT ’99,
volume 1592 of LCNS, pages 223–238. Springer, 1999.
[43] E.-K. Ryu and T. Takagi. Efficient conjunctive keyword-searchable encryption.
In AINA Workshops (1), pages 409–414. IEEE Computer Society,
2007.
[44] B. Schoenmakers and P. Tuyls. Efficient binary conversion for Paillier
encrypted values. In EUROCRYPT, volume 4004, pages 522–537. Springer,
2006.
[45] A. Shamir. How to share a secret. Commun. ACM, 22(11):612–613, 1979.
[46] Yagiz Sutcu, Qiming Li, and N. Memon. Protecting biometric templates
with sketch: Theory and practice. Information Forensics and Security,
IEEE Transactions on, 2(3):503–512, Sept. 2007.
[47] Q. Tang, J. Bringer, H. Chabanne, and D. Pointcheval. A formal study
of the privacy concerns in biometric-based remote authentication schemes.
In Liqun Chen, Yi Mu, and Willy Susilo, editors, ISPEC, volume 4991 of
Lecture Notes in Computer Science, pages 56–70. Springer, 2008.
[48] P. Tuyls, A. H. M. Akkermans, T. A. M. Kevenaar, G. Jan Schrijen, A. M.
Bazen, and R. N. J. Veldhuis. Practical biometric authentication with
template protection. In Audio-and Video-Based Biometrie Person Authentication,
volume 3546, pages 436–446. Springer, 2005.A Achieving Symmetric Receiver Privacy
In this section, we introduce a new security concern, which we call Symmetric
Receiver Privacy.
A.1 Condition statement
This property aims at limiting the amount of information that Y gets through
the protocol. Indeed, if previous constructions of Searchable Encryption such
as [5, 24] seem to consider that the sender and the receiver are the same person,
thus owning the database in the same way, there are applications where the
receiver must not dispose of the entire database. If for example different users
Yi have access to the application, we do not want user Yi to obtain information
on another user Yj’s data.
For this purpose, we define a database simulator S1. S1(x ′ ) is a simulator
which only knows the tags of the registered elements that are in Φ(x ′ ), while
the other elements are random. Here, x ′ stands for the message to be retrieved.
On the other hand, S0 is the regular server, which genuinely runs the protocol.
Condition 8 (Symmetric Receiver Privacy) The scheme is said to respect
Symmetric Receiver Privacy if there exists a simulator S1 such that the advantage
of any malicious receiver is negligible in the Exp Sym-Rec-Privacy
A
experiment
described below. Here, A is the ’honest-but-curious’ opponent taking the place
of Y, and C the challenger at the server side.
Exp Sym-Rec-Privacy
A
1. (pk, sk) ← KeyGen(1 k ) (A)
2. {x1, . . . , xΩ}, ← A (A)
d(xi, xj) > λmax, ∀i, j ∈ {1, . . . , Ω}
3. ϕ(xi) ← SendA,S(xi, pk) (A)
4. e ∈R {0, 1} ← C (C)
5. {x ′ 1, . . . , x ′ p}, ← A (A)
d(x ′ i, x ′ j) > λmax, ∀i, j ∈ {1, . . . , p}
6. Φ(x ′ i) ← RetrieveA,Se(x ′ i, sk) (A)
7. e ′ ∈ {0, 1} ← A (A)
The advantage of the adversary is | Pr[e ′ = e] − 1
2 |.
This new condition does not fit into previous models for Searchable Encryption,
and is not satisfied by constructions such as [6, 24]. It is inspired by the
Data Privacy property of SPIR protocols, which states that it is not possible to
tell whether or not S possesses more data than the received messages. Indeed,
if the receiver is able to tell the difference between a server S0 that possess
more data than what Y received, and a server S1 that just has in memory the
information that Y needs, then Y detains more information than what he ought
to; that is why this indistinguishability game fits the informal description of
Symmetric Receiver Privacy.
Section A.3 is dedicated to give a construction that also fits this security
conditions.A.2 Specific Tools
ElGamal For this purpose, we specify a second cryptosystem (Setup, Enc, Dec)
to be that of ElGamal: let G be a cyclic group of order q, a large prime, with g a
generator; let f be another generator of G. Setup renders the key pair (h = gv , v)
for v a random integer. Encryption Enc takes a random value r, and computes
= x can be computed thanks
Enc(x) = (gr, hrx). The value Dec ((y1, y2)) = y2
yv 1
to the secret key v. The homomorphic property is Dec(Enc(x)Enc(x ′)) = xx ′.
Secret splitting Let s be a small secret; we wish to split s into n rerandomizable
parts. There is a general technique for this, called Proactive
Secret Sharing [45, 28], but for clarity reasons, we propose a simple technique
for this. We construct n shares A1, . . . , An such that Ai = g ri where ri is a random
integer, for i ∈ {1, . . .,n −1} and An = g − P ri+s
, where g is the generator
of a group of large prime order q. Recovering s can be done by multiplying all
the Ai, and then proceeding to an exhaustive search to compute the discrete
logarithm of gs in basis g. Re-randomization of the parts Ai can easily be done
by choosing a random integer t, and replacing each Ai by At i . The generator for
the discrete logarithm must then be replaced by gt .
A.3 Extending our Scheme
The scheme proposed in Sec. 5.1 does not achieve Symmetric Receiver Privacy.
For example, the user Y has access to all the ϕ(xi) such that there exists h c , h c 0 ∈
H c , h c (xi) = h c 0 (x′ ). Without further caution, a malicious user could get more
information than what he ought to. We here describe an example of a protocol
variant that leads to the desired properties.
We will apply secret splitting to the tags ϕ(x) returned by Send. That
implies that we consider the range of ϕ(x) to be relatively small, for example of
32-bit long integers. Primitives are adapted this way:
• KeyGen(1 k ) is unchanged, but here both Setup and Setup are used to
generate (pk, sk),
• SendX,S(x, pk) is slightly modified, namely:
1. Identifier establishment (unchanged) S attributes to x a unique
identifier ϕ(x), and sends it to X.
2. Data storage (unchanged) X sends Enc(x) to S, who stores it in
a memory cell that depends on ϕ(x).
3. Data indexing
– X splits the tag ϕ(x) into |Hc| shares Ax,1, . . . , Ax,|Hc | thanks
to the method described above, and picks a random integer rx,
(x), and executes the queries
– X computes all h c i
updateBF((Enc(f rx ), Enc(Ax,i)), h c i(x))to send (Enc(f rx ), Enc(Ax,i)) to be added to the filter’s bucket
of index h c i (x), where hc i is the i-th function of H c , for i ∈
{1, . . .,|H c |}.
At the end of this update, the bucket Tα of the filter is filled with l couples
(Enc(f zα,j ), Enc(Bα,j)), j ∈ {1, . . ., l}. Bα,j is a share of some tag, or a
random element of the group.
• RetrieveY,S(x ′ , sk) is adapted consequently:
1. (unchanged) Y computes each αi = h c i (x′ ) for hi ∈ H c , then executes
Query PIR
Y,S (αi),
2. – S first re-randomizes the content of each bucket of the Bloom
filter database by the same random value. The filter bucket
Tαi
= {(Enc(f zα i ,j ), Enc(Bαi,j)), j ∈ {1, . . .,l}} becomes
T c1,c2
α i
= {(Enc(f z α i ,j
)
c1
, Enc(Bαi,j) c2
), j ∈ {1, . . . , l}}
– S then answers to the PIR Query, and sends along g c2 to Y,
3. Y decrypts the content of each bucket T c1,c2
αi
(f zα i ,jc1 , B c2
αi,j ),
to get a set of couples
4. If the same element fzc1 is present in the intersection of all the different
sets T c1,c2, then Y possesses all shares of a tag ϕ(x), and then
αi
computes ∏ |H c |
i=1 Ac2 x,i = (gc2) ϕ(x) ,
5. Y finally runs a discrete logarithm of (g c2 ) ϕ(x) in basis g c2 , and adds
ϕ(x) to the set of results Φ(x ′ ).
Note that this scheme can also be generalized for other Proactive Secret
Sharing schemes.
A.4 Security Properties
This new scheme is an extension of the previous one, and the same security properties
are achieved. Moreover, Condition 8 also holds. Indeed, the modification
to the Send procedure is not significant enough to alter the Sender Privacy property:
the only modification on S’s side is the content of the updateBF procedure,
which does not leak. Moreover, the Receiver Privacy property is also preserved,
as communications from Y to S in Retrieve only involves a PIR query.
Proposition 5 (Symmetric Receiver Privacy) Assume the PIR ensures Data
Privacy i.e. it is a SPIR, and that H is a (λmin, λmax, ǫ1, ǫ2)-LSH family with
a negligible ǫ2, then the scheme ensures Symmetric Receiver Privacy, over the
Decisional Diffie Hellman hypothesis.
To demonstrate this proposition, let us begin with a preliminary Lemma.Lemma 3 Let s1, . . .,st ∈ S be t different secrets, with |S| small. Let Ai,1, . . .,Ai,n
be the n parts of the secret si split thanks to the method described in Sec. A. Let
, i ∈ {1, . . .,t}, j ∈ {1, . . ., n}, c ∈ {1, . . .,q}} be collection of k such
π0 ⊂ {Ac i,j
parts, and π1 = {gr1, . . .,g rk
} a set of k random elements of the cyclic group G.
Under the DDH assumption, if an adversary A can distinguish between π0
and π1, then there exists c0 ∈ {1, . . .,q}, i ∈ {1, . . .,t} such that {A c0
i,1 , . . . , Ac0 i,n } ⊂
π0.
Sketch of proof
Let (g, g a , g b , g c ) be an instance of the DDH problem. An adversary can solve
this instance if he can tell, with non-negligible probability, whether g c = g ab or
not.
We take t = 1, because all secrets are independent, and n = 2 (it is easy to
take n > 2 by multiplying the parts and returning to the case n = 2). Suppose
the lemma is false, that means that there exists a polynomial algorithm A that
takes as inputs couples (g cu , g cur ) and (g cv , g cv(s−r) ), with cu ̸= cv, and that
returns the secret s with non-negligible probability.
We then give as input to A the couples (g, g a ) and (g b , g bs−c ) for s ∈ S. If
A returns s, that means that gc−bs = gb(a−s) = gbag−bs . We finally have an
advantage on the DDH problem; that proves the lemma.
✷
Proof of Proposition 5
We now build a simulator S1 for the server in order to prove the proposition.
Let x ′ be the request and Φ(x ′ ) = {ϕ(x1), . . . , ϕ(xk)} be the genuine
answer to Retrieve(x ′ , sk). First, the simulator generates Ω random elements
{z1, . . . , zΩ} ⊂ {1, . . .,q}; he associates the first k elements to the elements
of Φ(x ′).
The simulator splits each of the ϕ(xj) into the n = |Hc| parts
Axj,1, . . .,Axj,n. Finally, he picks random integers c1, c2.
Since the PIR is symmetrical, we can impose the response to each Query(αi)
to be a set containing the k elements that must be present in the intersection,
namely ( Enc(f zj ) c1 , Enc(Axj,i) c2
)
, and the remaining random values
(Enc(f z ) c1 , Enc(g r )),
with z a random element of {zk+1, . . . , zΩ} and r a random integer. We give
to the simulator enough memory to remember which z was returned for which
α, so that multiple queries to the same α are consistent. The simulator also
returns gc2 .
Let A be a malicious receiver in the Exp Sym-Rec-Privacy
A
experiment. Following
Cond. 8, A makes p Retrieve queries to S; each of these requests lead to |Hc|
calls to Query. As the requests x ′ i are λmax-separated, and as the hashes are
λmin, λmax, ǫ1, ǫ2 with a negligible ǫ2, we can consider these Retrieve queries to
be independent.
Note that the first parts of the Bloom filters are always indistinguishable, as
they are generated in the same way. Therefore, if A distinguishes between S0
and S1, that means that he distinguished between a given π0 and π1, constructed
by taking the set of all answers to the Query request he made. By application
of the Lemma, we deduce the proposition.✷