﻿Implementing Minimized Multivariate PKC on
Low-Resource Embedded Systems
Bo-Yin Yang 1,⋆ , Chen-Mou Cheng 2 , Bor-Rong Chen 2 , and Jiun-Ming Chen 3
1
Dept. of Math., Tamkang University, Tamsui, Taiwan
by@moscito.org
2
Harvard University
{doug, brchen}@eecs.harvard.edu
3
Chinese Data Security, Inc., and Nat’l Taiwan U., Taipei
jmchen@math.ntu.edu.tw
Abstract. Multivariate (or MQ) public-key cryptosystems (PKC) are alternatives
to traditional PKCs based on large algebraic structures (e.g., RSA and ECC);
they usually execute much faster than traditional PKCs on the same hardware.
However, one major challenge in implementing multivariates in embedded systems
is that the key size can be prohibitively large for applications with stringent
resource constraints such as low-cost smart cards, sensor networks (e.g., Berkeley
motes), and radio-frequency identification (RFID). In this paper, we investigate
strategies for shortening the key of a multivariate PKC. We apply these strategies
to the Tame Transformation Signatures (TTS) as an example and quantify
the improvement in key size and running speed, both theoretically and via implementation.
We also investigate ways to save die space and energy consumption
in hardware, reporting on our ASIC implementation of TTS on a TSMC 0.25µm
process. Even without any key shortening, the current consumption of TTS is
only 21 µA for computing a signature, using 22,000 gate equivalents and 16,000
100-kHz cycles (160 ms). With circulant-matrix key shortening, the numbers go
down to 17,000 gates and 4,400 cycles (44 ms). We therefore conclude: besides
representing a future-proofing investment against the emerging quantum computers,
multivariates can be immediately useful in niches.
Keywords: Multivariate public-key cryptosystem, efficient implementation,
digital signature schemes, embedded system, sensor networks, motes.
1 Introduction
A multivariate public-key cryptosystem (a multivariate PKC, an MQ-scheme, or simply
a multivariate) uses as the public key the mn(n +3)/2 coefficients of a quadratic
polynomial map V = (ℓ1,...,ℓm) : K n → K m with no constant term, where
K = GF(q) is a finite field, called the “base field”. Computing w = V −1 (z) =
(w1,...,wn) ∈ K n is considered a hard problem, generically NP-hard [16].
⋆
Partially sponsored by Taiwan’s Nat’l Science Council grant NSC-94-2115-M-032-010 and
Taiwan Information Security Center (TWISC) at Nat’l Taiwan U. of Sci&Tech, Taipei, Taiwan.
J.A. Clark et al. (Eds.): SPC 2006, LNCS 3934, pp. 73–88, 2006.
c○ Springer-Verlag Berlin Heidelberg 200674 B.-Y. Yang et al.
In practice, the trapdoor needed for a PKC is almost always accomplished by using
a public map composed of three maps as in V : w ∈ Kn φ1
↦→ x φ2
↦→ y φ3
↦→ z ∈ Km .
The central map φ2 is quadratic. φ1 : w ↦→ M1w + c1 and φ3 : y ↦→ M3y + c3 are
affine, usually invertible. The private key is the m(m +1)+n(n+1)coefficients in
(M1, M3, c1, c3) plus whatever necessary to compute φ −1
2 .
The security of a multivariate PKC thus depends on the infeasibility of decomposing
maps and that of solving a large system of polynomial equations. The reader
is referred to [41] for a comprehensive reference for multivariate PKCs, including the
nomenclature and the state of the art.
1.1 Advantages That Accrue to Multivariates
When quantum computers (QCs) with thousands of qubits arrive, RSA and discrete-log
schemes will be broken [40]. MQ- and lattice-type schemes stand with halved logcomplexity
([22], e.g., 2200 → 2100 ). We also have quantum key exchange but not
quantum signatures. So MQ-schemes seem like future-proofing insurance.
Another reason is that the private map of RSA is intrinsically slow. As a result,
transaction speed on commodity hardware suffers, hindering wider deployment. This
slowness is magnified on embedded systems [1, 44], where lack of megahertz hurts
multivariates less because these schemes spend most of their time in lookup tables.
1.2 Challenges for Multivariates and Our Contributions
Currently, multivariate schemes face some major challenges:
Novelty: This leads to distrust and a prolonged lack of interest. Consequently, there is
little in the way of provable security results or optimizations as for RSA or ECC.
Key sizes: Multivariates have a large public key and sometimes a large private key as
well. This necessitates costlier hardware just like the slow private map of RSA.
We will herein discuss aspects of low-resource implementations. We will illustrate
with the signature scheme TTS (Sec. 2), but the following apply to most MQ-schemes
if one wishes to cut down the private key:
Key scheduling lets a smart card carry a small “mini key” (Sec. 3), based on which the
“real key” and the private map is built on the fly.
Structure in the linear maps enables representation of M1 and M3 with fewer parameters.
We can have a private key that is around 1/5 of its original length and use
40% less running time (Sec. 4).
Despite the fact that most multivariate schemes are already fairly fast, and that the
public keys are much longer (which makes them the natural place to start if one wishes
to cut down memory usage), we feel that our effort is justified in view of the following:
– Often a scheme needs to be as fast as possible, not just “fast enough”.
– In many applications, we only need to store the private key on-card; for example,
when the embedded devices are used as a cryptographic token.Implementing Minimized Multivariate PKC on Low-Resource Embedded Systems 75
– Some applications can tolerate lower security but absolutely must fit within very
tight resource limits. RSA simply won’t fit on a low-end RFID. But a multivariate
PKC might if its private key can be shortened. Schemes operating on small chunks
are more likely to be implemented in such an environment [15].
– The central map of an MQ-scheme often determines its properties, including
weaknesses. A chain can only be as strong as its weakest link. So if a given central
map leads to a cryptanalysis in, say, 2 100 , then the 100+ parameters in the matrices
are more likely to be an overkill rather than a security enhancement.
The rest of this paper is organized as follows. In Sec. 2, we go over the Tame Transformation
Signatures (TTS), a workbench for key-shortening experimentation. Subsequently
in Sec. 3, we discuss the aspects of key scheduling, and in Sec. 4, we describe
the key-shortening strategy via restructuring the linear maps. In Sec. 5, we report the
result of our application-specific integrated circuit (ASIC) implementation, mainly targeting
at RFID applications. We conclude with the discussion in Sec. 6.
Some of the diversed results referenced are summarized in the appendix; the rest
needs to be looked up from the original sources [8, 10, 11, 13, 27, 37, 41, 43, 44].
2 Tame Transformation Signatures
Before we move on, we need to emphasize that there is no reductionist proof of security
today; the security of MQ-schemes is so far thrust-and-parry. Since this is
mainly about implementation, we leave out most details about security, although we
check that known attacks are not functional against our schemes. The interested reader
is referred to Appendix A, where we report the result of a few basic cryptographical
experiments.
2.1 Classification of Multivariates, and enTTS (20,28)
In designing and implementing key-shortening techniques for multivariates, we note
that there is a rationale to experimenting with variants of existing ones instead of inventing
new ones. Extant schemes represent accumulated experience and history. In
tweaking them appropriately, one can expect to inherit many of the good properties and
reuse code, which introduces fewer opportunities for mistakes.
There are four basic ways [41] one can set up the trapdoor of a multivariate scheme
for φ2 to be inverted: Imai-Matsumoto’s C ∗ [31] (and derivatives SFLASH [39] and
PMI+ [10]), Hidden Field Equations [37], “Unbalanced Oil-and-Vinegar” [27], and
“Stepwise Triangular System” [43]. Basic trapdoors must be modified for security —
see [41] for a summary of modifications. All have O(n 3 )-time (and space) public maps
in the practical range, where n is the size of the digest or plaintext block.
C ∗ and HFE are “dual-field” multivariates, which operate over a larger field K k ≡
GF(q k ). UOV and STS are “single-field” multivariates. The TTS [43] family of signature
schemes use a UOV-STS combination. We will illustrate most of our techniques
on a current variant in the family called the Enhanced TTS with K =GF(2 8 ), 20-byte
hashes, and 28-byte signatures, also known as enTTS (20,28). This has the central map76 B.-Y. Yang et al.
φ2 : x =(x0, x1,..., x27) ↦→ y =(y8, y9,..., y27):
yi = xi + ∑7
j=1 pijxjx 8+(i+j mod 9), i=8···16;
y17 = x17 + p17,1x1x6 + p17,2x2x5 + p17,3x3x4
+p17,4x9x16 + p17,5x10x15 + p17,6x11x14 + p17,7x12x13;
y18 = x18 + p18,1x2x7 + p18,2x3x6 + p18,3x4x5
+p18,4x10x17 + p18,5x11x16 + p18,6x12x15 + p18,7x13x14;
yi = xi + pi,0xi−11xi−9 + ∑i−1
j=19 pi,j−18 x2(i−j)−(i mod 2) xj + pi,i−18x0xi
+ ∑27
j=i+1 pi,j−18 xi−j+19 xj, i=19···27.
Given message M, we compute a 160-bit secure hash digest vector z = H(M)
from the message. We may now compute y = M −1
3 (z − c3), thenx∈φ −1
2 (y) as
follows:
1. Assign random x1,..., x7 and try to solve the first 9 equations for x8 to x16.
2. Solve serially for x17 and x18 using the next two equations (y17 and y18).
3. Assign a random x0 and try to solve the last 9 equations for x19 through x27.
We find the signature w = M −1
1 (x − c1). Release (M,w). Atmost9 values of x7 and
x0 make the two systems unsolvable, so we will find a solution.
enTTS (20,28) has 8680- and 1417-byte public and private keys. A smaller instance
enTTS (16,22) has this central map, along with 4400- and 879-byte public and private
keys [43]:
yi = xi + ∑6
j=1 pijxjx 6+(i+j+1 mod 7), i=6···12;
y13 = x13 + p13,1x1x4 + p13,2x2x3
+p13,3x7x12 + p13,4x8x11 + p13,5x9x10;
y14 = x14 + p14,1x2x5 + p14,2x3x4
+p14,3x8x13 + p14,4x9x12 + p14,5x10x11;
yi = xi + pi,0xi−7xi−9 + ∑i−1
j=15 pi,j−14 x2(i−j)−(i mod 2) xj + pi,i−14x0xi
+ ∑21
j=i+1 pi,j−14 xi−j+15 xj, i=15···21.
This was built to compare to RSA-768, while enTTS (20,28) was to RSA-1024.
2.2 History and Security of enTTS and Other TTS Schemes
The STS-UOV family of MQ-signatures contains in addition the Rainbow and TRMS
schemes [2, 11]. The combination of recursive segmented evaluation and solving consecutive
linear systems defend against the three well-known attacks based on linear algebra
against STS [21] and UOV [27]. TRMS uses intermediate-field arithmetic, which
also in effect solves a linear system. Both TRMS and enTTS are sparse variants, i.e., a
TTS-type scheme, with fast signatures and key generation on both PCs and smart cards.
The schemes in [44] slightly differ from enTTS due to the UOV-based attacks carried
out by Ding and Yin [12], while high speed on a smart card is still retained.
The enTTS (20,28) scheme was designed to account for all known attacks [43], including
linear algebra attacks mentioned above, algebraic attacks based on Faugère’sImplementing Minimized Multivariate PKC on Low-Resource Embedded Systems 77
F5 and XL of Courtois et al [8, 13], improved search methods [4], and some methods
tailored to specific schemes [19, 20, 36]. We repeated experiments checking that no
vulnerabilities had been reintroduced in our short-key versions.
2.3 Performance on State-of-the-Art Sensor Nodes
To evaluate the performance of enTTS on state-of-the-art sensor nodes, we benchmark
enTTS on one of the most popular wireless sensor network devices—the Tmote
Sky mote. Tmote Sky is equipped with an 8 MHz Texas Instrument MSP430 microcontroller
and a Chipcon CC2420 2.4 GHz radio that supports IEEE 802.15.4 wireless
low-power medium access control standard. Running on top of it is the TinyOS sensor
network operating system [23]. TinyOS is a modular event-driven operating system
designed specifically for sensor network applications. It has a component-based programming
model, supported by the nesC language [18]. Applications are written as
modules that are composed together through a set of predefined interfaces, which describe
the events to be handled and the implemented commands. The composition of
modules is achieved in compile time by the nesC compiler, whose output is a single C
source program, subsequently to be compiled by a regular C compiler into the MSP430
binary code. It is then downloaded onto the sensor devices to run. Such TinyOS-based
wireless sensor systems are under active study and development by the sensor network
community; they are also becoming a candidate for building pervasive computing infrastructures.
Security and privacy issues, such as authentication and data integrity, are important
in some of these emerging applications, e.g., patients vital signals monitoring and dissemination
in a hospital. Unfortunately, the more traditional PKCs have not been able
to apply in these applications, due largely to the limited computational resources available
on the sensor nodes. This is where MQ schemes like enTTS can find immediate
applications.
A typical sensor node like the Tmote Sky mote has a small working RAM
(10 KBytes in this case), a slightly larger read-only program memory (48 KBytes),
and a relatively large flash memory (1 MBytes) for storing collected raw data and other
auxiliary information for its operations. In our case, the private keys are small and can
be fit into the working RAM. This helps achieving high signature performance, which is
important because it is usually these computationally challenged sensors that are signing
the raw data to protect data integrity. For verification, we need to store the public
key in flash memory because they are too big to fit into RAM. We argue that this is
tolerable because verification is mostly done in the relatively more powerful base stations,
although it is needed occasionally by the motes, say, when they need to verify the
authenticity of a message coming from a base station or another mote.
We benchmark both the verification and the signature performance of enTTS
(20,28): The average signing time is 71 ms, while the average verification time is
726 ms 4 , measured with the time provided by TinyOS at the granularity of 1/32,768
seconds, averaging over 1,000 runs. We note that the throughput of flash memory
reads on our Tmote Sky mote is merely around 45 KBytes per second, so we spend a
4
These numbers do not include the time in computing message digests.78 B.-Y. Yang et al.
significant portion of the verification time reading the 8,680-byte public key off the
flash memory. We believe that the verification time can be significantly improved if the
entire public key can be stored in a larger working RAM, which is expected to be in
the future releases of the device. Even with such an impediment, the results are quite
promising compared with previous results obtained using more traditional PKCs such
as ECC [30].
3 From Previous Attempts at Key Shortening to Scheduling
Many attempts to use subfields for smaller keys in multivariates have been made. The
original version of SFLASH used a subfield but was broken [19]. In general, using
subfields to cut down public key size is not a very good idea. The reason is that, in many
of the attacks against multivariates such as those in [21, 27, 38], guessing at variables
is needed; thus the security is highly correlated to the size of the field in which the
matrices elements of M1 and M3 are taken.
However, using elements of subfields for parameters in the central map can cut
down private key size as will be in use in an implementation below. Further, subfield
constructions for hardware (especially ASIC) are common. Especially notable are the
layered designs of C. Paar [34, 35], which we borrowed into our designs here.
3.1 Key Scheduling and Its Practical Considerations
In general, key scheduling (generating sub-keys or round keys from an original key)
will not shorten the running time. In fact, it usually makes everything slower. But it is
considered worthwhile because short keys have advantages. In presenting SFLASH v2
[39], the authors mentioned that it is possible to run SFLASH with a key schedule. We
try to affirm this idea with an actual run and to provide a control.
We have a few obvious candidates. AES submissions have well-tested key schedules
with assembly-language routines available to call on the PC and the 8051, so we can
use the Rijndael key set-up routine. Or we can also use a stream cipher like RC4 (with a
variable key size), or a fast linear feed-back pseudo random number generator (PRNG)
such as TT800 (a precursor to the mersenne twister, up to 100-byte keys, cf. [32]).
It seems natural to treat each routine as a PRNG or stream cipher. Two observations:
– We store sub-keys (random numbers) in a buffer and take inputs as needed.
– If we wish to output the public key on demand, we must be able to evaluate φi and
φ −1
i in any order. This is very difficult unless the state involved is very short, since
we may have to keep six or seven sets of state for each stage.
A trick from [44] seems applicable: Factor both M1 and M3 as LiDiUi where L and U
are lower- and upper-triangular, respectively, with 1’s on the diagonal, and D is invertible
and diagonal. Order the bytes L21, L31,..., Ln1, L32,..., Ln,n−1 (column order),
then D −1 (diagonal only), then Un−1,n, Un−2,n,...,U1,n, Un−2,n−1,..., U1,2
(U, in reversed column order). We can thus compute a product by M −1
1 and M −1
3 .Store
c1 ahead of the M1, and compute and tuck c3 away somewhere.Implementing Minimized Multivariate PKC on Low-Resource Embedded Systems 79
3.2 Testing Results with Key Scheduling
We justify, after a fashion, the claim that key scheduling [1, 39] is feasible for multivariates.
We are able to fit in the keys and the program of an SFLASH scheme in 8kB
ROM for a 8051-compatible micro-controller (this test was run on simulator of a development
kit only) and a TTS scheme in 4kB. We can not claim our 8051 programs as
well optimized, but here are the test results for TTS:
The “private key” lengths listed are really the seed maintained by the key scheduling
routine. We are unable to test key generation on card with TT800 and RC4 because we
somehow ran out of resources each time. Using the key setup routine for AES, the
key generation time is long but barely doable. The conclusion is that perhaps we don’t
need on-card key generation for such low-cost concerns, even though there appears to
be no security concerns in so doing, which can be seen from some basic testing on
the public keys generated by the key-scheduled methods (Appendix A) showing “no
difference.”
Table 1. Key-scheduled signatures schemes on a stock 3.57MHz Intel 8032AH card
Scheme Signature Pr. Key Pub.Key setup ROM setup sign sign ROM
TTS (20,28) 224 bits 1.4 kB 8.6 kB 4.2 kB 62 sec 198 ms 1.4 kB
Rijndael 224 bits 120 B 8.6 kB 6.4 kB 703 sec 454 ms 3.6 kB
RC4 224 bits 32 B 8.6 kB N/A N/A 334 ms 2.2 kB
TT800 224 bits 32 B 8.6 kB N/A N/A 310 ms 2.5 kB
4 Restructure for Faster Signing and Tiny Private Keys
Decomposing a matrix is not new. There are [n]q! =(q n − 1) (q n − q) ···(q n −
q n−1 ) invertible n × n matrices over GF(q). But for generating an invertible matrix,
it is common to use LU decomposition, which produces only (q − 1) n q n(n−1)
of the non-singular matrices. To decrease storage needs, we must decompose differently
here. Trying not to repeat history, we examine two such earlier attempts in
Sec. 4.1.
Here we must mention the two recent other papers dealing with equivalency and
normal forms of multivariates, [25] and [42]. As mentioned by [41], sparse variants like
TTS are not affected much by such equivalency concerns.
4.1 Earlier Attempts
Two ideas heretofore seen in print tried replacing multiplication by a non-singular matrix
by a sequence of O(n) linear steps between the components of a vector.
D(J − P)D: Both M1 and M3 are in form L(J − Pσ)R,whereL and R are invertible
and diagonal, J amatrixofall1’s, and Pσ a permutation matrix. Store L and R
plus the σ’s in private key.80 B.-Y. Yang et al.
Elementary Row Operations: [24] proposed to use both M1 and M3 ( ) ( ) in the form of
∏n−1
∏n−1
Pσ−1ρ−1
i=i
(1 + βiEn−i+1,n−i) Pρ j=1
(1 + αjEj,j+1)
PσD, wheren
is the dimension, Pσ a permutation matrix with σ =[σ1, σ2,...] (list notation), D
is invertible diagonal, and Eij is all 0’s except a 1 in the (i, j) position.
But both have security concerns. Let x = Mw and M =(mkℓ)1≤k,ℓ≤n, then
n∑
xixj = mikmjk w 2 k +
∑
(mikmjℓ + miℓmjk) wkwℓ.
k=1
1≤k<ℓ≤n
If either most of the mij are zero, or there is a tendency for mik : miℓ = mjk : mjℓ,
then cross-term coefficients will tend to vanish. If M is constructed with elementary row
operations as above, many entries will be zero. When both mik and mjk are non-zero,
the odds are good that a multiple of row i has been just added to row j or vice versa.
The matrix L(J − P )R has far fewer zeroes, but most entries of J − Pσ are 1’s,
hence mkℓ is usually LkRℓ. When neither σ1 nor σ2 is 1 or 2 (very likely), we have
x1 = L1(R1w1 + R2w2 + ···), x2 = L2(R1w1 + R2w2 + ···),
and w1w2 coefficient in x1x2 is zero. More precisely, let M1 = L(J−Pσ)R, x = M1w.
The (i, j) entry of M1 is 0 if j = σi and LiRj otherwise. Thus only 2n − 2 of the
n(n − 1)/2 cross terms are non-zero in the cross term xixj.
(
n∑
R 2 kw2 k +(Rσiwσi
n∑
)
+ Rσj
wσj
)
.(1)
xixj = LiLj
k=1
k=1
Rkwk + RσiwσiRσj
wσj
It is verifiable that polynomials for y in w are sparse for cryptosystems with relatively
few cross terms which leads to easier Gröbner bases computations [14].
4.2 Designing for Non-sparseness
An obvious improvement on the previous section is to use other forms of zero-one
matrices that leads to fewer zeros coefficients in the quadratic polynomials.
Proposition 1. Suppose M = LSR, whereLand R are diagonal and S =(sij) is a
0-1 matrix, and if sij =1with probability p, then the probability for the coefficient of
wkwℓ in the expansion of xixj (denoted [wkwℓ](xixj)) to be non-zero is ≈ 2p2 (1−p2 ).
Proof. Let entries of L and R be (resp.) Li and Rj,thenxi = Li
(
∑n
j=1 Risijwi
)
+ci.
So when k ̸= ℓ, [wkwℓ](xixj) =LiLjRkRl(siksjℓ + siℓsjk). This will vanish unless
exactly one of siksjℓ and siℓsjk is 1, or rather if and only if either sik = sjℓ =1(at
least one of siℓ and sjk is zero); or siℓ = sjk =1(at least one of sik and sjℓ is zero).
We can make a coefficient non-zero at most half the time when p ≈ 1/ √ 2. For example,
when n =28we want nineteen entries in each row and column of S to be 1 and the
other nine to be 0. Elementary row operations and D(J − P )D decompositions wereImplementing Minimized Multivariate PKC on Low-Resource Embedded Systems 81
considered partly because we want the length of the private key and the signing time to
be O(n), yet (as above) the number of zeros as well as the number of ones need to be
∝ n 2 .Whatwewantis:S should be in a form that makes x ↦→ w = M −1 x doable
in O(n), even though M itself contains ∝ n 2 of both zeros and ones. This is where
circulants come in.
4.3 Circulant Matrices
A square matrix M =(mij)1≤i,j≤n is circulant if i − k ≡ j − ℓ (mod n) implies
mij = mkℓ. Circulant matrices have been studied for a long time (e.g., the monograph
[7]). The S-Box of AES effectively uses a multiplication by the circulants
S8 =
⎡
⎤
11111000
01111100
00111110
00011111
10001111
⎢11000111
⎥
⎣11100011⎦
11110001
S −1
8 =
⎡
⎤
01010010
00101001
10010100
01001010
00100101
.
⎢10010010
⎥
⎣01001001⎦
10100100
Borrowing a page from the same book, we let M1 = LPσSPρR, whereσ, ρ ∈Sm are
permutations. We want an S that would let us multiply a vector by S −1 quickly. One
obvious candidate is S28 =(sij)1≤i,j≤28,wheresij =1if j −i is congruent to 0 ···18
and =0otherwise. Multiplying by S28 is easy, but multiplying by S −1
28 is even easier,
because
S −1
28 =(ai,j)1≤i,j≤28, aij =1if i − j ≡ 0, 9, 18 (mod 28), else aij =0.
We would like to find an Sm for any m with similarly nice properties: S −1
m x easily
evaluated in O(m), andSm in some nice form that has the proportion of 1’s as close to
0.707 as possible. Luckily, the same thing works for any m =3k +1. In fact we have
Proposition 2 (Trilinear Circulant Matrices). In a field of characteristic 2:
1. Let S3k+1 =(sij), sij =1iff j − i ≡ 0 ···2k (mod 3k +1)and =0otherwise,
then S −1
3k+1 =(aij) is a 0-1 circulant with aij =1iff i − j ≡ 0,k,2k;
2. Let S3k+2 =(sij), sij =1iff j − i ≡ 0 ···2k (mod 3k +2)and =0otherwise,
then S −1
3k+2 =(aij) is a 0-1 circulant with aij =1iff i − j ≡−1,k,2k +1.
The proofs can be found in [45], the more complete version of this paper submitted as
a research report to TWISC.
We see that our formulas are still missing the 3k cases. Circulants in the DPCPD
form seem to fit TTS (28,20) well: neither dimension is divisible by 3, and the proportion
of zeros in each cross-term is 41/81, very close to one-half. It works for SFLASH v2
(central map of GF(2 7 ) 37 → GF(2 7 ) 26 ) too, but SFLASH v2 takes most of its time in
the central map so the time savings are scarce.82 B.-Y. Yang et al.
4.4 The Quirks of Circulants: Filling in the Blanks
We can summarize our problems encountered and observation made in trying to find a
nice regular form for S3k in the following proposition :
Proposition 3. Let the n × n circulant Sn,k have first row [1, 1,...,1, 0,..., 0], then
} {{ } } {{ }
k n−k
1. Sn,k is singular if (and only if) d =gcd(n, k) > 1;
2. For large enough n with gcd(n, q) =1,someS −1
n,k
will have q lines of 1’s.
3. If n is even, mink>1(#of1’s in a row of S −1
n,k
) is the smallest odd prime p ̸ | n.
Here d and q are defined as odd (since we are playing in GF(2)).
Proposition 3 implies that it is more difficult to find nice matrices for n = 3k (or
n =6k, assuming only even dimensions). If we do not want to avoid multiples of 3,we
need a circulant S3k ∈{0, 1} 3k×3k with about two-thirds of its entries being 1, such
that S −1
3k
has only three 1’s in a row.
Proposition 4 (Trilinear Circulants of Size 3k). We can find circulants Ĉk,j so that
Ĉ −1
k,j
exists in GF(2) and has (2k − 1) 1’s in every column. One way is to form Ĉk,j
from this row:
110 j 10 3k−3−j j
3k−3−j
{ }} { { }} {
=[1, 1, 0,..., 0, 1, 0,..., 0],
We can choose S3k satisfying the requirements in Sec. 4.3 among such Ĉ−1
k,j .
Again we need to leave the proofs to the more complete Research Report [45].
4.5 Performance Evaluation
Armed with the above, for any TTS central map, we can build a TTS variant with both
M1 and M3 in DLPLCPRDR form and call it TTS LPSQR. We use only elements of
the subfield GF(16) for the coefficients in the central map. Also, The permutations are
stored packed: for dimensions under 32, we only use 5 bits to an entry, so the private
key totals 289 bytes (170/2 in φ2 +20inc3 +28inc1 + (20 + 28) × (1 + 5
8
) × 2
Table 2. Performance of various signature schemes on a 500MHz Pentium III
Scheme Signature Pub. Key Priv. Key KeyGen Signing Verifying
RSA-PSS 1024 bits 128 B 320 B 2.7 sec 84 ms 2.0 ms
ECDSA 326 bits 48 B 24 B 1.6 ms 1.9 ms 5.1 ms
SFLASH 259 bits 15.4 kB 2.4 kB 1.2 sec 1.6 ms 0.28 ms
TTS (20,28) 224 bits 8.6 kB 1.4 kB 15 ms 51 µs 0.11 ms
w.LPSQR 224 bits 8.6 kB 289 B 16 ms 28 µs 0.11 ms
TTS (24,32) 256 bits 13.4 kB 1.8 kB 25 ms 67 µs 0.18 ms
w.LPSQR 256 bits 13.4 kB 455 B 27 ms 41 µs 0.18 msImplementing Minimized Multivariate PKC on Low-Resource Embedded Systems 83
in M1, M3). We compare the speed of TTS, TTS LPSQR and alternatives 5 in Table 2.
The tests compile with only gcc -O3 (ver. 3.3) using plain ANSI C and no inlined
assembly. For TTS, there is an expected speedup.
5 ASIC Implementation
The following represents our attempt to implement the enTTS scheme straight into
an ASIC for verification purposes. This is to avoid any new vulnerabilities peculiar
to structured matrices. To our best knowledge, this work appears to be the first ASIC
implementation of a multivariate signature scheme in the literature, although there have
been ASIC implementations of multivariate encryption schemes, e.g., NtruEncrypt [17].
In application scenarios like RFID-based security systems, typically we require a
tag to authenticate itself to a reader, e.g., using the tag as a cryptographic token. In
this case, only the signature functionality is needed on the tag. To meet the stringent
resource constraint posed by devices like RFID tags, we only implement the signature
functionality on the tags, leaving verification to the more resource-abundant (memory
in particular) readers.
We first describe a hardware design that implements the plain enTTS (20,28) signature
scheme without key shortening and report its synthesis result on the TSMC 0.25 µm
process. The architecture of our implementation is depicted in Fig. 1 (b). It is divided
into 4 portions:
1. SRAM cells: 81 bytes (8 for loop control and temporary storage, 28 for storing the
signature, and 45 for solving a 9 × 9 linear system using Lanczos’ method), taking
about 9,900 gate equivalents.
2. NVRAM block: 1579 bytes (private key and look-up tables), taking about 6,400
gate equivalents.
3. Arithmetic Unit: about 800 gate equivalents, containing a set of 12 GF(16) multipliers
and 2 GF(16) inverters, as well as an 8-bit integer unit for loop control.
4. Control Unit: about 3,900 gate equivalents, containing a microcoded state engine
(200), microcode firmware (1,100), and an address generating unit (2,600).
Analog
Interface
Circuitry
V
DD
Data
clock
Digital
control
circults
Data
Data
Signature
controller
circuit
EEPROM
data bank
control
unit
NVRAM
320 bytes
4bit multipliers x12 RNG
4bit inverter x2
SRAM
40x8 bit
(a)
(b)
Fig. 1. Functional block diagrams for the ASIC implementation: (a) the entire RFID tag (b) inside
the signature controller circuit
5
ECC, RSA, and QUARTZ data from NESSIE website [33]. For earlier data of TTS, see [43].84 B.-Y. Yang et al.
The total synthesized circuit contains about 21,000 gate equivalents. Signing a message
takes about 60,000 cycles, which translates to 0.6 seconds on a 100kHz clock.
Most of the running time is spent in Lanczos’ method. To save storage space, we do
not store the matrix entries when inverting φ2, so we have to generate the entries on the
fly. A back-of-envelope calculation reveals that it takes about 1,000 cycles to perform
a matrix-vector multiplication or to compute a bilinear form, and each iteration in the
Lanczos method involves 5 such operations, adding up to a bit less than 6,000 cycles
per iteration. Typically it takes 9 iterations to solve a 9 × 9 system, and we need to solve
two such systems when inverting φ2. As a result, we spend almost 90% of the time
in Lanczos’ method. The measured time consumed in Lanczos’ method is close to 0.5
seconds per signing, fairly close to what we have estimated.
We note that it is possible to further speed up the operation by either using more
SRAM or introducing parallelism. We report our attempt along the first direction. By
adding 9 bytes of SRAM (totaling 54 bytes, 45 for storing a symmetric 9×9 matrix and
9 for a vector), we are able to use symmetric Gaussian elimination in place of Lanczos’
method. The idea is to generate M T M,solveM T Mx = M T w, and then verify if
Mx = w. This way we are able to cut the running time to about 0.16 seconds (a direct
extrapolation would give 0.22, but there are some improvements we can do), less than
one third the time that Lanczos’ method takes. If situation permits, it is possible to
halve the running time to about 0.077 seconds by adding 36 more bytes and doing a
full-fledged Gaussian elimination.
At a first glance, such a strategy may not look so attractive in a resource-constrained
environment such as RFID tags, particularly because SRAM cells are very expensive in
terms of gate counts and power consumption. However, it will make sense from energy
consumption’s point of view, since cutting running time can reduce the total energy
consumption and thus may achieve a higher energy efficiency. Experiments indeed validate
our hypothesis. To our surprise, possibly due to the reduced switching activities
in SRAM (e.g., due to less memory accesses) when doing symmetric Gaussian elimination,
the power consumption of which is actually lower than that of doing Lanczos,
even though the former uses more SRAM cells.
We summarize our synthesis results on the TSMC 0.25 µm process and a 100kHz
clock in Table 3. Finally, with circulant (LPSQR) form of enTTS (20,28), we can go
down to as low as 0.044 seconds per signing and 17,000 gates using Gaussian elimination,
or 0.13 seconds and 14,000 gates using symmetrized Gaussian.
Table 3. Simulation and synthesis result on the TSMC 0.25µm process
Lanczos Sym. GE GE
Component Gates µA Gates µA Gates µA
SRAM 9,902 7.5 10,782 6.9 15,057 12.4
NVRAM 6,401 2.7 6,399 3.4 6,399 2.2
Arithmetic 768 4.3 768 3.5 768 4.2
Control Unit 4,207 6.2 4,009 7.5 3,689 6.6
Total 21,278 20.7 21,958 21.3 25,913 25.4Implementing Minimized Multivariate PKC on Low-Resource Embedded Systems 85
6 Concluding Remarks
NESSIE [33] did not recommend SFLASH for general use but said that it was very
efficient on low-cost smart cards, where the size of the public key is not a constraint.
We think that the techniques we proposed here help with the issue of overly large
keys. This discussion applies also to other multivariate PKC schemes. We think that
multivariate schemes will be better contenders against the established schemes if they
are a lot better customized and optimized.
The usual heir apparent if RSA/ECC is toppled would be NTRU. In fact NTRU-
Encrypt has many good qualities. We point out that diversity is a good thing, and
NTRUSign is not yet as widely implemented. So we think that there is still value to
our implementations here.
We Believe That the Most Important Issue Is Still Provable Security for
Multivariates. Aside from that, there are at least the following directions to explore:
optimization towards smaller keys and better performance in multivariate PKCs and
their security estimates. Once key-shortening techniques, e.g., in the LPSQR form or
other forms, are considered safe, they can be implemented for a lot of savings in space
and time, stimulating more interest towards multivariate schemes. There should still be
a lot to work for in this multivariate arena.
References
1. M. Akkar, N. Courtois, R. Duteuil, and L. Goubin, A Fast and Secure Implementation of
SFLASH, PKC’03, LNCS 2567, pp. 267–278.
2. C.-Y. Chou, Y.-H. Hu, F.-P. Lai, L.-C. Wang, and B.-Y. Yang, Tractable Rational Map Signature,
PKC’05, LNCS 3386, pp. 244–257.
3. N. Courtois, Generic Attacks and the Security of Quartz, PKC’03, LNCS 2567, pp. 351–364.
4. N. Courtois, L. Goubin, W. Meier, and J. Tacier, Solving Underdefined Systems of Multivariate
Quadratic Equations, PKC’02, LNCS 2274, pp. 211–227.
5. N. Courtois, A. Klimov, J. Patarin, and A. Shamir, Efficient Algorithms for Solving
Overdefined Systems of Multivariate Polynomial Equations, EUROCRYPT 2000, LNCS 1807,
pp. 392–407.
6. J. Daemen and V. Rijmen, The Design of Rijndael, AES - the Advanced Encryption Standard.
Springer 2002.
7. P. Davis, Circulant matrices, John Wiley & Sons, New York-Chichester-Brisbane, 1979.
8. C. Diem, The XL-algorithm and a conjecture from commutative algebra, ASIACRYPT’04,
LNCS 3329, pp. 338–353.
9. J. Ding, A New Variant of the Matsumoto-Imai Cryptosystem through Perturbation, PKC’04,
LNCS 2947, pp. 305–318.
10. J. Ding, J. Gower et al, Innoculating Multivariate Schemes against Differential Attacks,
http://eprint.iacr.org/2005/255/.
11. J. Ding and D. Schmidt, Rainbow, a new Digitial Multivariate Signature Scheme, ACNS’05,
LNCS 3531, pp. 164-177.
12. J. Ding and Z. Yin, Cryptanalysis of TTS and tame-like multivariable signature schemes,
presentation, IWAP’04.
13. J.-C. Faugère, A New Efficient Algorithm for Computing Gröbner Bases without Reduction
to Zero (F5), Proceedings of ISSAC’02, pp. 75-83, ACM Press 2002.86 B.-Y. Yang et al.
14. J.-C. Faugère, invited talk at AES4 conference, and private communication.
15. M. Feldhofer, S. Dominikus, and J. Wolkerstorfer, Strong Authentication for RFID Systems
Using the AES Algorithm, CHES 2004, LNCS 3156, pp. 357–370.
16. M. Garey and D. Johnson, Computers and Intractability, A Guide to the Theory of NPcompleteness,
Freeman and Co., 1979, p. 251.
17. G. Gaubatz, J.-P. Kaps, and B. Sunar, Public Key Cryptography in Sensor Networks—
Revisited, 1st European Workshop on Security in Ad-Hoc and Sensor Networks (ESAS
2004), LNCS 3313, Heidelberg, Germany, August, 2004.
18. D. Gay, P. Levis, R. von Behren, M. Welsh, E. Brewer, and D. Culler, The nesC Language:
A Holistic Approach to Networked Embedded Systems, ACM SIGPLAN 2003 Conference
on Programming Language Design and Implementation (PLDI), San Diego, CA, USA, June,
2003.
19. H. Gilbert and M. Minier, Cryptanalysis of SFLASH, EUROCRYPT 2002, LNCS 2332,
pp. 288–298.
20. W. Geiselmann, R. Steinwandt, and T. Beth, Attacking the Affine Parts of SFLASH, 8th International
IMA Conference on Cryptography and Coding, LNCS 2260, pp. 355–359.
21. L. Goubin and N. Courtois, Cryptanalysis of the TTM Cryptosystem, ASIACRYPT 2000,
LNCS 1976, pp. 44–57.
22. L. K. Grover, A fast quantum mechanical algorithm for database search, Proc. 28th Annual
ACM Symposium on the Theory of Computing, (May ’96) pp. 212–220.
23. J. Hill, R. Szewczyk, A. Woo, S. Hollar, D. E. Culler, and K. S. J. Pister, System Architecture
Directions for Networked Sensors, Proc. 9th International Conference on Architectural
Support for Programming Languages and Operating Systems (November 2000), pp. 93–104.
24. Y. Hu, L. Wang, J. Chen, F. Lai, and C. Chou, A Performance Report and Security Analysis
of a fast TTM implementation, 2003 IEEE Int’l Symp. on Information Theory, Yokohama,
Japan, June 2003.
25. Y. Hu, L. Wang, F. Lai, and C. Chou, Similar Keys of Multivariate Quadratic Public Key
Cryptosystems, CANS’05, LNCS 3810, pp. 211-222.
26. A. Joux, S. Kunz-Jacques, F. Muller, P.-M. Ricordel, Cryptanalysis of the Tractable Rational
Map Cryptosystem, PKC’05, LNCS 3386, pp. 258–274.
27. A. Kipnis, J. Patarin, and L. Goubin, Unbalanced Oil and Vinegar Signature Schemes,
CRYPTO’99, LNCS 1592, pp. 206–222.
28. R. Lidl and H. Niederreiter, Finite Fields. Addison-Wesley, 1984.
29. S. Ljungkvist, in the 8051 code library http://www.8052.com/codelib.phtm
30. D. Malan, M. Welsh, and M. Smith, A Public-Key Infrastructure for Key Distribution in
TinyOS Based on Elliptic Curve Cryptography, First IEEE International Conference on Sensor
and Ad hoc Communications and Networks (SECON), Santa Clara, CA, USA, October,
2004.
31. T. Matsumoto and H. Imai, Public Quadratic Polynomial-Tuples for Efficient Signature-
Verification and Message-Encryption, EUROCRYPT’88, LNCS 330, pp. 419–453.
32. M. Matsumoto and T. Nishimura, Mersenne Twister: A 623-Dimensionally Equidistributed
Uniform Pseudo-Random Number Generator, ACM Trans. on Modeling and Computer Sim.,
8 (1998), pp. 3-30.
33. The NESSIE project homepage: http://www.cryptonessie.org.
34. C. Paar, Some Remarks on Efficient Inversion in Finite Fields, 1995 IEEE International Symposium
on Information Theory, Whistler, B.C. Canada, September 1995, available from the
author’s website.
35. C. Paar, A New Architechture for a Parallel Finite Field Multiplier with Low Complexity
Based on Composition Fields, Brief Contributions section of IEEE Transactions on Computers,
vol. 45(1996), No. 7, pp. 856–861.Implementing Minimized Multivariate PKC on Low-Resource Embedded Systems 87
36. J. Patarin, Cryptanalysis of the Matsumoto and Imai Public Key Scheme of Eurocrypt’88,
CRYPTO’95, LNCS 963, pp. 248–261.
37. J. Patarin, Hidden Fields Equations (HFE) and Isomorphisms of Polynomials (IP): Two New
Families of Asymmetric Algorithms, EUROCRYPT’96, LNCS 1070, pp. 33–48.
38. J. Patarin, L. Goubin, and N. Courtois, C ∗ −+ and HM: Variations Around Two Schemes of T.
Matsumoto and H. Imai, ASIACRYPT’98, LNCS 1514, pp. 35–49.
39. J. Patarin, N. Courtois, and L. Goubin, FLASH, a Fast Multivariate Signature
Algorithm, CT-RSA’01, LNCS 2020, pp. 298–307. Updated version available at
http://www.cryptonessie.org
40. P. W. Shor, Algorithms for quantum computation: Discrete logarithms and factoring, Proc.
35nd Annual Symposium on Foundations of Computer Science (S. Goldwasser, ed.), IEEE
Computer Society Press (1994), 124-134.
41. C. Wolf and B. Preneel, Taxonomy of Public-Key Schemes based on the Problem of Multivariate
Quadratic Equations, http://eprint.iacr.org/2005/077.
42. C. Wolf and B. Preneel, Equivalent Keys in HFE, C ∗ , and variations, In Mycrypt’05, LNCS
3715, pp. 33-49, 2005.
43. B.-Y. Yang and J.-M. Chen, Rank Attacks and Defence in Tame-Like Multivariate PKC’s,
ACISP 2005, LNCS 3574, p. 518–531. Older version at E-Print Archive 2004/061.
44. B.-Y. Yang, Y.-H. Chen, and J.-M. Chen, TTS: High-Speed Signatures on a Low-Cost Smart
Card, CHES’04, LNCS 3156, pp. 371–385.
45. B.-Y. Yang, C.-M. Cheng, B.-R. Chen, and J.-M. Chen, Technical Research Report Number
11, 2005, Taiwan Information Security Center (TWISC).
A
Cryptographical Experiments
Hazards (cf. Appendix A) facing multivariate digital signature schemes include algebraic
attacks, mainly XL-Gröbner-Bases [5, 8, 13], searching [4], rank attacks [21, 43],
and other tailored attacks. In this section, we report our effort to make sure that our
shortened keys stand up against all known attacks.
We randomly generated enTTS (with both miniature and normal instances; see [43])
and SFLASH/C ∗− public keys (for control purposes) using both generic and LPSQR
matrices, and put them through equation-solving mechanisms using Gröbner Bases,
FXL simulators and rank attacks. Also small tests were run with Magma (including F4,
the precursor to F5). We conducted repeated runs of tests until our test machines (with
3 GB of RAM) ran out of memory. Each test was run at least 100 times except for the
maximum-sized one listed, repeated only 70 times.
Table 4. List of tests run to look for differences between generic and proposed shortened keys
Test Test Size enTTS C ∗−
FXL Up to 14 remaining variables no no
MAGMA F4 Upto8eqs&vars no no
High Rank Up to 16 × 16 matrices no no
Low Rank Up to 16 × 16 matrices no no
Oil+Vinegar Up to 12 × 12 matrices no no88 B.-Y. Yang et al.
The conclusion is that the shortened keys behave identically under Gröbner and
XL to generic keys and do not have rank vulnerabilities. We hope to try again later
with more memory and better optimized programs, perhaps by implementing Faugère’s
F5 [13] for ourselves. We must note that there is no guarantee for the security of our
schemes solely based on the result of these tests, but cryptologists are still working on
getting some measure of reductionist security for multivariates.