

Message Recovery Signature Schemes from SigmaprotocolsAbstractThis paper presents a framework for constructing message recovery signature schemes from sigmaprotocols. The heart of our construction is a redundancy function that adds some redundancy to a message that only a legitimately signed and recovered message can have. We characterize redundancy functions that make the resulting message recovery signature scheme proven secure. Our framework includes known schemes when the building blocks are given concrete implementations, so it presents insightful explanation into their structure.
1. IntroductionDigital signatures have been a central research subject in cryptography and are widely used these days. Among the many theoretical issues to pursue, efficiency is of great concern in practice. It is typically measured by the size of the public key and the signatures and by the computational cost for signature generation and verification. This paper focuses on efficient digital signature schemes that yield short signatures. A digital signature scheme yields either a digital signature with an appendix or with messae recovery. In the former case, a signed message looks like (σ, msg), where σ is the signature and msg is the message. Given (σ, msg), the verifier can determine whether σ is a correct signature that guarantees the authenticity and integrity of message msg. Naturally, the length of a signed message is the sum of the length of the message and the signature. In the latter case, a signed message is like (σ, n), where n is part of the message. The remaining part of the message is contained in signature σ, and anyone can recover the entire message. Because of this structure, one can expect a signed message to be shorter than a message in the case of a signature with appendix. Typically, a message recovery signature proceeds as follows. The signer adds some redundancy to the message to sign it. This produces a signed message (σ, n). To verify the signed message, the verifier first recovers the entire message msg and then checks whether the recovered message has the correct redundancy. All that is required to recover the message is the public key, so anyone can do it. Furthermore, every random string (σ, n) will yield a string that looks like a redundant message (but is actually a random meaningless one). Accordingly, it is essential to choose redundancy that cannot be easily satisfied by the message recovered from forged (σ, n). However, there are many message recovery signature schemes in the literature that do not specify or even characterize the necessary properties of the redundancy that makes the signature scheme proven secure. For RSA (Rivest, Shamir, and Adleman) signatures, there are some redundancy functions (also known as the padding method) such as PSSR [1] that eventually achieves security against adaptive chosen message attacks [2] in the random oracle model [3]. In [4]–[6], Nyberg and Rouppel presented message recovery signature schemes whose security is based on the discretelog problem, but no specific redundancy function was given, so its security is unclear. It was followed by some similar schemes, e.g., [7], also without specific redundancy functions. In [8], Abe and Okamoto presented a message recovery version of the Schnorr signature scheme [9] with a specific redundancy using random oracles but with a very complicated security analysis that results in a high cost for reduction to the discrete logarithm problem. In [10], Naccache and Stern constructed a message recovery signature scheme from DSA [11] with characterization of sufficient redundancy. Pintsov and Vanstone presented a message recovery version of the Schnorr signature scheme whose security is proven in the ideal cipher model, which is known as the ECPV scheme [12], [13]. Many of these schemes are included in international standards such as [14]. The FiatShamir transform [15] is a wellknown methodology for obtaining a secure digital signature scheme. It converts a sigmaprotocol [16], [17] into a secure signature scheme in the random oracle model. A sigmaprotocol is a twoparty protocol where a prover interacts with a verifier to convince him that the prover knows a secret, which is often understood as a secret key corresponding to a public key, without leaking anything about the secret. More precisely, it is a publiccoin threeround honest verifier zeroknowledge proof system that has special soundness and special honest verifier zeroknowledge. Such protocols are often used for interactive authentication. The FiatShamir transform can be used to eliminate the interaction and enable the prover alone to create a transcript of the execution of the sigmaprotocol so that it can act as evidence of his knowledge about the secret, like the interactive version. When an arbitrary message is bound to the transcript, one can see that only the person who knows the secret can create the transcript with the message. Thus, it works as a digital signature with appendix. However, it is not known whether such a technique is also useful for designing digital signature schemes with message recovery. In this paper, we present a framework for constructing message recovery signature schemes from sigmaprotocols. Our results show how to securely convolute part of a document into a message used in the sigmaprotocol by using a redundancy function and eventually turn the sigmaprotocol into a message recovery signature scheme that is adaptively chosen message secure in the random oracle model. Importantly, we characterize the redundancy function that is sufficient to make the resulting message recovery signature scheme proven secure in the random oracle model. We then show two instantiations of the redundancy function that conforms to our requirements based on the random oracle model and the ideal cipher model. At 80bit security with typical parameter settings, both instantiations shorten the signed message by 80 bits. Our framework yields known schemes, ECPV in particular, when the building blocks are given concrete implementations; hence, it presents insightful explanation into their structure. 2. Preliminaries2.1 NotationWe consider uniform probabilistic algorithms (i.e., Turing machines) that take as input (the unary encoding of) a security parameter λ ∈ and possibly other inputs and run in deterministic polynomial time in λ. We thus always implicitly require the size of the input to be bounded by some polynomial in λ. Adversarial behavior is modeled by nonuniform polynomialtime probabilistic algorithms, i.e., by algorithms that together with the security parameter λ also get a polynomialsized auxiliary input aux_{λ}. To simplify the notation, we usually let the dependency on λ (and on aux_{λ}) remain implicit. By A(x), we denote the output distribution of a probabilistic algorithm A with input x and uniformly chosen randomness. By y ← A(x), we mean that algorithm A is executed on input x and the output is assigned to y. We may also denote it as y ← A(x; r) when the randomness r is to be explicitly noted. For any algorithm A, the output domain of A is denoted by Dom(A). Similarly, for any finite set S, we use the notation y ← S to denote that y is sampled uniformly from S, and y ← x means that the value x is assigned to y. By BitLen(S), we denote the number of bits used to present the largest element in S. X is used in different ways. If X is a set, X denotes the cardinality of X. If X is a distribution or an algorithm, X denotes the number of elements in X with nonzero probability. If X is a variable whose actual value is taken from some range, X denotes the number of bits needed to present the maximum value in the range. (Hence, X does not necessarily equal [log_{2} X] for the actual value assigned to X.) Finally, if X is a string, X is the number of bits needed to present the string. We use X := Y to define a new variable X with Y. For any two strings a and b, ab denotes a string that concatenates a and b in order. By ab ← c, we denote that string c is separated into two parts and assigned to a and b. The manner of separation depends on the context but may not be explicitly given when it is clear from the context. P[y = A(x)] denotes the probability (taken over a uniformly distributed random tape) that A outputs y on input x, and we write P[x ← B : A(x) = y] for the (average) probability that A outputs y on input x when x is output by B: P[x ← B: A(x) = y] = Σ_{x} P[y = A(x)]P[x = B]. We also use natural selfexplanatory extensions of this notation. An oracle algorithm A is an algorithm in the above sense connected to an oracle in that it can write on its own tape an input for the oracle and tell the oracle to execute it. Then, in a single step, the oracle processes this input in the prescribed way and writes its output on the tape. We write A^{O} when we consider A to be connected to the particular oracle O. As is common practice, a value v(λ)∈R, which depends on the security parameter λ, is treated as negliible, denoted by v(λ) ≤ negl(λ) or v ≤ negl, if ∀c > 0∃λ′∈∀λ ≥ λ′: v(λ) < 1/ λ^{c}. 2.2 SigmaprotocolLet L⊂{0, 1}^{*} be a language and R_{L} be a binary relation associated with L. Let W_{L}(x) for x∈L denote a set of witnesses for x, i.e., W_{L}(x) = {R_{L} (x, w) = 1}. Let L be an instance generator for L that takes security parameter λ and chooses an instance x∈L and a witness w∈W_{L}(x) with restriction x = λ. A sigmaprotocol for language L is an honest verifier zeroknowledge proof system that consists of (probabilistic) polynomialtime algorithms (A, C, Z, V, E, M). These algorithms work as follows. For (x, w) ← L(λ), the verifier is given x and the prover is given (x, w). The prover first sends the first message a ← A(x; t) to the verifier and the verifier challenges the prover by returning c ← C(x). The prover answers the challenge by sending z ← Z(w, t, c) to the verifier. The verifier accepts if a = V(x, c, z). (More generally, V can be an algorithm that takes (a, x, c, z) as input and returns 1 or 0 to report acceptance and rejection. Note, however, that it is essential for our purpose that a can be computed from (x, c, z) in polynomial time.) We assume perfect correctness, where the verifier accepts with probability 1 when both the prover and verifier are honest. A sigmaprotocol features special soundness. That is, there exists a deterministic polynomialtime algorithm E, called an extractor, that takes two acceptable transcripts with the same first message and different challenges, say (x, a, c, z) and (x, a, c′, z′) such that x∈L, c ≠ c′, V(x, c, z) = V(x, c′, z′) = a as input and outputs a witness w for x with probability 1 in polynomial time. A sigmaprotocol is honest verifier zeroknowlede. Namely, we assume that there exists a polynomialtime algorithm called a simulator, denoted by M, that takes x∈L and c ← C(x) as input and outputs (a, z) such that the distribution of simulated (a, c, z) is identical to the distribution of the real transcript generated by the honest prover and verifier. A note on relaxation. We can allow a small error probability in the correctness but the resulting signature scheme inherits the error probability in its correctness. Similarly, E can be a probabilistic algorithm with negligible error probability. Also, the quality of zeroknowledge simulator M can be relaxed to statistical or computational rather than perfect. These relaxations affect the unforgeability of the resulting signature scheme. 2.3 Security model of message recovery signature schemeTo define the security precisely, we first must present a syntactical definition of message recovery signature schemes. The one shown below is a very standard definition, which can be seamlessly turned into the definition of ordinary signature schemes with appendix by letting the nonrecoverable message n be the same as the entire message msg. Definition 1. (Message recovery signature scheme) A message recovery signature scheme consists of three probabilistic polynomialtime algorithms (G, S, V). – G is a key generation algorithm that takes security parameter λ and outputs a publickey pk and a privatekey sk. Associated with pk is a recoverable message space {0, 1}^{rec}. – S is a signatureissuing algorithm that takes a privatekey sk and a message msg and outputs a signature σ and the nonrecoverable part of the message n. – V is a verification protocol that takes a public key pk and a signed message (σ, n) and outputs (accept, msg) or reject. It is required that (accept, msg) = V(pk, S(sk, msg)) for any pk and sk generated by G and for any msg. A message recovery signature scheme is secure if it withstands an adversary that asks a legitimate signer to sign arbitrary messages and then attempts to forge a valid signed message never signed by the signer. Formal definition of such a chosen message adversary follows. Definition 2. (Chosen message adversary) A (τ_{sig}, ε_{sig}, q_{s}, q_{h}) chosen message adversary A_{sig} against the above message recovery signature scheme is an oracle Turing machine that launches a chosen message attack that consists of the following steps. 1. (pk, sk) ← G(λ) 2. (˜σ, ˜n ) ← A_{sig}^{S,H}(pk) S is the signing oracle such that, for every signing query for a message, say msg_{i}, it returns a legitimately generated signed message (σ_{i}, n_{i}) by using private key sk. H is the random oracle that corresponds to hash function H in the signature scheme. Let T denote the signed messages (σ_{i}, n_{i}) observed by S. The final output of A_{sig} must satisfy (˜σ, ˜n ) T. A_{sig} makes at most q_{s} queries to S and at most q_{h} queries to H and then stops within running time τ_{sig}. Then, V(pk, ˜σ,˜n ) outputs (accept, msg) for some msg with probability at least τ_{sig}. The probability is taken over all the random coins used in the chosen message attack. 3. Our construction3.1 Redundancy functionThe heart of our construction is the redundancy function, say Δ: Dom(A) × {0, 1}^{rec} → D, that takes the first message a of the underlying sigmaprotocol and a recoverable part of the message m and outputs a redundant message r in some specific domain D. More precisely, we assume a family of functions Δ_{λ} = {Δ} and its ensemble {Δ_{λ}}_{λ∈N} that satisfy the following properties. 1. (Invertibility) For every Δ, there exists an inverse function Δ^{−1} that, given (a, r), outputs m. It is required that m = Δ^{−1}(a, Δ(a, m)). 2. (Compactness) BitLen(Dom(A)) + _{rec} ≥ BitLen(D). 3. (Randomness) For every Δ and every m, the distribution of Δ(a, m) for uniformly chosen a is uniform. 4. (Collision resistance) Given Δ ← Δ_{λ}, any probabilistic algorithm running within time τ finds (a, m) and (a′, m′) such that Δ(a, m) = Δ(a′, m′) and (a, m) ≠ (a′, m′) only with probability, say ε_{Δ}^{coll}(τ), which is negligible in λ if τ is polynomial in λ. Invertibility is needed for our construction to work. Compactness is needed for the message recovery property to be meaningful. Without this property, the signed message of the resulting message recovery signature scheme would be longer than the one from the signature scheme with appendix. Randomness and collision resistance are needed for unforgeability of the resulting signature scheme. One can relax the randomness property to be computationally indistinguishable from uniform. This will affect the unforgeability of the resulting signature scheme, as we mention later. As we shall see in Section 4. ε_{Δ}^{coll}(τ) may take some other parameters such as the maximum number of oracle queries, depending on its construction. 3.2 Our schemeGiven a sigmaprotocol (A, C, Z, V, E, M) for L and a hash function H, we construct a message recovery signature scheme (G, S, V) as follows. – (Key generation: G) Given λ, run (x, w) ← L(λ) and output x as a public key and w as a private key. Security parameter λ also determines the length of the recoverable part _{rec} and the redundancy function Δ. – (Signature generation: S) Given private key w and message msg, first chop msg as msg = mn so that m_{∈} {0, 1}^{rec}. Then, compute a ← A(x; t), r ← Δ(a, m), c ← H(r, n), and z ← Z(w, t, c). Then, output (r, z, n) as a signed message. – (Signature verification: V) Given x and (r, z, n), compute c ← H(r,n), a ← V(x, c, z), and m ← Δ^{−1}(a, r). If r ← Δ(a, m), output (accept, m,n). Otherwise, output reject. Implementation note. Depending on the specific structure of function Δ, the signature verification can be more efficient than recomputing Δ. See also Subsections 4.1 and 4.2. 3.3 Proof of securityTheorem 1. (Security against chosen message attacks) If there exists (τ_{sig}, ε_{sig}, q_{s}, q_{h})adversary A_{sig} for the message recovery signature scheme based on language L and an instance generator L, then there exists a (τ_{w}, ε_{w}) witness extraction adversary A_{L}, where τ_{w} ≤ 8τ _{sim} /,(ε_{sig} − 4 ) and ε_{w} ≥ . (1 − ) (1 − ε_{}^{coll}_{Δ, sim}). Here, τ_{sim} is τ_{sig} + q_{s}T_{sig}, where T_{sig} is the sum of the running times of C, M, and Δ, and ε^{coll}_{Δ, sim} is ( − ) −1ε_{Δ}^{coll} (2τ_{sim}). Proof. This is proved by constructing A_{L} from A_{sig} using a forking lemma in a standard manner. Let x∈L be an instance generated by L(λ). Given x as input, A_{L} works as follows. 1. Invoke A_{sig} with x as a public key and a uniformly chosen random tape. If A_{sig} terminates with a valid signed message (˜r,˜z, ˜n), proceed to the next step. Otherwise, repeat this step with the same x and a fresh random tape up to t_{1} times. Abort if never successful. Oracle queries are handled as follows. – (Query msg_{i} to S.) Execute c_{i} ← C(x) and (a_{i}, z_{i}) ← M(x, c_{i}). Then, chop msg_{i} into m_{i}n_{i}, compute r_{i} ← Δ(a_{i}, m_{i}), and define H so that c_{i} = H(r_{i}, n_{i}). If H has already been defined for such an input, abort. Otherwise, return (r_{i}, z_{i}, n_{i}) as a signed message. – (Query (r_{j}, n_{j}) to H .) If the input is fresh, select c_{j} uniformly, define H as c_{j} = H(r_{i}, n_{i}), and return c_{j}. Otherwise, return the preliminarily defined value. 2. Let i^{*} denote the index that H is asked (˜r, ˜n ). That is, r_{i*} = ˜r, n_{i*} = ˜n, and c_{i*} = H(˜r, ˜n). Let ˜a denote ˜a ← V(x, c_{i*}, ˜z). Move to the next step. 3. Invoke A_{sig} with x and the random tape used in the successful run of the first step. The simulation for oracle S is unchanged. Oracle H is simulated identically up to the point that (˜r, ˜n )) is asked. From that moment, H is simulated with new randomness. If A_{sig} terminates with a valid signed message (˜r, ˜z′, ˜n ), then proceed to the next step. Otherwise, repeat this step up to t_{2} times. Abort if never successful. 4. Let c′_{i*} =H(˜r, ˜n), which was observed in the second step, and let ˜a′ be ˜a′← V(x, c_{i*}′ , ˜z′). If c_{i*} = c′_{i*} or ˜a ≠ ˜a′, abort. Otherwise, compute w ← E(x, ˜a, c_{i*}, ˜z, c′_{i*} , ˜z′) and output w. Now we estimate the running time and the probability of success of A_{L}. First, the success probability is estimated as follows. – First, we estimate the error probability of the oracle simulation. Observe that the simulation of S fails only if (r_{i}, n_{i}) has already been defined for H. Since the distribution of n_{i} is arbitrarily determined by the adversary, we consider only the sufficient condition that r_{i} has appeared so far. Suppose that r_{i} is uniformly chosen from D. Then, the probability that r_{i} has appeared among the inputs to H is at most q_{h}/D. Next consider r_{i} generated through Δ from uniformly chosen a_{i}. Since we assume that the distribution of the output of Δ is uniform, the above probability does not change at all. (This is where the relaxation of randomness on Δ would have an effect. If the randomness is not perfect but just indistinguishable with some advantage, say ε^{dist}_{Δ}(τ) for a polynomialtime distinguisher whose running time is , then the error probability of the simulation increases by ε^{dist}_{Δ}(τ_{sim}).) Then, consider a_{i} generated by M. As in the previous step, since the distribution of such a_{i} is perfectly the same as uniform, the error probability does not change at all, either. (Again, this is where the relaxation on the zeroknowledge property of the underlying sigmaprotocol matters. If the zeroknowledge simulation is only computational with some advantage, say at most ε_{zk}(τ) for a distinguisher whose running time is τ, then the error probability increases by ε_{zk}(τ_{sim}).) Therefore, the probability ε_{serr}
that the simulation fails while handling q_{s} queries is bound as The simulation of random oracle H is obviously perfect unless the simulation of S fails. – The success probability of A_{sig} for fixed x is at least ε_{sig}/2 with probability 1/2 (over the choice of x) due to the heavyrow lemma of [15], [18]. With the simulated signing oracle, the success probability reduces to ε_{sig}/2 − ε_{serr}. Accordingly, every run of A_{sig} in the first step is successful with probability at least ¦Ì_{1}: = (ε_{sig}/2 −ε_{serr}). By repeating the attempt up to t_{1}: = 1/¦Ì_{1} times, we get at least one valid (˜r, ˜z, ˜n ) at the end of the first step with probability greater than ε_{1} ≥1− (1 − ¦Ì_{1})^{1/¦Ì1} ≥ 1 − e<^{−1} ≥ 3/5. Let Θ* denote the randomness used for simulating the random oracle for the i*th and all subsequent queries in the successful run. Let ¦¨^{*} denote all other randomness also used in the successful run. – Over the choice of ¦¨^{*}, the success probability of A_{sig} for fixed ¦¨ and x is at least ε_{sig}/4 with probability 1/2 (over the choice of ¦¨) due to the heavyrow lemma again. If we apply the same argument as in the first step, each attempt in the third step is successful with probability at least ¦Ì_{2}: = ε_{sig}/4 − ε_{serr}. If we repeat this attempt up to t_{2}: = 1/¦Ì_{2} times, another valid signed message (˜r′, ˜z′, ˜n′;) is obtained with probability greater than ε_{2} ≥ 1 − (1 − ¦Ì_{2})^{1/¦Ì2} ≥ 1 − e^{−1} ≥ 3/5. Furthermore, the outcome corresponds to the i^{*}th query to H, i.e., ˜r′ = ˜r and ˜n′ = ˜n, with probability at least 1/q_{h}. – In the fourth step, c_{i*} = c′_{i*}
happens only with probability 1/C. Observe that, if ˜a
≠ ˜a′, then we have ˜r
= Δ(˜a, Δ^{−1}(˜a,
˜r)) = Δ(˜a′,
Δ^{−1}(˜a′,
˜r)) which is a collision. Let ε^{coll}_{Δ,sim}, denote the probability of such
an event occurring. Since the running time of A_{L}
is less than τ_{w}= (t_{1} + t_{2})
τ_{sim}, ε^{coll}_{Δ, sim}, collsim can be upper bounded by
ε^{coll}_{Δ, sim}(τ_{w}).
It is, however, an overestimation. More precise estimation of the probability
is at most By summing up the above assessment, we get the total success probability of
A_{L} as Assuming that the time for generating randomness and table searching used for
simulating H is free, the running time is 4. Construction of Δ4.1 In the random oracle modelWe construct Δ and Δ^{−1} as follows.
H_{1}: {0, 1}^{*} → {0, 1}^{1} and H_{2}: {0, 1}^{*} → {0, 1}^{rec} are hash functions modeled as independent random oracles. To make the message recovery property meaningful, _{1} < a should hold. Note that, as we shall show in Lemma 2, setting _{rec} too short would cause a security problem, i.e., it would result in a high probability of collision. Recommendable settings of _{1} and _{rec} are _{1} = _{rec} = a/2 to balance security and the benefit of message recovery. If one would like the length of the recoverable part to vary depending on the given message, the message must be polynomially long or some kind of padding must be applied so that the total length becomes polynomial in λ. With the above instantiation, the verification procedure specified in Subsection 3.2 can be slightly more efficient. The modified signature verification procedure is as follows. – (Signature verification: V) Given x and (r, z, n), compute c ← H(r, n), a ← V(x, c, z), and m ← Δ^{−1}(a, r). Then, let h_{1}h_{2} ← r. If h_{1} = H_{1}(a, m), output (accept, mn). Otherwise, output reject. Namely, it replaces the testing r = Δ(a, m) with h_{1} = H_{1}(a, m) and saves one computation of H_{2}. The above Δ satisfies the required properties listed in Subsection 3.1. Specifically, the following lemmas hold. The first three (invertibility, compactness, and randomness) are trivial and the last one (collision resistance) is followed by a detailed proof. Lemma 1. (Invertibility, compactness, and randomness) The above Δ is invertible. In particular, the above Δ^{−1} is the inverse function. It is compact when _{1} is set so that BitLen(Dom(A)) ≥ _{1}. The output distribution of Δ is uniform over {0, 1}^{1}+^{rec} if H_{1} and H_{2} are independent random oracles. Lemma 2. (Collision resistance) If _{1} and _{rec} are polynomial in λ, the above Δ is collision resistant against polynomialtime adversaries that make a polynomial number of queries to H_{1} and H_{2}. In particular, for any adversary asking q_{1} and q_{2} queries to H_{1} and H_{2}, respectively, the probability of collision is upper bounded by ∈_{Δ}^{coll}(q_{1}, q_{2}) ≤ + + + . + . Proof. (of Lemma 2) We refine the notation for the input and output of H_{1} and H_{2} as follows. Let h_{i} = H_{1}(R_{i}, M_{i}) and c_{j} = H_{2}(D_{j}, e_{j}) denote the ith and jth input/output of H_{1} and H_{2}, respectively. We first define the notation and some terminology. Let L_{1} denote the set of queries and answers for H_{1}, i.e., L_{1} = {(R_{i}, M_{i}, h_{i})i = 1, ..., q_{1}}. Define L_{2} = {(D_{j}, e_{j}, c_{j}) j = 1, ..., q_{2}} as well. Without loss of generality, we assume that every query is fresh, so every entry in L_{1} and L_{2} is unique. By (i, j), we denote a set that consists of the ith entry of L_{1} and the jth one of L_{2}, i.e., (i, j) = (R_{i}, M_{i}, h_{i}, D_{j}, e_{j}, c_{j}). We say that (i, j) is fully consistent if (R_{i}, h_{i}) = (D_{j}, e_{j}). Note that if (i, j) is fully consistent, it forms a correct computation of Δ. By FullCon (i, j) = true or false, we mean that (i, j) is or is not fully consistent, respectively. We also say that (i, j) is semiconsistent if R_{i} = D_{j}. A semiconsistent (i, j) may also be fully consistent but this is not necessary. By SemiCon(i, j) = true or false , we mean that (i, j) is or is not semiconsistent, respectively. Let τ(H_{b}, i) for b ∈1, 2 denote the time at which the ith query is made to H_{b}. (It can be understood as the number of steps before the ith query is made to oracle H_{1}). Since no more than one query can be made at one step, τ(H_{b}, i) is strictly larger or smaller than τ(H_{b′}, i′) if i ≠ i′. We say that (i, j) is regular if τ(H_{1}, i) < τ(H_{2}, j); otherwise, it is irregular. The intuition behind this terminology is that anyone who computes Δ correctly should ask H_{1 }first and then H_{2} in that order. Therefore, a normal computation of Δ yields regular and fully consistent (i, j). By Regular(i, j) = true or false , we mean that (i, j) is regular or irregular, respectively. We define function X(i, j) as X(i, j) = c_{j} ⊕ M_{i}. With this terminology, we can say that a collision Δ(R_{i}, M_{i}) = Δ(R_{u}, M_{u}) happens if and only if there exist j and υ such that – (i, j) ≠ (u, υ), – FullCon(i, j) = FullCon(u, υ) = true, and – h_{i}X(i, j) = h_{u}X(u, υ). Accordingly, we say that (i, j) and (u, υ) collide if and only if the above conditions are satisfied. By Collision(i, j, u, υ) = true or false , we mean (i, j) and (u, υ) do or do not collide, respectively. We classify entries in L_{2} by the value of e_{j}. Namely, for every different value of e_{j} in L_{2}, we consider a set of indexes {υe_{υ} = e_{j}} and label such sets as Class_{1}, Class_{2}, .... In this way, every entry in L_{2} is associated with one class. Obviously, there are at most q_{2} classes. Let SameClass( j, υ) be true if and only if j and υ are in the same class, i.e., e_{j} = e_{υ}. The above definitions lead simply to some facts. Fact 1. If SameClass( j, υ) = false, then Collision(i, j, u, υ) = false. If two consistent (i, j) and (u, υ) are in different classes, then h_{i} = e_{j} ≠ e_{υ} = h_{u} holds and they cannot collide. Moreover, if either of them is inconsistent, it cannot have a collision, either. Therefore, we consider the possibility of collision only within a class. Fact 2. {(i, j)FullCon(i, j) = true} ≤ q_{1}. Namely, there are at most q_{1} pairs of fully consistent queries. Suppose that (i, j) is fully consistent and (i, j′) is also fully consistent for some j′. Then, e_{j} = h_{i} = e_{j′} and D_{j} = R_{i} = D_{j′} holds, and it means that the jth and j′th queries to H_{2} are identical, whereas we assumed that L_{2} contains no duplicate entries. Fact 3. {(i, j)SemiCon(i, j) = true, ∧j∈Class_{k}} ≤ q_{1} for every Class_{k}. Namely, there are at most q_{1} semiconsistent pairs of queries with regard to each class of L_{2}. Suppose that (i, j) and (i, j′) are both semiconsistent and j and j′ are in the same class. We then have D_{j} = R_{i} = D_{j′} and e_{j} = e_{j′}. Hence, the jth and j′th entries in L_{2} are identical. Suppose that (i, j) and (u, υ) are both semiconsistent and they are in the same class. These queries can collide. (On the other hand, if either of them is not semiconsistent or if they are in different classes, they cannot collide.) Regarding the regularity of (i, j) and (u, υ), one of the following cases must be true. Case 1. Both are regular. Case 2. One is regular and the other is irregular. Case 3. Both are irregular. A pair of regular queries, say (i, j), is either fully consistent (the case where (R_{i}, h_{i}) is the input (D_{i}, e_{i}) to H_{2}) or inconsistent (something else is given as input to H_{2}). These cases are independent of the choice of the randomness of H_{1} and H_{2}. If (i, j) is regular but inconsistent, there is no chance of it causing a collision with another pair of queries. Hence, we only need to consider fully consistent queries in the regular case. Below, we consider the probability of collision in each of the above cases. Case 1. Consider regular and fully consistent queries (i, j) and (u, υ) in the same class. For a collision to occur, h_{i} = h_{u} and c_{j} ⊕ M_{i} = c_{υ} ⊕ M_{u} are necessary. First of all, h_{i} = h_{u} happens with probability at most because both h_{i} and h_{u} are uniformly chosen as a result of the true randomness of H_{1} and H_{2}. Next, consider c_{j} ⊕ M_{i} = c_{υ}⊕ M_{u}. Since (i, j) and (u, υ) are regular, τ(M_{i}) < τ(c_{j}) and τ(M_{u}) < τ(c_{υ}) hold. Therefore, at least one of c_{j} and c_{υ} is independent of M_{i} and M_{u}. Since both c_{j} and c_{υ} are chosen uniformly, c_{j} ⊕ M_{i} = c_{υ} ⊕ M_{u} holds with probability at most . Thus, (i, j) and (u, υ) collide with probability at most . Let FullCon(Class_{k}) denote the number of fully consistent queries in Class_{k}. By summing up the probability for all the classes and applying Fact 2, we get Case 2. For every entry i in L_{1}, there exists at most one entry, say j, in each class of L_{2} such that (i, j) is semiconsistent. (If there exists j′ in the same class such that (i, j′) is semiconsistent, then j = j′ because R_{i} = D_{j} = D_{j′} and e_{j} = e_{j′}.) We consider the probability that the semiconsistent (i, j) in the class is irregular and collides with a regular and fully consistent (u, υ) in the same class. First of all, (i, j) must be fully consistent to have a collision. Therefore, it must satisfy h_{i} = e_{j} when h_{i} is randomly and independently chosen, which in turn happens with probability at most . Second, c_{j} ⊕ M_{i} = c_{υ }⊕ M_{u} must hold. Since (i, j) is irregular, however, M_{i} can be set after the values of c_{j}, c_{υ}, and M_{u} have been seen, and this condition can be satisfied with probability 1 by setting M_{i} as M_{i} = c_{j} ⊕ c_{υ} ⊕ M_{u}. Note that there is only one regular and fully consistent (u, υ) in the class that causes c_{j} ⊕ M_{i} = c_{υ} ⊕ M_{u} with (i, j). Otherwise, there would exist another regular and fully consistent (u′, υ′) in the same class and it would cause c_{j} ⊕ M_{i} = c_{υ′} ⊕ M_{u′}. Then, (u, υ, u′, ¦Ô′) would have a collision. This has already been treated in Case 1. One question is whether or not such a manipulated M_{i} can also be a candidate for a collision in other classes. We claim that if (i, j) in Class_{k} has (u, υ) in Class_{k} that satisfies c_{j} ⊕ M_{i} = c_{υ} ⊕ M_{u}, then (i, j′) in Class_{k′} has (u′, υ′) in Class_{k′} that also causes c_{j′} ⊕ M_{i} = c_{υ′} ⊕ M_{u′} only with probability . (Naturally, we suppose (i, j′) is irregular and semiconsistent and (u′, υ′) is regular and fully consistent so they fall into Case 2.) This happens only if Since (u, υ) and (u′, υ′) are regular, all c_{j} c_{υ} ⊕ M_{u} c_{j′}, and c_{υ′} ⊕ M_{u′} are random and independent of each other. Hence, E_{q}.(11) is satisfied only with probability . In summary, with regard to each query i in L_{1}, we have a collision probability of at most +≤ + . Summing up the probabilities for all i in L_{1}, we get the upper bound of the probability for Case 2 as Case 3. We consider the probability that, for an irregular semiconsistent query (i, j) in a class, there exists another irregular semiconsistent query (u, υ) in the same class that causes a collision. First of all, (i, j) itself must be fully consistent. This is satisfied with probability at most . In addition, (u, υ) must be fully consistent and this happens with probability at most . Next, we consider the number of (u, υ) in the class, say Class_{k}, that collide with (i, j) with nonzero probability. We first claim that if j =υ , then (i, j) and (u, υ) cannot collide. Suppose that j =υ . Since both (i, j) and (u, υ) are semiconsistent, R_{i} = D_{j} = D_{υ} = R_{u} holds. Since (i, j) and (u, υ) are distinct, (R_{i}, M_{i}) ≠ (R_{u}, M_{u}), so M_{i} ≠ M_{u}. If a collision happens, c_{j} ⊕ M_{i} = c_{υ} ⊕ M_{u} must hold. But this is not possible because cj = cυ and M_{i} ≠ M_{u}. We next claim that if (i, j) and (u, υ) satisfy c_{j} ⊕ M_{i} = c_{υ} ⊕ M_{u}, then no other (u′,υ ) in the same class can cause c_{j} ⊕ M_{i} = c_{υ} ⊕ M_{u′}. The reason is essentially the same as for the above claim. Since both (u, υ) and (u′, υ) are semiconsistent and in the same class, M_{u} ≠ M_{u}′ must be the case (otherwise, u and u′ are the same query in L_{1}). Hence, c_{j} ⊕ M_{i} = c_{υ} ⊕ M_{u} = c_{υ} ⊕ M_{u}′ cannot happen. From the above two claims, we can see that, for (i, j), and for every υ in Class_{k}, there could exist at most one u that makes (i, j, u, υ) have a collision. Namely, there are at most Class_{k} — 1 candidates of (u, υ) that have nonzero probability of collision with (i, j). In summary, for every irregular semiconsistent (i, j), the probability that there exists an irregular semiconsistent (u, υ) that causes a collision is at most . . From Fact 3, we have at most q_{1} candidates of such (i, j) in a class. Thus, the probability of collision in each class is at most . . Summing up the probabilities for all classes and applying Σ_{k} Class_{k} ≤ q_{2}, we get Summary. From the bounds shown as in the Eqs. (10), (12), and (13), we get 4.2 In the ideal cipher modelLet (SG, SE, SD) be a symmetric encryption such that: SG is a probabilistic algorithm that takes security parameter λ and outputs a secret key sk, SE is a (probabilistic) algorithm that encrypts input message msg and outputs a ciphertext ξ by using secret key sk, and SD is a decryption algorithm that takes ciphertext ξ and recovers plaintext msg by using secret key sk. We model a symmetric encryption scheme as an ideal cipher in the following – SE and SD are oracles. – Given a query (sk, msg), oracle SE does the following. If (sk, msg) is not recorded in Π, an initially empty list, select a new random permutation RP: {0, 1}^{msg} → {0, 1}^{msg} and return RP(msg). Then, record (sk, msg, RP) to Π. If (sk, msg) is in Π, simply return RP(msg) using the corresponding RP. Oracle SD does the reverse. That is, given a query (sk, ξ), if (sk, ξ) is not in Π, select a new random permutation RP: {0, 1}^{ξ} → {0, 1}^{ξ}, record (sk, ξ, RP) in Π, and return RP^{−1}(ξ). If (sk, ξ) is in Π, simply return RP^{−1}(ξ) using the corresponding RP. We now construct Δ and Δ^{−1} as follows. Let str be a sufficiently long fixed string. Without loss of generality, we assume that SG takes randomness from the domain of a in the above construction. (Otherwise, a is preprocessed with an entropy smoothing method and then given to SG.) The output of SD is separated into two parts so that the length of the first part equals the length of the prefixed string str and the remaining part is taken as a recovered message. In Δ^{−1}, the recovered str is not used at all. It can be used to make the verification procedure slightly more efficient as follows. – (Signature verification: V) Given x and (r, z, n), compute c ← H(r, n), a ← V(x, c, z), and m ← Δ^{−1}(a, r). Then, let υar be the string obtained while computing Δ^{−1}. If υar = str, output (accept, mn). Otherwise, output reject. Namely, it verifies whether or not the predetermined string str is recovered correctly. We claim that the above construction conforms to our requirements. Again, the first three requirements are obviously satisfied. Collision resistance is stated with a proof below. Let _{str} be BitLen(str). Lemma 3. (Invertibility, compactness, and randomness) The above Δ is invertible. In particular, the above Δ^{−1} is the inverse function. It is compact when _{ str} is set to _{str} ≤ BitLen(Dom(A)). The output distribution of Δ is uniform over {0, 1}^{ str + rec}. Lemma 4. (Collision resistance) If _{str} and _{rec} are polynomial in λ, the above Δ is collision resistant against polynomialtime adversaries that make a polynomial number of queries to SE and SD. In particular, for any adversary asking q_{d} and q_{e} queries to SE and SD, respectively, ∈_{Δ}^{coll} ≤ + holds. Proof. Observe that a collision happens only if – there exists at least one pair of the same value among the returned values from SE or – there exists a message returned from SD whose leading part is str. The probability of the former case is upper bounded by as a result of the birthday paradox among at most q_{e} randomly selected return values from SE. Since every value returned from SD is random, the probability that the second case occurs when at most q_{d} queries are asked is at most . Summing up these bounds gives the upper bound as stated. From these lemmas, recommendable settings would be _{str} = _{rec} = BitLen(Dom(A))/2 to balance the efficiency of the message recovery property and unforgeability. With this instantiation of the Δ function and the Schnorr identification scheme defined over a group over an ellipticcurve as the underlying sigmaprotocol, the resulting message recovery signature scheme turns out to be the ECPV scheme presented in [12]. 5. Conclusion and open problemsWe presented a generic method that transforms a sigmaprotocol into a message recovery signature scheme in the random oracle model with specific properties of the redundancy function. The framework allows one to build a new message recovery signature scheme in a modular fashion such that the sigmaprotocol and the redundancy function are designed and analyzed in completely separate ways. Thus, one can now focus on designing the redundancy function with the shown properties and then automatically obtain a secure digital signature scheme with message recovery. Two specific redundancy functions were shown and it was proved that one meets all the sufficient properties in the random oracle model while the other meets those in the ideal cipher model. Combined with a sigmaprotocol in our framework, the first redundancy function yields a refined version of the ECAO message recovery signature scheme with a more understandable and convincing modular security proof. This can be regarded as an example that shows that our new framework can improve existing schemes by increasing their security. The second redundancy function yields an already known scheme, ECPV with another security proof. Thus, our framework gives another insightful explanation into an existing scheme and validates its design. These examples show the usefulness of our framework. One of the remaining challenges is to construct the redundancy function without using idealized assumptions. Another challenge is to find another framework that yields shorter signed messages. One essential question is whether the redundancy is unavoidable or not for any message recovery signature schemes. Another direction of research includes relaxing the requirements for the redundancy function or even finding necessary and sufficient conditions. References
