﻿Gaussian Bounds for Noise Correlation of Functions
arXiv:math/0703683v4 [math.PR] 16 Mar 2008
Elchanan Mossel ∗
U.C. Berkeley
mossel@stat.berkeley.edu
June 11, 2009
Abstract
In this paper we derive tight bounds on the expected value of products of low influence functions
defined on correlated probability spaces. The proofs are based on extending Fourier theory
to an arbitrary number of correlated probability spaces, on a generalization of an invariance principle
recently obtained with O’Donnell and Oleszkiewicz for multilinear polynomials with low
influences and bounded degree and on properties of multi-dimensional Gaussian distributions.
We present two applications of the new bounds to the theory of social choice. We show
that Majority is asymptotically the most predictable function among all low influence functions
given a random sample of the voters. Moreover, we derive an almost tight bound in the context
of Condorcet aggregation and low influence voting schemes on a large number of candidates.
In particular, we show that for every low influence aggregation function, the probability that
Condorcet voting on k candidates will result in a unique candidate that is preferable to all
others is k−1+o(1) . This matches the asymptotic behavior of the majority function for which the
probability is k−1−o(1) .
A number of applications in hardness of approximation in theoretical computer science were
obtained using the results derived here in subsequent work by Raghavendra and jointly with
Austrin.
A different type of applications involves hyper-graphs and arithmetic relations in product
spaces. For example, we show that if A ⊂ Zn m is of low influences, then the number of k
tuples (x1, . . . , xk) ∈ Ak satisfying ∑k
i=1 xi ∈ Bn mod m where B ⊂ [m] satisfies |B| ≥ 2 is
(1±o(1))P[A] k (mk−1|B|) n which is the same as if A were a random set of probability P[A]. Our
results also show that for a general set A without any restriction on the influences there exists
a set S ⊂ [n] with |S| = O(1) such that if C = {x : ∃y ∈ A, y [n]\S = x [n]\S} then the number of
k-tuples (x1, . . .,xk) ∈ Ck satisfying ∑k
i=1 xi ∈ Bn mod m is (1 ± o(1))P[C] k (mk−1|B|) n .
∗
Supported by a Sloan fellowship in Mathematics, by BSF grant 2004105, NSF Career Award (DMS 054829) and
by ONR award N00014-07-1-0506. Part of this work was carried out while the author was visiting IPAM, UCLA
11 Introduction
1.1 Harmonic analysis of boolean functions
This paper studies low influence functions f : Ω n → [0,1], where (Ω n ,µ n ) is a product probability
space and where the influence of the ith coordinate on f, denoted by Infi(f) is defined by
Infi(f) = E[Var[f(x)]]. (1)
x xi
The study of low influence functions is motivated by applications from the theory of social
choice in mathematical economics, from applications in the theory of hardness of approximation
in theoretical computer science and from problems in additive number theory. We refer the reader
to some recent papers [14, 15, 16, 17, 6, 23, 8] for motivation and general background. The main
theorems established here provide tight bounds on the expected value of the product of functions
defined on correlated probability spaces. These in turn imply some new results in the theory
of social choice and in the theory of hyper-graphs. Application to hardness of approximation in
computer science were derived in subsequent work in [1] and [21].
In our main result we consider a probability measure P defined on a space ∏k
i=1 Ωi . Letting
fi : (Ωi) n → [0,1],1 ≤ i ≤ k be a collection of low influence functions we derive tight bounds on
E[f1 ...fk] in terms of E[f1],... ,E[fk] and a measure of correlation between the spaces Ω1 ,... ,Ωk.
The bounds are expressed in terms of extremal probabilities in Gaussian space, that can be calculated
in the case k = 2. When k ≥ 2 and P is a pairwise independent distribution our bounds
show that E[f1 ...fk] is close to ∏k
i=1 E[fi]. We also apply a simple “weak Szemeredi regularity”
type argument in order to obtain results for general functions not necessarily of low influences.
The results show that the bounds for low influence functions hold for general functions after the
functions have been “modified” in a bounded number of coordinates. The rest of the introduction
is devoted to various applications to social choice and to the theory of hyper-graph followed by the
statement of the main technical results.
1.2 Prediction Of Low Influence Voting
Suppose n voters are to make a binary decision. Assume that the outcome of the vote is determined
by a social choice function f : {−1,1} n → {−1,1}, so that the outcome of the vote is f(x1,...,xn)
where xi ∈ {−1,1} is the vote of voter i. We assume that the votes are independent, each ±1 with
probability 1
2
. It is natural to assume that the function f satisfies f(−x) = −f(x), i.e., it does not
discriminate between the two candidates. Note that this implies that E[f] = 0 under the uniform
distribution. A natural way to try and predict the outcome of the vote is to sample a subset of
the voters, by sampling each voter independently with probability ρ. Conditioned on a vector x of
votes the distribution of y, the sampled votes, is i.i.d. where yi = xi with probability ρ and yi = ∗
(for unknown) otherwise.
Conditioned on y, the vector of sampled votes, the optimal prediction of the outcome of the
vote is given by sgn((Tf)(y)) where
This implies that the probability of correct prediction is given by
(Tf)(y) = E[f(x)|y]. (2)
P[f = sgn(Tf)] = 1
(1 + E[f sgn(Tf)]).
2
For example, when f(x) = x1 is the dictator function, we have E[f sgn(Tf)] = ρ corresponding to
the trivial fact that the outcome of the election is known when voter 1 is sampled and are ±1 with
2probability 1/2 otherwise. The notion of predictability is very natural in statistical contexts. It
was also studied in a more combinatorial context in [19].
In the first application presented here we show that
Theorem 1.1 (“Majority Is Most Predictable”) Let 0 ≤ ρ ≤ 1 and ǫ > 0 be given. Then
there exists τ > 0 such that if f : {−1,1} n → [−1,1] satisfies E[f] = 0 and Infi(f) ≤ τ for all i,
then
E[f sgn(Tf)] ≤ 2
π arcsin √ ρ + ǫ, (3)
where T is defined in (2).
Moreover, it follows from the central limit theorem (see Section 7.5; a version of this calculation
also appears in [19]) that if Maj n(x1,...,xn) = sgn( ∑ n
i=1 xi), then
lim
n→∞ E[Majnsgn(TMaj n)] = 2
π arcsin √ ρ.
We note that the bound obtained in Theorem 1.1 is a reminiscent of the Majority is Stablest
theorem [16, 17] as both involve the arcsin function. However, the two theorems are quite different.
The Majority is Stablest theorem asserts that under the same condition as in Theorem 1.1 it holds
that
E[f(x)f(y)] ≤ 2
π
arcsin ρ + ǫ.
where (xi,yi) ∈ {−1,1} 2 are i.i.d. with E[xi] = E[yi] = 0 and E[xiyi] = ρ. Thus “Majority is
Stablest” considers two correlated voting vectors, while “Majority is Most Predictable” considers a
sample of one voting vector. In fact, both results follow from the more general invariance principle
presented here.
It is well known that in the context of “Majority is Stablest”, among all boolean functions with
E[f] = 0 the maximum of E[f(x)f(y)] is obtained for dictator functions of the form f(x) = xi.
However, note that for ρ → 0 we have arcsin √ ρ >> ρ implying that for small values of ρ in the
context of prediction the majority function is much more predictable than the dictator function.
We also note that the “Ain’t over until it’s over” Theorem [16, 17] provides a bound under the
same conditions on
P[Tf > 1 − δ],
for small δ. However, this bound is not tight and does not imply Theorem 1.1. Similarly, Theorem
1.1 does not imply the “Ain’t over until it’s over” theorem. The bounds in “Ain’t Over Until
It’s Over” were derived using invariance of Tf while the bound (3) requires the joint invariance of
f and Tf.
1.3 Condorcet Paradoxes
Suppose n voters rank k candidates. It is assumed that each voter i has a linear order σi ∈ S(k)
on the candidates. In Condorcet voting, the rankings are aggregated by deciding for each pair of
candidates which one is superior among the n voters.
More formally, the aggregation results in a tournament Gk on the set [k]. Recall that Gk is a
tournament on [k] if it is a directed graph on the vertex set [k] such that for all a,b ∈ [k] either
(a > b) ∈ Gk or (b > a) ∈ Gk. Given individual rankings (σi) n i=1 the tournament Gk is defined as
follows.
Let x a>b (i) = 1, if σi(a) > σi(b), and x a>b (i) = −1 if σi(a) < σi(b). Note that x b>a = −x a>b .
3The binary decision between each pair of candidates is performed via a anti-symmetric function
f : {−1,1} n → {−1,1} so that f(−x) = −f(x) for all x ∈ {−1,1}. The tournament Gk = Gk(σ;f)
is then defined by letting (a > b) ∈ Gk if and only if f(xa>b ) = 1.
Note that there are 2 (k
2) tournaments while there are only k! = 2Θ(k log k)
linear rankings. For
the purposes of social choice, some tournaments make more sense than others.
Definition 1.2 We say that a tournament Gk is linear if it is acyclic. We will write Acyc(Gk)
for the logical statement that Gk is acyclic. Non-linear tournaments are often referred to as nonrational
in economics as they represent an order where there are 3 candidates a,b and c such that
a is preferred to b, b is preferred to c and c is preferred to a.
We say that the tournament Gk is a unique max tournament if there is a candidate a ∈ [k] such
that for all b ̸= a it holds that (a > b) ∈ Gk. We write UniqMax(Gk) for the logical statement that
Gk has a unique max. Note that the unique max property is weaker the linearity. It corresponds to
the fact that there is a candidate that dominates all other candidates.
Following [12, 11], we consider the probability distribution over n voters, where the voters have
independent preferences and each one chose a ranking uniformly at random among all k! orderings.
Note that the marginal distributions on vectors x a>b is the uniform distribution over {−1,1} n and
that if f : {−1,1} n → {−1,1} is anti-symmetric then E[f] = 0.
The case that is now understood is k = 3. Note that in this case G3 is unique max if and
only if it is linear. Kalai [11] studied the probability of a rational outcome given that the n voters
vote independently and at random from the 6 possible rational rankings. He showed that the
probability of a rational outcome in this case may be expressed as 3
4
(1 + E[fTf]) where T is the
Bonami-Beckner operator with parameter ρ = 1/3.
It is natural to ask which function f with small influences is most likely to produce a rational
outcome. Instead of considering small influences, Kalai considered the essentially stronger assumption
that f is monotone and “transitive-symmetric”; i.e., that for all 1 ≤ i < j ≤ n there exists a
permutation σ on [n] with σ(i) = j such that f(x1,... ,xn) = f(x σ(1),... ,x σ(n)) for all (x1,... ,xn).
Kalai conjectured that Majority is the transitive-symmetric function that maximized 3
4 (1+E[fTf]).
This was proven using the Majority is Stablest Theorem [16, 17]. Here we obtain similar results
for any value of k. Our result is not tight, but almost tight. More specifically we show that:
Theorem 1.3 (“Majority is best for Condorcet”) Consider Condorcet voting on k candidates.
Then for all ǫ > 0 there exists τ = τ(k,ǫ) > 0 such that if f : {−1,1} n → {−1,1} is anti-symmetric
and Infi(f) ≤ τ for all i, then
P[UniqMax(Gk(σ;f))] ≤ k −1+ok(1)
+ ǫ. (4)
Moreover for f = Maj n we have Infi(f) ≤ O(n −1/2 ) and it holds that
P[UniqMax(Gk(σ;f))] ≥ k −1−ok(1)
− on(1). (5)
Interestingly, we are not able to derive similar results for Acyc. We do calculate the probability
that Acyc holds for majority.
Proposition 1.4 We have
lim
n→∞ P[Acyc(Gk(σ;Maj n))] = exp(−Θ(k 5/3 )). (6)
We note that results in economics [3] have shown that for majority vote the probability that the
outcome will contain a Hamiltonian cycle when the number of voters goes to infinity is 1 − ok(1).
41.4 Hyper Graph and Additive Applications
Here we discuss some applications concerning hyper-graph problems. We let Ω be a finite set
equipped with the uniform probability measure. We let R ⊂ Ω k denote a k-wise relation. For sets
A1,...,Ak ⊂ Ωn we will be interested in the number of k-tuples x1 ∈ A1,... ,xk ∈ Ak satisfying
the relation R in all coordinates, i.e. (x1 i ,... ,xki ) ∈ R for all i. Assume below that R satisfies the
following two properties:
• For all a ∈ Ω and all 1 ≤ i ≤ k it holds that P[xi = a|R(x)] = |Ω| −1 . (This assumption is
actually not needed for the general statement – we state it for simplicity only).
• The relation R is connected. This means that x,y ∈ R there exists a path x = y 0 ,y1,... ,y r =
y in R such that y i and y i+1 differ in one coordinate only.
We will say that the relation R ⊂ Ω k that is pairwise smooth if for all i,j ∈ [k] and a,b ∈ Ω it holds
that
P[xi = a,xj = b|R] = P[xi = a|R]P[xj = b|R]
As a concrete example, consider the case where Ω = Zm
∑
and R consists of all k-tuples satisfying
k
i=1 xi ∈ B mod m where B ⊂ Zm. When k ≥ 2 we have P[xi = a|R] = m−1 for all i and a.
When k ≥ 3, we have pairwise smoothness. The connectivity condition holds whenever |B| > 1.
For a set A ⊂ Zn m and S ⊂ [n] we define
AS = {y : ∃x ∈ A,x [n]\S = y [n]\S},A S = {y : ∀x s.t. x [n]\S = y [n]\S, it holds that x ∈ A}
Our main result in the context of hyper graphs is the following.
Theorem 1.5 Let R be a connected relation on Ω k . Then there exist two continuous functions
Γ : (0,1) k → (0,1) and Γ : (0,1) k → (0,1) such that for every ǫ > 0 there exists a τ > 0 such that
if A1,... ,Ak ⊂ Ω n are sets with Ii(Aj) ≤ τ for all i and j then
(Γ(P[A1],... ,P[Ak]) − ǫ)P[R n ] ≤ P[R n ∩ (A1,... ,Ak)] ≤ (Γ(P[A1],... ,P[Ak]) + ǫ)P[R n ].
If R is pairwise smooth, then:
|P[R n ∩ (A1,...,Ak)] − P[R n ]P[A1]...P[Ak]| ≤ ǫP[R n ].
Moreover, one can take τ = ǫO(mk+1 log(1/ǫ)/ǫ)
.
For general sets A1,...,Ak, not necessarily of low influences, there exists a set S of coordinates
such that |S| ≤ O(1/τ) and the statements above hold for A S 1 ,...,AS k
1.5 Correlated Spaces
and for AS 1 ,...,AS k .
A central concept that is extensively studied and repeatedly used in the paper is that of correlated
probability spaces. We begin by defining notions of correlation between probability spaces. We will
later show how to relate this notion to noise operators.
Definition 1.6 Given a probability measure P defined on ∏ k
i=1 Ωi , we say that Ω 1 ,... ,Ω k are
correlated spaces. For A ⊂ Ω i we let
P[A] = P[(ω1,...,ωk) ∈
k∏
Ω j : ωi ∈ A],
and similarly E[f] for f : Ω i → R. We will write notation by writing P[A] for P n [A] for A ⊂
( ∏ k
i=1 Ωi ) n or A ⊂ (Ω i ) n and similarly for E.
j=1
5Definition 1.7 Given two linear subspaces A and B of L 2 (P) we define the correlation between A
and B by
ρ(A,B;P) = ρ(A,B) = sup{Cov[f,g] : f ∈ A,g ∈ B,Var[f] = Var[g] = 1}. (7)
Let Ω = (Ω 1 × Ω 2 ,P). We define the correlation ρ(Ω 1 ,Ω 2 ;P) by letting:
ρ(Ω 1 ,Ω 2 ;P) = ρ(L 2 (Ω 1 ,P),L 2 (Ω 2 ,P);P). (8)
More generally, let Ω = ( ∏k
i=1 Ωi ,P) and for a subset S ⊂ [k], write ΩS = ∏
i∈S Ωi . The correlation
vector ρ(Ω1 ,...,Ω k ;P) is a length k − 1 vector defined by
ρ(i) = ρ(Ω {1,...,i} ,Ω {i+1,...,k} ;P),
and the correlation ρ(Ω 1 ,... ,Ω k ) is defined by letting:
ρ(Ω 1 ,...,Ω k ;P) = max ρ(i). (9)
1≤i≤k−1
Remark 1.8 It is easy to see that ρ(Ω 1 ,Ω 2 ;P) is the second singular value of the conditional
expectation operator mapping f ∈ L 2 (Ω 2 ,P) to g(x) = E[f|x] ∈ L 2 (Ω 1 ,P).
Definition 1.9 Given ( ∏k
with |S| ≤ r and for all ∏
i=1 Ωi ,P), we say that Ω1 ,...,Ω k are r-wise independent if for all S ⊂ [k]
i∈S Ai ⊂ ∏
i∈S Ωi it holds that
P[ ∏
Ai] = ∏
P[Ai].
i∈S
The notion of r-wise independence is central in computer science and discrete mathematics, in
particular in the context of randomized algorithms and computational complexity.
1.6 Gaussian Stability
Our main result states bounds in terms of Gaussian stability measures which we discuss next. Let
γ be the one dimensional Gaussian measure.
Definition 1.10 Given µ ∈ [0,1], define χµ : R → {0,1} to be the indicator function of the
interval (−∞,t], where t is chosen so that Eγ[χµ] = µ. Explicitly, t = Φ −1 (µ), where Φ denotes
the distribution function of a standard Gaussian. Furthermore, define
i∈S
Γρ(µ,ν) = P[X ≤ Φ −1 (µ), Y ≤ Φ −1 (ν)],
Γρ (µ,ν) = P[X ≤ Φ −1 (µ), Y ≥ Φ −1 (1 − ν)],
( )
1 ρ
ρ 1
Given (ρ1,... ,ρk−1) ∈ [0,1] k−1 and (µ1,... ,µk) ∈ [0,1] k for k ≥ 3 we define by induction
where (X,Y ) is a two dimensional Gaussian vector with covariance matrix
and similarly Γ().
Γρ1,...,ρk−1 (µ1,... ,µk) = Γρ1 (µ1,Γρ2,...,ρk−1 (µ2,... ,µk)),
61.7 Statements of main results
We now state our main results. We state the results both for low influence functions and for general
functions. For the second family it is useful to define the following notions:
Definition 1.11 Let f : Ω n → R and S ⊂ [n]. We define
f S (x) = sup(f(y) : y [n]\S = x [n]\S),
f S (x) = inf(f(y) : y [n]\S = x [n]\S).
Theorem 1.12 Let ( ∏ k
j=1 Ωj
i ,Pi),1 ≤ i ≤ n be a sequence of finite probability spaces such that
for all 1 ≤ i ≤ n the minimum probability of any atom in ∏k
j=1 Ωj
i
is at least α ≤ 1/2. Assume
furthermore that
ρ(Ω 1 i ,...,Ωk i ;Pi) ≤ ρ < 1,
for all i,j. Then for all ǫ > 0 there exists τ > 0 such that if
satisfy
then
ρ(Ω 1,...,j
i
,Ω j+1,...,k
i
) ≤ ρ(j) (10)
fj :
n∏
i=1
Γ ρ(E[f1],... ,E[fj]) − ǫ ≤ E[
If we instead of (10) we assume that
for all i,j ̸= j ′ then
One may take
j=1
Ω j
i
→ [0,1].
max
i,j (Infi(fj)) ≤ τ (11)
k∏
fj] ≤ Γρ(E[f1],... ,E[fj]) + ǫ. (12)
j=1
ρ(Ω j
i ,Ωj′
i
) = 0, (13)
k∏
k∏
E[fj] − ǫ ≤ E[ fj] ≤
j=1
k∏
E[fj] + ǫ. (14)
j=1
τ = ǫ O(log(1/ǫ)log(1/α)
(1−ρ)ǫ )
.
A truncation argument allows to relax the conditions on the influences.
Proposition 1.13 For statement (12) to hold in the case where k = 2 it suffices to require that
max(min(Infi(f1),Infi(f2))) ≤ τ (15)
i
instead of (11).
In the case where for each i the spaces Ω 1 i ,... ,Ωk i
to hold it suffices to require that for all i
are s-wise independent, for statement (14)
|{j : Infi(fj) > τ}| ≤ s. (16)
An easy recursive argument allows to conclude the following result that does not require low
influences (11).
7Proposition 1.14 Consider the setting of Theorem 1.12 without the assumptions on low influences
(11).
Assuming (10), there exists a set S of size O(1/τ) such that the functions f S
j satisfy
E[
k∏
j=1
and the functions f S
j satisfy
E[
j=1
Assuming (13), we have
and similarly for f.
1.8 Road Map
f S
j ] ≥ Γρ(E[f S
1 ],... ,E[f S
j ]) − ǫ ≥ Γρ(E[f1],... ,E[fj]) − ǫ,
k∏
f S
j ] ≤ Γρ(E[f S
],... ,E[fS
1 j ]) − ǫ ≤ Γρ(E[f1],... ,E[fj]) − ǫ.
E[
k∏
j=1
f S
j ] ≥
k∏
j=1
E[f S
j ] − ǫ ≥
k∏
E[fj] − ǫ,
Let us review some of the main techniques we use in this paper.
• We develop a Fourier theory on correlated spaces in Section 2. Previous work considered
Fourier theory on one product space and reversible operators with respect to that space [6].
Our results here allows to study non-reversible operators which in turn allows to study products
of k correlated spaces. An important fact we prove that is used repeatedly is that general
noise operators respect “Efron-Stein” decomposition. This fact in particular allows to “truncate”
functions to their low degree parts when considering the expected value of the product
of functions on correlated spaces.
• In order to derive an invariance principle we need to extend the approach of [16, 17] to prove
the joint invariance of a number of multi-linear polynomials. The proof of the extension
appears in sections 3 and 4. The proof follows the same main steps as in [16, 17] but requires
a number of adaptations.
• In the Gaussian realm, we need to extend Borell’s isoperimetric result [5] both in the case of
two collections of Gaussians and in the case of k > 2 collections. This is done in Section 5.
• The proof of the main result, Theorem 1.12 follows in Section 6. The proof of the extensions
given in Proposition 1.13 uses a truncation argument for which s-wise independence plays a
crucial role. The proof of Proposition 1.14 is based on a simple “weak regularity” argument.
• In Section 7 we apply the noise bounds in order to derive the social choice results. Some
calculation with the majority function in the social choice setting, in particular showing the
tightness of theorems 1.1 and 1.3 are given in Section 7.5. We conclude by discussing the
applications to hyper-graphs in Section 8.
j=1
81.9 Subsequent Work And Applications in Computer Science
Subsequently to posting a draft of this paper on the Arxiv, two applications of our results to
hardness of approximation in computer science have been established. Both results are in the
context of the Unique Games conjecture in computational complexity [13]. Furthermore, both
result consider an important problem in computer science, that is - the problem of solving constraint
satisfaction problems (CSP)
.
Given a predicate P : [q] k → {0,1}, we define Max CSP(P) to be the algorithmic problem
where we are given a set of variables x1,... ,xn taking values in [q] and a set of constraints of the
form P(l1,... ,lk) where each li = xj + a where xj is one of the variables and a ∈ [q] is a constant
(addition is mod q). More generally the problem of Max k-CSPq we are given set of constraints
each involving k of the variables x1,... ,xn.
The objective is to find an assignment to the variables satisfying as many of the constraints as
possible. The problem of Max k-CSPq is NP-hard for any k ≥ 2,q ≥ 2, and as a consequence,
a lot of research has been focused on studying how well the problem can be approximated. We
say that a (randomized) algorithm has approximation ratio α if, for all instances, the algorithm is
guaranteed to find an assignment which (in expectation) satisfies at least α ·Opt of the constraints,
where Opt is the maximum number of simultaneously satisfied constraints, over any assignment.
The results of [1] show that assuming the Unique Games Conjecture, for any predicate P if
there exists a pairwise independent distribution over [q] k with uniform marignals, whose support is
contained in P −1 (1) then P is approximation resilient. In other words, there is no polynomial time
algorithm which achieves better approximation factor than assigning the variables at random. This
result imply in turn that For general k ≥ 3 and q ≥ 2, the Max k-CSPq problem is UG-hard to
approximate within O(kq 2 )/q k + ǫ. Moreover, for the special case of q = 2, i.e., boolean variables,
it gives hardness of (k+O(k 0.525 ))/2 k +ǫ, improving upon the best previous bound [23] of 2k/2 k +ǫ
by essentially a factor 2. Finally, again for q = 2, assuming that the famous Hadamard Conjecture
is true, this can be improved even further, and the O(k 0.525 ) term can be replaced by the constant
4.
These results should be compared to prior work by Samordnitsky and Trevisan [23] who using
the Gowers norm, proved that the Max k-CSP problem has a hardness factor of 2 ⌈log 2 k+1⌉ /2 k ,
which is (k + 1)/2 k for k = 2 r − 1, but can be as large as 2k/2 k for general k.
From the quantitative point of view [1] give stronger stronger hardness than [23] for Max k-CSPq,
even in the already thoroughly explored q = 2 case. These improvements may seem very small,
being an improvement only by a multiplicative factor 2. However, it is well known that it is impossible
to get non-approximability results which are better than (k + 1)/2 k , and thus, in this respect,
the hardness of (k + 4)/2 k assuming the Hadamard Conjecture is in fact optimal to within a very
small additive factor. Also, the results of [1] give approximation resistance of Max CSP(P) for a
much larger variety of predicates (any P containing a balanced pairwise independent distribution).
From a qualitative point of view, the analysis of [1] is very direct and general enough to accommodate
any domain [q] with virtually no extra effort. Also, their proof using the main result of the
current paper, i.e., bounds on expectations of products under certain types of correlation, putting
it in the same general framework as many other UGC-based hardness results, in particular those
for 2-CSPs.
In a second beautiful result by Raghavendra [21] the results of the current paper were used to
obtained very general hardness results for Max k-CSPP. In [21] it is shown that for every predicate
9P and for every approximation factor which is smaller than the UG-hardness of the problem, there
exists a polynomial time algorithm which achieves this approximation ratio. Thus for every P the
UG-hardness of Max k-CSPP is sharp. The proof of the results uses the results obtained here in
order to define and analyze the reduction from UG given the integrality gap of the corresponding
convex optimization problem. We note that for most predicates the UG hardness of Max k-CSPP
is unknown and therefore the results of [21] complement those of [1].
1.10 Acknowledgments
I would like to thank Noam Nisan for suggesting that generalization of the invariance principle
should be useful in the social choice context and Gil Kalai for pointing out some helpful references.
I would also like to thanks Terence Tao for helpful discussions and references on additive number
theory and Szemeredi regularity. Finally, Thanks to Per Austrin, Marcus Issacson and Nick
Crawford for careful reading of a draft of this paper.
2 Correlated Spaces and Noise
In this section we define and study the notion of correlated spaces and noise operators in a general
setting.
2.1 Correlated Probability Spaces and Noise Operators
We begin by defining noise operators and giving some basic examples.
Definition 2.1 Let (Ω 1 × Ω 2 ,P) be two correlated spaces. The Markov Operator associated with
(Ω 1 ,Ω 2 ) is the operator mapping f ∈ L p (Ω 2 ,P) to Tf ∈ L p (Ω 1 ,P) by:
(Tf)(x) = E[f(y)|x].
Example 2.2 In order to define Bonami-Beckner operator T = Tρ on a space (Ω,µ), consider the
space (Ω × Ω,ν) where ν(x,y) = (1 − ρ)µ(x)µ(y) + ρδ(x = y)µ(x). In this case, the operator T
satisfies:
Tf(x) = E[f(y)|x],
where the conditional distribution of y given x is ρδx + (1 − ρ)µ.
Remark 2.3 The construction above can be generalized as follows. Given any Markov chain on Ω
that is reversible with respect to µ, we may look at the measure ν on Ω × Ω defined by the Markov
chain. In this case T is the Markov operator determined by the chain. The same construction
applies under the weaker condition that T has µ as its stationary distribution.
It is straightforward to verify that:
Proposition 2.4 Suppose that for each 1 ≤ i ≤ n, (Ω1 i × Ω2i ,µi) are correlated spaces and Ti
is the Markov operator associated with Ω1 i and Ω2i . Then (∏ n
i=1 Ω1i ,∏ n
i=1 Ω2i ,∏ n
i=1 µi) defines two
correlated spaces and the Markov operator T associated with them is given by T = ⊗n i=1Ti. Example 2.5 For product spaces ( ∏n
i=1 Ωi, ∏n
i=1 µi), the Bonami-Beckner operator T = Tρ is
defined by T = ⊗n i=1T i ρ , where T i is the Bonami-Beckner operator on (Ωi × Ωi,µi). This noise
operator is the one most commonly discussed in previous work, see e.g. [10, 14, 17]. In a more
recent work [6] the case of Ωi × Ωi with Ti a reversible Markov operator with respect to a measure
µi on Ωi was studied.
10Example 2.6 In the first social choice example the space Ω = {±1} × {0, ±1} where element
(x,y) ∈ Ω corresponds to a voter with vote x and a sampled vote y where either y = x if the vote
is queried or y = 0 otherwise. The probability measure µ is given by
and
µ(x,y) = 1
(δ(x = 1) + δ(x = −1))(ρδ(y = x) + (1 − ρ)δ(y = 0)).
2
Note that the marginal distributions on ΩS = {0, ±1} and ΩV = {±1} are given by
µ = (1 − ρ)δ0 + ρ
2 (δ−1 + δ1), ν = 1
2 (δ−1 + δ1),
ν(·| ± 1) = δ±1, ν(·|0) = 1
2 (δ1 + δ−1).
Given independent copies µi of µ and νi of ν, the measure µ = ⊗ n i=1 µi corresponds to the distribution
of a sample of voters where each voter is sampled independently with probability ρ and the
distribution of the voters is given by ν = ⊗ n i=1 νi.
Example 2.7 The second non-reversible example is natural in the context of Condorcet voting.
For simplicity, we first discuss the case of 3 possible outcomes. The general case is discussed later.
Let τ denote the uniform measure on the set permutations on the set [3] denoted S [3]. Note that
each element σ ∈ S [3] defines an element f ∈ {−1,1} ([3]
2
) by letting f(i,j) = sgn(σ(i) − σ(j)). The
measure so defined, defined 3 correlated probability spaces ({±1} 3 ,P).
Note that the projection of P to each coordinate is uniform and
and
P(f(3,1) = −1|f(1,2) = f(2,3) = 1) = 0, P(f(3,1) = 1|f(1,2) = f(2,3) = −1) = 0,
P(f(3,1) = ±|f(1,2) ̸= f(2,3)) = 1/2.
2.2 Properties of Correlated Spaces and Noise Operators
Here we derive properties of correlated spaces and noise operators that will be repeatedly used
below.
Lemma 2.8 Let (Ω 1 × Ω 2 ,P) be two correlated spaces. Let f be an Ω 2 measurable function with
E[f] = 0, and E[f 2 ] = 1. Then among all g that are Ω 1 measurable satisfying E[g 2 ] = 1, a
maximizer of |E[fg]| is given by
g =
where Tf is the noise operator associated with Ω 1 ,Ω 2 . Moreover,
Tf
√
E[(Tf)
2
]
, (17)
|E[gf]| = |E[fTf]|
√
E[(Tf) 2 ]
= √ E[(Tf) 2 ]. (18)
Proof: To prove (17) let h be an Ω 1 measurable function with ‖h‖2 = 1. Write h = αg+βh ′ where
α 2 + β 2 = 1 and ‖h ′ ‖2 = 1 is orthogonal to g. From the properties of conditional expectation it
follows that E[fh ′ ] = 0. Therefore we may choose an optimizer satisfying α ∈ ±1. Equation (18)
follows since Tf is a conditional expectation. The same reasoning shows that E[fg] = 0 for every
Ω 1 measurable function g if Tf is identically 0.
✷
The following lemma is useful in bounding ρ in generic situations. Roughly speaking, it shows
that connectivity of the support of P on correlated spaces Ω 1 × Ω 2 implies that ρ < 1.
11Lemma 2.9 Let (Ω 1 ×Ω 2 ,P) be two correlated spaces such that the probability of the smallest atom
in Ω 1 × Ω 2 is at least α > 0. Define a bi-partite graph G = (Ω 1 ,Ω 2 ,E) where (a,b) ∈ Ω 1 × Ω 2
satisfies (a,b) ∈ E if P(a,b) > 0. Then if G is connected then
ρ(Ω 1 ,Ω 2 ;P) ≤ 1 − α 2 /2.
Proof: For the proof it would be useful to consider G ′ = (Ω 1 ∪ Ω 2 ,E), a weighted directed graph
where W(a → b) = P[b|a] and W(b → a) = P[a|b]. Note that the minimal non-zero weight must
be at least α and that W(a → b) > 0 iff W(b > a) > 0. This in turn implies that G ′ is strongly
connected. Note that G ′ is bi-partite.
Let A be the transition probability matrix defined by the weighted graph G ′ . Since G ′ is
connected and W(a,b) ≥ α for all a and b such that W(a → b) > 0, it follows by Cheeger’s inequality
that the spectral gap of A is at least α 2 /2. Our goal is to bound ‖Af‖2 for a function f that is
supported on Ω 1 and satisfies E[f] = 0. Note that such function is orthogonal to the eigen-vectors
of A corresponding to the eigenvalues −1 and 1. It therefore follows that ‖Af‖2 ≤ (1 − α 2 /2)‖f‖2
as needed. ✷
One nice property of noise operators that will be repeatedly used below is that they respect the
Efron-Stein decomposition. Given a vector x in an n dimensional product space and S ⊂ [n] we
write xS for the vector (xi : i ∈ S). Given probability spaces Ω1,...,Ωn, we use the convention of
writing Xi for a random variable that is distributed according to the measure of Ωi and xi for an
element of Ωi. We will also write XS for (Xi : i ∈ S).
Definition 2.10 Let (Ω1,µ1),... ,(Ωn,µn) be discrete probability spaces (Ω,µ) = ∏ n
i=1 (Ωi,µi).
The Efron-Stein decomposition of f : Ω → R is given by
where the functions fS satisfy that:
• fS depends only on xS.
• For all S ̸⊆ S ′ and all xS ′
f(x) = ∑
it holds that:
S⊆[n]
fS(xS), (19)
E[fS|XS ′ = xS ′] = 0.
It is well known that the Efron-Stein decomposition exists and that it is unique.
We now prove that the Efron-Stein decomposition “commutes” with noise operators.
Proposition 2.11 Let (Ω1 i ×Ω2i ,Pi) be correlated spaces and let Ti the Markov operator associated
with Ω1 i and Ω2i for 1 ≤ i ≤ n. Let
Ω 1 =
n∏
i=1
Ω 1 i , Ω2 =
n∏
i=1
Ω 2 i , P =
n∏
Pi,
i=1
T = ⊗ n i=1 Ti.
Suppose f ∈ L 2 (Ω 2 ) has Efron-Stein decomposition (19). Then the Efron-Stein decomposition of
Tf satisfies:
(Tf)S = T(fS).
12Proof: Clearly T(fS) is a function of xS only. Moreover, for all S ̸⊆ S ′ and x 0 S ′,y 0 S ′
it holds that
E[T(fS)(X)|XS ′ = x0S ′] = E[fS(Y )|XS′ = x0S ′] = E[E[fS(Y )|YS′]P[YS ′|XS ′ = x0S ′]] = 0,
concluding the proof. ✷
We next derive a useful bound showing that in the setting above if ρ(Ω 1 i × Ω2 i
then Tf depends on the “low degree expansion” of f.
;P) < 1 for all i
Proposition 2.12 Assume the setting of Proposition 2.11 and that further for all i it holds that
ρ(Ω 1 i ,Ω2 i ;Pi) ≤ ρi. Then for all f it holds that
‖T(fS)‖2 ≤
(
∏
i∈S
ρi
)
‖fS‖2.
Proof: Without loss of generality we may assume that S = [n]. For each 0 ≤ r ≤ n, let T (r) denote
the following operator. T (r) maps a function g of z = (z1,... ,zn) = (x1,... ,xr−1,yr,... ,yn) to a
function T (r) g of w = (w1,...,wn) = (x1,... ,xr,yr+1,... ,yn) defined as follows:
T (r) g(w) = E[g(Z)|W = w].
(Here Z = (X1,... ,Xr−1,Yr,...,Yn) and similarly W).
Let g be a function such that for any subset S � [n] and all zS,
We claim that
and that for all subsets S � [n] it holds that
E[g(Z)|ZS = zS] = 0.
‖T (r) g‖2 ≤ ρr‖g‖2 (20)
E[(T (r) g)(W)|WS = wS] = 0. (21)
Note that (20) and (21) together imply the desired bound as T = T (n) · · · T (1) .
So
For (20) note that if S = [n] \ {r} and f = T (r) g then by lemma 2.8
E[f 2 (W)|ZS = zS] = E[g(Z)f(W)|ZS = zS] ≤ ρr
which gives ‖f‖2 ≤ ρr‖g‖2 by integration.
For (21) we note that if S = [n] \ {t} then
E[f 2 (W)|ZS = zS] ≤ ρ 2 rE[g 2 (W)|ZS = zS],
√
E[f 2 (W)|ZS = zS]E[g 2 (Z)|ZS = zS].
E[f(W)|WS = wS] = E[g(Z)|WS = wS] = E[E[g(Z)|ZS]P[ZS|WS = wS]] = 0.
This concludes the proof. ✷
Proposition 2.13 Assume the setting of Proposition 2.11. Then
n∏
ρ(
i=1
Ω 1 i ,
n∏
i=1
Ω 2 i ;
n∏
i=1
Pi) = maxρ(Ω 1 i ,Ω2 i ).
13Proof: Let f ∈ L2 ( ∏n
i=1 Ω2i decomposition
) with E[f] = 0 and Var[f] = 1. Expand f according to its Efron-Stein
f = ∑
fS,
S̸=∅
(f ∅ = 0 since E[f] = 0). Then by propositions 2.11 and 2.12
E[(Tf) 2 ] = E[(T( ∑
fS)) 2 ] = E[( ∑
TfS) 2 ] = ∑
E[(Tfs) 2 ] ≤ ∑ ∏
≤
max
i
ρ2i
S
∑
S̸=∅
The other inequality is trivial. ✷
S
‖fS‖ 2 2 = max
i ρ2 i.
3 Background: Influences and Hyper-Contractivity
S
S̸=∅ i∈S
ρi‖fS‖ 2 2
In this section we recall and generalize some definitions and results from [17]. In particular, the
generalizations allows the study non-reversible noise operators and correlated ensembles. For the
reader that is familiar with [17] it suffices to look at subsections 3.3 and 3.5.
3.1 Influences and noise stability in product spaces
Let (Ω1,µ1),... ,(Ωn,µn) be probability spaces and let (Ω,µ) denote the product probability space.
Let
f = (f1,... ,fd) : Ω1 × · · · × Ωn → R d .
Definition 3.1 The influence of the ith coordinate on f is
Infi(f) = ∑
E [Var[fj]].
µ µi
3.2 Multi-linear Polynomials
1≤j≤d
In this sub-section we recall and slightly generalize the setup and notation used in [17]. Recall that
we are interested in functions on product of finite probability spaces, f : Ω1 × · · · × Ωn → R. For
each i, the space of all functions Ωi → R can be expressed as the span of a finite set of orthonormal
random variables, Xi,0 = 1,Xi,1,Xi,2,Xi,3,... ; then f can be written as a multilinear polynomial
in the Xi,j’s. In fact, it will be convenient for us to mostly disregard the Ωi’s and work directly with
sets of orthonormal random variables; in this case, we can even drop the restriction of finiteness.
We thus begin with the following definition:
Definition 3.2 We call a collection of finitely many orthonormal real random variables, one of
which is the constant 1, an orthonormal ensemble. We will write a typical sequence of n orthonormal
ensembles as X = (X 1,... ,Xn), where X i = {Xi,0 = 1,Xi,1,... ,Xi,mi }. We call a
sequence of orthonormal ensembles X independent if the ensembles are independent families of
random variables.
We will henceforth be concerned only with independent sequences of orthonormal ensembles,
and we will call these sequences of ensembles, for brevity.
14Remark 3.3 Given a sequence of independent random variables X1,... ,Xn with E[Xi] = 0 and
E[X 2 i ] = 1, we can view them as a sequence of ensembles X by renaming Xi = Xi,1 and setting
Xi,0 = 1 as required.
Definition 3.4 We denote by G the Gaussian sequence of ensembles, in which Gi = {Gi,0 =
1,Gi,1,Gi,2,...} and all Gi,j’s with j ≥ 1 are independent standard Gaussians.
As mentioned, we will be interested in multilinear polynomials over sequences of ensembles.
By this we mean sums of products of the random variables, where each product is obtained by
multiplying one random variable from each ensemble.
Definition 3.5 A multi-index σ is a sequence (σ1,... ,σn) in Nn . The degree of σ, denoted |σ|, is
|{i ∈ [n] : σi > 0}|. Given a doubly-indexed set of indeterminates {xi,j} i∈[n],j∈N, we write xσ for the
monomial ∏n
xi,σi i=1
. We now define a multilinear polynomial over such a set of indeterminates
to be any expression
Q(x) = ∑
cσxσ (22)
where the cσ’s are real constants, all but finitely many of which are zero. The degree of Q(x) is
max{|σ| : cσ ̸= 0}, at most n. We also use the notation
Q ≤d (x) = ∑
and the analogous Q =d (x) and Q >d (x).
σ
|σ|≤d
cσxσ
Naturally, we will consider applying multilinear polynomials Q to sequences of ensembles X;
the distribution of these random variables Q(X) is the subject of our invariance principle. Since
Q(X) can be thought of as a function on a product space Ω1×· · ·×Ωn as described at the beginning
of this section, there is a consistent way to define the notions of influences, Tρ, and noise stability
from Section 3.1. For example, the “influence of the ith ensemble on Q” is
Infi(Q(X)) = E[Var[Q(X) | X 1,... ,Xi−1,Xi+1,... ,Xn]].
Using independence and orthonormality, it is easy to show the following formulas, familiar from
harmonic analysis of boolean functions:
Proposition 3.6 Let X be a sequence of ensembles and Q a multilinear polynomial as in (22).
Then
E[Q(X)] = c0; E[Q(X) 2 ] = ∑
c 2 σ; Var[Q(X)] = ∑
c 2 σ;
and
σ
Infi(Q(X)) = ∑
σ:σi>0
For ρ ∈ [0,1] we define the operator Tρ as acting formally on multilinear polynomials Q(x) as
in (22) by
(TρQ)(x) = ∑
ρ |σ| cσxσ.
We finally recall the notion of “low-degree influences”, a notion that has proven crucial in the
analysis of PCPs in hardness of approximation in computer science (see, e.g., [14]).
σ
c 2 σ;
|σ|>0
15Definition 3.7 The d-low-degree influence of the ith ensemble on Q(X) is
Inf ≤d
i
(Q(X)) = Inf ≤d
i
(Q) = ∑
c
σ:|σ|≤d,σi>0
2 σ .
Note that this gives a way to define low-degree influences Inf ≤d
i
(f) for functions f : Ω1 ×· · ·Ωn → R
on finite product spaces.
There isn’t an especially natural interpretation of Inf ≤d
i
(f). However, the notion is important for
PCPs due to the fact that a function with variance 1 cannot have too many coordinates with
substantial low-degree influence; this is reflected in the following easy proposition:
Proposition 3.8 Suppose Q is multilinear polynomial as in (22). Then
∑
i
Inf ≤d
i
(Q) ≤ d · Var[Q].
3.3 Vector valued multi-linear polynomials
For the invariance principle discussed here we will need to consider vector-valued multi-linear
polynomials.
Definition 3.9 A k-dimensional multilinear polynomial over a set of indeterminates is given by
Q = (Q1,... ,Qk) (23)
where each Qj is a multi-linear polynomial as in (22). The degree of Q is the maximal degree of
the Qj’s.
Definition 3.10 We adopt the standard notation and write ‖Q‖q for E1/q [ ∑k
j=1 |Qi| q ]; we write
Var[Q] for E[‖Q − E[Q]‖2 2 ] and
Infi(Q(X)) =
Using these definitions, it is easy to see that
k∑
Infi(Qj(X))
j=1
Proposition 3.11 Let X be a sequence of ensembles and Q = (Q1,...,Qk) a k dimensional
multilinear polynomial where Qj is defined as in 22 with c j σ as its coefficients. Then:
E[Q(X)] = (c 1 0 ,...,cd 0 );
‖Q(X)‖2 ∑
2 =
j,σ
c j2
σ ;
∑
Var[Q(X)] =
j,|σ j |>0
Finally, we recall the standard multi-index notation associated with k-dimensional multi-linear
polynomials. A multi-index i of dimension k is a vector (i1,...,id), where each ij is an integer.
We write |i| for i1 + · · · + id. Given a function ψ of d variables, we will write ψ (i) for the partial
derivative of f taken i1 times with respect to the first variable, i2 with respect to the second etc.
(we will only consider functions ψ that are smooth enough that the order of derivatives does not
matter). We will also write Q i for the product Q i1
1
· · · Qid
d .
c j2
σ .
163.4 Hypercontractivity
As in [17] the invariance principle requires that the ensembles involved are (2,q,η)-hypercontractive
with some η ∈ (0,1) if and only if E[Y ] = 0 and E[|Y | q ] < ∞. Also, if Y is (2,q,η)-hypercontractive
then η ≤ (q − 1) −1/2 .
Definition 3.12 Let X be a sequence of ensembles. For 1 ≤ p ≤ q < ∞ and 0 < η < 1 we say
that X is (p,q,η)-hypercontractive if
for every multilinear polynomial Q over X.
Since Tη is a contractive semi-group, we have
‖(TηQ)(X)‖q ≤ ‖Q(X)‖p
Remark 3.13 If X is (p,q,η)-hypercontractive then it is (p,q,η ′ )-hypercontractive for any 0 <
η ′ ≤ η.
There is a related notion of hypercontractivity for sets of random variables which considers all
polynomials in the variables, not just multilinear polynomials; see, e.g., Janson [9]. We summarize
some of the basic properties below, see [17] for details.
Proposition 3.14 Suppose X is a sequence of n1 ensembles and Y is an independent sequence of
n2 ensembles. Assume both are (p,q,η)-hypercontractive. Then the sequence of ensembles X ∪Y =
(X 1,... ,Xn1 ,Y1,... ,Yn2 ) is also (p,q,η)-hypercontractive.
Proposition 3.15 Let X be a (2,q,η)-hypercontractive sequence of ensembles and Q a multilinear
polynomial over X of degree d. Then
‖Q(X)‖q ≤ η −d ‖Q(X)‖2.
We end this section by recording the optimal hypercontractivity constants for the ensembles we
consider. The result for ±1 Rademacher variables is well known and due originally to Bonami [4]
and independently Beckner [2]; the same result for Gaussian and uniform random variables is also
well known and in fact follows easily from the Rademacher case. The optimal hypercontractivity
constants for general finite spaces was recently determined by Wolff [24] (see also [20]):
Theorem 3.16 Let X denote either a uniformly random ±1 bit, a standard one-dimensional Gaussian,
or a random variable uniform on [− √ 3, √ 3]. Then X is (2,q,(q − 1) −1/2 )-hypercontractive.
Theorem 3.17 (Wolff) Let X be any mean-zero random variable on a finite probability space
in which the minimum nonzero probability of any atom is α ≤ 1/2. Then X is (2,q,ηq(α))hypercontractive,
where
(
A
ηq(α) =
1/q′
− A−1/q′ ) −1/2
Note the following special case:
with
A =
A 1/q − A −1/q
1 − α
α , 1/q + 1/q′ = 1.
17Proposition 3.18
η3(α) =
(
A 1/3 + A −1/3) −1/2
α→0
∼ α
1/6
,
and also
for all α ∈ [0,1/2].
1
2 α1/6 ≤ η3(α) ≤ 2 −1/2 ,
Proposition 3.19 Let X be a mean-zero random variable satisfying E[|X| q ] < ∞. Then X is
(2,q,ηq)-hypercontractive with ηq =
‖X‖2
2 √ q−1‖X‖q .
In particular, when E[X] = 0, E[X 2 ] = 1, and E[|X| 3 ] ≤ β, we have that X is (2,3,2 −3/2 β −1/3 )-
hypercontractive.
3.5 Vector Hyper-Contraction
For our purposes we will also need to obtain hyper-contraction results in cases where Q is a k-
dimensional multi-linear polynomial. We will need to consider vector-valued multi-linear polynomials.
Proposition 3.20 Let X be a (2,q,η)-hypercontractive sequence of ensembles and Q a multilinear
polynomial over X of degree d and dimension k. Assume q is integer and let i be a multi-index
with |i| = q. Then
k∏
Proof:
E[|Q i (X)|] ≤
E[|Q i (X)|] ≤ η −dq
k∏
j=1
j=1
E[|Qj| q ] ij/q
≤ η
−dq
E[Q 2 j ]ij/2 ,
k∏
j=1
E[Q 2 j ]ij/2 ,
where the first inequality is Hölder and the second follows by hyper-contractivity. ✷
4 Multi-dimensional Invariance principle
In this section we generalize the invariance principle from [17] to the multi-dimensional setting. We
omit some easy steps that are either identical or easy adaptation of the proofs of [17].
4.1 Hypotheses for invariance theorems
Below we will prove a generalization of the invariance principle [17]. The invariance principle proven
there concerns a multilinear polynomial Q over two hypercontractive sequences of ensembles, X
and Y; furthermore, X and Y are assumed to satisfy a “matching moments” condition, described
below.
It is possible to generalize the invariance principle to vector valued multi-linear polynomials
under each of the hyper-contractivity assumptions H1,H2,H3 and H4 of [17]. However, since
the proof of all generalizations is essentially the same and since for the applications studied here
it suffices to consider the hypothesis H3, this is the only hypothesis that will be discussed in the
paper. It is defined as follows:
18H3 Let X be a sequence of n ensembles in which the random variables in each ensemble X i form
a basis for the real-valued functions on some finite probability space Ωi. Further assume that
the least nonzero probability of any atom in any Ωi is α ≤ 1/2, and let η = 1
2α1/6 . Let Y be
any (2,3,η)-hypercontractive sequence of ensembles such that all the random variables in Y
are independent. Finally, let Q be a k dimensional multilinear polynomial as in (23).
4.2 Functional Setting
The essence of our invariance principle is that if Q is of bounded degree and has low influences then
the random variables Q(X) and Q(Y) are close in distribution. The simplest way to formulate this
conclusion is to say that if Ψ : R k → R is a sufficiently nice “test function” then Ψ(Q(X)) and
Ψ(Q(Y)) are close in expectation.
Theorem 4.1 Assume hypothesis H3. Further assume that Q is a k dimensional multi-linear
polynomial, that Var[Q] ≤ 1, deg(Q) ≤ d, and Infi(Q) ≤ τ for all i. Let Ψ : R k → R be a C 3
function with |Ψ (i) | ≤ B uniformly for every vector i with |i| = 3. Then
∣
∣E [ Ψ(Q(X)) ] − E [ Ψ(Q(Y)) ]∣ ∣ ∣ ≤ ǫ := 2dB k
3
(8α
−1/2
)
d
τ
1/2
.
Proof: Note that by Proposition 3.18, the random variables satisfy hyper-contractivity with η =
1
2 α1/6 .
We begin by defining intermediate sequences between X and Y. For i = 0,1,... ,n, let Z (i)
denote the sequence of n ensembles (Y1,... ,Yi,Xi+1,...,Xn) and let Q (i) = Q(Z (i) ). Our goal
will be to show ∣
∣∣E [
Ψ(Q
(i−1)
)
]
− E
[
Ψ(Q
(i)
)
] ∣ ∣ ∣ ≤ 2Bk
3
η
−3d
Infi(Q) 3/2
(24)
for each i ∈ [n]. Summing this over i will complete the proof since Z (0) = X, Z (n) = Y, and
n∑
n∑
Infi(Q) = τ 1/2 n∑
·
i=1
Infi(Q) 3/2 ≤ τ 1/2 ·
i=1
i=1
Inf ≤d
i
(Q) ≤ dτ 1/2 ,
where we used Proposition 3.8 and ∑
j Var[Qj] ≤ 1.
Let us fix a particular i ∈ [n] and proceed to prove (24). Given a multi-index σ, write σ \ i for
the same multi-index except with σi = 0. Now write
˜Q =
R =
S =
∑
σ:σi=0
∑
σ:σi>0
∑
σ:σi>0
cσZ (i)
σ ,
cσXi,σi
cσYi,σi
· Z(i)
σ\i ,
· Z(i)
σ\i .
Note that ˜ Q and the variables Z (i)
σ\i are independent of the variables in X i and Yi and that
Q (i−1) = ˜ Q + R and Q (i) = ˜ Q + S.
To bound the left side of (24) — i.e., |E[Ψ( ˜ Q + R) − Ψ( ˜ Q + S)]| — we use Taylor’s theorem:
for all x,y ∈ R,
∣
∣Ψ(x + y) − ∑ Ψk (x) yk ∣ ≤
k!
∑ B
k! |y|k.
|k|<3
19
|k|=3In particular,
and similarly,
∣
∣E[Ψ( ˜ Q + R)] − ∑
|k|<3
∣
∣E[Ψ( ˜ Q + S)] − ∑
|k|<3
[
Ψ
(k)( Q) ˜ Rk]∣ ∣∣ ∑
E
≤
k!
|k|=3
[
Ψ
(k)( Q) ˜ Sk]∣ ∣∣
∑
E
≤
k!
|k|=3
B
k! E[ |R| k]
B
k! E[ |S| k]
We will see below that R and S have finite 3’rd moments. Moreover, for 0 ≤ k ≤ r with |r| = 3
it holds that |Ψ (k) ( ˜ Q)R k | ≤ |k!B ˜ Q r−k R k | (and similarly for S). Thus all moments above are
finite. We now claim that for all 0 ≤ |k| < 3 it holds that
(25)
(26)
E[Ψ (k) ( ˜ Q)R k ] = E[Ψ (k) ( ˜ Q)S k ]. (27)
This follows from the fact that the expressions in the expected values when viewed as multi-linear
polynomials in the variables in X i and Yi respectively are of degree ≤ 2 and each monomial term
in X i has the same coefficient as the corresponding monomial in Yi.
From (25), (26) and (27) it follows that
|E[Ψ( ˜ Q + R) − Ψ( ˜ Q + S)]| ≤ ∑
r=3
B
r! (E[|R|r ] + E[|S| r ]). (28)
We now use hypercontractivity. By Proposition 3.14 each Z (i) is (2,r,η)-hypercontractive. Thus
by Proposition 3.20,
E[|R| r ] ≤ η −3d
k∏
j=1
E[R 2 j ]rj/2 ,
E[|S|
r
] ≤ η
−3d
k∏
j=1
E[S 2 j ]rj/2 . (29)
However,
E[S 2 j ] = E[R2j ] =
∑
Combining (28), (29) and (30) it follows that
confirming (24) and completing the proof. ✷
σ j :σ j
i >0
c 2 σ j = Infi(Qj) ≤ Infi(Q). (30)
|E[Ψ( ˜ Q + R) − Ψ( ˜ Q + S)]| ≤ 2Bk 3 η −3d · Infi(Q) 3/2
4.3 Invariance principle — other functionals, and smoothed version
The basic invariance principle shows that E[Ψ(Q(X))] and E[Ψ(Q(Y))] are close if Ψ is a C 3
functional with bounded 3rd derivative. To show that the distributions of Q(X) and Q(Y) are close
in other senses we need the invariance principle for less smooth functionals. This we can obtain
using straightforward approximation arguments, see for example [17]. For applications involving
bounded functions, it will be important to bound the following functionals. We let f [0,1] : R → R
be defined by
f [0,1](x) = max(min(x,1),0) = min(max(x,0),1),
and ζ : R k → R be defined by
ζ(x) =
k∑
i=1
(xi − f [0,1](xi)) 2 . (31)
20Similarly, we define
χ(x) =
Repeating the proofs of [17] one obtains:
k∏
f [0,1](xi). (32)
i=1
Theorem 4.2 Assume hypothesis H3. Further assume that Q = (Q1,... ,Qd) is a d-dimensional
≤ log(1/τ)/K
multi-linear polynomial with Var[Q] ≤ 1 and Inf (Q) ≤ τ for all i, where
i
K = log(1/α).
Suppose further that for all d it holds that Var[Q >d ] ≤ (1−γ) 2d where 0 < γ < 1. Write R = Q(X)
and S = Q(Y). Then ∣
∣∣E [
ζ(R)
]
− E
[
ζ(S)
] ∣ ∣ ∣ ≤ τ
Ω(γ/K)
,
where the Ω(·) hides a constant depending only on k. Similarly,
∣
∣E [ χ(R) ] − E [ χ(S) ]∣∣ Ω(γ/K) ≤ τ .
Proof: The proof for ζ uses the fact that the function ζ admits approximations ζλ such that
•
•
for all r with |r| = 3.
‖ζ − ζλ‖∞ = O(λ 2 ),
‖ζ r λ ‖∞ = O(λ −1 ),
This implies that for all k dimensional degree d polynomials we have:
∣
∣E [ ζ(R) ] − E [ ζ(S) ]∣∣ −d/3 −1/3 ≤ O(α τ ). (33)
See [17] for details. In order to obtain the result for polynomials with decaying tails we use the
fact that
k∑
|ζ(a1 + b1,...,ak + bk) − ζ(a1,... ,ak)| ≤ (|aibi| + b 2 i ).
This implies that truncating the polynomials at level d results in an error of at most exp(−dγ)
which together with the bound (33)implies the desired bound for ζ.
The proof for χ is similar as the function ζ admits approximations χλ such that
i=1
•
•
for all r with |r| = 3.
‖χ − χλ‖∞ = O(λ),
‖χ r λ ‖∞ = O(λ −2 ),
✷
Of particular interest to us is the following corollary.
21Corollary 4.3 Assume hypothesis H3. For each 1 ≤ i ≤ n and 1 ≤ j ≤ k, let X j
i denote
an ensemble all of whose elements are linear combinations of elements in X i. Similarly for each
1 ≤ i ≤ n and 1 ≤ j ≤ k, let Y j
denote an ensemble all of whose elements are linear combinations
of elements in Yi.
i
Assume further that for all i and j it holds that |Y j
i
correspondence between Y ∈ Y j
i
and X ∈ X j
i
given by:
given by:
|Yi|
∑
Y = a(i,j,k)Yk,
k=1
|Xi|
∑
X = a(i,j,k)Xk.
k=1
j
| = |X
i
| and that there is one to one
Let Q = (Q1,... ,Qk) a be multi-linear polynomial with Var[Q] ≤ 1 and Inf
for all i where
K = log(1/α).
≤ log(1/τ)/K
i
(Q) ≤ τ
Suppose further that for all d it holds that Var[Q >d
j
] ≤ (1 − γ) 2d where 0 < γ < 1. Write
R = (Q1(X 1 ),... ,Qk(X k )) and S = (Q1(Y1),... ,Qk(Y k )). Then
∣
∣E [ ζ(R) ] − E [ ζ(S) ]∣∣ Ω(γ/K) ≤ τ ,
and ∣
∣∣E [
χ(R)
]
− E
[
χ(S)
] ∣ ∣ ∣ ≤ τ
Ω(γ/K)
,
where the Ω(·) hides a constant depending only on k.
Proof: The proof follows immediately from the previous theorem noting that Infi and Var[Q >d ]
are basis independent. ✷
5 Noise in Gaussian Space
In this section we derive the Gaussian bounds correlation bounds needed for our applications. The
first bound derived in subsection 5.1 is an easy extension of [5]. The second one give a quantitative
estimate one iterations of the first bound that will be needed for some of the applications.
5.1 Noise stability in Gaussian space
We begin by recalling some definitions and results relevant for “Gaussian noise stability”. Throughout
this section we consider R n to have the standard n-dimensional Gaussian distribution, and our
probabilities and expectations are over this distribution. Recall Definition 1.10. We denote by Uρ
the Ornstein-Uhlenbeck operator acting on L 2 (R n ) by
(Uρf)(x) = E
y
[f(ρx + √ 1 − ρ 2 y)],
where y is a random standard n-dimensional Gaussian.
22It is immediate to see that if f,g ∈ L 2 (R n ) then
E[fUρg] = E[gUρf] = E[f(X1,...,Xn)g(Y1,...,Yn)]
where (Xi,Yi) are independent two dimensional Gaussian vectors with covariance matrix
The results of Borell [5] imply the following (see [17] for more details):
( )
1 ρ
ρ 1
Theorem 5.1 Let f,g : R n → [0,1] be two measurable functions on Gaussian space with E[f] = µ
and E[g] = ν. Then for all 0 ≤ ρ ≤ 1 we have
We will need is the following corollary.
E[fUρg] ≤ Γρ(µ,ν).
Corollary 5.2 Let X1,...,Xn,Y1,...,Ym be jointly Gaussian such that X1,... ,Xn are independent,
Y1,... ,Ym are independent and
|Cov[ ∑
αiXi, ∑
βiYi]| ≤ ρ.
sup
‖α‖2=‖β‖2=1
Let f1 : R n → [0,1],f2 : R m → [0,1] with E[fj] = µj. Then
i
Γ ρ(µ1,µ2) ≤ E[f1f2] ≤ Γρ(µ1,µ2).
Proof:
Note that if m > n then we may define Xn+1,...,Xm to be Gaussian and independent of all
the other variables. This implies that without loss of generality we may assume that m = n.
We may diagonalize the covariance matrix between X and Y and therefore assume that
Cov[Xi,Yj] = 0,
for i ̸= j and |Cov[Xi,Yi]| ≤ ρ for all i.
Write Cov[X,Y ] = ρJ, where J is diagonal with all entries in [−1,1]. Then clearly
E[f1(X1,... ,Xn)f2(Y1,...,Yn)] = E[f1Uρ(UJf2)],
where Uρ is the Ornstein-Uhlenbeck operator and UJ is the operator defined by
√
1 − J2 1,1 y1
√
, ...,jnxn + 1 − J2 n,n yn)],
(UJf)(x1,... ,xn) = E
y
[f(j1x1 +
where y is distributed according to the Gaussian measure. Since UJ is a Markov operator, we have
E[UJf2] = E[f2] and 0 ≤ UJf2 ≤ 1. Now applying Borell’s result we obtain the desired conclusion.
✷
We note that in general there is no closed form for Γρ(µ,ν); however, some asymptotics are
known: For balanced functions we have Sheppard’s formula Γρ(1/2,1/2) = 1
4
we record a fact to be used later.
Proposition 5.3 Let ((Xi,j,Yi,j)j) n i=1
be jointly Gaussian, each distributed N(0,1). Suppose further
that for each i:
sup
‖α‖2=‖β‖2=1
and that the n collections ((Xi,j,Yi,j)j) n i=1
sup
‖α‖2=‖β‖2=1
i
+
1
2π
arcsin ρ. Finally
|Cov[ ∑
j
αjXi,j
∑
βjYi,j]| ≤ ρ,
are independent. Then we have:
|Cov[ ∑ ∑
βi,jYi,j]| ≤ ρ,
i,j
αi,jXi,j
i
i,j
23Proof: Using the fact that a linear combination of Gaussians is a Gaussian it suffices to show that
if (Xi,Yi) are independent Gaussian vectors, each satisfying ρ(Xi,Yi) ≤ ρ,Xi,Yi ∼ N(0,1) and
‖α‖2 = ‖β‖2 = 1 then
|Cov[ ∑ ∑
βiYi]| ≤ ρ.
i
αiXi
This follows immediately from Cauchy-Schwartz:
|Cov[ ∑ ∑
βiYi]| = | ∑
αiβiCov[Xi,Yi]| ≤ ρ ∑
|αiβi| ≤ ρ.
✷
i
αiXi
5.2 Asymptotics of Γ()
i
i
In some of the applications below we will need to estimate Γρ1,...,ρk−1 (µ1,...,µk). In particular, we
will need the following estimate
Lemma 5.4 Let 0 < µ < 1 and 0 < ρ < 1. µ1 = ... = µk = µ and ρ1 = ... = ρk−1 = ρ. Define
Bk(ρ,µ) = Γρ1=ρ,...,ρk−1=ρ(µ1 = µ,...,µk = µ).
i
i
Then as k → ∞ we have
Bk(ρ,µ) ≤ k (ρ2 −1)/ρ 2 +o(1)
. (34)
Proof: Clearly, we have
Bi+1(ρ,µ) = Γρ(µ,Bi). (35)
The proof proceeds by deriving bounds on recursion (35). This is a straight forward (but not very
elegant) calculation with Gaussians. The main two steps in verifying (34) are to show that
• The sequence Bj converges to 0 as j → ∞. This follows from the fact that the functions
B → Γρ(µ,B) is easily seen to be strictly decreasing and has no fixed points other than
B = 0 and B = 1 when 0 < ρ < 1.
• Using Gaussian estimates sketched below, we see that for Bj−1 sufficiently small it holds that
This corresponds to Bj of the form
Bj−1 − Bj ≥ 1
2
ρ2
1−ρ2 +o(1)
B1+
j−1
. (36)
Bj ≤ Cj −1−ρ2
ρ 2 +o(1) .
More formally, it is easy to see that if Bj−1 is sufficiently small and satisfies
Bj−1 ≤ C(j − 1) −α
and
for α > 0 then
Bj ≤ Bj−1(1 − 1
2 B1/α
j−1 )
Bj−1 ≤ Cj −α .
24This follows using the fact that for small values of δ the maximum of the function x(1−x1/α /2)
in the interval [0,δ] is obtained at δ and therefore:
Bj ≤ Bj−1(1 − 1
2 B1/α
(
j−1
) ≤ C(j − 1)−α 1 − 1
2 (C(j − 1)−α ) 1/α
)
= C(j − 1) −α
(
1 − 1
2 C1/α (j − 1) −1
)
.
In order that
we need that
which holds for large enough value of C.
1
Bj ≤ Cj −α ,
1 − 1
2C1/α ≥
(j − 1) −1
( ) α
j
j − 1
In order to obtain (36) for small values of Bj−1, one uses the lemma stated below together with
the approximation
Φ(−t) = exp(−(1 + ot(1)) t2
2 ).
which implies that for every fixed ǫ > 0:
✷
ρ(t + ǫ)
Φ(−t)Φ(−√ 1 − ρ2 ) = Φ(−t)(1+ot(1))(1+
ρ2
1−ρ 2 ) .
Lemma 5.5 Let X,Y be a bi-variate normal with Var[X] = Var[Y ] = 1 and Cov[X,Y ] = ρ.
Then for all ǫ > 0 and t > 0 it holds that
Φ(−t) − Γρ(1/2,Φ(−t)) = P[X ≤ −t] − P[X ≤ −t,Y ≥ 0]
≥ (1 − exp(−tǫ − ǫ 2 ρ(t + ǫ)
/2))Φ(−t)Φ(−√ 1 − ρ2 )
Proof: The equality follows from the definitions. For the inequality, we first note that
and therefore
P[X ≤ −t − ǫ]
P[X ≤ −t] ≤ exp(−tǫ − ǫ2 /2),
P[X ≥ −t − ǫ|X ≤ −t] ≥ (1 − exp(−tǫ − ǫ 2 /2)).
Furthermore, writing Z for a N(0,1) variable that is independent of X we obtain
P[Y ≥ 0|X ≥ −t − ǫ] ≥ P[Y ≥ 0|X = −t − ǫ] = P[ρ(−t − ǫ) + √ 1 − ρ 2 Z ≥ 0]
= P[Z ≥
ρ(t + ǫ) ρ(t + ǫ)
√ ] = Φ(−√ 1 − ρ2 1 − ρ2 ),
as needed. ✷
256 Gaussian Bounds on Non-reversible Noise forms
In this section we prove the main results of the paper: Theorem 1.12 and its relaxations proposition
1.13 and 1.14. As in previous work [16, 17, 6], the proof idea is to use an invariance principle,
in this case Theorem 4.2, together with the Gaussian bounds of Section 5.
Since the invariance principle requires working either with low degree polynomials, or polynomials
that have exponentially decaying weight, an important step of the proof is the reduction to
this case. This reduction is proved in subsection 6.1. It is based on the fact that ρ < 1 and on the
properties of correlated spaces and Efron-Stein decompositions derived in Section 2.
The reduction, Theorem 4.2 and truncation arguments allow to prove Theorem 1.12 for k = 2
in subsection 6.2 and for k > 2 in subsection 6.3.
The relaxed conditions on the influences for k = 2 and for r-wise independent distributions are
derived in subsection 6.4 using a “two-threshold” technique. A related technique has been used
before in [7, 6]. However, the variant presented here is more elegant, gives more explicit dependency
on the influences and allows to exploit s-wise independence.
Finally, using a “weak Szemeredi type” type argument we derive in subsection 6.5 Proposition
1.13. We don’t know of any previous application of this idea in the context of the study of
influences.
6.1 Noise forms are determined by low degree expansion
In order to use the invariance principle, it is crucial to apply it to multi-linear polynomials that
are either of low degree or well approximated by their low degree part. Here we show that noise
stability quantities do not change by much if one replaces a function by a slight smoothing of it.
Lemma 6.1 Let Ω1,... ,Ωn,Λ1,... ,Λn be a collection of finite probability spaces. Let Let X,Y be
two ensembles such the collections of random variables such that (X i,Yi) are independent and X i
is a basis for the functions in Ωi, Yi is a basis for the functions in Λi. Suppose further that for all
i it holds that ρ(Ωi,Λi) ≤ ρ.
Let P and Q be two multi-linear polynomials. Let γ be chosen sufficiently close to 0 so that
Then:
In particular, it suffices to take
log ρ/ log ǫ+log ρ)
γ ≤ 1 − (1 − ǫ)
|E[P(X)Q(Y)] − E[T1−γ(X)T1−γQ(Y)]| ≤ 2ǫVar[P]Var[Q].
(
γ = Θ
ǫ
(1 − ρ)log(1/ǫ)
Proof: Without loss of generality if suffices to assume that Var[P] = Var[Q] = 1 and show that
|E[P(X)Q(Y)] − E[P(X)(T1−γQ)(Y)]| = |E[P(X)((I − T1−γ)Q) (Y)]| ≤ ǫ.
Let T be the noise operator defined by Tg(x) = E[g(Y )|X = x], where (X,Y ) are distributed
according to (X,Y). In order to prove the lemma it suffices to show that
)
.
|E[P(X)(T(I − T1−γ)Q) (X)]| ≤ ǫ. (37)
26Write P and Q in terms of their Efron-Stein decomposition, that is,
P = ∑
PS, Q = ∑
PQ.
S⊂[n]
S⊂[n]
It is easy to see that
(I − T1−γ)QS = (1 − (1 − γ) |S| )QS,
and propositions 2.11 and 2.12 imply that
‖TQS‖2 ≤ ρ |S| ‖QS‖2,
and that TQS is orthogonal to PS ′
for S′ ̸= S. Writing T ′ = T(I − T1−γ) we conclude that
‖T ′ QS‖2 ≤ min(ρ |S| ,1 − (1 − γ) |S| )‖QS‖2 ≤ ǫ‖QS‖2,
and that T ′ QS is orthogonal to P ′ S when S′ ̸= S.
By Cauchy-Schwartz we get that
|E[P(X)(T ′ Q)(Y)]| = | ∑
E[PST ′ QS]| ≤ √ √∑ Var[P]
as needed.
✷
Similarly we have:
S̸=∅
S̸=∅
‖T ′ QS‖ 2 2 ≤ ǫ√ Var[P] √ Var[Q],
Lemma 6.2 Let (Ω j
1 ,...,Ωj n) k j
j=1 be k collections of finite probability spaces. Let Let ((X
i )ni=1 : j =
are independent
1,... ,k) be k ensembles of collections of random variables such that ((X j
i )k j=1 )n i=1
and X j
i
is a basis for the functions in Ωj
i
. Suppose further that for all i it holds that ρ(Ωj
i
k) ≤ ρ.
Let P1,...,Pk be k multi-linear polynomials. Let γ be chosen sufficiently close to 0 so that
log ρ/(log ǫ+log ρ)
γ ≤ 1 − (1 − ǫ)
: 1 ≤ j ≤
Then:
k∏
|E[ Pj(X j k∏
)] − E[ T1−γPj(X j )]| ≤ ǫ
j=1
j=1
k∑
j=1
√ √
Var[Pj] Var[ ∏
j ′ <j
T1−γPj
∏
Pj].
j ′ >j
In particular, it suffices to take
(
γ = Θ
ǫ
(1 − ρ)log(1/ǫ)
)
.
Proof: The proof follows the proof of the previous lemma. ✷
276.2 Bilinear Stability Bounds
In this section we prove the bilinear stability bound. We repeat the statement of Theorem 1.12
with more explicit dependency on the influences.
Theorem 6.3 Let (Ω 1 i ×Ω2 i ,Pi) be a sequence of correlated spaces such that for each i the minimum
) ≤ ρ for all i. Let
→ [0,1]. Write K = log(1/α). For every ǫ > 0 there exists a
Pi probability of any atom in is at least α ≤ 1/2 and such that ρ(Ω1 i ,Ω2i f : ∏n
i=1 Ω1i → [0,1] and g : ∏n
i=1 Ω2i τ = τ(ǫ) < 1/2 such that if
max(Inf
≤log(1/τ) log(1/α) ≤log(1/τ) log(1/α)
i
(f),Inf
i
(g)) ≤ τ (38)
for all i (See Definition 3.7 for the definition of low-degree influence) then
Γ ρ(E[f],E[g]) − ǫ ≤ E[fg] ≤ Γρ(E[f],E[g]) + ǫ. (39)
One may take
log(1/ǫ)
O(log(1/α) )
τ = ǫ
ǫ(1−ρ)
(40)
Proof: Write µ = E[f] and ν = E[g]. As discussed in Section 3.2, let ˜ X be the sequence of
ensembles such that ˜ X i spans the functions on Ωi = Ω1 i × Ω2i , X i spans functions on Ω1 i and Yi
spans the functions on Ω2 i . We now express f and g as multilinear polynomials P and Q of X and
Y. Let γ > 0 be chosen so that
|E[P(X)Q(Y)] − E[T1−γ(X)T1−γQ(Y)]| ≤ ǫ/2.
Note that by Lemma 6.1 it follows that we may take γ = Θ(ǫ(1 − ρ)/log(1/ǫ)). Thus it suffices to
prove the bound stated in the theorem for T1−γP(X) and T1−γQ(Y).
We use the invariance principle under hypothesis H3. Let G and H be two Gaussian ensembles
such that for all i the covariance matrix of Hi and Gi is identical to the covariance matrix of X i
and Yi and such that (Gi,Hi) are independent. Clearly:
E[T1−γP(X)T1−γQ(Y)] = E[T1−γP(G)T1−γQ(H)].
Since (P(X),Q(Y)) take values in [0,1] 2 the same is true for
( ˜P, ˜Q) = ((T1−γP)(X),T1−γQ(Y)).
In other words, E[ζ( ˜ P, ˜ Q)] = 0, where ζ is the function in (31). Writing
( ¯ P, ¯ Q) = ((T1−γP)(G),T1−γQ(H)),
we conclude from Theorem 4.2 that E[ζ( ¯ P, ¯ Q))] ≤ τ Ω(γ/K) . That is, ‖( ¯ P, ¯ Q)−(P ′ ,Q ′ )‖ 2 2 ≤ τΩ(γ/K) ,
where P ′′ is the function of ¯ P defined by
P ′ ⎧
⎨
=
⎩
0 if ¯ P ≤ 0,
¯P if ¯ P ∈ [0,1],
1 if ¯ P ≥ 1.
and Q ′ is defined similarly. Now using Cauchy-Schwartz it is easy to see that
|E[ ¯ P ¯ Q] − E[P ′ Q ′ ]| ≤ τ Ω(γ/K) .
,
28Write µ ′ = E[P ′ ] and ν ′ = E[Q ′ ]. Then using the Gaussian Corollary 5.1 we obtain that
E[P ′ Q ′ ] ≤ Γρ(µ ′ ,ν ′ ).
From Cauchy-Schwartz it follows that |µ − µ ′ | ≤ τ Ω(γ/K) and similarly for ν,ν ′ . It is immediate
to check that
|Γρ(µ,ν) − Γρ ′(µ′ ,ν ′ )| ≤ |µ − µ ′ | + |ν − ν ′ | ≤ τ Ω(γ/K) .
Thus we have
Taking τ as in (40) yields
E[PQ] ≤ Γρ(µ,ν) + τ Ω(γ/K) + ǫ/2.
τ Ω(γ/K) ≤ ǫ/2,
and thus we obtain the upper bound in (39). The proof of the lower bound is identical. ✷
The following proposition completes the proof of (12).
Proposition 6.4 Let ( ∏k
j=1 Ωj
i ,Pi),1 ≤ i ≤ n be a sequence of correlated spaces such that for
each i the minimum Pi probability of any atom in Ωi is at least α ≤ 1/2 and such that such that
ρ(Ω1 i ,... ,Ωki ) ≤ ρ < 1 for all i. For 1 ≤ j ≤ k, let fj : ∏n
i=1 Ωj
i
→ [0,1].
Write K = log(1/α). For every ǫ > 0 there exists a τ = τ(ǫ) < 1/2 such that if
max(Inf ≤log(1/τ)/K
i
(fj)) ≤ τ
for all i and j (See Definition 3.7 for the definition of low-degree influence) then it holds that
One may take
Γ ρ(E[f1],... ,E[fk]) − ǫ ≤ E[
k∏
fi] ≤ Γρ(E[f1],... ,E[fk]) + ǫ
i=1
K log(1/ǫ)
O( )
τ = ǫ
ǫ(1−ρ)
(41)
The proof uses the following (known) lemma.
Lemma 6.5 Let f1,...,fk : Ω n → [0,1]. Then for all j:
Infj(
k∏
fi) ≤ k
i=1
Proof: The proof is based on the fact that
Var[
k∏
i=1
where x and y are independent. Now
E[(
k∏
fi(x) −
i=1
i=1
fi] = 1
2 E[(
k∑
Infj(fi).
i=1
k∏
fi(x) −
i=1
k∏
fi(y)) 2 k∑
] ≤ E[( |fi(x) − fi(y)|) 2 ] ≤ k
i=1
k∏
fi(y)) 2 ],
i=1
k∑
E[(fi(x) − fi(y)) 2 ] = 2k
i=1
k∑
Ii(f)
which gives the desired result. ✷
Proof:[Of Proposition 6.4] The proof follow by applying Theorem 6.3 iteratively for the functions
f1,f1f2,f1f2f3,... and using the previous lemma and the fact that Γρ() and Γ ρ () have Lipchitz
constant 1 in each coordinate. ✷
29
i=16.3 Multi-linear bounds
Next we prove (14).
Theorem 6.6 Let ( ∏k
j=1 Ωij ,Pi),1 ≤ i ≤ n be a sequence of correlated spaces such that for each
i the minimum Pi probability of any atom in Ωi is at least α ≤ 1/2 and such that such that
ρ(Ω1 i ,... ,Ωki ) ≤ ρ < 1 for all i. Suppose furthermore that for all i and j ̸= j′ , it holds that
ρ(Ω j
i ,Ωj′
i ) = 0, i.e., (∏ k
j=1 Ωj
i ,Pi) is pairwise independent. For 1 ≤ j ≤ k, let fj : ∏n
i=1 Ωj
i
→ [0,1].
Write K = log(1/α). For every ǫ > 0 there exists a τ = τ(ǫ) < 1/2 such that if
max(Inf ≤log(1/τ)/K
i
(fj)) ≤ τ (42)
then
One may take
|E[
k∏
fi] −
i=1
k∏
E[fi]| ≤ ǫ. (43)
i=1
K log(1/ǫ)
O( )
τ = ǫ
ǫ(1−ρ)
(44)
Proof: Note that for all i and all γ we have that the functions fi and T1−γfi are [0,1] valued
functions. Therefore, as in the previous proof we obtain by Lemma 6.2 that
k∏ k∏
|E[ fi] − E[ T1−γfi]| ≤ ǫ/4.
i=1
i=1
for γ = Ω(ǫ(1 − ρ)/log(1/ǫ)). Thus it suffices to prove the bound stated in the theorem for the
functions T1−γfi.
We now use the invariance principle. Recall that the fi may be written as a multi-linear
polynomial Pi of an ensemble X i. Let Gi,1 ≤ i ≤ k denote Gaussian ensembles with the same
covariances as X i,1 ≤ i ≤ k and let (g1,...,gk) be the multi-linear polynomials (P1,...,Pk) applied
to (X 1,...,Xk). Let hi = f [0,1](T1−γfi). By the invariance principle Corollary 4.3, we have:
k∏
k∏
|E[ T1−γfi] − E[ hi]| ≤ ǫ/4.
i=1
Note that hi are functions of ensembles of Gaussian random variables such that each pair of ensembles
is independent. Therefore, the hi’s are independent which implies in turn that
Corollary 4.3 also implies that
E[
k∏
hi] =
i=1
i=1
k∏
E[hi].
i=1
|E[hi] − E[fi]| = |E[hi] − E[T1−γfi]| ≤ ǫ/4k,
so
which concludes the proof. ✷
k∏ k∏
| E[hi] − E[fi]| ≤ ǫ/4,
i=1 i=1
306.4 Relaxed influence conditions
In this subsection we will relax the conditions imposed on the influence, i.e. Proposition 1.13.
In particular we will show that in Theorem 1.12 and k = 2 it suffices to assume that for each
coordinate at least one of the functions has low influence. Similarly for k > 2 and s-wise independent
distributions it suffices to have that in each coordinate at most s of the functions have large influence.
Lemma 6.7 Assume µ is an r-wise independent distribution on Ω k . Let f1,... ,fk : Ω n → [0,1].
Let S ⊂ [n] be a set of coordinate such that for each i ∈ S at most r of the functions fj have
Ii(fj) > ǫ. Define
gi(x) = E[fi(Y )|Y [n]\S = x [n]\S].
Then the functions gi do not depend on the coordinates in S and
k∏ k∏
|E[ fi] − E[ gi]| ≤ k|S| √ ǫ.
i=1
i=1
Proof: Recall that averaging over a subset of the variable preserves expected value. It also maintains
the property of taking values in [0,1] and decreases influences. Thus it suffices to prove the
claim for the case where |S| = 1. The general case then follows by induction.
So assume without loss of generality that S = {1} consists of the first coordinate only and that
fr+1,...,fk all have I1(fj) ≤ ǫ, so that E[(fj − gj) 2 ] ≤ ǫ for j > r. Then by Cauchy-Schwartz
we have E[|fj − gj|] ≤ √ ǫ for j > r and using the fact that the functions are bounded in [0,1] we
obtain
k∏ r∏
|E[ fi] − E[
i=1
k∏
fi
i=1 i=r+1
gi]| ≤ E[|
k∏
i=r+1
fi −
k∏
i=r+1
gi|] ≤
k∑
i=r+1
E[|fi − gi|] ≤ k √ ǫ. (45)
Let us write E1 for the expected value with respect to the first variable. Recalling that the gi do
not depend on the first variable and that the fi are r-wise independent we obtain that
E1[
r∏
k∏
fi
i=1 i=r+1
This implies that
gi]] =
k∏
i=r+1
giE1[
E[
r∏
r∏
fi] =
i=1
k∏
fi
i=1 i=r+1
k∏
gi
i=r+1 i=1
gi]] = E[
and the proof follows from (45) and(46). ✷
We now prove that condition (15) suffices instead of (11).
r∏
E1[fi] =
r∏
k∏
gi
i=1 i=r+1
gi =
k∏
gi.
i=1
k∏
gi], (46)
i=1
Lemma 6.8 Theorem 6.3 holds with the condition
instead of (38).
max
i
(min(Inf ≤log(1/τ)/K
i
(f1),Inf ≤log(1/τ)/K
i
(f2))) ≤ τ (47)
31Proof: As in the proof of Theorem 6.3 we may replace f and g by T1−γf and T1−γg for γ =
Ω(ǫ(1 −ρ)/log(1/ǫ)). Let τ and R be chosen so that the conclusion of Theorem 6.3 if all influences
of f and g satisfy I ≤R
i
≤ τ.
Let τ ′ and R ′ be chosen that
√
2R
τ
τ
′ + (1 − γ) 2R′ ≤ ǫ
4 .
It is easy to see that τ ′ may be taken as in (40) with bigger constants in the O and R ′ may be
taken as log(1/τ ′ )log(1/α).
Assume that f and g satisfy
max
i
(min(Inf ≤R′
i
(f),Inf ≤R′
i
(g))) ≤ τ ′ .
We will show that the statement of the theorem holds for f and g. For this let
Sf = {i : I ≤R
i
(f) > τ}, Sg = {i : I ≤R
i
(g) > τ}.
Let S = Sf ∪ Sg. Since R ′ ≥ R and τ ′ ≤ τ, the sets Sf and Sg are disjoint. Moreover, both Sf
and Sg are of size at most R
τ
. Also, if i ∈ S and I≤R
i
(f) > τ then I ≤R′
i
(g) < τ ′ and therefore
Ii(g) ≤ τ ′ + (1 − γ) 2R′
. In other words, for all i ∈ S we have min(Ii(g),Ii(f)) ≤ τ ′ + (1 − γ) 2R′
.
Letting S = Sf ∪ Sg and applying Lemma 6.7 with
we obtain that
g ′ (y ′ ) = E[g(y)|y [n]\S = y ′ [n]\S ], f ′ (x ′ ) = E[f(x)|x [n]\S = x ′ [n]\S ],
|E[f(x)g(y)] − E[f ′ (x)g ′ (y)]| ≤ 2R
√
τ
τ ′ + (1 − γ) 2R′ ≤ ǫ
4
. (48)
Note that the functions f ′ and g ′ satisfy that max(I ≤R
i
(f ′),I ≤R
i
(g′)) ≤ τ for all i. This implies that
the results of Theorem 6.3 hold for f ′ and g ′. This together with (48) implies the desired result. ✷
The proof of the relaxed condition on influences in Theorem 6.6 is similar.
Lemma 6.9 Assume the setup of Theorem 6.6 where ( ∏k
j=1 Ωj
i ,Pi) is s-wise independent for all
i. Then the conclusion of the theorem holds when the following condition:
replaces condition (42).
∀i, |{j : Inf ≤log(1/τ)/K
i
(fj) > τ}| ≤ s (49)
Proof: Again we start by looking at T1−γfi where γ = Ω(ǫ(1 − ρ)/log(1/ǫ)). We let τ ′ and R ′ be
chosen so that
√
kR
τ
τ
′ + (1 − γ) 2R′ ≤ ǫ
4 .
The set S will consist of all coordinates j where at least one of the functions fi has I ≤R
j
(fi) > τ.
The rest of the proof is similar. ✷
6.5 A Regularity Argument
Here we show how Proposition 1.14 follows from Theorem 1.12. The proof uses the following lemma.
32Lemma 6.10 Let (Ω1,µ1),... ,(Ωn,µn) be finite probability spaces such that for all i the minimum
probability of any atom in Ωi is at least α ≤ 1/2. Let f : ∏ n
i=1 Ωi → [0,1] and suppose that
Ii(f) ≥ τ. Let S = {i}. Then
E[f S ] ≥ E[f] + ατ,
Proof: Note that if g : Ωi → [0,1] satisfies Var[g] ≥ a then
Therefore:
E[f S ] ≤ E[f] − ατ.
(max
x g(x) − E[g])2 ≥ aα, (min g(x) − E[g])
x
2 ≥ aα.
E[f S − f] ≥ E[(f S − f) 2 ] ≥ αIi(f) = ατ,
which implies the first inequality. The second inequality is proved similarly. ✷
We now prove Proposition 1.14.
Proof: The set T is defined recursively as follows. We let To = ∅. a t = at = 0. Then we repeat
the following: If all of the functions
f Tt
1 ,... ,fTt k ,fTt
1 ,...,fTt
k
have influences lower than τ, then we halt and let T = Tt. Otherwise, there exists at least one i
and one j such that either Ii(f Tt
j ) ≥ τ or Ii(f Tt
j ) ≥ τ. We then let Tt+1 = Tt ∪ {i}. In the first case
we let at+1 = at + 1,a t+1 = a t. In the second case we let at+1 = at,a t+1 = a t + 1.
Note that by Lemma 6.10, this process must terminate within
steps since
and at + a t ≥ t. ✷
k ≥ E[
k∑
j=1
2k
ατ
f Tt
j ] ≥ ατat, 0 ≤ E[
k∑
j=1
f Tt
j ] ≤ k − ατa t .
7 Applications to Social Choice
In this section we apply Theorem 6.2 to the two social choice models.
7.1 ρ for samples of votes
In the first social choice example we consider example 2.6. The two probability spaces are the ones
given by
ΩV = {{x = 1}, {x = −1}},
representing the intended vote
ΩS = {{(x = 1,y = 1)}, {(x = −1,y = 1)}, {y = 0}}
representing the sampled status.
In order to calculate ρ(Ω1,Ω2) it suffices by lemma 2.8 to calculate √ E[(Tf) 2 ] where f(x,y) = x
is the (only) ΩV measurable with E[f] = 0 and E[f 2 ] = 1. We see that Tf(x,y) = 0 if y = 0 and
Tf(x,y) = x when y ̸= 0. Therefore
√E[(Tf)
2 ] = ρ
1/2
.
33Lemma 7.1
ρ(ΩV ,ΩS) = ρ 1/2 .
7.2 Predictability of Binary Vote
Here we prove Theorem 1.1.
Proof: The proof follows directly from Proposition 1.13 and Lemma 7.1 as
✷
7.3 ρ in Condorcet voting
Γ√ ρ(1/2,1/2) = 1 1
+
4 2π arcsin √ ρ.
In the context of Condorcet voting, Ω is given by Sk, the group of permutations on k elements and
µ is the uniform measures.
Definition 7.2 Let Q ⊂ ( )
[k]
2
and define RQ : Ω → {0,1} Q be letting (RQ(σ))i<j = 1 if σ(i) < σ(j)
and (RQ(σ))i<j = 0 if σ(i) > σ(j). The relation RQ summarizes the pairwise relation in the
permutation σ for pairs in
)
Q.
we define
Given a subset Q ⊂ ( [k]
2
ΩQ = {{σ : R(σ) = x} : x ∈ {0,1} Q }
Thus ΩQ is the coarsening of Ω summarizing the information about pairwise relations in Q.
We will mostly be interested in ρ(ΩQ,Ωi<j) where (i < j) /∈ Q.
Lemma 7.3 Suppose Q = {(1 > 2),(1 > 3),... ,(1 > r)} then
√
r − 1 1
ρ(ΩQ,Ω1>(1+r)) =
≤ √
3(r + 1) 3
Proof: The space Ω 1>(r+1) has a unique function with E[f] = 0 and E[f 2 ] = 1. This is the function
that satisfies f(σ) = 1 when σ(1) > σ(r + 1) and f(σ) = −1 if σ(1) < σ(r + 1).
The conditional probability that σ(1) > σ(r + 1) given that s of the inequalities σ(1) >
σ(2),... ,σ(1) > σ(r) hold is
s + 1
r + 1 .
Therefore the conditional expectation of f under this conditioning is:
1
r(r + 1) 2
r−1
s=0
(s + 1) − (r − s)
r + 1
=
2s + 1 − r
.
r + 1
Noting that the number of inequalities satisfied is uniform in the range {0,... ,r − 1} we see that
E[(Tf) 2 ∑
] = (2s + 1 − r) 2 1
=
r(r + 1) 2
(
∑r−1
4 s 2 ∑r−1
− 4(r − 1) s + r(r − 1) 2
)
=
=
1
r(r + 1) 2
r − 1
3(r + 1)
s=0
s=0
(
4 2(r − 1)3 + 3(r − 1) 2 + (r − 1)
− 4(r − 1)
6
(r − 1)2 + (r − 1)
+ r(r − 1)
2
2
34
)and
✷
√
r − 1
ρ =
3(r + 1)
7.4 Condorcet Paradox
We now prove Theorem 1.3 dealing with Condorcet paradoxes.
Proof:
We wish to bound the asymptotic probability in terms of the number of candidates k for the
probability that there is a unique maximum in Condorcet aggregation. Clearly this probability
equals k times the probability that candidate 1 is the maximum.
Recall that the votes are aggregated as follows. Let f : {−1,1} n → {−1,1} be an anti-symmetric
be a function with E[f] = 0 and low influences. Let σ1,... ,σn ∈ Sk denote the n rankings of the individual
voters. Recall that we denote x a>b (i) = 1 if σi(a) > σi(b) and x a<b (i) = −1 if σi(a) < σi(b).
Note that x b>a = −x a>b . We recall further that the binary decision between each pair of coordinates
is performed via a anti-symmetric function f : {−1,1} n → {−1,1} so that f(−x) = −f(x)
for all x ∈ β. Finally, the tournament GK = GK(σ;f) is defined by having (a > b) ∈ GK if and
only if f(x a>b ) = 1.
In order to obtain an upper bound on the probability that 1 is the unique maximum, define
f a,b (σ1,... ,σn) = (1 + f(x a>b ))/2. Then the probability that 1 is the maximum is given by:
Using (12) we obtain that
E[
k∏
a=2
E[
k∏
f 1,a ].
a=2
f 1,a ] ≤ Γ
(
1
√3 ,..., 1
√ )
3 (1
2
1
,... , ) + ǫ,
2
where ǫ → 0 as τ → 0; there are k − 1 expected values all given by 1/2 as E[f 1,a ] = 1/2 for all a;
the k − 2 values of ρ all bounded by 1/ √ 3 by Lemma 7.3.
By Lemma 5.4 we now obtain
E[
k∏
a=2
f 1,a ] ≤ k −2+o(1) .
Taking the union bound on the k possible maximal values we obtain that in Condorcet voting the
probability that there is a unique max is at most k −1+o(1) as needed. ✷
7.5 Majority and Tightness
The majority function shows the tightness of Theorem 1.1 and Theorem 1.3.
357.5.1 Tightness for Prediction
For the tightness of Theorem 1.1, write:
X =
∑ n
i=1 xi
√
n
, Y =
∑ n
i=1 xiyi
√ ,
ρn
where xi is the intended vote of voter i and yi = 1 if voter i was queried, yi = 0 otherwise. X is
the total bias of the actual vote and Y the bias of the sample.
Note that (X,Y ) is asymptotically a normal vector with covariance matrix
Therefore asymptotically we obtain
(
1
√
ρ
√
ρ 1
E[sgn(X)sgn(Y )] = 2P[X > 0,Y > 0] − 1 = 2Γρ(1/2,1/2) − 1 = 2
π arcsin √ ρ.
7.5.2 Tightness in Condorcet Voting
The tightness in Theorem 1.3 follows from [22] and [18]. We briefly sketch the main steps of the
proof.
For each a and b let
X a>b ∑n i=1
=
xa>b
√
i
,
n
be the bias preference in a majority vote towards a. All the random variables X a>b are asymptotically
N(0,1).
Consider the random variables X 1>2 ,... ,X 1>k . Note that this set of variables is exchangeable
(they are identically distributed under any permutation of their order). Moreover,
E[X 1>2 X 1>3 ] = 1
n
By the CLT the limiting value (as n to ∞) of
n∑
i=1
)
E[x 1>2
i x 1>3
i ] = 1/3.
lim
n>∞ P[X1>2 > 0,... ,X 1>k > 0] = P[N 1>2 > 0,... ,N 1>k > 0]
where (N1>a) is an exchangeable collection of normal N(0,1) random variables, the correlation
between each pair of which is 1/3. The results [22] imply that as k → ∞:
P[N 1>2 > 0,... ,N 1>k > 0] ∼ 2Γ(2) √ √
lnk
2π
k2 ,
This in turn implies that the probability of a unique max for majority voting for large k as n → ∞
is given by:
(2 + o(1))Γ(2) √ √
ln k
2π ,
k
showing the tightness of the result up to sub-polynomial terms.
367.5.3 The probability that majority will result in linear order
Here we prove Proposition 1.4 and show that the probability that majority will result in linear
order is exp(−Θ(k 5/3 )). We find this asymptotic behavior quite surprising. Indeed, given the
previous results that the probability that there is a unique max is k −1+o(1) , one may expect that
the probability that the order is linear would be
k −1+o(1) (k − 1) −1+o(1) ... = (k!) −1+o(1) .
However, it turns out that there is a strong negative correlation between the event that there is a
unique maximum among the k candidates and that among the other candidates there is a unique
max.
Proof: We use the multi-dimensional CLT. Let
Xa>b = 1
√
n
(|{σ : σ(a) > σ(b)}| − |{σ : σ(b) > σ(a)}|)
By the CLT at the limit the collection of variables (Xa>b)a̸=b converges to a joint Gaussian vector
(Na>b)a̸=b satisfying for all distinct a,b,c,d:
Na>b = −Nb>a, Cov[Na>b,Na>c] = 1
3 , Cov[Na>b,Nc>d] = 0.
and Na>b ∼ N(0,1) for all a and b.
We are interested in providing bounds on
P[∀a > b : Na>b > 0]
as the probability that the resulting tournament is an order is obtained by multiplying by a k! =
exp(Θ(k log k)) factor.
We claim that there exist independent N(0,1) random variables such that
Na>b = 1
√
3
(Xa − Xb + Za>b)
(where Za>b = −Zb>a). This follows from the fact that the joint distribution of Gaussian random
variables is determined by the covariance matrix (this is noted in the literature in [18]).
We now prove the upper bound. Let α be a constant to be chosen later. Note that for all a
and large enough k it holds that:
P[|Xa| > k α ] ≤ exp(−Ω(k 2α )).
Therefore the probability that for at least half of the a’s in the interval [k/2,k] it holds that
|Xa| > k α is at most
exp(−Θ(k 1+2α )).
Let’s assume that at least half of the a’s in the interval [k/2,k] satisfy that |Xa| < k α . We
claim that in this case the number H k/4[−k α ,k α ] of pairs a > b such that Xa,Xb ∈ −[k α ,k α ] and
Xa − Xb < 1 is Ω(k 2−α ).
For the last claim partition the interval [−k α ,k α ] into sub-intervals of length 1 and note that at
least Ω(k) of the points belong to sub-intervals which contain at least Ω(k 1−α ) points. This implies
that the number of pairs a > b satisfying |Xa − Xb| < 1 is Ω(k 2−α ).
Note that for such pair a > b in order that Na>b > 0 we need that Za>b > −1 which happens
with constant probability.
37We conclude that given that half of the X’s fall in [−k α ,k α ] the probability of a linear order is
bounded by
exp(−Ω(k 2−α )).
Thus overall we have bounded the probability by
exp(−Ω(k 1+2α )) + exp(−Ω(k 2−α )).
The optimal exponent is α = 1/3 giving the desired upper bound.
For the lower bound we condition on Xa taking value in (a,a + 1)k −2/3 . It is easily seen that
the probability that all Xa take such values is
exp(−O(k 5/3 )),
and that conditioned on Xa taking such values the probability that
Za>b > Xb − Xa,
for all a > b is also
This proves the required result. ✷
exp(−O(k 5/3 )),
8 Applications to Hyper-Graphs and Additive Properties
In this section we prove Theorem 1.5 and give a few examples. The basic idea in the applications
presented so far was that in order to bound correlation between k events of low influences, it suffices
to know how to bound the correlation between the first k − 1 and the last one. For low influence
events, using the invariance principle, one obtains bounds coming from half spaces in Gaussian
space, or majority functions in the discrete space.
The applications presented now will be of different nature. We will be interested again in
correlation between k events – however, we will restrict to correlation measures defined in such a
way that any pair of events are un-correlated. While this is a much more restrictive setting, it
allows to obtain exact results and not just bounds. In other words, we obtain that such correlation
measures for low influences events depend only on the measure of the sets but not on any additional
structure. While this may sound surprising, it in fact follows directly the invariance principle
together with the fact that for jointly Gaussian random variables, pairwise independence implies
independence. We first prove Theorem 1.5.
Proof: The proof follow immediately from Theorem 1.12, Proposition 1.14 and Lemma 2.9. ✷.
Example 8.1 Consider the group Zm for m > 2. We will be interested in linear relations over
Z k m. A linear relation over Z k m is given by L = (L0,ℓ1,... ,ℓk) such that ℓi ̸= 0 for all i ≥ 1 and ℓi
and m are co-prime for all i ≥ 1. We will restrict to the case k ≥ 3. We will write L(x) to denote
the logical statement that ∑ k
i=1 xiℓi mod m ∈ L0. Given a set A ⊂ Z n m we will denote
L(A k ) = |{x ∈ A k : L(x)}|,
and µL the uniform measure on L(A k ). We note that for every linear relation we have that µL is
pairwise smooth and that if the set L0 is of size at least 2 then R is connected.
38We now apply Theorem 1.5 to conclude that for low influence sets A ⊂ Zn m, the number of k
tuples (x1,... ,xk) ∈ Ak satisfying ∑ xi mod m ∈ Ln 0 is
(1 ± o(1))|A| k
( ) n
L0
. (50)
m
For general sets A we conclude that we have A S ⊂ A ⊂ A T where |S| = O(1),T = O(1) and (50)
holds for both A S and A T .
Example 8.2 We may consider much more general relations. For example we may take P be a
polynomial in x1,...,xr ∈ Zr and Q to be a polynomial in xr+1,... ,xk such that P and Q both
have roots. Then we can look at the relation defined by the zeros of PQ. It is easy to check that
R is connected. Therefore it follows that if A ⊂ Z n m is set all of whose influences are lower than τ
then
c1(τ,
|A|
|Zn m| )|Rn| ≤ |A k ∩ R n |A|
| ≤ c2(τ,
|Zn m| )|Rn|
where c1 and c2 are two positive functions. Again if A is not of low influences then there exist finite
sets of coordinates S and T such that A S ⊂ A ⊂ A T and the conclusion holds for S and T.
References
[1] P. Austrin and E. Mossel. Approximation resilient predicates from pairwise independence. To
appear in CCC 2008, 2008.
[2] W. Beckner. Inequalities in Fourier analysis. Ann. of Math. (2), 102(1):159–182, 1975.
[3] C. E. Bell. A random voting graph almost surely has a hamiltonian cycle when the number of
alternatives is large. Econometrica, 49(6):1597–1603, 1981.
[4] A. Bonami. Étude des coefficients de Fourier des fonctions de Lp (G). Ann. Inst. Fourier
(Grenoble), 20(fasc. 2):335–402 (1971), 1970.
[5] C. Borell. Geometric bounds on the Ornstein-Uhlenbeck velocity process. Z. Wahrsch. Verw.
Gebiete, 70(1):1–13, 1985.
[6] I. Dinur, E. Mossel, and O. Regev. Conditional hardness for approximate coloring. In Proceedings
of the thirty-eighth annual ACM symposium on Theory of computing (STOC 2006),
pages 344–353, 2006.
[7] I. Dinur and S. Safra. The importance of being biased. In Proceedings on 34th Annual ACM
Symposium on Theory of Computing, pages 33–42, 2002.
[8] B. Green. A Szemerédi-type regularity lemma in abelian groups, with applications. Geom.
Funct. Anal., 15(2):340–376, 2005.
[9] S. Janson. Gaussian Hilbert Spaces, volume 129 of Cambridge Tracts in Mathematics. Cambridge
University Press, Cambridge, 1997.
[10] J. Kahn, G. Kalai, and N. Linial. The influence of variables on boolean functions. In Proceedings
of the 29th Annual Symposium on Foundations of Computer Science, pages 68–80, 1988.
39[11] G. Kalai. A Fourier-theoretic perspective on the Concordet paradox and Arrow’s theorem.
Adv. in Appl. Math., 29(3):412–426, 2002.
[12] G. Kalai. Social Indeterminacy. Econometrica, 72:1565–1581, 2004.
[13] S. Khot. On the power of unique 2-prover 1-round games. In Proc. 34th Ann. STOC, pages
767–775, 2002.
[14] S. Khot, G. Kindler, E. Mossel, and R. O’Donnell. Optimal inapproximability results for
max-cut and other 2-variable csps? In Proceedings of the 45th Annual IEEE Symposium on
Foundations of Computer Science, pages 146–154. IEEE, 2004.
[15] S. Khot, G. Kindler, E. Mossel, and R. O’Donnell. Optimal inapproximability results for
max-cut and other 2-variable csps? SIAM J. Comput., 37:319–357, 2007.
[16] E. Mossel, R. O’Donnell, and K. Oleszkiewicz. Noise stability of functions with low influences:
invariance and optimality (extended abstract). In 46th Annual IEEE Symposium on
Foundations of Computer Science (FOCS 2005), 23-25 October 2005, Pittsburgh, PA, USA,
Proceedings, pages 21–30. IEEE Computer Society, 2005.
[17] E. Mossel, R. O’Donnell, and K. Oleszkiewicz. Noise stability of functions with low influences:
invariance and optimality. To appear in Ann. Math., 2008.
[18] R. G. Niemi and H. F. Weisberg. A mathematical solution for the probability of paradox of
voting. Behavioral Science, 13:317–323, 1968.
[19] R. O’Donnell. Hardness amplification within np. In Proceedings of the 34th ACM Symposium
on Theory of Computing, 2002.
[20] K. Oleszkiewicz. On a nonsymmetric version of the Khinchine-Kahane inequality. Progress In
Probability, 56:156–168, 2003.
[21] P. Raghavendra. Optimal Algorithms and Inapproximability Results For Every CSP? To
appear in STOC, 2008.
[22] Y. Rinott and V. Rotar ′ . A remark on quadrant normal probabilities in high dimensions.
Statistics and Probability Letters, 51(1):47–51, 2001.
[23] A. Samorodnitsky and L. Trevisan. Gowers uniformity, influence of variables, and PCPs. In
STOC ’06: Proceedings of the thirty-eighth annual ACM symposium on Theory of computing,
pages 11–20, New York, NY, USA, 2006. ACM Press.
[24] P. Wolff. Hypercontractivity of simple random variables. submitted, 2005.
40