CAPÍTULO 5. RESULTADOS
5.3. EXTINCIÓN DEL CONTRATO DE TRABAJO POR MUTUO DISENSO Y SU IMPLICANCIA
To see all these ideas in action, we will work out an example. Recall theplanted clique problem, involving ann-node graph G:
H0:G was sampled from the Erdős-Rényi model G(n, 1
2), with inde-
pendent edges each appearing with probability 1
2.
H1: G was sampled adding each vertex of G to a set S independently
with probabilityk/n, adding every edge between vertices in S to form a clique, then sampling the rest of the edges independently as in G(n,1
2).
Denote byν the null distribution G(n, 1/2) and µkthe alternative distribution H1. In this section we prove the following lemma about optimal D-simple
statistics for planted clique. A refined version of this lemma is the first step in a (much more complicated) SoS lower bound for the planted clique problem, presented later in this thesis.
Lemma 2.4.1. For everyε > 0, if k n1/2−ε, then
max
f (C log n)-simple
G∼µk
f (G)6 O(1)
for everyC > 0. On the other hand, if k > 1.01 √
n, then there is C > 0 such that
max
f (C log n)-simple
G∼µk
f (G) → ∞.
More refined versions of this lemma allow 1.01 √
n to be relaxed to Ω(√n), and to treatk in the interval [n1/2−ε, Ω(
√
n)] (though some questions about what precisely happens in this interval remain open for such k).
Proof. From Boolean Fourier analysis [142] we recall that the functions{χα(G) Î
{i,j}∈α(2Gi j−1)}α⊆(n 2),|α|6D
form an orthonormal basis for the degree-D functions f : {0, 1}(n
2) → . (HereG
i j is the 0/1 indicator for the presence of the edge i j in G.) So by (2.3.1), we just need to compute G∼µχα(G) for each suchα.
Fixα ⊆ n
2
. Consider the process of sampling a graphG ∼ µ by first sampling the clique verticesS ⊆ [n]. Conditioned on S the edges of G become independent, so χα(G) SÎi j∈α [(2Gi j − 1) | S]. If i or j is not in S, then the edge i, j is included inG with probability 1/2, so [(2Gi j− 1) | {i, j}1S] 0.
Thus, χα has nonzero conditional expectation only if all of V(α)
def
{i ∈ [n] : exists j ∈ [n] s.t. i j ∈ α} is in S. This occurs with probability precisely (k/n)|V(α). And, ifV(α) ⊆ S, every edge with endpoints in V(α) appears in G, so the conditional expectation ofχα(G) is 1. We find that µχα(G) (k/n)|V(α)|.
Now we need to estimateÍ 0<|α|6D( µχα(G)) 2 Í |α|6D(k/n) 2|V(α)| . We start with the upper bound, when k n1/2−ε for some ε > 0 and D C log n for some C > 0. Every α with |α| 6 D has |V(α)| 6 2D 2C log n. And, for everyt 6 2C log n there are at most nttmin(2C log n,2t
2)
sets α with |V(α)| t and |α|6 C log n. So, Õ 0<|α|6D ( µ χα(G)) 2 6 Õ t6 √ C log n n−2εt· t2t2 + Õ √ C log n6t62C log n n−εt· tC log n.
Standard manipulations bound both the above sums byO(1).
On the other hand, if k > 1.01 √
n, then just by considering the contri- butions to the sum from α’s which form a cycle it is easy to show that Í
|α|6100 log n( µχα(G))
2→ ∞.
CHAPTER 3
THE SOS METHOD FOR ALGORITHM DESIGN
The next goal is to describe the SoS method. As a prerequisite, we introduce two statistical tasks generalizing hypothesis testing: refutation and estimation.
We need to extend the setting from the last chapter to includehidden variables.
LetΩ, Σ be (finite or infinite) sets, and n, m ∈. Letµ be a probability distribution onΩn × Σm, andν a distribution on Ωn. We think ofµ as a distribution on pairs { y, x} where y ∈ Ωn and x ∈ Σm. If we projectµ onto the marginal distribution
on y we recover the hypothesis testing settings of the last chapter. Instead, we will consider algorithmic tasks in which an algorithm sees a sample y and accomplishes some task related tox.
3.1
Refutation
Definition 3.1.1. An α-refutation algorithm for (µ, ν) takes input y ∈ Ωn and outputs a numberA(y) such that A(y) >maxx∈Σmlogµ( y, x) andy∼ν(A(y) > α) 6 o(1). Notice that the second probability is over y ∼ ν, even though A(y) is an upper bound on probabilities related to µ. Informally, and for the right choices ofα, the algorithm A is certifying that typical y ∼ ν is extremely unlikely to have come fromµ.
Often, maxx∈Σmlogµ( y, x) corresponds to a natural combinatorial or analytic property of y, and the refutation problem requires certifying that y does not have some combinatorial or analytic structure. Some examples may be helpful.
alternative distributionµ on n-node graphs G. First include each vertex inde- pendently in a set S with probability k/n, then sample a random graph with a clique onS and the rest of the edges independent as in G(n, 1/2). Here, the hidden variablex ∈ {0, 1}nis the indicator vector forS, and the observed variable y ∈ {0, 1}(n
2)is the (adjacency matrix of the) graphG. We have
µ(G, S) µ(S) ·µ(G | S) k n |S| 1− k n n−|S| 1 2 (n2)−( |S| 2) .
ifS is a G-clique and 0 otherwise. So when S is a G-clique,
log µ(G, S) |S| 2 log 2− |S| log n k − |S| log(1 − k/n)+ f (n, k)
for some function f (n, k) not depending on S, and otherwise logµ(G, X) −∞. For|S| log n and some constant C,
|S|2
/C+ f (n, k) 6log
µ(G, S) 6C|S| 2
+ f (n, k) ,
so the refutation problem is (for such|S|) equivalent to certifying upper bounds on the size of the maximum clique inG ∼ G(n, 1
2) (the latter appears because
G(n,1
2) is typically the null distribution for planted clique).
Relation to Hypothesis Testing A refutation algorithm for distributionsν, µ can also be used to solve the hypothesis testing problem, assuming that there is an efficient algorithm to computeν( y). Often ν – the distribution of the null hypothesis – is a product distribution, or is uniform over some set of known size, making this task trivial.
In the case of planted clique this connection is intuitively clear. Graphs fromµ (typically) contain cliques of size k − O(
√
k) and graphs from G(n, 1/2) typically contain no clique larger than 2.1 · log n. Any refutation algorithm which
successfully certifies that graphs fromG(n, 1/2) do not contain cliques of size 0.9k also indicates by its success or failure which distribution its input came from.
It is possible to derive a reduction from hypothesis testing to refutation in an very generic setting (at least for finiteΩ and Σ), via the following familiar variational formula from the theory of exponential families [177]:
log µ( y) log Õ x∈Σm µ(x, y) maxµ0∈∆ Σm x∼µ0 log µ(x, y) + H(µ 0 )
where∆Σmis the set of distributions onΣmandH is the Shannon entropy. For any pair of distributions µ, ν, the associated refutation problem requires certifying an upper bound on maxxlogµ(x, y) given y ∼ ν.1