# Theory of Computing Blog Aggregator

### Special Topics in Complexity Theory, Lectures 6-7

from Emanuele Viola

Special Topics in Complexity Theory, Fall 2017. Instructor: Emanuele Viola

### 1 Lecture 6-7, Scribe: Willy Quach

In these lectures, we introduce $k$-wise indistinguishability and link this notion to the approximate degree of a function. Then, we study the approximate degree of some functions, namely, the AND function and the AND-OR function. For the latter function we begin to see a proof that is different (either in substance or language) from the proofs in the literature. We begin with some LATEXtips.

#### 1.1 Some LATEX tips.

• Mind the punctuation. Treat equations as part of the phrases; add commas, colons, etc accordingly.
• In math mode, it is usually better to use \ell ($\ell$) rather than regular l. The latter can be confused with 1.
• Align equations with \begin{align} $\cdots$ \end{align} with the alignment character &.
• For set inclusion, use \subset ($\subset$) only if you mean proper inclusion (which is uncommon). Otherwise use \subseteq ($\subseteq$). (Not everybody agrees with this, but this seems the most natural convention by analogy with $<$ and $\le$.)

#### 1.2 Introducing k-wise indistinguishability.

We studied previously the following questions:

• What is the minimum $k$ such that any $k$-wise independent distribution $P$ over $\{0,1\}^n$ fools $\mathrm {AC}^0$ (i.e. $\mathbb {E}C(P) \approx \mathbb {E}C(U)$ for all $poly(n)$-size circuits $C$ with constant depth)?

We saw that $k = \log ^{\mathcal {O}(d)}(s/\epsilon )$ is enough.

• What is the minimum $k$ such that $P$ fools the AND function?

Taking $k=\mathcal {O}(1)$ for $\epsilon =\mathcal {O}(1)$ suffices (more precisely we saw that $k$-wise independence fools the AND function with $\epsilon = 2^{-\Omega (k)}$).

Consider now $P$ and $Q$ two distributions over $\{0,1\}^n$ that are $k$-wise indistinguishable, that is, any projection over $k$ bits of $P$ and $Q$ have the same distribution. We can ask similar questions:

• What is the minimum $k$ such that $\mathrm {AC}^0$ cannot distinguish $P$ and $Q$ (i.e. $\mathbb {E}C(P) \approx \mathbb {E}C(Q)$ for all $poly(n)$-size circuits $C$ with constant depth)?

It turns out this requires $k \geq n^{1-o(1)}$: there are some distributions that are almost always distinguishable in this regime. (Whether $k=\Omega (n)$ is necessary or not is an open question.)

Also, $k = n\left (1- \frac {1}{polylog(n)}\right )$ suffices to fool $\mathrm {AC}^0$ (in which case $\epsilon$ is essentially exponentially small).

• What is the minimum $k$ such that the AND function (on $n$ bits) cannot distinguish $P$ and $Q$?

It turns out that $k=\Theta (\sqrt {n})$ is necessary and sufficient. More precisely:

• There exists some $P,Q$ over $\{0,1\}^n$ that are $c\sqrt {n}$-wise indistinguishable for some constant $c$, but such that:
\begin{aligned} \left | \Pr _P [AND(P)=1] - \Pr _Q [AND(Q)=1] \right | \geq 0.99 \,;\end{aligned}
• For all $P, Q$ that are $c'\sqrt {n}$-wise indistinguishable for some bigger constant $c'$, we have:
\begin{aligned} \left | \Pr _P [AND(P)=1] - \Pr _Q [AND(Q)=1] \right | \leq 0.01 \,.\end{aligned}

#### 1.3 Duality.

Those question are actually equivalent to ones related about approximation by real-valued polynomials:

Theorem 1. Let $f:\{0,1\}^n \rightarrow \{0,1\}$ be a function, and $k$ an integer. Then:

\begin{aligned} \max _{P,Q \, k\text {-wise indist.}} \left | \mathbb {E}f(P)-\mathbb {E}f(Q) \right | = \min \{ \, \epsilon \, | \, \exists g\in \mathbb {R}_k[X]: \forall x, \left |f(x)-g(x) \right | \leq \epsilon \}. \end{aligned}

Here $\mathbb {R}_k[X]$ denotes degree-$k$ real polynomials. We will denote the right-hand side $\epsilon _k(f)$.

Some examples:

• $f=1$: then $\mathbb {E}f(P)=1$ for all distribution $P$, so that both sides of the equality are $0$.
• $f(x) = \sum _i x_i \bmod 2$ the parity function on $n$ bits.

Then for $k = n-1$, the left-hand side is at least $1/2$: take $P$ to be uniform; and $Q$ to be uniform on $n-1$ bits, defining the $n$th bit to be $Q_n = \sum _{i to be the parity of the first $n-1$ bits. Then $\mathbb {E}f(P)=1/2$ but $\mathbb {E}f(Q)=0$.

Furthermore, we have:

Claim 2. $\epsilon _{n-1}(\mathrm {Parity}) \geq 1/2$.

Proof. Suppose by contradiction that some polynomial $g$ has degree $k$ and approximates Parity by $\epsilon < 1/2$.

The key ingredient is to symmetrize a polynomial $p$, by letting

\begin{aligned} p^{sym}(x) := \frac {1}{n!} \sum _{\pi \in \mathfrak {S}_n} f(\pi x),\end{aligned}

where $\pi$ ranges over permutations. Note that $p^{sym}(x)$ only depends on $\|x\| = \sum _i x_i$.

Now we claim that there is a univariate polynomial $p'$ also of degree $k$ such that

\begin{aligned} p'(\sum x_i) = p^{sym}(x_1, x_2, \ldots , x_n)\end{aligned}

for every $x$.

To illustrate, let $M$ be a monomial of $p$. For instance if $M = X_1$, then $p'(i) = i/n$, where $i$ is the Hamming weight of the input. (For this we think of the input as being $\in \{0,1\}$. Similar calculations can be done for $\in \{-1,-1\}$.)

If $M = X_1 X_2$, then $p'(i) = \frac {i}{n}\cdot \frac {i-1}{n}$ which is quadratic in $i$.

And so on.

More generally $p^{sym}(X_1,\dots ,X_n)$ is a symmetric polynomial. As $\{(\sum _j X_j)^\ell \}_{\ell \leq k}$ form a basis of symmetric polynomials of degree $k$, $p^{sym}$ can be written as a linear combination in this basis. Now note that $\{(\sum _j X_j)^{\ell } (x)\}_{\ell \leq k}$ only depends on $\|x\|$; substituting $i = \sum _j X_j$ gives that $p'$ is of degree $\leq k$ in $i$.

(Note that the degree of $p'$ can be strictly less than the degree of $p$ (e.g. for $p(X_1,X_2) = X_1-X_2$: we have $p^{sym} = p' = 0$).)

Then, applying symmetrization on $g$, if $g$ is a real polynomial $\epsilon$-close to Parity (in $\ell _\infty$ norm), then $g'$ is also $\epsilon$-close to Parity’ (as a convex combination of close values).

Finally, remark that for every integer $k \in \{0,\dots ,\lfloor n/2 \rfloor \}$, we have: Parity$'(2k)=0$ and Parity$'(2k+1)=1$. In particular, as $\epsilon < 1/2$, $g'-1/2$ must have at least $n$ zeroes, and must therefore be zero, which is a contradiction. $\square$

We will now focus on proving the theorem.

Note that one direction is easy: if a function $f$is closely approximated by a polynomial $g$ of degree $k$, it cannot distinguish two $k$-wise indistinguishable distributions $P$ and $Q$:

\begin{aligned} \mathbb {E}[f(P)]&= \mathbb {E}[g(P)] \pm \epsilon \\ &\stackrel {(*)}{=} \mathbb {E}[g(Q)] \pm \epsilon \\ &= \mathbb {E}[f(Q)] \pm 2\epsilon \, , \end{aligned}

where $(*)$ comes from the fact that $P$ and $Q$ are $k$-wise indistinguishable.

The general proof goes by a Linear Programming Duality (aka finite-dimensional Hahn-Banach theorem, min-max, etc.). This states that:

If $A \in \mathbb {R}^{n\times m}$, $x\in \mathbb {R}^m$, $b\in \mathbb {R}^n$ and $c\in \mathbb {R}^m$, then:

$\left . \begin {array}{rrcl} &\min \langle c,x \rangle &=& \sum _{i \leq m} c_i x_i\\ &&\\ \text { subject to:} &Ax &=& b\\ &x &\geq & 0\\ \end {array} \right | \, = \, \left | \begin {array}{cc} &\max \langle b,y \rangle \\ &\\ \text { subject to:} &A^T y \leq c\\ &\\ \end {array} \right .$

We can now prove the theorem:

Proof.

The proof will consist in rewriting the sides of the equality in the theorem as outputs of a Linear Program. Let us focus on the left side of the equality: $\max _{P,Q \, k\text {-wise indist.}} \left | \mathbb {E}f(P)-\mathbb {E}f(Q) \right |$.

We will introduce $2^{n+1}$ variables, namely $P_x$ and $Q_x$ for every $x\in \{0,1\}^n$, which will represent $\Pr [D=x]$ for $D=P,Q$.

We will also use the following, which can be proved similarly to the Vazirani XOR Lemma:

Claim 3. Two distributions $P$ and $Q$ are $k$-wise indistinguishable if and only if: $\forall S\subseteq \{1,\dots ,n\}$ with $|S|\leq k$, $\sum _x P_x \chi _S(x) - \sum _x Q_x \chi _S(x)=0$, where $\chi _S(X) = \prod _S X_i$ is the Fourier basis of boolean functions.

The quantity $\max _{P,Q \, k\text {-wise indist.}} \left | \mathbb {E}f(P)-\mathbb {E}f(Q) \right |$ can then be rewritten:

$\begin {array}{rrl} &-\min \sum _x P_xf(x) - \sum _x Q_xf(x)\\ &&\\ \text { subject to:} &\sum _x P_x &= 1\\ &\sum _x Q_x &= 1\\ &\forall S \subseteq \{1,\dots ,n\} \text { s.t. } |S|\leq k,\sum _x (P_x - Q_x) \chi _S(x) &= 0 \\ \end {array}$

Following the syntax of LP Duality stated above, we have:

$c^T = \overbrace {\cdots f(x) \cdots }^{2^n}\overbrace {\cdots -f(x) \cdots }^{2^n} \in \mathbb {R}^{2n}$, (where $x$ goes over $\{0,1\}^n$),

$x^T = \overbrace {\cdots P_x \cdots }^{2^n}\overbrace {\cdots Q_x \cdots }^{2^n} \in \mathbb {R}^{2n}$,

$b^T = 1 1 \overbrace {0\cdots 0}^{\# S}$,

$A = \left ( \begin {array}{cc} \overbrace {1\cdots \cdots 1}^{2^n} & \overbrace {0\cdots \cdots 0}^{2^n} \\ 0 \cdots \cdots 0 & 1 \cdots \cdots 1 \\ \cdots \cdots & \cdots \cdots \\ \vdots \cdots \cdots \vdots & \vdots \cdots \cdots \vdots \\ \cdots \chi _S(x) \cdots & \cdots -\chi _S(x) \cdots \\ \vdots \cdots \cdots \vdots & \vdots \cdots \cdots \vdots \\ \cdots \cdots & \cdots \cdots \\ \end {array} \right )$,

where the rows of $A$ except the first two correspond to some $S \subseteq \{1,\dots ,n\}$ such that $|S|\leq k$.

We apply LP duality. We shall denote the new set of variables by

$y^T = d \, d'\, \overbrace {\cdots d_S\cdots }^{\#S}$.

We have the following program:

$\begin {array}{rrl} &-\max d+d'\\ &&\\ \text { subject to:} &\forall x, d + \sum _x d_S \chi _S(x) &\leq f(x)\\ &\forall x, d' - \sum _x d_S \chi _S(x) &\leq -f(x)\\ \end {array}$

Writing $d' = -d-\epsilon$, the objective becomes to minimize $\epsilon$, while the second set of constraints can be rewritten:

\begin{aligned} \forall x, d+\epsilon + \sum _S d_S\chi _S(x) \geq f(x) \, . \end{aligned}

The expression $d + \sum _S d_S \chi _S(X)$ is an arbitrary degree-$k$ polynomial which we denote by $g(X)$. So our constrains become

\begin{aligned} g(x) &\leq f(x)\\ g(x) + \epsilon &\geq f(x). \end{aligned}

Where $g$ ranges over all degree-$k$ polynomials, and we are trying to minimize $\epsilon$. Because $g$ is always below $f$, but when you add $\epsilon$ it becomes bigger, $g$ is always within $\epsilon$ of $f$. $\square$

#### 1.4 Approximate Degree of AND.

Let us now study the AND function on $n$ bits. Let us denote $d_{\epsilon }(f)$ the minimal degree of a polynomial approximating $f$ with error $\epsilon$.

We will show that $d_{1/3}($AND$) = \Theta (\sqrt {n})$.

Let us first show the upper bound:

Claim 4. We have:

\begin{aligned}d_{1/3}(\text {AND}) = \mathcal {O}(\sqrt {n}).\end{aligned}

To prove this claim, we will consider a special family of polynomials:

Definition 5. (Chebychev polynomials of the first kind.)

The Chebychev polynomials (of the first kind) are a family $\{T_k\}_{k\in \mathbb {N}}$ of polynomials defined inductively as:

• $T_0(X) := 1$,
• $T_1(X) := X$,
• $\forall k \geq 1, T_{k+1}(X) := 2X T_k - T_{k-1}$.

Those polynomials satisfy some useful properties:

1. $\forall x \in [-1,1], T_k(x) = \cos (k \arccos (x))\, ,$
2. $\forall x \in [-1,1], \forall k, |T_k(x)| \leq 1 \,$,
3. $\forall x$ such that $|x| \geq 1, |T'_k(x)| \geq k^2 \,$,
4. $\forall k, T_k(1)=1 \,$.

Property $2$ follows from $1$, and property $4$ follows from a direct induction. For a nice picture of these polynomials you should have come to class (or I guess you can check wikipedia). We can now prove our upper bound:

Proof. Proof of Claim:

We construct a univariate polynomial $p:\{0,1,\dots ,n\} \rightarrow \mathbb {R}$ such that:

• $\deg p = \mathcal {O}(\sqrt {n})$;
• $\forall i;
• $|P(n)-1| \leq 1/3$.

In other words, $p$ will be close to $0$ on $[0,n-1]$, and close to $1$ on $n$. Then, we can naturally define the polynomial for the AND function on $n$ bits to be $q(X_1,\dots ,X_n) := p(\sum _i X_i)$, which also has degree $\mathcal {O}(\sqrt {n})$. Indeed, we want $q$ to be close to $0$ if $X$ has Hamming weight less than $n$, while being close to $1$ on $X$ of Hamming weight $n$ (by definition of AND). This will conclude the proof.

Let us define $p$ as follows:

\begin{aligned} \forall i\leq n, \quad p(i):= \frac {T_k\left ( \frac {i}{n-1}\right )}{T_k\left ( \frac {n}{n-1}\right )}. \end{aligned}

Intuitively, this uses the fact that Chebychev polynomials are bounded in $[-1,1]$ (Property 2.) and then increase very fast (Property 3.).

More precisely, we have:

• $p(n)=1$ by construction;
• for $i, we have:

$T_k\left ( \frac {i}{n-1}\right ) \leq 1$ by Property 2.;

$T_k\left ( \frac {n}{n-1}\right ) = T_k\left (1 + \frac {1}{n-1}\right ) \geq 1 + \frac {k^2}{n-1}$ by Property 3. and 4., and therefore for some $k = \mathcal {O}(\sqrt {n})$, we have: $T_k\left ( \frac {n}{n-1}\right ) \geq 3$.

$\square$

Let us now prove the corresponding lower bound:

Claim 6. We have:

\begin{aligned}d_{1/3}(\text {AND}) = \Omega (\sqrt {n}).\end{aligned}

Proof. Let $p$ be a polynomial that approximates the AND function with error $1/3$. Consider the univariate symmetrization $p'$ of $p$.

We have the following result from approximation theory:

Theorem 7. Let $q$ be a real univariate polynomial such that:

1. $\forall i \in \{0,\dots ,n\}, |q(i)| \leq \mathcal {O}(1)$;
2. $q'(x) \geq \Omega (1)$ for some $x \in [0,n]$.

Then $\deg q = \Omega (\sqrt {n})$.

To prove our claim, it is therefore sufficient to check that $p'$ satisfies conditions 1. and 2., as we saw that $\deg p \geq \deg p'$:

1. We have: $\forall i \in \{0,\dots ,n\}, |p'(i)| \leq 1 + 1/3$ by assumption on $p$;
2. We have $p'(n-1) \leq 1/3$ and $p'(n) \geq 2/3$ (by assumption), so that the mean value theorem gives some $x$ such that $p'(x) \geq \Omega (1)$.

This concludes the proof. $\square$

#### 1.5 Approximate Degree of AND-OR.

Consider the AND function on $R$ bits and the OR function on $N$ bits. Let AND-OR$:\{0,1\}^{R\times N} \rightarrow \{0,1\}$ be their composition (which outputs the AND of the $R$ outputs of the $OR$ function on $N$-bits (disjoint) blocks).

It is known that $d_{1/3}($AND-OR$) = \Theta (\sqrt {RN})$. To prove the upper bound, we will need a technique to compose approximating polynomials which we will discuss later.

Now we focus on the lower bound. This lower bound was recently proved independently by Sherstov and by Bun and Thaler. We present a proof that is different (either in substance or in language) and which we find more intuitive. Our proof replaces the “dual block method” with the following lemma.

Lemma 8. Suppose that

distributions $A^0, A^1$ over $\{0,1\}^{n_A}$ are $k_A$-wise indistinguishable distributions; and

distributions $B^0, B^1$ over $\{0,1\}^{n_B}$ are $k_B$-wise indistinguishable distributions.

Define $C^0, C^1$ over $\{0,1\}^{n_A \cdot n_B}$ as follows: $C^b$: draw a sample $x \in \{0,1\}^{n_A}$ from $A^b$, and replace each bit $x_i$ by a sample of $B^{x_i}$ (independently).

Then $C^0$ and $C^1$ are $k_A \cdot k_B$-wise indistinguishable.

Proof. Consider any set $S \subseteq \{1,\dots , n_A\cdot n_B \}$ of $k_A \cdot k_B$ bit positions; let us show that they have the same distribution in $C^0$ and $C^1$.

View the $n_A \cdot n_B$ as $n_A$ blocks of $n_B$ bits. Call a block $K$ of $n_B$ bits heavy if $|S\cap K| \geq k_B$; call the other blocks light.

There are at most $k_A$ heavy blocks by assumption, so that the distribution of the (entire) heavy blocks are the same in $C^0$ and $C^1$ by $k_A$-wise indistinguishability of $A^0$ and $A^1$.

Furthermore, conditioned on any outcome for the $A^b$ samples in $C^b$, the light blocks have the same distribution in both $C^0$ and $C^1$ by $k_B$-wise indistinguishability of $B^0$ and $B^1$.

Therefore $C^0$ and $C^1$ are $k_A \cdot k_B$-wise indistinguishable. $\square$

### Call for Nominations: SIGecom Doctoral Dissertation Award

[This announcement is posted on behalf of the SIGecom Doctoral Dissertation Award Committee.]

The SIGecom Doctoral Dissertation Award recognizes an outstanding dissertation in the field of economics and computer science. The award is conferred annually at the ACM Conference on Economics and Computation and includes a plaque, complimentary conference registration, and an honorarium of 1,500. A plaque may further be given to up to two runners-up. No award may be conferred if the nominations are judged not to meet the standards for the award. To be eligible, a dissertation must be on a topic related to the field of economics and computer science and must have been defended successfully during the calendar year preceding the year of the award presentation. The next SIGecom Doctoral Dissertation Award will be given for dissertations defended in 2017. Nominations are due by the March 31, 2018, and must be submitted by email with the subject “SIGecom Doctoral Dissertation Award” to the awards committee at sigecom-awards-diss@acm.org. A dissertation may be nominated simultaneously for both the SIGecom Doctoral Dissertation Award and the ACM Doctoral Dissertation Award. Nominations may be made by any member of SIGecom, and will typically come from the dissertation supervisor. Self-nomination is not allowed. Nominations for the award must include the following, preferably in a single PDF file: 1. A two-page summary of the dissertation, written by the nominee, including bibliographic data and links to publicly accessible versions of published papers based primarily on the dissertation. 2. An English-language version of the dissertation. 3. An endorsement letter of no more than two pages by the nominator, arguing the merit of the dissertation, potential impact, and justification of the nomination. This document should also certify the dissertation defense date. 4. The names, email addresses, and affiliations of at least two additional endorsers. The additional endorsement letters themselves should be emailed directly to sigecom-awards-diss@acm.org, by the same deadline. These endorsements should be no longer than 500 words, and should specify the relationship of the endorser to the nominee, contributions of the dissertation, and its potential impact on the field. It is expected that a nominated candidate, if selected for the award, will attend the next ACM Conference on Economics and Computation to accept the award and give a presentation on the dissertation work. The cost of attending the conference is not covered by the award, but complimentary registration is provided. Award Committee • Nicole Immorlica, Microsoft Research New England • Ariel Procaccia, Carnegie Mellow University • Aaron Roth, University of Pennsylvania by robertkleinberg at October 17, 2017 07:36 AM UTC ### Beyond Worst-Case Analysis: Lecture 9 from Luca Trevisan Scribed by Chinmay Nirkhe In which we explore the Stochastic Block Model. 1. The ${G_{n,p,q}}$ problem The Stochastic Block Model is a generic model for graphs generated by some parameters. The simplest model and one we will consider today is the ${G_{n,p,q}}$ problem. Definition 1 (${G_{n,p,q}}$ graph distribution) The ${G_{n,p,q}}$ distribution is a distribution on graphs of ${n}$ vertices where ${V}$ is partitioned into two 2 subsets of equal size: ${V = V_1 \sqcup V_2}$. Then for ${\{i,j\}}$ pair of vertices in the same subset, ${\Pr( (i,j) \in E) = p}$ and otherwise ${\Pr( (i,j) \in E) = q}$. We will only consider the regime under which ${p > q}$. If we want to find the partition ${V = V_1 \sqcup V_2}$, it is intuitive to look at the problem of finding the minimum balanced cut. The cut ${(V_1, V_2)}$ has expected size ${q n^2 / 4}$ and any other cut will have greater expected size. Our intuition should be that as ${p \rightarrow q}$, the problem only gets harder. And for fixed ratio ${p/q}$, as ${p,q \rightarrow 1}$, the problem only gets easier. This can be stated rigorously as follows: If we can solve the problem for ${p,q}$ then we can also solve it for ${cp, cq}$ where ${c > 1}$, by keeping only ${1/c}$ edges and reducing to the case we can solve. Recall that for the ${k}$-planted clique problem, we found the eigenvector ${{\bf x}}$ corresponding to the largest eigenvalue of ${A - \frac{1}{2} J}$. We then defined ${S}$ as the vertices ${i}$ with the ${k}$ largest values of ${|x_i|}$ and cleaned up ${S}$ a little to get our guess for the planted clique. In the Stochastic Block Model we are going to follow a similar approach, but we are instead going to find the largest eigenvalue of ${A - \left( \frac{p + q}{2} \right) J}$. Note this is intuitive as the average degree of the graph is ${p(n/2 - 1) + q(n/2) \approx \frac{n}{2} (p+q)}$. The idea is simple: Solve ${{\bf x}}$ the largest eigenvector corresponding to the largest eigenvalue and define $\displaystyle V_1 = \{ i : x_i > 0\}, \qquad V_2 = \{ i : x_i \leq 0 \} \ \ \ \ \ (1)$ As we proceed to the analysis of this procedure, we fix ${V_1, V_2}$. Prior to fixing, the adjacency matrix ${A}$ was ${\left( \frac{p+q}{2} \right) J}$.\footnote{The diagonal should be zeroes, but this is close enough.} Upon fixing ${V_1, V_2}$, the average adjacency matrix ${R}$ looks different. For ease of notation, if we write a bold constant ${{\bf c}}$ for a matrix, we mean the matrix ${cJ}$. It will be clear from context. $\displaystyle R = \left( \begin{array}{c | c} {\bf p} & {\bf q} \\ {\bf q} & {\bf p} \end{array} \right) \ \ \ \ \ (2)$ Here we have broken up ${R}$ into blocks according to the partition ${V_1, V_2}$. Theorem 2 If ${p, q > \log n / n}$ then with high probability, ${\lVert A - R \rVert < O \left(\sqrt{n(p+q)} \right)}$. Proof: Define the graph ${G_1}$ as the union of a ${G_{n/2, p}}$ graph on ${V_1}$ and ${G_{n/2, p}}$ graph on ${V_2}$. Define the graph ${G_2}$ a a ${G_{n, q}}$ graph. Note that the graph ${G}$ is distributed according to picking a ${G_1}$ and ${G_2}$ graph and adding the partition crossing edges of ${G_2}$ to ${G_1}$. Let ${A_1}$ and ${A_2}$ be the respective adjacency matrices and define the follow submatrices: $\displaystyle A_1 = \left( \begin{array}{c | c} A_1' & \\ & A_1'' \end{array} \right), \qquad A_2 = \left( \begin{array}{c | c} A_2' & A_2''' \\ {A_2'''}^\dagger & A_2'' \end{array} \right). \ \ \ \ \ (3)$ Then the adjacency matrix ${A}$ is defined by $\displaystyle A = A_1 + A_2 - \left( \begin{array}{c | c} A_2' & \\ & A_2'' \end{array} \right) \ \ \ \ \ (4)$ Similarly, we can generate a decomposition for ${R}$: $\displaystyle R = \left( \begin{array}{c | c} {\bf p} & \\ & {\bf p} \end{array} \right) + \bigg ( {\bf q} \bigg ) - \left( \begin{array}{c | c} {\bf q} & \\ & {\bf q} \end{array} \right). \ \ \ \ \ (5)$ Then using triangle inequality we can bound ${\lVert A - R \rVert}$ by bounding the difference in the various terms. \displaystyle \begin{aligned} \lVert A - R \rVert &\leq \left \lVert A_1 - \left(\begin{array}{c | c} {\bf p} & \\ & {\bf p} \end{array} \right) \right \rVert + \left \lVert A_2 - ({\bf q}) \right \rVert + \left \lVert \left( \begin{array}{c | c} A_2' & \\ & A_2'' \end{array} \right) - \left( \begin{array}{c | c} {\bf q} & \\ & {\bf q} \end{array} \right) \right \rVert \\ &\leq O(\sqrt{np}) + O(\sqrt{nq}) + O(\sqrt{nq}) \\ &= O \left( \sqrt{n (p+q)} \right) \end{aligned} \ \ \ \ \ (6) The last line follows as the submatrices are adjacency matrices of ${G_{n,p}}$ graphs and we can apply the results we proved in that regime for ${p, q > \log n / n}$. $\Box$ But the difficulty is that we don’t know ${R}$ as ${R = R(V_1, V_2)}$. If we knew ${R}$, then we would know the partition. What we can compute is ${\left \lVert A - \left(\frac{p+q}{2} \right) J \right \rVert}$.\footnote{The rest of this proof actually doesn’t even rely on knowing ${p}$ or ${q}$. We can estimate ${p + q}$ by calculating the average vertex degree.} We can rewrite ${R}$ as $\displaystyle R = \left( \frac{p+q}{2}\right) J + \frac{p-q}{2} \left( \begin{array}{c | c} {\bf 1} & - {\bf 1} \\ - {\bf 1} & {\bf 1} \end{array} \right) \ \ \ \ \ (7)$ Call the matrix on the right ${C}$. It is clearly rank-one as it has decomposition ${n \chi \chi^\dagger}$ where ${\chi = \frac{1}{\sqrt{n}}\left( \begin{array}{c} {\bf 1} \\ -{\bf 1} \end{array} \right)}$. Therefore $\displaystyle \left \lVert \left( A - \left( \frac{p+q}{2} \right) J \right) - \left( \frac{p-q}{2} \right) C \right \rVert = \left \lVert A - R \right \rVert \leq O \left( \sqrt{n (p+q)} \right). \ \ \ \ \ (8)$ Then ${A - \left( \frac{p+q}{2} \right) J}$ is close (in operator norm) to the rank 1 matrix ${\left( \frac{p-q}{2} \right) C}$. Then their largest eigenvalues are close. But since ${\left( \frac{p-q}{2} \right) C}$ has only one non-zero eigenvalue ${\chi}$, finding the corresponding eigenvector to the largest eigenvalue of ${A - \left( \frac{p+q}{2} \right) J}$ will be close to the ideal partition as ${C}$ describes the ideal partition. This can be formalized with the Davis-Kaham Theorem. Theorem 3 (Davis-Kahan) Given matrices ${M, M'}$ with ${\lVert M - M' \rVert \leq \varepsilon}$ where ${M}$ has eigenvalues ${\lambda_1 \leq \ldots \leq \lambda_n}$ and corresponding eigenvectors ${{\bf v}_1, \ldots, {\bf v}_n}$ and ${M'}$ has eigenvalues ${\lambda_1' \leq \ldots \leq \lambda_n'}$ and corresponding eigenvectors ${{\bf v}_1', \ldots, {\bf v}_n'}$, then $\displaystyle \sin \left( \mathrm{angle} \left( \mathrm{span}({\bf v}_1), \mathrm{span}({\bf v}_1') \right) \right) \leq \frac{\varepsilon}{|\lambda_1' - \lambda_2|} \leq \frac{\varepsilon}{|\lambda_1 - \lambda _2 - \varepsilon|}. \ \ \ \ \ (9)$ Equivalently, $\displaystyle \min \left \{ \lVert {\bf v}_1 \pm {\bf v}_1' \rVert \right \} \leq \frac{\sqrt{2} \varepsilon}{\lambda_1 - \lambda_2 - \varepsilon}. \ \ \ \ \ (10)$ The Davis Kahan Theorem with ${M' = A - \left( \frac{p+q}{2} \right) J, M = \left( \frac{p-q}{2} \right) C}$, and ${\varepsilon = O \left( \sqrt{n (p+q)} \right)}$ states that $\displaystyle \min \left \{ \lVert {\bf v}' \pm \chi \rVert \right \} \leq O \left( \frac{\sqrt{a + b}}{a - b - O \left( \sqrt{a + b} \right)} \right) \ \ \ \ \ (11)$ where ${{\bf v}'}$, the eigenvector associated with the largest eigenvalue of ${A - \left( \frac{p+q}{2} \right) J}$ and ${a = pn /2, b = qn/2}$, the expected degrees of the two parts of the graph. Choose between ${\pm {\bf v}'}$ for the one closer to ${\chi}$. Then $\displaystyle \lVert {\bf v}' - \chi \rVert^2 \leq O \left( \left( \frac{\sqrt{a + b}}{a - b - O\left(\sqrt{a + b}\right)} \right)^2 \right). \ \ \ \ \ (12)$ Recall that ${\sum_i (v_i' - \chi_i)^2 = \lVert {\bf v}' - \chi \rVert^2}$. If ${v_i'}$ and ${\chi_i}$ disagree in sign, then this contributes at least ${1/n}$ to the value of ${\lVert {\bf v}' - \chi \rVert^2}$. Equivalently, ${n \cdot \lVert {\bf v}' - \chi \rVert^2}$ is at least the number of misclassified vertices. It is simple to see from here that if ${a - b \geq c_\varepsilon \sqrt{a + b}}$ then we can bound the number of misclassified vertices by ${\varepsilon n}$. This completes the proof that the proposed algorithm does well in calculating the partition of the Stochastic Block Model. ### Reductions between formal languages Let EQ = {w : number of a's = number of b's } Let EQO = { anbn : n ∈ N} (so its Equal and in Order) Typically we do the following: Prove EQO is not regular by the pumping lemma. Then to show EQ is not regular you say: If EQ was regular than EQ INTERSECT a*b*= EQO is regular, hence EQ is not regular (I know you can also show EQ with the Pumping Lemma but thats not important now.) One can view this as a reduction: A ≤ B If one can take B, apply a finite sequence of closure operations (e.g., intersect with a regular lang, complement, replace all a with aba, replace all a with e (empty string), ) and get A. If A is not regular and A≤ B then B is not regular. Note that EQO ≤ EQ ≤ EQ Since EQO is not regular (by pumping ) we get EQ and \overline{EQ} are not regular. Hence we could view the theory of showing things not-reg like the theory of NP completeness with reductions and such. However, I have never seen a chain of more than length 2. BUT- consider the following! Instead of using Pumping Lemma we use Comm. Comp. I have been able to show (and this was well known) that EQ is not regular by using Comm. Comp: EQH = { (x,y) : |x|=|y| and number of a's in xy = number of b's in xy } Comm Complexity of EQH is known to be log n + \Theta(1). Important- NOT O(1). If EQ is regular then Alice and Bob have an O(1) protocol: Alice runs x through the DFA and transmits to Bob the state, then Bob runs y from that state to the end and transmits 1 if ended up in an accept state, 0 if not. But I was not able to show EQO is not regular using Comm Complexity. SO imagine a bizzaro world where I taught my students the Comm Comp approach but not the Pumping Lemma. Could they prove that EQO is not regular. For one thing, could they prove EQO ≤ EQ ? Or show that this CANNOT be done. Anyone know? One could also study the structure of the degrees induced by the equiv classes. If this has already been done, let me know in the comments. by GASARCH (noreply@blogger.com) at October 16, 2017 03:59 PM UTC ### If Quantum Computers are not Possible Why are Classical Computers Possible? from Gil Kalai As most of my readers know, I regard quantum computing as unrealistic. You can read more about it in my Notices AMS paper and its extended version (see also this post) and in the discussion of Puzzle 4 from my recent puzzles paper (see also this post). The amazing progress and huge investment in quantum computing (that I presented and update routinely in this post) will put my analysis to test in the next few years. Guy Kindler and I identified a very primitive complexity class LDP that describes noisy quantum systems in the small scales (few bosons, qubits etc.) Today I would like to discuss, with the help of illustrations, a central issue in the debate about quantum computing: In view of a putative impossibility (and obvious difficulty) of quantum computing, why is classical computing possible (and so common)? This was a major challenge that Aram Harrow raised in our 2012 debate (in this post), and it goes back to Dave Bacon and others. My 2014 work with Guy Kindler on noisy BosonSampling (taking them as an analogy for noisy quantum circuits) leads to a fairly detailed answer that I will explain below, mainly with illustrations. ## The picture behind the question A common view: in principle, there is no difference between achieving quantum computing via quantum error-correcting codes and achieving classical computing via classical error-correcting codes. ## Why are classical information and computation possible? Encoding by repetition, and decoding by “majority” are both supported by low degree polynomials Encoding by repetition refers to a logical ‘one’ encoded by many physical ones and the same with zero. Approximate version of majority enables decoding. Both encoding and decoding are supported by low degree polynomials. (A variant which is more realistic is that one is encoded by 70% ones (say) and zero by 30% ones.) ## Why are robust quantum information and quantum computation impossible? It is commonly expected that creating good-quality quantum error correcting codes is harder than demonstrating quantum supremacy. Unlike the repetition/majority mechanism which is supported by very primitive computational power, creating a quantum error correcting code and the easier task of demonstrating quantum supremacy are not likely to be achieved by devices which are very low-level in terms of computational complexity. This is the difference! (Quantum supremacy refers to the ability of quantum computers to perform computations which are infeasible for classical computers. ) ## The computational complexity picture Bounded depth computation ($AC^0$) is an extremely primitive computational complexity class. Sampling tasks that can be performed approximately by low degree polynomials represent an even more primitive computational task. (The proof that LPD is in $AC^0$ is quite clever but seems to Guy and me quite irrelevant to quantum computing ) LDP is so low-level that it allows efficient learning! (This leads to the following prediction: distributions coming from pure quantum states that can be approached by noisy quantum devices allows efficient approximate learning.) ## The underlying principles Very simple, isn’t it? The notion of “primitive computational devices” in both principles applies to the asymptotic behavior. The second principle extends to quantum devices that do not involve quantum error-correction. ## Can you be a little more formal? Let me try. (Trying to understand how insights from the theory of computation relate to real life computation is quite important.) I don’t know how general is this insight. (I asked about it in TCSoverflow.) Note that this insight gave the rationale for the threshold theorem to start with. ## My older stuff about correlated errors My earlier work (Section 6 of the Notices paper, and earlier works cited there) proposes other goals that appear to be easier than creating quantum error correcting codes and I expect them to be impossible. The remarkable IBM machine applies two-qubit CNOT gates. One can expect that errors for the gated qubits have large positive correlation. This is not controversial but it would be nice to check it. You can implement CNOT gate indirectly by a chain of other gates. I expect that it will not be possible to reduce the positive correlation by a process that reaches the CNOT indirectly. (You can experience the IBM quantum computer here.) ## A question by Ady Stern It is a positive and not terribly small constant! I am not sure what “fundamental” means. (Sometimes, I get corrected when I pronounce my name correctly.) ## The quality of my argument ## But maybe the processes for quantum error-correction have more than meets the eye? ## Why not just improve the physical qubits? Q: But why can’t you simply create good enough qubits to allow universal quantum circuits with 50 qubits? A: This will allow very primitive devices (in terms of the asymptotic behavior of computational complexity) to perform superior computation. Q: This is an indirect explanation. Can you give an explanation on the level of a single qubit? A: Yes, if the qubit is of too high quality, the microscopic process leading to it also demonstrates a primitive device leading to a superior computation. ## Topological quantum computing? What about topological quantum computing? ## Is Nature a malicious opponent of quantum computing? My model of errors is precisely the ordinary model. ## What do you mean by “primitive computational devices”? ## The fate of Quantum Mechanics ## What about the huge efforts and resources for building quantum computers or intermediate tasks? Also the Commonwealth Bank ## I talked about the problem with Dave Bacon in 2006 (Inspiration: math with bad drawing) by Gil Kalai at October 16, 2017 02:00 PM UTC ### Call for Nominations for the SIGecom Doctoral Dissertation Award from Aaron Roth The SIGecom Doctoral Dissertation Award recognizes an outstanding dissertation in the field of economics and computer science. The award is conferred annually at the ACM Conference on Economics and Computation and includes a plaque, complimentary conference registration, and an honorarium of1,500. A plaque may further be given to up to two runners-up. No award may be conferred if the nominations are judged not to meet the standards for the award.

To be eligible, a dissertation must be on a topic related to the field of economics and computer science and must have been defended successfully during the calendar year preceding the year of the award presentation.

The next SIGecom Doctoral Dissertation Award will be given for dissertations defended in 2017. Nominations are due by the March 31, 2018, and must be submitted by email with the subject "SIGecom Doctoral Dissertation Award" to the awards committee at sigecom-awards-diss@acm.org. A dissertation may be nominated simultaneously for both the SIGecom Doctoral Dissertation Award and the ACM Doctoral Dissertation Award.

Nominations may be made by any member of SIGecom, and will typically come from the dissertation supervisor. Self-nomination is not allowed. Nominations for the award must include the following, preferably in a single PDF file:

1. A two-page summary of the dissertation, written by the nominee, including bibliographic data and links to publicly accessible versions of published papers based primarily on the dissertation.
2. An English-language version of the dissertation.
3. An endorsement letter of no more than two pages by the nominator, arguing the merit of the dissertation, potential impact, and justification of the nomination. This document should also certify the dissertation defense date.
4. The names, email addresses, and affiliations of at least two additional endorsers.

The additional endorsement letters themselves should be emailed directly to sigecom-awards-diss@acm.org, by the same deadline. These endorsements should be no longer than 500 words, and should specify the relationship of the endorser to the nominee, contributions of the dissertation, and its potential impact on the field.

It is expected that a nominated candidate, if selected for the award, will attend the next ACM Conference on Economics and Computation to accept the award and give a presentation on the dissertation work. The cost of attending the conference is not covered by the award, but complimentary registration is provided.

by Aaron Roth (noreply@blogger.com) at October 16, 2017 01:49 PM UTC

### STOC 2018 CFP

The call for paper for STOC is up, see  http://acm-stoc.org/stoc2018/ and  here. The deadline is November 3, 2017 4:59pm Eastern Daylight Time.

STOC 2018 will again be a Theory Fest and also celebrate the 50th anniversary of STOC. As part of the “retro” atmosphere, the STOC PC also asks submissions to be sent by mail in 20 printed copies at most 10 pages.

by Boaz Barak at October 16, 2017 03:12 AM UTC

### The power of sum-of-squares for detecting hidden structures

Authors: Samuel B. Hopkins, Pravesh K. Kothari, Aaron Potechin, Prasad Raghavendra, Tselil Schramm, David Steurer
Abstract: We study planted problems---finding hidden structures in random noisy inputs---through the lens of the sum-of-squares semidefinite programming hierarchy (SoS).

This family of powerful semidefinite programs has recently yielded many new algorithms for planted problems, often achieving the best known polynomial-time guarantees in terms of accuracy of recovered solutions and robustness to noise.

One theme in recent work is the design of spectral algorithms which match the guarantees of SoS algorithms for planted problems.

Classical spectral algorithms are often unable to accomplish this: the twist in these new spectral algorithms is the use of spectral structure of matrices whose entries are low-degree polynomials of the input variables.

We prove that for a wide class of planted problems, including refuting random constraint satisfaction problems, tensor and sparse PCA, densest-k-subgraph, community detection in stochastic block models, planted clique, and others, eigenvalues of degree-d matrix polynomials are as powerful as SoS semidefinite programs of roughly degree d.

For such problems it is therefore always possible to match the guarantees of SoS without solving a large semidefinite program.

Using related ideas on SoS algorithms and low-degree matrix polynomials (and inspired by recent work on SoS and the planted clique problem by Barak et al.), we prove new nearly-tight SoS lower bounds for the tensor and sparse principal component analysis problems. Our lower bounds for sparse principal component analysis are the first to suggest that going beyond existing algorithms for this problem may require sub-exponential time.

### PCPNET: Learning Local Shape Properties from Raw Point Clouds

Authors: Paul Guerrero, Yanir Kleiman, Maks Ovsjanikov, Niloy J. Mitra
Abstract: In this paper, we propose a deep-learning based approach for estimating local 3D shape properties in point clouds. In contrast to the majority of prior techniques that concentrate on global or mid-level attributes, e.g., for shape classification or semantic labeling, we suggest a patch-based learning method, in which a series of local patches at multiple scales around each point is encoded in a structured manner. Our approach is especially well-adapted for estimating local shape properties such as normals (both unoriented and oriented) and curvature from raw point clouds in the presence of strong noise and multi-scale features. Our main contributions include both a novel multi-scale variant of a recently proposed PointNet architecture with emphasis on local shape information, and a series of novel applications in which we demonstrate how learning from training data arising from well-structured triangle meshes, and applying the trained model to noisy point clouds can produce superior results compared to specialized state-of-the-art techniques. Finally, we demonstrate the utility of our approach in the context of shape reconstruction, by showing how it can be used to extract normal orientation information from point clouds.

### High Dimensional Cluster Analysis Using Path Lengths

Authors: Kevin McIlhany, Stephen Wiggins
Abstract: A hierarchical scheme for clustering data is presented which applies to spaces with a high number of dimension ($N_{_{D}}>3$). The data set is first reduced to a smaller set of partitions (multi-dimensional bins). Multiple clustering techniques are used, including spectral clustering, however, new techniques are also introduced based on the path length between partitions that are connected to one another. A Line-Of-Sight algorithm is also developed for clustering. A test bank of 12 data sets with varying properties is used to expose the strengths and weaknesses of each technique. Finally, a robust clustering technique is discussed based on reaching a consensus among the multiple approaches, overcoming the weaknesses found individually.

### Compact Formulae in Sparse Elimination

Authors: Ioannis Emiris
Abstract: It has by now become a standard approach to use the theory of sparse (or toric) elimination, based on the Newton polytope of a polynomial, in order to reveal and exploit the structure of algebraic systems. This talk surveys compact formulae, including older and recent results, in sparse elimination. We start with root bounds and juxtapose two recent formulae: a generating function of the m-B{\'e}zout bound and a closed-form expression for the mixed volume by means of a matrix permanent. For the sparse resultant, a bevy of results have established determinantal or rational formulae for a large class of systems, starting with Macaulay. The discriminant is closely related to the resultant but admits no compact formula except for very simple cases. We offer a new determinantal formula for the discriminant of a sparse multilinear system arising in computing Nash equilibria. We introduce an alternative notion of compact formula, namely the Newton polytope of the unknown polynomial. It is possible to compute it efficiently for sparse resultants, discriminants, as well as the implicit equation of a parameterized variety. This leads us to consider implicit matrix representations of geometric objects.

### Robust Submodular Maximization: Offline and Online Algorithms

Authors: Nima Anari, Nika Haghtalab, Joseph Naor, Sebastian Pokutta, Mohit Singh, Alfredo Torrico
Abstract: In this work, we consider the robust submodular maximization problem subject a matroid constraint in the offline and online setting. In the offline version, we are given a collection of $k$ monotone submodular functions and matroid on a ground set of size $n$. The goal is to select one independent set that maximizes the minimum of the submodular functions. Given the complexity of the problem, we design a bi-criteria approximation algorithm that returns a set that is the union of a logarithmic number of independent sets. In the online version of the problem, we receive a new collection of functions at each time step and aim to pick an independent set in every stage. We measure the performance of the algorithm in the regret setting. Again, we give a bi-criteria approximation algorithm which gives a (nearly) optimal approximation as well as regret bounds. This result rely crucially on modifying the Follow-the-Perturbed-Leader algorithm of [Kalai and Vempala, 2005].

### TR17-157 | High Degree Sum of Squares Proofs, Bienstock-Zuckerberg hierarchy and Chvatal-Gomory cuts | Monaldo Mastrolilli

from ECCC papers

Chvatal-Gomory (CG) cuts and the Bienstock-Zuckerberg hierarchy capture useful linear programs that the standard bounded degree Lasserre/Sum-of-Squares (SOS) hierarchy fails to capture. In this paper we present a novel polynomial time SOS hierarchy for 0/1 problems with a custom subspace of high degree polynomials (not the standard subspace of low-degree polynomials). We show that the new SOS hierarchy recovers the Bienstock-Zuckerberg hierarchy. Our result implies a linear program that reproduces the Bienstock-Zuckerberg hierarchy as a polynomial sized, efficiently constructible extended formulation that satisfies all constant pitch inequalities. The construction is also very simple, and it is fully defined by giving the supporting polynomials (one paragraph). Moreover, for a class of polytopes (e.g. set covering and packing problems) it optimizes, up to an arbitrarily small error, over the polytope resulting from any constant rounds of CG cuts. Arguably, this is the first example where different basis functions can be useful in asymmetric situations to obtain a hierarchy of relaxations.

from David Eppstein

by David Eppstein at October 15, 2017 03:21 PM UTC

### TR17-156 | Small-depth Multilinear Formula Lower Bounds for Iterated Matrix Multiplication, with Applications | Suryajith Chillara, Nutan Limaye, Srikanth Srinivasan

from ECCC papers

The complexity of Iterated Matrix Multiplication is a central theme in Computational Complexity theory, as the problem is closely related to the problem of separating various complexity classes within $\mathrm{P}$. In this paper, we study the algebraic formula complexity of multiplying $d$ many $2\times 2$ matrices, denoted $\mathrm{IMM}_{d}$, and show that the well-known divide-and-conquer algorithm cannot be significantly improved at any depth, as long as the formulas are multilinear. Formally, for each depth $\Delta \leq \log d$, we show that any product-depth $\Delta$ multilinear formula for $\mathrm{IMM}_d$ must have size $\exp(\Omega(\Delta d^{1/\Delta})).$ It also follows from this that any multilinear circuit of product-depth $\Delta$ for the same polynomial of the above form must have a size of $\exp(\Omega(d^{1/\Delta})).$ In particular, any polynomial-sized multilinear formula for $\mathrm{IMM}_d$ must have depth $\Omega(\log d)$, and any polynomial-sized multilinear circuit for $\mathrm{IMM}_d$ must have depth $\Omega(\log d/\log \log d).$ Both these bounds are tight up to constant factors. Our lower bound has the following consequences for multilinear formula complexity. 1. Depth-reduction: A well-known result of Brent (JACM 1974) implies that any formula of size $s$ can be converted to one of size $s^{O(1)}$ and depth $O(\log s)$; further, this reduction continues to hold for multilinear formulas. On the other hand, our lower bound implies that any depth-reduction in the multilinear setting cannot reduce the depth to $o(\log s)$ without a superpolynomial blow-up in size. 2. Separations from general formulas: Shpilka and Yehudayoff (FnTTCS 2010) asked whether general formulas can be more efficient than multilinear formulas for computing multilinear polynomials. Our result, along with a non-trivial upper bound for $\mathrm{IMM}_{d}$ implied by a result of Gupta, Kamath, Kayal and Saptharishi (SICOMP 2016), shows that for any size $s$ and product-depth $\Delta = o(\log s),$ general formulas of size $s$ and product-depth $\Delta$ cannot be converted to multilinear formulas of size $s^{\omega(1)}$ and product-depth $\Delta,$ when the underlying field has characteristic zero.

### Beyond Worst-Case Analysis: Lecture 8

from Luca Trevisan

Scribed by Luowen Qian

In which we use spectral techniques to find certificates of unsatisfiability for random ${k}$-SAT formulas.

1. Introduction

Given a random ${k}$-SAT formula with ${m}$ clauses and ${n}$ variables, we want to find a certificate of unsatisfiability of such formula within polynomial time. Here we consider ${k}$ as fixed, usually equal to 3 or 4. For fixed ${n}$, the more clauses you have, the more constraints you have, so it becomes easier to show that these constraints are inconsistent. For example, for 3-SAT,

1. In the previous lecture, we have shown that if ${m > c_3 \cdot n}$ for some large constant ${c_3}$, almost surely the formula is not satisfiable. But it’s conjectured that there is no polynomial time, or even subexponential time algorithms that can find the certificate of unsatisfiability for ${m = O(n)}$.
2. If ${m > c \cdot n^2}$ for some other constant ${c}$, we’ve shown in the last time that we can find a certificate within polynomial time with high probability that the formula is not satisfiable.

The algorithm for finding such certificate is shown below.

• Algorithm 3SAT-refute(${f}$)
• for ${b_1 \in \{0,1\}}$
• if 2SAT-satisfiable(${f}$ restricted to clauses that contains ${x_1= \overline b_1}$, with ${x:= \overline b_1}$)
• return ${\bot}$
• return UNSATISFIABLE

We know that we can solve 2-SATs in linear time, and approximately

$\displaystyle \frac{\binom{n - 1} 2 \cdot m}{\binom n 3 \cdot 2} = \frac{3m}{2n + O(1)} > \frac 3 2 cn - O(1)$

clauses contains ${x_1 = \overline{b_1}}$. Similarly when ${c}$ is sufficiently large, the 2-SATs will almost surely be unsatisfiable. When a subset of the clauses is not satisfiable, the whole 3-SAT formula is not satisfiable. Therefore we can certify unsatisfiability for 3-SATs with high probability.

In general for ${k}$-SAT,

1. If ${m > c_k \cdot n}$ for some large constant ${c_k}$, almost surely the formula is not satisfiable.
2. If ${m > c'_k \cdot n^{k - 1}}$ for some other constant ${c'_k}$, we can construct a very similar algorithm, in which we check all assignments to the first ${k-2}$ variables, and see if the 2SAT part of the restricted formula is unsatisfiable.

Since for every fixed assignments to the first ${k - 2}$ variables, approximately

$\displaystyle \frac{\binom{n - k + 2} 2}{\binom n k 2^{k - 2}} = \frac{k!}{(n^{k - 2} + O(n^{k - 3})) 2^{k - 1}}$

portion of the ${m}$ clauses remains, we expect the constant ${c'_k = \Omega\left(\frac{2^k}{k!}\right)}$ and the running time is ${O(2^k m)}$.

So what about ${m}$‘s that are in between? It turns out that we can do better with spectral techniques. And the reason that spectral techniques work better is that unlike the previous method, it does not try all the possible assignments and fails to find a certificate of unsatisfiability.

2. Reduce certifying unsatisfiability for k-SAT to finding largest independent set

2.1. From 3-SAT instances to hypergraphs

Given a random 3-SAT formula ${f}$, which is an and of ${m}$ random 3-CNF-SAT clauses over ${n}$ variables ${x_1, x_2, ..., x_n}$ (abbreviated as vector ${{\bf x}}$), i.e.

$\displaystyle f({\bf x}) = \bigwedge\limits_{i = 1}^m \left( x_{\sigma_{i,1}} = b_{i,1} \lor x_{\sigma_{i,2}} = b_{i,2} \lor x_{\sigma_{i,3}} = b_{i,3} \right),$

where ${\sigma_{i,j} \in [n], b_{i,j} \in \{0, 1\}}$, ${\forall i \in [m], \sigma_{i,1} < \sigma_{i,2} < \sigma_{i,3}}$ and no two ${(\sigma_{i,1}, b_{i,1}, \sigma_{i,2}, b_{i,2}, \sigma_{i,3}, b_{i,3})}$ are exactly the same. Construct hypergraph ${H_f = (X, E)}$, where

$\displaystyle X = \left\{(i, b) \middle| i \in [n], b \in \{0, 1\}\right\}$

is a set of ${2n}$ vertices, where each vertex means an assignment to a variable, and

$\displaystyle E = \left\{ e_j \middle| j \in [m] \right\}, e_j = \{(\sigma_{j,1}, \overline{b_{j,1}}), (\sigma_{j,2}, \overline{b_{j,2}}), (\sigma_{j,3}, \overline{b_{j,3}})\}$

is a set of ${m}$ 3-hyperedges. The reason we’re putting in the negation of ${b}$ is that a 3-CNF clause evaluates to false if and only if all three subclauses evaluate to false. This will be useful shortly after.

First let’s generalize the notion of independent set for hypergraphs.

An independent set for hypergraph ${H = (X, E)}$ is a set ${S \subseteq X}$ that satisfies ${\forall e \in E, e \not \subseteq S}$.

If ${f}$ is satisfiable, ${H_f}$ has an independent set of size at least ${n}$. Equivalently if the largest independent set of ${H_f}$ has size less than ${n}$, ${f}$ is unsatisfiable. Proof: Assume ${f}$ is satisfiable, let ${{\bf x} \leftarrow {\bf y}}$ be a satisfiable assignment, where ${{\bf y} \in \{0, 1\}^n}$. Then ${S = \{ (x_i, y_i) | i \in [n] \}}$ is an independent set of size ${n}$. If not, it means some hyperedge ${e_j \subseteq S}$, so ${\sigma_{j,1} = \overline{b_{j,1}} \land \sigma_{j,2} = \overline{b_{j,2}} \land \sigma_{j,3} = \overline{b_{j,3}}}$ and the ${j}$-th clause in ${f}$ evaluates to false. Therefore ${f}$ evaluates to false, which contradicts the fact that ${{\bf y}}$ is a satisfiable assignment. $\Box$

We know that if we pick a random graph that’s sufficiently dense, i.e. the average degree ${d > \ln n}$, by spectral techniques we will have a certifiable upper bound on the size of the largest independent set of ${O\left(\frac n{\sqrt d}\right)}$ with high probability. So if a random graph has ${\Omega(n \log n)}$ random edges, we can prove that there’s no large independent set with high probability.

But if we have a random hypergraph with ${\Omega(n \log n)}$ random hyperedges, we don’t have any analog of spectral theories for hypergraphs that allow us to do this kind of certification. And from the fact that the problem of certifying unsatisfiability of random formula of ${\Omega(n \log n)}$ clauses is considered to be hard, we conjecture that there doesn’t exist a spectral theory for hypergraphs able to replicate some of the things we are able to do on graphs.

However, what we can do is possibly with some loss, to reduce the hypergraph to a graph, where we can apply spectral techniques.

2.2. From 4-SAT instances to graphs

Now let’s look at random 4-SATs. Similarly we will write a random 4-SAT formula ${f}$ as:

$\displaystyle f({\bf x}) = \bigwedge\limits_{i = 1}^m \left( x_{\sigma_{i,1}} = b_{i,1} \lor x_{\sigma_{i,2}} = b_{i,2} \lor x_{\sigma_{i,3}} = b_{i,3} \lor x_{\sigma_{i,4}} = b_{i,4} \right),$

where ${\sigma_{i,j} \in [n], b_{i,j} \in \{0, 1\}}$, ${\forall i \in [m], \sigma_{i,1} < \sigma_{i,2} < \sigma_{i,3} < \sigma_{i,4}}$ and no two ${(\sigma_{i,1}, b_{i,1}, ..., \sigma_{i,4}, b_{i,4})}$ are exactly the same. Similar to the previous construction, but instead of constructing another hypergraph, we will construct just a graph ${G_f = (V, E)}$, where

$\displaystyle V = \left\{(i_1, b_1, i_2, b_2) \middle| i_1, i_2 \in [n], b_1, b_2 \in \{0, 1\}\right\}$

is a set of ${4n^2}$ vertices and

$\displaystyle E = \left\{ e_j \middle| j \in [m] \right\}, e_j = \{(\sigma_{j,1}, \overline {b_{j,1}}, \sigma_{j,2}, \overline {b_{j,2}}), (\sigma_{j,3}, \overline {b_{j,3}}, \sigma_{j,4}, \overline {b_{j,4}})\}$

is a set of ${m}$ edges.

If ${f}$ is satisfiable, ${G_f}$ has an independent set of size at least ${n^2}$. Equivalently if the largest independent set of ${H_f}$ has size less than ${n^2}$, ${f}$ is unsatisfiable. Proof: The proof is very similar to the previous one. Assume ${f}$ is satisfiable, let ${{\bf x} \leftarrow {\bf y}}$ be a satisfiable assignment, where ${{\bf y} \in \{0, 1\}^n}$. Then ${S = \{ (x_i, y_i, x_j, y_j) | i, j \in [n] \}}$ is an independent set of size ${n^2}$. If not, it means some edge ${e_j \subseteq S}$, so ${\sigma_{j,1} = \overline {b_{j,1}} \land \sigma_{j,2} = \overline {b_{j,2}} \land \sigma_{j,3} = \overline {b_{j,3}} \land \sigma_{j,4} = \overline {b_{j,4}}}$ and the ${j}$-th clause in ${f}$ evaluates to false. Therefore ${f}$ evaluates to false, which contradicts the fact that ${{\bf y}}$ is a satisfiable assignment. $\Box$

From here, we can observe that ${G_f}$ is not a random graph because some edges are forbidden, for example when the two vertices of the edge has some element in common. But it’s very close to a random graph. In fact, we can apply the same spectral techniques to get a certifiable upper bound on the size of the largest independent set if the average degree ${d > \ln n}$, i.e. if ${m = \Omega(n^2 \log n)}$, we can certify unsatisfiability with high probability, by upper bounding the size of the largest independent set in the constructed graph.

We can generalize this results for all even ${k}$‘s. For random ${k}$-SAT where ${k}$ is even, if ${m > c_k n^{k/2} \log n}$, we can certify unsatisfiability with high probability, which is better than the previous method which requires ${m = \Omega(n^{k - 1})}$. The same ${n^{k/2}(\log n)^{O(1)}}$ is achievable for odd ${k}$, but the argument is significantly more complicated.

2.3. Certifiable upper bound for independent sets in modified random sparse graphs

Despite odd ${k}$‘s, another question is that in this setup, can we do better and get rid of the ${\log n}$ term? This term is coming from the fact that spectral norm break down when the average degree ${d < \ln n}$. However it’s still true that random graph doesn’t have any large independent sets even when the average degree ${d}$ is constant. It’s just that the spectral norm isn’t giving us good bounds any more, since the spectral norm is at most ${O\left(\sqrt{\max d}\right) = O\left(\sqrt \frac{\log n}{\log \log n}\right)}$. So is there something tighter than spectral bounds that could help us get rid of the ${\log n}$ term? Could we fix this by removing all the high degree vertices in the random graph?

This construction is due to Feige-Ofek. Given random graph ${G \sim G_{n, p}}$, where the average degree ${d = np}$ is some large constant. Construct ${G'}$ by taking ${G}$ and removing all edges incident on nodes with degree higher than ${2\bar d}$ where ${\bar d}$ is the average degree of ${G}$. We denote ${A}$ for the adjacency matrix of ${G}$ and ${A'}$ for that of ${G'}$. And it turns out,

With high probability, ${\left\lVert A' - \frac d n J \right\rVert \le O\left(\sqrt d\right)}$.

It turns out to be rather difficult to prove. Previously we saw spectral results on random graphs that uses matrix traces to bound the largest eigenvalue. In this case, it’s hard to do so because the contribution to the trace of a closed walk is complicated by the fact that edges have dependencies. The other approach is that given random matrix ${M}$, we will try to upper bound ${\left\lVert M \right\rVert = \max\limits_x \frac {x^T M x} {\lVert x \rVert^2}}$. A standard way for this is to that for every solution, count the instances of ${M}$ in which the fixed solution is good, and argue that the number of the fixed solutions is small, which tells us that there’s no good solution. The problem here is that the set of solutions is infinitely large. So Feige-Ofek discretize the set of vectors, and then reduce the bound on the quadratic form of a discretized vector to a sum of several terms, each of which has to be carefully bounded.

We always have

$\displaystyle \max \textrm{IndSet}(G) \le \max \textrm{IndSet}(G') \le \frac n d \left\lVert A' - \frac d n J \right\rVert$

and so, with high probability, we get an ${O\left(\frac n {\sqrt d}\right)}$ polynomial time upper bound certificate to the size of the independent set for a ${G_{n,d/n}}$ random graph. This removes the extra ${\log n}$ term from our analysis of certificates of unsatisfiability for random ${k}$-SAT when ${k}$ is even.

3. SDP relaxation of independent sets in random sparse graphs

In order to show a random graph has no large independent sets, a more principled way is to argue that there is some polynomial time solvable relaxation of the problem whose solution is an upper bound of the problem.

Let SDPIndSet${(G)}$ be the optimum of the following semidefinite programming relaxation of the Independent Set problem, which is due to Lovász:

$\displaystyle \begin{array}{rcl} \max && \sum_{i\in V} \langle {\bf x}_i, {\bf x}_0 \rangle\\ s.t. \\ && ||{\bf x}_0||^2 = 1\\ && \langle {\bf x}_0, {\bf x}_i \rangle = ||{\bf x}_i ||^2 \ \ \ \forall i\in V\\ && \langle {\bf x}_i, {\bf x}_j \rangle = 0 \ \ \ \forall (i,j)\in E \end{array}$

Since it’s the relaxation of the problem of finding the maximum independent set, ${\max \textrm{IndSet}(G) \le \textrm{SDPIndSet}(G)}$ for any graph ${G}$. And this relaxation has a nice property.

For every ${0 < p < 1}$, and for every graph ${G}$, we have \begin{equation*} {\rm SDPIndSet}(G) \leq \frac 1p \cdot || pJ – A || \end{equation*} where ${J}$ is the all-one matrix and ${A}$ is the adjacency matrix of ${G}$.

Proof: First we note that SDPIndSet${(G)}$ is at most

$\displaystyle \begin{array}{rcl} \max && \sum_{i\in V} \langle {\bf x}_i, {\bf x}_0 \rangle\\ s.t. \\ && ||{\bf x}_0||^2 = 1\\ && \sum_{i\in V} \langle {\bf x}_0, {\bf x}_i \rangle = \sum_{i\in V} ||{\bf x}_i ||^2 \\ && \sum_{(i,j)\in E} \langle {\bf x}_i, {\bf x}_j \rangle = 0 \end{array}$

and this is equal to

$\displaystyle \begin{array}{rcl} \max && \frac { \left( \sum_{i\in V} \langle {\bf x}_i, {\bf x}_0 \rangle \right) ^2}{\sum_{i \in V} || {\bf x}_i||^2}\\ s.t. \\ && ||{\bf x}_0||^2 = 1\\ && \sum_{i\in V} \langle {\bf x}_0, {\bf x}_i \rangle = \sum_{i\in V} ||{\bf x}_i ||^2 \\ && \sum_{(i,j)\in E} \langle {\bf x}_i, {\bf x}_j \rangle = 0 \end{array}$

which is at most

$\displaystyle \begin{array}{rcl} \max && \frac { \left \| \sum_{i\in V}{\bf x}_i \right \|^2}{\sum_{i \in V} || {\bf x}_i||^2}\\ s.t. \\ && ||{\bf x}_0||^2 = 1\\ && \sum_{i\in V} \langle {\bf x}_0, {\bf x}_i \rangle = \sum_{i\in V} ||{\bf x}_i ||^2 \\ && \sum_{(i,j)\in E} \langle {\bf x}_i, {\bf x}_j \rangle = 0 \end{array}$

because

$\displaystyle \sum_{i\in V} \langle {\bf x}_i, {\bf x}_0 \rangle = \left\langle \sum_{i\in V} {\bf x}_i , {\bf x}_0\right \rangle \leq \left \| \sum_{i\in V} {\bf x}_i \right \| \cdot || {\bf x}_0 || = \left \| \sum_{i\in V} {\bf x}_i \right \|$

Finally, the above optimization is equivalent to the following

$\displaystyle \begin{array}{rcl} \max && \frac { \left \| \sum_{i\in V}{\bf x}_i \right \|^2 - \frac 1p \sum_{i,j} A_{i,j} \langle {\bf x}_i , {\bf x}_j \rangle }{\sum_{i \in V} || {\bf x}_i||^2}\\ s.t. \\ && ||{\bf x}_0||^2 = 1\\ && \sum_{i\in V} \langle {\bf x}_0, {\bf x}_i \rangle = \sum_{i\in V} ||{\bf x}_i ||^2 \\ && \sum_{(i,j)\in E} \langle {\bf x}_i, {\bf x}_j \rangle = 0 \end{array}$

which is at most the unconstrained problem

\displaystyle \begin{aligned} \max \frac { \left \| \sum_{i\in V}{\bf x}_i \right \|^2 - \frac 1p \sum_{i,j} A_{i,j} \langle {\bf x}_i , {\bf x}_j \rangle }{\sum_{i \in V} || {\bf x}_i||^2} &= \max \frac { \sum_{i,j} \left( J - \frac 1p A\right)_{i,j} \langle {\bf x}_i,{\bf x}_j \rangle }{\sum_{i \in V} || {\bf x}_i||^2} \\ &= \lambda_{\max} \left (J - \frac 1p A \right) \\ &\leq \frac 1p || pJ - A||. \end{aligned}

$\Box$

Recall from the previous section that we constructed ${G'}$ by removing edges from ${G}$, which corresponds to removing constraints in our semidefinite programming problem, so ${\textrm{SDPIndSet}(G) \le \textrm{SDPIndSet}(G') \le \left\lVert J - \frac 1 p A' \right\rVert}$, which is by theorem 3 at most ${O\left(\frac n{\sqrt d}\right)}$ with high probability.

4. SDP relaxation of random k-SAT

From the previous section, we get an idea that we can use semidefinite programming to relax the problem directly and find a certificate of unsatisfiability for the relaxed problem.

Given a random ${k}$-SAT formula ${f}$:

$\displaystyle \begin{array}{rcl} f({\bf x}) &= & \bigwedge\limits_{i = 1}^m \bigvee\limits_{j = 1}^k x_{\sigma_{i,j}} = b_{i,j} \\ &= &\bigwedge\limits_{i = 1}^m \overline{\overline{\bigvee\limits_{j = 1}^k x_{\sigma_{i,j}} = b_{i,j}}} \\ &= &\bigwedge\limits_{i = 1}^m \overline{\bigwedge\limits_{j = 1}^k x_{\sigma_{i,j}} = \overline{b_{i,j}}}. \end{array}$

The satisfiability of ${f}$ is equivalent of the satisfiability of the following equations:

$\displaystyle \begin{array}{rcl} && x_i^2 = x_i \forall i \in [n] \\ && \sum_{i = 1}^m \left(1 - \prod_{j = 1}^k\left((-1)^{b_{i,j}}x_{\sigma_{i,j}} + b_{i,j}\right)\right) = m \end{array}$

Notice that if we expand the polynomial on the left side, there are some of the monomials having degree higher than 2 which prevents us relaxing these equations to a semidefinite programming problem. In order to resolve this, ${\forall A \subseteq {\bf x}}$ and ${|A| \le k/2}$ we introduce ${x_A = \prod_{i \in A} x_i}$. Then we can relax all variables to be vectors, i.e.

$\displaystyle \begin{array}{rcl} && \lVert {\bf x}_\emptyset \rVert^2 = 1 \\ && \langle {\bf x}_A, {\bf x}_B \rangle = \langle {\bf x}_C, {\bf x}_D \rangle \ \ \ \forall A \cup B = C \cup D \\ && \sum_{i = 1}^m \left(1 - \prod_{j = 1}^k\left((-1)^{b_{i,j}}{\bf x}_{\sigma_{i,j}} + b_{i,j}\right)\right) = m \ \ \ \textrm{rewritten as quadratic forms of } {\bf x}_A \end{array}$

For example, if we have a 4-SAT clause

$\displaystyle x_3 \lor \overline{x_4} \lor x_7 \lor \overline{x_{10}},$

we can rewrite it as

$\displaystyle \begin{array}{rcl} 1 - (1 - {\bf x}_3) \cdot {\bf x}_4 \cdot (1 - {\bf x}_7) \cdot {\bf x}_{10} &= &1 - {\bf x}_4{\bf x}_{10} + {\bf x}_3{\bf x}_4{\bf x}_{10} + {\bf x}_3{\bf x}_7{\bf x}_{10} - {\bf x}_3{\bf x}_4{\bf x}_7{\bf x}_{10} \\ &= &1 - {\bf x}_{\{4\}}{\bf x}_{\{10\}} + {\bf x}_{\{3,4\}}{\bf x}_{\{10\}} + {\bf x}_{\{3,7\}}{\bf x}_{\{10\}} - {\bf x}_{\{3,4\}}{\bf x}_{\{7,10\}}. \end{array}$

For this relaxation, we have:

1. If ${m < c(k, n) n^{k/2}}$, the SDP associated with the formula is feasible with high probability, where ${c(k, n) = 1/n^{o(1)}}$ for every fixed ${k}$.
2. If ${m > c'(k) n^{k/2}}$, the SDP associated with the formula is not feasible with high probability, where ${c'(k, n)}$ is a constant for every fixed even ${k}$, and ${c'(k, n) = \textrm{poly}(\log n)}$ for every fixed odd ${k}$.

### Beyond Worst-Case Analysis: Lecture 7

from Luca Trevisan

Scribed by Jeff Xu

In which we discussed planted clique distribution, specifically, we talked about how to find a planted clique in a random graph. We heavily relied upon our material back in lecture 2 and lecture 3 in which we covered the upper bound certificate for max clique in ${G_{n,\frac{1}{2}}}$. At the end of this class, we wrapped up this topic and started the topic of ${k}$-SAT.

1. Planted Clique

To start with, we describe a distribution of graphs with a planted clique. Suppose that we sample ${G}$ from ${G_{n,\frac{1}{2}}}$ and we want to modify ${G}$ s.t. it has a size ${k}$ clique, i.e., we have a clique ${S \subseteq V}$ with ${\left|S\right|=k}$. The following code describes a sampler for the distribution.

• ${G \leftarrow G_{n,\frac{1}{2}}}$
• Pick a subset of vertices ${S}$ from ${V}$ s.t. ${|S|=k}$
• Independently for each pair ${\{u,v\}}$, make ${\{u,v\}}$ an edge with probability
• ${1}$ if ${u,v \in S}$
• ${\frac 12}$ otherwise

Note: We are only interested in the case ${k \geq 2\log n}$, which is the case in which the planted clique is, with high probability, larger than any pre-existing clique

1.1. Finding the planted clique when ${k\gg \sqrt{n\log n}}$

When ${k\gg \sqrt{n\log n}}$, finding the planted clique is easy because the ${k}$ vertices in the planted clique are precisely the ${k}$ vertices of higher degree.

Lemma 1 In ${G_{n,\frac{1}{2}}}$, w.h.p., for every vertex ${v}$, ${\frac{n-1}{2}-\sqrt{n-1} \sqrt{\ln n} \leq}$ deg(${v}$) ${\leq \frac{n-1}{2}+\sqrt{n-1}\sqrt{\ln n}}$.

Proof: For each vertex ${v}$ in a graph ${G\sim G_{n,\frac{1}{2}}}$, we have ${\deg (v)}$= sum of ${n-1}$ random bits, which is simply a Binomial distribution. By Chernoff bound,

$\displaystyle \mathop{\mathbb P} \left[\left|\sum_i x_i-\frac{n-1}{2}\right|>t\right] \leq 2e^{-\frac{2t^2}{n-1}}$

For this probability to be upper bounded, say by ${\frac{1}{n^2}}$, we can fix ${t=\sqrt{(n-1)\ln n}}$ s.t. ${ t^2\geq (n-1)\ln n}$ and this completes the proof that with high probability, a vertex ${v}$ in ${G_{n,\frac{1}{2}}}$ random graph has degree ${\frac{n-1}{2}\pm \sqrt{n-1}\sqrt{\ln n}}$. $\Box$

Now we consider a vertex ${v}$ in the planted clique ${S}$.

Claim 1 In a graph with a planted clique coming from a ${G_{n,\frac{1}{2}}}$ random graph to which we add all the edges necessary to make ${S}$ a clique, each node in ${S}$ will receive ${\geq 0.4k}$ added edges w.h.p. over the sampling of graph.

Proof: Again, we regard the number of neighbors of a vertex ${v}$ in ${S}$ as a sum of Bernoulli distribution and denote it by ${X}$. By Chernoff bound, we obtain an upper bound on the probability that a vertex ${v \in S}$ has more than ${0.6k}$ neighbors in ${S}$ in a random ${G_{n,\frac{1}{2}}}$ graph.

$\displaystyle \begin{array}{rcl} \mathop{\mathbb P}\left[X>0.6k\right]=\mathop{\mathbb P} \left[X-0.5k > 0.1k\right]\leq e^{-\frac{0.2^2*k}{2}}=e^{-O(k)} \end{array}$

Since the probability is exponentially small in ${k}$, we can conclude that a node in ${S}$, with high probability, has less than ${0.6k}$ edges in the original ${G_{n,\frac{1}{2}}}$ random graph, and thus at least ${0.4k}$ edges will be added to each vertex in ${S}$. $\Box$

Corollary 2 In a graph with a planted clique, a vertex in ${S}$ will have degree ${\geq \frac{n-1}{2}+36\sqrt{n\log n}}$. (Note: we have ${k=100\sqrt{n\log n}}$ in this example)

Therefore, we show that in graph with a large planted clique, we can distinguish it from ${G_{n,\frac{1}{2}}}$ distribution by the existence of node with large degree, i.e., degree over ${\frac{n-1}{2}+4\sqrt{n\log n}}$.

1.2. Distinguish Planted Clique Distribution with ${k\gg \sqrt{n}}$

Moving on to the case in which ${k}$ is of the order of ${\sqrt n}$, we first show how to distinguish graphs sampled from the planted clique distribution from ${G_{n,\frac 12 }}$ random graphs.

Say that ${k=100\sqrt{n}}$, and let ${A}$ be the adjacency matrix of a graph from the planted clique distribution. Then

$\displaystyle \begin{array}{rcl} \left|\left|A-\frac{1}{2}J\right|\right| & \geq & \lambda_{max}\left(A-\frac{1}{2} J\right)\\ & = & \max\limits_x \frac{x^T\left(A-\frac{1}{2}J\right)x}{x^Tx}\\ & \geq & \frac{{\bf 1}_s^T\left(A-\frac{1}{2} J\right){\bf 1}_s}{||{\bf 1}_s||^2} \\ & = & \frac{{\bf 1}_s^TA{\bf 1}_s - \frac{1}{2}{\bf 1}_s^tJ{\bf 1}_s}{k}\\ & = & \frac{1}{k}\left(k^2-k-\frac{1}{2}k^2\right)\\ & =& \frac{k}{2}-1 \end{array}$

Now, recall he following theorem from Lecture 2.

Theorem 3 If ${A}$ is the adjacency matrix of a ${G_{n,\frac{1}{2}}}$ graph, w.h.p., ${\left|\left|A-\frac{1}{2}J\right|\right| \leq 2\sqrt{n}}$.

Therefore, we can identify graph with a planted clique from a random ${G_{n,\frac{1}{2}}}$ distribution using this method.

1.3. Uniqueness of Maximum Clique in Planted Clique Distribution

In order to show that we can find the planted clique from a graph, we want to prove first that the maximum clique in planted clique Distribution is unique. In other words, we want to prove that the planted clique is the maximum clique in planted clique distribution. We first prove the following lemmas:

Lemma 4 For each vertex not in the planted clique, i.e., ${v\in V-S}$, ${\#}$of ${v}$‘s neighbors in ${S}$ ${\leq .5k+\sqrt{k-1} \sqrt{\ln n}\leq .55k}$.

Proof: This is largely similar to Lemma 1. We see this as a sum of ${k-1}$ random ${0-1}$ bits and by Chernoff bound, we have:

$\displaystyle \mathop{\mathbb P} \left[\left|\sum_i x_i-\frac{k-1}{2}\right|>t\right] \leq e^{-\frac{2t^2}{k-1}}$

For this probability to be upper bounded, say by ${\frac{1}{n}}$, we can choose a ${t}$ s.t. ${t^2\geq (k-1)\ln n}$. Therefore, we pick ${t=\sqrt{(k-1)\ln n}}$ and this completes the proof that, with high probability, each vertex not in the planted clique has less no more than ${.55k}$ neighbors in the planted clique. $\Box$

Lemma 5 ${G}$ sampled from ${G_{n,\frac{1}{2}}}$, w.h.p., has a largest clique of size ${2\log n}$. (proved in lecture 2)

Claim 2 Under the above assumptions, ${S}$ is the unique clique of size ${k}$ in ${G}$.

Proof: Suppose, for the sake of contradiction, we find a clique ${T}$ s.t. ${T\neq S, |T|\geq k}$. Since ${T, S}$ are both cliques by assumption, ${T-S}$ is also clique. ${G_{n,\frac{1}{2}}}$ has a largest clique of size ${\leq 2\log n}$ w.h.p., so ${|T-S| \leq 2\log n}$ since it is a clique in ${G_{n,\frac{1}{2}}}$. Consider a vertex ${t \in T-S}$: the number of ${t}$‘s neighbors in ${S \geq |T| \geq |T\cap S|}$ is at least ${k-2\log n > 0.55k}$, but this contradicts Lemma 4, which states that ${t}$ should have no more than ${.55k}$ neighbors in ${S}$. $\Box$

1.4. Finding the Planted Clique

Now that we have shown the uniqueness of maximum clique, we want to proceed and show that we can find the planted clique. Let ${A}$ be the adjacency matrix of a ${G_{n,\frac{1}{2}}}$ random graph with a planted clique of size ${k=100\sqrt{n}}$, and let ${{\bf x}}$ be the maximizer of

$\displaystyle \begin{array}{rcl} \frac{{\bf x}^T\left(A-\frac{1}{2}J\right){\bf x}}{||{\bf x}||^2} \geq \frac{k}{2}-1=50\sqrt{n}-1 \end{array}$

We will show below that ${{\bf x}}$ is close to the indicator vector of ${S}$. First, we need to note that we are no longer using the sampling method described earlier to attain a planted clique distribution. Alternatively, we sample our graph ${G}$ from random ${G_{n,\frac{1}{2}}}$ distribution, pick a subset of vertices ${S}$ from ${G}$ and add to it the necessary edges to make ${S}$ a clique. From this point of view, we have ${G = G_{n,\frac{1}{2}}+G_{clique}}$, where ${G}$ is the graph with a planted clique and ${G_{clique}}$ is a distribution of edges that we need to add. We can then represent the adjacency matrix of ${G}$ as:

$\displaystyle A = A_{random} + A_{clique}$

where ${A_{random} \sim G_{n,\frac{1}{2}}}$ and ${A_{clique} \sim G_{k,\frac{1}{2}}}$.

By the theorem shown in lecture 2 (which we just recapped above), we have the following equations with high probability:

$\displaystyle \left|\left|A_{random}-\frac{1}{2}J\right|\right| \leq 2\sqrt{n}$

$\displaystyle \left|\left|A_{clique}-\frac{1}{2}{\bf 1}_s {\bf 1}_s^T\right|\right| \leq 2\sqrt{k} = o(k)$

Now we combine the equations listed above, and wlog, let ${||{\bf x}||=1}$.

$\displaystyle \begin{array}{rcl} 50\sqrt{n}-1 & = & \frac{k}{2}-1\\ & \leq & {\bf x}^T\left(A-\frac{1}{2}J\right){\bf x} \\ & = & {\bf x}^T\left(A_{clique}+A_{random}-\frac{1}{2}J\right){\bf x} \end{array}$

With ${||A_{random}-\frac{1}{2}J|| \leq 2\sqrt{n}}$ from above, we have

$\displaystyle \begin{array}{rcl} {\bf x}^TA_{clique}{\bf x} \geq 48\sqrt{n}=0.48k.\\ \end{array}$

Therefore, ${{\bf x}^T\left(A_{clique}-\frac{1}{2}{\bf 1}_s{\bf 1}_s^T+\frac{1}{2}{\bf 1}_s{\bf 1}_s^T\right){\bf x} \geq 0.48k}$. With ${\left|\left|A_{clique}-\frac{1}{2}{\bf 1}_s {\bf 1}_s^T\right|\right|= o(k)}$ shown above, we can conclude that

$\displaystyle {\bf x}^T \left(\frac{1}{2} {\bf 1}_s {\bf 1}_s^T \right){\bf x} \geq (0.48-o(1))k$

That is,

$\displaystyle \langle {\bf x}, {\bf 1}_S \rangle^2 \geq (0.96-o(1))k$

and, up to passing to ${-{\bf x}}$, which has the same set of largest ${k}$ entries in absolute value, we have ${ \langle {\bf x}, {\bf 1}_S \rangle \geq \sqrt{.96k - o(k)} \geq .979 \sqrt k }$, for sufficiently large ${k}$. This means that, up to scaling, ${{\bf x}}$ and ${{\bf 1}_S}$ are nearly identical.

$\displaystyle \begin{array}{rcl} || \sqrt k {\bf x} - {\bf 1}_S ||^2 & = & k ||{\bf x}||^2 + || {\bf 1}_S|| - 2 \sqrt k \langle {\bf x}, {\bf 1}_S \rangle\\ & \leq & 2k \cdot ( 1 - .979) \\ & \leq & .042 k \end{array}$

Let ${L}$ be the set of ${k}$ largest entries of ${\sqrt k{\bf x}}$, and hence of ${{\bf x}}$, breaking ties arbitrarily, and let ${t}$ be the threshold value for membership in ${L}$ (that is, ${\sqrt k x_i \geq t}$ for all ${i\in L}$ and ${\sqrt k x_i \leq t}$ for all ${i\not\in L}$). Suppose that there are ${B}$ elements of ${S}$ that are not in ${L}$, and hence ${B}$ elements not in ${S}$ that are in ${L}$. Then

$\displaystyle \begin{array}{rcl} || \sqrt k {\bf x} - {\bf 1}_S ||^2 & = & \sum_{i \in S} (\sqrt k x_i - 1)^2 + \sum_{i\not \in S} k x_i^2 \\ & \geq & B \cdot (1-t)^2 + B t^2\\ & \geq & \frac 12 B \end{array}$

And we conclude that ${B\leq .084 k}$, that is, ${L}$ contains at least ${.9k}$ of the ${k}$ elements of ${S}$.

We now present an algorithm to find the planted clique. Let ${L}$ be the set of ${k}$ vertices that has largest ${|x_i|}$. We then consider each vertex ${v \in L}$. If ${v \in S}$, by our proof above, it should have ${\geq 0.9}$ neighbors in L. If ${v \notin S}$, ${v}$ should have ${\leq 0.55k+0.2k=0.75k}$ neighbors in ${L}$. Therefore, we can easily verify a vertex is in ${S}$ by looking at its number of neighbors in ${L}$, and this gives us an algorithm.

• Algorithm(${G}$):
• ${A \leftarrow}$ adjacency matrix of ${G}$
• ${{\bf x} \leftarrow}$ eigenvector of largest eigenvalue to ${A-\frac{1}{2}J}$
• ${L \leftarrow}$ set of ${k}$ vertices with largest ${|x_i|}$
• clique ${\leftarrow}$ set of vertices with at least ${0.8k}$ neighbors in ${L}$

It is still an open problem to find a planted clique of size ${o(\sqrt n)}$

2. Random ${k}$-SAT and Proof of Unsatisfiability

We start the topic on random ${k}$-SAT formulas. In ${k}$-SAT problem, we are trying to decide whether a formula in CNF with each clause containing up to ${k}$ literals is satisfiable. We note in class that checking satisfiability for randomly generated equations is hard even in average case. Similar to the ${G_{n,p}}$ model, we generate a ${k}$-SAT formula on ${n}$ variable with parameter ${p}$ s.t. each of the ${\binom{n}{k}2^k}$ clauses exists with probability ${p}$. Besides the model above, we also briefly mention another model, in which we randomly pick ${m}$ of the ${\binom{n}{k}2^k}$ possible clauses. (Note: these two models are closely related when we have ${m=p\binom{n}{k} 2^k}$). To gain more insights, we discussed the example of ${3}$-SAT problem. It has an expected number of satisfying assignments ${2^n(\frac{7}{8})^m}$. We also observe that for any ${3}$-SAT instance ${f}$, we have ${\mathop{\mathbb P}\left[f \text{ is satisfiable} \right]\leq \mathop{\mathbb E}\left[\# \text{of satisfying assignments to variables}\right] = 2^n(\frac{7}{8})^m}$ which goes to ${0}$ if ${m>\log_{\frac{8}{7}} 2n}$.

### TTI-Chicago Summer Workshop Program

from CS Theory Events

June 18 – September 14, 2018 TTI-Chicago, Chicago IL http://www.ttic.edu/summer-workshop-2018/ Submission deadline: December 15, 2017 Would you like to get a group of 4-40 people together for a week to discuss a topic of common interest? Would you like to do it in an easy-to-reach vibrant city with a substantial local research community and with … Continue reading TTI-Chicago Summer Workshop Program

by shacharlovett at October 13, 2017 02:42 PM UTC

### TR17-155 | Proofs of Proximity for Distribution Testing | Alessandro Chiesa, Tom Gur

from ECCC papers

Distribution testing is an area of property testing that studies algorithms that receive few samples from a probability distribution D and decide whether D has a certain property or is far (in total variation distance) from all distributions with that property. Most natural properties of distributions, however, require a large number of samples to test, which motivates the question of whether there are natural settings wherein fewer samples suffice. We initiate a study of proofs of proximity for properties of distributions. In their basic form, these proof systems consist of a tester that not only has sample access to a distribution but also explicit access to a proof string that depends on the distribution. We refer to these as NP distribution testers, or MA distribution testers if the tester is a probabilistic algorithm. We also study the more general notion of IP distribution testers, in which the tester interacts with an all-powerful untrusted prover. We investigate the power and limitations of proofs of proximity for distributions and chart a landscape that, surprisingly, is significantly different from that of proofs of proximity for functions. Our main results include showing that MA distribution testers can be quadratically stronger than standard distribution testers, but no stronger than that; in contrast, IP distribution testers can be exponentially stronger than standard distribution testers, but when restricted to public coins they can be at best quadratically stronger.

### Hard and Easy Instances of L-Tromino Tilings

Authors: Javier T. Akagi, Carlos F. Gaona, Fabricio Mendoza, Marcos Villagra
Abstract: In this work we study tilings of regions in the square lattice with L-shaped trominoes. Deciding the existence of a tiling with L-trominoes for an arbitrary region in general is NP-complete, nonetheless, we indentify restrictions to the problem where either it remains NP-complete or it has a polynomial time algorithm. First we show that an aztec diamond of order $n$ always has an L-tromino tiling if and only if $n(n+1)\equiv 0\mod 3$; if an aztec diamond has at least two defects or holes, however, the problem of deciding a tiling is NP-complete. Then we study tilings of arbitrary regions where only $180^\circ$ rotations of L-trominoes are available. For this particular case we show that deciding the existence of a tiling remains NP-complete, yet, if a region contains certain so-called "forbidden polyominoes" as subregions, then there exists a polynomial time algorithm for deciding a tiling.

### Graph Drawing by Weighted Constraint Relaxation

Authors: Jonathan X. Zheng, Dan F. M. Goodman, Samraat Pawar
Abstract: A popular method of force-directed graph drawing is multidimensional scaling using graph-theoretic distances as input. We present an algorithm to minimize its energy function, known as stress, by using a relaxation method that considers a single pair of vertices at a time. Our results show that relaxation can reach lower stress levels faster and more consistently than majorization, without needing help from a good initialization. We then present various real-world applications to show how the unique properties of relaxation make it easier to produce constrained layouts than previous approaches. We also show how relaxation can be directly applied within the sparse stress approximation of Ortmann et al. [1], making the algorithm scalable up to large graphs.

### 4-connected planar graphs are in $B_3$-EPG

Authors: Therese Biedl, Claire Pennarun
Abstract: We show that every 4-connected planar graph has a $B_3$-EPG representation, i.e., every vertex is represented by a curve on the grid with at most three bends, and two vertices are adjacent if and only if the corresponding curves share an edge of the grid. Our construction is based on a modification of the representation by touching thickened $L$-shapes proposed by Gon\c{c}alves et al.

### Balancing expression dags for more efficient lazy adaptive evaluation

Authors: Martin Wilhelm
Abstract: Arithmetic expression dags are widely applied in robust geometric computing. In this paper we restructure expression dags by balancing consecutive additions or multiplications. We predict an asymptotic improvement in running time and experimentally confirm the theoretical results. Finally, we discuss some pitfalls of the approach resulting from changes in evaluation order.

### The Trees of Hanoi

Authors: Joost Engelfriet
Abstract: The game of the Towers of Hanoi is generalized to binary trees. First, a straightforward solution of the game is discussed. Second, a shorter solution is presented, which is then shown to be optimal.

### On the Containment Problem for Linear Sets

Authors: Hans U. Simon
Abstract: It is well known that the containment problem (as well as the equivalence problem) for semilinear sets is $\log$-complete in $\Pi_2^p$. In this paper, we show that already the containment problem for linear sets is $\log$-hard (and therefore also $\log$-complete) in $\Pi_2^p$. This holds even when we use a unary encoding for the numerical input parameters or when we restrict the problem to $1$-dimensional linear sets (i.e, linear sets in $\nats_0$). However, combining both restrictions, the problem becomes solvable in polynomial time.

### Pure Operation-Based Replicated Data Types

Authors: Carlos Baquero, Paulo Sergio Almeida, Ali Shoker
Abstract: Distributed systems designed to serve clients across the world often make use of geo-replication to attain low latency and high availability. Conflict-free Replicated Data Types (CRDTs) allow the design of predictable multi-master replication and support eventual consistency of replicas that are allowed to transiently diverge. CRDTs come in two flavors: state-based, where a state is changed locally and shipped and merged into other replicas; operation-based, where operations are issued locally and reliably causal broadcast to all other replicas. However, the standard definition of op-based CRDTs is very encompassing, allowing even sending the full-state, and thus imposing storage and dissemination overheads as well as blurring the distinction from state-based CRDTs. We introduce pure op-based CRDTs, that can only send operations to other replicas, drawing a clear distinction from state-based ones. Data types with commutative operations can be trivially implemented as pure op-based CRDTs using standard reliable causal delivery; whereas data types having non-commutative operations are implemented using a PO-Log, a partially ordered log of operations, and making use of an extended API, i.e., a Tagged Causal Stable Broadcast (TCSB), that provides extra causality information upon delivery and later informs when delivered messages become causally stable, allowing further PO-Log compaction. The framework is illustrated by a catalog of pure op-based specifications for classic CRDTs, including counters, multi-value registers, add-wins and remove-wins sets.

### On the Power of Tree-Depth for Fully Polynomial FPT Algorithms

Authors: Yoichi Iwata, Tomoaki Ogasawara, Naoto Ohsaka
Abstract: There are many classical problems in P whose time complexities have not been improved over the past decades. Recent studies of "Hardness in P" have revealed that, for several of such problems, the current fastest algorithm is the best possible under some complexity assumptions. To bypass this difficulty, Fomin et al. (SODA 2017) introduced the concept of fully polynomial FPT algorithms. For a problem with the current best time complexity $O(n^c)$, the goal is to design an algorithm running in $k^{O(1)}n^{c'}$ time for a parameter $k$ and a constant $c'<c$. In this paper, we investigate the complexity of graph problems in P parameterized by tree-depth, a graph parameter related to tree-width. We show that a simple divide-and-conquer method can solve many graph problems, including Weighted Matching, Negative Cycle Detection, Minimum Weight Cycle, Replacement Paths, and 2-hop Cover, in $O(\mathrm{td}\cdot m)$ time or $O(\mathrm{td}\cdot (m+n\log n))$ time, where $\mathrm{td}$ is the tree-depth of the input graph. Because any graph of tree-width $\mathrm{tw}$ has tree-depth at most $(\mathrm{tw}+1)\log_2 n$, our algorithms also run in $O(\mathrm{tw}\cdot m\log n)$ time or $O(\mathrm{tw}\cdot (m+n\log n)\log n)$ time. These results match or improve the previous best algorithms parameterized by tree-width. Especially, we solve an open problem of fully polynomial FPT algorithm for Weighted Matching parameterized by tree-width posed by Fomin et al.

### What Would a Graph Look Like in This Layout? A Machine Learning Approach to Large Graph Visualization

Authors: Oh-Hyun Kwon, Tarik Crnovrsanin, Kwan-Liu Ma
Abstract: Using different methods for laying out a graph can lead to very different visual appearances, with which the viewer perceives different information. Selecting a "good" layout method is thus important for visualizing a graph. The selection can be highly subjective and dependent on the given task. A common approach to selecting a good layout is to use aesthetic criteria and visual inspection. However, fully calculating various layouts and their associated aesthetic metrics is computationally expensive. In this paper, we present a machine learning approach to large graph visualization based on computing the topological similarity of graphs using graph kernels. For a given graph, our approach can show what the graph would look like in different layouts and estimate their corresponding aesthetic metrics. An important contribution of our work is the development of a new framework to design graph kernels. Our experimental study shows that our estimation calculation is considerably faster than computing the actual layouts and their aesthetic metrics. Also, our graph kernels outperform the state-of-the-art ones in both time and accuracy. In addition, we conducted a user study to demonstrate that the topological similarity computed with our graph kernel matches perceptual similarity assessed by human users.

### Improved Coresets for Kernel Density Estimates

Authors: Jeff M. Phillips, Wai Ming Tai
Abstract: We study the construction of coresets for kernel density estimates. That is we show how to approximate the kernel density estimate described by a large point set with another kernel density estimate with a much smaller point set. For characteristic kernels (including Gaussian and Laplace kernels), our approximation preserves the $L_\infty$ error between kernel density estimates within error $\epsilon$, with coreset size $2/\epsilon^2$, but no other aspects of the data, including the dimension, the diameter of the point set, or the bandwidth of the kernel common to other approximations. When the dimension is unrestricted, we show this bound is tight for these kernels as well as a much broader set.

This work provides a careful analysis of the iterative Frank-Wolfe algorithm adapted to this context, an algorithm called \emph{kernel herding}. This analysis unites a broad line of work that spans statistics, machine learning, and geometry.

When the dimension $d$ is constant, we demonstrate much tighter bounds on the size of the coreset specifically for Gaussian kernels, showing that it is bounded by the size of the coreset for axis-aligned rectangles. Currently the best known constructive bound is $O(\frac{1}{\epsilon} \log^d \frac{1}{\epsilon})$, and non-constructively, this can be improved by $\sqrt{\log \frac{1}{\epsilon}}$. This improves the best constant dimension bounds polynomially for $d \geq 3$.

### Maximum Margin Interval Trees

Authors: Alexandre Drouin, Toby Dylan Hocking, François Laviolette
Abstract: Learning a regression function using censored or interval-valued output data is an important problem in fields such as genomics and medicine. The goal is to learn a real-valued prediction function, and the training output labels indicate an interval of possible values. Whereas most existing algorithms for this task are linear models, in this paper we investigate learning nonlinear tree models. We propose to learn a tree by minimizing a margin-based discriminative objective function, and we provide a dynamic programming algorithm for computing the optimal solution in log-linear time. We show empirically that this algorithm achieves state-of-the-art speed and prediction accuracy in a benchmark of several data sets.

### Construction of FuzzyFind Dictionary using Golay Coding Transformation for Searching Applications

Authors: Kamran Kowsari, Maryam Yammahi, Nima Bari, Roman Vichr, Faisal Alsaby, Simon Y. Berkovich
Abstract: Searching through a large volume of data is very critical for companies, scientists, and searching engines applications due to time complexity and memory complexity. In this paper, a new technique of generating FuzzyFind Dictionary for text mining was introduced. We simply mapped the 23 bits of the English alphabet into a FuzzyFind Dictionary or more than 23 bits by using more FuzzyFind Dictionary, and reflecting the presence or absence of particular letters. This representation preserves closeness of word distortions in terms of closeness of the created binary vectors within Hamming distance of 2 deviations. This paper talks about the Golay Coding Transformation Hash Table and how it can be used on a FuzzyFind Dictionary as a new technology for using in searching through big data. This method is introduced by linear time complexity for generating the dictionary and constant time complexity to access the data and update by new data sets, also updating for new data sets is linear time depends on new data points. This technique is based on searching only for letters of English that each segment has 23 bits, and also we have more than 23-bit and also it could work with more segments as reference table.

### Lessons from the Nobel Prizes

We've had a big week of awards with the Nobel Prizes and the MacArthur "Genius" Fellows. The MacArthur Fellows include two computer scientists, Regina Barzilay and Stefan Savage, and a statistician Emmanuel Candès but no theoretical computer scientists this year.

No computer scientists among the Nobel Laureates either but technology played a large role in the chemistry and physics prize. The chemistry prize went for a fancy microscope that could determine biomolecular structure. The LIGO project that measures extremely weak gravitational waves received the physics prize.

In a sign of the times, Jeffrey Hall, one of the medical prize recipients, left science due to lack of funding.

The economics prize went to Richard Thaler who described how people act irrationally but often in predictable ways such as the endowment effect that states the people give more value to an object they own versus one they don't currently have. The book Thinking Fast and Slow by 2002 Laureate Daniel Kahneman does a great job describing these behaviors.

While at Northwestern I regularly attended the micro-economics seminars many of which tried to give models that described the seemingly irrational behaviors that researchers like Thaler brought to light. My personal theory: Humans evolved to have these behaviors because while they might not be the best individual choices they make society better overall.

by Lance Fortnow (noreply@blogger.com) at October 12, 2017 12:52 PM UTC

### TR17-154 | The Relation between Polynomial Calculus, Sherali-Adams, and Sum-of-Squares Proofs | Christoph Berkholz

from ECCC papers

We relate different approaches for proving the unsatisfiability of a system of real polynomial equations over Boolean variables. On the one hand, there are the static proof systems Sherali-Adams and sum-of-squares (a.k.a. Lasserre), which are based on linear and semi-definite programming relaxations. On the other hand, we consider polynomial calculus, which is a dynamic algebraic proof system that models Gröbner basis computations. Our first result is that sum-of-squares simulates polynomial calculus: any polynomial calculus refutation of degree $d$ can be transformed into a sum-of-squares refutation of degree $2d$ and only polynomial increase in size. In contrast, our second result shows that this is not the case for Sherali-Adams: there are systems of polynomial equations that have polynomial calculus refutations of degree $3$ and polynomial size, but require Sherali-Adams refutations of degree $\Omega(\sqrt{n}/\log n)$ and exponential size.

### The Provable Virtue of Laziness in Motion Planning

Authors: Nika Haghtalab, Simon Mackenzie, Ariel D. Procaccia, Oren Salzman, Siddhartha S. Srinivasa
Abstract: The Lazy Shortest Path (LazySP) class consists of motion-planning algorithms that only evaluate edges along shortest paths between the source and target. These algorithms were designed to minimize the number of edge evaluations in settings where edge evaluation dominates the running time of the algorithm; but how close to optimal are LazySP algorithms in terms of this objective? Our main result is an analytical upper bound, in a probabilistic model, on the number of edge evaluations required by LazySP algorithms; a matching lower bound shows that these algorithms are asymptotically optimal in the worst case.

### Finding Euler Tours in One Pass in the W-Streaming Model with O(n log(n)) RAM

Authors: Christian Glazik, Jan Schiemann, Anand Srivastav
Abstract: We study the problem of finding an Euler tour in an undirected graph G in the W-Streaming model with O(n polylog(n)) RAM, where n resp. m is the number of nodes resp. edges of G. Our main result is the first one pass W-Streaming algorithm computing an Euler tour of G in the form of an edge successor function with only O(n log(n)) RAM which is optimal for this setting (e.g., Sun and Woodruff (2015)). The previously best-known result in this model is implicitly given by Demetrescu et al. (2010) with the parallel algorithm of Atallah and Vishkin (1984) using O(m/n) passes under the same RAM limitation. For graphs with \omega(n) edges this is non-constant. Our overall approach is to partition the edges into edge-disjoint cycles and to merge the cycles until a single Euler tour is achieved. Note that in the W-Streaming model such a merging is far from being obvious as the limited RAM allows the processing of only a constant number of cycles at once. This enforces us to merge cycles that partially are no longer present in RAM. Furthermore, the successor of an edge cannot be changed after the edge has left RAM. So, we steadily have to output edges and their designated successors, not knowing the appearance of edges and cycles yet to come. We solve this problem with a special edge swapping technique, for which two certain edges per node are sufficient to merge tours without having all of their edges in RAM. Mathematically, this is controlled by structural results on the space of certain equivalence classes corresponding to cycles and the characterization of associated successor functions. For example, we give conditions under which the swapping of edge successors leads to a merging of equivalence classes.

### Stream Graphs and Link Streams for the Modeling of Interactions over Time

Authors: Matthieu Latapy, Tiphaine Viard, Clémence Magnien
Abstract: Graph theory provides a language for studying the structure of relations, and it is often used to study interactions over time too. However, it poorly captures the both temporal and structural nature of interactions, that calls for a dedicated formalism. In this paper, we generalize graph concepts in order to cope with both aspects in a consistent way. We start with elementary concepts like density, clusters, or paths, and derive from them more advanced concepts like cliques, degrees, clustering coefficients, or connected components. We obtain a language to directly deal with interactions over time, similar to the language provided by graphs to deal with relations. This formalism is self-consistent: usual relations between different concepts are preserved. It is also consistent with graph theory: graph concepts are special cases of the ones we introduce. This makes it easy to generalize higher-level objects such as quotient graphs, line graphs, k-cores, and centralities. This paper also considers discrete versus continuous time assumptions, instantaneous links, and extensions to more complex cases.

### Using modular decomposition technique to solve the maximum clique problem

Authors: Irina Utkina
Abstract: In this article we use the modular decomposition technique for exact solving the weighted maximum clique problem. Our algorithm takes the modular decomposition tree from the paper of Tedder et. al. and finds solution recursively. Also, we propose algorithms to construct graphs with modules. We show some interesting results, comparing our solution with Ostergard's algorithm on DIMACS benchmarks and on generated graphs

### Grid peeling and the affine curve-shortening flow

Authors: David Eppstein, Sariel Har-Peled, Gabriel Nivasch
Abstract: In this paper we study an experimentally-observed connection between two seemingly unrelated processes, one from computational geometry and the other from differential geometry. The first one (which we call "grid peeling") is the convex-layer decomposition of subsets $G\subset \mathbb Z^2$ of the integer grid, previously studied for the particular case $G=\{1,\ldots,m\}^2$ by Har-Peled and Lidick\'y (2013). The second one is the affine curve-shortening flow (ACSF), first studied by Alvarez et al. (1993) and Sapiro and Tannenbaum (1993). We present empirical evidence that, in a certain well-defined sense, grid peeling behaves at the limit like ACSF on convex curves. We offer some theoretical arguments in favor of this conjecture.

We also pay closer attention to the simple case where $G=\mathbb N^2$ is a quarter-infinite grid. This case corresponds to ACSF starting with an infinite L-shaped curve, which when transformed using the ACSF becomes a hyperbola for all times $t>0$. We prove that, in the grid peeling of $\mathbb N^2$, (1) the number of grid points removed up to iteration $n$ is $\Theta(n^{3/2}\log n)$; and (2) the boundary at iteration $n$ is sandwiched between two hyperbolas that are separated from each other by a constant factor.

### Constructing Top-k Routes with Personalized Submodular Maximization of POI Features

Authors: Hongwei Liang, Ke Wang
Abstract: We consider a practical top-k route problem: given a collection of points of interest (POIs) with rated features and traveling costs between POIs, a user wants to find k routes from a source to a destination, that maximally match her needs on feature preferences and can be completed within a travel cost budget. One challenge is dealing with the personalized diversity requirement where each user has a different trade-off between quantity (the number of POIs with a specified feature) and variety (the coverage of specified features). Another challenge is the large scale of the POI network and the great many alternative routes to search. We model the personalized diversity requirement by the whole class of submodular functions, and present an optimal solution to the top-k route problem through an index structure for retrieving relevant POIs in both feature and route spaces and various strategies for pruning the search space using user preferences and constraints. We also present heuristic solutions and evaluate all the solutions on real life POI network data.

### Peeling vs shortening

from David Eppstein

My latest preprint is about two processes, one continuous and one discrete, that appear to do the same thing as each other. We don’t really understand why. The preprint is “Grid peeling and the affine curve-shortening flow”, with Sariel Har-Peled and Gabriel Nivasch (arXiv:1710.03960) and will appear at ALENEX.

The discrete process is the one where you find the convex layers of a grid, or a subset of a grid. That is, you find the convex hull, remove its vertices, find the convex hull of the rest of the points, remove its vertices, etc. I wrote about it briefly a month ago, in a G+ post on a related integer sequence. When you do this to a square grid, the layers appear to get more circular as you go inwards (until they become so small that being on a grid forces them to get more polygonal again). When you do it to a rectangular grid, they appear to get elliptical. And when you do it to a quarter-infinite grid…

…you appear to get a hyperbola. What’s going on?

The continuous process is a variant of the curve-shortening flow called the affine curve-shortening flow. It operates on smooth curves in the plane, by moving all the points of the curve simultaneously. Each point moves towards its local center of curvature, at a speed proportional to the cube root of the curvature (the inverse of the cube root of the radius of curvature). If you take the “cube root” part out, you get the curve-shortening flow, under which every simple closed curve instantaneously smooths itself, then more slowly becomes convex (while avoiding self-intersections), then slowly becomes more circular (while shrinking to a point). The affine curve-shortening flow does much the same thing, more slowly, but it becomes elliptical instead of circular.

What we noticed was that if is a convex curve, then affine curve-shortening on and the convex layers of the points of a fine grid inside appear to be much the same. Here’s an example where the initial curve is less symmetric than a square or rectangle. The left side is affine curve-shortening, the right side is grid peeling, and the middle is a much coarser grid peeling (with every fifth layer shown) to demonstrate what we’re doing. Can you tell the difference between left and right?

Of course proof by picture is not very convincing. We have some theory to support this observation (we can prove that grid peeling moves within a constant factor of the speed of affine curve shortening) and some vague heuristic arguments for why they might be the same (grid peeling is invariant under a big subset of affine transformations, so if it’s going to act like a flow it should be a flow that’s also affine invariant, and the simplest choice is affine curve-shortening). But we did send this to a conference about experiments, so we also did some experiments to see how similar the two processes are.

The answer: pretty similar, similar enough to convince us that (in the limit as the grid becomes finer) grid peeling converges pointwise to affine curve-shortening. See the preprint for the details, but the basic story from the experiments is that for fine grids the Hausdorff distance between the peeled grid curves and the affinely shortened curves is small, and grows smaller with finer grids. Decreasing the grid spacing by a factor of 10 decreased the Hausdorff distance by a factor of about 2.65, so that suggests that the distance is a sublinear power of the grid spacing, but that’s a wild extrapolation from two data points so it would probably be a mistake to guess the exponent with more than one digit of precision.

Anyway, maybe this is easy to prove, to someone who has the right theoretical tools in hand (which is not us).

(G+)

by David Eppstein at October 11, 2017 09:04 PM UTC