# Theory of Computing Blog Aggregator

### Special Topics in Complexity Theory, Lectures 2-3

from Emanuele Viola

Special Topics in Complexity Theory, Fall 2017. Instructor: Emanuele Viola

### 1 Lectures 2-3, Scribe: Tanay Mehta

In these lectures we conclude the proof that bounded independence fools the AND function, and look at the more recent result that bounded independence fools the circuit class $\mathrm {AC}^0$.

#### 1.1 Bounded Independence Fools AND

We state again the theorem from last time.

Theorem 1. Let $(X_1, \dots , X_n)$ be a distribution over $\{0,1\}^n$ such that any $k$ $X_i$ are independent (but not necessarily uniform). Then, we have that

\begin{aligned} \left |\Pr \left [\bigwedge _{i=1}^n X_i = 1\right ] - \prod _{i=1}^n \Pr [X_i = 1]\right | \leq 2^{- \Omega (k)}\end{aligned}

Proof. Let $D$ be the distribution of $(X_1, \dots , X_n)$. Let $B$ be the $n$-wise independent distribution $(Y_1, \dots , Y_n)$ such that $\Pr [Y_i = 1] = \Pr [X_i = 1]$ for all $i \in [n]$ and the $Y_i$ are independent. The theorem is equivalent to the following statement.

\begin{aligned}|\Pr _{X \leftarrow D}\left [\bigwedge _{i=1}^n X_i = 1 \right ] - \Pr _{X \leftarrow B}\left [\bigwedge _{i=1}^n X_i = 1 \right ]| \leq 2^{-\Omega (k)}\end{aligned}

We will prove the above statement by the following version of the Inclusion-Exclusion principle.

##### 1.1.1 Inclusion-Exclusion Principle

Let $V$ be any distribution over $\{0,1\}^n$. Note that by De Morgan’s laws, we have

\begin{aligned}\Pr \left [\bigwedge V_i = 1 \right ] = 1 - \Pr \left [\bigvee V_i = 0 \right ]\end{aligned}

Let $E_i$ be the event that $V_i = 0$. We want to bound the quantity $\Pr \left [\bigcup E_i\right ]$. By looking at the Venn diagram of the events $E_i$, we can see that

\begin{aligned} \Pr \left [\bigcup E_i\right ] & \leq \Pr [E_1] + \dots + \Pr [E_n] = \sum _i \Pr [E_i] \nonumber \\ \Pr \left [\bigcup E_i\right ] & \geq \sum _i \Pr [E_i] - \sum _{i, j} \Pr [E_i \cap E_j] \nonumber \\ \Pr \left [\bigcup E_i\right ] & \leq \sum _i \Pr [E_i] ~~ - \sum _{S \subseteq [n], |S| = 2} \Pr \left [\bigcap _{i \in S} E_i \right ]~~+ \sum _{S \subseteq [n], |S| = 3} \Pr \left [\bigcap _{i \in S} E_i\right ], \nonumber \end{aligned}

and so on. In general, we have the following. Define

\begin{aligned}T_j := \sum _{S \subseteq [n], |S| = j} \Pr \left [ \bigcap _{i \in S} E_i \right ]\end{aligned}
\begin{aligned}S_h := \sum _{i=1}^h (-1)^{i+1} T_i \end{aligned}

Then, we have the bounds $\Pr \left [\bigcup E_i\right ] \leq S_j$ for odd $j$, and $\Pr \left [\bigcup E_i\right ] \geq S_j$ for even $j$. This fact holds for any distribution.

Let us return to the proof. Note that the $S_h$ are the same for $D$ and $B$ up to $h = k$ because they only involve sums of ANDs of at most $k$ events. Hence, we have that

\begin{aligned}\left |\Pr _D \left [\bigwedge X_i = 1\right ] - \Pr _B\left [ \bigwedge X_i = 1\right ]\right | \leq |S_k - S_{k-1}| = |T_k|\end{aligned}

where the last equality comes from the definition of $S_k$. Therefore, we are done if $|T_k| \leq 2^{-\Omega (k)}$. We have that

\begin{aligned}T_k = \sum _{S \subseteq [n], |S| = k} \Pr \left [\bigcap _{i \in S} E_i \right ] ={ n \choose k} ~~ \mathbb {E}_{S \subseteq [n], |S| = k} \left [ \prod _{i \in S} P_i \right ] \end{aligned}

where $P_i := \Pr [E_i] = 1 - \Pr [X_i = 1]$. To bound the expectation we recall a useful inequality.

##### 1.1.2 A Useful Inequality

Let $Q_1, \dots , Q_n$ be non-negative real numbers. Then, by the AM-GM inequality, we have that

\begin{aligned}\frac {\sum _i Q_i}{n} \geq \left (\prod _i Q_i \right )^{1/n}.\end{aligned}

Consider the following more general statement,

\begin{aligned} \mathbb {E}_{S \subseteq [n], |S| = 1} \left [ \prod _{i \in S} Q_i \right ] \geq \mathbb {E}_{S \subseteq [n], |S| = 2} \left [ \prod _{i \in S} Q_i \right ]^{1/2} \geq \dots \hspace {5cm} \nonumber \\ \dots \geq \mathbb {E}_{S \subseteq [n], |S| = k} \left [ \prod _{i \in S}Q_i \right ]^{1/k} \geq \dots \geq \mathbb {E}_{S \subseteq [n], |S| = n} \left [ \prod _{i \in S} Q_i \right ]^{1/n} \nonumber \end{aligned}

and note that the left most term is equal to $\frac {\sum _i Q_i}{n}$, while the right most term is equal to $\left (\prod _i Q_i \right )^{1/n}$

Applying the above inequality to $T_k$ and a common approximation for the binomial coefficient, we have that

\begin{aligned}T_k = { n \choose k} ~~ \mathbb {E}_{S \subseteq [n], |S| = k} \left [ \prod _{i \in S} P_i \right ] \leq {n \choose k} \sum _{i = 1}^n \left (\frac {P_i}{n}\right )^k \leq \left (\frac {e n}{k}\right )^k \left (\frac {\sum P_i}{n}\right )^k = \left (\frac {e \sum P_i}{k}\right )^k.\end{aligned}

Therefore, we are done if $\sum P_i \leq \frac {k}{2 e}$. Recall that $P_i = \Pr [E_i] = 1 - \Pr [X_i = 1]$. So if $P_i$ is small then $\Pr [X_i = 1]$ is close to $1$.

It remains to handle the case that $\sum P_i \geq \frac {k}{2 e}$. Pick $n'$ such that

\begin{aligned}\sum _{i=1}^{n'} P_i = \frac {k}{2e} \pm 1.\end{aligned}

By the previous argument, the AND of the first $n'$ is the same up to $2^{-\Omega (k)}$ for $D$ and $B$. Also, for every distribution the probability of that the And of $n$ bits is 1 is at most the probability that the And of $n'$ bits is $1$. And also, for the $n$-wise independent distribution $B$ we have

\begin{aligned} \Pr _B\left [\bigwedge _{i=1}^{n'} X_i = 1\right ] & = \prod _{i=1}^{n'} \Pr [X_i = 1] \\ & = \prod _{i=1}^{n'} (1 - P_i)\\ & \leq \left (\frac {\sum _{i=1}^{n'}(1-P_i)}{n'}\right )^{n'} \textrm { by the AM-GM inequality} \\ & \leq \left (\frac {n' - k/2e}{n'}\right )^{n'} \leq (1 - k/2en')^{n'} \leq e^{-\Omega (k)}. \end{aligned}

The combination of these facts concludes this case. To summarize, in this case we showed that

\begin{aligned}\Pr _D[\bigwedge _{i=1}^n X_i = 1] \le \Pr _D[\bigwedge _{i=1}^{n'} X_i = 1].\end{aligned}

as well as

\begin{aligned}\Pr _B[\bigwedge _{i=1}^n X_i = 1] \le \Pr _B[\bigwedge _{i=1}^{n'} X_i = 1] \leq 2^{-\Omega (k)}.\end{aligned}

By the choice of $n'$ and the previous argument, we also know that $|\Pr _D[\bigwedge _{i=1}^{n'} X_i = 1]- \Pr _B[\bigwedge _{i=1}^{n'} X_i = 1]| \le 2^{-\Omega (k)}$ and so we are done, as all quantities above are at most $2^{-\Omega (k)}$ (and at least $0$). $\square$

Remark 2. The bound is tight up to $\Omega ( .)$

Proof. Let $D$ be the distribution over $\{0,1\}^{k+1}$ as follows: $D_{1, \dots , k} = U_k$ and $D_{k+1} = D_1 + \dots + D_k \mod 2$. Then, $D$ is $k$-wise independent. However, if $k$ is even, then

\begin{aligned}\Pr [\bigwedge _{i=1}^{k+1} D_i = 1] = 0.\end{aligned}

Yet, we have that

\begin{aligned}\Pr [\bigwedge _{i=1}^{k+1} U_i = 1] = 2^{-(k+1)}.\end{aligned}

$\square$

#### 1.2 Bounded Independence Fools $\textrm {AC}^0$

Acknowledgement. This section is based on Amnon Ta-Shma’s notes for the class 0368.4159 Expanders, Pseudorandomness and Derandomization CS dept, Tel-Aviv University, Fall 2016.

Note that a DNF on $n$ bits can be modeled as a depth two circuit where the top layer is an OR-gate whose inputs are AND-gates, which take inputs $X_1, \dots , X_n$ and their negations. The circuit class $\textrm {AC}^0$ can be viewed as a generalization of this to higher (but constant) depth circuits. That is, $\textrm {AC}^0$ consists of circuits using AND-gates, OR-gates, NOT-gates, and input registers. Each of the gates have unbounded fan-in (i.e. the number of input wires). The size of the circuit is defined to be the number of gates.

$\textrm {AC}^0$ is one of the most studied classes in complexity theory. $\textrm {AC}^0$ circuits of polynomial size can do many things, including adding and subtracting $n$-bit integers.

Conjecture 3.[Linial-Nisan[LN90]] $\log ^{O(d)} s$-wise independence fools $\textrm {AC}^0$ circuits of depth $d$ and size $s$.

The conjecture was open for a long time, even for in the special case $d=2$. In 2007 a breakthrough work by Bazzi [Baz09] proved it for $d=2$. Shortly afterwards, Razborov presented a simpler proof of Bazzi’s result [Raz09], and Braverman proved the conjecture for any $d$ with $\log ^{d^2} s$-wise independence [Bra10]. Tal improved the result to $\log ^{O(d)} s$ [Tal17].

Interestingly, the progress on the conjecture does not use ideas that were not around since the time of its formulation. Bottom line: if a problem is open for a long time, you should immediately attack it with existing tools.

The high-level intuition why such a result should be true is the following:

1. $\textrm {AC}^0$ is approximated by polylog degree polynomials.
2. $k$-wise independence fools degree-$k$ polynomials.

Proof of (2). Let $x = (x_1, \dots , x_n) \in \{0,1\}^n$. Let $p(x_1, \dots , x_n)$ be a degree $k$ polynomial over $\mathbb {R}$. Write $p$ as

\begin{aligned}p(x_1, \dots , x_n) = \sum _{M \subseteq [n], |M| \leq k} c_M \cdot x_M.\end{aligned}

If $D$ is a $k$-wise independent distribution on $\{0,1\}^n$, then by linearity of expectation

\begin{aligned} \mathbb {E}_D[P] = \sum _{M \subseteq [n], |M| \leq k} c_M \mathbb {E}_D[x_M] = \sum _{M \subseteq [n], |M| \leq k} c_M \mathbb {E}_U[x_M] = \mathbb {E}_U[P]. \end{aligned}

$\square$

There are several notions of approximating AC$^0$ by low-degree polynomials. We now review two of them, explaining why neither of them is sufficient. Braverman showed how to cleverly combine the two methods to prove a version of (1) that’s strong enough.

##### 1.2.1 Approximation 1

Theorem 4. For all $\textrm {AC}^0$ circuits $C(x_1, \dots , x_n)$ of size $s$ and depth $d$, for all distributions $D$ over $\{0,1\}^n$, for all $\epsilon$, there exists a polynomial $p(x_1, \dots , x_n)$ of degree $\log ^{O(d)} s/\epsilon$ such that

\begin{aligned}\Pr _{x \leftarrow D}[p(x) = C(x)] \geq 1 - \epsilon .\end{aligned}

The important features of this approximation are that it works under any distribution, and when the polynomial is correct it outputs a boolean value.

Similar approximations appear in many papers, going back to Razborov’s paper [Raz87] (who considers polynomials modulo 2) which uses ideas from earlier still work.

Note that the polynomial $p$ depends on the circuit $C$ chosen, and on the distribution. This theorem is not a good enough approximation because on the $\epsilon$ fraction of inputs where the polynomial and circuit are unequal, the value of the polynomial can (and does) explode to be much greater than $1/\epsilon$. This prevents us from bounding the average of the polynomial.

Nevertheless, let us prove the above theorem.

Proof. Consider one OR-gate of fan-in $s$. We construct a distribution of polynomials that compute any input with high probability. This implies that there is a fixed polynomial that computes the circuit on a large fraction of the inputs by an averaging argument.

For $i = 1, 2, \dots , \log s$. let $S_i$ be a random subset of $[s]$ where every element is included with probability $1/2^i$, independently.

Suppose $x$ has Hamming weight $2^j$. Then, $\mathbb {E}[\sum _{n \in S_j} x_n] = 1$. And the sum can be shown to equal $1$ with constant probability.

Define the approximation polynomial $p$ to be

\begin{aligned}p(x) := 1 - \prod _{i=1}^{\log s} (1 - \sum _{h \in S_i} x_h)\end{aligned}

Note that if $x$ has weight $w > 0$, then $p(x) = 0$ with constant probability. If $w =0$, then $p(x) = 1$ with probability $1$. We can adjust the error probability to $\epsilon$ by repeating each term in the product $\log (1/\epsilon )$ times.

Thus, we can approximate one gate with the above polynomial of degree $O(\log (s) \cdot \log (1/\epsilon ))$. Construct polynomials as $p$ above for each gate, with error parameter $\epsilon /s$. The probability that any of them is wrong is at most $\epsilon$ by a union bound. To obtain the approximating polynomial for the whole circuit compose all the polynomials together. Since the circuit is of depth $d$, the final degree of the approximating polynomial is $(\log (s) \cdot \log (s/\epsilon ))^{d}$, as desired.

As mentioned at the beginning, this is a distribution on polynomials that computes correctly any input with probability at least $1-\epsilon$. By averaging, there exists a fixed polynomial that computes correctly a $1-\epsilon$ fraction of inputs. $\square$

It can be verified that the value of the polynomial can be larger than $1/\epsilon$. The polynomial for the gates closest to the input can be as large as $s$. Then at the next level it can be as large as $s^{\log s/\epsilon }$, which is already much larger than $1/\epsilon$.

#### 1.3 Approximation 2

Theorem 5.[Linial, Mansour, Nisan ’93 [LMN93]] For all circuits $C$ of size $s$ and depth $d$, for all error values $\epsilon$, there exists a polynomial $p(x_1, \dots , x_n)$ of degree $\log (s/\epsilon )^{O(d)}$ such that

\begin{aligned} \mathbb {E}_{x \leftarrow U_n}[(C(x) - p(x))^2] \leq \epsilon \end{aligned}

The important feature of this approximation is that it bounds the average, but only under the uniform distribution. Because it does not provide any guarantee on other distributions, including $k$-wise independent distributions, it cannot be used directly for our aims.

Remark 6. Approximation 2 is proved via the switching lemma, an influential lemma first proved in the early 80’s by Ajtai [Ajt83] and by Furst, Saxe, and Sipser [FSS84]. The idea is to randomly set a subset of the variables to simplify the circuit. You can do this repeatedly to simplify the circuit even further, but it only works on the uniform distribution. Hastad [Hås87] gave a much tighter analysis of the switching lemma, which is reflected in the parameters of Approximation 2.

#### 1.4 Bounded Independence Fools $\textrm {AC}^0$

Theorem 7.[Braverman [Bra10]] For all circuits $C$ with unbounded fan-in of size $s$ and depth $d$, for all error values $\epsilon$, for all $k$-wise independent distributions $D$ on $\{0,1\}^n$, we have that

\begin{aligned} |\mathbb {E}[C(D)] - \mathbb {E}[C(U_n)]| \leq \epsilon \end{aligned}

for $k = \log (s/\epsilon )^{O(d^2)}$.

Tal [Tal17] later improved the exponent in $k$ to $d$ but this is more complicated to show.

Corollary 8. In particular, if $s = \textrm {poly}(n)$, $d = O(1)$, $s = 1/\textrm {poly}(n)$, then $k = \log ^{O(1)}(n)$ suffices.

The next claim is the ultimate polynomial approximation used to prove the theorem.

Claim 9. For all circuits $C$ with unbounded fan-in of size $s$ and depth $d$, for all error values $\epsilon$, for all $k$-wise independent distributions $D$ on $\{0,1\}^n$, there is a set $E$ of inputs, and a degree-$k$ polynomial $p$ such that:

1. $E$ is ’rare’ under both $D$ and $U_n$:
$\Pr _{x \leftarrow U_n}[E(x) = 1] \leq \epsilon$, and $\Pr _{x \leftarrow D}[E(x) =1] \leq \epsilon$. Here we write $E(x)$ for the indicator function of the event $x \in E$.
2. For all $x$, $p(x) \leq C(x) \vee E(x)$. Here $\vee$ is the logical Or.
3. $\mathbb {E}[p(U_n)] = \mathbb {E}[C(U_n)] \pm \epsilon$.

We only need (1) under $D$, but (1) under $U$ is used to prove (3).

Proof of Theorem 7 from Claim 9.

\begin{aligned} \mathbb {E}[C(D)] & = \mathbb {E}[C(D) \vee E(D)] \pm \epsilon \text {, by Claim.(1)} \nonumber \\ & \geq \mathbb {E}[p(D)] \pm \epsilon \text {, by Claim.(2)} \nonumber \\ & = \mathbb {E}[p(U_n)] \pm \epsilon \text {, because } p \text { has degree } k \text { and } D \text { is } k \text {-wise independent} \nonumber \\ & = \mathbb {E}[C(U_n)] \pm \epsilon \text {, by Claim.(3)} \nonumber \end{aligned}

For the other direction, repeat the argument for ‘not $C$’. $\square$

We can construct the polynomial approximation from Claim 9 by using a combination of Approximation 1 and 2. First we need a little more information about Approximation 1.

Claim 10. Two properties of approximation 1:

1. For all $x$, $p(x) \leq 2^{\log (s/\epsilon )^{O(d)}}$.
2. The ’bad’ set $E$ is computable by a circuit of size $\textrm {poly}(s)$, and depth $d + O(1)$.

Proof of Claim 10 part 2. Consider a single OR gate with input gates $g_1, \dots , g_s$. This is represented in the approximating polynomial by the term

\begin{aligned}1 - \prod _{i=1}^{\textrm {polylog}(s/\epsilon )}(1 - \sum _{j \in S_i} g_j).\end{aligned}

Note that the term is incorrect exactly when the input $g_1, \dots , g_s$ has weight $> 0$ but all the sets $S_i$ intersect $0$ or $\geq 2$ ones. This can be checked in AC$^0$, in parallel for all gates in the circuit. $\square$

Proof of Claim 9. Run approximation 1 for the distribution $\frac {D+U}{2}$, yielding the polynomial $p_c$ and the set $E$. This already proves the first part of the claim for both $D$ and $U$, because if $E$ has probability $\epsilon$ under $D$ it has probability $\ge \epsilon /2$ under $(D+U)/2$, and the same for $U$ . Use Claim 10 part 2, and run approximation 2 on $E$. Call the resulting polynomial $p_E$, which has degree $\log (s/\delta )^{O(d)}$ with error bound $\delta$.

The idea in the ultimate approximating polynomial is to “check if there is a mistake, and if so, output $0$. Otherwise, output $C$”. Formally:

\begin{aligned}p(x) := 1 - (1 - p_c(1 - p_E))^2\end{aligned}

Claim 9 part 2 can be shown as follows. $p(x) \leq 1$ by definition. So, if $C(x) \vee E(x) = 1$, then we are done. Otherwise, $C(x) \vee E(x) = 0$. So there is no mistake, and $C=0$. Hence, by the properties of Approximation 1, $p_c(x) = 0$. This implies $p(x)=0$.

It only remains to show Claim 9 part 3:

\begin{aligned} \mathbb {E}_U[p(x)] = \mathbb {E}_U[C(x)] \pm \epsilon . \end{aligned}

By part 1 of Claim 9,

\begin{aligned} \mathbb {E}_U[C(x) - p(x)] = \mathbb {E}_U[C(x) \vee E(x) - p(x)] \pm \epsilon . \end{aligned}

We can show that this equals

\begin{aligned} \mathbb {E}_U\left [\left (C(x) \vee E(x) - p_c(x)(1-p_E(x))\right )^2\right ] \pm \epsilon \end{aligned}

by the following argument: If $C(x) \vee E(x) =1$ then $1-p(x) = (1-p_c(x)(1-p_E(x)) )^2$ by definition. If $C(x) \vee E(x) =0$, then there is no mistake, and $C(x) = 0$. This implies that $p_c(x)(1 - p_E(x)) = p(x) = 0$.

Let us rewrite the above expression in terms of the expectation $\ell _2$ norm.

\begin{aligned} || C \vee E - p_c(1 - p_E)||_2^2. \end{aligned}

Recall the triangle inequality, which states: $||u - v||_2 \leq ||u - w||_2 + ||w - v||_2$. Therefore, letting $w = p_c(1 - E)$ we have that the above quantity is

\begin{aligned} \leq (|| p_c(1 - E) - p_c(1 - p_E)||_2 ~~ + ~~ ||p_c(1 - E) - C \vee E ||_2)^2 \end{aligned}
\begin{aligned} \leq O(||p_c(1-E) - p_c(1 - p_E)||_2^2 + ||p_c (1 - E) - C \vee E ||_2^2). \end{aligned}

To conclude, we will show that each of the above terms are $\leq \epsilon$. Note that

\begin{aligned} ||p_c(1-E) - p_c(1 - p_E)||_2^2 \leq \max _x |p_c(x)|^2 ||(1 - E) - (1 - p_E)||_2^2. \end{aligned}

By Claim 10 part 1 and Approximation 2, this is at most

\begin{aligned} 2^{\log (s/\epsilon )^{O(d)}} \cdot ||E - p_E||_2^2 \leq 2^{\log (s/\epsilon )^{O(d)}} \cdot \delta . \end{aligned}

For this quantity to be at most $\epsilon$ we set $\delta = \epsilon \cdot 2^{-\log (s/\epsilon )^{O(d)}}$. Here we critically set the error in Approximation 2 much lower, to cancel the large values arising from Approximation 1. By Theorem 5, the polynomial arising from approximation 2 has degree $\log (s/\delta )^{O(d)}= (\log (s/\epsilon )^{O(d)})^{O(d)} = \log (s/\epsilon )^{O(d^2)}$. This is where the “$d^2$” term arises.

Finally, let us bound the other term, $||p_c (1 - E) - C \vee E ||_2^2$. If $E(x) = 0$, then the distance is $0$. If $E(x) =1$, then the distance is $\leq 1$. Therefore, this term is at most $\Pr _U[E(x) =1]$, which we know to be at most $\epsilon$. $\square$

### References

[Ajt83]    Miklós Ajtai. $\Sigma \sp {1}\sb {1}$-formulae on finite structures. Annals of Pure and Applied Logic, 24(1):1–48, 1983.

[Baz09]    Louay M. J. Bazzi. Polylogarithmic independence can fool DNF formulas. SIAM J. Comput., 38(6):2220–2272, 2009.

[Bra10]    Mark Braverman. Polylogarithmic independence fools AC$^{\mbox {0}}$ circuits. J. of the ACM, 57(5), 2010.

[FSS84]    Merrick L. Furst, James B. Saxe, and Michael Sipser. Parity, circuits, and the polynomial-time hierarchy. Mathematical Systems Theory, 17(1):13–27, 1984.

[Hås87]    Johan Håstad. Computational limitations of small-depth circuits. MIT Press, 1987.

[LMN93]   Nathan Linial, Yishay Mansour, and Noam Nisan. Constant depth circuits, Fourier transform, and learnability. J. of the ACM, 40(3):607–620, 1993.

[LN90]     Nathan Linial and Noam Nisan. Approximate inclusion-exclusion. Combinatorica, 10(4):349–365, 1990.

[Raz87]    Alexander Razborov. Lower bounds on the dimension of schemes of bounded depth in a complete basis containing the logical addition function. Akademiya Nauk SSSR. Matematicheskie Zametki, 41(4):598–607, 1987. English translation in Mathematical Notes of the Academy of Sci. of the USSR, 41(4):333-338, 1987.

[Raz09]    Alexander A. Razborov. A simple proof of Bazzi’s theorem. ACM Transactions on Computation Theory (TOCT), 1(1), 2009.

[Tal17]    Avishay Tal. Tight bounds on the fourier spectrum of AC0. In Conf. on Computational Complexity (CCC), pages 15:1–15:31, 2017.

### Non-Depth-First Search against Independent Distributions on an AND-OR Tree

Authors: Toshio Suzuki
Abstract: Suzuki and Niida (Ann. Pure. Appl. Logic, 2015) showed the following results on independent distributions (IDs) on an AND-OR tree, where they took only depth-first algorithms into consideration. (1) Among IDs such that probability of the root having value 0 is fixed as a given r such that 0 < r < 1, if d is a maximizer of cost of the best algorithm then d is an independent and identical distribution (IID). (2) Among all IDs, if d is a maximizer of cost of the best algorithm then d is an IID. In the case where non-depth-first algorithms are taken into consideration, the counter parts of (1) and (2) are left open in the above work. Peng et al. (Inform. Process. Lett., 2017) extended (1) and (2) to multi-branching trees, where in (2) they put an additional hypothesis on IDs that probability of the root having value 0 is neither 0 nor 1. We give positive answers for the two questions of Suzuki-Niida. A key to the proof is that if ID d achieves the equilibrium among IDs then we can chose an algorithm of the best cost against d from depth-first algorithms. In addition, we extend the result of Peng et al. to the case where non-depth-first algorithms are taken into consideration.

### Predicting Positive and Negative Links with Noisy Queries: Theory & Practice

Authors: Charalampos E. Tsourakakis, Michael Mitzenmacher, Jarosław Błasiok, Ben Lawson, Preetum Nakkiran, Vasileios Nakos
Abstract: Social networks and interactions in social media involve both positive and negative relationships. Signed graphs capture both types of relationships: positive edges correspond to pairs of "friends", and negative edges to pairs of "foes". The edge sign prediction problem, that aims to predict whether an interaction between a pair of nodes will be positive or negative, is an important graph mining task for which many heuristics have recently been proposed [Leskovec 2010].

We model the edge sign prediction problem as follows: we are allowed to query any pair of nodes whether they belong to the same cluster or not, but the answer to the query is corrupted with some probability $0<q<\frac{1}{2}$. Let $\delta=1-2q$ be the bias. We provide an algorithm that recovers all signs correctly with high probability in the presence of noise for any constant gap $\delta$ with $O(\frac{n\log n}{\delta^4})$ queries. Our algorithm uses breadth first search as its main algorithmic primitive. A byproduct of our proposed learning algorithm is the use of $s-t$ paths as an informative feature to predict the sign of the edge $(s,t)$. As a heuristic, we use edge disjoint $s-t$ paths of short length as a feature for predicting edge signs in real-world signed networks. Our findings suggest that the use of paths improves the classification accuracy, especially for pairs of nodes with no common neighbors.

### A Communication-Efficient Distributed Data Structure for Top-k and k-Select Queries

Authors: Felix Biermeier, Björn Feldkord, Manuel Malatyali, Friedhelm Meyer auf der Heide
Abstract: We consider the scenario of $n$ sensor nodes observing streams of data. The nodes are connected to a central server whose task it is to compute some function over all data items observed by the nodes. In our case, there exists a total order on the data items observed by the nodes. Our goal is to compute the $k$ currently lowest observed values or a value with rank in $[(1-\varepsilon)k,(1+\varepsilon)k]$ with probability $(1-\delta)$. We propose solutions for these problems in an extension of the distributed monitoring model where the server can send broadcast messages to all nodes for unit cost. We want to minimize communication over multiple time steps where there are $m$ updates to a node's value in between queries. The result is composed of two main parts, which each may be of independent interest: (1) Protocols which answer Top-k and k-Select queries. These protocols are memoryless in the sense that they gather all information at the time of the request. (2) A dynamic data structure which tracks for every $k$ an element close to $k$.

We describe how to combine the two parts to receive a protocol answering the stated queries over multiple time steps. Overall, for Top-$k$ queries we use $O(k + \log m + \log \log n)$ and for $k$-Select queries $O(\frac{1}{\varepsilon^2} \log \frac{1}{\delta} + \log m + \log^2 \log n)$ messages in expectation. These results are shown to be asymptotically tight if $m$ is not too small.

### Sorting with Recurrent Comparison Errors

Authors: Barbara Geissmann, Stefano Leucci, Chih-Hung Liu, Paolo Penna
Abstract: We present a sorting algorithm for the case of recurrent random comparison errors. The algorithm essentially achieves simultaneously good properties of previous algorithms for sorting $n$ distinct elements in this model. In particular, it runs in $O(n^2)$ time, the maximum dislocation of the elements in the output is $O(\log n)$, while the total dislocation is $O(n)$. These guarantees are the best possible since we prove that even randomized algorithms cannot achieve $o(\log n)$ maximum dislocation with high probability, or $o(n)$ total dislocation in expectation, regardless of their running time.

### Accelerating PageRank using Partition-Centric Processing

Authors: Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna
Abstract: PageRank is a fundamental link analysis algorithm and a key representative of the performance of other graph algorithms and Sparse Matrix Vector (SpMV) multiplication. Calculating PageRank on sparse graphs generates large amount of random memory accesses resulting in low cache line utilization and poor use of memory bandwidth. In this paper, we present a novel Partition-Centric Processing Methodology (PCPM) that drastically reduces the amount of communication with DRAM and achieves high memory bandwidth. Similar to the state of the art Binning with Vertex-centric Gather-Apply-Scatter (BVGAS) method, PCPM performs partition wise scatter and gather of updates with both phases enjoying full cache line utilization. However, BVGAS suffers from random memory accesses and redundant read/write of update values from nodes to their neighbors. In contrast, PCPM propagates single update from source node to all destinations in a partition, thus decreasing redundancy effectively. We make use of this characteristic to develop a novel bipartite Partition-Node Graph (PNG) data layout for PCPM, that enables streaming memory accesses, with very little generation overhead. We perform detailed analysis of PCPM and provide theoretical bounds on the amount of communication and random DRAM accesses. We experimentally evaluate our approach using 6 large graph datasets and demonstrate an average 2.7x speedup in execution time and 1.7x reduction in communication, compared to the state of the art. We also show that unlike the BVGAS implementation, PCPM is able to take advantage of intelligent node labeling that enhances locality in graphs, by further reducing the amount of communication with DRAM. Although we use PageRank as the target application in this paper, our approach can be applied to generic SpMV computation.

### Near Optimal Sketching of Low-Rank Tensor Regression

Authors: Jarvis Haupt, Xingguo Li, David P. Woodruff
Abstract: We study the least squares regression problem \begin{align*} \min_{\Theta \in \mathcal{S}_{\odot D,R}} \|A\Theta-b\|_2, \end{align*} where $\mathcal{S}_{\odot D,R}$ is the set of $\Theta$ for which $\Theta = \sum_{r=1}^{R} \theta_1^{(r)} \circ \cdots \circ \theta_D^{(r)}$ for vectors $\theta_d^{(r)} \in \mathbb{R}^{p_d}$ for all $r \in [R]$ and $d \in [D]$, and $\circ$ denotes the outer product of vectors. That is, $\Theta$ is a low-dimensional, low-rank tensor. This is motivated by the fact that the number of parameters in $\Theta$ is only $R \cdot \sum_{d=1}^D p_d$, which is significantly smaller than the $\prod_{d=1}^{D} p_d$ number of parameters in ordinary least squares regression. We consider the above CP decomposition model of tensors $\Theta$, as well as the Tucker decomposition. For both models we show how to apply data dimensionality reduction techniques based on {\it sparse} random projections $\Phi \in \mathbb{R}^{m \times n}$, with $m \ll n$, to reduce the problem to a much smaller problem $\min_{\Theta} \|\Phi A \Theta - \Phi b\|_2$, for which if $\Theta'$ is a near-optimum to the smaller problem, then it is also a near optimum to the original problem. We obtain significantly smaller dimension and sparsity in $\Phi$ than is possible for ordinary least squares regression, and we also provide a number of numerical simulations supporting our theory.

### Symmetric Randomized Dependent Rounding

Authors: David G. Harris, Thomas Pensyl, Aravind Srinivasan, Khoa Trinh
Abstract: Dependent rounding is a popular technique in designing approximation algorithms. It allows us to randomly round a fractional solution $x$ to an integral vector $X$ such that $E[X] = x$, all $X_i$'s are ("cylindrically") negatively correlated, and the cardinality constraint $\sum_i X_i = \sum_i x_i$ holds with probability one. One can also essentially replace the cardinality constraint by a general linear constraint $\sum_i a_i X_i = \sum_i a_i x_i$ if the $a_i$'s are non-negative. However, in certain problems, the coefficients $a_i$ may have mixed signs; furthermore, some applications require correlation inequalities that are much more general than negative correlation.

In this paper, we propose a generalized dependent rounding technique, called symmetric randomized dependent rounding, which can round $x$ to an almost-integer vector so that any given linear constraint is satisfied with probability one and such that all the variables are nearly independent. We give two illustrative examples of this technique: a randomized algorithm for the knapsack center problem with new average approximation guarantees and an improved bi-criteria approximation ratio for the knapsack median problem.

### Optimality of orders one to three and beyond: characterization and evaluation complexity in constrained nonconvex optimization

Authors: C. Cartis, N. I. M. Gould, Ph. L. Toint
Abstract: Necessary conditions for high-order optimality in smooth nonlinear constrained optimization are explored and their inherent intricacy discussed. A two-phase minimization algorithm is proposed which can achieve approximate first-, second- and third-order criticality and its evaluation complexity is analyzed as a function of the choice (among existing methods) of an inner algorithm for solving subproblems in each of the two phases. The relation between high-order criticality and penalization techniques is finally considered, showing that standard algorithmic approaches will fail if approximate constrained high-order critical points are sought.

### Evaluation complexity bounds for smooth constrained nonlinear optimisation using scaled KKT conditions, high-order models and the criticality measure $\chi$

Authors: Coralia Cartis, Nick Gould, Philippe L Toint
Abstract: Evaluation complexity for convexly constrained optimization is considered and it is shown first that the complexity bound of $O(\epsilon^{-3/2})$ proved by Cartis, Gould and Toint (IMAJNA 32(4) 2012, pp.1662-1695) for computing an $\epsilon$-approximate first-order critical point can be obtained under significantly weaker assumptions. Moreover, the result is generalized to the case where high-order derivatives are used, resulting in a bound of $O(\epsilon^{-(p+1)/p})$ evaluations whenever derivatives of order $p$ are available. It is also shown that the bound of $O(\epsilon_P^{-1/2}\epsilon_D^{-3/2})$ evaluations ($\epsilon_P$ and $\epsilon_D$ being primal and dual accuracy thresholds) suggested by Cartis, Gould and Toint (SINUM, 2015) for the general nonconvex case involving both equality and inequality constraints can be generalized to a bound of $O(\epsilon_P^{-1/p}\epsilon_D^{-(p+1)/p})$ evaluations under similarly weakened assumptions. This paper is variant of a companion report (NTR-11-2015, University of Namur, Belgium) which uses a different first-order criticality measure to obtain the same complexity bounds.

### Partially separable convexly-constrained optimization with non-Lipschitzian singularities and its complexity

Authors: Xiaojun Chen, Philippe Toint, Hong Wang
Abstract: An adaptive regularization algorithm using high-order models is proposed for partially separable convexly constrained nonlinear optimization problems whose objective function contains non-Lipschitzian $\ell_q$-norm regularization terms for $q\in (0,1)$. It is shown that the algorithm using an $p$-th order Taylor model for $p$ odd needs in general at most $O(\epsilon^{-(p+1)/p})$ evaluations of the objective function and its derivatives (at points where they are defined) to produce an $\epsilon$-approximate first-order critical point. This result is obtained either with Taylor models at the price of requiring the feasible set to be 'kernel-centered' (which includes bound constraints and many other cases of interest), or for non-Lipschitz models, at the price of passing the difficulty to the computation of the step. Since this complexity bound is identical in order to that already known for purely Lipschitzian minimization subject to convex constraints [CartGoulToin2016], the new result shows that introducing non-Lipschitzian singularities in the objective function may not affect the worst-case evaluation complexity order. The result also shows that using the problem's partially separable structure (if present) does not affect complexity order either. A final (worse) complexity bound is derived for the case where Taylor models are used with a general convex feasible set.

### Acronyms and PHP

Whenever I teach discrete math and use FML to mean Formula the students laugh since its a common acroynm for  Fuck My Life. Now they laugh, and I say I know why you are laughing, I know what it means  and they laugh even harder.

BUT it got me thinking: Pigeonhole Principle! There are more things we want short acroynms for then there are short acroynms. Below are some I thought of. I am sure there are others, in fact I am sure there are websites of such, but I wanted to see which ones I just happen to know.

AMS- American Math Society and much much more:see here

DOA-

Department of Aging. Scary!

ERA-

Earned Run Average in Baseball,

Equal Rights Amendment in politics

PCP-

Phencyclidine, a drug that you should never take.

Prob. Checkable Proofs. Obscure to the pubic but not to us.

IRA-

Irish Republic Army

Internal Retirement Account

Several companies have had rumors they fund terrorism because they were giving their employees IRA's. The headline Company X funds IRA's' could be misunderstood.

SAT-

Standard Aptitute Test

Satisfiability (of Boolena Formulas) Obscure to the pubic but not to us. Actually it may get less obscure as more proofs'' resolving P vs NP come out.

SJW

Single Jewish Female (in classified ads- more on that later). I think SJF is more common.

Social Justice Warrior (sounds like a good thing but might not be)

Classified ads are a source of many acronyms which can be used to teach combinatorics.

{S,M,W,D,G}{B,C,H,J,W}{M,F}

S-single, M-married, W-widowed, D-Divorced, G-Gay (this one I've seen alone making me wonder
about S/M/W/D? I've also seen four-letter acronyms to disambiguate).

B- black, C-Christian, H-Hispanic,  J-Jewish, W-White.

M,F- Male, Female, though I am sure there are ways to say other genders.

Great for combinatorics! especially if you add in other ones (like BD)

WTF-

Wisconsin  Tourism Federation

You know what else it means so I won't say it (this is a G-rated blog). When I first saw it I thought what the fuck?- how could they have screwed up so badly?'

TEACHING TOOL- when teaching PHP (Pigeon hole Principle, not the language PHP which stands for Hypertex PreProcessing, not quite in order, or Personal Home Page) you can use the the fact that

number of concepts GREATER THAN  number of 3-letter combos

leads to some 3-letter combos will be used more than once.

by GASARCH (noreply@blogger.com) at September 21, 2017 11:46 PM UTC

### Beyond Worst-Case Analysis: Lecture 6

from Luca Trevisan

Scribed by Theo McKenzie

In which we study the spectrum of random graphs.

1. Overview

When attempting to find in polynomial time an upper bound certificate on the max cut and maximum independent set of a graph, we have used the following property. If ${G\sim G_{n,\frac{1}{2}}}$, then with high probability ${\|A-\mathop{\mathbb E} (A)\|\leq O(\sqrt{n})}$, where ${\|\cdot\|}$ is the spectral norm. Generally, if ${G\sim G_{n,p}}$ and ${p>\frac{\log n}{n}}$ then w.h.p.

$\displaystyle \|A-\mathop{\mathbb E} (A)\|\leq O(\sqrt{np}).$

Today we will prove how to obtain the bound in Proposition 1 with an extra term of ${\log n}$, as well as show an outline of the method of finding the bound in Proposition 1. We will also show how when ${p}$ is small this bound breaks down, namely how when ${p=\Theta(\frac 1 n)}$,

$\displaystyle \|A-\mathop{\mathbb E} (A)\|\geq\Omega \left(\sqrt{\frac{\log n}{\log\log n}} \right).$

2. Introducing the Trace

Henceforth ${M_{ij}^k}$ signifies ${(M^k)_{ij}}$. Take ${M}$ symmetric and real. All eigenvalues of this matrix are real, and we can enumerate them ${\lambda_1,\lambda_2,\ldots,\lambda_n}$ such that ${|\lambda_1|\geq|\lambda_2|\geq\ldots\geq |\lambda_n|}$.

The trace ${\textnormal{Tr}(A)}$ is defined to be ${\textnormal{Tr}(A)=\sum_{i=1}^n A_{ii}}$ where ${A}$ is an ${n\times n}$ matrix.

Moreover we know that \textnormal{Tr}${(A)=\sum_{1=1}^n \lambda_i}$. If we take ${k}$ large and even, the eigenvalues of ${M^k}$ are ${|\lambda_1|^k\geq|\lambda_2|^k\geq\ldots\geq |\lambda_n|^k}$. Therefore we have

$\displaystyle \sum_i M^k_{ii}=\text{Tr}(M^k)=\sum_i |\lambda_i|^k\geq |\lambda_1|^k=\|M\|^k.$

Moreover we have

$\displaystyle \textnormal{Tr}(M^k)^{\frac{1}{k}}=\left(\sum_i |\lambda_i|^k\right)^\frac 1 k\leq n^{\frac{1}{k}}|\lambda_1|= n^{\frac{1}{k}}\|M\|.$

This gives us an estimation of the norm, ${\|M\|\leq (\sum_i M^k_{ii})^\frac{1}{k}\leq n^{\frac1 k}\|M\|}$, which for ${k>\log n}$ gives a constant factor approximation of ${\|M\|}$.

3. Using the Trace to Bound the Spectral Norm

Assume that ${G\sim G_{n,\frac{1}{2}}}$ and ${A}$ is the adjacency matrix of ${G}$. We will prove the following. ${\displaystyle{\mathop{\mathbb E}_{G\sim G_{n,\frac{1}{2}}}}(\textnormal{Tr}((A-\mathop{\mathbb E} (A))^k))}$ is bounded above by ${2^{O(k)}n^{1+k/2}k^{k/2}}$. If ${k>\log n}$, by taking the ${k}$th root we achieve a bound of ${O(\sqrt{n\log n})}$ on ${\|A-\mathop{\mathbb E} (A)\|}$.

3.1. Expected Value of Matrix Entries

First, we examine the matrix ${M=2A-2\mathop{\mathbb E} (A)}$. We have ${M_{ii}=0}$ and ${M_{ij}\in\{\pm1\}}$ with equal probability of each when ${i\neq j}$. Moreover ${M_{ij}=M_{ji}}$. If ${i\neq j,\mathop{\mathbb E}( M_{ij}^k)=0}$ if ${k}$ is odd and ${\mathop{\mathbb E} (M_{ij}^k)=1}$ for ${k}$ even.

${\mathop{\mathbb E} (\sum_i M^k_{ii})=n\mathop{\mathbb E} (M^k_{11})}$ by the linearity of expectation and symmetry between the entries. We evalute ${M^k_{11}}$.

$\displaystyle M^k_{11}=\sum_{\{i_1,\ldots,i_{k-1}\}} M_{1i_1}M_{i_1i_2}\cdots M_{i_{k-1,1}}$

where ${{i_1,\ldots i_{k-1}}}$ represents the intermediate steps on a “path” between vertices that starts at 1 and returns to 1. For example, ${M_{11}^2=\sum M_{1i}M_{i1}}$. Note that we can repeat edges in these paths. By the linearity of expectation

$\displaystyle \mathop{\mathbb E} (M_{11}^k)=\sum_{\{i_1,\ldots,i_{k-1}\}}\mathop{\mathbb E} (M_{1i_1}M_{i_1i_2}\cdots M_{i_{k-1,1}}).$

If any pair ${\{i,j\}}$ occurs ${\ell}$ times in the sequence of pairs ${\{ 1,i_1\}, \{ i_1, i_2 \}, \ldots, \{ i_{k-1}, 1 \}}$, where ${\ell}$ is odd, then as the value of this term is independent from all other terms and ${\mathop{\mathbb E} M^\ell_{ij}=0}$ for odd ${\ell}$, then ${\mathop{\mathbb E} (M_{1i_1}M_{i_1i_2}\cdots M_{i_{k-1}1})=0}$. If all pairs occur an even number of times, their product’s expectation is 1. Therefore ${\mathop{\mathbb E} (M_{11}^k)}$ is the number of sequences ${i_1,\ldots,i_{k-1}\in V^{k-1}}$ such that, in the sequence of pairs ${\{ 1,i_1\}, \{ i_1, i_2 \}, \ldots, \{ i_{k-1}, 1 \}}$, each pair occurs an even number of times.

3.2. Encoding argument

In order to give an upper bound on the number of such sequences, we will show how to encode a sequence where there are ${m}$ distinct edges. In the sequence ${i_1,\ldots i_{k-1}}$, the element ${i_j}$ is represented either as ${(0,i_j)}$, which takes ${1 + \log n}$ bits, if ${i_ji}$ appears for the first time in the sequence at location ${j}$, and as ${(0,\ell)}$ otherwise, where ${\ell < j}$ is such that ${i_\ell = i_j}$, which requires ${1 + \log k}$ bits. Notice that, if ${i_j}$ occurs for the first time at location ${j}$, then the pair ${\{ i_{j-1}, i_j \}}$ also occurs for the first time at the location ${j-1}$ and ${j}$. Thus the number of times that we encounter a vertex for the first time is at most the number of distinct edges. If we have ${t}$ distinct vertices (other than vertex 1), then we are using ${k + t \log n + (k-t) \log k}$; for ${k, this value increases with ${t}$, but we have ${t \leq m \leq k/2}$ (because every edge has to appear an even number of times and so there can be at most ${k/2}$ distinct edges. This means that we use at most ${k + \frac k2 \log n + \frac k2 \log k}$ bits in the encoding. The number of strings that can be encoded using at most ${L}$ bits is ${2^{L+1} }$. If we assume ${k, we have the bound ${\mathop{\mathbb E}(M_{11}^k)\leq k^\frac{k}{2}n^{\frac{k}{2}}2^{k+1}}$, meaning

$\displaystyle \textnormal{Tr}(M)=n\mathop{\mathbb E}( M_{11}^k)\leq n^{1+\frac{k}{2}}2^{k+1} k^\frac{k}{2}.$

Therefore using suitable ${k}$ and ${t}$ we achieve our bound on ${\|M\|}$. For example, choose ${k=\log n}$ and ${t=10\sqrt{n}\sqrt{\log n}}$. We use Markov’s inequality to obtain

$\displaystyle \mathbf{P}(\|M\|>t)=\mathbf{P}(\|M\|^k>t^k)\leq \frac{\mathop{\mathbb E}\|M\|^k}{t^k}\leq\left(\frac{2n^{\frac{1}{k}}\sqrt n \sqrt k }{t}\right)^k\leq e^{-\Omega(\log n)}\rightarrow 0.$

4. Tightening the Bound

To obtain the sharper bound of ${O(\sqrt n)}$, we need to count the number of pairs more sharply and remove the ${k^{\frac{k}{2}}}$ term, namely improve the way we talk about repetitions. Here we give an outline for how to find a tighter bound.

The worst case in the above analysis is when the number of distinct vertices (not counting vertex ${1}$) is maximal, which is ${k/2}$. In that case, the number of distinct “edges” ${\{ i_j, i_{j+1} \}}$ is ${k/2}$, and they must form a connected graph over ${1 + k/2}$ vertices, that is, they have to form a tree. Furthermore, each edges is repeated exactly twice in the closed walk, otherwise we would not have enough distinct edges to connect ${1+k/2}$ distinct vertices.

If the pairs form a tree, then the only way we can have closed walk in which every edge is repeated twice is that the closed walk is a depth-first visit of the tree. In this case, we can improve our encoding in the following way. In a depth-first visit of a tree only two events are possible at each step: either we discover a new vertex, or we backtrack on the edge between the current node and the parent node. Thus we only need to pay ${1 + \log n}$ bits to encode a new node in the sequence and ${1}$ bit to encode an already seen node, and we obtain a bound of ${2^{\frac{k}{2}+\frac k 2 \log n + \frac k 2}= 2^kn^\frac k 2}$. By taking the ${k}$th root we obtain a bound on ${\|M\|}$ of ${O(\sqrt n)}$.

5. Generalizing to any ${p}$

Now assume ${G\sim G_{n,p}}$ and ${A}$ is the adjacency matrix of ${G}$. We also assume ${p<\frac{1}{2}}$. We define

$\displaystyle M=A-\mathop{\mathbb E}(A).$

In this matrix ${M_{ii}=0}$ and if ${i\neq j, M_{i,j}=1-p}$ with probability ${p}$ and ${-p}$ with probability ${1-p}$. Therefore ${\mathop{\mathbb E} (M_{ij})=0, \mathop{\mathbb E} (M_{ij}^2)=p-p^2\leq p}$. In fact, ${\mathop{\mathbb E} (M_{ij}^k)\leq p}$ for all ${k\geq 1}$.

From this we see we need to sum over sequences such that the multiset has each pair occuring at least two times, as if any pair occurs once, the expectation is ${0}$.

Therefore the bound is

$\displaystyle \mathop{\mathbb E} (M_{11}^k)\leq \sum_{i_1,\ldots i_{k-1}} p^\ell$

where ${\ell}$ is the number of distinct pairs and the sum is taken over multisets where each pair occurs at least twice. For large ${\ell}$, the number of sequences where each pair occurs at least twice with ${\ell}$ distinct pairs is approximately ${2^{O(\ell)}n^\ell}$. This would give us

$\displaystyle \sum_{i_1,\ldots i_{k-1}}p^\ell=\sum_\ell p^\ell 2^{O(\ell)}n^\ell\leq O(p^\frac{k}{2}2^{O(k)}n^{\frac{k}{2}})$

so the bound on ${\|M\|}$ is ${O(\sqrt{np})}$. However, the bound on the number of sequences with ${\ell}$ distict pairs breaks down when ${\ell}$ is much smaller than ${k}$. In a full proof much more complicated calculations must be done.

6. Problems with sparse graphs

If ${p=\Theta(\frac{1}{n})}$ , then ${\|A-\mathop{\mathbb E}(A)\|\geq \Omega\left(\sqrt\frac{\log n}{\log\log n}\right)}$ w.h.p.

This breaks down the nice bound we obtained in section 5. This follows from the irregularity of sparse graphs. There will be isolated vertices and vertices with degree much higher than average.

Lemma 1 If ${p=\Theta(\frac{1}{n})}$ then w.h.p. the highest degree vertex of ${G}$ is of order ${\Theta\left(\frac{\log n}{\log \log n}\right)}$.

If G has a node of degree ${\geq d}$, then, for every ${p< \frac 1 {4\sqrt d}}$, ${\lambda_{\max} (A-pJ) \geq\Omega(\sqrt d)}$. This implies that ${\forall 0< p < . \frac 1 {4\sqrt d}, \ \|A-pJ\|\geq \Omega(\sqrt d)}$.

Proof: We have

$\displaystyle \lambda_{\max}(A-pJ)=\max_{{\bf x}}\frac{{\bf x}^T(A-pJ){\bf x}}{\|{\bf x}\|^2}$

where the maximum is taken over all nonzero vectors ${{\bf x}}$. Call ${v}$ a node of degree ${\geq d}$ and call ${d}$ of its neighbors ${u_1,\ldots, u_d}$.

Consider the vector ${{\bf x}}$ such that ${x_v=1, x_{u_i}=\frac{1}{\sqrt d}}$ and ${x_w=0}$ for other vertices ${w}$. We have

$\displaystyle {\bf x}^TA{\bf x}\geq 2 \sqrt d$

$\displaystyle {\bf x}^TpJ{\bf x}=p\cdot \left(\sum x_i\right)^2=p\cdot (1+\sqrt d)^2\leq 4pd$

$\displaystyle || {\bf x}||^2=2$

Therefore if ${p\leq \frac 1{4\sqrt d}}$,

$\displaystyle \frac{{\bf x}^T(A-pJ){\bf x}}{\|{\bf x}\|^2}\geq \sqrt d - \frac 12 \sqrt d=\Omega(\sqrt d)$

yielding the desired bound.

$\Box$

Theorem 4 proceeds immediately from Proposition 5 and Lemma 1.

### TR17-142 | On small-depth Frege proofs for Tseitin for grids | Johan Hastad

from ECCC papers

We prove that a small-depth Frege refutation of the Tseitin contradiction on the grid requires subexponential size. We conclude that polynomial size Frege refutations of the Tseitin contradiction must use formulas of almost logarithmic depth.

### TCS+ talk: Wednesday, September 27th — Yuval Peres, Microsoft Research

The next TCS+ talk will take place next Wednesday, September 27th at 1:00 PM Eastern Time (10:00 AM Pacific Time, 19:00 Central European Time, 17:00 UTC). Yuval Peres (Microsoft Research) will will speak about “Trace reconstruction for the deletion channel” (abstract below).

Please make sure you reserve a spot for your group to join us live by signing up on the online form. As usual, for more information about the TCS+ online seminar series and the upcoming talks, or to suggest a possible topic or speaker, please see the website.

Abstract: In the trace reconstruction problem, an unknown string $x$ of $n$ bits is observed through the deletion channel, which deletes each bit with some constant probability $q$, yielding a contracted string. How many independent outputs (traces) of the deletion channel are needed to reconstruct $x$ with high probability?

The best lower bound known is linear in $n$. Until recently, the best upper bound was exponential in the square root of $n$. We improve the square root to a cube root using statistics of individual output bits and some complex analysis; this bound is sharp for reconstruction algorithms that only use this statistical information. (Similar results were obtained independently and concurrently by De, O’Donnell and Servedio). If the string $x$ is random and $q<1/2$, we can show a subpolynomial number of traces suffices by comparison to a biased random walk.

(Joint works with Fedor Nazarov, STOC 2017 and with Alex Zhai, FOCS 2017).

by plustcs at September 21, 2017 06:43 AM UTC

### Drawing Graphs on Few Circles and Few Spheres

Authors: Myroslav Kryven, Alexander Ravsky, Alexander Wolff
Abstract: Given a drawing of a graph, its visual complexity is defined as the number of geometrical entities in the drawing, for example, the number of segments in a straight-line drawing or the number of arcs in a circular-arc drawing (in 2D). Recently, Chaplick et al. introduced a different measure for the visual complexity, the affine cover number, which is the minimum number of lines (or planes) that together cover a straight-line drawing of a graph $G$ in 2D (3D). In this paper, we introduce the spherical cover number, which is the number of circles (or spheres) that together cover a circular-arc drawing in 2D (or 3D). It turns out that spherical covers are sometimes significantly smaller than affine covers. Moreover, there are highly symmetric graphs that have symmetric optimum spherical covers but apparently no symmetric optimum affine cover. For complete, complete bipartite, and platonic graphs, we analyze their spherical cover numbers and compare them to their affine cover numbers as well as their segment and arc numbers. We also link the spherical cover number to other graph parameters such as chromatic number, treewidth, and linear arboricity.

### Efficient Graph Edit Distance Computation and Verification via Anchor-aware Lower Bound Estimation

Authors: Lijun Chang, Xing Feng, Xuemin Lin, Lu Qin, Wenjie Zhang
Abstract: Graph edit distance (GED) is an important similarity measure adopted in a similarity-based analysis between two graphs, and computing GED is a primitive operator in graph database analysis. Partially due to the NP-hardness, the existing techniques for computing GED are only able to process very small graphs with less than 30 vertices. Motivated by this, in this paper we systematically study the problems of both GED computation, and GED verification (i.e., verify whether the GED between two graphs is no larger than a user-given threshold). Firstly, we develop a unified framework that can be instantiated into either a best-first search approach AStar+ or a depth-first search approach DFS+. Secondly, we design anchor-aware lower bound estimation techniques to compute tighter lower bounds for intermediate search states, which significantly reduce the search spaces of both AStar+ and DFS+. We also propose efficient techniques to compute the lower bounds. Thirdly, based on our unified framework, we contrast AStar+ with DFS+ regarding their time and space complexities, and recommend that AStar+ is better than DFS+ by having a much smaller search space. Extensive empirical studies validate that AStar+ performs better than DFS+, and show that our AStar+-BMa approach outperforms the state-of-the-art technique by more than four orders of magnitude.

### Complexity of Finding Perfect Bipartite Matchings Minimizing the Number of Intersecting Edges

Authors: Grzegorz Guśpiel
Abstract: Consider a problem where we are given a bipartite graph H with vertices arranged on two horizontal lines in the plane, such that the two sets of vertices placed on the two lines form a bipartition of H. We additionally require that H admits a perfect matching and assume that edges of H are embedded in the plane as segments. The goal is to compute the minimal number of intersecting edges in a perfect matching in H. The problem stems from so-called token swapping problems, introduced by Yamanaka et al. [3] and generalized by Bonnet, Miltzow and Rz\k{a}\.zewski [1]. We show that our problem, equivalent to one of the special cases of one of the token swapping problems, is NP-complete.

### Parameterization of configuration space obstacles in three-dimensional rotational motion planning

Authors: Przemysław Dobrowolski
Abstract: This study investigates the exact geometry of the configuration space in three-dimensional rotational motion planning. A parameterization of configuration space obstacles is derived for a given triangulated or ball-approximated scene with an object rotating around a given point. The results obtained are an important step towards a complete description of the rich structure of a configuration space in three-dimensional rotational motion planning.

### 6th International Optimisation Summer School

from CS Theory Events

January 14-19, 2018 NSW south coast, Australia http://site-1239031-1892-6439.strikingly.com A week by the beach learning about optimisation from leading expertsFiled under: school

by shacharlovett at September 20, 2017 10:05 AM UTC

### The 2nd Winter School in Engineering and Computer Science on Formal Verification

from CS Theory Events

December 17-21, 2017 Jerusalem, Israel http://ias.huji.ac.il/CSE2 Submission deadline: October 1, 2017Filed under: school

by shacharlovett at September 20, 2017 09:56 AM UTC

### Drama Therapy: A Math Viewpoint

from Richard Lipton

What is drama therapy?

Kathryn Farley obtained her PhD from Northwestern University in performance studies in 2007. After almost a decade working in that area, she has just started a Master’s program at New York University in a related field called drama therapy (DT).

I thought today I would talk about the math aspects of DT.

Okay so why should I report on DT here? It seems to have nothing in common with our usual topics. But I claim that it does, and I would like to make the case that it is an example of a phenomenon that we see throughout mathematics.

So here goes. By the way—to be fair and transparent—I must say that I am biased about Dr. Farley, since she is my wonderful wife. So take all I say with some reservations.

The whole point is that understanding what DT is hard, at least for me. But when I realized that it related to math it became much clearer to me, and I hope that it may even help those in DT to see what they do in a new light. It’s the power of math applied not to physics, not to biology, not to economics, but to a social science. Perhaps I am off and it’s just another example of “when you have a hammer, the whole world looks like a nail.” Oh, well.

## DT

I asked Kathryn for a summary of DT and here it is:

Drama therapy uses methods from theatre and performance to achieve therapeutic goals. Unlike traditional “talk” therapy, this new therapeutic method involves people enacting scenes from their own lives in order to express hidden emotions, gain valuable insights, solve problems and explore healthier behaviors. There are many types of DT, but most methods rely on members of a group acting as therapeutic agents for each other. In effect, the group functions as a self-contained theatre company, playing all the roles that a performance requires—playwright, director, actors, stagehands, and audience. The therapist functions as a producer, setting up the context for each scene and soliciting feedback from the audience.

## Math View Of DT

Kathryn’s summary of DT is clear and perhaps I should stop here and forget about linking it to math. But I think there is a nice connection that I would like to make.

Since Kathryn is a student again, and students are assigned readings—there is a lot of reading in DT—you may imagine that she has been sharing with me a lot of thoughts on her readings and classes. I have listened carefully to her, but honestly it was only the other day, in a cab going to Quad Cinema down on 13th St., that I had the “lightblub moment.” I suddenly understood what she is studying. Perhaps riding in a cab helps one listen: maybe that has been studied before by those in cognitive studies.

What I realized during that cab ride is that DT is an example of a generalization of another type of therapy. If the other therapy involves ${N=2}$ people–including the therapist—then DT is the generalization to ${N>2}$. We see this all the time in math, but it really helped me to see that the core insight—in my opinion—is that DT has simply moved ${N}$ from ${2}$ to ${3}$ or more.

We this type of generalization all the time in math. For example, in communication complexity the basic model is two players sending each other messages. The generalization to more players creates very different behavior. Another example is the rank of a matrix. This is a well understood notion: easy to compute and well behaved. Yet simply changing from a two-dimensional matrix to a three-dimensional tensor changes everything. Now the behavior is vastly more complex and the rank function is no longer known to be easy to compute.

## An Example

Here is an example of how DT could work—it is based on a case study Kathryn told me about.

Consider Bob who is seeing Alice who is Bob’s therapist. Alice is trained in some type of therapy that she uses via conversations with Bob to help him with some issue. This can be very useful if done correctly.

What DT is doing in letting ${N}$ be ${3}$ or more is a huge step. We see this happen all the time in mathematics—-finally the connection. Let’s look at Bob and Alice once more. Now Alice is talking with Bob about an issue. To be concrete let’s assume that Bob’s issue is this:

Bob has been dating two women. His dilemma is, which one should he view as a marriage prospect? He thinks both would go steady with him but they are very different in character. Sally is practical, solid, and interesting; Wanda is interesting too but a bit wild and unpredictable. Whom should he prefer?

The usual talk therapy would probably have Alice and Bob discuss the pros and cons. Hopefully Alice would ask the right question to help Bob make a good decision.

The DT approach would be quite different. Alice would have at least one other person join them to discuss Bob’s decision. This would change the mode from direct “telling” to a more indirect story-line. In that line it might emerge that Bob’s mother is a major factor in his decision—even though she passed away long ago. It might come out that his mom divorced his dad when he was young because he was too staid and level-headed. Perhaps this would make it clear to Bob that his mother was really the reason he was even considering Wanda, the wild one.

What is so interesting here is that by using more that just Bob, by setting ${N = 3}$, Alice can make the issues much more viivid for Bob.

## ${N}$ is the Root

The more I think about it, the idea of ${N \geq 3}$ people involved is the root. Naturally anything with more than two people transits from dialogue to theater. So the aspect of `drama’ is not primordial—it is emergent. Once you say ${N \geq 3}$, what goes down as Drama Therapy in the textbooks flows logically and sensibly—at least it does to me now.

This is accompanied by a phase change in complexity and richness. As such it parallels ways we have talked about mathematical transitions from the case of ${2}$ to ${3}$ on the blog before. Maybe DT even implements a strategy I heard from Albert Meyer:

Prove the theorem for ${3}$ and then let ${3}$ go to infinity.

## Open Problems

Does this connection help? Does it make any sense at all?

by rjlipton at September 20, 2017 03:11 AM UTC

### BIOS ORAM: Improved Privacy-Preserving Data Access for Parameterized Outsourced Storage

Authors: Michael T. Goodrich
Abstract: Algorithms for oblivious random access machine (ORAM) simulation allow a client, Alice, to obfuscate a pattern of data accesses with a server, Bob, who is maintaining Alice's outsourced data while trying to learn information about her data. We present a novel ORAM scheme that improves the asymptotic I/O overhead of previous schemes for a wide range of size parameters for client-side private memory and message blocks, from logarithmic to polynomial. Our method achieves statistical security for hiding Alice's access pattern and, with high probability, achieves an I/O overhead that ranges from $O(1)$ to $O(\log^2 n/(\log\log n)^2)$, depending on these size parameters, where $n$ is the size of Alice's outsourced memory. Our scheme, which we call BIOS ORAM, combines multiple uses of B-trees with a reduction of ORAM simulation to isogrammic access sequences.

### Inference in Graphical Models via Semidefinite Programming Hierarchies

Authors: Murat A. Erdogdu, Yash Deshpande, Andrea Montanari
Abstract: Maximum A posteriori Probability (MAP) inference in graphical models amounts to solving a graph-structured combinatorial optimization problem. Popular inference algorithms such as belief propagation (BP) and generalized belief propagation (GBP) are intimately related to linear programming (LP) relaxation within the Sherali-Adams hierarchy. Despite the popularity of these algorithms, it is well understood that the Sum-of-Squares (SOS) hierarchy based on semidefinite programming (SDP) can provide superior guarantees. Unfortunately, SOS relaxations for a graph with $n$ vertices require solving an SDP with $n^{\Theta(d)}$ variables where $d$ is the degree in the hierarchy. In practice, for $d\ge 4$, this approach does not scale beyond a few tens of variables. In this paper, we propose binary SDP relaxations for MAP inference using the SOS hierarchy with two innovations focused on computational efficiency. Firstly, in analogy to BP and its variants, we only introduce decision variables corresponding to contiguous regions in the graphical model. Secondly, we solve the resulting SDP using a non-convex Burer-Monteiro style method, and develop a sequential rounding procedure. We demonstrate that the resulting algorithm can solve problems with tens of thousands of variables within minutes, and outperforms BP and GBP on practical problems such as image denoising and Ising spin glasses. Finally, for specific graph types, we establish a sufficient condition for the tightness of the proposed partial SOS relaxation.

### Tilt Assembly: Algorithms for Micro-Factories That Build Objects with Uniform External Forces

Authors: Aaron T. Becker, Sándor P. Fekete, Phillip Keldenich, Dominik Krupke, Christian Rieck, Christian Scheffer, Arne Schmidt
Abstract: We present algorithmic results for the parallel assembly of many micro-scale objects in two and three dimensions from tiny particles, which has been proposed in the context of programmable matter and self-assembly for building high-yield micro-factories. The underlying model has particles moving under the influence of uniform external forces until they hit an obstacle; particles can bond when being forced together with another appropriate particle. Due to the physical and geometric constraints, not all shapes can be built in this manner; this gives rise to the Tilt Assembly Problem (TAP) of deciding constructibility. For simply-connected polyominoes $P$ in 2D consisting of $N$ unit-squares ("tiles"), we prove that TAP can be decided in $O(N\log N)$ time. For the optimization variant MaxTAP (in which the objective is to construct a subshape of maximum possible size), we show polyAPX-hardness: unless P=NP, MaxTAP cannot be approximated within a factor of $\Omega(N^{\frac{1}{3}})$; for tree-shaped structures, we give an $O(N^{\frac{1}{2}})$-approximation algorithm. For the efficiency of the assembly process itself, we show that any constructible shape allows pipelined assembly, which produces copies of $P$ in $O(1)$ amortized time, i.e., $N$ copies of $P$ in $O(N)$ time steps. These considerations can be extended to three-dimensional objects: For the class of polycubes $P$ we prove that it is NP-hard to decide whether it is possible to construct a path between two points of $P$; it is also NP-hard to decide constructibility of a polycube $P$. Moreover, it is expAPX-hard to maximize a path from a given start point.

### Improving spliced alignment for identification of ortholog groups and multiple CDS alignment

Authors: Jean-David Aguilar, Safa Jammali, Esaie Kuitche, Aïda Ouangraoua
Abstract: The Spliced Alignment Problem (SAP) that consists in finding an optimal semi-global alignment of a spliced RNA sequence on an unspliced genomic sequence has been largely considered for the prediction and the annotation of gene structures in genomes. Here, we re-visit it for the purpose of identifying CDS ortholog groups within a set of CDS from homologous genes and for computing multiple CDS alignments. We introduce a new constrained version of the spliced alignment problem together with an algorithm that exploits full information on the exon-intron structure of the input RNA and gene sequences in order to compute high-coverage accurate alignments. We show how pairwise spliced alignments between the CDS and the gene sequences of a gene family can be directly used in order to clusterize the set of CDS of the gene family into a set of ortholog groups. We also introduce an extension of the spliced alignment problem called Multiple Spliced Alignment Problem (MSAP) that consists in aligning simultaneously several RNA sequences on several genes from the same gene family. We develop a heuristic algorithmic solution for the problem. We show how to exploit multiple spliced alignments for the clustering of homologous CDS into ortholog and close paralog groups, and for the construction of multiple CDS alignments. An implementation of the method in Python is available on demande to SFA@USherbrooke.ca.

Keywords: Spliced alignment, CDS ortholog groups, Multiple CDS alignment, Gene structure, Gene family.

### Crossing Patterns in Nonplanar Road Networks

Authors: David Eppstein, Siddharth Gupta
Abstract: We define the crossing graph of a given embedded graph (such as a road network) to be a graph with a vertex for each edge of the embedding, with two crossing graph vertices adjacent when the corresponding two edges of the embedding cross each other. In this paper, we study the sparsity properties of crossing graphs of real-world road networks. We show that, in large road networks (the Urban Road Network Dataset), the crossing graphs have connected components that are primarily trees, and that the remaining non-tree components are typically sparse (technically, that they have bounded degeneracy). We prove theoretically that when an embedded graph has a sparse crossing graph, it has other desirable properties that lead to fast algorithms for shortest paths and other algorithms important in geographic information systems. Notably, these graphs have polynomial expansion, meaning that they and all their subgraphs have small separators.

### Graphs with sparse crossings

from David Eppstein

ACM SIGSPATIAL 2017 (to be held in Southern California in November) doesn’t yet seem to have posted a list of accepted papers, but two of them are mine (one long, one short). I already posted about the short one, “Defining Equitable Geographic Districts in Road Networks via Stable Matching”, so let me just say a little about the other one, now up on the arXiv: “Crossing Patterns in Nonplanar Road Networks” (with UCI grad student Sid Gupta, arXiv:1709.06113).

To begin with, road networks are graphs with a vertex at each intersection of roads and an edge for each segment of road between the intersections. We tend to think of them as being planar graphs, but they’re not. Road segments cross each other, often without an intersection, when one segment is part of an overpass, underpass, or tunnel. Additionally, some pairs of road segments can look like they’re crossing in our data, even when they don’t cross in real life, because the data makes a segment of road look straight when actually it bends around the end of another road.

But it would be nice if these graphs were planar (even if they aren’t), because then we could apply all the fast algorithms researchers have developed for planar graphs. Many of these algorithms don’t depend on the detailed properties of planarity, but only on the planar separator theorem. So it would be nice if these graphs obeyed a similar separator theorem.

And they do! Or at least, the graphs that fit the model of nonplanar road networks that we use in our new paper do have good separators. Our idea is that most of the time when roads cross they do so in isolation; a crossing road segment would often only cross one other road segment. If that were true everywhere, we’d get the 1-planar graphs, but there are a few segments that cross multiple others (think of the long tunnels under Boston, or complicated freeway interchanges such as the one in Dallas above). Even in those cases, though, the collection of road segments that cross each other has a restricted patterns. In a long tunnel, for instance, a single segment may cross many others, but each of those other crossed segments will typically be much shorter and have no other crossings.

We can understand these crossing patterns more clearly by drawing another graph, the crossing graph, which has a vertex for each road segment and an edge connecting it to each other road segment that it crosses. Most vertices are isolated (they don’t cross anything) and most edges are isolated (they connect two road segments that cross each other and nothing else). There are occasional larger connected components, but these still have a simple structure. We thought going into this research that they would all be trees; that’s not true, but they do all have low degeneracy. That means, in any cluster of crossing road segments, at least one of the segments is crossed only a small number of times. Remove that segment, and the remaining segments still have one that’s crossed only few times, and so on. We tested the degeneracy of the road networks of many cities, and we never saw a number higher than 6 (in Buenos Aires). For most cities the number was even smaller, 3 or 4. And the number of non-tree clusters of crossing road segments was always a small fraction of all crossings. We tried including only the crossings we new were real (by using the identification of some road segments as bridges or tunnels in the data we used) or looking at all the crossings, even the artificial ones caused by differences between the straight segments of the data and the curvature of the actual road segments; it didn’t make much difference to our overall results.

We also show (by using the crossing number inequality) that road networks with -degenerate crossing graphs are sparse (they have edges and crossings). It follows that they have small separators, inherited from the separators of their planarizations. So I think the graphs with degenerate crossing graphs make a good model to look for algorithms on: not so restrictive (like planar or 1-planar graphs) that it eliminates the real-world graphs we’re trying to model, but restrictive enough to have properties useful in algorithm design.

(G+)

by David Eppstein at September 19, 2017 11:07 PM UTC

### TR17-141 | A Family of Dictatorship Tests with Perfect Completeness for 2-to-2 Label Cover | Joshua Brakensiek, Venkatesan Guruswami

from ECCC papers

We give a family of dictatorship tests with perfect completeness and low-soundness for 2-to-2 constraints. The associated 2-to-2 conjecture has been the basis of some previous inapproximability results with perfect completeness. However, evidence towards the conjecture in the form of integrality gaps even against weak semidefinite programs has been elusive. Our result provides some indication of the expressiveness and non-triviality of 2-to-2 constraints, given the close connections between dictatorship tests and satisfiability and approximability of CSPs.

### TR17-140 | An improvement of the algorithm of Hertli for the unique 3SAT problem | Osamu Watanabe, Tong Qin

from ECCC papers

We propose a simple idea for improving the randomized algorithm of Hertli for the Unique 3SAT problem.

### Sunsetting ICORE

When friends remember me, I am happy to go to distant lands. I managed to travel to Israel to give a talk at the ICORE day.  I-CORE is Israeli Center for Research Excellence program, and the day marked the end of the funding period. Israel has great talent, so any money the govt puts in finds great use; in  this case, it seems to have filled a severe need, providing much-needed support for postdocs.

I talked about Heavy Hitters (HHs), a bit of the classical Count-Min stuff but also much less sculpted stuff like high dimensional HHs, H-influence and other topics. I find HHs interesting because it is not top k, it is a distinct concept, and a concept that represents what is (often the only thing) possible within resource constraints. Robi asked a great question, if this can be formalized.

There were many good talks. I managed to catch a bit of the talks by Rotem Oshman, Michael Shapira, Shahar Dobzinski, and others. I also managed to catch Yuval Ishai and Eylon Yagev talk secure multiparty computing and complexity theory resp in Hebrew. Finally, Bernard Haeupler gave an excellent  talk on using shortcuts to break natural bottlenecks with message passing algorithms. This talk introduced the audience to principled theory methods (was happy to see Leighton-Maggs-Rao O(Congestion+Dilation) result reenter the psyche) to attack worst case performance of message passing algorithms.

ps: Thanks to Moni for mentioning Mossel's paradox with dice/6 over dinner, he said there was an elegant solution, and my neurons stayed awake figuring it out.

by metoo (noreply@blogger.com) at September 19, 2017 05:54 AM UTC

### TR17-139 | A ZPP^NP[1] Lifting Theorem | Thomas Watson

from ECCC papers

The complexity class ZPP^NP[1] (corresponding to zero-error randomized algorithms with access to one NP oracle query) is known to have a number of curious properties. We further explore this class in the settings of time complexity, query complexity, and communication complexity. For starters, we provide a new characterization: ZPP^NP[1] equals the restriction of BPP^NP[1] where the algorithm is only allowed to err when it forgoes the opportunity to make an NP oracle query. Using the above characterization, we prove a query-to-communication lifting theorem, which translates any ZPP^NP[1] decision tree lower bound for a function f into a ZPP^NP[1] communication lower bound for a two-party version of f. As an application, we use the above lifting theorem to prove that the ZPP^NP[1] communication lower bound technique introduced by Goos, Pitassi, and Watson (ICALP 2016) is not tight. We also provide a "primal" characterization of this lower bound technique as a complexity class.

### A problem I thought was interesting- now...

On Nate Silver's page he sometimes (might be once a week) has a column edited by Oliver Roeder of problems. Pretty much math problems though sometimes not quite.

Some are math problems that I have seen before (e.g., hat problems). I don't bother submitting since that would just be goofy. I would be  ringer.

Some are math problems that I have not seen before, I try to do, I fail, but read the answer and am enlightened. I call that a win.

But some are math problems that I have not seen before, I try to do, I fail, but when I see the solution its a computer simulation or something else that isn't quite as interesting as I had hoped.

I describe one of those now; however, I ask if it can be made more interesting.

The problems is from this column: here

I paraphrase:  Let A be the numbers {1,2,3,...,100}.  A sequence is nice if  (1) it begins with any number in A, (2) every number is from A and is either a factor of multiple of the number just before it, and (3) no number can appear more than once.  Find the LONGEST nice sequence

Example of a nice sequence:

4, 12, 24, 6, 60, 30, 10, 100, 25, 5, 1, 97

I worked on it

1) By hand I came up with a nice sequence of length 42. This was FUN! You can either have fun trying to find a long nice sequence or you can look at mine here.

2) I tried to prove that it was optimal, hoping that either I would find its optimal or be guided to a longer sequence. Neither happened. More important is that this was NOT FUN.

3) I looked forward to the solution that would be in the next column and would be enlightening.

4) The next column, which did have the solution, is here! The answer was a sequence of length 77 found by a program that also verified there was no longer sequence. The sequence itself was mildly enlightening in that I found some tricks I didn't know about, but the lack of a real lower bound proof was disappointing.

They mentioned that this is a longest path problem (Graph is {1,..,100} edges are between numbers that are either multiples of factors) and that such problems are NP-complete. That gave the impression that THIS problem is hard since its a case of an NP-complete problem. Thats not quite right- its possible that this type of graph has a quick solution.

But I would like YOU the readers to help me turn lemon into lemonade.

1) Is there a short proof that 77 is optimal?  Is there a short proof that (say) there is no sequence of length 83.  I picked 83 at random.  One can easily prove there is no sequence of length 100.

2) Is the following problem in P or NPC or if-NPC-then-bad-thing-happen:

Given (n,k) is there a nice sequence of {1,...,n} of length at least k. (n is in binary, k is in unary, so that the problem is in NP.)

I suspect not NPC.

3) Is the following problem in P or NPC or ...

Given a set of numbers A and a number k, is there a nice sequence of elements of A of length at least k (k in unary).

Might be NPC if one can code any graph into such a set.

Might be in P since the input has a long length.

4) Is the following solvable: given a puzzle in the Riddler, determine ahead of time if its going to be interesting? Clyde Kruskal and I have a way to solve this- every even numbered column I read the problem and the solution and tell him if its interesting, and he does the same for odd number columns.

by GASARCH (noreply@blogger.com) at September 18, 2017 04:36 AM UTC

### The Orthogonal Vectors Conjecture for Branching Programs and Formulas

Authors: Daniel Kane, Ryan Williams
Abstract: In the Orthogonal Vectors (OV) problem, we wish to determine if there is an orthogonal pair of vectors among $n$ Boolean vectors in $d$ dimensions. The OV Conjecture (OVC) posits that OV requires $n^{2-o(1)}$ time to solve, for all $d=\omega(\log n)$. Assuming the OVC, optimal time lower bounds have been proved for many prominent problems in $P$.

We prove that OVC is true in several computational models of interest:

* For all sufficiently large $n$ and $d$, OV for $n$ vectors in $\{0,1\}^d$ has branching program complexity $\tilde{\Theta}(n\cdot \min(n,2^d))$. In particular, the lower bounds match the upper bounds up to polylog factors.

* OV has Boolean formula complexity $\tilde{\Theta}(n\cdot \min(n,2^d))$, over all complete bases of $O(1)$ fan-in.

* OV requires $\tilde{\Theta}(n\cdot \min(n,2^d))$ wires, in formulas comprised of gates computing arbitrary symmetric functions of unbounded fan-in.

Our lower bounds basically match the best known (quadratic) lower bounds for any explicit function in those models. Analogous lower bounds hold for many related problems shown to be hard under OVC, such as Batch Partial Match, Batch Subset Queries, and Batch Hamming Nearest Neighbors, all of which have very succinct reductions to OV.

The proofs use a certain kind of input restriction that is different from typical random restrictions where variables are assigned independently. We give a sense in which independent random restrictions cannot be used to show hardness, in that OVC is false in the "average case" even for $AC^0$ formulas:

* For every fixed $p \in (0,1)$ there is an $\epsilon_p > 0$ such that for every $n$ and $d$, OV instances where input bits are independently set to $1$ with probability $p$ (and $0$ otherwise) can be solved with $AC^0$ formulas of size $O(n^{2-\epsilon_p})$, on all but a $o_n(1)$ fraction of instances.

### On the Difference Between Closest, Furthest, and Orthogonal Pairs: Nearly-Linear vs Barely-Subquadratic Complexity in Computational Geometry

Authors: Ryan Williams
Abstract: Point location problems for $n$ points in $d$-dimensional Euclidean space (and $\ell_p$ spaces more generally) have typically had two kinds of running-time solutions:

* (Nearly-Linear) less than $d^{poly(d)} \cdot n \log^{O(d)} n$ time, or

* (Barely-Subquadratic) $f(d) \cdot n^{2-1/\Theta(d)}$ time, for various $f$.

For small $d$ and large $n$, "nearly-linear" running times are generally feasible, while "barely-subquadratic" times are generally infeasible. For example, in the Euclidean metric, finding a Closest Pair among $n$ points in ${\mathbb R}^d$ is nearly-linear, solvable in $2^{O(d)} \cdot n \log^{O(1)} n$ time, while known algorithms for Furthest Pair (the diameter of the point set) are only barely-subquadratic, requiring $\Omega(n^{2-1/\Theta(d)})$ time. Why do these proximity problems have such different time complexities? Is there a barrier to obtaining nearly-linear algorithms for problems which are currently only barely-subquadratic?

We give a novel exact and deterministic self-reduction for the Orthogonal Vectors problem on $n$ vectors in $\{0,1\}^d$ to $n$ vectors in ${\mathbb Z}^{\omega(\log d)}$ that runs in $2^{o(d)}$ time. As a consequence, barely-subquadratic problems such as Euclidean diameter, Euclidean bichromatic closest pair, ray shooting, and incidence detection do not have $O(n^{2-\epsilon})$ time algorithms (in Turing models of computation) for dimensionality $d = \omega(\log \log n)^2$, unless the popular Orthogonal Vectors Conjecture and the Strong Exponential Time Hypothesis are false. That is, while poly-log-log-dimensional Closest Pair is in $n^{1+o(1)}$ time, the analogous case of Furthest Pair can encode larger-dimensional problems conjectured to require $n^{2-o(1)}$ time. We also show that the All-Nearest Neighbors problem in $\omega(\log n)$ dimensions requires $n^{2-o(1)}$ time to solve, assuming either of the above conjectures.

### The Dominating Set Problem in Geometric Intersection Graphs

Authors: Mark de Berg, Sándor Kisfaludi-Bak, Gerhard Woeginger
Abstract: We study the parameterized complexity of dominating sets in geometric intersection graphs. In one dimension, we investigate intersection graphs induced by translates of a fixed pattern Q that consists of a finite number of intervals and a finite number of isolated points. We prove that Dominating Set on such intersection graphs is polynomially solvable whenever Q contains at least one interval, and whenever Q contains no intervals and for any two point pairs in Q the distance ratio is rational. The remaining case where Q contains no intervals but does contain an irrational distance ratio is shown to be NP-complete and contained in FPT (when parameterized by the solution size). In two and higher dimensions, we prove that Dominating Set is contained in W[1] for intersection graphs of semi-algebraic sets with constant description complexity. This generalizes known results from the literature. Finally, we establish W[1]-hardness for a large class of intersection graphs.

### Technical Report: Accelerating Dynamic Graph Analytics on GPUs

Authors: Mo Sha, Yuchen Li, Bingsheng He, Kian-Lee Tan
Abstract: As graph analytics often involves compute-intensive operations, GPUs have been extensively used to accelerate the processing. However, in many applications such as social networks, cyber security, and fraud detection, their representative graphs evolve frequently and one has to perform a rebuild of the graph structure on GPUs to incorporate the updates. Hence, rebuilding the graphs becomes the bottleneck of processing high-speed graph streams. In this paper, we propose a GPU-based dynamic graph storage scheme to support existing graph algorithms easily. Furthermore, we propose parallel update algorithms to support efficient stream updates so that the maintained graph is immediately available for high-speed analytic processing on GPUs. Our extensive experiments with three streaming applications on large-scale real and synthetic datasets demonstrate the superior performance of our proposed approach.

### Weighted and locally bounded list-colorings in split graphs, cographs, and partial k-trees

Authors: Cédric Bentz
Abstract: For a fixed number of colors, we show that, in node-weighted split graphs, cographs, and graphs of bounded tree-width, one can determine in polynomial time whether a proper list-coloring of the vertices of a graph such that the total weight of vertices of each color equals a given value in each part of a fixed partition of the vertices exists. We also show that this result is tight in some sense, by providing hardness results for the cases where any one of the assumptions does not hold. The edge-coloring variant is also studied, as well as special cases of cographs and split graphs.

### Geometric clustering in normed planes

Authors: Pedro Martín, Diego Yáñez
Abstract: Given two sets of points $A$ and $B$ in a normed plane, we prove that there are two linearly separable sets $A'$ and $B'$ such that $\mathrm{diam}(A')\leq \mathrm{diam}(A)$, $\mathrm{diam}(B')\leq \mathrm{diam}(B)$, and $A'\cup B'=A\cup B.$ This extends a result for the Euclidean distance to symmetric convex distance functions. As a consequence, some Euclidean $k$-clustering algorithms are adapted to normed planes, for instance, those that minimize the maximum, the sum, or the sum of squares of the $k$ cluster diameters. The 2-clustering problem when two different bounds are imposed to the diameters is also solved. The Hershberger-Suri's data structure for managing ball hulls can be useful in this context.