# Theory of Computing Blog Aggregator

### Do you KNOW how you KNOW what you KNOW? I don't KNOW.

When watching Jeopardy with Darling if I get a question correct that is NOT in my usual store of knowledge (that is NOT Ramsey Theory, NOT Vice Presidents, NOT Satires of Bob Dylan) Darling asks me How did you know that? I usually reply I do not know how I knew that. Recently I DID know and I'll get to that later, but for now the question arises: Do you know how you know what you know?
1. As an undergrad I learned mostly from taking courses. Hence I could say things like I Know Group theory from a course I had in Abstract Algebra in the Fall of 1978 (Side Note- I know why I should care about groups from reading the algorithm for graph isom for graphs of bounded degree---in 1988). I learned a few things on my own- I learned that a graph is Eulerian iff every vertex has even degree from a Martin Gardner article. But since most of my knowledge was from courses I knew how I knew what I knew.
2. As a grad students I still took courses but more routes to knowledge emerged. Papers! I could say things like
I know the oracle constructions about P vs NP because I read the Baker Gill Solovay paper on October 23, 1981. It helps that Oct 23 is Weird Al's birthday. But even here things get a bit murky- someone TOLD ME about the paper which lead me to read it, but I don't recall who. So one more route to knowledge emerged- people telling you stuff in the hallways.
3. I saw Anil Nerode give a talk on Recursive Mathematics and that day went to the library (ask your grandmother what a library is) and read some articles on it. This was well timed- I knew enough recursion theory and combinatorics to read up on recursive combinatorics. In this case I know exactly how I know what I know. Might be the last time.
4. As a professor I read papers, hear talks, hear things in hallways, and learn stuff. Its getting harder to know how I know things, but to some extend I still could. Until...
5. THE WEB. The Web is the main reason I don't know how I know things. I sometimes tell Darling I read it on the web which is (a) prob true, and (b) prob not very insightful.
So- do you know how you know what you know?

On Jeopardy recently the final Jeopardy question was as follows.

TOPIC: Island Countries.
ANSWER: No longer Western, this one-word nation has moved to the west side of the international Date Line to join Asia and Australia.
BILL: What is SAMOA!?
Darling wondered how I know that:
DARLING: How did you know that? Is there a Ramsey Theorist in Samoa?
BILL: Not that I know if, but that's a good guess as to how I knew that. Actually Lance had a blog post Those Happy Samoans about Samoa going over the international dateline and losing the advantage of having more time to work on their conference submissions.
DARLING: Too bad there isn't a Ramsey Theorist there to take advantage of that!

Thanks Lance!- In this one case I know how I know what I know!

by GASARCH (noreply@blogger.com) at May 21, 2013 05:00 PM UTC

from Gil Kalai

## Enlightenment at a red traffic light

### Wolf Prize laureate Prof. George Daniel Mostow made his greatest scientific breakthrough while driving.

Haaretz tells the story of how Dan Mostow reached his breakthrough known as Mostow’s rigidity theorem.

Congratulations, Dan!

## French-Isreali Meeting and Günterfest

More updates: If you are in Paris On Wednesday and Thursday this week there will be a lovely French-Isreali interacademic meeting on mathematics.  The problem is very interesting, and I will give a talk quite similar to my recent MIT talk on quantum computers.

In the  weekend  we will celebrate Günter Ziegler’s 50th birthday in Berlin. Günter started very very young so we had to wait long for this.

by Gil Kalai at May 21, 2013 04:16 PM UTC

### On the Importance and Risks in Using Sub-Reviewers

I promised to post from time to time about the FOCS 2013 PC work (to demystify the process). So here is a quick update: We got 280 submission (well, 281 submissions but one was just a bad joke). This is up (by more than  %10) from FOCS 2012 but on the other hand our PC is also on the large side so the number of papers assigned to a PC member is less than 40, which for FOCS/STOC is very reasonable (I reduced this number a bit by encouraging several authors of misplaced papers to withdraw their submissions).

The first stage of the work was assigning each paper to three PC members (more challenging than I expected), and then the individual review stage of the PC work. In this stage, each PC member reviews his/her assigned papers without discussion with other PC members. This way, each paper gets three relatively independent reviews. We are now ending this stage, and it is a good opportunity to talk about using sub-reviewers.

A PC member can ask an expert in the specific sub-area of a paper for a review (and we then refer to such an expert as a sub-reviewer). The purpose of sub-reviewers is not to lighten the load of PC members (though this could be seen as a positive byproduct), but rather to elicit expertise the PC is missing. One disadvantage of sub-reviewers which I will not consider here is the additional work for the rest of the community. This is a consideration, but hopefully an expert would be interested in reviewing papers that are very close to his/her expertise (and this would not be too much work).

It is my experience that a major source of mistakes for PCs is in the way they use external experts. One fear is an over-confident PC that believes it can judge every work on its own. True enough “in the good old days,” PCs were self-sufficient (especially in the pre-Internet era). But TOC today is much deeper and much wider than “in the good old days.” It is easy to make mistakes if you do not have good understanding of related work. On the other hand, PCs that rely on their sub-reviewers too much are in just a serious problem. The risk is to turn FOCS/STOC into a collection of mini-conferences run by separate sub-communities. This is completely opposite to the purpose of FOCS/STOC: promoting TOC as a joint community and fostering the flow of ideas. For that to happen we must have that papers accepted are of interest to larger fractions of the community. The ideal solution in my eye is to heavily rely on external experts, but to view these reviews as inputs to the PC work rather than as outsourcing of the PC work. I therefore asked each PC member to familiarize himself/herself with its assigned papers and to reach a separate opinion. I know this is an ideal view, but setting the right ideals could be important.

by Omer Reingold at May 21, 2013 03:22 PM UTC

### §8.5: Abelian groups

from Ryan O'Donnell

The previous section covered the case of $f \in L^2(\Omega^n, \pi^{\otimes n})$ with $|\Omega| = 2$; there, we saw it could be helpful to look at explicit Fourier bases. When $|\Omega| \geq 3$ this is often not helpful, especially if the only “operation” on the domain is equality. For example, if $f : \{\mathsf{Red}, \mathsf{Green}, \mathsf{Blue}\}^n \to {\mathbb R}$ then it’s best to just work abstractly with the orthogonal decomposition. However if there is a notion of, say, “addition” in $\Omega$ then there is a natural, canonical Fourier basis for $L^2(\Omega, \pi)$ when $\pi$ is the uniform distribution.

More precisely, suppose the domain $\Omega$ is a finite abelian group $G$, with operation $+$ and identity $0$. We will consider the domain $G$ under the uniform probability distribution $\pi$; this is quite natural because $\pi$ is translation-invariant: $\pi(X) = \pi(t + X)$ for any $X \subseteq G$, $t \in G$. In this setting it is more convenient to allow functions with range the complex numbers; thus we come to the following definition:

Definition 50 Let $G$ be a finite abelian group with operation $+$ and identity $0$. For $n \in {\mathbb N}^+$ we write $L^2(G^n)$ for the complex inner product space of functions $f : G^n \to {\mathbb C}$, with inner product $\langle f, g \rangle = \mathop{\bf E}_{{\boldsymbol{x}} \sim G^n}[f({\boldsymbol{x}})\overline{g({\boldsymbol{x}})}].$ Here and throughout this section ${\boldsymbol{x}} \sim G^n$ denotes that ${\boldsymbol{x}}$ is drawn from the uniform distribution on $G^n$.

Everything we have done in this chapter for the real inner product space $L^2(\Omega^n, \pi^{\otimes n})$ generalizes easily to the case of a complex inner product; the main difference is that Plancherel’s Theorem becomes $\langle f, g \rangle = \sum_{\alpha \in {\mathbb N}^n_{< m}} \widehat{f}(\alpha) \overline{\widehat{g}(\alpha)} = \sum_{S \subseteq [n]} \langle f^{=S}, g^{=S} \rangle.$ See the exercises for more.

A natural Fourier basis for $L^2(G)$ comes from a natural family of functions $G \to {\mathbb C}$, namely the characters. These are defined to be the group homomorphisms from $G$ to ${\mathbb C}^\times$, where ${\mathbb C}^\times$ is the abelian group of nonzero complex numbers under multiplication.

Definition 51 A character of the (finite) group $G$ is a function $\chi : G \to {\mathbb C}^\times$ which is a homomorphism; i.e., satisfies $\chi(x + y) = \chi(x) \chi(y)$. Since $G$ is finite there is some $m \in {\mathbb N}^+$ such that $0 = x + x + \cdots + x$ ($m$ times) for each $x \in G$. Thus $1 = \chi(0) = \chi(x)^m$, meaning the range of $\chi$ is in fact contained in the $m$th roots of unity. In particular, $|\chi(x)| = 1$ for all $x \in G$.

We have the following easy facts:

Fact 52 If $\chi$ and $\phi$ are characters of $G$ then so are $\overline{\chi}$ and $\phi \cdot \chi$.

Proposition 53 Let $\chi$ be a character of $G$. Then either $\chi \equiv 1$ or $\mathop{\bf E}[\chi] = 0$.

Proof: If $\chi \not \equiv 1$, pick some $y \in G$ such that $\chi(y) \neq 1$. Since ${\boldsymbol{x}} + y$ is uniformly distributed on $G$ when ${\boldsymbol{x}} \sim G$, $\mathop{\bf E}_{{\boldsymbol{x}} \sim G}[\chi({\boldsymbol{x}})] = \mathop{\bf E}_{{\boldsymbol{x}} \sim G}[\chi({\boldsymbol{x}}+y)] = \mathop{\bf E}_{{\boldsymbol{x}} \sim G}[\chi({\boldsymbol{x}})\chi(y)] = \chi(y)\mathop{\bf E}_{{\boldsymbol{x}} \sim G}[\chi({\boldsymbol{x}})].$ Since $\chi(y) \neq 1$ it follow that $\mathop{\bf E}[\chi({\boldsymbol{x}})]$ must be $0$. $\Box$

Proposition 54 The set of all characters of $G$ is orthonormal. (As a consequence, $G$ has at most $\dim(L^2(G)) = |G|$ characters.)

Proof: First, if $\chi$ is a character then $\langle \chi, \chi \rangle = \mathop{\bf E}[|\chi|^2] = 1$ because $|\chi| \equiv 1$. Next, if $\phi$ is another character distinct from $\chi$ then $\langle \phi, \chi \rangle = \mathop{\bf E}[\phi \cdot \overline{\chi}]$. But $\phi \cdot \overline{\chi}$ is a character by Fact 52, and $\phi \cdot \overline{\chi} = \phi/\chi \not \equiv 1$ because $\phi$ and $\chi$ are distinct; here we used $\overline{\chi} = 1/\chi$ because $|\chi| \equiv 1$. Thus $\langle \phi, \chi \rangle = 0$ by Proposition 53. $\Box$

As we will see next, $G$ in fact has exactly $|G|$ characters. It thus follows from Proposition 54 that the set of all characters (which includes the constant $1$ function) constitutes a Fourier basis for $L^2(G)$.

To check that each finite abelian group $G$ has $|G|$ distinct characters, we begin with the case of a cyclic group, ${\mathbb Z}_m$ for some $m$. In this case we know that every character’s range will be contained in the $m$th roots of unity.

Definition 55 Fix an integer $m \geq 2$ and write $\omega$ for the $m$th root of unity $\exp(2\pi i/m)$. For $0 \leq j < m$, we define $\chi_j : {\mathbb Z}_m \to {\mathbb C}$ by $\chi_j(x) = \omega^{jx}$. It is easy to see that these are distinct characters of ${\mathbb Z}_m$.

Thus the functions $\chi_0 \equiv 1, \chi_1, \dots, \chi_{m-1}$ form a Fourier basis for $L^2({\mathbb Z}_m)$. Furthermore, Proposition 13 tells us that we can get a Fourier basis for $L^2({\mathbb Z}_m^n)$ by taking all products of these functions.

Definition 56 Continuing Definition 55, let $n \in {\mathbb N}^+$. For $\alpha \in {\mathbb N}_{< m}^n$ we define $\chi_\alpha : {\mathbb Z}_m^n \to {\mathbb C}$ by $\chi_{\alpha}(x) = \prod_{j =1}^n \chi_{\alpha_j}(x_j).$ These functions are easily seen to be (all of the) characters of the group ${\mathbb Z}_m^n$, and they constitute a Fourier basis of $L^2({\mathbb Z}_m^n)$.

Most generally, by the Fundamental Theorem of Finitely Generated Abelian Groups we know that any finite abelian $G$ is a direct product of cyclic groups of prime-power order. In the exercises you are asked to check that you get all of the characters of $G$ — and hence a Fourier basis for $L^2(G)$ — by taking all products of the associated cyclic groups’ characters. In the remainder of the section we mostly stick to groups of the form ${\mathbb Z}_m^n$ for simplicity.

Returning to the characters $\chi_0, \dots, \chi_{m-1}$ from Definition 55, it is easy to see (using $\omega^m = 1$) that they satisfy $\chi_{j} \cdot \chi_{j’} = \chi_{j + j’ \pmod m}$ and also $1/\chi_{j} = \overline{\chi_j} = \chi_{-j \pmod m}$. Thus the characters themselves form a group under multiplication, isomorphic to ${\mathbb Z}_m$. As in Chapter 3.2, we index them using the notation $\widehat{{\mathbb Z}_m}$. More generally, indexing the Fourier basis/characters of $L^2({\mathbb Z}_m^n)$ by $\widehat{{\mathbb Z}_m^n}$ instead of multi-indices, we have:

Fact 57 The characters $(\chi_{\alpha})_{\alpha \in \widehat{{\mathbb Z}_m^n}}$ of ${\mathbb Z}_m^n$ form a group under multiplication:

• $\chi_{\alpha} \cdot \chi_{\beta} = \chi_{\alpha + \beta}$,
• $1/\chi_{\alpha} = \overline{\chi_{\alpha}} = \chi_{-\alpha}$.

As mentioned, the salient feature of $L^2(G)$ distinguishing it from other spaces $L^2(\Omega, \pi)$ is that there is a notion of addition on the domain. This means that convolution plays a major role in its analysis. We generalize the definition from the setting of ${\mathbb F}_2^n$:

Definition 58 Let $f, g \in L^2(G)$. Their convolution is the function $f * g \in L^2(G)$ defined by \begin{equation*} (f * g)(x) = \mathop{\bf E}_{\boldsymbol{y} \sim G}[f(\boldsymbol{y}) g(x - \boldsymbol{y})] = \mathop{\bf E}_{\boldsymbol{y} \sim G}[f(x-\boldsymbol{y})g(\boldsymbol{y})]. \end{equation*}

In the exercises you are asked to check that convolution is associative and commutative, and that the following generalization of Theorem 1.28 holds:

Theorem 59 Let $f, g \in L^2(G)$. Then $\widehat{f * g}(\alpha) = \widehat{f}(\alpha) \widehat{g}(\alpha)$.

We conclude this section by mentioning vector space domains. When doing Fourier analysis over the group ${\mathbb Z}_m^n$, it is natural for subgroups to arise. Things are simplest when the only subgroups of ${\mathbb Z}_m$ are the trivial ones, $\{0\}$ and ${\mathbb Z}_m$; in this case, all subgroups will be isomorphic to ${\mathbb Z}_m^{n’}$ for some $n’ \leq n$. Of course, this simple situation occurs if and only if $m$ is equal to some prime $p$. In that case, ${\mathbb Z}_p$ can be thought of as a field, ${\mathbb Z}_p^n$ as an $n$-dimensional vector space over this field, and its subgroups as subspaces. We use the notation ${\mathbb F}_p^n$ in this setting and write $\widehat{{\mathbb F}_p^n}$ to index the Fourier basis/characters; this generalizes the notation introduced for $p = 2$ in Chapter 3.2. Indeed, all of the notions from Chapters 3.2 and 3.3 regarding affine subspaces and restrictions thereto generalize easily to $L^2({\mathbb F}_p^n)$.

by Ryan O'Donnell at May 21, 2013 12:58 PM UTC

The grades are in for CS 124.  Hooray!

Interesting trend : freshmen, who make up a small fraction of the class, are highly over-represented in the A and A- grades.  This has been happening for some years now.
Extension of the interesting trend : women freshmen*, who make up a smaller fraction of the class, are even more highly over-represented in the A and A- grades.

I'd be very excited if I were teaching the class again next year -- finding undergraduates who have the potential to be TAs for multiple years is golden -- but I'll be sure to pass the names on to the new guy.**

Other side of the trend :  2nd semester seniors performed substantially below average this year.
But on the positive side :  nobody did so badly they won't graduate.  (At least, not in my class.)

Getting grades in really makes the class feel over.  Onto summer work!  (Research and writing...)

* Freshwomen if you prefer.
** And any freshman or sophomore in the class who got an A and is interested in TAing should let me know next November or so.

by Michael Mitzenmacher (noreply@blogger.com) at May 21, 2013 03:05 AM UTC

### Cheap fountain pen comparison

from David Eppstein

For the past few years I've been carrying around a small notebook to jot down notes from talks, research ideas, expenses, etc. (I guess I could do it on my phone but I prefer pen and paper.) I like writing with fountain pens, but don't care to carry around an expensive one: I don't feel I need to impress people with how much I spend on accessories, and I don't want to worry about losing it. I had been using disposable Pilot Varsity pens, but occasionally (especially if I took them on airplanes) I would end up with ink all over my fingers, so I recently tried switching to an inexpensive refillable pen, the Lamy Safari (fine point, with refilling attachment and Noodler's forest green ink). Below is a comparison of how my notes look with the two pens:

The Lamy feels less scratchy than the Pilots, but I didn't really mind the scratchiness. The more important differences in the way it feels to me are that the Lamy's bigger size fits my hand better and lets me hold my fingers a little farther away from the tip (making inky fingers less likely), and its triangular cross-section gives me a more consistent orientation, so I find myself holding the pen the wrong way less often. The writing comes out less blotchy, with better control of the line weight, and I don't feel that I have to write as large to make it legible. There's also significantly less bleed-through on the Moleskine square-ruled paper I've been using, although it's still nonzero.

I still don't know if these pens are more reliable in the air, but on the whole I'd say this is a big step up in writing quality.

### The Unique Games Conjecture, Integrality Gap for Cut Problems and Embeddability of Negative Type Metrics into $\ell_1$

Authors: Subhash A. Khot, Nisheeth K. Vishnoi
Abstract: In this paper, we disprove a conjecture of Goemans and Linial; namely, that every negative type metric embeds into $\ell_1$ with constant distortion. We show that for an arbitrarily small constant $\delta> 0$, for all large enough $n$, there is an $n$-point negative type metric which requires distortion at least $(\log\log n)^{1/6-\delta}$ to embed into $\ell_1.$

Surprisingly, our construction is inspired by the Unique Games Conjecture (UGC) of Khot, establishing a previously unsuspected connection between probabilistically checkable proof systems (PCPs) and the theory of metric embeddings. We first prove that the UGC implies a super-constant hardness result for the (non-uniform) Sparsest Cut problem. Though this hardness result relies on the UGC, we demonstrate, nevertheless, that the corresponding PCP reduction can be used to construct an "integrality gap instance" for Sparsest Cut. Towards this, we first construct an integrality gap instance for a natural SDP relaxation of Unique Games. Then we "simulate" the PCP reduction and "translate" the integrality gap instance of Unique Games to an integrality gap instance of Sparsest Cut. This enables us to prove a $(\log \log n)^{1/6-\delta}$ integrality gap for Sparsest Cut, which is known to be equivalent to the metric embedding lower bound.

### Correcting self-intersecting polygons using minimal memory A simple and efficient algorithm and thoughts on line-segment intersection algorithms

Authors: Jean Souviron
Abstract: While well-known methods to list the intersections of either a list of segments or a complex polygon aim at achieving optimal time-complexity they often do so at the cost of memory comsumption and complex code. Real-life software optimisation however lies in optimising at the same time speed and memory usage as well as keeping code simple. This paper first presents some thoughts on the available algorithms in terms of memory usage leading to a very simple scan-line-based algorithm aiming at answering that challenge. Although sub-optimal in terms of speed it is optimal if both speed and memory space are taken together and is very easy to implement. For N segments and k intersections it uses only N additional integers and lists the intersections in O(N^1.26) or corrects them in O((N+k) N^0.26) at most in average. It is therefore well adapted for inclusion in larger software and seems like a good compromise. Worst-case is in O(N^2). Then the paper will focus on differences between available methods and the brute-force algorithm and a solution is proposed. Although sub-optimal its applications could mainly be to answer in a fast way a number of scattered unrelated intersection queries using minimal complexity and additional resources.

### Efficient c-planarity testing algebraically

Authors: Radoslav Fulek, Jan Kynčl, Igor Malinović, Dömötör Pálvölgyi
Abstract: We generalize the strong Hanani-Tutte theorem to flat clustered graphs with two disjoint clusters and to clustered graphs with connected clusters, thereby obtaining a new polynomial time algorithm for testing c-planarity in these cases. We also give a new and short proof for a result by Di Battista and Frati about efficient c-planarity testing of an embedded flat clustered graph with small faces based on the matroid intersection algorithm.

### Realistic cost for the model of coherent computing

Authors: Akira SaiToh
Abstract: For the model of so-called coherent computing recently proposed by Yamamoto et al. [Y. Yamamoto et al., New Gen. Comput. 30 (2012) 327-355], a theoretical analysis of the success probability is given. Although it was claimed as their prospect that the Ising spin configuration problem would be efficiently solvable in the model, here it is shown that the probability of finding a desired spin configuration exponentially decreases in the problem size for hard instances. The model is thus physically unfeasible for solving the problem within a polynomial cost.

### Implementation of linear maps with circulant matrices via modulo 2 rectifier circuits of bounded depth

Authors: Igor S. Sergeev
Abstract: In the present note we show that for any positive integer k an arbitrary Boolean circulant matrix can be implemented via modulo 2 rectifier circuit of depth 2k-1 and complexity O(n^{1+1/k}), and also via circuit of depth 2k and complexity O(n^{1+1/k} log^{-1/k} n).

### Improved Approximation Algorithms for Connected Domatic Partitions and Related Problems

Authors: Alina Ene, Nitish Korula, Ali Vakilian
Abstract: A set of vertices in a graph is a dominating set if every vertex outside the set has a neighbor in the set. A dominating set is connected if the subgraph induced by its vertices is connected. The connected domatic partition problem asks for a partition of the nodes into connected dominating sets. The connected domatic number of a graph is the size of a largest connected domatic partition and it is a well-studied graph parameter with applications in the design of wireless networks. In this paper, we give improved approximation algorithms for constructing large connected domatic partitions and packings. Let $n$ be the number of nodes and let $k$ be the vertex connectivity of the graph. The fractional connected domatic number is at most $k$, and our algorithms construct partitions and packings whose sizes are proportional to $k$. Some of our main contributions are the following:

\begin{itemize}

\item An algorithm for constructing a connected domatic partition of size $\Omega(k/\ln^2{n})$ provided that $k = \Omega(\ln^3{n})$. This improves a recent result of \cite{CHGK} that gives a $\Omega(k/\ln^5{n})$-sized partition.

\item An algorithm for constructing a fractional connected domatic packing of size $\Omega(k)$ for node-capacitated planar and minor-closed families of graphs.

\item An algorithm for constructing an edge-connected domatic partition of size $\Omega(k/\ln{n})$.

\item An improved sampling theorem for analyzing the vertex connectivity of a graph under node sampling.

\end{itemize}

### On Paul Erdos and Kim Kardashian

from Luca Trevisan

This year is the centennial of Paul Erdős’s birth. Erdős lived most of his adult life as a traveling mathematician, “couchsurfing,” as we would say now, from place to place and from mathematical conference to mathematical conference. He wrote more than 1,500 papers with more than 500 different coauthors, introduced the probabilistic method and was the defining figure of the “Hungarian approach” to combinatorics. He died at age 83 while attending a mathematical conference.

Last year, we celebrated the centennial of Alan Turing’s birth. Turing and Erdős have become such iconic figures both for the impact of their work and for the fascinating facts of their lives. I would like to argue that the cultural archetype through which we interpret their lives is that of the saint. It is clearly that of the martyr saint in the case of Turing, while Erdős gave up material possessions and devoted his life to others, traveling everywhere and “preaching” to everybody, much in the mold of Saint Francis.

(A comparison of the Turing centennial celebration and Erdős’s, and a look at the frescoes of Medieval Catholic churches will show which kind of saint people are more interested in.)

The first step to become a saint of the Catholic church is to establish that the person exhibited “heroic virtues,” which is a great expression. This is an archetype that is not restricted to religion: you see it occurring in communist propaganda (Stakhanov, Lei Feng) and in every civil rights movement.

Saints were the “celebrities” of the Middle Ages, those whose life people liked to talk about. But contemporary celebrities come from a totally different archetype, that of the Greek God. Greek (and Roman) gods were petty and jealous, they cheated on their spouses, they were terrible parents, but there were good stories to be told about them. We don’t want (at least, I don’t) to live the life of a saint, but thinking about them is certainly inspirational and it makes us think that if someone can be so much better than us, maybe we can be a little better ourself in the practice of “virtues”, whatever this may mean to us. And we don’t admire gods, but, well, it’s probably fun to be one.

As usual, I have lost track of what I was trying to say, but I think that it speaks well of the academic community that we are more interested in saints than in gods, I will close by saying that my favorite saint of complexity theory is Avi Wigderson, I will keep to myself who my favorite god of complexity theory is, and I will leave it to the readers to contribute their picks.

by luca at May 20, 2013 09:06 PM UTC

### TR13-074 | Helly Circular-Arc Graph Isomorphism is in Logspace | Johannes Köbler, Sebastian Kuhnert, Oleg Verbitsky

from ECCC papers

We present logspace algorithms for the canonical labeling problem and the representation problem of Helly circular-arc (HCA) graphs. The first step is a reduction to canonical labeling and representation of interval intersection matrices. In a second step, the Delta trees employed in McConnell's linear time representation algorithm for interval matrices are adapted to the logspace setting and endowed with additional information to allow canonization. As a consequence, the isomorphism and recognition problems for HCA graphs turn out to be logspace complete.

### Average Stretch Factor: How Low Does It Go?

Authors: Vida Dujmovic, Pat Morin, Michiel Smid
Abstract: In a geometric graph, G, the stretch factor between two vertices, u and w, is the ratio between the Euclidean length of the shortest path from u to w in G and the Euclidean distance between u and w. The average stretch factor of G is the average stretch factor taken over all pairs of vertices in G. We show that for any set, V, of n points in R^d, there exists a geometric graph with vertex set V, that has O(n) edges, and that has average stretch factor 1 + o_n(1). More precisely, the average stretch factor of this graph is 1 + O((log n/n)^{1/(2d+1)}). We complement this upper-bound with a lower bound: There exist n point sets in R^2 for which any graph with O(n) edges has average stretch factor 1 + \Omega(1/\sqrt{n}). Bounds of this type are not possible for the more commonly studied worst-case stretch factor. In particular, there exists point sets, V, such that any graph with worst-case stretch factor 1 + o_n(1) has a superlinear number of edges.

### Dagstuhl Report 13082: Communication Complexity, Linear Optimization, and lower bounds for the nonnegative rank of matrices

Authors: LeRoy Beasely, Troy Lee, Hartmut Klauck, Dirk Oliver Theis
Abstract: This report documents the program and the outcomes of Dagstuhl Seminar 13082 "Communication Complexity, Linear Optimization, and lower bounds for the nonnegative rank of matrices", held in February 2013 at Dagstuhl Castle.

### Belief Propagation for Linear Programming

Authors: Andrew Gelfand, Jinwoo Shin, Michael Chertkov
Abstract: Belief Propagation (BP) is a popular, distributed heuristic for performing MAP computations in Graphical Models. BP can be interpreted, from a variational perspective, as minimizing the Bethe Free Energy (BFE). BP can also be used to solve a special class of Linear Programming (LP) problems. For this class of problems, MAP inference can be stated as an integer LP with an LP relaxation that coincides with minimization of the BFE at zero temperature". We generalize these prior results and establish a tight characterization of the LP problems that can be formulated as an equivalent LP relaxation of MAP inference. Moreover, we suggest an efficient, iterative annealing BP algorithm for solving this broader class of LP problems. We demonstrate the algorithm's performance on a set of weighted matching problems by using it as a cutting plane method to solve a sequence of LPs tightened by adding blossom'' inequalities.

### Computability vs. Nondeterministic and P vs. NP

Authors: Zhou Jian-Ming
Abstract: This paper demonstrates the relativity of Computability and Nondeterministic; the nondeterministic is just Turing's undecidable Decision rather than the Nondeterministic Polynomial time.

Based on analysis about TM, UM, DTM, NTM, Turing Reducible, beta-reduction, P-reducible, isomorph, tautology, semi-decidable, checking relation, the oracle and NP-completeness, etc., it reinterprets The Church-Turing Thesis that is equivalent of the Polynomial time and actual time; it redefines the NTM based on its undecidable set of its internal state. It comes to the conclusions: The P-reducible is misdirected from the Turing Reducible with its oracle; The NP-completeness is a reversal to The Church-Turing Thesis; The Cook-Levin theorem is an equipollent of two uncertains. This paper brings forth new concepts: NP (nondeterministic problem) and NP-algorithm (defined as the optimal algorithm to get the best fit approximation value of NP). P versus NP is the relativity of Computability and Nondeterministic, P/=NP. The NP-algorithm is effective approximate way to NP by TM.

### Understanding Incentives: Mechanism Design becomes Algorithm Design

Authors: Yang Cai, Constantinos Daskalakis, S. Matthew Weinberg
Abstract: We provide a computationally efficient black-box reduction from mechanism design to algorithm design in very general settings. Specifically, we give an approximation-preserving reduction from truthfully maximizing \emph{any} objective under \emph{arbitrary} feasibility constraints with \emph{arbitrary} bidder types to (not necessarily truthfully) maximizing the same objective plus virtual welfare (under the same feasibility constraints). Our reduction is based on a fundamentally new approach: we describe a mechanism's behavior indirectly only in terms of the expected value it awards bidders for certain behavior, and never directly access the allocation rule at all.

Applying our new approach to revenue, we exhibit settings where our reduction holds \emph{both ways}. That is, we also provide an approximation-sensitive reduction from (non-truthfully) maximizing virtual welfare to (truthfully) maximizing revenue, and therefore the two problems are computationally equivalent. With this equivalence in hand, we show that both problems are NP-hard to approximate within any polynomial factor, even for a single monotone submodular bidder.

We further demonstrate the applicability of our reduction by providing a truthful mechanism maximizing fractional max-min fairness. This is the first instance of a truthful mechanism that optimizes a non-linear objective.

### Reducing Revenue to Welfare Maximization: Approximation Algorithms and other Generalizations

Authors: Yang Cai, Constantinos Daskalakis, S. Matthew Weinberg
Abstract: It was recently shown in [this http URL] that revenue optimization can be computationally efficiently reduced to welfare optimization in all multi-dimensional Bayesian auction problems with arbitrary (possibly combinatorial) feasibility constraints and independent additive bidders with arbitrary (possibly combinatorial) demand constraints. This reduction provides a poly-time solution to the optimal mechanism design problem in all auction settings where welfare optimization can be solved efficiently, but it is fragile to approximation and cannot provide solutions to settings where welfare maximization can only be tractably approximated. In this paper, we extend the reduction to accommodate approximation algorithms, providing an approximation preserving reduction from (truthful) revenue maximization to (not necessarily truthful) welfare maximization. The mechanisms output by our reduction choose allocations via black-box calls to welfare approximation on randomly selected inputs, thereby generalizing also our earlier structural results on optimal multi-dimensional mechanisms to approximately optimal mechanisms. Unlike [this http URL], our results here are obtained through novel uses of the Ellipsoid algorithm and other optimization techniques over {\em non-convex regions}.

### An exponential lower bound for Cunningham's rule

Authors: David Avis, Oliver Friedmann
Abstract: In this paper we give an exponential lower bound for Cunningham's least recently considered (round-robin) rule as applied to parity games, Markhov decision processes and linear programs. This improves a recent subexponential bound of Friedmann for this rule on these problems. The round-robin rule fixes a cyclical order of the variables and chooses the next pivot variable starting from the previously chosen variable and proceeding in the given circular order. It is perhaps the simplest example from the class of history-based pivot rules. Our results are based on a new lower bound construction for parity games. Due to the nature of the construction we are also able to obtain an exponential lower bound for the round-robin rule applied to acyclic unique sink orientations of hypercubes (AUSOs). Furthermore these AUSOs are realizable as polytopes. We believe these are the first such results for history based rules for AUSOs, realizable or not. The paper is self-contained and requires no previous knowledge of parity games.

### SPARC 2013

from Andrew McGregor

The exciting news sweeping the blogosphere (see here and here) is that SPARC 2013 is on its way. Specifically, Atri asked me to post the following:

Efficient and effective transmission, storage, and retrieval of information on a large-scale are among the core technical problems in the modern digital revolution. The massive volume of data necessitates the quest for mathematical and algorithmic methods for efficiently describing, summarizing, synthesizing, and, increasingly more critical, deciding when and how to discard data before storing or transmitting it. Such methods have been developed in two areas: coding theory, and sparse approximation (SA) (and its variants called compressive sensing (CS) and streaming algorithms).

Coding theory and computational complexity are both well established fields that enjoy fruitful interactions with one another. On the other hand, while significant progress on the SA/CS problem has been made, much of that progress is concentrated on the feasibility of the problems, including a number of algorithmic innovations that leverage coding theory techniques, but a systematic computational complexity treatment of these problems is sorely lacking. The workshop organizers aim to develop a general computational theory of SA and CS (as well as related areas such as group testing) and its relationship to coding theory. This goal can be achieved only by bringing together researchers from a variety of areas.

The Coding, Complexity and Sparsity workshop (SPARC 13) will be held in Ann Arbor, MI on Aug 5-7.

These will be hour-long lectures designed to give students an introduction to coding theory, complexity theory/pseudo-randomness, and compressive sensing/streaming algorithms. We will have a poster session during the workshop and everyone is welcome to bring a poster but graduate students and postdocs are especially encouraged to give a poster presentation.

This is the third incarnation of the workshop and the previous two workshops were also held in Ann Arbor in August of 2011 and 2012.

Confirmed speakers:

• Jin Yi Cai (University of Wisconsin, Madison)
• Shafi Goldwasser (MIT)
• Piotr Indyk (MIT)
• Swastik Kopparty (Rutgers University)
• Dick Lipton (Georgia Tech)
• Andrew McGregor (University of Massachusetts, Amherst)
• Raghu Meka (IAS)
• Jelani Nelson (Harvard)
• Eric Price (MIT)
• Christopher Ré (University of Wisconsin, Madison)
• Shubhangi Saraf (Rutgers University)
• Suresh Venkatasubramanian (University of Utah)
• David Woodruff (IBM)
• Mary Wootters (Michigan)
• Shuheng Zhou (Michigan)

We have some funding for graduate students and postdocs with preference given to those who will be presenting posters. For registration and other details, please look at the workshop webpage.

by Andrew at May 19, 2013 11:59 PM UTC

### 21st century proverbs

from The Geomblog

Having nothing better to do, I decided to create a twitter meme: updating proverbs for the 21st century. Use the hashtag #21stcentproverbs and join in the fun below.

by Suresh Venkatasubramanian (noreply@blogger.com) at May 19, 2013 09:58 PM UTC

### Coding, Complexity and Sparsity 2013 (SPARC) 2013.

from The Geomblog

Atri Rudra reminds us that the 3rd incarnation of SPARC is soon to be upon us (disclaimer: I'll be speaking at the event):
Efficient and effective transmission, storage, and retrieval of information on a large-scale are among the core technical problems in the modern digital revolution. The massive volume of data necessitates the quest for mathematical and algorithmic methods for efficiently describing, summarizing, synthesizing, and, increasingly more critical, deciding when and how to discard data before storing or transmitting it. Such methods have been developed in two areas: coding theory, and sparse approximation (SA) (and its variants called compressive sensing (CS) and streaming algorithms).
Coding theory and computational complexity are both well established fields that enjoy fruitful interactions with one another. On the other hand, while significant progress on the SA/CS problem has been made, much of that progress is concentrated on the feasibility of the problems, including a number of algorithmic innovations that leverage coding theory techniques, but a systematic computational complexity treatment of these problems is sorely lacking. The workshop organizers aim to develop a general computational theory of SA and CS (as well as related areas such as group testing) and its relationship to coding theory. This goal can be achieved only by bringing together researchers from a variety of areas.
The Coding, Complexity and Sparsity workshop (SPARC 13) will be held in Ann Arbor, MI on Aug 5-7.
These will be hour-long lectures designed to give students an introduction to coding theory, complexity theory/pseudo-randomness, and compressive sensing/streaming algorithms. We will have a poster session during the workshop and everyone is welcome to bring a poster but graduate students and postdocs are especially encouraged to give a poster presentation.
This is the third incarnation of the workshop and the previous two workshops were also held in Ann Arbor in August of 2011 and 2012.
Confirmed speakers:
* Jin Yi Cai (University of Wisconsin, Madison)
* Shafi Goldwasser (MIT)
* Piotr Indyk (MIT)
* Swastik Kopparty (Rutgers University)
* Dick Lipton (Georgia Tech)
* Andrew McGregor (University of Massachusetts, Amherst)
* Raghu Meka (IAS)
* Jelani Nelson    (Harvard)
* Eric Price (MIT)
* Christopher Ré (University of Wisconsin, Madison)
* Shubhangi Saraf (Rutgers University)
* Suresh Venkatasubramanian (University of Utah)
* David Woodruff (IBM)
* Mary Wootters    (Michigan)
* Shuheng Zhou (Michigan)

We have some funding for graduate students and postdocs with preference given to those who will be presenting posters. For registration and other details, please look at the workshop webpage: http://eecs.umich.edu/eecs/SPARC2013/

by Suresh Venkatasubramanian (noreply@blogger.com) at May 19, 2013 07:10 AM UTC

### Test Your Intuition (21): Auctions

from Gil Kalai

You run a single-item sealed bid auction where you sell an old camera. There are three bidders and the value of the camera for each of them is described by a certain (known) random variable: With probability 0.9 the value is 100+x where x is taken uniformly at random from the interval [-1,1]. With probability 0.1 the value is 300+x where x is as before. The 300 value represents the case that the item has a special nostalgic value for the buyer.

The values of the camera to the three bidders are thus i.i.d random variables. (The reason for adding this small random x is to avoid ties, and you can replace the interval [-1,1] with [-ε, ε] for a small ε, if you wish.) If you don’t want to worry about ties you can simply think about the value being 100 with probability 0.9 and 300 wit probability 0.1.

### The basic question

The basic questions for you the seller is:

What is the highest expected revenue you, the seller, can guarantee and what is your optimal allocation policy.

I’d like to see the answer for this question. But let me challenge your intuition with two simpler questions.

1) Can the seller guarantee an expected revenue of 120  or more?

2) What (roughly) is the optimal allocation policy

a) Highest bidder wins.

b) Highest bidder wins if his bid passes some reserve price.

c) The item is allocated at random to the bidders with probabilities depending on their bids.

### Background: Myerson’s paper and his revenue equivalence theorem

The most relevant paper to this post is a well-known paper Optimal auction design by Roger Myerson. Myerson considered the case of a single-item sealed-bid auction where the bidders’ values for the item are independent identical random variable.

Note that I did not specify the price that the winning bidder is going to pay for the camera. The reason is that according to a famous theorem by Myerson (referred to as the revenue equivalence theorem), when the bidders are strategic, the expected revenues for the seller are determined by the allocation rule and are independent from the pricing policy! (Well, you need to assume a reasonable pricing policy…)

For example, if the item is allocated to the highest bidder then the expected price will be the second highest price. If the price is determined by the second highest bid (Vickery’s auction) then each bidder has no incentive to give a bid which is different from its value. But even if the item will be allocated to the first bidder for the highest price, the expected revenues will still be the same! When you analyze an auction mechanism like ours you can assume that the payments are set in a way that the bidders have no incentive not to bid the true value the camera has. This is sometimes referred to as the revelation principle.

Myerson considered a mechanism which sometimes lead to higher revenues compared to allocating the item to the highest bidder: The seller set a reserve price and the item is allocated to the highest bidder if the bid passes this reserve price, and is not allocated at all otherwise. In the paper Myerson also considered more complicated allocation rules which are important in the analysis where the item is allocated to bidders according to some probabilities based on their bids.

### Polls

This time we have two questions and two polls:

Once again this is a game-theory question. I hope it will lead to interesting discussion like the earlier one on tic-tac-toe.

### A little more Background: Auctions and Vickery’s second price auction.

(From Wikipedia) An auction is a process of buying and selling goods or services by offering them up for bid, taking bids, and then selling the item to the highest bidder. In economic theory, an auction may refer to any mechanism or set of trading rules for exchange.

In our case, we consider an auction of a single item (the camera) and each bidder is giving a sealed bid.

(Again from Wikipedea) A Vickrey auction is a type of sealed-bid auction, where bidders submit written bids without knowing the bid of the other people in the auction, and in which the highest bidder wins, but the price paid is the second-highest bid. The auction was first described academically by Columbia University professor William Vickrey in 1961 though it had been used by stamp collectors since 1893.[2] This type of auction is strategically similar to an English auction, and gives bidders an incentive to bid their true value.

by Gil Kalai at May 18, 2013 08:26 PM UTC

### Oz’ Balls Problem: The Solution

from Gil Kalai

A commentator named Oz proposed the following question: You have a box with n red balls and n blue balls. You take out each time a ball at random but, if the ball was red, you put it back in the box and take out a blue ball. If the ball was blue, you put it back in the box and take out a red ball.

You keep doing it until left only with balls of the same color. How many balls will be left (as a function of n)?

Peter Shor wrote in a comment “I’m fairly sure that there is not enough bias to get $cn$, but it intuitively seems far too much bias to still be $c \sqrt{n}$. I want to say $n^c$. At a wild guess, it’s either $c = \frac{2}{3}$or $c = \frac{3}{4}$, since those are the simplest exponents between $\frac{1}{2}$ and $1$.”  The comment followed by a heuristic argument of Kevin Kostelo and computer experiments by Lior Silberman that supported the answer $n^{3/4}$.

This is correct!

Good intuition, Peter!

In our student probability day a couple of weeks ago, Yuval Peres told me the origin of this problem. The way it was originally posed by D. Williams and P. McIlroy used roughly the following story (I modified it a little): There are two groups of n gunmen that shoot at each other. Once a gunman is hit he stops shooting, and leaves the place happily and peacefully. How many gunmen will be left after all gunmen in one team had left. The problem was solved by Kingman and Volkov in this paper.

J. F. C. Kingman, and S. E. Volkov,  Solution to the OK Corral model via decoupling of Friedman’s urn, J. Theoret. Probab. 16 (2003), no. 1, 267–276. A nice presentation of the result entitled: Internal erosion and the exponent 3/4 was given by Lionel Levine and Yuval Peres.

by Gil Kalai at May 17, 2013 09:09 AM UTC

### Some polyhedral combinatorics

from David Eppstein

Today's new Wikipedia articles: Hanner polytopes and Kalai's 3d conjecture.

The 3d conjecture states that a centrally symmetric d-dimensional polytope must have at least 3d faces (of all dimensions including the d-dimensional polytope itself as a face but not including the –1-dimensional empty set). For instance, the cube has 8 vertices, 12 edges, 6 squares, and 1 cube as faces; 8 + 12 + 6 + 1 = 27 = 33.

The Hanner polytopes include the cubes, are closed under Cartesian products and duality, and if the conjecture is true have the fewest possible numbers of faces of all centrally symmetric d-dimensional polytopes. They also have some other interesting properties. In 3d there are only the cube and the octahedron, but in higher dimensions there are a lot more of them.

Earlier this week Georgia Tech announced the Online Masters of Science in Computer Science, a MOOCs-based degree with a total tuition of about $7000. This degree came out of a collaboration between Sebastian Thrun of Udacity and my dean Zvi Galil with some significant financial support from AT&T. We've spent several months getting faculty input and buy-in to the program and we're very excited about taking a new leading role in the MOOCs revolution. We will roll out slowly, with a smaller scale courses to corporate affiliates to work out the kinks and the plan to go to the general public in fall 2014. Read the FAQ to get more information about the program. It's been fun watching the development of this degree, in particular hearing Sebastian talk about his MOOC 2.0 plans to scale courses with a small amount of expense that we pull from the tuition. No doubt we will have challenges in making this degree truly work at a large scale but I'm truly bullish that we'll a self-sustaining quality Masters program that will reach tens if not hundreds of thousands of students. Here we go. by Lance Fortnow (noreply@blogger.com) at May 17, 2013 02:03 AM UTC ### Geometric primitive feature extraction - concepts, algorithms, and applications Authors: Dilip K. Prasad Download: PDF Abstract: This thesis presents important insights and concepts related to the topic of the extraction of geometric primitives from the edge contours of digital images. Three specific problems related to this topic have been studied, viz., polygonal approximation of digital curves, tangent estimation of digital curves, and ellipse fitting anddetection from digital curves. For the problem of polygonal approximation, two fundamental problems have been addressed. First, the nature of the performance evaluation metrics in relation to the local and global fitting characteristics has been studied. Second, an explicit error bound of the error introduced by digitizing a continuous line segment has been derived and used to propose a generic non-heuristic parameter independent framework which can be used in several dominant point detection methods. For the problem of tangent estimation for digital curves, a simple method of tangent estimation has been proposed. It is shown that the method has a definite upper bound of the error for conic digital curves. It has been shown that the method performs better than almost all (seventy two) existing tangent estimation methods for conic as well as several non-conic digital curves. For the problem of fitting ellipses on digital curves, a geometric distance minimization model has been considered. An unconstrained, linear, non-iterative, and numerically stable ellipse fitting method has been proposed and it has been shown that the proposed method has better selectivity for elliptic digital curves (high true positive and low false positive) as compared to several other ellipse fitting methods. For the problem of detecting ellipses in a set of digital curves, several innovative and fast pre-processing, grouping, and hypotheses evaluation concepts applicable for digital curves have been proposed and combined to form an ellipse detection method. ### Multicut Lower Bounds via Network Coding Authors: Anna Blasiak Download: PDF Abstract: We introduce a new technique to certify lower bounds on the multicut size using network coding. In directed networks the network coding rate is not a lower bound on the multicut, but we identify a class of networks on which the rate is equal to the size of the minimum multicut and show this class is closed under the strong graph product. We then show that the famous construction of Saks et al. that gives a$\Theta(k)$gap between the multicut and the multicommodity flow rate is contained in this class. This allows us to apply our result to strengthen their multicut lower bound, determine the exact value of the minimum multicut, and give an optimal network coding solution with rate matching the multicut. ### Exploiting non-constant safe memory in resilient algorithms and data structures Authors: Lorenzo De Stefani, Francesco Silvestri Download: PDF Abstract: We extend the Faulty RAM model by Italiano and Finocchi (2008) by adding a safe memory of arbitrary size$S$, and we then derive tradeoffs between the performance of resilient algorithmic techniques and the size of the safe memory. In particular, we describe a resilient algorithm for sorting$n$entries in$\BO{n\log n+\alpha (\delta/S + \log S)}$time using$\BT{S}$safe memory locations, where$\alpha\leq \delta$denotes the actual number of faults. Our algorithm outperforms previous resilient sorting algorithms which do not exploit the available safe memory and require$\BO{n\log n+ \alpha\delta}$time. Finally, we exploit our merging algorithm for deriving a resilient priority queue data structure requiring$\BO{\log n + \delta/S}$amortized time per operation. Our priority queue improves the$\BO{\log n + \delta}$amortized time required by the state of the art. ### 3SUM, 3XOR, Triangles Authors: Zahra Jafargholi, Emanuele Viola Download: PDF Abstract: We show that if one can solve 3SUM on a set of size n in time n^{1+\e} then one can list t triangles in a graph with m edges in time O(m^{1+\e}t^{1/3-\e/3}). This is a reversal of Patrascu's reduction from 3SUM to listing triangles (STOC '10). Our result builds on and extends works by the Paghs (PODS '06) and by Vassilevska and Williams (FOCS '10). We make our reductions deterministic using tools from pseudorandomness. We then re-execute both Patrascu's reduction and our reversal for the variant 3XOR of 3SUM where integer summation is replaced by bit-wise xor. As a corollary we obtain that if 3XOR is solvable in linear time but 3SUM requires quadratic randomized time, or vice versa, then the randomized time complexity of listing m triangles in a graph with$m$edges is m^{4/3} up to a factor m^\alpha for any \alpha > 0. ### A subexponential-time quantum algorithm for LWE problem Authors: Fada Li, Wansu Bao, Xiangqun Fu, Yuchao Zhang, Tan Li Download: PDF Abstract: Learning with Errors (LWE) problems are the foundations for numerous applications in lattice-based cryptography and are prov-ably as hard as approximate lattice problems in the worst case. The fastest known classical algorithm takes exponential time, and prior to our work no faster quantum algorithm was known. It would therefore be highly desirable to make algorithmic improvement on LWE. Here we propose a subexponential-time quantum algorithm for LWE problem.The algorithm is based on a reduction to a two point problem and a new reduction from two point problem to dihedral coset problem. Our algorithm introduces the method that may be applicable to other lattice problems. ### On Structural Parameterizations for the 2-Club Problem Authors: Sepp Hartung, Christian Komusiewicz, André Nichterlein, Ondrej Suchý Download: PDF Abstract: The NP-hard 2-Club problem is, given an undirected graph G=(V,E) and l\in N, to decide whether there is a vertex set S\subseteq V of size at least l such that the induced subgraph G[S] has diameter at most two. We make progress towards a systematic classification of the complexity of 2-Club with respect to a hierarchy of prominent structural graph parameters. First, we present the following tight NP-hardness results: 2-Club is NP-hard on graphs that become bipartite by deleting one vertex, on graphs that can be covered by three cliques, and on graphs with domination number two and diameter three. Then, we consider the parameter h-index of the input graph. This parameter is motivated by real-world instances and the fact that 2-Club is fixed-parameter tractable with respect to the larger parameter maximum degree. We present an algorithm that solves 2-Club in |V|^{f(k)} time with k being the h-index. By showing W[1]-hardness for this parameter, we provide evidence that the above algorithm cannot be improved to a fixed-parameter algorithm. Furthermore, the reduction used for this hardness result can be modified to show that 2-Club is NP-hard if the input graph has constant degeneracy. Finally, we show that 2-Club is fixed-parameter tractable with respect to distance to cographs. ### The Thinnest Path Problem Authors: Jianhang Gao, Qing Zhao, Ananthram Swami Download: PDF Abstract: We formulate and study the thinnest path problem in wireless ad hoc networks. The objective is to find a path from the source to the destination that results in the minimum number of nodes overhearing the message by carefully choosing the relaying nodes and their corresponding transmission power. We adopt a directed hypergraph model of the problem and establish the NP-completeness of the problem in 2-D networks. We then develop two polynomial-time approximation algorithms that offer$\sqrt{\frac{n}{2}}$and$\frac{n}{2\sqrt{n-1}}$for general directed hypergraphs (which can model non-isomorphic signal propagation in space) and constant approximation ratios for ring hypergraphs (which result from isomorphic signal propagation). We also consider the thinnest path problem in 1-D networks and 1-D networks embedded in 2-D field of eavesdroppers with arbitrary unknown locations (the so-called 1.5-D networks). We propose a linear-complexity algorithm based on nested backward induction that obtains the optimal solution to both 1-D and 1.5-D networks. In particular, no algorithm, even with complete location information of the eavesdroppers, can obtain a thinner path than the proposed algorithm which does not require the knowledge of eavesdropper locations. ### Modeling Information Propagation with Survival Theory Authors: Manuel Gomez Rodriguez, Jure Leskovec, Bernhard Schoelkopf Download: PDF Abstract: Networks provide a skeleton for the spread of contagions, like, information, ideas, behaviors and diseases. Many times networks over which contagions diffuse are unobserved and need to be inferred. Here we apply survival theory to develop general additive and multiplicative risk models under which the network inference problems can be solved efficiently by exploiting their convexity. Our additive risk model generalizes several existing network inference models. We show all these models are particular cases of our more general model. Our multiplicative model allows for modeling scenarios in which a node can either increase or decrease the risk of activation of another node, in contrast with previous approaches, which consider only positive risk increments. We evaluate the performance of our network inference algorithms on large synthetic and real cascade datasets, and show that our models are able to predict the length and duration of cascades in real data. ### D-Wave: Truth finally starts to emerge from Scott Aaronson More Updates (May 21): Happy 25th birthday to me! Among the many interesting comments below, see especially this one by Alex Selby, who says he’s written his own solver significantly outperforming that of McGeoch and Wang (as well as the D-Wave machine) on the same benchmarks—and who provides the Python code so you can try it yourself. Also, Igor Vernik asked me to announce that on July 8th, D-Wave will be giving a technical presentation at the International Superconducting Electronics Conference in Cambridge. See here for more info; I’ll be traveling then and won’t be able to make it. I don’t know whether the performance comparisons to Matthias Troyer’s and Alex Selby’s code will be among the topics discussed, or if there will be an opportunity to ask questions about such things. Update (May 17): Daniel Lidar emailed me to clarify his views about error-correction and the viability of D-Wave’s approach. He invited me to share his clarification with others—something that I’m delighted to do, since I agree with him wholeheartedly. Without further ado, here’s what Lidar says: I don’t believe D-Wave’s approach is scalable without error correction. I believe that the incorporation of error correction is a necessary condition in order to ever achieve a speedup with D-Wave’s machines, and I don’t believe D-Wave’s machines are any different from other types of quantum information processing in this regard. I have repeatedly made this point to D-Wave over several years, and I hope that in the future their designs will allow more flexibility in the incorporation of error correction. Lidar also clarified that he not only doesn’t dispute what Matthias Troyer told me about the lack of speedup of the D-Wave device compared to classical simulated annealing in their experiments, but “fully agrees, endorses, and approves” of it—and indeed, that he himself was part of the team that did the comparison. In other news, this Hacker News thread, which features clear, comprehending discussions of this blog post and the backstory that led up to it, has helped to restore my faith in humanity. Two years ago almost to the day, I announced my retirement as Chief D-Wave Skeptic. But—as many readers predicted at the time—recent events (and the contents of my inbox!) have given me no choice except to resume my post. In an all-too-familiar pattern, multiple rounds of D-Wave-related hype have made it all over the world before the truth has had time to put its pants on and drop its daughter off in daycare. And the current hype is particularly a shame, because once one slices through all the layers of ugh—the rigged comparisons, the “dramatic announcements” that mean nothing, the lazy journalists cherry-picking what they want to hear and ignoring the inconvenient bits—there really has been a huge scientific advance this past month in characterizing the D-Wave devices. I’m speaking about the experiments on the D-Wave One installed at USC, the main results of which finally appeared in April. Two of the coauthors of this new work—Matthias Troyer and Daniel Lidar—were at MIT recently to speak about their results, Troyer last week and Lidar this Tuesday. Intriguingly, despite being coauthors on the same paper, Troyer and Lidar have very different interpretations of what their results mean, but we’ll get to that later. For now, let me summarize what I think their work has established. Evidence for Quantum Annealing Behavior For the first time, we have evidence that the D-Wave One is doing what should be described as “quantum annealing” rather than “classical annealing” on more than 100 qubits. (Note that D-Wave itself now speaks about “quantum annealing” rather than “quantum adiabatic optimization.” The difference between the two is that the adiabatic algorithm runs coherently, at zero temperature, while quantum annealing is a “messier” version in which the qubits are strongly coupled to their environment throughout, but still maintain some quantum coherence.) The evidence for quantum annealing behavior is still extremely indirect, but despite my “Chief Skeptic” role, I’m ready to accept what the evidence indicates with essentially no hesitation. So what is the evidence? Basically, the USC group ran the D-Wave One on a large number of randomly generated instances of what I’ll call the “D-Wave problem”: namely, the problem of finding the lowest-energy configuration of an Ising spin glass, with nearest-neighbor interactions that correspond to the D-Wave chip’s particular topology. Of course, restricting attention to this “D-Wave problem” tilts the tables heavily in D-Wave’s favor, but no matter: scientifically, it makes a lot more sense than trying to encode Sudoku puzzles or something like that. Anyway, the group then looked at the distribution of success probabilities when each instance was repeatedly fed to the D-Wave machine. For example, would the randomly-generated instances fall into one giant clump, with a few outlying instances that were especially easy or especially hard for the machine? Surprisingly, they found that the answer was no: the pattern was strongly bimodal, with most instances either extremely easy or extremely hard, and few instances in between. Next, the group fed the same instances to Quantum Monte Carlo: a standard classical algorithm that uses Wick rotation to find the ground states of “stoquastic Hamiltonians,” the particular type of quantum evolution that the D-Wave machine is claimed to implement. When they did that, they found exactly the same bimodal pattern that they found with the D-Wave machine. Finally they fed the instances to a classical simulated annealing program—but there they found a “unimodal” distribution, not a bimodal one. So, their conclusion is that whatever the D-Wave machine is doing, it’s more similar to Quantum Monte Carlo than it is to classical simulated annealing. Curiously, we don’t yet have any hint of a theoretical explanation for why Quantum Monte Carlo should give rise to a bimodal distribution, while classical simulating annealing should give rise to a unimodal one. The USC group simply observed the pattern empirically (as far as I know, they’re the first to do so), then took advantage of it to characterize the D-Wave machine. I regard explaining this pattern as an outstanding open problem raised by their work. In any case, if we accept that the D-Wave One is doing “quantum annealing,” then despite the absence of a Bell-inequality violation or other direct evidence, it’s reasonably safe to infer that there should be large-scale entanglement in the device. I.e., the true quantum state is no doubt extremely mixed, but there’s no particular reason to believe we could decompose that state into a mixture of product states. For years, I tirelessly repeated that D-Wave hadn’t even provided evidence that its qubits were entangled—and that, while you can have entanglement with no quantum speedup, you can’t possibly have a quantum speedup without at least the capacity to generate entanglement. Now, I’d say, D-Wave finally has cleared the evidence-for-entanglement bar—and, while they’re not the first to do so with superconducting qubits, they’re certainly the first to do so with so many superconducting qubits. So I congratulate D-Wave on this accomplishment. If this had been advertised from the start as a scientific research project—”of course we’re a long way from QC being practical; no one would ever claim otherwise; but as a first step, we’ve shown experimentally that we can entangle 100 superconducting qubits with controllable couplings”—my reaction would’ve been, “cool!” (Similar to my reaction to any number of other steps toward scalable QC being reported by research groups all over the world.) No Speedup Compared to Classical Simulated Annealing But of course, D-Wave’s claims—and the claims being made on its behalf by the Hype-Industrial Complex—are far more aggressive than that. And so we come to the part of this post that has not been pre-approved by the International D-Wave Hype Repeaters Association. Namely, the same USC paper that reported the quantum annealing behavior of the D-Wave One, also showed no speed advantage whatsoever for quantum annealing over classical simulated annealing. In more detail, Matthias Troyer’s group spent a few months carefully studying the D-Wave problem—after which, they were able to write optimized simulated annealing code that solves the D-Wave problem on a normal, off-the-shelf classical computer, about 15 times faster than the D-Wave machine itself solves the D-Wave problem! Of course, if you wanted even more classical speedup than that, then you could simply add more processors to your classical computer, for only a tiny fraction of the ~$10 million that a D-Wave One would set you back.

Some people might claim it’s “unfair” to optimize the classical simulated annealing code to take advantage of the quirks of the D-Wave problem.  But think about it this way: D-Wave has spent ~$100 million, and hundreds of person-years, optimizing the hell out of a special-purpose annealing device, with the sole aim of solving this one problem that D-Wave itself defined. So if we’re serious about comparing the results to a classical computer, isn’t it reasonable to have one professor and a few postdocs spend a few months optimizing the classical code as well? As I said, besides simulated annealing, the USC group also compared the D-Wave One’s performance against a classical implementation of Quantum Monte Carlo. And maybe not surprisingly, the D-Wave machine was faster than a “direct classical simulation of itself” (I can’t remember how many times faster, and couldn’t find that information in the paper). But even here, there’s a delicious irony. The only reason the USC group was able to compare the D-Wave one against QMC at all, is that QMC is efficiently implementable on a classical computer! (Albeit probably with a large constant overhead compared to running the D-Wave annealer itself—hence the superior performance of classical simulated annealing over QMC.) This means that, if the D-Wave machine can be understood as reaching essentially the same results as QMC (technically, “QMC with no sign problem”), then there’s no real hope for using the D-Wave machine to get an asymptotic speedup over a classical computer. The race between the D-Wave machine and classical simulations of the machine would then necessarily be a cat-and-mouse game, a battle of constant factors with no clear asymptotic victor. (Some people might conjecture that it will also be a “Tom & Jerry game,” the kind where the classical mouse always gets the better of the quantum cat.) At this point, it’s important to give a hearing to three possible counterarguments to what I’ve written above. The first counterargument is that, if you plot both the runtime of simulated annealing and the runtime of the D-Wave machine as functions of the instance size n, you find that, while simulated annealing is faster in absolute terms, it can look like the curve for the D-Wave machine is less steep. Over on the blog “nextbigfuture”, an apparent trend of this kind has been fearlessly extrapolated to predict that with 512 qubits, the D-Wave machine will be 10 billion times faster than a classical computer. But there’s a tiny fly in the ointment. As Troyer carefully explained to me last week, the “slow growth rate” of the D-Wave machine’s runtime is, ironically, basically an artifact of the machine being run too slowly on small values of n. Run the D-Wave machine as fast as it can run for small n, and the difference in the slopes disappears, with only the constant-factor advantage for simulated annealing remaining. In short, there seems to be no evidence, at present, that the D-Wave machine is going to overtake simulated annealing for any instance size. The second counterargument is that the correlation between the two “bimodal distributions”—that for the D-Wave machine and that for the Quantum Monte Carlo simulation—is not perfect. In other words, there are a few instances (not many) that QMC solves faster than the D-Wave machine, and likewise a few instances that the D-Wave machine solves faster than QMC. Not surprisingly, the latter fact has been eagerly seized on by the D-Wave boosters (“hey, sometimes the machine does better!”). But Troyer has a simple and hilarious response to that. Namely, he found that his group’s QMC code did a better job of correlating with the D-Wave machine, than the D-Wave machine did of correlating with itself! In other words, calibration errors seem entirely sufficient to explain the variation in performance, with no need to posit any special class of instances (however small) on which the D-Wave machine dramatically outperforms QMC. The third counterargument is just the banal one: the USC experiment was only one experiment with one set of instances (albeit, a set one might have thought would be heavily biased toward D-Wave). There’s no proof that, in the future, it won’t be discovered that the D-Wave machine does something more than QMC, and that there’s some (perhaps specially-designed) set of instances on which the D-Wave machine asymptotically outperforms both QMC and Troyer’s simulated annealing code. (Indeed, I gather that folks at D-Wave are now assiduously looking for such instances.) Well, I concede that almost anything is possible in the future—but “these experiments, while not supporting D-Wave’s claims about the usefulness of its devices, also don’t conclusively disprove those claims” is a very different message than what’s currently making it into the press. Comparison to CPLEX is Rigged Unfortunately, the USC paper is not the one that’s gotten the most press attention—perhaps because half of it inconveniently told the hypesters something they didn’t want to hear (“no speedup”). Instead, journalists have preferred a paper released this week by Catherine McGeoch and Cong Wang, which reports that quantum annealing running on the D-Wave machine outperformed the CPLEX optimization package running on a classical computer by a factor of ~3600, on Ising spin problems involving 439 bits. Wow! That sounds awesome! But before rushing to press, let’s pause to ask ourselves: how can we reconcile this with the USC group’s result of no speedup? The answer turns out to be painfully simple. CPLEX is a general-purpose, off-the-shelf exact optimization package. Of course an exact solver can’t compete against quantum annealing—or for that matter, against classical annealing or other classical heuristics! Noticing this problem, McGeoch and Wang do also compare the D-Wave machine against tabu search, a classical heuristic algorithm. When they do so, they find that an advantage for the D-Wave machine persists, but it becomes much, much smaller (they didn’t report the exact time comparison). Amusingly, they write in their “Conclusions and Future Work” section: It would of course be interesting to see if highly tuned implementations of, say, tabu search or simulated annealing could compete with Blackbox or even QA [i.e., the D-Wave machines] on QUBO [quadratic binary optimization] problems; some preliminary work on this question is underway. As I said above, at the time McGeoch and Wang’s paper was released to the media (though maybe not at the time it was written?), the “highly tuned implementation” of simulated annealing that they ask for had already been written and tested, and the result was that it outperformed the D-Wave machine on all instance sizes tested. In other words, their comparison to CPLEX had already been superseded by a much more informative comparison—one that gave the “opposite” result—before it ever became public. For obvious reasons, most press reports have simply ignored this fact. Troyer, Lidar, and Stone Soup Much of what I’ve written in this post, I learned by talking to Matthias Troyer—the man who carefully experimented with the D-Wave machine and figured out how to beat it using simulated annealing, and who I regard as probably the world’s #1 expert right now on what exactly the machine does. Troyer wasn’t shy about sharing his opinions, and while couched with qualifications, they tended toward extremely skeptical. For example, Troyer conjectured that, if D-Wave ultimately succeeds in getting a speedup over classical computers in a fair comparison, then it will probably be by improving coherence and calibration, incorporating error-correction, and doing other things that “traditional,” “academic” quantum computing researchers had said all along would need to be done. As I said, Daniel Lidar is another coauthor on the USC paper, and also recently visited MIT to speak. Lidar and Troyer agree on the basic facts—yet Lidar noticeably differed from Troyer, in trying to give each fact the most “pro-D-Wave spin” it could possibly support. Lidar spoke at our quantum group meeting, not about the D-Wave vs. simulated annealing performance comparison (which he agrees with), but about a proposal of his for incorporating quantum error-correction into the D-Wave device, together with some experimental results. He presented his proposal, not as a reductio ad absurdum of D-Wave’s entire philosophy, but rather as a positive opportunity to get a quantum speedup using D-Wave’s approach. So, to summarize my current assessment of the situation: yes, absolutely, D-Wave might someday succeed—ironically, by adapting the very ideas from “the gate model” that its entire business plan has been based on avoiding, and that D-Wave founder Geordie Rose has loudly denigrated for D-Wave’s entire history! If that’s what happens, then I predict that science writers, and blogs like “nextbigfuture,” will announce from megaphones that D-Wave has been vindicated at last, while its narrow-minded, theorem-obsessed, ivory-tower academic naysayers now have egg all over their faces. No one will care that the path to success—through quantum error-correction and so on—actually proved the academic critics right, and that D-Wave’s “vindication” was precisely like that of the deliciousness of stone soup in the old folktale. As for myself, I’ll probably bang my head on my desk until I sustain so much brain damage that I no longer care either. But at least I’ll still have tenure, and the world will have quantum computers. The Messiah’s Quantum Annealer Over the past few days, I’ve explained the above to at least six different journalists who asked. And I’ve repeatedly gotten a striking response: “What you say makes sense—but then why are all these prestigious people and companies investing in D-Wave? Why did Bo Ewald, a prominent Silicon Valley insider, recently join D-Wave as president of its US operations? Why the deal with Lockheed Martin? Why the huge deal with NASA and Google, just announced today? What’s your reaction to all this news?” My reaction, I confess, is simple. I don’t care—I actually told them this—if the former Pope Benedict has ended his retirement to become D-Wave’s new marketing director. I don’t care if the Messiah has come to Earth on a flaming chariot, not to usher in an age of peace but simply to spend$10 million on D-Wave’s new Vesuvius chip.  And if you imagine that I’ll ever care about such things, then you obviously don’t know much about me.  I’ll tell you what: if peer pressure is where it’s at, then come to me with the news that Umesh Vazirani, or Greg Kuperberg, or Matthias Troyer is now convinced, based on the latest evidence, that D-Wave’s chip asymptotically outperforms simulated annealing in a fair comparison, and does so because of quantum effects.  Any one such scientist’s considered opinion would mean more to me than 500,000 business deals.

The Argument from Consequences

Let me end this post with an argument that several of my friends in physics have explicitly made to me—not in the exact words below but in similar ones.

“Look, Scott, let the investors, government bureaucrats, and gullible laypeople believe whatever they want—and let D-Wave keep telling them whatever’s necessary to stay in business.  It’s unsportsmanlike and uncollegial of you to hold D-Wave’s scientists accountable for whatever wild claims their company’s PR department might make.  After all, we’re in this game too!  Our universities put out all sorts of overhyped press releases, but we don’t complain because we know that it’s done for our benefit.  Besides, you’d doubtless be trumpeting the same misleading claims, if you were in D-Wave’s shoes and needed the cash infusions to survive.  Anyway, who really cares whether there’s a quantum speedup yet or no quantum speedup?  At least D-Wave is out there trying to build a scalable quantum computer, and getting millions of dollars from Jeff Bezos, Lockheed, Google, the CIA, etc. etc. to do so—resources more of which would be directed our way if we showed a more cooperative attitude!  If we care about scalable QCs ever getting built, then the wise course is to celebrate what D-Wave has done—they just demonstrated quantum annealing on 100 qubits, for crying out loud!  So let’s all be grownups here, focus on the science, and ignore the marketing buzz as so much meaningless noise—just like a tennis player might ignore his opponent’s trash-talking (‘your mother is a whore,’ etc.) and focus on the game.”

I get this argument: really, I do.  I even concede that there’s something to be said for it.  But let me now offer a contrary argument for the reader’s consideration.

Suppose that, unlike in the “stone soup” scenario I outlined above, it eventually becomes clear that quantum annealing can be made to work on thousands of qubits, but that it’s a dead end as far as getting a quantum speedup is concerned.  Suppose the evidence piles up that simulated annealing on a conventional computer will continue to beat quantum annealing, if even the slightest effort is put into optimizing the classical annealing code.  If that happens, then I predict that the very same people now hyping D-Wave will turn around and—without the slightest acknowledgment of error on their part—declare that the entire field of quantum computing has now been unmasked as a mirage, a scam, and a chimera.  The same pointy-haired bosses who now flock toward quantum computing, will flock away from it just as quickly and as uncomprehendingly.  Academic QC programs will be decimated, despite the slow but genuine progress that they’d been making the entire time in a “parallel universe” from D-Wave.  People’s contempt for academia is such that, while a D-Wave success would be trumpeted as its alone, a D-Wave failure would be blamed on the entire QC community.

When it comes down to it, that’s the reason why I care about this matter enough to have served as “Chief D-Wave Skeptic” from 2007 to 2011, and enough to resume my post today.  As I’ve said many times, I really, genuinely hope that D-Wave succeeds at building a QC that achieves an unambiguous speedup!  I even hope the academic QC community will contribute to D-Wave’s success, by doing careful independent studies like the USC group did, and by coming up with proposals like Lidar’s for how D-Wave could move forward.  On the other hand, in the strange, unlikely event that D-Wave doesn’t succeed, I’d like people to know that many of us in the QC community were doing what academics are supposed to do, which is to be skeptical and not leave obvious questions unasked.  I’d like them to know that some of us simply tried to understand and describe what we saw in front of us—changing our opinions repeatedly as new evidence came in, but disregarding “meta-arguments” like my physicist friends’ above.  The reason I can joke about how easy it is to bribe me is that it’s actually kind of hard.

by Scott at May 16, 2013 05:41 PM UTC

from Richard Lipton

Can we help avoid parallel repetition of mistakes?

Irit Dinur has recently again shown a wonderful skill at re-conceptualizing an area that had seemingly been well worked out. A notable previous instance was her re-casting the proof of the PCP Theorem as a progressive amplification. Now she and David Steurer have posted a new paper whose title, “Analytical Approach to Parallel Repetition,” introduces a new framework. The subject of parallel repetition, which they call “a basic product operation for one-round two-player games,” is distinctive in having its genesis in a mistake made in a paper—a trap of automatic thinking. Lance Fortnow and Mike Sipser thought that executing multiple instances of a randomized protocol in parallel would have the same power-law reduction in the probability of overall failure as executing them independently in sequence, overlooking how the strategies in the parallel cases could be dependent and exploit this.

Today we talk about similar traps that people can fall into even at advanced graduate level. How might they be avoided?

Truth-in-blogging note: this post is really about a different case of products and independence, and most of it was written months ago. It was lacking an intro section with someone to feature according to our “blog invariant,” and we also wanted a few short examples of graduate-student traps in computational theory and mathematics before progressing to the main one. The parallel repetition example came not only first to mind, but also second and third and fourth… as Dick and I struggled to think of more good ones. Lance’s haute faute has already been mentioned a few times on this blog, and I thought it would make a tiresome and repetitious parallel to my own “trap.” It didn’t help that the neat example I saw online years ago which furnished the phrase “graduate-student trap”—but which I didn’t preserve and forgot—has evidently vanished into unsearchability.

I was trying to decide between leading with the late-medieval mathematician-theologian Nicholas de Cusa for his advice on not pretending to have completed knowledge, or alternately a colleague in Buffalo who has compiled a good graduate-student advice list. Lance has similar advice, and when looking for it I spotted the mention of Dinur in his Twitter feed—actually Lance’s re-tweet of one by my former student D. Sivakumar. Voilà—Lance’s example it was. Thanks, Irit.

## Traps

What is a trap? Pitfalls and paradoxes abound in mathematics and the sciences, and surmounting them is just part of acquiring the literature. Sometimes it is a confounding of preconceived expectations, but it is hard to find a way of defining such expectations that works for everybody or even most people. What makes a trap, in my opinion, is there being concrete indications in the concepts, in their contextual use, in their notation, or in the literature itself that run counter to the truth. Here are what strike Dick and me as a few simple but significant examples:

${\bullet}$ Square root is not a function. It is written like a function, but isn’t. Here is an example of what you can “do” by conveniently forgetting this: ${-1/1 = 1/-1}$, so take square roots of both sides. You get

$\displaystyle \frac{\sqrt{-1}}{\sqrt{1}} = \frac{\sqrt{1}}{\sqrt{-1}}\quad\text{so}\quad \frac{i}{1} = \frac{1}{i}\quad\text{so}\quad i^2 = 1.$

This contradicts the definition ${i^2 = -1}$.

${\bullet}$ Not all matrices are diagonalizable. Since even many singular matrices are diagonalizable, it is easy to forget this is not true in general. If it were true, then there would be a really quick proof that a matrix ${A}$ always satisfies its characteristic polynomial ${p}$. Namely, let ${A = P D P^{-1}}$ with ${D}$ the diagonal matrix of eigenvalues. Then on substituting the right-hand side into the formula ${p(A)}$, all the ${P}$‘s and ${P^{-1}}$‘s cancel except for one ${P}$ in front and one ${P^{-1}}$ in back. The rest is the component-wise evaluation ${p(\lambda)}$ on each eigenvalue ${\lambda}$ in ${D}$, which identically vanishes, leaving the all-zero matrix.

Well this is often not a bad error to make. Every matrix ${A}$ has arbitrarily close perturbed forms ${A'}$ that are diagonalizable, indeed have distinct eigenvalues. The above proof gives ${p'(A') = 0}$ where the characteristic polynomial ${p'}$ is coefficient-wise close to ${p}$. Continuity then implies ${p(A) = 0}$.

${\bullet}$ ${\mathbb{Z}_{p^k}}$ is not the same as the field ${GF(p^k)}$. The former is not a field, as it has zero divisors. The multiplicative subgroup formed by the elements that are not multiples of ${p}$ is not a field either. But this is again not always a bad error to make, even in crypto. A lot of properties and problems are similar between the structures.

These are really at high school or undergraduate level, before the research stage. What kind of traps matter at research level?

## My Trap

My own strongest feeling of falling into a “graduate student trap” came in October 2006, as I began my work on statistical claims of cheating with computers at chess that had arisen during the world championship match that month and before. I modeled a human player and a computer as distributions ${P}$ and ${Q}$ over the choices of available moves in game positions. Cheating would depend on how close the ensemble of played moves was to ${P}$ vis-à-vis ${Q}$, so I wanted a suitable distance measure ${d(P,Q)}$ between distributions. Modeling the computer as a distribution not only allows for different chess-engine program versions and parameter settings, but also for a steady amount of small variation caused by hash collisions—as I described first here and mainly here.

I decided to postulate that for two different (sets of) positions ${S_1}$ and ${S_2}$, the player’s distributions ${P_1,P_2}$ would be independent, and similarly ${Q_1,Q_2}$ for the computer. This makes the joint distributions ${P}$ and ${Q}$ on ${S_1 \cup S_2}$ satisfy ${P = P_1 \times P_2}$ and ${Q = Q_1 \times Q_2}$. So that I could group game turns as I wished, I wanted the distance measure to be additive, namely

$\displaystyle d(P,Q) = d(P_1 \times P_2, Q_1 \times Q_2) = d(P_1,Q_1) + d(P_2,Q_2).$

The first distance measure I considered, called Kullback-Leibler (K-L) divergence, is defined (on discrete domains ${X}$) by

$\displaystyle \kappa(P || Q) = \sum_{x \in X}P(x)\ln\frac{P(x)}{Q(x)}.$

Its Wikipedia page says straight out that ${\kappa}$ is additive. Great, I thought.

Unfortunately, K-L is not symmetric, and more of concern to me, ${\kappa}$ approaches ${+\infty}$ whenever there are events ${x}$ for which ${Q(x)}$ is tiny but ${P(x)}$ is not. In chess, such events would be moves the computer recognizes as bad but that players still fall into, or are tempted by. This was a concern because chess positions can have many bad moves, so that the “tail” of the move distribution could distort the value of ${\kappa}$. I could switch around ${P}$ and ${Q}$ to avoid this, but then reasonable moves shunned by players would cause other distortion.

Applications employing distributional divergence measures were new to me, but it so happened that my department’s Distinguished Alumni Speaker that month knew something about them. After hearing my issues, he—I won’t name the “guilty party,” though I already did—suggested using Jensen-Shannon (J-S) divergence instead. J-S reduces this distortion by employing the interpolated distribution ${R = \frac{1}{2}P + \frac{1}{2}Q}$. Then it is defined from two invocations of K-L by

$\displaystyle \eta(P,Q) = \frac{1}{2}\kappa(P || R) + \frac{1}{2}\kappa(Q || R).$

This always gives finite values, and is symmetric—hence the use of comma not ${||}$. Analogous to how the sum of squared differences, which is obviously additive on product vectors, is the square of the Euclidean metric, ${\eta}$ is also the square of a metric. All this, plus the absence of contrary information, plus the frequent words “J-S is a symmetrized and smoothed version of K-L,” naturally made me assume that ${\eta}$ was additive. Grateful for the tip, I happily started on the machinery to apply it for chess.

A week later I started drafting a paper describing my concept and model, and decided it would be good to include a proof that J-S divergence is additive. Then, only then, is when I discovered with an electric shock:

It isn’t.

I’ll leave the task of actually constructing counterexamples to the reader, but here’s an intuition. It uses a generalizing trick that reminds me of one by Bob Vaughan that we covered last July. For any ${\lambda}$ put ${R' = \lambda P + (1 - \lambda)Q}$, and then define

$\displaystyle \eta'_{\lambda}(P,Q) = \lambda \kappa(P||R') + (1 - \lambda)\kappa(Q||R').$

A little reflection may convince you that this cannot be additive for all ${\lambda}$. Hence its being additive for ${\lambda = \frac{1}{2}}$, which yields ${\eta}$, would be an “accident.” Finally thinking how ${P}$ and ${Q}$ themselves can give-and-go with ${\lambda}$ gives the inkling that the accident doesn’t happen.

## How Clear in the Literature?

I can put this in the form of a “blog-beg” (sometimes called a bleg):

Can you find an easily-accessed source that says clearly that the basic Jensen-Shannon divergence is not additive?

As of this writing, the Wikipedia page on J-S still does have such a statement. Adding one yourself would be cheating. In 2006 I did not find one elsewhere, even in a couple of texts. My one-hour trial by Google when I first drafted this post last summer found one paper in 2008 titled “Nonextensive Generalizations of the Jensen-Shannon Divergence.” This clued me in that nonextensive is the antonym of additive. So the authors’ generalizations are not additive, but what about the original ${\eta}$?

Another paper I found had the promising title “Properties of Classical and Quantum Jensen-Shannon Divergence,” and even more, its first author and I have Harry Burhman as a common coauthor. It defines J-S with a bang in the opening sentence, states some generalizations of J-S, and (still on page 1) says the magic word:

Shannon entropy is additive in the sense that the entropy of independent random variables, defined as the entropy of their joint distribution, is the sum of their individual entropies. (emphasis in original)

But the next sentence brings up the different topic of Rényi entropy, and after a mention of “non-extensive (i.e. nonadditive) generalizations” of J-S it goes into quantum.

Another paper picks up the thread in its title, “Nonextensive Generalizations of the Jensen-Shannon Divergence.” The point of the first two words must be that the original J-S is additive, yes? It’s the generalizations that are non-additive. Right? The paper’s abstract says it builds something called the Jensen-Tsallis ${q}$-difference, “which is a nonextensive generalization of the JSD.” So the original J-S is extensive, then? After defining additivity, it brings up the generalizations in which

… the additivity property is abandoned, yielding the so-called nonextensive entropies.

The next paragraph introduces K-L and J-S, but doesn’t tell me whether the “abandoned” property was ever there. It seems that this simple knowledge is presumed, but how might a bright young graduate student—or an old one—find it in the first place?

## Open Problems

Can you give some more examples of “graduate-student traps”? Ones that are helpful to know?

Is Jensen-Shannon divergence close to being additive, in some useful sense? This actually strikes me as a non-trappy, research-worthy question.

by KWRegan at May 16, 2013 04:42 AM UTC

### New Greedy Heuristics For Set Cover and Set Packing

Authors: David Kordalewski