# Theory of Computing Blog Aggregator

### Why do we think P NE NP? (inspired by Scott's post)

Recently  Scott Posted an excellent essay on reasons to think that P NE NP.  This inspired me to post on the same topic. Inspired is probably the right word. Some of my post is  copied and some of my post   are my thoughts. A good test: if its intelligent and well thought out then its probably from Scott.

Why do scientists believe any particular theory?  I state three reasons, though there are likely more:
(1) By doing Popperian experiments- experiments that really can fail. Their failure to fail helps to confirm the theory. This is common in Physics, though gets harder when the particles get smaller and string-like. (2) Great Explanatory power. Evolution is the main example here--- it's hard to do real experiments but one can look at the data that is already out there and find a theory that explains it all. (3) (Kuhn-light) It fits into the paradigm that scientists already have. This comes dangerously close to group think; however, if most of the people who have looked at problem X think Y, that should carry some weight.

Can we do Popperian experiments for P vs NP? For that matter can we do Popperian experiments in Mathematics? Goldbach's conjecture and the Riemann Hypothesis seem to have good empirical evidence. Though that kind of reasoning gets me nervous because of the following true story: Let li(x) = int_0^x dt/ln t.
Let pi(x) be the number of primes \le x. It is known that li(x) and pi(x) are VERY CLOSE. Empirical evidence suggested that li(x)  \le  pi(x) and this was conjectured. Skewes proved that there was an x  for which li(x) \ge  pi(x). His bound on x, the the Skewes' number ,was quite large, at one time the largest number to appear in a math paper (that record now belongs to  Graham's Number). Then Littlewood, Skewes's advisor, showed that the sign of li(x)-pi(x) changes infinitely often. So the empirical evidence was not indicative.

There might also be empirical tests you can do for continuous math, especially if its related to physics so you can do physics experiments.

Have there been Popperian experiments to try to verify P NE NP? I am not quite sure what that means, but I do not think there have been (if I'm wrong please comment politely).

So we move on to Great Explanatory Power. Are there many different empirical facts out there for which P NE NP would explain them and give a unifying reason? Does a bear... Well, never mind, the answer is YES! I give one  examples (from Scott's post) and two more. But note that there are MANY MANY MORE.

1. Set Cover problem: Given S_1,....S_m \subseteq {1,...,n} find the size of the smallest subset of S_1,...,S_m that covers the union of S_1,...,S_m. Chvatal showed in 1979 that one can find, i poly time, a subset of S_1,...,S_m that is (ln n)+OPT. Okay, great, an approximation. EMPIRICAL FACT: people seemed unable to improve on this at all. In 2013 Dana Moshkovitz proved  that, assuming P\ne NP, this bound CANNOT be broken. Note that the algorithm of Chvatal and the lower bound of Moshkovitz have nothing to do with each other.
2. Vertex cover: given a graph find the smallest set of vertices so that every vertex is one of them or next to one of them. There is an algorthm that gives 2OPT. There is one that does ever so slightly better: (2-(1/sqrt(log V))OPT.  In 2005 Dinur and Safra proved that, assuming P NE NP, there is no 1.36*OPT approximation. This does not match exactly but it still explains the lack of progress somewhat (more on this later when I discuss UQC).
3. Max 3-SAT: given a 3-CNF formula find an assignment that maximizes the number of clauses satisfied.  Karloff and Zwick proved that there is an algorithm that finds an assignment satisfying (7/8)*OPT. Hastad proved that, assuming P NE NP, (7/8)*OPT is the best you can do.
A bit more prosaic: P NE NP explains why people have had a hard time solving THOUSANDS OF PROBLEMS. I am most impressed with HAM CYCLE since mathematicians had been working on that one for quite some time--- trying to get a similar char to that of EULER circuit.

So in summary, I find that P NE NP has GREAT explanatory power. That makes it a very compelling conjecture. Let us apply this test to other conjectures.

1. Sigma_2 \ne Pi_2. Does this assumption explain anything? Our inability to find a circuit for SAT. I dont know what else it implies. Same for Sigma_i vs Pi_i. This might qualify as mini-kuhnian: Sigma_2\ne Pi_2 fits into how we view the world.
2. P=BPP.  Almost every problem in BPP ended up falling into P over time. P=BPP would explain this. Also Nisan-Widgerson and its extensions make P=BPP fit into our world view.
3. Unique Game Conjecture. This explains many upper and lower bounds that match, though nowhere near that of P NE NP. One of them is the constant 2 for VC. Even so, I find that compelling. (One of Scott's commenters said that all of the lower bounds from UGC are actually unified some how, so its not quite as compelling.)
4. Factoring not in P. No real explanatory power here except that we seem to have a hard time finding an algorithm for Factoring.
5. Graph Isom not in P. Similar to Factoring not in P.
So, what are our reasons to think Sigma_2 \ne Pi_2?

Lastly, mini-Kuhnian. What do people in the field think? The polls on P vs NP that I conducted in 2002  and 2012 (see here)  indicate that believe that P NE NP is growing- roughly 60% in 2002, roughly 80% in 2012. Some of the commentators on Scott's blog took that 20% of P=NP people to be relevant.
And indeed some of the P=NP people are both serious theorists and also not Dick Lipton (who seems to be who everyone points to) as a serious theorist who thinks P=NP). But some of those people emailed me that this was a protest vote, protesting the fields certainty that P=NP. I also note that three of them compared it to their voting or Ralph Nader in 2000, only with less drastic consequences.

I personally don't take what people think' that seriously, but because of my polls we actually know what people think, so I put it out there.

by GASARCH (noreply@blogger.com) at March 10, 2014 02:55 PM UTC

### How To Carry Fame

from Richard Lipton

Long proofs are not always the most important results

Michael Rabin is visiting Georgia Tech today and tomorrow to give a pair of distinguished lectures. Both of these will be on applications of cryptography. One is to help auctions avoid cheaters, while the other is to help elections avoid cheaters. I see a pattern. Ken sees another pattern— he is helping chess tournaments avoid cheaters.

Today I want to comment about Rabin’s fame and what makes a result important.

I have known Michael since I was a graduate student at CMU—I have talked about this before here. In the decades since then I have heard him given many talks, all of which have been brilliant. He is one of the best presenters of technical material I have every seen, perhaps the best in the world. My “proof” of this statement is:

that I can still recall—in detail—most of his talks, even ones from decades ago.

Can you recall that talk you heard last year, or even one you heard last month? I have trouble recalling my own talks. But Michael’s talks are special, memorable, informative, clear, and fun.

## Great Talks

I have selected a few talks of Michael that I recall in great detail—they span about forty years. There are many others that I could have added, but these should make my point.

${\bullet }$ His talk on Theoretical impediments to artificial intelligence, was the first of his talks that I had ever heard. It was at the 1974 IFIP Congress, which occurred in Stockholm Sweden. There was a time when the IFIP Congress was a major conference that many of us went to. I met Dick Karp there for the first time.

${\bullet }$ His talk on the introduction of randomness to algorithms, which was given at Yale when I was there as a junior faculty member. It was in 1977, I recall. This talk made the case for the power of randomness—Michael showed that randomness could help in a certain geometric search problem. I talked about this in detail in the same post with the CMU story.

${\bullet }$ His talk on the Karp-Rabin pattern matching algorithm was given in the 1980′s at Princeton University. We have also talked about this before here.

${\bullet }$ His talk on hyper-encryption was given at Georgia Tech about ten years ago. This was an cool idea—I believe—on using non-complexity assumptions to build encryption methods that were very powerful. The short insight was that memory is expensive, and one could defeat an adversary that had limited memory. This yielded a protocol that needed no assumptions about factoring or the existence of one-way functions.

## Great Results

Why indeed is Rabin famous? He received the Turing Award with Dana Scott for their work on finite state automata (FSA). I would argue that his most exciting results were curiously his least deep results. We all know about FSA; his introduction of randomness to all parts of computing; his primality test, independent but related to Gary Miller’s work; his pattern matching algorithm with Karp; and much more. Yet, I would argue that his deepest result is probably his least known. It was, is, his brilliant work on S2S.

## Second Order Monadic Theory

What is S2S?

There are many logical theories that we study, such as Peano Arithmetic (PA). PA is a first-order theory. This means that quantifiers can only range over individual elements—in PA they range over integers. Thus, in PA we can say

$\displaystyle \forall x \ \exists r \ \exists s \ \exists t (t\neq 0 \wedge tx = r^{3} + s^{3}).$

This states that all numbers have a non-zero multiple that is a sum of two cubes. This is true—but it is not trivial.

The reason PA is so powerful is that it allows both addition and multiplication. Given a statement like the above about cubes it is impossible, in general, to decide whether the statement is true or not.

We obviously like decidable theories since at least in principle they allow us to tell if a statement is true or false. Of course if ${\mathsf{P} \neq \mathsf{NP}}$, then even for a decidable theory it may be hard to tell whether something is true. But still decidable is a great property for a theory to have.

A difficulty is the tension between being an expressive theory and being decidable. PA is very expressive, most everyday theorems of mathematics can be proved in it, at least in principle. It is so expressive that even weak subtheories are undecidable.

Enter S2S. The theory S2S is a different kind of theory from PA. While PA is a first-order theory, S2S is a second-order theory. The “S” in “S2S” stands for second order. It allows quantifiers to range over individual elements and also over finite or infinite sets of elements. The basic objects in S2S are finite paths in the infinite binary tree.

In S2S we can talk about the left and right successor to any such element: if ${x}$ is an element, then ${x0}$ and ${x1}$ are the respective successors. Since it is a second order theory we are also allowed quantifiy over sets of such elements.

## Decidedly More Power

The ability to quantify over sets makes S2S very powerful and expressive. For example, here are two notions expressed formally:

The magic of this is that while the theory is expressive, it is not too expressive. Indeed the Rabin proved in 1969:

Theorem 1 The monadic second order theory of the infinite binary tree is decidable.

When I first looked at Rabin’s paper, as a graduate student at CMU, it was not the depth of his proof, which is wonderful, but rather the array of applications that followed that excited me. One measure of the depth of a theorem is the number of open problems it solves. Rabin’s theorem can be used to prove the following other theories are decidable:

• The first order theory of rationals with the order ${<}$ relation.
• The first order theory of boolean algebras with ideals.
• Certain modal logics, such as S4.

These results follow by encoding the decidability question into the powerful theory S2S and invoking Rabin’s Theorem. See this for a nice summary of S2S in slide format by Shane Steinert-Threlkeld.

The proof of Rabin’s Theorem was a tour-de-force. It requires clever definitions and some quite detailed inductive arguments. Since his original proof people have found “easier” proofs, but the original was quite deep and intricate.

I would argue that this theorem is one of the deepest results of Rabin’s many beautiful results over his long career. It is well known to those who work in logic and automata theory, but is perhaps less known to the whole theory community. If you already knew it fine, if not, then I hope you begin to appreciate the depth of his work.

## Open Problems

Perhaps a lesson here for all: fame comes from results that are game-changers, which does not always mean they are deep long complex arguments. Sometimes that is the case: clearly the solution to Fermat Last Theorem and the Poincaré Conjecture are famous and deep results. Yet many times I think Rabin’s situation is more often the case: a simple to state result that yields an “ah” moment, that opens doors for others, that changes the landscape of thinking about an area, is the most important type of result. Rabin has many many of these results. I would argue that without S2S he still would be one of the greatest theorists who has ever lived.

What do you think?

by rjlipton at March 10, 2014 02:43 PM UTC

### TR14-034 | On the complexity of trial and error for constraint satisfaction problems | Raghav Kulkarni, Miklos Santha, Youming Qiao, Gábor Ivanyos, Aarthi Sundaram

from ECCC papers

In a recent work of Bei, Chen and Zhang (STOC 2013), a trial and error model of computing was introduced, and applied to some constraint satisfaction problems. In this model the input is hidden by an oracle which, for a candidate assignment, reveals some information about a violated constraint if the assignment is not satisfying. In this paper we initiate a systematic study of constraint satisfaction problems in the trial and error model. To achieve this, we first adopt a formal framework for CSPs, and based on this framework we define several types of revealing oracles. Our main contribution is to develop a transfer theorem for each type of the revealing oracle, under a broad class of parameters. To any hidden CSP with a specific type of revealing oracle, the transfer theorem associates another, potentially harder CSP in the normal setting, such that their complexities are polynomial time equivalent. This in principle transfers the study of a large class of hidden CSPs, possibly with a promise on the instances, to the study of CSPs in the normal setting. We then apply the transfer theorems to get polynomial-time algorithms or hardness results for hidden CSPs, including satisfaction problems, monotone graph properties, isomorphism problems, and the exact version of the Unique Games problem. Most of the proofs of these results are short and straightforward, which exhibits the power of the transfer theorems.

### TR14-033 | Candidate Weak Pseudorandom Functions in $\mathrm{AC}0 \circ \mathrm{MOD}2$ | Siyao Guo, Adi Akavia, Andrej Bogdanov, Alon Rosen, Akshay Kamath

from ECCC papers

Pseudorandom functions (PRFs) play a fundamental role in symmetric-key cryptography. However, they are inherently complex and cannot be implemented in the class $\mathrm{AC}^0( \mathrm{MOD}_2)$. Weak pseudorandom functions (weak PRFs) do not suffer from this complexity limitation, yet they suffice for many cryptographic applications. We study the minimal complexity requirements for constructing weak PRFs. To this end 1. We conjecture that the function family $F_A(x) = g(Ax)$, where $A$ is a random square $GF(2)$ matrix and $g$ is a carefully chosen function of constant depth, is a weak PRF. In support of our conjecture, we show that functions in this family are inapproximable by $GF(2)$ polynomials of low degree and do not correlate with any fixed Boolean function family of subexponential size. 2. We study the class $\mathrm{AC}0 \circ \mathrm{MOD}_2$ that captures the complexity of our construction. We conjecture that all functions in this class have a Fourier coefficient of magnitude $\exp(- \poly \log n)$ and prove this conjecture in the case when the $\mathrm{MOD}_2$ function is typical. 3. We investigate the relation between the hardness of learning noisy parities and the existence of weak PRFs in $\mathrm{AC}0 \circ \mathrm{MOD}_2$. We argue that such a complexity-driven approach can play a role in bridging the gap between the theory and practice of cryptography.

### Massively parallel read mapping on GPUs with PEANUT

Authors: Johannes Köster, Sven Rahmann
Download: PDF
Abstract: We present PEANUT (ParallEl AligNment UTility), a highly parallel GPU-based read mapper with several distinguishing features, including a novel q-gram index (called the q-group index) with small memory footprint built on-the-fly over the reads and the possibility to output both the best hits or all hits of a read. Designing the algorithm particularly for the GPU architecture, we were able to reach maximum core occupancy for several key steps. Our benchmarks show that PEANUT outperforms other state-of- the-art mappers in terms of speed and sensitivity. The software is available at this http URL

### Implementing and reasoning about hash-consed data structures in Coq

Authors: Thomas Braibant, Jacques-Henri Jourdan, David Monniaux
Download: PDF
Abstract: We report on four different approaches to implementing hash-consing in Coq programs. The use cases include execution inside Coq, or execution of the extracted OCaml code. We explore the different trade-offs between faithful use of pristine extracted code, and code that is fine-tuned to make use of OCaml programming constructs not available in Coq. We discuss the possible consequences in terms of performances and guarantees. We use the running example of binary decision diagrams and then demonstrate the generality of our solutions by applying them to other examples of hash-consed data structures.

### Simons Symposium 2014 — Discrete Analysis: Beyond the Boolean Cube

from Ryan O'Donnell

I’m pleased to announce that this week we’ll be reporting on the 2014 Simons Symposium — Discrete Analysis: Beyond the Boolean Cube. This is the second of three biannual symposia on Analysis of Boolean Functions, sponsored by the Simons Foundation. You may remember our reports on the 2012 edition which took place in Caneel Bay, US Virgin Islands. This year we’re lucky to be holding the symposium in Rio Grande, Puerto Rico.

I’m also happy to report that we will have guest blogging by symposium attendee Li-Yang Tan. This year’s talk lineup looks quite diverse, with topics ranging from the Bernoulli Conjecture Theorem to Fourier analysis on the symmetric group, to additive number theory. Stay tuned!

by Ryan O'Donnell at March 09, 2014 09:35 PM UTC

### TR14-032 | Tableau vs. Sequent Calculi for Minimal Entailment | Olaf Beyersdorff, Leroy Chew

from ECCC papers

In this paper we compare two proof systems for minimal entailment: a tableau system OTAB and a sequent calculus MLK, both developed by Olivetti (J. Autom. Reasoning, 1992). Our main result shows that OTAB-proofs can be efficiently translated into MLK-proofs, i.e., MLK p-simulates OTAB. The simulation is technically very involved and answers an open question posed by Olivetti (1992) on the relation between the two calculi. We also show that the two systems are exponentially separated, i.e., there are formulas which have polynomial-size MLK-proofs, but require exponential-size OTAB-proofs.

### Codes Meet Numbers

from Richard Lipton

Results and problems about primes that make us think of coding theory

 Cropped from source.

Zhi-Wei Sun is a professor in Mathematics at Nanjing University in China. He is the Editor in Chief of the recently-founded Journal of Combinatorics and Number Theory. His homepage is unique in prominently featuring a long list of

• …not his awards,

• …not his papers,

• …not his theorems,

• …but rather his conjectures.

Indeed we count 432 total conjectures in his list, subtracting one that he seems last year to have proved. They do not seem to be easy conjectures—this one implies the Riemann Hypothesis in a nontrivial way. Some of them involve powers of 2, which lend them a coding-theory flavor.

Today Ken and I wish to share ideas of using coding theory to prove number-theoretic results.

Coding theory owes much to Richard Hamming, who defined Hamming distance between codewords and created the binary Hamming codes. The Hamming sphere of radius ${d}$ centered on a word ${x}$ is the set of all words ${y}$ of the same length whose Hamming distance is at most ${d}$, meaning ${x}$ and ${y}$ differ in at most ${d}$ places. Our question is how the distributions of prime numbers and other sets relate to the Hamming spheres of their binary encodings. For example, ${5 = 101}$ and ${7 = 111}$ are twin primes of Hamming distance ${1}$, while ${11 = 1011}$ and ${13 = 1101}$ have distance ${2}$.

Sun has his own “Super Twin Prime Conjecture” listed first on his page. Call a pair ${(k,m)}$ of integers “super” if ${\pi(k)}$ and ${\pi(\pi(m))}$ are both twin primes, indeed the lower member of a twin pair, where ${\pi(k)}$ denotes the ${k^{th}}$ prime number. The conjecture—a hybrid of Twin and Goldbach—is:

Every integer ${n \geq 3}$ is the sum of a super pair.

He has verified this for ${n}$ up to ${10^9}$. What can motivate such a conjecture? Certainly one motivation is expected density—not only “should” the twin primes be infinite, but they should be dense enough to fill this kind of requirement, much as the set of primes themselves is dense enough to make Goldbach’s conjecture—that every even number ${n \geq 4}$ is the sum of two primes—plausible. But what other structure can supply motivation? That is what we seek.

## Numbers and Codes

Coding theory is all about the properties of vectors over finite sets, often over the binary field ${\{0,1\}}$. Of course number theory is all about the additive and multiplicative properties of integers. These seem to be vastly different, yet they are related: every number can be represented by a binary vector.

Coding theory gives a way to express relationships that might be cumbersome with arithmetical tools such as congruences. For example, say a number ${r}$ is “top modulo ${2^m}$” if its ${m}$-bit encoding begins with a ${1}$. Of course we can specify this as ${2^{m-1} \leq r < 2^m}$, but “less-than” is a dodgy concept when working modulo some number. When ${m}$ is odd we might want to define instead that the middle bit of ${r}$ in binary is a ${1}$. Middle bits are sometimes important in crypto, but relations involving them are not always easy to define via congruences.

Of course a number is odd iff its last bit in binary is ${1}$, and is congruent to ${3}$ mod ${4}$ iff its second-last bit is also ${1}$. The distinction between congruence to ${1}$ versus ${3}$ mod ${4}$ is generally important for odd primes. How about congruence to ${5}$ or ${7}$ mod ${8}$, versus ${1}$ or ${3}$, that is being top mod ${8}$? Of course one important distinction is which congruences are quadratic residues, but with binary-code notions we can define others.

The number ${71 = 1000111}$ is congruent to ${7}$ mod ${8}$, and is part of a twin pair of Hamming distance ${3}$ with ${73 = 1001001}$. The first twin pair with greater Hamming distance actually gives distance ${6}$: ${191 = 10111111}$ and ${193 = 11000001}$. Next comes ${2687 = 10101111111}$ and ${2689 = 10110000001}$ for distance ${7}$. Is the Hamming distance of twin primes unbounded?

Of course we don’t even know if there are infinitely many twin primes. This is really asking whether twin primes can flank a multiple of an arbitrarily large power of ${2}$. Quick web searches have not found this question, while our point is that our motive came from the coding-theory angle.

## Polignacs and Twins and Spheres

In 1848, Camille Marie de Polignac somewhat lazily conjectured that every odd number ${n}$ is the sum of a prime number and a power of ${2}$. We speculate that he may have intended to exclude false cases such as ${n = 127}$ where ${n}$ itself is prime, but even so he missed ${n = 905}$ amid his reported (but to us incredible) claim to have verified this up to about 3 million. Perhaps it was spoken with the bluster of a Confederate major-general during the Civil War, which is surprisingly what this French nobleman became.

His older brother, Alphonse de Polignac, made a somewhat less lazy conjecture: that every even number ${k}$ is the difference between infinitely many pairs of consecutive primes. Of course with ${k=2}$ this subsumes the Twin Primes Conjecture, and indeed the latter is sometimes called de Polignac’s Conjecture after him.

Should Camille have teamed up with his brother to make his conjecture? And would they have done better if they had been twins—maybe prove something about their conjecture? Well we have an example to go by: Zhi-Wei Sun has a twin brother, Zhi-Hong Sun at Huaiyin Normal University, and they teamed up in 1992 to prove something about the following conjecture by Donald Wall:

There are infinitely many primes ${p}$ such that either ${p \equiv \pm 1}$ modulo ${5}$ and ${p^2}$ divides the Fibonacci number ${F_{p - 1}}$, or ${p \equiv \pm 2}$ modulo ${5}$ and ${p^2}$ divides the Fibonacci number ${F_{p + 1}}$, where ${F_n}$ is indexed with ${F_0 = 0}$, ${F_1 = 1}$.

What Sun and Sun proved is that any minimal counterexample to Fermat’s Last Theorem would need to involve such a prime—of course from the proof of FLT two years later, we know there are none. They also gave a sufficient condition for a “Wall-Sun-Sun prime” to exist, though none has yet been found.

Back to the Polignacs, we can try to capture ideas of both their conjectures with a case of what is actually a pretty general kind of question—a kind one can pose about other sets of numbers besides the primes:

What is the minimum ${d}$ such that every odd number is within the distance-${d}$ Hamming sphere of a prime number? Is it finite?

To get the even numbers too we can add ${1}$ to ${d}$. Of course this is still a strong “every” kind of conjecture, and those are hard to prove. One can first try to attack “infinitely-many” versions. Obviously there are infinitely many odd numbers ${q}$ that are ${\pm 2}$ from a prime ${p}$, but if we insist that ${q}$ too be prime we have our old friend the Twin Prime Conjecture again. So here is the corresponding Hamming-like question:

What is the minimum ${d}$ such that there are infinitely many prime numbers ${q}$ that are within Hamming distance ${d}$ of some other prime number?

Using Hamming’s own ideas in coding theory, we can prove the minimum is at most ${d=2}$. Note that this is stronger than saying there are infinitely many pairs of primes ${(p,q)}$ such that ${q - p = 2^k \pm 2^l}$, because we are also restricting what ${p}$ and ${q}$ have in the bit positions ${k}$ and ${l}$ from the end.

## The Proof

The theorem is not that amazing, or unexpected, but we think how we prove it may be of interest. The proof is via Hamming’s famous theorem on the density of codes. Let ${A_{q}(n,d)}$ be size of the largest set ${S}$ of ${n}$ vectors over an alphabet of size ${q}$ so that any two distinct code words in ${S}$ are at least Hamming distance ${d}$ apart.

Theorem 1 For all ${n}$ and ${q}$ and ${d}$:

$\displaystyle A_{q}(n,d) \le \frac{q^{n}}{\sum_{k=0}^{t} \binom{n}{k}(q-1)^{k}},$

where

$\displaystyle t = \left\lceil \frac{d-1}{2} \right\rceil.$

Now to state formally what we are proving, it is:

Theorem 2 For every sufficiently large ${n}$, there are primes ${p,q}$ with ${2^n \leq p,q < 2^{n+1}}$ such that ${p}$ and ${q}$ have Hamming distance at most ${2}$.

Proof: Consider the set ${S}$ of all primes in the interval ${[2^{n}, \dots, 2^{n+1}-1]}$. These of course can be represented by ${n}$-bit vectors, eliding the leading ${1}$ in the ${2^n}$ place. Think of them as a code. We will show that its minimum distance ${d}$ is at most ${2}$, from which the theorem follows.

Suppose that the distance is larger, that is ${d \ge 3}$. The apply Hamming’s Theorem for ${A_{2}(n,d)}$ noting that ${t \ge 1}$. This yields

$\displaystyle |S| \le \frac{2^{n}}{\sum_{k=0}^{t} \binom{n}{k}} \le \frac{2^{n}}{1+n}.$

The Prime Number Theorem states that the density of primes up to ${N}$ converges to ${N/\ln N}$ as ${N \longrightarrow \infty}$, where ${\ln}$ is natural log. By an easy manipulation of estimates it follows that for any ${\epsilon > 0}$ and all large enough ${N}$, the primes have density at least ${(1-\epsilon)\frac{1}{\ln N}}$ in ${[N/2,\dots,N-1]}$. It follows with ${N = 2^{n+1}}$ that

$\displaystyle |S| \geq (1-\epsilon)\frac{2^{n}}{(n+1)\ln 2}.$

Since ${1/\ln 2 = \log_2 e = 1.44\dots}$, this implies with the above that

$\displaystyle \frac{1.44(1-\epsilon)2^n}{n+1} \le \frac{2^n}{n+1},$

but this is clearly false for large enough ${n}$ and small enough ${\epsilon}$. This contradiction proves ${d \leq 2}$ and hence the theorem. $\Box$

As the proof shows, there is actually a fair bit of “slack” in the counting. Hence the theorem can be extended to add conditions on the primes: we can further restrict the sizes of the primes, their residues modulo small numbers, and other properties. Indeed the counting makes this all close to working with ${d=1}$. That takes us back within the sphere of Camille de Polignac’s question as well.

## An Obstinate Question?

For ${d=1}$ the question again comes in “every” (with the qualifier “large enough”) and “infinitely many” flavors:

1. Is it true that for every large enough prime ${p}$, there is a prime ${q}$ such that ${p}$ and ${q}$ differ in one bit, or at least differ by a power of ${2}$? (no)

2. Are there infinitely many primes ${p}$ such that for some other prime ${q}$, ${p}$ and ${q}$ differ in one bit, or at least ${q - p}$ is a power of ${2}$? (open)

Note that ${q > p}$ is allowed. Without that condition we’d have already the counterexample ${p = 127}$ to Camille’s conjecture. Incidentally, counterexamples to Camille are called obstinate numbers, and there are various pages devoted to enumerating them. A case where ${q > p}$ is important is ${p = 113,\!921}$: ${q = p + 2^{141}}$ is prime. Of course whenever ${q \geq 2p}$ such a pair also has Hamming distance ${1}$, using leading ${0}$‘s to pad ${p}$ to the same length as ${q}$.

In 1964, Fred Cohen and John Selfridge noted that allowing ${q > p}$ made Camille’s idea good for every odd number up to ${262,144}$. However, they proved that the prime

$\displaystyle p = 47,\!867,\!742,\!232,\!066,\!880,\!047,\!611,\!079$

cannot be written as ${p = q \pm 2^k}$ with ${q}$ prime. Moreover, they gave an infinite arithmetic progression of numbers ${a + bn}$ with ${a,b}$ coprime that cannot be written as ${\pm 2^k \pm q^l}$ with ${q}$ prime. Since every such progression contains infinitely many primes, this finally lays question 1 to rest even with the “large enough” modifier. Zhi-wei Sun himself made good on something stated in their abstract but not proved in their paper, in a 2000 paper giving an infinite arithmetic progression of numbers that cannot be written as ${p^k \pm q^l}$ with ${p,q}$ prime and ${k,l > 0}$.

All this still, however, leaves the second question open. We would like to prove it, indeed find moderately dense sets of such pairs.

## Open Problems

Are there infinitely many pairs of primes that differ in just one bit?

We note that there are people who have thought about connections between coding theory and number theory. For example, Toyokazu Hiramatsu and Günter Köhler have a whole monograph titled Coding Theory and Number Theory on this topic. But the idea there is to apply number theory to shed light on the structure of codes. Elliptic curves, for instance, can be used to construct certain interesting codes. We are interested in the impact of coding theory on number theory, such as the distribution of important sets like the primes, which in turn may have applications in computing theory.

[$2^k + 2^l$ changed to $2^k \pm 2^l$]

### Induced Clebsch subgraphs

from David Eppstein

Today I've been playing with the induced subgraphs of the Clebsch graph. Several other interesting and well-known graphs can be obtained from it by deleting a small number of vertices and forming the induced subgraph of the remaining vertices.

To begin with, one simple construction of the Clebsch graph is to take all length-four binary strings as vertices, and to make two strings neighbors when they differ either by a single bit or by all four bits. So it has sixteen vertices and 40 edges, and can be visualized as a four-dimensional hypercube with the long diagonals added.

It can be split into two three-dimensional cubes, by partitioning the binary strings according to the value of their first bit. Since the cube is bipartite, we can color the whole Clebsch graph with four colors, by using two colors within each cube. This is optimal, because the largest independent sets in the Clebsch graph (the sets of neighbors of a single vertex) have five vertices, not enough for three of them to cover the whole graph.

The Clebsch graph can also be split in a different way into two copies of the Wagner graph:

Removing a vertex and its five neighbors from the Clebsch graph (a K1,5 subgraph) leaves a copy of the Petersen graph as the remaining graph. Similarly removing a maximum independent set leaves a Petersen graph together with a single isolated vertex.

Removing a five-vertex cycle (such as the central cycle of the Petersen graph above) leaves a copy of the Grötzsch graph:

Removing a four-vertex maximal independent set (four independent vertices that do not have a single common neighbor) leaves a subdivision of the complete bipartite graph K4,4, in which four edges forming a matching have been subdivided:

Removing a four-cycle leaves a twelve-vertex torus graph with 24-way symmetry:

Here are a couple of less-symmetric large planar induced subgraphs.

The second of these planar graphs can be formed by removing all binary strings with equal numbers of zeros and nonzeros, a six-vertex subset that induces a three-edge matching. If we instead partition the vertices into strings with even and odd parity, we get a partition of the Clebsch graph into two eight-vertex induced matchings.

The Schläfli graph (or its complement) has many induced Clebsch graphs inside it, so presumably it also has an interesting collection of symmetric induced subgraphs. But that's getting a bit too large for the sort of by-hand analysis I did to get the list here; it calls for automation instead.

### David Woodruff receives the Presburger Award 2014

from Luca Aceto

The EATCS is proud to announce that the Presburger Award 2014 Committee has chosen David Woodruff (IBM Almaden Research Center) as the recipient of the Presburger Award 2014. Congratulations to David!

Since 2010, the Presburger Award has been given each year to a young scientist (in exceptional cases to several young scientists) for outstanding contributions in theoretical computer science, documented by a published paper or a series of published papers. The award is named after Mojzesz Presburger who accomplished his path-breaking work on decidability of the theory of addition (which today is called Presburger arithmetic) as a student in 1929. The Presburger Award 2014 is sponsored by CWI Amsterdam and will be presented at ICALP 2014 in Copenhagen, Denmark.

David Woodruff, born in 1980, has made important contributions to the theory of data streams, both creating new algorithms, and demonstrating that certain algorithms cannot exist. His work has an impact on other fields, including compressed sensing, machine learning, and numerical linear algebra. In the area of data streams, he resolved the Distinct Elements Problem, simultaneously optimizing the amount of memory used, the time needed to process each new entity, and the time needed to report an estimate of the number of distinct elements in the stream. In the area of machine learning, he used his previous results on data streams to design sub-linear algorithms for linear classification and minimum enclosing ball problems. In numerical linear algebra, he developed the first algorithms for low rank approximation and regression that run in time proportional to the number of non-zero entries of the input matrix. His work also resulted in 17 patents related to data streams and their applications.

The 2014 Presburger Award Committee consisted of
 Antonin Kucera Brno, chair Claire Mathieu ENS Paris Peter Widmayer Zurich

by Luca Aceto (noreply@blogger.com) at March 07, 2014 01:13 PM UTC

### The Scientific Case for P≠NP

from Scott Aaronson

Out there in the wider world—OK, OK, among Luboš Motl, and a few others who comment on this blog—there appears to be a widespread opinion that P≠NP is just “a fashionable dogma of the so-called experts,” something that’s no more likely to be true than false.  The doubters can even point to at least one accomplished complexity theorist, Dick Lipton, who publicly advocates agnosticism about whether P=NP.

Of course, not all the doubters reach their doubts the same way.  For Lipton, the thinking is probably something like: as scientists, we should be rigorously open-minded, and constantly question even the most fundamental hypotheses of our field.  For the outsiders, the thinking is more like: computer scientists are just not very smart—certainly not as smart as real scientists—so the fact that they consider something a “fundamental hypothesis” provides no information of value.

Consider, for example, this comment of Ignacio Mosqueira:

If there is no proof that means that there is no reason a-priori to prefer your arguments over those [of] Lubos. Expertise is not enough.  And the fact that Lubos is difficult to deal with doesn’t change that.

In my response, I wondered how broadly Ignacio would apply the principle “if there’s no proof, then there’s no reason to prefer any argument over any other one.”  For example, would he agree with the guy interviewed on Jon Stewart who earnestly explained that, since there’s no proof that turning on the LHC will destroy the world, but also no proof that it won’t destroy the world, the only rational inference is that there’s a 50% chance it will destroy the world?  (John Oliver’s deadpan response was classic: “I’m … not sure that’s how probability works…”)

In a lengthy reply, Luboš bites this bullet with relish and mustard.  In physics, he agrees, or even in “continuous mathematics that is more physics-wise,” it’s possible to have justified beliefs even without proof.  For example, he admits to a 99.9% probability that the Riemann hypothesis is true.  But, he goes on, “partial evidence in discrete mathematics just cannot exist.”  Discrete math and computer science, you see, are so arbitrary, manmade, and haphazard that every question is independent of every other; no amount of experience can give anyone any idea which way the next question will go.

No, I’m not kidding.  That’s his argument.

I couldn’t help wondering: what about number theory?  Aren’t the positive integers a “discrete” structure?  And isn’t the Riemann Hypothesis fundamentally about the distribution of primes?  Or does the Riemann Hypothesis get counted as an “honorary physics-wise continuous problem” because it can also be stated analytically?  But then what about Goldbach’s Conjecture?  Is Luboš 50/50 on that one too?  Better yet, what about continuous, analytic problems that are closely related to P vs. NP?  For example, Valiant’s Conjecture says you can’t linearly embed the permanent of an n×n matrix as the determinant of an m×m matrix, unless m≥exp(n).  Mulmuley and others have connected this “continuous cousin” of P≠NP to issues in algebraic geometry, representation theory, and even quantum groups and Langlands duality.  So, does that make it kosher?  The more I thought about the proposed distinction, the less sense it made to me.

But enough of this.  In the rest of this post, I want to explain why the odds that you should assign to P≠NP are more like 99% than they are like 50%.  This post supersedes my 2006 post on the same topic, which I hereby retire.  While that post was mostly OK as far as it went, I now feel like I can do a much better job articulating the central point.  (And also, I made the serious mistake in 2006 of striving for literary eloquence and tongue-in-cheek humor.  That works great for readers who already know the issues inside-and-out, and just want to be amused.  Alas, it doesn’t work so well for readers who don’t know the issues, are extremely literal-minded, and just want ammunition to prove their starting assumption that I’m a doofus who doesn’t understand the basics of his own field.)

So, OK, why should you believe P≠NP?  Here’s why:

Because, like any other successful scientific hypothesis, the P≠NP hypothesis has passed severe tests that it had no good reason to pass were it false.

What kind of tests am I talking about?

By now, tens of thousands of problems have been proved to be NP-complete.  They range in character from theorem proving to graph coloring to airline scheduling to bin packing to protein folding to auction pricing to VLSI design to minimizing soap films to winning at Super Mario Bros.  Meanwhile, another cluster of tens of thousands of problems has been proved to lie in P (or BPP).  Those range from primality to matching to linear and semidefinite programming to edit distance to polynomial factoring to hundreds of approximation tasks.  Like the NP-complete problems, many of the P and BPP problems are also related to each other by a rich network of reductions.  (For example, countless other problems are in P “because” linear and semidefinite programming are.)

So, if we were to draw a map of the complexity class NP  according to current knowledge, what would it look like?  There’d be a huge, growing component of NP-complete problems, all connected to each other by an intricate network of reductions.  There’d be a second huge component of P problems, many of them again connected by reductions.  Then, much like with the map of the continental US, there’d be a sparser population in the middle: stuff like factoring, graph isomorphism, and Unique Games that for various reasons has thus far resisted assimilation onto either of the coasts.

Of course, to prove P=NP, it would suffice to find a single link—that is, a single polynomial-time equivalence—between any of the tens of thousands of problems on the P coast, and any of the tens of thousands on the NP-complete one.  In half a century, this hasn’t happened: even as they’ve both ballooned exponentially, the two giant regions have remained defiantly separate from each other.  But that’s not even the main point.  The main point is that, as people explore these two regions, again and again there are “close calls”: places where, if a single parameter had worked out differently, the two regions would have come together in a cataclysmic collision.  Yet every single time, it’s just a fake-out.  Again and again the two regions “touch,” and their border even traces out weird and jagged shapes.  But even in those border zones, not a single problem ever crosses from one region to the other.  It’s as if they’re kept on their respective sides by an invisible electric fence.

As an example, consider the Set Cover problem: i.e., the problem, given a collection of subsets S1,…,Sm⊆{1,…,n}, of finding as few subsets as possible whose union equals the whole set.  Chvatal showed in 1979 that a greedy algorithm can produce, in polynomial time, a collection of sets whose size is at most ln(n) times larger than the optimum size.  This raises an obvious question: can you do better?  What about 0.9ln(n)?  Alas, building on a long sequence of prior works in PCP theory, it was recently shown that, if you could find a covering set at most (1-ε)ln(n) times larger than the optimum one, then you’d be solving an NP-complete problem, and P would equal NP.  Notice that, conversely, if the hardness result worked for ln(n) or anything above, then we’d also get P=NP.  So, why do the algorithm and the hardness result “happen to meet” at exactly ln(n), with neither one venturing the tiniest bit beyond?  Well, we might say, ln(n) is where the invisible electric fence is for this problem.

Want another example?  OK then, consider the “Boolean Max-k-CSP” problem: that is, the problem of setting n bits so as to satisfy the maximum number of constraints, where each constraint can involve an arbitrary Boolean function on any k of the bits.  The best known approximation algorithm, based on semidefinite programming, is guaranteed to satisfy at least a 2k/2k fraction of the constraints.  Can you guess where this is going?  Recently, Siu On Chan showed that it’s NP-hard to satisfy even slightly more than a 2k/2k fraction of constraints: if you can, then P=NP.  In this case the invisible electric fence sends off its shocks at 2k/2k.

I could multiply such examples endlessly—or at least, Dana (my source for such matters) could do so.  But there are also dozens of “weird coincidences” that involve running times rather than approximation ratios; and that strongly suggest, not only that P≠NP, but that problems like 3SAT should require cn time for some constant c.  For a recent example—not even a particularly important one, but one that’s fresh in my memory—consider this paper by myself, Dana, and Russell Impagliazzo.  A first thing we do in that paper is to give an approximation algorithm for a family of two-prover games called “free games.”  Our algorithm runs in quasipolynomial time:  specifically, nO(log(n)).  A second thing we do is show how to reduce the NP-complete 3SAT problem to free games of size ~2O(√n).

Composing those two results, you get an algorithm for 3SAT whose overall running time is roughly

$$2^{O( \sqrt{n} \log 2^{\sqrt{n}}) } = 2^{O(n)}.$$

Of course, this doesn’t improve on the trivial “try all possible solutions” algorithm.  But notice that, if our approximation algorithm for free games had been slightly faster—say, nO(log log(n))—then we could’ve used it to solve 3SAT in $$2^{O(\sqrt{n} \log n)}$$ time.  Conversely, if our reduction from 3SAT had produced free games of size (say) $$2^{O(n^{1/3})}$$ rather than 2O(√n), then we could’ve used that to solve 3SAT in $$2^{O(n^{2/3})}$$ time.

I should stress that these two results have completely different proofs: the approximation algorithm for free games “doesn’t know or care” about the existence of the reduction, nor does the reduction know or care about the algorithm.  Yet somehow, their respective parameters “conspire” so that 3SAT still needs cn time.  And you see the same sort of thing over and over, no matter which problem domain you’re interested in.  These ubiquitous “coincidences” would be immediately explained if 3SAT actually did require cn time—i.e., if it had a “hard core” for which brute-force search was unavoidable, no matter which way you sliced things up.  If that’s not true—i.e., if 3SAT has a subexponential algorithm—then we’re left with unexplained “spooky action at a distance.”  How do the algorithms and the reductions manage to coordinate with each other, every single time, to avoid spilling the subexponential secret?

Notice that, contrary to Luboš’s loud claims, there’s no “symmetry” between P=NP and P≠NP in these arguments.  Lower bound proofs are much harder to come across than either algorithms or reductions, and there’s not really a mystery about why: it’s hard to prove a negative!  (Especially when you’re up against known mathematical barriers, including relativization, algebrization, and natural proofs.)  In other words, even under the assumption that lower bound proofs exist, we now understand a lot about why the existing mathematical tools can’t deliver them, or can only do so for much easier problems.  Nor can I think of any example of a “spooky numerical coincidence” between two unrelated-seeming results, which would’ve yielded a proof of P≠NP had some parameters worked out differently.  P=NP and P≠NP can look like “symmetric” possibilities only if your symmetry is unbroken by knowledge.

Imagine a pond with small yellow frogs on one end, and large green frogs on the other.  After observing the frogs for decades, herpetologists conjecture that the populations represent two distinct species with different evolutionary histories, and are not interfertile.  Everyone realizes that to disprove this hypothesis, all it would take would be a single example of a green/yellow hybrid.  Since (for some reason) the herpetologists really care about this question, they undertake a huge program of breeding experiments, putting thousands of yellow female frogs next to green male frogs (and vice versa) during mating season, with candlelight, soft music, etc.  Nothing.

As this green vs. yellow frog conundrum grows in fame, other communities start investigating it as well: geneticists, ecologists, amateur nature-lovers, commercial animal breeders, ambitious teenagers on the science-fair circuit, and even some extralusionary physicists hoping to show up their dimwitted friends in biology.  These other communities try out hundreds of exotic breeding strategies that the herpetologists hadn’t considered, and contribute many useful insights.  They also manage to breed a larger, greener, but still yellow frog—something that, while it’s not a “true” hybrid, does have important practical applications for the frog-leg industry.  But in the end, no one has any success getting green and yellow frogs to mate.

Then one day, someone exclaims: “aha!  I just found a huge, previously-unexplored part of the pond where green and yellow frogs live together!  And what’s more, in this part, the small yellow frogs are bigger and greener than normal, and the large green frogs are smaller and yellower!”

This is exciting: the previously-sharp boundary separating green from yellow has been blurred!  Maybe the chasm can be crossed after all!

Alas, further investigation reveals that, even in the new part of the pond, the two frog populations still stay completely separate.  The smaller, yellower frogs there will mate with other small yellow frogs (even from faraway parts of the pond that they’d never ordinarily visit), but never, ever with the larger, greener frogs even from their own part.  And vice versa.  The result?  A discovery that could have falsified the original hypothesis has instead strengthened it—and precisely because it could’ve falsified it but didn’t.

Now imagine the above story repeated a few dozen more times—with more parts of the pond, a neighboring pond, sexually-precocious tadpoles, etc.  Oh, and I forgot to say this before, but imagine that doing a DNA analysis, to prove once and for all that the green and yellow frogs had separate lineages, is extraordinarily difficult.  But the geneticists know why it’s so difficult, and the reasons have more to do with the limits of their sequencing machines and with certain peculiarities of frog DNA, than with anything about these specific frogs.  In fact, the geneticists did get the sequencing machines to work for the easier cases of turtles and snakes—and in those cases, their results usually dovetailed well with earlier guesses based on behavior.  So for example, where reddish turtles and bluish turtles had never been observed interbreeding, the reason really did turn out to be that they came from separate species.  There were some surprises, of course, but nothing even remotely as shocking as seeing the green and yellow frogs suddenly getting it on.

Now, even after all this, someone could saunter over to the pond and say: “ha, what a bunch of morons!  I’ve never even seen a frog or heard one croak, but I know that you haven’t proved anything!  For all you know, the green and yellow frogs will start going at it tomorrow.  And don’t even tell me about ‘the weight of evidence,’ blah blah blah.  Biology is a scummy mud-discipline.  It has no ideas or principles; it’s just a random assortment of unrelated facts.  If the frogs started mating tomorrow, that would just be another brute, arbitrary fact, no more surprising or unsurprising than if they didn’t start mating tomorrow.  You jokers promote the ideology that green and yellow frogs are separate species, not because the evidence warrants it, but just because it’s a convenient way to cover up your own embarrassing failure to get them to mate.  I could probably breed them myself in ten minutes, but I have better things to do.”

At this, a few onlookers might nod appreciatively and say: “y’know, that guy might be an asshole, but let’s give him credit: he’s unafraid to speak truth to competence.”

Even among the herpetologists, a few might beat their breasts and announce: “Who’s to say he isn’t right?  I mean, what do we really know?  How do we know there even is a pond, or that these so-called ‘frogs’ aren’t secretly giraffes?  I, at least, have some small measure of wisdom, in that I know that I know nothing.”

What I want you to notice is how scientifically worthless all of these comments are.  If you wanted to do actual research on the frogs, then regardless of which sympathies you started with, you’d have no choice but to ignore the naysayers, and proceed as if the yellow and green frogs were different species.  Sure, you’d have in the back of your mind that they might be the same; you’d be ready to adjust your views if new evidence came in.  But for now, the theory that there’s just one species, divided into two subgroups that happen never to mate despite living in the same habitat, fails miserably at making contact with any of the facts that have been learned.  It leaves too much unexplained; in fact it explains nothing.

For all that, you might ask, don’t the naysayers occasionally turn out to be right?  Of course they do!  But if they were right more than occasionally, then science wouldn’t be possible.  We would still be in caves, beating our breasts and asking how we can know that frogs aren’t secretly giraffes.

So, that’s what I think about P and NP.  Do I expect this post to convince everyone?  No—but to tell you the truth, I don’t want it to.  I want it to convince most people, but I also want a few to continue speculating that P=NP.

Why, despite everything I’ve said, do I want maybe-P=NP-ism not to die out entirely?  Because alongside the P=NP carpers, I also often hear from a second group of carpers.  This second group says that P and NP are so obviously, self-evidently unequal that the quest to separate them with mathematical rigor is quixotic and absurd.  Theoretical computer scientists should quit wasting their time struggling to understand truths that don’t need to be understood, but only accepted, and do something useful for the world.  (A natural generalization of this view, I guess, is that all basic science should end.)  So, what I really want is for the two opposing groups of naysayers to keep each other in check, so that those who feel impelled to do so can get on with the fascinating quest to understand the ultimate limits of computation.

Update (March 8): At least eight readers have by now emailed me, or left comments, asking why I’m wasting so much time and energy arguing with Luboš Motl.  Isn’t it obvious that, ever since he stopped doing research around 2006 (if not earlier), this guy has completely lost his marbles?  That he’ll never, ever change his mind about anything?

Yes.  In fact, I’ve noticed repeatedly that, even when Luboš is wrong about a straightforward factual matter, he never really admits error: he just switches, without skipping a beat, to some other way to attack his interlocutor.  (To give a small example: watch how he reacts to being told that graph isomorphism is neither known nor believed to be NP-complete.  Caught making a freshman-level error about the field he’s attacking, he simply rants about how graph isomorphism is just as “representative” and “important” as NP-complete problems anyway, since no discrete math question is ever more or less “important” than any other; they’re all equally contrived and arbitrary.  At the Luboš casino, you lose even when you win!  The only thing you can do is stop playing and walk away.)

Anyway, my goal here was never to convince Luboš.  I was writing, not for him, but for my other readers: especially for those genuinely unfamiliar with these interesting issues, or intimidated by Luboš’s air of certainty.  I felt like I owed it to them to set out, clearly and forcefully, certain facts that all complexity theorists have encountered in their research, but that we hardly ever bother to articulate.  If you’ve never studied physics, then yes, it sounds crazy that there would be quadrillions of invisible neutrinos coursing through your body.  And if you’ve never studied computer science, it sounds crazy that there would be an “invisible electric fence,” again and again just barely separating what the state-of-the-art approximation algorithms can handle from what the state-of-the-art PCP tools can prove is NP-complete.  But there it is, and I wanted everyone else at least to see what the experts see, so that their personal judgments about the likelihood of P=NP could be informed by seeing it.

Luboš’s response to my post disappointed me (yes, really!).  I expected it to be nasty and unhinged, and so it was.  What I didn’t expect was that it would be so intellectually lightweight.  Confronted with the total untenability of his foot-stomping distinction between “continuous math” (where you can have justified beliefs without proof) and “discrete math” (where you can’t), and with exactly the sorts of “detailed, confirmed predictions” of the P≠NP hypothesis that he’d declared impossible, Luboš’s response was simply to repeat his original misconceptions, but louder.

And that brings me, I confess, to a second reason for my engagement with Luboš.  Several times, I’ve heard people express sentiments like:

Yes, of course Luboš is a raging jerk and a social retard.  But if you can just get past that, he’s so sharp and intellectually honest!  No matter how many people he needlessly offends, he always tells it like it is.

I want the nerd world to see—in as stark a situation as possible—that the above is not correct.  Luboš is wrong much of the time, and he’s intellectually dishonest.

At one point in his post, Luboš actually compares computer scientists who find P≠NP a plausible working hypothesis to his even greater nemesis: the “climate cataclysmic crackpots.”  (Strangely, he forgot to compare us to feminists, Communists, Muslim terrorists, or loop quantum gravity theorists.)  Even though the P versus NP and global warming issues might not seem closely linked, part of me is thrilled that Luboš has connected them as he has.  If, after seeing this ex-physicist’s “thought process” laid bare on the P versus NP problem—how his arrogance and incuriosity lead him to stake out a laughably-absurd position; how his vanity then causes him to double down after his errors are exposed—if, after seeing this, a single person is led to question Lubošian epistemology more generally, then my efforts will not have been in vain.

Anyway, now that I’ve finally unmasked Luboš—certainly to my own satisfaction, and I hope to that of most scientifically-literate readers—I’m done with this.  The physicist John Baez is rumored to have said: “It’s not easy to ignore Luboš, but it’s ALWAYS worth the effort.”  It took me eight years, but I finally see the multiple layers of profundity hidden in that snark.

And thus I make the following announcement:

For the next three years, I, Scott Aaronson, will not respond to anything Luboš says, nor will I allow him to comment on this blog.

In March 2017, I’ll reassess my Luboš policy.  Whether I relent will depend on a variety of factors—including whether Luboš has gotten the professional help he needs (from a winged pig, perhaps?) and changed his behavior; but also, how much my own quality of life has improved in the meantime.

by Scott at March 07, 2014 09:16 AM UTC

### TR14-031 | On the Query Complexity of Selecting Few Minimal Sets | Joao Marques-Silva, Mikolas Janota

from ECCC papers

Propositional Satisfiability (SAT) solvers are routinely used for solving many function problems. A natural question that has seldom been addressed is: what is the best worst-case number of calls to a SAT solver for solving some target function problem? This paper develops tighter upper bounds on the query complexity of solving several function problems defined on propositional formulas. These include computing the backbone of a formula and computing the set of independent variables of a formula. For the general case, the paper develops tighter upper bounds on the query complexity of computing a minimal set when the number of minimal sets is constant. This applies for example to the computation of a minimal unsatisfiable subset (MUS) for CNF formulas, but also to the computation of prime implicants and implicates.

### Corrections to the results derived in "A Unified Approach to Algorithms Generating Unrestricted and Restricted Integer Compositions and Integer Partitions"'; and a comparison of four restricted integer composition generation algorithms

Authors: Steffen Eger
Download: PDF
Abstract: In this note, I discuss results on integer compositions/partitions given in the paper "A Unified Approach to Algorithms Generating Unrestricted and Restricted Integer Compositions and Integer Partitions". I also experiment with four different generation algorithms for restricted integer compositions and find the algorithm designed in the named paper to be pretty slow, comparatively.

Some of my comments may be subjective.

### Linear Recognition of Almost (Unit) Interval Graphs

Authors: Yixin Cao
Download: PDF
Abstract: Give a graph class $\cal G$ and a nonnegative integer $k$, we use $\cal G + k v$, $\cal G + k e$, and $\cal G - k e$ to denote the classes of graphs that can be obtained from some graph in $\cal G$ by adding $k$ vertices, adding $k$ edges, and deleting $k$ edges, respectively. They are called \emph{almost (unit) interval graphs} if $\cal G$ is the class of (unit) interval graphs. Almost (unit) interval graphs are well motivated from computational biology, where the data ought to be represented by a (unit) interval graph while we can only expect an almost (unit) interval graph for the best. For any fixed $k$, we give linear-time algorithms for recognizing all these classes, and in the case of membership, our algorithms provide also a specific (unit) interval graph as evidence.

When $k$ is part of the input, all the recognition problems are NP-complete. Our results imply that all of them are fixed-parameter tractable parameterized by $k$, thereby resolving the long-standing open problem on the parameterized complexity of recognizing $\mbox{(unit)-interval}+ k e$, first asked by Bodlaender et al. [Comput. Appl. Biosci., 11(1):49--57, 1995]. Moreover, our algorithms for recognizing $\mbox{(unit-)interval}+ k v$ and $\mbox{(unit-)interval}- k e$ have single-exponential dependence on $k$ and linear dependence on the graph size, which significantly improve all previous algorithms for recognizing the same classes: [Heggernes et al., STOC 2007]; [Kaplan et al., FOCS 1994]; [Cao and Marx, SODA 2014]; and [Villanger, IPEC 2010].

### Parameterized Complexity of the $k$-Arc Chinese Postman Problem

Authors: Gregory Gutin, Mark Jones, Bin Sheng
Download: PDF
Abstract: In the Mixed Chinese Postman Problem (MCPP), given an edge-weighted mixed graph $G$ ($G$ may have both edges and arcs), our aim is to find a minimum weight closed walk traversing each edge and arc at least once. The MCPP parameterized by the number of edges was known to be fixed-parameter tractable using a simple argument. Solving an open question of van Bevern et al., we prove that the MCPP parameterized by the number of arcs is also fixed-parameter tractable. Our proof is more involved and, in particular, uses a very useful result of Marx, O'Sullivan and Razgon (2013) on the treewidth of torso graphs with respect to small cuts.

### How Unsplittable-Flow-Covering helps Scheduling with Job-Dependent Cost Functions

Authors: Wiebke Höhn, Julián Mestre, Andreas Wiese
Download: PDF
Abstract: Generalizing many well-known and natural scheduling problems, scheduling with job-specific cost functions has gained a lot of attention recently. In this setting, each job incurs a cost depending on its completion time, given by a private cost function, and one seeks to schedule the jobs to minimize the total sum of these costs. The framework captures many important scheduling objectives such as weighted flow time or weighted tardiness. Still, the general case as well as the mentioned special cases are far from being very well understood yet, even for only one machine. Aiming for better general understanding of this problem, in this paper we focus on the case of uniform job release dates on one machine for which the state of the art is a 4-approximation algorithm. This is true even for a special case that is equivalent to the covering version of the well-studied and prominent unsplittable flow on a path problem, which is interesting in its own right. For that covering problem, we present a quasi-polynomial time $(1+\epsilon)$-approximation algorithm that yields an $(e+\epsilon)$-approximation for the above scheduling problem. Moreover, for the latter we devise the best possible resource augmentation result regarding speed: a polynomial time algorithm which computes a solution with \emph{optimal }cost at $1+\epsilon$ speedup. Finally, we present an elegant QPTAS for the special case where the cost functions of the jobs fall into at most $\log n$ many classes. This algorithm allows the jobs even to have up to $\log n$ many distinct release dates.

### A Suffix Tree Or Not A Suffix Tree?

Authors: Tatiana Starikovskaya, Hjalte Wedel Vildhøj
Download: PDF
Abstract: In this paper we study the structure of suffix trees. Given an unlabelled tree $\tau$ on $n$ nodes and suffix links of its internal nodes, we ask the question "Is $\tau$ a suffix tree?", i.e., is there a string $S$ whose suffix tree has the same topological structure as $\tau$? We place no restrictions on $S$, in particular we do not require that $S$ ends with a unique symbol. This corresponds to considering the more general definition of implicit or extended suffix trees. Such general suffix trees have many applications and are for example needed to allow efficient updates when suffix trees are built online. We prove that $\tau$ is a suffix tree if and only if it is realized by a string $S$ of length $n-1$, and we give an $O(n^2)$ time algorithm for inferring $S$ when the first letter on each edge is known. This generalizes the work of I et al. [Discrete Appl. Math. 163, 2014].

### A Lower Bound for Fourier Transform Computation in a Linear Well Conditioned Model

Authors: Nir Ailon
Download: PDF
Abstract: Obtaining a non-trivial (super-linear) lower bound for computation of the Fourier transform in the linear circuit model has been a long standing open problem. All lower bounds so far have made strong restrictions on the computational model.

An early result by Morgenstern from 1973, provides an $\Omega(n \log n)$ lower bound for the \emph{unnormalized} FFT when the constants used in the computation are bounded. The proof uses a potential function related to a determinant. The determinant of the unnormalized Fourier transform is $n^{n/2}$, and thus by showing that it can grow by at most a constant factor after each step yields the result. This classic result, however, does not explain why the \emph{normalized} Fourier transform, which has a unit determinant, should take $\Omega(n\log n)$ steps to compute.

More recently, Ailon (2013) showed that in a layered linear circuit model restricted to unitary $2\times 2$ gates, one obtains an $\Omega(n\log n)$ lower bound. The well known FFT works in this model. The main conclusion from that work was that a potential function that might eventually help proving the $\Omega(n\log n)$ conjectured lower bound for computation of Fourier transform is not related to matrix determinant, but rather to a notion of matrix entropy.

In this work we take another step and show that an $\Omega(n\log n)$ lower bound in a considerably less restricted computational model. In this model, we allow the gates to perform any nonsingular (not necessarily unitary) transformation. Our restriction is that, for any $t$, after gates $1,\dots, t$ are executed, the resulting transformation as well as its inverse have spectral norm at most $R$ for some constant $R$. The case $R=1$ is equivalent to the result of Ailon (2013). Therefore, the novelty is in extending to arbitrary $R>1$.

### Computational search of small point sets with small rectilinear crossing number

Authors: Ruy Fabila-Monroy, Jorge López
Download: PDF
Abstract: Let $\crs(K_n)$ be the minimum number of crossings over all rectilinear drawings of the complete graph on $n$ vertices on the plane. In this paper we prove that $\crs(K_n) < 0.380473\binom{n}{4}+\Theta(n^3)$; improving thus on the previous best known upper bound. This is done by obtaining new rectilinear drawings of $K_n$ for small values of $n$, and then using known constructions to obtain arbitrarily large good drawings from smaller ones. The "small" sets where found using a simple heuristic detailed in this paper.

### Design, Implementation and Evaluation of MTBDD based Fuzzy Sets and Binary Fuzzy Relations

Authors: Hamid A. Toussi, Bahram Sadeghi Bigham
Download: PDF
Abstract: For fast and efficient analysis of large sets of fuzzy data, elimination of redundancies in the memory representation is needed. We used MTBDDs as the underlying data-structure to represent fuzzy sets and binary fuzzy relations. This leads to elimination of redundancies in the representation, less computations, and faster analyses. We have also extended a BDD package (BuDDy) to support MTBDDs in general and fuzzy sets and relations in particular. Different fuzzy operations such as max, min and max-min composition were implemented based on our representation. Effectiveness of our representation is shown by applying it on fuzzy connectedness and image segmentation problem. Compared to a base implementation, the running time of our MTBDD based implementation was faster (in our test cases) by a factor ranging from 2 to 27. Also, when the MTBDD based data-structure was employed, the memory needed to represent the final results was improved by a factor ranging from 37.9 to 265.5.

### Favorite Theorems: Unique Games

Michel Goemans and David Williamson made a splash in the 90's using semidefinite programming to give a new approximation algorithm for the max-cut problem, a ratio of 2θ/(π(1-cos(θ)) minimized over θ between 0 and π, approximately 0.87856. Hard to believe that this ratio is tight, but it is assuming the unique games conjecture.
The first paper showed that the Goemans-Williamson bound was tight assuming the unique games conjecture and a "majority is stablest conjecture", the last says very roughly that the most robust election scheme is a simple majority. The second paper, which followed soon thereafter, proved an invariance property that implies, among other things, that indeed majority is stablest.

Khot and Oded Regev show that under the unique games conjecture that essentially the best algorithm for approximating vertex cover is to take all the vertices involved in a maximal matching.

Sanjeev Arora, Boaz Barak and David Steurer describe an algorithm that given a unique game where 1-δ fraction of the edges can be satisfied, you can in time 2npoly(δ) find a coloring that satisfies a constant fraction of edges. This may or may not give evidence against the UGC.

Luca Trevisan has a nice recent survey on the unique games conjecture, covering much of the above and more, including beautiful connections between unique games and semidefinite programming.

by Lance Fortnow (noreply@blogger.com) at March 06, 2014 12:40 PM UTC

### TR14-030 | An Approach To The Sliding Scale Conjecture Via Parallel Repetition For Low Degree Testing | Dana Moshkovitz

from ECCC papers

The Sliding Scale Conjecture in PCP states that there are PCP verifiers with a constant number of queries and soundness error that is exponentially small in the randomness of the verifier and the length of the prover's answers. The Sliding Scale Conjecture is one of the oldest open problems in PCP, and it implies hardness of approximation up to polynomial factors for problems like Max-CSP (with polynomial-sized alphabet), Directed-Sparsest-Cut and Directed-Multi-Cut. In this work we prove: 1) The Sliding Scale Conjecture follows from a construction of a low degree test whose soundness error is exponential in the randomness of the verifier. 2) A parallel repetition theorem for low degree testing: Given a low degree test with error |F|^{-\Omega(1)}, one can generate a repeated low degree test whose error is |F|^{-\Omega(k)}. 3) Applying the parallel repetition theorem on a suitable low degree test, we get a low degree test with error |F|^{-\Omega(k)} and randomness O(kmlog|F|). In particular, we get the first low degree test with error << 1/|F| and O(mlog|F|) randomness. The missing piece for proving the Sliding Scale Conjecture is a derandomization of the parallel repetition theorem. This seems plausible given the algebraic structure of the low degree testing problem, which was utilized for derandomization in the past. The limitation on derandomizing parallel repetition by Feige and Kilian does not rule out this approach. Additional contributions in this work include an analysis of the sampling properties of the incidence graph of degree-k curves and k'-tuples of points in a finite space F^m, and a combinatorial composition lemma for PCP that abstracts the composition technique of Arora and Safra.

### Natural theorems proven only "to high probability"?

There are plenty of situations where a randomized "proof" is much easier than a deterministic proof, the canonical example being polynomial identity testing.

Question: Are there any natural mathematical "theorems" where a randomized proof is known but a deterministic proof is not?

By a "randomized proof" of a statement $P$ I mean that

1. There is a randomized algorithm that takes an input $n > 0$ and if $P$ is false produces a deterministic proof of $\neg P$ with probability at least $1-2^{-n}$.

2. Someone has run the algorithm for, say, $n = 100$, and not disproved the theorem.

It's easy to generate non-natural statements that fit: just pick a large instance of any problem where only an efficient randomized algorithm is known. However, although there a lot of mathematical theorems with "lots of numerical evidence", such as the Riemann hypothesis, I don't know of any with rigorous randomized evidence of the above form.

by Geoffrey Irving at March 06, 2014 01:34 AM UTC

### On the complexity of deciding whether the regular number is at most two

Authors: Ali Dehghan, Mohammad-Reza Sadeghi, Arash Ahadi
Download: PDF
Abstract: The regular number of a graph G denoted by reg(G) is the minimum number of subsets into which the edge set of G can be partitioned so that the subgraph induced by each subset is regular. In this work we answer to the problem posed as an open problem in A. Ganesan et al. (2012) [3] about the complexity of determining the regular number of graphs. We show that computation of the regular number for connected bipartite graphs is NP-hard. Furthermore, we show that, determining whether reg(G) = 2 for a given connected 3-colorable graph G is NP-complete. Also, we prove that a new variant of the Monotone Not-All-Equal 3-Sat problem is NP-complete.

### A Polynomial Time Solution to the Clique Problem

Authors: Pawan Tamta, B. P. Pande, H. S. Dhami
Download: PDF
Abstract: The Clique Problem has a reduction to the Maximum Flow Network Interdiction Problem. We review the reduction to evolve a polynomial time algorithm for the Clique Problem. A computer program in C language has been written to validate the easiness of the algorithm.

### Compact Subsequence Matching and Packed Tree Coloring

Authors: Philip Bille, Patrick Hagge Cording, Inge Li Gørtz
Download: PDF
Abstract: We present a new algorithm for subsequence matching in grammar compressed strings. Given a grammar of size $n$ compressing a string of size $N$ and a pattern string of size $m$ over an alphabet of size $\sigma$, our algorithm uses $O(n+\frac{n\sigma}{w})$ space and $O(n+\frac{n\sigma}{w}+m\log N\log w\cdot occ)$ or $O(n+\frac{n\sigma}{w}\log w+m\log N\cdot occ)$ time. Here $w$ is the word size and $occ$ is the number of occurrences of the pattern. Our algorithm uses less space than previous algorithms and is also faster for $occ=o(\frac{n}{\log N})$ occurrences. The algorithm uses a new data structure that allows us to efficiently find the next occurrence of a given character after a given position in a compressed string. This data structure in turn is based on a new data structure for the tree color problem, where the node colors are packed in bit strings.

### Counting the Number of Minimum Roman Dominating Functions of a Graph

Authors: Zheng Shi, Khee Meng Koh
Download: PDF
Abstract: We provide two algorithms counting the number of minimum Roman dominating functions of a graph on n vertices in O(1.5673^n) time and polynomial space. We also show that the time complexity can be reduced to O(1.5014^n) if exponential space is used. Our result is obtained by transforming the Roman domination problem into other combinatorial problems on graphs for which exact algorithms already exist.

### EATCS Fellows class of 2014 named

from Luca Aceto

The EATCS has recognized ten of its members for their outstanding contributions to theoretical computer science by naming them as the first recipients of an EATCS fellowship.

The EATCS Fellows for 2014 are:
• Susanne Albers (Technische Universität München, Germany) for "her contributions to the design and analysis of algorithms, especially online algorithms, approximation algorithms, algorithmic game theory and algorithm engineering";
• Giorgio Ausiello (Università di Roma "La Sapienza", Italy) for "the impact of his scientific work in the field of algorithms and computational complexity and for his service to the scientific community";
• the late Wilfried Brauer (Technische Universität München, Germany) for "outstanding contributions to the foundation and organization of the European TCS community";
• Herbert Edelsbrunner (Institute of Science and Technology Austria and Duke University, USA) for "his tremendous impact on the field of computational geometry";
• Mike Fellows (Charles Darwin University, Australia) for "his role in founding the field of parameterized complexity theory, which has become a major subfield of research in theoretical computer science, and for being a leader in computer science education";
• Yuri Gurevich (Microsoft Research, USA) for "his development of abstract state machines and for outstanding contributions to algebra, logic, game theory, complexity theory and  software engineering";
• Monika Henzinger (University of Vienna, Austria) for "being one of the pioneers of web algorithms, algorithms that deal with problems of the world wide web";
• Jean-Eric Pin (LIAFA, CNRS and University Paris Diderot, France) for "outstanding contributions to the algebraic theory of automata and languages in connection with logic, topology, and combinatorics and service to the European TCS community";
• Paul Spirakis (University of Liverpool, UK, and University of Patras, Greece) for "seminal papers on Random Graphs and Population Protocols, Algorithmic Game Theory, as well as Robust Parallel Distribute Computing";
• Wolfgang Thomas (RWTH Aachen University, Germany) for "foundational  contributions to the development of automata theory as a framework for modelling, analyzing, verifying and synthesizing information processing systems."
The aforementioned members of the EATCS were selected by the EATCS Fellow Selection Committee, after examining the nominations received from our research community. The EATCS Fellow Selection Committee for 2014 consisted of
• Rocco De Nicola (IMT Lucca, Italy,
• Paul Goldberg (Oxford, UK),
• Anca Muscholl (Bordeaux, France; chair),
• Dorothea Wagner (Karlsruhe, Germany) and
• Roger Wattenhofer (ETH Zurich, CH)
The EATCS Fellows Program was established by the association last year  to recognize outstanding EACTS members for their scientific achievements in the field of Theoretical Computer Science. The EATCS is very proud to have the above-mentioned members of the organization as its first fellows. Congratulations all of them!

by Luca Aceto (noreply@blogger.com) at March 05, 2014 01:29 PM UTC

### An Extension Of Weiler-Atherton Algorithm To Cope With The Self-intersecting Polygon

Authors: Anurag Chakraborty
Download: PDF
Abstract: In this paper a new algorithm has been proposed which can fix the problem of Weiler Atherton algorithm. The problem of Weiler Atherton algorithm lies in clipping self intersecting polygon. Clipping self intersecting polygon is not considered in Weiler Atherton algorithm and hence it is also a main disadvantage of this algorithm. In our new algorithm a self intersecting polygon has been divided into non self intersecting contours and then perform the Weiler Atherton clipping algorithm on those sub polygons. For holes we have to store the edges that is not own boundary of hole contour from recently clipped polygon. Thus if both contour is hole then we have to store all the edges of the recently clipped polygon. Finally the resultant polygon has been produced by eliminating all the stored edges.

### Standard Simplices and Pluralities are Not the Most Noise Stable

Authors: Steven Heilman, Elchanan Mossel, Joe Neeman
Download: PDF
Abstract: The Standard Simplex Conjecture and the Plurality is Stablest Conjecture are two conjectures stating that certain partitions are optimal with respect to Gaussian and discrete noise stability respectively. These two conjectures are natural generalizations of the Gaussian noise stability result by Borell (1985) and the Majority is Stablest Theorem (2004). Here we show that the standard simplex is not the most stable partition in Gaussian space and that Plurality is not the most stable low influence partition in discrete space for every number of parts $k \geq 3$, for every value $\rho \neq 0$ of the noise and for every prescribed measures for the different parts as long as they are not all equal to $1/k$. Our results show a fundamental difference between noise stability of partitions into $k \geq 3$ parts and noise stability of partitions into $2$ parts. Moreover, our results further show a difference between noise stability of partitions into $k \geq 3$ parts and Gaussian surface area of partitions into $k \geq 3$ parts. Given our results it is natural to ask for (conjectured) partitions achieving the optimum noise stability.

### QPTAS for Geometric Set-Cover Problems via Optimal Separators

Authors: Nabil H. Mustafa, Rajiv Raman, Saurabh Ray
Download: PDF
Abstract: Geometric set-cover problems arise naturally in several basic settings, and therefore the problem of computing small set covers has been studied extensively. However, after more than two decades of research, $(1+\epsilon)$-approximation algorithms for basic scenarios are still lacking. A prominent example is of weighted disks in the plane for which, after a series of papers, Varadarajan~\cite{V09} and Chan \etal~\cite{CGKS12} showed a $O(1)$-approximation algorithm for computing minimum-weight set-covers. For the fundamental class of objects called pseudodisks (disks, unit-height rectangles, translates of a convex set), even a PTAS for the unweighted case is a long-standing open problem. In this paper, our main result is presenting a QPTAS for both these problems based on the framework of Adamaszek and Weise~\cite{AW13, AW14}, who recently obtained QPTAS for weighted independent set of polygonal regions. Their new result is the existence of small-complexity separators for geometric objects in the plane. We observe that their construction can be made optimal (and more general), thus implying improved running times for the algorithms of Adamaszek and Weise. This also shows that their algorithms works for a broader category of geometric objects. We note that these separator statements imply directly a QPTAS for independent set (and, in general, packings problems); however their relevance to set-cover was surprising to us.

### Induced Disjoint Paths in Circular-Arc Graphs in Linear Time

Authors: Petr A. Golovach, Daniël Paulusma, Erik Jan van Leeuwen
Download: PDF
Abstract: The Induced Disjoint Paths problem is to test whether a graph G with k distinct pairs of vertices (s_i,t_i) contains paths P_1,...,P_k such that P_i connects s_i and t_i for i=1,...,k, and P_i and P_j have neither common vertices nor adjacent vertices (except perhaps their ends) for 1<=i < j<=k. We present a linear-time algorithm for Induced Disjoint Paths on circular-arc graphs. For interval graphs, we exhibit a linear-time algorithm for the generalization of Induced Disjoint Paths where the pairs (s_i,t_i) are not necessarily distinct.

### Counting small cliques in MapReduce

Authors: Irene Finocchi, Marco Finocchi, Emanuele G. Fusco
Download: PDF
Abstract: We present exact and approximate MapReduce estimators for the number of cliques of size k in an undirected graph, for any small constant k >= 3. Besides theoretically analyzing our algorithms in the computational model for MapReduce introduced by Karloff, Suri, and Vassilvitskii, we present the results of extensive computational experiments on the Amazon EC2 platform. Our experiments show the practical effectiveness of our algorithms even on clusters of small/medium size, and suggest their scalability to larger clusters.

### A Novel Method for Vectorization

Authors: Tolga Birdal, Emrah Bala
Download: PDF
Abstract: Vectorization of images is a key concern uniting computer graphics and computer vision communities. In this paper we are presenting a novel idea for efficient, customizable vectorization of raster images, based on Catmull Rom spline fitting. The algorithm maintains a good balance between photo-realism and photo abstraction, and hence is applicable to applications with artistic concerns or applications where less information loss is crucial. The resulting algorithm is fast, parallelizable and can satisfy general soft realtime requirements. Moreover, the smoothness of the vectorized images aesthetically outperforms outputs of many polygon-based methods

### TR14-029 | On Learning and Testing Dynamic Environments | Oded Goldreich, Dana Ron

from ECCC papers

We initiate a study of learning and testing dynamic environments, focusing on environment that evolve according to a fixed local rule. The (proper) learning task consists of obtaining the initial configuration of the environment, whereas for non-proper learning it suffices to predict its future values. The testing task consists of checking whether the environment has indeed evolved from some initial configuration according to the known evolution rule. We focus on the temporal aspect of these computational problems, which is reflected in the requirement that only a small portion of the environment is inspected in each time slot (i.e., the time period between two consecutive applications of the evolution rule). We present some general observations, an extensive study of two special cases, two separation results, and a host of open problems. The two special cases that we study refer to linear rules of evolution and to rules of evolution that represent simple movement of objects. Specifically, we show that evolution according to any linear rule can be tested within a total number of queries that is sublinear in the size of the environment, and that evolution according to a simple one-dimensional movement can be tested within a total number of queries that is independent of the size of the environment.

### Two big data talks at Haverford College

from The Geomblog

I'm giving two talks at Haverford College just before SDM 2014 in Philadelphia. If you're in the area. stop by and say hello !

by Suresh Venkatasubramanian (noreply@blogger.com) at March 04, 2014 04:34 PM UTC

### Why are there so few intemediary problems in Complexity? In Computability?

There are thousands of natural PC problems. Assuming P NE NP how many natural problems are there that are
in NP-P but are NOT NPC? Some candidates are Factoring, Discrete Log, Graph Isom, some in group theory, and any natural sparse set. See
here for some more.

A student asked me WHY there are so few natural intermediary problems. I don't know but here are some
options:

1. Bill you moron, there are MANY such problems. You didn't mention THESE problems (Followed by a list of problems
that few people have heard of but seem to be intermediary.)
2. This is a question of Philosophy and hence not interesting.
3. This is a question of Philosophy and hence very interesting.
4. That's just the way it goes.
5. By Murphy's law there will be many problems that we can't solve quickly.

At least in complexity theory there are SOME candidates for intermediary sets.
In computability theory, where we know Sigma_1 \ne \Sigma_0, there are no
candidates for natural problems that are c.e., not decidable, but not complete. There have been some attempts to show that there can't be any
such sets, but its hard to define natural'' rigorously. (There ARE sets that are c.e., not dec, not complete, but they are
constructed for the sole purpose of being there. My darling would call them dumb ass' sets,
a terminology that my class now uses as well.)

A long time ago an AI student was working on classifying various problems in planning. There was one that was c.e. and not decidable
and he was unable to show it was complete. He asked me to help him prove it was not complete. I told him, without looking at it,
that it was COMPLETE!!!!!!!!! My confidence inspired him to prove it was complete.

So, aside from the answers above, is there a MATH reason why there are so few
intermediary problems in Complexity, and NONE in computability theory?
Is there some other kind of reason?

by GASARCH (noreply@blogger.com) at March 04, 2014 04:07 PM UTC

### News for February 2014

So what did we see in February 2014?

Local Algorithms for Sparse Spanning Graphs by Reut Levi, Dana Ron, and Ronitt Rubinfeld (arXiv). Given a graph $G = (V,E)$, the aim is the find a small sparse spanning subgraph $G’$ in the local algorithm setting. So, given any edge $(u,v) \in E$, we must determine in sublinear time whether this edge exists in $G’$. The main result is an $\Omega(|V|)$ lower bound, and a matching algorithm when $G$ is either an expander or hyperfinite.

A Characterization of Locally Testable Affine-Invariant Properties via Decomposition Theorems by Yuichi Yoshida (arXiv). This result gives a complete characterization of (two-sided) locally testable affine-invariant properties. These are based on existing decomposition theorems for functions into structured and pseudorandom parts. This result is analogous to that of Alon, Fischer, Newman, and Shapira (link) on characterizations for testability over dense graphs.

Testing probability distributions underlying aggregated data by Clément Canonne and Ronitt Rubinfeld (arXiv). There has been much recent work on distribution testing, and this work discusses stronger models where better algorithms can be obtained. Given a distribution $D$ over $[n]$, we can get samples from this distribution, and also query the probability $D(i)$ of any $i \in [n]$. With these “dual” queries, one gets distribution testers for a variety of problems surpassing previous lower bounds. Interestingly, this is stronger than the conditional model introduced recently (here and here).

Strong Locally Testable Codes with Relaxed Local Decoders by Oded Goldreich, Tom Gur, and Ilan Komargodski (ECCC). For a locally testable code (LTC), the property of being a codeword is efficiently (constant time) testable. A locally decodable code (LDC) is a code from which individual bits of the original message can be recovered in constant time. A relaxed local decoder is allowed to declare failure for recovered a small fraction of message bits. This allows for bypassing the difficult (and prominent) open problem of constructing small LDCs. This result gives a construction of near-linear LTCs with a relaxed local decoder. This result leads to a property that is hard to test but easy to verify (in the MA proofs of proximity sense, refer to last month’s open problem).

by Seshadhri at March 04, 2014 12:25 AM UTC

### Approximating Persistent Homology in Euclidean Space Through Collapses

Authors: Magnus Bakke Botnan, Gard Spreemann
Download: PDF
Abstract: The \v{C}ech complex is one of the most widely used tools in applied algebraic topology. Unfortunately, due to the inclusive nature of the \v{C}ech filtration, the number of simplices grows exponentially in the number of input points. A practical consequence is that computations may have to terminate at smaller scales than what the application calls for.

In this paper we propose two methods to approximate the \v{C}ech persistence module. Both constructions are built on the level of spaces, i.e. as sequences of simplicial complexes induced by nerves. We also show how the bottleneck distance between such persistence modules can be understood by how tightly they are sandwiched on the level of spaces. In turn, this implies the correctness of our approximation methods.

Finally, we implement our methods and apply them to some example point clouds in Euclidean space.