# Theory of Computing Blog Aggregator

### Parameterized Complexity of Asynchronous Border Minimization

Authors: Robert Ganian, Martin Kronegger, Andreas Pfandler, Alexandru Popa
Abstract: Microarrays are research tools used in gene discovery as well as disease and cancer diagnostics. Two prominent but challenging problems related to microarrays are the Border Minimization Problem (BMP) and the Border Minimization Problem with given placement (P-BMP).

In this paper we investigate the parameterized complexity of natural variants of BMP and P-BMP under several natural parameters. We show that BMP and P-BMP are in FPT under the following two combinations of parameters: 1) the size of the alphabet (c), the maximum length of a sequence (string) in the input (l) and the number of rows of the microarray (r); and, 2) the size of the alphabet and the size of the border length (o). Furthermore, P-BMP is in FPT when parameterized by c and l. We complement our tractability results with corresponding hardness results.

### Optimality of Fast Matching Algorithms for Random Networks with Applications to Structural Controllability

Abstract: Network control refers to a very large and diverse set of problems including controllability of linear time-invariant dynamical systems evolving over time that have inputs and outputs. The network control problem in this setting is to select the appropriate input to steer the network into a desired output state. Examples of the output state include the throughput of a communications network, transcription factor concentration in a gene regulatory network, customer purchases in a marketing context subject to social influences and the amount of flux flowing through a biochemical network.

We focus on control of linear dynamical systems under the notion of structural controllability which is intimately connected to finding maximum matchings. Hence, a natural objective is studying scalable and fast algorithms for this task. We first show the convergence of matching algorithms for different random networks and then analyze a popular, fast and practical heuristic due to Karp and Sipser. We establish the optimality of both the Karp-Sipser Algorithm as well as a simplification of it, and provide results concerning the asymptotic size of maximum matchings for an extensive class of random networks.

### Competitive Distribution Estimation

Authors: Alon Orlitsky, Ananda Theertha Suresh
Abstract: Estimating an unknown distribution from its samples is a fundamental problem in statistics. The common, min-max, formulation of this goal considers the performance of the best estimator over all distributions in a class. It shows that with $n$ samples, distributions over $k$ symbols can be learned to a KL divergence that decreases to zero with the sample size $n$, but grows unboundedly with the alphabet size $k$.

Min-max performance can be viewed as regret relative to an oracle that knows the underlying distribution. We consider two natural and modest limits on the oracle's power. One where it knows the underlying distribution only up to symbol permutations, and the other where it knows the exact distribution but is restricted to use natural estimators that assign the same probability to symbols that appeared equally many times in the sample.

We show that in both cases the competitive regret reduces to $\min(k/n,\tilde{\mathcal{O}}(1/\sqrt n))$, a quantity upper bounded uniformly for every alphabet size. This shows that distributions can be estimated nearly as well as when they are essentially known in advance, and nearly as well as when they are completely known in advance but need to be estimated via a natural estimator. We also provide an estimator that runs in linear time and incurs competitive regret of $\tilde{\mathcal{O}}(\min(k/n,1/\sqrt n))$, and show that for natural estimators this competitive regret is inevitable. We also demonstrate the effectiveness of competitive estimators using simulations.

### On Limits of Dense Packing of Equal Spheres in a Cube

Authors: Milos Tatarevic
Abstract: We examine packing of $n$ congruent spheres in a cube when $n$ is close but less than the number of spheres in a regular cubic close-packed (ccp) arrangement of $\lceil p^{3}/2\rceil$ spheres. For this family of packings, the previous best-known arrangements were usually derived from a ccp by omission of a certain number of spheres without changing the initial structure. In this paper, we show that better arrangements exist for all $n\leq\lceil p^{3}/2\rceil-2$. We introduce an optimization method to reveal improvements of these packings, and present many new improvements for $n\leq4629$.

### D3-Tree: A Dynamic Distributed Deterministic Load - Balancer for decentralized tree structures

Authors: Efrosini Sourla, Spyros Sioutas, Kostas Tsichlas, Christos Zaroliagis
Abstract: In this work, we propose D3-Tree, a dynamic distributed deterministic structure for data management in decentralized networks. We present in brief the theoretical algorithmic analysis, in which our proposed structure is based on, and we describe thoroughly the key aspects of the implementation. Conducting experiments, we verify that the implemented structure outperforms other well-known hierarchical tree-based structures, since it provides better complexities regarding load-balancing operations. More specifically, the structure achieves a logarithmic amortized bound, using an efficient deterministic load-balancing mechanism, which is general enough to be applied to other hierarchical tree-based structures. Moreover, we investigate the structure's fault tolerance, which hasn't been sufficiently tackled in previous work, both theoretically and through rigorous experimentation. We prove that D3-Tree is highly fault tolerant, since, even for massive node failures, it achieves a significant success rate in element queries. Afterwards we go one step further, in order to achieve sub-logarithmic complexity and propose the ART+ structure (Autonomous Range Tree), exploiting the excellent performance of D3-Tree. ART+ is a fully dynamic and fault-tolerant structure, which achieves sub-logarithmic performance for query and update operations and performs load-balancing in sub-logarithmic amortized cost.

### A Robust and Efficient Method for Solving Geometrical Constraint Problems by Homotopy

Authors: Rémi Imbach, Pascal Mathis, Pascal Schreck
Abstract: The goal of Geometric Constraint Solving is to find 2D or 3D placements of some geometric primitives fulfilling some geometric constraints. The common guideline is to solve them by a numerical iterative method (e.g. Newton-Raphson method). A sole solution is obtained whereas many exist. But the number of solutions can be exponential and methods should provide solutions close to a sketch drawn by the user. Assuming that a decomposition-recombination planner is used, we consider irreducible problems. Geometric reasoning can help to simplify the underlying system of equations by changing a few equations and triangularizing it. This triangularization is a geometric construction of solutions, called construction plan. We aim at finding several solutions close to the sketch on a one-dimensional path defined by a global parameter-homotopy using the construction plan. Some numerical instabilities may be encountered due to specific geometric configurations. We address this problem by changing on-the-fly the construction plan. Numerical results show that this hybrid method is efficient and robust.

### Links: HBR article on Women in STEM and AAUW Report

A latest article on bias issues for women in STEM by Harvard Business Review.

A slide presentation by AAUW related to their report: Solving the Equation: The Variables for Women’s Success in Engineering and Computing.  The full report can be downloaded for free here.

by Michael Mitzenmacher (noreply@blogger.com) at March 28, 2015 07:32 PM UTC

### Research Days

 Nisheeth Vishnoi, (whenever I see him, I am reminded that thoughts --- do and have to --- run ultra deep in research), is organizing this year's Research Day at EPFL, Lausanne, June 30, poster on the left.

by metoo (noreply@blogger.com) at March 28, 2015 11:40 AM UTC

### Log-concavity and lower bounds for arithmetic circuits

Authors: Ignacio García-Marco, Pascal Koiran, Sébastien Tavenas
Abstract: One question that we investigate in this paper is, how can we build log-concave polynomials using sparse polynomials as building blocks? More precisely, let $f = \sum\_{i = 0}^d a\_i X^i \in \mathbb{R}^+[X]$ be a polynomial satisfying the log-concavity condition $a\_i^2 \textgreater{} \tau a\_{i-1}a\_{i+1}$ for every $i \in \{1,\ldots,d-1\},$ where $\tau \textgreater{} 0$. Whenever $f$ can be written under the form $f = \sum\_{i = 1}^k \prod\_{j = 1}^m f\_{i,j}$ where the polynomials $f\_{i,j}$ have at most $t$ monomials, it is clear that $d \leq k t^m$. Assuming that the $f\_{i,j}$ have only non-negative coefficients, we improve this degree bound to $d = \mathcal O(k m^{2/3} t^{2m/3} {\rm log^{2/3}}(kt))$ if $\tau \textgreater{} 1$, and to $d \leq kmt$ if $\tau = d^{2d}$.

This investigation has a complexity-theoretic motivation: we show that a suitable strengthening of the above results would imply a separation of the algebraic complexity classes VP and VNP. As they currently stand, these results are strong enough to provide a new example of a family of polynomials in VNP which cannot be computed by monotone arithmetic circuits of polynomial size.

### Sensitivity versus Certificate Complexity of Boolean Functions

Authors: Andris Ambainis, Krišjānis Prūsis, Jevgēnijs Vihrovs
Abstract: Sensitivity, block sensitivity and certificate complexity are basic complexity measures of Boolean functions. The famous sensitivity conjecture claims that sensitivity is polynomially related to block sensitivity. However, it has been notoriously hard to obtain even exponential bounds. Since block sensitivity is known to be polynomially related to certificate complexity, an equivalent of proving this conjecture would be showing that certificate complexity is polynomially related to sensitivity. Previously, it has been shown that $bs(f) \leq C(f) \leq 2^{s(f)-1} s(f) - (s(f)-1)$. In this work, we give a better upper bound of $bs(f) \leq C(f) \leq \max\left(2^{s(f)-1}\left(s(f)-\frac 1 3\right), s(f)\right)$ using a recent theorem limiting the structure of function graphs. We also examine relations between these measures for functions with small 1-sensitivity $s_1(f)$ and arbitrary 0-sensitivity $s_0(f)$.

### Analyzing Adaptive Cache Replacement Strategies

Authors: Leo Shao, Mario E. Consuegra, Raju Rangaswami, Giri Narasimhan
Abstract: Adaptive Replacement Cache (ARC) and Clock with Adaptive Replacement (CAR) are state-of-the-art "adaptive" cache replacement algorithms invented to improve on the shortcomings of classical cache replacement policies such as LRU and Clock. Both ARC and CAR have been shown to outperform their classical and popular counterparts in practice. However, for over a decade, no theoretical proof of the competitiveness of ARC and CAR is known. We prove that for a cache of size N, (a) ARC is 4N-competitive, and (b) CAR is 21N-competitive, thus proving that no "pathological" worst-case request sequence exists that could make them perform much worse than LRU.

### Online Dictionary Matching with One Gap

Authors: Amihood Amir, Tsvi Kopelowitz, Avivit Levy, Seth Pettie, Ely Porat, B. Riva Shalom
Abstract: The Online Dictionary Matching with Gaps Problem (DMG), is the following. Preprocess a dictionary $D$ of $d$ gapped patterns $P_1,\ldots,P_d$ such that given a query online text $T$ presented one character at a time, each time a character arrives we must report the subset of patterns that have a match ending at this character. A gapped pattern $P_i$ is of the form $P_{i,1}\{\alpha_i,\beta_i\}P_{i,2}$, where $P_{i,1},P_{i,2}\in \Sigma^*$ and $\{\alpha_i,\beta_i\}$ matches any substring with length between $\alpha_i$ and $\beta_i$. Finding efficient solutions for DMG has proven to be a difficult algorithmic challenge. Little progress has been made on this problem and to date, no truly efficient solutions are known. In this paper, we give a new DMG algorithm and provide compelling evidence that no substantially faster algorithm exists, based on the Integer3SUM conjecture.

### Prize Announcements

John Nash will receive the 2015 Abel Prize (the most prestigious prize in mathematics besides the Fields Medal). The Norwegian Academy of Sciences and Letters is awarding the prize not for Nash’s work on game theory, but for his (and Louis Nirenberg’s) “striking and seminal contributions to the theory of nonlinear partial differential equations and its applications to geometric analysis”. Nash is the first person to win the Nobel Prize and the Abel Prize.

Coincidentally, it has been announced that former Turing’s Invisible Hand blogger Ariel Procaccia will receive the IJCAI-2015 Computers and Thought Award. Ariel is “recognized for his contributions to the fields of computational social choice and computational economics, and for efforts to make advanced fair division techniques more widely accessible”. Congratulations!

by felixbrandt at March 26, 2015 01:42 PM UTC

### TCS+ Talk: Wednesday, April 1 — Shachar Lovett (UCSD)

On Wednesday, April 1st at 1 PM Eastern Time (10 AM Pacific Time, 7 PM Central European Time, 5 PM UTC), Shachar Lovett will talk about “Structure and Pseudo-Randomness in Coding Theory” (abstract below).

Please sign up on the online form if you wish to join the talk as a group.

The theory of structure and pseudo-randomness has been very influential in several areas of mathematics, such as number theory, graph theory and harmonic analysis. It is also been influential in theoretical computer science, with applications in complexity theory, cryptography and property testing. At a high level, it allows to analyze arbitrary objects by decomposing them to a “structural” component and a “pseudo-random” component. The pseudo-random component behaves in many senses like random noise, while the structural component has a concise representation which makes it amenable to analysis and algorithmic manipulation.
In this talk, I will describe applications of this paradigm to coding theory. I will describe a new general approach to list decoding, which follows by decomposing an arbitrary received word to a structural received word and pseudo-random noise. This allows for a simplified analysis of the list decoding problem. In particular, I will describe how this approach leads to a resolution of a conjecture by Gopalan, Klivans and Zuckerman [STOC 2008], that the list decoding radius of Reed-Muller codes (in certain regimes) is equal to the minimal distance of the code.
Based on joint works with Abhishek Bhowmick.

PS: For those viewers that watch us from Europe, please notice the daylight saving times change will occur the week-end prior to the talk.

by plustcs at March 26, 2015 11:41 AM UTC

### Tutorial talks, and a reminder that total search problems are not NP-complete

from Paul Goldberg

Here is a link to recordings of invited and tutorial talks at the recent STACS 2015 conference. (Slides are available here.) These recordings include my tutorial talk on algorithmic game theory, all 3 hours of it, which may possibly be of use to writers of articles with titles like The Top Ten Mistakes you make When Giving a Technical Talk.

Possibly the commonest mistake (other than dressing badly) is to assume too much knowledge on the part of the audience. This mistake wastes the opportunity for a “big win” in which you present some fact or result that most people in the room don’t already know, but which should by rights be ingrained in their DNA. I got to do that via presenting Megiddo’s 1988 result that NP total search problems cannot be NP-complete unless NP=co-NP. To be clear, an NP total search problem is one where every instance has an easy-to-check solution, for example FACTORING, where the output is a prime factorization of an input number, or NASH, where the output is a Nash equilibrium of a given bimatrix game. This is the reason why we don’t bother to look for an NP-hardness result for NASH, and have to use an obscure-looking complexity class like PPAD.

It's possible to write a convincing proof sketch as follows. Consider what would it would mean if an NP total search problem (like FACTORING) was NP-complete. There should then be a poly-time reduction from (say) SAT to FACTORING. That is, an algorithm A that can solve SAT efficiently, provided that it has access to an efficient solver for FACTORING. Now consider what happens when pass a non-satisfiable formula Φ to A. What you end up concluding is that any run of A must constitute a (short) certificate that Φ is not satisfiable, implying NP=co-NP. A will process Φ, from time to time constructing instances of FACTORING that it can solve “for free”. Since every such instance of FACTORING must have a solution, A cannot fail for reasons of presenting the FACTORING oracle with an unsolvable instance, so must run its course. If it ends without finding a satisfying assignment, there isn’t one.

by Paul Goldberg (noreply@blogger.com) at March 26, 2015 10:12 AM UTC

### Computing the partition function of a polynomial on the Boolean cube

Authors: Alexander Barvinok
Abstract: For a polynomial f: {-1, 1}^n --> C, we define the partition function as the average of exp{ lambda f(x) } over all points x in {-1, 1}^n, where lambda in C is a parameter. We present an algorithm, which, given such f, lambda and epsilon >0 approximates the partition function within a relative error of epsilon in N^{ O (ln n -ln epsilon) } time provided |lambda| < ( 2L sqrt{d} )^{-1}, where d is the degree, L is (roughly) the Lipschitz constant of f and N is the number of monomials in f. As a corollary, we obtain an algorithm of N^{ O (ln n) } complexity which produces an approximate solution y in {-1, 1}^n in the problem of finding the maximum of a real polynomial f on the Boolean cube and show that the probability that f(x) > f(y) for a random x in {-1,1}^n is exponentially small in n for a reasonably wide class of polynomials f.

### Building Efficient and Compact Data Structures for Simplicial Complexes

Authors: Jean-Daniel Boissonnat, Karthik C. S., Sebastien Tavenas
Abstract: The Simplex Tree (ST) is a recently introduced data structure that can represent abstract simplicial complexes of any dimension and allows efficient implementation of a large range of basic operations on simplicial complexes. In this paper, we show how to optimally compress the Simplex Tree while retaining its functionalities. In addition, we propose two new data structures called the Maximal Simplex Tree (MxST) and the Simplex Array List (SAL). We analyze the compressed Simplex Tree, the Maximal Simplex Tree, and the Simplex Array List under various settings.

### The Complexity of Phylogeny Constraint Satisfaction

Authors: Manuel Bodirsky, Peter Jonsson, Trung Van Pham
Abstract: We systematically study the computational complexity of a broad class of computational problems in phylogenetic reconstruction. The class contains for example the rooted triple consistency problem, forbidden subtree problems, the quartet consistency problem, and many other problems studied in the bioinformatics literature. The studied class of problems can be described as the constraint satisfaction problems for constraint languages having a quantifier-free definition over the rooted triple relation (known in model theory as the homogeneous binary branching C-relation); in the following we call such problems phylogeny problems. We show that every phylogeny problem can be solved in polynomial time or is NP-complete. Our classification establishes a dichotomy for a large class of infinite structures that we believe is of independent interest in universal algebra, model theory, and topology. On the algorithmic side, we generalize a well-known polynomial-time algorithm of Aho, Sagiv, Szymanski, and Ullman for the rooted triple consistency problem. Our algorithm repeatedly solves linear equation systems to construct a solution in polynomial time. We then show that every phylogeny problem that cannot be solved by our algorithm is NP-complete. The proof of our main result combines results and techniques from various research areas: a recent classification of the model-complete cores of the reducts of the homogeneous binary branching C-relation, Leeb's Ramsey theorem for rooted trees, and universal algebra.

### Dual Polynomials for Collision and Element Distinctness

Authors: Mark Bun, Justin Thaler
Abstract: The approximate degree of a Boolean function $f: \{-1, 1\}^n \to \{-1, 1\}$ is the minimum degree of a real polynomial that approximates $f$ to within error $1/3$ in the $\ell_\infty$ norm. In an influential result, Aaronson and Shi (J. ACM 2004) proved tight $\tilde{\Omega}(n^{1/3})$ and $\tilde{\Omega}(n^{2/3})$ lower bounds on the approximate degree of the Collision and Element Distinctness functions, respectively. Their proof was non-constructive, using a sophisticated symmetrization argument and tools from approximation theory.

More recently, several open problems in the study of approximate degree have been resolved via the construction of dual polynomials. These are explicit dual solutions to an appropriate linear program that captures the approximate degree of any function. We reprove Aaronson and Shi's results by constructing explicit dual polynomials for the Collision and Element Distinctness functions.

### Shortest-Path Queries in Planar Graphs on GPU-Accelerated Architectures

Authors: Guillaume Chapuis, Hristo Djidjev
Abstract: We develop an efficient parallel algorithm for answering shortest-path queries in planar graphs and implement it on a multi-node CPU/GPU clusters. The algorithm uses a divide-and-conquer approach for decomposing the input graph into small and roughly equal subgraphs and constructs a distributed data structure containing shortest distances within each of those subgraphs and between their boundary vertices. For a planar graph with $n$ vertices, that data structure needs $O(n)$ storage per processor and allows queries to be answered in $O(n^{1/4})$ time.

### An Integer Linear Programming Solution to the Telescope Network Scheduling Problem

Authors: Sotiria Lampoudi, Eric Saunders, Jason Eastman
Abstract: Telescope networks are gaining traction due to their promise of higher resource utilization than single telescopes and as enablers of novel astronomical observation modes. However, as telescope network sizes increase, the possibility of scheduling them completely or even semi-manually disappears. In an earlier paper, a step towards software telescope scheduling was made with the specification of the Reservation formalism, through the use of which astronomers can express their complex observation needs and preferences. In this paper we build on that work. We present a solution to the discretized version of the problem of scheduling a telescope network. We derive a solvable integer linear programming (ILP) model based on the Reservation formalism. We show computational results verifying its correctness, and confirm that our Gurobi-based implementation can address problems of realistic size. Finally, we extend the ILP model to also handle the novel observation requests that can be specified using the more advanced Compound Reservation formalism.

### News Aplenty

Both the Turing Award and the Abel Prize were announced this morning.

MIT databases researcher Michael Stonebraker wins the ACM Turing Award. He developed INGRES one of the first relational databases. Stonebraker is the first Turing award winner since the prize went up to a cool million dollars.

John Nash and Louis Nirenberg share this years Abel Prize “for striking and seminal contributions to the theory of nonlinear partial differential equations and its applications to geometric analysis.” This work on PDEs is completely independent from the equilibrium results that won Nash the 1994 Nobel Prize.

Earlier this week the CRA released their latest Best Practice Memo: Incentivizing Quality and Impact: Evaluating Scholarship in Hiring, Tenure, and Promotion. In short: Emphasize quality over quantity in research.

The NSF announced their public access plan to ensure that research is "available for download, reading and analysis free of charge no later than 12 months after initial publication".

by Lance Fortnow (noreply@blogger.com) at March 25, 2015 06:10 PM UTC

### 2015 Swedish Summer School in Computer Science – apply soon

Last summer I gave a mini-course on the Sum of Squares algorithm in the Swedish Summer School of Computer Science.

It was a great experience – the venue was  Djurönäset  – a hotel in the beautiful Stockholm archipelgo with stunning views and great food. It was organized very smoothly by Jakob Nordström, Per Austrin, and Johan Håstad, andthe students were bright and enthusiastic. I could see them collaborating on homework problems well into the night, and I’m sure that beyond learning on theoretical computer science, they made great new connections.

In fact, I believe the only downside to that summer school was my own course – this was the first time I taught  this material and so it felt very much like a beta version. (On the other hand, Ryan O’Donnell gave a fantastic lecture series on analysis of Boolean function, so the students didn’t suffer too much.) Luckily, this year the summer school features two absolute top experts on coding theory, Venkat Guruswami and Sergey Yekhanin, who will give courses on developments that are not yet as widely known in the theoretical computer science community as they should be, but have had wide reaching impact. Polar codes are considered one of the great recent breakthroughs in coding theory as they provide in some sense the first explicit codes that achieve Shannon’s non-constructive bounds for the capacity of the binary symmetric channel. Codes with local decoding are a prime example of the surprising interplay between theory and practice. Originally arising from probabilistically checkable proofs, they have recently found very significant industrial applications,  saving companies 100’s of millions of dollars.

I hear that there are still a few slots left. If you are a Ph.D student, Masters student, or postdoc who want to know more about coding theory, I can hardly think of a more pleasant and beneficial way to do so. If you are interested, please apply as soon as possible at http://s3cs.csc.kth.se/application.php . For any questions, the organizers can be contacted at  s3cs-2015@csc.kth.se.

by Boaz Barak at March 25, 2015 04:39 PM UTC

### TR15-041 | Dual Polynomials for Collision and Element Distinctness | Mark Bun, Justin Thaler

from ECCC papers

The approximate degree of a Boolean function $f: \{-1, 1\}^n \to \{-1, 1\}$ is the minimum degree of a real polynomial that approximates $f$ to within error $1/3$ in the $\ell_\infty$ norm. In an influential result, Aaronson and Shi (J. ACM 2004) proved tight $\tilde{\Omega}(n^{1/3})$ and $\tilde{\Omega}(n^{2/3})$ lower bounds on the approximate degree of the Collision and Element Distinctness functions, respectively. Their proof was non-constructive, using a sophisticated symmetrization argument and tools from approximation theory. More recently, several open problems in the study of approximate degree have been resolved via the construction of dual polynomials. These are explicit dual solutions to an appropriate linear program that captures the approximate degree of any function. We reprove Aaronson and Shi's results by constructing explicit dual polynomials for the Collision and Element Distinctness functions.

### Is It New?

from Richard Lipton

How to tell algorithms apart

Edgar Daylight was trained both as a computer scientist and as a historian. He writes a historical blog themed for his near-namesake Edsger Dijkstra, titled, “Dijkstra’s Rallying Cry for Generalization.” He is a co-author with Don Knuth of the 2014 book: Algorithmic Barriers Failing: P=NP?, which consists of a series of interviews of Knuth, extending their first book in 2013.

The book is essentially a conversation between Knuth and Daylight that ranges over Knuth’s many contributions and his many insights.

One of the most revealing discussions, in my opinion, is Knuth’s discussion of his view of asymptotic analysis. Let’s turn and look at that next.

## Asymptotic Analysis

We all know what asymptotic analysis is: Given an algorithm, determine how many operations the algorithm uses in worst case. For example, the naïve matrix product of square ${n}$ by ${n}$ matrices runs in time ${O(n^{3})}$. Knuth dislikes the use of ${O}$ notation, which he thinks is used often to hide important information.

For example, the correct the count of operations for matrix product is actually

$\displaystyle \frac{1}{3} n^{3} + O(n^{2}).$

In general Knuth suggests that we determine, if possible, the number of operations as

$\displaystyle f(n) + O(g(n)),$

where ${f(n)}$ and ${g(n)}$ are both explicit functions and ${g}$ is lower-order. The idea is that not only does this indicate more precisely that the number of operations is ${\Theta(f(n))}$, not just ${O(f(n))}$, but also is forces us to give the exact constant hiding under the ${\Theta}$. If the constant is only approached as ${n}$ increases, perhaps the difference can be hidden inside the lower-order ${g(n)}$ term.

An example from the book (page 29) is a discussion of Tony Hoare’s quicksort algorithm. Its running time is ${O(n\log n)}$, on average. This allows one, as Knuth says, to throw all the details away, including the exact machine model. He goes on to say that he prefers to know:

${\dots}$ that quicksort makes ${2n\ln n - \frac{8}{3}n + O(\log n)}$ comparisons, on average, and ${\frac{1}{3}n \ln n - \frac{4}{9}n + O(\log n)}$ exchanges, and ${\frac{2}{3}n + O(1)}$ stack adjustments, when sorting random numbers.

## The Novelty Game

Theorists create algorithms as one of their favorite activities. A classic way to get a paper accepted into a top conference is to say: In this paper we improve the running time of the best known algorithm for X from order ${n^{3}\log n}$ to ${O(n^{3})}$ by applying methods Y.

But is the algorithm of this paper really new? One possibility is that the analysis of the previous paper was too coarse and the algorithms are actually the same. Or at least equivalent. The above information is logically insufficient to rule out this possibility.

Asymptotic analysis à-la Knuth comes to the rescue. Suppose that we proved that the older algorithm X ran in time

$\displaystyle \frac{1}{4}n^{3}\log n + O(n).$

Then we would be able to conclude—without any doubt—that the new algorithm was indeed new. Knuth points this out in the interviews, and adds a comment about practice. Of course losing the logarithmic factor may not yield a better running time in practice, if the hidden constant in ${O(n^{3})}$ is huge. But whatever the constant is, the new algorithm must be new. It must contain some new idea.

This is quite a nice use of analysis of algorithms in my opinion. Knowing that an algorithm contains, for certain, some new idea, may lead to further insights. It may eventually even lead to an algorithm that is better both in theory and in practice.

## Open Problems

Daylight’s book is a delight—a pun? As always Knuth has lots to say, and lots of interesting insights. The one caveat about the book is the subtitle: “P=NP?” I wish Knuth had added more comments about this great problem. He does comment on the early history of the problem: for example, explaining how Dick Karp came down to Stanford to talk about his brilliant new paper, and other comments have been preserved in a “Twenty Questions” session from last May. Knuth also reminds us in the book that as reported in the January 1973 issue of SIGACT News, Manny Blum gave odds of 100:1 in a bet with Mike Paterson that P and NP are not equal.

[fixed picture glitch at top]

by rjlipton at March 25, 2015 12:47 AM UTC

### On the Complexity of Nondeterministically Testable Hypergraph Parameters

Authors: Marek Karpinski, Roland Markó
Abstract: The paper proves the equivalence of the notions of nondeterministic and deterministic parameter testing for uniform dense hypergraphs of arbitrary order. It generalizes the result previously known only for the case of simple graphs. By a similar method we establish also the equivalence between nondeterministic and deterministic hypergraph property testing, answering the open problem in the area. We introduce a new notion of a cut norm for hypergraphs of higher order, and employ regularity techniques combined with the ultralimit method.

### Recognition and Complexity of Point Visibility Graphs

Authors: Jean Cardinal, Udo Hoffmann
Abstract: A point visibility graph is a graph induced by a set of points in the plane, where every vertex corresponds to a point, and two vertices are adjacent whenever the two corresponding points are visible from each other, that is, the open segment between them does not contain any other point of the set. We study the recognition problem for point visibility graphs: given a simple undirected graph, decide whether it is the visibility graph of some point set in the plane. We show that the problem is complete for the existential theory of the reals. Hence the problem is as hard as deciding the existence of a real solution to a system of polynomial inequalities. The proof involves simple substructures forcing collinearities in all realizations of some visibility graphs, which are applied to the algebraic universality constructions of Mn\"ev and Richter-Gebert. This solves a longstanding open question and paves the way for the analysis of other classes of visibility graphs. Furthermore, as a corollary of one of our construction, we show that there exist point visibility graphs that do not admit any geometric realization with points having integer coordinates.

### Straight Line Triangle Representations

Authors: Nieke Aerts, Stefan Felsner
Abstract: A straight line triangle representation (SLTR) of a planar graph is a straight line drawing such that all the faces including the outer face have triangular shape. Such a drawing can be viewed as a tiling of a triangle using triangles with the input graph as skeletal structure. In this paper we present a characterization of graphs that have an SLTR. The characterization is based on flat angle assignments, i.e., selections of angles of the graph that have size~$\pi$ in the representation. We also provide a second characterization in terms of contact systems of pseudosegments. With the aid of discrete harmonic functions we show that contact systems of pseudosegments that respect certain conditions are stretchable. The stretching procedure is then used to get straight line triangle representations. Since the discrete harmonic function approach is quite flexible it allows further applications, we mention some of them. The drawback of the characterization of SLTRs is that we are not able to effectively check whether a given graph admits a flat angle assignment that fulfills the conditions. Hence it is still open to decide whether the recognition of graphs that admit straight line triangle representation is polynomially tractable.

### Graph isomorphism completeness for trapezoid graphs

Authors: Asahi Takaoka
Abstract: The complexity of the graph isomorphism problem for trapezoid graphs has been open over a decade. This paper shows that the problem is GI-complete. More precisely, we show that the graph isomorphism problem is GI-complete for comparability graphs of partially ordered sets with interval dimension 2 and height 3. In contrast, the problem is known to be solvable in polynomial time for comparability graphs of partially ordered sets with interval dimension at most 2 and height at most 2.

### Binary and Multi-Bit Coding for Stable Random Projections

Authors: Ping Li
Abstract: Motivated by the recent work on "one scan 1-bit compressed sensing", which was based on $\alpha$-stable random projections with small $\alpha$, we develop efficient coding schemes for storing the measurements and recovering the $l_\alpha$ norms of signals. Coding at the bit level is crucial especially when the measurements are heavy-tailed (due to stable random projections). Interestingly, using merely 1-bit information does not result in significant loss of accuracy if the parameter is chosen appropriately. For example, when $\alpha=0+$, 1, and 2, the coefficients of the optimal estimation variances using full (i.e., infinite-bit) information are 1, 2, and 2, respectively. With the 1-bit scheme and appropriately chosen parameters, the corresponding variance coefficients are 1.544, $\pi^2/4$, and 3.066, respectively. Using 2 or more bits reduces the estimation variances and, importantly, stabilizes the performance so that the variances are not sensitive to parameters. The computational cost of the estimation procedure is minimal if we implement the method using look-up tables and a small number (e.g., 1, 2, or 3) of bits. Note that the method presented in this paper is applicable to the general scale distribution family, not just $\alpha$-stable distributions.

### TR15-040 | Proper PAC learning is compressing | Shay Moran, Amir Yehudayoff

from ECCC papers

We prove that proper PAC learnability implies compression. Namely, if a concept $C \subseteq \Sigma^X$ is properly PAC learnable with $d$ samples, then $C$ has a sample compression scheme of size $2^{O(d)}$. In particular, every boolean concept class with constant VC dimension has a sample compression scheme of constant size. This answers a question of Littlestone and Warmuth (1986). The proof uses an approximate minimax phenomenon for boolean matrices of low VC dimension.

### NSF mandates (sort of) open access

See here.

NSF-sponsored papers should be freely available no more than 12 months after publication in a journal.

This is not perfect, but a step in the right direction. Computer scientists should insist that conference proceedings are treated the same way, and made freely available no more than 12 months after publication.

Hat tip: Lance Fortnow.

by Boaz Barak at March 24, 2015 04:34 AM UTC

### Exact sampling of graphs with prescribed degree correlations

Authors: Kevin E. Bassler, Charo I. Del Genio, Péter L. Erdős, István Miklós, Zoltán Toroczkai
Abstract: Many real-world networks exhibit correlations between the node degrees. For instance, in social networks nodes tend to connect to nodes of similar degree. Conversely, in biological and technological networks, high-degree nodes tend to be linked with low-degree nodes. Degree correlations also affect the dynamics of processes supported by a network structure, such as the spread of opinions or epidemics. The proper modelling of these systems, i.e., without uncontrolled biases, requires the sampling of networks with a specified set of constraints. We present a solution to the sampling problem when the constraints imposed are the degree correlations. In particular, we develop an efficient and exact method to construct and sample graphs with a specified joint-degree matrix, which is a matrix providing the number of edges between all the sets of nodes of a given degree, for all degrees, thus completely specifying all pairwise degree correlations, and additionally, the degree sequence itself. Our algorithm always produces independent samples without backtracking. The complexity of the graph construction algorithm is O(NM) where N is the number of nodes and M is the number of edges.

### A Fast Heuristic Algorithm for Redundancy Removal

Authors: Maxim Teslenko, Elena Dubrova
Abstract: Redundancy identification is an important step of the design flow that typically follows logic synthesis and optimization. In addition to reducing circuit area, power consumption, and delay, redundancy removal also improves testability. All commercially available synthesis tools include a redundancy removal engine which is often run multiple times on the same netlist during optimization. This paper presents a fast heuristic algorithm for redundancy removal in combinational circuits. Our idea is to provide a quick partial solution which can be used for the intermediate redundancy removal runs instead of exact ATPG or SAT-based approaches. The presented approach has a higher implication power than the traditional heuristic algorithms, such as FIRE, e.g. on average it removes 37% more redundancies than FIRE with no penalty in runtime.

### Efficient Generation of Stable Planar Cages for Chemistry

Authors: Dominique Barth, Olivier David, Franck Quessette, Vincent Reinhard, Yann Strozecki, Sandrine Vial
Abstract: In this paper we describe an algorithm which generates all colored planar maps with a good minimum sparsity from simple motifs and rules to connect them. An implementation of this algorithm is available and is used by chemists who want to quickly generate all sound molecules they can obtain by mixing some basic components.

### A Machine Learning Approach to Predicting the Smoothed Complexity of Sorting Algorithms

Authors: Bichen Shi, Michel Schellekens, Georgiana Ifrim
Abstract: Smoothed analysis is a framework for analyzing the complexity of an algorithm, acting as a bridge between average and worst-case behaviour. For example, Quicksort and the Simplex algorithm are widely used in practical applications, despite their heavy worst-case complexity. Smoothed complexity aims to better characterize such algorithms. Existing theoretical bounds for the smoothed complexity of sorting algorithms are still quite weak. Furthermore, empirically computing the smoothed complexity via its original definition is computationally infeasible, even for modest input sizes. In this paper, we focus on accurately predicting the smoothed complexity of sorting algorithms, using machine learning techniques. We propose two regression models that take into account various properties of sorting algorithms and some of the known theoretical results in smoothed analysis to improve prediction quality. We show experimental results for predicting the smoothed complexity of Quicksort, Mergesort, and optimized Bubblesort for large input sizes, therefore filling the gap between known theoretical and empirical results.

### On some provably correct cases of variational inference for topic models

Authors: Pranjal Awasthi, Andrej Risteski
Abstract: Variational inference is a very efficient and popular heuristic used in various forms in the context of latent variable models. It's closely related to Expectation Maximization (EM), and is applied when exact EM is computationally infeasible. Despite being immensely popular, current theoretical understanding of the effectiveness of variaitonal inference based algorithms is very limited. In this work we provide the first analysis of instances where variational inference algorithms converge to the global optimum, in the setting of topic models.

More specifically, we show that variational inference provably learns the optimal parameters of a topic model under natural assumptions on the topic-word matrix and the topic priors. The properties that the topic word matrix must satisfy in our setting are related to the topic expansion assumption introduced in (Anandkumar et al., 2013), as well as the anchor words assumption in (Arora et al., 2012c). The assumptions on the topic priors are related to the well known Dirichlet prior, introduced to the area of topic modeling by (Blei et al., 2003).

It is well known that initialization plays a crucial role in how well variational based algorithms perform in practice. The initializations that we use are fairly natural. One of them is similar to what is currently used in LDA-c, the most popular implementation of variational inference for topic models. The other one is an overlapping clustering algorithm, inspired by a work by (Arora et al., 2014) on dictionary learning, which is very simple and efficient.

While our primary goal is to provide insights into when variational inference might work in practice, the multiplicative, rather than the additive nature of the variational inference updates forces us to use fairly non-standard proof arguments, which we believe will be of general interest.

### Sum-of-squares lower bounds for planted clique

Authors: Raghu Meka, Aaron Potechin, Avi Wigderson
Abstract: Finding cliques in random graphs and the closely related "planted" clique variant, where a clique of size k is planted in a random G(n, 1/2) graph, have been the focus of substantial study in algorithm design. Despite much effort, the best known polynomial-time algorithms only solve the problem for k ~ sqrt(n).

In this paper we study the complexity of the planted clique problem under algorithms from the Sum-of-squares hierarchy. We prove the first average case lower bound for this model: for almost all graphs in G(n,1/2), r rounds of the SOS hierarchy cannot find a planted k-clique unless k > n^{1/2r} (up to logarithmic factors). Thus, for any constant number of rounds planted cliques of size n^{o(1)} cannot be found by this powerful class of algorithms. This is shown via an integrability gap for the natural formulation of maximum clique problem on random graphs for SOS and Lasserre hierarchies, which in turn follow from degree lower bounds for the Positivestellensatz proof system.

We follow the usual recipe for such proofs. First, we introduce a natural "dual certificate" (also known as a "vector-solution" or "pseudo-expectation") for the given system of polynomial equations representing the problem for every fixed input graph. Then we show that the matrix associated with this dual certificate is PSD (positive semi-definite) with high probability over the choice of the input graph.This requires the use of certain tools. One is the theory of association schemes, and in particular the eigenspaces and eigenvalues of the Johnson scheme. Another is a combinatorial method we develop to compute (via traces) norm bounds for certain random matrices whose entries are highly dependent; we hope this method will be useful elsewhere.

### Maximization Problems Parameterized Using Their Minimization Versions: The Case of Vertex Cover

Authors: Meirav Zehavi