Theory of Computing Blog Aggregator

Georgia Tech in its library renovation will move the vast majority of its books into a combined Emory/Tech Library Service Center off-campus. Very few people use the stacks anymore, particularly at this institute dominated by engineering and computer science.

As a young assistant professor at the University of Chicago in the 90's, I averaged about one trip a day to the joint Math/CS library to track down research papers or old books to find various mathematical definitions and theorems. These days I can access all these papers online and get access to mathematical definitions through Wikipedia and other sites.

I miss seeing the papers near the papers I'm looking for. I remember running across a paper grappling with the union of two complexity classes, which inspired this blog post. In order to prove a theorem about context-free languages, we relied on a characterization buried in a 1966 textbook on the topic. The Chicago library had three copies of that book, so I suppose someone once taught from it, a whole course on Context-Free Languages in the 60's.

I last time tracked down books in the library (at Northwestern) to research the history of P v NP for my book, which will itself likely soon follow the others into these "service centers". I can't remember the last time I went to the library to track down information for research purposes.

There's much to miss about looking up research materials in a library. But I also miss my electric typewriter, my almanac and my World Book encyclopedia collection. Can't stop progress.

by Lance Fortnow (noreply@blogger.com) at April 24, 2014 03:01 PM UTC

Authors: Therese Biedl
Download: PDF
Abstract: We show that any $y$-monotone poly-line drawing can be straightened out while maintaining $y$-coordinates and height. The width may increase much, but we also show that on some graphs exponential width is required if we do not want to increase the height. Likewise $y$-monotonicity is required: there are poly-line drawings (not $y$-monotone) that cannot be straightened out while maintaining the height. We give some applications of our result.

at April 24, 2014 12:00 AM UTC

Authors: Yaoyu Wang, Yitong Yin
Download: PDF
Abstract: We study certificates in static data structures. In the cell-probe model, certificates are the cell probes which can uniquely identify the answer to the query. As a natural notion of nondeterministic cell probes, lower bounds for certificates in data structures immediately imply deterministic cell-probe lower bounds. In spite of this extra power brought by nondeterminism, we prove that two widely used tools for cell-probe lower bounds: richness lemma of Miltersen et al. and direct-sum richness lemma of P\v{a}tra\c{s}cu and Thorup, both hold for certificates in data structures with even better parameters. Applying these lemmas and adopting existing reductions, we obtain certificate lower bounds for a variety of static data structure problems. These certificate lower bounds are at least as good as the highest known cell-probe lower bounds for the respective problems. In particular, for approximate near neighbor (ANN) problem in Hamming distance, our lower bound improves the state of the art. When the space is strictly linear, our lower bound for ANN in d-dimensional Hamming space becomes t=\Omega(d), which along with the recent breakthrough for polynomial evaluation of Larsen, are the only two t=\Omega(d) lower bounds ever proved for any problems in the cell-probe model.

at April 24, 2014 12:40 AM UTC

Authors: Nikhil Rao, Parikshit Shah, Stephen Wright
Download: PDF
Abstract: In many signal processing applications, one aims to reconstruct a signal that has a simple representation with respect to a certain basis or frame. Fundamental elements of the basis known as "atoms" allow us to define "atomic norms" that can be used to construct convex regularizers for the reconstruction problem. Efficient algorithms are available to solve the reconstruction problems in certain special cases, but an approach that works well for general atomic norms remains to be found. This paper describes an optimization algorithm called CoGEnT , which produces solutions with succinct atomic representations for reconstruction problems, generally formulated with atomic norm constraints. CoGEnT combines a greedy selection scheme based on the conditional gradient approach with a backward (or "truncation") step that exploits the quadratic nature of the objective to reduce the basis size. We establish convergence properties and validate the algorithm via extensive numerical experiments on a suite of signal processing applications. Our algorithm and analysis are also novel in that they allow for inexact forward steps. In practice, CoGEnT significantly outperforms the basic conditional gradient method, and indeed many methods that are tailored to specific applications, when the truncation steps are defined appropriately. We also introduce several novel applications that are enabled by the atomic norm framework, including tensor completion, moment problems in signal processing, and graph deconvolution.

at April 24, 2014 12:41 AM UTC

Authors: Richard Cole, Howard Karloff
Download: PDF
Abstract: Karloff? and Shirley recently proposed summary trees as a new way to visualize large rooted trees (Eurovis 2013) and gave algorithms for generating a maximum-entropy k-node summary tree of an input n-node rooted tree. However, the algorithm generating optimal summary trees was only pseudo-polynomial (and worked only for integral weights); the authors left open existence of a olynomial-time algorithm. In addition, the authors provided an additive approximation algorithm and a greedy heuristic, both working on real weights. This paper shows how to construct maximum entropy k-node summary trees in time O(k^2 n + n log n) for real weights (indeed, as small as the time bound for the greedy heuristic given previously); how to speed up the approximation algorithm so that it runs in time O(n + (k^4/eps?) log(k/eps?)), and how to speed up the greedy algorithm so as to run in time O(kn + n log n). Altogether, these results make summary trees a much more practical tool than before.

at April 24, 2014 12:40 AM UTC

We give the first representation-independent hardness result for agnostically learning halfspaces with respect to the Gaussian distribution. We reduce from the problem of learning sparse parities with noise with respect to the uniform distribution on the hypercube (sparse LPN), a notoriously hard problem in computer science and show that any algorithm for agnostically learning halfspaces requires $n^{\Omega(\log{(1/\epsilon)})}$ time, ruling out a polynomial time algorithm for the problem. As far as we are aware, this is the first representation-independent hardness result for supervised learning when the underlying distribution is restricted to be a Gaussian. We also show that the problem of agnostically learning sparse polynomials with respect to the Gaussian distribution in polynomial time is as hard as PAC learning DNFs on the uniform distribution in polynomial time. This complements the surprising result of Andoni et. al (2014} who show that sparse polynomials are learnable under {\em random} Gaussian noise in polynomial time. Taken together, these results show the inherent difficulty of designing supervised learning algorithms in Euclidean space even in the presence of strong distributional assumptions. Our results use a novel embedding of random labeled examples from the uniform distribution on the Boolean hypercube into random labeled examples from the Gaussian distribution that allows us to relate the hardness of learning problems on two different domains and distributions.

at April 23, 2014 06:12 PM UTC

A brief synopsis of my general-audience talk at Haverford College. 

I'm currently visiting Haverford College at the invitation of +Sorelle Friedler as part of Haverford's big data lecture series. Today's talk was a general audience talk about data mining, titled 'The Shape Of Information': (GDrive link)
The Shape Of Information 
What makes data mining so powerful, and so ubiquitous? How can the same set of techniques identify patients at risk for a rare genetic disorder, consumers most likely to like Beyonce's latest album, or even a new star from an sky survey ?  
The answer starts with an idea Descartes had nearly 500 years ago. He suggested expressing geometry in terms of numbers (coordinates). This turned out to be a powerful technique that led (among other things) to the development of the calculus. Data mining returns the favor. It starts with sets of numbers that describe a collection of objects. To find patterns in these objects, we create a geometry in which the numbers are coordinates. And just like that, objects become shapes, and the search for information becomes a quest for common structure in these shapes. 
In this search, we are not limited by the geometry of our world: we can dream up ever more intricate geometries that capture the shape of the information that we seek to find in our data. In this sense, data mining is the best kind of science fiction come to life: we craft a world out of our imagination, and let the laws of this world lead us to fascinating discoveries about the data that inhabits it.
I had a great time visiting with Sorelle's students in their data mining class. Haverford College continues to impress me with the quality of their undergrads (and their faculty !)

by Suresh Venkatasubramanian (noreply@blogger.com) at April 23, 2014 04:44 AM UTC

Authors: Arne C. Reimers, Leen Stougie
Download: PDF
Abstract: In the last years the vertex enumeration problem of polyhedra has seen a revival in the study of metabolic networks, which increased the demand for efficient vertex enumeration algorithms for high-dimensional polyhedra given by inequalities. In this paper we apply the concept of branch-decomposition to the vertex enumeration problem of polyhedra $P = \{x : Sx = b, x \geq 0\}$. Therefore, we introduce the concept of $k$-module and show how it relates to the separators of the linear matroid generated by the columns of $S$. This then translates structural properties of the matroidal branch-decomposition to the context of polyhedra. We then use this to present a total polynomial time algorithm for polytopes $P$ for which the branch-width of the linear matroid generated by $S$ is bounded by a constant $k$.

at April 23, 2014 12:00 AM UTC

Authors: Martin Böhm, Jiří Sgall, Rob van Stee, Pavel Veselý
Download: PDF
Abstract: Online Bin Stretching is a semi-online variant of bin packing in which the algorithm has to use the same number of bins as the optimal packing, but is allowed to slightly overpack the bins. The goal is to minimize the amount of overpacking, i.e., the maximum size packed into any bin.

We give an algorithm for Online Bin Stretching with a stretching factor of 1.5 for any number of bins. We also show a specialized algorithm for three bins with a stretching factor of 11/8 = 1.375.

at April 23, 2014 12:41 AM UTC

Authors: Dana Ron, Gilad Tsur
Download: PDF
Abstract: We study a basic problem of approximating the size of an unknown set $S$ in a known universe $U$. We consider two versions of the problem. In both versions the algorithm can specify subsets $T\subseteq U$. In the first version, which we refer to as the group query or subset query version, the algorithm is told whether $T\cap S$ is non-empty. In the second version, which we refer to as the subset sampling version, if $T\cap S$ is non-empty, then the algorithm receives a uniformly selected element from $T\cap S$. We study the difference between these two versions under different conditions on the subsets that the algorithm may query/sample, and in both the case that the algorithm is adaptive and the case where it is non-adaptive. In particular we focus on a natural family of allowed subsets, which correspond to intervals, as well as variants of this family.

at April 23, 2014 12:41 AM UTC

Authors: Mateus de Oliveira Oliveira
Download: PDF
Abstract: It has been known since long time that many NP-hard optimization problems can be solved in polynomial time when restricted to structures of constant treewidth. In this work we provide the first extension of such results to the quantum setting. We show that given a quantum circuit $C$ with $n$ uninitialized inputs, $poly(n)$ gates and treewidth $t$, one can compute in time $(\frac{n}{\delta})^{\exp(O(t))}$ a classical witness $y\in \{0,1\}^n$ that maximizes the acceptance probability of $C$ up to a $\delta$ additive factor. In particular our algorithm runs in polynomial time if $t$ is constant and $1/poly(n) \leq \delta < 1$. For unrestricted values of $t$ this problem is known to be hard for the complexity class QCMA, a quantum generalization of NP. In contrast, we show that the same problem is already NP-hard if $t=O(\log n)$ even when $\delta$ is constant. Finally, we show that for $t=O(\log n)$ and constant $\delta$, it is QMA-hard to find a quantum witness $\ket{\varphi}$ that maximizes the acceptance probability of a quantum circuit of treewidth $t$ up to a $\delta$ additive factor.

at April 23, 2014 12:40 AM UTC

Authors: Martin Böhm, Jiří Sgall, Pavel Veselý
Download: PDF
Abstract: In the Colored Bin Packing problem a sequence of items of sizes up to $1$ arrives to be packed into bins of unit capacity. Each item has one of $c\geq 2$ colors and an additional constraint is that we cannot pack two items of the same color next to each other in the same bin. The objective is to minimize the number of bins.

In the important special case when all items have size zero, we characterize the optimal value to be equal to color discrepancy. As our main result, we give an (asymptotically) 1.5-competitive algorithm which is optimal. In fact, the algorithm always uses at most $\lceil1.5\cdot OPT\rceil$ bins and we show a matching lower bound of $\lceil1.5\cdot OPT\rceil$ for any value of $OPT\geq 2$. In particular, the absolute ratio of our algorithm is $5/3$ and this is optimal.

For items of unrestricted sizes we give an asymptotically $3.5$-competitive algorithm. When the items have sizes at most $1/d$ for a real $d \geq 2$ the asymptotic competitive ratio is $1.5+d/(d-1)$. We also show that classical algorithms First Fit, Best Fit and Worst Fit are not constant competitive, which holds already for three colors and small items.

In the case of two colors---the Black and White Bin Packing problem---we prove that all Any Fit algorithms have absolute competitive ratio $3$. When the items have sizes at most $1/d$ for a real $d \geq 2$ we show that the Worst Fit algorithm is absolutely $(1+d/(d-1))$-competitive.

at April 23, 2014 12:41 AM UTC

Authors: Abdolhamid Ghodselahi, Fabian Kuhn
Download: PDF
Abstract: We study an online problem in which a set of mobile resources or centers have to be moved in order to optimally serve a set of demands. There is a set of $n$ nodes and a set of $k$ resources that are placed at some of the nodes. Each node can potentially host several resources and the resources can be moved between the nodes. At the nodes, demands arrive in an online fashion, and the cost for serving the demands is a function of the number of demands and resources at the different nodes. An online algorithm has to move resources in order to keep the service cost low, however as moving resources is expensive, the objective of an online algorithm is to minimize the total number of movements.

Specifically, we give a deterministic online algorithm, which for parameters $\alpha\geq 1$ and $\beta\geq 0$ guarantees that at all times, the service cost is within a multiplicative factor $\alpha$ and an additive term $\beta$ of the optimal service cost and where also the movement cost is within a small multiplicative factor and an additive term of the optimal overall cost. Roughly, for $\alpha=1$ and $\beta=\Omega(k)$, we show that the movement cost by time $t$ is upper bounded by the optimal service cost at time $t$ plus an additive term of order $O(k\log k)$. For $\alpha>1$ and arbitrary $\beta$, we show that the movement cost by time $t$ is only linear in $k\log k$ and logarithmic in the optimal service cost at time $t$. We also show that both bounds are almost asymptotically tight.

at April 23, 2014 12:41 AM UTC

Authors: Guru Prakash Arumugam, John Augustine, Mordecai J. Golin, Prashanth Srikanthan
Download: PDF
Abstract: A dynamic path network is an undirected path with evacuees situated at each vertex. To evacuate the path, evacuees travel towards a designated sink (doorway) to exit. Each edge has a capacity, the number of evacuees that can enter the edge in unit time. Congestion occurs if an evacuee has to wait at a vertex for other evacuees to leave first. The basic problem is to place k sinks on the line, with an associated evacuation strategy, so as to minimize the total time needed to evacuate everyone. The minmax-regret version introduces uncertainty into the input, with the number of evacuees at vertices only being specified to within a range. The problem is to find a universal solution whose regret (difference from optimal for a given input) is minimized over all legal inputs. The previously best known algorithms for the minmax regret version problem ran in time exponential in k. In this paper, we derive new prop- erties of solutions that yield the first polynomial time algorithms for solving the problem.

at April 23, 2014 12:00 AM UTC

Authors: Bojun Huang
Download: PDF
Abstract: We consider the problem of minimizing the total cost to run a sequence of $n$ tasks in the given order by $k$ agents under the positional cost model. The cost to run a task not only depends on the intrinsic cost of the task itself, but also monotonically related to the position this task is in the working list of the agent assigned. Such a positional effect can naturally arise from the classic sum-of-completion-time minimization problems, and is also well motivated by the varying efficiency when an agent works in reality (such as due to the learning effects or deteriorating effects). Also, it can be seen as a deterministic variant of the classic Baysian sequential decision making problems. This paper presents a simple and practical algorithm that runs in $O(k^2 n)$ time and minimizes the total cost of any problem instance consisting of two task types. The algorithm works by making greedy decision for each task sequentially based on some stopping thresholds in a "greedy-like" allocation simulation -- a working style coinciding with Gittins' optimal-stopping based algorithm for the classic Baysian multi-armed bandit problem.

at April 23, 2014 12:41 AM UTC

Authors: Odile Bellenguez-Morineau, Marek Chrobak, Christoph Dürr, Damien Prot
Download: PDF
Abstract: In the paper "The complexity of mean flow time scheduling problems with release times", by Baptiste, Brucker, Chrobak, D\"urr, Kravchenko and Sourd, the authors claimed to prove strong NP-hardness of the scheduling problem $P|pmtn,r_j|\sum C_j$, namely multiprocessor preemptive scheduling where the objective is to minimize the mean flow time. We point out a serious error in their proof and give a new proof of strong NP-hardness for this problem.

at April 23, 2014 12:00 AM UTC

Authors: Omer Reingold, Shai Vardi
Download: PDF
Abstract: Given an input $x$, and a search problem $F$, local computation algorithms (LCAs) implement access to specified locations of $y$ in a legal output $y \in F(x)$, using polylogarithmic time and space. Mansour et al., (2012), had previously shown how to convert certain online algorithms to LCAs. In this work, we expand on that line of work and develop new techniques for designing LCAs and bounding their space and time complexity. Our contributions are fourfold: (1) We significantly improve the running times and space requirements of LCAs for previous results, (2) we expand and better define the family of online algorithms which can be converted to LCAs using our techniques, (3) we show that our results apply to a larger family of graphs than that of previous results, and (4) our proofs are simpler and more concise than the previous proof methods.

For example, we show how to construct LCAs that require $O(\log{n}\log\log{n})$ space and $O(\log^2{n})$ time (and expected time $O(\log\log{n})$) for problems such as maximal matching on a large family of graphs, as opposed to the henceforth best results that required $O(\log^3{n})$ space and $O(\log^4{n})$ time, and applied to a smaller family of graphs.

at April 23, 2014 12:41 AM UTC

Authors: Dan Hassin, Adam Scrivener, Yibo Zhou
Download: PDF
Abstract: This paper is a critique of version two of Joonmo Kim's paper entitled "P is not equal to NP by Modus Tollens." After summarizing Kim's proof, we note that the logic that Kim uses is inconsistent, which provides evidence that the proof is invalid. To show this, we will consider two reasonable interpretations of Kim's definitions, and show that "P is not equal to NP" does not seem to follow in an obvious way using any of them.

at April 23, 2014 12:40 AM UTC

So, I’ve written an article of that title for the wonderful American Scientist magazine—or rather, Part I of such an article.  This part explains the basics of Kolmogorov complexity and algorithmic information theory: how, under reasonable assumptions, these ideas can be used in principle to “certify” that a string of numbers was really produced randomly—something that one might’ve imagined impossible a priori.  Unfortunately, the article also explains why this fact is of limited use in practice: because Kolmogorov complexity is uncomputable!  Readers who already know this material won’t find much that’s new here, but I hope those who don’t will enjoy the piece.

Part II, to appear in the next issue, will be all about quantum entanglement and Bell’s Theorem, and their very recent use in striking protocols for generating so-called “Einstein-certified random numbers”—something of much more immediate practical interest.

Thanks so much to Fenella Saunders of American Scientist for commissioning these articles, and my apologies to her and any interested readers for the 4.5 years (!) it took me to get off my rear end (or rather, onto it) to write these things.

by Scott at April 22, 2014 07:07 PM UTC

We prove a version of "Reversed Newman Theorem" in context of information complexity: every private-coin communication protocol with information complexity $I$ and communication complexity $C$ can be replaced by public-coin protocol with the same behavior so that it's information complexity does not exceed $O\left(\sqrt{IC}\right)$. This result holds for unbounded-round communication whereas previous results in this area dealt with one-way protocols. As an application it gives an undirect way to prove a best-known compression theorem in Information Complexity.

at April 22, 2014 03:10 PM UTC

The organizers of the AdAuctions workshop have extended the submission deadline to April 27. Updated information on the workshop is as follows.

The Tenth Ad Auction Workshop  (here is the call for papers)

  • Date: June 8, 2014, 8:30-15:30
  • Submission deadline: April 27, 2014, (23:59 Hawaii time)
  • Organizing Committee: adauctionsworkshop2014@gmail.com
    Itai Ashlagi (MIT Sloan)
    Patrick Jordan (Microsoft)
    De Liu (University of Kentucky)
    Renato Paes Leme (Google)
    Chris Wilkens (Yahoo)
  • Keynote speakers: Hal Varian (Google) and Paul Milgrom (Stanford).

by robertkleinberg at April 22, 2014 12:09 PM UTC

A 120 year old algorithm for learning mixtures of gaussians turns out to be optimal

So, how was math writing in 1894? I’d imagined it to be a lot like one of Nietzsche’s last diary entries except with nouns replaced by German fraktur symbols, impenetrable formalisms and mind-bending passive voice constructions. I was in for no small surprise when I started reading Karl Pearson’s Contributions to the Mathematical Theory of Evolution. The paper is quite enjoyable! The math is rigorous yet not overly formal. Examples and philosophical explanations motivate various design choices and definitions. A careful experimental evaluation gives credibility to the theory.

The utterly mind-boggling aspect of the paper however is not how well-written it is, but rather what Pearson actually did in it and how far ahead of his time he was. This is the subject of this post.

Pearson was interested in building a mathematical theory for evolutionary biology. In hindsight, his work is also one of the earliest contributions to the computational theory of evolution. This already strikes me as visionary as it remains a hot research area today. The Simons Institute in Berkeley just devoted a semester-long program to it.

In his paper, he considers the distribution of a single measurement among a population, more concretely, the forehead width to body length ratio among a population of crabs. The crab data had been collected by zoologist Weldon and his wife during their 1892 easter vacation on Malta. Wikipedia describes the crab (carcinus maenas) as one of the “world’s worst alien invasive species”, but fails to acknowledge the crab’s academic contributions.

The Pearson-Weldon crab: Evil alien invader or misjudged scientific apprentice?

The Pearson-Weldon crab: Evil alien invader or misjudged scientific apprentice?

The Weldons had taken 1000 samples each with 23 attributes of wich all but the one attribute describing forehead to body length ratio seemed to follow a single normal distribution. Regarding this peculiar deviation from normal, Pearson notes:

In the case of certain biological, sociological, and economic measurements there is, however, a well-marked deviation from this normal shape, and it becomes important to determine the direction and amount of such deviation. The asymmetry may arise from the fact that the units grouped together in the measured material are not really homogeneous. It may happen that we have a mixture of {2, 3,\dots, n} homogeneous groups, each of which deviates about its own mean symmetrically and in a manner represented with sufficient accuracy by the normal curve.

What the paragraph describes is the problem of learning a mixture of gaussians: Each sample is drawn from one of several unknown gaussian distributions. Each distribution is selected with an unknown mixture probability. The goal is to find the parameters of each gaussian distribution as well as the mixing probabilities. Pearson focuses on the case of two gaussians which he believes is of special importance in the context of evolutionary biology:

A family probably breaks up first into two species, rather than three or more, owing to the pressure at a given time of some particular form of natural selection.

With this reasoning he stipulates that the crab measurements are drawn from a mixture of two gaussians and aims to recover the parameters of this mixture. Learning a mixture of two gaussians at the time was a formidable task that lead Pearson to come up with a powerful approach. His approach is based on the method of moments which uses the empirical moments of a distribution to distinguish between competing models. Given {n} samples {x_1,...,x_n} the {k}-th empirical moment is defined as {\frac1n\sum_ix_i^k}, which for sufficiently large {n} will approximate the true moment {\mathbb{E}\,x_i^k}. A mixture of two one-dimensional gaussians has {5} parameters so one might hope that {5} moments are sufficient to identify the parameters.

Pearson derived a ninth degree polynomial {p_5} in the first {5} moments and located the real roots of this polynomial over a carefully chosen interval. Each root gives a candidate mixture that matches the first {5} moments; there were two valid solutions, among which Pearson selected the one whose {6}-th moment was closest to the observed empirical {6}-th moment.

Pearson goes on to evaluate this method numerically on the crab samples—by hand. In doing so he touches on a number of issues such as numerical stability and number of floating point operations that seem to me as being well ahead of their time. Pearson was clearly aware of computational constraints and optimized his algorithm to be computationally tractable.

Learning mixtures of gaussians

Pearson’s seminal paper proposes a problem and a method, but it doesn’t give an analysis of the method. Does the method really work on all instances or did he get lucky on the crab population? Is there any efficient method that works on all instances? The sort of theorem we would like to have is: Given {n} samples from a mixture of two gaussians, we can estimate each parameter up to small additive distance in polynomial time.

Eric Price and I have a recent result that resolves this problem by giving a computationally efficient estimator that achieves an optimal bound on the number of samples:

Theorem. Denoting by {\sigma^2} the overall variance of the mixture, {\Theta(\sigma^{12})} samples are necessary and sufficient to estimate each parameter to constant additive distance. The estimator is computationally efficient and extends to the {d}-dimensional mixture problem up to a necessary factor of {O(\log d)} in the sample requirement.

Strikingly, our one-dimensional estimator turns out to be very closely related to Pearson’s approach. Some extensions are needed, but I’m inclined to say that our result can be interpreted as showing that Pearson’s original estimator is in fact an optimal solution to the problem he proposed.

Until a recent breakthrough result by Kalai, Moitra and Valiant (2010), no polynomial bounds for the problem were known except under stronger assumptions on the gaussian components.

 A word about definitions

Of course, we need to be a bit more careful with the above statements. Clearly we can only hope to recover the parameters up to permutation of the two gaussian components as any permutation gives an identical mixture. Second, we cannot hope to learn the mixture probabilities in general up to small error. If, for example, the two gaussian components are indistinguishable, there is no way of learning the mixture probabilities—even though we can estimate means and variances easily by treating the distribution as one gaussian. On the other hand, if the gaussian components are distinguishable then it is not hard to estimate the mixture probabilities from the other parameters and the overall mean and variance of the mixture. For this reason it makes some sense to treat the mixture probabilities separately and focus on learning means and variances of the mixture.

Another point to keep in mind is that our notion of parameter distance should be scale-free. A reasonable goal is therefore to learn parameters {\hat \mu_1,\hat \mu_2,\hat\sigma_1,\hat\sigma_2} such that if {\mu_1,\mu_2,\sigma_1,\sigma_2} are the unknown parameters, there is a permutation {\pi\colon\{1,2\}\rightarrow\{1,2\}} such that for all {i\in\{1,2\},}

\displaystyle \max\left( \big|\mu_i-\mu_{\pi(i)}\big|^2, \big|\sigma_i^2-\sigma_{\pi(i)}^2\big|\right)\le\epsilon \cdot \sigma^2.

Here, {\sigma^2} is the overall variance of the mixture. The definition extends to the {d}-dimensional problem for suitable notion of variance (essentially the maximum variance in each coordinate) and by replacing absolute values with {\ell_\infty}-norms.

Some intuition for the proof

While the main difficulty in the proof is the upper bound, let’s start with some intuition for why we can’t hope to do better than {\sigma^{12}}. Just like Pearson’s estimator and that of Kalai, Moitra and Valiant, our estimator is based on the first six moments of the distribution. It is not hard to show that estimating the {k}-th moment up to error {\epsilon\sigma^k} requires {\Omega(1/\epsilon^2)} samples. Hence, learning each of the first six moments to constant error needs {\Omega(\sigma^{12})} samples. This doesn’t directly imply a lower bound for the mixture problem. After all, learning mixtures could be strictly easier than estimating the first six moments. Nevertheless, we know from Pearson that there are two mixtures that have matching first {5} moments. This turns out to give you a huge hint as to how to prove a lower bound for the mixture problem. The key observation is that two mixtures that have matching first {5} moments and constant parameters are extremely close to each other after adding a gaussian variable {N(0,\sigma^2)} to both mixtures for large enought {\sigma^2}. Note that we still have two mixtures of gaussians after this operation and the total variance is {O(\sigma^2).}

Making this statement formal requires choosing the right probability metric. In our case this turns out to be the squared Hellinger distance. Sloppily identifying distributions with density functions like only a true computer scientist would do, the squared Hellinger distance is defined as

\displaystyle \mathrm{H}^2(f,g) = \frac12\int_{-\infty}^{\infty} \Big(\sqrt{f(x)}-\sqrt{g(x)}\Big)^2 \mathrm{d}x\,.

Lemma. Let {f,g} be mixtures with matching first {k} moments and constant parameters. Let {p,q} be the distributions obtained by adding {N(0,\sigma^2)} to each distribution. Then, {\mathrm{H}^2(p,q)\le O(1/\sigma^{2k+2})}.

The lemma immediately gives the lower bound we need. Indeed, the squared Hellinger distance is sub-additive on product distributions. So, it’s easy to see what happens for {n} independent samples from each distribution. This can increase the squared Hellinger distance by at most a factor {n}. In particular, for any {n=o(\sigma^{2k+2})} the squared Hellinger distance between {n} samples from each distribution is at most {o(1)}. Owing to a useful relation between the Hellinger distance and total variation, this implies that also the total variation distance is at most {o(1)}. Using the definition of total variation distance, this means no estimator (efficient or not) can tell apart the two distributions from {n=o(\sigma^{2k+2})} samples with constant success probability. In particular, parameter estimation is impossible.

Mixtures with identical first 5 moments before and after adding noise

This lemma is an example of a broader phenomenon: Matching moments plus “noise” implies tiny Hellinger distance. A great reference is Pollard’s book and lecture notes. By the way, Pollard’s expository writing is generally wonderful. For instance, check out his (unrelated) paper on guilt-free manipulation of conditional densities if you ever wondered about how to condition on a measure zero event without regret. Not that computer scientists are generally concerned about it.

Matching upper bound

The matching upper bound is a bit trickier. The proof uses a convenient notion of excess moments that was introduced by Pearson. You might have heard of excess kurtosis (the fourth excess moment) as it appears in the info box on every Wikipedia article about distributions. Excess moments have the appealing property that they are invariant under adding and subtracting a common term from the variance of each component.

Identical excess moments

The second excess moment is always {0.} As the picture suggests, we can make one component increasingly spiky and think of it as a distribution that places all its weight on a single point. In accordance with this notion of excess moments it makes sense to reparametrize the mixture in such a way that the parameters are also independent of adding and subtracting a common variance term. Assuming the overall mean of the mixture is {0,} this leaves us with three free parameters {\alpha,\beta,\gamma.} The third through sixth excess moment give us four equations in these three parameters.

Expressing the excess moments in terms of our new parameters {\alpha,\beta,\gamma,} we can derive in a fairly natural manner a ninth degree polynomial {p_5(y)} whose coefficients depend on the first {5} excess moments so that {\alpha} has to satisfy {p_5(\alpha)=0.} The polynomial {p_5} was already used by Pearson. Unfortunately, {p_5} can have multiple roots and this is to be expected since {5} moments are not sufficient to identify a mixture of two gaussians. Pearson computed the mixtures associated with each of the roots and threw out the invalid solutions (e.g. the ones that give imaginary variances), getting two valid mixtures that matched on the first five moments. He then chose the one whose sixth moment was closest to the observed sixth moment.

We proceed somewhat differently from Pearson after computing {p_5}. We derive another polynomial {p_6} (depending on all six moments) and a bound {B} such that {\alpha} is the only solution to the system of equations

\displaystyle \big\{p_5(y)=0,\quad p_6(y)=0,\quad 0<y\leq B\big\}.

This approach isn’t yet robust to small perturbations in the moments; for example, if {p_5} has a double root at {\alpha}, it may have no nearby root after perturbation. We therefore consider the polynomial {r(y) = p_5^2(y)+p_6^2(y)} which we know is zero at {\alpha.} We argue that {r(y)} is significantly nonzero for any {y} significantly far from {\alpha}. This is the main difficulty in the proof, but if you really read this far, it’s perhaps not a bad point to switch to reading our paper.

Conclusions

I sure drooled enough over how much I like Pearson’s paper. Let me conclude with a different practical thought. There were a bunch of tools (as in actual computer programs—gasp!) that made the work a lot easier. Before we proved the lower bound we had verified it using arbitrary floating point precision numerical integration to accuracy {10^{-300}} for {\sigma^2=10,100,1000,\dots} and the decrease in Hellinger distance was exactly what it should be. We used a Python package called mpmath for that.

Similarly, for the upper bound we used a symbolic computation package in Python called SymPy to factor various polynomials, perform substitutions and multiplications. It would’ve been tedious, time-consuming and error prone to do everything by hand (even if the final result is not too hard to check by hand).

To stay on top of future posts, subscribe to the RSS feed or follow me on Twitter.

by Moritz Hardt at April 22, 2014 01:02 AM UTC

Authors: Dmitry Kosolobov, Mikhail Rubinchik, Arseny M. Shur
Download: PDF
Abstract: Given a language $L$ that is online recognizable in linear time and space, we construct a linear time and space online recognition algorithm for the language $L\cdot\mathrm{Pal}$, where $\mathrm{Pal}$ is the language of all nonempty palindromes. Hence for every fixed positive $k$, $\mathrm{Pal}^k$ is online recognizable in linear time and space. Thus we solve an open problem posed by Galil and Seiferas in 1978.

at April 22, 2014 12:41 AM UTC

Authors: Boaz Barak, David Steurer
Download: PDF
Abstract: In order to obtain the best-known guarantees, algorithms are traditionally tailored to the particular problem we want to solve. Two recent developments, the Unique Games Conjecture (UGC) and the Sum-of-Squares (SOS) method, surprisingly suggest that this tailoring is not necessary and that a single efficient algorithm could achieve best possible guarantees for a wide range of different problems.

The Unique Games Conjecture (UGC) is a tantalizing conjecture in computational complexity, which, if true, will shed light on the complexity of a great many problems. In particular this conjecture predicts that a single concrete algorithm provides optimal guarantees among all efficient algorithms for a large class of computational problems.

The Sum-of-Squares (SOS) method is a general approach for solving systems of polynomial constraints. This approach is studied in several scientific disciplines, including real algebraic geometry, proof complexity, control theory, and mathematical programming, and has found applications in fields as diverse as quantum information theory, formal verification, game theory and many others.

We survey some connections that were recently uncovered between the Unique Games Conjecture and the Sum-of-Squares method. In particular, we discuss new tools to rigorously bound the running time of the SOS method for obtaining approximate solutions to hard optimization problems, and how these tools give the potential for the sum-of-squares method to provide new guarantees for many problems of interest, and possibly to even refute the UGC.

at April 22, 2014 12:00 AM UTC

Authors: Ragesh Jaiswal
Download: PDF
Abstract: There is a close connection between Direct Product and XOR lemmas in the sense that in many settings, we can prove one given the other. The known reductions that are used for the above purpose are either in the non-uniform setting or give non-matching parameters. By non-matching parameter we mean that $k$-wise Direct Product lemma implies $k'$-wise XOR lemma (and vice versa) for $k \neq k'$. In this work, we discuss reductions between $k$-wise Direct Product and $k$-wise XOR lemmas. That is, we show that if the $k$-wise direct product lemma holds, then so does the $k$-wise XOR lemma and vice versa. We show that even though there is a perfectly uniform reduction in one direction, the reduction in the other direction requires some amount of non-uniformity. We give reductions in both directions matching information-theoretic bounds up to polynomial factors. Our techniques also give a small quantitative improvement over the known results about proving $k$-wise XOR lemma using $2k$-wise Direct Product lemma.

at April 22, 2014 12:40 AM UTC

Authors: Deepak Ajwani, W. Sean Kennedy, Alessandra Sala, Iraj Saniee
Download: PDF
Abstract: Many graph processing algorithms require determination of shortest-path distances between arbitrary numbers of node pairs. Since computation of exact distances between all node-pairs of a large graph, e.g., 10M nodes and up, is prohibitively expensive both in computational time and storage space, distance approximation is often used in place of exact computation. In this paper, we present a novel and scalable distance oracle that leverages the hyperbolic core of real-world large graphs for fast and scalable distance approximation. We show empirically that the proposed oracle significantly outperforms prior oracles on a random set of test cases drawn from public domain graph libraries. There are two sets of prior work against which we benchmark our approach. The first set, which often outperforms other oracles, employs embedding of the graph into low dimensional Euclidean spaces with carefully constructed hyperbolic distances, but provides no guarantees on the distance estimation error. The second set leverages Gromov-type tree contraction of the graph with the additive error guaranteed not to exceed $2\delta\log{n}$, where $\delta$ is the hyperbolic constant of the graph. We show that our proposed oracle 1) is significantly faster than those oracles that use hyperbolic embedding (first set) with similar approximation error and, perhaps surprisingly, 2) exhibits substantially lower average estimation error compared to Gromov-like tree contractions (second set). We substantiate our claims through numerical computations on a collection of a dozen real world networks and synthetic test cases from multiple domains, ranging in size from 10s of thousand to 10s of millions of nodes.

at April 22, 2014 12:41 AM UTC

Authors: Moritz Hardt, Eric Price
Download: PDF
Abstract: We consider the problem of identifying the parameters of an unknown mixture of two arbitrary $d$-dimensional gaussians from a sequence of random samples. Our main result is a computationally efficient moment-based estimator with an optimal convergence rate thus resolving a problem introduced by Pearson (1894). Denoting by $\sigma^2$ the variance of the unknown mixture, we prove that $\Theta(\sigma^{12})$ samples are necessary and sufficient to estimate each parameter up to constant additive error when $d=1.$ Our upper bound extends to arbitrary dimension $d>1$ up to a (necessary) logarithmic loss in $d$ using a novel---yet simple---dimensionality reduction technique.

Strikingly, our estimator turns out to be very similar to the one Pearson proposed in 1894 which reduces the one-dimensional problem to solving and analyzing a tractable system of polynomial equations. Our result greatly improves on the exponent in the sample size of the best previous estimator due to Kalai, Moitra and Valiant (2010).

at April 22, 2014 12:41 AM UTC

Authors: Søren Dahlgaard, Mathias Bæk Tejs Knudsen, Noy Rotbart
Download: PDF
Abstract: We investigate labeling schemes supporting adjacency, ancestry, sibling, and connectivity queries in forests. In the course of more than 20 years, the existence of $\log n + O(\log \log)$ labeling schemes supporting each of these functions was proven, with the most recent being ancestry [Fraigniaud and Korman, STOC '10]. Several multi-functional labeling schemes also enjoy lower or upper bounds of $\log n + \Omega(\log \log n)$ or $\log n + O(\log \log n)$ respectively. Notably an upper bound of $\log n + 5\log \log n$ for adjacency+siblings and a lower bound of $\log n + \log \log n$ for each of the functions siblings, ancestry, and connectivity [Alstrup et al., SODA '03]. We improve the constants hidden in the $O$-notation. In particular we show a $\log n + 2\log \log n$ lower bound for connectivity+ancestry and connectivity+siblings, as well as an upper bound of $\log n + 3\log \log n + O(\log \log \log n)$ for connectivity+adjacency+siblings by altering existing methods.

In the context of dynamic labeling schemes it is known that ancestry requires $\Omega(n)$ bits [Cohen, et al. PODS '02]. In contrast, we show upper and lower bounds on the label size for adjacency, siblings, and connectivity of $2\log n$ bits, and $3 \log n$ to support all three functions. There exist efficient adjacency labeling schemes for planar, bounded treewidth, bounded arboricity and interval graphs. In a dynamic setting, we show a lower bound of $\Omega(n)$ for each of those families.

at April 22, 2014 12:40 AM UTC

Authors: Yoshimasa Takabatake, Yasuo Tabei, Hiroshi Sakamoto
Download: PDF
Abstract: While several self-indexes for highly repetitive texts exist, developing a practical self-index applicable to real world repetitive texts remains a challenge. ESP-index is a grammar-based self-index on the notion of edit-sensitive parsing (ESP), an efficient parsing algorithm that guarantees upper bounds of parsing discrepancies between different appearances of the same subtexts in a text. Although ESP-index performs efficient top-down searches of query texts, it has a serious issue on binary searches for finding appearances of variables for a query text, which resulted in slowing down the query searches. We present an improved ESP-index (ESP-index-I) by leveraging the idea behind succinct data structures for large alphabets. While ESP-index-I keeps the same types of efficiencies as ESP-index about the top-down searches, it avoid the binary searches using fast rank/select operations. We experimentally test ESP-index-I on the ability to search query texts and extract subtexts from real world repetitive texts on a large-scale, and we show that ESP-index-I performs better that other possible approaches.

at April 22, 2014 12:41 AM UTC

Authors: Gonzalo Navarro, Simon J. Puglisi, Jouni Sirén
Download: PDF
Abstract: Document retrieval aims at finding the most important documents where a pattern appears in a collection of strings. Traditional pattern-matching techniques yield brute-force document retrieval solutions, which has motivated the research on tailored indexes that offer near-optimal performance. However, an experimental study establishing which alternatives are actually better than brute force, and which perform best depending on the collection characteristics, has not been carried out. In this paper we address this shortcoming by exploring the relationship between the nature of the underlying collection and the performance of current methods. Via extensive experiments we show that established solutions are often beaten in practice by brute-force alternatives. We also design new methods that offer superior time/space trade-offs, particularly on repetitive collections.

at April 22, 2014 12:40 AM UTC

Authors: Raphael Kramer, Anand Subramanian, Thibaut Vidal, Lucídio dos Anjos Formiga Cabral
Download: PDF
Abstract: This paper deals with the Pollution-Routing Problem (PRP), a Vehicle Routing Problem (VRP) with environmental considerations, recently introduced in the literature by [Bektas and Laporte (2011), Transport. Res. B-Meth. 45 (8), 1232-1250]. The objective is to minimize operational and environmental costs while respecting capacity constraints and service time windows. Costs are based on driver wages and fuel consumption, which depends on many factors, such as travel distance and vehicle load. The vehicle speeds are considered as decision variables. They complement routing decisions, impacting the total cost, the travel time between locations, and thus the set of feasible routes. We propose a method which combines a local search-based metaheuristic with an integer programming approach over a set covering formulation and a recursive speed-optimization algorithm. This hybridization enables to integrate more tightly route and speed decisions. Moreover, two other "green" VRP variants, the Fuel Consumption VRP (FCVRP) and the Energy Minimizing VRP (EMVRP), are addressed. The proposed method compares very favorably with previous algorithms from the literature and many new improved solutions are reported.

at April 22, 2014 12:41 AM UTC

Authors: Yaroslav Akhremtsev, Peter Sanders, Christian Schulz
Download: PDF
Abstract: In this paper, we develop semi-external and external memory algorithms for graph partitioning and clustering problems. Graph partitioning and clustering are key tools for processing and analyzing large complex networks. We address both problems in the (semi-)external model by adapting the size-constrained label propagation technique. Our (semi-)external size-constrained label propagation algorithm can be used to compute graph clusterings and is a prerequisite for the (semi-)external graph partitioning algorithm. The algorithm is then used for both the coarsening and the refinement phase of a multilevel algorithm to compute graph partitions. Our algorithm is able to partition and cluster huge complex networks with billions of edges on cheap commodity machines. Experiments demonstrate that the semi-external graph partitioning algorithm can compute high quality partitions in time no more than a factor 4 of an efficient internal memory implementation.

at April 22, 2014 12:41 AM UTC

I am teaching the undergrad jr/sr course Elementary Theory of Computation
which is Reg Langs, Context free Langs, Computability theory, P and NP.

Over the last few years I've changed some of the content (I dumped CFLs-- don't worry, they cover them some in the Prog Langs course) but more to my point today changed the ORDER I do things in

Change  1: Do P and NP first and Computability later. While this is not historically accurate (which may be why the order was what it was)
this is better pedagogically. Why?  Consider the following reductions:

1) Given a FORMULA phi produce a graph G and a number k such that
phi is in SAT iff (G,k) is in CLIQ

2) Given a MACHINE M_e produce a MACHINE M_i such that

M_e(e) halts iff M_j halts on at least 10 numbers.

Students have always found the second more confusing since the input and the output are both machines (and the reduction itself is done by a machine)
where as the first one seems like a REAL transformation- you input a FORMULA
and get out a GRAPH and a NUMBER.

There are two other reasons to do P NP first:
(1) Sometimes the last topic in a course gets short changed, and P NP is
more important than Computability, so better if Computability gets short changed.
(2) Lets say I could do P NP in either April or May. If I do it in April and someone resolves it in May, the students can get excited about it since they know about it. If I do it in May and its resolved in April then the students
cannot share in the joy of the discovery. This reason may become more relevant in the year 2214.

Change 2: I used to do the Cook-Levin Theorem (SAT is NP-complete) and
then do a reduction like SAT \le CLIQ. This semester I did SAT \le CLIQ first
and Cook-Levin later. Why? Because SAT\le CLIQ is better pedagogically (note- pedagogically is today's word on my word-of-the-day calendar) since the Cook-Levin reduction is harder and deals with Turing Machines as opposed to more familiar objects like Formulas and Graphs.

More generally- the order you teach things in may matter, and changing it is relatively easy.

by GASARCH (noreply@blogger.com) at April 21, 2014 11:57 PM UTC

I just added a new article to Wikipedia on indifference graphs (also known as unit interval graphs or proper interval graphs): the graphs formed from sets of points on the real line by connecting every two points whose distance is less than one.



There are many papers on algorithms for going from the graph to a geometric representation in linear time. The following method for the reverse problem, going from the set of points (or equivalently unit intervals) to its graph must be known, possibly in the context of its generalization to higher dimensional unit disk graphs, but I don't know a good reference for it.

Given a set of real numbers:
  1. Round each one down to the nearest smaller integer.
  2. Use a hash table to collect numbers that round to the same integer.
  3. For each number x, use the hash table to find each other number y whose rounded value is within one of the rounded value of x.
  4. Create an edge in the graph for every pair (x,y) found in this way whose distance is at most one.
Each pair (x,y) can be charged against the hash table entry of x or y that has the larger number of inputs mapped to it. In this way, each hash table entry gets a charge proportional to the square of its number of values. But on the other hand every pair of inputs that map to the same hash table entry form an edge, so the number of edges is at least proportional to the sum of the squares of the hash table entry sizes. Thus, the total work is at most proportional to the total output size.

Update, April 22: It appears that the correct reference for this is Bentley, Jon L.; Stanat, Donald F.; Williams, E. Hollins, Jr.
The complexity of finding fixed-radius near neighbors. Information Processing Lett. 6 (1977), no. 6, 209–212.

at April 21, 2014 06:33 PM UTC

For a Boolean function $f,$ let $D(f)$ denote its deterministic decision tree complexity, i.e., minimum number of (adaptive) queries required in worst case in order to determine $f.$ In a classic paper, Rivest and Vuillemin \cite{rv} show that any non-constant monotone property $\mathcal{P} : \{0, 1\}^{n \choose 2} \to \{0, 1\}$ of $n$-vertex graphs has $D(\mathcal{P}) = \Omega(n^2).$ We extend their result to $3$-uniform hypergraphs. In particular, we show that any non-constant monotone property $\mathcal{P} : \{0, 1\}^{n \choose 3} \to \{0, 1\}$ of $n$-vertex $3$-uniform hypergraphs has $D(\mathcal{P}) = \Omega(n^3).$ Our proof combines the combinatorial approach of Rivest and Vuillemin with the topological approach of Kahn, Saks, and Sturtevant. Interestingly, our proof makes use of Vinogradov's Theorem (weak Goldbach Conjecture), inspired by its recent use by Babai et. al. \cite{bbkn} in the context of the topological approach. Our work leaves the generalization to $k$-uniform hypergraphs as an intriguing open question.

at April 21, 2014 05:23 PM UTC

We show that the deterministic multiparty communication complexity of set disjointness for $k$ parties on a universe of size $n$ is $\Omega(n/4^k)$. We also simplify Sherstov's proof showing an $\Omega(\sqrt{n}/(k2^k))$ lower bound for the randomized communication complexity of set disjointness.

at April 21, 2014 06:34 AM UTC

I have just posted online a new survey “Sum-of-Squares proofs and the quest toward optimal algorithms” co-authored with David Steurer .

The survey discusses two topics I have blogged about before – Khot’s Unique Games Conjecture (UGC) and the Sum-of-Squares (SOS) method – and the connections between them. Both are related to the notion of meta algorithms. While typically we think of an algorithm as tailor-made for a particular problem, there are some generic approaches to design an “algorithmic scheme” or a meta algorithm, that can be applied to a large class of problems. The UGC predicts that a particular such meta-algorithm, which we call in the survey simply the “UGC Meta-Algorithm”, would give in fact optimal approximation guarantees among all efficient algorithms for a large class of problems. This is very exciting from the point of view of complexity theory, not simply because it gives many hardness-of-approximation results in one shot, but because in some sense it gives a complete understanding of what makes problems of certain domains easy or hard.

The SOS method is another meta algorithm that can be applied to the same cases. It is parameterized by a number \ell called its degree. Using a higher degree can give potentially better approximation guarantees, at the expense of longer running time: running the method with degree \ell on input of length n takes n^{O(\ell)} time. The UGC Meta-Algorithm in fact corresponds to the base case (which is degree \ell=2) of the SOS method, and so the UGC predicts that in a great many cases, using constant (even even mildly super-constant) degree larger than 2 will not help get better approximation guarantees. We discuss in the survey a few recent results that, although falling short of refuting the UGC, cast some doubts on this prediction by showing that larger degree can sometimes help in somewhat similar contexts. I will give a talk on the same topic in the mathematical aspects of computer science section  of the 2014 International Congress of Mathematicians to be held in Seoul, South Korea, August 13-21, 2014. As you can see, there will be some great speakers in this session (and ICM in general), and I hope we will see blog posts on some of those surveys as well.


by Boaz Barak at April 21, 2014 02:58 AM UTC

In order to obtain the best-known guarantees, algorithms are traditionally tailored to the particular problem we want to solve. Two recent developments, the Unique Games Conjecture (UGC) and the Sum-of-Squares (SOS) method, surprisingly suggest that this tailoring is not necessary and that a single efficient algorithm could achieve best possible guarantees for a wide range of different problems. The Unique Games Conjecture (UGC) is a tantalizing conjecture in computational complexity, which, if true, will shed light on the complexity of a great many problems. In particular this conjecture predicts that a single concrete algorithm provides optimal guarantees among all efficient algorithms for a large class of computational problems. The Sum-of-Squares (SOS) method is a general approach for solving systems of polynomial constraints. This approach is studied in several scientific disciplines, including real algebraic geometry, proof complexity, control theory, and mathematical programming, and has found applications in fields as diverse as quantum information theory, formal verification, game theory and many others. We survey some connections that were recently uncovered between the Unique Games Conjecture and the Sum-of-Squares method. In particular, we discuss new tools to rigorously bound the running time of the SOS method for obtaining approximate solutions to hard optimization problems, and how these tools give the potential for the sum-of-squares method to provide new guarantees for many problems of interest, and possibly to even refute the UGC.

at April 21, 2014 02:11 AM UTC

Authors: Paul Accisano, Alper Üngör
Download: PDF
Abstract: Let $P$ be a polygonal curve in $\mathbb{R}^d$ of length $n$, and $S$ be a point-set of size $k$. The Curve/Point Set Matching problem consists of finding a polygonal curve $Q$ on $S$ such that the Fr\'echet distance from $P$ is less than a given $\varepsilon$. We consider eight variations of the problem based on the distance metric used and the omittability or repeatability of the points. We provide closure to a recent series of complexity results for the case where $S$ consists of precise points. More importantly, we formulate a more realistic version of the problem that takes into account measurement errors. This new problem is posed as the matching of a given curve to a set of imprecise points. We prove that all three variations of the problem that are in P when $S$ consists of precise points become NP-complete when $S$ consists of imprecise points. We also discuss approximation results.

at April 21, 2014 12:00 AM UTC

Authors: Robert L. Surowka, Kenneth W. Regan
Download: PDF
Abstract: We extend the Ax-Katz theorem for a single polynomial from finite fields to the rings Z_m with m composite. This extension not only yields the analogous result, but gives significantly higher divisibility bounds. We conjecture what computer runs suggest is the optimal result for any m, and prove a special case of it. The special case is for m = 2^r and polynomials of degree 2. Our results also yield further properties of the solution spaces. Polynomials modulo composites are the focus of some computational complexity lower bound frontiers, while those modulo 2^r arise in the simulation of quantum circuits. We give some prospective applications of this research.

at April 21, 2014 12:40 AM UTC