Theory of Computing Blog Aggregator

Having nothing better to do, I decided to create a twitter meme: updating proverbs for the 21st century. Use the hashtag #21stcentproverbs and join in the fun below.

 

by Suresh Venkatasubramanian (noreply@blogger.com) at May 19, 2013 09:58 PM UTC

Atri Rudra reminds us that the 3rd incarnation of SPARC is soon to be upon us (disclaimer: I'll be speaking at the event):
Efficient and effective transmission, storage, and retrieval of information on a large-scale are among the core technical problems in the modern digital revolution. The massive volume of data necessitates the quest for mathematical and algorithmic methods for efficiently describing, summarizing, synthesizing, and, increasingly more critical, deciding when and how to discard data before storing or transmitting it. Such methods have been developed in two areas: coding theory, and sparse approximation (SA) (and its variants called compressive sensing (CS) and streaming algorithms). 
Coding theory and computational complexity are both well established fields that enjoy fruitful interactions with one another. On the other hand, while significant progress on the SA/CS problem has been made, much of that progress is concentrated on the feasibility of the problems, including a number of algorithmic innovations that leverage coding theory techniques, but a systematic computational complexity treatment of these problems is sorely lacking. The workshop organizers aim to develop a general computational theory of SA and CS (as well as related areas such as group testing) and its relationship to coding theory. This goal can be achieved only by bringing together researchers from a variety of areas. 
The Coding, Complexity and Sparsity workshop (SPARC 13) will be held in Ann Arbor, MI on Aug 5-7. 
These will be hour-long lectures designed to give students an introduction to coding theory, complexity theory/pseudo-randomness, and compressive sensing/streaming algorithms. We will have a poster session during the workshop and everyone is welcome to bring a poster but graduate students and postdocs are especially encouraged to give a poster presentation. 
This is the third incarnation of the workshop and the previous two workshops were also held in Ann Arbor in August of 2011 and 2012. 
Confirmed speakers:
* Jin Yi Cai (University of Wisconsin, Madison)
* Shafi Goldwasser (MIT)
* Piotr Indyk (MIT)
* Swastik Kopparty (Rutgers University)
* Dick Lipton (Georgia Tech)
* Andrew McGregor (University of Massachusetts, Amherst)
* Raghu Meka (IAS)
* Jelani Nelson    (Harvard)
* Eric Price (MIT)
* Christopher Ré (University of Wisconsin, Madison)
* Shubhangi Saraf (Rutgers University)
* Suresh Venkatasubramanian (University of Utah)
* David Woodruff (IBM)
* Mary Wootters    (Michigan)
* Shuheng Zhou (Michigan)

We have some funding for graduate students and postdocs with preference given to those who will be presenting posters. For registration and other details, please look at the workshop webpage: http://eecs.umich.edu/eecs/SPARC2013/

by Suresh Venkatasubramanian (noreply@blogger.com) at May 19, 2013 07:10 AM UTC

850320-austria-photography-camera-auction

You run a single-item sealed bid auction where you sell an old camera. There are three bidders and the value of the camera for each of them is described by a certain (known) random variable: With probability 0.9 the value is 100+x where x is taken uniformly at random from the interval [-1,1]. With probability 0.1 the value is 300+x where x is as before. The 300 value represents the case that the item has a special nostalgic value for the buyer.

The values of the camera to the three bidders are thus i.i.d random variables. (The reason for adding this small random x is to avoid ties, and you can replace the interval [-1,1] with [-ε, ε] for a small ε, if you wish.) If you don’t want to worry about ties you can simply think about the value being 100 with probability 0.9 and 300 wit probability 0.1.

The basic question

The basic questions for you the seller is:

What is the highest expected revenue you, the seller, can guarantee and what is your optimal allocation policy.

I’d like to see the answer for this question. But let me challenge your intuition with two simpler questions.

1) Can the seller guarantee an expected revenue of 120  or more?

2) What (roughly) is the optimal allocation policy

a) Highest bidder wins.

b) Highest bidder wins if his bid passes some reserve price.

c) The item is allocated at random to the bidders with probabilities depending on their bids.

Background: Myerson’s paper and his revenue equivalence theorem

The most relevant paper to this post is a well-known paper Optimal auction design by Roger Myerson. Myerson considered the case of a single-item sealed-bid auction where the bidders’ values for the item are independent identical random variable.

Note that I did not specify the price that the winning bidder is going to pay for the camera. The reason is that according to a famous theorem by Myerson (referred to as the revenue equivalence theorem), when the bidders are strategic, the expected revenues for the seller are determined by the allocation rule and are independent from the pricing policy! (Well, you need to assume a reasonable pricing policy…)

For example, if the item is allocated to the highest bidder then the expected price will be the second highest price. If the price is determined by the second highest bid (Vickery’s auction) then each bidder has no incentive to give a bid which is different from its value. But even if the item will be allocated to the first bidder for the highest price, the expected revenues will still be the same! When you analyze an auction mechanism like ours you can assume that the payments are set in a way that the bidders have no incentive not to bid the true value the camera has. This is sometimes referred to as the revelation principle.

Myerson considered a mechanism which sometimes lead to higher revenues compared to allocating the item to the highest bidder: The seller set a reserve price and the item is allocated to the highest bidder if the bid passes this reserve price, and is not allocated at all otherwise. In the paper Myerson also considered more complicated allocation rules which are important in the analysis where the item is allocated to bidders according to some probabilities based on their bids.

Polls

This time we have two questions and two polls:

Once again this is a game-theory question. I hope it will lead to interesting discussion like the earlier one on tic-tac-toe.

A little more Background: Auctions and Vickery’s second price auction.

(From Wikipedia) An auction is a process of buying and selling goods or services by offering them up for bid, taking bids, and then selling the item to the highest bidder. In economic theory, an auction may refer to any mechanism or set of trading rules for exchange.

In our case, we consider an auction of a single item (the camera) and each bidder is giving a sealed bid.

(Again from Wikipedea) A Vickrey auction is a type of sealed-bid auction, where bidders submit written bids without knowing the bid of the other people in the auction, and in which the highest bidder wins, but the price paid is the second-highest bid. The auction was first described academically by Columbia University professor William Vickrey in 1961 though it had been used by stamp collectors since 1893.[2] This type of auction is strategically similar to an English auction, and gives bidders an incentive to bid their true value.


by Gil Kalai at May 18, 2013 08:26 PM UTC

idledisc500

A commentator named Oz proposed the following question: You have a box with n red balls and n blue balls. You take out each time a ball at random but, if the ball was red, you put it back in the box and take out a blue ball. If the ball was blue, you put it back in the box and take out a red ball.

You keep doing it until left only with balls of the same color. How many balls will be left (as a function of n)?

ozpoll

Peter Shor wrote in a comment “I’m fairly sure that there is not enough bias to get cn, but it intuitively seems far too much bias to still be c \sqrt{n}. I want to say n^c. At a wild guess, it’s either c = \frac{2}{3}or c = \frac{3}{4}, since those are the simplest exponents between \frac{1}{2} and 1.”  The comment followed by a heuristic argument of Kevin Kostelo and computer experiments by Lior Silberman that supported the answer n^{3/4}.

This is correct!

Good intuition, Peter!

In our student probability day a couple of weeks ago, Yuval Peres told me the origin of this problem. The way it was originally posed by D. Williams and P. McIlroy used roughly the following story (I modified it a little): There are two groups of n gunmen that shoot at each other. Once a gunman is hit he stops shooting, and leaves the place happily and peacefully. How many gunmen will be left after all gunmen in one team had left. The problem was solved by Kingman and Volkov in this paper.

J. F. C. Kingman, and S. E. Volkov,  Solution to the OK Corral model via decoupling of Friedman’s urn, J. Theoret. Probab. 16 (2003), no. 1, 267–276. A nice presentation of the result entitled: Internal erosion and the exponent 3/4 was given by Lionel Levine and Yuval Peres.


by Gil Kalai at May 17, 2013 09:09 AM UTC

Today's new Wikipedia articles: Hanner polytopes and Kalai's 3d conjecture.

The 3d conjecture states that a centrally symmetric d-dimensional polytope must have at least 3d faces (of all dimensions including the d-dimensional polytope itself as a face but not including the –1-dimensional empty set). For instance, the cube has 8 vertices, 12 edges, 6 squares, and 1 cube as faces; 8 + 12 + 6 + 1 = 27 = 33.

The Hanner polytopes include the cubes, are closed under Cartesian products and duality, and if the conjecture is true have the fewest possible numbers of faces of all centrally symmetric d-dimensional polytopes. They also have some other interesting properties. In 3d there are only the cube and the octahedron, but in higher dimensions there are a lot more of them.

at May 17, 2013 07:37 AM UTC

Earlier this week Georgia Tech announced the Online Masters of Science in Computer Science, a MOOCs-based degree with a total tuition of about $7000. This degree came out of a collaboration between Sebastian Thrun of Udacity and my dean Zvi Galil with some significant financial support from AT&T. We've spent several months getting faculty input and buy-in to the program and we're very excited about taking a new leading role in the MOOCs revolution.

We will roll out slowly, with a smaller scale courses to corporate affiliates to work out the kinks and the plan to go to the general public in fall 2014. Read the FAQ to get more information about the program.

It's been fun watching the development of this degree, in particular hearing Sebastian talk about his MOOC 2.0 plans to scale courses with a small amount of expense that we pull from the tuition. No doubt we will have challenges in making this degree truly work at a large scale but I'm truly bullish that we'll a self-sustaining quality Masters program that will reach tens if not hundreds of thousands of students.

Here we go.

by Lance Fortnow (noreply@blogger.com) at May 17, 2013 02:03 AM UTC

Authors: Dilip K. Prasad
Download: PDF
Abstract: This thesis presents important insights and concepts related to the topic of the extraction of geometric primitives from the edge contours of digital images. Three specific problems related to this topic have been studied, viz., polygonal approximation of digital curves, tangent estimation of digital curves, and ellipse fitting anddetection from digital curves. For the problem of polygonal approximation, two fundamental problems have been addressed. First, the nature of the performance evaluation metrics in relation to the local and global fitting characteristics has been studied. Second, an explicit error bound of the error introduced by digitizing a continuous line segment has been derived and used to propose a generic non-heuristic parameter independent framework which can be used in several dominant point detection methods. For the problem of tangent estimation for digital curves, a simple method of tangent estimation has been proposed. It is shown that the method has a definite upper bound of the error for conic digital curves. It has been shown that the method performs better than almost all (seventy two) existing tangent estimation methods for conic as well as several non-conic digital curves. For the problem of fitting ellipses on digital curves, a geometric distance minimization model has been considered. An unconstrained, linear, non-iterative, and numerically stable ellipse fitting method has been proposed and it has been shown that the proposed method has better selectivity for elliptic digital curves (high true positive and low false positive) as compared to several other ellipse fitting methods. For the problem of detecting ellipses in a set of digital curves, several innovative and fast pre-processing, grouping, and hypotheses evaluation concepts applicable for digital curves have been proposed and combined to form an ellipse detection method.

at May 17, 2013 12:40 AM UTC

Authors: Anna Blasiak
Download: PDF
Abstract: We introduce a new technique to certify lower bounds on the multicut size using network coding. In directed networks the network coding rate is not a lower bound on the multicut, but we identify a class of networks on which the rate is equal to the size of the minimum multicut and show this class is closed under the strong graph product. We then show that the famous construction of Saks et al. that gives a $\Theta(k)$ gap between the multicut and the multicommodity flow rate is contained in this class. This allows us to apply our result to strengthen their multicut lower bound, determine the exact value of the minimum multicut, and give an optimal network coding solution with rate matching the multicut.

at May 17, 2013 12:41 AM UTC

Authors: Lorenzo De Stefani, Francesco Silvestri
Download: PDF
Abstract: We extend the Faulty RAM model by Italiano and Finocchi (2008) by adding a safe memory of arbitrary size $S$, and we then derive tradeoffs between the performance of resilient algorithmic techniques and the size of the safe memory. In particular, we describe a resilient algorithm for sorting $n$ entries in $\BO{n\log n+\alpha (\delta/S + \log S)}$ time using $\BT{S}$ safe memory locations, where $\alpha\leq \delta$ denotes the actual number of faults. Our algorithm outperforms previous resilient sorting algorithms which do not exploit the available safe memory and require $\BO{n\log n+ \alpha\delta}$ time. Finally, we exploit our merging algorithm for deriving a resilient priority queue data structure requiring $\BO{\log n + \delta/S}$ amortized time per operation. Our priority queue improves the $\BO{\log n + \delta}$ amortized time required by the state of the art.

at May 17, 2013 12:41 AM UTC

Authors: Zahra Jafargholi, Emanuele Viola
Download: PDF
Abstract: We show that if one can solve 3SUM on a set of size n in time n^{1+\e} then one can list t triangles in a graph with m edges in time O(m^{1+\e}t^{1/3-\e/3}). This is a reversal of Patrascu's reduction from 3SUM to listing triangles (STOC '10). Our result builds on and extends works by the Paghs (PODS '06) and by Vassilevska and Williams (FOCS '10). We make our reductions deterministic using tools from pseudorandomness.

We then re-execute both Patrascu's reduction and our reversal for the variant 3XOR of 3SUM where integer summation is replaced by bit-wise xor. As a corollary we obtain that if 3XOR is solvable in linear time but 3SUM requires quadratic randomized time, or vice versa, then the randomized time complexity of listing m triangles in a graph with $m$ edges is m^{4/3} up to a factor m^\alpha for any \alpha > 0.

at May 17, 2013 12:00 AM UTC

Authors: Fada Li, Wansu Bao, Xiangqun Fu, Yuchao Zhang, Tan Li
Download: PDF
Abstract: Learning with Errors (LWE) problems are the foundations for numerous applications in lattice-based cryptography and are prov-ably as hard as approximate lattice problems in the worst case. The fastest known classical algorithm takes exponential time, and prior to our work no faster quantum algorithm was known. It would therefore be highly desirable to make algorithmic improvement on LWE. Here we propose a subexponential-time quantum algorithm for LWE problem.The algorithm is based on a reduction to a two point problem and a new reduction from two point problem to dihedral coset problem. Our algorithm introduces the method that may be applicable to other lattice problems.

at May 17, 2013 12:40 AM UTC

Authors: Sepp Hartung, Christian Komusiewicz, André Nichterlein, Ondrej Suchý
Download: PDF
Abstract: The NP-hard 2-Club problem is, given an undirected graph G=(V,E) and l\in N, to decide whether there is a vertex set S\subseteq V of size at least l such that the induced subgraph G[S] has diameter at most two. We make progress towards a systematic classification of the complexity of 2-Club with respect to a hierarchy of prominent structural graph parameters. First, we present the following tight NP-hardness results: 2-Club is NP-hard on graphs that become bipartite by deleting one vertex, on graphs that can be covered by three cliques, and on graphs with domination number two and diameter three. Then, we consider the parameter h-index of the input graph. This parameter is motivated by real-world instances and the fact that 2-Club is fixed-parameter tractable with respect to the larger parameter maximum degree. We present an algorithm that solves 2-Club in |V|^{f(k)} time with k being the h-index. By showing W[1]-hardness for this parameter, we provide evidence that the above algorithm cannot be improved to a fixed-parameter algorithm. Furthermore, the reduction used for this hardness result can be modified to show that 2-Club is NP-hard if the input graph has constant degeneracy. Finally, we show that 2-Club is fixed-parameter tractable with respect to distance to cographs.

at May 17, 2013 12:00 AM UTC

Authors: Jianhang Gao, Qing Zhao, Ananthram Swami
Download: PDF
Abstract: We formulate and study the thinnest path problem in wireless ad hoc networks. The objective is to find a path from the source to the destination that results in the minimum number of nodes overhearing the message by carefully choosing the relaying nodes and their corresponding transmission power. We adopt a directed hypergraph model of the problem and establish the NP-completeness of the problem in 2-D networks. We then develop two polynomial-time approximation algorithms that offer $\sqrt{\frac{n}{2}}$ and $\frac{n}{2\sqrt{n-1}}$ for general directed hypergraphs (which can model non-isomorphic signal propagation in space) and constant approximation ratios for ring hypergraphs (which result from isomorphic signal propagation). We also consider the thinnest path problem in 1-D networks and 1-D networks embedded in 2-D field of eavesdroppers with arbitrary unknown locations (the so-called 1.5-D networks). We propose a linear-complexity algorithm based on nested backward induction that obtains the optimal solution to both 1-D and 1.5-D networks. In particular, no algorithm, even with complete location information of the eavesdroppers, can obtain a thinner path than the proposed algorithm which does not require the knowledge of eavesdropper locations.

at May 17, 2013 12:40 AM UTC

Authors: Manuel Gomez Rodriguez, Jure Leskovec, Bernhard Schoelkopf
Download: PDF
Abstract: Networks provide a skeleton for the spread of contagions, like, information, ideas, behaviors and diseases. Many times networks over which contagions diffuse are unobserved and need to be inferred. Here we apply survival theory to develop general additive and multiplicative risk models under which the network inference problems can be solved efficiently by exploiting their convexity. Our additive risk model generalizes several existing network inference models. We show all these models are particular cases of our more general model. Our multiplicative model allows for modeling scenarios in which a node can either increase or decrease the risk of activation of another node, in contrast with previous approaches, which consider only positive risk increments. We evaluate the performance of our network inference algorithms on large synthetic and real cascade datasets, and show that our models are able to predict the length and duration of cascades in real data.

at May 17, 2013 12:41 AM UTC

Update (May 17): Daniel Lidar emailed me to clarify his views about error-correction and the viability of D-Wave’s approach.  He invited me to share his clarification with others—something that I’m delighted to do, since I agree with him wholeheartedly.  Without further ado, here’s what Lidar says:

I don’t believe D-Wave’s approach is scalable without error correction.  I believe that the incorporation of error correction is a necessary condition in order to ever achieve a speedup with D-Wave’s machines, and I don’t believe D-Wave’s machines are any different from other types of quantum information processing in this regard.  I have repeatedly made this point to D-Wave over several years, and I hope that in the future their designs will allow more flexibility in the incorporation of error correction.

Lidar also clarified that he not only doesn’t dispute what Matthias Troyer told me about the lack of speedup of the D-Wave device compared to classical simulated annealing in their experiments, but “fully agrees, endorses, and approves” of it—and indeed, that he himself was part of the team that did the comparison.

In other news, this Hacker News thread, which features clear, comprehending discussions of this blog post and the backstory that led up to it, has helped to restore my faith in humanity.


Two years ago almost to the day, I announced my retirement as Chief D-Wave Skeptic.  But—as many readers predicted at the time—recent events (and the contents of my inbox!) have given me no choice except to resume my post.  In an all-too-familiar pattern, multiple rounds of D-Wave-related hype have made it all over the world before the truth has had time to put its pants on and drop its daughter off in daycare.  And the current hype is particularly a shame, because once one slices through all the layers of ugh—the rigged comparisons, the “dramatic announcements” that mean nothing, the lazy journalists cherry-picking what they want to hear and ignoring the inconvenient bits—there really has been a huge scientific advance this past month in characterizing the D-Wave devices.  I’m speaking about the experiments on the D-Wave One installed at USC, the main results of which finally appeared in April.  Two of the coauthors of this new work—Matthias Troyer and Daniel Lidar—were at MIT recently to speak about their results, Troyer last week and Lidar this Tuesday.  Intriguingly, despite being coauthors on the same paper, Troyer and Lidar have very different interpretations of what their results mean, but we’ll get to that later.  For now, let me summarize what I think their work has established.

Evidence for Quantum Annealing Behavior

For the first time, we have evidence that the D-Wave One is doing what should be described as “quantum annealing” rather than “classical annealing” on more than 100 qubits.  (Note that D-Wave itself now speaks about “quantum annealing” rather than “quantum adiabatic optimization.”  The difference between the two is that the adiabatic algorithm runs coherently, at zero temperature, while quantum annealing is a “messier” version in which the qubits are strongly coupled to their environment throughout, but still maintain some quantum coherence.)  The evidence for quantum annealing behavior is still extremely indirect, but despite my “Chief Skeptic” role, I’m ready to accept what the evidence indicates with essentially no hesitation.

So what is the evidence?  Basically, the USC group ran the D-Wave One on a large number of randomly generated instances of what I’ll call the “D-Wave problem”: namely, the problem of finding the lowest-energy configuration of an Ising spin glass, with nearest-neighbor interactions that correspond to the D-Wave chip’s particular topology.  Of course, restricting attention to this “D-Wave problem” tilts the tables heavily in D-Wave’s favor, but no matter: scientifically, it makes a lot more sense than trying to encode Sudoku puzzles or something like that.  Anyway, the group then looked at the distribution of success probabilities when each instance was repeatedly fed to the D-Wave machine.  For example, would the randomly-generated instances fall into one giant clump, with a few outlying instances that were especially easy or especially hard for the machine?  Surprisingly, they found that the answer was no: the pattern was strongly bimodal, with most instances either extremely easy or extremely hard, and few instances in between.  Next, the group fed the same instances to Quantum Monte Carlo: a standard classical algorithm that uses Wick rotation to find the ground states of “stoquastic Hamiltonians,” the particular type of quantum evolution that the D-Wave machine is claimed to implement.  When they did that, they found exactly the same bimodal pattern that they found with the D-Wave machine.  Finally they fed the instances to a classical simulated annealing program—but there they found a “unimodal” distribution, not a bimodal one.  So, their conclusion is that whatever the D-Wave machine is doing, it’s more similar to Quantum Monte Carlo than it is to classical simulated annealing.

Curiously, we don’t yet have any hint of a theoretical explanation for why Quantum Monte Carlo should give rise to a bimodal distribution, while classical simulating annealing should give rise to a unimodal one.  The USC group simply observed the pattern empirically (as far as I know, they’re the first to do so), then took advantage of it to characterize the D-Wave machine.  I regard explaining this pattern as an outstanding open problem raised by their work.

In any case, if we accept that the D-Wave One is doing “quantum annealing,” then despite the absence of a Bell-inequality violation or other direct evidence, it’s reasonably safe to infer that there should be large-scale entanglement in the device.  I.e., the true quantum state is no doubt extremely mixed, but there’s no particular reason to believe we could decompose that state into a mixture of product states.  For years, I tirelessly repeated that D-Wave hadn’t even provided evidence that its qubits were entangled—and that, while you can have entanglement with no quantum speedup, you can’t possibly have a quantum speedup without at least the capacity to generate entanglement.  Now, I’d say, D-Wave finally has cleared the evidence-for-entanglement bar—and, while they’re not the first to do so with superconducting qubits, they’re certainly the first to do so with so many superconducting qubits.  So I congratulate D-Wave on this accomplishment.  If this had been advertised from the start as a scientific research project—”of course we’re a long way from QC being practical; no one would ever claim otherwise; but as a first step, we’ve shown experimentally that we can entangle 100 superconducting qubits with controllable couplings”—my reaction would’ve been, “cool!”  (Similar to my reaction to any number of other steps toward scalable QC being reported by research groups all over the world.)

No Speedup Compared to Classical Simulated Annealing

But of course, D-Wave’s claims—and the claims being made on its behalf by the Hype-Industrial Complex—are far more aggressive than that.  And so we come to the part of this post that has not been pre-approved by the International D-Wave Hype Repeaters Association.  Namely, the same USC paper that reported the quantum annealing behavior of the D-Wave One, also showed no speed advantage whatsoever for quantum annealing over classical simulated annealing.  In more detail, Matthias Troyer’s group spent a few months carefully studying the D-Wave problem—after which, they were able to write optimized simulated annealing code that solves the D-Wave problem on a normal, off-the-shelf classical computer, about 15 times faster than the D-Wave machine itself solves the D-Wave problem!  Of course, if you wanted even more classical speedup than that, then you could simply add more processors to your classical computer, for only a tiny fraction of the ~$10 million that a D-Wave One would set you back.

Some people might claim it’s “unfair” to optimize the classical simulated annealing code to take advantage of the quirks of the D-Wave problem.  But think about it this way: D-Wave has spent ~$100 million, and hundreds of person-years, optimizing the hell out of a special-purpose annealing device, with the sole aim of solving this one problem that D-Wave itself defined.  So if we’re serious about comparing the results to a classical computer, isn’t it reasonable to have one professor and a few postdocs spend a few months optimizing the classical code as well?

As I said, besides simulated annealing, the USC group also compared the D-Wave One’s performance against a classical implementation of Quantum Monte Carlo.  And maybe not surprisingly, the D-Wave machine was faster than a “direct classical simulation of itself” (I can’t remember how many times faster, and couldn’t find that information in the paper).  But even here, there’s a delicious irony.  The only reason the USC group was able to compare the D-Wave one against QMC at all, is that QMC is efficiently implementable on a classical computer!  (Albeit probably with a large constant overhead compared to running the D-Wave annealer itself—hence the superior performance of classical simulated annealing over QMC.)  This means that, if the D-Wave machine can be understood as reaching essentially the same results as QMC (technically, “QMC with no sign problem”), then there’s no real hope for using the D-Wave machine to get an asymptotic speedup over a classical computer.  The race between the D-Wave machine and classical simulations of the machine would then necessarily be a cat-and-mouse game, a battle of constant factors with no clear asymptotic victor.  (Some people might conjecture that it will also be a “Tom & Jerry game,” the kind where the classical mouse always gets the better of the quantum cat.)

At this point, it’s important to give a hearing to three possible counterarguments to what I’ve written above.

The first counterargument is that, if you plot both the runtime of simulated annealing and the runtime of the D-Wave machine as functions of the instance size n, you find that, while simulated annealing is faster in absolute terms, it can look like the curve for the D-Wave machine is less steep.  Over on the blog “nextbigfuture”, an apparent trend of this kind has been fearlessly extrapolated to predict that with 512 qubits, the D-Wave machine will be 10 billion times faster than a classical computer.  But there’s a tiny fly in the ointment.  As Troyer carefully explained to me last week, the “slow growth rate” of the D-Wave machine’s runtime is, ironically, basically an artifact of the machine being run too slowly on small values of n.  Run the D-Wave machine as fast as it can run for small n, and the difference in the slopes disappears, with only the constant-factor advantage for simulated annealing remaining.  In short, there seems to be no evidence, at present, that the D-Wave machine is going to overtake simulated annealing for any instance size.

The second counterargument is that the correlation between the two “bimodal distributions”—that for the D-Wave machine and that for the Quantum Monte Carlo simulation—is not perfect.  In other words, there are a few instances (not many) that QMC solves faster than the D-Wave machine, and likewise a few instances that the D-Wave machine solves faster than QMC.  Not surprisingly, the latter fact has been eagerly seized on by the D-Wave boosters (“hey, sometimes the machine does better!”).  But Troyer has a simple and hilarious response to that.  Namely, he found that his group’s QMC code did a better job of correlating with the D-Wave machine, than the D-Wave machine did of correlating with itself!  In other words, calibration errors seem entirely sufficient to explain the variation in performance, with no need to posit any special class of instances (however small) on which the D-Wave machine dramatically outperforms QMC.

The third counterargument is just the banal one: the USC experiment was only one experiment with one set of instances (albeit, a set one might have thought would be heavily biased toward D-Wave).  There’s no proof that, in the future, it won’t be discovered that the D-Wave machine does something more than QMC, and that there’s some (perhaps specially-designed) set of instances on which the D-Wave machine asymptotically outperforms both QMC and Troyer’s simulated annealing code.  (Indeed, I gather that folks at D-Wave are now assiduously looking for such instances.)  Well, I concede that almost anything is possible in the future—but “these experiments, while not supporting D-Wave’s claims about the usefulness of its devices, also don’t conclusively disprove those claims” is a very different message than what’s currently making it into the press.

Comparison to CPLEX is Rigged

Unfortunately, the USC paper is not the one that’s gotten the most press attention—perhaps because half of it inconveniently told the hypesters something they didn’t want to hear (“no speedup”).  Instead, journalists have preferred a paper released this week by Catherine McGeoch and Cong Wang, which reports that quantum annealing running on the D-Wave machine outperformed the CPLEX optimization package running on a classical computer by a factor of ~3600, on Ising spin problems involving 439 bits.  Wow!  That sounds awesome!  But before rushing to press, let’s pause to ask ourselves: how can we reconcile this with the USC group’s result of no speedup?

The answer turns out to be painfully simple.  CPLEX is a general-purpose, off-the-shelf exact optimization package.  Of course an exact solver can’t compete against quantum annealing—or for that matter, against classical annealing or other classical heuristics!  Noticing this problem, McGeoch and Wang do also compare the D-Wave machine against tabu search, a classical heuristic algorithm.  When they do so, they find that an advantage for the D-Wave machine persists, but it becomes much, much smaller (they didn’t report the exact time comparison).  Amusingly, they write in their “Conclusions and Future Work” section:

It would of course be interesting to see if highly tuned implementations of, say, tabu search or simulated annealing could compete with Blackbox or even QA [i.e., the D-Wave machines] on QUBO [quadratic binary optimization] problems; some preliminary work on this question is underway.

As I said above, at the time McGeoch and Wang’s paper was released to the media (though maybe not at the time it was written?), the “highly tuned implementation” of simulated annealing that they ask for had already been written and tested, and the result was that it outperformed the D-Wave machine on all instance sizes tested.  In other words, their comparison to CPLEX had already been superseded by a much more informative comparison—one that gave the “opposite” result—before it ever became public.  For obvious reasons, most press reports have simply ignored this fact.

Troyer, Lidar, and Stone Soup

Much of what I’ve written in this post, I learned by talking to Matthias Troyer—the man who carefully experimented with the D-Wave machine and figured out how to beat it using simulated annealing, and who I regard as probably the world’s #1 expert right now on what exactly the machine does.  Troyer wasn’t shy about sharing his opinions, and while couched with qualifications, they tended toward extremely skeptical.  For example, Troyer conjectured that, if D-Wave ultimately succeeds in getting a speedup over classical computers in a fair comparison, then it will probably be by improving coherence and calibration, incorporating error-correction, and doing other things that “traditional,” “academic” quantum computing researchers had said all along would need to be done.

As I said, Danny Lidar is another coauthor on the USC paper, and also recently visited MIT to speak.  Lidar and Troyer agree on the basic facts—yet Lidar noticeably differed from Troyer, in trying to give each fact the most “pro-D-Wave spin” it could possibly support.  Lidar spoke at our quantum group meeting, not about the D-Wave vs. simulated annealing performance comparison (which he agrees with), but about a proposal of his for incorporating quantum error-correction into the D-Wave device, together with some experimental results.  He presented his proposal, not as a reductio ad absurdum of D-Wave’s entire philosophy, but rather as a positive opportunity to get a quantum speedup using D-Wave’s approach.

So, to summarize my current assessment of the situation: yes, absolutely, D-Wave might someday succeed—ironically, by adapting the very ideas from “the gate model” that its entire business plan has been based on avoiding, and that D-Wave founder Geordie Rose has loudly denigrated for D-Wave’s entire history!  If that’s what happens, then I predict that science writers, and blogs like “nextbigfuture,” will announce from megaphones that D-Wave has been vindicated at last, while its narrow-minded, theorem-obsessed, ivory-tower academic naysayers now have egg all over their faces.  No one will care that the path to success—through quantum error-correction and so on—actually proved the academic critics right, and that D-Wave’s “vindication” was precisely like that of the deliciousness of stone soup in the old folktale.  As for myself, I’ll probably bang my head on my desk until I sustain so much brain damage that I no longer care either.  But at least I’ll still have tenure, and the world will have quantum computers.

The Messiah’s Quantum Annealer

Over the past few days, I’ve explained the above to at least six different journalists who asked.  And I’ve repeatedly gotten a striking response: “What you say makes sense—but then why are all these prestigious people and companies investing in D-Wave?  Why did Bo Ewald, a prominent Silicon Valley insider, recently join D-Wave as president of its US operations?  Why the deal with Lockheed Martin?  Why the huge deal with NASA and Google, just announced today?  What’s your reaction to all this news?”

My reaction, I confess, is simple.  I don’t care—I actually told them this—if the former Pope Benedict has ended his retirement to become D-Wave’s new marketing director.  I don’t care if the Messiah has come to Earth on a flaming chariot, not to usher in an age of peace but simply to spend $10 million on D-Wave’s new Vesuvius chip.  And if you imagine that I’ll ever care about such things, then you obviously don’t know much about me.  I’ll tell you what: if peer pressure is where it’s at, then come to me with the news that Umesh Vazirani, or Greg Kuperberg, or Matthias Troyer is now convinced, based on the latest evidence, that D-Wave’s chip asymptotically outperforms simulated annealing in a fair comparison, and does so because of quantum effects.  Any one such scientist’s considered opinion would mean more to me than 500,000 business deals.

The Argument from Consequences

Let me end this post with an argument that several of my friends in physics have explicitly made to me—not in the exact words below but in similar ones.

“Look, Scott, let the investors, government bureaucrats, and gullible laypeople believe whatever they want—and let D-Wave keep telling them whatever’s necessary to stay in business.  It’s unsportsmanlike and uncollegial of you to hold D-Wave’s scientists accountable for whatever wild claims their company’s PR department might make.  After all, we’re in this game too!  Our universities put out all sorts of overhyped press releases, but we don’t complain because we know that it’s done for our benefit.  Besides, you’d doubtless be trumpeting the same misleading claims, if you were in D-Wave’s shoes and needed the cash infusions to survive.  Anyway, who really cares whether there’s a quantum speedup yet or no quantum speedup?  At least D-Wave is out there trying to build a scalable quantum computer, and getting millions of dollars from Jeff Bezos, Lockheed, Google, the CIA, etc. etc. to do so—resources more of which would be directed our way if we showed a more cooperative attitude!  If we care about scalable QCs ever getting built, then the wise course is to celebrate what D-Wave has done—they just demonstrated quantum annealing on 100 qubits, for crying out loud!  So let’s all be grownups here, focus on the science, and ignore the marketing buzz as so much meaningless noise—just like a tennis player might ignore his opponent’s trash-talking (‘your mother is a whore,’ etc.) and focus on the game.”

I get this argument: really, I do.  I even concede that there’s something to be said for it.  But let me now offer a contrary argument for the reader’s consideration.

Suppose that, unlike in the “stone soup” scenario I outlined above, it eventually becomes clear that quantum annealing can be made to work on thousands of qubits, but that it’s a dead end as far as getting a quantum speedup is concerned.  Suppose the evidence piles up that simulated annealing on a conventional computer will continue to beat quantum annealing, if even the slightest effort is put into optimizing the classical annealing code.  If that happens, then I predict that the very same people now hyping D-Wave will turn around and—without the slightest acknowledgment of error on their part—declare that the entire field of quantum computing has now been unmasked as a mirage, a scam, and a chimera.  The same pointy-haired bosses who now flock toward quantum computing, will flock away from it just as quickly and as uncomprehendingly.  Academic QC programs will be decimated, despite the slow but genuine progress that they’d been making the entire time in a “parallel universe” from D-Wave.  People’s contempt for academia is such that, while a D-Wave success would be trumpeted as its alone, a D-Wave failure would be blamed on the entire QC community.

When it comes down to it, that’s the reason why I care about this matter enough to have served as “Chief D-Wave Skeptic” from 2007 to 2011, and enough to resume my post today.  As I’ve said many times, I really, genuinely hope that D-Wave succeeds at building a QC that achieves an unambiguous speedup!  I even hope the academic QC community will contribute to D-Wave’s success, by doing careful independent studies like the USC group did, and by coming up with proposals like Lidar’s for how D-Wave could move forward.  On the other hand, in the strange, unlikely event that D-Wave doesn’t succeed, I’d like people to know that many of us in the QC community were doing what academics are supposed to do, which is to be skeptical and not leave obvious questions unasked.  I’d like them to know that some of us simply tried to understand and describe what we saw in front of us—changing our opinions repeatedly as new evidence came in, but disregarding “meta-arguments” like my physicist friends’ above.  The reason I can joke about how easy it is to bribe me is that it’s actually kind of hard.

by Scott at May 16, 2013 05:41 PM UTC


Can we help avoid parallel repetition of mistakes?

irit

Irit Dinur has recently again shown a wonderful skill at re-conceptualizing an area that had seemingly been well worked out. A notable previous instance was her re-casting the proof of the PCP Theorem as a progressive amplification. Now she and David Steurer have posted a new paper whose title, “Analytical Approach to Parallel Repetition,” introduces a new framework. The subject of parallel repetition, which they call “a basic product operation for one-round two-player games,” is distinctive in having its genesis in a mistake made in a paper—a trap of automatic thinking. Lance Fortnow and Mike Sipser thought that executing multiple instances of a randomized protocol in parallel would have the same power-law reduction in the probability of overall failure as executing them independently in sequence, overlooking how the strategies in the parallel cases could be dependent and exploit this.

Today we talk about similar traps that people can fall into even at advanced graduate level. How might they be avoided?

Truth-in-blogging note: this post is really about a different case of products and independence, and most of it was written months ago. It was lacking an intro section with someone to feature according to our “blog invariant,” and we also wanted a few short examples of graduate-student traps in computational theory and mathematics before progressing to the main one. The parallel repetition example came not only first to mind, but also second and third and fourth… as Dick and I struggled to think of more good ones. Lance’s haute faute has already been mentioned a few times on this blog, and I thought it would make a tiresome and repetitious parallel to my own “trap.” It didn’t help that the neat example I saw online years ago which furnished the phrase “graduate-student trap”—but which I didn’t preserve and forgot—has evidently vanished into unsearchability.

I was trying to decide between leading with the late-medieval mathematician-theologian Nicholas de Cusa for his advice on not pretending to have completed knowledge, or alternately a colleague in Buffalo who has compiled a good graduate-student advice list. Lance has similar advice, and when looking for it I spotted the mention of Dinur in his Twitter feed—actually Lance’s re-tweet of one by my former student D. Sivakumar. Voilà—Lance’s example it was. Thanks, Irit.

Traps

What is a trap? Pitfalls and paradoxes abound in mathematics and the sciences, and surmounting them is just part of acquiring the literature. Sometimes it is a confounding of preconceived expectations, but it is hard to find a way of defining such expectations that works for everybody or even most people. What makes a trap, in my opinion, is there being concrete indications in the concepts, in their contextual use, in their notation, or in the literature itself that run counter to the truth. Here are what strike Dick and me as a few simple but significant examples:

{\bullet} Square root is not a function. It is written like a function, but isn’t. Here is an example of what you can “do” by conveniently forgetting this: {-1/1 = 1/-1}, so take square roots of both sides. You get

\displaystyle \frac{\sqrt{-1}}{\sqrt{1}} = \frac{\sqrt{1}}{\sqrt{-1}}\quad\text{so}\quad \frac{i}{1} = \frac{1}{i}\quad\text{so}\quad i^2 = 1.

This contradicts the definition {i^2 = -1}.

{\bullet} Not all matrices are diagonalizable. Since even many singular matrices are diagonalizable, it is easy to forget this is not true in general. If it were true, then there would be a really quick proof that a matrix {A} always satisfies its characteristic polynomial {p}. Namely, let {A = P D P^{-1}} with {D} the diagonal matrix of eigenvalues. Then on substituting the right-hand side into the formula {p(A)}, all the {P}‘s and {P^{-1}}‘s cancel except for one {P} in front and one {P^{-1}} in back. The rest is the component-wise evaluation {p(\lambda)} on each eigenvalue {\lambda} in {D}, which identically vanishes, leaving the all-zero matrix.

Well this is often not a bad error to make. Every matrix {A} has arbitrarily close perturbed forms {A'} that are diagonalizable, indeed have distinct eigenvalues. The above proof gives {p'(A') = 0} where the characteristic polynomial {p'} is coefficient-wise close to {p}. Continuity then implies {p(A) = 0}.

{\bullet} {\mathbb{Z}_{p^k}} is not the same as the field {GF(p^k)}. The former is not a field, as it has zero divisors. The multiplicative subgroup formed by the elements that are not multiples of {p} is not a field either. But this is again not always a bad error to make, even in crypto. A lot of properties and problems are similar between the structures.

These are really at high school or undergraduate level, before the research stage. What kind of traps matter at research level?

My Trap

My own strongest feeling of falling into a “graduate student trap” came in October 2006, as I began my work on statistical claims of cheating with computers at chess that had arisen during the world championship match that month and before. I modeled a human player and a computer as distributions {P} and {Q} over the choices of available moves in game positions. Cheating would depend on how close the ensemble of played moves was to {P} vis-à-vis {Q}, so I wanted a suitable distance measure {d(P,Q)} between distributions. Modeling the computer as a distribution not only allows for different chess-engine program versions and parameter settings, but also for a steady amount of small variation caused by hash collisions—as I described first here and mainly here.

I decided to postulate that for two different (sets of) positions {S_1} and {S_2}, the player’s distributions {P_1,P_2} would be independent, and similarly {Q_1,Q_2} for the computer. This makes the joint distributions {P} and {Q} on {S_1 \cup S_2} satisfy {P = P_1 \times P_2} and {Q = Q_1 \times Q_2}. So that I could group game turns as I wished, I wanted the distance measure to be additive, namely

\displaystyle  d(P,Q) = d(P_1 \times P_2, Q_1 \times Q_2) = d(P_1,Q_1) + d(P_2,Q_2).

The first distance measure I considered, called Kullback-Leibler (K-L) divergence, is defined (on discrete domains {X}) by

\displaystyle  \kappa(P || Q) = \sum_{x \in X}P(x)\ln\frac{P(x)}{Q(x)}.

Its Wikipedia page says straight out that {\kappa} is additive. Great, I thought.

Unfortunately, K-L is not symmetric, and more of concern to me, {\kappa} approaches {+\infty} whenever there are events {x} for which {Q(x)} is tiny but {P(x)} is not. In chess, such events would be moves the computer recognizes as bad but that players still fall into, or are tempted by. This was a concern because chess positions can have many bad moves, so that the “tail” of the move distribution could distort the value of {\kappa}. I could switch around {P} and {Q} to avoid this, but then reasonable moves shunned by players would cause other distortion.

Is Jensen-Shannon Divergence Additive?

Applications employing distributional divergence measures were new to me, but it so happened that my department’s Distinguished Alumni Speaker that month knew something about them. After hearing my issues, he—I won’t name the “guilty party,” though I already did—suggested using Jensen-Shannon (J-S) divergence instead. J-S reduces this distortion by employing the interpolated distribution {R = \frac{1}{2}P + \frac{1}{2}Q}. Then it is defined from two invocations of K-L by

\displaystyle  \eta(P,Q) = \frac{1}{2}\kappa(P || R) + \frac{1}{2}\kappa(Q || R).

This always gives finite values, and is symmetric—hence the use of comma not {||}. Analogous to how the sum of squared differences, which is obviously additive on product vectors, is the square of the Euclidean metric, {\eta} is also the square of a metric. All this, plus the absence of contrary information, plus the frequent words “J-S is a symmetrized and smoothed version of K-L,” naturally made me assume that {\eta} was additive. Grateful for the tip, I happily started on the machinery to apply it for chess.

A week later I started drafting a paper describing my concept and model, and decided it would be good to include a proof that J-S divergence is additive. Then, only then, is when I discovered with an electric shock:

It isn’t.

I’ll leave the task of actually constructing counterexamples to the reader, but here’s an intuition. It uses a generalizing trick that reminds me of one by Bob Vaughan that we covered last July. For any {\lambda} put {R' = \lambda P + (1 - \lambda)Q}, and then define

\displaystyle  \eta'_{\lambda}(P,Q) = \lambda \kappa(P||R') + (1 - \lambda)\kappa(Q||R').

A little reflection may convince you that this cannot be additive for all {\lambda}. Hence its being additive for {\lambda = \frac{1}{2}}, which yields {\eta}, would be an “accident.” Finally thinking how {P} and {Q} themselves can give-and-go with {\lambda} gives the inkling that the accident doesn’t happen.

How Clear in the Literature?

I can put this in the form of a “blog-beg” (sometimes called a bleg):

Can you find an easily-accessed source that says clearly that the basic Jensen-Shannon divergence is not additive?

As of this writing, the Wikipedia page on J-S still does have such a statement. Adding one yourself would be cheating. In 2006 I did not find one elsewhere, even in a couple of texts. My one-hour trial by Google when I first drafted this post last summer found one paper in 2008 titled “Nonextensive Generalizations of the Jensen-Shannon Divergence.” This clued me in that nonextensive is the antonym of additive. So the authors’ generalizations are not additive, but what about the original {\eta}?

Another paper I found had the promising title “Properties of Classical and Quantum Jensen-Shannon Divergence,” and even more, its first author and I have Harry Burhman as a common coauthor. It defines J-S with a bang in the opening sentence, states some generalizations of J-S, and (still on page 1) says the magic word:

Shannon entropy is additive in the sense that the entropy of independent random variables, defined as the entropy of their joint distribution, is the sum of their individual entropies. (emphasis in original)

But the next sentence brings up the different topic of Rényi entropy, and after a mention of “non-extensive (i.e. nonadditive) generalizations” of J-S it goes into quantum.

Another paper picks up the thread in its title, “Nonextensive Generalizations of the Jensen-Shannon Divergence.” The point of the first two words must be that the original J-S is additive, yes? It’s the generalizations that are non-additive. Right? The paper’s abstract says it builds something called the Jensen-Tsallis {q}-difference, “which is a nonextensive generalization of the JSD.” So the original J-S is extensive, then? After defining additivity, it brings up the generalizations in which

… the additivity property is abandoned, yielding the so-called nonextensive entropies.

The next paragraph introduces K-L and J-S, but doesn’t tell me whether the “abandoned” property was ever there. It seems that this simple knowledge is presumed, but how might a bright young graduate student—or an old one—find it in the first place?

Open Problems

Can you give some more examples of “graduate-student traps”? Ones that are helpful to know?

Is Jensen-Shannon divergence close to being additive, in some useful sense? This actually strikes me as a non-trappy, research-worthy question.


by KWRegan at May 16, 2013 04:42 AM UTC

Authors: David Kordalewski
Download: PDF
Abstract: The Set Cover problem (SCP) and Set Packing problem (SPP) are standard NP-hard combinatorial optimization problems. Their decision problem versions are shown to be NP-Complete in Karp's 1972 paper. We specify a rough guide to constructing approximation heuristics that may have widespread applications and apply it to devise greedy approximation algorithms for SCP and SPP, where the selection heuristic is a variation of that in the standard greedy approximation algorithm. Our technique involves assigning to each input set a valuation and then selecting, in each round, the set whose valuation is highest. We prove that the technique we use for determining a valuation of the input sets yields a unique value for all Set Cover instances. For both SCP and SPP we give experimental evidence that the valuations we specify are unique and can be computed to high precision quickly by an iterative algorithm. Others have experimented with testing the observed approximation ratio of various algorithms over a variety of randomly generated instances, and we have extensive experimental evidence to show the quality of the new algorithm relative to greedy heuristics in common use. Our algorithms are somewhat more computationally intensive than the standard heuristics, though they are still practical for large instances. We discuss some ways to speed up our algorithms that do not significantly distort their effectiveness in practice on random instances.

at May 16, 2013 12:40 AM UTC

Authors: Shiri Chechik
Download: PDF
Abstract: An approximate distance oracle is a succinct data structure that provides fast answers to distance queries between any two nodes. In this paper we consider approximate distance oracles for general undirected graphs with non-negative edge weights with constant query time. We present a distance oracle of size O(k n^{1+1/k}), with 2k-1 stretch and O(1) query time. This improves the O(log{k}) query time of Wulff-Nilsen's distance oracle [SODA '13], which in turn improved the O(k) query time of Thorup and Zwick's distance oracle [J. ACM '05].

at May 16, 2013 12:40 AM UTC

Authors: Jop Briët, Daniel Dadush, Sebastian Pokutta
Download: PDF
Abstract: In Rothvo{\ss} [2011] it was shown that there exists a 0/1 polytope such that any higher-dimensional polytope projecting to it must have a subexponential number of facets, i.e., its linear extension complexity is subexponential. The question as to whether there exists a 0/1 polytope having high PSD extension complexity was left open, i.e., is there a 0/1 polytope such that any spectrahedron projecting to it must be the intersection of a subexponential sized semidefinite cone and an affine space? We answer this question in the affirmative using a new technique to rescale semidefinite factorizations.

at May 15, 2013 12:40 AM UTC

Authors: Junichiro Fukuyama
Download: PDF
Abstract: The class P is in fact a proper sub-class of NP. We explore topological properties of the Hamming space 2^[n] where [n]={1, 2,..., n}. With the developed theory, we show: (i) a theorem that is closely related to Erdos and Rado's sunflower lemma, and claims a stronger statement in most cases, (ii) a new approach to prove the exponential monotone circuit complexity of the clique problem, (iii) NC \ne NP through the impossibility of a Boolean circuit with poly-log depth to compute cliques, based on the construction of (ii), and (iv) P \ne NP through the exponential circuit complexity of the clique problem, based on the construction of (iii).

Item (i) leads to the existence of a sunflower with a small core in certain families of sets, which is not an obvious consequence of the sunflower lemma. In (iv), we show that any Boolean circuit computing the clique function CLIQUE_{n,k} (k=n^{1/4}) has a size exponential in n. Thus, we will separate P/poly from NP also.

Razborov and Rudich showed strong evidence that no natural proof can prove exponential circuit complexity of a Boolean function. We confirm that the proofs for (iii) and (iv) are not natural.

at May 15, 2013 12:40 AM UTC

Authors: Siu-On Chan, Ilias Diakonikolas, Rocco A. Servedio, Xiaorui Sun
Download: PDF
Abstract: We give a highly efficient "semi-agnostic" algorithm for learning univariate probability distributions that are well approximated by piecewise polynomial density functions. Let $p$ be an arbitrary distribution over an interval $I$ which is $\tau$-close (in total variation distance) to an unknown probability distribution $q$ that is defined by an unknown partition of $I$ into $t$ intervals and $t$ unknown degree-$d$ polynomials specifying $q$ over each of the intervals. We give an algorithm that draws $\tilde{O}(t\new{(d+1)}/\eps^2)$ samples from $p$, runs in time $\poly(t,d,1/\eps)$, and with high probability outputs a piecewise polynomial hypothesis distribution $h$ that is $(O(\tau)+\eps)$-close (in total variation distance) to $p$. This sample complexity is essentially optimal; we show that even for $\tau=0$, any algorithm that learns an unknown $t$-piecewise degree-$d$ probability distribution over $I$ to accuracy $\eps$ must use $\Omega({\frac {t(d+1)} {\poly(1 + \log(d+1))}} \cdot {\frac 1 {\eps^2}})$ samples from the distribution, regardless of its running time. Our algorithm combines tools from approximation theory, uniform convergence, linear programming, and dynamic programming.

We apply this general algorithm to obtain a wide range of results for many natural problems in density estimation over both continuous and discrete domains. These include state-of-the-art results for learning mixtures of log-concave distributions; mixtures of $t$-modal distributions; mixtures of Monotone Hazard Rate distributions; mixtures of Poisson Binomial Distributions; mixtures of Gaussians; and mixtures of $k$-monotone densities. Our general technique yields computationally efficient algorithms for all these problems, in many cases with provably optimal sample complexities (up to logarithmic factors) in all parameters.

at May 15, 2013 12:40 AM UTC

Authors: Travis Gagie, Paweł Gawrychowski, Yakov Nekrich
Download: PDF
Abstract: Suppose we have two trees on the same set of leaves, in which nodes are weighted such that children are heavier than their parents. We say a node from the first tree and a node from the second tree are induced together if they have a common leaf descendant. In this paper we describe data structures that efficiently support the following heaviest-induced-ancestor query: given a node from the first tree and a node from the second tree, find an induced pair of their ancestors with maximum combined weight. Our solutions are based on a geometric interpretation that enables us to find heaviest induced ancestors using range queries. We then show how to use these results to build an LZ-compressed index with which we can quickly find with high probability the longest substring common to the indexed string and a given pattern.

at May 15, 2013 12:40 AM UTC

Authors: Michael R. Fellows, Bart M. P. Jansen
Download: PDF
Abstract: Many graph problems were first shown to be fixed-parameter tractable using the results of Robertson and Seymour on graph minors. We show that the combination of finite, computable, obstruction sets and efficient order tests is not just one way of obtaining strongly uniform FPT algorithms, but that all of FPT may be captured in this way. Our new characterization of FPT has a strong connection to the theory of kernelization, as we prove that problems with polynomial kernels can be characterized by obstruction sets whose elements have polynomial size. Consequently we investigate the interplay between the sizes of problem kernels and the sizes of the elements of such obstruction sets, obtaining several examples of how results in one area yield new insights in the other. We show how exponential-size minor-minimal obstructions for pathwidth k form the crucial ingredient in a novel OR-cross-composition for k-Pathwidth, complementing the trivial AND-composition that is known for this problem. In the other direction, we show that OR-cross-compositions into a parameterized problem can be used to rule out the existence of efficiently generated quasi-orders on its instances that characterize the NO-instances by polynomial-size obstructions.

at May 15, 2013 12:00 AM UTC

Authors: Joan Boyar, Magnus Find
Download: PDF
Abstract: We study the notion of ``cancellation-free'' circuits. This is a restriction of linear Boolean circuits (XOR circuits), but can be considered as being equivalent to previously studied models of computation. The notion was coined by Boyar and Peralta in a study of heuristics for a particular circuit minimization problem. They asked how large a gap there can be between the smallest cancellation-free circuit and the smallest linear circuit. We show that the difference can be a factor $\Omega(n/\log^{2}n)$. This improves on a recent result by Sergeev and Gashkov who have studied a similar problem. Furthermore, our proof holds for circuits of constant depth. We also study the complexity of computing the Sierpinski matrix using cancellation-free circuits and give a tight $\Omega(n\log n)$ lower bound.

at May 15, 2013 12:40 AM UTC

A couple of years ago, Blais, Brody, and Matulef put forward a methodology for proving lower bounds on the query complexity of property testing via communication complexity. They provided a restricted formulation of their methodology (via ``simple combining operators'') and also hinted towards a more general formulation, which we spell out in this paper. A special case of the general formulation proceeds as follows: In order to derive a lower bound on testing the property $\Pi$, one presents a mapping $F$ of pairs of inputs $(x,y)\in\{0,1\}^{n+n}$ for a two-party communication problem $\Psi$ to $\ell(n)$-bit long inputs for $\Pi$ such that $(x,y)\in\Psi$ implies $F(x,y)\in\Pi$ and $(x,y)\not\in\Psi$ implies that $F(x,y)$ is far from $\Pi$. Let $f_i(x,y)$ be the $i^\xth$ bit of $F(x,y)$, and suppose that $B$ is an upper bound on the (deterministic) communication complexity of each $f_i$ and that $C$ is a lower bound on the randomized communication complexity of $\Psi$. Then, testing $\Pi$ requires at least $C/B$ queries. The foregoing formulation is generalized by considering randomized protocols (with small error) for computing the $f_i$'s. In contrast, the restricted formulation (via ``simple combining operators'') requires that each $f_i(x,y)$ be a function of $x_i$ and $y_i$ only, and uses $B=2$ for the straightforward computation of $f_i$. We show that the general formulation cannot yield significantly stronger lower bounds than those that can be obtained by the restricted formulation. Nevertheless, we advocate the use of the general formulation, because we believe that it is easier to work with. Following Blais et al., we also describe a version of the methodology for nonadaptive testers and one-way communication complexity.

at May 14, 2013 12:45 PM UTC

Authors: Andreas Galanis, Daniel Stefankovic, Eric Vigoda
Download: PDF
Abstract: A remarkable connection has been established for 2-spin systems, including the Ising and hard-core models, showing that the computational complexity of approximating the partition function for graphs with maximum degree D undergoes a phase transition that coincides with the statistical physics uniqueness/non-uniqueness phase transition on the infinite D-regular tree. Despite this clear picture for 2-spin systems, there is little known for multi-spin systems. We present an analog of the above inapproximability results for multi-spin systems. We prove that, unless NP=RP, for any antiferromagnetic spin system, there is no FPRAS for the partition function of D-regular graphs when the dominant semi-translation invariant Gibbs measures on the infinite D-regular tree are not translation invariant and are permutation symmetric of each other. Our results apply to the antiferromagnetic Potts model (even q) and colorings problem (even k), which are the multi-spin systems of particular interest. Our proof relies on a simple and generic analysis of the second moment for any spin system. As a consequence we get concentration results for any spin system in which one can analyze the first moment. We also present a tool for simplifying the associated first moment calculations by relating it to the stable fixed points for the tree recursions.

at May 14, 2013 12:40 AM UTC

Authors: Giovanni Viglietta
Download: PDF
Abstract: We study the Art Gallery Problem for face guards in polyhedral environments. The problem can be informally stated as: how many (not necessarily convex) windows should we place on the external walls of a dark building, in order to completely illuminate it?

We consider both closed and open face guards (i.e., faces with or without their boundary), and we give some upper and lower bounds on the minimum number of faces required to guard a given polyhedron, in terms of the total number of its faces, f. In some notable cases, our bounds are tight: f/6 open face guards for orthogonal polyhedra, and f/4 open face guards for 4-oriented polyhedra (i.e., polyhedra whose faces have only four different orientations).

Then we show that it is NP-hard to approximate the minimum number of (closed or open) face guards within a factor of Omega(log f), even for polyhedra that are orthogonal and simply connected.

Along the way we discuss some applications, arguing that face guards are not a reasonable model for guards patroling on the surface of a polyhedron.

at May 14, 2013 12:00 AM UTC

Authors: Andreas Kosmatopoulos, Kostas Tsichlas
Download: PDF
Abstract: Let $\mathcal{S}$ be a dataset of $n$ 2-dimensional points. The top-$k$ dominating query aims to report the $k$ points that dominate the most points in $\mathcal{S}$. A point $p$ dominates a point $q$ iff all coordinates of $p$ are smaller than or equal to those of $q$ and at least one of them is strictly smaller. The top-$k$ dominating query combines the dominance concept of maxima queries with the ranking function of top-$k$ queries and can be used as an important tool in multi-criteria decision making systems. In this work, we propose novel algorithms for answering semi-dynamic (insertions only) and fully dynamic (insertions and deletions) top-$k$ dominating queries. To the best of our knowledge, this is the first work towards handling (semi-)dynamic top-$k$ dominating queries that offers algorithms with asymptotic guarantees regarding their time and space cost.

at May 14, 2013 12:00 AM UTC

Authors: Igor Pak, Jed Yang
Download: PDF
Abstract: In [BNRR], it was shown that tiling of general regions with two rectangles is NP-complete, except for a few trivial special cases. In a different direction, R\'emila showed that for simply connected regions by two rectangles, the tileability can be solved in quadratic time (in the area). We prove that there is a finite set of at most 10^6 rectangles for which the tileability problem of simply connected regions is NP-complete, closing the gap between positive and negative results in the field. We also prove that counting such rectangular tilings is #P-complete, a first result of this kind.

at May 14, 2013 12:00 AM UTC

Authors: Marc Gillé
Download: PDF
Abstract: A graph $G = (V,E)$ can be described by the characteristic function of the edge set $\chi_E$ which maps a pair of binary encoded nodes to 1 iff the nodes are adjacent. Using \emph{Ordered Binary Decision Diagrams} (OBDDs) to store $\chi_E$ can lead to a (hopefully) compact representation. Given the OBDD as an input, symbolic/implicit OBDD-based graph algorithms can solve optimization problems by mainly using functional operations, e.g. quantification or binary synthesis. While the OBDD representation size can not be small in general, it can be provable small for special graph classes and then also lead to fast algorithms. In this paper, we show that the OBDD size of unit interval graphs is $O(\ | V \ | /\log \ | V \ |)$ and the OBDD size of interval graphs is $O(\ | V \ | \log \ | V \ |)$ which both improve a known result from Nunkesser and Woelfel (2009). Furthermore, we can show that using our variable order and node labeling for interval graphs the worst-case OBDD size is $\Omega(\ | V \ | \log \ | V \ |)$. We use the structure of the adjacency matrices to prove these bounds. This method may be of independent interest and can be applied to other graph classes. We also develop a maximum matching algorithm on unit interval graphs using $O(\log \ | V \ |)$ operations and a coloring algorithm for unit and general intervals graphs using $O(\log^2 \ | V \ |)$ operations and evaluate the algorithms empirically.

at May 14, 2013 12:41 AM UTC

Authors: Sylvain Guillemot, Dániel Marx
Download: PDF
Abstract: The \textsc{Bipartite Contraction} problem is to decide, given a graph $G$ and a parameter $k$, whether we can can obtain a bipartite graph from $G$ by at most $k$ edge contractions. The fixed-parameter tractability of the problem was shown by [Heggernes et al. 2011], with an algorithm whose running time has double-exponential dependence on $k$. We present a new randomized FPT algorithm for the problem, which is both conceptually simpler and achieves an improved $2^{O(k^2)} n m$ running time, i.e., avoiding the double-exponential dependence on $k$. The algorithm can be derandomized using standard techniques.

at May 14, 2013 12:41 AM UTC

Authors: Samuel Zbarsky
Download: PDF
Abstract: Given points in Euclidean space of arbitrary dimension, we prove that there exists a spanning tree having no vertices of degree greater than 3 with weight at most 1.561 times the weight of the minimum spanning tree. We also prove that there is a set of points such that no spanning tree of maximal degree 3 exists that has this ratio be less than 1.447. Our central result is based on the proof of the following claim:

Given $n$ points in Euclidean space with one special point $V$, there exists a Hamiltonian path with an endpoint at $V$ that is at most 1.561 times longer than the sum of the distances of the points to $V$.

These proofs also lead to a way to find the tree in linear time given the minimal spanning tree.

at May 14, 2013 12:40 AM UTC

Authors: Mark Korenblit, Vadim E. Levit
Download: PDF
Abstract: The paper investigates relationship between algebraic expressions and graphs. We consider a digraph called a Fibonacci graph which gives a generic example of non-series-parallel graphs. Our intention in this paper is to simplify the expressions of Fibonacci graphs and eventually find their shortest representations. With that end in view, we describe the number of methods for generating Fibonacci graph expressions and carry out their comparative analysis.

at May 14, 2013 12:41 AM UTC

Authors: Mark Korenblit, Vadim E. Levit
Download: PDF
Abstract: The paper investigates relationship between algebraic expressions and graphs. We consider a digraph called a Fibonacci graph which gives a generic example of non-series-parallel graphs. Our intention in this paper is to simplify the expressions of Fibonacci graphs and eventually find their shortest representations. With that end in view, we describe the optimal decomposition method for generating Fibonacci graph expressions that is conjectured to provide these representations. Proof (or disproof) of this conjecture is presented as an open problem.

at May 14, 2013 12:41 AM UTC

Authors: Mark Korenblit
Download: PDF
Abstract: The paper investigates relationship between algebraic expressions and graphs. We consider a digraph called a full square rhomboid that is an example of non-series-parallel graphs. Our intention is to simplify the expressions of full square rhomboids and eventually find their shortest representations. With that end in view, we describe two decomposition methods for generating expressions of full square rhomboids and carry out their comparative analysis.

at May 14, 2013 12:41 AM UTC

Authors: Ashwinkumar Badanidiyuru, Robert Kleinberg, Aleksandrs Slivkins
Download: PDF
Abstract: Multi-armed bandit problems are the predominant theoretical model of exploration-exploitation tradeoffs in learning, and they have countless applications ranging from medical trials, to communication networks, to Web search and advertising. In many of these application domains the learner may be constrained by one or more supply (or budget) limits, in addition to the customary limitation on the time horizon. The literature lacks a general model encompassing these sorts of problems. We introduce such a model, called "bandits with knapsacks", that combines aspects of stochastic integer programming with online learning. A distinctive feature of our problem, in comparison to the existing regret-minimization literature, is that the optimal policy for a given latent distribution may significantly outperform the policy that plays the optimal fixed arm. Consequently, achieving sublinear regret in the bandits-with-knapsacks problem is significantly more challenging than in conventional bandit problems.

We present two algorithms whose reward is close to the information-theoretic optimum: one is based on a novel "balanced exploration" paradigm, while the other is a primal-dual algorithm that uses multiplicative updates. Further, we prove that the regret achieved by both algorithms is optimal up to polylogarithmic factors. We illustrate the generality of the problem by presenting applications in a number of different domains including electronic commerce, routing, and scheduling. As one example of a concrete application, we consider the problem of dynamic posted pricing with limited supply and obtain the first algorithm whose regret, with respect to the optimal dynamic policy, is sublinear in the supply.

at May 14, 2013 12:41 AM UTC

Authors: Dmitry Kosolobov, Mikhail Rubinchik, Arseny M. Shur
Download: PDF
Abstract: We exhibit an online algorithm finding all distinct palindromes inside a given string in time $\Theta(n\log|\Sigma|)$ over an ordered alphabet and in time $\Theta(n|\Sigma|)$ over an unordered alphabet. Using a reduction from a dictionary-like data structure, we prove the optimality of this algorithm in the comparison-based computation model.

at May 14, 2013 12:41 AM UTC

Authors: Sonia Perez-Diaz, Li-Yong Shen
Download: PDF
Abstract: The developable surface is an important surface in computer aided design, geometric modeling and industrial manufactory. It is often given in the stan- dard parametric form, but it can also be in the implicit form which is commonly used in algebraic geometry. Not all algebraic developable surfaces have rational parametrizations. In this paper, we focus on the rational developable surfaces. For a given algebraic surface, we first determine whether it is developable by geometric inspection, and we give a rational proper parametrization for the af- firmative case. For a rational parametric surface, we can also determine the developability and give a proper reparametrization for the developable surface.

at May 14, 2013 12:40 AM UTC

Authors: Sonia Perez-Diaza, Li-Yong Shen
Download: PDF
Abstract: The ruled surface is a typical modeling surface in computer aided geometric design. It is usually given in the standard parametric form. However, it can also be in the forms than the standard one. For these forms, it is necessary to determine and find the standard form. In this paper, we present algorithms to determine whether a given implicit surface is a rational ruled surface. A parametrization of the surface is computed for the affirmative case. We also consider the parametric situation. More precisely, after a given rational parametric surface is determined as a ruled one, we reparameterize it to the standard form.

at May 14, 2013 12:40 AM UTC

Authors: Jierui Xie, Boleslaw K. Szymanski
Download: PDF
Abstract: An important challenge in big data analysis nowadays is detection of cohesive groups in large-scale networks, including social networks, genetic networks, communication networks and so. In this paper, we propose LabelRank, an efficient algorithm detecting communities through label propagation. A set of operators is introduced to control and stabilize the propagation dynamics. These operations resolve the randomness issue in traditional label propagation algorithms (LPA), stabilizing the discovered communities in all runs of the same network. Tests on real-world networks demonstrate that LabelRank significantly improves the quality of detected communities compared to LPA, as well as other popular algorithms.

at May 14, 2013 12:41 AM UTC