Theory of Computing Blog Aggregator

I'm now making my way back from Iceland, where I attended the 15th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2016). Rather than providing a run-down of the talks and results I found interesting (you can choose your own from the conference proceedngs, this year provided as open-access through LIPIcs), I thought it might be more fun to mention a few open problems that people mentioned along the way and that caught my attention.

The first invited talk, by Julia Chuzhoy, concerned the relation between grid minor size and treewidth. There exist graphs of treewidth t whose biggest grid minor has side length O(sqrt t/log t), and the results she spoke about give polynomiial lower bounds, but with much smaller exponents (currently around 1/19). As her abstract states, "an important open question is establishing tight bounds" for this problem.

From Zachary Friggstad's talk on integrality gaps for directed Steiner trees, we have the situation that (for directed Steiner tree problems with t terminals) it is possible to approximate the solution within a tε factor, for any ε > 0, in polynomial time, or to within a polylog factor in quasipolynomial time. Is a polynomial-time polylog-factor approximation possible?

Daniël Paulusma spoke on finding square roots of k-apex graphs; that is, one is given a graph G that could be made planar by the deletion of k vertices, and the goal is to find another graph R on the same vertex set such that two vertices are adjacent in G if and only if their distance in R is at most two. There are a lot of other graph classes for which the problem is known to be polynomial or hard, but he listed as open finding square roots of split graphs or cographs, and finding graph square roots that are planar.

Jens Schmidt's talk was motivated by some older work of Mikkel Thorup on recognizing map graphs, the intersection graphs of interior-disjoint simply-connected regions in the plane. These differ from planar graphs because of the four corners phenomenon, where Arizona, Colorado, New Mexico, and Utah all meet at a point (and are all adjacent in the map graph). Thorup gave a polynomial time recognition algorithm but his exponent is huge and his algorithm does not provide a combinatorial certificate of being a map graph (a planar bipartite graph whose half-square is the given graph). Schmidt gave a more satisfactory solution to a special case but would like a better algorithm for the general problem.

My own talk concerned cuckoo filters, a more space-efficient replacement for Bloom filters. A data structure of Pagh et al provides a theoretical but not-yet-practical solution for the same problem (for all false positive rates), but cuckoo filters only work well when the false positive rate is small enough. What about if we want a practically efficient filter that gives a high false positive rate, say 20%? Can we get close to optimal space in this case?

Konrad Dabrowski made progress on classifying which sets of forbidden induced subgraphs lead to graph families with bounded clique-width by showing that (diamond,P1+2P2)-free graphs and four other similarly defined classes do have bounded clique-width. But as he reports, there are many remaining open cases.

Dana Richards spoke on a problem of comparison-sorting a set of items when only certain pairs of items are allowed to be compared, but his paper has more questions on this problem than answers. Any set of comparisons that you make (a subset of allowed ones) gives you a DAG with an edge for each comparison, oriented from smaller to larger, and you can infer the results of all comparisons in the transitive closure of this DAG. The goal is to either test or infer the results of all allowed comparisons. You can do it in a sublinear number of comparisons when either the set of allowed comparisons is small (by just doing them all) or its complement is small (by Richards' algorithm) but what he would like to prove is a subquadratic (deterministic) bound that holds in all cases.

Timothy Chan revisited one of the first kinetic data structures, for closest pairs (and/or nearest neighbors). The static version of the problem can be solved in O(n log n) time in any constant dimension, but previous kinetic structures had a factor of logdimension in their time bounds. Chan eliminated this, but at the expense of making the time bound depend on the geometry of the input (the range of values that the closest pair might span) instead of just on the number of points. He asked whether it is possible to eliminate both the dependence on dimension and the dependence on geometry.

One of the disadvantages of the move from paper to electronic proceedings (at least in my own experience) is less browsing and more going directly to the papers you already knew you were looking for. And one of the continuing advantages of going to a symposium like this (especially a small and welcoming one like SWAT) is that by listening to talks on subjects you don't already know about, you can broaden your own interests and find out about topics you might not have discovered on your own. But browsing is still possible, even if we no longer are forced to leaf through paper proceedings. So, I'd encourage you to download the full proceedings (from the link at the top left corner of the proceedings link) rather than just the individual papers, and browse them to find many more problems that the papers there have listed as open.

at June 25, 2016 09:04 AM UTC

Authors: Jonah Sherman
Download: PDF
Abstract: We consider approximation algorithms for the problem of finding $x$ of minimal norm $\|x\|$ satisfying a linear system $\A x = \b$, where the norm $\|\cdot \|$ is arbitrary and generally non-Euclidean. We show a simple general technique for composing solvers, converting iterative solvers with residual error $\|\A x - \b\| \leq t^{-\Omega(1)}$ into solvers with residual error $\exp(-\Omega(t))$, at the cost of an increase in $\|x\|$, by recursively invoking the solver on the residual problem $\tilde{\b} = \b - \A x$. Convergence of the composed solvers depends strongly on a generalization of the classical condition number to general norms, reducing the task of designing algorithms for many such problems to that of designing a \emph{generalized preconditioner} for $\A$. The new ideas significantly generalize those introduced by the author's earlier work on maximum flow, making them more widely applicable.

As an application of the new technique, we present a nearly-linear time approximation algorithm for uncapacitated minimum-cost flow on undirected graphs. Given an undirected graph with $m$ edges labelled with costs, and $n$ vertices labelled with demands, the algorithm takes $\epsilon^{-2}m^{1+o(1)}$-time and outputs a flow routing the demands with total cost at most $(1+\epsilon)$ times larger than minimal, along with a dual solution proving near-optimality. The generalized preconditioner is obtained by embedding the cost metric into $\ell_1$, and then considering a simple hierarchical routing scheme in $\ell_1$ where demands initially supported on a dense lattice are pulled from a sparser lattice by randomly rounding unaligned coordinates to their aligned neighbors. Analysis of the generalized condition number for the corresponding preconditioner follows that of the classical multigrid algorithm for lattice Laplacian systems.

at June 24, 2016 01:00 AM UTC

Authors: Ilias Diakonikolas, Daniel Kane, Alistair Stewart
Download: PDF
Abstract: We investigate the problem of learning Bayesian networks in an agnostic model where an $\epsilon$-fraction of the samples are adversarially corrupted. Our agnostic learning model is similar to -- in fact, stronger than -- Huber's contamination model in robust statistics. In this work, we study the fully observable Bernoulli case where the structure of the network is given. Even in this basic setting, previous learning algorithms either run in exponential time or lose dimension-dependent factors in their error guarantees. We provide the first computationally efficient agnostic learning algorithm for this problem with dimension-independent error guarantees. Our algorithm has polynomial sample complexity, runs in polynomial time, and achieves error that scales nearly-linearly with the fraction of adversarially corrupted samples.

at June 24, 2016 01:03 AM UTC

Authors: Guru Prakash Arumugam, John Augustine, Mordecai J. Golin, Yuya Higashikawa, Naoki Katoh, Prashanth Srikanthan
Download: PDF
Abstract: A Dynamic Graph Network is a graph in which each edge has an associated travel time and a capacity (width) that limits the number of items that can travel in parallel along that edge. Each vertex in this dynamic graph network begins with the number of items that must be evacuated into designated sink vertices. A $k$-sink evacuation protocol finds the location of $k$ sinks and associated evacuation movement protocol that allows evacuating all the items to a sink in minimum time. The associated evacuation movement must impose a confluent flow, i.e, all items passing through a particular vertex exit that vertex using the same edge. In this paper we address the $k$-sink evacuation problem on a dynamic path network. We provide solutions that run in $O(n \log n)$ time for $k=1$ and $O(k n \log^2 n)$ for $k >1$ and work for arbitrary edge capacities.

at June 24, 2016 01:00 AM UTC

Authors: Gilles Barthe, Marco Gaboardi, Benjamin Grégoire, Justin Hsu, Pierre-Yves Strub
Download: PDF
Abstract: Differential privacy is a promising formal approach to data privacy, which provides a quantitative bound on the privacy cost of an algorithm that operates on sensitive information. Several tools have been developed for the formal verification of differentially private algorithms, including program logics and type systems. However, these tools do not capture fundamental techniques that have emerged in recent years, and cannot be used for reasoning about cutting-edge differentially private algorithms. Existing techniques fail to handle three broad classes of algorithms: 1) algorithms where privacy depends accuracy guarantees, 2) algorithms that are analyzed with the advanced composition theorem, which shows slower growth in the privacy cost, 3) algorithms that interactively accept adaptive inputs.

We address these limitations with a new formalism extending apRHL, a relational program logic that has been used for proving differential privacy of non-interactive algorithms, and incorporating aHL, a (non-relational) program logic for accuracy properties. We illustrate our approach through a single running example, which exemplifies the three classes of algorithms and explores new variants of the Sparse Vector technique, a well-studied algorithm from the privacy literature. We implement our logic in EasyCrypt, and formally verify privacy. We also introduce a novel coupling technique called \emph{optimal subset coupling} that may be of independent interest.

at June 24, 2016 01:03 AM UTC

I attended the 2016 STOC Conference earlier this week in Cambridge, Massachusetts. No shortage of awesome results highlighted by László Babai's new quasipolynomial-time graph isomorphism algorithm. Babai didn't have a chance to give more than a glimpse into his algorithm in the 20 minute talk but you did see his joy in discovering the concept of "affected" last September, the key to making the whole algorithm work.

Babai also received the ACM SIGACT Distinguished Service prize for his work for the community including his open-access journal Theory of Computing.

You can see and access all the papers from the program page. Links on that page will give you free access to the papers forever. I'll mention some papers here but you'll have to click over on the program page to access them.

Troy Lee gave my favorite talk on his paper Separations in Query Complexity Based on Pointer Functions with Andris Ambainis, Kaspars Balodis, Aleksandrs Belovs, Miklos Santha and Juris Smotrovs. He gave a wonderful description of a (contrived) function that gives strong separations between deterministic, randomized and quantum decision tree complexity. Scott Aaronson, Shalev Ben-David and Robin Kothari followed up on that work in Separations in query complexity using cheat sheets. 

Let me also mention the best paper winners
  • Reed-Muller Codes Achieve Capacity on Erasure Channels by Shrinivas Kudekar, Santhosh Kumar, Marco Mondelli, Henry D. Pfister, Eren Sasoglu and Rudiger Urbanke
  • Explicit Two-Source Extractors and Resilient Functions by Eshan Chattopadhyay and David Zuckerman
  • Graph Isomorphism in Quasipolynomial Time by Laszlo Babai
and the Danny Lewin best student paper awardees
  • A Tight Space Bound for Consensus by Leqi Zhu
  • The 4/3 Additive Spanner Exponent is Tight by Amir Abboud and Greg Bodwin
The conference had 327 registrants, 44% students and 74% from the USA. The Symposium on Computational Geometry was held at Tufts before STOC and they held a joint workshop day in between. There were about 20 registered for both conferences.

STOC had 92 accepted papers out of 370 submissions. None of the three papers claiming to settle P v NP were accepted.

STOC 2017 will be held in Montreal June 19-23, the first of a theory festival. There will be multiple day poster sessions, a poster for every papers. Invited talks will included best papers from other theory conference. There will be a mild increase in the number of accepted papers. The hope is to draw a broader and larger group of theorists for this festival.

The first STOC conference was held in Marina del Rey in 1969. That makes the 2018 conference STOC's 50th which will return to Southern California. The Conference on Computational Complexity will co-locate.

FOCS 2016 will be held October 9-11 in New Brunswick, New Jersey preceded by a celebration for the 60th birthday for Avi Wigderson.

SODA 2017 will be held January 16-19 in Barcelona. Paper registration deadline is July 6.

If you weren't at the business meeting, it's worth looking at the slides for the NSF and the Committee for the Advancement for Theoretical Computer Science. Of particular note, the CATCS site Theory Matters has job announcements and sample CAREER proposals.

by Lance Fortnow ( at June 23, 2016 12:44 PM UTC

Authors: William S. Evans, Krzysztof Fleszar, Philipp Kindermann, Noushin Saeedi, Chan-Su Shin, Alexander Wolff
Download: PDF
Abstract: A rectilinear polygon is a polygon whose edges are axis-aligned. Walking counterclockwise on the boundary of such a polygon yields a sequence of left turns and right turns. The number of left turns always equals the number of right turns plus 4. It is known that any such sequence can be realized by a rectilinear polygon. In this paper, we consider the problem of finding realizations that minimize the perimeter or the area of the polygon or the area of the bounding box of the polygon. We show that all three problems are NP-hard in general. Then we consider the special cases of $x$-monotone and $xy$-monotone rectilinear polygons. For these, we can optimize the three objectives efficiently.

at June 23, 2016 01:05 AM UTC

Authors: Thomas Kesselheim, Andreas Tönnis
Download: PDF
Abstract: The \emph{Temp Secretary Problem} was recently introduced by Fiat et al. It is a generalization of the Secretary Problem, in which commitments are temporary for a fixed duration. We present a simple online algorithm with improved performance guarantees for cases already considered by Fiat et al.\ and give competitive ratios for new generalizations of the problem. In the classical setting, where candidates have identical contract durations $\gamma \ll 1$ and we are allowed to hire up to $B$ candidates simultaneously, our algorithm is $(\frac{1}{2} - O(\sqrt{\gamma}))$-competitive. For large $B$, the bound improves to $1 - O\left(\frac{1}{\sqrt{B}}\right) - O(\sqrt{\gamma})$.

Furthermore we generalize the problem from cardinality constraints towards general packing constraints. We achieve a competitive ratio of $1 - O\left(\sqrt{\frac{(1+\log d + \log B)}{B}}\right) -O(\sqrt{\gamma})$, where $d$ is the sparsity of the constraint matrix and $B$ is generalized to the capacity ratio of linear constraints. Additionally we extend the problem towards arbitrary hiring durations.

Our algorithmic approach is a relaxation that aggregates all temporal constraints into a non-temporal constraint. Then we apply a linear scaling algorithm that, on every arrival, computes a tentative solution on the input that is known up to this point. This tentative solution uses the non-temporal, relaxed constraints scaled down linearly by the amount of time that has already passed.

at June 23, 2016 01:02 AM UTC

Authors: Daniel Dadush, Oded Regev
Download: PDF
Abstract: We present a natural reverse Minkowski-type inequality for lattices, which gives upper bounds on the number of lattice points in a Euclidean ball in terms of sublattice determinants, and conjecture its optimal form. The conjecture exhibits a surprising wealth of connections to various areas in mathematics and computer science, including a conjecture motivated by integer programming by Kannan and Lov\'asz (Annals of Math. 1988), a question from additive combinatorics asked by Green, a question on Brownian motions asked by Saloff-Coste (Colloq. Math. 2010), a theorem by Milman and Pisier from convex geometry (Ann. Probab. 1987), worst-case to average-case reductions in lattice-based cryptography, and more. We present these connections, provide evidence for the conjecture, and discuss possible approaches towards a proof. Our main technical contribution is in proving that our conjecture implies the $\ell_2$ case of the Kannan and Lov\'asz conjecture. The proof relies on a novel convex relaxation for the covering radius, and a rounding procedure for based on "uncrossing" lattice subspaces.

at June 23, 2016 01:01 AM UTC

Authors: Iordanis Kerenidis, Adi Rosén, Florent Urrutia
Download: PDF
Abstract: We introduce the new measure of Public Information Complexity (PIC), as a tool for the study of multi-party computation protocols, and of quantities such as their communication complexity, or the amount of randomness they require in the context of information-theoretic private computations. We are able to use this measure directly in the natural asynchronous message-passing peer-to-peer model and show a number of interesting properties and applications of our new notion: the Public Information Complexity is a lower bound on the Communication Complexity and an upper bound on the Information Complexity; the difference between the Public Information Complexity and the Information Complexity provides a lower bound on the amount of randomness used in a protocol; any communication protocol can be compressed to its Public Information Cost; an explicit calculation of the zero-error Public Information Complexity of the k-party, n-bit Parity function, where a player outputs the bit-wise parity of the inputs. The latter result establishes that the amount of randomness needed for a private protocol that computes this function is Omega(n).

at June 23, 2016 01:00 AM UTC

Authors: Rad Niazadeh, Christopher Wilkens
Download: PDF
Abstract: Quasiliearity is a ubiquitous and questionable assumption in the standard study of Walrasian equilibria. Quasilinearity implies that a buyer's value for goods purchased in a Walrasian equilibrium is always additive with goods purchased with unspent money. It is a particularly suspect assumption in combinatorial auctions, where buyers' complex preferences over goods would naturally extend beyond the items obtained in the Walrasian equilibrium.

We study Walrasian equilibria in combinatorial auctions when quasilinearity is not assumed. We show that existence can be reduced to an Arrow-Debreu style market with one divisible good and many indivisible goods, and that a "fractional" Walrasian equilibrium always exists. We also show that standard integral Walrasian equilibria are related to integral solutions of an induced configuration LP associated with a fractional Walrasian equilibrium, generalizing known results for both quasilinear and non-quasilnear settings.

at June 23, 2016 01:02 AM UTC

Authors: Richard Whyman
Download: PDF
Abstract: In this paper we give a framework for describing how abstract systems can be used to compute if no randomness or error is involved. Using this we describe a class of classical "physical" computation systems whose computational capabilities in polynomial time are equivalent to P/poly. We then extend our framework to describe how measurement and transformation times may vary depending on their input. Finally we describe two classes of classical "physical" computation systems in this new framework whose computational capabilities in polynomial time are equivalent to P/poly and P/log*.

at June 23, 2016 01:00 AM UTC

Authors: Ciarán M. Lee, Matty J. Hoban
Download: PDF
Abstract: What kind of object is a quantum state? Is it an object that encodes an exponentially growing amount of information (in the size of the system) or more akin to a probability distribution? It turns out that these questions are sensitive to what we do with the information. For example, Holevo's bound tells us that n qubits only encode n bits of classical information but for certain communication complexity tasks there is an exponential separation between quantum and classical resources. Instead of just contrasting quantum and classical physics, we can place both within a broad landscape of physical theories and ask how non-quantum (and non-classical) theories are different from, or more powerful than quantum theory. For example, in communication complexity, certain (non-quantum) theories can trivialise all communication complexity tasks. In recent work [C. M. Lee and M. J. Hoban, Proc. Royal Soc. A 472 (2190), 2016], we showed that the immense power of the information content of states in general (non-quantum) physical theories is not limited to communication complexity. We showed that, in general physical theories, states can be taken as "advice" for computers in these theories and this advice allows the computers to easily solve any decision problem. Aaronson has highlighted the close connection between quantum communication complexity and quantum computations that take quantum advice, and our work gives further indications that this is a very general connection. In this work, we review the results in our previous work and discuss the intricate relationship between communication complexity and computers taking advice for general theories.

at June 23, 2016 01:00 AM UTC

Authors: Mohamed El Yafrani, Belaïd Ahiod
Download: PDF
Abstract: This extended abstract presents an overview on NP-hard optimization problems with multiple interdependent components. These problems occur in many real-world applications: industrial applications, engineering, and logistics. The fact that these problems are composed of many sub-problems that are NP-hard makes them even more challenging to solve using exact algorithms. This is mainly due to the high complexity of this class of algorithms and the hardness of the problems themselves. The main source of difficulty of these problems is the presence of internal dependencies between sub-problems. This aspect of interdependence of components is presented, and some outlines on solving approaches are briefly introduced from a (meta)heuristics and evolutionary computation perspective.

at June 23, 2016 01:05 AM UTC

Authors: Kaushik Sarkar, Charles J. Colbourn
Download: PDF
Abstract: Modern software systems often consist of many different components, each with a number of options. Although unit tests may reveal faulty options for individual components, functionally correct components may interact in unforeseen ways to cause a fault. Covering arrays are used to test for interactions among components systematically. A two-stage framework, providing a number of concrete algorithms, is developed for the efficient construction of covering arrays. %Our framework divides the construction in two stages. In the first stage, a time and memory efficient randomized algorithm covers most of the interactions. In the second stage, a more sophisticated search covers the remainder in relatively few tests. In this way, the storage limitations of the sophisticated search algorithms are avoided; hence the range of the number of components for which the algorithm can be applied is extended, without increasing the number of tests. Many of the framework instantiations can be tuned to optimize a memory-quality trade-off, so that fewer tests can be achieved using more memory. The algorithms developed outperform the currently best known methods when the number of components ranges from 20 to 60, the number of options for each ranges from 3 to 6, and $t$-way interactions are covered for $t\in \{5,6\}$. In some cases a reduction in the number of tests by more than $50\%$ is achieved.

at June 23, 2016 01:02 AM UTC

STOC 2016 just ended and it included many great results with one highlight of course being Laci Babai’s quasipolynomial time algorithm for graph isomorphism. But today I wanted to mention another paper that I found quite interesting and reminded me of the famous Tolstoy quote that

Happy families are all alike; every unhappy family is unhappy in its own way.

I am talking about the work Reed-Muller Codes Achieve Capacity on Erasure Channels by Shrinivas Kudekar, Santhosh Kumar, Marco Mondelli, Henry D. Pfister, Eren Sasoglu and Rudiger Urbanke. We are used to thinking of some error correcting codes as being “better” than others in the sense that they have fewer decoding errors. But it turns out that in some sense all codes of a given rate have the same average number of errors. The only difference is that “bad” codes (such as the repetition code), have a fairly “smooth” error profile in the sense that the probability of decoding success decays essentially like a low degree polynomial with the fraction of errors, while for “good” codes the decay is like a step function, where one can succeed with probability 1 when the error is smaller than some \tau but this probability quickly decays to half when the error passes \tau.

Specifically, if C\subseteq GF(2)^n is a linear code of dimension k and p \in [0,1], we let Y be the random variable over GF(2)^n that is obtained by sampling a random codeword X in C and erasing (i.e., replacing it with \star) every coordinate i independently with probability p. Then we define h_C(p) to be the average over all i of the conditional entropy of X_i given Y_{-i}=(Y_1,\ldots,Y_{i-1},Y_{i+1},\ldots,Y_n). Note that for linear codes, the coordinate X_i is either completely fixed by Y_{-i} or it is a completely uniform bit, and hence h_C(p) can be thought of as the expected number of the coordinates that we won’t be able to decode with probability better than half from a 1-p-sized random subset of the remaining coordinates.

One formalization of this notion that all codes have the same average number of errors is known as the Area Law for EXIT functions which states that for every code C of dimension k, the integral \int_0^1 h_C(p) dp is a fixed constant independent of C. In particular note that if C is the simple “repetition code” where we simply repeat every symbol n/k times, then the probability we can’t decode some coordinate from the remaining ones (in which case the entropy is one) is exactly p^{n/k-1} where p is the erasure probability. Hence in this case we can easily compute the integral \int_0^1 h_C(p)dp = \int_0^1 p^{n/k-1}dp = k/n which is simply one minus the rate of the code. In particular this tells us that the average entropy is always equal to the rate of the code. A code is said to be capacity achieving if there is some function \epsilon=\epsilon(n) that goes to zero with n such that h_C(p)<\epsilon whenever p< 1-k/n-\epsilon. The area law immediately implies that in this case it must be that h_C(p) is close to one when p>1-k/n (since otherwise the total integral would be smaller than 1-k/n), and hence a code is capacity achieving if and only if the function h_C(p) has a threshold behavior. (See figure below).

The paper above uses this observation to show that the Reed Muller code is capacity achieving for this binary erasure channel. The only property they use is the symmetry of this code which means that for this code we might as well have defined h_C(p) with some fixed coordinate (e.g., the first one). In this case, using linearity, we can see that for every erasure pattern \alpha on the coordinates 2,\ldots, n the entropy of X_1 given Y_2,\ldots,Y_n is a Boolean monotone function of \alpha. (Booleanity follows because in a linear subspace the entropy of the remaining coordinate is either zero or one; monotonicity follows because in the erasure channel erasing more coordinates cannot help you decode.) One can then use the papers of Friedgut or Friedgut-Kalai to establish such a property. (The Reed-Muller code has an additional stronger property of double transitivity which allows to deduce that one can decode not just most coordinates but all coordinates with high probability when the fraction of errors is a smaller than the capacity.)


How do you prove this area law? The idea is simple. Because of linearity, we can think of the following setting: suppose we have the all zero codeword and we permute its coordinates randomly and reveal the first t=(1-p)n of them. Then the probability that the t+1^{th} coordinate is determined to be zero as well is 1-H_C(p). Another way to say it is that if we permute the columns of the n\times k generating matrix A of C randomly, then the probability that the t+1^{th} column is independent from the first t columns is H_C(1-t/n). In other words, if we keep track of the rank of the first t columns, then at step t the probability that the rank will increase by one is H_C(1-t/n), but since we know that the rank of all n columns is k, it follows that \sum_{t=1}^n H_C(1-t/n) = n \int_0^1 H_C(p)dp = k, which is what we wanted to prove. QED


p.s. Thanks to Yuval Wigderson, whose senior thesis is a good source for these questions.

by Boaz Barak at June 22, 2016 08:45 PM UTC

In my own anti-Trump post two weeks ago, I started out by mentioning that Terry Tao and Stephen Hawking had recently denounced Trump, and jokingly wondered when we’d hear from Ed Witten.  Well, will Leonard Susskind of Stanford University—a creator of string theory, and one of the most legendarily original physicists and physics expositors of our time—do instead?

Over the last decade, it’s been a privilege for me to get to know Lenny, to learn from him, and recently, to collaborate with him on quantum circuit complexity and AdS/CFT.  Today, Lenny wrote to ask whether I’d share his open letter about the US election on this blog.  Of course I said yes.  Better yet, Lenny has agreed to my request to be available here to answer questions and comments.  Lenny’s views, even when close to mine (as they certainly are in this case), are still his, and I’d never want to speak on his behalf.  Better that you should hear it straight from the horse’s mouth—as you now will, without further ado.  –Scott A.

Letter to My Friends, by Leonard Susskind

I’m watching this thing that’s happening with disbelief, dismay, and disgust. There is a lunatic loose—I’m sure we all agree about that—but I keep hearing people say that they can’t vote for Hillary. I heard it at my daughter’s birthday party Sunday. Boy oh boy, will these people be sorry if the lunatic gets his way. Personally I do not find it an excuse that “I live in California, which will go Democrat whatever I do.”

I strongly believe in all things Bernie, but Hillary is not the Anti-Bernie. There is much less difference between Clinton and Sanders than the distortions of the nominating process might lead people to think. She’s for health care, he’s for health care; he’s for increased minimum wage, she’s for increased minimum wage; she’s for immigrant rights, he’s for immigrant rights; and on and on it goes.

The lunatic may be just that—a lunatic—but he is also a master of smear and innuendo.  He is a gigantic liar, and he knows that if you keep saying something over and over, it sticks in people’s minds. It’s called the Big Lie, and it works. Say it enough and it sows confusion and distrust, not only among the know-nothings, but even among those who know better.

The lunatic and his supporters are exceedingly dangerous. Tell your friends: don’t be fooled. The only thing between us and the lunatic is Hillary. Get off your ass and vote in Nov.

Leonard Susskind

Director, Stanford Institute for Theoretical Physics,

Stanford University


by Scott at June 22, 2016 08:41 PM UTC

Authors: Quirijn W. Bouts, Irina Kostitsyna, Marc van Kreveld, Wouter Meulemans, Willem Sonke, Kevin Verbeek
Download: PDF
Abstract: We show how to represent a simple polygon $P$ by a grid (pixel-based) polygon $Q$ that is simple and whose Hausdorff or Fr\'echet distance to $P$ is small. For any simple polygon $P$, a grid polygon exists with constant Hausdorff distance between their boundaries and their interiors. Moreover, we show that with a realistic input assumption we can also realize constant Fr\'echet distance between the boundaries. We present algorithms accompanying these constructions, heuristics to improve their output while keeping the distance bounds, and experiments to assess the output.

at June 22, 2016 01:12 AM UTC

Authors: Ben Strasser
Download: PDF
Abstract: We study the earliest arrival problem in road networks with static time-dependent functions as arc weights. We propose and evaluate the following simple algorithm: (1) average the travel time in k time windows, (2) compute a shortest time-independent path within each window and mark the edges in these paths, and (3) compute a shortest time-dependent path in the original graph restricted to the marked edges. Our experimental evaluation shows that this simple algorithm yields near optimal results on well-established benchmark instances. We additionally demonstrate that the error can be further reduced by additionally considering alternative routes at the expense of more marked edges. Finally, we show that the achieved subgraphs are small enough to be able to efficiently implement profile queries using a simple sampling-based approach. A highlight of our introduced algorithms is that they do not rely on linking and merging profile functions.

at June 22, 2016 01:11 AM UTC

Authors: Olivier Bodini, Camille Coti, Julien David
Download: PDF
Abstract: In this paper, we study a parallel version of Galton-Watson processes for the random generation of tree-shaped structures. Random trees are useful in many situations (testing, binary search, simulation of physics phenomena,...) as attests more than 49000 citations on Google scholar. Using standard analytic combinatorics, we first give a theoretical, average-case study of the random process in order to evaluate how parallelism can be extracted from this process, and we deduce a parallel generation algorithm. Then we present how it can be implemented in a task-based parallel paradigm for shared memory (here, Intel Cilk). This implementation faces several challenges, among which efficient, thread-safe random bit generation, memory management and algorithmic modifications for small-grain parallelism. Finally, we evaluate the performance of our implementation and the impact of different choices and parameters. We obtain a significant efficiency improvement for the generation of big trees. We also conduct empirical and theoretical studies of the average behaviour of our algorithm.

at June 22, 2016 01:11 AM UTC

Authors: Gonzalo Navarro
Download: PDF
Abstract: The Block Tree is a recently proposed data structure that reaches compression close to Lempel-Ziv while supporting efficient direct access to text substrings. In this paper we show how a self-index can be built on top of a Block Tree so that it provides efficient pattern searches while using space proportional to that of the original data structure. We also show how the structure can be extended to perform document listing and range-restricted pattern matching.

at June 22, 2016 01:00 AM UTC

Authors: Cornelius Brand, Holger Dell, Marc Roth
Download: PDF
Abstract: Jaeger, Vertigan, and Welsh [15] proved a dichotomy for the complexity of evaluating the Tutte polynomial at fixed points: The evaluation is #P-hard almost everywhere, and the remaining points admit polynomial-time algorithms. Dell, Husfeldt, and Wahl\'en [9] and Husfeldt and Taslaman [12], in combination with Curticapean [7], extended the #P-hardness results to tight lower bounds under the counting exponential time hypothesis #ETH, with the exception of the line $y=1$, which was left open. We complete the dichotomy theorem for the Tutte polynomial under #ETH by proving that the number of all acyclic subgraphs of a given $n$-vertex graph cannot be determined in time $exp(o(n))$ unless #ETH fails.

Another dichotomy theorem we strengthen is the one of Creignou and Hermann [6] for counting the number of satisfying assignments to a constraint satisfaction problem instance over the Boolean domain. We prove that all #P-hard cases are also hard under #ETH. The main ingredient is to prove that the number of independent sets in bipartite graphs with $n$ vertices cannot be computed in time $exp(o(n))$ unless #ETH fails. In order to prove our results, we use the block interpolation idea by Curticapean [7] and transfer it to systems of linear equations that might not directly correspond to interpolation.

at June 22, 2016 01:00 AM UTC

Authors: Martin Fürer
Download: PDF
Abstract: Tree-width and path-width are widely successful concepts. Many NP-hard problems have efficient solutions when restricted to graphs of bounded tree-width. Many efficient algorithms are based on a tree decomposition. Sometimes the more restricted path decomposition is required. The bottleneck for such algorithms is often the computation of the width and a corresponding tree or path decomposition. For graphs with $n$ vertices and tree-width or path-width $k$, the standard linear time algorithm to compute these decompositions dates back to 1996. Its running time is linear in $n$ and exponential in $k^3$ and not usable in practice. Here we present a more efficient algorithm to compute the path-width and provide a path decomposition. Its running time is $2^{O(k^2)} n$. In the classical algorithm of Bodlaender and Kloks, the path decomposition is computed from a tree decomposition. Here, an optimal path decomposition is computed from a path decomposition of about twice the width. The latter is computed from a constant factor smaller graph.

at June 22, 2016 01:08 AM UTC

Authors: Walter F. Mascarenhas
Download: PDF
Abstract: We present fast and accurate ways to normalize two and three dimensional vectors and quaternions and compute their length. Our approach is an adaptation of ideas used in the linear algebra library LAPACK, and we believe that the computational geometry and computer aided design communities are not aware of the possibility of speeding up these fundamental operations in the robust way proposed here.

at June 22, 2016 01:12 AM UTC

Authors: Wouter Meulemans
Download: PDF
Abstract: To produce cartographic maps, simplification is typically used to reduce complexity of the map to a legible level. With schematic maps, however, this simplification is pushed far beyond the legibility threshold and is instead constrained by functional need and resemblance. Moreover, stylistic geometry is often used to convey the schematic nature of the map. In this paper we explore discretized approaches to computing a schematic shape $S$ for a simple polygon $P$. We do so by overlaying a plane graph $G$ on $P$ as the solution space for the schematic shape. Topological constraints imply that $S$ should describe a simple polygon. We investigate two approaches, simple map matching and connected face selection, based on commonly used similarity metrics.

With the former, $S$ is a simple cycle $C$ in $G$ and we quantify resemblance via the Fr\'echet distance. We prove that it is NP-hard to compute a cycle that approximates the minimal Fr\'echet distance over all simple cycles in a plane graph $G$. This result holds even if $G$ is a partial grid graph, if area preservation is required and if we assume a given sequence of turns is specified.

With the latter, $S$ is a connected face set in $G$, quantifying resemblance via the symmetric difference. Though the symmetric difference seems a less strict measure, we prove that it is NP-hard to compute the optimal face set. This result holds even if $G$ is full grid graph or a triangular or hexagonal tiling, and if area preservation is required. Moreover, it is independent of whether we allow the set of faces to have holes or not.

at June 22, 2016 01:12 AM UTC

Authors: Jonathan Gorard
Download: PDF
Abstract: This paper presents the novel `uniqueness tree' algorithm, as one possible method for determining whether two finite, undirected graphs are isomorphic. We prove that the algorithm has polynomial time complexity in the worst case, and that it will always detect the presence of an isomorphism whenever one exists. We also propose that the algorithm will equivalently discern the lack of an isomorphism whenever one does not exist, and some initial justifications are given for this proposition, although it cannot yet be rigorously proven. Finally, we present experimental evidence for both the effectiveness and efficiency of the uniqueness tree method, using data gathered from a practical implementation of the algorithm. Some consequences and directions for further research are discussed.

at June 22, 2016 01:08 AM UTC

Authors: Brian Brubach, Karthik Abinav Sankararaman, Aravind Srinivasan, Pan Xu
Download: PDF
Abstract: Online matching has received significant attention over the last 15 years due to its close connection to Internet advertising. As the seminal work of Karp, Vazirani, and Vazirani has an optimal (1 - 1/e) competitive ratio in the standard adversarial online model, much effort has gone into developing useful online models that incorporate some stochasticity in the arrival process. One such popular model is the known I.I.D. model where different customer types arrive online from a known distribution. We develop algorithms with improved competitive ratios for some basic variants of this model with integral arrival rates, including (a) the case of general weighted edges, where we improve the best-known ratio of 0.667 due to Haeupler, Mirrokni and Zadimoghaddam to 0.705 and (b) the vertex-weighted case, where we improve the 0.7250 ratio of Jaillet and Lu to 0.7299. We also consider two extensions, one is known I.I.D. with non-integral arrival rate and stochastic rewards. The other is known I.I.D. b-matching with non-integral arrival rate and stochastic rewards. We present a simple non-adaptive algorithm which works well simultaneously on the two extensions.

at June 22, 2016 01:08 AM UTC

Authors: John Iacono, Mark Yagnatinsky
Download: PDF
Abstract: We present the first potential function for pairing heaps with linear range. This implies that the runtime of a short sequence of operations is faster than previously known. It is also simpler than the only other potential function known to give amortized constant amortized time for insertion.

at June 22, 2016 01:11 AM UTC

The June 2016 issue of the Bulletin of the EATCS is now available on line. If you prefer, you can download a pdf with the printed version of the Bulletin. As usual, thanks to the support of the members of the EATCS, the Bulletin is open access.

This issue of the Bulletin  features the following columns:
Some of you might also be interested in advice to young researchers from Michael Fellows and other Fellows of the EATCS, and in interviews with the recipients of the Gödel Prize 2016 and of the first Alonzo Church Award.

Thanks to Kazuo Iwama, the editor in chief of the Bulletin, and the EATCS Secretary Office for their work on another excellent issue of the Bulletin.

by Luca Aceto ( at June 21, 2016 10:43 PM UTC

In my last post When does n divide a_n? I gave a sequence:




for all n ≥ 4 a(n) = a(n-2) + a(n-3)

and I noted that for 2 ≤ n ≤ 23 it looked like

n divides a(n) iff n is prime

and challenged my readers to prove it or disprove it.

I thought it would be hard to find the first counterexample and hence I would make the point:

Just because a pattern holds for the first 271440  numbers does not mean that it always holds!

However, several readers found that 5212=271441 is a counterexample. In fact they found it rather quickly. I thought it would take longer since this blog (also my inspiration) by Colin Wright seemed to indicate that fancy data structures and algorithms tricks are needed. So I emailed Colin about this and it turns out he originally wrote the program many years ago. I quote his email:

> I posted on Perrin Primes (without calling them that) and people seemed to
> find the counterexample, 561^2, easily.

Yes.  These days the machines are faster, and languages with arbitrary
precision arithmetic are common-place.  When I first came across this
problem, if you wanted arithmetic on more than 32 bits you had to write
the arbitrary precision package yourself.  This was pre-WWW,
although not pre-internet.  Quite.

> So I wondered why your code needed those optimizations.

Even now, it's easy to find the first few counter-examples, but it rapidly
gets out-of-hand.  The sequence grows exponentially, and very soon you are
dealing with numbers that have thousands of digits. Again, not that bad now,
but impossible when I first did it, and even now, to find anything beyond the
first 20 or 30 counter-examples takes a very long time.

So ask people how to find the first 1000 counter-examples,
and what they notice about them all.

back to my post:

It is known that if n is prime then n divides a(n). (ADDED LATER: see here for a proof) The converse is false. Composites n such  that n divides a(n) are called Perrin Pseudoprimes. The following questions seem open, interesting, and rather difficult:

1) Why is the first Perrin Pseudoprime so large?

2) Are there other simple recurrences b(n) where n divides b(n) iff n is prime LOOKS true for a while?

by GASARCH ( at June 21, 2016 06:54 PM UTC

Here are slides from the CATCS presentation at the STOC ’16 business meeting, and here are Jack Snoeyink’s slides from his NSF presentation.

by timroughgarden at June 21, 2016 06:38 PM UTC

Hello?  Hello?  Is this thing still on?

I've been a bit quiet here for the last few years, partly as a new parent, learning to deal with children, and more recently as an associate department head, learning to deal with bureaucracy.  (The two skills are remarkably similar!)  But now that my kids are now well into their transformation from cute gobs of poo into full-fledged independent human beings, and my three-year sentence term as an administrator is coming to a close, I find myself wondering what I'm going to do when I'm no longer a grown-up.

My main job as associate head was hiring tenure-track faculty.  Illinois CS just had an excellent year—five new faculty are joining us this fall and several more are starting in Fall 2017—but I need to wait for the ink to dry on our final offers before talking about the hiring process in any detail.

My other job was helping to launch a major revision of our undergraduate theory courses, as part of a wider curriculum revision.  Our new required theory course (CS 374) has a steady-state enrollment of 400 students per semester.  This curriculum change, combined with our skyrocketing enrollments (just like everyone else's) has had some interesting side-effects, which I'm sure I'll talk more about later.

One side-effect of the change is that my algorithms lecture notes no longer mirror the structure of the courses I teach, so I'm starting a major revision. With videos. Stay tuned.

But enough about me.  The title of this post is a line from a 1940's cold-reading test for prospective radio announcers, popularized by Jerry Lewis (“The Announcer's Test”), Flo and Eddie (“The Tibetan Memory Trick”), and many others. The test has the same cumulative structure as “The Twelve Days of Christmas” or some versions of “Old MacDonald Had a Farm”; the kth “verse” consists of the first k lines from the following list (with “and” inserted between like k–1 and k):

  • One hen
  • Two ducks
  • Three squawking geese
  • Four limerick oysters
  • Five corpulent porpoises
  • Six pairs of Don Alverzo's tweezers
  • Seven thousand Macedonian warriors in full battle array
  • Eight brass monkeys from the ancient, sacred, holy crypts of Egypt
  • Nine sympathetic, apathetic, diabetic, old men on roller skates, with a marked propensity towards procrastination and sloth
  • Ten lyrical, spherical, diabolical denizens of the deep who all stall around the corner on the quo of the quay of the quivvey, all at the same time.

Roughly, the length of each line increases linearly, which means the length of each verse increases quadratically, which means the length of the entire announcement is cubic in the number of lines.  This is the only "song" I know that requires Θ(n3) time to "sing" the first n verses.  On the other hand, just writing down n lines requires Θ(n2) space, so in terms of Knuth's infamous paper “The Complexity of Songs”, the Announcer's Test has complexity Θ(N2/3).  This is the only example I know of a "song" with this complexity.

Do you know of any other songs with complexity Θ(N2/3)? How about Θ(Nc) for any constant c other than 0, 1/2, 2/3, or 1?  I look forward to reading your compositions in the comments.

by Jeff Erickson at June 21, 2016 03:38 PM UTC

Authors: Charalampos Tsourakakis, Jakub Pachocki, Michael Mitzenmacher
Download: PDF
Abstract: We develop new methods based on graph motifs for graph clustering, allowing more efficient detection of communities within networks. We focus on triangles within graphs, but our techniques extend to other motifs as well. Our intuition, which has been suggested but not formalized similarly in previous works, is that triangles are a better signature of community than edges. We therefore generalize the notion of conductance for a graph to {\em triangle conductance}, where the edges are weighted according to the number of triangles containing the edge. This methodology allows us to develop variations of several existing clustering techniques, including spectral clustering, that minimize triangles split by the cluster instead of edges cut by the cluster. We provide theoretical results in a planted partition model to demonstrate the potential for triangle conductance in clustering problems. We then show experimentally the effectiveness of our methods to multiple applications in machine learning and graph mining.

at June 21, 2016 12:00 AM UTC

Authors: Richard Barnes
Download: PDF
Abstract: Algorithms for extracting hydrologic features and properties from digital elevation models (DEMs) are challenged by large datasets, which often cannot fit within a computer's RAM. Depression filling is an important preconditioning step to many of these algorithms. Here, I present a new, linearly-scaling algorithm which parallelizes the Priority-Flood depression-filling algorithm by subdividing a DEM into tiles. Using a single-producer, multi-consumer design, the new algorithm works equally well on one core, multiple cores, or multiple machines and can take advantage of large memories or cope with small ones. Unlike previous algorithms, the new algorithm guarantees a fixed number of memory access and communication events per subdivision of the DEM. In comparison testing, this results in the new algorithm running generally faster while using fewer resources than previous algorithms. For moderately sized tiles, the algorithm exhibits ~60% strong and weak scaling efficiencies up to 48 cores, and linear time scaling across datasets ranging over three orders of magnitude. The largest dataset on which I run the algorithm has 2 trillion (2*10^12) cells. With 48 cores, processing required 4.8 hours wall-time (9.3 compute-days). This test is three orders of magnitude larger than any previously performed in the literature. Complete, well-commented source code and correctness tests are available for download from a repository.

at June 21, 2016 12:00 AM UTC

Authors: Mahdi Ghamkhari, Ashkan Sadeghi-Mobarakeh, Hamed Mohsenian-Rad
Download: PDF
Abstract: Strategic bidding problems in electricity markets are widely studied in power systems, often by formulating complex bi-level optimization problems that are hard to solve. The state-of-the-art approach to solve such problems is to reformulate them as mixed-integer linear programs (MILPs). However, the computational time of such MILP reformulations grows dramatically, once the network size increases, scheduling horizon increases, or randomness is taken into consideration. In this paper, we take a fundamentally different approach and propose effective and customized convex programming tools to solve the strategic bidding problem for producers in nodal electricity markets. Our approach is inspired by the Schmudgen's Positivstellensatz Theorem in semi-algebraic geometry; but then we go through several steps based upon both convex optimization and mixed-integer programming that results in obtaining close to optimal bidding solutions, as evidenced by several numerical case studies, besides having a huge advantage on reducing computation time. While the computation time of the state-of-the-art MILP approach grows exponentially when we increase the scheduling horizon or the number of random scenarios, the computation time of our approach increases rather linearly.

at June 21, 2016 12:00 AM UTC

Authors: Archontia C. Giannopoulou, Michał Pilipczuk, Jean-Florent Raymond, Dimitrios M. Thilikos, Marcin Wrochna
Download: PDF
Abstract: Cutwidth is one of the classic layout parameters for graphs. It measures how well one can order the vertices of a graph in a linear manner, so that the maximum number of edges between any prefix and its complement suffix is minimized. As graphs of cutwidth at most $k$ are closed under taking immersions, the results of Robertson and Seymour imply that there is a finite list of minimal immersion obstructions for admitting a cut layout of width at most $k$. We prove that every minimal immersion obstruction for cutwidth at most $k$ has size at most $2^{O(k^3\log k)}$. For our proof, we introduce the concept of a lean ordering that can be seen as the analogue of lean decompositions defined by Thomas in [A Menger-like property of tree-width: The finite case, J. Comb. Theory, Ser. B, 48(1):67--76, 1990] for the case of treewidth. As an interesting algorithmic byproduct, we design a new fixed-parameter algorithm for computing the cutwidth of a graph that runs in time $2^{O(k^2\log k)}\cdot n$, where $k$ is the optimum width and $n$ is the number of vertices. While being slower by a $\log k$-factor in the exponent than the fastest known algorithm, given by Thilikos, Bodlaender, and Serna in [Cutwidth I: A linear time fixed parameter algorithm, J. Algorithms, 56(1):1--24, 2005] and [Cutwidth II: Algorithms for partial $w$-trees of bounded degree, J. Algorithms, 56(1):25--49, 2005], our algorithm has the advantage of being simpler and self-contained; arguably, it explains better the combinatorics of optimum-width layouts.

at June 21, 2016 12:00 AM UTC

Authors: Siddharth Gupta, Diana Palsetia, Md. Mostofa Ali Patwary, Ankit Agrawal, Alok Choudhary
Download: PDF
Abstract: Connected Component Labeling (CCL) is an important step in pattern recognition and image processing. It assigns labels to the pixels such that adjacent pixels sharing the same features are assigned the same label. Typically, CCL requires several passes over the data. We focus on two-pass technique where each pixel is given a provisional label in the first pass whereas an actual label is assigned in the second pass.

We present a scalable parallel two-pass CCL algorithm, called PAREMSP, which employs a scan strategy and the best union-find technique called REMSP, which uses REM's algorithm for storing label equivalence information of pixels in a 2-D image. In the first pass, we divide the image among threads and each thread runs the scan phase along with REMSP simultaneously. In the second phase, we assign the final labels to the pixels. As REMSP is easily parallelizable, we use the parallel version of REMSP for merging the pixels on the boundary. Our experiments show the scalability of PAREMSP achieving speedups up to $20.1$ using $24$ cores on shared memory architecture using OpenMP for an image of size $465.20$ MB. We find that our proposed parallel algorithm achieves linear scaling for a large resolution fixed problem size as the number of processing elements are increased. Additionally, the parallel algorithm does not make use of any hardware specific routines, and thus is highly portable.

at June 21, 2016 12:00 AM UTC

Authors: Oliver Knill
Download: PDF
Abstract: We formulate Goldbach type questions for Gaussian, Hurwitz, Octavian and Eisenstein primes. They are different from Goldbach type statements by Takayoshi Mitsui from 1960 for number fields or C.A. Holben and James Jordan from 1968 for Gaussian integers. Here is what we meeasure: 1) Every even Gaussian integer a+ib satisfying a>2, b>2 is a sum of two Gaussian primes with positive coefficients. 2) Every Eisenstein integer a+bw with a>3,b>3 and w=(1+sqrt(-3))/2 is the sum of two Eisenstein primes with positive coefficients. Note that no evenness condition is asked in the Eisenstein case. 3) Every Lipschitz integer quaternion with positive entries is the sum of two Hurwitz primes. 4) There exists a constant K such that every Octavian integer with coefficients larger than K is the sum of two Octavian primes. Except in the Octonion case, where the fewest experiments were done, the statements can be linked to difficult questions like Landau or Bunyakovsky conjectures. We therefore look also more closely and numerically at some Hardy-Littlewood constants following early computations from Daniel Shanks and Marvin Wunderlich from the 50ies and 70ies.

at June 21, 2016 12:00 AM UTC

Authors: Wilson S. Siringoringo, Andy M. Connor, Nick Clements, Nick Alexander
Download: PDF
Abstract: Minimum Cost Polygon Overlay (MCPO) is a unique two-dimensional optimization problem that involves the task of covering a polygon shaped area with a series of rectangular shaped panels. This has a number of applications in the construction industry. This work examines the MCPO problem in order to construct a model that captures essential parameters of the problem to be solved automatically using numerical optimization algorithms. Three algorithms have been implemented of the actual optimization task: the greedy search, the Monte Carlo (MC) method, and the Genetic Algorithm (GA). Results are presented to show the relative effectiveness of the algorithms. This is followed by critical analysis of various findings of this research.

at June 21, 2016 12:00 AM UTC

Authors: Daniel Doerr, Pedro Feijao, Metin Balaban, Cedric Chauve
Download: PDF
Abstract: The gene family-free framework for comparative genomics aims at developing methods for gene order analysis that do not require prior gene family assignment, but work directly on a sequence similarity multipartite graph. We present a model for constructing a median of three genomes in this family-free setting, based on maximizing an objective function that generalizes the classical breakpoint distance by integrating sequence similarity in the score of a gene adjacency. We show that the corresponding computational problem is MAX SNP-hard and we present a 0-1 linear program for its exact solution. The result of our FF-median program is a median genome with median genes associated to extant genes, in which median adjacencies are assumed to define positional orthologs. We demonstrate through simulations and comparison with the OMA orthology database that the herein presented method is able compute accurate medians and positional orthologs for genomes comparable in size of bacterial genomes.

at June 21, 2016 12:00 AM UTC