Theory of Computing Blog Aggregator

Authors: Paola Bonizzoni, Gianluca Della Vedova, Yuri Pirola, Marco Previtali, Raffaella Rizzi
Download: PDF
Abstract: Indexing very large collections of strings, such as those produced by the widespread next generation sequencing technologies, heavily relies on multi-string generalization of the Burrows-Wheeler Transform (BWT): recent developments in this field have resulted in external memory algorithms, motivated by the large requirements of in-memory approaches.

The related problem of computing the Longest Common Prefix (LCP) array of a set of strings is often instrumental in several algorithms: for example, to compute the suffix-prefix overlaps among strings, which is an essential step for many genome assembly algorithms.

In this paper we propose a new external memory method to simultaneously build the BWT and the LCP array on a collection of $m$ strings of length $k$ with $O(mkl)$ time and I/O complexity, using $O(k + m)$ main memory, where $l$ is the maximum value in the LCP array.

at May 23, 2017 12:00 AM UTC

Authors: Zhaoming Yin, Xuan Shi
Download: PDF
Abstract: Near repeat (NR) is a well known phenomenon in crime analysis assuming that crime events exhibit cor- relations within a given time and space frame. Traditional NR calculation generates 2 event pairs if 2 events happened within a given space and time limit. When the number of events is large, however, NR calculation is time consuming and how these pairs are organized are not yet explored. In this paper, we designed a new approach to calculate clusters of NR events efficiently. To begin with, R-tree is utilized to index crime events, a single event is represented by a vertex whereas edges are constructed by range querying the vertex in R-tree, and a graph is formed. Cohesive subgraph approaches are applied to identify the event chains. k-clique, k-truss, k- core plus DBSCAN algorithms are implemented in sequence with respect to their varied range of ability to find cohesive subgraphs. Real world crime data in Chicago, New York and Washington DC are utilized to conduct experiments. The experiment confirmed that near repeat is a solid effect in real big crime data by conducting Mapreduce empowered knox tests. The performance of 4 different algorithms are validated, while the quality of the algorithms are gauged by the distribution of number of cohesive subgraphs and their clustering coefficients. The proposed framework is the first to process the real crime data of million record scale, and is the first to detect NR events with size of more than 2.

at May 23, 2017 12:00 AM UTC

Authors: Svyatoslav Covanov
Download: PDF
Abstract: In 2012, Barbulescu, Detrey, Estibals and Zimmermann proposed a new framework to exhaustively search for optimal formulae for evaluating bilinear maps, such as Strassen or Karatsuba formulae. The main contribution of this work is a new criterion to aggressively prune useless branches in the exhaustive search, thus leading to the computation of new optimal formulae, in particular for the short product modulo $X^5$ and the circulant product modulo $(X^5-1)$. Moreover, we are able to prove that there is essentially only one optimal decomposition of the product of $3\times 2$ by $2\times 3$ matrices up to the action of some group of automorphisms.

at May 23, 2017 12:00 AM UTC

Authors: Tatsuhiko Hatanaka, Takehiro Ito, Xiao Zhou
Download: PDF
Abstract: Let $G$ be a graph such that each vertex has its list of available colors, and assume that each list is a subset of the common set consisting of $k$ colors. For two given list colorings of $G$, we study the problem of transforming one into the other by changing only one vertex color assignment at a time, while at all times maintaining a list coloring. This problem is known to be PSPACE-complete even for bounded bandwidth graphs and a fixed constant $k$. In this paper, we study the fixed-parameter tractability of the problem when parameterized by several graph parameters. We first give a fixed-parameter algorithm for the problem when parameterized by $k$ and the modular-width of an input graph. We next give a fixed-parameter algorithm for the shortest variant when parameterized by $k$ and the size of a minimum vertex cover of an input graph. As corollaries, we show that the problem for cographs and the shortest variant for split graphs are fixed-parameter tractable even when only $k$ is taken as a parameter. On the other hand, we prove that the problem is W[1]-hard when parameterized only by the size of a minimum vertex cover of an input graph.

at May 23, 2017 12:00 AM UTC

Authors: Keren Censor-Hillel, Bernhard Haeupler, D. Ellis Hershkowitz, Goran Zuzic
Download: PDF
Abstract: The widely-studied radio network model [Chlamtac and Kutten, 1985] is a graph-based description that captures the inherent impact of collisions in wireless communication. In this model, the strong assumption is made that node $v$ receives a message from a neighbor if and only if exactly one of its neighbors broadcasts.

We relax this assumption by introducing a new noisy radio network model in which random faults occur at senders or receivers. Specifically, for a constant noise parameter $p \in [0,1)$, either every sender has probability $p$ of transmitting noise or every receiver of a single transmission in its neighborhood has probability $p$ of receiving noise.

We first study single-message broadcast algorithms in noisy radio networks and show that the Decay algorithm [Bar-Yehuda et al., 1992] remains robust in the noisy model while the diameter-linear algorithm of Gasieniec et al., 2007 does not. We give a modified version of the algorithm of Gasieniec et al., 2007 that is robust to sender and receiver faults, and extend both this modified algorithm and the Decay algorithm to robust multi-message broadcast algorithms.

We next investigate the extent to which (network) coding improves throughput in noisy radio networks. We address the previously perplexing result of Alon et al. 2014 that worst case coding throughput is no better than worst case routing throughput up to constants: we show that the worst case throughput performance of coding is, in fact, superior to that of routing -- by a $\Theta(\log(n))$ gap -- provided receiver faults are introduced. However, we show that any coding or routing scheme for the noiseless setting can be transformed to be robust to sender faults with only a constant throughput overhead. These transformations imply that the results of Alon et al., 2014 carry over to noisy radio networks with sender faults.

at May 23, 2017 12:00 AM UTC

Authors: Abdolhamid Ghodselahi, Fabian Kuhn
Download: PDF
Abstract: The Arrow protocol is a simple and elegant protocol to coordinate exclusive access to a shared object in a network. The protocol solves the underlying distributed queueing problem by using path reversal on a pre-computed spanning tree (or any other tree topology simulated on top of the given network).

It is known that the Arrow protocol solves the problem with a competitive ratio of O(log D) on trees of diameter D. This implies a distributed queueing algorithm with competitive ratio O(s*log D) for general networks with a spanning tree of diameter D and stretch s. In this work we show that when running the Arrow protocol on top of the well-known probabilistic tree embedding of Fakcharoenphol, Rao, and Talwar [STOC 03], we obtain a randomized distributed queueing algorithm with a competitive ratio of O(log n) even on general network topologies. The result holds even if the queueing requests occur in an arbitrarily dynamic and concurrent fashion and even if communication is asynchronous. From a technical point of view, the main of the paper shows that the competitive ratio of the Arrow protocol is constant on a special family of tree topologies, known as hierarchically well separated trees.

at May 23, 2017 12:00 AM UTC

Authors: Yichen Chen, Mengdi Wang
Download: PDF
Abstract: We study the computational complexity of the infinite-horizon discounted-reward Markov Decision Problem (MDP) with a finite state space $|\mathcal{S}|$ and a finite action space $|\mathcal{A}|$. We show that any randomized algorithm needs a running time at least $\Omega(|\mathcal{S}|^2|\mathcal{A}|)$ to compute an $\epsilon$-optimal policy with high probability. We consider two variants of the MDP where the input is given in specific data structures, including arrays of cumulative probabilities and binary trees of transition probabilities. For these cases, we show that the complexity lower bound reduces to $\Omega\left( \frac{|\mathcal{S}| |\mathcal{A}|}{\epsilon} \right)$. These results reveal a surprising observation that the computational complexity of the MDP depends on the data structure of input.

at May 23, 2017 12:40 AM UTC

Authors: Filip Pavetić, Ivan Katanić, Gustav Matula, Goran Žužić, Mile Šikić
Download: PDF
Abstract: Longest Common Subsequence ($LCS$) deals with the problem of measuring similarity of two strings. While this problem has been analyzed for decades, the recent interest stems from a practical observation that considering single characters is often too simplistic. Therefore, recent works introduce the variants of $LCS$ based on shared substrings of length exactly or at least $k$ ($LCS_{k}$ and $LCS_{k+}$ respectively). The main drawback of the state-of-the-art algorithms for computing $LCS_{k}$ and $LCS_{k+}$ is that they work well only in a limited setting: they either solve the average case well while being suboptimal in the pathological situations or they achieve a good worst-case performance, but fail to exploit the input data properties to speed up the computation. Furthermore, these algorithms are based on non-trivial data structures which is not ideal from a practitioner's point of view. We present a single algorithm to compute both $LCS_{k}$ and $LCS_{k+}$ which outperforms the state-of-the art algorithms in terms of runtime complexity and requires only basic data structures. In addition, we describe an algorithm to reconstruct the solution which offers significant improvement in terms of memory consumption. Our empirical validation shows that we save around 1000x of memory on human genome data.

at May 23, 2017 12:00 AM UTC

Authors: Ziqi Yan, Jiqiang Liu, Gang Li, Zhen Han, Shuo Qiu
Download: PDF
Abstract: In many industrial applications of big data, the Jaccard Similarity Computation has been widely used to measure the distance between two profiles or sets respectively owned by two users. Yet, one semi-honest user with unpredictable knowledge may also deduce the private or sensitive information (e.g., the existence of a single element in the original sets) of the other user via the shared similarity. In this paper, we aim at solving the privacy issues in Jaccard similarity computation with strict differential privacy guarantees. To achieve this, we first define the Conditional $\epsilon$-DPSO, a relaxed differential privacy definition regarding set operations, and prove that the MinHash-based Jaccard Similarity Computation (MH-JSC) satisfies this definition. Then for achieving strict differential privacy in MH-JSC, we propose the PrivMin algorithm, which consists of two private operations: 1) the Private MinHash Value Generation that works by introducing the Exponential noise to the generation of MinHash signature. 2) the Randomized MinHashing Steps Selection that works by adopting Randomized Response technique to privately select several steps within the MinHashing phase that are deployed with the Exponential mechanism. Experiments on real datasets demonstrate that the proposed PrivMin algorithm can successfully retain the utility of the computed similarity while preserving privacy.

at May 23, 2017 01:05 AM UTC

Authors: Dmitry Gavinsky
Download: PDF
Abstract: We present a bipartite partial function, whose communication complexity is $O((\log n)^2)$ in the model of quantum simultaneous message passing and $\tilde\Omega(\sqrt n)$ in the model of randomised simultaneous message passing.

In fact, our function has a poly-logarithmic protocol even in the (restricted) model of quantum simultaneous message passing without shared randomness, thus witnessing the possibility of qualitative advantage of this model over randomised simultaneous message passing with shared randomness. This can be interpreted as the strongest known $-$ as of today $-$ example of "super-classical" capabilities of the weakest studied model of quantum communication.

at May 23, 2017 12:40 AM UTC

Authors: Maria-Florina Balcan, Colin White
Download: PDF
Abstract: Recently, there has been substantial interest in clustering research that takes a beyond worst-case approach to the analysis of algorithms. The typical idea is to design a clustering algorithm that outputs a near-optimal solution, provided the data satisfy a natural stability notion. For example, Bilu and Linial (2010) and Awasthi et al. (2012) presented algorithms that output near-optimal solutions, assuming the optimal solution is preserved under small perturbations to the input distances. A drawback to this approach is that the algorithms are often explicitly built according to the stability assumption and give no guarantees in the worst case; indeed, several recent algorithms output arbitrarily bad solutions even when just a small section of the data does not satisfy the given stability notion.

In this work, we address this concern in two ways. First, we provide algorithms that inherit the worst-case guarantees of clustering approximation algorithms, while simultaneously guaranteeing near-optimal solutions when the data is stable. Our algorithms are natural modifications to existing state-of-the-art approximation algorithms. Second, we initiate the study of local stability, which is a property of a single optimal cluster rather than an entire optimal solution. We show our algorithms output all optimal clusters which satisfy stability locally. Specifically, we achieve strong positive results in our local framework under recent stability notions including metric perturbation resilience (Angelidakis et al. 2017) and robust perturbation resilience (Balcan and Liang 2012) for the $k$-median, $k$-means, and symmetric/asymmetric $k$-center objectives.

at May 23, 2017 12:00 AM UTC

A set of $n$ players, each holding a private input bit, communicate over a noisy broadcast channel. Their mutual goal is for all players to learn all inputs. At each round one of the players broadcasts a bit to all the other players, and the bit received by each player is flipped with a fixed constant probability (independently for each recipient). How many rounds are needed? This problem was first suggested by El Gamal in 1984. In 1988, Gallager gave an elegant noise-resistant protocol requiring only $O(nloglogn)$ rounds. The problem got resolved in 2005 by a seminal paper of Goyal, Kindler, and Saks, proving that Gallager's protocol is essentially optimal. We revisit the above noisy broadcast problem and show that $O(n)$ rounds suffice. This is possible due to a relaxation of the model assumed by the previous works. We no longer demand that exactly one player broadcasts in every round, but rather allow any number of players to broadcast. However, if it is not the case that exactly one player chooses to broadcast, each of the other players gets an adversely chosen bit. We generalized the above result and initiate the study of interactive coding over the noisy broadcast channel. We show that any interactive protocol that works over the noiseless broadcast channel can be simulated over our restrictive noisy broadcast model with constant blowup of the communication.

at May 22, 2017 04:15 AM UTC

Authors: Sarvar Patel, Giuseppe Persiano, Kevin Yeo
Download: PDF
Abstract: In this work, we present CacheShuffle, an algorithm for obliviously shuffling data on an untrusted server. Our methods improve the previously best known results by reducing the number of block accesses to (4 + \epsilon)N which is close to the lower bound of 2N. Experimentation shows that for practical sizes of N, there is a 4x improvement over the previous best result. Also our result only requires 2 roundtrips compared to 5 roundtrips needed by the previous result. The novelty in our algorithm involves introducing a persistent client cache. Additionally, we show a recursive version of our algorithm which allows oblivious shuffling with smaller client memory.

at May 22, 2017 12:00 AM UTC

Authors: Martin Böhm, Łukasz Jeż, Jiří Sgall, Pavel Veselý
Download: PDF
Abstract: In Packet Scheduling with Adversarial Jamming packets of arbitrary sizes arrive over time to be transmitted over a channel in which instantaneous jamming errors occur at times chosen by the adversary and not known to the algorithm. The transmission taking place at the time of jamming is corrupt, and the algorithm learns this fact immediately. An online algorithm maximizes the total size of packets it successfully transmits and the goal is to develop an algorithm with the lowest possible asymptotic competitive ratio, where the additive constant may depend on packet sizes.

Our main contribution is a simple algorithm that works for any speedup and packet sizes and, unlike previous algorithms for the problem, it does not need to know these properties in advance. We show that this algorithm maintains 1-competitiveness with speedup 4, making it the first known algorithm to maintain 1-competitiveness with a moderate speedup in the general setting of arbitrary packet sizes. Furthermore, we show a lower bound of $\phi+1\approx 2.618$ on the speedup of any 1-competitive deterministic algorithm.

Additionally, we formulate a general framework for analyzing our algorithm locally and use it to show upper bounds on its competitive ratio for speedups in $[1,4)$ and for several special cases. In particular, our algorithm is 3-competitive without speedup, matching the algorithm and the lower bound of Jurdzinski et al. (WAOA 2014).

We use this framework also for the case of divisible packet sizes to show that a slight modification of our algorithm is 1-competitive with speedup 2 and it achieves the optimal competitive ratio of 2 without speedup, again matching the algorithm and the lower bound of Jurdzinski et al. (WAOA 2014). Finally, we obtain a lower bound of 2 on the speedup of any 1-competitive deterministic algorithm using only two divisible packet sizes.

at May 22, 2017 12:00 AM UTC

Authors: Hamza Fawzi, Mohab Safey El Din
Download: PDF
Abstract: The positive semidefinite rank of a convex body $C$ is the size of its smallest positive semidefinite formulation. We show that the positive semidefinite rank of any convex body $C$ is at least $\sqrt{\log d}$ where $d$ is the smallest degree of a polynomial that vanishes on the boundary of the polar of $C$. This improves on the existing bound which relies on results from quantifier elimination. The proof relies on the B\'ezout bound applied to the Karush-Kuhn-Tucker conditions of optimality. We discuss the connection with the algebraic degree of semidefinite programming and show that the bound is tight (up to constant factor) for random spectrahedra of suitable dimension.

at May 22, 2017 12:40 AM UTC

Authors: Haotian Jiang, Jian Li, Mingda Qiao
Download: PDF
Abstract: In the Best-$K$ identification problem (Best-$K$-Arm), we are given $N$ stochastic bandit arms with unknown reward distributions. Our goal is to identify the $K$ arms with the largest means with high confidence, by drawing samples from the arms adaptively. This problem is motivated by various practical applications and has attracted considerable attention in the past decade. In this paper, we propose new practical algorithms for the Best-$K$-Arm problem, which have nearly optimal sample complexity bounds (matching the lower bound up to logarithmic factors) and outperform the state-of-the-art algorithms for the Best-$K$-Arm problem (even for $K=1$) in practice.

at May 22, 2017 01:54 AM UTC

Authors: Vladimir Ejov, Michael Haythorpe, Serguei Rossomakhine
Download: PDF
Abstract: We describe a hybrid procedure for solving the traveling salesman problem (TSP) to provable optimality. We first sparsify the instance, and then use a hybrid algorithm that combines a branch-and-cut TSP solver with a Hamiltonian cycle problem solver. We demonstrate that this procedure enables us to solve difficult instances to optimality, including one which had remained unsolved since its construction in 2002.

at May 22, 2017 12:00 AM UTC

Authors: Irene Muzi, Michael P. O'Brien, Felix Reidl, Blair D. Sullivan
Download: PDF
Abstract: We study the computational complexity of identifying dense substructures, namely $r/2$-shallow topological minors and $r$-subdivisions. Of particular interest is the case when $r=1$, when these substructures correspond to very localized relaxations of subgraphs. Since Densest Subgraph can be solved in polynomial time, we ask whether these slight relaxations also admit efficient algorithms.

In the following, we provide a negative answer: Dense $r/2$-Shallow Topological Minor and Dense $r$-Subdivsion are already NP-hard for $r = 1$ in very sparse graphs. Further, they do not admit algorithms with running time $2^{o(\mathbf{tw}^2)} n^{O(1)}$ when parameterized by the treewidth of the input graph for $r \geq 2$ unless ETH fails.

at May 22, 2017 12:41 AM UTC

On Wednesday, Scott Alexander finally completed his sprawling serial novel Unsong, after a year and a half of weekly updates—incredibly, in his spare time while also working as a full-term resident in psychiatry, and also regularly updating Slate Star Codex, which I consider to be the world’s best blog.  I was honored to attend a party in Austin (mirroring parties in San Francisco, Boston, Tel Aviv, and elsewhere) to celebrate Alexander’s release of the last chapter—depending on your definition, possibly the first “fan event” I’ve ever attended.

Like many other nerds I’ve met, I’d been following Unsong almost since the beginning—with its mix of Talmudic erudition, CS humor, puns, and even a shout-out to Quantum Computing Since Democritus (which shows up as Ben Aharon’s Gematria Since Adam), how could I not be?  I now count Unsong as one of my favorite works of fiction, and Scott Alexander alongside Rebecca Newberger Goldstein among my favorite contemporary novelists.  The goal of this post is simply to prod readers of my blog who don’t yet know Unsong: if you’ve ever liked anything here on Shtetl-Optimized, then I predict you’ll like Unsong, and probably more.

[WARNING: SPOILERS FOLLOW]

Though not trivial to summarize, Unsong is about a world where the ideas of religion and mysticism—all of them, more or less, although with a special focus on kabbalistic Judaism—turn out to be true.  In 1968, the Apollo 8 mission leads not to an orbit of the Moon, as planned, but instead to cracking an invisible crystal sphere that had surrounded the Earth for millennia.  Down through the crack rush angels, devils, and other supernatural forces.  Life on Earth becomes increasingly strange: on the one hand, many technologies stop working; on the other, people can now gain magical powers by speaking various names of God.  A worldwide industry arises to discover new names of God by brute-force search through sequences of syllables.  And a powerful agency, the eponymous UNSONG (United Nations Subcommittee on Names of God), is formed to enforce kabbalistic copyright law, hunting down and punishing anyone who speaks divine names without paying licensing fees to the theonomic corporations.

As the story progresses, we learn that eons ago, there was an epic battle in Heaven between Good and Evil, and Evil had the upper hand.  But just as all seemed lost, an autistic angel named Uriel reprogrammed the universe to run on math and science rather than on God’s love, as a last-ditch strategy to prevent Satan’s forces from invading the sublunary realm.  Molecular biology, the clockwork regularity of physical laws, false evidence for a huge and mindless cosmos—all these were retconned into the world’s underpinnings.  Uriel did still need to be occasionally involved, but less as a loving god than as an overworked sysadmin: for example, he descended to Mount Sinai to warn humans never to boil goats in their mothers’ milk, because he discovered that doing so (like the other proscribed activities in the Torah, Uriel’s readme file) triggered bugs in the patchwork of code that was holding the universe together.  Now that the sky has cracked, Uriel is forced to issue increasingly desperate patches, and even those will only buy a few decades until his math-and-science-based world stops working entirely, with Satan again triumphant.

Anyway, that’s a tiny part of the setup.  Through 72 chapters and 22 interludes, there’s world-building and philosophical debates and long kabbalistic digressions.  There are battle sequences (the most striking involves the Lubavitcher Rebbe riding atop a divinely-animated Statue of Liberty like a golem).  There’s wordplay and inside jokes—holy of holies are there those—including, notoriously, a sequence of cringe-inducing puns involving whales.  But in this story, wordplay isn’t just there for the hell of it: Scott Alexander has built an entire fictional universe that runs on wordplay—one where battles between the great masters, the equivalent of the light-saber fights in Star Wars, are conducted by rearranging letters in the sky to give them new meanings.  Scott A. famously claims he’s bad at math (though if you read anything he’s written on statistics or logic puzzles, it’s clear he undersells himself).  One could read Unsong as Alexander’s book-length answer to the question: what could it mean for the world to be law-governed but not mathematical?  What if the Book of Nature were written in English, or Hebrew, or other human languages, and if the Newtons and Einsteins were those who were most adept with words?

I should confess that for me, the experience of reading Unsong was colored by the knowledge that, in his years of brilliant and prolific writing, lighting up the blogosphere like a comet, the greatest risk Scott Alexander ever took (by his own account) was to defend me.  It’s like, imagine that in Elizabethan England, you were placed in the stocks and jeered at by thousands for advocating some unpopular loser cause—like, I dunno, anti-cat-burning or something.  And imagine that, when it counted, your most eloquent supporter was a then-obscure poet from Stratford-upon-Avon.  You’d be grateful to the poet, of course; you might even become a regular reader of his work, even if it wasn’t good.  But if the same poet went on to write Hamlet or Macbeth?  It might almost be enough for you to volunteer to be scorned and pilloried all over again, just for the honor of having the Bard divert a rivulet of his creative rapids to protesting on your behalf.

Yes, a tiny part of me had a self-absorbed child’s reaction to Unsong: “could Amanda Marcotte have written this?  could Arthur Chu?  who better to have in your camp: the ideologues du jour of Twitter and Metafilter, Salon.com and RationalWiki?  Or a lone creative genius, someone who can conjure whole worlds into being, as though graced himself with the Shem haMephorash of which he writes?”  Then of course I’d catch myself, and think: no, if you want to be in Scott Alexander’s camp, then the only way to do it is to be in nobody’s camp.  If two years ago it was morally justified to defend me, then the reasons why have nothing to do with the literary gifts of any of my defenders.  And conversely, the least we can do for Unsong is to judge it by what’s on the page, rather than as a soldier in some army fielded by the Gray Tribe.

So in that spirit, let me explain some of what’s wrong with Unsong.  That it’s a first novel sometimes shows.  It’s brilliant on world-building and arguments and historical tidbits and jokes, epic on puns, and uneven on character and narrative flow.  The story jumps around spasmodically in time, so much so that I needed a timeline to keep track of what was happening.  Subplots that are still open beget additional subplots ad headacheum, like a string of unmatched left-parentheses.  Even more disorienting, the novel changes its mind partway through about its narrative core.  Initially, the reader is given a clear sense that this is going to be a story about a young Bay Area kabbalist named Aaron Smith-Teller, his not-quite-girlfriend Ana, and their struggle for supernatural fair-use rights.  Soon, though, Aaron and Ana become almost side characters, their battle against UNSONG just one subplot among many, as the focus shifts to the decades-long war between the Comet King, a messianic figure come to rescue humanity, and Thamiel, the Prince of Hell.  For the Comet King, even saving the earth from impending doom is too paltry a goal to hold his interest much.  As a strict utilitarian and fan of Peter Singer, the Comet King’s singleminded passion is destroying Hell itself, and thereby rescuing the billions of souls who are trapped there for eternity.

Anyway, unlike the Comet King, and unlike a certain other Scott A., I have merely human powers to marshal my time.  I also have two kids and a stack of unwritten papers.  So let me end this post now.  If the post causes just one person to read Unsong who otherwise wouldn’t have, it will be as if I’ve nerdified the entire world.

by Scott at May 20, 2017 09:43 AM UTC

Authors: Flavio Chierichetti, Sreenivas Gollapudi, Ravi Kumar, Silvio Lattanzi, Rina Panigrahy, David P. Woodruff
Download: PDF
Abstract: We consider the problem of approximating a given matrix by a low-rank matrix so as to minimize the entrywise $\ell_p$-approximation error, for any $p \geq 1$; the case $p = 2$ is the classical SVD problem. We obtain the first provably good approximation algorithms for this version of low-rank approximation that work for every value of $p \geq 1$, including $p = \infty$. Our algorithms are simple, easy to implement, work well in practice, and illustrate interesting tradeoffs between the approximation quality, the running time, and the rank of the approximating matrix.

at May 19, 2017 12:43 AM UTC

Authors: Steven Heilman
Download: PDF
Abstract: Let $\Omega\subset\mathbb{R}^{n+1}$ have minimal Gaussian surface area among all sets satisfying $\Omega=-\Omega$ with fixed Gaussian volume. Let $A=A_{x}$ be the second fundamental form of $\partial\Omega$ at $x$, i.e. $A$ is the matrix of first order partial derivatives of the unit normal vector at $x\in\partial\Omega$. For any $x=(x_{1},\ldots,x_{n+1})\in\mathbb{R}^{n+1}$, let $\gamma_{n}(x)=(2\pi)^{-n/2}e^{-(x_{1}^{2}+\cdots+x_{n+1}^{2})/2}$. Let $\|A\|^{2}$ be the sum of the squares of the entries of $A$, and let $\|A\|_{2\to 2}$ denote the $\ell_{2}$ operator norm of $A$.

It is shown that if $\Omega$ or $\Omega^{c}$ is convex, and if either $$\int_{\partial\Omega}(\|A_{x}\|^{2}-1)\gamma_{n}(x)dx>0\qquad\mbox{or}\qquad \int_{\partial\Omega}\Big(\|A_{x}\|^{2}-1+2\sup_{y\in\partial\Omega}\|A_{y}\|_{2\to 2}^{2}\Big)\gamma_{n}(x)dx<0,$$ then $\partial\Omega$ must be a round cylinder. That is, except for the case that the average value of $\|A\|^{2}$ is slightly less than $1$, we resolve the convex case of a question of Barthe from 2001.

The main tool is the Colding-Minicozzi theory for Gaussian minimal surfaces, which studies eigenfunctions of the Ornstein-Uhlenbeck type operator $L= \Delta-\langle x,\nabla \rangle+\|A\|^{2}+1$ associated to the surface $\partial\Omega$. A key new ingredient is the use of a randomly chosen degree 2 polynomial in the second variation formula for the Gaussian surface area. Our actual results are a bit more general than the above statement. Also, some of our results hold without the assumption of convexity.

at May 19, 2017 12:40 AM UTC

Authors: Zhaoming Yin, Jijun Tang, Stephen W. Schaeffer, David A. Bader
Download: PDF
Abstract: The edit distance under the DCJ model can be computed in linear time for genomes with equal content or with Indels. But it becomes NP-Hard in the presence of duplications, a problem largely unsolved especially when Indels are considered. In this paper, we compare two mainstream methods to deal with duplications and associate them with Indels: one by deletion, namely DCJ-Indel-Exemplar distance; versus the other by gene matching, namely DCJ-Indel-Matching distance. We design branch-and-bound algorithms with set of optimization methods to compute exact distances for both. Furthermore, median problems are discussed in alignment with both of these distance methods, which are to find a median genome that minimizes distances between itself and three given genomes. Lin-Kernighan (LK) heuristic is leveraged and powered up by sub-graph decomposition and search space reduction technologies to handle median computation. A wide range of experiments are conducted on synthetic data sets and real data sets to show pros and cons of these two distance metrics per se, as well as putting them in the median computation scenario.

at May 19, 2017 12:46 AM UTC

Authors: Bhadrachalam Chitturi
Download: PDF
Abstract: The independent set on a graph $G=(V,E)$ is a subset of $V$ such that no two vertices in the subset have an edge between them. The maximum independent set problem on $G$ seeks to identify an independent set with maximum cardinality, i.e. maximum independent set or MIS. The maximum independent set problem on a general graph is known to be NP-complete. On certain classes of graphs MIS can be computed in polynomial time. Such algorithms are known for bipartite graphs, chordal graphs, cycle graphs, comparability graphs, claw-free graphs, interval graphs and circular arc graphs. On trees, the weighted version of this problem can be solved in linear time. In this article we introduce a new type of graph called a layered graph and show that if the number of vertices in a layer is $O(\log \mid V \mid)$ then the maximum independent set can be computed in polynomial time.

at May 19, 2017 12:43 AM UTC

Authors: Kanthi K. Sarpatwar, Baruch Schieber, Hadas Shachnai
Download: PDF
Abstract: We present a combinatorial algorithm that improves the best known approximation ratio for monotone submodular maximization under a knapsack and a matroid constraint to $\frac{1 -e^{-2}}{2}$. This classic problem is known to be hard to approximate within factor better than $1 - 1/e$. We show that the algorithm can be extended to yield a ratio of $\frac{1 - e^{-(k+1)}}{k+1}$ for the problem with a single knapsack and the intersection of $k$ matroid constraints, for any fixed $k > 1$.

Our algorithms, which combine the greedy algorithm of [Khuller, Moss and Naor, 1999] and [Sviridenko, 2004] with local search, show the power of interleaving iterations of the two techniques as a new algorithmic tool.

at May 19, 2017 12:40 AM UTC

Authors: S. Colabrese, D. De Martino, L. Leuzzi, E. Marinari
Download: PDF
Abstract: The resolution of linear system with positive integer variables is a basic yet difficult computational problem with many applications. We consider sparse uncorrelated random systems parametrised by the density $c$ and the ratio $\alpha=N/M$ between number of variables $N$ and number of constraints $M$. By means of ensemble calculations we show that the space of feasible solutions endows a Van-Der-Waals phase diagram in the plane ($c$, $\alpha$). We give numerical evidence that the associated computational problems become more difficult across the critical point and in particular in the coexistence region.

at May 19, 2017 12:40 AM UTC

Last week the Georgia Tech School of Industrial and Systems Engineering honored the 80th birthday of George Nemhauser and the 70th of Arkadi Nemirovski at an event naturally called NemFest. The Nems are powerhouses in the optimization community and this event drew many of the greats of the field.

In theoretical CS we often take NP-complete as a sign to stop searching for an efficient algorithm. Optimization people take NP-complete as a starting point, using powerful algorithmic ideas, clever heuristics and sheer computing power to solve or nearly optimize in many real-world cases.

Bill Cook talked about his adventures with the traveling salesman problem. Check out his British pub crawl and his tour through the nearly 50,000 US historic sites.

Michael Trick talked about his side job, schedule MLB baseball games, a surprisingly challenging problem. Like TSP, you want to minimize total travel distance but under a wide variety of constraints. "There's something satisfying about being at a bar, seeing a game on the TV and knowing those two teams are playing because you scheduled them." Can't say I've had that kind of satisfaction in my computational complexity research.

by Lance Fortnow (noreply@blogger.com) at May 18, 2017 12:42 PM UTC

Authors: Thomas D. Dickerson
Download: PDF
Abstract: This paper proposes a new concurrent heap algorithm, based on a stateless shape property, which efficiently maintains balance during insert and removeMin operations implemented with hand-over-hand locking. It also provides a O(1) linearizable snapshot operation based on lazy copy-on-write semantics. Such snapshots can be used to provide consistent views of the heap during iteration, as well as to make speculative updates (which can later be dropped).

The simplicity of the algorithm allows it to be easily proven correct, and the choice of shape property provides priority queue performance which is competitive with highly optimized skiplist implementations (and has stronger bounds on worst-case time complexity).

A Scala reference implementation is provided.

at May 18, 2017 12:41 AM UTC

Authors: Mikkel Abrahamsen, Mark de Berg, Kevin Buchin, Mehran Mehr, Ali D. Mehrabi
Download: PDF
Abstract: In a geometric $k$-clustering problem the goal is to partition a set of points in $\mathbb{R}^d$ into $k$ subsets such that a certain cost function of the clustering is minimized. We present data structures for orthogonal range-clustering queries on a point set $S$: given a query box $Q$ and an integer $k>2$, compute an optimal $k$-clustering for $S\setminus Q$. We obtain the following results. We present a general method to compute a $(1+\epsilon)$-approximation to a range-clustering query, where $\epsilon>0$ is a parameter that can be specified as part of the query. Our method applies to a large class of clustering problems, including $k$-center clustering in any $L_p$-metric and a variant of $k$-center clustering where the goal is to minimize the sum (instead of maximum) of the cluster sizes. We extend our method to deal with capacitated $k$-clustering problems, where each of the clusters should not contain more than a given number of points. For the special cases of rectilinear $k$-center clustering in $\mathbb{R}^1$, and in $\mathbb{R}^2$ for $k=2$ or 3, we present data structures that answer range-clustering queries exactly.

at May 18, 2017 12:46 AM UTC

Authors: Gui Citovsky, Tyler Mayer, Joseph S. B. Mitchell
Download: PDF
Abstract: In this paper we study a natural special case of the Traveling Salesman Problem (TSP) with point-locational-uncertainty which we will call the {\em adversarial TSP} problem (ATSP). Given a metric space $(X, d)$ and a set of subsets $R = \{R_1, R_2, ... , R_n\} : R_i \subseteq X$, the goal is to devise an ordering of the regions, $\sigma_R$, that the tour will visit such that when a single point is chosen from each region, the induced tour over those points in the ordering prescribed by $\sigma_R$ is as short as possible. Unlike the classical locational-uncertainty-TSP problem, which focuses on minimizing the expected length of such a tour when the point within each region is chosen according to some probability distribution, here, we focus on the {\em adversarial model} in which once the choice of $\sigma_R$ is announced, an adversary selects a point from each region in order to make the resulting tour as long as possible. In other words, we consider an offline problem in which the goal is to determine an ordering of the regions $R$ that is optimal with respect to the "worst" point possible within each region being chosen by an adversary, who knows the chosen ordering. We give a $3$-approximation when $R$ is a set of arbitrary regions/sets of points in a metric space. We show how geometry leads to improved constant factor approximations when regions are parallel line segments of the same lengths, and a polynomial-time approximation scheme (PTAS) for the important special case in which $R$ is a set of disjoint unit disks in the plane.

at May 18, 2017 12:41 AM UTC

Authors: Petr Ryšavý, Filip Železný
Download: PDF
Abstract: To cluster sequences given only their read-set representations, one may try to reconstruct each one from the corresponding read set, and then employ conventional (dis)similarity measures such as the edit distance on the assembled sequences. This approach is however problematic and we propose instead to estimate the similarities directly from the read sets. Our approach is based on an adaptation of the Monge-Elkan similarity known from the field of databases. It avoids the NP-hard problem of sequence assembly. For low coverage data it results in a better approximation of the true sequence similarities and consequently in better clustering, in comparison to the first-assemble-then-cluster approach.

at May 18, 2017 12:41 AM UTC

Authors: Andrej Dudenhefner, Jakob Rehof
Download: PDF
Abstract: We revisit the undecidability result of rank 3 intersection type inhabitation (Urzyczyn 2009) in pursuit of two goals. First, we strengthen the previous result by showing that intersection type inhabitation is undecidable for types of rank 3 and order 3, i.e. it is not necessary to introduce new functional dependencies (new instructions) during proof search. Second, we pinpoint the principles necessary to simulate Turing machine computation directly, whereas previous constructions used a highly parallel and non-deterministic computation model. Since our construction is more concise than existing approaches taking no detours, we believe that it is valuable for a better understanding of the expressiveness of intersection type inhabitation.

at May 18, 2017 12:40 AM UTC

Authors: Yi-Zhi Xu, Hai-Jun Zhou
Download: PDF
Abstract: The minimum feedback arc set problem asks to delete a minimum number of arcs (directed edges) from a digraph (directed graph) to make it free of any directed cycles. In this work we approach this fundamental cycle-constrained optimization problem by considering a generalized task of dividing the digraph into D layers of equal size. We solve the D-segmentation problem by the replica-symmetric mean field theory and belief-propagation heuristic algorithms. The minimum feedback arc density of a given random digraph ensemble is then obtained by extrapolating the theoretical results to the limit of large D. A divide-and-conquer algorithm (nested-BPR) is devised to solve the minimum feedback arc set problem with very good performance and high efficiency.

at May 18, 2017 12:40 AM UTC

Authors: Alper Atamturk, Birce Tezel, Simge Kucukyavuz
Download: PDF
Abstract: Capacitated fixed-charge network flows are used to model a variety of problems in telecommunication, facility location, production planning and supply chain management. In this paper, we investigate capacitated path substructures and derive strong and easy-to-compute \emph{path cover and path pack inequalities}. These inequalities are based on an explicit characterization of the submodular inequalities through a fast computation of parametric minimum cuts on a path, and they generalize the well-known flow cover and flow pack inequalities for the single-node relaxations of fixed-charge flow models. We provide necessary and sufficient facet conditions. Computational results demonstrate the effectiveness of the inequalities when used as cuts in a branch-and-cut algorithm.

at May 18, 2017 12:41 AM UTC

We establish an explicit link between depth-3 formulas and one-sided approximation by depth-2 formulas, which were previously studied independently. Specifically, we show that the minimum size of depth-3 formulas is (up to a factor of n) equal to the inverse of the maximum, over all depth-2 formulas, of one-sided-error correlation bound divided by the size of the depth-2 formula, on a certain hard distribution. We apply this duality to obtain several consequences: 1. Any function f can be approximated by a CNF formula of size $O(\epsilon 2^n / n)$ with one-sided error and advantage $\epsilon$ for some $\epsilon$, which is tight up to a constant factor. 2. There exists a monotone function f such that f can be approximated by some polynomial-size CNF formula, whereas any monotone CNF formula approximating f requires exponential size. 3. Any depth-3 formula computing the parity function requires $\Omega(2^{2 \sqrt{n}})$ gates, which is tight up to a factor of $\sqrt n$. This establishes a quadratic separation between depth-3 circuit size and depth-3 formula size. 4. We give a characterization of the depth-3 monotone circuit complexity of the majority function, in terms of a natural extremal problem on hypergraphs. In particular, we show that a known extension of Turan's theorem gives a tight (up to a polynomial factor) circuit size for computing the majority function by a monotone depth-3 circuit with bottom fan-in 2. 5. AC0[p] has exponentially small one-sided correlation with the parity function for odd prime p.

at May 17, 2017 08:18 PM UTC

A small-biased function is a randomized function whose distribution of truth-tables is small-biased. We demonstrate that known explicit lower bounds on the size of (1) general Boolean formulas, (2) Boolean formulas of fan-in two, (3) de Morgan formulas, as well as (4) correlation lower bounds against small de Morgan formulas apply to small-biased functions. As a consequence, any strongly explicit small-biased generator is subject to the best known explicit formula lower bounds in all these models. On the other hand, we give a construction of a small-biased function that is tight with respect to lower bounds (1) and (2) for the relevant range of parameters. We interpret this construction as a natural-type barrier against substantially stronger lower bounds for general formulas.

at May 17, 2017 06:42 PM UTC

A reminder that this year's STOC is supersized, with (completely in the STOC package) multiple workshops, tutorials, and special speakers.   But for those of you (like me) who put these things off until the last minute, the last minute is now actually here.

The hotel reservation deadline is Friday, May 19.  Again, here is the link for information.  If you're looking to lower the hotel cost, we have a page for room sharing.

Finally, early registration ends Sunday, May 21.  Here is your final link for info.

by Michael Mitzenmacher (noreply@blogger.com) at May 17, 2017 05:46 PM UTC

____________________________________________________________
On behalf of the Steering Committee of ESA and the Program Committee of ESA 2017, I would like to invite you to attend the ESA conference and the ESA Test-of-Time 2016 Award Ceremony. The ceremony will take place on the afternoon of 5th of September during ESA 2017 in Vienna and it will include the presentation by the awardees of the recognized paper.
_____________________________________________________
Announcement of the ESA Test-of-Time Award 2016

European Symposium on Algorithms (ESA)
http://esa-symposium.org/
____________________________________________________________

The ESA Test-of-Time Award (ESA ToTA) recognizes excellent papers in algorithms research that were published in the ESA proceedings 19-21 years ago and which are still influential and stimulating for the field today. In this second year in which the award is given, papers from ESA'95 to ESA'97 were considered.

The committee nominates the following paper for the ESA ToTA 2016. The paper stands out as a classic in the algorithms field and by its excellent citation
record still relevant today.

From ESA 95-97:

Boris V. Cherkassky, Andrew V. Goldberg
Negative-cycle detection algorithms
Proceedings ESA'96, also in: Mathematical Programming 85:2 (1999) 277-311

Laudation:
The paper by Cherkassky and Goldberg deals with the problem of finding a
negative-length cycle in a network or proving that there is none. Algorithms
for this are a combination of a shortest-path algorithm and a negative-cycle
detection strategy. The authors analyse known algorithms and some new ones
and determine the best combinations. Novel instance generators are used in
this study. The paper is a model experimental paper in algorithms.

ESA ToTA 2016 Award Committee:
Kurt Mehlhorn (Saarbrucken)
Mike Paterson (Warwick)
Jan van Leeuwen (Utrecht)
____________________________________________________________

by sank at May 17, 2017 01:22 PM UTC

Authors: Sina Dehghani, Soheil Ehsani, MohammadTaghi HajiAghayi, Vahid Liaghat, SaeedReza Seddighin
Download: PDF
Abstract: In this paper, we study a stochastic variant of the celebrated k-server problem. In the k-server problem, we are required to minimize the total movement of k servers that are serving an online sequence of t requests in a metric. In the stochastic setting we are given t independent distributions <P_1, P_2,..., P_t> in advance, and at every time step i a request is drawn from Pi. Designing the optimal online algorithm in such setting is NP-hard, therefore the emphasis of our work is on designing an approximately optimal online algorithm. We first show a structural characterization for a certain class of non-adaptive online algorithms. We prove that in general metrics, the best of such algorithms has a cost of no worse than three times that of the optimal online algorithm. Next, we present an integer program that finds the optimal algorithm of this class for any arbitrary metric. Finally, by rounding the solution of the linear relaxation of this program, we present an online algorithm for the stochastic k-server problem with the approximation factor of 3 in the line and circle metrics and O(log n) in a general metric of size n. Moreover, we define the Uber problem, in which each demand consists of two endpoints, a source and a destination. We show that given an a-approximation algorithm for the k-server problem, we can obtain an (a+2)-approximation algorithm for the Uber problem. Motivated by the fact that demands are usually highly correlated with the time we study the stochastic Uber problem. Furthermore, we extend our results to the correlated setting where the probability of a request arriving at a certain point depends not only on the time step but also on the previously arrived requests.

at May 17, 2017 12:41 AM UTC

Authors: Jon Kleinberg, Sendhil Mullainathan, Johan Ugander
Download: PDF
Abstract: A broad range of on-line behaviors are mediated by interfaces in which people make choices among sets of options. A rich and growing line of work in the behavioral sciences indicate that human choices follow not only from the utility of alternatives, but also from the choice set in which alternatives are presented. In this work we study comparison-based choice functions, a simple but surprisingly rich class of functions capable of exhibiting so-called choice-set effects. Motivated by the challenge of predicting complex choices, we study the query complexity of these functions in a variety of settings. We consider settings that allow for active queries or passive observation of a stream of queries, and give analyses both at the granularity of individuals or populations that might exhibit heterogeneous choice behavior. Our main result is that any comparison-based choice function in one dimension can be inferred as efficiently as a basic maximum or minimum choice function across many query contexts, suggesting that choice-set effects need not entail any fundamental algorithmic barriers to inference. We also introduce a class of choice functions we call distance-comparison-based functions, and briefly discuss the analysis of such functions. The framework we outline provides intriguing connections between human choice behavior and a range of questions in the theory of sorting.

at May 17, 2017 12:41 AM UTC

Authors: Keren Censor-Hillel, Seri Khoury, Ami Paz
Download: PDF
Abstract: We present the first super-linear lower bounds for natural graph problems in the CONGEST model, answering a long-standing open question.

Specifically, we show that any exact computation of a minimum vertex cover or a maximum independent set requires $\Omega(n^2/\log^2{n})$ rounds in the worst case in the CONGEST model, as well as any algorithm for $\chi$-coloring a graph, where $\chi$ is the chromatic number of the graph. We further show that such strong lower bounds are not limited to NP-hard problems, by showing two simple graph problems in P which require a quadratic and near-quadratic number of rounds.

Finally, we address the problem of computing an exact solution to weighted all-pairs-shortest-paths (APSP), which arguably may be considered as a candidate for having a super-linear lower bound. We show a simple $\Omega(n)$ lower bound for this problem, which implies a separation between the weighted and unweighted cases, since the latter is known to have a complexity of $\Theta(n/\log{n})$. We also formally prove that the standard Alice-Bob framework is incapable of providing a super-linear lower bound for exact weighted APSP, whose complexity remains an intriguing open question.

at May 17, 2017 12:41 AM UTC