Theory of Computing Blog Aggregator

My latest arXiv preprint is "Covering many points with a small-area box", arXiv:1612.02149, with de Berg, Cabello, Cheong, and Knauer. It is about finding an axis-parallel rectangle that covers k out of n points in the plane and has as small area as possible. Here, both k and n are variables, but k is expected to be significantly smaller than n. So, in finding a fast algorithm, the goal is first to minimize the exponent of n in the time bound, and only after that to minimize the dependence on k. We achieve a bound of O(nk2log n). There are also some related algorithms for approximating the dual problem where you are given a fixed area and want to maximize the number of points that are covered.

The result has a bit of a curious history. A group of us worked on it, and came up with the best algorithms we could, thinking it was a new and unsolved problem. Then, we did a more careful literature search and found that it was actually a known problem, and worse than that, that the time bounds we had achieved were all known or improved, so that we had no result. But then we looked again and discovered that the supposedly known bounds aren't actually true.

What happened was that there was a line of research (that I was part of) in the early 1990s on finding "small" subsets of a given number of points. A sequence of papers on this subtopic identified many different definitions of what it meant to be small (like having low circumradius, diameter, etc) that could all be solved by similar algorithms. These algorithms worked by covering the input by clusters of O(k) points, one of which could be guaranteed to include the optimal solution, and then worked separately within each cluster. A simple version of this is to choose O(k) nearest neighbors to each point and to use these n sets of near neighbors as the clusters; later refinements used fewer and easier-to-find clusters. The algorithms of this type varied by what they did inside the clusters, but this general method ensured that the dependence on n would be linear. And one of the problems solved using this approach was almost the same as the one in our new paper, but with axis-parallel rectangle perimeter in place of rectangle area.

Later, another paper studied the problem of finding smallest rectangles covering k points, but with k assumed to be very close to n rather than much smaller than n, so that factors of k in the runtime are bad and factors of (n − k) are good. It solved both the minimum-perimeter and minimum-area problems for the large-k case. Of course, it quoted the past work on the minimum perimeter problem for small k, but it didn't make clear that this past work solved only the case of perimeter minimization and not area minimization. And so, other papers, quoting that one, started also saying that the minimum-area k-point rectangle problem (for the small-k case) had also already been solved.

It hadn't, but now it has.

at December 08, 2016 05:11 AM UTC

Authors: Sang Won Bae, Mark de Berg, Otfried Cheong, Joachim Gudmundsson, Christos Levcopoulos
Download: PDF
Abstract: Let $C$ be the unit circle in $\mathbb{R}^2$. We can view $C$ as a plane graph whose vertices are all the points on $C$, and the distance between any two points on $C$ is the length of the smaller arc between them. We consider a graph augmentation problem on $C$, where we want to place $k\geq 1$ \emph{shortcuts} on $C$ such that the diameter of the resulting graph is minimized.

We analyze for each $k$ with $1\leq k\leq 7$ what the optimal set of shortcuts is. Interestingly, the minimum diameter one can obtain is not a strictly decreasing function of~$k$. For example, with seven shortcuts one cannot obtain a smaller diameter than with six shortcuts. Finally, we prove that the optimal diameter is $2 + \Theta(1/k^{\frac{2}{3}})$ for any~$k$.

at December 08, 2016 12:00 AM UTC

Authors: Luis Barba, Jean Cardinal, John Iacono, Stefan Langerman, Aurélien Ooms, Noam Solomon
Download: PDF
Abstract: The 3SUM problem asks if an input $n$-set of real numbers contains a triple whose sum is zero. We consider the 3POL problem, a natural generalization of 3SUM where we replace the sum function by a constant-degree polynomial in three variables. The motivations are threefold. Raz, Sharir, and de Zeeuw gave a $O(n^{11/6})$ upper bound on the number of solutions of trivariate polynomial equations when the solutions are taken from the cartesian product of three $n$-sets of real numbers. We give algorithms for the corresponding problem of counting such solutions. Gr\o nlund and Pettie recently designed subquadratic algorithms for 3SUM. We generalize their results to 3POL. Finally, we shed light on the General Position Testing (GPT) problem: "Given $n$ points in the plane, do three of them lie on a line?", a key problem in computational geometry.

We prove that there exist bounded-degree algebraic decision trees of depth $O(n^{\frac{12}{7}+\varepsilon})$ that solve 3POL, and that 3POL can be solved in $O(n^2 {(\log \log n)}^\frac{3}{2} / {(\log n)}^\frac{1}{2})$ time in the real-RAM model. Among the possible applications of those results, we show how to solve GPT in subquadratic time when the input points lie on $o({(\log n)}^\frac{1}{6}/{(\log \log n)}^\frac{1}{2})$ constant-degree polynomial curves. This constitutes a first step towards closing the major open question of whether GPT can be solved in subquadratic time.

To obtain these results, we generalize important tools --- such as batch range searching and dominance reporting --- to a polynomial setting. We expect these new tools to be useful in other applications.

at December 08, 2016 02:05 AM UTC

Authors: MohammadHossein Bateni, Hossein Esfandiari, Vahab Mirrokni
Download: PDF
Abstract: Coverage problems are central in optimization and have a wide range of applications in data mining and machine learning. While several distributed algorithms have been developed for coverage problems, the existing methods suffer from several limitations, e.g., they all achieve either suboptimal approximation guarantees or suboptimal space and memory complexities. In addition, previous algorithms developed for submodular maximization assume oracle access, and ignore the computational complexity of communicating large subsets or computing the size of the union of the subsets in this subfamily. In this paper, we develop an improved distributed algorithm for the $k$-cover and the set cover with $\lambda$ outliers problems, with almost optimal approximation guarantees, almost optimal memory complexity, and linear communication complexity running in only four rounds of computation. Finally, we perform an extensive empirical study of our algorithms on a number of publicly available real data sets, and show that using sketches of size $30$ to $600$ times smaller than the input, one can solve the coverage maximization problem with quality very close to that of the state-of-the-art single-machine algorithm.

at December 08, 2016 02:00 AM UTC

Authors: Jason Qin, Denys Kim, Yumei Tung
Download: PDF
Abstract: The information presented in this paper defines LogLog-Beta. LogLog-Beta is a new algorithm for estimating cardinalities based on LogLog counting. The new algorithm uses only one formula and needs no additional bias corrections for the entire range of cardinalities, therefore, it is more efficient and simpler to implement. Our simulations show that the accuracy provided by the new algorithm is as good as or better than the accuracy provided by either of HyperLogLog or HyperLogLog++. In addition to LogLog-Beta we also provide another one-formula estimator for cardinalities based on order statistics, a modification of an algorithm developed by Lumbroso.

at December 08, 2016 02:00 AM UTC

Authors: Julie Digne, Sébastien Valette, Raphaëlle Chaine
Download: PDF
Abstract: Shape analysis is very often performed by segmenting the shape into smooth surface parts that can be further classified using a set of predefined primitives such as planes, cylinders or spheres. Hence the shape is generally assumed to be manifold and smooth or to be an assembly of primitive parts. In this paper we propose an approach which does not make any assumption on the shape properties but rather learns its characteristics through a statistical analysis of local shape variations. Armed solely with a local probing operator, we are able to perform a non local analysis of the shape yielding a shape dictionary which encodes its structures. Our method relies on a novel description of shape variations, called Local Probing Field (LPF), which describes how a generic pattern is transformed onto the shape. By carefully optimizing the position and orientation of these descriptors we are able to capture shape similarities and gather them into a geometrically relevant dictionary over which the shape decomposes sparsely. Furthermore, this analysis also reveals the local dimensions of the shape. Our shape representation has several potential applications; here we demonstrate its efficiency for shape resampling and point set denoising.

at December 08, 2016 12:00 AM UTC

Authors: Sébastien Bouchard, Marjorie Bournat, Yoann Dieudonné, Swan Dubois, Franck Petit
Download: PDF
Abstract: In this paper we study the task of approach of two mobile agents having the same limited range of vision and moving asynchronously in the plane. This task consists in getting them in finite time within each other's range of vision. The agents execute the same deterministic algorithm and are assumed to have a compass showing the cardinal directions as well as a unit measure. On the other hand, they do not share any global coordinates system (like GPS), cannot communicate and have distinct labels. Each agent knows its label but does not know the label of the other agent or the initial position of the other agent relative to its own. The route of an agent is a sequence of segments that are subsequently traversed in order to achieve approach. For each agent, the computation of its route depends only on its algorithm and its label. An adversary chooses the initial positions of both agents in the plane and controls the way each of them moves along every segment of the routes, in particular by arbitrarily varying the speeds of the agents. A deterministic approach algorithm is a deterministic algorithm that always allows two agents with any distinct labels to solve the task of approach regardless of the choices and the behavior of the adversary. The cost of a complete execution of an approach algorithm is the length of both parts of route travelled by the agents until approach is completed. Let $\Delta$ and $l$ be the initial distance separating the agents and the length of the shortest label, respectively. Assuming that $\Delta$ and $l$ are unknown to both agents, does there exist a deterministic approach algorithm always working at a cost that is polynomial in $\Delta$ and $l$? In this paper, we provide a positive answer to the above question by designing such an algorithm.

at December 08, 2016 02:01 AM UTC

Authors: Balázs Keszegh, Dömötör Pálvölgyi
Download: PDF
Abstract: We study whether for a given planar family F there is an m such that any finite set of points can be 3-colored such that any member of F that contains at least m points contains two points with different colors. We conjecture that if F is a family of pseudo-disks, then m=3 is sufficient. We prove that when F is the family of all homothetic copies of a given convex polygon, then such an m exists. We also study the problem in higher dimensions.

at December 08, 2016 12:00 AM UTC

Authors: Mark de Berg, Sergio Cabello, Otfried Cheong, David Eppstein, Christian Knauer
Download: PDF
Abstract: Let $P$ be a set of $n$ points in the plane. We show how to find, for a given integer $k>0$, the smallest-area axis-parallel rectangle that covers $k$ points of $P$ in $O(nk^2 \log n+ n\log^2 n)$ time. We also consider the problem of, given a value $\alpha>0$, covering as many points of $P$ as possible with an axis-parallel rectangle of area at most $\alpha$. For this problem we give a randomized $(1-\varepsilon)$-approximation that works in near-linear time: in $O((n/\varepsilon^4)\log^3 n \log (1/\varepsilon))$ time we find an axis-parallel rectangle of area at most $\alpha$ that covers at least $(1-\varepsilon)\kappa^*$ points, where $\kappa^*$ is the maximum possible number of points that could be covered.

at December 08, 2016 12:00 AM UTC

Authors: Uriel Feige, Michal Feldman, Inbal Talgam-Cohen
Download: PDF
Abstract: Set functions with convenient properties (such as submodularity) appear in application areas of current interest, such as algorithmic game theory, and allow for improved optimization algorithms. It is natural to ask (e.g., in the context of data driven optimization) how robust such properties are, and whether small deviations from them can be tolerated. We consider two such questions in the important special case of linear set functions.

One question that we address is whether any set function that approximately satisfies the modularity equation (linear functions satisfy the modularity equation exactly) is close to a linear function. The answer to this is positive (in a precise formal sense) as shown by Kalton and Roberts [1983] (and further improved by Bondarenko, Prymak, and Radchenko [2013]). We revisit their proof idea that is based on expander graphs, and provide significantly stronger upper bounds by combining it with new techniques. Furthermore, we provide improved lower bounds for this problem.

Another question that we address is that of how to learn a linear function $h$ that is close to an approximately linear function $f$, while querying the value of $f$ on only a small number of sets. We present a deterministic algorithm that makes only linearly many (in the number of items) nonadaptive queries, by this improving over a previous algorithm of Chierichetti, Das, Dasgupta and Kumar [2015] that is randomized and makes more than a quadratic number of queries. Our learning algorithm is based on a Hadamard transform.

at December 08, 2016 02:02 AM UTC

We prove several results giving new and stronger connections between learning theory, circuit complexity and pseudorandomness. Let C be any typical class of Boolean circuits, and C[s(n)] denote n-variable C-circuits of size at most s(n). We show: Learning Speedups: If C[$n^{O(1)}$] admits a randomized weak learning algorithm under the uniform distribution with membership queries that runs in time $2^n/n^{\omega(1)}$, then for each $k \geq 1$ and $\varepsilon > 0$ the class C[$n^k$] can be learned to high accuracy in time $O(2^{n^\varepsilon})$. There is $\varepsilon > 0$ such that C[$2^{n^{\varepsilon}}$] can be learned in time $2^n/n^{\omega(1)}$ if and only if C[$n^{O(1)}$] can be learned in time $2^{(log n)^{O(1)}}$ Equivalences between Learning Models: We use learning speedups to obtain equivalences between various randomized learning and compression models, including sub-exponential time learning with membership queries, sub-exponential time learning with membership and equivalence queries, probabilistic function compression and probabilistic average-case function compression. A Dichotomy between Learnability and Pseudorandomness: In the non-uniform setting, there is non-trivial learning for C[$n^{O(1)}$] if and only if there are no exponentially secure pseudorandom functions computable in C[$n^{O(1)}$]. Lower Bounds from Nontrivial Learning: If for each k >= 1, (depth-$d$)-C[$n^k$] admits a randomized weak learning algorithm with membership queries under the uniform distribution that runs in time $2^n/n^{\omega(1)}$, then for each $k \geq 1$, BPE is not contained in (depth-$d$)-C[$n^k$]. If for some $\varepsilon > 0$ there are P-natural proofs useful against C[$2^{n^{\varepsilon}}$], then ZPEXP is not contained in C[$n^{O(1)}$]. Karp-Lipton Theorems for Probabilistic Classes: If there is a $k > 0$ such that BPE is contained in i.o.Circuit[$n^k$], then BPEXP is contained in i.o.EXP/O(log(n)). If ZPEXP is contained in i.o.Circuit[$2^{n/3}$], then ZPEXP is contained in i.o.ESUBEXP. Hardness Results for MCSP: All functions in non-uniform NC^1 reduce to the Minimum Circuit Size Problem via truth-table reductions computable by TC^0 circuits. In particular, if MCSP in TC^0 then NC^1 = TC^0.

at December 07, 2016 12:47 PM UTC

Authors: David Bergman, Carlos H. Cardonha, Andre A. Cire, Arvind U. Raghunathan
Download: PDF
Abstract: A graph is chordal if every cycle of length at least four contains a chord, that is, an edge connecting two nonconsecutive vertices of the cycle. Several classical applications in sparse linear systems, database management, computer vision, and semidefinite programming can be reduced to finding the minimum number of edges to add to a graph so that it becomes chordal, known as the minimum chordal completion problem (MCCP). In this article we propose a new formulation for the MCCP which does not rely on finding perfect elimination orderings of the graph, as has been considered in previous work. We introduce several families of facet-defining inequalities for cycle subgraphs and investigate the underlying separation problems, showing that some key inequalities are NP-Hard to separate. We also show general properties of the proposed polyhedra, indicating certain conditions and methods through which facets and inequalities associated with the polytope of a certain graph can be adapted in order to become valid and eventually facet-defining for some of its subgraphs or supergraphs. Numerical studies combining heuristic separation methods based on a threshold rounding and lazy-constraint generation indicate that our approach substantially outperforms existing methods for the MCCP, solving many benchmark graphs to optimality for the first time.

at December 07, 2016 02:06 AM UTC

Authors: Ke Liu, Ye Guo, Zeyun Yu
Download: PDF
Abstract: Development of additive manufacturing in last decade greatly improves tissue engineering. During the manufacturing of porous scaffold, simplified but functionally equivalent models are getting focused for practically reasons. Scaffolds can be classified into regular porous scaffolds and irregular porous scaffolds. Several methodologies are developed to design these scaffolds. A novel method is proposed in this paper using anisotropic radial basis function (ARBF) interpolation. This is method uses geometric models such as volumetric meshes as input and proves to be flexible because geometric models are able to capture the characteristics of complex tissues easily. Moreover, this method is straightforward and easy to implement.

at December 07, 2016 02:12 AM UTC

Authors: Tobias Wicky, Edgar Solomonik, Torsten Hoefler
Download: PDF
Abstract: We present a new parallel algorithm for solving triangular systems with multiple right hand sides (TRSM). TRSM is used extensively in numerical linear algebra computations, both to solve triangular linear systems of equations as well as to compute factorizations with triangular matrices, such as Cholesky, LU, and QR. Our algorithm achieves better theoretical scalability than known alternatives, while maintaining numerical stability, via selective use of triangular matrix inversion. We leverage the fact that triangular inversion and matrix multiplication are more parallelizable than the standard TRSM algorithm. By only inverting triangular blocks along the diagonal of the initial matrix, we generalize the usual way of TRSM computation and the full matrix inversion approach. This flexibility leads to an efficient algorithm for any ratio of the number of right hand sides to the triangular matrix dimension. We provide a detailed communication cost analysis for our algorithm as well as for the recursive triangular matrix inversion. This cost analysis makes it possible to determine optimal block sizes and processor grids a priori. Relative to the best known algorithms for TRSM, our approach can require asymptotically fewer messages, while performing optimal amounts of communication and computation.

at December 07, 2016 02:06 AM UTC

Authors: Waldo Gálvez, José A. Soto, José Verschae
Download: PDF
Abstract: Online models that allow recourse are highly effective in situations where classical models are too pessimistic. One such problem is the online machine covering problem on identical machines. In this setting jobs arrive one by one and must be assigned to machines with the objective of maximizing the minimum machine load. When a job arrives, we are allowed to reassign some jobs as long as their total size is (at most) proportional to the processing time of the arriving job. The proportionality constant is called the migration factor of the algorithm.

Using a new rounding procedure specially tailored for online problems, we design a $(4/3+\varepsilon)$-competitive algorithm using migration factor $\tilde{O}(1/\varepsilon^3)$. At every arrival we run an adaptation of the Largest Processing Time first (LPT) algorithm. Since the new job can cause a complete change of the assignment of smaller jobs, a low migration factor is achieved by carefully exploiting the highly symmetric structure obtained by our rounding.

We also study local search algorithms for the machine covering problem, and show that jump and swap optimality have an approximation ratio which lies in the interval $[1.691, 1.75]$ and can be adapted to the online context with a small constant as migration factor. Our lower bound is obtained by a nice construction based on Sylvester's sequence.

at December 07, 2016 02:08 AM UTC

Authors: Igor C. Oliveira, Rahul Santhanam
Download: PDF
Abstract: We study pseudodeterministic constructions, i.e., randomized algorithms which output the same solution on most computation paths. We establish unconditionally that there is an infinite sequence $\{p_n\}_{n \in \mathbb{N}}$ of increasing primes and a randomized algorithm $A$ running in expected sub-exponential time such that for each $n$, on input $1^{|p_n|}$, $A$ outputs $p_n$ with probability $1$. In other words, our result provides a pseudodeterministic construction of primes in sub-exponential time which works infinitely often.

This result follows from a much more general theorem about pseudodeterministic constructions. A property $Q \subseteq \{0,1\}^{*}$ is $\gamma$-dense if for large enough $n$, $|Q \cap \{0,1\}^n| \geq \gamma 2^n$. We show that for each $c > 0$ at least one of the following holds: (1) There is a pseudodeterministic polynomial time construction of a family $\{H_n\}$ of sets, $H_n \subseteq \{0,1\}^n$, such that for each $(1/n^c)$-dense property $Q \in \mathsf{DTIME}(n^c)$ and every large enough $n$, $H_n \cap Q \neq \emptyset$; or (2) There is a deterministic sub-exponential time construction of a family $\{H'_n\}$ of sets, $H'_n \subseteq \{0,1\}^n$, such that for each $(1/n^c)$-dense property $Q \in \mathsf{DTIME}(n^c)$ and for infinitely many values of $n$, $H'_n \cap Q \neq \emptyset$.

We provide further algorithmic applications that might be of independent interest. Perhaps intriguingly, while our main results are unconditional, they have a non-constructive element, arising from a sequence of applications of the hardness versus randomness paradigm.

at December 07, 2016 12:00 AM UTC

Authors: Sascha Brauer
Download: PDF
Abstract: Metric facility location and $K$-means are well-known problems of combinatorial optimization. Both admit a fairly simple heuristic called single-swap, which adds, drops or swaps open facilities until it reaches a local optimum. For both problems, it is known that this algorithm produces a solution that is at most a constant factor worse than the respective global optimum. In this paper, we show that single-swap applied to the weighted metric uncapacitated facility location and weighted discrete $K$-means problem is tightly PLS-complete and hence has exponential worst-case running time.

at December 07, 2016 02:06 AM UTC

Authors: Philip Bille, Inge Li Gørtz, Frederik Rye Skjoldjensen
Download: PDF
Abstract: Given a string $S$ of length $n$, the classic string indexing problem is to preprocess $S$ into a compact data structure that supports efficient subsequent pattern queries. In the \emph{deterministic} variant the goal is to solve the string indexing problem without any randomization (at preprocessing time or query time). In the \emph{packed} variant the strings are stored with several character in a single word, giving us the opportunity to read multiple characters simultaneously. Our main result is a new string index in the deterministic \emph{and} packed setting. Given a packed string $S$ of length $n$ over an alphabet $\sigma$, we show how to preprocess $S$ in $O(n)$ (deterministic) time and space $O(n)$ such that given a packed pattern string of length $m$ we can support queries in (deterministic) time $O\left(m/\alpha + \log m + \log \log \sigma\right), $ where $\alpha = w / \log \sigma$ is the number of characters packed in a word of size $w = \Theta(\log n)$. Our query time is always at least as good as the previous best known bounds and whenever several characters are packed in a word, i.e., $\log \sigma \ll w$, the query times are faster.

at December 07, 2016 02:10 AM UTC

Authors: Sunil Arya, Guilherme D. da Fonseca, David M. Mount
Download: PDF
Abstract: In the polytope membership problem, a convex polytope $K$ in $R^d$ is given, and the objective is to preprocess $K$ into a data structure so that, given a query point $q \in R^d$, it is possible to determine efficiently whether $q \in K$. We consider this problem in an approximate setting and assume that $d$ is a constant. Given an approximation parameter $\varepsilon > 0$, the query can be answered either way if the distance from $q$ to $K$'s boundary is at most $\varepsilon$ times $K$'s diameter. Previous solutions to the problem were on the form of a space-time trade-off, where logarithmic query time demands $O(1/\varepsilon^{d-1})$ storage, whereas storage $O(1/\varepsilon^{(d-1)/2})$ admits roughly $O(1/\varepsilon^{(d-1)/8})$ query time. In this paper, we present a data structure that achieves logarithmic query time with storage of only $O(1/\varepsilon^{(d-1)/2})$, which matches the worst-case lower bound on the complexity of any $\varepsilon$-approximating polytope. Our data structure is based on a new technique, a hierarchy of ellipsoids defined as approximations to Macbeath regions.

As an application, we obtain major improvements to approximate Euclidean nearest neighbor searching. Notably, the storage needed to answer $\varepsilon$-approximate nearest neighbor queries for a set of $n$ points in $O(\log \frac{n}{\varepsilon})$ time is reduced to $O(n/\varepsilon^{d/2})$. This halves the exponent in the $\varepsilon$-dependency of the existing space bound of roughly $O(n/\varepsilon^d)$, which has stood for 15 years (Har-Peled, 2001).

at December 07, 2016 02:11 AM UTC

Authors: Avni Verma, Kamalakar Karlapalem
Download: PDF
Abstract: The subset sum problem, also referred as SSP, is a NP-Hard computational problem. SSP has its applications in broad domains like cryptography, number theory, operation research and complexity theory. The most famous algorithm for solving SSP is Backtracking Algorithm which has exponential time complexity. Therefore, our goal is to design and develop better alternate enumeration techniques for faster generation of SSP solutions. Given the set of first n natural numbers which is denoted by Xn and a target sum S, we propose various alternate enumeration techniques which find all the subsets of Xn that add up to sum S.

In this paper, we present the mathematics behind this exponential problem. We analyze the distribution of power set of Xn and present formulas which show definite patterns and relations among these subsets. We introduce three major distributions for power set of Xn: Sum Distribution, Length-Sum Distribution and Element Distribution. These distributions are prepossessing procedures for various alternate enumeration techniques for solving SSP. We propose novel algorithms: Subset Generation using Sum Distribution, Subset Generation using Length-Sum Distribution, Basic Bucket Algorithm, Maximum and Minimum Frequency Driven Bucket Algorithms and Local Search using Maximal and Minimal Subsets for enumerating SSP.

We compare the performance of these approaches against the traditional backtracking algorithm. The efficiency and effectiveness of these algorithms are presented with the help of these experimental results. Furthermore, we studied the over solution set of subsets generated by various algorithms to get the complete solution for subset sum problem. Finally, we present a conjecture about upper bound on the number of subsets that has to be enumerated to get all solutions for Subset Sum Problem.

at December 07, 2016 02:09 AM UTC

Authors: Neil Lutz
Download: PDF
Abstract: Algorithmic dimensions quantify the algorithmic information density of individual points and may be defined in terms of Kolmogorov complexity. This work uses these dimensions to bound the classical Hausdorff and packing dimensions of intersections and Cartesian products of fractals in Euclidean spaces. This approach shows that a known intersection formula for Borel sets holds for arbitrary sets, and it significantly simplifies the proof of a known product formula. Both of these formulas are prominent, fundamental results in fractal geometry that are taught in typical undergraduate courses on the subject.

at December 07, 2016 02:02 AM UTC

Authors: Hang Li, Xun Gao, Tao Xin, Man-Hong Yung, Guilu Long
Download: PDF
Abstract: Correlation functions are often employed to quantify the relationships among interdependent variables or sets of data. Recently, a new class of correlation functions, called Forrelation, has been introduced by Aaronson and Ambainis for studying the query complexity of quantum devices. It was found that there exists a quantum query algorithm solving 2-fold Forrelation problems with an exponential quantum speedup over all possible classical means, which represents essentially the largest possible separation between quantum and classical query complexities. Here we report an experimental study probing the 2-fold and 3-fold Forrelations encoded in nuclear spins. The major experimental challenge is to control the spin fluctuation to within a threshold value, which is achieved by developing a set of optimized GRAPE pulse sequences. Overall, our small-scale implementation indicates that the quantum query algorithm is capable of determine the values of Forrelations within an acceptable accuracy required for demonstrating quantum supremacy, given the current technology and in the presence of experimental noise.

at December 07, 2016 02:02 AM UTC

Authors: David G. Harris
Download: PDF
Abstract: Set Cover is a classic NP-hard problem; as shown by Slav\'{i}k (1997) the greedy algorithm gives an approximation ratio of $\ln n - \ln \ln n + \Theta(1)$. A series of works by Lund \& Yannakakis (1994), Feige (1998), Moshkovitz (2015) have shown that, under the assumption $P \neq NP$, it is impossible to obtain a polynomial-time approximation ratio with approximation ratio $(1 - \alpha) \ln n$, for any constant $\alpha > 0$.

In this note, we show that under the Exponential Time Hypothesis (a stronger complexity-theoretic assumptions than $P \neq NP$), there are no polynomial-time algorithms achieving approximation ratio $\ln n - C \ln \ln n$, where $C$ is some universal constant. Thus, the greedy algorithm achieves an essentially optimal approximation ratio (up to the coefficient of $\ln \ln n$).

at December 07, 2016 02:06 AM UTC

Authors: Joshua A. Grochow, Cristopher Moore
Download: PDF
Abstract: We show how to construct highly symmetric algorithms for matrix multiplication. In particular, we consider algorithms which decompose the matrix multiplication tensor into a sum of rank-1 tensors, where the decomposition itself consists of orbits under some finite group action. We show how to use the representation theory of the corresponding group to derive simple constraints on the decomposition, which we solve by hand for n=2,3,4,5, recovering Strassen's algorithm (in a particularly symmetric form) and new algorithms for larger n. While these new algorithms do not improve the known upper bounds on tensor rank or the matrix multiplication exponent, they are beautiful in their own right, and we point out modifications of this idea that could plausibly lead to further improvements. Our constructions also suggest further patterns that could be mined for new algorithms, including a tantalizing connection with lattices.

at December 07, 2016 12:00 AM UTC

I asked a student to show that between any two rationals is a rational.

She did the following: if x < y are rational then take δ << y-x and rational and use x+δ

This is correct though more complicated then what I had in mind: (x+y)/2

I then asked her to prove that between two irrationals is an irrational.

She did the following: if x < y are irrational then take δ << y-x and rational and use x+δ


I had a different proof in mind: the number of reals in (x,y) is uncountable while the number of rationals is countable, so there must be at least one (in fact uncountable many) irrationals in (x,y).
(NOTE- I originally had `the number of irrationals in (x,y) is ...' which, as comment by stu below
points out, is assuming what I want to prove. Typo on my part.)

These proofs raise questions about our criteria of goodness-of-proof.

1) Which proof for rationals is better:

The delta-proof or the (x+y)/2 proof?

The (x+y)/2 proof is simple, but the delta-proof also works for irrationals.

2) Which proof for irrationals is better:

The delta proof or the uncountable proof?

Which one is simpler?

The uncountable proof gives more information (the number of irrationals is unctble)
but is nonconstructive.

3) Are there other proofs for either theorem?

Which proof do you prefer? Why? What is your criteria?

by GASARCH ( at December 06, 2016 02:56 PM UTC

TTI-Chicago invites applications for faculty positions in computer science at the level of tenure-track and senior (tenured) professor, endowed three-year research assistant professor, and visiting professor.

We will start reviewing applications on December 15, 2016, and will continue until the positions are filled.



by theorycsjobs at December 06, 2016 09:02 AM UTC

November was quite eventful for property testing, with six exciting new results for you to peruse.

Alice and Bob Show Distribution Testing Lower Bounds (They don’t talk to each other anymore.), by Eric Blais, Clément L. Canonne, and Tom Gur (ECCC). The authors examine distribution testing lower bounds through the lens of communication complexity, a-la Blais, Brody, and Matulef, who previously showed such a connection for property testing lower bounds in the Boolean function setting. In this work, the authors’ main result involves testing identity to a specific distribution \(p\). While Valiant and Valiant showed tight bounds involving the \(\ell_{2/3}\)-quasinorm of \(p\), this paper gives tight bounds using a different quantity, namely Peetre’s \(K\)-functional. Their techniques also give lower bounds for several other properties (some old and some new), including monotonicity, sparse symmetric support, and \(k\)-juntas in the PAIRCOND model.

Fast property testing and metrics for permutations, by Jacob Fox and Fan Wei (arXiv). This paper proves a general testing result for permutations. In particular, it shows that any hereditary property of permutations is two-sided testable with respect to the rectangular distance with a constant number of queries. While in many such testing results on combinatorial objects (such as graphs), a “constant number of queries” may be exorbitantly large (due to complexities arising from an application of the strong regularity lemma), surprisingly, the complexity obtained in this paper is polynomial in \(1/\varepsilon\).

A Unified Maximum Likelihood Approach for Optimal Distribution Property Estimation, by Jayadev Acharya, Hirakendu Das, Alon Orlitsky, and Ananda Theertha Suresh (arXiv, ECCC). There has been a considerable deal of work recently on estimating several symmetric distribution properties, namely support size, support coverage, entropy, and distance to uniformity. One drawback of these results is that, despite the similarities between these properties, seemingly different techniques are required to obtain optimal rates for each. This paper shows that one concept, pattern maximum likelihood (PML), unifies them all. A PML distribution of a multiset of samples is any distribution which maximizes the likelihood of observing the multiplicities of the multiset, after discarding the labels on the elements. This can behave quite differently from the sequence maximum likelihood (SML), or empirical distribution. In particular, if a multiset of samples on support \(\{x, y\}\) is \(\{x, y, x\}\), then the SML is \((2/3, 1/3)\), while the PML is \((1/2, 1/2)\). The main result of this paper is, if one can approximate PML, then applying the plug-in estimator gives the optimal sample complexity for all of the aforementioned properties. The one catch is that efficient approximation of the PML is currently open. Consider the gauntlet thrown to all our readers!

Statistical Query Lower Bounds for Robust Estimation of High-dimensional Gaussians and Gaussian Mixtures, by Ilias Diakonikolas, Daniel M. Kane, and Alistair Stewart (arXiv, ECCC). While the main focus of this work is on lower bounds for distribution estimation in the statistical query (SQ) model, this paper also has some interesting lower bounds for multivariate testing problems. Namely, they show that it is impossible to achieve a sample complexity which is significantly sublinear in the dimension for either of the following two problems:

  • Given samples from an \(n\)-dimensional distribution \(D\), distinguish whether \(D = \mathcal{N}(0,I)\) or \(D\) is \(\varepsilon/100\)-close to any \(\mathcal{N}(\mu, I)\) where \(\|\mu\|_2 \geq \varepsilon\).
  • Given samples from an \(n\)-dimensional distribution \(D\), distinguish whether \(D = \mathcal{N}(0,I)\) or \(D\) is a mixture of \(k\) Gaussians with almost non-overlapping components.

Collision-based Testers are Optimal for Uniformity and Closeness, by Ilias Diakonikolas, Themis Gouleakis, John Peebles, and Eric Price (arXiv, ECCC). In the TCS community, the seminal results in the field of distribution testing include the papers of Goldreich and Ron and Batu et al., which study uniformity testing and \(\ell_2\)-closeness testing (respectively) using collision based testers. While these testers appeared to be lossy, subsequent results have attained tight upper and lower bounds for these problems. As suggested by the title, this paper shows that collision-based testers actually achieve the optimal sample complexities for uniformity and \(\ell_2\)-closeness testing.

Testing submodularity and other properties of valuation functions, by Eric Blais, and Abhinav Bommireddi (arXiv). This paper studies the query complexity of several properties which have been studied in the context of valuation functions in algorithmic game theory. These properties are real-valued functions over the Boolean hypercube, and include submodularity, additivity, unit-demand, and much more. The authors show that, for constant \(\varepsilon\) and any \(p \geq 1\), these properties are constant-sample testable. Their results are obtained via an extension of the testing by implicit learning method of Diakonikolas et al.

by Gautam "G" Kamath at December 06, 2016 05:10 AM UTC

Authors: Siddhartha V. Jayanti, Robert E. Tarjan
Download: PDF
Abstract: The disjoint set union problem is a basic problem in data structures with a wide variety of applications. We extend a known efficient sequential algorithm for this problem to obtain a simple and efficient concurrent wait-free algorithm running on an asynchronous parallel random access machine (APRAM). Crucial to our result is the use of randomization. Under a certain independence assumption, for a problem instance in which there are n elements, m operations, and p processes, our algorithm does Theta(m (alpha(n, m/(np)) + log(np/m + 1))) expected work, where the expectation is over the random choices made by the algorithm and alpha is a functional inverse of Ackermann's function. In addition, each operation takes O(log n) steps with high probability. Our algorithm is significantly simpler and more efficient than previous algorithms proposed by Anderson and Woll. Under our independence assumption, our algorithm achieves almost-linear speed-up for applications in which all or most of the processes can be kept busy.

at December 06, 2016 12:00 AM UTC

Authors: Yin Tat Lee, Santosh Vempala
Download: PDF
Abstract: We show that the KLS constant for n-dimensional isotropic logconcave measures is $n^{o(1)}$, thus approaching the conjecture that it is $O(1)$. As corollaries we obtain the same almost constant bound on the thin-shell estimate, isotropic constant, Poincar\'e constant and exponential concentration constant; it also follows that the ball walk for an isotropic logconcave density in $R^n$, from a warm start, converges in $O^{*}(n^{2+o(1)})$ steps.

at December 06, 2016 12:00 AM UTC

Authors: Jorma Tarhio, Jan Holub, Emanuele Giaquinta
Download: PDF
Abstract: More than 120 algorithms have been developed for exact string matching within the last 40 years. We show by experiments that the \naive{} algorithm exploiting SIMD instructions of modern CPUs (with symbols compared in a special order) is the fastest one for patterns of length up to about 50 symbols and extremely good for longer patterns and small alphabets. The algorithm compares 16 or 32 characters in parallel by applying SSE2 or AVX2 instructions, respectively. Moreover, it uses loop peeling to further speed up the searching phase. We tried several orders for comparisons of pattern symbols and the increasing order of their probabilities in the text was the best.

at December 06, 2016 12:00 AM UTC

Authors: Jennifer Iglesias, Rajmohan Rajaraman, R Ravi, Ravi Sundaram
Download: PDF
Abstract: We study the design of schedules for multi-commodity multicast. In this problem, we are given an undirected graph $G$ and a collection of source-destination pairs, and the goal is to schedule a minimum-length sequence of matchings that connects every source with its respective destination. Multi-commodity multicast models a classic information dissemination problem in networks where the primary communication constraint is the number of connections that a given node can make, not link bandwidth. Multi-commodity multicast and its special cases, broadcast and multicast, are all NP-complete and the best approximation factors known are $2^{\widetilde{O}(\sqrt{\log n})}$, $O(\log n/\log\log n)$, and $O(\log k/\log\log k)$, respectively, where $n$ is the number of nodes and $k$ is the number of terminals in the multicast instance.

Multi-commodity multicast is closely related to the problem of finding a subgraph of optimal poise, where the poise is defined as the sum of the maximum degree of the subgraph and the maximum distance between any source-destination pair in the subgraph. We first show that for any instance of the multicast problem, the minimum poise subgraph can be approximated to within a factor of $O(\log k \log n)$ with respect to the value of a natural LP relaxation in an $n$-node graph with $k$ source-destination pairs. This is the first upper bound on the integrality gap of the natural LP; all previous algorithms, both combinatorial and LP-based, yielded approximations with respect to the integer optimum. Using this integrality gap upper bound, we obtain our main result: an $O(\log^2 k\log^2 n)$-approximation for multi-commodity multicast for planar graphs. In addition, the techniques used also give an $O(\log^2 n)$-approximation for radio gossip in planar graphs. This is the first bound for radio gossip given that doesn't rely on the maximum degree of the graph.

at December 06, 2016 12:00 AM UTC

Authors: Apoorva Honnegowda Roopa, Shrisha Rao
Download: PDF
Abstract: This paper defines a distance function that measures the dissimilarity between planar geometric figures formed with straight lines. This function can in turn be used in partial matching of different geometric figures. For a given pair of geometric figures that are graphically isomorphic, one function measures the angular dissimilarity and another function measures the edge length disproportionality. The distance function is then defined as the convex sum of these two functions. The novelty of the presented function is that it satisfies all properties of a distance function and the computation of the same is done by projecting appropriate features to a cartesian plane. To compute the deviation from the angular similarity property, the Euclidean distance between the given angular pairs and the corresponding points on the $y=x$ line is measured. Further while computing the deviation from the edge length proportionality property, the best fit line, for the set of edge lengths, which passes through the origin is found, and the Euclidean distance between the given edge length pairs and the corresponding point on a $y=mx$ line is calculated. Iterative Proportional Fitting Procedure (IPFP) is used to find this best fit line. We demonstrate the behavior of the defined function for some sample pairs of figures.

at December 06, 2016 12:00 AM UTC

Authors: Jean-Lou De Carufel, Carsten Grimm, Stefan Schirra, Michiel Smid
Download: PDF
Abstract: We augment a tree $T$ with a shortcut $pq$ to minimize the largest distance between any two points along the resulting augmented tree $T+pq$. We study this problem in a continuous and geometric setting where $T$ is a geometric tree in the Euclidean plane, where a shortcut is a line segment connecting any two points along the edges of $T$, and we consider all points on $T+pq$ (i.e., vertices and points along edges) when determining the largest distance along $T+pq$. We refer to the largest distance between any two points along edges as the continuous diameter to distinguish it from the discrete diameter, i.e., the largest distance between any two vertices.

We establish that a single shortcut is sufficient to reduce the continuous diameter of a geometric tree $T$ if and only if the intersection of all diametral paths of $T$ is neither a line segment nor a single point. We determine an optimal shortcut for a geometric tree with $n$ straight-line edges in $O(n \log n)$ time. Apart from the running time, our results extend to geometric trees whose edges are rectifiable curves. The algorithm for trees generalizes our algorithm for paths.

at December 06, 2016 12:00 AM UTC

Authors: Elena Khramtcova, Evanthia Papadopoulou
Download: PDF
Abstract: This paper applies the randomized incremental construction (RIC) framework to computing the Hausdorff Voronoi diagram of a family of k clusters of points in the plane. The total number of points is n. The diagram is a generalization of Voronoi diagrams based on the Hausdorff distance function. The combinatorial complexity of the Hausdorff Voronoi diagram is O(n+m), where m is the total number of crossings between pairs of clusters. For non-crossing clusters (m=0), our algorithm works in expected O(n log n + k log n log k) time and deterministic

O(n) space. For arbitrary clusters (m=O(n^2)), the algorithm runs in expected O((m+n log k) log n) time and O(m +n log k) space. When clusters cross, bisectors are disconnected curves resulting in disconnected Voronoi regions that challenge the incremental construction. This paper applies the RIC paradigm to a Voronoi diagram with disconnected regions and disconnected bisectors, for the first time.

at December 06, 2016 12:00 AM UTC

Authors: Michael Sutton, Tal Ben-Nun, Amnon Barak, Sreepathi Pai, Keshav Pingali
Download: PDF
Abstract: This report presents an adaptive work-efficient approach for implementing the Connected Components algorithm on GPUs. The results show a considerable increase in performance (up to 6.8$\times$) over current state-of-the-art solutions.

at December 06, 2016 12:00 AM UTC

Authors: Dániel Marx, Anastasios Sidiropoulos
Download: PDF
Abstract: We are studying $d$-dimensional geometric problems that have algorithms with $1-1/d$ appearing in the exponent of the running time, for example, in the form of $2^{n^{1-1/d}}$ or $n^{k^{1-1/d}}$. This means that these algorithms perform somewhat better in low dimensions, but the running time is almost the same r all large values $d$ of the dimension. Our main result is showing that for some of these problems the dependence on $1-1/d$ is best possible under a standard complexity assumption. We show that, assuming the Exponential Time Hypothesis,

--- $d$-dimensional Euclidean TSP on $n$ points cannot be solved in time $2^{O(n^{1-1/d-\epsilon})}$ for any $\epsilon>0$, and

--- the problem of finding a set of $k$ pairwise nonintersecting $d$-dimensional unit balls/axis parallel unit cubes cannot be solved in time $f(k)n^{o(k^{1-1/d})}$ for any computable function $f$.

These lower bounds essentially match the known algorithms for these problems. To obtain these results, we first prove lower bounds on the complexity of Constraint Satisfaction Problems (CSPs) whose constraint graphs are $d$-dimensional grids. We state the complexity results on CSPs in a way to make them convenient starting points for problem-specific reductions to particular $d$-dimensional geometric problems and to be reusable in the future for further results of similar flavor.

at December 06, 2016 12:00 AM UTC

Authors: Johan Thapper, Stanislav Zivny
Download: PDF
Abstract: It has been shown that for a general-valued constraint language $\Gamma$ the following statements are equivalent: (1) any instance of $\operatorname{VCSP}(\Gamma)$ can be solved to optimality using a constant level of the Sherali-Adams LP hierarchy; (2) any instance of $\operatorname{VCSP}(\Gamma)$ can be solved to optimality using the third level of the Sherali-Adams LP hierarchy; (3) the support of $\Gamma$ satisfies the "bounded width condition", i.e., it contains weak near-unanimity operations of all arities.

We show that if the support of $\Gamma$ violates the bounded with condition then not only is $\operatorname{VCSP}(\Gamma)$ not solved by a constant level of the Sherali-Adams LP hierarchy but it is also not solved by $\Omega(n)$ levels of the Lasserre SDP hierarchy (also known as the sum-of-squares SDP hierarchy). For $\Gamma$ corresponding to linear equations in an Abelian group, this result follows from existing work on inapproximability of Max-CSPs. By a breakthrough result of Lee, Raghavendra, and Steurer [STOC'15], our result implies that for any $\Gamma$ whose support violates the bounded width condition no SDP relaxation of polynomial-size solves $\operatorname{VCSP}(\Gamma)$.

We establish our result by proving that various reductions preserve exact solvability by the Lasserre SDP hierarchy (up to a constant factor in the level of the hierarchy). Our results hold for general-valued constraint languages, i.e., sets of functions on a fixed finite domain that take on rational or infinite values, and thus also hold in notable special cases of $\{0,\infty\}$-valued languages (CSPs), $\{0,1\}$-valued languages (Min-CSPs/Max-CSPs), and $\mathbb{Q}$-valued languages (finite-valued CSPs).

at December 06, 2016 02:06 AM UTC

Authors: Alexander Heußner, Aleks Kissinger, Anton Wijs
Download: PDF
Abstract: Graphs are used as models in all areas of computer science: examples are state space graphs, control flow graphs, syntax graphs, UML-type models of all kinds, network layouts, social networks, dependency graphs, and so forth. Once such graphical models are constructed, they can be analysed and transformed to verify their correctness within a domain, discover new properties, or produce new equivalent and/or optimised versions.

Graphs as Models' main focus is the exchange and collaboration of researchers from different backgrounds. The workshop serves as platform to boost inter- and transdisciplinary research and wants to serve as leeway for new ideas. Thus, besides classical research presentations, the workshop is highly geared toward numerous interactive sessions.

The second edition of the Graphs as Models workshop was held on 2-3 June 2016 in Eindhoven, The Netherlands, colocated with the 19th European Joint Conferences on Theory and Practice of Software (ETAPS 2016).

at December 06, 2016 12:00 AM UTC

Authors: Mohammad Bavarian, Badih Ghazi, Elad Haramaty, Pritish Kamath, Ronald L. Rivest, Madhu Sudan
Download: PDF
Abstract: In the "correlated sampling" problem, two players, say Alice and Bob, are given two distributions, say $P$ and $Q$ respectively, over the same universe and access to shared randomness. The two players are required to output two elements, without any interaction, sampled according to their respective distributions, while trying to minimize the probability that their outputs disagree. A well-known protocol due to Holenstein, with close variants (for similar problems) due to Broder, and to Kleinberg and Tardos, solves this task with disagreement probability at most $2 \delta/(1+\delta)$, where $\delta$ is the total variation distance between $P$ and $Q$. This protocol has been used in several different contexts including sketching algorithms, approximation algorithms based on rounding linear programming relaxations, the study of parallel repetition and cryptography.

In this note, we give a surprisingly simple proof that this protocol is in fact tight. Specifically, for every $\delta \in (0,1)$, we show that any correlated sampling scheme should have disagreement probability at least $2\delta/(1+\delta)$. This partially answers a recent question of Rivest.

Our proof is based on studying a new problem we call "constrained agreement". Here, Alice is given a subset $A \subseteq [n]$ and is required to output an element $i \in A$, Bob is given a subset $B \subseteq [n]$ and is required to output an element $j \in B$, and the goal is to minimize the probability that $i \neq j$. We prove tight bounds on this question, which turn out to imply tight bounds for correlated sampling. Though we settle basic questions about the two problems, our formulation also leads to several questions that remain open.

at December 06, 2016 02:00 AM UTC