Theory of Computing Blog Aggregator

Optimal lower bounds for universal relation, samplers, and finding duplicates

Authors: Jelani Nelson, Jakub Pachocki, Zhengyu Wang
Abstract: In the communication problem $\mathbf{UR}$ (universal relation) [KRW95], Alice and Bob respectively receive $x$ and $y$ in $\{0,1\}^n$ with the promise that $x\neq y$. The last player to receive a message must output an index $i$ such that $x_i\neq y_i$. We prove that the randomized one-way communication complexity of this problem in the public coin model is exactly $\Theta(\min\{n, \log(1/\delta)\log^2(\frac{n}{\log(1/\delta)})\})$ bits for failure probability $\delta$. Our lower bound holds even if promised $\mathop{support}(y)\subset \mathop{support}(x)$. As a corollary, we obtain optimal lower bounds for $\ell_p$-sampling in strict turnstile streams for $0\le p < 2$, as well as for the problem of finding duplicates in a stream. Our lower bounds do not need to use large weights, and hold even if it is promised that $x\in\{0,1\}^n$ at all points in the stream.

Our lower bound demonstrates that any algorithm $\mathcal{A}$ solving sampling problems in turnstile streams in low memory can be used to encode subsets of $[n]$ of certain sizes into a number of bits below the information theoretic minimum. Our encoder makes adaptive queries to $\mathcal{A}$ throughout its execution, but done carefully so as to not violate correctness. This is accomplished by injecting random noise into the encoder's interactions with $\mathcal{A}$, which is loosely motivated by techniques in differential privacy. Our correctness analysis involves understanding the ability of $\mathcal{A}$ to correctly answer adaptive queries which have positive but bounded mutual information with $\mathcal{A}$'s internal randomness, and may be of independent interest in the newly emerging area of adaptive data analysis with a theoretical computer science lens.

Resolving the Complexity of Some Fundamental Problems in Computational Social Choice

Authors: Palash Dey
Abstract: This thesis is in the area called computational social choice which is an intersection area of algorithms and social choice theory.

Minimum Cuts and Shortest Cycles in Directed Planar Graphs via Noncrossing Shortest Paths

Authors: Hung-Chun Liang, Hsueh-I Lu
Abstract: Let $G$ be an $n$-node simple directed planar graph with nonnegative edge weights. We study the fundamental problems of computing (1) a global cut of $G$ with minimum weight and (2) a~cycle of $G$ with minimum weight. The best previously known algorithm for the former problem, running in $O(n\log^3 n)$ time, can be obtained from the algorithm of \Lacki, Nussbaum, Sankowski, and Wulff-Nilsen for single-source all-sinks maximum flows. The best previously known result for the latter problem is the $O(n\log^3 n)$-time algorithm of Wulff-Nilsen. By exploiting duality between the two problems in planar graphs, we solve both problems in $O(n\log n\log\log n)$ time via a divide-and-conquer algorithm that finds a shortest non-degenerate cycle. The kernel of our result is an $O(n\log\log n)$-time algorithm for computing noncrossing shortest paths among nodes well ordered on a common face of a directed plane graph, which is extended from the algorithm of Italiano, Nussbaum, Sankowski, and Wulff-Nilsen for an undirected plane graph.

Width Hierarchies for Quantum and Classical Ordered Binary Decision Diagrams with Repeated Test

Abstract: We consider quantum, nondterministic and probabilistic versions of known computational model Ordered Read-$k$-times Branching Programs or Ordered Binary Decision Diagrams with repeated test ($k$-QOBDD, $k$-NOBDD and $k$-POBDD). We show width hierarchy for complexity classes of Boolean function computed by these models and discuss relation between different variants of $k$-OBDD.

Distance-sensitive hashing

Authors: Martin Aumüller, Tobias Christiani, Rasmus Pagh, Francesco Silvestri
Abstract: We initiate the study of distance-sensitive hashing, a generalization of locality-sensitive hashing that seeks a family of hash functions such that the probability of two points having the same hash value is a given function of the distance between them. More precisely, given a distance space $(X, \text{dist})$ and a "collision probability function" (CPF) $f\colon \mathbb{R}\rightarrow [0,1]$ we seek a distribution over pairs of functions $(h,g)$ such that for every pair of points $x, y \in X$ the collision probability is $\Pr[h(x)=g(y)] = f(\text{dist}(x,y))$. Locality-sensitive hashing is the study of how fast a CPF can decrease as the distance grows. For many spaces $f$ can be made exponentially decreasing even if we restrict attention to the symmetric case where $g=h$. In this paper we study how asymmetry makes it possible to achieve CPFs that are, for example, increasing or unimodal. Our original motivation comes from annulus queries where we are interested in searching for points at distance approximately $r$ from a query point, but we believe that distance-sensitive hashing is of interest beyond this application.

Information complexity of the AND function in the two-Party, and multiparty settings

Authors: Yuval Filmus, Hamed Hatami, Yaqiao Li, Suzin You
Abstract: In a recent breakthrough paper [M. Braverman, A. Garg, D. Pankratov, and O. Weinstein, From information to exact communication, STOC'13] Braverman et al. developed a local characterization for the zero-error information complexity in the two party model, and used it to compute the exact internal and external information complexity of the 2-bit AND function, which was then applied to determine the exact asymptotic of randomized communication complexity of the set disjointness problem.

In this article, we extend their results on AND function to the multi-party number-in-hand model by proving that the generalization of their protocol has optimal internal and external information cost for certain distributions. Our proof has new components, and in particular it fixes some minor gaps in the proof of Braverman et al.

The Dagstuhl Family

This week I'm at the Dagstuhl workshop on Computational Complexity of Discrete Problems. As you long time readers know Dagstuhl is a German center that hosts weekly computer science workshops. I've been coming to Dagstuhl for some 25 years now but for the first time brought my family, my wife Marcy and daughter Molly, so they can see where I have spent more than half a year total of my life. Molly, currently a freshman at the University of Chicago, was the only Chicago representative, though the attendees included four Chicago PhDs, a former postdoc and a former professor.

We had a different ice breaker, where each person wrote topics they think about which ended up looking look like an interesting bipartite graph.

Molly has a few thoughts on Dagstuhl:

The coolest thing about the study of computer science is this place.

Okay, I know my dad would disagree with me (he probably thinks the coolest thing about computer science is the computer science itself). But for me, someone quite removed from the math and science and thinking, this place is by far the coolest thing about the computer science community. The point of it is isolation, as well simultaneous connection. The isolation comes in the form of a meeting center in rural Germany, separated from the world, devices which can (and do) block wifi in rooms like lecture rooms and the dining hall, resulting in a week without much interaction with the outside world. The connection stems from this very isolation -- in this highly isolated place, people are forced to connect with each other face-to-face, and to get to know each other, as well as the ideas and problems people are working on. The isolation creates a heightened sense of community, both in social and intellectual senses of the word. Forced to be so close and so interconnected, it’s no wonder so many problems get solved here.

I’m glad I got to come see why my father has been coming here for a quarter century. He is very old.

by Lance Fortnow (noreply@blogger.com) at March 23, 2017 09:16 AM UTC

On the Probe Complexity of Local Computation Algorithms

Authors: Uriel Feige, Boaz Patt-Shamir, Shai Vardi
Abstract: The Local Computation Algorithms (LCA) model is a computational model aimed at problem instances with huge inputs and output. For graph problems, the input graph is accessed using probes: strong probes (SP) specify a vertex $v$ and receive as a reply a list of $v$'s neighbors, and weak probes (WP) specify a vertex $v$ and a port number $i$ and receive as a reply $v$'s $i^{th}$ neighbor. Given a local query (e.g., "is a certain vertex in the vertex cover of the input graph?"), an LCA should compute the corresponding local output (e.g., "yes" or "no") while making only a small number of probes, with the requirement that all local outputs form a single global solution (e.g., a legal vertex cover). We study the probe complexity of LCAs that are required to work on graphs that may have arbitrarily large degrees. In particular, such LCAs are expected to probe the graph a number of times that is significantly smaller than the maximum, average, or even minimum degree.

For weak probes, we focus on the weak coloring problem. Among our results we show a separation between weak 3-coloring and weak 2-coloring for deterministic LCAs: $\log^* n + O(1)$ weak probes suffice for weak 3-coloring, but $\Omega\left(\frac{\log n}{\log\log n}\right)$ weak probes are required for weak 2-coloring.

For strong probes, we consider randomized LCAs for vertex cover and maximal/maximum matching. Our negative results include showing that there are graphs for which finding a \emph{maximal} matching requires $\Omega(\sqrt{n})$ strong probes. On the positive side, we design a randomized LCA that finds a $(1-\epsilon)$ approximation to \emph{maximum} matching in regular graphs, and uses $\frac{1}{\epsilon }^{O\left( \frac{1}{\epsilon ^2}\right)}$ probes, independently of the number of vertices and of their degrees.

Query-to-Communication Lifting for BPP

Authors: Mika Göös, Toniann Pitassi, Thomas Watson
Abstract: For any $n$-bit boolean function $f$, we show that the randomized communication complexity of the composed function $f\circ g^n$, where $g$ is an index gadget, is characterized by the randomized decision tree complexity of $f$. In particular, this means that many query complexity separations involving randomized models (e.g., classical vs. quantum) automatically imply analogous separations in communication complexity.

A Counterexample to the "Majority is Least Stable" Conjecture

Authors: Vishesh Jain
Abstract: We exhibit a linear threshold function in 5 variables with strictly smaller noise stability (for small values of the correlation parameter) than the majority function on 5 variables, thereby providing a counterexample to the "Majority is Least Stable" Conjecture of Benjamini, Kalai, and Schramm.

Clustering for Different Scales of Measurement - the Gap-Ratio Weighted K-means Algorithm

Authors: Joris Guérin, Olivier Gibaru, Stéphane Thiery, Eric Nyiri
Abstract: This paper describes a method for clustering data that are spread out over large regions and which dimensions are on different scales of measurement. Such an algorithm was developed to implement a robotics application consisting in sorting and storing objects in an unsupervised way. The toy dataset used to validate such application consists of Lego bricks of different shapes and colors. The uncontrolled lighting conditions together with the use of RGB color features, respectively involve data with a large spread and different levels of measurement between data dimensions. To overcome the combination of these two characteristics in the data, we have developed a new weighted K-means algorithm, called gap-ratio K-means, which consists in weighting each dimension of the feature space before running the K-means algorithm. The weight associated with a feature is proportional to the ratio of the biggest gap between two consecutive data points, and the average of all the other gaps. This method is compared with two other variants of K-means on the Lego bricks clustering problem as well as two other common classification datasets.

Lifting randomized query complexity to randomized communication complexity

Authors: Anurag Anshu, Naresh B. Goud, Rahul Jain, Srijita Kundu, Priyanka Mukhopadhyay
Abstract: We show that for any (partial) query function $f:\{0,1\}^n\rightarrow \{0,1\}$, the randomized communication complexity of $f$ composed with $\mathrm{Index}^n_m$ (with $m= \mathrm{poly}(n)$) is at least the randomized query complexity of $f$ times $\log n$. Here $\mathrm{Index}_m : [m] \times \{0,1\}^m \rightarrow \{0,1\}$ is defined as $\mathrm{Index}_m(x,y)= y_x$ (the $x$th bit of $y$).

Our proof follows on the lines of Raz and Mckenzie [RM99] (and its generalization due to [GPW15]), who showed a lifting theorem for deterministic query complexity to deterministic communication complexity. Our proof deviates from theirs in an important fashion that we consider partitions of rectangles into many sub-rectangles, as opposed to a particular sub-rectangle with desirable properties, as considered by Raz and McKenzie. As a consequence of our main result, some known separations between quantum and classical communication complexities follow from analogous separations in the query world.

Efficient PAC Learning from the Crowd

Authors: Pranjal Awasthi, Avrim Blum, Nika Haghtalab, Yishay Mansour
Abstract: In recent years crowdsourcing has become the method of choice for gathering labeled training data for learning algorithms. Standard approaches to crowdsourcing view the process of acquiring labeled data separately from the process of learning a classifier from the gathered data. This can give rise to computational and statistical challenges. For example, in most cases there are no known computationally efficient learning algorithms that are robust to the high level of noise that exists in crowdsourced data, and efforts to eliminate noise through voting often require a large number of queries per example.

In this paper, we show how by interleaving the process of labeling and learning, we can attain computational efficiency with much less overhead in the labeling cost. In particular, we consider the realizable setting where there exists a true target function in $\mathcal{F}$ and consider a pool of labelers. When a noticeable fraction of the labelers are perfect, and the rest behave arbitrarily, we show that any $\mathcal{F}$ that can be efficiently learned in the traditional realizable PAC model can be learned in a computationally efficient manner by querying the crowd, despite high amounts of noise in the responses. Moreover, we show that this can be done while each labeler only labels a constant number of examples and the number of labels requested per example, on average, is a constant. When no perfect labelers exist, a related task is to find a set of the labelers which are good but not perfect. We show that we can identify all good labelers, when at least the majority of labelers are good.

Approximating k-spanners in the LOCAL model

Authors: Michael Dinitz, Yasamin Nazari
Abstract: Graph spanners have been studied extensively, and have many applications in algorithms, distributed systems, and computer networks. For many of these application, we want distributed constructions of spanners, i.e., algorithms which use only local information. Dinitz and Krauthgamer (PODC 2011) provided a distributed approximation algorithm for 2-spanners in the LOCAL model with polylogarithmic running time, but the question of whether a similar algorithm exists for k-spanners with k > 2 remained open. In this paper, we show that a similar algorithm also works for cases where k > 2.

Subset sum problem in polycyclic groups

Authors: Andrey Nikolaev, Alexander Ushakov
Abstract: We consider a group-theoretic analogue of the classic subset sum problem. It is known that every virtually nilpotent group has polynomial time decidable subset sum problem. In this paper we use subgroup distortion to show that every polycyclic non-virtually-nilpotent group has NP-complete subset sum problem.

Topological Analysis of Nerves, Reeb Spaces, Mappers, and Multiscale Mappers

Authors: Tamal K. Dey, Facundo Memoli, Yusu Wang
Abstract: Data analysis often concerns not only the space where data come from, but also various types of maps attached to data. In recent years, several related structures have been used to study maps on data, including Reeb spaces, mappers and multiscale mappers. The construction of these structures also relies on the so-called \emph{nerve} of a cover of the domain.

In this paper, we aim to analyze the topological information encoded in these structures in order to provide better understanding of these structures and facilitate their practical usage.

More specifically, we show that the one-dimensional homology of the nerve complex $N(\mathcal{U})$ of a path-connected cover $\mathcal{U}$ of a domain $X$ cannot be richer than that of the domain $X$ itself. Intuitively, this result means that no new $H_1$-homology class can be "created" under a natural map from $X$ to the nerve complex $N(\mathcal{U})$. Equipping $X$ with a pseudometric $d$, we further refine this result and characterize the classes of $H_1(X)$ that may survive in the nerve complex using the notion of \emph{size} of the covering elements in $\mathcal{U}$. These fundamental results about nerve complexes then lead to an analysis of the $H_1$-homology of Reeb spaces, mappers and multiscale mappers.

The analysis of $H_1$-homology groups unfortunately does not extend to higher dimensions. Nevertheless, by using a map-induced metric, establishing a Gromov-Hausdorff convergence result between mappers and the domain, and interleaving relevant modules, we can still analyze the persistent homology groups of (multiscale) mappers to establish a connection to Reeb spaces.

Accepted papers at LICS 2017

from Luca Aceto

The list of accepted papers for LICS 2017 is now available at http://lics.rwth-aachen.de/lics17/accepted.html. Thanks to Joel Ouaknine and his PC for their work in selecting such a mouth-watering list of papers!

If you have not done so already, go ahead and register for the conference now. We strongly encourage prospective participants to register, and to make their travel and accommodation plans ASAP. Iceland gets literally fully booked early during the summer months.

by Luca Aceto (noreply@blogger.com) at March 22, 2017 10:33 PM UTC

TR17-053 | Query-to-Communication Lifting for BPP | Thomas Watson, Mika Göös, Toniann Pitassi

from ECCC papers

For any $n$-bit boolean function $f$, we show that the randomized communication complexity of the composed function $f\circ g^n$, where $g$ is an index gadget, is characterized by the randomized decision tree complexity of $f$. In particular, this means that many query complexity separations involving randomized models (e.g., classical vs.\ quantum) automatically imply analogous separations in communication complexity.

TCS+ talk: Wednesday, March 29 — Noah Stephens-Davidowitz, NYU

TCS+ is resuming! Our next talk will take place next Wednesday, March 29th at 1:00 PM Eastern Time (10:00 AM Pacific Time, 19:00 Central European Time, 17:00 UTC). Noah Stephens-Davidowitz (NYU) will present joint work with Oded Regev on “A Reverse Minkowski Theorem”, to appear in the next STOC  (abstract below).

Please make sure you reserve a spot for your group to join us live by signing up on the online form. As usual, for more information about the TCS+ online seminar series and the upcoming talks, or to suggest a possible topic or speaker, please see the website.

A classical problem in the geometry of numbers asks us to estimate how many lattice points lie in some ball around the origin. Minkowski’s celebrated theorem gives us a tight lower bound for this number that depends only on the determinant of the lattice. (One can think of the determinant as the limit of the density of lattice points inside large balls. So, Minkowski’s theorem gives a tight lower bound on a lattice’s “local density” based on its “global density.”) We resolve a conjecture due to Dadush that gives a nearly tight converse to Minkowski’s theorem—an upper bound on the number of lattice points in a ball that depends only on the determinants of sublattices. This theorem has numerous applications, from complexity theory, to the geometry of numbers, to the behavior of Brownian motion on flat tori.

Based on joint work with Oded Regev.

by plustcs at March 22, 2017 02:34 PM UTC

from Scott Aaronson

Yesterday Ryan Mandelbaum, at Gizmodo, posted a decidedly tongue-in-cheek piece about whether or not the universe is a computer simulation.  (The piece was filed under the category “LOL.”)

The immediate impetus for Mandelbaum’s piece was a blog post by Sabine Hossenfelder, a physicist who will likely be familiar to regulars here in the nerdosphere.  In her post, Sabine vents about the simulation speculations of philosophers like Nick Bostrom.  She writes:

Proclaiming that “the programmer did it” doesn’t only not explain anything – it teleports us back to the age of mythology. The simulation hypothesis annoys me because it intrudes on the terrain of physicists. It’s a bold claim about the laws of nature that however doesn’t pay any attention to what we know about the laws of nature.

After hammering home that point, Sabine goes further, and says that the simulation hypothesis is almost ruled out, by (for example) the fact that our universe is Lorentz-invariant, and a simulation of our world by a discrete lattice of bits won’t reproduce Lorentz-invariance or other continuous symmetries.

In writing his post, Ryan Mandelbaum interviewed two people: Sabine and me.

I basically told Ryan that I agree with Sabine insofar as she argues that the simulation hypothesis is lazy—that it doesn’t pay its rent by doing real explanatory work, doesn’t even engage much with any of the deep things we’ve learned about the physical world—and disagree insofar as she argues that the simulation hypothesis faces some special difficulty because of Lorentz-invariance or other continuous phenomena in known physics.  In short: blame it for being unfalsifiable rather than for being falsified!

Indeed, to whatever extent we believe the Bekenstein bound—and even more pointedly, to whatever extent we think the AdS/CFT correspondence says something about reality—we believe that in quantum gravity, any bounded physical system (with a short-wavelength cutoff, yada yada) lives in a Hilbert space of a finite number of qubits, perhaps ~1069 qubits per square meter of surface area.  And as a corollary, if the cosmological constant is indeed constant (so that galaxies more than ~20 billion light years away are receding from us faster than light), then our entire observable universe can be described as a system of ~10122 qubits.  The qubits would in some sense be the fundamental reality, from which Lorentz-invariant spacetime and all the rest would need to be recovered as low-energy effective descriptions.  (I hasten to add: there’s of course nothing special about qubits here, any more than there is about bits in classical computation, compared to some other unit of information—nothing that says the Hilbert space dimension has to be a power of 2 or anything silly like that.)  Anyway, this would mean that our observable universe could be simulated by a quantum computer—or even for that matter by a classical computer, to high precision, using a mere ~210^122 time steps.

Sabine might respond that AdS/CFT and other quantum gravity ideas are mere theoretical speculations, not solid and established like special relativity.  But crucially, if you believe that the observable universe couldn’t be simulated by a computer even in principle—that it has no mapping to any system of bits or qubits—then at some point the speculative shoe shifts to the other foot.  The question becomes: do you reject the Church-Turing Thesis?  Or, what amounts to the same thing: do you believe, like Roger Penrose, that it’s possible to build devices in nature that solve the halting problem or other uncomputable problems?  If so, how?  But if not, then how exactly does the universe avoid being computational, in the broad sense of the term?

I’d write more, but by coincidence, right now I’m at an It from Qubit meeting at Stanford, where everyone is talking about how to map quantum theories of gravity to quantum circuits acting on finite sets of qubits, and the questions in quantum circuit complexity that are thereby raised.  It’s tremendously exciting—the mixture of attendees is among the most stimulating I’ve ever encountered, from Lenny Susskind and Don Page and Daniel Harlow to Umesh Vazirani and Dorit Aharonov and Mario Szegedy to Google’s Sergey Brin.  But it should surprise no one that, amid all the discussion of computation and fundamental physics, the question of whether the universe “really” “is” a simulation has barely come up.  Why would it, when there are so many more fruitful things to ask?  All I can say with confidence is that, if our world is a simulation, then whoever is simulating it (God, or a bored teenager in the metaverse) seems to have a clear preference for the 2-norm over the 1-norm, and for the complex numbers over the reals.

by Scott at March 22, 2017 04:10 AM UTC

Writing for a Broad Scientific Audience

from Theory Matters

Many results in theoretical computer science are of interest to a broad scientific audience.  One way to reach that audience is to publish in venues aimed at them, such as Science, Nature, and the Proceedings of the National Academy of Sciences (PNAS).  What’s the best approach for writing for such venues?  Ryan O’Donnell collected advice from a number of TCS researchers who have successfully published in these venues in the past.  See this document for the summary (also available from Theory Matters under the “Resources” tab).

by timroughgarden at March 22, 2017 12:03 AM UTC

Construction of Directed 2K Graphs

Authors: Bálint Tillman, Athina Markopoulou, Carter T. Butts, Minas Gjoka
Abstract: We study the problem of constructing synthetic graphs that resemble real-world directed graphs in terms of their degree correlations. We define the problem of directed 2K construction (D2K) that takes as input the directed degree sequence (DDS) and a joint degree and attribute matrix (JDAM) so as to capture degree correlation specifically in directed graphs. We provide necessary and sufficient conditions to decide whether a target D2K is realizable, and we design an efficient algorithm that creates realizations with that target D2K. We evaluate our algorithm in creating synthetic graphs that target real-world directed graphs (such as Twitter) and we show that it brings significant benefits compared to state-of-the-art approaches.

Just-in-Time Batch Scheduling Problem with Two-dimensional Bin Packing Constraints

Authors: S. Polyakovskiy, A. Makarowsky, R. M'Hallah
Abstract: This paper introduces and approximately solves a multi-component problem where small rectangular items are produced from large rectangular bins via guillotine cuts. An item is characterized by its width, height, due date, and earliness and tardiness penalties per unit time. Each item induces a cost that is proportional to its earliness and tardiness. Items cut from the same bin form a batch, whose processing and completion times depend on its assigned items. The items of a batch have the completion time of their bin. The objective is to find a cutting plan that minimizes the weighted sum of earliness and tardiness penalties. We address this problem via a constraint programming based heuristic (CP) and an agent based modelling heuristic (AB). CP is an impact-based search strategy, implemented in the general-purpose solver IBM CP Optimizer. AB is constructive. It builds a solution through repeated negotiations between the set of agents representing the items and the set representing the bins. The agents cooperate to minimize the weighted earliness-tardiness penalties. The computational investigation shows that CP outperforms AB on small-sized instances while the opposite prevails for larger instances.

A Note on the Tree Augmentation Problem

Authors: Zeev Nutov
Abstract: In the Tree Augmentation problem we are given a tree $T=(V,F)$ and an additional set $E \subseteq V \times V$ of edges, called "links", with positive integer costs $\{c_e:e \in E\}$. The goal is to augment $T$ by a minimum cost set of links $J \subseteq E$ such that $T \cup J$ is $2$-edge-connected. Let $M$ denote the maximum cost of a link. Recently, Adjiashvili introduced a novel LP for the problem and used it to break the natural $2$-approximation barrier for instances when $M$ is a constant. Specifically, his algorithm computes a $1.96418+\epsilon$ approximate solution in time $n^{O(M/\epsilon^2)}$. Using a slightly weaker LP we achieve ratio $\frac{12}{7}+\epsilon$ for arbitrary costs and ratio $1.6+\epsilon$ for unit costs in time $2^{O(M/\epsilon^2)}$.

A Hybrid Feasibility Constraints-Guided Search to the Two-Dimensional Bin Packing Problem with Due Dates

Authors: S. Polyakovskiy, R. M'Hallah
Abstract: The two-dimensional non-oriented bin packing problem with due dates packs a set of rectangular items, which may be rotated by 90 degrees, into identical rectangular bins. The bins have equal processing times. An item's lateness is the difference between its due date and the completion time of its bin. The problem packs all items without overlap as to minimize maximum lateness Lmax.

The paper proposes a tight lower bound that enhances an existing bound on Lmax for 24.07% of the benchmark instances and matches it in 30.87% cases. In addition, it models the problem using mixed integer programming (MIP), and solves small-sized instances exactly using CPLEX. It approximately solves larger-sized instances using a two-stage heuristic. The first stage constructs an initial solution via a first-fit heuristic that applies an iterative constraint programming (CP)-based neighborhood search. The second stage, which is iterative too, approximately solves a series of assignment low-level MIPs that are guided by feasibility constraints. It then enhances the solution via a high-level random local search. The approximate approach improves existing upper bounds by 27.45% on average, and obtains the optimum for 33.93% of the instances. Overall, the exact and approximate approaches identify the optimum for 39.07% cases.

The proposed approach is applicable to complex problems. It applies CP and MIP sequentially, while exploring their advantages, and hybridizes heuristic search with MIP. It embeds a new lookahead strategy that guards against infeasible search directions and constrains the search to improving directions only; thus, differs from traditional lookahead beam searches.

Exact Affine OBDDs

Authors: Rishat Ibrahimov, Kamil Khadiev, Abuzer Yakaryilmaz
Abstract: We introduce affine OBDD model and we show that exact affine OBDDs can be exponentially narrower than bounded-error quantum and classical OBDDs on some certain problems. Moreover, we consider Las-Vegas quantum and classical automata models and improve the previous gap between deterministic and probabilistic models by a factor of 2 and then follow the same gap for the known most restricted quantum model. Lastly, we follow an exponential gap between exact affine finite automata and Las-Vegas classical and quantum models.

On the Interplay between Strong Regularity and Graph Densification

Authors: Marco Fiorucci, Alessandro Torcinovich, Manuel Curado, Francisco Escolano, Marcello Pelillo
Abstract: In this paper we analyze the practical implications of Szemer\'edi's regularity lemma in the preservation of metric information contained in large graphs. To this end, we present a heuristic algorithm to find regular partitions. Our experiments show that this method is quite robust to the natural sparsification of proximity graphs. In addition, this robustness can be enforced by graph densification.

New TCS Events Calendar

from Theory Matters

Shachar Lovett and Sasha Kulikov created a website for TCS events, at http://cstheory-events.org.  There is also a link to the site from the the top-level menu at Theory Matters.

From the creators:

The goal of this website is to allow the TCS community to advertise and learn about relevant events (workshops, schools, etc), with a focus on algorithms and complexity. We added the top theory conferences for convenience, but the goal is to mainly focus on events that do not repeat annually and that people may not be aware of them.

If you find any mistakes, want to update existing events, suggest a change  to the website or anything else, please email Shachar Lovett or Sasha Kulikov.

by timroughgarden at March 21, 2017 03:35 PM UTC

Opinions on the ERC after ten years

from Luca Aceto

I am collecting some opinions about the European Research Council (ERC) from researchers who have received funding from it, and from some who have not.

What is your overall opinion on the ERC? Do you think that it is good for European research?

My interest in this matter started because the ERC is ten (and so it might be a good time to draw a preliminary assessment of its impact on the European research environment) and was piqued by the opinions aired by the Italian physicist Sylos Labini who claimed that the ERC has become the main problem in European research funding. He says that there are three main problems with the ERC.
1. The first is that it uses "research excellence" to mask political choices.
2. The second is that rewarding today's excellence does nothing to support the excellence of tomorrow. Moreover, one does not reward excellent research by giving money to the top 5% of those who apply.
3. The third is that the ERC gives a bad example to national funding agencies in Europe, who also reward excellence.
See also http://www.sciencemag.org/news/2017/03/10-europe-s-excellence-fund-faces-calls-change, where Sylos Labini makes a cameo appearance in this paragraph:

But some chafe at the singular focus on excellence. Countries in southern Europe have cut their research budgets during the economic crisis, and now ERC is further weakening these countries by essentially redistributing their EU contributions to the research powerhouses in the north, says Francesco Sylos Labini, a physicist at the Enrico Fermi Center in Rome. And it’s not just the money: “The few Italian researchers that get an ERC grant go to Germany or another country to do their research,” he says.

I do not a long piece of text. Just a few lines would do. I'd be grateful if you could contribute to this discussion by posting your comments.

You might also wish to read Helga Nowotny's short piece entitled ERC---the next ten years.

by Luca Aceto (noreply@blogger.com) at March 21, 2017 09:19 AM UTC

Discovering Relaxed Sound Workflow Nets using Integer Linear Programming

Authors: S. J. van Zelst, B. F. van Dongen, W. M. P. van der Aalst, H. M. W. Verbeek
Abstract: Process mining is concerned with the analysis, understanding and improvement of business processes. Process discovery, i.e. discovering a process model based on an event log, is considered the most challenging process mining task. State-of-the-art process discovery algorithms only discover local control-flow patterns and are unable to discover complex, non-local patterns. Region theory based techniques, i.e. an established class of process discovery techniques, do allow for discovering such patterns. However, applying region theory directly results in complex, over-fitting models, which is less desirable. Moreover, region theory does not cope with guarantees provided by state-of-the-art process discovery algorithms, both w.r.t. structural and behavioural properties of the discovered process models. In this paper we present an ILP-based process discovery approach, based on region theory, that guarantees to discover relaxed sound workflow nets. Moreover, we devise a filtering algorithm, based on the internal working of the ILP-formulation, that is able to cope with the presence of infrequent behaviour. We have extensively evaluated the technique using different event logs with different levels of exceptional behaviour. Our experiments show that the presented approach allow us to leverage the inherent shortcomings of existing region-based approaches. The techniques presented are implemented and readily available in the HybridILPMiner package in the open-source process mining tool-kits ProM and RapidProM.

Parallel Sort-Based Matching for Data Distribution Management on Shared-Memory Multiprocessors

Authors: Moreno Marzolla, Gabriele D'Angelo
Abstract: In this paper we consider the problem of identifying intersections between two sets of d-dimensional axis-parallel rectangles. This is a common problem that arises in many agent-based simulation studies, and is of central importance in the context of High Level Architecture (HLA), where it is at the core of the Data Distribution Management (DDM) service. Several realizations of the DDM service have been proposed; however, many of them are either inefficient or inherently sequential. These are serious limitations since multicore processors are now ubiquitous, and DDM algorithms -- being CPU-intensive -- could benefit from additional computing power. We propose a parallel version of the Sort-Based Matching algorithm for shared-memory multiprocessors. Sort-Based Matching is one of the most efficient serial algorithms for the DDM problem, but is quite difficult to parallelize due to data dependencies. We describe the algorithm and compute its asymptotic running time; we complete the analysis by assessing its performance and scalability through extensive experiments on two commodity multicore systems based on a dual socket Intel Xeon processor, and a single socket Intel Core i7 processor.

Reoptimization of the Closest Substring Problem under Pattern Length Modification

Authors: Jhoirene B. Clemente, Henry N. Adorna
Abstract: This study investigates whether reoptimization can help in solving the closest substring problem. We are dealing with the following reoptimization scenario. Suppose, we have an optimal l-length closest substring of a given set of sequences S. How can this information be beneficial in obtaining an (l+k)-length closest substring for S? In this study, we show that the problem is still computationally hard even with k=1. We present greedy approximation algorithms that make use of the given information and prove that it has an additive error that grows as the parameter k increases. Furthermore, we present hard instances for each algorithm to show that the computed approximation ratio is tight. We also show that we can slightly improve the running-time of the existing polynomial-time approximation scheme (PTAS) for the original problem through reoptimization.

On Optimal 2- and 3-Planar Graphs

Authors: Michael A. Bekos, Michael Kaufmann, Chrysanthi N. Raftopoulou
Abstract: A graph is $k$-planar if it can be drawn in the plane such that no edge is crossed more than $k$ times. While for $k=1$, optimal $1$-planar graphs, i.e., those with $n$ vertices and exactly $4n-8$ edges, have been completely characterized, this has not been the case for $k \geq 2$. For $k=2,3$ and $4$, upper bounds on the edge density have been developed for the case of simple graphs by Pach and T\'oth, Pach et al. and Ackerman, which have been used to improve the well-known "Crossing Lemma". Recently, we proved that these bounds also apply to non-simple $2$- and $3$-planar graphs without homotopic parallel edges and self-loops.

In this paper, we completely characterize optimal $2$- and $3$-planar graphs, i.e., those that achieve the aforementioned upper bounds. We prove that they have a remarkably simple regular structure, although they might be non-simple. The new characterization allows us to develop notable insights concerning new inclusion relationships with other graph classes.

Anisotropic triangulations via discrete Riemannian Voronoi diagrams

Authors: Jean-Daniel Boissonnat, Mael Rouxel-Labbé, Mathijs Wintraecken
Abstract: The construction of anisotropic triangulations is desirable for various applications, such as the numerical solving of partial differential equations and the representation of surfaces in graphics. To solve this notoriously difficult problem in a practical way, we introduce the discrete Riemannian Voronoi diagram, a discrete structure that approximates the Riemannian Voronoi diagram. This structure has been implemented and was shown to lead to good triangulations in $\mathbb{R}^2$ and on surfaces embedded in $\mathbb{R}^3$ as detailed in our experimental companion paper.

In this paper, we study theoretical aspects of our structure. Given a finite set of points $\cal P$ in a domain $\Omega$ equipped with a Riemannian metric, we compare the discrete Riemannian Voronoi diagram of $\cal P$ to its Riemannian Voronoi diagram. Both diagrams have dual structures called the discrete Riemannian Delaunay and the Riemannian Delaunay complex. We provide conditions that guarantee that these dual structures are identical. It then follows from previous results that the discrete Riemannian Delaunay complex can be embedded in $\Omega$ under sufficient conditions, leading to an anisotropic triangulation with curved simplices. Furthermore, we show that, under similar conditions, the simplices of this triangulation can be straightened.

The Hardness of Embedding Grids and Walls

Authors: Yijia Chen, Martin Grohe, Bingkai Lin
Abstract: The dichotomy conjecture for the parameterized embedding problem states that the problem of deciding whether a given graph $G$ from some class $K$ of "pattern graphs" can be embedded into a given graph $H$ (that is, is isomorphic to a subgraph of $H$) is fixed-parameter tractable if $K$ is a class of graphs of bounded tree width and $W[1]$-complete otherwise.

Towards this conjecture, we prove that the embedding problem is $W[1]$-complete if $K$ is the class of all grids or the class of all walls.

Spectrum Estimation from a Few Entries

Authors: Ashish Khetan, Sewoong Oh
Abstract: Singular values of a data in a matrix form provide insights on the structure of the data, the effective dimensionality, and the choice of hyper-parameters on higher-level data analysis tools. However, in many practical applications such as collaborative filtering and network analysis, we only get a partial observation. Under such scenarios, we consider the fundamental problem of recovering spectral properties of the underlying matrix from a sampling of its entries. We are particularly interested in directly recovering the spectrum, which is the set of singular values, and also in sample-efficient approaches for recovering a spectral sum function, which is an aggregate sum of the same function applied to each of the singular values. We propose first estimating the Schatten $k$-norms of a matrix, and then applying Chebyshev approximation to the spectral sum function or applying moment matching in Wasserstein distance to recover the singular values. The main technical challenge is in accurately estimating the Schatten norms from a sampling of a matrix. We introduce a novel unbiased estimator based on counting small structures in a graph and provide guarantees that match its empirical performance. Our theoretical analysis shows that Schatten norms can be recovered accurately from strictly smaller number of samples compared to what is needed to recover the underlying low-rank matrix. Numerical experiments suggest that we significantly improve upon a competing approach of using matrix completion methods.

Hardware-Efficient Schemes of Quaternion Multiplying Units for 2D Discrete Quaternion Fourier Transform Processors

Authors: Aleksandr Cariow, Galina Cariowa, Marina Chicheva
Abstract: In this paper, we offer and discuss three efficient structural solutions for the hardware-oriented implementation of discrete quaternion Fourier transform basic operations with reduced implementation complexities. The first solution: a scheme for calculating sq product, the second solution: a scheme for calculating qt product, and the third solution: a scheme for calculating sqt product, where s is a so-called i-quaternion, t is an j-quaternion, and q is an usual quaternion. The direct multiplication of two usual quaternions requires 16 real multiplications (or two-operand multipliers in the case of fully parallel hardware implementation) and 12 real additions (or binary adders). At the same time, our solutions allow to design the computation units, which consume only 6 multipliers plus 6 two input adders for implementation of sq or qt basic operations and 9 binary multipliers plus 6 two-input adders and 4 four-input adders for implementation of sqt basic operation.

Definition of geometric space around analytic fractal trees using derivative coordinate funtions

Authors: Henk Mulder
Abstract: The concept of derivative coordinate functions proved useful in the formulation of analytic fractal functions to represent smooth symmetric binary fractal trees [1]. In this paper we introduce a new geometry that defines the fractal space around these fractal trees. We present the canonical and degenerate form of this fractal space and extend the fractal geometrical space to R3 specifically and Rn by a recurrence relation. We also discuss the usage of such fractal geometry.

Hardness of almost embedding simplicial complexes in $\mathbb R^d$

Abstract: A map $f\colon K\to \mathbb R^d$ of a simplicial complex is an almost embedding if $f(\sigma)\cap f(\tau)=\emptyset$ whenever $\sigma,\tau$ are disjoint simplices of $K$.

Theorem. Fix integers $d,k\ge2$ such that $d=\frac{3k}2+1$.

(a) Assume that $P\ne NP$. Then there exists a finite $k$-dimensional complex $K$ that does not admit an almost embedding in $\mathbb R^d$ but for which there exists an equivariant map $\tilde K\to S^{d-1}$.

(b) The algorithmic problem of recognition almost embeddability of finite $k$-dimensional complexes in $\mathbb R^d$ is NP hard.

The proof is based on the technique from the Matou\v{s}ek-Tancer-Wagner paper (proving an analogous result for embeddings), and on singular versions of the higher-dimensional Borromean rings lemma and a generalized van Kampen--Flores theorem.

Discriminative Distance-Based Network Indices and the Tiny-World Property

Authors: Mostafa Haghir Chehreghani, Albert Bifet, Talel Abdessalem
Abstract: Distance-based indices, including closeness centrality, average path length, eccentricity and average eccentricity, are important tools for network analysis. In these indices, the distance between two vertices is measured by the size of shortest paths between them. However, this measure has shortcomings. A well-studied shortcoming is that extending it to disconnected graphs (and also directed graphs) is controversial. The second shortcoming is that when this measure is used in real-world networks, a huge number of vertices may have exactly the same closeness/eccentricity scores. The third shortcoming is that in many applications, the distance between two vertices not only depends on the size of shortest paths, but also on the number of shortest paths between them. In this paper, we develop a new distance measure between vertices of a graph that yields discriminative distance-based centrality indices. This measure is proportional to the size of shortest paths and inversely proportional to the number of shortest paths. We present algorithms for exact computation of the proposed discriminative indices. We then develop randomized algorithms that precisely estimate average discriminative path length and average discriminative eccentricity and show that they give $(\epsilon,\delta)$-approximations of these indices. Finally, we preform extensive experiments over several real-world networks from different domains and show that compared to the traditional indices, discriminative indices have usually much more discriminability. Our experiments reveal that real-world networks have usually a tiny average discriminative path length, bounded by a constant (e.g., 2). We refer to this property as the tiny-world property.