Theory of Computing Blog Aggregator

Some people seems to be unaware that there is newer DBLP site, that have more features (among them allowing you to download all your bibtex entries in one go, and better citations formatting). Anyway, the link is http://dblp.dagstuhl.de/db/.

by Sariel at October 31, 2014 05:02 PM UTC

The October issue of the EATCS Bulletin is now available online at http://bulletin.eatcs.org/index.php/beatcs/issue/view/16 from where you can access the individual contributions separately.

You can download a pdf with the printed version of the whole issue from http://www.eatcs.org/images/bulletin/beatcs114.pdf.

The Bulletin of the EATCS is open access, so people who are not members of the EATCS can read it. Let me thank the members of the association who make this service of the community possible with their support.  (EATCS members have access to the member area, which contains news and related articles and provides access to the Springer Reading Room. Young researchers can find announcements of open positions, news and related articles.)

This issue of the bulletin is brimming with interesting content, with five EATCS Columns and a piece by David Woodruff surveying the work for which he had received the EATCS Presburger Award 2014 amongst others. You might also enjoy reading the transcript of a dialogue between Christian Calude and Kurt Mehlhorn about theory, LEDA and Algorithm Engineering. I find it inspiring to read Christian's dialogues with famous members of our community and I always learn something useful from them. (Unfortunately, the lessons I think I learn do not make it often into my work practices. That's the theory-practice divide, I guess :-))

Here are a couple of excerpts to whet your appetite.
  • Kurt's motto, even definition, for Algorithm Engineering is: "Treat programs as first class citizens in algorithms research and not as an afterthought." He also adds that "Algorithm engineering is not only a sub-discipline of algorithms research. More importantly, it is a mind set."
  • CC: How do you manage to juggle between so many jobs in di fferent countries?
    KM: I try to follow some simple principles.
    I avoid multi-tasking. I set aside time for particular tasks and then concentrate on them. For example, when I was writing my 1984 books and the LEDA book, I would work on the book every work day from 8:00 am to 12:00 pm. I would not accept phone calls or interruptions by students during this time. Now, the 8am to 12pm slot is reserved for reading, thinking and writing. The no-interruption rule still holds.
    I clean my desk completely every evening when I leave my o ffice, so that I can start with an empty desk the next morning.
    When I accept a new responsibility, I decide, what I am going to give up for
    it. For example, when I became vice-president of the Max Planck Society in 2002 (for a 6 year term), I resigned as editor of Algorithmica, Information and Computation, SIAM Journal of Computing, Journal of Discrete and Computational Geometry, International Journal of Computational Geometry and Applications, and Computing.
    And most importantly, I am supported by many people in what I do, in particular, my co-workers, my students, and then administrative staff in the institute and the department. Cooperation and delegation are very important.
Enjoy the issue and consider contributing to future ones!

by Luca Aceto (noreply@blogger.com) at October 31, 2014 09:05 AM UTC

Authors: Alexander P. Giles, Orestis Georgiou, Carl P. Dettmann
Download: PDF
Abstract: Random geometric networks are mathematical structures consisting of a set of nodes placed randomly within a bounded set $\mathcal{{V}}\subseteq\mathbb{R}^{d}$ mutually coupled with a probability dependent on their Euclidean separation, and are the classic model used within the expanding field of ad-hoc wireless networks. In order to rank the importance of network nodes, we consider the well established `betweenness' centrality measure (quantifying how often a node is on a shortest path of links between any pair of nodes), providing an analytic treatment of betweenness within a random graph model using a continuum approach by deriving a closed form expression for the expected betweenness of a node placed within a dense random geometric network formed inside a disk of radius $R$. We confirm this with numerical simulations, and discuss the importance of the formula for mitigating the `boundary effect' connectivity phenomenon, for cluster head node election protocol design and for detecting the location of a network's `vulnerability backbone'.

at October 31, 2014 12:41 AM UTC

Authors: Eric Blais, Clément L. Canonne, Igor C. Oliveira, Rocco A. Servedio, Li-Yang Tan
Download: PDF
Abstract: Monotone Boolean functions, and the monotone Boolean circuits that compute them, have been intensively studied in complexity theory. In this paper we study the structure of Boolean functions in terms of the minimum number of negations in any circuit computing them, a complexity measure that interpolates between monotone functions and the class of all functions. We study this generalization of monotonicity from the vantage point of learning theory, giving near-matching upper and lower bounds on the uniform-distribution learnability of circuits in terms of the number of negations they contain. Our upper bounds are based on a new structural characterization of negation-limited circuits that extends a classical result of A. A. Markov. Our lower bounds, which employ Fourier-analytic tools from hardness amplification, give new results even for circuits with no negations (i.e. monotone functions).

at October 31, 2014 12:40 AM UTC

Authors: Marthe Bonamy, Lukasz Kowalik
Download: PDF
Abstract: We show a kernel of at most $13k$ vertices for the Planar Feedback Vertex Set problem restricted to planar graphs, i.e., a polynomial-time algorithm that transforms an input instance $(G,k)$ to an equivalent instance with at most $13k$ vertices. To this end we introduce a few new reduction rules. However, our main contribution is an application of the region decomposition technique in the analysis of the kernel size. We show that our analysis is tight, up to a constant additive term.

at October 31, 2014 12:41 AM UTC

Authors: Karsten Lehmann, Alban Grastien, Pascal Van Hentenryck
Download: PDF
Abstract: Recent years have witnessed significant interest in convex relaxations of the power flows, several papers showing that the second-order cone relaxation is tight for tree networks under various conditions on loads or voltages. This paper shows that AC-feasibility, i.e., to find whether some generator dispatch can satisfy a given demand, is NP-Hard for tree networks.

at October 31, 2014 12:40 AM UTC

Authors: Timothy M. Chan, Fabrizio Frati, Carsten Gutwenger, Anna Lubiw, Petra Mutzel, Marcus Schaefer
Download: PDF
Abstract: We investigate the problem of constructing planar drawings with few bends for two related problems, the partially embedded graph problem---to extend a straight-line planar drawing of a subgraph to a planar drawing of the whole graph---and the simultaneous planarity problem---to find planar drawings of two graphs that coincide on shared vertices and edges. In both cases we show that if the required planar drawings exist, then there are planar drawings with a linear number of bends per edge and, in the case of simultaneous planarity, a constant number of crossings between every pair of edges. Our proofs provide efficient algorithms if the combinatorial embedding of the drawing is given. Our result on partially embedded graph drawing generalizes a classic result by Pach and Wenger which shows that any planar graph can be drawn with a linear number of bends per edge if the location of each vertex is fixed.

at October 31, 2014 12:00 AM UTC

Authors: Jesko Hüttenhain, Christian Ikenmeyer
Download: PDF
Abstract: We prove that for writing the 3 by 3 permanent polynomial as a determinant of a matrix consisting only of zeros, ones, and variables as entries, a 7 by 7 matrix is required. Our proof is computer based and uses the enumeration of bipartite graphs. Furthermore, we analyze sequences of polynomials that are determinants of polynomially sized matrices consisting only of zeros, ones, and variables. We show that these are exactly the sequences in the complexity class of constant free polynomially sized (weakly) skew circuits.

at October 31, 2014 12:41 AM UTC

Congratulations to the San Francisco Giants, winning the World Series last night. In honor of their victory let's talk metrics. Baseball has truly embraced metrics as evidenced in the book and movie Moneyball about focusing on statistics to choose which players to trade for. This year we saw a dramatic increase in the infield shift, the process of moving the infielders to different locations for each batter based on where they hit the ball, all based on statistics.

Metrics work in baseball because we do have lots of statistics, but also an objective goal of winning games and ultimately the World Series. You can use machine learning techniques to predict the effects of certain players and positions and the metrics can drive your decisions.

In the academic world we certainly have our own statistics, publications counts and citations, grant income, teaching evaluation scores, sizes of classes and majors, number of faculty and much more. We certainly draw useful information from these values and they feed into the decisions of hiring and promotion and evaluation of departments and disciplines. But I don't like making decisions solely based on metrics, because we don't have an objective outcome.

What does it mean to be a great computer scientist? It's not just a number, not necessarily the person with a large number of citations or a high h-index, or the one who brings in huge grants, or the one with high teaching scores, or whose students gets high paying jobs. It's a much more subjective measure, the person who has a great impact. in the many various ways one can have an impact. It's why faculty applications require recommendation letters. It's why we have faculty recruiting and P&T committees, instead of just punching in a formula. It's why we have outside review committees that review departments and degrees, and peer review of grant proposals.

As you might have guessed this post is motivated by attempts to rank departments based on metrics, such as described in the controversial guest post last week or by Mitzenmacher. There are so many rankings based on metrics, you just need to find one that makes you look good. But metric-based rankings have many problems, most importantly they can't capture the subjective measure of greatness and people will disagree on which metric to use. If a ranking takes hold, you may optimize to the metric instead of to the real goals, a bad allocation of resources.

I prefer the US News & World report approach to ranking CS Departments, which are based heavily on surveys filled out by department and graduate committee chairs. For the subareas, it would be better to have, for example, theory people rank the theory groups but I still prefer the subjective approach.

In the end, the value of a program is its reputation, for a strong reputation is what attracts faculty and students. Reputation-based rankings can best capture the relative strengths of academic departments in what really matters.

by Lance Fortnow (noreply@blogger.com) at October 30, 2014 05:35 PM UTC

I’m happy to say that I’ve finally gotten things set up so that you can download a PDF of the book. The official web address for this is http://get.analysisofbooleanfunctions.org, or you can click “DOWNLOAD THE PDF” on the blog’s main page.

A small warning: On the download page, I am using a “Google form” to ask for your email address, to which the PDF will be sent. I promise not to send you any other emails; just the one containing the attached PDF. I would also like to warn that I think Google allows at most 100 emails per day, so there is some small chance that we might hit the limit today. If that happens, please try again tomorrow! Finally, the server hosting this website has been a bit flaky lately; I hope it will be fine in the future.

Many thanks to Cambridge University Press for agreeing (several years ago!) to allow the PDF to be freely available. Of course, I’m also very thankful to the heroes who sent in comments and bug-fixes to the blog draft of the book. Please feel free to continue doing so — I’m happy to keep making corrections.

Regarding the differences between the printed book and the electronic version: The printed book is “Version 1.00″ and the PDF is “Version 1.01″. The main difference is that about 40 small typos/bugs have been fixed (nothing major; in almost all cases you can easily guess the correct thing once you notice the mistake). I would also like to add that the numbering of the theorems/definitions/exercises/etc. is identical between the printed book and the PDF, so if you would like to cite something, it doesn’t matter which version you cite. The page numbering is not the same, though, so please try not to cite things by page number.

Once more, thanks again to all the readers of the blog; I hope you enjoy the book!

Added later: We hit 100 emails; however, it appears that the downloads are still working. But in any case, if you have trouble, please try again tomorrow.

by Ryan O'Donnell at October 30, 2014 12:09 PM UTC

Authors: Jesper W. Mikkelsen
Download: PDF
Abstract: We study the amount of knowledge about the future that an online algorithm needs to color the edges of a graph optimally (i.e. using as few colors as possible). Assume that along with each edge, the online algorithm receives a fixed number of advice bits. For graphs of maximum degree $\Delta$, it follows from Vizing's Theorem that $\lceil \log(\Delta+1)\rceil$ bits per edge suffice to achieve optimality. We show that even for bipartite graphs, $\Omega(\log \Delta)$ bits per edge are in fact necessary. However, we also show that there is an online algorithm which can color the edges of a $d$-degenerate graph optimally using $O(\log d)$ bits of advice per edge. It follows that for planar graphs and other graph classes of bounded degeneracy, only $O(1)$ bits per edge are needed, independently of how large $\Delta$ is.

at October 30, 2014 12:40 AM UTC

Authors: Alexander Mäcker, Manuel Malatyali, Friedhelm Meyer auf der Heide
Download: PDF
Abstract: Consider n nodes connected to a single coordinator. Each node receives an individual online data stream of numbers and, at any point in time, the coordinator has to know the k nodes currently observing the largest values, for a given k between 1 and n. We design and analyze an algorithm that solves this problem while bounding the amount of messages exchanged between the nodes and the coordinator. Our algorithm employs the idea of using filters which, intuitively speaking, leads to few messages to be sent, if the new input is "similar" to the previous ones. The algorithm uses a number of messages that is on expectation by a factor of O((log {\Delta} + k) log n) larger than that of an offline algorithm that sets filters in an optimal way, where {\Delta} is upper bounded by the largest value observed by any node.

at October 30, 2014 12:40 AM UTC

Authors: Oliver Friedmann, Thomas Dueholm Hansen, Uri Zwick
Download: PDF
Abstract: In Friedmann, Hansen, and Zwick (2011) we claimed that the expected number of pivoting steps performed by the Random-Facet algorithm of Kalai and of Matousek, Sharir, and Welzl is equal to the expected number of pivoting steps performed by Random-Facet^*, a variant of Random-Facet that bases its random decisions on one random permutation. We then obtained a lower bound on the expected number of pivoting steps performed by Random-Facet^* and claimed that the same lower bound holds also for Random-Facet. Unfortunately, the claim that the expected numbers of steps performed by Random-Facet and Random-Facet^* are the same is false. We provide here simple examples that show that the expected numbers of steps performed by the two algorithms are not the same.

at October 30, 2014 12:40 AM UTC

We study the space complexity of the cutting planes proof system, in which the lines in a proof are integral linear inequalities. We measure the space used by a refutation as the number of inequalities that need to be kept on a blackboard while verifying it. We show that any unsatisfiable set of inequalities has a cutting planes refutation in space five. This is in contrast to the weaker resolution proof system, for which the analogous space measure has been well-studied and many optimal lower bounds are known. Motivated by this result we consider a natural restriction of cutting planes, in which all coefficients have size bounded by a constant. We show that there is a CNF which requires super-constant space to refute in this system. The system nevertheless already has an exponential speed-up over resolution with respect to size, and we additionally show that it is stronger than resolution with respect to space, by constructing constant-space cutting planes proofs of the pigeonhole principle with coefficients bounded by two. We also consider variable space for cutting planes, where we count the number of instances of variables on the blackboard, and total space, where we count the total number of symbols.

at October 29, 2014 03:06 PM UTC

We describe a family of CNF formulas in $n$ variables, with small initial width, which have polynomial length resolution refutations. By a result of Ben-Sasson and Wigderson it follows that they must also have narrow resolution refutations, of width $O(\sqrt {n \log n})$. We show that, for our formulas, this decrease in width comes at the expense of an increase in size, and any such narrow refutations must have exponential length.

at October 29, 2014 03:06 PM UTC

Authors: Krishnendu Chatterjee, Rasmus Ibsen-Jensen, Andreas Pavlogiannis, Prateesh Goyal
Download: PDF
Abstract: Interprocedural analysis is at the heart of numerous applications in programming languages, such as alias analysis, constant propagation, etc. Recursive state machines (RSMs) are standard models for interprocedural analysis. We consider a general framework with RSMs where the transitions are labeled from a semiring, and path properties are algebraic with semiring operations. RSMs with algebraic path properties can model interprocedural dataflow analysis problems, the shortest path problem, the most probable path problem, etc. The traditional algorithms for interprocedural analysis focus on path properties where the starting point is \emph{fixed} as the entry point of a specific method. In this work, we consider possible multiple queries as required in many applications such as in alias analysis. The study of multiple queries allows us to bring in a very important algorithmic distinction between the resource usage of the \emph{one-time} preprocessing vs for \emph{each individual} query. The second aspect that we consider is that the control flow graphs for most programs have constant treewidth.

Our main contributions are simple and implementable algorithms that support multiple queries for algebraic path properties for RSMs that have constant treewidth. Our theoretical results show that our algorithms have small additional one-time preprocessing, but can answer subsequent queries significantly faster as compared to the current best-known solutions for several important problems, such as interprocedural reachability and shortest path. We provide a prototype implementation for interprocedural reachability and intraprocedural shortest path that gives a significant speed-up on several benchmarks.

at October 29, 2014 12:41 AM UTC

Authors: Shipra Agrawal, Nikhil R. Devanur
Download: PDF
Abstract: We introduce the online stochastic Convex Programming (CP) problem, a very general version of stochastic online problems which allows arbitrary concave objectives and convex feasibility constraints. Many well-studied problems like online stochastic packing and covering, online stochastic matching with concave returns, etc. form a special case of online stochastic CP. We present fast algorithms for these problems, which achieve near-optimal regret guarantees for both the i.i.d. and the random permutation models of stochastic inputs. When applied to the special case online packing, our ideas yield a simpler and faster primal-dual algorithm for this well studied problem, which achieves the optimal competitive ratio. Our techniques make explicit the connection of primal-dual paradigm and online learning to online stochastic CP.

at October 29, 2014 12:41 AM UTC

Authors: Romain Hollanders, Balázs Gerencsér, Jean-Charles Delvenne, Raphaël M. Jungers
Download: PDF
Abstract: Solving Markov Decision Processes (MDPs) is a recurrent task in engineering. Even though it is known that solutions for minimizing the infinite horizon expected reward can be found in polynomial time using Linear Programming techniques, iterative methods like the Policy Iteration algorithm (PI) remain usually the most efficient in practice. This method is guaranteed to converge in a finite number of steps. Unfortunately, it is known that it may require an exponential number of steps in the size of the problem to converge. On the other hand, many open questions remain considering the actual worst case complexity. In this work, we provide the first improvement over the fifteen years old upper bound from Mansour & Singh (1999) by showing that PI requires at most k/(k-1)*k^n/n + o(k^n/n) iterations to converge, where n is the number of states of the MDP and k is the maximum number of actions per state. Perhaps more importantly, we also show that this bound is optimal for an important relaxation of the problem.

at October 29, 2014 12:40 AM UTC

Authors: Krzysztof Ciebiera, Piotr Godlewski, Piotr Sankowski, Piotr Wygocki
Download: PDF
Abstract: This paper summarizes the work on implementing few solutions for the Steiner Tree problem which we undertook in the PAAL project. The main focus of the project is the development of generic implementations of approximation algorithms together with universal solution frameworks. In particular, we have implemented Zelikovsky 11/6-approximation using local search framework, and 1.39-approximation by Byrka et al. using iterative rounding framework. These two algorithms are experimentally compared with greedy 2-approximation, with exact but exponential time Dreyfus-Wagner algorithm, as well as with results given by a state-of-the-art local search techniques by Uchoa and Werneck. The results of this paper are twofold. On one hand, we demonstrate that high level algorithmic concepts can be designed and efficiently used in C++. On the other hand, we show that the above algorithms with good theoretical guarantees, give decent results in practice, but are inferior to state-of-the-art heuristical approaches.

at October 29, 2014 12:41 AM UTC

Authors: Oliver Friedmann, Thomas Dueholm Hansen, Uri Zwick
Download: PDF
Abstract: The Random-Facet algorithm of Kalai and of Matousek, Sharir and Welzl is an elegant randomized algorithm for solving linear programs and more general LP-type problems. Its expected subexponential time of $2^{\tilde{O}(\sqrt{m})}$, where $m$ is the number of inequalities, makes it the fastest known combinatorial algorithm for solving linear programs. We previously showed that Random-Facet performs an expected number of $2^{\tilde{\Omega}(\sqrt[3]{m})}$ pivoting steps on some LPs with $m$ inequalities that correspond to $m$-action Markov Decision Processes (MDPs). We also showed that Random-Facet-1P, a one permutation variant of Random-Facet, performs an expected number of $2^{\tilde{O}(\sqrt{m})}$ pivoting steps on these examples. Here we show that the same results can be obtained using LPs that correspond to instances of the classical shortest paths problem. This shows that the stochasticity of the MDPs, which is essential for obtaining lower bounds for Random-Edge, is not needed in order to obtain lower bounds for Random-Facet. We also show that our new $2^{\tilde{\Omega}(\sqrt{m})}$ lower bound applies to Random-Bland, a randomized variant of the classical anti-cycling rule suggested by Bland.

at October 29, 2014 12:41 AM UTC

Authors: Deeparnab Chakrabarty, Sanjeev Khanna, Shi Li
Download: PDF
Abstract: Makespan minimization on unrelated machines is a classic problem in approximation algorithms. No polynomial time $(2-\delta)$-approximation algorithm is known for the problem for constant $\delta> 0$. This is true even for certain special cases, most notably the restricted assignment problem where each job has the same load on any machine but can be assigned to one from a specified subset. Recently in a breakthrough result, Svensson [Svensson, 2011] proved that the integrality gap of a certain configuration LP relaxation is upper bounded by $1.95$ for the restricted assignment problem; however, the rounding algorithm is not known to run in polynomial time.

In this paper we consider the $(1,\varepsilon)$-restricted assignment problem where each job is either heavy ($p_j = 1$) or light ($p_j = \varepsilon$), for some parameter $\varepsilon > 0$. Our main result is a $(2-\delta)$-approximate polynomial time algorithm for the $(1,\epsilon)$-restricted assignment problem for a fixed constant $\delta> 0$. Even for this special case, the best polynomial-time approximation factor known so far is 2. We obtain this result by rounding the configuration LP relaxation for this problem. A simple reduction from vertex cover shows that this special case remains NP-hard to approximate to within a factor better than 7/6.

at October 29, 2014 12:40 AM UTC



On Sunday we had the Symposium on Theoretical Computer Science on the Occasion of Michael Sipser's 60th birthday to celebrate what Mike has brought to research (seminal work on the complexity of randomness and circuits), service (ten years leading the MIT math department before recently becoming Dean of Science) and education (his great textbook and the corresponding popular course he still teaches).

We had an incredible turnout for the symposium and banquet that followed. I counted five Turing Award Winners (Mike's advisor Manuel Blum, Leslie Valiant, Shafi Goldwasser, Silvio Micali and Ron Rivest), five of the nine Nevanlinna Prize winners (Mike's student Daniel Spielman, Avi Wigderson, Valiant, Peter Shor and Madhu Sudan) and nine Gödel Prize winners.

Manuel Blum talked about his work with Jeremiah Blocki, Anupam Datta, and Santosh Vempala about a human computable hash function to create different passwords for every website. I've seen Manuel and Santosh produce passwords from arbitrary words and I'm impressed.

Johan Håstad recounted the early days of circuit complexity. Avi also talked about lower bounds. Shafi talked on her great paper with Mike showing public and private coin interactive proofs have the same power and a recent application of that result by Moni Naor et al showing how a defendant can convince the police his DNA doesn't match without revealing his DNA.

A number of Mike's PhD students also gave talks. Dan Spielman gave a framework for designing quantum algorithms without knowing much quantum.

The banquet included several stories, thanks and toasts. Nearly all of Mike's students participated in some way, a great collection of men and women I'm proud to be part of: David Barrington, Ravi Boppana, Jonathan Buss, Andrew Chou, Aditi Dhagat, Lance Fortnow, David Gillman, Michelangelo Grigni, Christos Kapoutsis, Marcos Kiwi, Mary (Bruzzese) O'Connor, Sofya Raskhodnikova, Zack Remscrim (current), Alexander Russell, Leonard Schulman, Daniel Spielman, Ravi Sundaram, Andrew Sutherland and Yiqun Yin .

by Lance Fortnow (noreply@blogger.com) at October 28, 2014 04:21 PM UTC

Like many (theoretical) computer scientists, I spent far, far too much time this weekend thinking about rankings.  Well, really, it was more that I was being far, far too amused by the various posts and comments on the novel ranking site "Ranking of CS Departments based on the Number of Papers in Theoretical Computer Science",  which led to posts at the Computational Complexity blog and in theory, the first of which shut down the comments rather quickly, and the second of which is at this time nearing the amazing 100 comment line.  (Luca's post deserves it -- it was the funniest thing I've seen all week.  Kudos.) 

Now, to fan these flames to further untold heights, I was informed that new US News and World Report Computer Science Rankings are out.  Needless to say, I disagree with them.  How exactly can Harvard be second to MIT?  Ridiculous.

(For those who are concerned about such things, the methodology is also given, though I haven't checked to see what parts of it are reproducible.  And what is that "Web Of Science" thing?  Is it as good as the citation tools in Google Scholar?)

For the ultimate in ranking tongue-in-cheek-iness, and for those who haven't seen it, I also recommend the NRC Deranker, which will (eventually) provide you the ranking you personally agree most with. 


by Michael Mitzenmacher (noreply@blogger.com) at October 28, 2014 03:46 PM UTC

UPenn Big Data Reading Group

This semester I am running a reading group on algorithms for big data at UPenn. The goal is to cover some of the most important papers in streaming and massively parallel computation, which came out in the last 5-10 years. This area is evolving extremely fast, so I will be grateful for any suggestions about the papers missing from our list. Special thanks to Silvio Lattanzi for a few suggestions he gave me while I was visiting Google NYC on Friday!

I also want to give a shout-out for Penn theory graduate students, who made presentations at the group meetings so much more enjoyable than I ever imagined. Thank you, Steven Wu, Justin Hsu, Ryan Rogers and Sepehr Assadi! I guess, some of them are going to hit the internship job market soon. Keep an eye if you are looking to hire in algorithms for big graphs and other data.

Penn Big Data Reading Group was originally published by Grigory Yaroslavtsev at The Big Data Theory on October 28, 2014.

by Grigory Yaroslavtsev (grigory@grigory.us) at October 28, 2014 12:00 AM UTC

Authors: Abhishek Bhowmick, Ariel Gabizon, Thái Hoàng Lê, David Zuckerman
Download: PDF
Abstract: We propose a new model of a weakly random source that admits randomness extraction. Our model of additive sources includes such natural sources as uniform distributions on arithmetic progressions (APs), generalized arithmetic progressions (GAPs), and Bohr sets, each of which generalizes affine sources. We give an explicit extractor for additive sources with linear min-entropy over both $\mathbb{Z}_p$ and $\mathbb{Z}_p^n$, for large prime $p$, although our results over $\mathbb{Z}_p^n$ require that the source further satisfy a list-decodability condition. As a corollary, we obtain explicit extractors for APs, GAPs, and Bohr sources with linear min-entropy, although again our results over $\mathbb{Z}_p^n$ require the list-decodability condition. We further explore special cases of additive sources. We improve previous constructions of line sources (affine sources of dimension 1), requiring a field of size linear in $n$, rather than $\Omega(n^2)$ by Gabizon and Raz. This beats the non-explicit bound of $\Theta(n \log n)$ obtained by the probabilistic method. We then generalize this result to APs and GAPs.

at October 28, 2014 12:40 AM UTC

Authors: Tobias Jacobs, Salvatore Longo
Download: PDF
Abstract: The Windows Scheduling Problem, also known as the Pinwheel Problem, is to schedule periodic jobs subject to their processing frequency demands. Instances are given as a set of jobs that have to be processed infinitely often such that the time interval between two consecutive executions of the same job j is no longer than the job's given period $p_j$.

The key contribution of this work is a new interpretation of the problem variant with exact periods, where the time interval between consecutive executions must be strictly $p_j$. We show that this version is equivalent to a natural combinatorial problem we call Partial Coding. Reductions in both directions can be realized in polynomial time, so that both hardness proofs and algorithms for Partial Coding transfer to Windows Scheduling.

Applying this new perspective, we obtain a number of new results regarding the computational complexity of various Windows Scheduling Problem variants. We prove that even the case of one processor and unit-length jobs does not admit a pseudo-polynomial time algorithm unless SAT can be solved by a randomized method in expected quasi-polynomial time. This result also extends to the case of inexact periods, which answers a question that has remained open for more than two decades. Furthermore, we report an error found in a hardness proof previously given for the multi-machine case without machine migration, and we show that this variant reduces to the single-machine case. Finally, we prove that even with unit-length jobs the problem is co-NP-hard when jobs are allowed to migrate between machines.

at October 28, 2014 12:00 AM UTC

Authors: Reza Eghbali, Jon Swenson, Maryam Fazel
Download: PDF
Abstract: Online optimization problems arise in many resource allocation tasks, where the future demands for each resource and the associated utility functions change over time and are not known apriori, yet resources need to be allocated at every point in time despite the future uncertainty. In this paper, we consider online optimization problems with general concave utilities. We modify and extend an online optimization algorithm proposed by Devanur et al. for linear programming to this general setting. The model we use for the arrival of the utilities and demands is known as the random permutation model, where a fixed collection of utilities and demands are presented to the algorithm in random order. We prove that under this model the algorithm achieves a competitive ratio of $1-O(\epsilon)$ under a near-optimal assumption that the bid to budget ratio is $O \left(\frac{\epsilon^2}{\log\left({m}/{\epsilon}\right)}\right)$, where $m$ is the number of resources, while enjoying a significantly lower computational cost than the only algorithm known to achieve optimal competitive ratio proposed by Kesselheim et al. We draw a connection between the proposed algorithm and subgradient methods used in convex optimization. In addition, we present numerical experiments that demonstrate the performance and speed of this algorithm in comparison to existing algorithms.

at October 28, 2014 12:41 AM UTC

Authors: Marek Chrobak, Mordecai Golin, Tak-Wah Lam, Dorian Nogneng
Download: PDF
Abstract: We consider scheduling problems for unit jobs with release times, where the number or size of the gaps in the schedule is taken into consideration, either in the objective function or as a constraint. Except for a few papers on energy minimization, there is no work in the scheduling literature that uses performance metrics depending on the gap structure of a schedule. One of our objectives is to initiate the study of such scheduling problems with gaps. We show that such problems often lead to interesting algorithmic problems, with connections to other areas of algorithmics. We focus on the model with unit jobs. First we examine scheduling problems with deadlines, where we consider variants of minimum-gap scheduling, including maximizing throughput with a budget for gaps or minimizing the number of gaps with a throughput requirement. We then turn to other objective functions. For example, in some scenarios, gaps in a schedule may be actually desirable, leading to the problem of maximizing the number of gaps. Other versions we study include minimizing maximum gap or maximizing minimum gap. The second part of the paper examines the model without deadlines, where we focus on the tradeoff between the number of gaps and the total or maximum flow time. For all these problems we provide polynomial time algorithms, with running times ranging from $O(n \log n)$ for some problems, to $O(n^7)$ for other. The solutions involve a spectrum of algo- rithmic techniques, including different dynamic programming formulations, speed-up techniques based on searching Monge arrays, searching X + Y matrices, or implicit binary search.

at October 28, 2014 12:41 AM UTC

Authors: Aleksandr Maksimenko
Download: PDF
Abstract: Let $X$ be a finite set in $Z^d$. We consider the problem of optimizing linear function $f(x) = c^T x$ on $X$, where $c\in Z^d$ is an input vector. We call it a problem $X$. A problem $X$ is related with linear program $\max\limits_{x \in P} f(x)$, where polytope $P$ is a convex hull of $X$. The key parameters for evaluating the complexity of a problem $X$ are the dimension $d$, the cardinality $|X|$, and the encoding size $S(X) = \log_2 \left(\max\limits_{x\in X} \|x\|_{\infty}\right)$. We show that if the (time and space) complexity of some algorithm $A$ for solving a problem $X$ is defined only in terms of combinatorial structure of $P$ and the size $S(X)$, then for every $d$ and $n$ there exists polynomially (in $d$, $\log n$, and $S$) solvable problem $Y$ with $\dim Y = d$, $|Y| = n$, such that the algorithm $A$ requires exponential time or space for solving $Y$.

at October 28, 2014 12:41 AM UTC

Authors: Jean-Daniel Boissonnat, Ramsay Dyer, Arijit Ghosh, Steve Y. Oudot
Download: PDF
Abstract: In this paper, we give the first algorithm that outputs a faithful reconstruction of a submanifold of Euclidean space without maintaining or even constructing complicated data structures such as Voronoi diagrams or Delaunay complexes. Our algorithm uses the witness complex and relies on the stability of power protection, a notion introduced in this paper. The complexity of the algorithm depends exponentially on the intrinsic dimension of the manifold, rather than the dimension of ambient space, and linearly on the dimension of the ambient space. Another interesting feature of this work is that no explicit coordinates of the points in the point sample is needed. The algorithm only needs the distance matrix as input, i.e., only distance between points in the point sample as input.

at October 28, 2014 12:42 AM UTC

Authors: Aleksandr Cariow, Galina Cariowa
Download: PDF
Abstract: In this paper we present a hardware-oriented algorithm for constant matrix-vector product calculating, when the all elements of vector and matrix are complex numbers. The proposed algorithm versus the naive method of analogous calculations drastically reduces the number of multipliers required for FPGA implementation of complex-valued constant matrix-vector multiplication.If the fully parallel hardware implementation of naive (schoolbook) method for complex-valued matrix-vector multiplication requires 4MN multipliers, 2M N-inputs adders and 2MN two-input adders, the proposed algorithm requires only 3N(M+1)/2 multipliers and 3M(N+2)+1,5N+2 two-input adders and 3(M+1) N/2-input adders.

at October 28, 2014 12:41 AM UTC

Authors: Jesús A. De Loera, Susan Margulies, Michael Pernpeintner, Eric Riedl, David Rolnick, Gwen Spencer, Despina Stasi, Jon Swenson
Download: PDF
Abstract: We revisit a well-known family of polynomial ideals encoding the problem of graph-$k$-colorability. Our paper describes how the inherent combinatorial structure of the ideals implies several interesting algebraic properties. Specifically, we provide lower bounds on the difficulty of computing Gr\"obner bases and Nullstellensatz certificates for the coloring ideals of general graphs. For chordal graphs, however, we explicitly describe a Gr\"obner basis for the coloring ideal, and provide a polynomial-time algorithm.

at October 28, 2014 12:41 AM UTC

Authors: Michael Cohen, Sam Elder, Cameron Musco, Christopher Musco, Madalina Persu
Download: PDF
Abstract: We show how to approximate a data matrix $\mathbf{A}$ with a much smaller sketch $\mathbf{\tilde A}$ that can be used to solve a general class of \emph{constrained k-rank approximation} problems to within $(1+\epsilon)$ error. Importantly, this class of problems includes $k$-means clustering and unconstrained low rank approximation (i.e. principle component analysis). By reducing data points to just $O(k)$ dimensions, our methods generically accelerate any exact, approximate, or heuristic algorithm for these ubiquitous problems.

For $k$-means dimensionality reduction, we provide the first $(1+\epsilon)$ relative error results for many common sketching techniques, including random row projection, column selection, and approximate SVD. For approximate PCA, we give a simple alternative to known algorithms that has applications in the streaming setting. Additionally, we extend recent work on column-based matrix reconstruction, giving column subsets that not only `cover' a good subspace for $\mathbf{A}$, but can be used directly to compute this subspace.

Finally, for $k$-means clustering, we show how to achieve a $(9+\epsilon)$ approximation by Johnson-Lindenstrauss projecting data points to just $O(\log k/\epsilon^2)$ dimensions. This gives the first result with dimension independent of input size and sublinear in $k$. It opens the important question of whether it is possible to obtain a $(1+\epsilon)$ approximation using an $o(k)$-dimensional sketch.

at October 27, 2014 12:00 AM UTC

Authors: Tali Kaufman, David Kazhdan, Alexander Lubotzky
Download: PDF
Abstract: Expander graphs have been intensively studied in the last four decades. In recent years a high dimensional theory of expanders has emerged, and several variants have been studied. Among them stand out coboundary expansion and topological expansion. It is known that for every $d$ there are unbounded degree simplicial complexes of dimension $d$ with these properties. However, a major open problem, formulated by Gromov, is whether bounded degree high dimensional expanders exist for $d \geq 2$.

We present an explicit construction of bounded degree complexes of dimension $d=2$ which are topological expanders, thus answering Gromov's question in the affirmative. Conditional on a conjecture of Serre on the congruence subgroup property, infinite sub-family of these give also a family of bounded degree coboundary expanders.

The main technical tools are new isoperimetric inequalities for Ramanujan Complexes. We prove linear size bounds on $F_2$ systolic invariants of these complexes, which seem to be the first linear $F_2$ systolic bounds. The expansion results are deduced from these isoperimetric inequalities.

at October 28, 2014 12:41 AM UTC


How to beat the end of Moore’s Law

Screen Shot 2014-10-27 at 2.25.21 PM

Elie Track and Tom Conte were co-chairs of the recently-held Third IEEE workshop on Rebooting Computing. Tom is a computer architect at Georgia Tech. A year ago he became the 2014 President-Elect of the IEEE Computer Society, which according to both the press release and Conte’s Wikipedia page entails that he “currently serves as the 2015 President.” I guess that is one way to stay focused on the future. Track is president of the IEEE Council on Superconductivity. A year ago he founded nVizix, a startup to develop photovoltaic power, and serves as CEO. He is also a senior partner at Hypres, a superconducting electronics company.

Today I wish to relate some of what happened last week at the meeting.

The meeting was held in Santa Cruz and had about forty participants. I was honored to be one of them. The problem we were given to consider is simple to state:

Computers are growing in importance every day, in all aspects of our lives. Yet the never-ending increase in their performance and the corresponding decrease in their price seems to be near the end.

Essentially this is saying that Moore’s Law, named for Gordon Moore, is about to end.

Well “about” is a bit tricky. Some would argue it has already ended: computers continue to have more transistors, but the clock rate of a processor has stayed much the same. Some argue that it will end in a few decades. Yet others claim that it could go on for a while longer. Lawrence Krauss and Glenn Starkman see the limit still six hundred—yes 600—years away. This is based on the physics of the Universe and the famous Bekenstein bound. I wonder if this limit has any reality when we consider smartphones, tablets, and laptops?

In 2005, Gordon Moore stated: “It can’t continue forever.” Herbert Stein’s Law says:

“If something cannot go on forever, it will stop.”

I love this law.

The Main Issues

The workshop was divided into four groups—see here for the program.

  • Security
  • Parallelism
  • Human Computer Interaction
  • Randomness and Approximation

Let me say something about each of these areas. Security is one of the most difficult areas in all of computing. The reason is that it is essentially a game between the builders of systems and the attackers. Under standard adversarial modeling, the attackers win provided there is one hole in the system. Those at meeting discussed at length how this makes security difficult—impossible?—to achieve. The relationship to the end of Moore’s Law is weaker than in the other groups. Whether computers continue to gain in price-performance, security will continue to be a huge problem.

Parallelism is a long-studied problem. Making sequential programs into parallel ones is a famously difficult problem. It is gaining importance precisely because of the end of Moore’s Law. The failure to make faster single processors is giving us chips that have several and soon hundreds of processors. The central question is, how can we possibly use these many processors to speed up our algorithms? What I like about this problem is it is really a theory type question: Can we build parallel algorithms for tasks that seem inherently sequential?

Human Computer Interaction (HCI) is about helping humans, us, to use computers to do our work. Gregory Abowd of Georgia Tech gave a great talk on this issue. His idea is that future computing is all about three things:

  1. Clouds
  2. Crowds
  3. Shrouds

Of course “clouds” refers to the rise of cloud computing, which is already an important area. And “crowds” refers not only to organized crowdsourcing but also to rising use of social media to gather information, try experiments, make decisions, and more.

The last, “shrouds,” is a strange use of the word, I think. Gregory means by “shrouds” the computerized smart devices that we will soon be surrounded with every day. Now it may just be a smartphone, but soon it will include watches, glasses, implants, and who knows what else. Ken’s recent TEDxBuffalo talk covered the same topic, including evidence that human-computer partnerships made better chess moves than computers acting alone, at least until very recently.

The last area is randomness and approximation. This sounds near and dear to theory, and it is. Well, the workshop folks mean it in a slightly different manner. The rough idea is that if it is impossible to run a processor faster, then perhaps we can get around that by changing the way processors work. Today’s processors are made to compute exactly. Yes they can make errors, but they are very rare. The new idea is to run the processor’s much faster to the point where they will make random errors and also be unable to compute to high precision. The belief is that perhaps this could then be used to make them more useful.

The fundamental question is: If we have fast processors that are making lots of errors, can we use them to get better overall performance? The answer is unknown. But the mere possibility is intriguing. It makes a nice contrast:

Today we operate like this:

Design approximate algorithms that run on exact processors. Can we switch this around: Design algorithms that operate on approximate processors?

I note that in the narrow area of just communication the answer is yes: It is better to have a very fast channel that has a reasonable error rate, than a slower exact channel. This is the whole point of error correction. Is there some theory like this for computation?

Open Problems

There are several theory-type issues here. What really are the ultimate limits of computation? Can we exploit more parallelism? This seems related to the class {\mathsf{NC}}, for example. And finally, can we exploit approximate computing to solve traditional problems?


by rjlipton at October 27, 2014 08:27 PM UTC

The 16th ACM Conference on Economics and Computation (EC 2015) will be held on June 15-19, as part of FCRC 2015, in Portland, Oregon. The Call for Papers is now out. As in previous years, to accommodate the publishing traditions of different fields (i.e., subsequent publications in Econ/other journals), authors of accepted papers are allowed to publish only a one page abstract in the proceedings.  Looking forward to your contribution and participation!


by michalfeldman at October 27, 2014 07:52 PM UTC

You might wonder why I am welcoming attorneys to the blog.  Well, I happen to know that some attorneys read my blog.  I know this because recently, as an expert witness, while I was testifying in court, the attorney on the other side during my cross-examination specifically asked me about my blog.  Based on their questions, I am reasonably sure they had downloaded all the posts.  I imagine some poor associate attorney low on the totem pole had to read through it all to see if they could find anything embarrassing they could use in the context of the case. 

I don't think it was successful.  But who knows.  At one point, the attorney cross-examining me asked if one of the reasons I liked to do consulting work was because it pays well.  I answered, "I would say that's one of the reasons."  That wasn't a hard answer to give;  it's an honest answer.  But afterward I recalled that, years ago, I had talked about consulting on my blog.  I looked it up, and sure enough, here is what I wrote years ago:
I enjoy consulting, for several reasons:
  1. It fulfills a need to be more directly relevant to the real world.
  2. It gives me a chance to work with different sets of people.
  3. It allows me to learn new things and exercise skills that I don't necessarily use as a professor.
  4. It pays well.
Based on my experience as a witness, I am of very high certainty that the attorney had this post in front of him when he asked the question.  Of course, when he asked the question, he failed to note the other reasons I had given for consulting, or provide me the blog post for context.  But my assumption is that they were simply looking for a "gotcha" moment.  Perhaps the question and my response made me look bad to the jury.  (After subsequent clarifying questions on the issue from the client's attorney, I would like to believe it had no effect.)  I imagine that they were hoping that I would say that the pay wasn't important, in which case I am sure they would have readily put up the post from my blog to show the jury to try to discredit me.   

I talk very little about my legal consulting on the blog or publicly.  The main reason, obviously, is out of respect for and professional obligation to my clients.  But I find I don't blog about it even in very general terms, where confidentiality or other similar issues would not come into play.  And one good reason not to is that an attorney could use anything I write on the subject to try to impeach my credibility as a witness, even if it is out of context. 

Of course, as this example shows, anything I post online can potentially later be used against me in a case or in court.  Or in other situations.  These issues regarding our permanent digital identity, and how modern technology allows our past to easily be brought up and used against us, are not new;  I'm not suggesting I'm offering novel insight here.  Really, I'm just relaying an instance where the issue of my digital identity, in the context of this blog, hit me head on.  Certainly when I wrote that post years ago about consulting I did not think it would come up later in this way.  Mostly it makes me wonder how my children, who are busy exploring their own identities (digital and otherwise), will manage in a world where most of their history will be available online, for people to use or misuse in ways that I could not imagine.     

by Michael Mitzenmacher (noreply@blogger.com) at October 27, 2014 07:46 PM UTC

The Computing Community Consortium (CCC) invites proposals for visioning workshops that will catalyze and enable innovative research at the frontiers of computing.  Successful activities will articulate new research visions, galvanize community interest in those visions, mobilize support for those visions from the computing research community, government leaders, and funding agencies, and encourage broader segments of society to participate in computing research and education.  Past examples can be found at www.cra.org/ccc/visioning/visioning-activities.

Workshop organizers are expected to bring together a group of scientists and practitioners in the area of interest, and to formulate a program that encourages new ideas, innovative thinking, and broad discussion. Workshops can be of varying sizes, typically ranging from 20 to 100 participants.  It is important that the participants cover a broad spectrum to ensure full coverage of the area, both in terms of content area representation and employment (academia, industry, research labs, and policy and funding organizations).

Workshops are expected to have a tangible output – for example, a whitepaper (or set thereof) or a workshop report. Workshop outcomes should be targeted to multiple audiences (the research community, science policy groups or funding agencies, the general public), and the deliverables should be tailored for easy dissemination.  CCC will help to support both workshop organization and the subsequent generation and communication of the output.

The CCC encourages creative ideas from all segments of the computing research community on topics ranging from the formulation of new basic research areas and technologies to the use of new or existing research ideas and technologies to address important scientific or societal challenges.

For CCC planning purposes, proposals with start dates prior to September 2015 should be submitted by December 1, 2014.

by salilvadhan at October 27, 2014 02:35 AM UTC

So, writing in latex is sometime a low level affair. One of the irritating thing about latex is its handling of resizable parentheses. The default way to do this is quite tedious:
$ \left( whatever \right) $
The true LaTeX believers think one should never use this directly – instead one should use the amsmath \bigl(, \Bigl(, \Biggl(, \bigr),... commands, and handling resizing of parentheses explicitly, because it looks better (especially for summations with subscripts, automatic resizing of parentheses results in monstrously large parentheses).

Nevetheless, if you are lazy like me, the natural thing is to write a little macro:
\newcommand{\pth}[1]{\left( #1 \right) }
Usage: $\pth{ whatever } $

Which works great, except that it doesn’t:
$$\text{With pth: }f\left( x \right)\text{ compared to regular }f(x).$$ This demonstrates the problem with the spacing generated – latex adds a tiny space before a left command.

There is a tedious way to solve this, but it involves adding negative space to the macro, and then being careful to remove this space when using \pth in a subscript (which is what I was doing for a long time). Luckily, there is a latex package that solves this problem: mleftright. Doing:

\usepackage{mleftright}
...
\newcommand{\pth}[1]{\mleft( #1 \mright) }
...

Works correctly in all situations (at least so far).

by Sariel at October 27, 2014 02:22 AM UTC

Authors: Michael Axtmann, Timo Bingmann, Peter Sanders, Christian Schulz
Download: PDF
Abstract: To obtain sorting algorithms that scale to the largest available machines, conventional parallel sorting algorithms cannot be used since they either have prohibitive communication volume or prohibitive critical path length for computation. We outline ideas how to combine a number of basic algorithmic techniques which overcome these bottlenecks.

at October 27, 2014 12:40 AM UTC