Theory of Computing Blog Aggregator

Today, I’m told, is Rational Approximation Day. It’s 22/7 (for those who write dates in little-endian format), which differs from π by about 0.04 percent. (The big-endians among us are welcome to approximate 1/π.)

Given the present state of life in America, what we really need is an Approximation to Rationality Day, but that may have to wait for 20/1/21. In the meantime, let us merrily fiddle with numbers, searching for ratios of integers that brazenly invade the personal space of famous irrationals.

When I was a teenager, somebody told me about the number 355/113, which is an exceptionally good approximation to π. The exact value is


correct through the first six digits after the decimal point. In other words, it differs from the true value by less than one-millionth. I was intrigued, and so I set out to find an even better approximation. My search was necessarily a pencil-and-paper affair, since I had no access to any electronic or even mechanical aids to computation. The spiral-bound notebook in which I made my calculations has not survived, and I remember nothing about the outcome of the effort.

A dozen years later I acquired some computing machinery: a Hewlett-Packard programmable calculator, called the HP-41C. Here is the main loop of an HP-41C program that searches for good rational approximations. Note the date at the top of the printout (written in middle-endian format). Apparently I was finishing up this program just before Approximation Day in 1981.

HP 41C program main loop

What’s that you say? You’re not fluent in the 30-year-old Hewlett-Packard dialect of reverse Polish notation? All right, here’s a program that does roughly the same thing, written in an oh-so-modern language, Julia.

function approximate(T, dmax)
    d = 1
    leastError = T
    while d <= dmax && leastError > 0
        n = Int(round(d * T))
        err = abs(T - n/d) / T
        merit = 1 / ((n + d)^2 * err)
        if err < leastError
            println("$n/$d = $(n/d)  error = $err  merit = $merit")
            leastError = err
        d += 1

The algorithm is a naive, sequential search for fractions \(n/d\) that approximate the target number \(T\). For each value of \(d\), you need to consider only one value of \(n\), namely the integer nearest to \(d \times T\). (What happens if \(d \times T\) falls halfway between two integers? That can’t happen if \(T\) is irrational.) Thus you can begin with \(d = 1\) and continue up to a specified largest denominator \(d = dmax\). The accuracy of the approximation is measured by the error term \(|T - n/d| / T\). Whenever a value of \(n/d\) yields a new minimum error, the program prints a line of results. (This version of the algorithm works correctly only for \(T \gt 1\), but it can readily be adapted to \(T \lt 1\).

The HP-41C has a numerical precision of 10 decimal digits, and so the closest possible approximation to π is 3.141592654. Back in 1981 I ran the program until it found a fraction equal to this value—a perfect approximation, from the program’s point of view. According to a note on the printout, that took 13 hours. The Julia program above, running on a laptop, completes the same computation in about three milliseconds. You’re welcome to take a scroll through the results, below. (The numbers are not digit-for-digit identical to those generated by the HP-41C because Julia calculates with higher precision, about 16 decimal digits.)

     3/1     = 3.0                 error = 0.045070341573315915    merit =  1.3867212410256813
    13/4     = 3.25                error = 0.03450712996224109     merit =  0.10027514940370374
    16/5     = 3.2                 error = 0.018591635655129744    merit =  0.12196741256165179
    19/6     = 3.1666666666666665  error = 0.007981306117055373    merit =  0.20046844169789904
    22/7     = 3.142857142857143   error = 0.0004024993041452083   merit =  2.9541930379680195
   179/57    = 3.1403508771929824  error = 0.00039526983405584675  merit =  0.04542368072920613
   201/64    = 3.140625            error = 0.0003080138345651019   merit =  0.04623150469956595
   223/71    = 3.140845070422535   error = 0.00023796324342470652  merit =  0.04861781754719378
   245/78    = 3.141025641025641   error = 0.0001804858353094197   merit =  0.053107007660473673
   267/85    = 3.1411764705882352  error = 0.00013247529441315622  merit =  0.060922789404334425
   289/92    = 3.141304347826087   error = 9.177070539240495e-5    merit =  0.07506646742266793
   311/99    = 3.1414141414141414  error = 5.6822320879624425e-5   merit =  0.10469195703580983
   333/106   = 3.141509433962264   error = 2.6489760736525772e-5   merit =  0.19588127575835135
   355/113   = 3.1415929203539825  error = 8.478310581938076e-8    merit = 53.85164473263654
 52518/16717 = 3.1415923909792425  error = 8.37221074104896e-8     merit =  0.00249177288308447
 52873/16830 = 3.141592394533571   error = 8.259072954625822e-8    merit =  0.0024921016732136797
 53228/16943 = 3.1415923980404887  error = 8.147444291923546e-8    merit =  0.0024926612882136163
 53583/17056 = 3.141592401500938   error = 8.03729477091334e-8     merit =  0.0024934520351304946
 53938/17169 = 3.1415924049158366  error = 7.928595172899531e-8    merit =  0.0024944743578840687
 54293/17282 = 3.141592408286078   error = 7.821317056655376e-8    merit =  0.0024957288257085445
 54648/17395 = 3.141592411612532   error = 7.715432730151448e-8    merit =  0.002497216134767719
 55003/17508 = 3.1415924148960475  error = 7.610915194012454e-8    merit =  0.0024989371196291283
 55358/17621 = 3.1415924181374497  error = 7.507738155653036e-8    merit =  0.0025008927426067996
 55713/17734 = 3.1415924213375437  error = 7.405876001006156e-8    merit =  0.0025030840968725283
 56068/17847 = 3.1415924244971145  error = 7.305303737979925e-8    merit =  0.002505512419906649
 56423/17960 = 3.1415924276169265  error = 7.20599703886498e-8     merit =  0.002508179074048983
 56778/18073 = 3.141592430697726   error = 7.107932141383905e-8    merit =  0.0025110855755419263
 57133/18186 = 3.14159243374024    error = 7.01108591937022e-8     merit =  0.002514233565685482
 57488/18299 = 3.1415924367451775  error = 6.915435783817789e-8    merit =  0.0025176248413626597
 57843/18412 = 3.1415924397132304  error = 6.820959725288218e-8    merit =  0.0025212613363967255
 58198/18525 = 3.141592442645074   error = 6.727636243231866e-8    merit =  0.002525145143834103
 58553/18638 = 3.141592445541367   error = 6.635444374259433e-8    merit =  0.0025292785028112976
 58908/18751 = 3.141592448402752   error = 6.544363663870371e-8    merit =  0.0025336638062423296
 59263/18864 = 3.141592451229856   error = 6.454374152317083e-8    merit =  0.002538303603848205
 59618/18977 = 3.1415924540232916  error = 6.365456332197522e-8    merit =  0.002543200616913158
 59973/19090 = 3.1415924567836564  error = 6.277591190862598e-8    merit =  0.002548357720152209
 60328/19203 = 3.1415924595115348  error = 6.190760125601375e-8    merit =  0.0025537779743748956
 60683/19316 = 3.1415924622074964  error = 6.10494500018427e-8     merit =  0.0025594646031786867
 61038/19429 = 3.1415924648720983  error = 6.020128088319864e-8    merit =  0.002565421015548036
 61393/19542 = 3.141592467505885   error = 5.936292059519092e-8    merit =  0.0025716508123781218
 61748/19655 = 3.141592470109387   error = 5.853420007366852e-8    merit =  0.0025781577749599853
 62103/19768 = 3.1415924726831244  error = 5.771495407114599e-8    merit =  0.002584945883912429
 62458/19881 = 3.141592475227604   error = 5.690502101544554e-8    merit =  0.002592019327133724
 62813/19994 = 3.141592477743323   error = 5.6104242868339024e-8   merit =  0.0025993825084809985
 63168/20107 = 3.1415924802307655  error = 5.531246526690591e-8    merit =  0.0026070400439016164
 63523/20220 = 3.1415924826904056  error = 5.4529537523533324e-8   merit =  0.0026149967637792084
 63878/20333 = 3.141592485122707   error = 5.375531191912607e-8    merit =  0.002623257749852838
 64233/20446 = 3.141592487528123   error = 5.2989644268538606e-8   merit =  0.0026318283126966317
 64588/20559 = 3.141592489907097   error = 5.22323933551431e-8     merit =  0.0026407140236596287
 64943/20672 = 3.1415924922600618  error = 5.148342135490336e-8    merit =  0.002649920699086574
 65298/20785 = 3.1415924945874427  error = 5.0742592988226976e-8   merit =  0.002659454449139831
 65653/20898 = 3.1415924968896545  error = 5.0009776226755164e-8   merit =  0.0026693216486930156
 66008/21011 = 3.141592499167103   error = 4.928484186928889e-8    merit =  0.002679528965991537
 66363/21124 = 3.1415925014201855  error = 4.8567663400430846e-8   merit =  0.0026900833784673454
 66718/21237 = 3.1415925036492913  error = 4.7858116990585446e-8   merit =  0.0027009921818650063
 67073/21350 = 3.141592505854801   error = 4.715608149595883e-8    merit =  0.0027122629998437182
 67428/21463 = 3.1415925080370872  error = 4.6461438175842924e-8   merit =  0.002723903810648984
 67783/21576 = 3.1415925101965145  error = 4.577407111668933e-8    merit =  0.002735922933992634
 68138/21689 = 3.1415925123334407  error = 4.5093866383961494e-8   merit =  0.0027483290931549346
 68493/21802 = 3.1415925144482157  error = 4.442071258756658e-8    merit =  0.002761131395876878
 68848/21915 = 3.141592516541182   error = 4.375450074049751e-8    merit =  0.002774339356802981
 69203/22028 = 3.1415925186126747  error = 4.309512411747499e-8    merit =  0.0027879629217230834
 69558/22141 = 3.1415925206630235  error = 4.244247783087354e-8    merit =  0.002802012512429091
 69913/22254 = 3.14159252269255    error = 4.179645953751142e-8    merit =  0.0028164989998024
 70268/22367 = 3.1415925247015695  error = 4.115696873186072e-8    merit =  0.0028314337694556623
 70623/22480 = 3.1415925266903915  error = 4.0523907028763286e-8   merit =  0.002846828724926181
 70978/22593 = 3.141592528659319   error = 3.989717788071482e-8    merit =  0.00286269633032941
 71333/22706 = 3.1415925306086496  error = 3.9276686719222797e-8   merit =  0.0028790496258831624
 71688/22819 = 3.1415925325386738  error = 3.86623409548065e-8     merit =  0.0028959022542887716
 72043/22932 = 3.141592534449677   error = 3.805404969428105e-8    merit =  0.0029132685103826087
 72398/23045 = 3.1415925363419395  error = 3.7451723882115376e-8   merit =  0.0029311633622333107
 72753/23158 = 3.1415925382157353  error = 3.685527615907423e-8    merit =  0.002949602495467867
 73108/23271 = 3.1415925400713336  error = 3.626462086221821e-8    merit =  0.002968602349703417
 73463/23384 = 3.1415925419089974  error = 3.567967430761971e-8    merit =  0.002988180133716996
 73818/23497 = 3.141592543728987   error = 3.510035365949903e-8    merit =  0.003008353961046636
 74173/23610 = 3.1415925455315543  error = 3.452657862652023e-8    merit =  0.003029142753805288
 74528/23723 = 3.1415925473169497  error = 3.395826962413729e-8    merit =  0.0030505664465106676
 74883/23836 = 3.141592549085417   error = 3.339534904681598e-8    merit =  0.0030726459300795604
 75238/23949 = 3.1415925508371956  error = 3.283774056124397e-8    merit =  0.003095403169820992
 75593/24062 = 3.141592552572521   error = 3.228536938904675e-8    merit =  0.0031188612412389144
 75948/24175 = 3.1415925542916234  error = 3.173816202407169e-8    merit =  0.0031430444223940115
 76303/24288 = 3.14159255599473    error = 3.1196046373746034e-8   merit =  0.0031679782521683033
 76658/24401 = 3.141592557682062   error = 3.065895190043484e-8    merit =  0.0031936895918127546
 77013/24514 = 3.1415925593538385  error = 3.01268089146511e-8     merit =  0.0032202067806171002
 77368/24627 = 3.141592561010273   error = 2.9599549423203633e-8   merit =  0.003247559639023363
 77723/24740 = 3.1415925626515766  error = 2.9077106281049175e-8   merit =  0.0032757796556622983
 78078/24853 = 3.1415925642779543  error = 2.8559414180798277e-8   merit =  0.0033048999843237645
 78433/24966 = 3.14159256588961    error = 2.804640823913544e-8    merit =  0.003334955716987436
 78788/25079 = 3.1415925674867418  error = 2.753802541039899e-8    merit =  0.0033659838476231357
 79143/25192 = 3.141592569069546   error = 2.703420321435919e-8    merit =  0.003398023556100075
 79498/25305 = 3.1415925706382137  error = 2.6534880725724155e-8   merit =  0.0034311162371627422
 79853/25418 = 3.141592572192934   error = 2.6039997867349902e-8   merit =  0.0034653057466538235
 80208/25531 = 3.141592573733892   error = 2.554949569295635e-8    merit =  0.0035006385417218717
 80563/25644 = 3.14159257526127    error = 2.5063316245769302e-8   merit =  0.0035371638899188347
 80918/25757 = 3.1415925767752455  error = 2.4581402841236452e-8   merit =  0.0035749340371894456
 81273/25870 = 3.1415925782759953  error = 2.410369936023742e-8    merit =  0.003614004535709633
 81628/25983 = 3.1415925797636914  error = 2.3630150955873712e-8   merit =  0.00365443439340209
 81983/26096 = 3.141592581238504   error = 2.3160703488036753e-8   merit =  0.00369628643041249
 82338/26209 = 3.141592582700599   error = 2.2695304230197833e-8   merit =  0.003739627468693587
 82693/26322 = 3.1415925841501404  error = 2.2233900879902193e-8   merit =  0.0037845288174018898
 83048/26435 = 3.1415925855872895  error = 2.1776442124200985e-8   merit =  0.0038310665494126084
 83403/26548 = 3.1415925870122043  error = 2.1322877639651253e-8   merit =  0.00387932189896066
 83758/26661 = 3.1415925884250404  error = 2.087315795095796e-8    merit =  0.003929381726572982
 84113/26774 = 3.1415925898259505  error = 2.0427234430973973e-8   merit =  0.003981339007706688
 84468/26887 = 3.1415925912150855  error = 1.9985059017984126e-8   merit =  0.004035293430477111
 84823/27000 = 3.1415925925925925  error = 1.9546584922495102e-8   merit =  0.004091351857390988
 85178/27113 = 3.1415925939586176  error = 1.9111765637729565e-8   merit =  0.004149629190123568
 85533/27226 = 3.1415925953133033  error = 1.868055578777407e-8    merit =  0.004210248941258058
 85888/27339 = 3.141592596656791   error = 1.825291042078912e-8    merit =  0.004273344214343279
 86243/27452 = 3.1415925979892174  error = 1.78287858571571e-8     merit =  0.004339058439193095
 86598/27565 = 3.1415925993107203  error = 1.7408138417260385e-8   merit =  0.004407546707464268
 86953/27678 = 3.1415926006214323  error = 1.6990925835061217e-8   merit =  0.004478976601684539
 87308/27791 = 3.1415926019214853  error = 1.6577106127237806e-8   merit =  0.004553529781140699
 87663/27904 = 3.1415926032110093  error = 1.6166637875900305e-8   merit =  0.004631403402447433
 88018/28017 = 3.141592604490131   error = 1.5759480794022753e-8   merit =  0.0047128116308472546
 88373/28130 = 3.141592605758976   error = 1.5355594877295166e-8   merit =  0.004797987771392931
 88728/28243 = 3.1415926070176683  error = 1.4954940686839493e-8   merit =  0.0048871863549194705
 89083/28356 = 3.141592608266328   error = 1.4557479914641577e-8   merit =  0.004980685405908598
 89438/28469 = 3.141592609505076   error = 1.4163174252687263e-8   merit =  0.005078789613658918
 89793/28582 = 3.1415926107340284  error = 1.3771986523826276e-8   merit =  0.005181833172630217
 90148/28695 = 3.141592611953302   error = 1.338387969226633e-8    merit =  0.005290183824183623
 90503/28808 = 3.1415926131630103  error = 1.2998817570363058e-8   merit =  0.005404246870669908
 90858/28921 = 3.1415926143632653  error = 1.2616764535904027e-8   merit =  0.005524470210563737
 91213/29034 = 3.141592615554178   error = 1.2237685249392783e-8   merit =  0.005651350205744754
 91568/29147 = 3.1415926167358563  error = 1.1861545360838771e-8   merit =  0.005785438063205309
 91923/29260 = 3.1415926179084073  error = 1.1488310802967408e-8   merit =  0.005927347979056494
 92278/29373 = 3.141592619071937   error = 1.111794779122008e-8    merit =  0.006077766389438445
 92633/29486 = 3.141592620226548   error = 1.0750423671902066e-8   merit =  0.006237462409303776
 92988/29599 = 3.1415926213723435  error = 1.0385705649960649e-8   merit =  0.006407301439430316
 93343/29712 = 3.1415926225094237  error = 1.0023761778491034e-8   merit =  0.00658826005755035
 93698/29825 = 3.1415926236378877  error = 9.664560534662385e-9    merit =  0.006781444748602359
 94053/29938 = 3.1415926247578327  error = 9.308070961075804e-9    merit =  0.006988114128701429
 94408/30051 = 3.1415926258693556  error = 8.954262241690382e-9    merit =  0.007209706348604964
 94763/30164 = 3.14159262697255    error = 8.603104549971112e-9    merit =  0.007447871540046976
 95118/30277 = 3.14159262806751    error = 8.254567918024995e-9    merit =  0.007704513406469473
 95473/30390 = 3.141592629154327   error = 7.90862336746494e-9     merit =  0.007981838717667477
 95828/30503 = 3.1415926302330917  error = 7.565241919903853e-9    merit =  0.008282421184374838
 96183/30616 = 3.1415926313038933  error = 7.224395162386583e-9    merit =  0.008609280341750632
 96538/30729 = 3.1415926323668195  error = 6.8860552473899216e-9   merit =  0.008965982432171553
 96893/30842 = 3.1415926334219573  error = 6.550194468748648e-9    merit =  0.009356770586561815
 97248/30955 = 3.141592634469391   error = 6.216785968445456e-9    merit =  0.009786732283709331
 97603/31068 = 3.1415926355092054  error = 5.885802747105052e-9    merit =  0.010262022067809991
 97958/31181 = 3.1415926365414837  error = 5.557218370784088e-9    merit =  0.010790155391967196
 98313/31294 = 3.1415926375663066  error = 5.231007112329143e-9    merit =  0.011380406450991833
 98668/31407 = 3.1415926385837554  error = 4.907143103228812e-9    merit =  0.012044356667029002
 99023/31520 = 3.1415926395939087  error = 4.585601323119603e-9    merit =  0.01279665696194468
 99378/31633 = 3.141592640596845   error = 4.266356751638026e-9    merit =  0.013656119502875172
 99733/31746 = 3.1415926415926414  error = 3.9493849338525334e-9   merit =  0.014647305857352692
100088/31859 = 3.141592642581374   error = 3.6346615561895634e-9   merit =  0.015802906908552822
100443/31972 = 3.1415926435631176  error = 3.322162870507497e-9    merit =  0.017167407267272748
100798/32085 = 3.1415926445379463  error = 3.0118652700227016e-9   merit =  0.018802933529623964
101153/32198 = 3.141592645505932   error = 2.703745854741474e-9    merit =  0.020798958087527405
101508/32311 = 3.1415926464671475  error = 2.397781441954139e-9    merit =  0.02328921472604781
101863/32424 = 3.141592647421663   error = 2.0939496970989362e-9   merit =  0.026482916558483883
102218/32537 = 3.1415926483695484  error = 1.7922284269720909e-9   merit =  0.03072676661447583
102573/32650 = 3.141592649310873   error = 1.492595579727815e-9    merit =  0.03664010548445531
102928/32763 = 3.141592650245704   error = 1.195029527594277e-9    merit =  0.04544847105306477
103283/32876 = 3.1415926511741086  error = 8.995092082315892e-10   merit =  0.05996553050516452
103638/32989 = 3.1415926520961532  error = 6.060132765838922e-10   merit =  0.0883984797913258
103993/33102 = 3.1415926530119025  error = 3.1452123574324146e-10  merit =  0.16916355170353897
104348/33215 = 3.141592653921421   error = 2.5012447443706518e-11  merit =  2.1127131430431656

The error values in the middle column of the table above shrink steadily as you read from the top of the list to the bottom. Each successive approximation is more accurate than all those above it. Does that also mean each successive approximation is better than those above it? I would say no. Any reasonable notion of “better” in this context has to take into account the size of the numerator and the denominator.

If you want an approximation of \(\pi\) accurate to seven digits, I can give you one off the top of my head: \(3141593/1000000\). But the numbers making up that ratio are themselves seven digits long. What makes \(355/113\) impressive is that it achieves seven-digit accuracy with only three digits in the numerator and the denominator. Accordingly, I would argue that a “better” approximation is one that minimizes both error and size. The rightmost column of the table, filled with numbers labeled “merit” is meant to quantify this intuition.

When I wrote that program in 1981, I chose a strange formula for merit, one that now baffles me:

\[\frac{1}{(n + d)^2 * err}.\]

Adding the numerator and denominator and then squaring the sum is an operation that makes no sense, although the formula as a whole does have the correct qualitative behavior, favoring both smaller errors and smaller values of \(n\) and \(d\). In trying to reconstruct what I had in mind 26 years ago, my best guess is that I was trying to capture a geometric insight, and I flubbed it when translating math into code. On this assumption, the correct figure of merit would be:

\[\frac{1}{\sqrt{n^2 + d^2} * err}.\]

To see where this formula comes from, consider a two-dimensional lattice of integers, with a ray of slope \(\pi\) drawn from the origin and going on to infinite distance.

Lattice of integers

Because the line’s slope is irrational, it will never pass through any point of the integer lattice, but it will have many near misses. The near-miss points, with coordinates interpreted as numerator and denominator, are the accurate approximations to \(\pi\). The diagram suggests a measure of the merit based on distances. An approximation gets better when we minimize the distance of the lattice point from the origin as well as the vertical distance from the point to the \(\pi\) line. That’s the meaning of the formula with \(\sqrt{n^2 + d^2}\) in the denominator.

Another approach to defining merit simply counts digits. The merit is the ratio of the number of correctly predicted digits in the irrational target \(T\) to the number of digits in the denominator. A problem with this scheme is that it’s rather coarse. For example, \(13/4\) and \(16/5\) both have single-digit denominators and they each get one digit of \(\pi\) correct, but
\(16/5\) actually has a smaller error.

To smooth out the digit-counting criterion, and distinguish between values that differ in magnitude but have the same number of digits, we can take logarithms of the numbers. Let merit equal: \(-log(err) / log(d)\). (The \(log(err)\) term is negated because the error is always less than \(1\) and so its logarithm is negative.)

Here’s a comparison of the three merit criteria for some selected approximations to \(\pi\):

     n/d           1981 merit                  distance merit              log merit

     3/1        1.3867212448620723            7.016316181613145         --
    13/4        0.10027514901117529           2.1306165422053285        2.4284808488226544
    16/5        0.12196741168912356           3.208700907602539         2.4760467349663537
    19/6        0.20046843839209055           6.288264070960828         2.6960388788612515
    22/7        2.954192079226498           107.61458138965322          4.017563128080901
   179/57       0.04542369572848121          13.467303354323912         1.9381258641568968
   201/64       0.04623152429195394          15.390920494844842         1.9441196398907357
   223/71       0.04861784421796857          17.956388291625093         1.9573120958787444
   245/78       0.05310704607396699          21.548988850935377         1.9785253787278367
   267/85       0.06092284944437125          26.93965209372642          2.0098618723780515
   289/92       0.07506657421887829          35.92841360228601          2.055872071177696
   311/99       0.10469219759604646          53.921550739835986         2.1273838230139175
   333/106      0.1958822412726219          108.02438852795403          2.259868093766371
   355/113     53.76883630752973          31610.90993685001             3.444107245852723
 52163/16604    0.002495514149618044        215.57611105028013          1.6757260012234105
      •                  •                         •                            •
      •                  •                         •                            •
      •                  •                         •                            •
103993/33102    0.2892417579456485        49813.04849576935             2.1538978293241056
104348/33215    0.5006051667655171        86508.24042805366             2.2065386096084607
208341/66317    0.3403602724772912       117433.39822796892             2.1589243556399245
312689/99532    0.6343809166515098       328504.0552596196              2.207421489352196

All three measures agree that \(22/7\) and \(355/113\) are quite special. In other respects they give quite different views of the data. My weird 1981 formula compares \((n + d)^{-2}\) with \(err^{-1}\); the asymmetry in the exponents suggests the merit will tend to zero as \(n\) and \(d\) increase, at least in the average case. The maximum of the distance-based measure, on the other hand, appears to grow without bound. And the logarithmic merit function seems to be settling on a value near 2.0. This implies that we shouldn’t expect to see many \(n/d \) approximations where the number of correct digits is greater than twice the number of digits in \(d\). The late Tom Apostol and Mamikon A. Mnatsakanian proved a closely related proposition (“Surprisingly accurate rational approximations,” Mathematics Magazine, Vol. 75, No. 4 (Oct. 2002), pp. 307-310).

The final joke on my 1981 self is that all this searching for better approximants can be neatly sidestepped by a bit of algorithmic sophistication. The magic phrase is “continued fractions.” The continued fraction for \(\pi\) begins:

\[ \pi = 3+\cfrac{1}{7+\cfrac{1}{15+\cfrac{1}{292+\cfrac{1}{1 + \cdots}}}}\]

Evaluating the successive levels of this expression yields a sequence of “convergents” that should look familiar:

\[3/1, 22/7, 333/106, 355/113, 103993/33102, 104348/33215.\]

It is a series of “best” approximations to \(\pi\), generated without bothering with all the intervening non-”best” values. I produced this list in CoCalc (a.k.a. SageMathCloud), following the excellent tutorial in William Stein’s Elementary Number Theory. Even much larger approximants gush forth from the algorithm in milliseconds. Here’s the 100th element of the series:


A question remains: In what sense are these approximations “best”? It’s guaranteed that every element of the series is more accurate than all those that came before, but it’s not clear to me that they also satisfy any sort of compactness criterion. But that’s a question to be taken up another day. Perhaps on Continued Fraction Day.

by Brian Hayes at July 22, 2017 11:55 PM UTC

Itsykson and Sokolov in 2014 introduced the class of DPLL($\oplus$) algorithms that solve Boolean satisfiability problem using the splitting by linear combinations of variables modulo 2. This class extends the class of DPLL algorithms that split by variables. DPLL($\oplus$) algorithms solve in polynomial time systems of linear equations modulo two that are hard for DPLL, PPSZ and CDCL algorithms. Itsykson and Sokolov have proved first exponential lower bounds for DPLL($\oplus$) algorithms on unsatisfiable formulas. In this paper we consider a subclass of DPLL($\oplus$) algorithms that arbitrary choose a linear form for splitting and randomly (with equal probabilities) choose a value to investigate first; we call such algorithms drunken DPLL($\oplus$). We give a construction of a family of satisfiable CNF formulas $\Psi_n$ of size poly($n$) such that any drunken DPLL($\oplus$) algorithm with probability at least $1 - 2^{-\Omega(n)}$ runs at least $2^{\Omega(n)}$ steps on $\Psi_n$; thus we solve an open question stated in the paper of Itsykson and Sokolov. This lower bound extends the result of Alekhnovich, Hirsch and Itsykson (2005) from drunken DPLL to drunken DPLL($\oplus$).

at July 22, 2017 09:08 PM UTC

It's summer.  And I'm now on sabbatical.  So perhaps I shouldn't care about strange Harvard politics goings-on, but I can't help it.

Here's the tl;dr version, which is already too long.  (What's the recursive form of tl;dr?)  Sometime back, the powers-that-be at Harvard decided that they didn't like the Harvard final clubs (Harvard's kind-of-like fraternities, "social clubs" that have been around for ages, but that are not in any official way affiliated with Harvard).  There's plenty of reason not to like them, but at least initially concerns about sexual assault seemed to be the motivating factor.  So the powers-that-be decided that if you belonged to some private single-sex organization, they would not let you be captain of a sports team, or be approved by Harvard for a Rhodes fellowship, or things like that.  A number of faculty -- perhaps most notably, Harry Lewis -- objected to this policy, on multiple grounds.  (Perhaps one large one is that there are many private single-sex organizations that are quite positive, and it seems odd to put all these organizations under the same blanket policy.)  So after it was clear that there was some significant faculty objections, for a bit it was temporarily shelved, and a new committee put in place to make recommendations.

Several weeks deep in the summer, the report comes out, suggesting policies even harsher and more draconian than the original plan.  And the reasons for this outcome, from the latest-breaking reporting, seem at least somewhat confused.

There's now two issues, seemingly only tangentially related, but that have come together here.  The first relates to the suggested policies themselves.  But the second relates to how Harvard is governed -- can these types of disciplinary regulations for the students come into being without a vote of the full faculty.  The first, rightly, gets more attention, but as a member of the faculty, the second is of significant importance to me, and relates to a historical trend of taking power (or at least trying to take power) away from the faculty that seems consistent since I've arrived at Harvard.

I could write pages and pages about this, but fortunately for me (and you), the Crimson and Harry Lewis already have.  So for those who care enough to find out more, here are links.  The most recent Crimson piece, Seven Votes, suggests there's more behind the latest report that is worth knowing about, and probably is as good a starting point as any.  All of Harry's posts are interesting, but perhaps the best are where Harry relays his remarks from faculty meetings -- they're great to read, but I admit, it's much more fun to watch him deliver them in person.

Crimson articles (that last week or so)

Key Events in Harvard's Social Groups Sanctions Policy
Faculty Committee Recommends Social Groups be 'Phased Out'
Social Group Ban Revives Faculty Concerns Over Governance
Seven Votes    **** [most recent, and controversial]

Harry's blog (chronological order, back to about November).

My remarks to the faculty in support of the nondiscrimination motion
Why History Matters
My remarks to the FAS meeting of December 6th, 2016
Further Q&A on the motion for the December 6th meeting
The State of the Debate
Withdrawing the motion
Guest post on nondiscrimination
Where the rubber meets the road
How it would probably work in practice
Professor Haig's motion against compelled oaths
More about the implementation
I know it's a dumb question but
This week's development in USGSO policy
Harvard's nondiscrimination policy
The new policy about social clubs
Further comments on the social club policy

by Michael Mitzenmacher ( at July 22, 2017 06:52 PM UTC

“Information is physical.”

This slogan seems to have originated around 1991 with Rolf Landauer.  It’s ricocheted around quantum information for the entire time I’ve been in the field, incanted in funding agency reports and popular articles and at the beginnings and ends of talks.

But what the hell does it mean?

There are many things it’s taken to mean, in my experience, that don’t make a lot of sense when you think about them—or else they’re vacuously true, or purely a matter of perspective, or not faithful readings of the slogan’s words.

For example, some people seem to use the slogan to mean something more like its converse: “physics is informational.”  That is, the laws of physics are ultimately not about mass or energy or pressure, but about bits and computations on them.  As I’ve often said, my problem with that view is less its audacity than its timidity!  It’s like, what would the universe have to do in order not to be informational in this sense?  “Information” is just a name we give to whatever picks out one element from a set of possibilities, with the “amount” of information given by the log of the set’s cardinality (and with suitable generalizations to infinite sets, nonuniform probability distributions, yadda yadda).  So, as long as the laws of physics take the form of telling us that some observations or configurations of the world are possible and others are not, or of giving us probabilities for each configuration, no duh they’re about information!

Other people use “information is physical” to pour scorn on the idea that “information” could mean anything without some actual physical instantiation of the abstract 0’s and 1’s, such as voltage differences in a loop of wire.  Here I certainly agree with the tautology that in order to exist physically—that is, be embodied in the physical world—a piece of information (like a song, video, or computer program) does need to be embodied in the physical world.  But my inner Platonist slumps in his armchair when people go on to assert that, for example, it’s meaningless to discuss the first prime number larger than 1010^125, because according to post-1998 cosmology, one couldn’t fit its digits inside the observable universe.

If the cosmologists revise their models next week, will this prime suddenly burst into existence, with all the mathematical properties that one could’ve predicted for it on general grounds—only to fade back into the netherworld if the cosmologists revise their models again?  Why would anyone want to use language in such a tortured way?

Yes, brains, computers, yellow books, and so on that encode mathematical knowledge comprise only a tiny sliver of the physical world.  But it’s equally true that the physical world we observe comprises only a tiny sliver of mathematical possibility-space.

Still other people use “information is physical” simply to express their enthusiasm for the modern merger of physical and information sciences, as exemplified by quantum computing.  Far be it from me to temper that enthusiasm: rock on, dudes!

Yet others use “information is physical” to mean that the rules governing information processing and transmission in the physical world aren’t knowable a priori, but can only be learned from physics.  This is clearest in the case of quantum information, which has its own internal logic that generalizes the logic of classical information.  But in some sense, we didn’t need quantum mechanics to tell us this!  Of course the laws of physics have ultimate jurisdiction over whatever occurs in the physical world, information processing included.

My biggest beef, with all these unpackings of the “information is physical” slogan, is that none of them really engage with any of the deep truths that we’ve learned about physics.  That is, we could’ve had more-or-less the same debates about any of them, even in a hypothetical world where the laws of physics were completely different.

So then what should we mean by “information is physical”?  In the rest of this post, I’d like to propose an answer to that question.

We get closer to the meat of the slogan if we consider some actual physical phenomena, say in quantum mechanics.  The double-slit experiment will do fine.

Recall: you shoot photons, one by one, at a screen with two slits, then examine the probability distribution over where the photons end up on a second screen.  You ask: does that distribution contain alternating “light” and “dark” regions, the signature of interference between positive and negative amplitudes?  And the answer, predicted by the math and confirmed by experiment, is: yes, but only if the information about which slit the photon went through failed to get recorded anywhere else in the universe, other than the photon location itself.

Here a skeptic interjects: but that has to be wrong!  The criterion for where a physical particle lands on a physical screen can’t possibly depend on anything as airy as whether “information” got “recorded” or not.  For what counts as “information,” anyway?  As an extreme example: what if God, unbeknownst to us mortals, took divine note of which slit the photon went through?  Would that destroy the interference pattern?  If so, then every time we do the experiment, are we collecting data about the existence or nonexistence of an all-knowing God?

It seems to me that the answer is: insofar as the mind of God can be modeled as a tensor factor in Hilbert space, yes, we are.  And crucially, if quantum mechanics is universally true, then the mind of God would have to be such a tensor factor, in order for its state to play any role in the prediction of observed phenomena.

To say this another way: it’s obvious and unexceptionable that, by observing a physical system, you can often learn something about what information must be in it.  For example, you need never have heard of DNA to deduce that chickens must somehow contain information about making more chickens.  What’s much more surprising is that, in quantum mechanics, you can often deduce things about what information can’t be present, anywhere in the physical world—because if such information existed, even a billion light-years away, it would necessarily have a physical effect that you don’t see.

Another famous example here concerns identical particles.  You may have heard the slogan that “if you’ve seen one electron, you’ve seen them all”: that is, apart from position, momentum, and spin, every two electrons have exactly the same mass, same charge, same every other property, including even any properties yet to be discovered.  Again the skeptic interjects: but that has to be wrong.  Logically, you could only ever confirm that two electrons were different, by observing a difference in their behavior.  Even if the electrons had behaved identically for a billion years, you couldn’t rule out the possibility that they were actually different, for example because of tiny nametags (“Hi, I’m Emily the Electron!” “Hi, I’m Ernie!”) that had no effect on any experiment you’d thought to perform, but were visible to God.

You can probably guess where this is going.  Quantum mechanics says that, no, you can verify that two particles are perfectly identical by doing an experiment where you swap them and see what happens.  If the particles are identical in all respects, then you’ll see quantum interference between the swapped and un-swapped states.  If they aren’t, you won’t.  The kind of interference you’ll see is different for fermions (like electrons) than for bosons (like photons), but the basic principle is the same in both cases.  Once again, quantum mechanics lets you verify that a specific type of information—in this case, information that distinguishes one particle from another—was not present anywhere in the physical world, because if it were, it would’ve destroyed an interference effect that you in fact saw.

This, I think, already provides a meatier sense in which “information is physical” than any of the senses discussed previously.

But we haven’t gotten to the filet mignon yet.  The late, great Jacob Bekenstein will forever be associated with the discovery that information, wherever and whenever it occurs in the physical world, takes up a minimum amount of space.  The most precise form of this statement, called the covariant entropy bound, was worked out in detail by Raphael Bousso.  Here I’ll be discussing a looser version of the bound, which holds in “non-pathological” cases, and which states that a bounded physical system can store at most A/(4 ln 2) bits of information, where A is the area in Planck units of any surface that encloses the system—so, about 1069 bits per square meter.  (Actually it’s 1069 qubits per square meter, but because of Holevo’s theorem, an upper bound on the number of qubits is also an upper bound on the number of classical bits that can be reliably stored in a system and then retrieved later.)

You might have heard of the famous way Nature enforces this bound.  Namely, if you tried to create a hard drive that stored more than 1069 bits per square meter of surface area, the hard drive would necessarily collapse to a black hole.  And from that point on, the information storage capacity would scale “only” with the area of the black hole’s event horizon—a black hole itself being the densest possible hard drive allowed by physics.

Let’s hear once more from our skeptic.  “Nonsense!  Matter can take up space.  Energy can take up space.  But information?  Bah!  That’s just a category mistake.  For a proof, suppose God took one of your black holes, with a 1-square-meter event horizon, which already had its supposed maximum of ~1069 bits of information.  And suppose She then created a bunch of new fundamental fields, which didn’t interact with gravity, electromagnetism, or any of the other fields that we know from observation, but which had the effect of encoding 10300 new bits in the region of the black hole.  Presto!  An unlimited amount of additional information, exactly where Bekenstein said it couldn’t exist.”

We’d like to pinpoint what’s wrong with the skeptic’s argument—and do so in a self-contained, non-question-begging way, a way that doesn’t pull any rabbits out of hats, other than the general principles of relativity and quantum mechanics.  I was confused myself about how to do this, until a month ago, when Daniel Harlow helped set me straight (any remaining howlers in my exposition are 100% mine, not his).

I believe the logic goes like this:

  1. Relativity—even just Galilean relativity—demands that, in flat space, the laws of physics must have the same form for all inertial observers (i.e., all observers who move through space at constant speed).
  2. Anything in the physical world that varies in space—say, a field that encodes different bits of information at different locations—also varies in time, from the perspective of an observer who moves through the field at a constant speed.
  3. Combining 1 and 2, we conclude that anything that can vary in space can also vary in time.  Or to say it better, there’s only one kind of varying: varying in spacetime.
  4. More strongly, special relativity tells us that there’s a specific numerical conversion factor between units of space and units of time: namely the speed of light, c.  Loosely speaking, this means that if we know the rate at which a field varies across space, we can also calculate the rate at which it varies across time, and vice versa.
  5. Anything that varies across time carries energy.  Why?  Because this is essentially the definition of energy in quantum mechanics!  Up to a constant multiple (namely, Planck’s constant), energy is the expected speed of rotation of the global phase of the wavefunction, when you apply your Hamiltonian.  If the global phase rotates at the slowest possible speed, then we take the energy to be zero, and say you’re in a vacuum state.  If it rotates at the next highest speed, we say you’re in a first excited state, and so on.  Indeed, assuming a time-independent Hamiltonian, the evolution of any quantum system can be fully described by simply decomposing the wavefunction into a superposition of energy eigenstates, then tracking of the phase of each eigenstate’s amplitude as it loops around and around the unit circle.  No energy means no looping around means nothing ever changes.
  6. Combining 3 and 5, any field that varies across space carries energy.
  7. More strongly, combining 4 and 5, if we know how quickly a field varies across space, we can lower-bound how much energy it has to contain.
  8. In general relativity, anything that carries energy couples to the gravitational field.  This means that anything that carries energy necessarily has an observable effect: if nothing else, its effect on the warping of spacetime.  (This is dramatically illustrated by dark matter, which is currently observable via its spacetime warping effect and nothing else.)
  9. Combining 6 and 8, any field that varies across space couples to the gravitational field.
  10. More strongly, combining 7 and 8, if we know how quickly a field varies across space, then we can lower-bound by how much it has to warp spacetime.  This is so because of another famous (and distinctive) feature of gravity: namely, the fact that it’s universally attractive, so all the warping contributions add up.
  11. But in GR, spacetime can only be warped by so much before we create a black hole: this is the famous Schwarzschild bound.
  12. Combining 10 and 11, the information contained in a physical field can only vary so quickly across space, before it causes spacetime to collapse to a black hole.

Summarizing where we’ve gotten, we could say: any information that’s spatially localized at all, can only be localized so precisely.  In our world, the more densely you try to pack 1’s and 0’s, the more energy you need, therefore the more you warp spacetime, until all you’ve gotten for your trouble is a black hole.  Furthermore, if we rewrote the above conceptual argument in math—keeping track of all the G’s, c’s, h’s, and so on—we could derive a quantitative bound on how much information there can be in a bounded region of space.  And if we were careful enough, that bound would be precisely the holographic entropy bound, which says that the number of (qu)bits is at most A/(4 ln 2), where A is the area of a bounding surface in Planck units.

Let’s pause to point out some interesting features of this argument.

Firstly, we pretty much needed the whole kitchen sink of basic physical principles: special relativity (both the equivalence of inertial frames and the finiteness of the speed of light), quantum mechanics (in the form of the universal relation between energy and frequency), and finally general relativity and gravity.  All three of the fundamental constants G, c, and h made appearances, which is why all three show up in the detailed statement of the holographic bound.

But secondly, gravity only appeared from step 8 onwards.  Up till then, everything could be said solely in the language of quantum field theory: that is, quantum mechanics plus special relativity.  The result would be the so-called Bekenstein bound, which upper-bounds the number of bits in any spatial region by the product of the region’s radius and its energy content.  I learned that there’s an interesting history here: Bekenstein originally deduced this bound using ingenious thought experiments involving black holes.  Only later did people realize that the Bekenstein bound can be derived purely within QFT (see here and here for example)—in contrast to the holographic bound, which really is a statement about quantum gravity.  (An early hint of this was that, while the holographic bound involves Newton’s gravitational constant G, the Bekenstein bound doesn’t.)

Thirdly, speaking of QFT, some readers might be struck by the fact that at no point in our 12-step program did we ever seem to need QFT machinery.  Which is fortunate, because if we had needed it, I wouldn’t have been able to explain any of this!  But here I have to confess that I cheated slightly.  Recall step 4, which said that “if you know the rate at which a field varies across space, you can calculate the rate at which it varies across time.”  It turns out that, in order to give that sentence a definite meaning, one uses the fact that in QFT, space and time derivatives in the Hamiltonian need to be related by a factor of c, since otherwise the Hamiltonian wouldn’t be Lorentz-invariant.

Fourthly, eagle-eyed readers might notice a loophole in the argument.  Namely, we never upper-bounded how much information God could add to the world, via fields that are constant across all of spacetime.  For example, there’s nothing to stop Her from creating a new scalar field that takes the same value everywhere in the universe—with that value, in suitable units, encoding 1050000 separate divine thoughts in its binary expansion.  But OK, being constant, such a field would interact with nothing and affect no observations—so Occam’s Razor itches to slice it off, by rewriting the laws of physics in a simpler form where that field is absent.  If you like, such a field would at most be a comment in the source code of the universe: it could be as long as the Great Programmer wanted it to be, but would have no observable effect on those of us living inside the program’s execution.

Of course, even before relativity and quantum mechanics, information had already been playing a surprisingly fleshy role in physics, through its appearance as entropy in 19th-century thermodynamics.  Which leads to another puzzle.  To a computer scientist, the concept of entropy, as the log of the number of microstates compatible with a given macrostate, seems clear enough, as does the intuition for why it should increase monotonically with time.  Or at least, to whatever extent we’re confused about these matters, we’re no more confused than the physicists are!

But then why should this information-theoretic concept be so closely connected to tangible quantities like temperature, and pressure, and energy?  From the mere assumption that a black hole has a nonzero entropy—that is, that it takes many bits to describe—how could Bekenstein and Hawking have possibly deduced that it also has a nonzero temperature?  Or: if you put your finger into a tub of hot water, does the heat that you feel somehow reflect how many bits are needed to describe the water’s microstate?

Once again our skeptic pipes up: “but surely God could stuff as many additional bits as She wanted into the microstate of the hot water—for example, in degrees of freedom that are still unknown to physics—without the new bits having any effect on the water’s temperature.”

But we should’ve learned by now to doubt this sort of argument.  There’s no general principle, in our universe, saying that you can hide as many bits as you want in a physical object, without those bits influencing the object’s observable properties.  On the contrary, in case after case, our laws of physics seem to be intolerant of “wallflower bits,” which hide in a corner without talking to anyone.  If a bit is there, the laws of physics want it to affect other nearby bits and be affected by them in turn.

In the case of thermodynamics, the assumption that does all the real work here is that of equidistribution.  That is, whatever degrees of freedom might be available to your thermal system, your gas in a box or whatever, we assume that they’re all already “as randomized as they could possibly be,” subject to a few observed properties like temperature and volume and pressure.  (At least, we assume that in classical thermodynamics.  Non-equilibrium thermodynamics is a whole different can of worms, worms that don’t stay in equilibrium.)  Crucially, we assume this despite the fact that we might not even know all the relevant degrees of freedom.

Why is this assumption justified?  “Because experiment bears it out,” the physics teacher explains—but we can do better.  The assumption is justified because, as long as the degrees of freedom that we’re talking about all interact with each other, they’ve already had plenty of time to equilibrate.  And conversely, if a degree of freedom doesn’t interact with the stuff we’re observing—or with anything that interacts with the stuff we’re observing, etc.—well then, who cares about it anyway?

But now, because the microscopic laws of physics have the fundamental property of reversibility—that is, they never destroy information—a new bit has to go somewhere, and it can’t overwrite degrees of freedom that are already fully randomized.  This is why, if you pump more bits of information into a tub of hot water, while keeping it at the same volume, the new bits have nowhere to go except into pushing up the energy.  Now, there are often ways to push up the energy other than by raising the temperature—the concept of specific heat, in chemistry, is precisely about this—but if you need to stuff more bits into a substance, at the cost of raising its energy, certainly one of the obvious ways to do it is to describe a greater range of possible speeds for the water molecules.  So since that can happen, by equidistribution it typically does happen, which means that the molecules move faster on average, and your finger feels the water get hotter.

In summary, our laws of physics are structured in such a way that even pure information often has “nowhere to hide”: if the bits are there at all in the abstract machinery of the world, then they’re forced to pipe up and have a measurable effect.  And this is not a tautology, but comes about only because of nontrivial facts about special and general relativity, quantum mechanics, quantum field theory, and thermodynamics.  And this is what I think people should mean when they say “information is physical.”

Anyway, if this was all obvious to you, I apologize for having wasted your time!  But in my defense, it was never explained to me quite this way, nor was it sorted out in my head until recently—even though it seems like one of the most basic and general things one can possibly say about physics.

Endnotes. Thanks again to Daniel Harlow, not only for explaining the logic of the holographic bound to me but for several suggestions that improved this post.

Some readers might suspect circularity in the arguments we’ve made: are we merely saying that “any information that has observable physical consequences, has observable physical consequences”?  No, it’s more than that.  In all the examples I discussed, the magic was that we inserted certain information into our abstract mathematical description of the world, taking no care to ensure that the information’s presence would have any observable consequences whatsoever.  But then the principles of quantum mechanics, quantum gravity, or thermodynamics forced the information to be detectable in very specific ways (namely, via the destruction of quantum interference, the warping of spacetime, or the generation of heat respectively).

by Scott at July 21, 2017 01:30 AM UTC

Authors: Ruben Becker, Vincenzo Bonifaci, Andreas Karrenbauer, Pavel Kolev, Kurt Mehlhorn
Download: PDF
Abstract: In this paper, we present two results on slime mold computations. The first one treats a biologically-grounded model, originally proposed by biologists analyzing the behavior of the slime mold Physarum polycephalum. This primitive organism was empirically shown by Nakagaki et al. to solve shortest path problems in wet-lab experiments (Nature'00). We show that the proposed simple mathematical model actually generalizes to a much wider class of problems, namely undirected linear programs with a non-negative cost vector.

For our second result, we consider the discretization of a biologically-inspired model. This model is a directed variant of the biologically-grounded one and was never claimed to describe the behavior of a biological system. Straszak and Vishnoi showed that it can $\epsilon$-approximately solve flow problems (SODA'16) and even general linear programs with positive cost vector (ITCS'16) within a finite number of steps. We give a refined convergence analysis that improves the dependence on $\epsilon$ from polynomial to logarithmic and simultaneously allows to choose a step size that is independent of $\epsilon$. Furthermore, we show that the dynamics can be initialized with a more general set of (infeasible) starting points.

at July 21, 2017 12:00 AM UTC

Authors: Eldar Fischer, Frédéric Magniez, Tatiana Starikovskaya
Download: PDF
Abstract: In this paper we consider the problem of deciding membership in Dyck languages, a fundamental family of context-free languages, comprised of well-balanced strings of parentheses. In this problem we are given a string of length $n$ in the alphabet of parentheses of $m$ types and must decide if it is well-balanced. We consider this problem in the property testing setting, where one would like to make the decision while querying as few characters of the input as possible.

Property testing of strings for Dyck language membership for $m=1$, with a number of queries independent of the input size $n$, was provided in [Alon, Krivelevich, Newman and Szegedy, SICOMP 2001]. Property testing of strings for Dyck language membership for $m \ge 2$ was first investigated in [Parnas, Ron and Rubinfeld, RSA 2003]. They showed an upper bound and a lower bound for distinguishing strings belonging to the language from strings that are far (in terms of the Hamming distance) from the language, which are respectively (up to polylogarithmic factors) the $2/3$ power and the $1/11$ power of the input size $n$.

Here we improve the power of $n$ in both bounds. For the upper bound, we introduce a recursion technique, that together with a refinement of the methods in the original work provides a test for any power of $n$ larger than $2/5$. For the lower bound, we introduce a new problem called Truestring Equivalence, which is easily reducible to the $2$-type Dyck language property testing problem. For this new problem, we show a lower bound of $n$ to the power of $1/5$.

at July 21, 2017 12:00 AM UTC

Authors: Bin Sheng, Ruijuan Li, Gregory Gutin
Download: PDF
Abstract: The famous Chinese Postman Problem (CPP) is polynomial time solvable on both undirected and directed graphs. Gutin et al. [Discrete Applied Math 217 (2016)] generalized these results by proving that CPP on $c$-edge-colored graphs is polynomial time solvable for every $c\geq 2$. In CPP on weighted edge-colored graphs $G$, we wish to find a minimum weight properly colored closed walk containing all edges of $G$ (a walk is properly colored if every two consecutive edges are of different color, including the last and first edges in a closed walk). In this paper, we consider CPP on arc-colored digraphs (for properly colored closed directed walks), and provide a polynomial-time algorithm for the problem on weighted 2-arc-colored digraphs. This is a somewhat surprising result since it is NP-complete to decide whether a 2-arc-colored digraph has a properly colored directed cycle [Gutin et al., Discrete Math 191 (1998)]. To obtain the polynomial-time algorithm, we characterize 2-arc-colored digraphs containing properly colored Euler trails.

at July 21, 2017 12:00 AM UTC

Authors: Rajesh Chitnis, Andreas Emil Feldmann, Pasin Manurangsi
Download: PDF
Abstract: (Shortened abstract below, see paper for full abstract)

Given an edge-weighted directed graph $G=(V,E)$ and a set of $k$ terminal pairs $\{(s_i, t_i)\}_{i=1}^{k}$, the objective of the \pname{Directed Steiner Network (DSN)} problem is to compute the cheapest subgraph $N$ of $G$ such that there is an $s_i\to t_i$ path in $N$ for each $i\in [k]$. This problem is notoriously hard: there is no polytime $O(2^{\log^{1-\eps}n})$-approximation~[Dodis \& Khanna, \emph{STOC}~'99], and it is W[1]-hard~[Guo~et~al., \emph{SIDMA}~'11] for the well-studied parameter $k$. One option to circumvent these hardness results is to obtain a \emph{parameterized approximation}. Our first result is the following:

- Under Gap-ETH there is no $k^{o(1)}$-approximation for \pname{DSN} in $f(k)\cdot n^{O(1)}$ time for any function~$f$.

Given our results above, the question we explore in this paper is: can we obtain faster algorithms or better approximation ratios for other special cases of \pname{DSN} when parameterizing by $k$?

A well-studied special case of \pname{DSN} is the \pname{Strongly Connected Steiner Subgraph (SCSS)} problem, where the goal is to find the cheapest subgraph of $G$, which pairwise connects all $k$ given terminals. We show the following

- Under Gap-ETH no $(2-\eps)$-approximation can be computed in $f(k)\cdot n^{O(1)}$ time, for any function~$f$. This shows that the $2$-approximation in $2^k\polyn$ time observed in~[Chitnis~et~al., \emph{IPEC}~2013] is optimal.

Next we consider \dsn and \scss when the input graph is \emph{bidirected}, i.e., for every edge $uv$ the reverse edge exists and has the same weight. We call the respective problems \bidsn and \biscss and show:

- \biscss is NP-hard, but there is a \smash{$2^{O(2^{k^2-k})}\polyn$} time algorithm.

- In contrast, \bidsn is W[1]-hard parameterized by $k$.

at July 21, 2017 12:00 AM UTC

Authors: Ankush Agarwalla, John Augustine, William K. Moses Jr., Madhav Sankar K., Arvind Krishna Sridhar
Download: PDF
Abstract: In this work, we study the problem of dispersion of mobile robots on dynamic rings. The problem of dispersion of $n$ robots on an $n$ node graph, introduced by Augustine and Moses Jr.~\cite{AM17}, requires robots to coordinate with each other and reach a configuration where exactly one robot is present on each node. This problem has real world applications and applies whenever we want to minimize the total cost of $n$ agents sharing $n$ resources, located at various places, subject to the constraint that the cost of an agent moving to a different resource is comparatively much smaller than the cost of multiple agents sharing a resource (e.g. smart electric cars sharing recharge stations). The study of this problem also provides indirect benefits to the study of exploration by mobile robots and the study of load balancing on graphs.

We solve the problem of dispersion in the presence of two types of dynamism in the underlying graph: (i) vertex permutation and (ii) 1-interval connectivity. We introduce the notion of vertex permutation dynamism and have it mean that for a given set of nodes, in every round, the adversary ensures a ring structure is maintained, but the connections between the nodes may change. We use the idea of 1-interval connectivity from Di Luna et al.~\cite{LDFS16}, where for a given ring, in each round, the adversary chooses at most one edge to remove.

We assume robots have full visibility and present asymptotically time optimal algorithms to achieve dispersion in the presence of both types of dynamism when robots have chirality. When robots do not have chirality, we present asymptotically time optimal algorithms to achieve dispersion subject to certain constraints. Finally, we provide impossibility results for dispersion when robots have no visibility.

at July 21, 2017 12:00 AM UTC

Authors: Gonzalo Navarro
Download: PDF
Abstract: We consider document listing on string collections, that is, finding in which strings a given pattern appears. In particular, we focus on repetitive collections: a collection of size $N$ over alphabet $[1,\sigma]$ is composed of $D$ copies of a string of size $n$, and $s$ single-character or block edits are applied on ranges of copies. We introduce the first document listing index with size $\tilde{O}(n+s)$, precisely $O((n\log\sigma+s\log^2 N)\log D)$ bits, and with useful worst-case time guarantees: Given a pattern of length $m$, the index reports the $\ndoc$ strings where it appears in time $O(m^2 + m\log^{1+\epsilon} N \cdot \ndoc)$, for any constant $\epsilon>0$. Our technique is to augment a range data structure that is commonly used on grammar-based indexes, so that instead of retrieving all the pattern occurrences, it computes useful summaries on them. We show that the idea has independent interest: we introduce the first grammar-based index that, on a text $T[1,N]$ with a grammar of size $r$, uses $O(r\log N)$ bits and counts the number of occurrences of a pattern $P[1,m]$ in time $O(m^2 + m\log^{2+\epsilon} r)$, for any constant $\epsilon>0$.

at July 21, 2017 12:00 AM UTC

Authors: Nikhil Srivastava, Luca Trevisan
Download: PDF
Abstract: We prove the following Alon-Boppana type theorem for general (not necessarily regular) weighted graphs: if $G$ is an $n$-node weighted undirected graph of average combinatorial degree $d$ (that is, $G$ has $dn/2$ edges) and girth $g> 2d^{1/8}+1$, and if $\lambda_1 \leq \lambda_2 \leq \cdots \lambda_n$ are the eigenvalues of the (non-normalized) Laplacian of $G$, then \[ \frac {\lambda_n}{\lambda_2} \geq 1 + \frac 4{\sqrt d} - O \left( \frac 1{d^{\frac 58} }\right) \] (The Alon-Boppana theorem implies that if $G$ is unweighted and $d$-regular, then $\frac {\lambda_n}{\lambda_2} \geq 1 + \frac 4{\sqrt d} - O\left( \frac 1 d \right)$ if the diameter is at least $d^{1.5}$.)

Our result implies a lower bound for spectral sparsifiers. A graph $H$ is a spectral $\epsilon$-sparsifier of a graph $G$ if \[ L(G) \preceq L(H) \preceq (1+\epsilon) L(G) \] where $L(G)$ is the Laplacian matrix of $G$ and $L(H)$ is the Laplacian matrix of $H$. Batson, Spielman and Srivastava proved that for every $G$ there is an $\epsilon$-sparsifier $H$ of average degree $d$ where $\epsilon \approx \frac {4\sqrt 2}{\sqrt d}$ and the edges of $H$ are a (weighted) subset of the edges of $G$. Batson, Spielman and Srivastava also show that the bound on $\epsilon$ cannot be reduced below $\approx \frac 2{\sqrt d}$ when $G$ is a clique; our Alon-Boppana-type result implies that $\epsilon$ cannot be reduced below $\approx \frac 4{\sqrt d}$ when $G$ comes from a family of expanders of super-constant degree and super-constant girth.

The method of Batson, Spielman and Srivastava proves a more general result, about sparsifying sums of rank-one matrices, and their method applies to an "online" setting. We show that for the online matrix setting the $4\sqrt 2 / \sqrt d$ bound is tight, up to lower order terms.

at July 21, 2017 12:00 AM UTC

Authors: Erik D. Demaine, Quanquan C. Liu
Download: PDF
Abstract: Pebble games are single-player games on DAGs involving placing and moving pebbles on nodes of the graph according to a certain set of rules. The goal is to pebble a set of target nodes using a minimum number of pebbles. In this paper, we present a possibly simpler proof of the result in [CLNV15] and strengthen the result to show that it is PSPACE-hard to determine the minimum number of pebbles to an additive $n^{1/3-\epsilon}$ term for all $\epsilon > 0$, which improves upon the currently known additive constant hardness of approximation [CLNV15] in the standard pebble game. We also introduce a family of explicit, constant indegree graphs with $n$ nodes where there exists a graph in the family such that using constant $k$ pebbles requires $\Omega(n^k)$ moves to pebble in both the standard and black-white pebble games. This independently answers an open question summarized in [Nor15] of whether a family of DAGs exists that meets the upper bound of $O(n^k)$ moves using constant $k$ pebbles with a different construction than that presented in [AdRNV17].

at July 21, 2017 12:40 AM UTC

Authors: Jacob Holm, Eva Rotenberg, Mikkel Thorup
Download: PDF
Abstract: We present a deterministic fully-dynamic data structure for maintaining information about the bridges in a graph. We support updates in $\tilde{O}((\log n)^2)$ amortized time, and can find a bridge in the component of any given vertex, or a bridge separating any two given vertices, in $O(\log n / \log \log n)$ worst case time. Our bounds match the current best for bounds for deterministic fully-dynamic connectivity up to $\log\log n$ factors. The previous best dynamic bridge finding was an $\tilde{O}((\log n)^3)$ amortized time algorithm by Thorup [STOC2000], which was a bittrick-based improvement on the $O((\log n)^4)$ amortized time algorithm by Holm et al.[STOC98, JACM2001].

Our approach is based on a different and purely combinatorial improvement of the algorithm of Holm et al., which by itself gives a new combinatorial $\tilde{O}((\log n)^3)$ amortized time algorithm. Combining it with Thorup's bittrick, we get down to the claimed $\tilde{O}((\log n)^2)$ amortized time.

We also offer improvements in space. We describe a general trick which applies to both of our new algorithms, and to the old ones, to get down to linear space, where the previous best use $O(m + n\log n\log\log n)$. Finally, we show how to obtain $O(\log n/\log \log n)$ query time, matching the optimal trade-off between update and query time.

Our result yields an improved running time for deciding whether a unique perfect matching exists in a static graph.

at July 21, 2017 12:00 AM UTC

Authors: Andreas Schmid, Jens M. Schmidt
Download: PDF
Abstract: Tutte paths are one of the most successful tools for attacking Hamiltonicity problems in planar graphs. Unfortunately, results based on them are non-constructive, as their proofs inherently use an induction on overlapping subgraphs and these overlaps hinder to bound the running time to a polynomial. For special cases however, computational results of Tutte paths are known: For 4-connected planar graphs, Tutte paths are in fact Hamiltonian paths and Chiba and Nishizeki showed how to compute such paths in linear time. For 3-connected planar graphs, Tutte paths have a more complicated structure, and it has only recently been shown that they can be computed in polynomial time. However, Tutte paths are defined for general 2-connected planar graphs and this is what most applications need. Unfortunately, no computational results are known. We give the first efficient algorithm that computes a Tutte path (for the general case of 2-connected planar graphs). One of the strongest existence results about such Tutte paths is due to Sanders, which allows to prescribe the end vertices and an intermediate edge of the desired path. Encompassing and strengthening all previous computational results on Tutte paths, we show how to compute this special Tutte path efficiently. Our method refines both, the results of Thomassen and Sanders, and avoids overlapping subgraphs by using a novel iterative decomposition along 2-separators. Finally, we show that our algorithm runs in quadratic time.

at July 21, 2017 12:00 AM UTC

Here is a game (Darling says I only blog about non-fun games. This post will NOT prove her wrong.)

Let D be a domain,  d ≥  1 and 0 ≠ a0  ∈ D. There are two players Wanda (for Wants root) and Nora (for No root). One of the players is Player I, the other Player II.

(1) Player I and II alternate (with Player I going first) choosing the coefficients in D of a polynomial of degree d with the constant term preset to a0.

(2) When they are done, if there is a root in D  then Wanda wins, else Nora wins.

There is a paper by Gasarch-Washington-Zbarsky here where we determine who wins the game when D is Z,Q (these proofs are elementary), any finite extension of Q (this proof uses hard number theory), R, C (actually any algebraic closed field), and any finite field.

How did I think of this game?  There was a paper called Greedy Galois Games (which I blogged about here). When I saw the title I thought the game might be that players pick coefficients from Q and if the final polynomial has a solution in radicals then (say) Player I wins. That was not correct. They only use that Galois was a bad duelist. Even so, the paper INSPIRED me! Hence the paper above! The motivating problem is still open:

Open Question: Let d be at least 5. Play the above game except that (1) the coefficients are out of Q, and (2) Wanda wins if the final poly is solvable by radicals, otherwise Nora wins. (Note that if d=1,2,3,4 then Wanda wins.) Who wins?

If they had named their game Hamilton Game (since Alexander Hamilton lost a duel) I might have been inspired to come up with a game about quaternions or Hamiltonian cycles.

POINT- take ideas for problems from any source, even an incorrect guess about a paper!

by GASARCH ( at July 20, 2017 09:47 PM UTC

Authors: Martin Nägele, Benny Sudakov, Rico Zenklusen
Download: PDF
Abstract: Submodular function minimization (SFM) is a fundamental and efficiently solvable problem class in combinatorial optimization with a multitude of applications in various fields. Surprisingly, there is only very little known about constraint types under which SFM remains efficiently solvable. The arguably most relevant non-trivial constraint class for which polynomial SFM algorithms are known are parity constraints, i.e., optimizing only over sets of odd (or even) cardinality. Parity constraints capture classical combinatorial optimization problems like the odd-cut problem, and they are a key tool in a recent technique to efficiently solve integer programs with a constraint matrix whose subdeterminants are bounded by two in absolute value.

We show that efficient SFM is possible even for a significantly larger class than parity constraints, by introducing a new approach that combines techniques from Combinatorial Optimization, Combinatorics, and Number Theory. In particular, we can show that efficient SFM is possible over all sets (of any given lattice) of cardinality r mod m, as long as m is a constant prime power. This covers generalizations of the odd-cut problem with open complexity status, and with relevance in the context of integer programming with higher subdeterminants. To obtain our results, we establish a connection between the correctness of a natural algorithm, and the inexistence of set systems with specific combinatorial properties. We introduce a general technique to disprove the existence of such set systems, which allows for obtaining extensions of our results beyond the above-mentioned setting. These extensions settle two open questions raised by Geelen and Kapadia [Combinatorica, 2017] in the context of computing the girth and cogirth of certain types of binary matroids.

at July 20, 2017 12:00 AM UTC

Authors: Hiroshi Saito, Hirotada Honda
Download: PDF
Abstract: We geometrically analyze the problem of estimating parameters related to the shape and size of a two-dimensional target object on the plane by using randomly distributed distance sensors whose locations are unknown. Based on the analysis using geometric probability, we discuss the observability of these parameters: which parameters we can estimate and what conditions are required to estimate them. For a convex target object, its size and perimeter length are observable, and other parameters are not observable. For a general polygon target object, convexity in addition to its size and perimeter length is observable. Parameters related to a concave vertex can be observable when some conditions are satisfied. We also propose a method for estimating the convexity of a target object and the perimeter length of the target object.

at July 20, 2017 12:00 AM UTC

Authors: Stefan Felsner, Tamás Mészáros, Piotr Micek
Download: PDF
Abstract: The dimension of a partially ordered set $P$ is an extensively studied parameter. Small dimension allows succinct encoding. Indeed if $P$ has dimension $d$, then to know whether $x < y$ in $P$ it is enough to check whether $x<y$ in each of the $d$ linear extensions of a witnessing realizer. Focusing on the encoding aspect Ne\v{s}et\v{r}il and Pudl\'{a}k defined the boolean dimension so that $P$ has boolean dimension at most $d$ if it is possible to decide whether $x < y$ in $P$ by looking at the relative position of $x$ and $y$ in only $d$ permutations of the elements of $P$.

Our main result is that posets with cover graphs of bounded tree-width have bounded boolean dimension. This stays in contrast with the fact that there are posets with cover graphs of tree-width three and arbitrarily large dimension.

As a corollary, we obtain a labeling scheme of size $\log(n)$ for reachability queries on $n$-vertex digraphs with bounded tree-width. Our techniques seem to be very different from the usual approach for labeling schemes which is based on divide-and-conquer decompositions of the input graphs.

at July 20, 2017 12:00 AM UTC

Authors: Anton V. Eremeev, Alexander V. Kelmanov, Artem V. Pyatkin, Igor A. Ziegler
Download: PDF
Abstract: In this paper, we consider the problem of finding a maximum cardinality subset of vectors, given a constraint on the normalized squared length of vectors sum. This problem is closely related to Problem 1 from (Eremeev, Kel'manov, Pyatkin, 2016). The main difference consists in swapping the constraint with the optimization criterion.

We prove that the problem is NP-hard even in terms of finding a feasible solution. An exact algorithm for solving this problem is proposed. The algorithm has a pseudo-polynomial time complexity in the special case of the problem, where the dimension of the space is bounded from above by a constant and the input data are integer. A computational experiment is carried out, where the proposed algorithm is compared to COINBONMIN solver, applied to a mixed integer quadratic programming formulation of the problem. The results of the experiment indicate superiority of the proposed algorithm when the dimension of Euclidean space is low, while the COINBONMIN has an advantage for larger dimensions.

at July 20, 2017 12:00 AM UTC

Authors: Aaron Bernstein, Jacob Holm, Eva Rotenberg
Download: PDF
Abstract: In the online bipartite matching problem with replacements, all the vertices on one side of the bipartition are given, and the vertices on the other side arrive one by one with all their incident edges. The goal is to maintain a maximum matching while minimizing the number of changes (replacements) to the matching. We show that the greedy algorithm that always takes the shortest augmenting path from the newly inserted vertex (denoted the SAP protocol) uses at most amortized $O(\log^2 n)$ replacements per insertion, where $n$ is the total number of vertices inserted. This is the first analysis to achieve a polylogarithmic number of replacements for \emph{any} replacement strategy, almost matching the $\Omega(\log n)$ lower bound. The previous best known strategy achieved amortized $O(\sqrt{n})$ replacements [Bosek, Leniowski, Sankowski, Zych, FOCS 2014]. For the SAP protocol in particular, nothing better than then trivial $O(n)$ bound was known except in special cases.

Our analysis immediately implies the same upper bound of $O(\log^2 n)$ reassignments for the capacitated assignment problem, where each vertex on the static side of the bipartition is initialized with the capacity to serve a number of vertices.

We also analyze the problem of minimizing the maximum server load. We show that if the final graph has maximum server load $L$, then the SAP protocol makes amortized $O( \min\{L \log^2 n , \sqrt{n}\log n\})$ reassignments. We also show that this is close to tight because $\Omega(\min\{L, \sqrt{n}\})$ reassignments can be necessary.

at July 20, 2017 12:00 AM UTC

Authors: Paweł Gawrychowski, Jakub Łopuszański
Download: PDF
Abstract: A labeling scheme for nearest common ancestors assigns a distinct binary string, called the label, to every node of a tree, so that given the labels of two nodes (and no further information about the topology of the tree) we can compute the label of their nearest common ancestor. The goal is to make the labels as short as possible. Alstrup, Gavoille, Kaplan, and Rauhe [Theor. Comput. Syst. 37(3):441-456 2004] showed that $O(\log n)$-bit labels are enough. More recently, Alstrup, Halvorsen, and Larsen [SODA 2014] refined this to only $2.772\log n$, and provided a lower bound of $1.008\log n$.

We connect designing a labeling scheme for nearest common ancestors to the existence of a tree, called a minor-universal tree, that contains every tree on $n$ nodes as a topological minor. Even though it is not clear if a labeling scheme must be based on such a notion, we argue that the existing schemes can be reformulated as such, and it allows us to obtain clean and good bounds on the length of the labels. As the main upper bound, we show that $2.318\log n$-bit labels are enough. Surprisingly, the notion of a minor-universal tree for binary trees on $n$ nodes has been already used in a different context by Hrubes et al. [CCC 2010], and Young, Chu, and Wong [J. ACM 46(3):416-435, 1999] introduced a closely related notion of a universal tree. On the lower bound side, we show that the size of any minor-universal tree for trees on $n$ nodes is $\Omega(n^{2.174})$. This highlights a natural limitation for all approaches based on such trees. Our lower bound technique also implies that the size of any universal tree in the sense of Young et al. is $\Omega(n^{2.185})$, thus dramatically improves their lower bound of $\Omega(n\log n)$. We complement the existential results with a generic transformation that decreases the query time to constant in any scheme based on a minor-universal tree.

at July 20, 2017 12:00 AM UTC

Authors: Pranav Kant Gaur, S. K. Bose
Download: PDF
Abstract: In this article, recent works on 2D Constrained Delaunay triangulation(CDT) algorithms have been reported. Since the review of CDT algorithms presented by de Floriani(Issues on Machine Vision, Springer Vienna, pg. 95--104, 1989), different algorithms for construction and applications of CDT have appeared in literature each concerned with different aspects of computation and different suitabilities. Therefore, objective of this work is to report an update over that review article considering contemporary prominent algorithms and generalizations for the problem of two dimensional Constrained Delaunay triangulation.

at July 20, 2017 12:00 AM UTC

Authors: Martin Grohe, Nicole Schweikardt
Download: PDF
Abstract: We study an extension of first-order logic that allows to express cardinality conditions in a similar way as SQL's COUNT operator. The corresponding logic FOC(P) was introduced by Kuske and Schweikardt (LICS'17), who showed that query evaluation for this logic is fixed-parameter tractable on classes of structures (or databases) of bounded degree. In the present paper, we first show that the fixed-parameter tractability of FOC(P) cannot even be generalised to very simple classes of structures of unbounded degree such as unranked trees or strings with a linear order relation.

Then we identify a fragment FOC1(P) of FOC(P) which is still sufficiently strong to express standard applications of SQL's COUNT operator. Our main result shows that query evaluation for FOC1(P) is fixed-parameter tractable with almost linear running time on nowhere dense classes of structures. As a corollary, we also obtain a fixed-parameter tractable algorithm for counting the number of tuples satisfying a query over nowhere dense classes of structures.

at July 20, 2017 12:00 AM UTC

Authors: Jun Zhang, Qi Cheng
Download: PDF
Abstract: In their celebrated paper "On Siegel's Lemma", Bombieri and Vaaler found an upper bound on the height of integer solutions of systems of linear Diophantine equations. Calculating the bound directly, however, requires exponential time. In this paper, we present the bound in a different form that can be computed in polynomial time. We also give an elementary (and arguably simpler) proof for the bound.

at July 20, 2017 02:01 AM UTC

Authors: Michael Mitzenmacher, Tom Morgan
Download: PDF
Abstract: We explore a generalization of set reconciliation, where the goal is to reconcile sets of sets. Alice and Bob each have a parent set consisting of $s$ child sets, each containing at most $h$ elements from a universe of size $u$. They want to reconcile their sets of sets in a scenario where the total number of differences between all of their child sets (under the minimum difference matching between their child sets) is $d$. We give several algorithms for this problem, and discuss applications to reconciliation problems on graphs, databases, and collections of documents. We specifically focus on graph reconciliation, providing protocols based on sets of sets reconciliation for random graphs from $G(n,p)$ and for forests of rooted trees.

at July 20, 2017 12:00 AM UTC

Authors: Vassil Dimitrov, Diego Coelho
Download: PDF
Abstract: This paper proposes new factorizations for computing the Neumann series. The factorizations are based on fast algorithms for small prime sizes series and the splitting of large sizes into several smaller ones. We propose a different basis for factorizations other than the well-known binary and ternary basis. We show that is possible to reduce the overall complexity for the usual binary decomposition from 2log2(N)-2 multiplications to around 1.72log2(N)-2 using a basis of size five. Merging different basis we can demonstrate that we can build fast algorithms for particular sizes. We also show the asymptotic case where one can reduce the number of multiplications to around 1.70log2(N)-2. Simulations are performed for applications in the context of wireless communications and image rendering, where is necessary perform large sized matrices inversion.

at July 20, 2017 12:00 AM UTC

Authors: Dhiraj Holden
Download: PDF
Abstract: We show the first unconditional pseudo-determinism result for all of search-BPP. Specifically, we show that every BPP search problem can be computed pseudo-deterministically on average for infinitely many input lengths. In other words, for infinitely many input lengths and for any polynomial-time samplable distribution our algorithm succeeds in producing a unique answer (if one exists) with high probability over the distribution and the coins tossed.

at July 20, 2017 12:00 AM UTC

I recently added to the incidence theory book two chapters about incidences in (Chapters 8 and 9). At the moment incidence bounds in are known for several different cases, and it is rather unclear how a general incidence bound in should look like. In this post I would like to state what I believe this […]

by Adam Sheffer at July 19, 2017 08:19 PM UTC

A core, emerging problem in nonconvex optimization involves the escape of saddle points. While recent research has shown that gradient descent (GD) generically escapes saddle points asymptotically (see Rong Ge’s and Ben Recht’s blog posts), the critical open problem is one of efficiency — is GD able to move past saddle points quickly, or can it be slowed down significantly? How does the rate of escape scale with the ambient dimensionality? In this post, we describe our recent work with Rong Ge, Praneeth Netrapalli and Sham Kakade, that provides the first provable positive answer to the efficiency question, showing that, rather surprisingly, GD augmented with suitable perturbations escapes saddle points efficiently; indeed, in terms of rate and dimension dependence it is almost as if the saddle points aren’t there!

Perturbing Gradient Descent

We are in the realm of classical gradient descent (GD) — given a function $f:\mathbb{R}^d \to \mathbb{R}$ we aim to minimize the function by moving in the direction of the negative gradient:

where $x_t$ are the iterates and $\eta$ is the step size. GD is well understood theorietically in the case of convex optimization, but the general case of nonconvex optimization has been far less studied. We know that GD converges quickly to the neighborhood of stationary points (points where $\nabla f(x) = 0$) in the nonconvex setting, but these stationary points may be local minima or, unhelpfully, local maxima or saddle points.

Clearly GD will never move away from a stationary point if started there (even a local maximum); thus, to provide general guarantees, it is necessary to modify GD slightly to incorporate some degree of randomness. Two simple methods have been studied in the literature:

  1. Intermittent Perturbations: Ge, Huang, Jin and Yuan 2015 considered adding occasional random perturbations to GD, and were able to provide the first polynomial time guarantee for GD to escape saddle points. (See also Rong Ge’s post )

  2. Random Initialization: Lee et al. 2016 showed that with only random initialization, GD provably avoids saddle points asymptotically (i.e., as the number of steps goes to infinity). (see also Ben Recht’s post)

Asymptotic — and even polynomial time —results are important for the general theory, but they stop short of explaining the success of gradient-based algorithms in practical nonconvex problems. And they fail to provide reassurance that runs of GD can be trusted — that we won’t find ourselves in a situation in which the learning curve flattens out for an indefinite amount of time, with the user having no way of knowing that the asymptotics have not yet kicked in. Lastly, they fail to provide reassurance that GD has the kind of favorable properties in high dimensions that it is known to have for convex problems.

One reasonable approach to this issue is to consider second-order (Hessian-based) algorithms. Although these algorithms are generally (far) more expensive per iteration than GD, and can be more complicated to implement, they do provide the kind of geometric information around saddle points that allows for efficient escape. Accordingly, a reasonable understanding of Hessian-based algorithms has emerged in the literature, and positive efficiency results have been obtained.

Is GD also efficient? Or is the Hessian necessary for fast escape of saddle points?

A negative result emerges to this first question if one considers the random initialization strategy discussed. Indeed, this approach is provably inefficient in general, taking exponential time to escape saddle points in the worst case (see “On the Necessity of Adding Perturbations” section).

Somewhat surprisingly, it turns out that we obtain a rather different — and positive — result if we consider the perturbation strategy. To be able to state this result, let us be clear on the algorithm that we analyze:

Perturbed gradient descent (PGD)

  1. for $~t = 1, 2, \ldots ~$ do
  2. $\quad\quad x_{t} \leftarrow x_{t-1} - \eta \nabla f (x_{t-1})$
  3. $\quad\quad$ if $~$perturbation condition holds$~$ then
  4. $\quad\quad\quad\quad x_t \leftarrow x_t + \xi_t$

Here the perturbation $\xi_t$ is sampled uniformly from a ball centered at zero with a suitably small radius, and is added to the iterate when the gradient is suitably small. These particular choices are made for analytic convenience; we do not believe that uniform noise is necessary. nor do we believe it essential that noise be added only when the gradient is small.

Strict-Saddle and Second-order Stationary Points

We define saddle points in this post to include both classical saddle points as well as local maxima. They are stationary points which are locally maximized along at least one direction. Saddle points and local minima can be categorized according to the minimum eigenvalue of Hessian:

We further call the saddle points in the last category, where $\lambda_{\min}(\nabla^2 f(x)) < 0$, strict saddle points.

Strict and Non-strict Saddle Point

While non-strict saddle points can be flat in the valley, strict saddle points require that there is at least one direction along which the curvature is strictly negative. The presence of such a direction gives a gradient-based algorithm the possibility of escaping the saddle point. In general, distinguishing local minima and non-strict saddle points is NP-hard; therefore, we — and previous authors — focus on escaping strict saddle points.

Formally, we make the following two standard assumptions regarding smoothness.

Assumption 1: $f$ is $\ell$-gradient-Lipschitz, i.e.
$\quad\quad\quad\quad \forall x_1, x_2, |\nabla f(x_1) - \nabla f(x_2)| \le \ell |x_1 - x_2|$.
Assumption 2: $f$ is $\rho$-Hessian-Lipschitz, i.e.
$\quad\quad\quad\quad \forall x_1, x_2$, $|\nabla^2 f(x_1) - \nabla^2 f(x_2)| \le \rho |x_1 - x_2|$.

Similarly to classical theory, which studies convergence to a first-order stationary point, $\nabla f(x) = 0$, by bounding the number of iterations to find a $\epsilon$-first-order stationary point, $|\nabla f(x)| \le \epsilon$, we formulate the speed of escape of strict saddle points and the ensuing convergence to a second-order stationary point, $\nabla f(x) = 0, \lambda_{\min}(\nabla^2 f(x)) \ge 0$, with an $\epsilon$-version of the definition:

Definition: A point $x$ is an $\epsilon$-second-order stationary point if:
$\quad\quad\quad\quad |f(x)|\le \epsilon$, and $\lambda_{\min}(\nabla^2 f(x)) \ge -\sqrt{\rho \epsilon}$.

In this definition, $\rho$ is the Hessian Lipschitz constant introduced above. This scaling follows the convention of Nesterov and Polyak 2006.


In a wide range of practical nonconvex problems it has been proved that all saddle points are strict — such problems include, but not are limited to, principal components analysis, canonical correlation analysis, orthogonal tensor decomposition, phase retrieval, dictionary learning, matrix sensing, matrix completion, and other nonconvex low-rank problems.

Furthermore, in all of these nonconvex problems, it also turns out that all local minima are global minima. Thus, in these cases, any general efficient algorithm for finding $\epsilon$-second-order stationary points immediately becomes an efficient algorithm for solving those nonconvex problem with global guarantees.

Escaping Saddle Point with Negligible Overhead

In the classical case of first-order stationary points, GD is known to have very favorable theoretical properties:

Theorem (Nesterov 1998): If Assumption 1 holds, then GD, with $\eta = 1/\ell$, finds an $\epsilon$-first-order stationary point in $2\ell (f(x_0) - f^\star)/\epsilon^2$ iterations.

In this theorem, $x_0$ is the initial point and $f^\star$ is the function value of the global minimum. The theorem says for that any gradient-Lipschitz function, a stationary point can be found by GD in $O(1/\epsilon^2)$ steps, with no explicit dependence on $d$. This is called “dimension-free optimization” in the literature; of course the cost of a gradient computation is $O(d)$, and thus the overall runtime of GD scales as $O(d)$. The linear scaling in $d$ is especially important for modern high-dimensional nonconvex problems such as deep learning.

We now wish to address the corresponding problem for second-order stationary points. What is the best we can hope for? Can we also achieve

  1. A dimension-free number of iterations;
  2. An $O(1/\epsilon^2)$ convergence rate;
  3. The same dependence on $\ell$ and $(f(x_0) - f^\star)$ as in (Nesterov 1998)?

Rather surprisingly, the answer is Yes to all three questions (up to small log factors).

Main Theorem: If Assumptions 1 and 2 hold, then PGD, with $\eta = O(1/\ell)$, finds an $\epsilon$-second-order stationary point in $\tilde{O}(\ell (f(x_0) - f^\star)/\epsilon^2)$ iterations with high probability.

Here $\tilde{O}(\cdot)$ hides only logarithmic factors; indeed, the dimension dependence in our result is only $\log^4(d)$. The theorem thus asserts that a perturbed form of GD, under an additional Hessian-Lipschitz condition, converges to a second-order-stationary point in almost the same time required for GD to converge to a first-order-stationary point. In this sense, we claim that PGD can escape strict saddle points almost for free.

We turn to a discussion of some of the intuitions underlying these results.

Why do polylog(d) iterations suffice?

Our strict-saddle assumption means that there is only, in the worst case, one direction in $d$ dimensions along which we can escape. A naive search for the descent direction intuitively should take at least $\text{poly}(d)$ iterations, so why should only $\text{polylog}(d)$ suffice?

Consider a simple case in which we assume that the function is quadratic in the neighborhood of the saddle point. That is, let the objective function be $f(x) = x^\top H x$, a saddle point at zero, with constant Hessian $H = \text{diag}(-1, 1, \cdots, 1)$. In this case, only the first direction is an escape direction (with negative eigenvalue $-1$).

It is straightforward to work out the general form of the iterates in this case:

Assume that we start at the saddle point at zero, then add a perturbation so that $x_0$ is sampled uniformly from a ball $\mathcal{B}_0(1)$ centered at zero with radius one. The decrease in the function value can be expressed as:

Set the step size to be $1/2$, let $\lambda_i$ denote the $i$-th eigenvalue of the Hessian $H$ and let $\alpha_i = e_i^\top x_0$ denote the component in the $i$th direction of the initial point $x_0$. We have $\sum_{i=1}^d \alpha_i^2 = | x_0|^2 = 1$, thus:

A simple probability argument shows that sampling uniformly in $\mathcal{B}_0(1)$ will result in at least a $\Omega(1/d)$ component in the first direction with high probability. That is, $\alpha^2_1 = \Omega(1/d)$. Substituting $\alpha_1$ in the above equation, we see that it takes at most $O(\log d)$ steps for the function value to decrease by a constant amount.

Pancake-shape stuck region for general Hessian

We can conclude that for the case of a constant Hessian, only when the perturbation $x_0$ lands in the set $\{x | ~ |e_1^\top x|^2 \le O(1/d)\}$ $\cap \mathcal{B}_0 (1)$, can we take a very long time to escape the saddle point. We call this set the stuck region; in this case it is a flat disk. In general, when the Hessian is no longer constant, the stuck region becomes a non-flat pancake, depicted as a green object in the left graph. In general this region will not have an analytic expression.

Earlier attempts to analyze the dynamics around saddle points tried to the approximate stuck region by a flat set. This results in a requirement of an extremely small step size and a correspondingly very large runtime complexity. Our sharp rate depends on a key observation — although we don’t know the shape of the stuck region, we know it is very thin.


In order to characterize the “thinness” of this pancake, we studied pairs of hypothetical perturbation points $w, u$ separated by $O(1/\sqrt{d})$ along an escaping direction. We claim that if we run GD starting at $w$ and $u$, at least one of the resulting trajectories will escape the saddle point very quickly. This implies that the thickness of the stuck region can be at most $O(1/\sqrt{d})$, so a random perturbation has very little chance to land in the stuck region.

On the Necessity of Adding Perturbations

We have discussed two possible ways to modify the standard gradient descent algorithm, the first by adding intermittent perturbations, and the second by relying on random initialization. Although the latter exhibits asymptotic convergence, it does not yield efficient convergence in general; in recent joint work with Simon Du, Jason Lee, Barnabas Poczos, and Aarti Singh, we have shown that even with fairly natural random initialization schemes and non-pathological functions, GD with only random initialization can be significantly slowed by saddle points, taking exponential time to escape. The behavior of PGD is strikingingly different — it can generically escape saddle points in polynomial time.

To establish this result, we considered random initializations from a very general class including Gaussians and uniform distributions over the hypercube, and we constructed a smooth objective function that satisfies both Assumptions 1 and 2. This function is constructed such that, even with random initialization, with high probability both GD and PGD have to travel sequentially in the vicinity of $d$ strict saddle points before reaching a local minimum. All strict saddle points have only one direction of escape. (See the left graph for the case of $d=2$).


When GD travels in the vicinity of a sequence of saddle points, it can get closer and closer to the later saddle points, and thereby take longer and longer to escape. Indeed, the time to escape the $i$th saddle point scales as $e^{i}$. On the other hand, PGD is always able to escape any saddle point in a small number of steps independent of the history. This phenomenon is confirmed by our experiments; see, for example, an experiment with $d=10$ in the right graph.


In this post, we have shown that a perturbed form of gradient descent can converge to a second-order-stationary point at almost the same rate as standard gradient descent converges to a first-order-stationary point. This implies that Hessian information is not necessary for to escape saddle points efficiently, and helps to explain why basic gradient-based algorithms such as GD (and SGD) work surprisingly well in the nonconvex setting. This new line of sharp convergence results can be directly applied to nonconvex problem such as matrix sensing/completion to establish efficient global convergence rates.

There are of course still many open problems in general nonconvex optimization. To name a few: will adding momentum improve the convergence rate to a second-order stationary point? What type of local minima are tractable and are there useful structural assumptions that we can impose on local minima so as to avoid local minima efficiently? We are making slow but steady progress on nonconvex optimization, and there is the hope that at some point we will transition from “black art” to “science”.

at July 19, 2017 10:00 AM UTC

The UCSD machine learning and theory groups are looking for strong post-doctoral applicants. Interest in broadly-defined data science is preferred, but all excellent candidates will be considered.

by Moses Charikar at July 19, 2017 06:05 AM UTC

Nikhil Devanur, one of the PC Chairs for WINE 2017, presents the following announcement. Please consider sending a paper to WINE this year!

Reminder: The WINE 2017 deadline is in roughly 2 weeks, on August 2. The conference will be held during December 17-20, 2017, at the Indian Institute of Science, Bangalore, India.


The CFP is here.

Submission server is here:


Some points to raise awareness.

  1. A selection of papers accepted to the conference is invited to GEB/TEAC.
  2. WINE has had the best paper award for a couple of years now. In some years the committee has chosen not to give the award to any paper, but the option has been there.
  3. WINE is seen as being narrower than EC, and some subset of the EC community doesn’t typically submit to WINE. We would like to broaden the scope of WINE to include all those topics covered in EC. In particular, we welcome and encourage papers from the AI, OR, and Econ communities.
  4. We also welcome experimental papers, and in particular papers that have both a theoretical and an experimental part are strongly encouraged.

We are trying different things to make WINE be a top tier conference, but it all starts with you submitting good papers.


by robertkleinberg at July 19, 2017 04:18 AM UTC

Authors: Dimitris Fotakis, Vasilis Kontonis, Piotr Krysta, Paul Spirakis
Download: PDF
Abstract: We introduce the problem of simultaneously learning all powers of a Poisson Binomial Distribution (PBD). A PBD of order $n$ is the distribution of a sum of $n$ mutually independent Bernoulli random variables $X_i$, where $\mathbb{E}[X_i] = p_i$. The $k$'th power of this distribution, for $k$ in a range $[m]$, is the distribution of $P_k = \sum_{i=1}^n X_i^{(k)}$, where each Bernoulli random variable $X_i^{(k)}$ has $\mathbb{E}[X_i^{(k)}] = (p_i)^k$. The learning algorithm can query any power $P_k$ several times and succeeds in learning all powers in the range, if with probability at least $1- \delta$: given any $k \in [m]$, it returns a probability distribution $Q_k$ with total variation distance from $P_k$ at most $\epsilon$. We provide almost matching lower and upper bounds on query complexity for this problem. We first show a lower bound on the query complexity on PBD powers instances with many distinct parameters $p_i$ which are separated, and we almost match this lower bound by examining the query complexity of simultaneously learning all the powers of a special class of PBD's resembling the PBD's of our lower bound. We study the fundamental setting of a Binomial distribution, and provide an optimal algorithm which uses $O(1/\epsilon^2)$ samples. Diakonikolas, Kane and Stewart [COLT'16] showed a lower bound of $\Omega(2^{1/\epsilon})$ samples to learn the $p_i$'s within error $\epsilon$. The question whether sampling from powers of PBDs can reduce this sampling complexity, has a negative answer since we show that the exponential number of samples is inevitable. Having sampling access to the powers of a PBD we then give a nearly optimal algorithm that learns its $p_i$'s. To prove our two last lower bounds we extend the classical minimax risk definition from statistics to estimating functions of sequences of distributions.

at July 19, 2017 12:57 AM UTC

Authors: John Augustine, William K. Moses Jr
Download: PDF
Abstract: We introduce a new problem in the domain of mobile robots, which we term dispersion. In this problem, $n$ robots are placed in an $n$ node graph arbitrarily and must coordinate with each other to reach a final configuration such that exactly one robot is at each node. We study this problem through the lenses of minimizing the memory required by each robot and of minimizing the number of rounds required to achieve dispersion.

Dispersion is of interest due to its relationship to the problem of exploration using mobile robots and the problem of load balancing on a graph. Additionally, dispersion has an immediate real world application due to its relationship to the problem of recharging electric cars, as each car can be considered a robot and recharging stations and the roads connecting them nodes and edges of a graph respectively. Since recharging is a costly affair relative to traveling, we want to distribute these cars amongst the various available recharge points where communication should be limited to car-to-car interactions.

We provide lower bounds on both the memory required for robots to achieve dispersion and the minimum running time to achieve dispersion on any type of graph. We then analyze the trade-offs between time and memory for various types of graphs. We provide time optimal and memory optimal algorithms for several types of graphs and show the power of a little memory in terms of running time.

at July 19, 2017 12:52 AM UTC

Authors: Rodrigo Canovas, Bastien Cazaux, Eric Rivals
Download: PDF
Abstract: For analysing text algorithms, for computing superstrings, or for testing random number generators, one needs to compute all overlaps between any pairs of words in a given set. The positions of overlaps of a word onto itself, or of two words, are needed to compute the absence probability of a word in a random text, or the numbers of common words shared by two random texts. In all these contexts, one needs to compute or to query overlaps between pairs of words in a given set. For this sake, we designed COvI, a compressed overlap index that supports multiple queries on overlaps: like computing the correlation of two words, or listing pairs of words whose longest overlap is maximal among all possible pairs. COvI stores overlaps in a hierarchical and non-redundant manner. We propose an implementation that can handle datasets of millions of words and still answer queries efficiently. Comparison with a baseline solution - called FullAC - relying on the Aho-Corasick automaton shows that COvI provides significant advantages. For similar construction times, COvI requires half the memory FullAC, and still solves complex queries much faster.

at July 19, 2017 12:50 AM UTC

Authors: Nikhil Bansal, Martin Böhm, Marek Eliáš, Grigorios Koumoutsos, Seeun William Umboh
Download: PDF
Abstract: In the Convex Body Chasing problem, we are given an initial point $v_0$ in $R^d$ and an online sequence of $n$ convex bodies $F_1, ..., F_n$. When we receive $F_i$, we are required to move inside $F_i$. Our goal is to minimize the total distance travelled. This fundamental online problem was first studied by Friedman and Linial (DCG 1993). They proved an $\Omega(\sqrt{d})$ lower bound on the competitive ratio, and conjectured that a competitive ratio depending only on d is possible. However, despite much interest in the problem, the conjecture remains wide open.

We consider the setting in which the convex bodies are nested: $F_1 \supset ... \supset F_n$. The nested setting is closely related to extending the online LP framework of Buchbinder and Naor (ESA 2005) to arbitrary linear constraints. Moreover, this setting retains much of the difficulty of the general setting and captures an essential obstacle in resolving Friedman and Linial's conjecture. In this work, we give the first $f(d)$-competitive algorithm for chasing nested convex bodies in $R^d$.

at July 19, 2017 12:50 AM UTC

Authors: Maryam Aliakbarpour, Ilias Diakonikolas, Ronitt Rubinfeld
Download: PDF
Abstract: We investigate the problems of identity and closeness testing over a discrete population from random samples. Our goal is to develop efficient testers while guaranteeing Differential Privacy to the individuals of the population. We describe an approach that yields sample-efficient differentially private testers for these problems. Our theoretical results show that there exist private identity and closeness testers that are nearly as sample-efficient as their non-private counterparts. We perform an experimental evaluation of our algorithms on synthetic data. Our experiments illustrate that our private testers achieve small type I and type II errors with sample size sublinear in the domain size of the underlying distributions.

at July 19, 2017 12:46 AM UTC

Authors: Andrzej Grzesik, Tereza Klimošová, Marcin Pilipczuk, Michał Pilipczuk
Download: PDF
Abstract: In the classic Maximum Weight Independent Set problem we are given a graph $G$ with a nonnegative weight function on vertices, and the goal is to find an independent set in $G$ of maximum possible weight. While the problem is NP-hard in general, we give a polynomial-time algorithm working on any $P_6$-free graph, that is, a graph that has no path on $6$ vertices as an induced subgraph. This improves the polynomial-time algorithm on $P_5$-free graphs of Lokshtanov et al. (SODA 2014), and the quasipolynomial-time algorithm on $P_6$-free graphs of Lokshtanov et al (SODA 2016). The main technical contribution leading to our main result is enumeration of a polynomial-size family $\mathcal{F}$ of vertex subsets with the following property: for every maximal independent set $I$ in the graph, $\mathcal{F}$ contains all maximal cliques of some minimal chordal completion of $G$ that does not add any edge incident to a vertex of $I$.

at July 19, 2017 12:48 AM UTC

Authors: Qianggong Zhang, Tat-Jun Chin
Download: PDF
Abstract: Multiple-view triangulation by $\ell_{\infty}$ minimisation has become established in computer vision. State-of-the-art $\ell_{\infty}$ triangulation algorithms exploit the quasiconvexity of the cost function to derive iterative update rules that deliver the global minimum. Such algorithms, however, can be computationally costly for large problem instances that contain many image measurements, e.g., from web-based photo sharing sites or long-term video recordings. In this paper, we prove that $\ell_{\infty}$ triangulation admits a coreset approximation scheme, which seeks small representative subsets of the input data called coresets. A coreset possesses the special property that the error of the $\ell_{\infty}$ solution on the coreset is within known bounds from the global minimum. We establish the necessary mathematical underpinnings of the coreset algorithm, specifically, by enacting the stopping criterion of the algorithm and proving that the resulting coreset gives the desired approximation accuracy. On large-scale triangulation problems, our method provides theoretically sound approximate solutions. Iterated until convergence, our coreset algorithm is also guaranteed to reach the true optimum. On practical datasets, we show that our technique can in fact attain the global minimiser much faster than current methods

at July 19, 2017 12:00 AM UTC

Authors: Adil Erzin
Download: PDF
Abstract: In this paper, we propose several regular covers with identical sectors that observe the strip in the same direction, and we carried out their analysis, which allows choosing the best coverage model and the best parameters of the sector.

at July 19, 2017 12:00 AM UTC

Authors: Oswin Aichholzer, Thomas Hackl, Matias Korman, Marc van Kreveld, Maarten Löffler, Alexander Pilz, Bettina Speckmann, Emo Welzl
Download: PDF
Abstract: We consider the following question: How many edge-disjoint plane spanning trees are contained in a complete geometric graph $GK_n$ on any set $S$ of $n$ points in general position in the plane? We show that this number is in $\Omega(\sqrt{n})$. Further, we consider variants of this problem by bounding the diameter and the degree of the trees (in particular considering spanning paths).

at July 19, 2017 01:02 AM UTC