Time: 2016-07-19 10:00-12:00

Venue: 8-206 Rohm Building, Tsinghua

Abstract:

Estimating the support size of a distribution is a classical problem in statistics, dating back to the early work of Fisher, Good-Turing, and the influential work by Efron-Thisted on "how many words did Shakespeare know." This problem has also been investigated by the CS theory community under the umbrella of property testing. In the sublinear regime where the alphabet cardinality far exceeds the sample size, the challenge lies in how to extrapolate the number of unseen symbols based on what have been observed so far. We first consider the estimation of the support size of an arbitrary discrete distribution whose minimum non-zero mass is at least $\frac{1}{k}$. We show that, to achieve an additive error of $\epsilon k$ with probability at least, say, 0.9, the minimal number of independent samples is within universal constant factors of $\frac{k}{\log k} \log^2\frac{1}{\epsilon}$. The apparatus of best polynomial approximation, as well as its duality to the moment matching problem, play a key role in both the minimax lower bound and the construction of optimal linear-time estimators based on Chebyshev polynomials. We also discuss two closely-related problems: (a) distinct elements, where the goal is to estimate the number of distinct colors in a ball-urn model from repeated draws. We characterize the optimal sample complexity by means of a linear program of logarithmically many variables. (b) species extrapolation, where given n samples we want to estimate the number of hitherto unseen symbols that would be observed if m additional samples were collected. We show that the best range of extrapolation is $m=Theta(n\log n)$ and optimal accuracy is $n^{-\Theta(n/m)}$. The unifying theme is a duality view: For various estimation problems on large alphabets, we obtain an effective minimax theorem in terms of a convex or linear program, where the primal solution corresponds to a rate-minimax estimator, the dual solution gives rise to a least favorable prior, and the value of the program yields a constant-factor approximation of the minimax risk. (Based on joint work with Pengkun Yang http://arxiv.org/abs/1504.01227 and Alon Orlisty and Ananda Sureshhttp://arxiv.org/abs/1511.07428) Bio: Yihong Wu is an assistant professor at the Department of Statistics at Yale University. He received the B.E. degree from Tsinghua University in 2006 and the M.A. and Ph.D. degrees from Princeton University in 2008 and 2011, all in electrical engineering. He was a a postdoctoral fellow with the Statistics Department, The Wharton School, University of Pennsylvania and an assistant professor at the Department of Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign as an assistant professor 2013-2016. His research interests are in theoretical and algorithmic aspects of high-dimensional statistics and information theory.