(Tenative Schedule)

Time & Location: All talks are on Thursdays in Dinwiddie 102 at 3:30 pm unless otherwise noted. Refreshments in Gibson 426 after the talk.

Comments indicating vacations, special lectures, or change in location or time are in

Organizer: Gustavo Didier

**Abstract**:

In modern applications of high-throughput sequencing technologies researchers may be interested in quantifying the molecular diversity of a sample (e.g. T-Cell repertoire, transcriptional diversity, or microbial species diversity). In these sampling-based technologies there is an important detail that is often overlooked in the analysis of the data and the design of the experiments, specifically that the sampled observations often do not give a fully representative picture of the underlying population. This has long been a recognized problem in statistical ecology and in the broader statistics literature, and is commonly known as the missing species problem.

In classical settings, the size of the sample is usually small. New technologies such as high-throughput sequencing have allowed for the sampling of extremely large and heterogeneous populations at scales not previously attainable or even considered. New algorithms are required that take advantage of the scale of the data to account for heterogeneity, but are also sufficiently fast and scale well with the size of the data. I will discuss a moment-based approach for estimating the missing species based on an extension of Chao's moment-based lower bound (Chao, 1984). We apply results from the classical moment problem to show that solutions can be obtained efficiently, allowing for estimators that are simultaneously conservative and use more information. By connecting the rich theory of the classical moment problem to the missing species problem we can also clear up issues in the identifiability of the missing species.**Abstract**:

**Abstract**:

**Abstract**:

The (semiparametric) Gaussian copula model consists of distributions that have dependence structure described by Gaussian copulas but that have arbitrary marginals. A Gaussian copula is in turn determined by an Euclidean parameter $R$ called the copula correlation matrix. In this talk we study the normal scores (rank correlation coefficient) estimator, also known as the van der Waerden coefficient, of $R$ in high dimensions. It is well known that in fixed dimensions, the normal scores estimator is the optimal estimator of $R$, i.e., it has the smallest asymptotic covariance. Curiously though, in high dimensions, nowadays the preferred estimators of $R$ are usually based on Kendall's tau or Spearman's rho. We show that the normal scores estimator in fact remains the optimal estimator of $R$ in high dimensions. More specifically, we show that the approximate linearity of the normal scores estimator in the efficient influence function, which in fixed dimensions implies the optimality of the normal scores estimator, holds in high dimensions as well.

Minh TangJohns Hopkins

**Abstract**:

We prove a central limit theorem for the components of the eigenvectors corresponding to the d largest eigenvalues of the normalized Laplacian matrix of a finite dimensional random dot product graph. As a corollary, we show that for stochastic blockmodel graphs, the rows of the spectral embedding of the normalized Laplacian converge to multivariate normals and furthermore the mean and the covariance matrix of each row are functions of the associated vertex's block membership. Together with prior results for the eigenvectors of the adjacency matrix, we then compare, via the Chernoff information between multivariate normal distributions, how the choice of embedding method impacts subsequent inference. We demonstrate that neither embedding method dominates with respect to the inference task of recovering the latent block assignments.

**Location:** Gibson Hall 126A

**Time:** 3:30

Speakerinstitution

**Abstract**:

**Abstract**:

**Abstract**:

Forecasting the trajectory of social dynamic processes such as the spread of infectious diseases poses significant challenges that call for methods that account for data and model uncertainty. Here we introduce a frequentist computational bootstrap approach that weights the uncertainty derived from a set of plausible models to build an ensemble model for sequential forecasting. The power and transparency of this approach is illustrated in the context of simple dynamic differential-equation models, which we confront against the trajectory of real and simulated outbreak data. For illustration, we generate sequential short-term ensemble forecasts of epidemic outbreaks by combining the strengths of phenomenological models that incorporate flexible epidemic growth scaling namely the Generalized-Growth Model (GGM) and the Generalized Logistic Model (GLM). With our ensemble approach, we also addressed prior lessons of the Ebola forecasting challenge particularly with a focus at improving short-term forecasts of outbreaks which may involve a temporary downturn in case incidence.

**Abstract**:

**Abstract**:

**Abstract**:

**Abstract**:

**Abstract**:

**Abstract**:

Stochastic processing networks arise as models in manufacturing, telecommunications, transportation, computer systems, the customer service industry, and biochemical reaction networks. Common characteristics of these networks are that they have entities (jobs, packets, vehicles, customers, or molecules) that move along routes, wait in buffers, receive processing from various resources, and are subject to the effects of stochastic variability through such quantities as arrival times, processing times, and routing protocols. The mathematical theory of queueing aims to understand, analyze, and control congestion in stochastic processing networks. In this talk, we will review some of the major developments in the last century with more emphasis on some common approximations used in the last couple of decades. In particular, we will discuss broad results for control of large networks as well as more detailed results for control of specific smaller networks, under heavy traffic approximations.

**Abstract**:

**Abstract**:

Mathematics Department, 424 Gibson Hall, New Orleans, LA 70118 504-865-5727 math@math.tulane.edu