Colloquia Archives: 2012-2013

Lecture Series Overview

Top ⇑ Spring 2013 Colloquia

January 4

Coresets for Uncertain Data

Jeff Phillips University of Utah

Abstract: I will discuss computational geometry problems on uncertain data. That is, instead of a point set where each point has a known position, we consider point sets where each point has a probability distribution describing its location. We often consider the case where each uncertain point has a finite number of possible locations, motivated by scenarios where each object has been observed multiple times with slightly different results.

Next I will argue that when considering uncertain data, any queries (e.g. mean) should not be answered with a single value, but rather a distribution of values, describing what the true answer might be under the probability distributions governing the uncertain location of the data points. This suggests a certain type of approximate representation of the cumulative density function.

Finally, as an efficient way to compute these queries, we discuss methods to construct small coresets for large uncertain data sets which permit efficient answers of queries, up to the approximate guarantees suggested above. The first approach is a Monte Carlo sampling approach that selects possible locations from each uncertain point, and we apply it to shape fitting (a slightly old result) and nearest neighbor queries (a new result). The second approach selects a subset of uncertain points including their full probability distributions, and we apply this to various sorts of range counting queries (also a new result).

February 22

Place-based Information Systems: Textual Location Identification and Visualization

Hanan Samet University of maryland

Abstract: The popularity of web-based mapping services such as Google Earth/Maps and Microsoft Virtual Earth (Bing), has led to an increasing awareness of the importance of location data and its incorporation into both web-based search applications and the databases that support them, In the past, attention to location data had been primarily limited to geographic information systems (GIS), where locations correspond to spatial objects and are usually specified geometrically. However, in the web-based applications, the location data often corresponds to place names and is usually specified textually.

An advantage of such a specification is that the same specification can be used regardless of whether the place name is to be interpreted as a point or a region. Thus the place name acts as a polymorphic data type in the parlance of programming languages. However, its drawback is that it is ambiguous. In particular, a given specification may have several interpretations, not all of which are names of places. For example, "Jordan" may refer to both a person as well as a place. Moreover, there is additional ambiguity when the specification has a place name interpretation. For example, "Jordan" can refer to a river or a country while there are a number of cities named "London".

In this talk we examine the extension of GIS concepts to textually specified location data and review search engines that we have developed to retrieve documents where the similarity criterion is not based solely on exact match of elements of the query string but instead also based on spatial proximity. Thus we want to take advantage of spatial synonyms so that, for example, a query seeking a rock concert in Tel Aviv would be satisfied by a result finding a rock concert in Herzliyah of Petach Tikva. This idea has been applied by us to develop the STEWARD (Spatio-Textual Extraction on the Web Aiding Retrieval of Documents) system for finding documents on website of the Department of Housing and Urban Development. This system relies on the presence of a document tagger that automatically identifies spatial references in text, pdf, word, and other unstructured documents. The thesaurus for the document tagger is a collection of publicly available data sets forming a gazetteer containing the names of places in the world. Search results are ranked according to the extent to which they satisfy the query, which is determined in part by the prevalent spatial entities that are present in the document. The same ideas have also been adapted by us to collections of news articles as well as Twitter tweets resulting in the NewsStand and TwitterStand systems, respectively, which will be demonstrated along with the STEWARD system in conjunction with a discussion of some of the underlying issues that arose and the techniques used in their implementation. Future work involves applying these ideas to spreadsheet data.

April 5

Guns, Drugs and Computation*

Ryan Liliencadre research labs, university of toronto

Abstract: In this talk I will highlight two current yet very different projects taking place in my labs. First, I'll discuss the development of a 3D imaging and analysis system for firearm forensics. The topography of micro-toolmarks imprinted on the head of a fired carridge case (shell) serve as a 'fingerprint'; cases with highly similar surface topographies are likey to have been fired through the same firearm. Based off the retrographic sensor of Adelson and Johnson (MIT), we are developing a system for imaging the three-dimensional surfaces of fired cartridge cases and comparing the measured topographies using feature-based image analysis.

In the second half of the talk I'll highlight a novel modeling approach for predicting the emergence of drug resistance. The problem of modeling drug resistance, like many computational biology problems, manifests as a combinatorial search and optimization problem. The ability to predict drug resistance allows the prioritization of drug candidates and increased clinical vigilance. Our work represents some first, promising, steps towards this goal.

*No expertise in firearms or computational biology required.

Work is supported by the NIST, the NIJ, the Natural Sciencees and Engineering Research Council of Canada, and the Gates Foundation. Collaboration with Maria Safi (UofT), Marcus Brubaker (Cadre), Todd Weller (Oakland PD), Kimo Johnson (GelSight).

April 12

Computational Protein Design, With Applications to Cystic Fibrosis and HIV

Bruce Donald duke university

Computer Science Department Distinguished Lecture

Abstract: Some of the most challenging and influential opportunities for Physical Geometric Algorithms (PGA) arise in developing and applying information technology to understand the molecular machinery of the cell. Our recent work shows that PGA techniques may be fruitfully applied to the challenges of structural molecular biology and rational drug design. Concomitantly, a wealth of interesting computational problems arise in proposed methods for discovering new pharmaceuticals.

In this talk, I will discuss recent results on computational protein design. Computational protein design is a transformative field with exciting prospects for advancing both basic science and translational medical research. My laboratory has developed protein design algorithms and used them to design new drugs for leukemia; redesign an enzyme to diversify current antibiotics; to design protein-protein interactions; design probes to isolate broadly-neutralizing HIV antibodies; and predict MRSA resistance mutations to new antibiotics. At the heart of this research lies OSPREY, a software package implementing provable design algorithms of considerable intrinsic interest. I will introduce these algorithms for protein design. Then, I will discuss two applications: (1) designing drugs for cystic fibrosis, and (2) designing antibodies against HIV-1. Computational, and experimental (in vitro, and ex vivo) results will be presented.

About the Speaker: Bruce Donald is the James B. Duke Professor of Computer Science at Duke University, and Professor of Biochemistry in the Duke University Medical Center. He was a professor in the Computer Science Department at Cornell University from 1987-1998. Dr. Donald received a B.A. from Yale University, and a Ph.D. from MIT. He has been a National Science Foundation Presidential Young Investigator, and was awarded a Guggenheim Fellowship for his work on algorithms for structural proteomics. Donald is a Fellow of the ACM and the IEEE. His most recent book, Algorithms in Structural Molecular Biology, was published by The MIT Press (Cambridge, 2011).

April 19

Practical Abstractions for Dynamic and Parallel Software

Umut A. Acarcarnegie-mellon university, school of computer science

Abstract: Developing efficient and reliable software is a difficult task.  Rapidly growing and dynamically changing data sets further increase complexity by making it more challenging to achieve efficiency and performance.  I present practical and powerful abstractions for taming software complexity in two large domains: 1) dynamic software that interacts with dynamically changing data, and 2) parallel software that utilizes multiple processors or cores.  Together with the algorithmic models and programming-languages that embody them, these abstractions enable designing and developing efficient and reliable software by using high-level reasoning principles and programming techniques.  As evidence of their effectiveness, I consider a broad range of benchmarks involving lists, arrays, matrices, and trees, as well as sophisticated applications in geometry, machine-learning, and large-scale cloud computing.  On the theoretical side, I show asymptotically significant improvements in efficiency and present solutions to several open problems.  On the practical side, I present programming languages, compilers, and related software systems that deliver massive speedups with little or no programmer effort. About the Speaker: Umut Acar is an Assistant Professor at Carnegie Mellon University.  He obtained his Ph.D. at Carnegie Mellon University (2005), and his M.A. and B.S. degrees at University of Texas at Austin and Bilkent University.  Between 2005 and 2012, he led research groups at Toyota Technological Institute and Max Planck Institute for Software Systems.  Acar's research interests span programming languages, algorithms, and software systems.  Acar is a co-inventor of self-adjusting computation and a co-creator of the programming languages CEAL and DeltaML for self-adjusting computation.  He developed algorithms and software systems for locality-guided parallel scheduling, dynamic trees, dynamic meshes, statistical learning, and incremental large-scale data processing.

May 29

Dividing the Indivisible

Toby WalshNICta and unsw

The speaker is recognized as one of the leading researchers in artificial intelligence worldwide and his interests include the interface between (distributed) optimization, social choice, game theory and machine learning.

Abstract: I consider procedures for allocating a set of indivisible goods to multiple agents.  For example, one such procedure has agents alternate picking items.  Many of us used such a procedure at school when picking teams.  A similar procedure is used by the Harvard Business School to allocate courses to students.  I study here the impact of strategic behavior on the complete-information extensive-form game of such games.  Computing the subgame-perfect Nash equilibrium is PSPACE-hard in general, but takes only linear time with two agents.  Finally I discuss the "optimal" policies for two agents in different settings, including when agents behave sincerely or strategically.

About the Speaker: Toby Walsh is the Scientific Director of NICTA, Australia's centre of excellence for ICT research.  He is adjunct Professor at the University of New South Wales, external Professor at Uppsala University and an honorary fellow of Edinburgh University.  He has been Editor-in-Chief of the Journal of Artificial Intelligence Research, and of AI Communications.  He is both an AAAI and an ECCAI fellow.  He has been Secretary of the Association for Constraint Programming (ACP) and is Editor of CP News, the newsletter of the ACP.  Like Francesca, he is one of the Editors of the Handbook for Constraint Programming.  He is also an Editor of the Handbook for Satisfiability.  He has been Program Chair of CP 2001, Conference Chair of IJCAR 2004, Program and Conference Chair of SAT 2005, Conference Chair of CP 2008, and Program Chair of IJCAI 2011.

*The seminar talk will take place at 2:00pm in Stanley Thomas 302 on Wednesday, May 29.

303 Stanley Thomas Hall, New Orleans, LA 70118 504-865-5785