Spring 2017 Colloquia
Check back soon for more information on the computer science seminar series. Unless otherwise noted, the seminars meet on Mondays at 3pm in Stanley Thomas 302. If you would like to receive notices about upcoming seminars, you can subscribe to the announcement listserv.
Improving Tor’s Security with Trust-Aware Path Selection
Aaron Johnson Naval Research Laboratory
Abstract: Tor is a popular tool for low-latency anonymous communication, with over an estimated 1.5 million daily users. Tor users are vulnerable to deanonymization by an adversary that can observe some Tor relays or some parts of the network. We demonstrate that previous network-aware path-selection algorithms that propose to solve this problem are vulnerable to attacks across multiple Tor connections. We suggest that users use trust to choose the paths through Tor that are less likely to be observed, where trust is flexibly modeled as a probability distribution on the location of the user's adversaries, and we present the Trust-Aware Path Selection algorithm for Tor that helps users avoid traffic-analysis attacks while still choosing paths that could have been selected by many other users. We evaluate this algorithm in two settings using a high-level map of Internet routing: (i) users try to avoid a single global adversary that has an independent chance to control each Autonomous System organization, Internet Exchange Point organization, and Tor relay family, and (ii) users try to avoid deanonymization by any single country. Simulation results using data from the live Tor network show reductions of as much as 85% in the chance of deanonymization by a global adversary and 60% in the number of countries to which Tor users are vulnerable.
About the Speaker: Dr. Aaron Johnson is a computer scientist at the U.S. Naval Research Laboratory. His research interests include private communication and privacy-preserving data analysis. He has performed foundational mathematical research in the area of anonymous communication by modeling and analyzing the security of onion routing. He has also applied mathematically-rigorous privacy-preserving methods to publishing sensitive genetic and network data. Much of his work has been focused on the Tor network, which is an onion-routing network used by over 2 million users daily to secure their communications. He designed several improvements to Tor, including denial-of-service defenses, faster onion services, privacy-preserving network monitoring, and improvements to Tor's path selection. Many of these results have been incorporated into the Tor network and provide enhanced security, performance, and utility to its many users. Dr. Johnson received his Ph.D. in 2009 from the computer science department at Yale University and completed postdoctoral training at the University of Texas at Austin.
Tuning the Scale of Big Data Analytics
Nick Duffield texas a&m university
Abstract: Sampling is a powerful approach to reduce Big Data to Small Data, relieving storage and enabling faster query response when an approximate answer suffices. The first part of this talk describes a cost-based formulation for optimal data reduction that is used by a major Internet Service Provider, and some new applications to subgraph counting in graph streaming. The second part of this talk focuses on the use of machine learning methods to model the complex dependence between internet user experience and the systems that provide services, and how this knowledge can be used to improve those services. The talk also touches on some current applications of Data Science in transportation and hydrology, and some of the challenges for interdisciplinary research and education in Data Science.
About the Speaker: Nick Duffield (http://nickduffield.net/work
) is a Professor in the Department of Electrical and Computer Engineering at Texas A&M University, and Director of the Texas A&M Engineering Big Data Initiative. He worked previously at AT&T Labs-Research, Florham Park, NJ, where he was a Distinguished Member of Technical Staff and AT&T Fellow. His research focuses on the foundations and applications of Data Science, including graph streaming, communications networks, transportation, and hydrology. He is Chief Editor for Big Data at Frontiers in ICT and an Editor-at-Large for the IEEE/ACM Transactions on Networking. Dr. Duffield is an IEEE Fellow, an IET Fellow, a member of the Board of Directors of ACM Sigmetrics, and was a co-recipient of the ACM Sigmetrics Test of Time Award in 2012 and 2013.
Deep Learning for Natural Language Processing: Summarization and Language Vagueness
Fei Liu University of Central Florida
A Davis Washington Mitchell Lecture
Abstract: Deep learning techniques have revolutionized the field of natural language processing in the past few years, yet there remain challenges and open problems. In this talk I will discuss two case studies where deep learning techniques are called upon to solve natural language processing problems: summarization is a classic NLP task, whereas modeling language vagueness is a new area of research where limited work has been done. I will focus the talk on leveraging the hierarchical attention networks for forum thread summarization, while providing overviews of other projects. Challenges, opportunities, and future works will be discussed toward the end of the talk.
About the Speaker: Dr. Fei Liu is an assistant professor of Computer Science at the University of Central Florida. Her research areas are natural language processing and machine learning. From 2013 to 2015, Fei was a postdoctoral fellow at Carnegie Mellon University, member of Noah's ARK. From 2011 to 2013, she worked as a senior research scientist at Bosch Research, Palo Alto, California. Fei received her Ph.D. in Computer Science from the University of Texas at Dallas in 2011. Prior to that, she obtained her Bachelors and Masters degrees in Computer Science from Fudan University, Shanghai, China. Fei was a recipient of a Best Paper Nomination at the 25th International World Wide Web Conference (WWW) in 2016. She served as an Area Chair for the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT).
Scalable Systems for Collaborative Analytics over Spatiotemporal Data
Hakan Ferhatosmanoglu Max Planck Institute (MPI) for Computer Science
THIS EVENT HAS BEEN POSTPONED. WE WILL UPDATE THIS SITE WITH THE NEW DATE AND TIME AS SOON AS POSSIBLE.
Abstract by Speaker: In this talk, I will first present our solutions to merge and analyze streaming location data from multiple public and private sources of urban data (e.g., check-ins, GPS traces, mobile applications). The talk will include a new model for spatiotemporal record linkage and a scalable algorithm that joins records even when they are anonymized. I will then present our vision towards developing collaborative systems for urban data science applications that involve multiple stakeholders such as data owners, analysts, and domain experts. In contrast to traditional data analysis methods, our approach does not assume that the analyst owns the data, has it on the premises, with enough resources, tools, and expertise to produce actionable outcomes. I will also present the urban data science applications we developed with performance and accuracy results for several spatiotemporal data analytics tasks.
About the Speaker: Prof. Hakan Ferhatosmanoglu received his Ph.D. in Computer Science from University of California Santa Barbara in 2001. He is currently at Max Planck Institute with a research award by Alexander von Humboldt Foundation. He is a Professor at Bilkent University and worked as an associate professor at The Ohio State University. His research is broadly on building scalable software systems for data science applications. His work has been supported by more than 20 research grants from government agencies and companies. Dr. Ferhatosmanoglu received several career awards, from the US Department of Energy, US National Science Foundation, Turkish Academy of Sciences, and Humboldt Foundation.
The Church-Turing Thesis in the theory of computation and in philosophy
Achim Jung University of Birmingham
Abstract: At one level, the Church-Turing Thesis (CTT) is a quite clear and simple statement: All formalisations of the intuitive notion of computability are equally expressive. However, one might want to analyse this a bit more carefully, and also consider the context in which computation takes place. For example, we can consider machines that are connected to other machines, or machines that have a built-in notion of data type. Perhaps surprisingly, in these more refined settings the CTT is no longer valid, in the sense that otherwise perfectly natural computational formalisms are weaker than what one might like to call computable.
In this talk the speaker wants to explain this phenomenon and speculate as to why this could be an interesting point when considering the computational possibilities of the brain.
About the Speaker: Achim Jung is Professor of Computer Science in the School of Computer Science at the University of Birmingham. He is a leading authority in Domain Theory, Topology, the Semantics of High-level Programming Languages, Logics for Computation and the Lambda Calculus. He is an Editor of "Theoretical Computer Science", of "Soft Computing” and of the "Electronic Notes in Theoretical Computer Science". Professor Jung has twice served as Head of Department at Birmingham, and is largely credited with leading the renaissance the School of Computer Science has experienced since his arrival there. Professor Jung also is active in the support of teaching computer science in primary and secondary schools in the UK.
Latent Variable Modeling for Data Big and Small, with Perspectives on the Big Data Revolution
Jimmy Foulds University of California, San Diego
This event will be held on Tuesday, 3/7/2017
, from 11:00 a.m. - 12:15 p.m
. in Stanley Thomas, Room 302. Please note the special weekday and time for this event.
Abstract: Probabilistic latent variable models provide a powerful and principled approach for uncovering meaningful structure hidden in data such as social networks and text corpora, with applications in both academia and industry. For example, topic models summarize documents with human-interpretable topics, latent variable blockmodels find communities in social networks, and word embeddings map dictionary words into latent semantic spaces in order to improve the performance of natural language processing systems. Crucially, these relatively simple latent variable models can be developed into sophisticated models for solving nuanced applied modeling tasks. In this talk, I will illustrate this with case studies from my research, including measuring the influence of scientific articles and modeling the diffusion of text-based information in social networks (text-based cascades). I will then overview recent directions I have addressed in my work, including scaling these models up for big data, and general-purpose modeling frameworks, with perspectives on the recent "big data" revolution, and its consequences, benefits, and limitations.
James (a.k.a. Jimmy) Foulds is a postdoctoral scholar at the University of California, San Diego. His research interests are in both applied and foundational machine learning, focusing on probabilistic latent variable models and the inference algorithms to learn them from data. His work aims to promote the practice of latent variable modeling for multidisciplinary research in areas including computational social science and the digital humanities. He earned his Ph.D. in computer science at the University of California, Irvine, and was a postdoctoral scholar at the University of California, Santa Cruz. His master's and bachelor's degrees were earned with first class honours at the University of Waikato, New Zealand, where he also contributed to the Weka data mining system.
Sharon Fox Southeast Louisiana Veterans Health Care System
Egocentric computer vision, for fun and science
David Crandall Indiana University
This event will be held on Tuesday, 3/14/2017
, from 3:30 p.m. - 4:30 p.m
. in Stanley Thomas, Room 302. Please note the special weekday and time for this event.
Abstract: New sources of large-scale visual data raise both opportunities and challenges for computer vision. For example, each of the nearly trillion photos on Facebook is an observation of what the world looked like at a particular point in time and space, and what a particular photographer was paying attention to. Meanwhile, low-cost wearable cameras (like GoPro) are entering the mainstream, allowing people to record and share their lives from a first-person, "egocentric" perspective. How can vision help people organize these (and other) vast but noisy datasets? What could mining these rich datasets reveal about ourselves and about the world in general? In this talk, I'll describe recent work investigating these questions, focusing on two lines of work on egocentric imagery as examples. The first is for consumer applications, where our goal is to develop automated classifiers to help categorize lifelogging images across several dimensions. The second is an interdisciplinary project using computer vision with wearable cameras to study parent-child interactions in order to better understand child learning. Despite the different goals, these applications share common themes of robustly recognizing image content in noisy, highly dynamic, unstructured imagery.
About the Speaker: David Crandall is an Associate Professor in the School of Informatics and Computing at Indiana University Bloomington, where he is a member of the programs in Computer Science, Informatics, Cognitive Science, and Data Science, and of the Center for Complex Networks and Systems Research. He received the Ph.D. in computer science from Cornell University in 2008 and the M.S. and B.S. degrees in computer science and engineering from the Pennsylvania State University in 2001. He was a Postdoctoral Research Associate at Cornell from 2008-2010, and a Senior Research Scientist with Eastman Kodak Company from 2001-2003. He received an NSF CAREER award in 2013 and a Google Faculty Research Award in 2014.
Val Tannen University of Pennsylvania
Kevin Liu Ohio State University
This event will be held on Tuesday
, 3/21/2017, from 2:00 - 3:15 p.m.
in Stanley Thomas, Room 302. Please note the special weekday and time for this event.
Similarity Learning in the Era of Big Data
Shiyu Chang IBM T. J. Watson Research Center
This event will be held on Thursday
, 3/23/2017, from 12:30 - 1:45 p.m.
in Stanley Thomas, Room 302. Please note the special weekday and time for this event.
Abstract: The notion of machines that can learn has caught imaginations since the days of the early computer. In recent years, as we face burgeoning amounts of data around us that no human mind can process, machines that can learn to automatically find insights from such vast amounts of data have become a growing necessity. The field of machine learning is a modern marriage between computer science and statistics driven by tremendous industrial demands. The soul behind many applications is based on the so-called “similarity learning”. Learning similarities is often used as a subroutine in important data mining and machine learning tasks. For example, recommender systems utilize the learned metric to measure the relevance of the candidate items to target users. Applications of this approach also exist in the context of clinical decision support, search, and retrieval settings. However, the three-V (volume, variety, and velocity) natures of big data make learning similarity for pattern discovery and data analysis facing new challenges. How to reveal the truth from massive unlabeled data? How to handle data with multimodality? What if the data consist network structures? Does temporal dynamic effect the process of decision-making? For example, in clinical decision making, doctors retrieve the most similar clinical pathway for auxiliary diagnosis. However, the sheer volume and complexity of the data present major barriers toward their translation into effective clinical actions. In this talk, I will illustrate some of these challenges with examples from my works on foundations of similarity learning. I will show that with judicious design together with rigorous mathematics for learning similarities, we are able to make various kinds of impact on society and uncover surprising natural and social phenomena.
About the Speaker: Shiyu Chang is a Research Staff Member at IBM Thomas J. Watson Research Center. He recently obtained his Ph.D. from the University of Illinois at Urbana-Champaign (UIUC) under the supervision of Prof. Thomas S. Huang. Shiyu has a wide range of research interests in data explorations and analytics at large-scale. Specifically, his current research directions lie on developing novel machine learning algorithms to solve complex computational tasks in real-world. Shiyu received his B.S. degree at UIUC in 2011 with the highest university honor (Bronze Tablet Award). He graduated from the Department of Electrical and Computer Engineering at UIUC and obtained his M.S. degree in 2014. He is a recipient of the Thomas and Margaret Huang Award in 2016 and the Kodak Fellowship Award in 2014. Most of Shiyu’s research has been published in top data mining, computer vision and artificial intelligent venues including SIGKDD, WWW, CVPR, WSDM, ICDM, SDM, IJCAI etc. The paper “Factorized Similarity Learning in Networks” has been selected as the best student paper in ICDM 2014.