Spring 2017 Colloquia
Check back soon for more information on the computer science seminar series. Unless otherwise noted, the seminars meet on Mondays at 3pm in Stanley Thomas 302. If you would like to receive notices about upcoming seminars, you can subscribe to the announcement listserv.
Improving Tor’s Security with Trust-Aware Path Selection
Aaron Johnson Naval Research Laboratory
Abstract: Tor is a popular tool for low-latency anonymous communication, with over an estimated 1.5 million daily users. Tor users are vulnerable to deanonymization by an adversary that can observe some Tor relays or some parts of the network. We demonstrate that previous network-aware path-selection algorithms that propose to solve this problem are vulnerable to attacks across multiple Tor connections. We suggest that users use trust to choose the paths through Tor that are less likely to be observed, where trust is flexibly modeled as a probability distribution on the location of the user's adversaries, and we present the Trust-Aware Path Selection algorithm for Tor that helps users avoid traffic-analysis attacks while still choosing paths that could have been selected by many other users. We evaluate this algorithm in two settings using a high-level map of Internet routing: (i) users try to avoid a single global adversary that has an independent chance to control each Autonomous System organization, Internet Exchange Point organization, and Tor relay family, and (ii) users try to avoid deanonymization by any single country. Simulation results using data from the live Tor network show reductions of as much as 85% in the chance of deanonymization by a global adversary and 60% in the number of countries to which Tor users are vulnerable.
About the Speaker: Dr. Aaron Johnson is a computer scientist at the U.S. Naval Research Laboratory. His research interests include private communication and privacy-preserving data analysis. He has performed foundational mathematical research in the area of anonymous communication by modeling and analyzing the security of onion routing. He has also applied mathematically-rigorous privacy-preserving methods to publishing sensitive genetic and network data. Much of his work has been focused on the Tor network, which is an onion-routing network used by over 2 million users daily to secure their communications. He designed several improvements to Tor, including denial-of-service defenses, faster onion services, privacy-preserving network monitoring, and improvements to Tor's path selection. Many of these results have been incorporated into the Tor network and provide enhanced security, performance, and utility to its many users. Dr. Johnson received his Ph.D. in 2009 from the computer science department at Yale University and completed postdoctoral training at the University of Texas at Austin.
Tuning the Scale of Big Data Analytics
Nick Duffield texas a&m university
Abstract: Sampling is a powerful approach to reduce Big Data to Small Data, relieving storage and enabling faster query response when an approximate answer suffices. The first part of this talk describes a cost-based formulation for optimal data reduction that is used by a major Internet Service Provider, and some new applications to subgraph counting in graph streaming. The second part of this talk focuses on the use of machine learning methods to model the complex dependence between internet user experience and the systems that provide services, and how this knowledge can be used to improve those services. The talk also touches on some current applications of Data Science in transportation and hydrology, and some of the challenges for interdisciplinary research and education in Data Science.
About the Speaker: Nick Duffield (http://nickduffield.net/work
) is a Professor in the Department of Electrical and Computer Engineering at Texas A&M University, and Director of the Texas A&M Engineering Big Data Initiative. He worked previously at AT&T Labs-Research, Florham Park, NJ, where he was a Distinguished Member of Technical Staff and AT&T Fellow. His research focuses on the foundations and applications of Data Science, including graph streaming, communications networks, transportation, and hydrology. He is Chief Editor for Big Data at Frontiers in ICT and an Editor-at-Large for the IEEE/ACM Transactions on Networking. Dr. Duffield is an IEEE Fellow, an IET Fellow, a member of the Board of Directors of ACM Sigmetrics, and was a co-recipient of the ACM Sigmetrics Test of Time Award in 2012 and 2013.
Deep Learning for Natural Language Processing: Summarization and Language Vagueness
Fei Liu University of Central Florida
A Davis Washington Mitchell Lecture
Abstract: Deep learning techniques have revolutionized the field of natural language processing in the past few years, yet there remain challenges and open problems. In this talk I will discuss two case studies where deep learning techniques are called upon to solve natural language processing problems: summarization is a classic NLP task, whereas modeling language vagueness is a new area of research where limited work has been done. I will focus the talk on leveraging the hierarchical attention networks for forum thread summarization, while providing overviews of other projects. Challenges, opportunities, and future works will be discussed toward the end of the talk.
About the Speaker: Dr. Fei Liu is an assistant professor of Computer Science at the University of Central Florida. Her research areas are natural language processing and machine learning. From 2013 to 2015, Fei was a postdoctoral fellow at Carnegie Mellon University, member of Noah's ARK. From 2011 to 2013, she worked as a senior research scientist at Bosch Research, Palo Alto, California. Fei received her Ph.D. in Computer Science from the University of Texas at Dallas in 2011. Prior to that, she obtained her Bachelors and Masters degrees in Computer Science from Fudan University, Shanghai, China. Fei was a recipient of a Best Paper Nomination at the 25th International World Wide Web Conference (WWW) in 2016. She served as an Area Chair for the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT).
Scalable Systems for Collaborative Analytics over Spatiotemporal Data
Hakan Ferhatosmanoglu Max Planck Institute (MPI) for Computer Science
THIS EVENT HAS BEEN POSTPONED. WE WILL UPDATE THIS SITE WITH THE NEW DATE AND TIME AS SOON AS POSSIBLE.
Abstract by Speaker: In this talk, I will first present our solutions to merge and analyze streaming location data from multiple public and private sources of urban data (e.g., check-ins, GPS traces, mobile applications). The talk will include a new model for spatiotemporal record linkage and a scalable algorithm that joins records even when they are anonymized. I will then present our vision towards developing collaborative systems for urban data science applications that involve multiple stakeholders such as data owners, analysts, and domain experts. In contrast to traditional data analysis methods, our approach does not assume that the analyst owns the data, has it on the premises, with enough resources, tools, and expertise to produce actionable outcomes. I will also present the urban data science applications we developed with performance and accuracy results for several spatiotemporal data analytics tasks.
About the Speaker: Prof. Hakan Ferhatosmanoglu received his Ph.D. in Computer Science from University of California Santa Barbara in 2001. He is currently at Max Planck Institute with a research award by Alexander von Humboldt Foundation. He is a Professor at Bilkent University and worked as an associate professor at The Ohio State University. His research is broadly on building scalable software systems for data science applications. His work has been supported by more than 20 research grants from government agencies and companies. Dr. Ferhatosmanoglu received several career awards, from the US Department of Energy, US National Science Foundation, Turkish Academy of Sciences, and Humboldt Foundation.
The Church-Turing Thesis in the theory of computation and in philosophy
Achim Jung University of Birmingham
Abstract: At one level, the Church-Turing Thesis (CTT) is a quite clear and simple statement: All formalisations of the intuitive notion of computability are equally expressive. However, one might want to analyse this a bit more carefully, and also consider the context in which computation takes place. For example, we can consider machines that are connected to other machines, or machines that have a built-in notion of data type. Perhaps surprisingly, in these more refined settings the CTT is no longer valid, in the sense that otherwise perfectly natural computational formalisms are weaker than what one might like to call computable.
In this talk the speaker wants to explain this phenomenon and speculate as to why this could be an interesting point when considering the computational possibilities of the brain.
About the Speaker: Achim Jung is Professor of Computer Science in the School of Computer Science at the University of Birmingham. He is a leading authority in Domain Theory, Topology, the Semantics of High-level Programming Languages, Logics for Computation and the Lambda Calculus. He is an Editor of "Theoretical Computer Science", of "Soft Computing” and of the "Electronic Notes in Theoretical Computer Science". Professor Jung has twice served as Head of Department at Birmingham, and is largely credited with leading the renaissance the School of Computer Science has experienced since his arrival there. Professor Jung also is active in the support of teaching computer science in primary and secondary schools in the UK.
Latent Variable Modeling for Data Big and Small, with Perspectives on the Big Data Revolution
Jimmy Foulds University of California, San Diego
This event will be held on Tuesday, 3/7/2017
, from 11:00 a.m. - 12:15 p.m
. in Stanley Thomas, Room 302. Please note the special weekday and time for this event.
Abstract: Probabilistic latent variable models provide a powerful and principled approach for uncovering meaningful structure hidden in data such as social networks and text corpora, with applications in both academia and industry. For example, topic models summarize documents with human-interpretable topics, latent variable blockmodels find communities in social networks, and word embeddings map dictionary words into latent semantic spaces in order to improve the performance of natural language processing systems. Crucially, these relatively simple latent variable models can be developed into sophisticated models for solving nuanced applied modeling tasks. In this talk, I will illustrate this with case studies from my research, including measuring the influence of scientific articles and modeling the diffusion of text-based information in social networks (text-based cascades). I will then overview recent directions I have addressed in my work, including scaling these models up for big data, and general-purpose modeling frameworks, with perspectives on the recent "big data" revolution, and its consequences, benefits, and limitations.
About the Speaker: James (a.k.a. Jimmy) Foulds is a postdoctoral scholar at the University of California, San Diego. His research interests are in both applied and foundational machine learning, focusing on probabilistic latent variable models and the inference algorithms to learn them from data. His work aims to promote the practice of latent variable modeling for multidisciplinary research in areas including computational social science and the digital humanities. He earned his Ph.D. in computer science at the University of California, Irvine, and was a postdoctoral scholar at the University of California, Santa Cruz. His master's and bachelor's degrees were earned with first class honours at the University of Waikato, New Zealand, where he also contributed to the Weka data mining system.
Eye-Tracking Data for the Improvement of Digital Pathology Diagnosis
Sharon Fox Southeast Louisiana Veterans Health Care System
Abstract: Digital anatomic pathology refers to an electronic platform for performing pathologic diagnosis that replicates glass slide-based microscopy. Digital slides also provide a means to understand the visual process by which pathologists arrive at image-based diagnoses. Increasing innovation in this field promises to reduce overall healthcare costs, while improving accessibility and the ability to achieve expert consultation across many institutions. My prior work has utilized eye-tracking technology to study expert pathologists, and in particular patterns of gaze associated with different forms of diagnostic visual processing. Eye-tracking systems enable the visualization and analysis of the eye points of gaze of the viewer, and provide the link to our understanding of the interface between digital images and the interpreting pathologist.
One advantage of digital pathology is the potential to utilize computer-assisted diagnostics (CAD). The information gained through eye-tracking data can allow us to improve the efficiency of CAD for common disease states. While my research has not been directly within the field of computer science, I hope to collaborate on topics of common interest to bioengineers, computer scientists, and clinicians. This talk will therefore focus upon some of the current methods employed in computer image analysis for pathology diagnosis, and the ways in which data from human “visual experts” may improve the accuracy and efficiency of these techniques.
About the Speaker: Sharon Fox is an Anatomic Pathologist in the Pathology and Laboratory Medicine Service at the new Southeast Louisiana Veterans Health Care System, and a current recipient of the NIH R25 Fellowship from Boston Children’s Hospital for work in the field of digital pathology. She received her M.D. from Harvard Medical School in 2008, with honors in Biomedical Imaging, and her Ph.D. in medical engineering from MIT in 2012. She completed residency training in anatomic pathology at Beth Israel Deaconess Medical Center, Boston and LSU Health Sciences in New Orleans, and is a current Diplomate of the American Board of Pathology. She has received numerous awards from the Digital Pathology Association for her work as a trainee, and is continuing her research as faculty in the VA system.
Egocentric Computer Vision, For Fun and Science
David Crandall Indiana University
This event will be held on Tuesday, 3/14/2017
, from 3:30 p.m. - 4:30 p.m
. in Stanley Thomas, Room 302. Please note the special weekday and time for this event.
Abstract: New sources of large-scale visual data raise both opportunities and challenges for computer vision. For example, each of the nearly trillion photos on Facebook is an observation of what the world looked like at a particular point in time and space, and what a particular photographer was paying attention to. Meanwhile, low-cost wearable cameras (like GoPro) are entering the mainstream, allowing people to record and share their lives from a first-person, "egocentric" perspective. How can vision help people organize these (and other) vast but noisy datasets? What could mining these rich datasets reveal about ourselves and about the world in general? In this talk, I'll describe recent work investigating these questions, focusing on two lines of work on egocentric imagery as examples. The first is for consumer applications, where our goal is to develop automated classifiers to help categorize lifelogging images across several dimensions. The second is an interdisciplinary project using computer vision with wearable cameras to study parent-child interactions in order to better understand child learning. Despite the different goals, these applications share common themes of robustly recognizing image content in noisy, highly dynamic, unstructured imagery.
About the Speaker: David Crandall is an Associate Professor in the School of Informatics and Computing at Indiana University Bloomington, where he is a member of the programs in Computer Science, Informatics, Cognitive Science, and Data Science, and of the Center for Complex Networks and Systems Research. He received the Ph.D. in computer science from Cornell University in 2008 and the M.S. and B.S. degrees in computer science and engineering from the Pennsylvania State University in 2001. He was a Postdoctoral Research Associate at Cornell from 2008-2010, and a Senior Research Scientist with Eastman Kodak Company from 2001-2003. He received an NSF CAREER award in 2013 and a Google Faculty Research Award in 2014.
Provenance Analysis for First-Order Model Checking
Val Tannen University of Pennsylvania
Abstract: Model checking for FOL (First-Order Logic) is the computational problem of deciding, given an FO finite model (structure), A, and an FO sentence, s, whether s holds true in A or not. Its provenance analysis determines how that answer (holds or not) depends on the information that defines the model A. Provenance questions like this one have emerged in databases, scientific workflows, networks, and other areas.
We apply the semiring provenance framework, developed in databases, to the FOL model checking problem. This provides a non-standard semantics for FOL that refines logical truth to values in commutative semirings: the semiring of provenance polynomials, the Viterbi semiring of confidence scores, access control semirings, etc. the semantics can be used to synthesize models based on criteria like maximum confidence or public access. Our uniform treatment of logical negation also provides an approach to negative (a.k.a. why-not or non-answers) provenance.
Joint work with Erich Graedel, RWTH Aachen
About the Speaker: Val Tannen is a professor in the Department of Computer and Information Science of the University of Pennsylvania. He joined Penn after receiving his PhD from the Massachusetts Institute of Technology in 1987. After working for a time in Programming Languages, his current research interests are in Databases. Moreover, he has always been interested in applications of Logic to Computer Science and since 1994 he has also worked in Bioinformatics, leading a number of interdisciplinary projects. In Databases, he and his students and collaborators have worked on query language design and on models and systems for query optimization, parallel query processing, and data integration. More recently their work has focused on models and systems for data sharing, data provenance, the management of uncertain information and algorithmic provisioning for what-if analysis. Tannen has received the 20 year Test-of-Time Award from ICDT and the 10 year Test-of-Time Award from PODS. He is an ACM Fellow.
Dynamic Resource Control and Optimization for Data-Intensive Network Systems
Kevin Liu Ohio State University
This event will be held on Tuesday
, from 2:00 - 3:00 p.m.
in Stanley Thomas, Room 302. Please note the special weekday and time for this event.
Abstract: Due to the proliferation of smart mobile devices and Internet-of-Things (IoT), recent years have witnessed an explosive growth of mobile data demands. As a result, today's data network infrastructures are being stretched to their capacity limits. The quest for an ever-increasing network capacity has attracted tremendous research interests to develop new data-intensive networking technologies, which is envisioned to be the backbone of future IoT. However, the emerging IoT applications also introduce much more stringent performance requirements on throughput, latency, and convergence speed in controlling data network infrastructure.
To this end, in this talk, we introduce a new momentum-based network congestion control and scheduling optimization approach to address the above challenges. Based on this momentum-based approach, we develop a cross-layer optimization framework that offers throughput-optimality, fast-convergence, and significant delay reduction. Further, we show that the proposed momentum-based approach offers an elegant three-way trade-off in throughput, delay, and convergence, which is achieved under a near index-type simple policy with two control degrees of freedom. Our work opens the door to an unexplored network congestion control and scheduling optimization paradigm that leverages advanced techniques based on "memory/momentum" information for data-intensive networking.
About the Speaker: Jia (Kevin) Liu received his Ph.D. degree in the Bradley Department of Electrical and Computer Engineering at Virginia Tech, Blacksburg, VA in 2010. He joined the Ohio State University as a postdoctoral researcher afterwards. He is currently a Research Assistant Professor in the Department of Electrical and Computer Engineering at the Ohio State University. His research areas include theoretical foundations of control and optimization for network systems, distributed algorithms design, and Internet-of-things security. Dr. Liu is a senior member of IEEE. His work has received numerous awards at top venues, including IEEE INFOCOM 2016 Best Paper Award, IEEE INFOCOM 2013 Best Paper Runner-up Award, IEEE INFOCOM 2011 Best Paper Runner-up Award, and IEEE ICC 2008 Best Paper Award. He is a recipient of the Bell Labs President Gold Award in 2001 and Chinese Government Award for Outstanding Ph.D. Students Abroad in 2008. He is currently the Sole PI of two active NSF grants on Massive MIMO networking and low-delay and fast-convergence stochastic network optimization. His research is also funded by AFOSR, AFRL, and ONR.
A Denotational Semantics for Concurrent Separation Logic
Stephen Brookes Carnegie Mellon University
This event will be held on Tuesday
, 3/21/2017, from 3:30 - 4:30 p.m.
in Boggs 242. Please note the special weekday and time for this event.
Abstract: We discuss the general aims of denotational semantics, in giving support to language definition and program analysis, and as a basis for automated tools and for logics of program correctness. Concurrent execution, typical of modern multicore and multiprocessor hardware, makes it all the more difficult to design correct programs, because of the potential for interference between threads, such as race conditions involving simultaneous attempts to update the same piece of shared state. We give a (mostly informal) account of the development of, and the ideas behind, Concurrent Separation Logic, a logic which has had substantial impact in both theory and practice. A denotational semantic model is used to prove soundness of this logic, showing that every program provable in CSL is race-free. This illustrates the vital foundational role to be played by semantics in general, in formalizing aspects of program behavior and in validating techniques for proving correctness. The talk should be accessible even for those without formal grounding in
logic or semantics.
About the Speaker: Stephen Brookes is Professor of Computer Science at Carnegie Mellon University. He works in concurrency and in the semantics of programming languages. He received is DPhil at Oxford in 1983 under the direction of C. A. R. Hoare; the next year the seminal paper, A Theory of Communicating Sequential Processes, jointly authored by Brookes, Hoare and Roscoe, appeared in the Journal of the ACM. Professor Brookes went to CMU in 1984 to join the semantics group, led by Dana Scott and John Reynolds. Professor Brookes’ research has touched on a number of important areas, including shared variable concurrency, the full abstraction problem for PCF, and recently weak memory. Professor Brookes shared the 2016 Goedel Prize with Peter O’Hearn for their invention of Concurrent Separation Logic.
Similarity Learning in the Era of Big Data
Shiyu Chang IBM T. J. Watson Research CenterTHIS EVENT HAS BEEN CANCELED.
Abstract: The notion of machines that can learn has caught imaginations since the days of the early computer. In recent years, as we face burgeoning amounts of data around us that no human mind can process, machines that can learn to automatically find insights from such vast amounts of data have become a growing necessity. The field of machine learning is a modern marriage between computer science and statistics driven by tremendous industrial demands. The soul behind many applications is based on the so-called “similarity learning”. Learning similarities is often used as a subroutine in important data mining and machine learning tasks. For example, recommender systems utilize the learned metric to measure the relevance of the candidate items to target users. Applications of this approach also exist in the context of clinical decision support, search, and retrieval settings. However, the three-V (volume, variety, and velocity) natures of big data make learning similarity for pattern discovery and data analysis facing new challenges. How to reveal the truth from massive unlabeled data? How to handle data with multimodality? What if the data consist network structures? Does temporal dynamic effect the process of decision-making? For example, in clinical decision making, doctors retrieve the most similar clinical pathway for auxiliary diagnosis. However, the sheer volume and complexity of the data present major barriers toward their translation into effective clinical actions. In this talk, I will illustrate some of these challenges with examples from my works on foundations of similarity learning. I will show that with judicious design together with rigorous mathematics for learning similarities, we are able to make various kinds of impact on society and uncover surprising natural and social phenomena.
About the Speaker: Shiyu Chang is a Research Staff Member at IBM Thomas J. Watson Research Center. He recently obtained his Ph.D. from the University of Illinois at Urbana-Champaign (UIUC) under the supervision of Prof. Thomas S. Huang. Shiyu has a wide range of research interests in data explorations and analytics at large-scale. Specifically, his current research directions lie on developing novel machine learning algorithms to solve complex computational tasks in real-world. Shiyu received his B.S. degree at UIUC in 2011 with the highest university honor (Bronze Tablet Award). He graduated from the Department of Electrical and Computer Engineering at UIUC and obtained his M.S. degree in 2014. He is a recipient of the Thomas and Margaret Huang Award in 2016 and the Kodak Fellowship Award in 2014. Most of Shiyu’s research has been published in top data mining, computer vision and artificial intelligent venues including SIGKDD, WWW, CVPR, WSDM, ICDM, SDM, IJCAI etc. The paper “Factorized Similarity Learning in Networks” has been selected as the best student paper in ICDM 2014.