Sergey Levine (UC Berkeley) and Chelsea Finn (UC Berkeley)
Deep learning methods, which combine high-capacity neural network models with simple and scalable training algorithms, have made a tremendous impact across a range of supervised learning domains, including computer vision, speech recognition, and natural language processing. This success has been enabled by the ability of deep networks to capture complex, high-dimensional functions and learn flexible distributed representations. Can this capability be brought to bear on real-world decision making and control problems, where the machine must not only classify complex sensory patterns, but choose actions and reason about their long-term consequences?
Decision making and control problems lack the close supervision present in more classic deep learning applications, and present a number of challenges that necessitate new algorithmic developments. In this tutorial, we will cover the foundational theory of reinforcement and optimal control as it relates to deep reinforcement learning, discuss a number of recent results on extending deep learning into decision making and control, including model-based algorithms, imitation learning, and inverse reinforcement learning, and explore the frontiers and limitations of current deep reinforcement learning algorithms.
Alex Smola (AWS) and Aran Khanna (AWS)
We present MxNet Gluon, an easy to use tool for designing a wide range of networks from image processing (LeNet, inception, etc.) to advanced NLP (TreeLSTM). It combines the convenience of imperative frameworks (PyTorch, Torch, Chainer) with efficient symbolic execution (TensorFlow, CNTK).
The tutorial covers the following issues: basic distributed linear algebra with NDArray, automatic differentiation of code, and designing networks from scratch (and using Gluon). Subsequently we cover convenience and efficiency features such as automagic shape inference, deferred initialization and lazy evaluation, and hybridization of compute graphs. We then discuss structured architectures such as TreeLSTMs, which are key for natural language processing. We conclude by showing how to perform parallel and distributed training on multiple GPUs and multiple machines. For Jupyter notebooks and details see http://gluon.mxnet.io and https://github.com/zackchase/mxnet-the-straight-dope
Ankur Moitra (MIT)
In every corner of machine learning and statistics, there is a need for estimators that work not just in an idealized model but even when their assumptions are violated. It turns out that being provably robust and being efficiently computable are often at odds with each other. In even the most basic settings such as robustly computing the mean and covariance, until recently the only known estimators were either hard to compute or could only tolerate a negligible fraction of errors in high-dimensional applications.
In this tutorial, we will survey the exciting recent progress in algorithmic robust statistics. We will give the first provably robust and efficiently computable estimators for several fundamental questions that were thought to be hard, and explain the main insights behind them. We will give practical applications to exploratory data analysis. Finally, we raise some philosophical questions about robustness. It is standard to compare algorithms (especially those with provable guarantees) in terms of their running time and sample complexity. But what frameworks can be used to explore their robustness?
Alekh Agarwal (Microsoft Research) and John Langford (Microsoft Research)
This is a tutorial about real-world use of interactive and online learning. We focus on systems for practical applications ranging from recommendation tasks and ad-display, to clinical trials and adaptive decision making in computer systems. There is quite a bit of foundational theory and algorithms from the field of machine learning yet practical use is fraught with several challenges. Success in interactive learning requires a complete learning system which handles exploration, data-flow, logging and real-time updating supporting the core algorithm.
Each potential application also comes with multiple design choices and often do not fit the setting in theory as-is. We cover both foundational principles which have proved practically essential as well as recipes for success from practical experience. After the tutorial, participants should have both a firm understanding of the foundations and the practical ability to deploy and start using such a system in an hour.
Zeyuan Allen-Zhu (Microsoft Research)
In this tutorial, we will provide an accessible and extensive overview on recent advances to optimization methods based on stochastic gradient descent (SGD), for both convex and non-convex tasks. In particular, this tutorial shall try to answer the following questions with theoretical support. How can we properly use momentum to speed up SGD? What is the maximum parallel speedup can we achieve for SGD? When should we use dual or primal-dual approach to replace SGD? What is the difference between coordinate descent (e.g. SDCA) and SGD? How is variance reduction affecting the performance of SGD? Why does the second-order information help us improve the convergence of SGD?
Machine Learning for Autonomous Vehicles
Raquel Urtasun (University of Toronto), Drew Gray (Uber), and Carl Wellington (Uber ATG)
Drew and Carl presenting
The tutorial will cover core machine learning topics for self-driving cars. The objectives are (1) to call to arms of researchers and practitioners to tackle the pressing challenges of autonomous driving; (2) equip participants with enough background to attend the companion workshop on ML for autonomous vehicles. Machine learning holds the key to solve autonomous driving. Despite recent advances, major problems are far from solved both in terms of fundamental research and engineering challenges.
Oriol Vinyals (Google DeepMind) and Navdeep Jaitly (Google Brain)
Sequence-To-Sequence (Seq2Seq) learning was introduced in 2014, and has since been extensively studied and extended to a large variety of domains. Seq2Seq yields state-of-the-art performance on several applications such as machine translation, image captioning, speech generation, or summarization. In this tutorial, we will survey the basics of this framework, its applications, main algorithmic techniques and future research directions.
Been Kim (presenter, Google Brain) and Finale Doshi-Velez (Harvard)
As machine learning systems become ubiquitous, there has been a surge of interest in interpretable machine learning: systems that provide explanation for their outputs. These explanations are often used to qualitatively assess other criteria such as safety or non-discrimination. However, despite the interest in interpretability, there is little consensus on what interpretable machine learning is and how it should be measured. In this talk, we first suggest a definitions of interpretability and describe when interpretability is needed (and when it is not). Then we will review related work, all the way back from classical AI systems to recent efforts for interpretability in deep learning. Finally, we will talk about a taxonomy for rigorous evaluation, and recommendations for researchers. We will end with discussing open questions and concrete problems for new researchers.
Yan Liu (USC) and Jimeng Sun (Georgia Tech)
It is widely believed that deep learning and artificial intelligence techniques will fundamentally change health care industries. Even though recent development in deep learning has achieved successes in many applications, such as computer vision, natural language processing, speech recognition and so on, health care applications pose many significantly different challenges to existing deep learning models. Examples include but not are limited to interpretations for prediction, heterogeneity in data, missing value, multi-rate multiresolution data, big and small data, and privacy issues.
In this tutorial, we will discuss a series of problems in health care that can benefit from deep learning models, the challenges as well as recent advances in addressing those. We will also include data sets and demos of working systems.