Highlights and impressions from NIPS conference on machine learning
This year’s NIPS was an epicenter of the current enthusiasm about AI and deep learning – there was a visceral sense of how quickly the field of machine learning is progressing, and two new AI startups were announced. Attendance has almost doubled compared to the 2014 conference (I hope they make it multi-track next year), and several popular workshops were standing room only. Given that there were only 400 accepted papers and almost 4000 people attending, most people were there to learn and socialize. The conference was a socially intense experience that reminded me a bit of Burning Man – the overall sense of excitement, the high density of spontaneous interesting conversations, the number of parallel events at any given time, and of course the accumulating exhaustion.
Some interesting talks and posters
Sergey Levine’s robotics demo at the crowded Deep Reinforcement Learning workshop (we showed up half an hour early to claim spots on the floor). This was one of the talks that gave me a sense of fast progress in the field. The presentation started with videos from this summer’s DARPA robotics challenge, where the robots kept falling down while trying to walk or open a door. Levine proceeded to outline his recent work on guided policy search, alternating between trajectory optimization and supervised training of the neural network, and granularizing complex tasks. He showed demos of robots successfully performing various high-dexterity tasks, like opening a door, screwing on a bottle cap, or putting a coat hanger on a rack. Impressive!
Generative image models using a pyramid of adversarial networks by Denton & Chintala. Generating realistic-looking images using one neural net as a generator and another as an evaluator – the generator tries to fool the evaluator by making the image indistinguishable from a real one, while the evaluator tries to tell real and generated images apart. Starting from a coarse image, successively finer images are generated using the adversarial networks from the coarser images at the previous level of the pyramid. The resulting images were mistaken for real images 40% of the time in the experiment, and around 80% of them looked realistic to me when staring at the poster.
Path-SGD by Salakhutdinov et al, a scale-invariant version of the stochastic gradient descent algorithm. Standard SGD uses the L2 norm in as the measure of distance in the parameter space, and rescaling the weights can have large effects on optimization speed. Path-SGD instead regularizes the maximum norm of incoming weights into any unit, minimizing the max-norm over all rescalings of the weights. The resulting norm (called a “path regularizer”) is shown to be invariant to weight rescaling. Overall a principled approach with good empirical results.
End-to-end memory networks by Sukhbaatar et al (video), an extension of memory networks – neural networks that learn to read and write to a memory component. Unlike traditional memory networks, the end-to-end version eliminates the need for supervision at each layer. This makes the method applicable to a wider variety of domains – it is competitive both with memory networks for question answering and with LSTMs for language modeling. It was fun to see the model perform basic inductive reasoning about locations, colors and sizes of objects.
Algorithms Among Us symposium (videos)
A highlight of the conference was the Algorithms Among Us symposium on the societal impacts of machine learning, which I helped organize along with others from FLI. The symposium consisted of 3 panels and accompanying talks – on near-term AI impacts, timelines to general AI, and research priorities for beneficial AI. The symposium organizers (Adrian Weller, Michael Osborne and Murray Shanahan) gathered an impressive array of AI luminaries with a variety of views on the subject, including Cynthia Dwork from Microsoft, Yann LeCun from Facebook, Andrew Ng from Baidu, and Shane Legg from DeepMind. All three panel topics generated lively debate among the participants.
Andrew Ng took his famous statement that “worrying about general AI is like worrying about overpopulation on Mars” to the next level, namely “overpopulation on Alpha Centauri” (is Mars too realistic these days?). But he also endorsed long-term AI safety research, saying that it’s not his cup of tea but someone should be working on it. Ng's main argument was that even superforecasters can’t predict anything 5 years into the future, so any predictions on longer time horizons are useless. However, as Murray pointed out, having complete uncertainty past a 5-year horizon means that you can’t rule out reaching general AI in 20 years either.
With regards to roadmapping the remaining milestones to general AI, Yann LeCun gave an apt analogy of traveling through mountains in the fog – there are some you can see, and an unknown number hiding in the fog. He also argued that advanced AI is unlikely to be human-like, and cautioned against anthropomorphizing it.
In the research priorities panel, Shane Legg gave some specific recommendations – goal-system stability, interruptibility, sandboxing / containment, and formalization of various thought experiments (e.g. in Superintelligence). He pointed out that AI safety is both overblown and underemphasized – while the risks from advanced AI are not imminent the way they are usually portrayed in the media, more thought and resources need to be devoted to the challenging research problems involved.
One question that came up during the symposium is the importance of interpretability for AI systems, which is actually the topic of my current research project. There was some disagreement about the tradeoff between effectiveness and interpretability. LeCun thought that the main advantage of interpretability is increased robustness, and improvements to transfer learning should produce that anyway, without decreases in effectiveness. Percy Liang argued that transparency is needed to explain to the rest of the world what machine learning systems are doing, which is increasingly important in many applications. LeCun also pointed out that machine learning systems that are usually considered transparent, such as decision trees, aren’t necessarily so. There was also disagreement about what interpretability means in the first place – as Cynthia Dwork said, we need a clearer definition before making any conclusions. It seems that more work is needed both on defining interpretability and on figuring out how to achieve it without sacrificing effectiveness.
Overall, the symposium was super interesting and gave a lot of food for thought (here’s a more detailed summary by Ariel from FLI). Thanks to Adrian, Michael and Murray for their hard work in putting it together.
It was exciting to see two new AI startups announced at NIPS – OpenAI, led by Ilya Sutskever and backed by Musk, Altman and others, and Geometric Intelligence, led by Zoubin Ghahramani and Gary Marcus.
OpenAI is a non-profit with a mission to democratize AI research and keep it beneficial for humanity, and a whopping $1Bn in funding pledged. They believe that it’s safer to have AI breakthroughs happening in a non-profit, unaffected by financial interests, rather than monopolized by for-profit corporations. The intent to open-source the research seems clearly good in the short and medium term, but raises some concerns in the long run when getting closer to general AI. As an OpenAI researcher emphasized in an interview, “we are not obligated to share everything – in that sense the name of the company is a misnomer”, and decisions to open-source the research would in fact be made on a case-by-case basis.
While OpenAI plans to focus on deep learning in their first few years, Geometric Intelligence is developing an alternative approach to deep learning that can learn more effectively from less data. Gary Marcus argues that we need to learn more from how human minds acquire knowledge in order to build advanced AI (an inspiration for the venture was observing his toddler learn about the world). I’m looking forward to what comes out of the variety of approaches taken by these new companies and other research teams.
(Thanks to Janos Kramar for his help with editing this post.)