Tag Archives: EECS

Lead to fully automated speech recognition

Speech recognition systems, such as those that convert speech to text on cellphones, are generally the result of machine learning. A computer pores through thousands or even millions of audio files and their transcriptions, and learns which acoustic features correspond to which typed words.

But transcribing recordings is costly, time-consuming work, which has limited speech recognition to a small subset of languages spoken in wealthy nations.

At the Neural Information Processing Systems conference this week, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) are presenting a new approach to training speech-recognition systems that doesn’t depend on transcription. Instead, their system analyzes correspondences between images and spoken descriptions of those images, as captured in a large collection of audio recordings. The system then learns which acoustic features of the recordings correlate with which image characteristics.

“The goal of this work is to try to get the machine to learn language more like the way humans do,” says Jim Glass, a senior research scientist at CSAIL and a co-author on the paper describing the new system. “The current methods that people use to train up speech recognizers are very supervised. You get an utterance, and you’re told what’s said. And you do this for a large body of data.

“Big advances have been made — Siri, Google — but it’s expensive to get those annotations, and people have thus focused on, really, the major languages of the world. There are 7,000 languages, and I think less than 2 percent have ASR [automatic speech recognition] capability, and probably nothing is going to be done to address the others. So if you’re trying to think about how technology can be beneficial for society at large, it’s interesting to think about what we need to do to change the current situation. And the approach we’ve been taking through the years is looking at what we can learn with less supervision.”

Joining Glass on the paper are first author David Harwath, a graduate student in electrical engineering and computer science (EECS) at MIT; and Antonio Torralba, an EECS professor.

Visual semantics

The version of the system reported in the new paper doesn’t correlate recorded speech with written text; instead, it correlates speech with groups of thematically related images. But that correlation could serve as the basis for others.

If, for instance, an utterance is associated with a particular class of images, and the images have text terms associated with them, it should be possible to find a likely transcription of the utterance, all without human intervention. Similarly, a class of images with associated text terms in different languages could provide a way to do automatic translation.

Conversely, text terms associated with similar clusters of images, such as, say, “storm” and “clouds,”  could be inferred to have related meanings. Because the system in some sense learns words’ meanings — the images associated with them — and not just their sounds, it has a wider range of potential applications than a standard speech recognition system.

To test their system, the researchers used a database of 1,000 images, each of which had a recording of a free-form verbal description associated with it. They would feed their system one of the recordings and ask it to retrieve the 10 images that best matched it. That set of 10 images would contain the correct one 31 percent of the time.

Provided key Knowladge

This week the Association for Computer Machinery (ACM) announced its 2016 fellows, which include four principal investigators from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL): professors Erik Demaine, Fredo Durand, William Freeman, and Daniel Jackson. They were among the 1 percent of ACM members to receive the distinction.

“Erik, Fredo, Bill, and Daniel are wonderful colleagues and extraordinary computer scientists, and I am so happy to see their contributions recognized with the most prestigious member grade of the ACM,” says CSAIL Director Daniela Rus, who herself was named a fellow last year. “All of us at CSAIL are very proud of these researchers for receiving these esteemed honors.”

ACM’s 53 fellows for 2016 were named for their distinctive contributions spanning such computer science disciplines as computer vision, computer graphics, software design, machine learning, algorithms, and theoretical computer science.

“As nearly 100,000 computing professionals are members of our association, to be selected to join the top 1 percent is truly an honor,” says ACM President Vicki L. Hanson. “Fellows are chosen by their peers and hail from leading universities, corporations and research labs throughout the world. Their inspiration, insights and dedication bring immeasurable benefits that improve lives and help drive the global economy. ”

Demaine was selected for contributions to geometric computing, data structures, and graph algorithms. His research interests include the geometry of understanding how proteins fold and the computational difficulty of playing games. He received the MacArthur Fellowship for his work in computational geometry. He and his father Martin Demaine have produced numerous curved-crease sculptures that explore the intersection of science and art — and that are currently in the Museum of Modern Art in New York.

A Department of Electrical Engineering and Computer Science (EECS) professor whose research spans video graphics and photo-generation, Durand was selected for contributions to computational photography and computer graphics rendering. He also works to develop new algorithms to enable image enhancements and improved scene understanding. He received the ACM SIGGRAPH Computer Graphics Achievement Award in 2016.

Freeman is the Thomas and Gerd Perkins Professor of EECS at MIT. He was selected as a fellow for his contributions to computer vision, machine learning, and computer graphics. His research interests also include Bayesian models of visual perception and computational photography. He received “Outstanding Paper” awards at computer vision and machine learning conferences in 1997, 2006, 2009 and 2012, as well as ACM’s “Test of Time” awards for papers from 1990 and 1995.

Jackson is an EECS professor and associate director of CSAIL whose work has focused on improving the functionality and dependability of software through lightweight formal methods. He was selected by ACM for contributions to software modeling and the creation of Alloy, a modeling language that has been used to find flaws in many designs and protocols. He is a MacVicar Fellow and also received this year’s ACM SIGSOFT Impact Paper Award.