Category Archives: Internet

A nightmare on Ames Street

“People are afraid of artificial intelligence, from autonomous cars making unethical decisions in accidents, to robots taking our jobs and causing mass unemployment, to runaway superintelligent machines obliterating humanity. Engineering pioneer and inventor Elon Musk famously said that as we develop AI, we are ‘summoning the demon.’

Halloween is a time when people celebrate the things that terrify them. So it seems like a perfect occasion for an MIT project that explores society’s fear of AI. And what better way to do this than have an actual AI literally scare us in an immediate, visceral sense? Postdoc Pinar Yanardhag, visiting scientist Manuel Cebrian, and I used a recently published, open-source deep neural network algorithm to learn features of a haunted house and apply these features to a picture of the Media Lab.

We also launched the Nightmare Machine website, where people can vote on which AI-generated horror images they find scary; these were generated using the same algorithm, combined with another recent algorithm for generating faces. So far, we’ve collected over 300,000 individual votes, and the results are clear: the AI demon is here, and it can terrify us. Happy Halloween!”

—Iyad Rahwan, AT&T Career Development Professor and an associate professor of media arts and sciences in the MIT Media Lab

Technique would reveal the basis

In recent years, the best-performing systems in artificial-intelligence research have come courtesy of neural networks, which look for patterns in training data that yield useful predictions or classifications. A neural net might, for instance, be trained to recognize certain objects in digital images or to infer the topics of texts.

But neural nets are black boxes. After training, a network may be very good at classifying data, but even its creators will have no idea why. With visual data, it’s sometimes possible to automate experiments that determine which visual features a neural net is responding to. But text-processing systems tend to be more opaque.

At the Association for Computational Linguistics’ Conference on Empirical Methods in Natural Language Processing, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) will present a new way to train neural networks so that they provide not only predictions and classifications but rationales for their decisions.

“In real-world applications, sometimes people really want to know why the model makes the predictions it does,” says Tao Lei, an MIT graduate student in electrical engineering and computer science and first author on the new paper. “One major reason that doctors don’t trust machine-learning methods is that there’s no evidence.”

“It’s not only the medical domain,” adds Regina Barzilay, the Delta Electronics Professor of Electrical Engineering and Computer Science and Lei’s thesis advisor. “It’s in any domain where the cost of making the wrong prediction is very high. You need to justify why you did it.”

“There’s a broader aspect to this work, as well,” says Tommi Jaakkola, an MIT professor of electrical engineering and computer science and the third coauthor on the paper. “You may not want to just verify that the model is making the prediction in the right way; you might also want to exert some influence in terms of the types of predictions that it should make. How does a layperson communicate with a complex model that’s trained with algorithms that they know nothing about? They might be able to tell you about the rationale for a particular prediction. In that sense it opens up a different way of communicating with the model.”

Virtual brains

Neural networks are so called because they mimic — approximately — the structure of the brain. They are composed of a large number of processing nodes that, like individual neurons, are capable of only very simple computations but are connected to each other in dense networks.

In a process referred to as “deep learning,” training data is fed to a network’s input nodes, which modify it and feed it to other nodes, which modify it and feed it to still other nodes, and so on. The values stored in the network’s output nodes are then correlated with the classification category that the network is trying to learn — such as the objects in an image, or the topic of an essay.

Over the course of the network’s training, the operations performed by the individual nodes are continuously modified to yield consistently good results across the whole set of training examples. By the end of the process, the computer scientists who programmed the network often have no idea what the nodes’ settings are. Even if they do, it can be very hard to translate that low-level information back into an intelligible description of the system’s decision-making process.

In the new paper, Lei, Barzilay, and Jaakkola specifically address neural nets trained on textual data. To enable interpretation of a neural net’s decisions, the CSAIL researchers divide the net into two modules. The first module extracts segments of text from the training data, and the segments are scored according to their length and their coherence: The shorter the segment, and the more of it that is drawn from strings of consecutive words, the higher its score.

The segments selected by the first module are then passed to the second module, which performs the prediction or classification task. The modules are trained together, and the goal of training is to maximize both the score of the extracted segments and the accuracy of prediction or classification.

One of the data sets on which the researchers tested their system is a group of reviews from a website where users evaluate different beers. The data set includes the raw text of the reviews and the corresponding ratings, using a five-star system, on each of three attributes: aroma, palate, and appearance.

What makes the data attractive to natural-language-processing researchers is that it’s also been annotated by hand, to indicate which sentences in the reviews correspond to which scores. For example, a review might consist of eight or nine sentences, and the annotator might have highlighted those that refer to the beer’s “tan-colored head about half an inch thick,” “signature Guinness smells,” and “lack of carbonation.” Each sentence is correlated with a different attribute rating.

Reproduces aspects of human neurolog

MIT researchers and their colleagues have developed a new computational model of the human brain’s face-recognition mechanism that seems to capture aspects of human neurology that previous models have missed.

The researchers designed a machine-learning system that implemented their model, and they trained it to recognize particular faces by feeding it a battery of sample images. They found that the trained system included an intermediate processing step that represented a face’s degree of rotation — say, 45 degrees from center — but not the direction — left or right.

This property wasn’t built into the system; it emerged spontaneously from the training process. But it duplicates an experimentally observed feature of the primate face-processing mechanism. The researchers consider this an indication that their system and the brain are doing something similar.

“This is not a proof that we understand what’s going on,” says Tomaso Poggio, a professor of brain and cognitive sciences at MIT and director of the Center for Brains, Minds, and Machines (CBMM), a multi-institution research consortium funded by the National Science Foundation and headquartered at MIT. “Models are kind of cartoons of reality, especially in biology. So I would be surprised if things turn out to be this simple. But I think it’s strong evidence that we are on the right track.”

Indeed, the researchers’ new paper includes a mathematical proof that the particular type of machine-learning system they use, which was intended to offer what Poggio calls a “biologically plausible” model of the nervous system, will inevitably yield intermediary representations that are indifferent to angle of rotation.

Poggio, who is also a primary investigator at MIT’s McGovern Institute for Brain Research, is the senior author on a paper describing the new work, which appeared today in the journal Computational Biology. He’s joined on the paper by several other members of both the CBMM and the McGovern Institute: first author Joel Leibo, a researcher at Google DeepMind, who earned his PhD in brain and cognitive sciences from MIT with Poggio as his advisor; Qianli Liao, an MIT graduate student in electrical engineering and computer science; Fabio Anselmi, a postdoc in the IIT@MIT Laboratory for Computational and Statistical Learning, a joint venture of MIT and the Italian Institute of Technology; and Winrich Freiwald, an associate professor at the Rockefeller University.

The new paper is “a nice illustration of what we want to do in [CBMM], which is this integration of machine learning and computer science on one hand, neurophysiology on the other, and aspects of human behavior,” Poggio says. “That means not only what algorithms does the brain use, but what are the circuits in the brain that implement these algorithms.”

Poggio has long believed that the brain must produce “invariant” representations of faces and other objects, meaning representations that are indifferent to objects’ orientation in space, their distance from the viewer, or their location in the visual field. Magnetic resonance scans of human and monkey brains suggested as much, but in 2010, Freiwald published a study describing the neuroanatomy of macaque monkeys’ face-recognition mechanism in much greater detail.

Freiwald showed that information from the monkey’s optic nerves passes through a series of brain locations, each of which is less sensitive to face orientation than the last. Neurons in the first region fire only in response to particular face orientations; neurons in the final region fire regardless of the face’s orientation — an invariant representation.

But neurons in an intermediate region appear to be “mirror symmetric”: That is, they’re sensitive to the angle of face rotation without respect to direction. In the first region, one cluster of neurons will fire if a face is rotated 45 degrees to the left, and a different cluster will fire if it’s rotated 45 degrees to the right. In the final region, the same cluster of neurons will fire whether the face is rotated 30 degrees, 45 degrees, 90 degrees, or anywhere in-between. But in the intermediate region, a particular cluster of neurons will fire if the face is rotated by 45 degrees in either direction, another if it’s rotated 30 degrees, and so on.

This is the behavior that the researchers’ machine-learning system reproduced. “It was not a model that was trying to explain mirror symmetry,” Poggio says. “This model was trying to explain invariance, and in the process, there is this other property that pops out.”

Neural training

The researchers’ machine-learning system is a neural network, so called because it roughly approximates the architecture of the human brain. A neural network consists of very simple processing units, arranged into layers, that are densely connected to the processing units — or nodes — in the layers above and below. Data are fed into the bottom layer of the network, which processes them in some way and feeds them to the next layer, and so on. During training, the output of the top layer is correlated with some classification criterion — say, correctly determining whether a given image depicts a particular person.

Lets nonexperts optimize programs

Dynamic programming is a technique that can yield relatively efficient solutions to computational problems in economics, genomic analysis, and other fields. But adapting it to computer chips with multiple “cores,” or processing units, requires a level of programming expertise that few economists and biologists have.

Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Stony Brook University aim to change that, with a new system that allows users to describe what they want their programs to do in very general terms. It then automatically produces versions of those programs that are optimized to run on multicore chips. It also guarantees that the new versions will yield exactly the same results that the single-core versions would, albeit much faster.

In experiments, the researchers used the system to “parallelize” several algorithms that used dynamic programming, splitting them up so that they would run on multicore chips. The resulting programs were between three and 11 times as fast as those produced by earlier techniques for automatic parallelization, and they were generally as efficient as those that were hand-parallelized by computer scientists.

The researchers presented their new system last week at the Association for Computing Machinery’s conference on Systems, Programming, Languages and Applications: Software for Humanity.

Dynamic programming offers exponential speedups on a certain class of problems because it stores and reuses the results of computations, rather than recomputing them every time they’re required.

“But you need more memory, because you store the results of intermediate computations,” says Shachar Itzhaky, first author on the new paper and a postdoc in the group of Armando Solar-Lezama, an associate professor of electrical engineering and computer science at MIT. “When you come to implement it, you realize that you don’t get as much speedup as you thought you would, because the memory is slow. When you store and fetch, of course, it’s still faster than redoing the computation, but it’s not as fast as it could have been.”

Outsourcing complexity

Computer scientists avoid this problem by reordering computations so that those requiring a particular stored value are executed in sequence, minimizing the number of times that the value has to be recalled from memory. That’s relatively easy to do with a single-core computer, but with multicore computers, when multiple cores are sharing data stored at multiple locations, memory management become much more complex. A hand-optimized, parallel version of a dynamic-programming algorithm is typically 10 times as long as the single-core version, and the individual lines of code are more complex, to boot.

The CSAIL researchers’ new system — dubbed Bellmania, after Richard Bellman, the applied mathematician who pioneered dynamic programming — adopts a parallelization strategy called recursive divide-and-conquer. Suppose that the task of a parallel algorithm is to perform a sequence of computations on a grid of numbers, known as a matrix. Its first task might be to divide the grid into four parts, each to be processed separately.

But then it might divide each of those four parts into four parts, and each of those into another four parts, and so on. Because this approach — recursion — involves breaking a problem into smaller subproblems, it naturally lends itself to parallelization.

Computers could be much more powerful than previously realized

Quantum computers promise huge speedups on some computational problems because they harness a strange physical property called entanglement, in which the physical state of one tiny particle depends on measurements made of another. In quantum computers, entanglement is a computational resource, roughly like a chip’s clock cycles — kilohertz, megahertz, gigahertz — and memory in a conventional computer.

In a recent paper in the journal Proceedings of the National Academy of Sciences, researchers at MIT and IBM’s Thomas J. Watson Research Center show that simple systems of quantum particles exhibit exponentially more entanglement than was previously believed. That means that quantum computers — or other quantum information devices — powerful enough to be of practical use could be closer than we thought.

Where ordinary computers deal in bits of information, quantum computers deal in quantum bits, or qubits. Previously, researchers believed that in a certain class of simple quantum systems, the degree of entanglement was, at best, proportional to the logarithm of the number of qubits.

“For models that satisfy certain physical-reasonability criteria — i.e., they’re not too contrived; they’re something that you could in principle realize in the lab — people thought that a factor of the log of the system size was the best you can do,” says Ramis Movassagh, a researcher at Watson and one of the paper’s two co-authors. “What we proved is that the entanglement scales as the square root of the system size. Which is really exponentially more.”

That means that a 10,000-qubit quantum computer could exhibit about 10 times as much entanglement as previously thought. And that difference increases exponentially as more qubits are added.

Logical or physical?

This matters because of the distinction, in quantum computing, between logical qubits and physical qubits. A logical qubit is an abstraction used to formulate quantum algorithms; a physical qubit is a tiny bit of matter whose quantum states are both controllable and entangled with those of other physical qubits.

A computation involving, say, 100 logical qubits would already be beyond the capacity of all the conventional computers in the world. But with most of today’s theoretical designs for general-purpose quantum computers, realizing a single logical qubit requires somewhere around 100 physical qubits. Most of the physical qubits are used for quantum error correction and to encode operations between logical qubits.

Since preserving entanglement across large groups of qubits is the biggest obstacle to developing working quantum devices, extracting more entanglement from smaller clusters of qubits could make quantum computing devices more practical.

Qubits are analogous to bits in a conventional computer, but where a conventional bit can take on the values 0 or 1, a qubit can be in “superposition,” meaning that it takes on both values at once. If qubits are entangled, they can take on all their possible states simultaneously. One qubit can take on two states, two qubits four, three qubits eight, four qubits 16, and so on. It’s the ability to, in some sense, evaluate computational alternatives simultaneously that gives quantum computers their extraordinary power.

Publicly traded corporation

Cook joined Apple in 1998 and was named its CEO in 2011. As chief executive, he has overseen the introduction of some of Apple’s innovative and popular products, including iPhone 7 and Apple Watch. An advocate for equality and champion of the environment, Cook reminds audiences that Apple’s mission is to change the world for the better, both through its products and its policies.

“Mr. Cook’s brilliance as a business leader, his genuineness as a human being, and his passion for issues that matter to our community make his voice one that I know will resonate deeply with our graduates,” MIT President L. Rafael Reif says. “I am delighted that he will join us for Commencement and eagerly await his charge to the Class of 2017.”

Before becoming CEO, Cook was Apple’s chief operating officer, responsible for the company’s worldwide sales and operations, including management of Apple’s global supply chain, sales activities, and service and support. He also headed the Macintosh division and played a key role in the development of strategic reseller and supplier relationships, ensuring the company’s flexibility in a demanding marketplace.

“Apple stands at the intersection of liberal arts and technology, and we’re proud to have many outstanding MIT graduates on our team,” Cook says. “We believe deeply that technology can be a powerful force for good, and I’m looking forward to speaking to the Class of 2017 as they look ahead to making their own mark on the world.”

Prior to joining Apple, Cook was vice president of corporate materials at Compaq, responsible for procuring and managing product inventory. Before that, he served as chief operating officer of the Reseller Division at Intelligent Electronics.

Cook also spent 12 years with IBM, ending as director of North American fulfillment, where he led manufacturing and distribution for IBM’s personal computer company in North and Latin America.

Cook earned a BS in industrial engineering from Auburn University in 1982, and an MBA from Duke University in 1988.

“Tim Cook is a trailblazer and an inspiration to innovators worldwide,” says Liana Ilutzi, president of MIT’s Class of 2017. “He represents the best of the entrepreneurial and fearless spirit of the MIT community. While faithfully maintaining his integrity and humility, Tim runs one of the most influential companies on the planet. We are beyond excited to have him with us for Commencement!”

“We are looking forward to hearing Tim Cook speak at Commencement,” says Graduate Student Council President Arolyn Conwill. “We believe that his innovative leadership at Apple, along with his commitment to advocacy on sustainability, security, and equality, will inspire graduates to make a far-reaching, positive impact on the world.”

Cook joins a list of notable recent MIT Commencement speakers, including actor and filmmaker Matt Damon (2016); U.S. Chief Technology Officer Megan Smith ’86 SM ’88 (2015); DuPont CEO Ellen Kullman (2014); Dropbox co-founder and CEO Drew Houston ’05 (2013); and Khan Academy founder Sal Khan ’98, MEng ’98 (2012).

“I am delighted with the selection of Tim Cook as the Commencement speaker,” says Chancellor for Academic Advancement Eric Grimson, the longstanding chair of MIT’s Commencement Committee. “Apple is widely viewed as a company that champions innovation, that seeks creative and inventive solutions to problems across a wide range of domains, and that looks to balance technology with issues of social good. These are all themes that are of great importance to our graduates, and I am sure his remarks will be an inspiration to them.”

Lead to fully automated speech recognition

Speech recognition systems, such as those that convert speech to text on cellphones, are generally the result of machine learning. A computer pores through thousands or even millions of audio files and their transcriptions, and learns which acoustic features correspond to which typed words.

But transcribing recordings is costly, time-consuming work, which has limited speech recognition to a small subset of languages spoken in wealthy nations.

At the Neural Information Processing Systems conference this week, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) are presenting a new approach to training speech-recognition systems that doesn’t depend on transcription. Instead, their system analyzes correspondences between images and spoken descriptions of those images, as captured in a large collection of audio recordings. The system then learns which acoustic features of the recordings correlate with which image characteristics.

“The goal of this work is to try to get the machine to learn language more like the way humans do,” says Jim Glass, a senior research scientist at CSAIL and a co-author on the paper describing the new system. “The current methods that people use to train up speech recognizers are very supervised. You get an utterance, and you’re told what’s said. And you do this for a large body of data.

“Big advances have been made — Siri, Google — but it’s expensive to get those annotations, and people have thus focused on, really, the major languages of the world. There are 7,000 languages, and I think less than 2 percent have ASR [automatic speech recognition] capability, and probably nothing is going to be done to address the others. So if you’re trying to think about how technology can be beneficial for society at large, it’s interesting to think about what we need to do to change the current situation. And the approach we’ve been taking through the years is looking at what we can learn with less supervision.”

Joining Glass on the paper are first author David Harwath, a graduate student in electrical engineering and computer science (EECS) at MIT; and Antonio Torralba, an EECS professor.

Visual semantics

The version of the system reported in the new paper doesn’t correlate recorded speech with written text; instead, it correlates speech with groups of thematically related images. But that correlation could serve as the basis for others.

If, for instance, an utterance is associated with a particular class of images, and the images have text terms associated with them, it should be possible to find a likely transcription of the utterance, all without human intervention. Similarly, a class of images with associated text terms in different languages could provide a way to do automatic translation.

Conversely, text terms associated with similar clusters of images, such as, say, “storm” and “clouds,”  could be inferred to have related meanings. Because the system in some sense learns words’ meanings — the images associated with them — and not just their sounds, it has a wider range of potential applications than a standard speech recognition system.

To test their system, the researchers used a database of 1,000 images, each of which had a recording of a free-form verbal description associated with it. They would feed their system one of the recordings and ask it to retrieve the 10 images that best matched it. That set of 10 images would contain the correct one 31 percent of the time.

Fabricate drones with a wide range

This fall’s new Federal Aviation Administration regulations have made drone flight easier than ever for both companies and consumers. But what if the drones out on the market aren’t exactly what you want?

A new system from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) is the first to allow users to design, simulate, and build their own custom drone. Users can change the size, shape, and structure of their drone based on the specific needs they have for payload, cost, flight time, battery usage, and other factors.

To demonstrate, researchers created a range of unusual-looking drones, including a five-rotor “pentacopter” and a rabbit-shaped “bunnycopter” with propellers of different sizes and rotors of different heights.

“This system opens up new possibilities for how drones look and function,” says MIT Professor Wojciech Matusik, who oversaw the project in CSAIL’s Computational Fabrication Group. “It’s no longer a one-size-fits-all approach for people who want to make and use drones for particular purposes.”

The interface lets users design drones with different propellers, rotors, and rods. It also provides guarantees that the drones it fabricates can take off, hover and land — which is no simple task considering the intricate technical trade-offs associated with drone weight, shape, and control.

“For example, adding more rotors generally lets you carry more weight, but you also need to think about how to balance the drone to make sure it doesn’t tip,” says PhD student Tao Du, who was first author on a related paper about the system. “Irregularly-shaped drones are very difficult to stabilize, which means that they require establishing very complex control parameters.”

Du and Matusik co-authored a paper with PhD student Adriana Schulz, postdoc Bo Zhu, and Assistant Professor Bernd Bickel of IST Austria. It will be presented next week at the annual SIGGRAPH Asia conference in Macao, China.

Today’s commercial drones only come in a small range of options, typically with an even number of rotors and upward-facing propellers. But there are many emerging use cases for other kinds of drones. For example, having an odd number of rotors might create a clearer view for a drone’s camera, or allow the drone to carry objects with unusual shapes.

Designing these less conventional drones, however, often requires expertise in multiple disciplines, including control systems, fabrication, and electronics.

“Developing multicopters like these that are actually flyable involves a lot of trial-and-error, tweaking the balance between all the propellers and rotors,” says Du. “It would be more or less impossible for an amateur user, especially one without any computer-science background.”

But the CSAIL group’s new system makes the process much easier. Users design drones by choosing from a database of parts and specifying their needs for things like payload, cost, and battery usage. The system computes the sizes of design elements like rod lengths and motor angles, and looks at metrics such as torque and thrust to determine whether the design will actually work. It also uses an “LQR controller” that takes information about a drone’s characteristics and surroundings to optimize its flight plan.

One of the project’s core challenges stemmed from the fact that a drone’s shape and structure (its “geometry”) is usually strongly tied to how it has been programmed to move (its “control”). To overcome this, researchers used what’s called an “alternating direction method,” which means that they reduced the number of variables by fixing some of them and optimizing the rest. This allowed the team to decouple the variables of geometry and control in a way that optimizes the drone’s performance.

“Once you decouple these variables, you turn a very complicated optimization problem into two easy sub-problems that we already have techniques for solving,” says Du. He envisions future versions of the system that could proactively give design suggestions, like recommending where a rotor should go to accommodate a desired payload.

“This is the first system in which users can interactively design a drone that incorporates both geometry and control,” says Nobuyuki Umetani, a research scientist at Autodesk, Inc., who was not involved in the paper. “This is very exciting work that has the potential to change the way people design.”

The project was supported, in part, by the National Science Foundation, the Air Force Research Laboratory and the European Union’s Horizon 2020 research and innovation program.

Provided key Knowladge

This week the Association for Computer Machinery (ACM) announced its 2016 fellows, which include four principal investigators from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL): professors Erik Demaine, Fredo Durand, William Freeman, and Daniel Jackson. They were among the 1 percent of ACM members to receive the distinction.

“Erik, Fredo, Bill, and Daniel are wonderful colleagues and extraordinary computer scientists, and I am so happy to see their contributions recognized with the most prestigious member grade of the ACM,” says CSAIL Director Daniela Rus, who herself was named a fellow last year. “All of us at CSAIL are very proud of these researchers for receiving these esteemed honors.”

ACM’s 53 fellows for 2016 were named for their distinctive contributions spanning such computer science disciplines as computer vision, computer graphics, software design, machine learning, algorithms, and theoretical computer science.

“As nearly 100,000 computing professionals are members of our association, to be selected to join the top 1 percent is truly an honor,” says ACM President Vicki L. Hanson. “Fellows are chosen by their peers and hail from leading universities, corporations and research labs throughout the world. Their inspiration, insights and dedication bring immeasurable benefits that improve lives and help drive the global economy. ”

Demaine was selected for contributions to geometric computing, data structures, and graph algorithms. His research interests include the geometry of understanding how proteins fold and the computational difficulty of playing games. He received the MacArthur Fellowship for his work in computational geometry. He and his father Martin Demaine have produced numerous curved-crease sculptures that explore the intersection of science and art — and that are currently in the Museum of Modern Art in New York.

A Department of Electrical Engineering and Computer Science (EECS) professor whose research spans video graphics and photo-generation, Durand was selected for contributions to computational photography and computer graphics rendering. He also works to develop new algorithms to enable image enhancements and improved scene understanding. He received the ACM SIGGRAPH Computer Graphics Achievement Award in 2016.

Freeman is the Thomas and Gerd Perkins Professor of EECS at MIT. He was selected as a fellow for his contributions to computer vision, machine learning, and computer graphics. His research interests also include Bayesian models of visual perception and computational photography. He received “Outstanding Paper” awards at computer vision and machine learning conferences in 1997, 2006, 2009 and 2012, as well as ACM’s “Test of Time” awards for papers from 1990 and 1995.

Jackson is an EECS professor and associate director of CSAIL whose work has focused on improving the functionality and dependability of software through lightweight formal methods. He was selected by ACM for contributions to software modeling and the creation of Alloy, a modeling language that has been used to find flaws in many designs and protocols. He is a MacVicar Fellow and also received this year’s ACM SIGSOFT Impact Paper Award.

Moving target technique

When it comes to protecting data from cyberattacks, information technology (IT) specialists who defend computer networks face attackers armed with some advantages. For one, while attackers need only find one vulnerability in a system to gain network access and disrupt, corrupt, or steal data, the IT personnel must constantly guard against and work to mitigate varied and myriad network intrusion attempts.

The homogeneity and uniformity of software applications have traditionally created another advantage for cyber attackers. “Attackers can develop a single exploit against a software application and use it to compromise millions of instances of that application because all instances look alike internally,” says Hamed Okhravi, a senior staff member in the Cyber Security and Information Sciences Division at MIT Lincoln Laboratory. To counter this problem, cybersecurity practitioners have implemented randomization techniques in operating systems. These techniques, notably address space layout randomization (ASLR), diversify the memory locations used by each instance of the application at the point at which the application is loaded into memory.

In response to randomization approaches like ASLR, attackers developed information leakage attacks, also called memory disclosure attacks. Through these software assaults, attackers can make the application disclose how its internals have been randomized while the application is running. Attackers then adjust their exploits to the application’s randomization and successfully hijack control of vulnerable programs. “The power of such attacks has ensured their prevalence in many modern exploit campaigns, including those network infiltrations in which an attacker remains undetected and continues to steal data in the network for a long time,” explains Okhravi, who adds that methods for bypassing ASLR, which is currently deployed in most modern operating systems, and similar defenses can be readily found on the Internet.

Okhravi and colleagues David Bigelow, Robert Rudd, James Landry, and William Streilein, and former staff member Thomas Hobson, have developed a unique randomization technique, timely address space randomization (TASR), to counter information leakage attacks that may thwart ASLR protections. “TASR is the first technology that mitigates an attacker’s ability to leverage information leakage against ASLR, irrespective of the mechanism used to leak information,” says Rudd.

To disallow an information leakage attack, TASR immediately rerandomizes the memory’s layout every time it observes an application processing an output and input pair. “Information may leak to the attacker on any given program output without anybody being able to detect it, but TASR ensures that the memory layout is rerandomized before the attacker has an opportunity to act on that stolen information, and hence denies them the opportunity to use it to bypass operating system defenses,” says Bigelow. Because TASR’s rerandomization is based upon application activity and not upon a set timing (say every so many minutes), an attacker cannot anticipate the interval during which the leaked information might be used to send an exploit to the application before randomization recurs.

When TASR determines that the rerandomization must be performed, it pauses the running application, injects a randomizer component that performs the actual rewriting of code, then deletes the randomizer component from the application’s memory, and resumes the application. This process protects the randomizer from infiltration. To change the memory layout of a running application without causing a crash, TASR updates all memory addresses stored in the application during rerandomization.

TASR has several advantages over other randomization techniques. It protects against all existing types of information leaks for memory corruption attacks, regardless of the specific method of attack (e.g., viruses, email phishing, access via the Internet) or type of vulnerability (e.g., logic flaws, race conditions, buffer overflows). TASR is flexible: it is compatible with full standard C language, does not require additional hardware, and is backward-compatible with legacy systems. Finally, performance evaluations carried out by the research team showed that the fully automated TASR technique incurs a low execution overhead of only about 2.1 percent.