System News
Reinforcement Learning for Multi-agent Systems
This Calls for More Than Just Food Rewards
August 7, 2009,
Volume 138, Issue 1

With the help of the tuning algorithms David Vengerov is creating multi-agent systems that can learn via positive reinforcement.

In his contrarian minds interview with David Vengerov, writer Al Riske characterizes the Sun Labs researcher as "part of an emerging consensus that large-scale systems need to be self-managing and self-optimizing." The complexity of systems is so vast that setting tuning parameters has become all but impossible to accomplish perfectly. "The only thing people can do now to make these systems work well is to set by hand multitudes of different parameters and policies," Vengerov told Riske.

Vengerov credits his father with setting him on the path to study complex systems, which he did at MIT and Stanford, where he learned what he needed to know about math and control theory in order to write the algorithms he currently devises that "...enable systems to observe their constantly changing environments and learn from what they experience," as Riske puts it.

Riske also notes the utility of these algorithms in dynamic resource allocation, dynamic data migration among storage devices, dynamic user migration in a network of servers, dynamic job scheduling in server farms, and dynamic pricing of incoming jobs. In sum: invaluable for cloud computing.

According to Vengerov, what he calls "reinforcement learning" as applied to complex systems operates according to the same principles as rewards for the correct response among animals set to the task of learning a particular action. His tuning algorithms assist machines to "learn" in much the same fashion.

"Right now people will just heuristically, by hand, specify that we want to drive the system toward this state and not that state. But actually there are an infinite number of states. Reinforcement learning can help establish the value of each state. Then we can take actions," Vengerov says.

As he puts it, Vengerov likens reinforcement learning to "... trial and error but smart. Directed trial and error where you are trying to explore the space of possible actions while at the same time taking the actions that give you the highest reward most often." Machines learn to choose states and actions in terms of the reward history of their own behavior.

While work along these lines had been done earlier in single-agent systems, what sets Vengerov apart is that he works with multi-agent systems, writes Riske.

Riske cites the example of dynamic user migration among the servers on a network where, according to Vengerov, " ... each server is an agent, and they learn how to trade users among themselves so as not to overload any one server, and at the same time place communicating users nearby on the same server so as to reduce the communication time."

What is more, Vengerov's work has led him to attack real-world situations and to work with Solaris engineers on things like thread migration and dynamic bandwidth allocation, as well as collaborating with Java programmers to embed a self-tuning algorithm in the Java virtual machine -- boosting performance on a well-known benchmark -- SPECjbb2005 -- by 70 percent, Riske notes.

Read More ... [ more...]



Other articles in the Features section of Volume 138, Issue 1:

See all archived articles in the Features section.

Trending in
Vol 227, Issue 2
Trending IT Articles