I've been thinking about this concept off and on for a couple months now, so I think I should type it down. Keep in mind that it's unpolished: I don't have a solid take on the precise execution of the concept, or if it is actually functionally different from anything in the space, but it seems an interesting way to handle things, to me. It also seems to be an interesting avenue to investigate cognitive science, as well.
Anyway, on to the concept.
Let's say that you're playing chess. Now, in chess, you have the game state, and a variety of pieces, each with different abilities and functionality. The overall goal is to win by taking your opponent's king.
Now, as a bit of an AI guy, I would say that the way for an AI to accomplish this would be best done with reinforcement learning. The AI plays randomly until it wins a game, which it then propagates a value back through the move set, making those moves preferable. So long as the state of the game exactly resembles a position it has seen before, it will have a value associated with a given move that it has made previously. Throw in a chance for the AI to say, "eh, fuck it," and try something new, and... we're done here.
This is boring and impractical. Not only that, but it doesn't seem a very good model for how humans do it. The randomness aspect, yes, but not the overarching approach: we know, for instance, that the best humans don't do much look-ahead. While the AI in question isn't doing actual look-ahead, it is effectively doing it.
Now, I'm a big proponent of the idea that AI thinking doesn't need to mimic human thinking, so the idea that the AI does it differently than the human doesn't bother me too much. But anyway.
So what if, instead of approaching the board from an overview, RTS-style approach, the AI instead gives each of its pieces its own agency. In effect, rather than looking at the whole of the game state, each piece looks around itself, and the parts of the state that are relevant to it, and applies values to each potential move it could make. Apply a sort of Bayesian agency to the system as a whole at the AI's overview level, and you get something really interesting.
So let's say that this particular AI - let's call him Sam - has had much success in the past with his opening move as moving one of the four central-most pawns ahead two spaces. Sam doesn't have access to this information directly - rather, what is going on here is that the pawns each say, "hey, my value of moving ahead two spaces is 8, my value for one space ahead is 4, and my value for not moving is -2." The other four pawns say some useless things and return values that aren't as good, so they're immediately discounted by the Bayesian agency at the top level.
So the Bayesian agent here has a bit of a problem, he has four sub-agents all giving him the same value. However, as the sub-agents learn, so, too, does the B agent: it knows that listening to the fifth pawn from the left usually yields successes (thanks to RL), but rather than working with raw values, it has percentages associated with each sub-agent's responses. So it returns to Sam: move the fifth pawn from the left forward two spaces.
In more complicated game states, the B agent could act as an intermediary between sub-agents. Say that two sub-agents, a queen and a knight, are both returning a 15 for a given move; to further complicate it, the B agent believes both with a 40% chance of success. All other moves are significantly weaker. The B agent might then have the authority to create a new sub-agent, that takes both sub-agent's actions into account, combining them into a single agent that can do an analysis of the effects of both actions, and return that information back to the B agent: thus allowing for a cogent decision to be made. The level of success of that action can then be propagated through to the sub-agents and the B agent, potentially allowing for a more intelligent strategy to emerge in a way that wouldn't be possible otherwise.
I realize that, taking a step back, this looks like an overcomplication of standard RL methods. Why have a multitude of agents all arguing and clamoring for attention, when the end result winds up being roughly the same - potentially even mildly worse, given potential information loss - and significantly more expensively, computationally?
Because the agents have different goals.
Consider this. Instead of a single agent, Sam, with the goal to take the opponent's king, you have a multitude of agents, all with different goals. Pawns want to get to the end of the board and get promoted. Kings want to avoid enemy pieces, with a stronger desire for that goal than any other piece. All pieces work to serve Sam's overarching goal, but recognize that they approach it differently, and this understanding of sub-agency allows Sam to reach his goal more efficiently.
Think about how your mind works. While you might have a goal in mind at any one time, there are still a multitude of other goals, all clamoring for attention in your head: eat food, put more toner in the printer, make babies, mow the lawn, start dieting, punch your boss in the face. At any given moment, we are shuffling priorities, sometimes taking stock of our current situation and attempting to determine which goal is most pertinent at the time.
In our AI efforts to date, though, we don't seem to recognize this. Our agency is not the result of a singular agent, but a multitude of sub-agents, which ultimately serve a higher power - the self - which decides which sub-agent to indulge at the moment. Selfhood is the agent of agency, through which sub-agents attain their agency by making the agent's goal their goal. We even create new agents or destroy old ones, as we come to epiphanies about ourselves and our goals in our lives - the goal of the self changes, which is reflected in the hierarchy of sub-agents.
So, yes, the end result is somewhat messier. But it allows for a significantly different approach than standard AI approaches to date. I think it also helps clarify some things in human cognition, too, explaining things like internal conflicts and the like.
Friday, May 10, 2013
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment