In this document, we describe the problem of planning and the solution used in Grail. If you wish to go straight to coding, go to Planners: Quickstart page in HOWTOs section.
Planning is a branch of Artificial Intelligence research concerned with finding appropriate sequences of actions needed to achieve stated goals given an initial world state. The action sequences leading to a stated goal are called plans.
The role of planning algorithms is to find not just any plan, but the optimal one (according to hard-coded or user-provided requirements). This might mean finding the shortest one, but it might also be required to minimize a more abstract action execution cost metric.
In this section we provide a high level description of planning algorithms, without delving into implementation details or differences between various existing approaches. The goal of this section is to give you a basic understanding required for using planners.
All planning algorithms require the concept of state (or world state). The world state contains all information needed to determine if a given action may be performed. For example, a simple state representation may be a pair of string arrays - one describing items in your inventory and the other one - items lying around in the game world.
Another key concept of planning algorithms is action. An action is represented by a list of precondinditions and a list of effects. Preconditions determine whether the action can be executed, given a world state. Effects define world state transformations applied after choosing the action. For example, "CutDown" action may be described as follows:
Applying the action may be visualized as follows:
In the remaining part of this document, actions will be simply represented as arrows with names on top of them, to avoid clutter.
Planner goals are defined as a set of conditions, in a way similar to action preconditions. Whenever the planner stumbles upon a state that satisfies all of the conditions, the best plan leading to this state is returned.
The set of all possible states is called a state space. In any state there can be (and usually there is) more than one action available. Let’s consider state space search on a more complex case (the initial state is marked in blue):
But wait! There are duplicate states! We can simplify our diagram by merging them:
Now the state space representation resembles something we all know and love: a graph! Now it’s become apparent that planning is nothing more but a pathfinding problem on a very peculiar graph. This means that we can use our favourite pathfinding algorithm - A*.
There is one important detail, though. The state space graph is not known upfront, but has to be discovered on the go. This means that the algorithm has to generate the graph on the fly, check for duplicate states and merge them, just like we did in our example. This makes the problem a little bit harder, but not prohibitively so.
To make our problem more pathfinding-like, we might want to add lengths/weights to our edges/actions. This way, the planning algorithm will be able to find not only the shortest possible plan, but to optimize for other, more arbitrary metrics (like mana cost, fatigue etc.). You can also try to model agent preference using action costs. For example, if you want your agent to be aggresive, just assign lower action costs to offensive actions - it will cause the agent to take the pacifist route only if they have no other choice.
As you can probably imagine, the state space can grow REALLY big given enough actions and possible state parameters. Searching such a space may prove to be very difficult or even bordering on impossibility. Luckily, we can guide our algorithm by using a heuristic, just like we’re used to in traditional pathfinding problems.
A good heuristic should give an estimate of abstract distance from the current state to a goal state. Of course, there might be many states that satisfy our goal conditions and the concept of distance itself is not so trivial in this weird space. But in many instances, given enough domain knowledge, it’s possible to devise a reasonable heuristic.
If we want the algorithm to find a strictly optimal plan, our heuristic has to satisfy the admissibility criterion: it has to return at most the lower bound on distance to goal, i. e. it can never overestimate the total cost of the optimal plan. Sometimes you may want to sacrifice accuracy for shorter computation times and use a heuristic that’s not admissible in the strict sense, but you should always make this decision consciously.
Sometimes goal conditions may be so difficult to attain that there’s not enough time for our agent to formulate the plan in one go. We don’t want them to just stand there thinking for 10 seconds straight, we need them to act!
In such a case we can try using partial plans. If we provide the algorithm with a good heuristic (in other words: a good estimation of plan’s completion) it can return the plan that is expected to bring the agent as close to stated goals as possible if it fails to find a complete one.