Designing Fitness Functions for Evolutionary Algorithms in Game Development

Introduction

Designing an effective evaluation function is often the most challenging aspect when applying evolutionary algorithms for optimization. This requires thinking about the desired outcomes in numeric terms. The value of the fitness function indicates the extent to which your design goals have been achieved.

The idea of a fitness function

Types of Fitness Functions

Fitness functions can be categorized in two main ways:

Based on Determinism

  • Deterministic: A given phenotype consistently receives the same fitness score.

    • Example: Finding the shortest path in a graph, with fitness defined as the inverse of total distance.

  • Non-deterministic: Different evaluations might assign varying fitness scores to the same individual due to random elements in the calculation procedure.

    • Example: Fitness score based on the number of frags achieved in one deathmatch in an FPS game.

Based on Evaluation Method

  • Direct (absolute): Assesses an individual directly based on its attributes, without requiring comparison with other individuals.

    • Example: The inverse of total distance in the shortest path problem.

  • Relative: Fitness results from interactions between two or more individuals in the population.

    • Example: Tournament setup where each individual competes in five 1-on-1 matches against randomly selected opponents. The fitness value can be then calculated as the number of wins divided by five.

Deterministic & absolute fitness functions are best for speed and convergence of evolutionary algorithms. However, game development typically involves non-deterministic & relative fitness functions.

Examples of Fitness Functions in Game Development

Game Type & Optimization Objective Fitness Function Idea

Optimizing weapon parameters in an FPS game

Series of head-to-head battles between standard bots using different weapons. Fitness based on how closely actual win rates match expected win rates.

Generating interesting map layouts in dungeon-crawler/rogue-like/action RPG games

Bot completion time within a maximum limit. Fitness based on completion time or number of interactions.

Balancing unit parameters in an RTS game

Multiple matches between factions. Fitness based on how close win rates are to 0.5, with bonus for unit utilization.

Optimizing resource placement in multiplayer games

Similar to RTS example, but encoding different sets of parameters.

Finding challenging enemy configurations in roguelike bullet hell games

Fitness based on number of enemy units or player’s remaining health after victory.

Constructing optimal racing tracks

Fitness based on lap time, with a constraint on track length.

Optimizing gravity in platformer games

Fitness based on level completability and jump difficulty distribution.

Balancing in-game economies

Fitness based on how well simulations satisfy predefined economic rules.

Fine-tuning bot shooting accuracy in FPS games

Fitness based on how closely in-game accuracy matches set parameters.

Reliance of Human Players in Fitness Computations

In general, this evaluation procedure may involve human players but this makes both the implementation non-standard and the process to take extensively long.

It would be best if you could define an evaluation function that only requires to run bots.

What if you can’t?

What if you really need to evaluate your game against humans?

In such a case I would recommend defining the fitness value of a given individual (representing a given set of parameters) as an average value of outcomes from games played by humans against this particular individual. By doing it this way, we can compare results against a hopefully normal distribution of players and the more human players will play, the more confident the fitness value will be.

Let us estimate the effort.

The number of evaluations needed depends on the number of children produced in each iteration and the number of iterations.

The population size should depend on the problem complexity, but we recommend having a number between 60 and 300, because these are the sizes tested the most in various problems in the literature. For the number of iterations, we recommend from 40 to 200.

Let’s make a modest assumption of 80 iterations and 60 children produced per iteration. This requires 4800 evaluations, where each evaluation will be typically a game against a human player.

Let’s assume that one game episode lasts for 30 minutes making the whole workload equal to 2400 man-hours. Assuming an 8-hour work day, it totals to 300 mandays. This is expensive but doable given enough number of participants and a parallelized setup.

However, bear in mind that we have made low-end assumptions about the population size and the number of iterations. It is really advised to run a bot-exclusive evolutionary optimization.