Simulated Games: simple Offline Learning example

This page presents a fully working example of a very simple offline learning process using SimulatedGame.

It is recommended to read the Simulated Games and Simulated Games: offline learning manual pages first.

We will use the same game as defined here: Simulated Games: Simple Game with Randomness. Make sure you are familiar with the description in the link.

To enable offline learning, the IVectorizer interface implementation must be provided. Our goal is to train a decision tree what will have two kinds of actions in it:

ChooseDieAction - when in state PlayerState::CHOOSE_DIE
BuyWeaponAction - when in state PlayerState::CHOOOSE_WEAPON

The IsLearnableSituation method returns always true, because we will attach the state watcher to the regular Player (not to random DiePlayer), which only has those two actions available. Each situation this player is allowed to make an action is the situation we want to include in the decision tree.

The Vectorize method encodes two aspects of the game state, that are stored in Player instance. One is state o distinguish between PlayerState::CHOOSE_DIE and PlayerState::CHOOOSE_WEAPON as well as player’s money. The money will be used to determine the decision of BuyWeaponAction.

//.h
class GameRandomnessVectorizer : public IVectorizer
{
public:
  GameRandomnessVectorizer(Player* player);
  virtual bool IsLearnableSituation(const ISimulatedGameUnit& unit) const override;
  virtual std::vector<float> Vectorize() const override;
  virtual ~GameRandomnessVectorizer() = default;

private:
  const Player* player;
};

//.cpp
GameRandomnessVectorizer::GameRandomnessVectorizer(Player* player)
  : player{ player }
{
}

bool GameRandomnessVectorizer::IsLearnableSituation(const ISimulatedGameUnit&) const
{
  return true;
}

std::vector<float> GameRandomnessVectorizer::Vectorize() const
{
  return std::vector<float> { static_cast<float>(player->state), static_cast<float>(player->money) };
}

public class GameRandomnessVectorizer : IVectorizer
{
  public Player player {get; private set};

  public GameRandomnessVectorizer(Player player)
  {
    this.player = player;
  }

  public bool IsLearnableSituation(in ISimulatedGameUnit unit)
  {
    return true;
  }

  public IEnumerable<float> Vectorize()
  {
    return new float[2] { (float)player.State, player.Money };
  }
}

Next, implement EqualsForLearning and HashForLearning methods in C++ or Equals and GetHashCode methods in C#. You only need to do this for actions that will be included in decision tree. Therefore, we omit actions performed by the DiePlayer i.e., rolling dice.

Please note that in our implementation of this game, we create each type of action only one and reuse the objects. In C# they are reference types and since the language default implementation of Equals and GetHashCode is based on comparing the references, we do not write anything. We have this functionality out of the box. However, if logically equal actions could be represented by distinct instances of ISimulatedGameAction then we would have to implement the Equals and GetHashCode methods as well.

size_t BuyWeaponAction::HashForLearning() const
{
  return static_cast<size_t>(100 + WeaponAttackValue);
}

bool BuyWeaponAction::EqualsForLearning(const ISimulatedGameAction& other) const
{
  //we have the perfect hash
  return HashForLearning() == other.HashForLearning();
}

size_t ChooseDieAction::HashForLearning() const
{
  return static_cast<size_t>(ChosenDie->GetSideCount());
}

bool ChooseDieAction::EqualsForLearning(const ISimulatedGameAction& other) const
{
  //we have the perfect hash
  return HashForLearning() == other.HashForLearning();
}

// the same procedure applies for C#, but for Equals() and GetHashCode() methods

The code for learning itself is relatively short:

auto watcher = std::make_unique<GameRandomnessVectorizer>(pPlayer);
auto learner = std::make_shared<OfflineLearner>(std::move(watcher));
pPlayer->offlineLearners.push_back(learner);

//Note that these numbers can be really large! it's offline learning.
game.Run(120000, 2000000);

//SampleDataset is a UniqueTreeDataset where each distinct vectorized state appears only once with some metadata from the tree it was generated from.
auto sampleDataset = pPlayer->offlineLearners[0]->GetSamplesDataset(0.1); //states visited in at least 10% of simulations will contribute to learning

//The regular dataset allows for multiples. For example, the same vectorized state may come different runs of the game.
//Each such sample can have different decision (action) regarded as the best one.
auto dataset = sampleDataset->ConvertToDataset(DTConsiderationType::NOMINAL, DTConsiderationType::NOMINAL); //equivalent to (DTConsiderationType::NOMINAL, 2)

//Decision tree requires IVectorizer not for learning but for actual usage.
//It will use Vectorize() to sample the state and such a vectorized form is an input to the decision tree.
DecisionTree tree(std::make_unique<GameRandomnessVectorizer>(pPlayer));
tree.Construct(*dataset);
tree.Print();

var vectorizer = new RandomnessTestVectorizer(player);
player.OfflineLearners.Add(new OfflineLearner(vectorizer));

//Note that these numbers can be really large! it's offline learning.
game.Run(120000, 2000000);

//SampleDataset is a UniqueTreeDataset where each distinct vectorized state appears only once with some metadata from the tree it was generated from.
var sampleDataset = player.OfflineLearners[0].GetSamplesDataset(0.1); //states visited in at least 10% of simulations will contribute to learning

//The regular dataset allows for multiples. For example, the same vectorized state may come different runs of the game.
//Each such sample can have different decision (action) regarded as the best one.
var dataset = sampleDataset.ConvertToDataset(DecisionConsiderationType.NOMINAL, DecisionConsiderationType.NOMINAL); //equivalent to (DecisionConsiderationType.NOMINAL, 2)

//Decision tree requires IVectorizer not for learning but for actual usage.
//It will use Vectorize() to sample the state and such a vectorized form is an input to the decision tree.
DecisionTree tree = new DecisionTree(vectorizer);
tree.Construct(dataset);
tree.Print();

In order to enable text serialization, we will provide a class that enables to serialize actions defined in the game. The implementation uses hashing since we have a perfect hash function (with no collisions). Therefore, we can reconstruct an action based only on its hash.

//.h
class GameRandomnessActionStringSerializer : public IDecisionStringSerializer<ISimulatedGameAction>
{
public:
  GameRandomnessActionStringSerializer(Player* player);
  virtual ~GameRandomnessActionStringSerializer();

  virtual std::string Serialize(const ISimulatedGameAction& decision) override;
  virtual std::unique_ptr<ISimulatedGameAction> Deserialize(std::string decisionString) override;

private:
  Player* player;
};

//.cpp
GameRandomnessActionStringSerializer::GameRandomnessActionStringSerializer(Player* player)
  : player{ player }
{
}

std::string GameRandomnessActionStringSerializer::Serialize(const ISimulatedGameAction& decision)
{
  return std::to_string(decision.HashForLearning());
}

std::unique_ptr<ISimulatedGameAction> GameRandomnessActionStringSerializer::Deserialize(std::string decisionString)
{
  int hash = std::stoi(decisionString);
  if (hash <= 6)
  {
    return std::make_unique<ChooseDieAction>(player->GetDiePlayer(hash/3));
  }
  else if (hash >= 100)
  {
    return std::make_unique<BuyWeaponAction>(hash - 100);
  }
  return nullptr;
}

public class GameRandomnessActionStringSerializer : IDecisionStringSerializer<ISimulatedGameAction>
{
  private Player player;

  public GameRandomnessActionStringSerializer(Player player)
  {
    this.player = player;
  }

  public override ISimulatedGameAction Deserialize(string decisionString)
  {
    int hash = int.Parse(decisionString);
    if (hash <= 6)
      return new ChooseDieAction(player.ChooseDieActions(hash / 3);
    else if( hash >= 100)
    {
      return new BuyWeaponAction(hash - 100);
    }
    return null;
  }

  public override string Serialize(ISimulatedGameAction decision)
  {
    return decision.GetHashCode().ToString();
  }
}

Once you have an action serializer, the serialization is as easy as:

DecisionTreeStringListSerializer<ISimulatedGameAction> serializer(std::make_unique<GameRandomnessActionStringSerializer>(pPlayer));
tree.Serialize(serializer);

DecisionTreeStringListSerializer<ISimulatedGameAction> serializer = new DecisionTreeStringListSerializer<ISimulatedGameAction>(new GameRandomnessActionStringSerializer(player));
tree.Serialize(serializer);

API Reference

Vectorizer API
OfflineLearner API
SimulatedGameHeuristic API
DecisionTree API
Dataset API
DecisionTreeBinarySerializer API
DecisionTreeStringListSerializer API