Principios básicos de la Terapia Comportamental

Part of the author’s contribution to this work was to develop a fast, memory efficient implementation of the algorithm. In this section, we will briefly touch on the speed and memory optimizations performed, and describe how the algorithm can be parallelized.

Speed and Memory improvements

Since the strategy profile converges to a Nash equilibrium based on the number of iterations, it is important to traverse the game trees as efficiently as possible. It is possible to perform cutoffs at some points during the tree traversal, to avoid updating the regret values if the probability of an opponent reaching a particular history becomes 0. In practice, this is a very significant savings: if the opponent never raises with a weak hand in the Preflop, then we can avoid traversing large parts of the tree.

A careful implementation can also yield a large decrease in the memory requirements of the program. By designing our data structure to use as little memory as possible per node in the information set tree, we can make more memory available to solve larger abstractions. One “trick” of particular use in poker was to eliminate the terminal nodes from the tree. In our original implementation, every terminal node in the tree stored the amount of money in the pot (the utility for the game’s winner).

Due to the branching nature of trees, there are more terminal nodes than any other type of node. However, since Limit Heads-Up Texas Hold’em can only end in very few ways (between 1.5 and 24 small bets in the pot, for a limited number of showdowns or folds), we created only 138 unique terminal nodes and structured our tree to avoid duplication.

Parallel Implementation

To compute our CFR poker agents, we used a cluster of computers where each node has 4 CPUs and 8 gigabytes of RAM, connected by a fast network. However, any one node did not have enough main memory to store both the details of a 10-bucket abstraction and the two information set trees required to compute a strategy profile. Our original goal in designing a parallel version of the program was to store parts of the game tree on different computers to make use of distributed memory. While meeting that requirement, we found that the algorithm is easily modified to allow parallel traversals of the game trees when updating the counterfactual regret and action probabilities.

We start with the observation that Limit Hold’em has seven Preflop betting sequences that con- tinue to the Flop (i.e. betting sequences without folds): check-call, check-bet-call, check-bet-raise- call, check-bet-raise-raise-call, bet-call, bet-raise-call, and bet-raise-raise-call. After one of these Preflop betting sequences, to perform the recursive regret and probability updates, we only require the portion of both players’ game trees that started with that Preflop betting sequence.

We use a client/server layout for our parallel approach. The server holds the details of the card abstraction and only the Preflop portion of both players’ information set trees. On each of seven additional computers, we store the portion of both players’ game trees that start with one of the Preflop betting sequences. At the start of each iteration, the server generates N joint bucket sequences, performs the update functions on its part of the game tree, and then contacts each of the clients. Each client is given the bucket sequences and the probability that each player would play that Preflop betting sequence, given the Preflop bucket. With this information, for each bucket sequence, the clients can run their update functions in parallel and contact the server when they are finished.

In theory, this algorithm is almost embarrassingly parallelizable. The server does a very fast iteration over its small Preflop game tree, and then contacts the clients with a short message over a fast network. Each of the clients then does the time consuming part of the task in parallel, with no need for communication between peers, or even with the server until the task is complete.

In practice, we get a sublinear speedup: 3.5x when using 8 CPUs, instead of the 7x speedup that we might expect2_{. This is because not all of the Preflop betting sequences are equally likely. For}

example, the check-bet-raise-raise-call sequence indicates that both players have a strong hand, and are willing to repeatedly bet before seeing the Flop cards. The probability of both players following this betting sequence with a moderate hand may be zero, in which case the computation is pruned.

This means that the computer reserved for handling this betting sequence is not used to capacity. The computers responsible for the check-call, check-bet-call, and bet-call betting sequences are the bottlenecks as these betting sequences are likely to occur with most of the possible deals.

Even with this load balancing issue, however, the parallel version of the program is of consid- erable use. It satisfies the main goal of increasing the available memory by giving us access to 64 gigabytes of RAM across 8 computers. Using this parallel version, we have solved games with as many as 12 buckets on each round to within 2 millibets/game of a Nash equilibrium.

In document Manual Para Padres TDAH (página 56-59)