• No se han encontrado resultados

Función que calcula la matr

In document Analisis Factorial CENEVAL (página 95-104)

This year’s event was the second annual Computer Poker Competition, in which teams submit au- tonomous agents to play Limit and No-Limit Heads-Up Texas Hold’em. This year, 15 competitors from 7 countries submitted 43 agents to three different tournaments. This is the world’s only public annual competition for poker programs; the agents in the competition are the strongest known pro- grams in the world. The Computer Poker Competition consisted of three Heads-Up Texas Hold’em tournaments: Limit Equilibrium, Limit Online Learning, and No-Limit.

7.2.1

Heads-Up Limit Equilibrium

Since finding an -Nash equilibrium strategy is an interesting challenge in its own right, one of the tournaments is designed to determine which player is closest to the Nash equilibrium in the real game. In this competition, every pair of competitors plays a series of several duplicate matches, and the winner of the series receives one point. The players are then ranked by their total number of points. By using this format, we are rewarding players for not losing : it does not matter how much they win by, it only matters that they win.

In this competition, the CPRG entered an agent called Hyperborean07EQ. It is an -Nash equi- librium strategy that plays in a 10-bucket nested abstraction (5 E[HS2] sets, each split into 2 E[HS]

buckets), and it is 2.27 mb/game exploitable in its own abstraction. This strategy was created using the Counterfactual Regret Minimization technique by using four CPUs over 14 days.

The results of the Limit Equilibrium competition are presented in Table 7.1. Hyperborean07EQ took first place in this competition. It did not lose a series of matches against any of its competitors, and had the highest average win rate of any competitor.

7.2.2

Heads-Up Limit Online

In the Heads-Up Limit Online tournament, players are rewarded for exploiting their opponents. Every pair of competitors plays a series of several duplicate matches. The players are then ordered by their total winnings over all of their matches. The bottom 1/3 of competitors is eliminated to remove extremely exploitable players, and then rank the remaining 2/3 of the players by their total winnings against the other remaining competitors.

In this competition, the CPRG entered two agents. The first agent was called Hyperborean07OL. It is an -Nash equilibrium strategy that plays in a 10-bucket non-nested abstraction (10 [E[HS2] buckets). It is 4.33 mb/g exploitable in its own abstraction. Hyperborean07OL was created by the Counterfactual Regret Minimization technique, using 4 CPUs over 14 days. The other CPRG entry was called Hyperborean07OL-2. It was created by a separate branch of research pursued by Billings and Kan. They describe it as a “quasi-equilibrium” that is better able to exploit weak opponents than regular -Nash equilibria strategies.

The full results of the Limit Online competition are presented in Table 7.2. As described above, the bottom 1/3 is removed, resulting in the crosstable shown in Table 7.3. The competi- tors are ranked according to their winnings in this second, smaller table. The two CPRG entries, Hyperborean07OL-2 and Hyperborean07OL took first and second place respectively, with a statis- tically insignificant margin between them.

There is an interesting result in this, however. The first place agent, Hyperborean07OL-2, lost to the next 3 top ranked agents, and was able to win enough from the remaining opponents to still take first place. The second place agent, Hyperborean07OL, did not lose to any opponent, but only won enough from all opponents on average to narrowly miss first place. This result emphasizes one of the features of poker that make it an interesting game: exploitation is important. In this match, we have shown that an agent can lose to several opponents but still win overall, if it is better at exploiting the weak players. A team of several RNR counter-strategies and Hyperborean07OL, as described in Chapter 6, we may have performed better than either Hyperborean07OL or Hyperborean07OL-2.

7.2.3

No-Limit

In the No-Limit tournament, players are rewarded for exploiting the strongest opponents. Every pair of competitors plays a series of several duplicate matches. The players are then ordered by their total winnings over all of their matches. To find the winner, we repeatedly eliminate the player with

the lowest total winnings against players that have not yet been eliminated.

Before this competition, the CPRG had never created a No-Limit agent. The agent we entered was called Hyperborean07, and it uses another -Nash equilibrium strategy made by the Counter- factual Regret Minimization technique. It plays in an 8-bucket abstraction (8 E[HS2] buckets), and

considers only four actions — fold, call, pot-raise, and all-in. A pot-raise is a raise of a size equal to the current size of the pot.

The results of the No-Limit competition are presented in Table 7.4. The CPRG entry, Hyper- borean07, took third place, losing to BluffBot20NoLimit1 and GS3NoLimit1. Of these top three agents, Hyperborean obtained the highest average score, but was defeated by the top two agents. Once again, we found an interesting pattern in the results. The players SlideRule, Gomel, and Gomel-2 were able to exploit their opponents to a much larger degree than the other competitors. Even though they lost several of their matches, their large wins against the weaker opponents such as PokeMinn and Manitoba-2 meant that their average performance was higher than the top three.

This competition was designed to reward consistent play, but under a different winner determi- nation rule, the ranking could have been very different, putting Gomel or SlideRule into first place.

7.2.4

Summary

In the AAAI 2007 Computer Poker Competition, the agents fielded by the CPRG competed against the world’s best artificial poker agents and made a strong showing. In the Limit events that have been our focus since as early as 2003, we took first place twice and second once. One first place and the second place finish went to poker agents created through the techniques described in this thesis. This strong showing provides an experimental basis for our claims as to the applicability and usefulness of these techniques.

The introduction of the No-Limit event and our third-place finish in it affirm that No-Limit is a significantly different style of game than Limit. While our approaches towards finding equilibria and computing counter-strategies are still valid in this game, we believe there are great improvements in performance that can be realized by changing the abstract game in which we compute our strategies. In Section 8.1, we will present our current ideas for future directions that this research can take in order to accommodate the new challenges presented by No-Limit Heads-Up Texas Hold’em.

In document Analisis Factorial CENEVAL (página 95-104)