Article 1002![]()
Equilibrium Selection via Adaptation: Using Genetic Programming to Model Learning in a Coordination GameShu-Heng Chen, John Duffy, Chia-Hsuan Yeh National Chengchi University; University of Pittsburgh; I-Shou University |
Keywords: Adaptation, Genetic Programming, Coordination Game, Equilibrium Selection, Survival of the Fittest
JEL: C63, D83.
Abstract |
|---|
The aims of the articleThis paper models adaptive learning behavior in a simple coordination game that Van Huyck, Cook and Battalio (1994) have investigated in a controlled laboratory setting with human subjects. We consider how populations of artificially intelligent players behave when playing the same game. We use the genetic programming paradigm, as developed by Koza (1992, 1994), to model how a population of players might learn over time. In genetic programming one seeks to breed and evolve highly fit computer programs that are capable of solving a given problem. In our application, each computer program in the population can be viewed as an individual agent's forecast rule. The various forecast rules (programs) then repeatedly take part in the coordination game evolving and adapting over time according to principles of natural selection and population genetics. We argue that the genetic programming paradigm that we use has certain advantages over other models of adaptive learning behavior in the context of the coordination game that we consider. We find that the pattern of behavior generated by our population of artificially intelligent players is remarkably similar to that followed by the human subjects who played the same game. In particular, we find that a steady state that is theoretically unstable under a myopic, best--response learning dynamic turns out to be stable under our genetic--programming--based learning system, in accordance with Van Huyck et al.'s (1994) finding using human subjects. We conclude that genetic programming techniques may serve as a plausible mechanism for modeling human behavior, and may also serve as a useful selection criterion in environments with multiple equilibria. The methods of the articleGenetic programming (GP) represents a new field in the artificial intelligence literature that was developed only recently by Koza (1992, 1994) and others (See also Kinnear (1994)). GP belongs to a class of evolutionary computing techniques based on principles of population genetics. These techniques combine Darwin's notion of natural selection through survival of the fittest with naturally occurring genetic operations of recombination (crossover) and mutation. Genetic programming techniques have already been widely applied to engineering type optimization problems (both theoretical and commercial), but have seen comparatively little application to economic problems, which are often similar in nature. The few economic applications of GP thus far include Allen and Karjalainen (1993), Chen and Yeh (1997ab) and Dworman, Kimbrough and Laing (1996). While GP techniques are often viewed as an offshoot of Holland's (1975) genetic algorithm (GA), GP techniques are perhaps more accurately viewed as a generalization of the genetic algorithm. The standard genetic algorithm operates on a population of structures, usually strings of bits. Each of the members of this population, the individual bitstrings, represent different candidate solutions to a well-defined optimization problem. The genetic algorithm evaluates the fitness of these various candidate solutions using the given objective function of the optimization problem and retains solutions that have, on average, higher fitness values. Operations of crossover (recombination) and mutation are then applied to some of these more fit solutions as a means of creating a new ''generation'' of candidate solutions. The whole process is repeated over many generations, in order to evolve solutions that are as close to optimal as possible. In analyzing the evolution of solutions over time, it is typical to report the solution in each generation that has the highest fitness value - this solution is designated the ''best-of-generation'' solution. The algorithm is ended when this best-of-generation solution satisfies a certain criterion (e.g. some tolerance) or after some maximum number of iterations has been reached. Theoretical analyses of genetic algorithms suggest that they are capable of quickly locating regions of large and complex search spaces that yield highly fit solutions to optimization problems. That is because the genetic operators of the GA work together to optimize on the trade-off between discovering new solutions and using solutions that have worked well in the past (Holland (1975)). For an introduction to the theory of genetic algorithms, see, e.g. Goldberg (1989) or Mitchell (1996). Economic applications of genetic algorithms can be found in the work of Arifovic (1994, 1995, 1996, 1997), Arthur et al. (1996), Bullard and Duffy (1998ab), Dawid (1996), Miller (1996) and Tesfatsion (1995) among others and are also discussed in Sargent (1993) and Birchenhall (1995).} Koza's idea in developing genetic programming techniques was to take the genetic algorithm a step further and ask whether the same genetic operators used in GAs could be applied to a population of computer programs so as to evolve highly fit computer programs. There are several advantages to using computer programs rather than bitstrings as the structures to be evolved. First, the computer programs of GP have an explicit, dynamic structure} that can be easily represented in a decision tree format. By contrast, the bitstrings of GAs typically encode passive yes/no type decisions or parameter values for prespecified, often static functional forms. The dynamic nature of the computer programs of GP makes them capable of a much more sophisticated and nonlinear decision making than in generally possible using the bitstrings of GAs. Second, the computer programs of GP are immediately implementable structures; as such, they can be readily interpreted as the forecast rules used by a heterogeneous population of agents. For example, in GP, a computer program used by player i in round t, gp_{i,t}, might take the form: gp_{i,t} = 0.31 + M_{t-1}(M_{t-1} - M_{t-2}). Here, M_{t-j} represents the value of the median j periods in the past. Given these lagged median values, this program can be immediately executed and delivers a forecast of the median in period t, equal to the value of 0.31 + M_{t-1}(M_{t-1} - M_{t-2}). This forecast then becomes the action taken by player i in round t. Note that this program is readily interpreted as the agent's forecast function. By contrast, the bitstrings used in GAs are not immediately implementable and their interpretation is less clear; these bitstrings first have to be decoded and then the decoded values must be applied to some prespecified functional form before the solution the bitstrings represent can be implemented. Finally, while the length of the bitstrings used in GAs is fixed, the length of the computer programs used in GP is free to vary (up to some limit, of course) providing for a much richer range of structures. The results of the articleWe have considered a simple coordination game where the actions of the individual players are modelled and updated using GP techniques. Our GP--based coordination game allows for a considerably more flexible experimental design than is possible in experiments with human subjects. In particular, we do not have to restrict the choice set to a finite set of discrete actions, and we can have large numbers of players, e.g. n=500. Moreover, players in our genetic programming implementation are explicitly endowed with the ability to formulate a vast number of both linear and nonlinear forecasting rules for the mean, including the myopic best response rule. This more flexible design allows for a possibly dense set of periodic and chaotic trajectories for the mean for values of omega > 3. Despite this more flexible design, the evolution of play in our GP-based coordination game remains quite similar to that observed in the experiments that Van Huyck et al. (1994) conducted with human subjects. The mean choice of action eventually settles down to a small neighborhood of the interior equilibrium, even in Case 2, where the myopic best response dynamic predicts that this interior equilibrium should be unstable. There is evidence however, that the coordination problem that our artificial agents face in Case 2 is somewhat more difficult than the coordination problem they face in Case 1, as indicated by the different standard deviations about the mean/interior equilibrium for these two cases. While these results cast some doubt on the plausibility of the myopic best response dynamic as a selection criterion (or any other learning schemes that would predict the interior equilibrium to be unstable), it is not yet clear that the myopic best response dynamic should be rejected on the basis of a ``bad'' prediction for a single game, namely Gamma(3.86957), or that the alternative, inertial learning algorithm should be accepted as a plausible selection dynamic on the same basis. While the inertial learning dynamic predicts that the interior equilibrium is always stable, the predicted trajectory for the mean/median is much too smooth when compared with the same trajectory from the experimental data. Moreover, the notion that a single, representative-agent-type learning algorithm can be used to characterize the evolution of the mean/median is at odds with the initial heterogeneity that is apparent in the experimental subjects' actions. Finally, since our GP-based learning algorithm always ``converges'' to the interior equilibrium it is, by the criterion of Van Huyck et al. (1994), just as plausible a selection dynamic as the inertial learning algorithm. The initial heterogeneity of the forecasts that arise from our population-based GP algorithm makes it all the more plausible as a characterization of the experimental data. We also note that the predictions of our GP-based learning model, especially those involving the truncated linear transformation, compare quite favorably with some new coordination game experiments that Van Huyck, Battalio and Rankin (1996) have recently conducted with human subjects. These new experiments differ from the previous experiments conducted by Van Huyck et al. (1994) in that subjects are not informed of the game's payoff function pi; the only information available to subjects is their own past action/payoff history and the discrete action set that they may choose from. The purpose of this new experimental treatment is to place the human subjects in an environment that is as close as possible to that of artificial learning algorithms such as GP. In this new treatment, the human subjects learn to coordinate on the interior equilibrium even more quickly than in the previous treatment where subjects are informed of the payoff function pi, of the game. Van Huyck, Battalio and Rankin (1996) compare the experimental behavior in the new treatment with the behavior of a representative-agent-type, linear, stochastic reinforcement algorithm. While this algorithm eventually achieves a neighborhood of the interior equilibrium, it takes much longer to achieve this equilibrium (750 iterations) than it takes the experimental subjects. By contrast, our multi-agent GP-based learning algorithm converges much more quickly to a neighborhood of the interior equilibrium (usually within 50 iterations) so that it comes closer to mimicking the behavior of the experimental subjects. Finally, we note that our findings for the coordination game are consistent with some other coordination experiments that have involved overlapping generations economies. Marimon, Spear and Sunder (1993) for example, report that experimental subjects are unable to coordinate on two-state sunspot equilibria, choosing instead to settle upon the steady state of an overlapping generations economy. Similarly, Bullard and Duffy (1998b) simulate behavior using a genetic algorithm--based learning model in an overlapping generations economy and find that their population of artificial agents is able to eventually coordinate on steady state and low--order cycles for inflation rates but not on the higher order periodic equilibria of their model. This paper extends these earlier findings by suggesting that it may not be possible for agents to coordinate on aperiodic, chaotic trajectories. |
This article has effectively(*) been downloaded 2426 times.
![]()
Back to e-JEMED page