^{*}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: HJZ. Performed the experiments: ZB HJZ. Analyzed the data: ZB HJZ. Contributed reagents/materials/analysis tools: HJZ. Wrote the paper: HJZ.

In an iterated non-cooperative game, if all the players act to maximize their individual accumulated payoff, the system as a whole usually converges to a Nash equilibrium that poorly benefits any player. Here we show that such an undesirable destiny is avoidable in an iterated Rock-Paper-Scissors (RPS) game involving two rational players, X and Y. Player X has the option of proactively adopting a cooperation-trap strategy, which enforces complete cooperation from the rational player Y and leads to a highly beneficial and maximally fair situation to both players. That maximal degree of cooperation is achievable in such a competitive system with cyclic dominance of actions may stimulate further theoretical and empirical studies on how to resolve conflicts and enhance cooperation in human societies.

The solution concept of Nash equilibrium (NE) plays a fundamental role both in classic game theory and in evolutionary game theory

Many non-cooperative games have only a unique NE. When such a game is played by highly rational players who act to maximize their individual accumulated payoff, it is unavoidable that the system will sooner or later converge to this unique equilibrium situation. Unfortunately, however, it is usually the case that the NE of a non-cooperative game is an unfavorable or even miserable destiny for all the players. Let's consider the two-player Prisoner's Dilemma (PD) game as a simple example. The cooperative situation of both players choosing not to confess is much better than the defection situation of both players choosing to confess, but the latter is the unique NE of this game while the former is not

In this paper we study the issue of cooperation in the itererated two-player Rock-Paper-Scissors (RPS) game, which is a fundamental non-cooperative game with cyclic dominance among its action choices (namely Rock beats Scissors, Scissors beats Paper, and Paper in turn beats Rock), see

(A) The payoff matrix. Each matrix element is the payoff of the row player X's action in competition with the column player Y's action. (B) The cyclic (non-transitive) dominance relationship among the three candidate actions: Rock (

In a literature search for related studies, we found that an early paper of Grofman and Pool

The present effort can be regarded as an extension of the Grofman-Pool theory to the iterated RPS game, which has the additional difficulty of having more than two action choices that are related by a rotation symmetry (see

Consider two players X and Y playing the RPS game for an indefinite number of rounds. At every game round each player can choose one action among three candidate actions

When

We now develop CT strategies for player X, and begin with the simplest case of memoryless strategies, namely at every game round player X does not consider her and her opponent's prior actions nor the outcomes of prior plays but chooses actions

As player Y is sufficiently intelligent, he will figure out the strategy of X after a small number of game repeats. Alternatively, with the aim of promoting cooperation from player Y, player X may also explicitly inform Y about her strategy parameters, which are

If the strategy of player X have the following property that

If the payoff parameter

If

The optimal values of both players' expected payoff per round

These shortcomings of the memoryless CT strategy can be eliminated by increasing the memory length of the CT strategy.

Recent laboratory experiments carried at Zhejiang University

When the payoff parameter

On the other hand, when

It turns out that the optimal CT strategy of unit memory length has the following quantitative properties:

If

If

If

If

The optimal values of both players' expected payoff per round

To completely eliminate these undesirable features, player X can increase the memory length of her CT strategy and therefore be more non-tolerant to defection. There are many ways of implementing such an idea. When

The optimal values of both players' expected payoff per round

If the payoff parameter

As clearly demonstrated in

We have demonstrated in this paper that fair cooperation can be achieved in the two-player iterated RPS game. Such a highly cooperative state brings maximal accumulated payoff to the group, and it is not enforced by external authorities but by the proactive decision of one player to adopt an optimal cooperation-trap strategy. The basic designing principle of such optimal CT strategies should be generally applicable to other two-player iterated non-cooperation games.

For the optimal CT strategies to work, the passive player Y is assumed to be considerably rational so that he adopts a best response strategy to that of his opponent X to maximize his accumulated payoff, while the proactive player X is assumed in addition to be wise enough so that she does not exploit the cooperation state of her opponent too much but is satisfied with a fair share of the total accumulated group payoff. This latter assumption might be a little bit too strong, but maybe it is not strictly necessary as player Y will punish X for defection behaviors.

For the iterated RPS game, it appears to be impossible for the proactive player X to design a CT strategy which brings higher expected payoff per game round to herself than to her opponent. However, this is not a general conclusion. For some other game systems, notably the iterated PD game

When strategic interactions occur in biological systems

Cooperation in a finite-population RPS game system with more than two players may be much more difficult to achieve than the case of two players. A recent theoretical investigation by one of the present authors

The iterated two-player RPS game might also serve as a simple system to quantitatively measure the degree of rationality of single human subjects. For example, an experiment can be arranged as follows. A human subject Y plays repeatedly with a fixed opponent X which is actually a computer implementing an optimal CT strategy. But Y does not know that he is playing with a computer and assumes he is playing with another human subject. By analyzing the evolution trajectory of player Y's action choices, we may quantitative measure the learning behavior of player Y and his tendency of making rational decisions. We are discussing with colleagues about the possibility of carrying out such an experimental study.

HJZ thanks Zhijian Wang and Bin Xu for a recent fruitful collaboration on the finite-population Rock-Paper-Scissors game, which inspired the present work greatly, and thanks Jiping Huang for comments on the manuscript.