Abstract

We study the risk of collusive behavior when using Reinforcement Learning (RL) algorithms to decide on pricing strategies in competitive markets. Prior research in this field focused exclusively on Tabular Q-learning (TQL) and led to opposing views on whether learning-based algorithms can lead to supra- competitive prices. Firms are increasingly using Deep Reinforcement Learning (DRL). Our work contributes to this ongoing discussion by providing a more nuanced numerical study that goes beyond TQL by additionally capturing off- and on-policy based DRL algorithms: Deep Q-Networks (DQN) and Proximal Policy Optimization (PPO). We study the dynamics of these algorithms in a Bertrand competition and show that algorithmic collusion indeed depends on the algorithm used. In our experiments TQL exhibits higher collusion and price dispersion phe- nomena, while DQN and PPO show lower collusion tendencies, with PPO, in particular, achieving a large proportion of competitive outcomes. We further show that algorithm dynamics are sensitive to general algorithm design decisions.

Share

COinS