Optimal Bargaining on Ebay Using Deep Reinforcement Learning

Etan Green (University of Pennsylvania)

Barry Plunkett (University of Pennsylvania)

Abstract: Reinforcement learning algorithms now outperform the best humans in a wide variety of Markov Decision processes (MDPs), such as chess and Go. We 1) formulate bargaining in "Best Offer" listings on eBay as an MDP; 2) train neural networks to behave like human buyers and sellers using a large, publicly available dataset of Best Offer listings; 3) train a reinforcement learner to play optimally against these agents as either the seller or a buyer; and 4) characterize the learner's behavior. More generally, we provide a template for estimating optimal policies in economic settings where experimentation is infeasible but data are plentiful.