We examine the depth of large language models’ (LLMs) potential to serve as tools for simulating human decision-making processes. Using GPT-4 Turbo in a sequential prisoner’s dilemma, we find significant discrepancies between LLM-generated and human behaviors: GPT consistently exhibits a higher propensity to cooperate and forms overly optimistic expectations of human cooperation, leading to outcomes that diverge from observed human decision-making patterns. Yet, GPT’s decisions do not appear random; they align with a formal model of human preferences for fairness and efficiency that explains the behavior of most human participants in similar settings. We find a striking contrast: while GPT may fall short as accurate surrogate for human decisions at the surface level, the model nonetheless exhibits a capacity to reflect the underlying preferences that drive human behavior in strategic contexts. Our findings have implications for LLM’s potential to serve as a managerial or research tool in simulating human behaviors.
SAFE Working Paper No. 401