Details
Adaptive Reward Shaping for Text-Based Adventure Games
Year: 2025
Term: Fall
Student Name: Adnan El Assadi
Supervisor: Sriram Subramanian
Abstract: Reinforcement learning in text-based adventure games faces significant challenges due to sparse reward signals and large action spaces. While potential-based reward shaping accelerates learning, existing approaches use fixed shaping coefficients throughout training, providing excessive guidance early or insufficient support later. We introduce adaptive reward shaping schedulers that dynamically adjust the shaping coefficient based on training progress, reward sparsity patterns, and agent uncertainty. We propose six schedulers organized into three categories: time-based (exponential, linear, cosine decay), sparsity-based (triggered and sensitive variants), and uncertainty-based (entropy-informed). Evaluation across three text-based games with varying reward structures demonstrates that adaptive methods achieve 14-53% improvement over static shaping, with the optimal scheduler depending on environment characteristics. Our key insight is that static shaping creates a fundamental trade-off: strong early guidance accelerates initial learning but creates dependence on auxiliary rewards, limiting final performance. Adaptive schedulers resolve this by gradually reducing shaping strength as training progresses. On the sparsest game (Zork1), our best adaptive method achieves 53% higher final score than static shaping, demonstrating a principled approach to balancing exploration and exploitation. This work provides practitioners with effective methods and guidelines for selecting schedulers based on reward sparsity, enabling improved learning in sparse-reward domains including robotics, dialogue systems, and game playing. To facilitate reproducibility and future research, we release our complete implementation at https://github.com/AdnanElAssadi56/jericho-sac-adaptive.