// case study · research
Can emotions emerge from prediction alone?
A 72-feature LSTM, trained only to predict future world events, independently rediscovers fear, grief, and suspicion as functional internal states — no hardcoded appraisal rules.
Most generative-agent systems script emotions: if low health, set state = fear. That works, but it doesn't answer the more interesting question — does the agent actually needthe rule? Emotion Engine extends Stanford's Generative Agents framework with a predictive world model and asks whether emotion-like internal states fall out for free when the only training signal is “guess what happens next.”
The setup: Ghost Town
Twelve agents are dropped into a grid-world survival simulation spanning three simulated days. Resources are scarce, social ties form and break, and ghosts attack at night. Each agent observes a 21-dimensional state vector. I trained five architectures on the same world and compared their behavior:
- Baseline — no emotion machinery at all.
- A — Hand-coded appraisal. The classic OCC-rule-style system: scripted triggers for fear, grief, suspicion.
- B — Behavioral cloning. A network that imitates an emotion-rule policy from logs.
- C — Latent emotion dynamics. A learned emotion vector with no semantic supervision.
- D — Predictive world model. A 72-feature LSTM that ingests the 21-dim observation and is trained only to predict 12 future world events. No emotion supervision. None.
What the predictive model did unprompted
Condition D drove the strongest survival behavior in the simulation. More importantly, when I probed its hidden state, the model had spontaneously organized internal dimensions that functionally are emotions:
- A dimension that spikes when ghosts are nearby and drives shelter-seeking — fear, in everything but name.
- A dimension that activates after a known agent dies and depresses exploration for a window of steps — grief.
- A dimension that scales with betrayal frequency and refuses cooperation — suspicion.
Seven of eight latent dimensions had legible behavioral correlates. The model had to reinvent emotion because emotion is what fast, low-bandwidth prediction looks like in a social, adversarial world.
99.0%
Fear-driven shelter-seeking
vs 97.9% hand-coded
3.3e-113
Fisher's exact p-value
N = 205,940 agent-steps
11.8 / 12
Mean survivors
up from 10.1 baseline
7 / 8
Interpretable latent dims
fear, grief, suspicion, +4
Methodology
- Data. 205,940 agent-step records across 8 random seeds and 6 scenarios.
- Statistics. Fisher's exact tests with Wilson 95% confidence intervals to validate behavioral signatures.
- External baseline. Survival and behavior compared against the Gallup 2024 cooperation baseline to ground “does this look like a real social agent.”
- Stack. Custom Python simulation engine, PyTorch for training, Django for the interactive viewer.
Why this matters
For agent designers, this is permission to stop scripting affect. A small predictive head plus the right observation space gives you emotion-like behavior for free, and the latent space is legible enough to debug. For interpretability folks, it's a clean example of capability emerging from objective rather than architecture.
What I'd do next
- Push the observation vector to be sparser and see at what point emotion latents collapse.
- Move from grid-world to a continuous environment and check whether the same latent organization survives.
- Run an ablation on memory horizon — at what point does “suspicion” stop generalizing?