back to work

Can emotions emerge from prediction alone?

A 72-feature LSTM, trained only to predict future world events, independently rediscovers fear, grief, and suspicion as functional internal states — no hardcoded appraisal rules.

Period
Feb 2026 – Apr 2026
Role
Sole researcher
Demo
Open

Most generative-agent systems script emotions: if low health, set state = fear. That works, but it doesn't answer the more interesting question — does the agent actually needthe rule? Emotion Engine extends Stanford's Generative Agents framework with a predictive world model and asks whether emotion-like internal states fall out for free when the only training signal is “guess what happens next.”

The setup: Ghost Town

Twelve agents are dropped into a grid-world survival simulation spanning three simulated days. Resources are scarce, social ties form and break, and ghosts attack at night. Each agent observes a 21-dimensional state vector. I trained five architectures on the same world and compared their behavior:

  • Baseline — no emotion machinery at all.
  • A — Hand-coded appraisal. The classic OCC-rule-style system: scripted triggers for fear, grief, suspicion.
  • B — Behavioral cloning. A network that imitates an emotion-rule policy from logs.
  • C — Latent emotion dynamics. A learned emotion vector with no semantic supervision.
  • D — Predictive world model. A 72-feature LSTM that ingests the 21-dim observation and is trained only to predict 12 future world events. No emotion supervision. None.

What the predictive model did unprompted

Condition D drove the strongest survival behavior in the simulation. More importantly, when I probed its hidden state, the model had spontaneously organized internal dimensions that functionally are emotions:

  • A dimension that spikes when ghosts are nearby and drives shelter-seeking — fear, in everything but name.
  • A dimension that activates after a known agent dies and depresses exploration for a window of steps — grief.
  • A dimension that scales with betrayal frequency and refuses cooperation — suspicion.

Seven of eight latent dimensions had legible behavioral correlates. The model had to reinvent emotion because emotion is what fast, low-bandwidth prediction looks like in a social, adversarial world.

99.0%

Fear-driven shelter-seeking

vs 97.9% hand-coded

3.3e-113

Fisher's exact p-value

N = 205,940 agent-steps

11.8 / 12

Mean survivors

up from 10.1 baseline

7 / 8

Interpretable latent dims

fear, grief, suspicion, +4

Methodology

  • Data. 205,940 agent-step records across 8 random seeds and 6 scenarios.
  • Statistics. Fisher's exact tests with Wilson 95% confidence intervals to validate behavioral signatures.
  • External baseline. Survival and behavior compared against the Gallup 2024 cooperation baseline to ground “does this look like a real social agent.”
  • Stack. Custom Python simulation engine, PyTorch for training, Django for the interactive viewer.

Why this matters

For agent designers, this is permission to stop scripting affect. A small predictive head plus the right observation space gives you emotion-like behavior for free, and the latent space is legible enough to debug. For interpretability folks, it's a clean example of capability emerging from objective rather than architecture.

What I'd do next

  • Push the observation vector to be sparser and see at what point emotion latents collapse.
  • Move from grid-world to a continuous environment and check whether the same latent organization survives.
  • Run an ablation on memory horizon — at what point does “suspicion” stop generalizing?