Coming soon: an eval for story-state continuity

Most AI writing evaluations test whether a model can answer questions, follow instructions, or produce plausible prose.

This one tests whether it can keep custody of a story.

The benchmark is designed to measure long-context narrative continuity under generative pressure: not just whether the model remembers a fact, but whether it preserves relationships, objects, obligations, procedures, emotional state, and unresolved constraints while continuing to write.

A model can sound fluent and still lose the thread. It can summarize correctly and still betray the actual state of the story. This eval is built to catch that seam.

The first release is in development now.

Previous
Previous

Coming soon: an evaluation suite for testing how well AI models handle nonhuman perception