Rigorous-looking scores for texts that have no business being on a scale. Backed by a language model and one CSV of 30 passages someone scored on a Tuesday.
Inspired by industry leaders whose own scale also doesn't work.
"Measures That Inspire"
"Measures That Barely Work"
A small CSV of passages with human-assigned difficulty scores. Small enough to fit in a chat, big enough to publish a paper.
GPT-4.1 reads your text against each anchor and decides which is harder. Thirty calls in parallel. Costs roughly a coffee.
We fit a logistic curve through the pairwise results, scale the estimate to 100–1600, and present it with appropriate gravitas.