cryptics.georgeho.org
is a dataset of cryptic crossword1 clues, indicators and charades, collected from various blogs and digital archives.
This dataset is a significant work of crossword archivism and scholarship, as acquiring historical crosswords and structuring their contents require focused effort and tedious cleaning that few are willing to do for such trivial data - for example, according to this 2004 selection guide2, the Library of Congress explicitly does not collect crossword puzzles.
This project indexes various blogs and digital archives for cryptic crosswords. Several fields - such as clues, answers, clue numbers, annotations or commentary, puzzle title and publication date - are parsed and extracted into a tabular dataset. The result is over half a million clues from cryptic crosswords over the past twelve years.
Two other datasets are subsequently derived from the clues - wordplay indicators and charades (a.k.a. substitutions). All told, the derived datasets contain over twelve thousand wordplay indicators and over sixty thousand charades.
Currently the sources for clues are:
- ๐ฌ๐ง Big Daveโs Crossword Blog (The Daily Telegraph, The Sunday Telegraph)
- ๐บ๐ธ The Browser3
- ๐บ๐ธ Cru Cryptic Archive (The New York Times โCruโ Forums)
- ๐ฌ๐ง Fifteensquared (Financial Times, The Guardian, The Independent)
- ๐ฎ๐ณ The Hindu Crossword Corner (The Hindu)
- ๐บ๐ธ Leo Edit4
- ๐จ๐ฆ National Post Cryptic Crossword Forum (National Post)
- ๐บ๐ธ The New York Times
.puz
archive5 - ๐บ๐ธ The New Yorker
- ๐ฌ๐ง Times for the Times (The Times of London)
The data can be viewed online and downloaded for free (CSV, JSON, SQLite, advanced6). Detailed documentation can be found on the datasheet and the source code for creating the dataset is available on GitHub.
Send all comments, suggestions and complaints to hello[รฆ]georgeho.org.
Please share and enjoy!
If youโre new to cryptic crosswords, rejoice! A whole new world awaits you! The New Yorker has an excellent introduction to cryptic crosswords, and Matt Gritzmacher has a daily newsletter with links to crosswords.โฉ
Heard through Saul Pwanson and sourced from the Internet Archive.โฉ
The Browserโs clues are sourced with the gracious permission of Dan Feyer and The Browserโs editors!โฉ
As of August 2022, Leo Edit has sadly discontinued their cryptic crosswords.โฉ
.puz
files were provided courtesy of Michael F. Gill. As of August 2021, The New York Times no longer supports.puz
files.โฉThe CSV request will only return the first 1000 rows, click here to stream all rows (this will take a while). The JSON request is paginated with 100 rows per page.โฉ