A Dataset of Cryptic Crossword Clues

cryptics.georgeho.org is a dataset of cryptic crossword1 clues, indicators and charades, collected from various blogs and digital archives.

This dataset is a significant work of crossword archivism and scholarship, as acquiring historical crosswords and structuring their contents require focused effort and tedious cleaning that few are willing to do for such trivial data - for example, according to this 2004 selection guide2, the Library of Congress explicitly does not collect crossword puzzles.

This project indexes various blogs and digital archives for cryptic crosswords. Several fields - such as clues, answers, clue numbers, annotations or commentary, puzzle title and publication date - are parsed and extracted into a tabular dataset. The result is over half a million clues from cryptic crosswords over the past twelve years.

Two other datasets are subsequently derived from the clues - wordplay indicators and charades (a.k.a. substitutions). All told, the derived datasets contain over twelve thousand wordplay indicators and over sixty thousand charades.

Currently the sources for clues are:

The data can be viewed online and downloaded for free (CSV, JSON, SQLite, advanced6). Detailed documentation can be found on the datasheet and the source code for creating the dataset is available on GitHub.

Send all comments, suggestions and complaints to hello[รฆ]georgeho.org.

Please share and enjoy!

~ George Ho

  1. If youโ€™re new to cryptic crosswords, rejoice! A whole new world awaits you! The New Yorker has an excellent introduction to cryptic crosswords, and Matt Gritzmacher has a daily newsletter with links to crosswords.โ†ฉ

  2. Heard through Saul Pwanson and sourced from the Internet Archive.โ†ฉ

  3. The Browserโ€™s clues are sourced with the gracious permission of Dan Feyer and The Browserโ€™s editors!โ†ฉ

  4. As of August 2022, Leo Edit has sadly discontinued their cryptic crosswords.โ†ฉ

  5. .puz files were provided courtesy of Michael F. Gill. As of August 2021, The New York Times no longer supports .puz files.โ†ฉ

  6. The CSV request will only return the first 1000 rows, click here to stream all rows (this will take a while). The JSON request is paginated with 100 rows per page.โ†ฉ