five-letter-words-2
Advanced problems with Donald Knuth's GraphBase list of five-letter words, exploring complex algorithmic concepts and mathematical theorems using the same 5,757-word dataset.
See NewProblems.md for the full problem statements.
The list of words comes from [1]
and is in the public domain. The original warm-up exercises live in original/.
Project Structure
Each problem has a single driver script prefixed with its problem number.
Shared utility functions (word loading, graph building, distance metrics)
live in utils.py so they aren't duplicated across files.
utils.py - shared utilities (get_words, hamming_distance, etc.)
p01_markov_chain.py - Problem 1: Word Ladder Markov Chains
p02_ramsey_theory.py - Problem 2: Ramsey Theory on Word Graphs
p03_wordle_solver.py - Problem 3: Information-Theoretic Wordle Solver
p04_expander_graphs.py - Problem 4: Probabilistic Method and Expander Graphs
p05_coding_theory.py - Problem 5: Coding Theory and Error-Correcting Codes
p06_colonel_blotto.py - Problem 6: Colonel Blotto Tournament
p07_random_graph_thresholds.py - Problem 7: Random Graph Thresholds
p08_word_complexes.py - Problem 8: Algebraic Topology and Word Complexes
p09_hash_collisions.py - Problem 9: Hash Collisions and Birthday Paradox
p10_card_shuffling.py - Problem 10: Card Shuffling and Mixing Times
original/ - Original warm-up exercises
Run any problem with python p01_markov_chain.py, etc.
Problem 1: Word Ladder Markov Chains
p01_markov_chain.py - Builds a Markov chain where states are five-letter words
and transitions occur between words differing by exactly one letter. Computes the
stationary distribution, classifies recurrent/transient states, analyzes connected
components, estimates diameter, and applies Perron-Frobenius theorem concepts.
Includes visualization of the word graph and degree distribution.
Problem 2: Ramsey Theory on Word Graphs
p02_ramsey_theory.py - Constructs a graph where edges connect words sharing at
least 3 letters in the same positions. Investigates cliques, independent sets,
and Ramsey numbers R(3,k). Includes visualization of the Ramsey word graph.
Problem 3: Information-Theoretic Wordle Solver
p03_wordle_solver.py - Optimal Wordle strategy using information theory. At each
guess, selects the word that maximizes expected information gain (minimizes entropy).
Models the game as a decision tree and compares entropy-based and minimax approaches.
Includes extended demo comparing strategies on random words.
Problem 4: Probabilistic Method and Expander Graphs
p04_expander_graphs.py - Creates a random graph on five-letter words where edges
between words differing by one letter exist with probability p. Analyzes expansion
properties, spectral gaps, and mixing times using the probabilistic method. Includes
analysis of how expander properties vary with edge probability.
Problem 5: Coding Theory and Error-Correcting Codes
p05_coding_theory.py - Treats five-letter words as codewords and finds a maximal
set with minimum Hamming distance d=3 (1-error-correcting). Uses both greedy and
randomized heuristics. Relates to sphere-packing bounds and Gilbert-Varshamov bounds.
Problem 6: Colonel Blotto Tournament
p06_colonel_blotto.py - Tournament where each word is a strategy and letter
positions are battlefields won by the lexicographically later letter. Analyzes
as a zero-sum game and finds Nash equilibria using linear programming.
Problem 7: Random Graph Thresholds
p07_random_graph_thresholds.py - Studies Erdos-Renyi random graphs on five-letter
words where edges connect words sharing at least k letters. Identifies threshold
functions for connectivity, giant components, and Hamiltonicity. Applies Friedgut's
sharp threshold theorem.
Problem 8: Algebraic Topology and Word Complexes
p08_word_complexes.py - Constructs simplicial complexes where words are vertices
and simplices are formed by sets of words with pairwise edit distance <= 2. Computes
homology groups and Betti numbers. Applies the nerve lemma and analyzes
linguistic-topological correlations.
Problem 9: Hash Collisions and Birthday Paradox
p09_hash_collisions.py - Interprets five-letter words as inputs to simplified
cryptographic hash functions. Studies collision probabilities using birthday paradox
analysis, empirical collision testing, and near-collision implications.
Problem 10: Card Shuffling and Mixing Times
p10_card_shuffling.py - Models card shuffling where permutations are represented
by five-letter words. Studies mixing time using coupling methods and total variation
distance, applying Diaconis' work on riffle shuffles. Analyzes the cutoff phenomenon.
Requirements
- Python 3.x
- NumPy
- SciPy
- NetworkX
Sources
- Knuth, Donald. The Stanford GraphBase: A Platform for Combinatorial Computing. New York: ACM Press, 1994. http://www-cs-faculty.stanford.edu/~knuth/sgb.html