193 Commits

Author SHA1 Message Date
Leonard Richardson
cfb1d23cb5 Merge branch 'master' of https://github.com/leonardr/olipy 2013-12-29 11:23:24 -05:00
Leonard Richardson
727ad1d1e2 Added list of bad words. 2013-12-29 11:23:09 -05:00
Leonard Richardson
acb0b6ad3b Added some word lists from COHA. 2013-12-29 11:19:13 -05:00
Leonard Richardson
4d6125816c Merge branch 'master' of https://github.com/leonardr/olipy 2013-12-25 23:46:21 -05:00
Leonard Richardson
232feabb24 Iterating over the year directories is (mostly? entirely?) redundant with the numbered directories. 2013-12-25 22:06:27 -05:00
Leonard Richardson
cb3e7a950b Added Rosetta stone gibberish. 2013-12-18 09:13:28 -05:00
Leonard Richardson
1eb08cb1ff Yield the first token. 2013-12-16 12:07:40 -05:00
Leonard Richardson
5858e73870 Added a SentenceAssembler to queneau. 2013-12-15 10:58:16 -05:00
Leonard Richardson
d2258334cf Added a couple more links. 2013-12-15 10:34:11 -05:00
Leonard Richardson
34ec933758 Improved README 2013-12-15 10:30:25 -05:00
Leonard Richardson
1ed840ab70 Added a class for diagnosing Unicode strings and a few more alphabets. 2013-12-15 10:16:06 -05:00
Leonard Richardson
040346dbf8 Merge branch 'master' of https://github.com/leonardr/olipy 2013-12-04 09:19:46 -05:00
Leonard Richardson
daa063a32e Made example filename more generic. 2013-12-04 09:19:17 -05:00
Leonard Richardson
8b34e9f8d1 Added disclaimer. 2013-12-03 17:08:46 -05:00
Leonard Richardson
277f8851f5 Added a very simple scheduler because I'm sick of dealing with huge standard deviations. 2013-12-03 17:01:25 -05:00
Leonard Richardson
36c297f8a9 Added a port of the word filter. 2013-12-01 21:22:44 -05:00
Leonard Richardson
1d74cebcad Added as much of a modifier alphabet as I could find. 2013-12-01 06:57:23 -05:00
Leonard Richardson
75506ae8e4 Added another indicator of the start of the text. 2013-11-30 18:41:56 -05:00
Leonard Richardson
0abe661395 Correctly identify the etext ID from a numeric filename. 2013-11-30 18:14:26 -05:00
Leonard Richardson
d0cdf7f945 Automatically provide the RDF graph for each PG text (if possible), and search that graph for language information more reliable than the stuff inside the header. 2013-11-30 17:20:57 -05:00
Leonard Richardson
7df369a250 Added a lot of other ways for the etext part of a book to end. 2013-11-30 09:34:55 -05:00
Leonard Richardson
abbec27c53 Added a Markov generator that tried to keep brackets and quotes balanced. 2013-11-29 16:35:18 -05:00
Leonard Richardson
49ce43e570 Made the API for the Markov chain module consistent with the API for the Queneau assembly module. 2013-11-29 09:00:19 -05:00
Leonard Richardson
a8dc086fa6 Tweaked the ebooks algorithm and added a Markov chain algorithm. 2013-11-28 19:54:00 -05:00
Leonard Richardson
b5276928ab Improved performance a bit and increased the preference for lines that begin with capital letters. 2013-11-27 13:22:29 -05:00
Leonard Richardson
71565c638d Added a mapping of old-style Project Gutenberg filenames to new-style ebook IDs. 2013-11-27 10:08:05 -05:00
Leonard Richardson
bf6462653c Made the ebook generator go through the pre-2007 ebooks. 2013-11-26 18:48:22 -05:00
Leonard Richardson
1bb5220fcc Try to get all the way through the corpus. 2013-11-26 18:08:49 -05:00
Leonard Richardson
7fbc3d47b6 Derive encoding from filename if possible. 2013-11-26 15:23:20 -05:00
Leonard Richardson
7b4733e236 Derive encoding from filename if possible. 2013-11-26 15:21:56 -05:00
Leonard Richardson
9dbd816e8d We can now parse every plain-text document in the Project Gutenberg DVD. 2013-11-26 15:18:45 -05:00
Leonard Richardson
b66a5240e8 Added the ability to extract the 'best' version of each text on a Project Gutenberg CD or DVD. 2013-11-26 12:36:07 -05:00
Leonard Richardson
01bb4d70f7 Fixed text in use. 2013-11-26 08:30:04 -05:00
Leonard Richardson
9479250a6c Added english.py. 2013-11-26 08:27:55 -05:00
Leonard Richardson
e02ea08a3b Remove obviously unbalanced quote marks. 2013-11-26 08:24:34 -05:00
Leonard Richardson
84431d7f4e Added a number of horse_ebooks-like tweaks to improve the quality of the selected quotes. 2013-11-26 08:22:33 -05:00
Leonard Richardson
97df55de06 Added a basic Project Gutenberg tool and an exciting new text sampler that supplies @horse_ebooks-style hilarity. 2013-11-25 23:01:28 -05:00
Leonard Richardson
ab1fb1b909 Tweaked probabilities and improved the looks of gradients. 2013-11-25 08:54:30 -05:00
Leonard Richardson
be43fe753a Added gibberish gradients. 2013-11-24 22:41:45 -05:00
Leonard Richardson
d0ed2cae39 Added more block and box drawing charsets. 2013-11-23 12:44:19 -05:00
Leonard Richardson
160aa33939 Added some alphanumeric mosaic sets. 2013-11-14 15:13:43 -05:00
Leonard Richardson
dbd5822a7b Bring back the 'choice among Latin alphabets', using a wide variety of cool alphabets assembled by @tef for the unicodefuckery project. 2013-11-14 14:58:53 -05:00
Leonard Richardson
e8493734c7 Added composite gibberish, which is like two, two, two gibberishes in one\! 2013-11-14 14:37:59 -05:00
Leonard Richardson
8677df6f8e Bumped up limited vocabularies. 2013-10-18 17:18:15 -04:00
Leonard Richardson
f36f26cbc7 Fixed 'choose one alphabet.' 2013-10-18 17:09:57 -04:00
Leonard Richardson
615ce1fd19 Un-inverted inverted logic. 2013-10-18 16:56:57 -04:00
Leonard Richardson
c6713ccfdd Made short strings a little longer. 2013-10-18 16:56:34 -04:00
Leonard Richardson
a8989b6d9a Added a crossout alphabet. 2013-10-18 14:50:37 -04:00
Leonard Richardson
bb162595e0 Tweaked lengths and added a symbology alphabet. 2013-10-18 14:43:16 -04:00
Leonard Richardson
2faee6e02c Added fill mosaic as a glitch charset. 2013-10-18 14:18:39 -04:00