Leonard Richardson
|
cfb1d23cb5
|
Merge branch 'master' of https://github.com/leonardr/olipy
|
2013-12-29 11:23:24 -05:00 |
|
Leonard Richardson
|
727ad1d1e2
|
Added list of bad words.
|
2013-12-29 11:23:09 -05:00 |
|
Leonard Richardson
|
acb0b6ad3b
|
Added some word lists from COHA.
|
2013-12-29 11:19:13 -05:00 |
|
Leonard Richardson
|
4d6125816c
|
Merge branch 'master' of https://github.com/leonardr/olipy
|
2013-12-25 23:46:21 -05:00 |
|
Leonard Richardson
|
232feabb24
|
Iterating over the year directories is (mostly? entirely?) redundant with the numbered directories.
|
2013-12-25 22:06:27 -05:00 |
|
Leonard Richardson
|
cb3e7a950b
|
Added Rosetta stone gibberish.
|
2013-12-18 09:13:28 -05:00 |
|
Leonard Richardson
|
1eb08cb1ff
|
Yield the first token.
|
2013-12-16 12:07:40 -05:00 |
|
Leonard Richardson
|
5858e73870
|
Added a SentenceAssembler to queneau.
|
2013-12-15 10:58:16 -05:00 |
|
Leonard Richardson
|
d2258334cf
|
Added a couple more links.
|
2013-12-15 10:34:11 -05:00 |
|
Leonard Richardson
|
34ec933758
|
Improved README
|
2013-12-15 10:30:25 -05:00 |
|
Leonard Richardson
|
1ed840ab70
|
Added a class for diagnosing Unicode strings and a few more alphabets.
|
2013-12-15 10:16:06 -05:00 |
|
Leonard Richardson
|
040346dbf8
|
Merge branch 'master' of https://github.com/leonardr/olipy
|
2013-12-04 09:19:46 -05:00 |
|
Leonard Richardson
|
daa063a32e
|
Made example filename more generic.
|
2013-12-04 09:19:17 -05:00 |
|
Leonard Richardson
|
8b34e9f8d1
|
Added disclaimer.
|
2013-12-03 17:08:46 -05:00 |
|
Leonard Richardson
|
277f8851f5
|
Added a very simple scheduler because I'm sick of dealing with huge standard deviations.
|
2013-12-03 17:01:25 -05:00 |
|
Leonard Richardson
|
36c297f8a9
|
Added a port of the word filter.
|
2013-12-01 21:22:44 -05:00 |
|
Leonard Richardson
|
1d74cebcad
|
Added as much of a modifier alphabet as I could find.
|
2013-12-01 06:57:23 -05:00 |
|
Leonard Richardson
|
75506ae8e4
|
Added another indicator of the start of the text.
|
2013-11-30 18:41:56 -05:00 |
|
Leonard Richardson
|
0abe661395
|
Correctly identify the etext ID from a numeric filename.
|
2013-11-30 18:14:26 -05:00 |
|
Leonard Richardson
|
d0cdf7f945
|
Automatically provide the RDF graph for each PG text (if possible), and search that graph for language information more reliable than the stuff inside the header.
|
2013-11-30 17:20:57 -05:00 |
|
Leonard Richardson
|
7df369a250
|
Added a lot of other ways for the etext part of a book to end.
|
2013-11-30 09:34:55 -05:00 |
|
Leonard Richardson
|
abbec27c53
|
Added a Markov generator that tried to keep brackets and quotes balanced.
|
2013-11-29 16:35:18 -05:00 |
|
Leonard Richardson
|
49ce43e570
|
Made the API for the Markov chain module consistent with the API for the Queneau assembly module.
|
2013-11-29 09:00:19 -05:00 |
|
Leonard Richardson
|
a8dc086fa6
|
Tweaked the ebooks algorithm and added a Markov chain algorithm.
|
2013-11-28 19:54:00 -05:00 |
|
Leonard Richardson
|
b5276928ab
|
Improved performance a bit and increased the preference for lines that begin with capital letters.
|
2013-11-27 13:22:29 -05:00 |
|
Leonard Richardson
|
71565c638d
|
Added a mapping of old-style Project Gutenberg filenames to new-style ebook IDs.
|
2013-11-27 10:08:05 -05:00 |
|
Leonard Richardson
|
bf6462653c
|
Made the ebook generator go through the pre-2007 ebooks.
|
2013-11-26 18:48:22 -05:00 |
|
Leonard Richardson
|
1bb5220fcc
|
Try to get all the way through the corpus.
|
2013-11-26 18:08:49 -05:00 |
|
Leonard Richardson
|
7fbc3d47b6
|
Derive encoding from filename if possible.
|
2013-11-26 15:23:20 -05:00 |
|
Leonard Richardson
|
7b4733e236
|
Derive encoding from filename if possible.
|
2013-11-26 15:21:56 -05:00 |
|
Leonard Richardson
|
9dbd816e8d
|
We can now parse every plain-text document in the Project Gutenberg DVD.
|
2013-11-26 15:18:45 -05:00 |
|
Leonard Richardson
|
b66a5240e8
|
Added the ability to extract the 'best' version of each text on a Project Gutenberg CD or DVD.
|
2013-11-26 12:36:07 -05:00 |
|
Leonard Richardson
|
01bb4d70f7
|
Fixed text in use.
|
2013-11-26 08:30:04 -05:00 |
|
Leonard Richardson
|
9479250a6c
|
Added english.py.
|
2013-11-26 08:27:55 -05:00 |
|
Leonard Richardson
|
e02ea08a3b
|
Remove obviously unbalanced quote marks.
|
2013-11-26 08:24:34 -05:00 |
|
Leonard Richardson
|
84431d7f4e
|
Added a number of horse_ebooks-like tweaks to improve the quality of the selected quotes.
|
2013-11-26 08:22:33 -05:00 |
|
Leonard Richardson
|
97df55de06
|
Added a basic Project Gutenberg tool and an exciting new text sampler that supplies @horse_ebooks-style hilarity.
|
2013-11-25 23:01:28 -05:00 |
|
Leonard Richardson
|
ab1fb1b909
|
Tweaked probabilities and improved the looks of gradients.
|
2013-11-25 08:54:30 -05:00 |
|
Leonard Richardson
|
be43fe753a
|
Added gibberish gradients.
|
2013-11-24 22:41:45 -05:00 |
|
Leonard Richardson
|
d0ed2cae39
|
Added more block and box drawing charsets.
|
2013-11-23 12:44:19 -05:00 |
|
Leonard Richardson
|
160aa33939
|
Added some alphanumeric mosaic sets.
|
2013-11-14 15:13:43 -05:00 |
|
Leonard Richardson
|
dbd5822a7b
|
Bring back the 'choice among Latin alphabets', using a wide variety of cool alphabets assembled by @tef for the unicodefuckery project.
|
2013-11-14 14:58:53 -05:00 |
|
Leonard Richardson
|
e8493734c7
|
Added composite gibberish, which is like two, two, two gibberishes in one\!
|
2013-11-14 14:37:59 -05:00 |
|
Leonard Richardson
|
8677df6f8e
|
Bumped up limited vocabularies.
|
2013-10-18 17:18:15 -04:00 |
|
Leonard Richardson
|
f36f26cbc7
|
Fixed 'choose one alphabet.'
|
2013-10-18 17:09:57 -04:00 |
|
Leonard Richardson
|
615ce1fd19
|
Un-inverted inverted logic.
|
2013-10-18 16:56:57 -04:00 |
|
Leonard Richardson
|
c6713ccfdd
|
Made short strings a little longer.
|
2013-10-18 16:56:34 -04:00 |
|
Leonard Richardson
|
a8989b6d9a
|
Added a crossout alphabet.
|
2013-10-18 14:50:37 -04:00 |
|
Leonard Richardson
|
bb162595e0
|
Tweaked lengths and added a symbology alphabet.
|
2013-10-18 14:43:16 -04:00 |
|
Leonard Richardson
|
2faee6e02c
|
Added fill mosaic as a glitch charset.
|
2013-10-18 14:18:39 -04:00 |
|