* 'master' of github.com:charlesreid1/centillion: update config_flask.example.py to strip dc info
|5 years ago|
|docs||5 years ago|
|mkdocs-material-dib@c3dd912f3c||5 years ago|
|static||5 years ago|
|templates||5 years ago|
|.gitignore||5 years ago|
|.gitmodules||5 years ago|
|LICENSE||5 years ago|
|Readme.md||5 years ago|
|Schema.md||5 years ago|
|Todo.md||5 years ago|
|auth.py||5 years ago|
|centillion.py||5 years ago|
|centillion_prepare.py||5 years ago|
|centillion_search.py||5 years ago|
|config_centillion.json||5 years ago|
|config_flask.example.py||5 years ago|
|gdrive_util.py||5 years ago|
|get_centillion_config.py||5 years ago|
|groupsio_util.py||5 years ago|
|install_pandoc.sh||5 years ago|
|requirements.txt||5 years ago|
centillion: a pan-github-markdown-issues-google-docs search engine.
a centillion: a very large number consisting of a 1 with 303 zeros after it.
one centillion is 3.03 log-times better than a googol.
what is it
Centillion (https://github.com/dcppc/centillion) is a search engine that can index three kinds of collections: Google Documents, Github issues, and Markdown files in Github repos.
We define the types of documents the centillion should index,
what info and how. The centillion then builds and
updates a search index. That's all done in
The centillion also provides a simple web frontend for running
queries against the search index. That's done using a Flask server
The centillion keeps it simple.
Centillion lives behind a Github authentication layer, implemented with flask-dance. When you first visit the site it will ask you to authenticate with Github so that it can verify you have permission to access the site.
Centillion is a Python program built using whoosh (search engine library). It indexes the full text of docx files in Google Documents, just the filenames for non-docx files. The full text of issues and their comments are indexed, and results are grouped by issue. Centillion requires Google Drive and Github OAuth apps. Once you provide credentials to Flask you're all set to go.
There's also a control panel at https://search.nihdatacommons.us/control_panel that allows you to rebuild the search index from scratch (the Google Drive indexing takes a while).
quickstart (with Github auth)
Start by creating a Github OAuth application. Get the public and private application key (client token and client secret token) from the Github application's page. You will also need a Github access token (in addition to the app tokens).
When you create the application, set the callback
/login/github/authorized, as in:
Edit the Flask configuration
and set the public and private application keys.
Now run centillion:
or if you used http instead of https:
OAUTHLIB_INSECURE_TRANSPORT="true" python centillion.py
This will start a Flask server, and you can view the minimal search engine
interface in your browser at
If you are having problems with your callback URL being treated as HTTP by Github, even though there is an HTTPS address, and everything else seems fine, try deleting the Github OAuth app and creating a new one.