add results-handling for markdown files

improve counts accounting, and construct usable urls for markdown
fix markdown indexing
2018-08-03 00:19:57 -07:00 · 2018-08-03 00:19:35 -07:00 · 2018-08-02 23:56:56 -07:00 · 2018-08-02 23:14:55 -07:00 · 2018-08-02 22:29:18 -07:00 · 2018-08-02 22:27:30 -07:00
19 changed files with 1019 additions and 305 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -1,8 +1,8 @@
 config_*
 vp
 credentials.json
 drive*.json
 *.pyc
 config.py
 out/
 search_index/
 venv/
--- a/Readme.md
+++ b/Readme.md
@@ -8,6 +8,7 @@ the centillion is 3.03 log-times better than the googol.
 ![Screen shot of centillion](img/ss.png)
 ## what is it
 The centillion is a search engine built using [whoosh](https://whoosh.readthedocs.io/en/latest/intro.html),
@@ -24,17 +25,46 @@ defined in `centillion.py`.
 The centillion keeps it simple.
-## quickstart
+## quickstart (with Github auth)
-Run the centillion app with a github access token API key set via
+Start by creating a Github OAuth application.
-environment variable:
+Get the public and private application key 
 (client token and client secret token)
 from the Github application's page.
 When you create the application, set the callback
 URL to `/login/github/authorized`, as in:
 ```
-GITHUB_TOKEN="XXXXXXXX" python centillion.py
+https://<url>/login/github/authorized
 ```
 Edit the Flask configuration `config_flask.py`
 and set the public and private application keys.
 Now run centillion:
 ```
 python centillion.py
 ```
 or if you used http instead of https:
 ```
 OAUTHLIB_INSECURE_TRANSPORT="true" python centillion.py
 ```
 This will start a Flask server, and you can view the minimal search engine
-interface in your browser at <http://localhost:5000>.
+interface in your browser at `http://<ip>:5000`.
 ## troubleshooting
 If you are having problems with your callback URL being treated
 as HTTP by Github, even though there is an HTTPS address, and
 everything else seems fine, try deleting the Github OAuth app
 and creating a new one.
 ## more info
--- a/Todo.md
+++ b/Todo.md
@@ -1,7 +1,47 @@
 # todo
-current problems:
+Main task:
- some github issues have no title
+- hashing and caching
- github issues are just being re-indexed over and over
+    - <s>first, working out the logic of how we group items into sets
- documents not showing up in results
+        - needs to be deleted
        - needs to be updated
        - needs to be added
        - for docs, issues, and comments</s>
    - second, when we add or update an item, need to:
        - go through the motions, download file, extract text
        - check for existing indexed doc with that id
        - check if existing indexed doc has same hash
            - if so, skip
            - otherwise, delete and re-index
 Other bugs:
 - Some github issues have no title (?)
 - <s>Need to combine issues with comments</s>
 - Not able to index markdown files _in a repo_
 - (Longer term) update main index vs update diff index
 Needs:
 - <s>control panel</s>
 Thursday product:
 - Everything re-indexed nightly
 - Search engine built on all documents in Google Drive, all issues, markdown files
 - Using pandoc to extract Google Drive document contents
 - BRIEF quickstart documentation
 Future:
 - Future plans to improve - plugins, improving matching
 - Subdomain plans
 - Folksonomy tagging and integration plans
 config options for plugins
 conditional blocks with import github inside
 complicated tho - better to have components split off
--- a/centillion.py
+++ b/centillion.py
@@ -2,8 +2,11 @@ import threading
 from subprocess import call
 import codecs
-import os
+import os, json
 from werkzeug.contrib.fixers import ProxyFix
 from flask import Flask, request, redirect, url_for, render_template, flash
 from flask_dance.contrib.github import make_github_blueprint, github
 # create our application
 from centillion_search import Search
@@ -22,10 +25,12 @@ You provide:
    - Google Drive API key via file
 """
 class UpdateIndexTask(object):
-    def __init__(self, diff_index=False):
+    def __init__(self, gh_oauth_token, diff_index=False):
        self.diff_index = diff_index
        thread = threading.Thread(target=self.run, args=())
        self.gh_oauth_token = gh_oauth_token
        thread.daemon = True
        thread.start()
@@ -38,91 +43,178 @@ class UpdateIndexTask(object):
        from get_centillion_config import get_centillion_config
        config = get_centillion_config('config_centillion.json')
-        gh_token = os.environ['GITHUB_TOKEN']
+        search.update_index_markdown(self.gh_oauth_token,config)
-        search.update_index_issues(gh_token, config)
+        search.update_index_issues(self.gh_oauth_token,config)
        search.update_index_gdocs(config)
 app = Flask(__name__)
 app.wsgi_app = ProxyFix(app.wsgi_app)
 # Load default config and override config from an environment variable
 app.config.from_pyfile("config_flask.py")
-last_searches_file = app.config["INDEX_DIR"] + "/last_searches.txt"
+github_bp = make_github_blueprint()
 #github_bp = make_github_blueprint(
 #                        client_id = os.environ.get('GITHUB_OAUTH_CLIENT_ID'),
 #                        client_secret = os.environ.get('GITHUB_OAUTH_CLIENT_SECRET'),
 #                        scope='read:org')
 app.register_blueprint(github_bp, url_prefix="/login")
 contents404 = "<html><body><h1>Status: Error 404 Page Not Found</h1></body></html>"
 contents403 = "<html><body><h1>Status: Error 403 Access Denied</h1></body></html>"
 contents200 = "<html><body><h1>Status: OK 200</h1></body></html>"
 ##############################
 # Flask routes
@app.route('/')
 def index():
-    return redirect(url_for("search", query="", fields=""))
+
    if not github.authorized:
        return redirect(url_for("github.login"))
    else:
        username = github.get("/user").json()['login']
        resp = github.get("/user/orgs")
        if resp.ok:
            # If they are in team copper, redirect to search.
            # Otherwise, hit em with a 403
            all_orgs = resp.json()
            for org in all_orgs:
                if org['login']=='dcppc':
                    copper_team_id = '2700235'
                    mresp = github.get('/teams/%s/members/%s'%(copper_team_id,username))
                    if mresp.status_code==204:
                        # --------------------
                        # Business as usual
                        return redirect(url_for("search", query="", fields=""))
            return contents403
        return contents404
 ### @app.route('/')
 ### def index():
 ###     return redirect(url_for("search", query="", fields=""))
@app.route('/search')
 def search():
    query = request.args['query']
    fields = request.args.get('fields')
    if fields == 'None':
        fields = None
-    search = Search(app.config["INDEX_DIR"])
+    if not github.authorized:
-    if not query:
+        return redirect(url_for("github.login"))
        parsed_query = ""
        result = []
-    else:
+    username = github.get("/user").json()['login']
        parsed_query, result = search.search(query.split(), fields=[fields])
        store_search(query, fields)
-    totals = search.get_document_total_count()
+    resp = github.get("/user/orgs")
    if resp.ok:
        all_orgs = resp.json()
        for org in all_orgs:
            if org['login']=='dcppc':
                copper_team_id = '2700235'
                mresp = github.get('/teams/%s/members/%s'%(copper_team_id,username))
                if mresp.status_code==204:
                    # --------------------
                    # Business as usual
                    query = request.args['query']
                    fields = request.args.get('fields')
                    if fields == 'None':
                        fields = None
                    search = Search(app.config["INDEX_DIR"])
                    if not query:
                        parsed_query = ""
                        result = []
                    else:
                        parsed_query, result = search.search(query.split(), fields=[fields])
                    totals = search.get_document_total_count()
                    return render_template('search.html', 
                                           entries=result, 
                                           query=query, 
                                           parsed_query=parsed_query, 
                                           fields=fields, 
                                           totals=totals)
    return contents403
    return render_template('search.html', 
                           entries=result, 
                           query=query, 
                           parsed_query=parsed_query, 
                           fields=fields, 
                           last_searches=get_last_searches(), 
                           totals=totals)
@app.route('/update_index')
 def update_index():
-    rebuild = request.args.get('rebuild')
+
-    UpdateIndexTask(diff_index=False)
+    if not github.authorized:
-    flash("Rebuilding index, check console output")
+        return redirect(url_for("github.login"))
-    return render_template("search.html", 
+
-                           query="", 
+    username = github.get("/user").json()['login']
-                           fields="", 
+
-                           last_searches=get_last_searches(),
+    resp = github.get("/user/orgs")
-                           totals={})
+    if resp.ok:
        all_orgs = resp.json()
        for org in all_orgs:
            if org['login']=='dcppc':
                copper_team_id = '2700235'
                mresp = github.get('/teams/%s/members/%s'%(copper_team_id,username))
                if mresp.status_code==204:
                    gh_oauth_token = github.token['access_token']
                    # --------------------
                    # Business as usual
                    UpdateIndexTask(gh_oauth_token, diff_index=False)
                    flash("Rebuilding index, check console output")
                    return render_template("controlpanel.html", 
                                           totals={})
    return contents403
 ##############
 # Utility methods
-def get_last_searches():
+@app.route('/control_panel')
-    if os.path.exists(last_searches_file):
+def control_panel():
        with codecs.open(last_searches_file, 'r', encoding='utf-8') as f:
            contents = f.readlines()
    else:
        contents = []
    return contents
-def store_search(query, fields):
+    if not github.authorized:
-    if os.path.exists(last_searches_file):
+        return redirect(url_for("github.login"))
        with codecs.open(last_searches_file, 'r', encoding='utf-8') as f:
            contents = f.readlines()
    else:
        contents = []
-    search = "query=%s&fields=%s\n" % (query, fields)
+    username = github.get("/user").json()['login']
    if not search in contents:
        contents.insert(0, search)
-    with codecs.open(last_searches_file, 'w', encoding='utf-8') as f:
+    resp = github.get("/user/orgs")
-        f.writelines(contents[:30])
+    if resp.ok:
        all_orgs = resp.json()
        for org in all_orgs:
            if org['login']=='dcppc':
                copper_team_id = '2700235'
                mresp = github.get('/teams/%s/members/%s'%(copper_team_id,username))
                if mresp.status_code==204:
                    return render_template("controlpanel.html", 
                                           totals={})
    return contents403
@app.errorhandler(404)
 def oops(e):
    return contents404
 if __name__ == '__main__':
-    app.run()
+    app.run(host="0.0.0.0",port=5000)
--- a/centillion_prepare.py
+++ b/centillion_prepare.py
@@ -0,0 +1,5 @@
 from gdrive_util import GDrive
 gd = GDrive()
 service = gd.get_service()
--- a/centillion_search.py
+++ b/centillion_search.py
@@ -2,6 +2,7 @@ import shutil
 import html.parser
 from github import Github
 import base64
 from gdrive_util import GDrive
 from apiclient.http import MediaIoBaseDownload
@@ -42,6 +43,7 @@ Search object functions:
 Schema:
    - id
    - kind
    - fingerprint
    - created_time
    - modified_time
    - indexed_time
@@ -95,6 +97,11 @@ class Search:
    def __init__(self, index_folder):
        self.open_index(index_folder)
    # ------------------------------
    # Create a schema and open a search index
    # on disk.
    def open_index(self, index_folder, create_new=False):
        """
        Create a schema,
@@ -115,13 +122,13 @@ class Search:
        # ------------------------------
        # IMPORTANT:
        # This is where the search index's document schema
        # is defined.
        schema = Schema(
                id = ID(stored=True, unique=True),
                kind = ID(stored=True),
                #fingerprint = ID(stored=True),
                created_time = ID(stored=True),
                modified_time = ID(stored=True),
@@ -160,16 +167,13 @@ class Search:
    # Define how to add documents
-    def add_drive_file(self, writer, item, indexed_ids, temp_dir, config):
+    def add_drive_file(self, writer, item, temp_dir, config, update=False):
        """
        Add a Google Drive document/file to a search index.
        If it is a document, extract the contents.
        """
        gd = GDrive()
        service = gd.get_service()
-        # ------------------------
+        # There are two kinds of documents:
        # Two kinds of documents:
        # - documents with text that can be extracted (docx)
        # - everything else
@@ -179,10 +183,33 @@ class Search:
        }
        content = ""
-        if(mimetype not in mimemap.keys()):
+        if mimetype not in mimemap.keys():
-            # Not a document - 
+
-            # Just a file
+            # Not a document - just a file
-            print("Indexing document \"%s\" of type %s"%(item['name'], mimetype))
+            print("Indexing Google Drive file \"%s\" of type %s"%(item['name'], mimetype))
            writer.delete_by_term('id',item['id'])
            # Index a plain google drive file
            writer.add_document(
                    id = item['id'],
                    kind = 'gdoc',
                    created_time = item['createdTime'],
                    modified_time = item['modifiedTime'],
                    indexed_time = datetime.now().replace(microsecond=0).isoformat(),
                    title = item['name'],
                    url = item['webViewLink'],
                    mimetype = mimetype,
                    owner_email = item['owners'][0]['emailAddress'],
                    owner_name = item['owners'][0]['displayName'],
                    repo_name='',
                    repo_url='',
                    github_user='',
                    issue_title='',
                    issue_url='',
                    content = content
            )
        else:
            # Document with text
            # Perform content extraction
@@ -194,7 +221,8 @@ class Search:
            # This is a file type we know how to convert
            # Construct the URL and download it
-            print("Extracting content from \"%s\" of type %s"%(item['name'], mimetype))
+            print("Indexing Google Drive document \"%s\" of type %s"%(item['name'], mimetype))
            print(" > Extracting content")
            # Create a URL and a destination filename
@@ -215,7 +243,7 @@ class Search:
                outfile_name = name+'.'+out_ext
-            # assemble input/output file paths
+            # Assemble input/output file paths
            fullpath_input = os.path.join(temp_dir,infile_name)
            fullpath_output = os.path.join(temp_dir,outfile_name)
@@ -234,7 +262,7 @@ class Search:
                )
                assert output == ""
            except RuntimeError:
-                print("XXXXXX Failed to index document \"%s\""%(item['name']))
+                print(" > XXXXXX Failed to index document \"%s\""%(item['name']))
            # If export was successful, read contents of markdown
@@ -247,7 +275,7 @@ class Search:
            # No matter what happens, clean up.
-            print("Cleaning up \"%s\""%item['name'])
+            print(" > Cleaning up \"%s\""%item['name'])
            subprocess.call(['rm','-fr',fullpath_output])
            #print(" ".join(['rm','-fr',fullpath_output]))
@@ -255,49 +283,70 @@ class Search:
            subprocess.call(['rm','-fr',fullpath_input])
            #print(" ".join(['rm','-fr',fullpath_input]))
            if update:
                print(" > Removing old record")
                writer.delete_by_term('id',item['id'])
            else:
                print(" > Creating a new record")
-        # ------------------------------
+            writer.add_document(
-        # IMPORTANT:
+                    id = item['id'],
-        # This is where the search documents are actually created.
+                    kind = 'gdoc',
-
+                    created_time = item['createdTime'],
-        mimetype = re.split('[/\.]', item['mimeType'])[-1]
+                    modified_time = item['modifiedTime'],
-        writer.add_document(
+                    indexed_time = datetime.now().replace(microsecond=0).isoformat(),
-                id = item['id'],
+                    title = item['name'],
-                kind = 'gdoc',
+                    url = item['webViewLink'],
-                created_time = item['createdTime'],
+                    mimetype = mimetype,
-                modified_time = item['modifiedTime'],
+                    owner_email = item['owners'][0]['emailAddress'],
-                indexed_time = datetime.now().replace(microsecond=0).isoformat(),
+                    owner_name = item['owners'][0]['displayName'],
-                title = item['name'],
+                    repo_name='',
-                url = item['webViewLink'],
+                    repo_url='',
-                mimetype = mimetype,
+                    github_user='',
-                owner_email = item['owners'][0]['emailAddress'],
+                    issue_title='',
-                owner_name = item['owners'][0]['displayName'],
+                    issue_url='',
-                repo_name='',
+                    content = content
-                repo_url='',
+            )
                github_user='',
                issue_title='',
                issue_url='',
                content = content
        )
-    def add_issue(self, writer, issue, repo, config):
+
    # ------------------------------
    # Add a single github issue and its comments
    # to a search index.
    def add_issue(self, writer, issue, config, update=True):
        """
        Add a Github issue/comment to a search index.
        """
        repo = issue.repository
        repo_name = repo.owner.login+"/"+repo.name
        repo_url = repo.html_url
        count = 0
        # Handle the issue content
        print("Indexing issue %s"%(issue.html_url))
        # Combine comments with their respective issues.
        # Otherwise just too noisy.
        issue_comment_content = issue.body.rstrip()
        issue_comment_content += "\n"
        # Handle the comments content
        if(issue.comments>0):
            comments = issue.get_comments()
            for comment in comments:
                issue_comment_content += comment.body.rstrip()
                issue_comment_content += "\n"
        # Now create the actual search index record
        created_time = clean_timestamp(issue.created_at)
        modified_time = clean_timestamp(issue.updated_at)
        indexed_time = clean_timestamp(datetime.now())
        # Add one document per issue thread,
        # containing entire text of thread.
        writer.add_document(
                id = issue.html_url,
                kind = 'issue',
@@ -314,45 +363,67 @@ class Search:
                github_user = issue.user.login,
                issue_title = issue.title,
                issue_url = issue.html_url,
-                content = issue.body.rstrip()
+                content = issue_comment_content
        )
        count += 1
-        # Handle the comments content
+    def add_markdown(self, writer, d, config, update=True):
-        if(issue.comments>0):
+        """
        Use a Github markdown document API record
        to add a markdown document's contents to
        the search index.
        """
        repo = d['repo']
        org = d['org']
        repo_name = org + "/" + repo
        repo_url = "https://github.com/" + repo_name
-            comments = issue.get_comments()
+        fpath = d['path']
-            for comment in comments:
+        furl = d['url']
        fsha = d['sha']
        _, fname = os.path.split(fpath)
        _, fext = os.path.splitext(fpath)
-                print(" > Indexing comment %s"%(comment.html_url))
+        print("Indexing markdown doc %s"%(fname))
-                created_time = clean_timestamp(comment.created_at)
+        # Unpack the requests response and decode the content
-                modified_time = clean_timestamp(comment.updated_at)
+        response = requests.get(furl)
-                indexed_time = clean_timestamp(datetime.now())
+        jresponse = response.json()
        content = ""
        try:
            binary_content = re.sub('\n','',jresponse['content'])
            content = base64.b64decode(binary_content).decode('utf-8')
-                writer.add_document(
+        except KeyError:
-                        id = comment.html_url,
+            print(" > XXXXXXXX Failed to extract 'content' field. You probably hit the rate limit.")
-                        kind = 'comment',
+            return 
                        created_time = created_time,
                        modified_time = modified_time,
                        indexed_time = indexed_time,
                        title = "Comment on "+issue.title,
                        url = comment.html_url,
                        mimetype='',
                        owner_email='',
                        owner_name='',
                        repo_name = repo_name,
                        repo_url = repo_url,
                        github_user = comment.user.login,
                        issue_title = issue.title,
                        issue_url = issue.html_url,
                        content = comment.body.rstrip()
                )
-        count += 1
+        # Now create the actual search index record
-        return count
+        indexed_time = clean_timestamp(datetime.now())
        usable_url = "https://github.com/%s/blob/master/%s"%(repo_name, fpath)
        # Add one document per issue thread,
        # containing entire text of thread.
        writer.add_document(
                id = fsha,
                kind = 'markdown',
                created_time = '',
                modified_time = '',
                indexed_time = indexed_time,
                title = fname,
                url = usable_url,
                mimetype='',
                owner_email='',
                owner_name='',
                repo_name = repo_name,
                repo_url = repo_url,
                github_user = '',
                issue_title = '',
                issue_url = '',
                content = content
        )
@@ -365,86 +436,107 @@ class Search:
        """
        Update the search index using a collection of 
        Google Drive documents and files.
        Uses the 'id' field to uniquely identify documents.
        Also see:
        https://developers.google.com/drive/api/v3/reference/files
        """
        gd = GDrive()
        service = gd.get_service()
-        # -----
+        # Updated algorithm:
-        # Get the set of all documents on Google Drive:
+        # - get set of indexed ids
        # - get set of remote ids
        # - drop indexed ids not in remote ids
        # - index all remote ids
        # - add hash check in add_
        # ------------------------------
        # IMPORTANT:
        # This determines what information about the Google Drive files
        # you'll get back, and that's all you're going to have to work with.
        # If you need more information, modify the statement below.
        # Also see:
        # https://developers.google.com/drive/api/v3/reference/files
        # Get the set of indexed ids:
        # ------
        indexed_ids = set()
        p = QueryParser("kind", schema=self.ix.schema)
        q = p.parse("gdoc")
        with self.ix.searcher() as s:
            results = s.search(q,limit=None)
            for result in results:
                indexed_ids.add(result['id'])
        # Get the set of remote ids:
        # ------
        # Start with google drive api object
        gd = GDrive()
        service = gd.get_service()
        drive = service.files()
-
+        # Now index all the docs in the google drive folder
        # We should do more here
        # to check if we should update
        # or not...
        # 
        # loop over existing documents in index:
        #
        #    p = QueryParser("kind", schema=self.ix.schema)
        #    q = p.parse("gdoc")
        #    with self.ix.searcher() as s:
        #        results = s.search(q,limit=None)
        #        counts[key] = len(results)
        # The trick is to set next page token to None 1st time thru (fencepost)
        nextPageToken = None
        # Use the pager to return all the things
-        items = []
+        remote_ids = set()
        full_items = {}
        while True:
            ps = 12
            results = drive.list(
                    pageSize=ps,
                    pageToken=nextPageToken,
-                    fields="nextPageToken, files(id, kind, createdTime, modifiedTime, mimeType, name, owners, webViewLink)",
+                    fields = "nextPageToken, files(id, kind, createdTime, modifiedTime, mimeType, name, owners, webViewLink)",
                    spaces="drive"
            ).execute()
            nextPageToken = results.get("nextPageToken")
-            items += results.get("files", [])
+            files = results.get("files",[])
            for f in files:
-            # Keep it short
+                # Add all remote docs to a set
                remote_ids.add(f['id'])
                # Also store the doc
                full_items[f['id']] = f
            # Shorter:
            break
-
+            ## Longer:
            #if nextPageToken is None:
            #    break
        # Here is where we update.
        # Grab indexed ids
        # Grab remote ids
        # Drop indexed ids not in remote ids
        # Index all remote ids
        # Change add_ to update_
        # Add a hash check in update_
        indexed_ids = set()
        for item in items:
            indexed_ids.add(item['id'])
        writer = self.ix.writer()
-
+        count = 0
        temp_dir = tempfile.mkdtemp(dir=os.getcwd())
        print("Temporary directory: %s"%(temp_dir))
        if not os.path.exists(temp_dir):
            os.mkdir(temp_dir)
-        count = 0
+
-        for item in items:
+
-            self.add_drive_file(writer, item, indexed_ids, temp_dir, config)
+        # Drop any id in indexed_ids
        # not in remote_ids
        drop_ids = indexed_ids - remote_ids
        for drop_id in drop_ids:
            writer.delete_by_term('id',drop_id)
        # Update any id in indexed_ids
        # and in remote_ids
        update_ids = indexed_ids & remote_ids
        for update_id in update_ids:
            # cop out
            writer.delete_by_term('id',update_id)
            item = full_items[update_id]
            self.add_drive_file(writer, item, temp_dir, config, update=True)
            count += 1
        # Add any id not in indexed_ids
        # and in remote_ids
        add_ids = remote_ids - indexed_ids
        for add_id in add_ids:
            item = full_items[add_id]
            self.add_drive_file(writer, item, temp_dir, config, update=False)
            count += 1
        print("Cleaning temporary directory: %s"%(temp_dir))
        subprocess.call(['rm','-fr',temp_dir])
@@ -453,69 +545,218 @@ class Search:
-
+    def update_index_issues(self, gh_oauth_token, config):
    def update_index_issues(self, 
                            gh_access_token,
                            config):
        """
        Update the search index using a collection of 
        Github repo issues and comments.
        gh_oauth_token can also be an access token.
        """
-        # Strategy:
+        # Updated algorithm:
-        # To get the proof of concept up and running,
+        # - get set of indexed ids
-        # we are just deleting and re-indexing every issue/comment.
+        # - get set of remote ids
        # - drop indexed ids not in remote ids
        # - index all remote ids
-        g = Github(gh_access_token)
+        # Get the set of indexed ids:
        # ------
        indexed_issues = set()
        p = QueryParser("kind", schema=self.ix.schema)
        q = p.parse("gdoc")
        with self.ix.searcher() as s:
            results = s.search(q,limit=None)
            for result in results:
                indexed_issues.add(result['id'])
        # Set of all URLs as existing on github
        to_index = set()
-        writer = self.ix.writer()
+        # Get the set of remote ids:
        # ------
        # Start with api object
        g = Github(gh_oauth_token)
        # Now index all issue threads in the user-specified repos
        # Iterate over each repo 
        list_of_repos = config['repositories']
        for r in list_of_repos:
            # Start by collecting all the things
            remote_issues = set()
            full_items = {}
            if '/' not in r:
                err = "Error: specify org/reponame or user/reponame in list of repos"
                raise Exception(err)
            this_org, this_repo = re.split('/',r)
            org = g.get_organization(this_org)
            repo = org.get_repo(this_repo)
-            count = 0
+            # Iterate over each issue thread
            # Iterate over each thread
            issues = repo.get_issues()
            for issue in issues:
                # This approach is more work than is needed
                # but PoC||GTFO
                # For each issue/comment URL,
-                # remove the corresponding item
+                # grab the key and store the 
-                # and re-add it to the index
+                # corresponding issue object
                key = issue.html_url
                value = issue
-                to_index.add(issue.html_url)
+                remote_issues.add(key)
-                writer.delete_by_term('url', issue.html_url)
+                full_items[key] = value
                count -= 1
                comments = issue.get_comments()
-                for comment in comments:
+        writer = self.ix.writer()
-                    to_index.add(comment.html_url)
+        count = 0
                    writer.delete_by_term('url', comment.html_url)
-                # Now re-add this issue to the index
+        # Drop any issues in indexed_issues
-                # (this will also add the comments)
+        # not in remote_issues
-                count += self.add_issue(writer, issue, repo, config)
+        drop_issues = indexed_issues - remote_issues
        for drop_issue in drop_issues:
            writer.delete_by_term('id',drop_issue)
        # Update any issue in indexed_issues
        # and in remote_issues
        update_issues = indexed_issues & remote_issues
        for update_issue in update_issues:
            # cop out
            writer.delete_by_term('id',update_issue)
            item = full_items[update_issue]
            self.add_issue(writer, item, config, update=True)
            count += 1
        # Add any issue not in indexed_issues
        # and in remote_issues
        add_issues = remote_issues - indexed_issues
        for add_issue in add_issues:
            item = full_items[add_issue]
            self.add_issue(writer, item, config, update=False)
            count += 1
        writer.commit()
        print("Done, updated %d documents in the index" % count)
    def update_index_markdown(self, gh_oauth_token, config): 
        """
        Update the search index using a collection of 
        Markdown files from a Github repo.
        gh_oauth_token can also be an access token.
        """
        EXT = '.md'
        # Updated algorithm:
        # - get set of indexed ids
        # - get set of remote ids
        # - drop indexed ids not in remote ids
        # - index all remote ids
        # Get the set of indexed ids:
        # ------
        indexed_ids = set()
        p = QueryParser("kind", schema=self.ix.schema)
        q = p.parse("markdown")
        with self.ix.searcher() as s:
            results = s.search(q,limit=None)
            for result in results:
                indexed_ids.add(result['id'])
        # Get the set of remote ids:
        # ------
        # Start with api object
        g = Github(gh_oauth_token)
        # Now index all markdown files
        # in the user-specified repos
        # Iterate over each repo 
        list_of_repos = config['repositories']
        for r in list_of_repos:
            # Start by collecting all the things
            remote_ids = set()
            full_items = {}
            if '/' not in r:
                err = "Error: specify org/reponame or user/reponame in list of repos"
                raise Exception(err)
            this_org, this_repo = re.split('/',r)
            org = g.get_organization(this_org)
            repo = org.get_repo(this_repo)
            # ---------
            # begin markdown-specific code
            # Get head commit
            commits = repo.get_commits()
            last = commits[0]
            sha = last.sha
            # Get all the docs
            tree = repo.get_git_tree(sha=sha, recursive=True)
            docs = tree.raw_data['tree']
            for d in docs:
                # For each doc, get the file extension
                # If it matches EXT, download the file
                fpath = d['path']
                _, fname = os.path.split(fpath)
                _, fext = os.path.splitext(fpath)
                if fext==EXT:
                    key = d['sha']
                    d['org'] = this_org
                    d['repo'] = this_repo
                    value = d
                    # Stash the doc for later
                    remote_ids.add(key)
                    full_items[key] = value
        writer = self.ix.writer()
        count = 0
        # Drop any id in indexed_ids
        # not in remote_ids
        drop_ids = indexed_ids - remote_ids
        for drop_id in drop_ids:
            writer.delete_by_term('id',drop_id)
        # Update any id in indexed_ids
        # and in remote_ids
        update_ids = indexed_ids & remote_ids
        for update_id in update_ids:
            # cop out
            writer.delete_by_term('id',update_id)
            item = full_items[update_id]
            self.add_markdown(writer, item, config, update=True)
            count += 1
        # Add any issue not in indexed_ids
        # and in remote_ids
        add_ids = remote_ids - indexed_ids
        for add_id in add_ids:
            item = full_items[add_id]
            self.add_markdown(writer, item, config, update=False)
            count += 1
        writer.commit()
        print("Done, updated %d markdown documents in the index" % count)
    # ---------------------------------
    # Search results bundler
@@ -580,21 +821,18 @@ class Search:
            highlights = self.html_parser.unescape(highlights)
            html = self.markdown(highlights)
            html = re.sub(r'\n','<br />',html)
            sr.content_highlight = html
            search_results.append(sr)
        return search_results
        # ------------------
        # github issues
        # create search results
    def search(self, query_list, fields=None):
        with self.ix.searcher() as searcher:
            query_string = " ".join(query_list)
            query = None
@@ -628,13 +866,13 @@ class Search:
        kind_labels = {
                "documents" : "gdoc",
                "markdown" :  "markdown",
                "issues" :    "issue",
                "comments" :  "comment"
        }
        counts = {
                "documents" : None,
                "markdown" : None,
                "issues" : None,
                "comments" : None,
                "total" : None
        }
        for key in kind_labels:
@@ -644,7 +882,9 @@ class Search:
                results = s.search(q,limit=None)
                counts[key] = len(results)
-        counts['total'] = self.ix.searcher().doc_count_all()
+        ## These two should NOT be different, but they are...
        #counts['total'] = self.ix.searcher().doc_count_all()
        counts['total'] = counts['documents'] + counts['markdown'] + counts['issues']
        return counts
--- a/config_flask.py
+++ b/config_flask.py
@@ -1,9 +1,19 @@
 # Location of index file
 INDEX_DIR = "search_index"
 # oauth client deets
 GITHUB_OAUTH_CLIENT_ID = "63f8d49c651840cbe31e"
 GITHUB_OAUTH_CLIENT_SECRET = "36d9a4611f7427336d3c89ed041c45d086b793ee"
 # More information footer: Repository label
 FOOTER_REPO_ORG = "charlesreid1"
 FOOTER_REPO_NAME = "centillion"
 # Toggle to show Whoosh parsed query
 SHOW_PARSED_QUERY=True
 TAGLINE = "Search all the things"
 # Flask settings
 DEBUG = True
 SECRET_KEY = '42c5a8eda356ca9d9c3ab2d149541e6b91d843fa'
--- a/docs/centillion_components.md
+++ b/docs/centillion_components.md
@@ -0,0 +1,22 @@
 # Centillion Components
 Centillion keeps it simple.
 There are two components:
 * The `Search` object, which uses whoosh and various
  APIs (Github, Google Drive) to build and manage
  the search index. The `Search` object also runs all
  queries against the search index. (See the
  [Centillion Whoosh](centillion_whoosh.md) page
  or the `centillion_search`.py` file
  for details.)
 * Flask app, which uses Jinja templates to present the
  user with a minimal web frontend that allows them
  to interact with the search engine. (See the
  [Centillion Flask](centillion_flask.md) page
  or the `centillion`.py` file
  for details.)
--- a/docs/centillion_flask.md
+++ b/docs/centillion_flask.md
@@ -0,0 +1,30 @@
 # Centillion Flask
 ## What the flask server does
 Flask is a web server framework
 that allows developers to define
 behavior for specific endpoints,
 such as `/hello_world`, or
 <http://localhost:5000/hello_world>
 on a web server running locally.
 ## Flask server routes
 - `/home`
    - if not logged in, this redirects to a "log into github" landing page (not implemented yet)
    - if logged in, this redirects to the search route
 - `/search`
    - search template
 - `/main_index_update`
    - update main index, all docs period
 - `/control_panel`
    - this is the control panel, where you can trigger
      the search index to be re-made
--- a/docs/centillion_whoosh.md
+++ b/docs/centillion_whoosh.md
@@ -0,0 +1,34 @@
 # Centillion Whoosh
 The `centillion_search.py` file defines a
 `Search` class that serves as the backend
 for centillion.
 ## What the Search class does
 The `Search` class has two roles:
 - create (and update) the search index
    - this also requires the `Search` class
      to define the schema for storing documents
 - run queries against the search index,
  and package results up for Flask and Jinja
 ## Search class functions
 The `Search` class defines several functions:
 - `open_index()` creates the schema
 - `add_issue()`, `add_md()`, `add_document()` have three diff method sigs and add diff types
  of documents to the search index
 - `update_all_issues()` or `update_all_md()` or `update_all_documents()` iterates over items
  and determines whether each item needs to be updated in the search index
 - `update_main_index()` - update the entire search index
    - calls all three update_all methods
 - `create_search_results()` - package things up for jinja
 - `search()` - run the query, pass results to the jinja-packager
--- a/docs/index.md
+++ b/docs/index.md
@@ -1,30 +1,31 @@
-# The Centillion
+# Centillion
-**the centillion**: a pan-github-markdown-issues-google-docs search engine.
+**centillion**: a pan-github-markdown-issues-google-docs search engine.
 **a centillion**: a very large number consisting of a 1 with 303 zeros after it.
-the centillion is 3.03 log-times better than the googol.
+centillion is 3.03 log-times better than the googol.
-## what is it
+## What is centillion
-The centillion is a search engine built using [whoosh](https://whoosh.readthedocs.io/en/latest/intro.html),
+Centillion is a search engine built using [whoosh](https://whoosh.readthedocs.io/en/latest/intro.html),
 a Python library for building search engines.
-We define the types of documents the centillion should index,
+
-what info and how. The centillion then builds and
+We define the types of documents centillion should index,
 what info and how. Centillion then builds and
 updates a search index. That's all done in `centillion_search.py`.
-The centillion also provides a simple web frontend for running
+Centillion also provides a simple web frontend for running
 queries against the search index. That's done using a Flask server
 defined in `centillion.py`.
-The centillion keeps it simple.
+Centillion keeps it simple.
-## quickstart
+## Quickstart
-Run the centillion app with a github access token API key set via
+Run centillion with a github access token API key set via
 environment variable:
 ```
@@ -34,21 +35,50 @@ GITHUB_TOKEN="XXXXXXXX" python centillion.py
 This will start a Flask server, and you can view the minimal search engine
 interface in your browser at <http://localhost:5000>.
 ## Configuration
-## work that is done
+### Centillion configuration
-See [standalone.md](standalone.md) for the summary of
+`config_centillion.json` defines configuration variables
-the three standalone whoosh servers that were built:
+for centillion - namely, what to index, and how, and where.
 one for a folder of markdown files, one for github issues
 and comments, and one for google drive documents.
-## work that is being done
+### Flask configuration
-See [workinprogress.md](workinprogress.md) for details about
+`config_flask.py` defines configuration variables
-work in progress.
+used by flask, which controls the web frontend 
 for centillion.
-## work that is planned
+## Control Panel/Rebuilding Search Index
-See [plans.md](plans.md)
+To rebuild the search engine, visit the control panel route (`/control_panel`),
 for example at <http://localhost:5000/control_panel>.
 This allows you to rebuild the search engine index. The search index
 is stored in the `search_index/` directory, and that directory
 can be configured with centillion's configuration file.
 The diff search index is faster to build, as it only
 indexes documents that have been added since the last
 new document was added to the search index.
 The main search index is slower to build, as it will
 re-index everything.
 (Cron scripts? Threaded task that runs hourly?)
 ## Details
 More on the details of how centillion works.
 Under the hood, centillion uses flask and whoosh.
 Flask builds and runs the web server.
 Whoosh handles search requests and management
 of the search index.
 [Centillion Components](centillion_components.md)
 [Centillion Flask](centillion_flask.md)
 [Centillion Whoosh](centillion_whoosh.md)
--- a/install_pandoc.sh
+++ b/install_pandoc.sh
@@ -0,0 +1,19 @@
 #!/bin/bash
 #
 # for ubuntu 
 if [ "$(id -u)" != "0" ]; then
    echo ""
    echo ""
    echo "This script should be run as root."
    echo ""
    echo ""
    exit 1;
 fi
 OFILE="/tmp/pandoc.deb"
 curl -L https://github.com/jgm/pandoc/releases/download/2.2.2.1/pandoc-2.2.2.1-1-amd64.deb -o ${OFILE}
 dpkg -i ${OFILE}
 rm -f ${OFILE}
--- a/requirements.txt
+++ b/requirements.txt
@@ -9,3 +9,4 @@ PyGithub>=1.39
 pypandoc>=1.4
 requests>=2.19
 pandoc>=1.0
 flask-dance>=1.0.0
--- a/static/bootstrap.min.js
+++ b/static/bootstrap.min.js
--- a/static/jquery.min.js
+++ b/static/jquery.min.js
--- a/static/style.css
+++ b/static/style.css
@@ -1,17 +1,38 @@
 span.badge {
    vertical-align: text-bottom;
 }
-li.search-group-item {
+a.badgelinks, a.badgelinks:hover {
-    position: relative;
+    color: #fff;
-    display: block;
+    text-decoration: none;
    padding: 0px;
    margin-bottom: -1px;
    background-color: #fff;
    border: 1px solid #ddd;
 }
 div.list-group {
    border: 1px solid rgba(86,61,124,.2);
 }
 li.list-group-item {
    position: relative;
    display: block;
    /*padding: 20px 10px;*/
    margin-bottom: -1px;
    background-color: #f8f8f8;
    border: 1px solid #ddd;
 }
 li.search-group-item {
    position: relative;
    display: block;
    padding: 0px;
    margin-bottom: -1px;
    background-color: #fff;
    border: 1px solid #ddd;
 }
 div.url {
    background-color: rgba(86,61,124,.15);
    padding: 8px;
--- a/templates/controlpanel.html
+++ b/templates/controlpanel.html
@@ -0,0 +1,108 @@
 {% extends "layout.html" %}
 {% block body %}
 {% with messages = get_flashed_messages() %}
 {% if messages %}
 <div class="container">
    <div class="alert alert-success alert-dismissible">
        <a href="#" class="close" data-dismiss="alert" aria-label="close">&times;</a>
        <ul class=flashes>
            {% for message in messages %}
            <li>{{ message }}</li>
            {% endfor %}
        </ul>
    </div>
 </div>
 {% endif %}
 {% endwith %}
 <div class="container">
    <div class="row">
        <div class="col-md-12">
            <center>
                <a href="{{ url_for('search')}}?query=&fields=">
                <img src="{{ url_for('static', filename='centillion_white.png') }}">
                </a>
                {% if config['TAGLINE'] %}
                    <h2><a href="{{ url_for('search')}}?query=&fields=">
                        {{config['TAGLINE']}}
                    </a></h2>
                {% endif %}
            </center>
        </div>
    </div>
    {% if config['zzzTAGLINE'] %}
    <div class="row">
        <div class="col12sm">
            <center>
                <h2><a href="{{ url_for('search')}}?query=&fields=">
                    {{config['TAGLINE']}}
                </a></h2>
            </center>
        </div>
    </div>
    {% endif %}
 </div>
 <hr />
 <div class="container">
    <div class="row">
        {# update main search index #}
        <div class="panel panel-danger">
            <div class="panel-heading">
                <h3 class="panel-title">
                    Update Main Search Index
                </h3>
            </div>
            <div class="panel-body">
                <div class="container-fluid">
                    <div class="row">
                        <div class="col-md-12">
                            <p class="panel-text">Re-index <i>every</i> document in the
                            remote collection in the search index. <b>Warning: this operation may take a while.</b>
                            <p/> <p>
                            <a href="{{ url_for('update_index') }}" class="btn btn-large btn-danger">Update Main Index</a>
                            <p/> 
                        </div>
                    </div>
                </div>
            </div>
        </div>
        {# update diff search index #}
        <div class="panel panel-danger">
            <div class="panel-heading">
                <h3 class="panel-title">
                    Update Diff Search Index
                </h3>
            </div>
            <div class="panel-body">
                <div class="container-fluid">
                    <div class="row">
                        <div class="col-md-12">
                            <p class="panel-text">Diff search index only re-indexes documents created after the last
                            search index update. <b>Not currently implemented.</b>
                            <p/> <p>
                            <a href="#" class="btn btn-large disabled btn-danger">Update Diff Index</a>
                            <p/> 
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </div>
 </div>
 {% endblock %}
--- a/templates/layout.html
+++ b/templates/layout.html
@@ -3,9 +3,10 @@
 <link rel="stylesheet" type="text/css" href="{{ url_for('static', filename='style.css') }}">
 <link rel="stylesheet" type="text/css" href="{{ url_for('static', filename='github-markdown.css') }}">
 <link rel="stylesheet" type="text/css" href="{{ url_for('static', filename='bootstrap.min.css') }}">
 <script src="{{ url_for('static', filename='jquery.min.js') }}"></script>
 <script src="{{ url_for('static', filename='bootstrap.min.js') }}"></script>
 <div>
    {% for message in get_flashed_messages() %}
        <div class="flash">{{ message }}</div>
    {% endfor %}
    {% block body %}{% endblock %}
 </div>
--- a/templates/search.html
+++ b/templates/search.html
@@ -4,34 +4,33 @@
 <div class="container">
    {#
    banner image
    #}
    <div class="row">
        <div class="col12sm">
            <center>
                <a href="{{ url_for('search')}}?query=&fields=">
                <img src="{{ url_for('static', filename='centillion_white.png') }}">
                </a>
                {#
                need a tag line
                #}
                {% if config['TAGLINE'] %}
                    <h2><a href="{{ url_for('search')}}?query=&fields=">
                        {{config['TAGLINE']}}
                    </a></h2>
                {% endif %}
            </center>
        </div>
    </div>
 </div>
 <div class="container">
    <div class="row">
        <div class="col12sm">
            <center>
                <h2>
                    <a href="{{ url_for('search')}}?query=&fields=">
                    Search the DCPPC
                    </a>
                </h2>
            </center>
        </div>
    </div>
-
+        <div class="col-xs-12">
    <div class="row">
        <div class="col-12">
            <center>
                <a class="index" href="{{ url_for('update_index')}}">[update index]</a>
                <a class="index" href="{{ url_for('update_index')}}?rebuild=True">[rebuild index]</a>
                <form action="{{ url_for('search') }}" name="search">
                    <input type="text" name="query" value="{{ query }}"> <br />
                    <button type="submit" style="font-size: 20px; padding: 10px; padding-left: 50px; padding-right: 50px;" 
@@ -48,8 +47,8 @@
    <div class="row">
        {% if directories %}
-        <div class="col-12 info directories-cloud">
+        <div class="col-xs-12 info directories-cloud">
-            File directories:&nbsp
+            <b>File directories:</b> 
            {% for d in directories %}
                <a href="{{url_for('search')}}?query={{d|trim}}&fields=filename">{{d|trim}}</a>
            {% endfor %}
@@ -60,25 +59,38 @@
            {% if config['SHOW_PARSED_QUERY'] and parsed_query %}
                <li  class="list-group-item">
-                    <div class="col-12 info">
+                    <div class="container-fluid">
-                        <b>Parsed query:</b> {{ parsed_query }}
+                        <div class="row">
                            <div class="col-xs-12 info">
                                <b>Parsed query:</b> {{ parsed_query }}
                            </div>
                        </div>
                    </div>
                </li>
            {% endif %}
            {% if parsed_query %}
                <li  class="list-group-item">
-                    <div class="col-12 info">
+                    <div class="container-fluid">
-                        <b>Found:</b> {{entries|length}} documents with results, out of {{totals["total"]}} total documents
+                        <div class="row">
                            <div class="col-xs-12 info">
                                <b>Found:</b> <span class="badge">{{entries|length}}</span> results 
                                out of <span class="badge">{{totals["total"]}}</span> total items indexed
                            </div>
                        </div>
                    </div>
                </li>
            {% endif %}
            <li  class="list-group-item">
-                <div class="col-12 info">
+                    <div class="container-fluid">
-                    <b>Indexing:</b> {{totals["documents"]}} Google Documents,
+                        <div class="row">
-                    {{totals["issues"]}} Github issues, and 
+                            <div class="col-xs-12 info">
-                    {{totals["comments"]}} Github comments
+                                <b>Indexing:</b> <span class="badge">{{totals["documents"]}}</span> Google Documents,
                                <span class="badge">{{totals["issues"]}}</span> Github issues,
                                <span class="badge">{{totals["markdown"]}}</span> markdown files.
                            </div>
                        </div>
                </div>
            </li>
@@ -97,28 +109,26 @@
                        {% if e.kind=="gdoc" %}
                            <b>Google Drive File:</b>
                            <a href='{{e.url}}'>{{e.title}}</a>
-                            ({{e.owner_name}}, {{e.owner_email}})
+                            (Owner: {{e.owner_name}}, {{e.owner_email}})
-                        {% elif e.kind=="comment" %}
+
                            <b>Comment:</b>
                            <a href='{{e.url}}'>Comment (link)</a>
                            {% if e.github_user %}
                            by <a href='https://github.com/{{e.github_user}}'>@{{e.github_user}}</a>
                            {% endif %}
                            on issue <a href='{{e.issue_url}}'>{{e.issue_title}}</a>
                            <br/>
                            <b>Repository:</b> <a href='{{e.repo_url}}'>{{e.repo_name}}</a>
                            {% if e.github_user %}
                            {% endif %}
                        {% elif e.kind=="issue" %}
                            <b>Issue:</b>
-                            <a href='{{e.issue_url}}'>{{e.issue_title}}</a>
+                            <a href='{{e.url}}'>{{e.title}}</a>
                            {% if e.github_user %}
-                            by <a href='https://github.com/{{e.github_user}}'>@{{e.github_user}}</a>
+                            opened by <a href='https://github.com/{{e.github_user}}'>@{{e.github_user}}</a>
                            {% endif %}
                            <br/>
                            <b>Repository:</b> <a href='{{e.repo_url}}'>{{e.repo_name}}</a>
                        {% elif e.kind=="markdown" %}
                            <b>Markdown:</b>
                            <a href='{{e.url}}'>{{e.title}}</a>
                            <br/>
                            <b>Repository:</b> <a href='{{e.repo_url}}'>{{e.repo_name}}</a>
                        {% else %}
                            <b>Item:</b> (<a href='{{e.url}}'>link</a>)
                        {% endif %}
                        <br />
                        score: {{'%d'  % e.score}}
@@ -134,16 +144,28 @@
 <div class="container">
    <div class="row">
-        <div class="col-12">
+        <ul class="list-group">
-            <div class="last-searches">Last searches: <br/>
+
-                {% for s in last_searches %}
+            {% if config['FOOTER_REPO_NAME'] %}
-                    <span><a href="{{url_for('search')}}?{{s}}">{{s}}</a></span>
+                {% if config['FOOTER_REPO_ORG'] %}
-                {% endfor %}
+
-            </div>
+                    <li  class="list-group-item">
-            <p>
+                        <div class="container-fluid">
-                More info can be found in the <a href="https://github.com/BernhardWenzel/markdown-search">README.md file</a>
+                            <div class="row">
-            </p>
+                                <div class="col-xs-12 info">
-        </div>
+                                    More information about {{config['FOOTER_REPO_NAME']}} can be found
                                    in the <a href="https://github.com/{{config['FOOTER_REPO_ORG']}}/{{config['FOOTER_REPO_NAME']}}">{{config['FOOTER_REPO_ORG']}}/{{config['FOOTER_REPO_NAME']}}</a>
                                    repository on Github.
                                </div>
                            </div>
                        </div>
                    </li>
                {% endif %}
            {% endif %}
        </ul>
    </div>
 </div>
Author	SHA1	Message	Date
Charles Reid	4d6386e74a	add results-handling for markdown files	2018-08-03 00:19:57 -07:00
Charles Reid	a93b7519de	improve counts accounting, and construct usable urls for markdown	2018-08-03 00:19:35 -07:00
Charles Reid	5e2c37164b	fix markdown indexing	2018-08-02 23:56:56 -07:00
Charles Reid	829e9c4263	finish subsuming repotree into centillion_search	2018-08-02 23:14:55 -07:00
Charles Reid	283991017c	add repotree script. temporary/standalone, but doing exactly what centillion needs to do.	2018-08-02 22:29:18 -07:00
Charles Reid	653af18f24	add update_index_markdown() function, rough/unfinished	2018-08-02 22:27:30 -07:00
Charles Reid	fae184f1f3	re-indexer now calls (nonexistent file) update_index_markdown	2018-08-02 22:26:56 -07:00
Charles Reid	d40bb3557f	Merge branch 'flask-dance' of charlesreid1/centillion into master	2018-08-03 04:09:20 +00:00
Charles Reid	a848f3ec3e	complete the conversion to oauth tokens	2018-08-02 19:06:34 -07:00
Charles Reid	50d27a915a	update readme	2018-08-02 19:04:40 -07:00
Charles Reid	1b950b7790	update re-index task to use gh token; reorganize logic; use werkzeug proxy	2018-08-02 19:02:00 -07:00
Charles Reid	04d4195668	Add flask-dance to centillion. - Remove config file, which now contains secrets - Add flask dance to requirements - Update instructions in readme to include Github application setup	2018-08-02 11:52:56 -07:00
Charles Reid	d0fe7aa799	ignore config files, which may have keys in them	2018-08-02 11:24:33 -07:00
Charles Reid	acc28aab44	Merge branch 'cache-and-hash' of charlesreid1/centillion into master	2018-08-02 17:59:45 +00:00
Charles Reid	adc2666a9b	actually fix flashed messages	2018-08-02 00:58:37 -07:00
Charles Reid	581f0a67ed	fix messages so they are js and dismissable	2018-08-02 00:54:56 -07:00
Charles Reid	0b96061bc5	update documentation, add new docs pages on components/flask/whoosh	2018-08-01 23:04:35 -07:00
Charles Reid	c7acdea889	finally. make results comprehensible.	2018-08-01 22:39:07 -07:00
Charles Reid	4eabd4536e	remove last searches from search.html	2018-08-01 22:32:20 -07:00
Charles Reid	78276c14d9	align badges higher	2018-08-01 22:31:59 -07:00
Charles Reid	68f90d383f	fix up how issues are added, and how all issues are iterated over (use set algebra)	2018-08-01 22:31:41 -07:00
Charles Reid	202643b85e	add control_panel route, remove last_search silliness	2018-08-01 22:29:06 -07:00
Charles Reid	dc9ac74d68	add control panel page	2018-08-01 20:12:55 -07:00
Charles Reid	36cc94a854	Fix bootstrap div classes, badgify counts, fix <li> styles	2018-08-01 20:12:10 -07:00
Charles Reid	740e757bcd	update todo with what we have done	2018-08-01 15:54:03 -07:00
Charles Reid	bf6afe39c6	caching is working	2018-08-01 15:48:43 -07:00
Charles Reid	54c09ce80b	call add drive file function with add/update docIDs. fix method headers.	2018-08-01 15:17:07 -07:00
Charles Reid	1407178f39	updating flask config and templates to parameterize repo info in footer	2018-08-01 13:43:43 -07:00
Charles Reid	2bf9abfd6f	update footer: prior searches are now badges, and link to more info now points to repo	2018-08-01 13:36:45 -07:00
Charles Reid	8328f96f76	make "prior searches" a badge and infobox bg color	2018-08-01 13:36:05 -07:00
Charles Reid	d5a9fe85af	Merge branch 'master' into cache-and-hash * master: update installation preparation step	2018-08-01 12:50:10 -07:00
Charles Reid	f8d2156d85	update installation preparation step	2018-08-01 12:48:09 -07:00
Charles Reid	a753ba4963	update centillion search with comment blocks laying out what to change and where	2018-08-01 11:32:37 -07:00
Charles Reid	8cca4b2c8d	add TAGLINE param	2018-08-01 00:49:56 -07:00