Merge pull request #95 from dcppc/fix-output-msg

change "documents" to "issues" in reindexing message
fix output messages for reindexing
2018-08-24 09:25:09 -07:00 · 2018-08-24 09:23:09 -07:00 · 2018-08-24 09:20:46 -07:00 · 2018-08-24 09:01:18 -07:00 · 2018-08-24 10:44:45 -05:00 · 2018-08-24 08:42:03 -07:00
32 changed files with 5952 additions and 395 deletions
--- a/.github/ISSUE_TEMPLATE.md
+++ b/.github/ISSUE_TEMPLATE.md
@@ -0,0 +1,17 @@
 Thanks for using Centillion. Your feedback is important to us. 
 ### When reporting a bug, please be sure to include the following:
 - [ ] A descriptive title
 - [ ] The behavior you expect to see and the actual behavior observed
 - [ ] Steps to reproduce the behavior 
 - [ ] What browser you are using
 ### When you open an issue for a feature request, please add as much detail as possible:
 - [ ] A descriptive title
 - [ ] A description of the problem you're trying to solve, including *why* you think this is a problem
 - [ ] An overview of the suggested solution
 - [ ] If the feature changes current behavior, please explain why your solution is better
 See read [our contributor guidelines](https://github.com/dcppc/centillion/blob/dcppc/CONTRIBUTING.md) 
 for more details about contributing to this project.
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1,12 @@
 Thanks for contributing to centillion!
 Please place an x between the brackets to indicate a yes answer
 to the questions below.
 - [ ] Is this pull request mergeable?
 - [ ] Has this been tested locally?
 - [ ] Does this pull request pass the tests?
 - [ ] Have new tests been added to cover any new code?
 - [ ] Was a spellchecker run on the source code and documentation after
  changes were made?
--- a/.gitignore
+++ b/.gitignore
@@ -1,4 +1,4 @@
-config_centillion.py
+feedback_database.json
 config_flask.py
 vp
 credentials.json
--- a/CODE_OF_CONDUCT.md
+++ b/CODE_OF_CONDUCT.md
@@ -0,0 +1,43 @@
 # Code of Conduct
 ## DCPPC Code of Conduct
 All members of the Commons are expected to agree with the following code
 of conduct. We will enforce this code as needed. We expect cooperation
 from all members to help ensuring a safe environment for everybody.
 ## The Quick Version
 The Consortium is dedicated to providing a harassment-free experience
 for everyone, regardless of gender, gender identity and expression, age,
 sexual orientation, disability, physical appearance, body size, race, or
 religion (or lack thereof). We do not tolerate harassment of Consortium
 members in any form. Sexual language and imagery is generally not
 appropriate for any venue, including meetings, presentations, or
 discussions.
 ## The Less Quick Version
 Harassment includes offensive verbal comments related to gender, gender
 identity and expression, age, sexual orientation, disability, physical
 appearance, body size, race, religion, sexual images in public spaces,
 deliberate intimidation, stalking, following, harassing photography or
 recording, sustained disruption of talks or other events, inappropriate
 physical contact, and unwelcome sexual attention.
 Members asked to stop any harassing behavior are expected to comply
 immediately.
 If you are being harassed, notice that someone else is being harassed,
 or have any other concerns, please contact [Titus
 Brown](mailto:ctbrown@ucdavis.edu) immediately. If Titus is the cause of
 your concern, please contact [Vivien
 Bonazzi](mailto:bonazziv@mail.nih.gov).
 We expect members to follow these guidelines at any Consortium event.
 Original source and credit: <http://2012.jsconf.us/#/about> & The Ada
 Initiative. Please help by translating or improving:
 <http://github.com/leftlogic/confcodeofconduct.com>. This work is
 licensed under a Creative Commons Attribution 3.0 Unported License
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -0,0 +1,21 @@
 # Contributing to the DCPPC Internal Repository
 Hello, and thank you for wanting to contribute to the DCPPC Internal
 Repository\!
 By contributing to this repository, you agree:
 1.  To obey the [Code of Conduct](./CODE_OF_CONDUCT.md)
 2.  To release all your contributions under the same terms as the
    license itself: the [Creative Commons Zero](./LICENSE.md) (aka
    Public Domain) license
 If you are OK with these two conditions, then we welcome both you and
 your contribution\!
 If you have any questions about contributing, please [open an
 issue](https://github.com/dcppc/internal/issues/new) and Team Copper
 will lend a hand ASAP.
 Thank you for being here and for being a part of the DCPPC project.
--- a/Disqus.md
+++ b/Disqus.md
--- a/Hypothesis.md
+++ b/Hypothesis.md
@@ -0,0 +1,249 @@
 # Hypothesis API
 ## Authenticating
 Example output call for authenticating with the API:
 ```
 {
    "links": {
        "profile": {
            "read": {
                "url": "https://hypothes.is/api/profile",
                "method": "GET",
                "desc": "Fetch the user's profile"
            },
            "update": {
                "url": "https://hypothes.is/api/profile",
                "method": "PATCH",
                "desc": "Update a user's preferences"
            }
        },
        "search": {
            "url": "https://hypothes.is/api/search",
            "method": "GET",
            "desc": "Search for annotations"
        },
        "group": {
            "member": {
                "add": {
                    "url": "https://hypothes.is/api/groups/:pubid/members/:userid",
                    "method": "POST",
                    "desc": "Add the user in the request params to a group."
                },
                "delete": {
                    "url": "https://hypothes.is/api/groups/:pubid/members/:userid",
                    "method": "DELETE",
                    "desc": "Remove the current user from a group."
                }
            }
        },
        "links": {
            "url": "https://hypothes.is/api/links",
            "method": "GET",
            "desc": "URL templates for generating URLs for HTML pages"
        },
        "groups": {
            "read": {
                "url": "https://hypothes.is/api/groups",
                "method": "GET",
                "desc": "Fetch the user's groups"
            }
        },
        "annotation": {
            "hide": {
                "url": "https://hypothes.is/api/annotations/:id/hide",
                "method": "PUT",
                "desc": "Hide an annotation as a group moderator."
            },
            "unhide": {
                "url": "https://hypothes.is/api/annotations/:id/hide",
                "method": "DELETE",
                "desc": "Unhide an annotation as a group moderator."
            },
            "read": {
                "url": "https://hypothes.is/api/annotations/:id",
                "method": "GET",
                "desc": "Fetch an annotation"
            },
            "create": {
                "url": "https://hypothes.is/api/annotations",
                "method": "POST",
                "desc": "Create an annotation"
            },
            "update": {
                "url": "https://hypothes.is/api/annotations/:id",
                "method": "PATCH",
                "desc": "Update an annotation"
            },
            "flag": {
                "url": "https://hypothes.is/api/annotations/:id/flag",
                "method": "PUT",
                "desc": "Flag an annotation for review."
            },
            "delete": {
                "url": "https://hypothes.is/api/annotations/:id",
                "method": "DELETE",
                "desc": "Delete an annotation"
            }
        }
    }
 }
 ```
 ## Listing
 Here is the result of the API call to list an annotation
 given its annotation ID:
 ```
 {
    "updated": "2018-07-26T10:20:47.803636+00:00",
    "group": "__world__",
    "target": [
        {
            "source": "https://h.readthedocs.io/en/latest/api/authorization/",
            "selector": [
                {
                    "conformsTo": "https://tools.ietf.org/html/rfc3236",
                    "type": "FragmentSelector",
                    "value": "access-tokens"
                },
                {
                    "endContainer": "/div[1]/section[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[2]/p[2]",
                    "startContainer": "/div[1]/section[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[2]/p[1]",
                    "type": "RangeSelector",
                    "startOffset": 14,
                    "endOffset": 116
                },
                {
                    "type": "TextPositionSelector",
                    "end": 2234,
                    "start": 1374
                },
                {
                    "exact": "hich read or write data as a specific user need to be authorized\nwith an access token. Access tokens can be obtained in two ways:\n\nBy generating a personal API token on the Hypothesis developer\npage (you must be logged in to\nHypothesis to get to this page). This is the simplest method, however\nthese tokens are only suitable for enabling your application to make\nrequests as a single specific user.\n\nBy registering an \u201cOAuth client\u201d and\nimplementing the OAuth authentication flow\nin your application. This method allows any user to authorize your\napplication to read and write data via the API as that user.  The Hypothesis\nclient is an example of an application that uses OAuth.\nSee Using OAuth for details of how to implement this method.\n\n\nOnce an access token has been obtained, requests can be authorized by putting\nthe token in the Authorization header.",
                    "prefix": "\n\n\nAccess tokens\u00b6\nAPI requests w",
                    "type": "TextQuoteSelector",
                    "suffix": "\nExample request:\nGET /api HTTP/"
                }
            ]
        }
    ],
    "links": {
        "json": "https://hypothes.is/api/annotations/kEaohJC9Eeiy_UOozkpkyA",
        "html": "https://hypothes.is/a/kEaohJC9Eeiy_UOozkpkyA",
        "incontext": "https://hyp.is/kEaohJC9Eeiy_UOozkpkyA/h.readthedocs.io/en/latest/api/authorization/"
    },
    "tags": [],
    "text": "sdfsdf",
    "created": "2018-07-26T10:20:47.803636+00:00",
    "uri": "https://h.readthedocs.io/en/latest/api/authorization/",
    "flagged": false,
    "user_info": {
        "display_name": null
    },
    "user": "acct:Aravindan@hypothes.is",
    "hidden": false,
    "document": {
        "title": [
            "Authorization \u2014 h 0.0.2 documentation"
        ]
    },
    "id": "kEaohJC9Eeiy_UOozkpkyA",
    "permissions": {
        "read": [
            "group:__world__"
        ],
        "admin": [
            "acct:Aravindan@hypothes.is"
        ],
        "update": [
            "acct:Aravindan@hypothes.is"
        ],
        "delete": [
            "acct:Aravindan@hypothes.is"
        ]
    }
 }
 ```
 ## Searching
 Here is the output from a call to the endpoint to search annotations
 (we pass a specific URL to the search function):
 ```
 {
    "rows": [
        {
            "updated": "2018-08-10T02:21:46.898833+00:00",
            "group": "__world__",
            "target": [
                {
                    "source": "http://pilot.data-commons.us/organize/CopperInternalDeliveryWorkFlow/",
                    "selector": [
                        {
                            "endContainer": "/div[1]/main[1]/div[1]/div[3]/article[1]/h2[1]",
                            "startContainer": "/div[1]/main[1]/div[1]/div[3]/article[1]/h2[1]",
                            "type": "RangeSelector",
                            "startOffset": 0,
                            "endOffset": 80
                        },
                        {
                            "type": "TextPositionSelector",
                            "end": 12328,
                            "start": 12248
                        },
                        {
                            "exact": "Deliverables are due internally on the first of each month, which here is Day 1,",
                            "prefix": "               \n                ",
                            "type": "TextQuoteSelector",
                            "suffix": "\u00b6\nDay -30 through -10\nCopper PM "
                        }
                    ]
                }
            ],
            "links": {
                "json": "https://hypothes.is/api/annotations/IY2W_pxEEeiVuxfD3sehjQ",
                "html": "https://hypothes.is/a/IY2W_pxEEeiVuxfD3sehjQ",
                "incontext": "https://hyp.is/IY2W_pxEEeiVuxfD3sehjQ/pilot.data-commons.us/organize/CopperInternalDeliveryWorkFlow/"
            },
            "tags": [],
            "text": "This is a sample annotation",
            "created": "2018-08-10T02:21:46.898833+00:00",
            "uri": "http://pilot.data-commons.us/organize/CopperInternalDeliveryWorkFlow/",
            "flagged": false,
            "user_info": {
                "display_name": null
            },
            "user": "acct:charlesreid1dib@hypothes.is",
            "hidden": false,
            "document": {
                "title": [
                    "Copper Internal Delivery Workflow - Data Commons Internal Site"
                ]
            },
            "id": "IY2W_pxEEeiVuxfD3sehjQ",
            "permissions": {
                "read": [
                    "group:__world__"
                ],
                "admin": [
                    "acct:charlesreid1dib@hypothes.is"
                ],
                "update": [
                    "acct:charlesreid1dib@hypothes.is"
                ],
                "delete": [
                    "acct:charlesreid1dib@hypothes.is"
                ]
            }
        }
    ],
    "total": 1
 }
 ```
--- a/Readme.md
+++ b/Readme.md
@@ -12,8 +12,13 @@ one centillion is 3.03 log-times better than a googol.
 ## What Is It
 Centillion (https://github.com/dcppc/centillion) is a search engine that can index 
-three kinds of collections: Google Documents (.docx files), Github issues, and Markdown files in 
+different kinds of document collections: Google Documents (.docx files), Google Drive files,
-Github repos.
+Github issues, Github files, Github Markdown files, and Groups.io email threads.
 ## What Is It
 We define the types of documents the centillion should index,
 what info and how. The centillion then builds and
--- a/Todo.md
+++ b/Todo.md
@@ -1,47 +1,29 @@
 # todo
-Main task:
+ux improvements:
- hashing and caching
+- feedback tools
-    - <s>first, working out the logic of how we group items into sets
+- integrating master list into single list
-        - needs to be deleted
+- providing advanced search interfce
        - needs to be updated
        - needs to be added
        - for docs, issues, and comments</s>
    - second, when we add or update an item, need to:
        - go through the motions, download file, extract text
        - check for existing indexed doc with that id
        - check if existing indexed doc has same hash
            - if so, skip
            - otherwise, delete and re-index
-Other bugs:
+big picture improvements:
- Some github issues have no title (?)
+- hypothesis API
- <s>Need to combine issues with comments</s>
+- folksonomy tagging with hypothesis
- Not able to index markdown files _in a repo_
+- tags, expanded schema
 - (Longer term) update main index vs update diff index
 Needs:
 - <s>control panel</s>
 Thursday product:
 - Everything re-indexed nightly
 - Search engine built on all documents in Google Drive, all issues, markdown files
 - Using pandoc to extract Google Drive document contents
 - BRIEF quickstart documentation
 Future:
 - Future plans to improve - plugins, improving matching
 - Subdomain plans
 - Folksonomy tagging and integration plans
-config options for plugins
+feedback form: where we are at
-conditional blocks with import github inside
+- feedback button
-complicated tho - better to have components split off
+- button triggers modal form
-
+- modal has emojis for feedback, text box, buttons
-
+- clicking emojis changes color, to select
 - clicking submit with filled out form submits to an endpoint
 - clicking submit also closes form, but only if submit successful
 feedback form: what we need to do
 - fix alerts - thank you for your feedback doesn't show up until a refresh
 - probably an easy ajax fix
--- a/centillion.py
+++ b/centillion.py
@@ -3,6 +3,7 @@ import subprocess
 import codecs
 import os, json
 from datetime import datetime
 from werkzeug.contrib.fixers import ProxyFix
 from flask import Flask, request, redirect, url_for, render_template, flash, jsonify
@@ -39,6 +40,7 @@ class UpdateIndexTask(object):
                'groupsio_username' :  app_config['GROUPSIO_USERNAME'],
                'groupsio_password' :  app_config['GROUPSIO_PASSWORD']
        }
        self.disqus_token = app_config['DISQUS_TOKEN']
        thread.daemon = True
        thread.start()
@@ -53,6 +55,7 @@ class UpdateIndexTask(object):
        search.update_index(self.groupsio_credentials,
                            self.gh_token,
                            self.disqus_token,
                            self.run_which,
                            config)
@@ -262,13 +265,61 @@ def list_docs(doctype):
        all_orgs = resp.json()
        for org in all_orgs:
            if org['login']=='dcppc':
-                copper_team_id = '2700235'
+                # Business as usual
-                mresp = github.get('/teams/%s/members/%s'%(copper_team_id,username))
+                search = Search(app.config["INDEX_DIR"])
-                if mresp.status_code==204:
+                results_list = search.get_list(doctype)
-                    # Business as usual
+                for result in results_list:
-                    search = Search(app.config["INDEX_DIR"])
+                    ct = result['created_time']
-                    return jsonify(search.get_list(doctype))
+                    result['created_time'] = datetime.strftime(ct,"%Y-%m-%d %I:%M %p")
                return jsonify(results_list)
    # nope
    return render_template('403.html')
@app.route('/feedback', methods=['POST'])
 def parse_request():
    if not github.authorized:
        return redirect(url_for("github.login"))
    username = github.get("/user").json()['login']
    resp = github.get("/user/orgs")
    if resp.ok:
        all_orgs = resp.json()
        for org in all_orgs:
            if org['login']=='dcppc':
                try:
                    # Business as usual
                    data = request.form.to_dict();
                    data['github_login'] = username
                    data['timestamp'] = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
                    feedback_database = 'feedback_database.json'
                    if not os.path.isfile(feedback_database):
                        with open(feedback_database,'w') as f:
                            json_data = [data]
                            json.dump(json_data, f, indent=4)
                    else:
                        json_data = []
                        with open(feedback_database,'r') as f:
                            json_data = json.load(f)
                        json_data.append(data)
                        with open(feedback_database,'w') as f:
                            json.dump(json_data, f, indent=4)
                    ## Should be done with Javascript
                    #flash("Thank you for your feedback!")
                    return jsonify({'status':'ok','message':'Thank you for your feedback!'})
                except:
                    return jsonify({'status':'error','message':'An error was encountered while submitting your feedback. Try submitting an issue in the <a href="https://github.com/dcppc/centillion/issues/new">dcppc/centillion</a> repository.'})
    # nope
    return render_template('403.html')
@app.errorhandler(404)
@@ -297,5 +348,10 @@ def store_search(query, fields):
 if __name__ == '__main__':
    # if running local instance, set to true
    os.environ['OAUTHLIB_INSECURE_TRANSPORT'] = 'true'
-    app.run(host="0.0.0.0",port=5000)
+    port = os.environ.get('CENTILLION_PORT','')
    if port=='':
        port = 5000
    else:
        port = int(port)
    app.run(host="0.0.0.0", port=port)
--- a/centillion_search.py
+++ b/centillion_search.py
@@ -6,6 +6,8 @@ import base64
 from gdrive_util import GDrive
 from groupsio_util import GroupsIOArchivesCrawler, GroupsIOException
 from disqus_util import DisqusCrawler
 from apiclient.http import MediaIoBaseDownload
 import mistune
@@ -19,8 +21,11 @@ import codecs
 from datetime import datetime
 import dateutil.parser
 from whoosh import query
 from whoosh.qparser import MultifieldParser, QueryParser
-from whoosh.analysis import StemmingAnalyzer
+from whoosh.analysis import StemmingAnalyzer, LowercaseFilter, StopFilter
 from whoosh.qparser.dateparse import DateParserPlugin
 from whoosh import fields, index
 """
@@ -103,10 +108,21 @@ class Search:
    # ------------------------------
    # Update the entire index
-    def update_index(self, groupsio_credentials, gh_token, run_which, config):
+    def update_index(self, groupsio_credentials, gh_token, disqus_token, run_which, config):
        """
        Update the entire search index
        """
        if run_which=='all' or run_which=='disqus':
            try:
                self.update_index_disqus(disqus_token, config)
            except Exception as e:
                print("ERROR: While re-indexing: failed to update Disqus comment threads")
                print("-"*40)
                print(repr(e))
                print("-"*40)
                print("Continuing...")
                pass
        if run_which=='all' or run_which=='emailthreads':
            try:
                self.update_index_emailthreads(groupsio_credentials, config)
@@ -172,7 +188,8 @@ class Search:
            os.mkdir(index_folder)
        exists = index.exists_in(index_folder)
-        stemming_analyzer = StemmingAnalyzer()
+        #stemming_analyzer = StemmingAnalyzer()
        stemming_analyzer = StemmingAnalyzer() | LowercaseFilter() | StopFilter()
        # ------------------------------
@@ -180,30 +197,38 @@ class Search:
        # is defined.
        schema = Schema(
-                id = ID(stored=True, unique=True),
+                id = fields.ID(stored=True, unique=True),
-                kind = ID(stored=True),
+                kind = fields.ID(stored=True),
-                created_time = ID(stored=True),
+                created_time = fields.DATETIME(stored=True),
-                modified_time = ID(stored=True),
+                modified_time = fields.DATETIME(stored=True),
-                indexed_time = ID(stored=True),
+                indexed_time = fields.DATETIME(stored=True),
-                title = TEXT(stored=True, field_boost=100.0),
+                title = fields.TEXT(stored=True, field_boost=100.0),
                url = ID(stored=True, unique=True),
-                mimetype=ID(stored=True),
+                url = fields.ID(stored=True),
                owner_email=ID(stored=True),
                owner_name=TEXT(stored=True),
-                repo_name=TEXT(stored=True),
+                mimetype = fields.TEXT(stored=True),
                repo_url=ID(stored=True),
-                github_user=TEXT(stored=True),
+                owner_email = fields.ID(stored=True),
                owner_name = fields.TEXT(stored=True),
                # mainly for email threads, groups.io, hypothesis
                group = fields.ID(stored=True),
                repo_name = fields.TEXT(stored=True),
                repo_url = fields.ID(stored=True),
                github_user = fields.TEXT(stored=True),
                tags = fields.KEYWORD(commas=True,
                                      stored=True,
                                      lowercase=True),
                # comments only
-                issue_title=TEXT(stored=True, field_boost=100.0),
+                issue_title = fields.TEXT(stored=True, field_boost=100.0),
-                issue_url=ID(stored=True),
+                issue_url = fields.ID(stored=True),
-                content=TEXT(stored=True, analyzer=stemming_analyzer)
+                content = fields.TEXT(stored=True, analyzer=stemming_analyzer)
        )
@@ -243,24 +268,32 @@ class Search:
            writer.delete_by_term('id',item['id'])
            # Index a plain google drive file
-            writer.add_document(
+            created_time = dateutil.parser.parse(item['createdTime'])
-                    id = item['id'],
+            modified_time = dateutil.parser.parse(item['modifiedTime'])
-                    kind = 'gdoc',
+            indexed_time = datetime.now().replace(microsecond=0)
-                    created_time = item['createdTime'],
+            try:
-                    modified_time = item['modifiedTime'],
+                writer.add_document(
-                    indexed_time = datetime.now().replace(microsecond=0).isoformat(),
+                        id = item['id'],
-                    title = item['name'],
+                        kind = 'gdoc',
-                    url = item['webViewLink'],
+                        created_time = created_time,
-                    mimetype = mimetype,
+                        modified_time = modified_time,
-                    owner_email = item['owners'][0]['emailAddress'],
+                        indexed_time = indexed_time,
-                    owner_name = item['owners'][0]['displayName'],
+                        title = item['name'],
-                    repo_name='',
+                        url = item['webViewLink'],
-                    repo_url='',
+                        mimetype = mimetype,
-                    github_user='',
+                        owner_email = item['owners'][0]['emailAddress'],
-                    issue_title='',
+                        owner_name = item['owners'][0]['displayName'],
-                    issue_url='',
+                        group='',
-                    content = content
+                        repo_name='',
-            )
+                        repo_url='',
                        github_user='',
                        issue_title='',
                        issue_url='',
                        content = content
                )
            except ValueError as e:
                print(repr(e))
                print(" > XXXXXX Failed to index Google Drive file \"%s\""%(item['name']))
        else:
@@ -314,7 +347,7 @@ class Search:
                )
                assert output == ""
            except RuntimeError:
-                print(" > XXXXXX Failed to index document \"%s\""%(item['name']))
+                print(" > XXXXXX Failed to index Google Drive document \"%s\""%(item['name']))
            # If export was successful, read contents of markdown
@@ -342,24 +375,33 @@ class Search:
            else:
                print(" > Creating a new record")
-            writer.add_document(
+            try:
-                    id = item['id'],
+                created_time = dateutil.parser.parse(item['createdTime'])
-                    kind = 'gdoc',
+                modified_time = dateutil.parser.parse(item['modifiedTime'])
-                    created_time = item['createdTime'],
+                indexed_time = datetime.now()
-                    modified_time = item['modifiedTime'],
+                writer.add_document(
-                    indexed_time = datetime.now().replace(microsecond=0).isoformat(),
+                        id = item['id'],
-                    title = item['name'],
+                        kind = 'gdoc',
-                    url = item['webViewLink'],
+                        created_time = created_time,
-                    mimetype = mimetype,
+                        modified_time = modified_time,
-                    owner_email = item['owners'][0]['emailAddress'],
+                        indexed_time = indexed_time,
-                    owner_name = item['owners'][0]['displayName'],
+                        title = item['name'],
-                    repo_name='',
+                        url = item['webViewLink'],
-                    repo_url='',
+                        mimetype = mimetype,
-                    github_user='',
+                        owner_email = item['owners'][0]['emailAddress'],
-                    issue_title='',
+                        owner_name = item['owners'][0]['displayName'],
-                    issue_url='',
+                        group='',
-                    content = content
+                        repo_name='',
-            )
+                        repo_url='',
                        github_user='',
                        issue_title='',
                        issue_url='',
                        content = content
                )
            except ValueError as e:
                print(repr(e))
                print(" > XXXXXX Failed to index Google Drive file \"%s\""%(item['name']))
@@ -393,31 +435,36 @@ class Search:
                issue_comment_content += comment.body.rstrip()
                issue_comment_content += "\n"
-        # Now create the actual search index record
+        # Now create the actual search index record.
        created_time = clean_timestamp(issue.created_at)
        modified_time = clean_timestamp(issue.updated_at)
        indexed_time = clean_timestamp(datetime.now())
        # Add one document per issue thread,
        # containing entire text of thread.
-        writer.add_document(
+
-                id = issue.html_url,
+        created_time = issue.created_at
-                kind = 'issue',
+        modified_time = issue.updated_at
-                created_time = created_time,
+        indexed_time = datetime.now()
-                modified_time = modified_time,
+        try:
-                indexed_time = indexed_time,
+            writer.add_document(
-                title = issue.title,
+                    id = issue.html_url,
-                url = issue.html_url,
+                    kind = 'issue',
-                mimetype='',
+                    created_time = created_time,
-                owner_email='',
+                    modified_time = modified_time,
-                owner_name='',
+                    indexed_time = indexed_time,
-                repo_name = repo_name,
+                    title = issue.title,
-                repo_url = repo_url,
+                    url = issue.html_url,
-                github_user = issue.user.login,
+                    mimetype='',
-                issue_title = issue.title,
+                    owner_email='',
-                issue_url = issue.html_url,
+                    owner_name='',
-                content = issue_comment_content
+                    group='',
-        )
+                    repo_name = repo_name,
                    repo_url = repo_url,
                    github_user = issue.user.login,
                    issue_title = issue.title,
                    issue_url = issue.html_url,
                    content = issue_comment_content
            )
        except ValueError as e:
            print(repr(e))
            print(" > XXXXXX Failed to index Github issue \"%s\""%(issue.title))
@@ -447,7 +494,8 @@ class Search:
            print(" > XXXXXXXX Failed to find file info.")
            return
-        indexed_time = clean_timestamp(datetime.now())
+
        indexed_time = datetime.now()
        if fext in MARKDOWN_EXTS:
            print("Indexing markdown doc %s from repo %s"%(fname,repo_name))
@@ -476,24 +524,31 @@ class Search:
            usable_url = "https://github.com/%s/blob/master/%s"%(repo_name, fpath)
            # Now create the actual search index record
-            writer.add_document(
+            try:
-                    id = fsha,
+                writer.add_document(
-                    kind = 'markdown',
+                        id = fsha,
-                    created_time = '',
+                        kind = 'markdown',
-                    modified_time = '',
+                        created_time = None,
-                    indexed_time = indexed_time,
+                        modified_time = None,
-                    title = fname,
+                        indexed_time = indexed_time,
-                    url = usable_url,
+                        title = fname,
-                    mimetype='',
+                        url = usable_url,
-                    owner_email='',
+                        mimetype='',
-                    owner_name='',
+                        owner_email='',
-                    repo_name = repo_name,
+                        owner_name='',
-                    repo_url = repo_url,
+                        group='',
-                    github_user = '',
+                        repo_name = repo_name,
-                    issue_title = '',
+                        repo_url = repo_url,
-                    issue_url = '',
+                        github_user = '',
-                    content = content
+                        issue_title = '',
-            )
+                        issue_url = '',
                        content = content
                )
            except ValueError as e:
                print(repr(e))
                print(" > XXXXXX Failed to index Github markdown file \"%s\""%(fname))
        else:
            print("Indexing github file %s from repo %s"%(fname,repo_name))
@@ -501,24 +556,29 @@ class Search:
            key = fname+"_"+fsha
            # Now create the actual search index record
-            writer.add_document(
+            try:
-                    id = key,
+                writer.add_document(
-                    kind = 'ghfile',
+                        id = key,
-                    created_time = '',
+                        kind = 'ghfile',
-                    modified_time = '',
+                        created_time = None,
-                    indexed_time = indexed_time,
+                        modified_time = None,
-                    title = fname,
+                        indexed_time = indexed_time,
-                    url = repo_url,
+                        title = fname,
-                    mimetype='',
+                        url = repo_url,
-                    owner_email='',
+                        mimetype='',
-                    owner_name='',
+                        owner_email='',
-                    repo_name = repo_name,
+                        owner_name='',
-                    repo_url = repo_url,
+                        group='',
-                    github_user = '',
+                        repo_name = repo_name,
-                    issue_title = '',
+                        repo_url = repo_url,
-                    issue_url = '',
+                        github_user = '',
-                    content = ''
+                        issue_title = '',
-            )
+                        issue_url = '',
                        content = ''
                )
            except ValueError as e:
                print(repr(e))
                print(" > XXXXXX Failed to index Github file \"%s\""%(fname))
@@ -529,30 +589,84 @@ class Search:
    def add_emailthread(self, writer, d, config, update=True):
        """
-        Use a Github file API record to add a filename
+        Use a Groups.io email thread record to add 
-        to the search index.
+        an email thread to the search index.
        """
-        indexed_time = clean_timestamp(datetime.now())
+        if 'created_time' in d.keys() and d['created_time'] is not None:
            created_time = d['created_time']
        else:
            created_time = None
        if 'modified_time' in d.keys() and d['modified_time'] is not None:
            modified_time = d['modified_time']
        else:
            modified_time = None
        indexed_time = datetime.now()
        # Now create the actual search index record
-        writer.add_document(
+        try:
-                id = d['permalink'],
+            writer.add_document(
-                kind = 'emailthread',
+                    id = d['permalink'],
-                created_time = '',
+                    kind = 'emailthread',
-                modified_time = '',
+                    created_time = created_time,
-                indexed_time = indexed_time,
+                    modified_time = modified_time,
-                title = d['subject'],
+                    indexed_time = indexed_time,
-                url = d['permalink'],
+                    title = d['subject'],
-                mimetype='',
+                    url = d['permalink'],
-                owner_email='',
+                    mimetype='',
-                owner_name=d['original_sender'],
+                    owner_email='',
-                repo_name = '',
+                    owner_name=d['original_sender'],
-                repo_url = '',
+                    group=d['subgroup'],
-                github_user = '',
+                    repo_name = '',
-                issue_title = '',
+                    repo_url = '',
-                issue_url = '',
+                    github_user = '',
-                content = d['content']
+                    issue_title = '',
-        )
+                    issue_url = '',
                    content = d['content']
            )
        except ValueError as e:
            print(repr(e))
            print(" > XXXXXX Failed to index Groups.io thread \"%s\""%(d['subject']))
    # ------------------------------
    # Add a single disqus comment thread
    # to the search index.
    def add_disqusthread(self, writer, d, config, update=True):
        """
        Use a disqus comment thread record
        to add a disqus comment thread to the
        search index.
        """
        indexed_time = datetime.now()
        # created_time is already a timestamp
        # Now create the actual search index record
        try:
            writer.add_document(
                    id = d['id'],
                    kind = 'disqus',
                    created_time = d['created_time'],
                    modified_time = None,
                    indexed_time = indexed_time,
                    title = d['title'],
                    url = d['link'],
                    mimetype='',
                    owner_email='',
                    owner_name='',
                    repo_name = '',
                    repo_url = '',
                    github_user = '',
                    issue_title = '',
                    issue_url = '',
                    content = d['content']
            )
        except ValueError as e:
            print(repr(e))
            print(" > XXXXXX Failed to index Disqus comment thread \"%s\""%(d['title']))
@@ -580,9 +694,8 @@ class Search:
        # Updated algorithm:
        # - get set of indexed ids
        # - get set of remote ids
-        # - drop indexed ids not in remote ids
+        # - drop all indexed ids
        # - index all remote ids
        # - add hash check in add_
        # Get the set of indexed ids:
@@ -632,7 +745,7 @@ class Search:
            ## Shorter:
            #break
-            # Longer:
+            ## Longer:
            if nextPageToken is None:
                break
@@ -642,40 +755,47 @@ class Search:
        temp_dir = tempfile.mkdtemp(dir=os.getcwd())
        print("Temporary directory: %s"%(temp_dir))
        try:
            # Drop any id in indexed_ids
            # not in remote_ids
            drop_ids = indexed_ids - remote_ids
            for drop_id in drop_ids:
                writer.delete_by_term('id',drop_id)
-        # Drop any id in indexed_ids
+            # Update any id in indexed_ids
-        # not in remote_ids
+            # and in remote_ids
-        drop_ids = indexed_ids - remote_ids
+            update_ids = indexed_ids & remote_ids
-        for drop_id in drop_ids:
+            for update_id in update_ids:
-            writer.delete_by_term('id',drop_id)
+                # cop out
                writer.delete_by_term('id',update_id)
                item = full_items[update_id]
                self.add_drive_file(writer, item, temp_dir, config, update=True)
                count += 1
-        # Update any id in indexed_ids
+            # Add any id not in indexed_ids
-        # and in remote_ids
+            # and in remote_ids
-        update_ids = indexed_ids & remote_ids
+            add_ids = remote_ids - indexed_ids
-        for update_id in update_ids:
+            for add_id in add_ids:
-            # cop out
+                item = full_items[add_id]
-            writer.delete_by_term('id',update_id)
+                self.add_drive_file(writer, item, temp_dir, config, update=False)
-            item = full_items[update_id]
+                count += 1
            self.add_drive_file(writer, item, temp_dir, config, update=True)
            count += 1
        # Add any id not in indexed_ids
        # and in remote_ids
        add_ids = remote_ids - indexed_ids
        for add_id in add_ids:
            item = full_items[add_id]
            self.add_drive_file(writer, item, temp_dir, config, update=False)
            count += 1
        except Exception as e:
            print("ERROR: While adding Google Drive files to search index")
            print("-"*40)
            print(repr(e))
            print("-"*40)
            print("Continuing...")
            pass
        print("Cleaning temporary directory: %s"%(temp_dir))
        subprocess.call(['rm','-fr',temp_dir])
        writer.commit()
-        print("Done, updated %d documents in the index" % count)
+        print("Done, updated %d Google Drive files in the index" % count)
    # ------------------------------
@@ -686,12 +806,6 @@ class Search:
        Update the search index using a collection of 
        Github repo issues and comments.
        """
        # Updated algorithm:
        # - get set of indexed ids
        # - get set of remote ids
        # - drop indexed ids not in remote ids
        # - index all remote ids
        # Get the set of indexed ids:
        # ------
        indexed_issues = set()
@@ -759,7 +873,7 @@ class Search:
        writer.commit()
-        print("Done, updated %d documents in the index" % count)
+        print("Done, updated %d Github issues in the index" % count)
@@ -772,12 +886,6 @@ class Search:
        files (and, separately, Markdown files) from 
        a Github repo.
        """
        # Updated algorithm:
        # - get set of indexed ids
        # - get set of remote ids
        # - drop indexed ids not in remote ids
        # - index all remote ids
        # Get the set of indexed ids:
        # ------
        indexed_ids = set()
@@ -896,12 +1004,6 @@ class Search:
        RELEASE THE SPIDER!!!
        """
        # Algorithm:
        # - get set of indexed ids
        # - get set of remote ids
        # - drop indexed ids not in remote ids
        # - index all remote ids
        # Get the set of indexed ids:
        # ------
        indexed_ids = set()
@@ -919,16 +1021,17 @@ class Search:
        # ask spider to crawl the archives
        spider.crawl_group_archives()
-        # now spider.archives is a list of dictionaries
+        # now spider.archives is a dictionary
-        # that each represent a thread:
+        # with one key per thread ID,
-        #   thread = {
+        # and a value set to the payload:
-        #           'permalink' : permalink,
+        #   '<thread-id>'  : {
-        #           'subject' : subject,
+        #                       'permalink' : permalink,
-        #           'original_sender' : original_sender,
+        #                       'subject' : subject,
-        #           'content' : full_content
+        #                       'original_sender' : original_sender,
-        #   }
+        #                       'content' : full_content
        #                    }
        #
-        # It is hard to reliablly extract more information
+        # It is hard to reliably extract more information
        # than that from the email thread.
        writer = self.ix.writer()
@@ -958,6 +1061,75 @@ class Search:
        print("Done, updated %d Groups.io email threads in the index" % count)
    # ------------------------------
    # Disqus Comments
    def update_index_disqus(self, disqus_token, config):
        """
        Update the search index using a collection of 
        Disqus comment threads from the dcppc-internal 
        forum.
        """
        # Updated algorithm:
        # - get set of indexed ids
        # - get set of remote ids
        # - drop all indexed ids
        # - index all remote ids
        # Get the set of indexed ids:
        # --------------------
        indexed_ids = set()
        p = QueryParser("kind", schema=self.ix.schema)
        q = p.parse("disqus")
        with self.ix.searcher() as s:
            results = s.search(q,limit=None)
            for result in results:
                indexed_ids.add(result['id'])
        # Get the set of remote ids:
        # ------
        spider = DisqusCrawler(disqus_token,'dcppc-internal')
        # ask spider to crawl disqus comments
        spider.crawl_threads()
        # spider.comments will be a dictionary
        # with keys as thread IDs and values as
        # a dictionary item
        writer = self.ix.writer()
        count = 0
        # archives is a dictionary
        # keys are IDs (urls)
        # values are dictionaries
        threads = spider.get_threads()
        # Start by collecting all the things
        remote_ids = set()
        for k in threads.keys():
            remote_ids.add(k)
        # drop indexed_ids
        for drop_id in indexed_ids:
            writer.delete_by_term('id',drop_id)
        # add remote_ids
        for add_id in remote_ids:
            item = threads[add_id]
            self.add_disqusthread(writer, item, config, update=False)
            count += 1
        writer.commit()
        print("Done, updated %d Disqus comment threads in the index" % count)
    # ---------------------------------
    # Search results bundler
@@ -1044,6 +1216,7 @@ class Search:
                "ghfile" : None,
                "markdown" : None,
                "emailthread" : None,
                "disqus" : None,
                "total" : None
        }
        for key in counts.keys():
@@ -1074,7 +1247,9 @@ class Search:
        elif doctype=='issue':
            item_keys = ['title','repo_name','repo_url','url','created_time','modified_time']
        elif doctype=='emailthread':
-            item_keys = ['title','owner_name','url']
+            item_keys = ['title','owner_name','url','group','created_time','modified_time']
        elif doctype=='disqus':
            item_keys = ['title','created_time','url']
        elif doctype=='ghfile':
            item_keys = ['title','repo_name','repo_url','url']
        elif doctype=='markdown':
@@ -1091,11 +1266,7 @@ class Search:
            for r in results:
                d = {}
                for k in item_keys:
-                    if k=='created_time' or k=='modified_time':
+                    d[k] = r[k]
                        #d[k] = r[k]
                        d[k] = dateutil.parser.parse(r[k]).strftime("%Y-%m-%d")
                    else:
                        d[k] = r[k]
                json_results.append(d)
        return json_results
@@ -1108,7 +1279,16 @@ class Search:
            query_string = " ".join(query_list)
            query = None
            if ":" in query_string:
-                query = QueryParser("content", self.schema).parse(query_string)
+
                #query = QueryParser("content", 
                #                    self.schema
                #).parse(query_string)
                query = QueryParser("content", 
                                    self.schema,
                                    termclass=query.Variations
                )
                query.add_plugin(DateParserPlugin(free=True))
                query = query.parse(query_string)
            elif len(fields) == 1 and fields[0] == "filename":
                pass
            elif len(fields) == 2:
@@ -1116,9 +1296,12 @@ class Search:
            else:
                # If the user does not specify a field,
                # these are the fields that are actually searched
-                fields = ['title', 'content','owner_name','owner_email','url']
+                fields = ['title', 'content','owner_name','owner_email','url','created_date','modified_date']
            if not query:
-                query = MultifieldParser(fields, schema=self.ix.schema).parse(query_string)
+                query = MultifieldParser(fields, schema=self.ix.schema)
                query.add_plugin(DateParserPlugin(free=True))
                query = query.parse(query_string)
                #query = MultifieldParser(fields, schema=self.ix.schema).parse(query_string) 
            parsed_query = "%s" % query
            print("query: %s" % parsed_query)
            results = searcher.search(query, terms=False, scored=True, groupedby="kind")
--- a/config_centillion.py
+++ b/config_centillion.py
@@ -0,0 +1,28 @@
 config = {
    "repositories" : [
        "dcppc/project-management",
        "dcppc/nih-demo-meetings",
        "dcppc/internal",
        "dcppc/organize",
        "dcppc/dcppc-bot",
        "dcppc/full-stacks",
        "dcppc/design-guidelines-discuss",
        "dcppc/dcppc-deliverables",
        "dcppc/dcppc-milestones",
        "dcppc/crosscut-metadata",
        "dcppc/lucky-penny",
        "dcppc/dcppc-workshops",
        "dcppc/metadata-matrix",
        "dcppc/data-stewards",
        "dcppc/dcppc-phase1-demos",
        "dcppc/apis",
        "dcppc/2018-june-workshop",
        "dcppc/2018-july-workshop",
        "dcppc/2018-august-workshop",
        "dcppc/2018-september-workshop",
        "dcppc/design-guidelines",
        "dcppc/2018-may-workshop",
        "dcppc/centillion"
    ]
 }
--- a/config_flask.example.py
+++ b/config_flask.example.py
@@ -1,20 +1,38 @@
 ######################################
 # github oauth
 GITHUB_OAUTH_CLIENT_ID = "XXX"
 GITHUB_OAUTH_CLIENT_SECRET = "YYY"
 ######################################
 # github acces token
 GITHUB_TOKEN = "XXX"
 ######################################
 # groups.io
 GROUPSIO_TOKEN = "XXXXX"
 GROUPSIO_USERNAME = "XXXXX"
 GROUPSIO_PASSWORD = "XXXXX"
 ######################################
 # Disqus API public key
 DISQUS_TOKEN = "XXXXX"
 ######################################
 # everything else
 # Location of index file
 INDEX_DIR = "search_index"
 # oauth client deets
 GITHUB_OAUTH_CLIENT_ID = "XXX"
 GITHUB_OAUTH_CLIENT_SECRET = "YYY"
 GITHUB_TOKEN = "ZZZ"
 # More information footer: Repository label
-FOOTER_REPO_ORG = "charlesreid1"
+FOOTER_REPO_ORG = "dcppc"
 FOOTER_REPO_NAME = "centillion"
 # Toggle to show Whoosh parsed query
 SHOW_PARSED_QUERY=True
-TAGLINE = "Search All The Things"
+TAGLINE = "Search the Data Commons"
 # Flask settings
 DEBUG = True
-SECRET_KEY = 'WWWWW'
+SECRET_KEY = 'XXXXX'
--- a/disqus_util.py
+++ b/disqus_util.py
@@ -0,0 +1,154 @@
 import os, re
 import requests
 import json
 import dateutil.parser
 from pprint import pprint
 """
 Convenience class wrapper for Disqus comments.
 This requires that the user provide either their
 API OAuth application credentials (in which case
 a user needs to authenticate with the application
 so it can access the comments that they can see)
 or user credentials from a previous login.
 """
 class DisqusCrawler(object):
    def __init__(self,
                 credentials,
                 group_name):
        self.credentials = credentials
        self.group_name = group_name
        self.crawled_comments = False
        self.threads = None
    def get_threads(self):
        """
        Return a list of dictionaries containing
        entries for each comment thread in the given 
        disqus forum.
        """
        return self.threads
    def crawl_threads(self):
        """
        This will use the API to get every thread,
        and will iterate through every thread to 
        get every comment thread. 
        """
        # The money shot
        threads = {}
        # list all threads
        list_threads_url = 'https://disqus.com/api/3.0/threads/list.json'
        # list all posts (comments)
        list_posts_url = 'https://disqus.com/api/3.0/threads/listPosts.json'
        base_params = dict(
                api_key=self.credentials,
                forum=self.group_name
        )
        # prepare url params
        params = {}
        for k in base_params.keys():
            params[k] = base_params[k]
        # make api call (first loop in fencepost)
        results = requests.request('GET', list_threads_url, params=params).json()
        cursor = results['cursor']
        responses = results['response']
        while True:
            for response in responses:
                if '127.0.0.1' not in response['link'] and 'localhost' not in response['link']:
                    # Save thread info
                    thread_id = response['id']
                    thread_count = response['posts']
                    print("Working on thread %s (%d posts)"%(thread_id,thread_count))
                    if thread_count > 0:
                        # prepare url params
                        params_comments = {}
                        for k in base_params.keys():
                            params_comments[k] = base_params[k]
                        params_comments['thread'] = thread_id
                        # make api call
                        results_comments = requests.request('GET', list_posts_url, params=params_comments).json()
                        cursor_comments = results_comments['cursor']
                        responses_comments = results_comments['response']
                        # Save comments for this thread
                        thread_comments = []
                        while True:
                            for comment in responses_comments:
                                # Save comment info
                                print("    + %s"%(comment['message']))
                                thread_comments.append(comment['message'])
                            if cursor_comments['hasNext']:
                                # Prepare for the next URL call
                                params_comments = {}
                                for k in base_params.keys():
                                    params_comments[k] = base_params[k]
                                params_comments['thread'] = thread_id
                                params_comments['cursor'] = cursor_comments['next']
                                # Make the next URL call
                                results_comments = requests.request('GET', list_posts_url, params=params_comments).json()
                                cursor_comments = results_comments['cursor']
                                responses_comments = results_comments['response']
                            else:
                               break
                        link = response['link']
                        clean_link = re.sub('data-commons.us','nihdatacommons.us',link)
                        clean_link += "#disqus_comments"
                        # Finished working on thread.
                        # We need to make this value a dictionary
                        thread_info = dict(
                                id = response['id'],
                                created_time = dateutil.parser.parse(response['createdAt']),
                                title = response['title'],
                                forum = response['forum'],
                                link = clean_link,
                                content = "\n\n-----".join(thread_comments)
                        )
                        threads[thread_id] = thread_info
            if 'hasNext' in cursor.keys() and cursor['hasNext']:
                # Prepare for next URL call
                params = {}
                for k in base_params.keys():
                    params[k] = base_params[k]
                params['cursor'] = cursor['next']
                # Make the next URL call
                results = requests.request('GET', list_threads_url, params=params).json()
                cursor = results['cursor']
                responses = results['response']
            else:
                break
        self.threads = threads
--- a/groupsio_util.py
+++ b/groupsio_util.py
@@ -1,5 +1,7 @@
 import requests, os, re
 from bs4 import BeautifulSoup
 import dateutil.parser
 import datetime
 class GroupsIOException(Exception):
    pass
@@ -251,7 +253,7 @@ class GroupsIOArchivesCrawler(object):
            subject = soup.find('title').text
            # Extract information for the schema:
-            # - permalink for thread (done)
+            # - permalink for thread (done above)
            # - subject/title (done)
            # - original sender email/name (done)
            # - content (done)
@@ -266,11 +268,35 @@ class GroupsIOArchivesCrawler(object):
                    pass
                else:
                    # found an email!
-                    # this is a maze, thanks groups.io
+                    # this is a maze, not amazing.
                    # thanks groups.io!
                    td = tr.find('td')
-                    divrow = td.find('div',{'class':'row'}).find('div',{'class':'pull-left'})
+
                    sender_divrow = td.find('div',{'class':'row'})
                    sender_divrow = sender_divrow.find('div',{'class':'pull-left'})
                    if (i+1)==1:
-                        original_sender = divrow.text.strip()
+                        original_sender = sender_divrow.text.strip()
                    date_divrow = td.find('div',{'class':'row'})
                    date_divrow = date_divrow.find('div',{'class':'pull-right'})
                    date_divrow = date_divrow.find('font',{'class':'text-muted'})
                    date_divrow = date_divrow.find('script').text
                    try:
                        time_seconds = re.search(' [0-9]{1,} ',date_divrow).group(0)
                        time_seconds = time_seconds.strip()
                        # Thanks groups.io for the weird date formatting
                        time_seconds = time_seconds[:10]
                        mmicro_seconds = time_seconds[10:]
                        if (i+1)==1:
                            created_time  = datetime.datetime.utcfromtimestamp(int(time_seconds))
                            modified_time = datetime.datetime.utcfromtimestamp(int(time_seconds))
                        else:
                            modified_time = datetime.datetime.utcfromtimestamp(int(time_seconds))
                    except AttributeError:
                        created_time = None
                        modified_time = None
                    for div in td.find_all('div'):
                        if div.has_attr('id'):
@@ -299,7 +325,10 @@ class GroupsIOArchivesCrawler(object):
            thread = {
                    'permalink' : permalink,
                    'created_time' : created_time,
                    'modified_time' : modified_time,
                    'subject' : subject,
                    'subgroup' : subgroup_name,
                    'original_sender' : original_sender,
                    'content' : full_content
            }
@@ -324,11 +353,13 @@ class GroupsIOArchivesCrawler(object):
        results = []
        for row in rows:
-            # We don't care about anything except title and ugly link
+            # This is where we extract
            # a list of thread titles 
            # and corresponding links.
            subject = row.find('span',{'class':'subject'})
            title = subject.get_text()
            link = row.find('a')['href']
-            #print(title)
+
            results.append((title,link))
        return results
--- a/hypothesis_util.py
+++ b/hypothesis_util.py
@@ -0,0 +1,89 @@
 import requests
 import json
 import os
 def get_headers():
    if 'HYPOTHESIS_TOKEN' in os.environ:
        token = os.environ['HYPOTHESIS_TOKEN']
    else:
        raise Exception("Need to specify Hypothesis token with HYPOTHESIS_TOKEN env var")
    auth_header = 'Bearer %s'%(token)
    return {'Authorization': auth_header}
 def basic_auth():
    url = ' https://hypothes.is/api'
    # Get the authorization header
    headers = get_headers()
    # Make the request
    response = requests.get(url, headers=headers)
    if response.status_code==200:
        # Interpret results as JSON
        dat = response.json()
        print(json.dumps(dat, indent=4))
    else:
        print("Response status code was not OK: %d"%(response.status_code))
 def list_annotations():
    # kEaohJC9Eeiy_UOozkpkyA
    url = 'https://hypothes.is/api/annotations/kEaohJC9Eeiy_UOozkpkyA'
    # Get the authorization header
    headers = get_headers()
    # Make the request
    response = requests.get(url, headers=headers)
    if response.status_code==200:
        # Interpret results as JSON
        dat = response.json()
        print(json.dumps(dat, indent=4))
    else:
        print("Response status code was not OK: %d"%(response.status_code))
 def search_annotations():
    url = ' https://hypothes.is/api/search'
    # Get the authorization header
    headers = get_headers()
    # Set query params
    params = dict(
            url = '*pilot.nihdatacommons.us*',
            limit = 200
    )
    #http://pilot.nihdatacommons.us/organize/CopperInternalDeliveryWorkFlow/',
    # Make the request
    response = requests.get(url, headers=headers, params=params)
    if response.status_code==200:
        # Interpret results as JSON
        dat = response.json()
        print(json.dumps(dat, indent=4))
    else:
        print("Response status code was not OK: %d"%(response.status_code))
 if __name__=="__main__":
    search_annotations()
--- a/static/centillion_white_beta.png
+++ b/static/centillion_white_beta.png
--- a/static/centillion_white_localhost.png
+++ b/static/centillion_white_localhost.png
--- a/static/feedback.js
+++ b/static/feedback.js
@@ -0,0 +1,133 @@
 // submitting form with modal:
 // https://stackoverflow.com/a/29068742
 //
 // closing a bootstrap modal with submit button:
 // https://stackoverflow.com/a/33478107
 //
 // flask post data as json:
 // https://stackoverflow.com/a/16664376
 /* this function is called when the user submits
 * the feedback form. it submits a post request
 * to the flask server, which squirrels away the
 * feedback in a file.
 */
 function submit_feedback() {
    // this function is called when submit button clicked
    // algorithm:
    // - check if text box has content
    // - check if happy/sad filled out
    var smile_active = $('#modal-feedback-smile-div').hasClass('smile-active');
    var frown_active = $('#modal-feedback-frown-div').hasClass('frown-active');
    if( !( smile_active || frown_active ) ) {
        alert('Please pick the smile or the frown.')
    } else if( $('#modal-feedback-textarea').val()=='' ) {
        alert('Please provide us with some feedback.')
    } else {
        var user_sentiment = '';
        if(smile_active) {
            user_sentiment = 'smile';
        } else {
            user_sentiment = 'frown';
        }
        var escaped_text = $('#modal-feedback-textarea').val();
        // prepare form data 
        var data = {
            sentiment : user_sentiment,
            content : escaped_text
        };
        // post the form. the callback function resets the form
        $.post("/feedback", 
            data, 
            function(response) {
                $('#myModal').modal('hide');
                $('#myModalForm')[0].reset();
                add_alert(response);
                frown_unclick();
                smile_unclick();
        });
    }
 }
 function add_alert(response) {
    str = ""
    str += '<div id="feedback-messages-container" class="container">';
    if (response['status']=='ok') {
        // if status is ok, use alert-success
        str += '    <div id="feedback-messages-alert" class="alert alert-success alert-dismissible fade in">';
    } else {
        // otherwise use alert-danger
        str += '    <div id="feedback-messages-alert" class="alert alert-danger alert-dismissible fade in">';
    }
    str += '        <a href="#" class="close" data-dismiss="alert" aria-label="close">&times;</a>';
    str += '        <div id="feedback-messages-contianer" class="container-fluid">';
    str += '            <div id="feedback-messages-div" class="co-xs-12">';
    str += '                <p>'
    str += response['message'];
    str += '                </p>';
    str += '            </div>';
    str += '    </div>';
    str += '</div>';
    $('div#messages').append(str);
 }
 /* for those particularly wordy users... limit feedback to 1000 chars */
 function cool_it() { 
    if($('#modal-feedback-textarea').val().length > 1100 ){
        $('#modal-too-long').show();
    } else {
        $('#modal-too-long').hide();
    }
 }
 /* smiley functions */
 function smile_click() {
    $('#modal-feedback-smile-div').addClass('smile-active');
    $('#modal-feedback-smile-icon').addClass('smile-active');
 }
 function frown_click() {
    $('#modal-feedback-frown-div').addClass('frown-active');
    $('#modal-feedback-frown-icon').addClass('frown-active');
 }
 function smile_unclick() {
    $('#modal-feedback-smile-div').removeClass('smile-active');
    $('#modal-feedback-smile-icon').removeClass('smile-active');
 }
 function frown_unclick() {
    $('#modal-feedback-frown-div').removeClass('frown-active');
    $('#modal-feedback-frown-icon').removeClass('frown-active');
 }
 function smile() {
    frown_unclick();
    smile_click();
 }
 function frown() { 
    smile_unclick();
    frown_click();
 }
 /* for those particularly wordy users... limit feedback to 1100 chars */
 // how to check n characters in a textarea
 // https://stackoverflow.com/a/19934613
 /*
 $(document).ready(function() {
    $('#modal-feedback-textarea').on('change',function(event) {
        if($('#modal-feedback-textarea').val().length > 1100 ){
            $('#modal-too-long').show();
        } else {
            $('#modal-too-long').hide();
        }
    });
 }
 */
--- a/static/master_list.js
+++ b/static/master_list.js
@@ -22,6 +22,7 @@ var initIssuesTable = false;
 var initGhfilesTable = false;
 var initMarkdownTable = false;
 var initEmailthreadsTable = false;
 var initDisqusTable = false;
 $(document).ready(function() {
    var url_string = document.location.toString();
@@ -32,10 +33,6 @@ $(document).ready(function() {
        load_gdoc_table();
        var divList = $('div#collapseDrive').addClass('in');
    } else if (d==='emailthread') {
        load_emailthreads_table();
        var divList = $('div#collapseThreads').addClass('in');
    } else if (d==='issue') {
        load_issue_table();
        var divList = $('div#collapseIssues').addClass('in');
@@ -48,10 +45,37 @@ $(document).ready(function() {
        load_markdown_table();
        var divList = $('div#collapseMarkdown').addClass('in');
    } else if (d==='emailthread') {
        load_emailthreads_table();
        var divList = $('div#collapseThreads').addClass('in');
    } else if (d==='disqus') {
        load_disqusthreads_table();
        var divList = $('div#collapseDisqus').addClass('in');
    }
 });
 //////////////////////////////////
 // utility functions
 // https://stackoverflow.com/a/25275808
 function iso8601(date) {
  var hours = date.getHours();
  var minutes = date.getMinutes();
  var ampm = hours >= 12 ? 'PM' : 'AM';
  hours = hours % 12;
  hours = hours ? hours : 12; // the hour '0' should be '12'
  minutes = minutes < 10 ? '0'+minutes : minutes;
  var strTime = hours + ':' + minutes + ' ' + ampm;
  return date.getYear() + "-" + (date.getMonth()+1) + "-" + date.getDate() + "  " + strTime;
 }
 // https://stackoverflow.com/a/7390612
 var toType = function(obj) {
  return ({}).toString.call(obj).match(/\s([a-zA-Z]+)/)[1].toLowerCase()
 }
 //////////////////////////////////
 // API-to-Table Functions
@@ -77,9 +101,9 @@ function load_gdoc_table(){
    if(!initGdocTable) {
        var divList = $('div#collapseDrive').attr('class');
        if (divList.indexOf('in') !== -1) {
-            console.log('Closing Google Drive master list');
+            //console.log('Closing Google Drive master list');
        } else { 
-            console.log('Opening Google Drive master list');
+            //console.log('Opening Google Drive master list');
            $.getJSON("/list/gdoc", function(result){
@@ -125,7 +149,7 @@ function load_gdoc_table(){
                initGdocTable = true
            });
-            console.log('Finished loading Google Drive master list');
+            //console.log('Finished loading Google Drive master list');
        }
    }
 }
@@ -137,9 +161,9 @@ function load_issue_table(){
    if(!initIssuesTable) {
        var divList = $('div#collapseIssues').attr('class');
        if (divList.indexOf('in') !== -1) {
-            console.log('Closing Github issues master list');
+            //console.log('Closing Github issues master list');
        } else { 
-            console.log('Opening Github issues master list');
+            //console.log('Opening Github issues master list');
            $.getJSON("/list/issue", function(result){
                var r = new Array(), j = -1, size=result.length;
@@ -183,7 +207,7 @@ function load_issue_table(){
                initIssuesTable = true;
            });
-            console.log('Finished loading Github issues master list');
+            //console.log('Finished loading Github issues master list');
        }
    }
 }
@@ -195,13 +219,13 @@ function load_ghfile_table(){
    if(!initGhfilesTable) {
        var divList = $('div#collapseFiles').attr('class');
        if (divList.indexOf('in') !== -1) {
-            console.log('Closing Github files master list');
+            //console.log('Closing Github files master list');
        } else { 
-            console.log('Opening Github files master list');
+            //console.log('Opening Github files master list');
            $.getJSON("/list/ghfile", function(result){
-                console.log("-----------");
+                //console.log("-----------");
-                console.log(result);
+                //console.log(result);
                var r = new Array(), j = -1, size=result.length;
                r[++j] = '<thead>'
                r[++j] = '<tr class="header-row">';
@@ -237,7 +261,7 @@ function load_ghfile_table(){
                initGhfilesTable = true;
            });
-            console.log('Finished loading Github file list');
+            //console.log('Finished loading Github file list');
        }
    }
 }
@@ -249,9 +273,9 @@ function load_markdown_table(){
    if(!initMarkdownTable) { 
        var divList = $('div#collapseMarkdown').attr('class');
        if (divList.indexOf('in') !== -1) {
-            console.log('Closing Github markdown master list');
+            //console.log('Closing Github markdown master list');
        } else { 
-            console.log('Opening Github markdown master list');
+            //console.log('Opening Github markdown master list');
            $.getJSON("/list/markdown", function(result){
                var r = new Array(), j = -1, size=result.length;
@@ -289,7 +313,7 @@ function load_markdown_table(){
                initMarkdownTable = true;
            });
-            console.log('Finished loading Markdown list');
+            //console.log('Finished loading Markdown list');
        }
    }
 }
@@ -302,16 +326,18 @@ function load_emailthreads_table(){
    if(!initEmailthreadsTable) { 
        var divList = $('div#collapseThreads').attr('class');
        if (divList.indexOf('in') !== -1) {
-            console.log('Closing Groups.io email threads master list');
+            //console.log('Closing Groups.io email threads master list');
        } else { 
-            console.log('Opening Groups.io email threads master list');
+            //console.log('Opening Groups.io email threads master list');
            $.getJSON("/list/emailthread", function(result){
                var r = new Array(), j = -1, size=result.length;
                r[++j] = '<thead>'
                r[++j] = '<tr class="header-row">';
-                r[++j] = '<th width="70%">Topic</th>';
+                r[++j] = '<th width="60%">Topic</th>';
-                r[++j] = '<th width="30%">Started By</th>';
+                r[++j] = '<th width="15%">Started By</th>';
                r[++j] = '<th width="15%">Date</th>';
                r[++j] = '<th width="10%">Mailing List</th>';
                r[++j] = '</tr>';
                r[++j] = '</thead>'
                r[++j] = '<tbody>'
@@ -322,6 +348,10 @@ function load_emailthreads_table(){
                    r[++j] = '</a>'
                    r[++j] = '</td><td>';
                    r[++j] = result[i]['owner_name'];
                    r[++j] = '</td><td>';
                    r[++j] = result[i]['created_time'];
                    r[++j] = '</td><td>';
                    r[++j] = result[i]['group'];
                    r[++j] = '</td></tr>';
                }
                r[++j] = '</tbody>'
@@ -340,7 +370,57 @@ function load_emailthreads_table(){
                initEmailthreadsTable = true;
            });
-            console.log('Finished loading Groups.io email threads list');
+            //console.log('Finished loading Groups.io email threads list');
        }
    }
 }
 // ------------------------
 // Disqus Comment Threads
 function load_disqusthreads_table(){
    if(!initEmailthreadsTable) { 
        var divList = $('div#collapseDisqus').attr('class');
        if (divList.indexOf('in') !== -1) {
            //console.log('Closing Disqus comment threads master list');
        } else { 
            //console.log('Opening Disqus comment threads master list');
            $.getJSON("/list/disqus", function(result){
                var r = new Array(), j = -1, size=result.length;
                r[++j] = '<thead>'
                r[++j] = '<tr class="header-row">';
                r[++j] = '<th width="70%">Page Title</th>';
                r[++j] = '<th width="30%">Created</th>';
                r[++j] = '</tr>';
                r[++j] = '</thead>'
                r[++j] = '<tbody>'
                for (var i=0; i<size; i++){
                    r[++j] ='<tr><td>';
                    r[++j] = '<a href="' + result[i]['url'] + '" target="_blank">'
                    r[++j] = result[i]['title'];
                    r[++j] = '</a>'
                    r[++j] = '</td><td>';
                    r[++j] = result[i]['created_time'];
                    r[++j] = '</td></tr>';
                }
                r[++j] = '</tbody>'
                // Construct names of id tags
                var doctype = 'disqus';
                var idlabel = '#' + doctype + '-master-list';
                var filtlabel = idlabel + '_filter';
                // Initialize the DataTable
                $(idlabel).html(r.join(''));
                $(idlabel).DataTable({
                    responsive: true,
                    lengthMenu: [50,100,250,500]
                });
                initDisqusTable = true;
            });
            console.log('Finished loading Disqus comment threads list');
        }
    }
 }
--- a/static/search_list.js
+++ b/static/search_list.js
@@ -31,7 +31,7 @@ $(document).ready(function() {
              aTargets : [2]
            }
        ],
-        lengthMenu: [50,100,250,500]
+        lengthMenu: [10,20,50,100]
    });
    console.log('Finished loading search results list');
--- a/static/style.css
+++ b/static/style.css
@@ -1,6 +1,64 @@
 #modal-too-long {
    visibility: hidden;
 }
 /* feedback smileys */
 #modal-feedback-smile-icon,
 #modal-feedback-frown-icon {
    padding-left: 100px;
    padding-right: 100px;
    padding-top: 20px;
    padding-bottom: 20px;
 }
 div.smile-active {
    background-color: #2b2;
 }
 i.smile-active {
    color: #fff;
 }
 div.frown-active {
    background-color: #b22;
 }
 i.frown-active {
    color: #fff;
 }
 /* feedback text area */
 #modal-feedback-textarea {
    width: 100%;
 }
 /* feedback buttons */
 button.close {
    font-size: 35px;
 }
 button#submit-feedback-btn {
    width: 250px;
 }
 button#feedback:hover {
    opacity: 1.0;
    filter: alpha(opacity=100); /* For IE8 and earlier */
 }
 button#feedback {
    opacity: 0.5;
    filter: alpha(opacity=50); /* For IE8 and earlier */
    width: 180px;
    height: 50px;
    position: fixed; 
    z-index: 999;
    right: 120px;
    bottom: 10px;
 }
 /* search results table */
 td#search-results-score-col,
 td#search-results-type-col {
-    width: 100px;
+    width: 90px;
 }
 div.container {
@@ -28,6 +86,14 @@ div.container {
 }
 /* badges for number of docs indexed */
 span.results-count {
    background-color: #555;
 }
 span.indexing-count {
    background-color: #337ab7;
 }
 span.badge {
    vertical-align: text-bottom;
 }
@@ -68,7 +134,7 @@ li.search-group-item {
 }
 div.url {
-    background-color: rgba(86,61,124,.15);
+    background-color: rgba(40,40,60,.15);
    padding: 8px;
 }
@@ -134,7 +200,7 @@ table {
 .info, .last-searches {
    color: gray;
-    font-size: 12px;
+    /*font-size: 12px;*/
    font-family: Arial, serif;
 }
@@ -144,27 +210,27 @@ table {
 div.tags a, td.tag-cloud a {
    color: #b56020;
-    font-size: 12px;
+    /*font-size: 12px;*/
 }
 td.tag-cloud, td.directories-cloud {
-    font-size: 12px;
+    /*font-size: 12px;*/
    color: #555555;
 }
 td.directories-cloud a {
-    font-size: 12px;
+    /*font-size: 12px;*/
    color: #377BA8;
 }
 div.path {
-    font-size: 12px;
+    /*font-size: 12px;*/
    color: #666666;
    margin-bottom: 3px;
 }
 div.path a {
-    font-size: 12px;
+    /*font-size: 12px;*/
    margin-right: 5px;
 }
--- a/templates/403.html
+++ b/templates/403.html
@@ -1,4 +1,5 @@
 {% extends "layout.html" %}
 {% set active_page = "403" %}
 {% block body %}
 <div class="container">
--- a/templates/404.html
+++ b/templates/404.html
@@ -1,4 +1,5 @@
 {% extends "layout.html" %}
 {% set active_page = "404" %}
 {% block body %}
 <div class="container">
--- a/templates/banner.html
+++ b/templates/banner.html
@@ -0,0 +1,32 @@
 <div class="container" id="banner-container">
    {#
    banner image
    #}
    <div class="row" id="banner-row">
        <div class="col12sm" id="banner-col">
            <center>
                <a id="banner-a" href="{{ url_for('search')}}?query=&fields=">
                    {% if 'betasearch' in request.url %}
                        <img id="banner-img" src="{{ url_for('static', filename='centillion_white_beta.png') }}">
                    {% elif 'localhost' in request.url %}
                        <img id="banner-img" src="{{ url_for('static', filename='centillion_white_localhost.png') }}">
                    {% else %}
                        <img id="banner-img" src="{{ url_for('static', filename='centillion_white.png') }}">
                    {% endif %}
                </a>
            </center>
        </div>
    </div>
    {% if config['TAGLINE'] %}
    <div class="row" id="tagline-row">
        <div class="col12sm" id="tagline-col">
            <center>
                    <h2 id="tagline-tagline"> {{config['TAGLINE']}} </h2>
            </center>
        </div>
    </div>
    {% endif %}
 </div>
--- a/templates/controlpanel.html
+++ b/templates/controlpanel.html
@@ -1,4 +1,5 @@
 {% extends "layout.html" %}
 {% set active_page = "control_panel" %}
 {% block body %}
 <hr />
@@ -53,6 +54,8 @@
                            </p>  
                            <p><a href="{{ url_for('update_index',run_which='emailthreads') }}" class="btn btn-large btn-danger btn-reindex-type">Update Groups.io Email Threads Index</a>
                            </p> 
                            <p><a href="{{ url_for('update_index',run_which='disqus') }}"       class="btn btn-large btn-danger btn-reindex-type">Update Disqus Comment Threads Index</a>
                            </p> 
                        </div>
                    </div>
                </div>
--- a/templates/flashed_messages.html
+++ b/templates/flashed_messages.html
@@ -0,0 +1,14 @@
 <div id="messages">
 {% with messages = get_flashed_messages() %}
    {% if messages %}
        <div class="container" id="flashed-messages-container">
            <div class="alert alert-success alert-dismissible fade in">
                <a href="#" class="close" data-dismiss="alert" aria-label="close">&times;</a>
                    {% for message in messages %}
                        <p>{{ message }}</p>
                    {% endfor %}
            </div>
        </div>
    {% endif %}
 {% endwith %}
 </div>
--- a/templates/landing.html
+++ b/templates/landing.html
@@ -1,4 +1,5 @@
 {% extends "layout.html" %}
 {% set active_page = "landing" %}
 {% block body %}
 <div class="container">
    <div class="row">
--- a/templates/layout.html
+++ b/templates/layout.html
@@ -10,6 +10,7 @@
 <script src="{{ url_for('static', filename='bootstrap.min.js') }}"></script>
 <script src="{{ url_for('static', filename='master_list.js') }}"></script>
 <script src="{{ url_for('static', filename='search_list.js') }}"></script>
 <script src="{{ url_for('static', filename='feedback.js') }}"></script>
 {# ########## dataTables plugin ############ #}
@@ -23,52 +24,43 @@
-{# ########## github fork corner ############ #}
+<div id="master-div">
-<div>
+    {#
    flashed messages
    #}
    {% include "flashed_messages.html" %}
-    {% with messages = get_flashed_messages() %}
+    {#
-    {% if messages %}
+    banner image
-    <div class="container">
+    #}
-        <div class="alert alert-success alert-dismissible">
+    {% include "banner.html" %}
            <a href="#" class="close" data-dismiss="alert" aria-label="close">&times;</a>
            <ul class=flashes>
                {% for message in messages %}
                <li>{{ message }}</li>
                {% endfor %}
            </ul>
        </div>
    </div>
    {% endif %}
    {% endwith %}
-    <div class="container">
+    {#
-    
+    feedback modal
-        {#
+    #}
-        banner image
+    {% include "modal.html" %}
        #}
        <div class="row">
            <div class="col12sm">
                <center>
                    <a href="{{ url_for('search')}}?query=&fields=">
                    <img src="{{ url_for('static', filename='centillion_white.png') }}">
                    </a>
                    {#
                    need a tag line
                    #}
                    {% if config['TAGLINE'] %}
                        <h2><a href="{{ url_for('search')}}?query=&fields=">
                            {{config['TAGLINE']}}
                        </a></h2>
                    {% endif %}
                </center>
            </div>
        </div>
    </div>
    {% block body %}{% endblock %}
 </div>
-<a href="https://github.com/dcppc/centillion" class="github-corner" aria-label="View source on Github"><svg width="80" height="80" viewBox="0 0 250 250" style="fill:#151513; color:#fff; position: absolute; top: 0; border: 0; right: 0;" aria-hidden="true"><path d="M0,0 L115,115 L130,115 L142,142 L250,250 L250,0 Z"></path><path d="M128.3,109.0 C113.8,99.7 119.0,89.6 119.0,89.6 C122.0,82.7 120.5,78.6 120.5,78.6 C119.2,72.0 123.4,76.3 123.4,76.3 C127.3,80.9 125.5,87.3 125.5,87.3 C122.9,97.6 130.6,101.9 134.4,103.2" fill="currentColor" style="transform-origin: 130px 106px;" class="octo-arm"></path><path d="M115.0,115.0 C114.9,115.1 118.7,116.5 119.8,115.4 L133.7,101.6 C136.9,99.2 139.9,98.4 142.2,98.6 C133.8,88.0 127.5,74.4 143.8,58.0 C148.5,53.4 154.0,51.2 159.7,51.0 C160.3,49.4 163.2,43.6 171.4,40.1 C171.4,40.1 176.1,42.5 178.8,56.2 C183.1,58.6 187.2,61.8 190.9,65.4 C194.5,69.0 197.7,73.2 200.1,77.6 C213.8,80.2 216.3,84.9 216.3,84.9 C212.7,93.1 206.9,96.0 205.4,96.6 C205.1,102.4 203.0,107.8 198.3,112.5 C181.9,128.9 168.3,122.5 157.7,114.1 C157.9,116.9 156.7,120.9 152.7,124.9 L141.0,136.5 C139.8,137.7 141.6,141.9 141.8,141.8 Z" fill="currentColor" class="octo-body"></path></svg></a><style>.github-corner:hover .octo-arm{animation:octocat-wave 560ms ease-in-out}@keyframes octocat-wave{0%,100%{transform:rotate(0)}20%,60%{transform:rotate(-25deg)}40%,80%{transform:rotate(10deg)}}@media (max-width:500px){.github-corner:hover .octo-arm{animation:none}.github-corner .octo-arm{animation:octocat-wave 560ms ease-in-out}}</style>
+{% if active_page=="search" or active_page=="master_list" %}
    {# feedback button #}
    <button id="feedback" type="button" 
                          data-toggle="modal"
                          data-target="#myModal"
        class="btn btn-lg">Send Feedback</button>
    {# vertical spacing before the bottom, b/c of button #}
    <div id="footer-whitespace" class="container">
        <p>&nbsp;</p>
        <p>&nbsp;</p>
        <p>&nbsp;</p>
    </div>
 {% endif %}
 <a id="github-corner" href="https://github.com/dcppc/centillion" class="github-corner" aria-label="View source on Github"><svg width="80" height="80" viewBox="0 0 250 250" style="fill:#151513; color:#fff; position: absolute; top: 0; border: 0; right: 0;" aria-hidden="true"><path d="M0,0 L115,115 L130,115 L142,142 L250,250 L250,0 Z"></path><path d="M128.3,109.0 C113.8,99.7 119.0,89.6 119.0,89.6 C122.0,82.7 120.5,78.6 120.5,78.6 C119.2,72.0 123.4,76.3 123.4,76.3 C127.3,80.9 125.5,87.3 125.5,87.3 C122.9,97.6 130.6,101.9 134.4,103.2" fill="currentColor" style="transform-origin: 130px 106px;" class="octo-arm"></path><path d="M115.0,115.0 C114.9,115.1 118.7,116.5 119.8,115.4 L133.7,101.6 C136.9,99.2 139.9,98.4 142.2,98.6 C133.8,88.0 127.5,74.4 143.8,58.0 C148.5,53.4 154.0,51.2 159.7,51.0 C160.3,49.4 163.2,43.6 171.4,40.1 C171.4,40.1 176.1,42.5 178.8,56.2 C183.1,58.6 187.2,61.8 190.9,65.4 C194.5,69.0 197.7,73.2 200.1,77.6 C213.8,80.2 216.3,84.9 216.3,84.9 C212.7,93.1 206.9,96.0 205.4,96.6 C205.1,102.4 203.0,107.8 198.3,112.5 C181.9,128.9 168.3,122.5 157.7,114.1 C157.9,116.9 156.7,120.9 152.7,124.9 L141.0,136.5 C139.8,137.7 141.6,141.9 141.8,141.8 Z" fill="currentColor" class="octo-body"></path></svg></a><style>.github-corner:hover .octo-arm{animation:octocat-wave 560ms ease-in-out}@keyframes octocat-wave{0%,100%{transform:rotate(0)}20%,60%{transform:rotate(-25deg)}40%,80%{transform:rotate(10deg)}}@media (max-width:500px){.github-corner:hover .octo-arm{animation:none}.github-corner .octo-arm{animation:octocat-wave 560ms ease-in-out}}</style>
--- a/templates/masterlist.html
+++ b/templates/masterlist.html
@@ -1,4 +1,5 @@
 {% extends "layout.html" %}
 {% set active_page = "master_list" %}
 {% block body %}
 <hr />
@@ -8,8 +9,9 @@
    <div class="row">
    {#
-     # google drive files panel
+    # google drive files panel
-     #}
+    #}
    <a name="gdoc"></a>
    <div class="row">
        <div class="panel">
            <div class="panel-group" id="accordionDrive" role="tablist" aria-multiselectable="true">
@@ -45,8 +47,9 @@
    {#
-     # github issue panel
+    # github issue panel
-     #}
+    #}
    <a name="issue"></a>
    <div class="row">
        <div class="panel">
            <div class="panel-group" id="accordionIssues" role="tablist" aria-multiselectable="true">
@@ -84,8 +87,9 @@
    {#
-     # github file panel
+    # github file panel
-     #}
+    #}
    <a name="ghfile"></a>
    <div class="row">
        <div class="panel">
            <div class="panel-group" id="accordionFiles" role="tablist" aria-multiselectable="true">
@@ -121,8 +125,9 @@
    {#
-     # gh markdown file panel
+    # gh markdown file panel
-     #}
+    #}
    <a name="markdown"></a>
    <div class="row">
        <div class="panel">
            <div class="panel-group" id="accordionMarkdown" role="tablist" aria-multiselectable="true">
@@ -159,8 +164,9 @@
    {#
-     # groups.io
+    # groups.io email threads
-     #}
+    #}
    <a name="emailthread"></a>
    <div class="row">
        <div class="panel">
            <div class="panel-group" id="accordionThreads" role="tablist" aria-multiselectable="true">
@@ -194,6 +200,42 @@
        </div>
    </div>
    {#
    # disqus comment threads
    #}
    <a name="disqus"></a>
    <div class="row">
        <div class="panel">
            <div class="panel-group" id="accordionDisqus" role="tablist" aria-multiselectable="true">
                <div class="panel panel-default">
                    <div class="panel-heading" role="tab" id="disqus">
                        <h2 class="masterlist-header">
                            <a class="collapsed" 
                                role="button"
                                onClick="load_disqusthreads_table()"
                                data-toggle="collapse" 
                                data-parent="#accordionDisqus"
                                href="#collapseDisqus" 
                                aria-expanded="true"
                                aria-controls="collapseDisqus">
                                Disqus Comment Threads <small>indexed by centillion</small>
                            </a>
                        </h2>
                    </div>
                    <div id="collapseDisqus" class="panel-collapse collapse" role="tabpanel" 
                        aria-labelledby="disqus">
                        <div class="panel-body">
                            <table class="table table-striped" id="disqus-master-list">
                            </table>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </div>
 </div>
--- a/templates/modal.html
+++ b/templates/modal.html
@@ -0,0 +1,51 @@
 <div class="modal fade" id="myModal" tabindex="-1" role="dialog" aria-labelledby="myModalLabel">
  <form id="myModalForm" method="post">
    <div class="modal-dialog" role="document">
      <div id="myModal-content" class="modal-content">
        <div id="myModal-header" class="modal-header">
          <button type="button" class="close" data-dismiss="modal" aria-label="Close">
              <span aria-hidden="true">&times;</span>
          </button>
          <h4 class="modal-title" id="myModalLabel">
          Send us feedback!
          </h4>
        </div>
        <div id="myModal-body" class="modal-body">
          <div id="modal-feedback-smile-frown-container" class="container-fluid">
              <div id="modal-feedback-smile-div" class="col-xs-6 text-center" 
                  onClick="smile()">
                  <i id="modal-feedback-smile-icon" class="fa fa-smile-o fa-4x" aria-hidden="true"></i>
              </div>
              <div id="modal-feedback-frown-div" class="col-xs-6 text-center" 
                  onClick="frown()">
                  <i id="modal-feedback-frown-icon" class="fa fa-frown-o fa-4x" aria-hidden="true"></i>
              </div>
          </div>
          <div class="container-fluid">
              <p>&nbsp;</p>
          </div>
          <div id="modal-feedback-textarea-container" class="container-fluid">
              <textarea id="modal-feedback-textarea" rows="6"></textarea>
          </div>
          <div id="modal-too-long" class="container-fluid" >
              <p id="modal-too-long-text" class="lead">Please limit the length of your feedback. Thank you in advance!</p>
          </div>
        </div>
        <div id="myModal-footer" class="modal-footer">
            <div class="text-center">
              <button id="submit-feedback-btn" type="button" 
                  onClick="submit_feedback()"
                  class="btn btn-lg btn-primary">
                  Send
              </button>
            </div>
        </div>
      </div>
    </div>
  </form>
 </div>        
--- a/templates/search.html
+++ b/templates/search.html
@@ -1,7 +1,8 @@
 {% extends "layout.html" %}
 {% set active_page = "search" %}
 {% block body %}
-<div class="container">
+<div id="search-bar-container" class="container">
    <div class="row">
        <div class="col-xs-12">
@@ -12,7 +13,11 @@
                    <p><button id="the-big-one" type="submit" style="font-size: 20px; padding: 10px; padding-left: 50px; padding-right: 50px;" 
                        value="search" class="btn btn-primary">Search</button>
                    </p>
-                    <p><a href="{{ url_for('search')}}?query=&fields=">[clear all results]</a>
+
                    {% if parsed_query %}
                        <p><a href="{{ url_for('search')}}?query=&fields=">[clear all results]</a>
                    {% endif %}
                    </p>
                </form>
            </center>
@@ -20,17 +25,10 @@
    </div>
 </div>
-<div class="container">
+<div style="height: 20px;"><p>&nbsp;</p></div>
    <div class="row">
-        {% if directories %}
+<div id="info-bars-container" class="container">
-        <div class="col-xs-12 info directories-cloud">
+    <div class="row">
            <b>File directories:</b> 
            {% for d in directories %}
                <a href="{{url_for('search')}}?query={{d|trim}}&fields=filename">{{d|trim}}</a>
            {% endfor %}
        </div>
        {% endif %}
        <ul class="list-group">
@@ -46,60 +44,70 @@
                </li>
            {% endif %}
            {# use "if parsed_query" to check if this is 
               a new search or search results #}
            {% if parsed_query %}
                <li  class="list-group-item">
                    <div class="container-fluid">
                        <div class="row">
                            <div class="col-xs-12 info">
-                                <b>Found:</b> <span class="badge">{{entries|length}}</span> results 
+                                <b>Found:</b> <span class="badge results-count">{{entries|length}}</span> results 
-                                out of <span class="badge">{{totals["total"]}}</span> total items indexed
+                                out of <span class="badge results-count">{{totals["total"]}}</span> total items indexed
                            </div>
                        </div>
                    </div>
                </li>
            {% endif %}
            <li  class="list-group-item">
                    <div class="container-fluid">
                        <div class="row">
                            <div class="col-xs-12 info">
                                <b>Indexing:</b>
-                                <span class="badge">{{totals["gdoc"]}}</span>
+                                <span class="badge indexing-count">{{totals["gdoc"]}}</span>
-                                <a href="/master_list?doctype=gdoc">
+                                <a href="/master_list?doctype=gdoc#gdoc">
                                Google Drive files
                                </a>,
-                                <span class="badge">{{totals["issue"]}}</span>
+                                <span class="badge indexing-count">{{totals["issue"]}}</span>
-                                <a href="/master_list?doctype=issue">
+                                <a href="/master_list?doctype=issue#issue">
                                Github issues
                                </a>,
-                                <span class="badge">{{totals["ghfile"]}}</span>
+                                <span class="badge indexing-count">{{totals["ghfile"]}}</span>
-                                <a href="/master_list?doctype=ghfile">
+                                <a href="/master_list?doctype=ghfile#ghfile">
                                Github files
                                </a>,
-                                <span class="badge">{{totals["markdown"]}}</span>
+                                <span class="badge indexing-count">{{totals["markdown"]}}</span>
-                                <a href="/master_list?doctype=markdown">
+                                <a href="/master_list?doctype=markdown#markdown">
                                Github Markdown files
                                </a>,
-                                <span class="badge">{{totals["emailthread"]}}</span>
+                                <span class="badge indexing-count">{{totals["emailthread"]}}</span>
-                                <a href="/master_list?doctype=emailthread">
+                                <a href="/master_list?doctype=emailthread#emailthread">
                                Groups.io email threads
                                </a>,
                                <span class="badge indexing-count">{{totals["disqus"]}}</span>
                                <a href="/master_list?doctype=disqus#disqus">
                                Disqus comment threads
                                </a>
                            </div>
                        </div>
                </div>
            </li>
        </ul>
    </div>
 </div>
 {% if parsed_query %}
-<div class="container">
+<div id="search-results-container" class="container">
    <div class="row">
        <table id="search-results" class="table">
            <thead id="search-results-header">
@@ -126,44 +134,21 @@
                            {% if e.kind=="gdoc" %}
                                {% if e.mimetype=="document" %}
                                    <p><small>Drive Document</small</p>
                                    <!--
                                    <i class="fa fa-google fa-2x"></i>
                                    <i class="fa fa-file-text fa-2x"></i>
                                    -->
                                {% else %}
                                    <p><small>Drive File</small</p>
                                    <!--
                                    <i class="fa fa-google fa-2x"></i>
                                    <i class="fa fa-file-o fa-2x"></i>
                                    -->
                                {% endif %}
                            {% elif e.kind=="issue" %}
                                <p><small>Issue</small</p>
                                <!--
                                <i class="fa fa-github fa-2x"></i>
                                <i class="fa fa-question fa-2x"></i>
                                -->
                            {% elif e.kind=="ghfile" %}
                                <p><small>Github File</small</p>
                                <!--
                                <i class="fa fa-github fa-2x"></i>
                                <i class="fa fa-file-o fa-2x"></i>
                                -->
                            {% elif e.kind=="markdown" %}
                                <p><small>Github Markdown</small</p>
                                <!--
                                <i class="fa fa-github fa-2x"></i>
                                <i class="fa fa-file-text-o fa-2x"></i>
                                -->
                            {% elif e.kind=="emailthread" %}
                                <p><small>Email Thread</small</p>
                                <!--
                                <i class="fa fa-envelope-o fa-2x"></i>
                                -->
                            {% else %}
                                <p><small>Unknown</small</p>
Author	SHA1	Message	Date
Chaz Reid	1985e6606c	Merge pull request #95 from dcppc/fix-output-msg change "documents" to "issues" in reindexing message	2018-08-24 09:25:09 -07:00
Charles Reid	1b2f9a2278	fix output messages for reindexing	2018-08-24 09:23:09 -07:00
Chaz Reid	d7d929689b	Merge pull request #94 from dcppc/raynamharris-patch-1 Create ISSUE_TEMPLATE.md	2018-08-24 09:20:46 -07:00
Charles Reid	937708f5d8	do full indexing	2018-08-24 09:01:18 -07:00
Rayna M Harris	d2dff2217a	fixed typo	2018-08-24 10:44:45 -05:00
Charles Reid	4c3ee712bb	Fix display bug. Merge branch 'dcppc' of github.com:dcppc/centillion into dcppc * 'dcppc' of github.com:dcppc/centillion: fix styles	2018-08-24 08:42:03 -07:00
Charles Reid	f5af965a33	fix display bug	2018-08-24 08:41:35 -07:00
Charles Reid	bce16d336d	fix flask example configuration	2018-08-24 08:40:46 -07:00
Rayna M Harris	9b2ce7b3ca	Create ISSUE_TEMPLATE.md	2018-08-24 10:40:29 -05:00
Chaz Reid	729514ac89	Merge pull request #93 from dcppc/fix-styles fix styles	2018-08-24 08:37:51 -07:00
Charles Reid	46ce070b09	fix styles	2018-08-24 08:31:57 -07:00
Charles Reid	891fa50868	fix results boxes in results table to be gray	2018-08-24 02:30:49 -07:00
Charles Reid	fdb3963ede	tack on the disqus comments anchor to disqus URLs	2018-08-24 02:01:34 -07:00
Chaz Reid	90379a69c5	Merge pull request #92 from dcppc/add-date-subgrp-emailthreads add string formatting for dates and add date/mailing list column to email threads master list	2018-08-24 01:58:29 -07:00
Charles Reid	0faca67c35	add string formatting for dates and add date/mailing list column to email threads master list closes #58	2018-08-24 01:56:19 -07:00
Chaz Reid	77b533b642	Merge pull request #86 from dcppc/disqus Add Disqus	2018-08-24 01:18:37 -07:00
Chaz Reid	ccf013e3c9	Merge pull request #85 from dcppc/add-coc-dotgithub Add Code of Conduct, Contributing, and PR template	2018-08-24 01:18:14 -07:00
Chaz Reid	e67db4f1ef	Merge pull request #89 from dcppc/fix-flashed-messages-font fix font used in flashed messages	2018-08-24 01:17:59 -07:00
Chaz Reid	b11a26a812	Merge pull request #91 from dcppc/merge-datetime-into-disqus Merge datetime into disqus	2018-08-24 01:14:24 -07:00
Charles Reid	55a74f7d98	Merge branch 'use-datetime' into merge-datetime-into-disqus * use-datetime: extract date and time from email threads pages add groups and tags to schema; update how we determine timestamps; handle exceptions when we add the document to the writer, rather than elsewhere move where exception is caught (exception was also incorrect.) switched created_time, modified_time, indexed_time over to DATETIME. added DateParserPlugin to query QueryParser. added time fields to those being searched by default. tests do not seem to be working.	2018-08-24 01:13:42 -07:00
Chaz Reid	ab76226b0c	Merge pull request #90 from dcppc/add-dates-and-subgroups-to-emails Add dates and subgroups to emails	2018-08-24 00:07:40 -07:00
Charles Reid	a4ebef6e6f	extract date and time from email threads pages	2018-08-24 00:04:35 -07:00
Charles Reid	bad50efa9b	add groups and tags to schema; update how we determine timestamps; handle exceptions when we add the document to the writer, rather than elsewhere	2018-08-24 00:03:23 -07:00
Charles Reid	629fc063db	move where exception is caught (exception was also incorrect.)	2018-08-24 00:01:26 -07:00
Charles Reid	4f41d8597f	fix font used in flashed messages	2018-08-23 19:05:16 -07:00
Charles Reid	3b0baa21de	switched created_time, modified_time, indexed_time over to DATETIME. added DateParserPlugin to query QueryParser. added time fields to those being searched by default. tests do not seem to be working.	2018-08-23 19:01:40 -07:00
Charles Reid	33b8857bd0	implement stop filter; implement query variations in main query parser	2018-08-23 17:15:48 -07:00
Charles Reid	7c50fc9ff1	swap out data-commons with nihdatacommons in disqus urls	2018-08-23 17:15:10 -07:00
Charles Reid	eb2cdf1437	fix a bug	2018-08-23 15:57:25 -07:00
Charles Reid	c67e864581	add disqus threads to things being indexed by centillion	2018-08-23 15:55:59 -07:00
Charles Reid	25cc12cf21	turn disqus_util into a crawler object	2018-08-23 15:55:21 -07:00
Charles Reid	11c1185e62	clarify api call in disqus.md	2018-08-23 15:54:30 -07:00
Charles Reid	17b2d359bb	add contributing and code of conduct files	2018-08-23 11:03:48 -07:00
Charles Reid	62ca62274e	add github pull request template	2018-08-23 11:02:37 -07:00
Charles Reid	74cfaf8275	add more notes on hypothesis API output	2018-08-22 15:23:28 -07:00
Charles Reid	552caad135	add utilities to call disqus and hypothesis APIs - both of these files are functions and are not integrated into centillion	2018-08-22 15:22:25 -07:00
Charles Reid	19c42df978	update hypothesis/disqus notes	2018-08-22 12:45:51 -07:00
Charles Reid	6f30e3f120	add api output from listThreads endpoint	2018-08-21 19:36:36 -07:00
Charles Reid	ad6b653e27	add all the threads	2018-08-21 15:10:40 -07:00
Chaz Reid	501cae8329	Merge pull request #81 from dcppc/detect-beta-banner Add custom banners for beta/localhost centillion instances	2018-08-21 13:18:11 -07:00
Charles Reid	0543c3e89f	fix filename	2018-08-21 12:01:12 -07:00
Charles Reid	2191140232	Add custom banners for beta/localhost centillion instances	2018-08-21 11:58:19 -07:00
Chaz Reid	6bfadef829	Merge pull request #73 from dcppc/feedback-floater Add a feedback mechanism	2018-08-21 11:33:34 -07:00
Charles Reid	c38683ae9f	(resolve conflict) Merge branch 'dcppc' into feedback-floater * dcppc: add centillion config back. no sensitive info. add option to set port at runtime with CENTILLION_PORT environment variable add a bit o whitespace	2018-08-21 11:32:59 -07:00
Chaz Reid	3f5349a5a6	Merge pull request #80 from dcppc/add-centillion-config-back add centillion config back. no sensitive info.	2018-08-21 11:16:21 -07:00
Charles Reid	f88cf6ecad	add centillion config back. no sensitive info.	2018-08-21 11:15:29 -07:00
Chaz Reid	ec54292a4b	Merge pull request #79 from dcppc/add-port-env-var add option to set port at runtime	2018-08-21 11:12:17 -07:00
Charles Reid	296132d356	add option to set port at runtime with CENTILLION_PORT environment variable	2018-08-21 11:09:46 -07:00
Chaz Reid	0bc40ba323	Merge pull request #76 from dcppc/add-whitespace add a bit o whitespace	2018-08-21 10:33:20 -07:00
Charles Reid	8143e214c2	add a bit o whitespace	2018-08-21 10:06:16 -07:00
Charles Reid	b015da2e9b	add dismissable "thanks for your feedback" message to top	2018-08-20 20:42:58 -07:00
Charles Reid	9c6b57ba85	improve message formatting	2018-08-20 15:04:21 -07:00
Charles Reid	a080eebc29	add dumy function as placeholder for where we add info messages	2018-08-20 15:04:03 -07:00
Charles Reid	323d7ce8ca	return better messages	2018-08-20 15:03:21 -07:00
Charles Reid	da62a5c887	add successful post call and export to JSON db	2018-08-20 14:10:20 -07:00
Charles Reid	2714ad3e0c	update todo	2018-08-20 14:09:58 -07:00
Charles Reid	5e1388e8a8	move modal into its own .html file	2018-08-20 10:34:20 -07:00
Charles Reid	f40cccac99	update todo with tasks	2018-08-20 10:33:51 -07:00
Charles Reid	c72fc44ea7	fix button and smiley styles	2018-08-20 10:33:36 -07:00
Charles Reid	cf417917c9	add /feedback post route	2018-08-20 10:29:40 -07:00
Charles Reid	8aaad93e68	Merge remote-tracking branch 'origin/dcppc' into feedback-floater * origin/dcppc: remove copper requirement for /list endpoint	2018-08-19 21:05:22 -07:00
Chaz Reid	bf8d99c732	Merge pull request #71 from dcppc/hotfix/list-endpoint remove copper requirement for /list endpoint	2018-08-19 20:49:12 -07:00
Charles Reid	20c55891f3	remove copper requirement for /list endpoint	2018-08-19 20:44:06 -07:00
Charles Reid	685058545a	feedback button successfully triggers a modal	2018-08-19 01:37:55 -07:00
Charles Reid	23fd17132e	add page self-identifiers. add "send feedback" button. fix layouts.	2018-08-19 01:16:13 -07:00
Chaz Reid	d3ba1f11a7	Merge pull request #63 from dcppc/hotfix/search-box fix "Search Metadata" label so search does not break	2018-08-17 08:35:16 -07:00