add beginnings of a drop-down panel where we can put advanced search and stumbleupon

tack on the disqus comments anchor to disqus URLs
Merge pull request #92 from dcppc/add-date-subgrp-emailthreads
2018-08-24 02:20:28 -07:00 · 2018-08-24 02:01:34 -07:00 · 2018-08-24 01:58:29 -07:00 · 2018-08-24 01:56:19 -07:00 · 2018-08-24 01:18:37 -07:00 · 2018-08-24 01:18:14 -07:00
14 changed files with 463 additions and 218 deletions
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1,12 @@
 Thanks for contributing to centillion!
 Please place an x between the brackets to indicate a yes answer
 to the questions below.
 - [ ] Is this pull request mergeable?
 - [ ] Has this been tested locally?
 - [ ] Does this pull request pass the tests?
 - [ ] Have new tests been added to cover any new code?
 - [ ] Was a spellchecker run on the source code and documentation after
  changes were made?
--- a/CODE_OF_CONDUCT.md
+++ b/CODE_OF_CONDUCT.md
@@ -0,0 +1,43 @@
 # Code of Conduct
 ## DCPPC Code of Conduct
 All members of the Commons are expected to agree with the following code
 of conduct. We will enforce this code as needed. We expect cooperation
 from all members to help ensuring a safe environment for everybody.
 ## The Quick Version
 The Consortium is dedicated to providing a harassment-free experience
 for everyone, regardless of gender, gender identity and expression, age,
 sexual orientation, disability, physical appearance, body size, race, or
 religion (or lack thereof). We do not tolerate harassment of Consortium
 members in any form. Sexual language and imagery is generally not
 appropriate for any venue, including meetings, presentations, or
 discussions.
 ## The Less Quick Version
 Harassment includes offensive verbal comments related to gender, gender
 identity and expression, age, sexual orientation, disability, physical
 appearance, body size, race, religion, sexual images in public spaces,
 deliberate intimidation, stalking, following, harassing photography or
 recording, sustained disruption of talks or other events, inappropriate
 physical contact, and unwelcome sexual attention.
 Members asked to stop any harassing behavior are expected to comply
 immediately.
 If you are being harassed, notice that someone else is being harassed,
 or have any other concerns, please contact [Titus
 Brown](mailto:ctbrown@ucdavis.edu) immediately. If Titus is the cause of
 your concern, please contact [Vivien
 Bonazzi](mailto:bonazziv@mail.nih.gov).
 We expect members to follow these guidelines at any Consortium event.
 Original source and credit: <http://2012.jsconf.us/#/about> & The Ada
 Initiative. Please help by translating or improving:
 <http://github.com/leftlogic/confcodeofconduct.com>. This work is
 licensed under a Creative Commons Attribution 3.0 Unported License
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -0,0 +1,21 @@
 # Contributing to the DCPPC Internal Repository
 Hello, and thank you for wanting to contribute to the DCPPC Internal
 Repository\!
 By contributing to this repository, you agree:
 1.  To obey the [Code of Conduct](./CODE_OF_CONDUCT.md)
 2.  To release all your contributions under the same terms as the
    license itself: the [Creative Commons Zero](./LICENSE.md) (aka
    Public Domain) license
 If you are OK with these two conditions, then we welcome both you and
 your contribution\!
 If you have any questions about contributing, please [open an
 issue](https://github.com/dcppc/internal/issues/new) and Team Copper
 will lend a hand ASAP.
 Thank you for being here and for being a part of the DCPPC project.
--- a/centillion.py
+++ b/centillion.py
@@ -267,7 +267,11 @@ def list_docs(doctype):
            if org['login']=='dcppc':
                # Business as usual
                search = Search(app.config["INDEX_DIR"])
-                return jsonify(search.get_list(doctype))
+                results_list = search.get_list(doctype)
                for result in results_list:
                    ct = result['created_time']
                    result['created_time'] = datetime.strftime(ct,"%Y-%m-%d %I:%M %p")
                return jsonify(results_list)
    # nope
    return render_template('403.html')
--- a/centillion_search.py
+++ b/centillion_search.py
@@ -24,6 +24,8 @@ import dateutil.parser
 from whoosh import query
 from whoosh.qparser import MultifieldParser, QueryParser
 from whoosh.analysis import StemmingAnalyzer, LowercaseFilter, StopFilter
 from whoosh.qparser.dateparse import DateParserPlugin
 from whoosh import fields, index
 """
@@ -195,30 +197,38 @@ class Search:
        # is defined.
        schema = Schema(
-                id = ID(stored=True, unique=True),
+                id = fields.ID(stored=True, unique=True),
-                kind = ID(stored=True),
+                kind = fields.ID(stored=True),
-                created_time = ID(stored=True),
+                created_time = fields.DATETIME(stored=True),
-                modified_time = ID(stored=True),
+                modified_time = fields.DATETIME(stored=True),
-                indexed_time = ID(stored=True),
+                indexed_time = fields.DATETIME(stored=True),
-                title = TEXT(stored=True, field_boost=100.0),
+                title = fields.TEXT(stored=True, field_boost=100.0),
                url = ID(stored=True, unique=True),
-                mimetype=ID(stored=True),
+                url = fields.ID(stored=True),
                owner_email=ID(stored=True),
                owner_name=TEXT(stored=True),
-                repo_name=TEXT(stored=True),
+                mimetype = fields.TEXT(stored=True),
                repo_url=ID(stored=True),
-                github_user=TEXT(stored=True),
+                owner_email = fields.ID(stored=True),
                owner_name = fields.TEXT(stored=True),
                # mainly for email threads, groups.io, hypothesis
                group = fields.ID(stored=True),
                repo_name = fields.TEXT(stored=True),
                repo_url = fields.ID(stored=True),
                github_user = fields.TEXT(stored=True),
                tags = fields.KEYWORD(commas=True,
                                      stored=True,
                                      lowercase=True),
                # comments only
-                issue_title=TEXT(stored=True, field_boost=100.0),
+                issue_title = fields.TEXT(stored=True, field_boost=100.0),
-                issue_url=ID(stored=True),
+                issue_url = fields.ID(stored=True),
-                content=TEXT(stored=True, analyzer=stemming_analyzer)
+                content = fields.TEXT(stored=True, analyzer=stemming_analyzer)
        )
@@ -258,24 +268,32 @@ class Search:
            writer.delete_by_term('id',item['id'])
            # Index a plain google drive file
-            writer.add_document(
+            created_time = dateutil.parser.parse(item['createdTime'])
-                    id = item['id'],
+            modified_time = dateutil.parser.parse(item['modifiedTime'])
-                    kind = 'gdoc',
+            indexed_time = datetime.now().replace(microsecond=0)
-                    created_time = item['createdTime'],
+            try:
-                    modified_time = item['modifiedTime'],
+                writer.add_document(
-                    indexed_time = datetime.now().replace(microsecond=0).isoformat(),
+                        id = item['id'],
-                    title = item['name'],
+                        kind = 'gdoc',
-                    url = item['webViewLink'],
+                        created_time = created_time,
-                    mimetype = mimetype,
+                        modified_time = modified_time,
-                    owner_email = item['owners'][0]['emailAddress'],
+                        indexed_time = indexed_time,
-                    owner_name = item['owners'][0]['displayName'],
+                        title = item['name'],
-                    repo_name='',
+                        url = item['webViewLink'],
-                    repo_url='',
+                        mimetype = mimetype,
-                    github_user='',
+                        owner_email = item['owners'][0]['emailAddress'],
-                    issue_title='',
+                        owner_name = item['owners'][0]['displayName'],
-                    issue_url='',
+                        group='',
-                    content = content
+                        repo_name='',
-            )
+                        repo_url='',
                        github_user='',
                        issue_title='',
                        issue_url='',
                        content = content
                )
            except ValueError as e:
                print(repr(e))
                print(" > XXXXXX Failed to index Google Drive file \"%s\""%(item['name']))
        else:
@@ -329,7 +347,7 @@ class Search:
                )
                assert output == ""
            except RuntimeError:
-                print(" > XXXXXX Failed to index document \"%s\""%(item['name']))
+                print(" > XXXXXX Failed to index Google Drive document \"%s\""%(item['name']))
            # If export was successful, read contents of markdown
@@ -357,24 +375,33 @@ class Search:
            else:
                print(" > Creating a new record")
-            writer.add_document(
+            try:
-                    id = item['id'],
+                created_time = dateutil.parser.parse(item['createdTime'])
-                    kind = 'gdoc',
+                modified_time = dateutil.parser.parse(item['modifiedTime'])
-                    created_time = item['createdTime'],
+                indexed_time = datetime.now()
-                    modified_time = item['modifiedTime'],
+                writer.add_document(
-                    indexed_time = datetime.now().replace(microsecond=0).isoformat(),
+                        id = item['id'],
-                    title = item['name'],
+                        kind = 'gdoc',
-                    url = item['webViewLink'],
+                        created_time = created_time,
-                    mimetype = mimetype,
+                        modified_time = modified_time,
-                    owner_email = item['owners'][0]['emailAddress'],
+                        indexed_time = indexed_time,
-                    owner_name = item['owners'][0]['displayName'],
+                        title = item['name'],
-                    repo_name='',
+                        url = item['webViewLink'],
-                    repo_url='',
+                        mimetype = mimetype,
-                    github_user='',
+                        owner_email = item['owners'][0]['emailAddress'],
-                    issue_title='',
+                        owner_name = item['owners'][0]['displayName'],
-                    issue_url='',
+                        group='',
-                    content = content
+                        repo_name='',
-            )
+                        repo_url='',
                        github_user='',
                        issue_title='',
                        issue_url='',
                        content = content
                )
            except ValueError as e:
                print(repr(e))
                print(" > XXXXXX Failed to index Google Drive file \"%s\""%(item['name']))
@@ -408,31 +435,36 @@ class Search:
                issue_comment_content += comment.body.rstrip()
                issue_comment_content += "\n"
-        # Now create the actual search index record
+        # Now create the actual search index record.
        created_time = clean_timestamp(issue.created_at)
        modified_time = clean_timestamp(issue.updated_at)
        indexed_time = clean_timestamp(datetime.now())
        # Add one document per issue thread,
        # containing entire text of thread.
-        writer.add_document(
+
-                id = issue.html_url,
+        created_time = issue.created_at
-                kind = 'issue',
+        modified_time = issue.updated_at
-                created_time = created_time,
+        indexed_time = datetime.now()
-                modified_time = modified_time,
+        try:
-                indexed_time = indexed_time,
+            writer.add_document(
-                title = issue.title,
+                    id = issue.html_url,
-                url = issue.html_url,
+                    kind = 'issue',
-                mimetype='',
+                    created_time = created_time,
-                owner_email='',
+                    modified_time = modified_time,
-                owner_name='',
+                    indexed_time = indexed_time,
-                repo_name = repo_name,
+                    title = issue.title,
-                repo_url = repo_url,
+                    url = issue.html_url,
-                github_user = issue.user.login,
+                    mimetype='',
-                issue_title = issue.title,
+                    owner_email='',
-                issue_url = issue.html_url,
+                    owner_name='',
-                content = issue_comment_content
+                    group='',
-        )
+                    repo_name = repo_name,
                    repo_url = repo_url,
                    github_user = issue.user.login,
                    issue_title = issue.title,
                    issue_url = issue.html_url,
                    content = issue_comment_content
            )
        except ValueError as e:
            print(repr(e))
            print(" > XXXXXX Failed to index Github issue \"%s\""%(issue.title))
@@ -462,7 +494,8 @@ class Search:
            print(" > XXXXXXXX Failed to find file info.")
            return
-        indexed_time = clean_timestamp(datetime.now())
+
        indexed_time = datetime.now()
        if fext in MARKDOWN_EXTS:
            print("Indexing markdown doc %s from repo %s"%(fname,repo_name))
@@ -491,24 +524,31 @@ class Search:
            usable_url = "https://github.com/%s/blob/master/%s"%(repo_name, fpath)
            # Now create the actual search index record
-            writer.add_document(
+            try:
-                    id = fsha,
+                writer.add_document(
-                    kind = 'markdown',
+                        id = fsha,
-                    created_time = '',
+                        kind = 'markdown',
-                    modified_time = '',
+                        created_time = None,
-                    indexed_time = indexed_time,
+                        modified_time = None,
-                    title = fname,
+                        indexed_time = indexed_time,
-                    url = usable_url,
+                        title = fname,
-                    mimetype='',
+                        url = usable_url,
-                    owner_email='',
+                        mimetype='',
-                    owner_name='',
+                        owner_email='',
-                    repo_name = repo_name,
+                        owner_name='',
-                    repo_url = repo_url,
+                        group='',
-                    github_user = '',
+                        repo_name = repo_name,
-                    issue_title = '',
+                        repo_url = repo_url,
-                    issue_url = '',
+                        github_user = '',
-                    content = content
+                        issue_title = '',
-            )
+                        issue_url = '',
                        content = content
                )
            except ValueError as e:
                print(repr(e))
                print(" > XXXXXX Failed to index Github markdown file \"%s\""%(fname))
        else:
            print("Indexing github file %s from repo %s"%(fname,repo_name))
@@ -516,24 +556,29 @@ class Search:
            key = fname+"_"+fsha
            # Now create the actual search index record
-            writer.add_document(
+            try:
-                    id = key,
+                writer.add_document(
-                    kind = 'ghfile',
+                        id = key,
-                    created_time = '',
+                        kind = 'ghfile',
-                    modified_time = '',
+                        created_time = None,
-                    indexed_time = indexed_time,
+                        modified_time = None,
-                    title = fname,
+                        indexed_time = indexed_time,
-                    url = repo_url,
+                        title = fname,
-                    mimetype='',
+                        url = repo_url,
-                    owner_email='',
+                        mimetype='',
-                    owner_name='',
+                        owner_email='',
-                    repo_name = repo_name,
+                        owner_name='',
-                    repo_url = repo_url,
+                        group='',
-                    github_user = '',
+                        repo_name = repo_name,
-                    issue_title = '',
+                        repo_url = repo_url,
-                    issue_url = '',
+                        github_user = '',
-                    content = ''
+                        issue_title = '',
-            )
+                        issue_url = '',
                        content = ''
                )
            except ValueError as e:
                print(repr(e))
                print(" > XXXXXX Failed to index Github file \"%s\""%(fname))
@@ -547,28 +592,42 @@ class Search:
        Use a Groups.io email thread record to add 
        an email thread to the search index.
        """
-        indexed_time = clean_timestamp(datetime.now())
+        if 'created_time' in d.keys() and d['created_time'] is not None:
            created_time = d['created_time']
        else:
            created_time = None
        if 'modified_time' in d.keys() and d['modified_time'] is not None:
            modified_time = d['modified_time']
        else:
            modified_time = None
        indexed_time = datetime.now()
        # Now create the actual search index record
-        writer.add_document(
+        try:
-                id = d['permalink'],
+            writer.add_document(
-                kind = 'emailthread',
+                    id = d['permalink'],
-                created_time = '',
+                    kind = 'emailthread',
-                modified_time = '',
+                    created_time = created_time,
-                indexed_time = indexed_time,
+                    modified_time = modified_time,
-                title = d['subject'],
+                    indexed_time = indexed_time,
-                url = d['permalink'],
+                    title = d['subject'],
-                mimetype='',
+                    url = d['permalink'],
-                owner_email='',
+                    mimetype='',
-                owner_name=d['original_sender'],
+                    owner_email='',
-                repo_name = '',
+                    owner_name=d['original_sender'],
-                repo_url = '',
+                    group=d['subgroup'],
-                github_user = '',
+                    repo_name = '',
-                issue_title = '',
+                    repo_url = '',
-                issue_url = '',
+                    github_user = '',
-                content = d['content']
+                    issue_title = '',
-        )
+                    issue_url = '',
-
+                    content = d['content']
            )
        except ValueError as e:
            print(repr(e))
            print(" > XXXXXX Failed to index Groups.io thread \"%s\""%(d['subject']))
    # ------------------------------
@@ -581,28 +640,33 @@ class Search:
        to add a disqus comment thread to the
        search index.
        """
-        indexed_time = clean_timestamp(datetime.now())
+        indexed_time = datetime.now()
        # created_time is already a timestamp
        # Now create the actual search index record
-        writer.add_document(
+        try:
-                id = d['id'],
+            writer.add_document(
-                kind = 'disqus',
+                    id = d['id'],
-                created_time = d['created_time'],
+                    kind = 'disqus',
-                modified_time = '',
+                    created_time = d['created_time'],
-                indexed_time = indexed_time,
+                    modified_time = None,
-                title = d['title'],
+                    indexed_time = indexed_time,
-                url = d['link'],
+                    title = d['title'],
-                mimetype='',
+                    url = d['link'],
-                owner_email='',
+                    mimetype='',
-                owner_name='',
+                    owner_email='',
-                repo_name = '',
+                    owner_name='',
-                repo_url = '',
+                    repo_name = '',
-                github_user = '',
+                    repo_url = '',
-                issue_title = '',
+                    github_user = '',
-                issue_url = '',
+                    issue_title = '',
-                content = d['content']
+                    issue_url = '',
-        )
+                    content = d['content']
-
+            )
        except ValueError as e:
            print(repr(e))
            print(" > XXXXXX Failed to index Disqus comment thread \"%s\""%(d['title']))
@@ -680,10 +744,10 @@ class Search:
                full_items[f['id']] = f
            ## Shorter:
-            #break
+            break
-            # Longer:
+            ## Longer:
-            if nextPageToken is None:
+            #if nextPageToken is None:
-                break
+            #    break
        writer = self.ix.writer()
@@ -691,34 +755,41 @@ class Search:
        temp_dir = tempfile.mkdtemp(dir=os.getcwd())
        print("Temporary directory: %s"%(temp_dir))
        try:
            # Drop any id in indexed_ids
            # not in remote_ids
            drop_ids = indexed_ids - remote_ids
            for drop_id in drop_ids:
                writer.delete_by_term('id',drop_id)
-        # Drop any id in indexed_ids
+            # Update any id in indexed_ids
-        # not in remote_ids
+            # and in remote_ids
-        drop_ids = indexed_ids - remote_ids
+            update_ids = indexed_ids & remote_ids
-        for drop_id in drop_ids:
+            for update_id in update_ids:
-            writer.delete_by_term('id',drop_id)
+                # cop out
                writer.delete_by_term('id',update_id)
                item = full_items[update_id]
                self.add_drive_file(writer, item, temp_dir, config, update=True)
                count += 1
-        # Update any id in indexed_ids
+            # Add any id not in indexed_ids
-        # and in remote_ids
+            # and in remote_ids
-        update_ids = indexed_ids & remote_ids
+            add_ids = remote_ids - indexed_ids
-        for update_id in update_ids:
+            for add_id in add_ids:
-            # cop out
+                item = full_items[add_id]
-            writer.delete_by_term('id',update_id)
+                self.add_drive_file(writer, item, temp_dir, config, update=False)
-            item = full_items[update_id]
+                count += 1
            self.add_drive_file(writer, item, temp_dir, config, update=True)
            count += 1
        # Add any id not in indexed_ids
        # and in remote_ids
        add_ids = remote_ids - indexed_ids
        for add_id in add_ids:
            item = full_items[add_id]
            self.add_drive_file(writer, item, temp_dir, config, update=False)
            count += 1
        except Exception as e:
            print("ERROR: While adding Google Drive files to search index")
            print("-"*40)
            print(repr(e))
            print("-"*40)
            print("Continuing...")
            pass
        print("Cleaning temporary directory: %s"%(temp_dir))
        subprocess.call(['rm','-fr',temp_dir])
@@ -1176,7 +1247,7 @@ class Search:
        elif doctype=='issue':
            item_keys = ['title','repo_name','repo_url','url','created_time','modified_time']
        elif doctype=='emailthread':
-            item_keys = ['title','owner_name','url']
+            item_keys = ['title','owner_name','url','group','created_time','modified_time']
        elif doctype=='disqus':
            item_keys = ['title','created_time','url']
        elif doctype=='ghfile':
@@ -1195,11 +1266,7 @@ class Search:
            for r in results:
                d = {}
                for k in item_keys:
-                    if k=='created_time' or k=='modified_time':
+                    d[k] = r[k]
                        #d[k] = r[k]
                        d[k] = dateutil.parser.parse(r[k]).strftime("%Y-%m-%d")
                    else:
                        d[k] = r[k]
                json_results.append(d)
        return json_results
@@ -1212,13 +1279,16 @@ class Search:
            query_string = " ".join(query_list)
            query = None
            if ":" in query_string:
                #query = QueryParser("content", 
                #                    self.schema
                #).parse(query_string)
                query = QueryParser("content", 
                                    self.schema,
                                    termclass=query.Variations
-                ).parse(query_string)
+                )
                query.add_plugin(DateParserPlugin(free=True))
                query = query.parse(query_string)
            elif len(fields) == 1 and fields[0] == "filename":
                pass
            elif len(fields) == 2:
@@ -1226,9 +1296,12 @@ class Search:
            else:
                # If the user does not specify a field,
                # these are the fields that are actually searched
-                fields = ['title', 'content','owner_name','owner_email','url']
+                fields = ['title', 'content','owner_name','owner_email','url','created_date','modified_date']
            if not query:
-                query = MultifieldParser(fields, schema=self.ix.schema).parse(query_string)
+                query = MultifieldParser(fields, schema=self.ix.schema)
                query.add_plugin(DateParserPlugin(free=True))
                query = query.parse(query_string)
                #query = MultifieldParser(fields, schema=self.ix.schema).parse(query_string) 
            parsed_query = "%s" % query
            print("query: %s" % parsed_query)
            results = searcher.search(query, terms=False, scored=True, groupedby="kind")
--- a/disqus_util.py
+++ b/disqus_util.py
@@ -1,6 +1,7 @@
 import os, re
 import requests
 import json
 import dateutil.parser
 from pprint import pprint
@@ -117,13 +118,14 @@ class DisqusCrawler(object):
                        link = response['link']
                        clean_link = re.sub('data-commons.us','nihdatacommons.us',link)
                        clean_link += "#disqus_comments"
                        # Finished working on thread.
                        # We need to make this value a dictionary
                        thread_info = dict(
                                id = response['id'],
-                                created_time = response['createdAt'],
+                                created_time = dateutil.parser.parse(response['createdAt']),
                                title = response['title'],
                                forum = response['forum'],
                                link = clean_link,
--- a/groupsio_util.py
+++ b/groupsio_util.py
@@ -1,5 +1,7 @@
 import requests, os, re
 from bs4 import BeautifulSoup
 import dateutil.parser
 import datetime
 class GroupsIOException(Exception):
    pass
@@ -64,7 +66,7 @@ class GroupsIOArchivesCrawler(object):
            ## Short circuit
            ## for debugging purposes
-            #break
+            break
        return subgroups
@@ -251,7 +253,7 @@ class GroupsIOArchivesCrawler(object):
            subject = soup.find('title').text
            # Extract information for the schema:
-            # - permalink for thread (done)
+            # - permalink for thread (done above)
            # - subject/title (done)
            # - original sender email/name (done)
            # - content (done)
@@ -266,11 +268,35 @@ class GroupsIOArchivesCrawler(object):
                    pass
                else:
                    # found an email!
-                    # this is a maze, thanks groups.io
+                    # this is a maze, not amazing.
                    # thanks groups.io!
                    td = tr.find('td')
-                    divrow = td.find('div',{'class':'row'}).find('div',{'class':'pull-left'})
+
                    sender_divrow = td.find('div',{'class':'row'})
                    sender_divrow = sender_divrow.find('div',{'class':'pull-left'})
                    if (i+1)==1:
-                        original_sender = divrow.text.strip()
+                        original_sender = sender_divrow.text.strip()
                    date_divrow = td.find('div',{'class':'row'})
                    date_divrow = date_divrow.find('div',{'class':'pull-right'})
                    date_divrow = date_divrow.find('font',{'class':'text-muted'})
                    date_divrow = date_divrow.find('script').text
                    try:
                        time_seconds = re.search(' [0-9]{1,} ',date_divrow).group(0)
                        time_seconds = time_seconds.strip()
                        # Thanks groups.io for the weird date formatting
                        time_seconds = time_seconds[:10]
                        mmicro_seconds = time_seconds[10:]
                        if (i+1)==1:
                            created_time  = datetime.datetime.utcfromtimestamp(int(time_seconds))
                            modified_time = datetime.datetime.utcfromtimestamp(int(time_seconds))
                        else:
                            modified_time = datetime.datetime.utcfromtimestamp(int(time_seconds))
                    except AttributeError:
                        created_time = None
                        modified_time = None
                    for div in td.find_all('div'):
                        if div.has_attr('id'):
@@ -299,7 +325,10 @@ class GroupsIOArchivesCrawler(object):
            thread = {
                    'permalink' : permalink,
                    'created_time' : created_time,
                    'modified_time' : modified_time,
                    'subject' : subject,
                    'subgroup' : subgroup_name,
                    'original_sender' : original_sender,
                    'content' : full_content
            }
@@ -324,11 +353,13 @@ class GroupsIOArchivesCrawler(object):
        results = []
        for row in rows:
-            # We don't care about anything except title and ugly link
+            # This is where we extract
            # a list of thread titles 
            # and corresponding links.
            subject = row.find('span',{'class':'subject'})
            title = subject.get_text()
            link = row.find('a')['href']
-            #print(title)
+
            results.append((title,link))
        return results
--- a/static/centillion_white_beta.png
+++ b/static/centillion_white_beta.png
--- a/static/centillion_white_localhost.png
+++ b/static/centillion_white_localhost.png
--- a/static/master_list.js
+++ b/static/master_list.js
@@ -57,6 +57,25 @@ $(document).ready(function() {
 });
 //////////////////////////////////
 // utility functions
 // https://stackoverflow.com/a/25275808
 function iso8601(date) {
  var hours = date.getHours();
  var minutes = date.getMinutes();
  var ampm = hours >= 12 ? 'PM' : 'AM';
  hours = hours % 12;
  hours = hours ? hours : 12; // the hour '0' should be '12'
  minutes = minutes < 10 ? '0'+minutes : minutes;
  var strTime = hours + ':' + minutes + ' ' + ampm;
  return date.getYear() + "-" + (date.getMonth()+1) + "-" + date.getDate() + "  " + strTime;
 }
 // https://stackoverflow.com/a/7390612
 var toType = function(obj) {
  return ({}).toString.call(obj).match(/\s([a-zA-Z]+)/)[1].toLowerCase()
 }
 //////////////////////////////////
 // API-to-Table Functions
@@ -315,8 +334,10 @@ function load_emailthreads_table(){
                var r = new Array(), j = -1, size=result.length;
                r[++j] = '<thead>'
                r[++j] = '<tr class="header-row">';
-                r[++j] = '<th width="70%">Topic</th>';
+                r[++j] = '<th width="60%">Topic</th>';
-                r[++j] = '<th width="30%">Started By</th>';
+                r[++j] = '<th width="15%">Started By</th>';
                r[++j] = '<th width="15%">Date</th>';
                r[++j] = '<th width="10%">Mailing List</th>';
                r[++j] = '</tr>';
                r[++j] = '</thead>'
                r[++j] = '<tbody>'
@@ -327,6 +348,10 @@ function load_emailthreads_table(){
                    r[++j] = '</a>'
                    r[++j] = '</td><td>';
                    r[++j] = result[i]['owner_name'];
                    r[++j] = '</td><td>';
                    r[++j] = result[i]['created_time'];
                    r[++j] = '</td><td>';
                    r[++j] = result[i]['group'];
                    r[++j] = '</td></tr>';
                }
                r[++j] = '</tbody>'
--- a/static/style.css
+++ b/static/style.css
@@ -86,6 +86,14 @@ div.container {
 }
 /* badges for number of docs indexed */
 span.results-count {
    background-color: #555;
 }
 span.indexing-count {
    background-color: #337ab7;
 }
 span.badge {
    vertical-align: text-bottom;
 }
@@ -192,7 +200,7 @@ table {
 .info, .last-searches {
    color: gray;
-    font-size: 12px;
+    /*font-size: 12px;*/
    font-family: Arial, serif;
 }
@@ -202,27 +210,27 @@ table {
 div.tags a, td.tag-cloud a {
    color: #b56020;
-    font-size: 12px;
+    /*font-size: 12px;*/
 }
 td.tag-cloud, td.directories-cloud {
-    font-size: 12px;
+    /*font-size: 12px;*/
    color: #555555;
 }
 td.directories-cloud a {
-    font-size: 12px;
+    /*font-size: 12px;*/
    color: #377BA8;
 }
 div.path {
-    font-size: 12px;
+    /*font-size: 12px;*/
    color: #666666;
    margin-bottom: 3px;
 }
 div.path a {
-    font-size: 12px;
+    /*font-size: 12px;*/
    margin-right: 5px;
 }
--- a/templates/banner.html
+++ b/templates/banner.html
@@ -7,11 +7,18 @@
        <div class="col12sm" id="banner-col">
            <center>
                <a id="banner-a" href="{{ url_for('search')}}?query=&fields=">
-                    <img id="banner-img" src="{{ url_for('static', filename='centillion_white.png') }}">
+                    {% if 'betasearch' in request.url %}
                        <img id="banner-img" src="{{ url_for('static', filename='centillion_white_beta.png') }}">
                    {% elif 'localhost' in request.url %}
                        <img id="banner-img" src="{{ url_for('static', filename='centillion_white_localhost.png') }}">
                    {% else %}
                        <img id="banner-img" src="{{ url_for('static', filename='centillion_white.png') }}">
                    {% endif %}
                </a>
            </center>
        </div>
    </div>
    {% if config['TAGLINE'] %}
    <div class="row" id="tagline-row">
        <div class="col12sm" id="tagline-col">
--- a/templates/flashed_messages.html
+++ b/templates/flashed_messages.html
@@ -5,7 +5,7 @@
            <div class="alert alert-success alert-dismissible fade in">
                <a href="#" class="close" data-dismiss="alert" aria-label="close">&times;</a>
                    {% for message in messages %}
-                        <p class="lead">{{ message }}</p>
+                        <p>{{ message }}</p>
                    {% endfor %}
            </div>
        </div>
--- a/templates/search.html
+++ b/templates/search.html
@@ -52,8 +52,8 @@
                    <div class="container-fluid">
                        <div class="row">
                            <div class="col-xs-12 info">
-                                <b>Found:</b> <span class="badge">{{entries|length}}</span> results 
+                                <b>Found:</b> <span class="badge results-count">{{entries|length}}</span> results 
-                                out of <span class="badge">{{totals["total"]}}</span> total items indexed
+                                out of <span class="badge results-count">{{totals["total"]}}</span> total items indexed
                            </div>
                        </div>
                    </div>
@@ -67,32 +67,32 @@
                            <div class="col-xs-12 info">
                                <b>Indexing:</b>
-                                <span class="badge">{{totals["gdoc"]}}</span>
+                                <span class="badge indexing-count">{{totals["gdoc"]}}</span>
                                <a href="/master_list?doctype=gdoc#gdoc">
                                Google Drive files
                                </a>,
-                                <span class="badge">{{totals["issue"]}}</span>
+                                <span class="badge indexing-count">{{totals["issue"]}}</span>
                                <a href="/master_list?doctype=issue#issue">
                                Github issues
                                </a>,
-                                <span class="badge">{{totals["ghfile"]}}</span>
+                                <span class="badge indexing-count">{{totals["ghfile"]}}</span>
                                <a href="/master_list?doctype=ghfile#ghfile">
                                Github files
                                </a>,
-                                <span class="badge">{{totals["markdown"]}}</span>
+                                <span class="badge indexing-count">{{totals["markdown"]}}</span>
                                <a href="/master_list?doctype=markdown#markdown">
                                Github Markdown files
                                </a>,
-                                <span class="badge">{{totals["emailthread"]}}</span>
+                                <span class="badge indexing-count">{{totals["emailthread"]}}</span>
                                <a href="/master_list?doctype=emailthread#emailthread">
                                Groups.io email threads
                                </a>,
-                                <span class="badge">{{totals["disqus"]}}</span>
+                                <span class="badge indexing-count">{{totals["disqus"]}}</span>
                                <a href="/master_list?doctype=disqus#disqus">
                                Disqus comment threads
                                </a>
@@ -101,6 +101,25 @@
                </div>
            </li>
            {#
            # more options...
            #}
            <li  class="list-group-item">
                    <div class="container-fluid">
                        <div class="row">
                            <div class="col-xs-12 info">
                                <b>More Options <i class="fa fa-chevron-down"></i></b>
                            </div>
                        </div>
                </div>
            </li>
        </ul>
    </div>
 </div>
Author	SHA1	Message	Date
Charles Reid	8d0bf33f99	add beginnings of a drop-down panel where we can put advanced search and stumbleupon	2018-08-24 02:20:28 -07:00
Charles Reid	fdb3963ede	tack on the disqus comments anchor to disqus URLs	2018-08-24 02:01:34 -07:00
Chaz Reid	90379a69c5	Merge pull request #92 from dcppc/add-date-subgrp-emailthreads add string formatting for dates and add date/mailing list column to email threads master list	2018-08-24 01:58:29 -07:00
Charles Reid	0faca67c35	add string formatting for dates and add date/mailing list column to email threads master list closes #58	2018-08-24 01:56:19 -07:00
Chaz Reid	77b533b642	Merge pull request #86 from dcppc/disqus Add Disqus	2018-08-24 01:18:37 -07:00
Chaz Reid	ccf013e3c9	Merge pull request #85 from dcppc/add-coc-dotgithub Add Code of Conduct, Contributing, and PR template	2018-08-24 01:18:14 -07:00
Chaz Reid	e67db4f1ef	Merge pull request #89 from dcppc/fix-flashed-messages-font fix font used in flashed messages	2018-08-24 01:17:59 -07:00
Chaz Reid	b11a26a812	Merge pull request #91 from dcppc/merge-datetime-into-disqus Merge datetime into disqus	2018-08-24 01:14:24 -07:00
Charles Reid	55a74f7d98	Merge branch 'use-datetime' into merge-datetime-into-disqus * use-datetime: extract date and time from email threads pages add groups and tags to schema; update how we determine timestamps; handle exceptions when we add the document to the writer, rather than elsewhere move where exception is caught (exception was also incorrect.) switched created_time, modified_time, indexed_time over to DATETIME. added DateParserPlugin to query QueryParser. added time fields to those being searched by default. tests do not seem to be working.	2018-08-24 01:13:42 -07:00
Chaz Reid	ab76226b0c	Merge pull request #90 from dcppc/add-dates-and-subgroups-to-emails Add dates and subgroups to emails	2018-08-24 00:07:40 -07:00
Charles Reid	a4ebef6e6f	extract date and time from email threads pages	2018-08-24 00:04:35 -07:00
Charles Reid	bad50efa9b	add groups and tags to schema; update how we determine timestamps; handle exceptions when we add the document to the writer, rather than elsewhere	2018-08-24 00:03:23 -07:00
Charles Reid	629fc063db	move where exception is caught (exception was also incorrect.)	2018-08-24 00:01:26 -07:00
Charles Reid	4f41d8597f	fix font used in flashed messages	2018-08-23 19:05:16 -07:00
Charles Reid	3b0baa21de	switched created_time, modified_time, indexed_time over to DATETIME. added DateParserPlugin to query QueryParser. added time fields to those being searched by default. tests do not seem to be working.	2018-08-23 19:01:40 -07:00
Charles Reid	17b2d359bb	add contributing and code of conduct files	2018-08-23 11:03:48 -07:00
Charles Reid	62ca62274e	add github pull request template	2018-08-23 11:02:37 -07:00
Chaz Reid	501cae8329	Merge pull request #81 from dcppc/detect-beta-banner Add custom banners for beta/localhost centillion instances	2018-08-21 13:18:11 -07:00
Charles Reid	0543c3e89f	fix filename	2018-08-21 12:01:12 -07:00
Charles Reid	2191140232	Add custom banners for beta/localhost centillion instances	2018-08-21 11:58:19 -07:00