Compare commits
27 Commits
disqus
...
fix-output
Author | SHA1 | Date | |
---|---|---|---|
1b2f9a2278 | |||
937708f5d8 | |||
4c3ee712bb | |||
f5af965a33 | |||
bce16d336d | |||
729514ac89 | |||
46ce070b09 | |||
891fa50868 | |||
fdb3963ede | |||
90379a69c5 | |||
0faca67c35 | |||
77b533b642 | |||
ccf013e3c9 | |||
e67db4f1ef | |||
b11a26a812 | |||
55a74f7d98 | |||
ab76226b0c | |||
a4ebef6e6f | |||
bad50efa9b | |||
629fc063db | |||
4f41d8597f | |||
3b0baa21de | |||
17b2d359bb | |||
62ca62274e | |||
501cae8329 | |||
0543c3e89f | |||
2191140232 |
12
.github/PULL_REQUEST_TEMPLATE.md
vendored
Normal file
12
.github/PULL_REQUEST_TEMPLATE.md
vendored
Normal file
@@ -0,0 +1,12 @@
|
||||
Thanks for contributing to centillion!
|
||||
|
||||
Please place an x between the brackets to indicate a yes answer
|
||||
to the questions below.
|
||||
|
||||
- [ ] Is this pull request mergeable?
|
||||
- [ ] Has this been tested locally?
|
||||
- [ ] Does this pull request pass the tests?
|
||||
- [ ] Have new tests been added to cover any new code?
|
||||
- [ ] Was a spellchecker run on the source code and documentation after
|
||||
changes were made?
|
||||
|
43
CODE_OF_CONDUCT.md
Normal file
43
CODE_OF_CONDUCT.md
Normal file
@@ -0,0 +1,43 @@
|
||||
# Code of Conduct
|
||||
|
||||
## DCPPC Code of Conduct
|
||||
|
||||
All members of the Commons are expected to agree with the following code
|
||||
of conduct. We will enforce this code as needed. We expect cooperation
|
||||
from all members to help ensuring a safe environment for everybody.
|
||||
|
||||
## The Quick Version
|
||||
|
||||
The Consortium is dedicated to providing a harassment-free experience
|
||||
for everyone, regardless of gender, gender identity and expression, age,
|
||||
sexual orientation, disability, physical appearance, body size, race, or
|
||||
religion (or lack thereof). We do not tolerate harassment of Consortium
|
||||
members in any form. Sexual language and imagery is generally not
|
||||
appropriate for any venue, including meetings, presentations, or
|
||||
discussions.
|
||||
|
||||
## The Less Quick Version
|
||||
|
||||
Harassment includes offensive verbal comments related to gender, gender
|
||||
identity and expression, age, sexual orientation, disability, physical
|
||||
appearance, body size, race, religion, sexual images in public spaces,
|
||||
deliberate intimidation, stalking, following, harassing photography or
|
||||
recording, sustained disruption of talks or other events, inappropriate
|
||||
physical contact, and unwelcome sexual attention.
|
||||
|
||||
Members asked to stop any harassing behavior are expected to comply
|
||||
immediately.
|
||||
|
||||
If you are being harassed, notice that someone else is being harassed,
|
||||
or have any other concerns, please contact [Titus
|
||||
Brown](mailto:ctbrown@ucdavis.edu) immediately. If Titus is the cause of
|
||||
your concern, please contact [Vivien
|
||||
Bonazzi](mailto:bonazziv@mail.nih.gov).
|
||||
|
||||
We expect members to follow these guidelines at any Consortium event.
|
||||
|
||||
Original source and credit: <http://2012.jsconf.us/#/about> & The Ada
|
||||
Initiative. Please help by translating or improving:
|
||||
<http://github.com/leftlogic/confcodeofconduct.com>. This work is
|
||||
licensed under a Creative Commons Attribution 3.0 Unported License
|
||||
|
21
CONTRIBUTING.md
Normal file
21
CONTRIBUTING.md
Normal file
@@ -0,0 +1,21 @@
|
||||
# Contributing to the DCPPC Internal Repository
|
||||
|
||||
Hello, and thank you for wanting to contribute to the DCPPC Internal
|
||||
Repository\!
|
||||
|
||||
By contributing to this repository, you agree:
|
||||
|
||||
1. To obey the [Code of Conduct](./CODE_OF_CONDUCT.md)
|
||||
2. To release all your contributions under the same terms as the
|
||||
license itself: the [Creative Commons Zero](./LICENSE.md) (aka
|
||||
Public Domain) license
|
||||
|
||||
If you are OK with these two conditions, then we welcome both you and
|
||||
your contribution\!
|
||||
|
||||
If you have any questions about contributing, please [open an
|
||||
issue](https://github.com/dcppc/internal/issues/new) and Team Copper
|
||||
will lend a hand ASAP.
|
||||
|
||||
Thank you for being here and for being a part of the DCPPC project.
|
||||
|
@@ -267,7 +267,11 @@ def list_docs(doctype):
|
||||
if org['login']=='dcppc':
|
||||
# Business as usual
|
||||
search = Search(app.config["INDEX_DIR"])
|
||||
return jsonify(search.get_list(doctype))
|
||||
results_list = search.get_list(doctype)
|
||||
for result in results_list:
|
||||
ct = result['created_time']
|
||||
result['created_time'] = datetime.strftime(ct,"%Y-%m-%d %I:%M %p")
|
||||
return jsonify(results_list)
|
||||
|
||||
# nope
|
||||
return render_template('403.html')
|
||||
|
@@ -24,6 +24,8 @@ import dateutil.parser
|
||||
from whoosh import query
|
||||
from whoosh.qparser import MultifieldParser, QueryParser
|
||||
from whoosh.analysis import StemmingAnalyzer, LowercaseFilter, StopFilter
|
||||
from whoosh.qparser.dateparse import DateParserPlugin
|
||||
from whoosh import fields, index
|
||||
|
||||
|
||||
"""
|
||||
@@ -195,30 +197,38 @@ class Search:
|
||||
# is defined.
|
||||
|
||||
schema = Schema(
|
||||
id = ID(stored=True, unique=True),
|
||||
kind = ID(stored=True),
|
||||
id = fields.ID(stored=True, unique=True),
|
||||
kind = fields.ID(stored=True),
|
||||
|
||||
created_time = ID(stored=True),
|
||||
modified_time = ID(stored=True),
|
||||
indexed_time = ID(stored=True),
|
||||
created_time = fields.DATETIME(stored=True),
|
||||
modified_time = fields.DATETIME(stored=True),
|
||||
indexed_time = fields.DATETIME(stored=True),
|
||||
|
||||
title = TEXT(stored=True, field_boost=100.0),
|
||||
url = ID(stored=True, unique=True),
|
||||
|
||||
mimetype=ID(stored=True),
|
||||
owner_email=ID(stored=True),
|
||||
owner_name=TEXT(stored=True),
|
||||
|
||||
repo_name=TEXT(stored=True),
|
||||
repo_url=ID(stored=True),
|
||||
title = fields.TEXT(stored=True, field_boost=100.0),
|
||||
|
||||
github_user=TEXT(stored=True),
|
||||
url = fields.ID(stored=True),
|
||||
|
||||
mimetype = fields.TEXT(stored=True),
|
||||
|
||||
owner_email = fields.ID(stored=True),
|
||||
owner_name = fields.TEXT(stored=True),
|
||||
|
||||
# mainly for email threads, groups.io, hypothesis
|
||||
group = fields.ID(stored=True),
|
||||
|
||||
repo_name = fields.TEXT(stored=True),
|
||||
repo_url = fields.ID(stored=True),
|
||||
github_user = fields.TEXT(stored=True),
|
||||
|
||||
tags = fields.KEYWORD(commas=True,
|
||||
stored=True,
|
||||
lowercase=True),
|
||||
|
||||
# comments only
|
||||
issue_title=TEXT(stored=True, field_boost=100.0),
|
||||
issue_url=ID(stored=True),
|
||||
issue_title = fields.TEXT(stored=True, field_boost=100.0),
|
||||
issue_url = fields.ID(stored=True),
|
||||
|
||||
content=TEXT(stored=True, analyzer=stemming_analyzer)
|
||||
content = fields.TEXT(stored=True, analyzer=stemming_analyzer)
|
||||
)
|
||||
|
||||
|
||||
@@ -258,24 +268,32 @@ class Search:
|
||||
writer.delete_by_term('id',item['id'])
|
||||
|
||||
# Index a plain google drive file
|
||||
writer.add_document(
|
||||
id = item['id'],
|
||||
kind = 'gdoc',
|
||||
created_time = item['createdTime'],
|
||||
modified_time = item['modifiedTime'],
|
||||
indexed_time = datetime.now().replace(microsecond=0).isoformat(),
|
||||
title = item['name'],
|
||||
url = item['webViewLink'],
|
||||
mimetype = mimetype,
|
||||
owner_email = item['owners'][0]['emailAddress'],
|
||||
owner_name = item['owners'][0]['displayName'],
|
||||
repo_name='',
|
||||
repo_url='',
|
||||
github_user='',
|
||||
issue_title='',
|
||||
issue_url='',
|
||||
content = content
|
||||
)
|
||||
created_time = dateutil.parser.parse(item['createdTime'])
|
||||
modified_time = dateutil.parser.parse(item['modifiedTime'])
|
||||
indexed_time = datetime.now().replace(microsecond=0)
|
||||
try:
|
||||
writer.add_document(
|
||||
id = item['id'],
|
||||
kind = 'gdoc',
|
||||
created_time = created_time,
|
||||
modified_time = modified_time,
|
||||
indexed_time = indexed_time,
|
||||
title = item['name'],
|
||||
url = item['webViewLink'],
|
||||
mimetype = mimetype,
|
||||
owner_email = item['owners'][0]['emailAddress'],
|
||||
owner_name = item['owners'][0]['displayName'],
|
||||
group='',
|
||||
repo_name='',
|
||||
repo_url='',
|
||||
github_user='',
|
||||
issue_title='',
|
||||
issue_url='',
|
||||
content = content
|
||||
)
|
||||
except ValueError as e:
|
||||
print(repr(e))
|
||||
print(" > XXXXXX Failed to index Google Drive file \"%s\""%(item['name']))
|
||||
|
||||
|
||||
else:
|
||||
@@ -329,7 +347,7 @@ class Search:
|
||||
)
|
||||
assert output == ""
|
||||
except RuntimeError:
|
||||
print(" > XXXXXX Failed to index document \"%s\""%(item['name']))
|
||||
print(" > XXXXXX Failed to index Google Drive document \"%s\""%(item['name']))
|
||||
|
||||
|
||||
# If export was successful, read contents of markdown
|
||||
@@ -357,24 +375,33 @@ class Search:
|
||||
else:
|
||||
print(" > Creating a new record")
|
||||
|
||||
writer.add_document(
|
||||
id = item['id'],
|
||||
kind = 'gdoc',
|
||||
created_time = item['createdTime'],
|
||||
modified_time = item['modifiedTime'],
|
||||
indexed_time = datetime.now().replace(microsecond=0).isoformat(),
|
||||
title = item['name'],
|
||||
url = item['webViewLink'],
|
||||
mimetype = mimetype,
|
||||
owner_email = item['owners'][0]['emailAddress'],
|
||||
owner_name = item['owners'][0]['displayName'],
|
||||
repo_name='',
|
||||
repo_url='',
|
||||
github_user='',
|
||||
issue_title='',
|
||||
issue_url='',
|
||||
content = content
|
||||
)
|
||||
try:
|
||||
created_time = dateutil.parser.parse(item['createdTime'])
|
||||
modified_time = dateutil.parser.parse(item['modifiedTime'])
|
||||
indexed_time = datetime.now()
|
||||
writer.add_document(
|
||||
id = item['id'],
|
||||
kind = 'gdoc',
|
||||
created_time = created_time,
|
||||
modified_time = modified_time,
|
||||
indexed_time = indexed_time,
|
||||
title = item['name'],
|
||||
url = item['webViewLink'],
|
||||
mimetype = mimetype,
|
||||
owner_email = item['owners'][0]['emailAddress'],
|
||||
owner_name = item['owners'][0]['displayName'],
|
||||
group='',
|
||||
repo_name='',
|
||||
repo_url='',
|
||||
github_user='',
|
||||
issue_title='',
|
||||
issue_url='',
|
||||
content = content
|
||||
)
|
||||
except ValueError as e:
|
||||
print(repr(e))
|
||||
print(" > XXXXXX Failed to index Google Drive file \"%s\""%(item['name']))
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -408,31 +435,36 @@ class Search:
|
||||
issue_comment_content += comment.body.rstrip()
|
||||
issue_comment_content += "\n"
|
||||
|
||||
# Now create the actual search index record
|
||||
created_time = clean_timestamp(issue.created_at)
|
||||
modified_time = clean_timestamp(issue.updated_at)
|
||||
indexed_time = clean_timestamp(datetime.now())
|
||||
|
||||
# Now create the actual search index record.
|
||||
# Add one document per issue thread,
|
||||
# containing entire text of thread.
|
||||
writer.add_document(
|
||||
id = issue.html_url,
|
||||
kind = 'issue',
|
||||
created_time = created_time,
|
||||
modified_time = modified_time,
|
||||
indexed_time = indexed_time,
|
||||
title = issue.title,
|
||||
url = issue.html_url,
|
||||
mimetype='',
|
||||
owner_email='',
|
||||
owner_name='',
|
||||
repo_name = repo_name,
|
||||
repo_url = repo_url,
|
||||
github_user = issue.user.login,
|
||||
issue_title = issue.title,
|
||||
issue_url = issue.html_url,
|
||||
content = issue_comment_content
|
||||
)
|
||||
|
||||
created_time = issue.created_at
|
||||
modified_time = issue.updated_at
|
||||
indexed_time = datetime.now()
|
||||
try:
|
||||
writer.add_document(
|
||||
id = issue.html_url,
|
||||
kind = 'issue',
|
||||
created_time = created_time,
|
||||
modified_time = modified_time,
|
||||
indexed_time = indexed_time,
|
||||
title = issue.title,
|
||||
url = issue.html_url,
|
||||
mimetype='',
|
||||
owner_email='',
|
||||
owner_name='',
|
||||
group='',
|
||||
repo_name = repo_name,
|
||||
repo_url = repo_url,
|
||||
github_user = issue.user.login,
|
||||
issue_title = issue.title,
|
||||
issue_url = issue.html_url,
|
||||
content = issue_comment_content
|
||||
)
|
||||
except ValueError as e:
|
||||
print(repr(e))
|
||||
print(" > XXXXXX Failed to index Github issue \"%s\""%(issue.title))
|
||||
|
||||
|
||||
|
||||
@@ -462,7 +494,8 @@ class Search:
|
||||
print(" > XXXXXXXX Failed to find file info.")
|
||||
return
|
||||
|
||||
indexed_time = clean_timestamp(datetime.now())
|
||||
|
||||
indexed_time = datetime.now()
|
||||
|
||||
if fext in MARKDOWN_EXTS:
|
||||
print("Indexing markdown doc %s from repo %s"%(fname,repo_name))
|
||||
@@ -491,24 +524,31 @@ class Search:
|
||||
usable_url = "https://github.com/%s/blob/master/%s"%(repo_name, fpath)
|
||||
|
||||
# Now create the actual search index record
|
||||
writer.add_document(
|
||||
id = fsha,
|
||||
kind = 'markdown',
|
||||
created_time = '',
|
||||
modified_time = '',
|
||||
indexed_time = indexed_time,
|
||||
title = fname,
|
||||
url = usable_url,
|
||||
mimetype='',
|
||||
owner_email='',
|
||||
owner_name='',
|
||||
repo_name = repo_name,
|
||||
repo_url = repo_url,
|
||||
github_user = '',
|
||||
issue_title = '',
|
||||
issue_url = '',
|
||||
content = content
|
||||
)
|
||||
try:
|
||||
writer.add_document(
|
||||
id = fsha,
|
||||
kind = 'markdown',
|
||||
created_time = None,
|
||||
modified_time = None,
|
||||
indexed_time = indexed_time,
|
||||
title = fname,
|
||||
url = usable_url,
|
||||
mimetype='',
|
||||
owner_email='',
|
||||
owner_name='',
|
||||
group='',
|
||||
repo_name = repo_name,
|
||||
repo_url = repo_url,
|
||||
github_user = '',
|
||||
issue_title = '',
|
||||
issue_url = '',
|
||||
content = content
|
||||
)
|
||||
except ValueError as e:
|
||||
print(repr(e))
|
||||
print(" > XXXXXX Failed to index Github markdown file \"%s\""%(fname))
|
||||
|
||||
|
||||
|
||||
else:
|
||||
print("Indexing github file %s from repo %s"%(fname,repo_name))
|
||||
@@ -516,24 +556,29 @@ class Search:
|
||||
key = fname+"_"+fsha
|
||||
|
||||
# Now create the actual search index record
|
||||
writer.add_document(
|
||||
id = key,
|
||||
kind = 'ghfile',
|
||||
created_time = '',
|
||||
modified_time = '',
|
||||
indexed_time = indexed_time,
|
||||
title = fname,
|
||||
url = repo_url,
|
||||
mimetype='',
|
||||
owner_email='',
|
||||
owner_name='',
|
||||
repo_name = repo_name,
|
||||
repo_url = repo_url,
|
||||
github_user = '',
|
||||
issue_title = '',
|
||||
issue_url = '',
|
||||
content = ''
|
||||
)
|
||||
try:
|
||||
writer.add_document(
|
||||
id = key,
|
||||
kind = 'ghfile',
|
||||
created_time = None,
|
||||
modified_time = None,
|
||||
indexed_time = indexed_time,
|
||||
title = fname,
|
||||
url = repo_url,
|
||||
mimetype='',
|
||||
owner_email='',
|
||||
owner_name='',
|
||||
group='',
|
||||
repo_name = repo_name,
|
||||
repo_url = repo_url,
|
||||
github_user = '',
|
||||
issue_title = '',
|
||||
issue_url = '',
|
||||
content = ''
|
||||
)
|
||||
except ValueError as e:
|
||||
print(repr(e))
|
||||
print(" > XXXXXX Failed to index Github file \"%s\""%(fname))
|
||||
|
||||
|
||||
|
||||
@@ -547,28 +592,42 @@ class Search:
|
||||
Use a Groups.io email thread record to add
|
||||
an email thread to the search index.
|
||||
"""
|
||||
indexed_time = clean_timestamp(datetime.now())
|
||||
if 'created_time' in d.keys() and d['created_time'] is not None:
|
||||
created_time = d['created_time']
|
||||
else:
|
||||
created_time = None
|
||||
|
||||
if 'modified_time' in d.keys() and d['modified_time'] is not None:
|
||||
modified_time = d['modified_time']
|
||||
else:
|
||||
modified_time = None
|
||||
|
||||
indexed_time = datetime.now()
|
||||
|
||||
# Now create the actual search index record
|
||||
writer.add_document(
|
||||
id = d['permalink'],
|
||||
kind = 'emailthread',
|
||||
created_time = '',
|
||||
modified_time = '',
|
||||
indexed_time = indexed_time,
|
||||
title = d['subject'],
|
||||
url = d['permalink'],
|
||||
mimetype='',
|
||||
owner_email='',
|
||||
owner_name=d['original_sender'],
|
||||
repo_name = '',
|
||||
repo_url = '',
|
||||
github_user = '',
|
||||
issue_title = '',
|
||||
issue_url = '',
|
||||
content = d['content']
|
||||
)
|
||||
|
||||
try:
|
||||
writer.add_document(
|
||||
id = d['permalink'],
|
||||
kind = 'emailthread',
|
||||
created_time = created_time,
|
||||
modified_time = modified_time,
|
||||
indexed_time = indexed_time,
|
||||
title = d['subject'],
|
||||
url = d['permalink'],
|
||||
mimetype='',
|
||||
owner_email='',
|
||||
owner_name=d['original_sender'],
|
||||
group=d['subgroup'],
|
||||
repo_name = '',
|
||||
repo_url = '',
|
||||
github_user = '',
|
||||
issue_title = '',
|
||||
issue_url = '',
|
||||
content = d['content']
|
||||
)
|
||||
except ValueError as e:
|
||||
print(repr(e))
|
||||
print(" > XXXXXX Failed to index Groups.io thread \"%s\""%(d['subject']))
|
||||
|
||||
|
||||
# ------------------------------
|
||||
@@ -581,28 +640,33 @@ class Search:
|
||||
to add a disqus comment thread to the
|
||||
search index.
|
||||
"""
|
||||
indexed_time = clean_timestamp(datetime.now())
|
||||
indexed_time = datetime.now()
|
||||
|
||||
# created_time is already a timestamp
|
||||
|
||||
# Now create the actual search index record
|
||||
writer.add_document(
|
||||
id = d['id'],
|
||||
kind = 'disqus',
|
||||
created_time = d['created_time'],
|
||||
modified_time = '',
|
||||
indexed_time = indexed_time,
|
||||
title = d['title'],
|
||||
url = d['link'],
|
||||
mimetype='',
|
||||
owner_email='',
|
||||
owner_name='',
|
||||
repo_name = '',
|
||||
repo_url = '',
|
||||
github_user = '',
|
||||
issue_title = '',
|
||||
issue_url = '',
|
||||
content = d['content']
|
||||
)
|
||||
|
||||
try:
|
||||
writer.add_document(
|
||||
id = d['id'],
|
||||
kind = 'disqus',
|
||||
created_time = d['created_time'],
|
||||
modified_time = None,
|
||||
indexed_time = indexed_time,
|
||||
title = d['title'],
|
||||
url = d['link'],
|
||||
mimetype='',
|
||||
owner_email='',
|
||||
owner_name='',
|
||||
repo_name = '',
|
||||
repo_url = '',
|
||||
github_user = '',
|
||||
issue_title = '',
|
||||
issue_url = '',
|
||||
content = d['content']
|
||||
)
|
||||
except ValueError as e:
|
||||
print(repr(e))
|
||||
print(" > XXXXXX Failed to index Disqus comment thread \"%s\""%(d['title']))
|
||||
|
||||
|
||||
|
||||
@@ -681,7 +745,7 @@ class Search:
|
||||
|
||||
## Shorter:
|
||||
#break
|
||||
# Longer:
|
||||
## Longer:
|
||||
if nextPageToken is None:
|
||||
break
|
||||
|
||||
@@ -691,40 +755,47 @@ class Search:
|
||||
temp_dir = tempfile.mkdtemp(dir=os.getcwd())
|
||||
print("Temporary directory: %s"%(temp_dir))
|
||||
|
||||
try:
|
||||
|
||||
# Drop any id in indexed_ids
|
||||
# not in remote_ids
|
||||
drop_ids = indexed_ids - remote_ids
|
||||
for drop_id in drop_ids:
|
||||
writer.delete_by_term('id',drop_id)
|
||||
|
||||
|
||||
# Drop any id in indexed_ids
|
||||
# not in remote_ids
|
||||
drop_ids = indexed_ids - remote_ids
|
||||
for drop_id in drop_ids:
|
||||
writer.delete_by_term('id',drop_id)
|
||||
# Update any id in indexed_ids
|
||||
# and in remote_ids
|
||||
update_ids = indexed_ids & remote_ids
|
||||
for update_id in update_ids:
|
||||
# cop out
|
||||
writer.delete_by_term('id',update_id)
|
||||
item = full_items[update_id]
|
||||
self.add_drive_file(writer, item, temp_dir, config, update=True)
|
||||
count += 1
|
||||
|
||||
|
||||
# Update any id in indexed_ids
|
||||
# and in remote_ids
|
||||
update_ids = indexed_ids & remote_ids
|
||||
for update_id in update_ids:
|
||||
# cop out
|
||||
writer.delete_by_term('id',update_id)
|
||||
item = full_items[update_id]
|
||||
self.add_drive_file(writer, item, temp_dir, config, update=True)
|
||||
count += 1
|
||||
|
||||
|
||||
# Add any id not in indexed_ids
|
||||
# and in remote_ids
|
||||
add_ids = remote_ids - indexed_ids
|
||||
for add_id in add_ids:
|
||||
item = full_items[add_id]
|
||||
self.add_drive_file(writer, item, temp_dir, config, update=False)
|
||||
count += 1
|
||||
# Add any id not in indexed_ids
|
||||
# and in remote_ids
|
||||
add_ids = remote_ids - indexed_ids
|
||||
for add_id in add_ids:
|
||||
item = full_items[add_id]
|
||||
self.add_drive_file(writer, item, temp_dir, config, update=False)
|
||||
count += 1
|
||||
|
||||
except Exception as e:
|
||||
print("ERROR: While adding Google Drive files to search index")
|
||||
print("-"*40)
|
||||
print(repr(e))
|
||||
print("-"*40)
|
||||
print("Continuing...")
|
||||
pass
|
||||
|
||||
print("Cleaning temporary directory: %s"%(temp_dir))
|
||||
subprocess.call(['rm','-fr',temp_dir])
|
||||
|
||||
writer.commit()
|
||||
print("Done, updated %d documents in the index" % count)
|
||||
print("Done, updated %d Google Drive files in the index" % count)
|
||||
|
||||
|
||||
# ------------------------------
|
||||
@@ -802,7 +873,7 @@ class Search:
|
||||
|
||||
|
||||
writer.commit()
|
||||
print("Done, updated %d documents in the index" % count)
|
||||
print("Done, updated %d Github issues in the index" % count)
|
||||
|
||||
|
||||
|
||||
@@ -1176,7 +1247,7 @@ class Search:
|
||||
elif doctype=='issue':
|
||||
item_keys = ['title','repo_name','repo_url','url','created_time','modified_time']
|
||||
elif doctype=='emailthread':
|
||||
item_keys = ['title','owner_name','url']
|
||||
item_keys = ['title','owner_name','url','group','created_time','modified_time']
|
||||
elif doctype=='disqus':
|
||||
item_keys = ['title','created_time','url']
|
||||
elif doctype=='ghfile':
|
||||
@@ -1195,11 +1266,7 @@ class Search:
|
||||
for r in results:
|
||||
d = {}
|
||||
for k in item_keys:
|
||||
if k=='created_time' or k=='modified_time':
|
||||
#d[k] = r[k]
|
||||
d[k] = dateutil.parser.parse(r[k]).strftime("%Y-%m-%d")
|
||||
else:
|
||||
d[k] = r[k]
|
||||
d[k] = r[k]
|
||||
json_results.append(d)
|
||||
|
||||
return json_results
|
||||
@@ -1212,13 +1279,16 @@ class Search:
|
||||
query_string = " ".join(query_list)
|
||||
query = None
|
||||
if ":" in query_string:
|
||||
|
||||
#query = QueryParser("content",
|
||||
# self.schema
|
||||
#).parse(query_string)
|
||||
query = QueryParser("content",
|
||||
self.schema,
|
||||
termclass=query.Variations
|
||||
).parse(query_string)
|
||||
)
|
||||
query.add_plugin(DateParserPlugin(free=True))
|
||||
query = query.parse(query_string)
|
||||
elif len(fields) == 1 and fields[0] == "filename":
|
||||
pass
|
||||
elif len(fields) == 2:
|
||||
@@ -1226,9 +1296,12 @@ class Search:
|
||||
else:
|
||||
# If the user does not specify a field,
|
||||
# these are the fields that are actually searched
|
||||
fields = ['title', 'content','owner_name','owner_email','url']
|
||||
fields = ['title', 'content','owner_name','owner_email','url','created_date','modified_date']
|
||||
if not query:
|
||||
query = MultifieldParser(fields, schema=self.ix.schema).parse(query_string)
|
||||
query = MultifieldParser(fields, schema=self.ix.schema)
|
||||
query.add_plugin(DateParserPlugin(free=True))
|
||||
query = query.parse(query_string)
|
||||
#query = MultifieldParser(fields, schema=self.ix.schema).parse(query_string)
|
||||
parsed_query = "%s" % query
|
||||
print("query: %s" % parsed_query)
|
||||
results = searcher.search(query, terms=False, scored=True, groupedby="kind")
|
||||
|
@@ -1,20 +1,38 @@
|
||||
######################################
|
||||
# github oauth
|
||||
GITHUB_OAUTH_CLIENT_ID = "XXX"
|
||||
GITHUB_OAUTH_CLIENT_SECRET = "YYY"
|
||||
|
||||
######################################
|
||||
# github acces token
|
||||
GITHUB_TOKEN = "XXX"
|
||||
|
||||
######################################
|
||||
# groups.io
|
||||
GROUPSIO_TOKEN = "XXXXX"
|
||||
GROUPSIO_USERNAME = "XXXXX"
|
||||
GROUPSIO_PASSWORD = "XXXXX"
|
||||
|
||||
######################################
|
||||
# Disqus API public key
|
||||
DISQUS_TOKEN = "XXXXX"
|
||||
|
||||
######################################
|
||||
# everything else
|
||||
|
||||
# Location of index file
|
||||
INDEX_DIR = "search_index"
|
||||
|
||||
# oauth client deets
|
||||
GITHUB_OAUTH_CLIENT_ID = "XXX"
|
||||
GITHUB_OAUTH_CLIENT_SECRET = "YYY"
|
||||
GITHUB_TOKEN = "ZZZ"
|
||||
|
||||
# More information footer: Repository label
|
||||
FOOTER_REPO_ORG = "charlesreid1"
|
||||
FOOTER_REPO_ORG = "dcppc"
|
||||
FOOTER_REPO_NAME = "centillion"
|
||||
|
||||
# Toggle to show Whoosh parsed query
|
||||
SHOW_PARSED_QUERY=True
|
||||
|
||||
TAGLINE = "Search All The Things"
|
||||
TAGLINE = "Search the Data Commons"
|
||||
|
||||
# Flask settings
|
||||
DEBUG = True
|
||||
SECRET_KEY = 'WWWWW'
|
||||
SECRET_KEY = 'XXXXX'
|
||||
|
||||
|
@@ -1,6 +1,7 @@
|
||||
import os, re
|
||||
import requests
|
||||
import json
|
||||
import dateutil.parser
|
||||
|
||||
from pprint import pprint
|
||||
|
||||
@@ -117,13 +118,14 @@ class DisqusCrawler(object):
|
||||
|
||||
link = response['link']
|
||||
clean_link = re.sub('data-commons.us','nihdatacommons.us',link)
|
||||
clean_link += "#disqus_comments"
|
||||
|
||||
# Finished working on thread.
|
||||
|
||||
# We need to make this value a dictionary
|
||||
thread_info = dict(
|
||||
id = response['id'],
|
||||
created_time = response['createdAt'],
|
||||
created_time = dateutil.parser.parse(response['createdAt']),
|
||||
title = response['title'],
|
||||
forum = response['forum'],
|
||||
link = clean_link,
|
||||
|
@@ -1,5 +1,7 @@
|
||||
import requests, os, re
|
||||
from bs4 import BeautifulSoup
|
||||
import dateutil.parser
|
||||
import datetime
|
||||
|
||||
class GroupsIOException(Exception):
|
||||
pass
|
||||
@@ -251,7 +253,7 @@ class GroupsIOArchivesCrawler(object):
|
||||
subject = soup.find('title').text
|
||||
|
||||
# Extract information for the schema:
|
||||
# - permalink for thread (done)
|
||||
# - permalink for thread (done above)
|
||||
# - subject/title (done)
|
||||
# - original sender email/name (done)
|
||||
# - content (done)
|
||||
@@ -266,11 +268,35 @@ class GroupsIOArchivesCrawler(object):
|
||||
pass
|
||||
else:
|
||||
# found an email!
|
||||
# this is a maze, thanks groups.io
|
||||
# this is a maze, not amazing.
|
||||
# thanks groups.io!
|
||||
td = tr.find('td')
|
||||
divrow = td.find('div',{'class':'row'}).find('div',{'class':'pull-left'})
|
||||
|
||||
sender_divrow = td.find('div',{'class':'row'})
|
||||
sender_divrow = sender_divrow.find('div',{'class':'pull-left'})
|
||||
if (i+1)==1:
|
||||
original_sender = divrow.text.strip()
|
||||
original_sender = sender_divrow.text.strip()
|
||||
|
||||
date_divrow = td.find('div',{'class':'row'})
|
||||
date_divrow = date_divrow.find('div',{'class':'pull-right'})
|
||||
date_divrow = date_divrow.find('font',{'class':'text-muted'})
|
||||
date_divrow = date_divrow.find('script').text
|
||||
try:
|
||||
time_seconds = re.search(' [0-9]{1,} ',date_divrow).group(0)
|
||||
time_seconds = time_seconds.strip()
|
||||
# Thanks groups.io for the weird date formatting
|
||||
time_seconds = time_seconds[:10]
|
||||
mmicro_seconds = time_seconds[10:]
|
||||
if (i+1)==1:
|
||||
created_time = datetime.datetime.utcfromtimestamp(int(time_seconds))
|
||||
modified_time = datetime.datetime.utcfromtimestamp(int(time_seconds))
|
||||
else:
|
||||
modified_time = datetime.datetime.utcfromtimestamp(int(time_seconds))
|
||||
|
||||
except AttributeError:
|
||||
created_time = None
|
||||
modified_time = None
|
||||
|
||||
for div in td.find_all('div'):
|
||||
if div.has_attr('id'):
|
||||
|
||||
@@ -299,7 +325,10 @@ class GroupsIOArchivesCrawler(object):
|
||||
|
||||
thread = {
|
||||
'permalink' : permalink,
|
||||
'created_time' : created_time,
|
||||
'modified_time' : modified_time,
|
||||
'subject' : subject,
|
||||
'subgroup' : subgroup_name,
|
||||
'original_sender' : original_sender,
|
||||
'content' : full_content
|
||||
}
|
||||
@@ -324,11 +353,13 @@ class GroupsIOArchivesCrawler(object):
|
||||
|
||||
results = []
|
||||
for row in rows:
|
||||
# We don't care about anything except title and ugly link
|
||||
# This is where we extract
|
||||
# a list of thread titles
|
||||
# and corresponding links.
|
||||
subject = row.find('span',{'class':'subject'})
|
||||
title = subject.get_text()
|
||||
link = row.find('a')['href']
|
||||
#print(title)
|
||||
|
||||
results.append((title,link))
|
||||
|
||||
return results
|
||||
|
BIN
static/centillion_white_beta.png
Normal file
BIN
static/centillion_white_beta.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 29 KiB |
BIN
static/centillion_white_localhost.png
Normal file
BIN
static/centillion_white_localhost.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 30 KiB |
@@ -57,6 +57,25 @@ $(document).ready(function() {
|
||||
});
|
||||
|
||||
|
||||
//////////////////////////////////
|
||||
// utility functions
|
||||
|
||||
// https://stackoverflow.com/a/25275808
|
||||
function iso8601(date) {
|
||||
var hours = date.getHours();
|
||||
var minutes = date.getMinutes();
|
||||
var ampm = hours >= 12 ? 'PM' : 'AM';
|
||||
hours = hours % 12;
|
||||
hours = hours ? hours : 12; // the hour '0' should be '12'
|
||||
minutes = minutes < 10 ? '0'+minutes : minutes;
|
||||
var strTime = hours + ':' + minutes + ' ' + ampm;
|
||||
return date.getYear() + "-" + (date.getMonth()+1) + "-" + date.getDate() + " " + strTime;
|
||||
}
|
||||
|
||||
// https://stackoverflow.com/a/7390612
|
||||
var toType = function(obj) {
|
||||
return ({}).toString.call(obj).match(/\s([a-zA-Z]+)/)[1].toLowerCase()
|
||||
}
|
||||
|
||||
//////////////////////////////////
|
||||
// API-to-Table Functions
|
||||
@@ -315,8 +334,10 @@ function load_emailthreads_table(){
|
||||
var r = new Array(), j = -1, size=result.length;
|
||||
r[++j] = '<thead>'
|
||||
r[++j] = '<tr class="header-row">';
|
||||
r[++j] = '<th width="70%">Topic</th>';
|
||||
r[++j] = '<th width="30%">Started By</th>';
|
||||
r[++j] = '<th width="60%">Topic</th>';
|
||||
r[++j] = '<th width="15%">Started By</th>';
|
||||
r[++j] = '<th width="15%">Date</th>';
|
||||
r[++j] = '<th width="10%">Mailing List</th>';
|
||||
r[++j] = '</tr>';
|
||||
r[++j] = '</thead>'
|
||||
r[++j] = '<tbody>'
|
||||
@@ -327,6 +348,10 @@ function load_emailthreads_table(){
|
||||
r[++j] = '</a>'
|
||||
r[++j] = '</td><td>';
|
||||
r[++j] = result[i]['owner_name'];
|
||||
r[++j] = '</td><td>';
|
||||
r[++j] = result[i]['created_time'];
|
||||
r[++j] = '</td><td>';
|
||||
r[++j] = result[i]['group'];
|
||||
r[++j] = '</td></tr>';
|
||||
}
|
||||
r[++j] = '</tbody>'
|
||||
|
@@ -58,7 +58,7 @@ button#feedback {
|
||||
/* search results table */
|
||||
td#search-results-score-col,
|
||||
td#search-results-type-col {
|
||||
width: 100px;
|
||||
width: 90px;
|
||||
}
|
||||
|
||||
div.container {
|
||||
@@ -86,6 +86,14 @@ div.container {
|
||||
}
|
||||
|
||||
/* badges for number of docs indexed */
|
||||
span.results-count {
|
||||
background-color: #555;
|
||||
}
|
||||
|
||||
span.indexing-count {
|
||||
background-color: #337ab7;
|
||||
}
|
||||
|
||||
span.badge {
|
||||
vertical-align: text-bottom;
|
||||
}
|
||||
@@ -126,7 +134,7 @@ li.search-group-item {
|
||||
}
|
||||
|
||||
div.url {
|
||||
background-color: rgba(86,61,124,.15);
|
||||
background-color: rgba(40,40,60,.15);
|
||||
padding: 8px;
|
||||
}
|
||||
|
||||
@@ -192,7 +200,7 @@ table {
|
||||
|
||||
.info, .last-searches {
|
||||
color: gray;
|
||||
font-size: 12px;
|
||||
/*font-size: 12px;*/
|
||||
font-family: Arial, serif;
|
||||
}
|
||||
|
||||
@@ -202,27 +210,27 @@ table {
|
||||
|
||||
div.tags a, td.tag-cloud a {
|
||||
color: #b56020;
|
||||
font-size: 12px;
|
||||
/*font-size: 12px;*/
|
||||
}
|
||||
|
||||
td.tag-cloud, td.directories-cloud {
|
||||
font-size: 12px;
|
||||
/*font-size: 12px;*/
|
||||
color: #555555;
|
||||
}
|
||||
|
||||
td.directories-cloud a {
|
||||
font-size: 12px;
|
||||
/*font-size: 12px;*/
|
||||
color: #377BA8;
|
||||
}
|
||||
|
||||
div.path {
|
||||
font-size: 12px;
|
||||
/*font-size: 12px;*/
|
||||
color: #666666;
|
||||
margin-bottom: 3px;
|
||||
}
|
||||
|
||||
div.path a {
|
||||
font-size: 12px;
|
||||
/*font-size: 12px;*/
|
||||
margin-right: 5px;
|
||||
}
|
||||
|
||||
|
@@ -7,11 +7,18 @@
|
||||
<div class="col12sm" id="banner-col">
|
||||
<center>
|
||||
<a id="banner-a" href="{{ url_for('search')}}?query=&fields=">
|
||||
<img id="banner-img" src="{{ url_for('static', filename='centillion_white.png') }}">
|
||||
{% if 'betasearch' in request.url %}
|
||||
<img id="banner-img" src="{{ url_for('static', filename='centillion_white_beta.png') }}">
|
||||
{% elif 'localhost' in request.url %}
|
||||
<img id="banner-img" src="{{ url_for('static', filename='centillion_white_localhost.png') }}">
|
||||
{% else %}
|
||||
<img id="banner-img" src="{{ url_for('static', filename='centillion_white.png') }}">
|
||||
{% endif %}
|
||||
</a>
|
||||
</center>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{% if config['TAGLINE'] %}
|
||||
<div class="row" id="tagline-row">
|
||||
<div class="col12sm" id="tagline-col">
|
||||
|
@@ -5,7 +5,7 @@
|
||||
<div class="alert alert-success alert-dismissible fade in">
|
||||
<a href="#" class="close" data-dismiss="alert" aria-label="close">×</a>
|
||||
{% for message in messages %}
|
||||
<p class="lead">{{ message }}</p>
|
||||
<p>{{ message }}</p>
|
||||
{% endfor %}
|
||||
</div>
|
||||
</div>
|
||||
|
@@ -52,8 +52,8 @@
|
||||
<div class="container-fluid">
|
||||
<div class="row">
|
||||
<div class="col-xs-12 info">
|
||||
<b>Found:</b> <span class="badge">{{entries|length}}</span> results
|
||||
out of <span class="badge">{{totals["total"]}}</span> total items indexed
|
||||
<b>Found:</b> <span class="badge results-count">{{entries|length}}</span> results
|
||||
out of <span class="badge results-count">{{totals["total"]}}</span> total items indexed
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
@@ -67,32 +67,32 @@
|
||||
<div class="col-xs-12 info">
|
||||
<b>Indexing:</b>
|
||||
|
||||
<span class="badge">{{totals["gdoc"]}}</span>
|
||||
<span class="badge indexing-count">{{totals["gdoc"]}}</span>
|
||||
<a href="/master_list?doctype=gdoc#gdoc">
|
||||
Google Drive files
|
||||
</a>,
|
||||
|
||||
<span class="badge">{{totals["issue"]}}</span>
|
||||
<span class="badge indexing-count">{{totals["issue"]}}</span>
|
||||
<a href="/master_list?doctype=issue#issue">
|
||||
Github issues
|
||||
</a>,
|
||||
|
||||
<span class="badge">{{totals["ghfile"]}}</span>
|
||||
<span class="badge indexing-count">{{totals["ghfile"]}}</span>
|
||||
<a href="/master_list?doctype=ghfile#ghfile">
|
||||
Github files
|
||||
</a>,
|
||||
|
||||
<span class="badge">{{totals["markdown"]}}</span>
|
||||
<span class="badge indexing-count">{{totals["markdown"]}}</span>
|
||||
<a href="/master_list?doctype=markdown#markdown">
|
||||
Github Markdown files
|
||||
</a>,
|
||||
|
||||
<span class="badge">{{totals["emailthread"]}}</span>
|
||||
<span class="badge indexing-count">{{totals["emailthread"]}}</span>
|
||||
<a href="/master_list?doctype=emailthread#emailthread">
|
||||
Groups.io email threads
|
||||
</a>,
|
||||
|
||||
<span class="badge">{{totals["disqus"]}}</span>
|
||||
<span class="badge indexing-count">{{totals["disqus"]}}</span>
|
||||
<a href="/master_list?doctype=disqus#disqus">
|
||||
Disqus comment threads
|
||||
</a>
|
||||
@@ -101,6 +101,7 @@
|
||||
</div>
|
||||
</li>
|
||||
|
||||
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
|
Reference in New Issue
Block a user