Compare commits
31 Commits
testing
...
add-more-p
Author | SHA1 | Date | |
---|---|---|---|
8d0bf33f99 | |||
fdb3963ede | |||
90379a69c5 | |||
0faca67c35 | |||
77b533b642 | |||
ccf013e3c9 | |||
e67db4f1ef | |||
b11a26a812 | |||
55a74f7d98 | |||
ab76226b0c | |||
a4ebef6e6f | |||
bad50efa9b | |||
629fc063db | |||
4f41d8597f | |||
3b0baa21de | |||
33b8857bd0 | |||
7c50fc9ff1 | |||
eb2cdf1437 | |||
c67e864581 | |||
25cc12cf21 | |||
11c1185e62 | |||
17b2d359bb | |||
62ca62274e | |||
74cfaf8275 | |||
552caad135 | |||
19c42df978 | |||
6f30e3f120 | |||
ad6b653e27 | |||
501cae8329 | |||
0543c3e89f | |||
2191140232 |
12
.github/PULL_REQUEST_TEMPLATE.md
vendored
Normal file
12
.github/PULL_REQUEST_TEMPLATE.md
vendored
Normal file
@@ -0,0 +1,12 @@
|
||||
Thanks for contributing to centillion!
|
||||
|
||||
Please place an x between the brackets to indicate a yes answer
|
||||
to the questions below.
|
||||
|
||||
- [ ] Is this pull request mergeable?
|
||||
- [ ] Has this been tested locally?
|
||||
- [ ] Does this pull request pass the tests?
|
||||
- [ ] Have new tests been added to cover any new code?
|
||||
- [ ] Was a spellchecker run on the source code and documentation after
|
||||
changes were made?
|
||||
|
43
CODE_OF_CONDUCT.md
Normal file
43
CODE_OF_CONDUCT.md
Normal file
@@ -0,0 +1,43 @@
|
||||
# Code of Conduct
|
||||
|
||||
## DCPPC Code of Conduct
|
||||
|
||||
All members of the Commons are expected to agree with the following code
|
||||
of conduct. We will enforce this code as needed. We expect cooperation
|
||||
from all members to help ensuring a safe environment for everybody.
|
||||
|
||||
## The Quick Version
|
||||
|
||||
The Consortium is dedicated to providing a harassment-free experience
|
||||
for everyone, regardless of gender, gender identity and expression, age,
|
||||
sexual orientation, disability, physical appearance, body size, race, or
|
||||
religion (or lack thereof). We do not tolerate harassment of Consortium
|
||||
members in any form. Sexual language and imagery is generally not
|
||||
appropriate for any venue, including meetings, presentations, or
|
||||
discussions.
|
||||
|
||||
## The Less Quick Version
|
||||
|
||||
Harassment includes offensive verbal comments related to gender, gender
|
||||
identity and expression, age, sexual orientation, disability, physical
|
||||
appearance, body size, race, religion, sexual images in public spaces,
|
||||
deliberate intimidation, stalking, following, harassing photography or
|
||||
recording, sustained disruption of talks or other events, inappropriate
|
||||
physical contact, and unwelcome sexual attention.
|
||||
|
||||
Members asked to stop any harassing behavior are expected to comply
|
||||
immediately.
|
||||
|
||||
If you are being harassed, notice that someone else is being harassed,
|
||||
or have any other concerns, please contact [Titus
|
||||
Brown](mailto:ctbrown@ucdavis.edu) immediately. If Titus is the cause of
|
||||
your concern, please contact [Vivien
|
||||
Bonazzi](mailto:bonazziv@mail.nih.gov).
|
||||
|
||||
We expect members to follow these guidelines at any Consortium event.
|
||||
|
||||
Original source and credit: <http://2012.jsconf.us/#/about> & The Ada
|
||||
Initiative. Please help by translating or improving:
|
||||
<http://github.com/leftlogic/confcodeofconduct.com>. This work is
|
||||
licensed under a Creative Commons Attribution 3.0 Unported License
|
||||
|
21
CONTRIBUTING.md
Normal file
21
CONTRIBUTING.md
Normal file
@@ -0,0 +1,21 @@
|
||||
# Contributing to the DCPPC Internal Repository
|
||||
|
||||
Hello, and thank you for wanting to contribute to the DCPPC Internal
|
||||
Repository\!
|
||||
|
||||
By contributing to this repository, you agree:
|
||||
|
||||
1. To obey the [Code of Conduct](./CODE_OF_CONDUCT.md)
|
||||
2. To release all your contributions under the same terms as the
|
||||
license itself: the [Creative Commons Zero](./LICENSE.md) (aka
|
||||
Public Domain) license
|
||||
|
||||
If you are OK with these two conditions, then we welcome both you and
|
||||
your contribution\!
|
||||
|
||||
If you have any questions about contributing, please [open an
|
||||
issue](https://github.com/dcppc/internal/issues/new) and Team Copper
|
||||
will lend a hand ASAP.
|
||||
|
||||
Thank you for being here and for being a part of the DCPPC project.
|
||||
|
249
Hypothesis.md
Normal file
249
Hypothesis.md
Normal file
@@ -0,0 +1,249 @@
|
||||
# Hypothesis API
|
||||
|
||||
|
||||
## Authenticating
|
||||
|
||||
Example output call for authenticating with the API:
|
||||
|
||||
```
|
||||
{
|
||||
"links": {
|
||||
"profile": {
|
||||
"read": {
|
||||
"url": "https://hypothes.is/api/profile",
|
||||
"method": "GET",
|
||||
"desc": "Fetch the user's profile"
|
||||
},
|
||||
"update": {
|
||||
"url": "https://hypothes.is/api/profile",
|
||||
"method": "PATCH",
|
||||
"desc": "Update a user's preferences"
|
||||
}
|
||||
},
|
||||
"search": {
|
||||
"url": "https://hypothes.is/api/search",
|
||||
"method": "GET",
|
||||
"desc": "Search for annotations"
|
||||
},
|
||||
"group": {
|
||||
"member": {
|
||||
"add": {
|
||||
"url": "https://hypothes.is/api/groups/:pubid/members/:userid",
|
||||
"method": "POST",
|
||||
"desc": "Add the user in the request params to a group."
|
||||
},
|
||||
"delete": {
|
||||
"url": "https://hypothes.is/api/groups/:pubid/members/:userid",
|
||||
"method": "DELETE",
|
||||
"desc": "Remove the current user from a group."
|
||||
}
|
||||
}
|
||||
},
|
||||
"links": {
|
||||
"url": "https://hypothes.is/api/links",
|
||||
"method": "GET",
|
||||
"desc": "URL templates for generating URLs for HTML pages"
|
||||
},
|
||||
"groups": {
|
||||
"read": {
|
||||
"url": "https://hypothes.is/api/groups",
|
||||
"method": "GET",
|
||||
"desc": "Fetch the user's groups"
|
||||
}
|
||||
},
|
||||
"annotation": {
|
||||
"hide": {
|
||||
"url": "https://hypothes.is/api/annotations/:id/hide",
|
||||
"method": "PUT",
|
||||
"desc": "Hide an annotation as a group moderator."
|
||||
},
|
||||
"unhide": {
|
||||
"url": "https://hypothes.is/api/annotations/:id/hide",
|
||||
"method": "DELETE",
|
||||
"desc": "Unhide an annotation as a group moderator."
|
||||
},
|
||||
"read": {
|
||||
"url": "https://hypothes.is/api/annotations/:id",
|
||||
"method": "GET",
|
||||
"desc": "Fetch an annotation"
|
||||
},
|
||||
"create": {
|
||||
"url": "https://hypothes.is/api/annotations",
|
||||
"method": "POST",
|
||||
"desc": "Create an annotation"
|
||||
},
|
||||
"update": {
|
||||
"url": "https://hypothes.is/api/annotations/:id",
|
||||
"method": "PATCH",
|
||||
"desc": "Update an annotation"
|
||||
},
|
||||
"flag": {
|
||||
"url": "https://hypothes.is/api/annotations/:id/flag",
|
||||
"method": "PUT",
|
||||
"desc": "Flag an annotation for review."
|
||||
},
|
||||
"delete": {
|
||||
"url": "https://hypothes.is/api/annotations/:id",
|
||||
"method": "DELETE",
|
||||
"desc": "Delete an annotation"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
## Listing
|
||||
|
||||
Here is the result of the API call to list an annotation
|
||||
given its annotation ID:
|
||||
|
||||
```
|
||||
{
|
||||
"updated": "2018-07-26T10:20:47.803636+00:00",
|
||||
"group": "__world__",
|
||||
"target": [
|
||||
{
|
||||
"source": "https://h.readthedocs.io/en/latest/api/authorization/",
|
||||
"selector": [
|
||||
{
|
||||
"conformsTo": "https://tools.ietf.org/html/rfc3236",
|
||||
"type": "FragmentSelector",
|
||||
"value": "access-tokens"
|
||||
},
|
||||
{
|
||||
"endContainer": "/div[1]/section[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[2]/p[2]",
|
||||
"startContainer": "/div[1]/section[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[2]/p[1]",
|
||||
"type": "RangeSelector",
|
||||
"startOffset": 14,
|
||||
"endOffset": 116
|
||||
},
|
||||
{
|
||||
"type": "TextPositionSelector",
|
||||
"end": 2234,
|
||||
"start": 1374
|
||||
},
|
||||
{
|
||||
"exact": "hich read or write data as a specific user need to be authorized\nwith an access token. Access tokens can be obtained in two ways:\n\nBy generating a personal API token on the Hypothesis developer\npage (you must be logged in to\nHypothesis to get to this page). This is the simplest method, however\nthese tokens are only suitable for enabling your application to make\nrequests as a single specific user.\n\nBy registering an \u201cOAuth client\u201d and\nimplementing the OAuth authentication flow\nin your application. This method allows any user to authorize your\napplication to read and write data via the API as that user. The Hypothesis\nclient is an example of an application that uses OAuth.\nSee Using OAuth for details of how to implement this method.\n\n\nOnce an access token has been obtained, requests can be authorized by putting\nthe token in the Authorization header.",
|
||||
"prefix": "\n\n\nAccess tokens\u00b6\nAPI requests w",
|
||||
"type": "TextQuoteSelector",
|
||||
"suffix": "\nExample request:\nGET /api HTTP/"
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"links": {
|
||||
"json": "https://hypothes.is/api/annotations/kEaohJC9Eeiy_UOozkpkyA",
|
||||
"html": "https://hypothes.is/a/kEaohJC9Eeiy_UOozkpkyA",
|
||||
"incontext": "https://hyp.is/kEaohJC9Eeiy_UOozkpkyA/h.readthedocs.io/en/latest/api/authorization/"
|
||||
},
|
||||
"tags": [],
|
||||
"text": "sdfsdf",
|
||||
"created": "2018-07-26T10:20:47.803636+00:00",
|
||||
"uri": "https://h.readthedocs.io/en/latest/api/authorization/",
|
||||
"flagged": false,
|
||||
"user_info": {
|
||||
"display_name": null
|
||||
},
|
||||
"user": "acct:Aravindan@hypothes.is",
|
||||
"hidden": false,
|
||||
"document": {
|
||||
"title": [
|
||||
"Authorization \u2014 h 0.0.2 documentation"
|
||||
]
|
||||
},
|
||||
"id": "kEaohJC9Eeiy_UOozkpkyA",
|
||||
"permissions": {
|
||||
"read": [
|
||||
"group:__world__"
|
||||
],
|
||||
"admin": [
|
||||
"acct:Aravindan@hypothes.is"
|
||||
],
|
||||
"update": [
|
||||
"acct:Aravindan@hypothes.is"
|
||||
],
|
||||
"delete": [
|
||||
"acct:Aravindan@hypothes.is"
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Searching
|
||||
|
||||
Here is the output from a call to the endpoint to search annotations
|
||||
(we pass a specific URL to the search function):
|
||||
|
||||
```
|
||||
{
|
||||
"rows": [
|
||||
{
|
||||
"updated": "2018-08-10T02:21:46.898833+00:00",
|
||||
"group": "__world__",
|
||||
"target": [
|
||||
{
|
||||
"source": "http://pilot.data-commons.us/organize/CopperInternalDeliveryWorkFlow/",
|
||||
"selector": [
|
||||
{
|
||||
"endContainer": "/div[1]/main[1]/div[1]/div[3]/article[1]/h2[1]",
|
||||
"startContainer": "/div[1]/main[1]/div[1]/div[3]/article[1]/h2[1]",
|
||||
"type": "RangeSelector",
|
||||
"startOffset": 0,
|
||||
"endOffset": 80
|
||||
},
|
||||
{
|
||||
"type": "TextPositionSelector",
|
||||
"end": 12328,
|
||||
"start": 12248
|
||||
},
|
||||
{
|
||||
"exact": "Deliverables are due internally on the first of each month, which here is Day 1,",
|
||||
"prefix": " \n ",
|
||||
"type": "TextQuoteSelector",
|
||||
"suffix": "\u00b6\nDay -30 through -10\nCopper PM "
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"links": {
|
||||
"json": "https://hypothes.is/api/annotations/IY2W_pxEEeiVuxfD3sehjQ",
|
||||
"html": "https://hypothes.is/a/IY2W_pxEEeiVuxfD3sehjQ",
|
||||
"incontext": "https://hyp.is/IY2W_pxEEeiVuxfD3sehjQ/pilot.data-commons.us/organize/CopperInternalDeliveryWorkFlow/"
|
||||
},
|
||||
"tags": [],
|
||||
"text": "This is a sample annotation",
|
||||
"created": "2018-08-10T02:21:46.898833+00:00",
|
||||
"uri": "http://pilot.data-commons.us/organize/CopperInternalDeliveryWorkFlow/",
|
||||
"flagged": false,
|
||||
"user_info": {
|
||||
"display_name": null
|
||||
},
|
||||
"user": "acct:charlesreid1dib@hypothes.is",
|
||||
"hidden": false,
|
||||
"document": {
|
||||
"title": [
|
||||
"Copper Internal Delivery Workflow - Data Commons Internal Site"
|
||||
]
|
||||
},
|
||||
"id": "IY2W_pxEEeiVuxfD3sehjQ",
|
||||
"permissions": {
|
||||
"read": [
|
||||
"group:__world__"
|
||||
],
|
||||
"admin": [
|
||||
"acct:charlesreid1dib@hypothes.is"
|
||||
],
|
||||
"update": [
|
||||
"acct:charlesreid1dib@hypothes.is"
|
||||
],
|
||||
"delete": [
|
||||
"acct:charlesreid1dib@hypothes.is"
|
||||
]
|
||||
}
|
||||
}
|
||||
],
|
||||
"total": 1
|
||||
}
|
||||
```
|
||||
|
@@ -40,6 +40,7 @@ class UpdateIndexTask(object):
|
||||
'groupsio_username' : app_config['GROUPSIO_USERNAME'],
|
||||
'groupsio_password' : app_config['GROUPSIO_PASSWORD']
|
||||
}
|
||||
self.disqus_token = app_config['DISQUS_TOKEN']
|
||||
thread.daemon = True
|
||||
thread.start()
|
||||
|
||||
@@ -54,6 +55,7 @@ class UpdateIndexTask(object):
|
||||
|
||||
search.update_index(self.groupsio_credentials,
|
||||
self.gh_token,
|
||||
self.disqus_token,
|
||||
self.run_which,
|
||||
config)
|
||||
|
||||
@@ -265,7 +267,11 @@ def list_docs(doctype):
|
||||
if org['login']=='dcppc':
|
||||
# Business as usual
|
||||
search = Search(app.config["INDEX_DIR"])
|
||||
return jsonify(search.get_list(doctype))
|
||||
results_list = search.get_list(doctype)
|
||||
for result in results_list:
|
||||
ct = result['created_time']
|
||||
result['created_time'] = datetime.strftime(ct,"%Y-%m-%d %I:%M %p")
|
||||
return jsonify(results_list)
|
||||
|
||||
# nope
|
||||
return render_template('403.html')
|
||||
@@ -347,5 +353,5 @@ if __name__ == '__main__':
|
||||
port = 5000
|
||||
else:
|
||||
port = int(port)
|
||||
app.run(host="0.0.0.0",port=port)
|
||||
app.run(host="0.0.0.0", port=port)
|
||||
|
||||
|
@@ -6,6 +6,8 @@ import base64
|
||||
|
||||
from gdrive_util import GDrive
|
||||
from groupsio_util import GroupsIOArchivesCrawler, GroupsIOException
|
||||
from disqus_util import DisqusCrawler
|
||||
|
||||
from apiclient.http import MediaIoBaseDownload
|
||||
|
||||
import mistune
|
||||
@@ -19,8 +21,11 @@ import codecs
|
||||
from datetime import datetime
|
||||
import dateutil.parser
|
||||
|
||||
from whoosh import query
|
||||
from whoosh.qparser import MultifieldParser, QueryParser
|
||||
from whoosh.analysis import StemmingAnalyzer
|
||||
from whoosh.analysis import StemmingAnalyzer, LowercaseFilter, StopFilter
|
||||
from whoosh.qparser.dateparse import DateParserPlugin
|
||||
from whoosh import fields, index
|
||||
|
||||
|
||||
"""
|
||||
@@ -103,10 +108,21 @@ class Search:
|
||||
# ------------------------------
|
||||
# Update the entire index
|
||||
|
||||
def update_index(self, groupsio_credentials, gh_token, run_which, config):
|
||||
def update_index(self, groupsio_credentials, gh_token, disqus_token, run_which, config):
|
||||
"""
|
||||
Update the entire search index
|
||||
"""
|
||||
if run_which=='all' or run_which=='disqus':
|
||||
try:
|
||||
self.update_index_disqus(disqus_token, config)
|
||||
except Exception as e:
|
||||
print("ERROR: While re-indexing: failed to update Disqus comment threads")
|
||||
print("-"*40)
|
||||
print(repr(e))
|
||||
print("-"*40)
|
||||
print("Continuing...")
|
||||
pass
|
||||
|
||||
if run_which=='all' or run_which=='emailthreads':
|
||||
try:
|
||||
self.update_index_emailthreads(groupsio_credentials, config)
|
||||
@@ -172,7 +188,8 @@ class Search:
|
||||
os.mkdir(index_folder)
|
||||
|
||||
exists = index.exists_in(index_folder)
|
||||
stemming_analyzer = StemmingAnalyzer()
|
||||
#stemming_analyzer = StemmingAnalyzer()
|
||||
stemming_analyzer = StemmingAnalyzer() | LowercaseFilter() | StopFilter()
|
||||
|
||||
|
||||
# ------------------------------
|
||||
@@ -180,30 +197,38 @@ class Search:
|
||||
# is defined.
|
||||
|
||||
schema = Schema(
|
||||
id = ID(stored=True, unique=True),
|
||||
kind = ID(stored=True),
|
||||
id = fields.ID(stored=True, unique=True),
|
||||
kind = fields.ID(stored=True),
|
||||
|
||||
created_time = ID(stored=True),
|
||||
modified_time = ID(stored=True),
|
||||
indexed_time = ID(stored=True),
|
||||
created_time = fields.DATETIME(stored=True),
|
||||
modified_time = fields.DATETIME(stored=True),
|
||||
indexed_time = fields.DATETIME(stored=True),
|
||||
|
||||
title = TEXT(stored=True, field_boost=100.0),
|
||||
url = ID(stored=True, unique=True),
|
||||
|
||||
mimetype=ID(stored=True),
|
||||
owner_email=ID(stored=True),
|
||||
owner_name=TEXT(stored=True),
|
||||
|
||||
repo_name=TEXT(stored=True),
|
||||
repo_url=ID(stored=True),
|
||||
title = fields.TEXT(stored=True, field_boost=100.0),
|
||||
|
||||
github_user=TEXT(stored=True),
|
||||
url = fields.ID(stored=True),
|
||||
|
||||
mimetype = fields.TEXT(stored=True),
|
||||
|
||||
owner_email = fields.ID(stored=True),
|
||||
owner_name = fields.TEXT(stored=True),
|
||||
|
||||
# mainly for email threads, groups.io, hypothesis
|
||||
group = fields.ID(stored=True),
|
||||
|
||||
repo_name = fields.TEXT(stored=True),
|
||||
repo_url = fields.ID(stored=True),
|
||||
github_user = fields.TEXT(stored=True),
|
||||
|
||||
tags = fields.KEYWORD(commas=True,
|
||||
stored=True,
|
||||
lowercase=True),
|
||||
|
||||
# comments only
|
||||
issue_title=TEXT(stored=True, field_boost=100.0),
|
||||
issue_url=ID(stored=True),
|
||||
issue_title = fields.TEXT(stored=True, field_boost=100.0),
|
||||
issue_url = fields.ID(stored=True),
|
||||
|
||||
content=TEXT(stored=True, analyzer=stemming_analyzer)
|
||||
content = fields.TEXT(stored=True, analyzer=stemming_analyzer)
|
||||
)
|
||||
|
||||
|
||||
@@ -243,24 +268,32 @@ class Search:
|
||||
writer.delete_by_term('id',item['id'])
|
||||
|
||||
# Index a plain google drive file
|
||||
writer.add_document(
|
||||
id = item['id'],
|
||||
kind = 'gdoc',
|
||||
created_time = item['createdTime'],
|
||||
modified_time = item['modifiedTime'],
|
||||
indexed_time = datetime.now().replace(microsecond=0).isoformat(),
|
||||
title = item['name'],
|
||||
url = item['webViewLink'],
|
||||
mimetype = mimetype,
|
||||
owner_email = item['owners'][0]['emailAddress'],
|
||||
owner_name = item['owners'][0]['displayName'],
|
||||
repo_name='',
|
||||
repo_url='',
|
||||
github_user='',
|
||||
issue_title='',
|
||||
issue_url='',
|
||||
content = content
|
||||
)
|
||||
created_time = dateutil.parser.parse(item['createdTime'])
|
||||
modified_time = dateutil.parser.parse(item['modifiedTime'])
|
||||
indexed_time = datetime.now().replace(microsecond=0)
|
||||
try:
|
||||
writer.add_document(
|
||||
id = item['id'],
|
||||
kind = 'gdoc',
|
||||
created_time = created_time,
|
||||
modified_time = modified_time,
|
||||
indexed_time = indexed_time,
|
||||
title = item['name'],
|
||||
url = item['webViewLink'],
|
||||
mimetype = mimetype,
|
||||
owner_email = item['owners'][0]['emailAddress'],
|
||||
owner_name = item['owners'][0]['displayName'],
|
||||
group='',
|
||||
repo_name='',
|
||||
repo_url='',
|
||||
github_user='',
|
||||
issue_title='',
|
||||
issue_url='',
|
||||
content = content
|
||||
)
|
||||
except ValueError as e:
|
||||
print(repr(e))
|
||||
print(" > XXXXXX Failed to index Google Drive file \"%s\""%(item['name']))
|
||||
|
||||
|
||||
else:
|
||||
@@ -314,7 +347,7 @@ class Search:
|
||||
)
|
||||
assert output == ""
|
||||
except RuntimeError:
|
||||
print(" > XXXXXX Failed to index document \"%s\""%(item['name']))
|
||||
print(" > XXXXXX Failed to index Google Drive document \"%s\""%(item['name']))
|
||||
|
||||
|
||||
# If export was successful, read contents of markdown
|
||||
@@ -342,24 +375,33 @@ class Search:
|
||||
else:
|
||||
print(" > Creating a new record")
|
||||
|
||||
writer.add_document(
|
||||
id = item['id'],
|
||||
kind = 'gdoc',
|
||||
created_time = item['createdTime'],
|
||||
modified_time = item['modifiedTime'],
|
||||
indexed_time = datetime.now().replace(microsecond=0).isoformat(),
|
||||
title = item['name'],
|
||||
url = item['webViewLink'],
|
||||
mimetype = mimetype,
|
||||
owner_email = item['owners'][0]['emailAddress'],
|
||||
owner_name = item['owners'][0]['displayName'],
|
||||
repo_name='',
|
||||
repo_url='',
|
||||
github_user='',
|
||||
issue_title='',
|
||||
issue_url='',
|
||||
content = content
|
||||
)
|
||||
try:
|
||||
created_time = dateutil.parser.parse(item['createdTime'])
|
||||
modified_time = dateutil.parser.parse(item['modifiedTime'])
|
||||
indexed_time = datetime.now()
|
||||
writer.add_document(
|
||||
id = item['id'],
|
||||
kind = 'gdoc',
|
||||
created_time = created_time,
|
||||
modified_time = modified_time,
|
||||
indexed_time = indexed_time,
|
||||
title = item['name'],
|
||||
url = item['webViewLink'],
|
||||
mimetype = mimetype,
|
||||
owner_email = item['owners'][0]['emailAddress'],
|
||||
owner_name = item['owners'][0]['displayName'],
|
||||
group='',
|
||||
repo_name='',
|
||||
repo_url='',
|
||||
github_user='',
|
||||
issue_title='',
|
||||
issue_url='',
|
||||
content = content
|
||||
)
|
||||
except ValueError as e:
|
||||
print(repr(e))
|
||||
print(" > XXXXXX Failed to index Google Drive file \"%s\""%(item['name']))
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -393,31 +435,36 @@ class Search:
|
||||
issue_comment_content += comment.body.rstrip()
|
||||
issue_comment_content += "\n"
|
||||
|
||||
# Now create the actual search index record
|
||||
created_time = clean_timestamp(issue.created_at)
|
||||
modified_time = clean_timestamp(issue.updated_at)
|
||||
indexed_time = clean_timestamp(datetime.now())
|
||||
|
||||
# Now create the actual search index record.
|
||||
# Add one document per issue thread,
|
||||
# containing entire text of thread.
|
||||
writer.add_document(
|
||||
id = issue.html_url,
|
||||
kind = 'issue',
|
||||
created_time = created_time,
|
||||
modified_time = modified_time,
|
||||
indexed_time = indexed_time,
|
||||
title = issue.title,
|
||||
url = issue.html_url,
|
||||
mimetype='',
|
||||
owner_email='',
|
||||
owner_name='',
|
||||
repo_name = repo_name,
|
||||
repo_url = repo_url,
|
||||
github_user = issue.user.login,
|
||||
issue_title = issue.title,
|
||||
issue_url = issue.html_url,
|
||||
content = issue_comment_content
|
||||
)
|
||||
|
||||
created_time = issue.created_at
|
||||
modified_time = issue.updated_at
|
||||
indexed_time = datetime.now()
|
||||
try:
|
||||
writer.add_document(
|
||||
id = issue.html_url,
|
||||
kind = 'issue',
|
||||
created_time = created_time,
|
||||
modified_time = modified_time,
|
||||
indexed_time = indexed_time,
|
||||
title = issue.title,
|
||||
url = issue.html_url,
|
||||
mimetype='',
|
||||
owner_email='',
|
||||
owner_name='',
|
||||
group='',
|
||||
repo_name = repo_name,
|
||||
repo_url = repo_url,
|
||||
github_user = issue.user.login,
|
||||
issue_title = issue.title,
|
||||
issue_url = issue.html_url,
|
||||
content = issue_comment_content
|
||||
)
|
||||
except ValueError as e:
|
||||
print(repr(e))
|
||||
print(" > XXXXXX Failed to index Github issue \"%s\""%(issue.title))
|
||||
|
||||
|
||||
|
||||
@@ -447,7 +494,8 @@ class Search:
|
||||
print(" > XXXXXXXX Failed to find file info.")
|
||||
return
|
||||
|
||||
indexed_time = clean_timestamp(datetime.now())
|
||||
|
||||
indexed_time = datetime.now()
|
||||
|
||||
if fext in MARKDOWN_EXTS:
|
||||
print("Indexing markdown doc %s from repo %s"%(fname,repo_name))
|
||||
@@ -476,24 +524,31 @@ class Search:
|
||||
usable_url = "https://github.com/%s/blob/master/%s"%(repo_name, fpath)
|
||||
|
||||
# Now create the actual search index record
|
||||
writer.add_document(
|
||||
id = fsha,
|
||||
kind = 'markdown',
|
||||
created_time = '',
|
||||
modified_time = '',
|
||||
indexed_time = indexed_time,
|
||||
title = fname,
|
||||
url = usable_url,
|
||||
mimetype='',
|
||||
owner_email='',
|
||||
owner_name='',
|
||||
repo_name = repo_name,
|
||||
repo_url = repo_url,
|
||||
github_user = '',
|
||||
issue_title = '',
|
||||
issue_url = '',
|
||||
content = content
|
||||
)
|
||||
try:
|
||||
writer.add_document(
|
||||
id = fsha,
|
||||
kind = 'markdown',
|
||||
created_time = None,
|
||||
modified_time = None,
|
||||
indexed_time = indexed_time,
|
||||
title = fname,
|
||||
url = usable_url,
|
||||
mimetype='',
|
||||
owner_email='',
|
||||
owner_name='',
|
||||
group='',
|
||||
repo_name = repo_name,
|
||||
repo_url = repo_url,
|
||||
github_user = '',
|
||||
issue_title = '',
|
||||
issue_url = '',
|
||||
content = content
|
||||
)
|
||||
except ValueError as e:
|
||||
print(repr(e))
|
||||
print(" > XXXXXX Failed to index Github markdown file \"%s\""%(fname))
|
||||
|
||||
|
||||
|
||||
else:
|
||||
print("Indexing github file %s from repo %s"%(fname,repo_name))
|
||||
@@ -501,24 +556,29 @@ class Search:
|
||||
key = fname+"_"+fsha
|
||||
|
||||
# Now create the actual search index record
|
||||
writer.add_document(
|
||||
id = key,
|
||||
kind = 'ghfile',
|
||||
created_time = '',
|
||||
modified_time = '',
|
||||
indexed_time = indexed_time,
|
||||
title = fname,
|
||||
url = repo_url,
|
||||
mimetype='',
|
||||
owner_email='',
|
||||
owner_name='',
|
||||
repo_name = repo_name,
|
||||
repo_url = repo_url,
|
||||
github_user = '',
|
||||
issue_title = '',
|
||||
issue_url = '',
|
||||
content = ''
|
||||
)
|
||||
try:
|
||||
writer.add_document(
|
||||
id = key,
|
||||
kind = 'ghfile',
|
||||
created_time = None,
|
||||
modified_time = None,
|
||||
indexed_time = indexed_time,
|
||||
title = fname,
|
||||
url = repo_url,
|
||||
mimetype='',
|
||||
owner_email='',
|
||||
owner_name='',
|
||||
group='',
|
||||
repo_name = repo_name,
|
||||
repo_url = repo_url,
|
||||
github_user = '',
|
||||
issue_title = '',
|
||||
issue_url = '',
|
||||
content = ''
|
||||
)
|
||||
except ValueError as e:
|
||||
print(repr(e))
|
||||
print(" > XXXXXX Failed to index Github file \"%s\""%(fname))
|
||||
|
||||
|
||||
|
||||
@@ -529,30 +589,84 @@ class Search:
|
||||
|
||||
def add_emailthread(self, writer, d, config, update=True):
|
||||
"""
|
||||
Use a Github file API record to add a filename
|
||||
to the search index.
|
||||
Use a Groups.io email thread record to add
|
||||
an email thread to the search index.
|
||||
"""
|
||||
indexed_time = clean_timestamp(datetime.now())
|
||||
if 'created_time' in d.keys() and d['created_time'] is not None:
|
||||
created_time = d['created_time']
|
||||
else:
|
||||
created_time = None
|
||||
|
||||
if 'modified_time' in d.keys() and d['modified_time'] is not None:
|
||||
modified_time = d['modified_time']
|
||||
else:
|
||||
modified_time = None
|
||||
|
||||
indexed_time = datetime.now()
|
||||
|
||||
# Now create the actual search index record
|
||||
writer.add_document(
|
||||
id = d['permalink'],
|
||||
kind = 'emailthread',
|
||||
created_time = '',
|
||||
modified_time = '',
|
||||
indexed_time = indexed_time,
|
||||
title = d['subject'],
|
||||
url = d['permalink'],
|
||||
mimetype='',
|
||||
owner_email='',
|
||||
owner_name=d['original_sender'],
|
||||
repo_name = '',
|
||||
repo_url = '',
|
||||
github_user = '',
|
||||
issue_title = '',
|
||||
issue_url = '',
|
||||
content = d['content']
|
||||
)
|
||||
try:
|
||||
writer.add_document(
|
||||
id = d['permalink'],
|
||||
kind = 'emailthread',
|
||||
created_time = created_time,
|
||||
modified_time = modified_time,
|
||||
indexed_time = indexed_time,
|
||||
title = d['subject'],
|
||||
url = d['permalink'],
|
||||
mimetype='',
|
||||
owner_email='',
|
||||
owner_name=d['original_sender'],
|
||||
group=d['subgroup'],
|
||||
repo_name = '',
|
||||
repo_url = '',
|
||||
github_user = '',
|
||||
issue_title = '',
|
||||
issue_url = '',
|
||||
content = d['content']
|
||||
)
|
||||
except ValueError as e:
|
||||
print(repr(e))
|
||||
print(" > XXXXXX Failed to index Groups.io thread \"%s\""%(d['subject']))
|
||||
|
||||
|
||||
# ------------------------------
|
||||
# Add a single disqus comment thread
|
||||
# to the search index.
|
||||
|
||||
def add_disqusthread(self, writer, d, config, update=True):
|
||||
"""
|
||||
Use a disqus comment thread record
|
||||
to add a disqus comment thread to the
|
||||
search index.
|
||||
"""
|
||||
indexed_time = datetime.now()
|
||||
|
||||
# created_time is already a timestamp
|
||||
|
||||
# Now create the actual search index record
|
||||
try:
|
||||
writer.add_document(
|
||||
id = d['id'],
|
||||
kind = 'disqus',
|
||||
created_time = d['created_time'],
|
||||
modified_time = None,
|
||||
indexed_time = indexed_time,
|
||||
title = d['title'],
|
||||
url = d['link'],
|
||||
mimetype='',
|
||||
owner_email='',
|
||||
owner_name='',
|
||||
repo_name = '',
|
||||
repo_url = '',
|
||||
github_user = '',
|
||||
issue_title = '',
|
||||
issue_url = '',
|
||||
content = d['content']
|
||||
)
|
||||
except ValueError as e:
|
||||
print(repr(e))
|
||||
print(" > XXXXXX Failed to index Disqus comment thread \"%s\""%(d['title']))
|
||||
|
||||
|
||||
|
||||
@@ -580,9 +694,8 @@ class Search:
|
||||
# Updated algorithm:
|
||||
# - get set of indexed ids
|
||||
# - get set of remote ids
|
||||
# - drop indexed ids not in remote ids
|
||||
# - drop all indexed ids
|
||||
# - index all remote ids
|
||||
# - add hash check in add_
|
||||
|
||||
|
||||
# Get the set of indexed ids:
|
||||
@@ -631,10 +744,10 @@ class Search:
|
||||
full_items[f['id']] = f
|
||||
|
||||
## Shorter:
|
||||
#break
|
||||
# Longer:
|
||||
if nextPageToken is None:
|
||||
break
|
||||
break
|
||||
## Longer:
|
||||
#if nextPageToken is None:
|
||||
# break
|
||||
|
||||
|
||||
writer = self.ix.writer()
|
||||
@@ -642,34 +755,41 @@ class Search:
|
||||
temp_dir = tempfile.mkdtemp(dir=os.getcwd())
|
||||
print("Temporary directory: %s"%(temp_dir))
|
||||
|
||||
try:
|
||||
|
||||
# Drop any id in indexed_ids
|
||||
# not in remote_ids
|
||||
drop_ids = indexed_ids - remote_ids
|
||||
for drop_id in drop_ids:
|
||||
writer.delete_by_term('id',drop_id)
|
||||
|
||||
|
||||
# Drop any id in indexed_ids
|
||||
# not in remote_ids
|
||||
drop_ids = indexed_ids - remote_ids
|
||||
for drop_id in drop_ids:
|
||||
writer.delete_by_term('id',drop_id)
|
||||
# Update any id in indexed_ids
|
||||
# and in remote_ids
|
||||
update_ids = indexed_ids & remote_ids
|
||||
for update_id in update_ids:
|
||||
# cop out
|
||||
writer.delete_by_term('id',update_id)
|
||||
item = full_items[update_id]
|
||||
self.add_drive_file(writer, item, temp_dir, config, update=True)
|
||||
count += 1
|
||||
|
||||
|
||||
# Update any id in indexed_ids
|
||||
# and in remote_ids
|
||||
update_ids = indexed_ids & remote_ids
|
||||
for update_id in update_ids:
|
||||
# cop out
|
||||
writer.delete_by_term('id',update_id)
|
||||
item = full_items[update_id]
|
||||
self.add_drive_file(writer, item, temp_dir, config, update=True)
|
||||
count += 1
|
||||
|
||||
|
||||
# Add any id not in indexed_ids
|
||||
# and in remote_ids
|
||||
add_ids = remote_ids - indexed_ids
|
||||
for add_id in add_ids:
|
||||
item = full_items[add_id]
|
||||
self.add_drive_file(writer, item, temp_dir, config, update=False)
|
||||
count += 1
|
||||
# Add any id not in indexed_ids
|
||||
# and in remote_ids
|
||||
add_ids = remote_ids - indexed_ids
|
||||
for add_id in add_ids:
|
||||
item = full_items[add_id]
|
||||
self.add_drive_file(writer, item, temp_dir, config, update=False)
|
||||
count += 1
|
||||
|
||||
except Exception as e:
|
||||
print("ERROR: While adding Google Drive files to search index")
|
||||
print("-"*40)
|
||||
print(repr(e))
|
||||
print("-"*40)
|
||||
print("Continuing...")
|
||||
pass
|
||||
|
||||
print("Cleaning temporary directory: %s"%(temp_dir))
|
||||
subprocess.call(['rm','-fr',temp_dir])
|
||||
@@ -686,12 +806,6 @@ class Search:
|
||||
Update the search index using a collection of
|
||||
Github repo issues and comments.
|
||||
"""
|
||||
# Updated algorithm:
|
||||
# - get set of indexed ids
|
||||
# - get set of remote ids
|
||||
# - drop indexed ids not in remote ids
|
||||
# - index all remote ids
|
||||
|
||||
# Get the set of indexed ids:
|
||||
# ------
|
||||
indexed_issues = set()
|
||||
@@ -772,12 +886,6 @@ class Search:
|
||||
files (and, separately, Markdown files) from
|
||||
a Github repo.
|
||||
"""
|
||||
# Updated algorithm:
|
||||
# - get set of indexed ids
|
||||
# - get set of remote ids
|
||||
# - drop indexed ids not in remote ids
|
||||
# - index all remote ids
|
||||
|
||||
# Get the set of indexed ids:
|
||||
# ------
|
||||
indexed_ids = set()
|
||||
@@ -896,12 +1004,6 @@ class Search:
|
||||
|
||||
RELEASE THE SPIDER!!!
|
||||
"""
|
||||
# Algorithm:
|
||||
# - get set of indexed ids
|
||||
# - get set of remote ids
|
||||
# - drop indexed ids not in remote ids
|
||||
# - index all remote ids
|
||||
|
||||
# Get the set of indexed ids:
|
||||
# ------
|
||||
indexed_ids = set()
|
||||
@@ -919,16 +1021,17 @@ class Search:
|
||||
# ask spider to crawl the archives
|
||||
spider.crawl_group_archives()
|
||||
|
||||
# now spider.archives is a list of dictionaries
|
||||
# that each represent a thread:
|
||||
# thread = {
|
||||
# 'permalink' : permalink,
|
||||
# 'subject' : subject,
|
||||
# 'original_sender' : original_sender,
|
||||
# 'content' : full_content
|
||||
# }
|
||||
# now spider.archives is a dictionary
|
||||
# with one key per thread ID,
|
||||
# and a value set to the payload:
|
||||
# '<thread-id>' : {
|
||||
# 'permalink' : permalink,
|
||||
# 'subject' : subject,
|
||||
# 'original_sender' : original_sender,
|
||||
# 'content' : full_content
|
||||
# }
|
||||
#
|
||||
# It is hard to reliablly extract more information
|
||||
# It is hard to reliably extract more information
|
||||
# than that from the email thread.
|
||||
|
||||
writer = self.ix.writer()
|
||||
@@ -958,6 +1061,75 @@ class Search:
|
||||
print("Done, updated %d Groups.io email threads in the index" % count)
|
||||
|
||||
|
||||
|
||||
# ------------------------------
|
||||
# Disqus Comments
|
||||
|
||||
|
||||
def update_index_disqus(self, disqus_token, config):
|
||||
"""
|
||||
Update the search index using a collection of
|
||||
Disqus comment threads from the dcppc-internal
|
||||
forum.
|
||||
"""
|
||||
# Updated algorithm:
|
||||
# - get set of indexed ids
|
||||
# - get set of remote ids
|
||||
# - drop all indexed ids
|
||||
# - index all remote ids
|
||||
|
||||
# Get the set of indexed ids:
|
||||
# --------------------
|
||||
indexed_ids = set()
|
||||
p = QueryParser("kind", schema=self.ix.schema)
|
||||
q = p.parse("disqus")
|
||||
with self.ix.searcher() as s:
|
||||
results = s.search(q,limit=None)
|
||||
for result in results:
|
||||
indexed_ids.add(result['id'])
|
||||
|
||||
# Get the set of remote ids:
|
||||
# ------
|
||||
spider = DisqusCrawler(disqus_token,'dcppc-internal')
|
||||
|
||||
# ask spider to crawl disqus comments
|
||||
spider.crawl_threads()
|
||||
|
||||
# spider.comments will be a dictionary
|
||||
# with keys as thread IDs and values as
|
||||
# a dictionary item
|
||||
|
||||
writer = self.ix.writer()
|
||||
count = 0
|
||||
|
||||
# archives is a dictionary
|
||||
# keys are IDs (urls)
|
||||
# values are dictionaries
|
||||
threads = spider.get_threads()
|
||||
|
||||
# Start by collecting all the things
|
||||
remote_ids = set()
|
||||
for k in threads.keys():
|
||||
remote_ids.add(k)
|
||||
|
||||
# drop indexed_ids
|
||||
for drop_id in indexed_ids:
|
||||
writer.delete_by_term('id',drop_id)
|
||||
|
||||
# add remote_ids
|
||||
for add_id in remote_ids:
|
||||
item = threads[add_id]
|
||||
self.add_disqusthread(writer, item, config, update=False)
|
||||
count += 1
|
||||
|
||||
writer.commit()
|
||||
print("Done, updated %d Disqus comment threads in the index" % count)
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
# ---------------------------------
|
||||
# Search results bundler
|
||||
|
||||
@@ -1044,6 +1216,7 @@ class Search:
|
||||
"ghfile" : None,
|
||||
"markdown" : None,
|
||||
"emailthread" : None,
|
||||
"disqus" : None,
|
||||
"total" : None
|
||||
}
|
||||
for key in counts.keys():
|
||||
@@ -1074,7 +1247,9 @@ class Search:
|
||||
elif doctype=='issue':
|
||||
item_keys = ['title','repo_name','repo_url','url','created_time','modified_time']
|
||||
elif doctype=='emailthread':
|
||||
item_keys = ['title','owner_name','url']
|
||||
item_keys = ['title','owner_name','url','group','created_time','modified_time']
|
||||
elif doctype=='disqus':
|
||||
item_keys = ['title','created_time','url']
|
||||
elif doctype=='ghfile':
|
||||
item_keys = ['title','repo_name','repo_url','url']
|
||||
elif doctype=='markdown':
|
||||
@@ -1091,11 +1266,7 @@ class Search:
|
||||
for r in results:
|
||||
d = {}
|
||||
for k in item_keys:
|
||||
if k=='created_time' or k=='modified_time':
|
||||
#d[k] = r[k]
|
||||
d[k] = dateutil.parser.parse(r[k]).strftime("%Y-%m-%d")
|
||||
else:
|
||||
d[k] = r[k]
|
||||
d[k] = r[k]
|
||||
json_results.append(d)
|
||||
|
||||
return json_results
|
||||
@@ -1108,7 +1279,16 @@ class Search:
|
||||
query_string = " ".join(query_list)
|
||||
query = None
|
||||
if ":" in query_string:
|
||||
query = QueryParser("content", self.schema).parse(query_string)
|
||||
|
||||
#query = QueryParser("content",
|
||||
# self.schema
|
||||
#).parse(query_string)
|
||||
query = QueryParser("content",
|
||||
self.schema,
|
||||
termclass=query.Variations
|
||||
)
|
||||
query.add_plugin(DateParserPlugin(free=True))
|
||||
query = query.parse(query_string)
|
||||
elif len(fields) == 1 and fields[0] == "filename":
|
||||
pass
|
||||
elif len(fields) == 2:
|
||||
@@ -1116,9 +1296,12 @@ class Search:
|
||||
else:
|
||||
# If the user does not specify a field,
|
||||
# these are the fields that are actually searched
|
||||
fields = ['title', 'content','owner_name','owner_email','url']
|
||||
fields = ['title', 'content','owner_name','owner_email','url','created_date','modified_date']
|
||||
if not query:
|
||||
query = MultifieldParser(fields, schema=self.ix.schema).parse(query_string)
|
||||
query = MultifieldParser(fields, schema=self.ix.schema)
|
||||
query.add_plugin(DateParserPlugin(free=True))
|
||||
query = query.parse(query_string)
|
||||
#query = MultifieldParser(fields, schema=self.ix.schema).parse(query_string)
|
||||
parsed_query = "%s" % query
|
||||
print("query: %s" % parsed_query)
|
||||
results = searcher.search(query, terms=False, scored=True, groupedby="kind")
|
||||
|
154
disqus_util.py
Normal file
154
disqus_util.py
Normal file
@@ -0,0 +1,154 @@
|
||||
import os, re
|
||||
import requests
|
||||
import json
|
||||
import dateutil.parser
|
||||
|
||||
from pprint import pprint
|
||||
|
||||
"""
|
||||
Convenience class wrapper for Disqus comments.
|
||||
|
||||
This requires that the user provide either their
|
||||
API OAuth application credentials (in which case
|
||||
a user needs to authenticate with the application
|
||||
so it can access the comments that they can see)
|
||||
or user credentials from a previous login.
|
||||
"""
|
||||
|
||||
class DisqusCrawler(object):
|
||||
|
||||
def __init__(self,
|
||||
credentials,
|
||||
group_name):
|
||||
|
||||
self.credentials = credentials
|
||||
self.group_name = group_name
|
||||
self.crawled_comments = False
|
||||
self.threads = None
|
||||
|
||||
|
||||
def get_threads(self):
|
||||
"""
|
||||
Return a list of dictionaries containing
|
||||
entries for each comment thread in the given
|
||||
disqus forum.
|
||||
"""
|
||||
return self.threads
|
||||
|
||||
|
||||
def crawl_threads(self):
|
||||
"""
|
||||
This will use the API to get every thread,
|
||||
and will iterate through every thread to
|
||||
get every comment thread.
|
||||
"""
|
||||
# The money shot
|
||||
threads = {}
|
||||
|
||||
# list all threads
|
||||
list_threads_url = 'https://disqus.com/api/3.0/threads/list.json'
|
||||
|
||||
# list all posts (comments)
|
||||
list_posts_url = 'https://disqus.com/api/3.0/threads/listPosts.json'
|
||||
|
||||
base_params = dict(
|
||||
api_key=self.credentials,
|
||||
forum=self.group_name
|
||||
)
|
||||
|
||||
# prepare url params
|
||||
params = {}
|
||||
for k in base_params.keys():
|
||||
params[k] = base_params[k]
|
||||
|
||||
# make api call (first loop in fencepost)
|
||||
results = requests.request('GET', list_threads_url, params=params).json()
|
||||
cursor = results['cursor']
|
||||
responses = results['response']
|
||||
|
||||
while True:
|
||||
|
||||
for response in responses:
|
||||
if '127.0.0.1' not in response['link'] and 'localhost' not in response['link']:
|
||||
|
||||
# Save thread info
|
||||
thread_id = response['id']
|
||||
thread_count = response['posts']
|
||||
|
||||
print("Working on thread %s (%d posts)"%(thread_id,thread_count))
|
||||
if thread_count > 0:
|
||||
|
||||
# prepare url params
|
||||
params_comments = {}
|
||||
for k in base_params.keys():
|
||||
params_comments[k] = base_params[k]
|
||||
|
||||
params_comments['thread'] = thread_id
|
||||
|
||||
# make api call
|
||||
results_comments = requests.request('GET', list_posts_url, params=params_comments).json()
|
||||
cursor_comments = results_comments['cursor']
|
||||
responses_comments = results_comments['response']
|
||||
|
||||
# Save comments for this thread
|
||||
thread_comments = []
|
||||
|
||||
while True:
|
||||
for comment in responses_comments:
|
||||
# Save comment info
|
||||
print(" + %s"%(comment['message']))
|
||||
thread_comments.append(comment['message'])
|
||||
|
||||
if cursor_comments['hasNext']:
|
||||
|
||||
# Prepare for the next URL call
|
||||
params_comments = {}
|
||||
for k in base_params.keys():
|
||||
params_comments[k] = base_params[k]
|
||||
params_comments['thread'] = thread_id
|
||||
params_comments['cursor'] = cursor_comments['next']
|
||||
|
||||
# Make the next URL call
|
||||
results_comments = requests.request('GET', list_posts_url, params=params_comments).json()
|
||||
cursor_comments = results_comments['cursor']
|
||||
responses_comments = results_comments['response']
|
||||
|
||||
else:
|
||||
break
|
||||
|
||||
link = response['link']
|
||||
clean_link = re.sub('data-commons.us','nihdatacommons.us',link)
|
||||
clean_link += "#disqus_comments"
|
||||
|
||||
# Finished working on thread.
|
||||
|
||||
# We need to make this value a dictionary
|
||||
thread_info = dict(
|
||||
id = response['id'],
|
||||
created_time = dateutil.parser.parse(response['createdAt']),
|
||||
title = response['title'],
|
||||
forum = response['forum'],
|
||||
link = clean_link,
|
||||
content = "\n\n-----".join(thread_comments)
|
||||
)
|
||||
threads[thread_id] = thread_info
|
||||
|
||||
|
||||
if 'hasNext' in cursor.keys() and cursor['hasNext']:
|
||||
|
||||
# Prepare for next URL call
|
||||
params = {}
|
||||
for k in base_params.keys():
|
||||
params[k] = base_params[k]
|
||||
params['cursor'] = cursor['next']
|
||||
|
||||
# Make the next URL call
|
||||
results = requests.request('GET', list_threads_url, params=params).json()
|
||||
cursor = results['cursor']
|
||||
responses = results['response']
|
||||
|
||||
else:
|
||||
break
|
||||
|
||||
self.threads = threads
|
||||
|
@@ -1,5 +1,7 @@
|
||||
import requests, os, re
|
||||
from bs4 import BeautifulSoup
|
||||
import dateutil.parser
|
||||
import datetime
|
||||
|
||||
class GroupsIOException(Exception):
|
||||
pass
|
||||
@@ -64,7 +66,7 @@ class GroupsIOArchivesCrawler(object):
|
||||
|
||||
## Short circuit
|
||||
## for debugging purposes
|
||||
#break
|
||||
break
|
||||
|
||||
return subgroups
|
||||
|
||||
@@ -251,7 +253,7 @@ class GroupsIOArchivesCrawler(object):
|
||||
subject = soup.find('title').text
|
||||
|
||||
# Extract information for the schema:
|
||||
# - permalink for thread (done)
|
||||
# - permalink for thread (done above)
|
||||
# - subject/title (done)
|
||||
# - original sender email/name (done)
|
||||
# - content (done)
|
||||
@@ -266,11 +268,35 @@ class GroupsIOArchivesCrawler(object):
|
||||
pass
|
||||
else:
|
||||
# found an email!
|
||||
# this is a maze, thanks groups.io
|
||||
# this is a maze, not amazing.
|
||||
# thanks groups.io!
|
||||
td = tr.find('td')
|
||||
divrow = td.find('div',{'class':'row'}).find('div',{'class':'pull-left'})
|
||||
|
||||
sender_divrow = td.find('div',{'class':'row'})
|
||||
sender_divrow = sender_divrow.find('div',{'class':'pull-left'})
|
||||
if (i+1)==1:
|
||||
original_sender = divrow.text.strip()
|
||||
original_sender = sender_divrow.text.strip()
|
||||
|
||||
date_divrow = td.find('div',{'class':'row'})
|
||||
date_divrow = date_divrow.find('div',{'class':'pull-right'})
|
||||
date_divrow = date_divrow.find('font',{'class':'text-muted'})
|
||||
date_divrow = date_divrow.find('script').text
|
||||
try:
|
||||
time_seconds = re.search(' [0-9]{1,} ',date_divrow).group(0)
|
||||
time_seconds = time_seconds.strip()
|
||||
# Thanks groups.io for the weird date formatting
|
||||
time_seconds = time_seconds[:10]
|
||||
mmicro_seconds = time_seconds[10:]
|
||||
if (i+1)==1:
|
||||
created_time = datetime.datetime.utcfromtimestamp(int(time_seconds))
|
||||
modified_time = datetime.datetime.utcfromtimestamp(int(time_seconds))
|
||||
else:
|
||||
modified_time = datetime.datetime.utcfromtimestamp(int(time_seconds))
|
||||
|
||||
except AttributeError:
|
||||
created_time = None
|
||||
modified_time = None
|
||||
|
||||
for div in td.find_all('div'):
|
||||
if div.has_attr('id'):
|
||||
|
||||
@@ -299,7 +325,10 @@ class GroupsIOArchivesCrawler(object):
|
||||
|
||||
thread = {
|
||||
'permalink' : permalink,
|
||||
'created_time' : created_time,
|
||||
'modified_time' : modified_time,
|
||||
'subject' : subject,
|
||||
'subgroup' : subgroup_name,
|
||||
'original_sender' : original_sender,
|
||||
'content' : full_content
|
||||
}
|
||||
@@ -324,11 +353,13 @@ class GroupsIOArchivesCrawler(object):
|
||||
|
||||
results = []
|
||||
for row in rows:
|
||||
# We don't care about anything except title and ugly link
|
||||
# This is where we extract
|
||||
# a list of thread titles
|
||||
# and corresponding links.
|
||||
subject = row.find('span',{'class':'subject'})
|
||||
title = subject.get_text()
|
||||
link = row.find('a')['href']
|
||||
#print(title)
|
||||
|
||||
results.append((title,link))
|
||||
|
||||
return results
|
||||
|
89
hypothesis_util.py
Normal file
89
hypothesis_util.py
Normal file
@@ -0,0 +1,89 @@
|
||||
import requests
|
||||
import json
|
||||
import os
|
||||
|
||||
def get_headers():
|
||||
|
||||
if 'HYPOTHESIS_TOKEN' in os.environ:
|
||||
token = os.environ['HYPOTHESIS_TOKEN']
|
||||
else:
|
||||
raise Exception("Need to specify Hypothesis token with HYPOTHESIS_TOKEN env var")
|
||||
|
||||
auth_header = 'Bearer %s'%(token)
|
||||
|
||||
return {'Authorization': auth_header}
|
||||
|
||||
|
||||
def basic_auth():
|
||||
|
||||
url = ' https://hypothes.is/api'
|
||||
|
||||
# Get the authorization header
|
||||
headers = get_headers()
|
||||
|
||||
# Make the request
|
||||
response = requests.get(url, headers=headers)
|
||||
|
||||
if response.status_code==200:
|
||||
|
||||
# Interpret results as JSON
|
||||
dat = response.json()
|
||||
print(json.dumps(dat, indent=4))
|
||||
|
||||
else:
|
||||
|
||||
print("Response status code was not OK: %d"%(response.status_code))
|
||||
|
||||
|
||||
def list_annotations():
|
||||
# kEaohJC9Eeiy_UOozkpkyA
|
||||
|
||||
url = 'https://hypothes.is/api/annotations/kEaohJC9Eeiy_UOozkpkyA'
|
||||
|
||||
# Get the authorization header
|
||||
headers = get_headers()
|
||||
|
||||
# Make the request
|
||||
response = requests.get(url, headers=headers)
|
||||
|
||||
if response.status_code==200:
|
||||
|
||||
# Interpret results as JSON
|
||||
dat = response.json()
|
||||
print(json.dumps(dat, indent=4))
|
||||
|
||||
else:
|
||||
|
||||
print("Response status code was not OK: %d"%(response.status_code))
|
||||
|
||||
|
||||
def search_annotations():
|
||||
url = ' https://hypothes.is/api/search'
|
||||
|
||||
# Get the authorization header
|
||||
headers = get_headers()
|
||||
|
||||
# Set query params
|
||||
params = dict(
|
||||
url = '*pilot.nihdatacommons.us*',
|
||||
limit = 200
|
||||
)
|
||||
#http://pilot.nihdatacommons.us/organize/CopperInternalDeliveryWorkFlow/',
|
||||
|
||||
# Make the request
|
||||
response = requests.get(url, headers=headers, params=params)
|
||||
|
||||
if response.status_code==200:
|
||||
|
||||
# Interpret results as JSON
|
||||
dat = response.json()
|
||||
print(json.dumps(dat, indent=4))
|
||||
|
||||
else:
|
||||
|
||||
print("Response status code was not OK: %d"%(response.status_code))
|
||||
|
||||
|
||||
if __name__=="__main__":
|
||||
search_annotations()
|
||||
|
@@ -1,181 +0,0 @@
|
||||
# Centillion Quality Engineering Plan
|
||||
|
||||
Table of Contents
|
||||
-------
|
||||
|
||||
* [Centillion Quality Engineering Plan](#centillion-quality-engineering-plan)
|
||||
* [Summary](#summary)
|
||||
* [Tracking Bugs and Issues](#tracking-bugs-and-issues)
|
||||
* [Branches, Versioning, and Git Workflow](#branches-versioning-and-git-workflow)
|
||||
* [Communication and Mailing Lists](#communication-and-mailing-lists)
|
||||
* [Checklists](#checklists)
|
||||
* [Documentation](#documentation)
|
||||
* [Configuration Management Tools](#configuration-management-tools)
|
||||
* [Tests](#tests)
|
||||
* [Code Reviews](#code-reviews)
|
||||
* [Formal Release Process](#formal-release-process)
|
||||
* [Continual Process Improvement](#continual-process-improvement)
|
||||
|
||||
Summary
|
||||
-------
|
||||
|
||||
This document contains a quality engineering plan for centillion, the
|
||||
Data Commons search engine.
|
||||
|
||||
Tracking Bugs and Issues
|
||||
------------------------
|
||||
|
||||
We utilize the [issues
|
||||
section](https://github.com/dcppc/centillion/issues) of the centillion
|
||||
repository to keep track of bugs and feature requests.
|
||||
|
||||
Branches, Versioning, and Git Workflow
|
||||
--------------------------------------
|
||||
|
||||
All code is kept under version control in the
|
||||
[dcppc/centillion](https://github.com/dcppc/centillion) Github
|
||||
repository.
|
||||
|
||||
**Primary Git Branches:**
|
||||
|
||||
We utillize a git branch pattern that has two primary branches: a
|
||||
development branch and a stable branch.
|
||||
|
||||
- The primary **development branch** is `dcppc` and is actively
|
||||
developed and deployed to <https://betasearch.nihdatacommons.us>.
|
||||
|
||||
- The primary **stable branch** is `releases/v1` and is stable and
|
||||
deployed to <https://search.nihdatacommons.us>.
|
||||
|
||||
All tagged versions of Centillion exist on the stable branch. Only
|
||||
tagged versions of centillion are run on
|
||||
<https://search.nihdatacommons.us>.
|
||||
|
||||
**Other Branches:**
|
||||
|
||||
Features are developed by creating a new branch from `dcppc`, working on
|
||||
the feature, and opening a pull request. When the pull request is
|
||||
approved, it can be merged into the `dcppc` branch.
|
||||
|
||||
When features have accumulated and a new version is ready, a new
|
||||
pre-release branch will be made to prepare for a new release. When the
|
||||
pre-release branch is ready, it is merged into the stable branch in a
|
||||
single merge commit and a new version of centillion is tagged. The new
|
||||
version is deployed on <https://search.nihdatacommons.us>.
|
||||
|
||||
Commits to fix bugs (hotfixes) may need to be applied to both the stable
|
||||
and development branches. In this case, a hotfix branch should be
|
||||
created from the head commit of the stable branch, and the appropriate
|
||||
changes should be made on the branch. A pull request should be opened to
|
||||
merge the hotfix into the release branch. A second pull request should
|
||||
be opened to merge the hotfix into the development branch. Once the
|
||||
hotfix is merged into the stable branch, a new version should be tagged.
|
||||
|
||||
Communication and Mailing Lists
|
||||
-------------------------------
|
||||
|
||||
- No mailing list currently exists for centillion.
|
||||
|
||||
- Github issues are the primary form of communication about
|
||||
development of centillion. This is the best method for communicating
|
||||
bug reports or detailed information.
|
||||
|
||||
- The Send Feedback button on the centillion page is the primary way
|
||||
of getting quick feedback from users about the search engine.
|
||||
|
||||
- The [\#centillion](https://nih-dcppc.slack.com/messages/CCD64QD6G)
|
||||
Slack channel in the DCPPC slack workspace is the best place for
|
||||
conversations about centillion (providing feedback, answering quick
|
||||
questions, etc.)
|
||||
|
||||
Checklists
|
||||
----------
|
||||
|
||||
We plan to utilize the Wiki feature of the Github repository to develop
|
||||
checlists:
|
||||
|
||||
- Checklist for releases
|
||||
- Checklist for deployment of https://search.nihdatacommons.us nginx
|
||||
etc.
|
||||
|
||||
Documentation
|
||||
-------------
|
||||
|
||||
The documentation is a pile of markdown documents, turned into a static
|
||||
site using mkdocs.
|
||||
|
||||
Configuration Management Tools
|
||||
------------------------------
|
||||
|
||||
We do not currently utilize any configuration management software,
|
||||
because centillion is not packaged as an importable Python module.
|
||||
|
||||
Packaging centillion is a future goal that is closely related to the
|
||||
need to improve and modularize the internal search schema/document type
|
||||
abstraction. These improvements would allow the types of collections
|
||||
being indexed to be separated from "core centillion", and core
|
||||
centillion would be packaged.
|
||||
|
||||
Tests
|
||||
-----
|
||||
|
||||
See (ref) for a full test plan with more detail.
|
||||
|
||||
Summary of test plan:
|
||||
|
||||
- Implement tests for the four major pages/components
|
||||
- Login/authentication
|
||||
- Search
|
||||
- Master List
|
||||
- Control Panel
|
||||
- Test authentication with two bot accounts (yammasnake and florence
|
||||
python)
|
||||
|
||||
- Separate frontend and backend tests
|
||||
|
||||
- Add a test flag in the flask config file to change the backend
|
||||
behavior of the server
|
||||
|
||||
Code Reviews
|
||||
------------
|
||||
|
||||
CI tests will be implemented for all pull requests.
|
||||
|
||||
Pull requests to the **stable branch** have the following checks in
|
||||
place:
|
||||
|
||||
- PRs to the stable branch require at least 1 PR review
|
||||
- PRs to the stable branch must pass CI tests
|
||||
|
||||
Pull requests to the **development branch** have the following checks in
|
||||
place:
|
||||
|
||||
- PRs to the development branch must pass CI tests
|
||||
|
||||
Formal Release Process
|
||||
----------------------
|
||||
|
||||
In order to ensure a stable, consistent product, we utilize the
|
||||
branching pattern described above to implement new features in the
|
||||
development branch and test them out on
|
||||
<https://betasearch.nihdatacommons.us>.
|
||||
|
||||
Once features and bug fixes have been tested and reviewed internally,
|
||||
they are ready to be deployed. A new pre-release branch is created from
|
||||
the development branch. The pre-release branch has a feature freeze in
|
||||
place. Changes are made to the pre-release branch to prepare it for the
|
||||
next major version release.
|
||||
|
||||
When the pre-release branch is finished, it is merged into the stable
|
||||
branch. The head commit of the stable version is tagged with the lastest
|
||||
release number.
|
||||
|
||||
Finally, the new version is deployed on
|
||||
<https://search.nihdatacommons.us>.
|
||||
|
||||
Continual Process Improvement
|
||||
-----------------------------
|
||||
|
||||
We will utilize the centillion wiki on Github to keep track of repeated
|
||||
processes and opportunities for improvement. Feedback and ideas for
|
||||
process improvement can also be submitted via Github issues.
|
BIN
static/centillion_white_beta.png
Normal file
BIN
static/centillion_white_beta.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 29 KiB |
BIN
static/centillion_white_localhost.png
Normal file
BIN
static/centillion_white_localhost.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 30 KiB |
@@ -22,6 +22,7 @@ var initIssuesTable = false;
|
||||
var initGhfilesTable = false;
|
||||
var initMarkdownTable = false;
|
||||
var initEmailthreadsTable = false;
|
||||
var initDisqusTable = false;
|
||||
|
||||
$(document).ready(function() {
|
||||
var url_string = document.location.toString();
|
||||
@@ -32,10 +33,6 @@ $(document).ready(function() {
|
||||
load_gdoc_table();
|
||||
var divList = $('div#collapseDrive').addClass('in');
|
||||
|
||||
} else if (d==='emailthread') {
|
||||
load_emailthreads_table();
|
||||
var divList = $('div#collapseThreads').addClass('in');
|
||||
|
||||
} else if (d==='issue') {
|
||||
load_issue_table();
|
||||
var divList = $('div#collapseIssues').addClass('in');
|
||||
@@ -48,10 +45,37 @@ $(document).ready(function() {
|
||||
load_markdown_table();
|
||||
var divList = $('div#collapseMarkdown').addClass('in');
|
||||
|
||||
} else if (d==='emailthread') {
|
||||
load_emailthreads_table();
|
||||
var divList = $('div#collapseThreads').addClass('in');
|
||||
|
||||
} else if (d==='disqus') {
|
||||
load_disqusthreads_table();
|
||||
var divList = $('div#collapseDisqus').addClass('in');
|
||||
|
||||
}
|
||||
});
|
||||
|
||||
|
||||
//////////////////////////////////
|
||||
// utility functions
|
||||
|
||||
// https://stackoverflow.com/a/25275808
|
||||
function iso8601(date) {
|
||||
var hours = date.getHours();
|
||||
var minutes = date.getMinutes();
|
||||
var ampm = hours >= 12 ? 'PM' : 'AM';
|
||||
hours = hours % 12;
|
||||
hours = hours ? hours : 12; // the hour '0' should be '12'
|
||||
minutes = minutes < 10 ? '0'+minutes : minutes;
|
||||
var strTime = hours + ':' + minutes + ' ' + ampm;
|
||||
return date.getYear() + "-" + (date.getMonth()+1) + "-" + date.getDate() + " " + strTime;
|
||||
}
|
||||
|
||||
// https://stackoverflow.com/a/7390612
|
||||
var toType = function(obj) {
|
||||
return ({}).toString.call(obj).match(/\s([a-zA-Z]+)/)[1].toLowerCase()
|
||||
}
|
||||
|
||||
//////////////////////////////////
|
||||
// API-to-Table Functions
|
||||
@@ -77,9 +101,9 @@ function load_gdoc_table(){
|
||||
if(!initGdocTable) {
|
||||
var divList = $('div#collapseDrive').attr('class');
|
||||
if (divList.indexOf('in') !== -1) {
|
||||
console.log('Closing Google Drive master list');
|
||||
//console.log('Closing Google Drive master list');
|
||||
} else {
|
||||
console.log('Opening Google Drive master list');
|
||||
//console.log('Opening Google Drive master list');
|
||||
|
||||
$.getJSON("/list/gdoc", function(result){
|
||||
|
||||
@@ -125,7 +149,7 @@ function load_gdoc_table(){
|
||||
|
||||
initGdocTable = true
|
||||
});
|
||||
console.log('Finished loading Google Drive master list');
|
||||
//console.log('Finished loading Google Drive master list');
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -137,9 +161,9 @@ function load_issue_table(){
|
||||
if(!initIssuesTable) {
|
||||
var divList = $('div#collapseIssues').attr('class');
|
||||
if (divList.indexOf('in') !== -1) {
|
||||
console.log('Closing Github issues master list');
|
||||
//console.log('Closing Github issues master list');
|
||||
} else {
|
||||
console.log('Opening Github issues master list');
|
||||
//console.log('Opening Github issues master list');
|
||||
|
||||
$.getJSON("/list/issue", function(result){
|
||||
var r = new Array(), j = -1, size=result.length;
|
||||
@@ -183,7 +207,7 @@ function load_issue_table(){
|
||||
|
||||
initIssuesTable = true;
|
||||
});
|
||||
console.log('Finished loading Github issues master list');
|
||||
//console.log('Finished loading Github issues master list');
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -195,13 +219,13 @@ function load_ghfile_table(){
|
||||
if(!initGhfilesTable) {
|
||||
var divList = $('div#collapseFiles').attr('class');
|
||||
if (divList.indexOf('in') !== -1) {
|
||||
console.log('Closing Github files master list');
|
||||
//console.log('Closing Github files master list');
|
||||
} else {
|
||||
console.log('Opening Github files master list');
|
||||
//console.log('Opening Github files master list');
|
||||
|
||||
$.getJSON("/list/ghfile", function(result){
|
||||
console.log("-----------");
|
||||
console.log(result);
|
||||
//console.log("-----------");
|
||||
//console.log(result);
|
||||
var r = new Array(), j = -1, size=result.length;
|
||||
r[++j] = '<thead>'
|
||||
r[++j] = '<tr class="header-row">';
|
||||
@@ -237,7 +261,7 @@ function load_ghfile_table(){
|
||||
|
||||
initGhfilesTable = true;
|
||||
});
|
||||
console.log('Finished loading Github file list');
|
||||
//console.log('Finished loading Github file list');
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -249,9 +273,9 @@ function load_markdown_table(){
|
||||
if(!initMarkdownTable) {
|
||||
var divList = $('div#collapseMarkdown').attr('class');
|
||||
if (divList.indexOf('in') !== -1) {
|
||||
console.log('Closing Github markdown master list');
|
||||
//console.log('Closing Github markdown master list');
|
||||
} else {
|
||||
console.log('Opening Github markdown master list');
|
||||
//console.log('Opening Github markdown master list');
|
||||
|
||||
$.getJSON("/list/markdown", function(result){
|
||||
var r = new Array(), j = -1, size=result.length;
|
||||
@@ -289,7 +313,7 @@ function load_markdown_table(){
|
||||
|
||||
initMarkdownTable = true;
|
||||
});
|
||||
console.log('Finished loading Markdown list');
|
||||
//console.log('Finished loading Markdown list');
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -302,16 +326,18 @@ function load_emailthreads_table(){
|
||||
if(!initEmailthreadsTable) {
|
||||
var divList = $('div#collapseThreads').attr('class');
|
||||
if (divList.indexOf('in') !== -1) {
|
||||
console.log('Closing Groups.io email threads master list');
|
||||
//console.log('Closing Groups.io email threads master list');
|
||||
} else {
|
||||
console.log('Opening Groups.io email threads master list');
|
||||
//console.log('Opening Groups.io email threads master list');
|
||||
|
||||
$.getJSON("/list/emailthread", function(result){
|
||||
var r = new Array(), j = -1, size=result.length;
|
||||
r[++j] = '<thead>'
|
||||
r[++j] = '<tr class="header-row">';
|
||||
r[++j] = '<th width="70%">Topic</th>';
|
||||
r[++j] = '<th width="30%">Started By</th>';
|
||||
r[++j] = '<th width="60%">Topic</th>';
|
||||
r[++j] = '<th width="15%">Started By</th>';
|
||||
r[++j] = '<th width="15%">Date</th>';
|
||||
r[++j] = '<th width="10%">Mailing List</th>';
|
||||
r[++j] = '</tr>';
|
||||
r[++j] = '</thead>'
|
||||
r[++j] = '<tbody>'
|
||||
@@ -322,6 +348,10 @@ function load_emailthreads_table(){
|
||||
r[++j] = '</a>'
|
||||
r[++j] = '</td><td>';
|
||||
r[++j] = result[i]['owner_name'];
|
||||
r[++j] = '</td><td>';
|
||||
r[++j] = result[i]['created_time'];
|
||||
r[++j] = '</td><td>';
|
||||
r[++j] = result[i]['group'];
|
||||
r[++j] = '</td></tr>';
|
||||
}
|
||||
r[++j] = '</tbody>'
|
||||
@@ -340,7 +370,57 @@ function load_emailthreads_table(){
|
||||
|
||||
initEmailthreadsTable = true;
|
||||
});
|
||||
console.log('Finished loading Groups.io email threads list');
|
||||
//console.log('Finished loading Groups.io email threads list');
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ------------------------
|
||||
// Disqus Comment Threads
|
||||
|
||||
function load_disqusthreads_table(){
|
||||
if(!initEmailthreadsTable) {
|
||||
var divList = $('div#collapseDisqus').attr('class');
|
||||
if (divList.indexOf('in') !== -1) {
|
||||
//console.log('Closing Disqus comment threads master list');
|
||||
} else {
|
||||
//console.log('Opening Disqus comment threads master list');
|
||||
|
||||
$.getJSON("/list/disqus", function(result){
|
||||
var r = new Array(), j = -1, size=result.length;
|
||||
r[++j] = '<thead>'
|
||||
r[++j] = '<tr class="header-row">';
|
||||
r[++j] = '<th width="70%">Page Title</th>';
|
||||
r[++j] = '<th width="30%">Created</th>';
|
||||
r[++j] = '</tr>';
|
||||
r[++j] = '</thead>'
|
||||
r[++j] = '<tbody>'
|
||||
for (var i=0; i<size; i++){
|
||||
r[++j] ='<tr><td>';
|
||||
r[++j] = '<a href="' + result[i]['url'] + '" target="_blank">'
|
||||
r[++j] = result[i]['title'];
|
||||
r[++j] = '</a>'
|
||||
r[++j] = '</td><td>';
|
||||
r[++j] = result[i]['created_time'];
|
||||
r[++j] = '</td></tr>';
|
||||
}
|
||||
r[++j] = '</tbody>'
|
||||
|
||||
// Construct names of id tags
|
||||
var doctype = 'disqus';
|
||||
var idlabel = '#' + doctype + '-master-list';
|
||||
var filtlabel = idlabel + '_filter';
|
||||
|
||||
// Initialize the DataTable
|
||||
$(idlabel).html(r.join(''));
|
||||
$(idlabel).DataTable({
|
||||
responsive: true,
|
||||
lengthMenu: [50,100,250,500]
|
||||
});
|
||||
|
||||
initDisqusTable = true;
|
||||
});
|
||||
console.log('Finished loading Disqus comment threads list');
|
||||
}
|
||||
}
|
||||
}
|
||||
|
@@ -31,7 +31,7 @@ $(document).ready(function() {
|
||||
aTargets : [2]
|
||||
}
|
||||
],
|
||||
lengthMenu: [50,100,250,500]
|
||||
lengthMenu: [10,20,50,100]
|
||||
});
|
||||
|
||||
console.log('Finished loading search results list');
|
||||
|
@@ -86,6 +86,14 @@ div.container {
|
||||
}
|
||||
|
||||
/* badges for number of docs indexed */
|
||||
span.results-count {
|
||||
background-color: #555;
|
||||
}
|
||||
|
||||
span.indexing-count {
|
||||
background-color: #337ab7;
|
||||
}
|
||||
|
||||
span.badge {
|
||||
vertical-align: text-bottom;
|
||||
}
|
||||
@@ -192,7 +200,7 @@ table {
|
||||
|
||||
.info, .last-searches {
|
||||
color: gray;
|
||||
font-size: 12px;
|
||||
/*font-size: 12px;*/
|
||||
font-family: Arial, serif;
|
||||
}
|
||||
|
||||
@@ -202,27 +210,27 @@ table {
|
||||
|
||||
div.tags a, td.tag-cloud a {
|
||||
color: #b56020;
|
||||
font-size: 12px;
|
||||
/*font-size: 12px;*/
|
||||
}
|
||||
|
||||
td.tag-cloud, td.directories-cloud {
|
||||
font-size: 12px;
|
||||
/*font-size: 12px;*/
|
||||
color: #555555;
|
||||
}
|
||||
|
||||
td.directories-cloud a {
|
||||
font-size: 12px;
|
||||
/*font-size: 12px;*/
|
||||
color: #377BA8;
|
||||
}
|
||||
|
||||
div.path {
|
||||
font-size: 12px;
|
||||
/*font-size: 12px;*/
|
||||
color: #666666;
|
||||
margin-bottom: 3px;
|
||||
}
|
||||
|
||||
div.path a {
|
||||
font-size: 12px;
|
||||
/*font-size: 12px;*/
|
||||
margin-right: 5px;
|
||||
}
|
||||
|
||||
|
@@ -7,11 +7,18 @@
|
||||
<div class="col12sm" id="banner-col">
|
||||
<center>
|
||||
<a id="banner-a" href="{{ url_for('search')}}?query=&fields=">
|
||||
<img id="banner-img" src="{{ url_for('static', filename='centillion_white.png') }}">
|
||||
{% if 'betasearch' in request.url %}
|
||||
<img id="banner-img" src="{{ url_for('static', filename='centillion_white_beta.png') }}">
|
||||
{% elif 'localhost' in request.url %}
|
||||
<img id="banner-img" src="{{ url_for('static', filename='centillion_white_localhost.png') }}">
|
||||
{% else %}
|
||||
<img id="banner-img" src="{{ url_for('static', filename='centillion_white.png') }}">
|
||||
{% endif %}
|
||||
</a>
|
||||
</center>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{% if config['TAGLINE'] %}
|
||||
<div class="row" id="tagline-row">
|
||||
<div class="col12sm" id="tagline-col">
|
||||
|
@@ -54,6 +54,8 @@
|
||||
</p>
|
||||
<p><a href="{{ url_for('update_index',run_which='emailthreads') }}" class="btn btn-large btn-danger btn-reindex-type">Update Groups.io Email Threads Index</a>
|
||||
</p>
|
||||
<p><a href="{{ url_for('update_index',run_which='disqus') }}" class="btn btn-large btn-danger btn-reindex-type">Update Disqus Comment Threads Index</a>
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
@@ -5,7 +5,7 @@
|
||||
<div class="alert alert-success alert-dismissible fade in">
|
||||
<a href="#" class="close" data-dismiss="alert" aria-label="close">×</a>
|
||||
{% for message in messages %}
|
||||
<p class="lead">{{ message }}</p>
|
||||
<p>{{ message }}</p>
|
||||
{% endfor %}
|
||||
</div>
|
||||
</div>
|
||||
|
@@ -9,8 +9,9 @@
|
||||
<div class="row">
|
||||
|
||||
{#
|
||||
# google drive files panel
|
||||
#}
|
||||
# google drive files panel
|
||||
#}
|
||||
<a name="gdoc"></a>
|
||||
<div class="row">
|
||||
<div class="panel">
|
||||
<div class="panel-group" id="accordionDrive" role="tablist" aria-multiselectable="true">
|
||||
@@ -46,8 +47,9 @@
|
||||
|
||||
|
||||
{#
|
||||
# github issue panel
|
||||
#}
|
||||
# github issue panel
|
||||
#}
|
||||
<a name="issue"></a>
|
||||
<div class="row">
|
||||
<div class="panel">
|
||||
<div class="panel-group" id="accordionIssues" role="tablist" aria-multiselectable="true">
|
||||
@@ -85,8 +87,9 @@
|
||||
|
||||
|
||||
{#
|
||||
# github file panel
|
||||
#}
|
||||
# github file panel
|
||||
#}
|
||||
<a name="ghfile"></a>
|
||||
<div class="row">
|
||||
<div class="panel">
|
||||
<div class="panel-group" id="accordionFiles" role="tablist" aria-multiselectable="true">
|
||||
@@ -122,8 +125,9 @@
|
||||
|
||||
|
||||
{#
|
||||
# gh markdown file panel
|
||||
#}
|
||||
# gh markdown file panel
|
||||
#}
|
||||
<a name="markdown"></a>
|
||||
<div class="row">
|
||||
<div class="panel">
|
||||
<div class="panel-group" id="accordionMarkdown" role="tablist" aria-multiselectable="true">
|
||||
@@ -160,8 +164,9 @@
|
||||
|
||||
|
||||
{#
|
||||
# groups.io
|
||||
#}
|
||||
# groups.io email threads
|
||||
#}
|
||||
<a name="emailthread"></a>
|
||||
<div class="row">
|
||||
<div class="panel">
|
||||
<div class="panel-group" id="accordionThreads" role="tablist" aria-multiselectable="true">
|
||||
@@ -195,6 +200,42 @@
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{#
|
||||
# disqus comment threads
|
||||
#}
|
||||
<a name="disqus"></a>
|
||||
<div class="row">
|
||||
<div class="panel">
|
||||
<div class="panel-group" id="accordionDisqus" role="tablist" aria-multiselectable="true">
|
||||
<div class="panel panel-default">
|
||||
<div class="panel-heading" role="tab" id="disqus">
|
||||
|
||||
<h2 class="masterlist-header">
|
||||
<a class="collapsed"
|
||||
role="button"
|
||||
onClick="load_disqusthreads_table()"
|
||||
data-toggle="collapse"
|
||||
data-parent="#accordionDisqus"
|
||||
href="#collapseDisqus"
|
||||
aria-expanded="true"
|
||||
aria-controls="collapseDisqus">
|
||||
Disqus Comment Threads <small>indexed by centillion</small>
|
||||
</a>
|
||||
</h2>
|
||||
|
||||
</div>
|
||||
<div id="collapseDisqus" class="panel-collapse collapse" role="tabpanel"
|
||||
aria-labelledby="disqus">
|
||||
<div class="panel-body">
|
||||
<table class="table table-striped" id="disqus-master-list">
|
||||
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
||||
</div>
|
||||
|
@@ -52,8 +52,8 @@
|
||||
<div class="container-fluid">
|
||||
<div class="row">
|
||||
<div class="col-xs-12 info">
|
||||
<b>Found:</b> <span class="badge">{{entries|length}}</span> results
|
||||
out of <span class="badge">{{totals["total"]}}</span> total items indexed
|
||||
<b>Found:</b> <span class="badge results-count">{{entries|length}}</span> results
|
||||
out of <span class="badge results-count">{{totals["total"]}}</span> total items indexed
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
@@ -67,35 +67,59 @@
|
||||
<div class="col-xs-12 info">
|
||||
<b>Indexing:</b>
|
||||
|
||||
<span class="badge">{{totals["gdoc"]}}</span>
|
||||
<a href="/master_list?doctype=gdoc">
|
||||
<span class="badge indexing-count">{{totals["gdoc"]}}</span>
|
||||
<a href="/master_list?doctype=gdoc#gdoc">
|
||||
Google Drive files
|
||||
</a>,
|
||||
|
||||
<span class="badge">{{totals["issue"]}}</span>
|
||||
<a href="/master_list?doctype=issue">
|
||||
<span class="badge indexing-count">{{totals["issue"]}}</span>
|
||||
<a href="/master_list?doctype=issue#issue">
|
||||
Github issues
|
||||
</a>,
|
||||
|
||||
<span class="badge">{{totals["ghfile"]}}</span>
|
||||
<a href="/master_list?doctype=ghfile">
|
||||
<span class="badge indexing-count">{{totals["ghfile"]}}</span>
|
||||
<a href="/master_list?doctype=ghfile#ghfile">
|
||||
Github files
|
||||
</a>,
|
||||
|
||||
<span class="badge">{{totals["markdown"]}}</span>
|
||||
<a href="/master_list?doctype=markdown">
|
||||
<span class="badge indexing-count">{{totals["markdown"]}}</span>
|
||||
<a href="/master_list?doctype=markdown#markdown">
|
||||
Github Markdown files
|
||||
</a>,
|
||||
|
||||
<span class="badge">{{totals["emailthread"]}}</span>
|
||||
<a href="/master_list?doctype=emailthread">
|
||||
<span class="badge indexing-count">{{totals["emailthread"]}}</span>
|
||||
<a href="/master_list?doctype=emailthread#emailthread">
|
||||
Groups.io email threads
|
||||
</a>,
|
||||
|
||||
<span class="badge indexing-count">{{totals["disqus"]}}</span>
|
||||
<a href="/master_list?doctype=disqus#disqus">
|
||||
Disqus comment threads
|
||||
</a>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</li>
|
||||
|
||||
|
||||
{#
|
||||
# more options...
|
||||
#}
|
||||
<li class="list-group-item">
|
||||
<div class="container-fluid">
|
||||
<div class="row">
|
||||
<div class="col-xs-12 info">
|
||||
<b>More Options <i class="fa fa-chevron-down"></i></b>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</li>
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
|
196
tests/Readme.md
196
tests/Readme.md
@@ -1,196 +0,0 @@
|
||||
Centillion Tests
|
||||
================
|
||||
|
||||
Table of Contents
|
||||
------------------
|
||||
|
||||
* [Centillion Tests](#centillion-tests)
|
||||
* [Test Plan](#test-plan)
|
||||
* [Local Tests](#local-tests)
|
||||
* [Short Tests](#short-tests)
|
||||
* [Long Tests](#long-tests)
|
||||
* [Credentials](#credentials)
|
||||
* [Detailed Description of Tests](#detailed-description-of-tests)
|
||||
* [Authentication Layer Tests](#authentication-layer-tests)
|
||||
* [Search Function Tests](#search-function-tests)
|
||||
* [Master List Endpoint Tests](#master-list-endpoint-tests)
|
||||
* [Control Panel Endpoint Tests](#control-panel-endpoint-tests)
|
||||
* [Continuous Integration Plan](#continuous-integration-plan)
|
||||
* [Procedure/Checklist](#procedurechecklist)
|
||||
|
||||
|
||||
Test Plan
|
||||
---------
|
||||
|
||||
Related: <https://github.com/dcppc/centillion/issues/82>
|
||||
|
||||
The test suite for centillion needs to check each of the major
|
||||
components of centillion, as well as check the authentication mechanism
|
||||
using multiple login credentials.
|
||||
|
||||
We implement the following checks:
|
||||
|
||||
1. Check authentication mechanism(s) (yamasnake and florence python)
|
||||
|
||||
2. Check search function
|
||||
|
||||
3. Check master list endpoint
|
||||
|
||||
4. Check control panel endpoint
|
||||
|
||||
5. Check update search index endpoints
|
||||
|
||||
The tests are written such that the back end and front end are tested
|
||||
separately.
|
||||
|
||||
We need also need different tiers of tests, so we don't max out API
|
||||
calls by making lots of commits to multiple PRs.
|
||||
|
||||
We have three tiers of tests: \* Local tests - quick tests for CI, no
|
||||
API calls \* Short tests - tests using dummy API accounts \* Long tests
|
||||
- tests using DCPPC API accounts
|
||||
|
||||
### Local Tests
|
||||
|
||||
Local tests can be run locally without any interaction with APIs. These
|
||||
will still utilize centillion's search schema, but will load the search
|
||||
index with fake documents rather than fetching them from an API.
|
||||
|
||||
Uncle Archie, which runs CI tests, runs local tests only (unless you
|
||||
request it to run short test or long test.)
|
||||
|
||||
### Short Tests
|
||||
|
||||
Short tests utilize credentials for bot accounts that have intentionally
|
||||
been set up to have a "known" corpus of test documents. These would
|
||||
provide unit-style tests for centillion - are the mechanics of indexing
|
||||
a particular type of document from a particular API working?
|
||||
|
||||
### Long Tests
|
||||
|
||||
Long tests are indexing the real deal, utilizing the credentials used in
|
||||
the final production centillion. This test takes longer but is more
|
||||
likely to catch corner cases specific to the DCPPC documents.
|
||||
|
||||
Credentials
|
||||
-----------
|
||||
|
||||
Running tests on centillion requires multiple sets of credentials. Let's
|
||||
lay out what is needed:
|
||||
|
||||
- The Flask app requires a token/secret token API key pair to allow
|
||||
users to authenticate through Github and confirm they are members of
|
||||
the DCPPC organization. This OAuth application is owned by Charles
|
||||
Reid (@charlesreid1).
|
||||
|
||||
- The search index needs a Github access token so that it can
|
||||
interface with the Github API to index files and issues. This access
|
||||
token is specified (along with other secrets) in the Flask
|
||||
configuration file. The access key comes from Florence Python
|
||||
(@fp9695253).
|
||||
|
||||
- The search index also requires a Google Drive API access token. This
|
||||
must be an access token for a user who has authenticated with the
|
||||
Centillion Google Drive OAuth application. This access token comes
|
||||
from <mailroom@nihdatacommons.com>.
|
||||
|
||||
- The search index requires API credentials for any other APIs
|
||||
associated with other document collections (Groups.io, Hypothesis,
|
||||
Disqus).
|
||||
|
||||
- The backend test requires the credentials provided to Flask.
|
||||
|
||||
- The frontend test (Selenium) needs two Github username/passwords:
|
||||
one for Florence Python (@fp9695253) and one for Yamma Snake
|
||||
(@yammasnake). These are required to simulate the user
|
||||
authenticating with Github through the browser.
|
||||
- The frontend test credentials are a special case.
|
||||
- The frontend tests expect credentials to come from environment
|
||||
variables.
|
||||
- These environment variables get passed in at test time.
|
||||
- Tests are all run on [Uncle
|
||||
Archie](https://github.com/dcppc/uncle-archie).
|
||||
- Uncle Archie already has to protect a confidential config file
|
||||
containing Github credentials, so add additional credentials for
|
||||
frontend tests there.
|
||||
- Logical separation: these credentials are not needed to
|
||||
*operate* centillion, these credentials are needed to *test*
|
||||
centillion
|
||||
- Uncle Archie already requires github credentials, already
|
||||
protects sensitive info.
|
||||
- Google Drive requiring its own credentials file on disk is a
|
||||
pain.
|
||||
|
||||
In summary: tests use the `config_flask.py` and `config_centillion.py`
|
||||
files to provide it with the API keys it needs and to instruct it on
|
||||
what to index. The credentials and config files will control what the
|
||||
search index will actually index. The Uncle Archie CI tester config file
|
||||
contains the credentials needed to run frontend tests (check the
|
||||
login/authentication layer).
|
||||
|
||||
Detailed Description of Tests
|
||||
-----------------------------
|
||||
|
||||
### Authentication Layer Tests
|
||||
|
||||
Frontend tests run as Florence Python:
|
||||
|
||||
- Can we log in via github and reach centillion
|
||||
- Can we reach the control panel
|
||||
|
||||
Frontend tests run as Yamma Snake (DCPPC member):
|
||||
|
||||
- Can we log in via github and reach centillion
|
||||
- Can we reach the control panel
|
||||
|
||||
### Search Function Tests
|
||||
|
||||
Frontend tests:
|
||||
|
||||
- Can we enter something into search box and submit
|
||||
- Can we sort the results
|
||||
- Do the results look okay
|
||||
|
||||
Backend tests:
|
||||
|
||||
- Load the search index and run a query using whoosh API
|
||||
|
||||
### Master List Endpoint Tests
|
||||
|
||||
Frontend tests:
|
||||
|
||||
- Can we get to the master list page
|
||||
- Can we sort the results
|
||||
- Do the results look okay
|
||||
|
||||
Backend tests:
|
||||
|
||||
- Check the output of the `/list` API endpoint
|
||||
|
||||
### Control Panel Endpoint Tests
|
||||
|
||||
Frontend tests:
|
||||
|
||||
- Can we get to the control panel page
|
||||
- Can we click the button to trigger an indexing event
|
||||
|
||||
Backend tests:
|
||||
|
||||
- Trigger a re-index of the search index from the backend.
|
||||
|
||||
### Continuous Integration Plan
|
||||
|
||||
Tests are automatically run using Uncle Archie for continuous
|
||||
integration and deployment.
|
||||
|
||||
Procedure/Checklist
|
||||
-------------------
|
||||
|
||||
Pre-release procedure:
|
||||
|
||||
- prepare to run all test
|
||||
|
||||
- run short tests
|
||||
- deploy to beta
|
||||
- run long tests
|
||||
- test out
|
Reference in New Issue
Block a user