30 Commits

Author SHA1 Message Date
46ce070b09 fix styles 2018-08-24 08:31:57 -07:00
891fa50868 fix results boxes in results table to be gray 2018-08-24 02:30:49 -07:00
fdb3963ede tack on the disqus comments anchor to disqus URLs 2018-08-24 02:01:34 -07:00
90379a69c5 Merge pull request #92 from dcppc/add-date-subgrp-emailthreads
add string formatting for dates and add date/mailing list column to email threads master list
2018-08-24 01:58:29 -07:00
0faca67c35 add string formatting for dates and add date/mailing list column to email threads master list
closes #58
2018-08-24 01:56:19 -07:00
77b533b642 Merge pull request #86 from dcppc/disqus
Add Disqus
2018-08-24 01:18:37 -07:00
ccf013e3c9 Merge pull request #85 from dcppc/add-coc-dotgithub
Add Code of Conduct, Contributing, and PR template
2018-08-24 01:18:14 -07:00
e67db4f1ef Merge pull request #89 from dcppc/fix-flashed-messages-font
fix font used in flashed messages
2018-08-24 01:17:59 -07:00
b11a26a812 Merge pull request #91 from dcppc/merge-datetime-into-disqus
Merge datetime into disqus
2018-08-24 01:14:24 -07:00
55a74f7d98 Merge branch 'use-datetime' into merge-datetime-into-disqus
* use-datetime:
  extract date and time from email threads pages
  add groups and tags to schema; update how we determine timestamps; handle exceptions when we add the document to the writer, rather than elsewhere
  move where exception is caught (exception was also incorrect.)
  switched created_time, modified_time, indexed_time over to DATETIME. added DateParserPlugin to query QueryParser. added time fields to those being searched by default. tests do not seem to be working.
2018-08-24 01:13:42 -07:00
ab76226b0c Merge pull request #90 from dcppc/add-dates-and-subgroups-to-emails
Add dates and subgroups to emails
2018-08-24 00:07:40 -07:00
a4ebef6e6f extract date and time from email threads pages 2018-08-24 00:04:35 -07:00
bad50efa9b add groups and tags to schema; update how we determine timestamps; handle exceptions when we add the document to the writer, rather than elsewhere 2018-08-24 00:03:23 -07:00
629fc063db move where exception is caught (exception was also incorrect.) 2018-08-24 00:01:26 -07:00
4f41d8597f fix font used in flashed messages 2018-08-23 19:05:16 -07:00
3b0baa21de switched created_time, modified_time, indexed_time over to DATETIME. added DateParserPlugin to query QueryParser. added time fields to those being searched by default. tests do not seem to be working. 2018-08-23 19:01:40 -07:00
33b8857bd0 implement stop filter; implement query variations in main query parser 2018-08-23 17:15:48 -07:00
7c50fc9ff1 swap out data-commons with nihdatacommons in disqus urls 2018-08-23 17:15:10 -07:00
eb2cdf1437 fix a bug 2018-08-23 15:57:25 -07:00
c67e864581 add disqus threads to things being indexed by centillion 2018-08-23 15:55:59 -07:00
25cc12cf21 turn disqus_util into a crawler object 2018-08-23 15:55:21 -07:00
11c1185e62 clarify api call in disqus.md 2018-08-23 15:54:30 -07:00
17b2d359bb add contributing and code of conduct files 2018-08-23 11:03:48 -07:00
62ca62274e add github pull request template 2018-08-23 11:02:37 -07:00
74cfaf8275 add more notes on hypothesis API output 2018-08-22 15:23:28 -07:00
552caad135 add utilities to call disqus and hypothesis APIs
- both of these files are functions and are not integrated into centillion
2018-08-22 15:22:25 -07:00
19c42df978 update hypothesis/disqus notes 2018-08-22 12:45:51 -07:00
6f30e3f120 add api output from listThreads endpoint 2018-08-21 19:36:36 -07:00
ad6b653e27 add all the threads 2018-08-21 15:10:40 -07:00
501cae8329 Merge pull request #81 from dcppc/detect-beta-banner
Add custom banners for beta/localhost centillion instances
2018-08-21 13:18:11 -07:00
17 changed files with 5460 additions and 267 deletions

12
.github/PULL_REQUEST_TEMPLATE.md vendored Normal file
View File

@@ -0,0 +1,12 @@
Thanks for contributing to centillion!
Please place an x between the brackets to indicate a yes answer
to the questions below.
- [ ] Is this pull request mergeable?
- [ ] Has this been tested locally?
- [ ] Does this pull request pass the tests?
- [ ] Have new tests been added to cover any new code?
- [ ] Was a spellchecker run on the source code and documentation after
changes were made?

43
CODE_OF_CONDUCT.md Normal file
View File

@@ -0,0 +1,43 @@
# Code of Conduct
## DCPPC Code of Conduct
All members of the Commons are expected to agree with the following code
of conduct. We will enforce this code as needed. We expect cooperation
from all members to help ensuring a safe environment for everybody.
## The Quick Version
The Consortium is dedicated to providing a harassment-free experience
for everyone, regardless of gender, gender identity and expression, age,
sexual orientation, disability, physical appearance, body size, race, or
religion (or lack thereof). We do not tolerate harassment of Consortium
members in any form. Sexual language and imagery is generally not
appropriate for any venue, including meetings, presentations, or
discussions.
## The Less Quick Version
Harassment includes offensive verbal comments related to gender, gender
identity and expression, age, sexual orientation, disability, physical
appearance, body size, race, religion, sexual images in public spaces,
deliberate intimidation, stalking, following, harassing photography or
recording, sustained disruption of talks or other events, inappropriate
physical contact, and unwelcome sexual attention.
Members asked to stop any harassing behavior are expected to comply
immediately.
If you are being harassed, notice that someone else is being harassed,
or have any other concerns, please contact [Titus
Brown](mailto:ctbrown@ucdavis.edu) immediately. If Titus is the cause of
your concern, please contact [Vivien
Bonazzi](mailto:bonazziv@mail.nih.gov).
We expect members to follow these guidelines at any Consortium event.
Original source and credit: <http://2012.jsconf.us/#/about> & The Ada
Initiative. Please help by translating or improving:
<http://github.com/leftlogic/confcodeofconduct.com>. This work is
licensed under a Creative Commons Attribution 3.0 Unported License

21
CONTRIBUTING.md Normal file
View File

@@ -0,0 +1,21 @@
# Contributing to the DCPPC Internal Repository
Hello, and thank you for wanting to contribute to the DCPPC Internal
Repository\!
By contributing to this repository, you agree:
1. To obey the [Code of Conduct](./CODE_OF_CONDUCT.md)
2. To release all your contributions under the same terms as the
license itself: the [Creative Commons Zero](./LICENSE.md) (aka
Public Domain) license
If you are OK with these two conditions, then we welcome both you and
your contribution\!
If you have any questions about contributing, please [open an
issue](https://github.com/dcppc/internal/issues/new) and Team Copper
will lend a hand ASAP.
Thank you for being here and for being a part of the DCPPC project.

4268
Disqus.md Normal file

File diff suppressed because it is too large Load Diff

249
Hypothesis.md Normal file
View File

@@ -0,0 +1,249 @@
# Hypothesis API
## Authenticating
Example output call for authenticating with the API:
```
{
"links": {
"profile": {
"read": {
"url": "https://hypothes.is/api/profile",
"method": "GET",
"desc": "Fetch the user's profile"
},
"update": {
"url": "https://hypothes.is/api/profile",
"method": "PATCH",
"desc": "Update a user's preferences"
}
},
"search": {
"url": "https://hypothes.is/api/search",
"method": "GET",
"desc": "Search for annotations"
},
"group": {
"member": {
"add": {
"url": "https://hypothes.is/api/groups/:pubid/members/:userid",
"method": "POST",
"desc": "Add the user in the request params to a group."
},
"delete": {
"url": "https://hypothes.is/api/groups/:pubid/members/:userid",
"method": "DELETE",
"desc": "Remove the current user from a group."
}
}
},
"links": {
"url": "https://hypothes.is/api/links",
"method": "GET",
"desc": "URL templates for generating URLs for HTML pages"
},
"groups": {
"read": {
"url": "https://hypothes.is/api/groups",
"method": "GET",
"desc": "Fetch the user's groups"
}
},
"annotation": {
"hide": {
"url": "https://hypothes.is/api/annotations/:id/hide",
"method": "PUT",
"desc": "Hide an annotation as a group moderator."
},
"unhide": {
"url": "https://hypothes.is/api/annotations/:id/hide",
"method": "DELETE",
"desc": "Unhide an annotation as a group moderator."
},
"read": {
"url": "https://hypothes.is/api/annotations/:id",
"method": "GET",
"desc": "Fetch an annotation"
},
"create": {
"url": "https://hypothes.is/api/annotations",
"method": "POST",
"desc": "Create an annotation"
},
"update": {
"url": "https://hypothes.is/api/annotations/:id",
"method": "PATCH",
"desc": "Update an annotation"
},
"flag": {
"url": "https://hypothes.is/api/annotations/:id/flag",
"method": "PUT",
"desc": "Flag an annotation for review."
},
"delete": {
"url": "https://hypothes.is/api/annotations/:id",
"method": "DELETE",
"desc": "Delete an annotation"
}
}
}
}
```
## Listing
Here is the result of the API call to list an annotation
given its annotation ID:
```
{
"updated": "2018-07-26T10:20:47.803636+00:00",
"group": "__world__",
"target": [
{
"source": "https://h.readthedocs.io/en/latest/api/authorization/",
"selector": [
{
"conformsTo": "https://tools.ietf.org/html/rfc3236",
"type": "FragmentSelector",
"value": "access-tokens"
},
{
"endContainer": "/div[1]/section[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[2]/p[2]",
"startContainer": "/div[1]/section[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[2]/p[1]",
"type": "RangeSelector",
"startOffset": 14,
"endOffset": 116
},
{
"type": "TextPositionSelector",
"end": 2234,
"start": 1374
},
{
"exact": "hich read or write data as a specific user need to be authorized\nwith an access token. Access tokens can be obtained in two ways:\n\nBy generating a personal API token on the Hypothesis developer\npage (you must be logged in to\nHypothesis to get to this page). This is the simplest method, however\nthese tokens are only suitable for enabling your application to make\nrequests as a single specific user.\n\nBy registering an \u201cOAuth client\u201d and\nimplementing the OAuth authentication flow\nin your application. This method allows any user to authorize your\napplication to read and write data via the API as that user. The Hypothesis\nclient is an example of an application that uses OAuth.\nSee Using OAuth for details of how to implement this method.\n\n\nOnce an access token has been obtained, requests can be authorized by putting\nthe token in the Authorization header.",
"prefix": "\n\n\nAccess tokens\u00b6\nAPI requests w",
"type": "TextQuoteSelector",
"suffix": "\nExample request:\nGET /api HTTP/"
}
]
}
],
"links": {
"json": "https://hypothes.is/api/annotations/kEaohJC9Eeiy_UOozkpkyA",
"html": "https://hypothes.is/a/kEaohJC9Eeiy_UOozkpkyA",
"incontext": "https://hyp.is/kEaohJC9Eeiy_UOozkpkyA/h.readthedocs.io/en/latest/api/authorization/"
},
"tags": [],
"text": "sdfsdf",
"created": "2018-07-26T10:20:47.803636+00:00",
"uri": "https://h.readthedocs.io/en/latest/api/authorization/",
"flagged": false,
"user_info": {
"display_name": null
},
"user": "acct:Aravindan@hypothes.is",
"hidden": false,
"document": {
"title": [
"Authorization \u2014 h 0.0.2 documentation"
]
},
"id": "kEaohJC9Eeiy_UOozkpkyA",
"permissions": {
"read": [
"group:__world__"
],
"admin": [
"acct:Aravindan@hypothes.is"
],
"update": [
"acct:Aravindan@hypothes.is"
],
"delete": [
"acct:Aravindan@hypothes.is"
]
}
}
```
## Searching
Here is the output from a call to the endpoint to search annotations
(we pass a specific URL to the search function):
```
{
"rows": [
{
"updated": "2018-08-10T02:21:46.898833+00:00",
"group": "__world__",
"target": [
{
"source": "http://pilot.data-commons.us/organize/CopperInternalDeliveryWorkFlow/",
"selector": [
{
"endContainer": "/div[1]/main[1]/div[1]/div[3]/article[1]/h2[1]",
"startContainer": "/div[1]/main[1]/div[1]/div[3]/article[1]/h2[1]",
"type": "RangeSelector",
"startOffset": 0,
"endOffset": 80
},
{
"type": "TextPositionSelector",
"end": 12328,
"start": 12248
},
{
"exact": "Deliverables are due internally on the first of each month, which here is Day 1,",
"prefix": " \n ",
"type": "TextQuoteSelector",
"suffix": "\u00b6\nDay -30 through -10\nCopper PM "
}
]
}
],
"links": {
"json": "https://hypothes.is/api/annotations/IY2W_pxEEeiVuxfD3sehjQ",
"html": "https://hypothes.is/a/IY2W_pxEEeiVuxfD3sehjQ",
"incontext": "https://hyp.is/IY2W_pxEEeiVuxfD3sehjQ/pilot.data-commons.us/organize/CopperInternalDeliveryWorkFlow/"
},
"tags": [],
"text": "This is a sample annotation",
"created": "2018-08-10T02:21:46.898833+00:00",
"uri": "http://pilot.data-commons.us/organize/CopperInternalDeliveryWorkFlow/",
"flagged": false,
"user_info": {
"display_name": null
},
"user": "acct:charlesreid1dib@hypothes.is",
"hidden": false,
"document": {
"title": [
"Copper Internal Delivery Workflow - Data Commons Internal Site"
]
},
"id": "IY2W_pxEEeiVuxfD3sehjQ",
"permissions": {
"read": [
"group:__world__"
],
"admin": [
"acct:charlesreid1dib@hypothes.is"
],
"update": [
"acct:charlesreid1dib@hypothes.is"
],
"delete": [
"acct:charlesreid1dib@hypothes.is"
]
}
}
],
"total": 1
}
```

View File

@@ -40,6 +40,7 @@ class UpdateIndexTask(object):
'groupsio_username' : app_config['GROUPSIO_USERNAME'],
'groupsio_password' : app_config['GROUPSIO_PASSWORD']
}
self.disqus_token = app_config['DISQUS_TOKEN']
thread.daemon = True
thread.start()
@@ -54,6 +55,7 @@ class UpdateIndexTask(object):
search.update_index(self.groupsio_credentials,
self.gh_token,
self.disqus_token,
self.run_which,
config)
@@ -265,7 +267,11 @@ def list_docs(doctype):
if org['login']=='dcppc':
# Business as usual
search = Search(app.config["INDEX_DIR"])
return jsonify(search.get_list(doctype))
results_list = search.get_list(doctype)
for result in results_list:
ct = result['created_time']
result['created_time'] = datetime.strftime(ct,"%Y-%m-%d %I:%M %p")
return jsonify(results_list)
# nope
return render_template('403.html')
@@ -347,5 +353,5 @@ if __name__ == '__main__':
port = 5000
else:
port = int(port)
app.run(host="0.0.0.0",port=port)
app.run(host="0.0.0.0", port=port)

View File

@@ -6,6 +6,8 @@ import base64
from gdrive_util import GDrive
from groupsio_util import GroupsIOArchivesCrawler, GroupsIOException
from disqus_util import DisqusCrawler
from apiclient.http import MediaIoBaseDownload
import mistune
@@ -19,8 +21,11 @@ import codecs
from datetime import datetime
import dateutil.parser
from whoosh import query
from whoosh.qparser import MultifieldParser, QueryParser
from whoosh.analysis import StemmingAnalyzer
from whoosh.analysis import StemmingAnalyzer, LowercaseFilter, StopFilter
from whoosh.qparser.dateparse import DateParserPlugin
from whoosh import fields, index
"""
@@ -103,10 +108,21 @@ class Search:
# ------------------------------
# Update the entire index
def update_index(self, groupsio_credentials, gh_token, run_which, config):
def update_index(self, groupsio_credentials, gh_token, disqus_token, run_which, config):
"""
Update the entire search index
"""
if run_which=='all' or run_which=='disqus':
try:
self.update_index_disqus(disqus_token, config)
except Exception as e:
print("ERROR: While re-indexing: failed to update Disqus comment threads")
print("-"*40)
print(repr(e))
print("-"*40)
print("Continuing...")
pass
if run_which=='all' or run_which=='emailthreads':
try:
self.update_index_emailthreads(groupsio_credentials, config)
@@ -172,7 +188,8 @@ class Search:
os.mkdir(index_folder)
exists = index.exists_in(index_folder)
stemming_analyzer = StemmingAnalyzer()
#stemming_analyzer = StemmingAnalyzer()
stemming_analyzer = StemmingAnalyzer() | LowercaseFilter() | StopFilter()
# ------------------------------
@@ -180,30 +197,38 @@ class Search:
# is defined.
schema = Schema(
id = ID(stored=True, unique=True),
kind = ID(stored=True),
id = fields.ID(stored=True, unique=True),
kind = fields.ID(stored=True),
created_time = ID(stored=True),
modified_time = ID(stored=True),
indexed_time = ID(stored=True),
created_time = fields.DATETIME(stored=True),
modified_time = fields.DATETIME(stored=True),
indexed_time = fields.DATETIME(stored=True),
title = TEXT(stored=True, field_boost=100.0),
url = ID(stored=True, unique=True),
mimetype=ID(stored=True),
owner_email=ID(stored=True),
owner_name=TEXT(stored=True),
repo_name=TEXT(stored=True),
repo_url=ID(stored=True),
title = fields.TEXT(stored=True, field_boost=100.0),
github_user=TEXT(stored=True),
url = fields.ID(stored=True),
mimetype = fields.TEXT(stored=True),
owner_email = fields.ID(stored=True),
owner_name = fields.TEXT(stored=True),
# mainly for email threads, groups.io, hypothesis
group = fields.ID(stored=True),
repo_name = fields.TEXT(stored=True),
repo_url = fields.ID(stored=True),
github_user = fields.TEXT(stored=True),
tags = fields.KEYWORD(commas=True,
stored=True,
lowercase=True),
# comments only
issue_title=TEXT(stored=True, field_boost=100.0),
issue_url=ID(stored=True),
issue_title = fields.TEXT(stored=True, field_boost=100.0),
issue_url = fields.ID(stored=True),
content=TEXT(stored=True, analyzer=stemming_analyzer)
content = fields.TEXT(stored=True, analyzer=stemming_analyzer)
)
@@ -243,24 +268,32 @@ class Search:
writer.delete_by_term('id',item['id'])
# Index a plain google drive file
writer.add_document(
id = item['id'],
kind = 'gdoc',
created_time = item['createdTime'],
modified_time = item['modifiedTime'],
indexed_time = datetime.now().replace(microsecond=0).isoformat(),
title = item['name'],
url = item['webViewLink'],
mimetype = mimetype,
owner_email = item['owners'][0]['emailAddress'],
owner_name = item['owners'][0]['displayName'],
repo_name='',
repo_url='',
github_user='',
issue_title='',
issue_url='',
content = content
)
created_time = dateutil.parser.parse(item['createdTime'])
modified_time = dateutil.parser.parse(item['modifiedTime'])
indexed_time = datetime.now().replace(microsecond=0)
try:
writer.add_document(
id = item['id'],
kind = 'gdoc',
created_time = created_time,
modified_time = modified_time,
indexed_time = indexed_time,
title = item['name'],
url = item['webViewLink'],
mimetype = mimetype,
owner_email = item['owners'][0]['emailAddress'],
owner_name = item['owners'][0]['displayName'],
group='',
repo_name='',
repo_url='',
github_user='',
issue_title='',
issue_url='',
content = content
)
except ValueError as e:
print(repr(e))
print(" > XXXXXX Failed to index Google Drive file \"%s\""%(item['name']))
else:
@@ -314,7 +347,7 @@ class Search:
)
assert output == ""
except RuntimeError:
print(" > XXXXXX Failed to index document \"%s\""%(item['name']))
print(" > XXXXXX Failed to index Google Drive document \"%s\""%(item['name']))
# If export was successful, read contents of markdown
@@ -342,24 +375,33 @@ class Search:
else:
print(" > Creating a new record")
writer.add_document(
id = item['id'],
kind = 'gdoc',
created_time = item['createdTime'],
modified_time = item['modifiedTime'],
indexed_time = datetime.now().replace(microsecond=0).isoformat(),
title = item['name'],
url = item['webViewLink'],
mimetype = mimetype,
owner_email = item['owners'][0]['emailAddress'],
owner_name = item['owners'][0]['displayName'],
repo_name='',
repo_url='',
github_user='',
issue_title='',
issue_url='',
content = content
)
try:
created_time = dateutil.parser.parse(item['createdTime'])
modified_time = dateutil.parser.parse(item['modifiedTime'])
indexed_time = datetime.now()
writer.add_document(
id = item['id'],
kind = 'gdoc',
created_time = created_time,
modified_time = modified_time,
indexed_time = indexed_time,
title = item['name'],
url = item['webViewLink'],
mimetype = mimetype,
owner_email = item['owners'][0]['emailAddress'],
owner_name = item['owners'][0]['displayName'],
group='',
repo_name='',
repo_url='',
github_user='',
issue_title='',
issue_url='',
content = content
)
except ValueError as e:
print(repr(e))
print(" > XXXXXX Failed to index Google Drive file \"%s\""%(item['name']))
@@ -393,31 +435,36 @@ class Search:
issue_comment_content += comment.body.rstrip()
issue_comment_content += "\n"
# Now create the actual search index record
created_time = clean_timestamp(issue.created_at)
modified_time = clean_timestamp(issue.updated_at)
indexed_time = clean_timestamp(datetime.now())
# Now create the actual search index record.
# Add one document per issue thread,
# containing entire text of thread.
writer.add_document(
id = issue.html_url,
kind = 'issue',
created_time = created_time,
modified_time = modified_time,
indexed_time = indexed_time,
title = issue.title,
url = issue.html_url,
mimetype='',
owner_email='',
owner_name='',
repo_name = repo_name,
repo_url = repo_url,
github_user = issue.user.login,
issue_title = issue.title,
issue_url = issue.html_url,
content = issue_comment_content
)
created_time = issue.created_at
modified_time = issue.updated_at
indexed_time = datetime.now()
try:
writer.add_document(
id = issue.html_url,
kind = 'issue',
created_time = created_time,
modified_time = modified_time,
indexed_time = indexed_time,
title = issue.title,
url = issue.html_url,
mimetype='',
owner_email='',
owner_name='',
group='',
repo_name = repo_name,
repo_url = repo_url,
github_user = issue.user.login,
issue_title = issue.title,
issue_url = issue.html_url,
content = issue_comment_content
)
except ValueError as e:
print(repr(e))
print(" > XXXXXX Failed to index Github issue \"%s\""%(issue.title))
@@ -447,7 +494,8 @@ class Search:
print(" > XXXXXXXX Failed to find file info.")
return
indexed_time = clean_timestamp(datetime.now())
indexed_time = datetime.now()
if fext in MARKDOWN_EXTS:
print("Indexing markdown doc %s from repo %s"%(fname,repo_name))
@@ -476,24 +524,31 @@ class Search:
usable_url = "https://github.com/%s/blob/master/%s"%(repo_name, fpath)
# Now create the actual search index record
writer.add_document(
id = fsha,
kind = 'markdown',
created_time = '',
modified_time = '',
indexed_time = indexed_time,
title = fname,
url = usable_url,
mimetype='',
owner_email='',
owner_name='',
repo_name = repo_name,
repo_url = repo_url,
github_user = '',
issue_title = '',
issue_url = '',
content = content
)
try:
writer.add_document(
id = fsha,
kind = 'markdown',
created_time = None,
modified_time = None,
indexed_time = indexed_time,
title = fname,
url = usable_url,
mimetype='',
owner_email='',
owner_name='',
group='',
repo_name = repo_name,
repo_url = repo_url,
github_user = '',
issue_title = '',
issue_url = '',
content = content
)
except ValueError as e:
print(repr(e))
print(" > XXXXXX Failed to index Github markdown file \"%s\""%(fname))
else:
print("Indexing github file %s from repo %s"%(fname,repo_name))
@@ -501,24 +556,29 @@ class Search:
key = fname+"_"+fsha
# Now create the actual search index record
writer.add_document(
id = key,
kind = 'ghfile',
created_time = '',
modified_time = '',
indexed_time = indexed_time,
title = fname,
url = repo_url,
mimetype='',
owner_email='',
owner_name='',
repo_name = repo_name,
repo_url = repo_url,
github_user = '',
issue_title = '',
issue_url = '',
content = ''
)
try:
writer.add_document(
id = key,
kind = 'ghfile',
created_time = None,
modified_time = None,
indexed_time = indexed_time,
title = fname,
url = repo_url,
mimetype='',
owner_email='',
owner_name='',
group='',
repo_name = repo_name,
repo_url = repo_url,
github_user = '',
issue_title = '',
issue_url = '',
content = ''
)
except ValueError as e:
print(repr(e))
print(" > XXXXXX Failed to index Github file \"%s\""%(fname))
@@ -529,30 +589,84 @@ class Search:
def add_emailthread(self, writer, d, config, update=True):
"""
Use a Github file API record to add a filename
to the search index.
Use a Groups.io email thread record to add
an email thread to the search index.
"""
indexed_time = clean_timestamp(datetime.now())
if 'created_time' in d.keys() and d['created_time'] is not None:
created_time = d['created_time']
else:
created_time = None
if 'modified_time' in d.keys() and d['modified_time'] is not None:
modified_time = d['modified_time']
else:
modified_time = None
indexed_time = datetime.now()
# Now create the actual search index record
writer.add_document(
id = d['permalink'],
kind = 'emailthread',
created_time = '',
modified_time = '',
indexed_time = indexed_time,
title = d['subject'],
url = d['permalink'],
mimetype='',
owner_email='',
owner_name=d['original_sender'],
repo_name = '',
repo_url = '',
github_user = '',
issue_title = '',
issue_url = '',
content = d['content']
)
try:
writer.add_document(
id = d['permalink'],
kind = 'emailthread',
created_time = created_time,
modified_time = modified_time,
indexed_time = indexed_time,
title = d['subject'],
url = d['permalink'],
mimetype='',
owner_email='',
owner_name=d['original_sender'],
group=d['subgroup'],
repo_name = '',
repo_url = '',
github_user = '',
issue_title = '',
issue_url = '',
content = d['content']
)
except ValueError as e:
print(repr(e))
print(" > XXXXXX Failed to index Groups.io thread \"%s\""%(d['subject']))
# ------------------------------
# Add a single disqus comment thread
# to the search index.
def add_disqusthread(self, writer, d, config, update=True):
"""
Use a disqus comment thread record
to add a disqus comment thread to the
search index.
"""
indexed_time = datetime.now()
# created_time is already a timestamp
# Now create the actual search index record
try:
writer.add_document(
id = d['id'],
kind = 'disqus',
created_time = d['created_time'],
modified_time = None,
indexed_time = indexed_time,
title = d['title'],
url = d['link'],
mimetype='',
owner_email='',
owner_name='',
repo_name = '',
repo_url = '',
github_user = '',
issue_title = '',
issue_url = '',
content = d['content']
)
except ValueError as e:
print(repr(e))
print(" > XXXXXX Failed to index Disqus comment thread \"%s\""%(d['title']))
@@ -580,9 +694,8 @@ class Search:
# Updated algorithm:
# - get set of indexed ids
# - get set of remote ids
# - drop indexed ids not in remote ids
# - drop all indexed ids
# - index all remote ids
# - add hash check in add_
# Get the set of indexed ids:
@@ -631,10 +744,10 @@ class Search:
full_items[f['id']] = f
## Shorter:
#break
# Longer:
if nextPageToken is None:
break
break
## Longer:
#if nextPageToken is None:
# break
writer = self.ix.writer()
@@ -642,34 +755,41 @@ class Search:
temp_dir = tempfile.mkdtemp(dir=os.getcwd())
print("Temporary directory: %s"%(temp_dir))
try:
# Drop any id in indexed_ids
# not in remote_ids
drop_ids = indexed_ids - remote_ids
for drop_id in drop_ids:
writer.delete_by_term('id',drop_id)
# Drop any id in indexed_ids
# not in remote_ids
drop_ids = indexed_ids - remote_ids
for drop_id in drop_ids:
writer.delete_by_term('id',drop_id)
# Update any id in indexed_ids
# and in remote_ids
update_ids = indexed_ids & remote_ids
for update_id in update_ids:
# cop out
writer.delete_by_term('id',update_id)
item = full_items[update_id]
self.add_drive_file(writer, item, temp_dir, config, update=True)
count += 1
# Update any id in indexed_ids
# and in remote_ids
update_ids = indexed_ids & remote_ids
for update_id in update_ids:
# cop out
writer.delete_by_term('id',update_id)
item = full_items[update_id]
self.add_drive_file(writer, item, temp_dir, config, update=True)
count += 1
# Add any id not in indexed_ids
# and in remote_ids
add_ids = remote_ids - indexed_ids
for add_id in add_ids:
item = full_items[add_id]
self.add_drive_file(writer, item, temp_dir, config, update=False)
count += 1
# Add any id not in indexed_ids
# and in remote_ids
add_ids = remote_ids - indexed_ids
for add_id in add_ids:
item = full_items[add_id]
self.add_drive_file(writer, item, temp_dir, config, update=False)
count += 1
except Exception as e:
print("ERROR: While adding Google Drive files to search index")
print("-"*40)
print(repr(e))
print("-"*40)
print("Continuing...")
pass
print("Cleaning temporary directory: %s"%(temp_dir))
subprocess.call(['rm','-fr',temp_dir])
@@ -686,12 +806,6 @@ class Search:
Update the search index using a collection of
Github repo issues and comments.
"""
# Updated algorithm:
# - get set of indexed ids
# - get set of remote ids
# - drop indexed ids not in remote ids
# - index all remote ids
# Get the set of indexed ids:
# ------
indexed_issues = set()
@@ -772,12 +886,6 @@ class Search:
files (and, separately, Markdown files) from
a Github repo.
"""
# Updated algorithm:
# - get set of indexed ids
# - get set of remote ids
# - drop indexed ids not in remote ids
# - index all remote ids
# Get the set of indexed ids:
# ------
indexed_ids = set()
@@ -896,12 +1004,6 @@ class Search:
RELEASE THE SPIDER!!!
"""
# Algorithm:
# - get set of indexed ids
# - get set of remote ids
# - drop indexed ids not in remote ids
# - index all remote ids
# Get the set of indexed ids:
# ------
indexed_ids = set()
@@ -919,16 +1021,17 @@ class Search:
# ask spider to crawl the archives
spider.crawl_group_archives()
# now spider.archives is a list of dictionaries
# that each represent a thread:
# thread = {
# 'permalink' : permalink,
# 'subject' : subject,
# 'original_sender' : original_sender,
# 'content' : full_content
# }
# now spider.archives is a dictionary
# with one key per thread ID,
# and a value set to the payload:
# '<thread-id>' : {
# 'permalink' : permalink,
# 'subject' : subject,
# 'original_sender' : original_sender,
# 'content' : full_content
# }
#
# It is hard to reliablly extract more information
# It is hard to reliably extract more information
# than that from the email thread.
writer = self.ix.writer()
@@ -958,6 +1061,75 @@ class Search:
print("Done, updated %d Groups.io email threads in the index" % count)
# ------------------------------
# Disqus Comments
def update_index_disqus(self, disqus_token, config):
"""
Update the search index using a collection of
Disqus comment threads from the dcppc-internal
forum.
"""
# Updated algorithm:
# - get set of indexed ids
# - get set of remote ids
# - drop all indexed ids
# - index all remote ids
# Get the set of indexed ids:
# --------------------
indexed_ids = set()
p = QueryParser("kind", schema=self.ix.schema)
q = p.parse("disqus")
with self.ix.searcher() as s:
results = s.search(q,limit=None)
for result in results:
indexed_ids.add(result['id'])
# Get the set of remote ids:
# ------
spider = DisqusCrawler(disqus_token,'dcppc-internal')
# ask spider to crawl disqus comments
spider.crawl_threads()
# spider.comments will be a dictionary
# with keys as thread IDs and values as
# a dictionary item
writer = self.ix.writer()
count = 0
# archives is a dictionary
# keys are IDs (urls)
# values are dictionaries
threads = spider.get_threads()
# Start by collecting all the things
remote_ids = set()
for k in threads.keys():
remote_ids.add(k)
# drop indexed_ids
for drop_id in indexed_ids:
writer.delete_by_term('id',drop_id)
# add remote_ids
for add_id in remote_ids:
item = threads[add_id]
self.add_disqusthread(writer, item, config, update=False)
count += 1
writer.commit()
print("Done, updated %d Disqus comment threads in the index" % count)
# ---------------------------------
# Search results bundler
@@ -1044,6 +1216,7 @@ class Search:
"ghfile" : None,
"markdown" : None,
"emailthread" : None,
"disqus" : None,
"total" : None
}
for key in counts.keys():
@@ -1074,7 +1247,9 @@ class Search:
elif doctype=='issue':
item_keys = ['title','repo_name','repo_url','url','created_time','modified_time']
elif doctype=='emailthread':
item_keys = ['title','owner_name','url']
item_keys = ['title','owner_name','url','group','created_time','modified_time']
elif doctype=='disqus':
item_keys = ['title','created_time','url']
elif doctype=='ghfile':
item_keys = ['title','repo_name','repo_url','url']
elif doctype=='markdown':
@@ -1091,11 +1266,7 @@ class Search:
for r in results:
d = {}
for k in item_keys:
if k=='created_time' or k=='modified_time':
#d[k] = r[k]
d[k] = dateutil.parser.parse(r[k]).strftime("%Y-%m-%d")
else:
d[k] = r[k]
d[k] = r[k]
json_results.append(d)
return json_results
@@ -1108,7 +1279,16 @@ class Search:
query_string = " ".join(query_list)
query = None
if ":" in query_string:
query = QueryParser("content", self.schema).parse(query_string)
#query = QueryParser("content",
# self.schema
#).parse(query_string)
query = QueryParser("content",
self.schema,
termclass=query.Variations
)
query.add_plugin(DateParserPlugin(free=True))
query = query.parse(query_string)
elif len(fields) == 1 and fields[0] == "filename":
pass
elif len(fields) == 2:
@@ -1116,9 +1296,12 @@ class Search:
else:
# If the user does not specify a field,
# these are the fields that are actually searched
fields = ['title', 'content','owner_name','owner_email','url']
fields = ['title', 'content','owner_name','owner_email','url','created_date','modified_date']
if not query:
query = MultifieldParser(fields, schema=self.ix.schema).parse(query_string)
query = MultifieldParser(fields, schema=self.ix.schema)
query.add_plugin(DateParserPlugin(free=True))
query = query.parse(query_string)
#query = MultifieldParser(fields, schema=self.ix.schema).parse(query_string)
parsed_query = "%s" % query
print("query: %s" % parsed_query)
results = searcher.search(query, terms=False, scored=True, groupedby="kind")

154
disqus_util.py Normal file
View File

@@ -0,0 +1,154 @@
import os, re
import requests
import json
import dateutil.parser
from pprint import pprint
"""
Convenience class wrapper for Disqus comments.
This requires that the user provide either their
API OAuth application credentials (in which case
a user needs to authenticate with the application
so it can access the comments that they can see)
or user credentials from a previous login.
"""
class DisqusCrawler(object):
def __init__(self,
credentials,
group_name):
self.credentials = credentials
self.group_name = group_name
self.crawled_comments = False
self.threads = None
def get_threads(self):
"""
Return a list of dictionaries containing
entries for each comment thread in the given
disqus forum.
"""
return self.threads
def crawl_threads(self):
"""
This will use the API to get every thread,
and will iterate through every thread to
get every comment thread.
"""
# The money shot
threads = {}
# list all threads
list_threads_url = 'https://disqus.com/api/3.0/threads/list.json'
# list all posts (comments)
list_posts_url = 'https://disqus.com/api/3.0/threads/listPosts.json'
base_params = dict(
api_key=self.credentials,
forum=self.group_name
)
# prepare url params
params = {}
for k in base_params.keys():
params[k] = base_params[k]
# make api call (first loop in fencepost)
results = requests.request('GET', list_threads_url, params=params).json()
cursor = results['cursor']
responses = results['response']
while True:
for response in responses:
if '127.0.0.1' not in response['link'] and 'localhost' not in response['link']:
# Save thread info
thread_id = response['id']
thread_count = response['posts']
print("Working on thread %s (%d posts)"%(thread_id,thread_count))
if thread_count > 0:
# prepare url params
params_comments = {}
for k in base_params.keys():
params_comments[k] = base_params[k]
params_comments['thread'] = thread_id
# make api call
results_comments = requests.request('GET', list_posts_url, params=params_comments).json()
cursor_comments = results_comments['cursor']
responses_comments = results_comments['response']
# Save comments for this thread
thread_comments = []
while True:
for comment in responses_comments:
# Save comment info
print(" + %s"%(comment['message']))
thread_comments.append(comment['message'])
if cursor_comments['hasNext']:
# Prepare for the next URL call
params_comments = {}
for k in base_params.keys():
params_comments[k] = base_params[k]
params_comments['thread'] = thread_id
params_comments['cursor'] = cursor_comments['next']
# Make the next URL call
results_comments = requests.request('GET', list_posts_url, params=params_comments).json()
cursor_comments = results_comments['cursor']
responses_comments = results_comments['response']
else:
break
link = response['link']
clean_link = re.sub('data-commons.us','nihdatacommons.us',link)
clean_link += "#disqus_comments"
# Finished working on thread.
# We need to make this value a dictionary
thread_info = dict(
id = response['id'],
created_time = dateutil.parser.parse(response['createdAt']),
title = response['title'],
forum = response['forum'],
link = clean_link,
content = "\n\n-----".join(thread_comments)
)
threads[thread_id] = thread_info
if 'hasNext' in cursor.keys() and cursor['hasNext']:
# Prepare for next URL call
params = {}
for k in base_params.keys():
params[k] = base_params[k]
params['cursor'] = cursor['next']
# Make the next URL call
results = requests.request('GET', list_threads_url, params=params).json()
cursor = results['cursor']
responses = results['response']
else:
break
self.threads = threads

View File

@@ -1,5 +1,7 @@
import requests, os, re
from bs4 import BeautifulSoup
import dateutil.parser
import datetime
class GroupsIOException(Exception):
pass
@@ -64,7 +66,7 @@ class GroupsIOArchivesCrawler(object):
## Short circuit
## for debugging purposes
#break
break
return subgroups
@@ -251,7 +253,7 @@ class GroupsIOArchivesCrawler(object):
subject = soup.find('title').text
# Extract information for the schema:
# - permalink for thread (done)
# - permalink for thread (done above)
# - subject/title (done)
# - original sender email/name (done)
# - content (done)
@@ -266,11 +268,35 @@ class GroupsIOArchivesCrawler(object):
pass
else:
# found an email!
# this is a maze, thanks groups.io
# this is a maze, not amazing.
# thanks groups.io!
td = tr.find('td')
divrow = td.find('div',{'class':'row'}).find('div',{'class':'pull-left'})
sender_divrow = td.find('div',{'class':'row'})
sender_divrow = sender_divrow.find('div',{'class':'pull-left'})
if (i+1)==1:
original_sender = divrow.text.strip()
original_sender = sender_divrow.text.strip()
date_divrow = td.find('div',{'class':'row'})
date_divrow = date_divrow.find('div',{'class':'pull-right'})
date_divrow = date_divrow.find('font',{'class':'text-muted'})
date_divrow = date_divrow.find('script').text
try:
time_seconds = re.search(' [0-9]{1,} ',date_divrow).group(0)
time_seconds = time_seconds.strip()
# Thanks groups.io for the weird date formatting
time_seconds = time_seconds[:10]
mmicro_seconds = time_seconds[10:]
if (i+1)==1:
created_time = datetime.datetime.utcfromtimestamp(int(time_seconds))
modified_time = datetime.datetime.utcfromtimestamp(int(time_seconds))
else:
modified_time = datetime.datetime.utcfromtimestamp(int(time_seconds))
except AttributeError:
created_time = None
modified_time = None
for div in td.find_all('div'):
if div.has_attr('id'):
@@ -299,7 +325,10 @@ class GroupsIOArchivesCrawler(object):
thread = {
'permalink' : permalink,
'created_time' : created_time,
'modified_time' : modified_time,
'subject' : subject,
'subgroup' : subgroup_name,
'original_sender' : original_sender,
'content' : full_content
}
@@ -324,11 +353,13 @@ class GroupsIOArchivesCrawler(object):
results = []
for row in rows:
# We don't care about anything except title and ugly link
# This is where we extract
# a list of thread titles
# and corresponding links.
subject = row.find('span',{'class':'subject'})
title = subject.get_text()
link = row.find('a')['href']
#print(title)
results.append((title,link))
return results

89
hypothesis_util.py Normal file
View File

@@ -0,0 +1,89 @@
import requests
import json
import os
def get_headers():
if 'HYPOTHESIS_TOKEN' in os.environ:
token = os.environ['HYPOTHESIS_TOKEN']
else:
raise Exception("Need to specify Hypothesis token with HYPOTHESIS_TOKEN env var")
auth_header = 'Bearer %s'%(token)
return {'Authorization': auth_header}
def basic_auth():
url = ' https://hypothes.is/api'
# Get the authorization header
headers = get_headers()
# Make the request
response = requests.get(url, headers=headers)
if response.status_code==200:
# Interpret results as JSON
dat = response.json()
print(json.dumps(dat, indent=4))
else:
print("Response status code was not OK: %d"%(response.status_code))
def list_annotations():
# kEaohJC9Eeiy_UOozkpkyA
url = 'https://hypothes.is/api/annotations/kEaohJC9Eeiy_UOozkpkyA'
# Get the authorization header
headers = get_headers()
# Make the request
response = requests.get(url, headers=headers)
if response.status_code==200:
# Interpret results as JSON
dat = response.json()
print(json.dumps(dat, indent=4))
else:
print("Response status code was not OK: %d"%(response.status_code))
def search_annotations():
url = ' https://hypothes.is/api/search'
# Get the authorization header
headers = get_headers()
# Set query params
params = dict(
url = '*pilot.nihdatacommons.us*',
limit = 200
)
#http://pilot.nihdatacommons.us/organize/CopperInternalDeliveryWorkFlow/',
# Make the request
response = requests.get(url, headers=headers, params=params)
if response.status_code==200:
# Interpret results as JSON
dat = response.json()
print(json.dumps(dat, indent=4))
else:
print("Response status code was not OK: %d"%(response.status_code))
if __name__=="__main__":
search_annotations()

View File

@@ -22,6 +22,7 @@ var initIssuesTable = false;
var initGhfilesTable = false;
var initMarkdownTable = false;
var initEmailthreadsTable = false;
var initDisqusTable = false;
$(document).ready(function() {
var url_string = document.location.toString();
@@ -32,10 +33,6 @@ $(document).ready(function() {
load_gdoc_table();
var divList = $('div#collapseDrive').addClass('in');
} else if (d==='emailthread') {
load_emailthreads_table();
var divList = $('div#collapseThreads').addClass('in');
} else if (d==='issue') {
load_issue_table();
var divList = $('div#collapseIssues').addClass('in');
@@ -48,10 +45,37 @@ $(document).ready(function() {
load_markdown_table();
var divList = $('div#collapseMarkdown').addClass('in');
} else if (d==='emailthread') {
load_emailthreads_table();
var divList = $('div#collapseThreads').addClass('in');
} else if (d==='disqus') {
load_disqusthreads_table();
var divList = $('div#collapseDisqus').addClass('in');
}
});
//////////////////////////////////
// utility functions
// https://stackoverflow.com/a/25275808
function iso8601(date) {
var hours = date.getHours();
var minutes = date.getMinutes();
var ampm = hours >= 12 ? 'PM' : 'AM';
hours = hours % 12;
hours = hours ? hours : 12; // the hour '0' should be '12'
minutes = minutes < 10 ? '0'+minutes : minutes;
var strTime = hours + ':' + minutes + ' ' + ampm;
return date.getYear() + "-" + (date.getMonth()+1) + "-" + date.getDate() + " " + strTime;
}
// https://stackoverflow.com/a/7390612
var toType = function(obj) {
return ({}).toString.call(obj).match(/\s([a-zA-Z]+)/)[1].toLowerCase()
}
//////////////////////////////////
// API-to-Table Functions
@@ -77,9 +101,9 @@ function load_gdoc_table(){
if(!initGdocTable) {
var divList = $('div#collapseDrive').attr('class');
if (divList.indexOf('in') !== -1) {
console.log('Closing Google Drive master list');
//console.log('Closing Google Drive master list');
} else {
console.log('Opening Google Drive master list');
//console.log('Opening Google Drive master list');
$.getJSON("/list/gdoc", function(result){
@@ -125,7 +149,7 @@ function load_gdoc_table(){
initGdocTable = true
});
console.log('Finished loading Google Drive master list');
//console.log('Finished loading Google Drive master list');
}
}
}
@@ -137,9 +161,9 @@ function load_issue_table(){
if(!initIssuesTable) {
var divList = $('div#collapseIssues').attr('class');
if (divList.indexOf('in') !== -1) {
console.log('Closing Github issues master list');
//console.log('Closing Github issues master list');
} else {
console.log('Opening Github issues master list');
//console.log('Opening Github issues master list');
$.getJSON("/list/issue", function(result){
var r = new Array(), j = -1, size=result.length;
@@ -183,7 +207,7 @@ function load_issue_table(){
initIssuesTable = true;
});
console.log('Finished loading Github issues master list');
//console.log('Finished loading Github issues master list');
}
}
}
@@ -195,13 +219,13 @@ function load_ghfile_table(){
if(!initGhfilesTable) {
var divList = $('div#collapseFiles').attr('class');
if (divList.indexOf('in') !== -1) {
console.log('Closing Github files master list');
//console.log('Closing Github files master list');
} else {
console.log('Opening Github files master list');
//console.log('Opening Github files master list');
$.getJSON("/list/ghfile", function(result){
console.log("-----------");
console.log(result);
//console.log("-----------");
//console.log(result);
var r = new Array(), j = -1, size=result.length;
r[++j] = '<thead>'
r[++j] = '<tr class="header-row">';
@@ -237,7 +261,7 @@ function load_ghfile_table(){
initGhfilesTable = true;
});
console.log('Finished loading Github file list');
//console.log('Finished loading Github file list');
}
}
}
@@ -249,9 +273,9 @@ function load_markdown_table(){
if(!initMarkdownTable) {
var divList = $('div#collapseMarkdown').attr('class');
if (divList.indexOf('in') !== -1) {
console.log('Closing Github markdown master list');
//console.log('Closing Github markdown master list');
} else {
console.log('Opening Github markdown master list');
//console.log('Opening Github markdown master list');
$.getJSON("/list/markdown", function(result){
var r = new Array(), j = -1, size=result.length;
@@ -289,7 +313,7 @@ function load_markdown_table(){
initMarkdownTable = true;
});
console.log('Finished loading Markdown list');
//console.log('Finished loading Markdown list');
}
}
}
@@ -302,16 +326,18 @@ function load_emailthreads_table(){
if(!initEmailthreadsTable) {
var divList = $('div#collapseThreads').attr('class');
if (divList.indexOf('in') !== -1) {
console.log('Closing Groups.io email threads master list');
//console.log('Closing Groups.io email threads master list');
} else {
console.log('Opening Groups.io email threads master list');
//console.log('Opening Groups.io email threads master list');
$.getJSON("/list/emailthread", function(result){
var r = new Array(), j = -1, size=result.length;
r[++j] = '<thead>'
r[++j] = '<tr class="header-row">';
r[++j] = '<th width="70%">Topic</th>';
r[++j] = '<th width="30%">Started By</th>';
r[++j] = '<th width="60%">Topic</th>';
r[++j] = '<th width="15%">Started By</th>';
r[++j] = '<th width="15%">Date</th>';
r[++j] = '<th width="10%">Mailing List</th>';
r[++j] = '</tr>';
r[++j] = '</thead>'
r[++j] = '<tbody>'
@@ -322,6 +348,10 @@ function load_emailthreads_table(){
r[++j] = '</a>'
r[++j] = '</td><td>';
r[++j] = result[i]['owner_name'];
r[++j] = '</td><td>';
r[++j] = result[i]['created_time'];
r[++j] = '</td><td>';
r[++j] = result[i]['group'];
r[++j] = '</td></tr>';
}
r[++j] = '</tbody>'
@@ -340,7 +370,57 @@ function load_emailthreads_table(){
initEmailthreadsTable = true;
});
console.log('Finished loading Groups.io email threads list');
//console.log('Finished loading Groups.io email threads list');
}
}
}
// ------------------------
// Disqus Comment Threads
function load_disqusthreads_table(){
if(!initEmailthreadsTable) {
var divList = $('div#collapseDisqus').attr('class');
if (divList.indexOf('in') !== -1) {
//console.log('Closing Disqus comment threads master list');
} else {
//console.log('Opening Disqus comment threads master list');
$.getJSON("/list/disqus", function(result){
var r = new Array(), j = -1, size=result.length;
r[++j] = '<thead>'
r[++j] = '<tr class="header-row">';
r[++j] = '<th width="70%">Page Title</th>';
r[++j] = '<th width="30%">Created</th>';
r[++j] = '</tr>';
r[++j] = '</thead>'
r[++j] = '<tbody>'
for (var i=0; i<size; i++){
r[++j] ='<tr><td>';
r[++j] = '<a href="' + result[i]['url'] + '" target="_blank">'
r[++j] = result[i]['title'];
r[++j] = '</a>'
r[++j] = '</td><td>';
r[++j] = result[i]['created_time'];
r[++j] = '</td></tr>';
}
r[++j] = '</tbody>'
// Construct names of id tags
var doctype = 'disqus';
var idlabel = '#' + doctype + '-master-list';
var filtlabel = idlabel + '_filter';
// Initialize the DataTable
$(idlabel).html(r.join(''));
$(idlabel).DataTable({
responsive: true,
lengthMenu: [50,100,250,500]
});
initDisqusTable = true;
});
console.log('Finished loading Disqus comment threads list');
}
}
}

View File

@@ -31,7 +31,7 @@ $(document).ready(function() {
aTargets : [2]
}
],
lengthMenu: [50,100,250,500]
lengthMenu: [10,20,50,100]
});
console.log('Finished loading search results list');

View File

@@ -86,6 +86,14 @@ div.container {
}
/* badges for number of docs indexed */
span.results-count {
background-color: #555;
}
span.indexing-count {
background-color: #337ab7;
}
span.badge {
vertical-align: text-bottom;
}
@@ -126,7 +134,7 @@ li.search-group-item {
}
div.url {
background-color: rgba(86,61,124,.15);
background-color: rgba(40,40,60,.15);
padding: 8px;
}
@@ -192,7 +200,7 @@ table {
.info, .last-searches {
color: gray;
font-size: 12px;
/*font-size: 12px;*/
font-family: Arial, serif;
}
@@ -202,27 +210,27 @@ table {
div.tags a, td.tag-cloud a {
color: #b56020;
font-size: 12px;
/*font-size: 12px;*/
}
td.tag-cloud, td.directories-cloud {
font-size: 12px;
/*font-size: 12px;*/
color: #555555;
}
td.directories-cloud a {
font-size: 12px;
/*font-size: 12px;*/
color: #377BA8;
}
div.path {
font-size: 12px;
/*font-size: 12px;*/
color: #666666;
margin-bottom: 3px;
}
div.path a {
font-size: 12px;
/*font-size: 12px;*/
margin-right: 5px;
}

View File

@@ -54,6 +54,8 @@
</p>
<p><a href="{{ url_for('update_index',run_which='emailthreads') }}" class="btn btn-large btn-danger btn-reindex-type">Update Groups.io Email Threads Index</a>
</p>
<p><a href="{{ url_for('update_index',run_which='disqus') }}" class="btn btn-large btn-danger btn-reindex-type">Update Disqus Comment Threads Index</a>
</p>
</div>
</div>
</div>

View File

@@ -5,7 +5,7 @@
<div class="alert alert-success alert-dismissible fade in">
<a href="#" class="close" data-dismiss="alert" aria-label="close">&times;</a>
{% for message in messages %}
<p class="lead">{{ message }}</p>
<p>{{ message }}</p>
{% endfor %}
</div>
</div>

View File

@@ -9,8 +9,9 @@
<div class="row">
{#
# google drive files panel
#}
# google drive files panel
#}
<a name="gdoc"></a>
<div class="row">
<div class="panel">
<div class="panel-group" id="accordionDrive" role="tablist" aria-multiselectable="true">
@@ -46,8 +47,9 @@
{#
# github issue panel
#}
# github issue panel
#}
<a name="issue"></a>
<div class="row">
<div class="panel">
<div class="panel-group" id="accordionIssues" role="tablist" aria-multiselectable="true">
@@ -85,8 +87,9 @@
{#
# github file panel
#}
# github file panel
#}
<a name="ghfile"></a>
<div class="row">
<div class="panel">
<div class="panel-group" id="accordionFiles" role="tablist" aria-multiselectable="true">
@@ -122,8 +125,9 @@
{#
# gh markdown file panel
#}
# gh markdown file panel
#}
<a name="markdown"></a>
<div class="row">
<div class="panel">
<div class="panel-group" id="accordionMarkdown" role="tablist" aria-multiselectable="true">
@@ -160,8 +164,9 @@
{#
# groups.io
#}
# groups.io email threads
#}
<a name="emailthread"></a>
<div class="row">
<div class="panel">
<div class="panel-group" id="accordionThreads" role="tablist" aria-multiselectable="true">
@@ -195,6 +200,42 @@
</div>
</div>
{#
# disqus comment threads
#}
<a name="disqus"></a>
<div class="row">
<div class="panel">
<div class="panel-group" id="accordionDisqus" role="tablist" aria-multiselectable="true">
<div class="panel panel-default">
<div class="panel-heading" role="tab" id="disqus">
<h2 class="masterlist-header">
<a class="collapsed"
role="button"
onClick="load_disqusthreads_table()"
data-toggle="collapse"
data-parent="#accordionDisqus"
href="#collapseDisqus"
aria-expanded="true"
aria-controls="collapseDisqus">
Disqus Comment Threads <small>indexed by centillion</small>
</a>
</h2>
</div>
<div id="collapseDisqus" class="panel-collapse collapse" role="tabpanel"
aria-labelledby="disqus">
<div class="panel-body">
<table class="table table-striped" id="disqus-master-list">
</table>
</div>
</div>
</div>
</div>
</div>
</div>
</div>

View File

@@ -52,8 +52,8 @@
<div class="container-fluid">
<div class="row">
<div class="col-xs-12 info">
<b>Found:</b> <span class="badge">{{entries|length}}</span> results
out of <span class="badge">{{totals["total"]}}</span> total items indexed
<b>Found:</b> <span class="badge results-count">{{entries|length}}</span> results
out of <span class="badge results-count">{{totals["total"]}}</span> total items indexed
</div>
</div>
</div>
@@ -67,35 +67,41 @@
<div class="col-xs-12 info">
<b>Indexing:</b>
<span class="badge">{{totals["gdoc"]}}</span>
<a href="/master_list?doctype=gdoc">
<span class="badge indexing-count">{{totals["gdoc"]}}</span>
<a href="/master_list?doctype=gdoc#gdoc">
Google Drive files
</a>,
<span class="badge">{{totals["issue"]}}</span>
<a href="/master_list?doctype=issue">
<span class="badge indexing-count">{{totals["issue"]}}</span>
<a href="/master_list?doctype=issue#issue">
Github issues
</a>,
<span class="badge">{{totals["ghfile"]}}</span>
<a href="/master_list?doctype=ghfile">
<span class="badge indexing-count">{{totals["ghfile"]}}</span>
<a href="/master_list?doctype=ghfile#ghfile">
Github files
</a>,
<span class="badge">{{totals["markdown"]}}</span>
<a href="/master_list?doctype=markdown">
<span class="badge indexing-count">{{totals["markdown"]}}</span>
<a href="/master_list?doctype=markdown#markdown">
Github Markdown files
</a>,
<span class="badge">{{totals["emailthread"]}}</span>
<a href="/master_list?doctype=emailthread">
<span class="badge indexing-count">{{totals["emailthread"]}}</span>
<a href="/master_list?doctype=emailthread#emailthread">
Groups.io email threads
</a>,
<span class="badge indexing-count">{{totals["disqus"]}}</span>
<a href="/master_list?doctype=disqus#disqus">
Disqus comment threads
</a>
</div>
</div>
</div>
</li>
</ul>
</div>
</div>