19 Commits

Author SHA1 Message Date
ab76226b0c Merge pull request #90 from dcppc/add-dates-and-subgroups-to-emails
Add dates and subgroups to emails
2018-08-24 00:07:40 -07:00
a4ebef6e6f extract date and time from email threads pages 2018-08-24 00:04:35 -07:00
bad50efa9b add groups and tags to schema; update how we determine timestamps; handle exceptions when we add the document to the writer, rather than elsewhere 2018-08-24 00:03:23 -07:00
629fc063db move where exception is caught (exception was also incorrect.) 2018-08-24 00:01:26 -07:00
3b0baa21de switched created_time, modified_time, indexed_time over to DATETIME. added DateParserPlugin to query QueryParser. added time fields to those being searched by default. tests do not seem to be working. 2018-08-23 19:01:40 -07:00
6bfadef829 Merge pull request #73 from dcppc/feedback-floater
Add a feedback mechanism
2018-08-21 11:33:34 -07:00
c38683ae9f (resolve conflict) Merge branch 'dcppc' into feedback-floater
* dcppc:
  add centillion config back. no sensitive info.
  add option to set port at runtime with CENTILLION_PORT environment variable
  add a bit o whitespace
2018-08-21 11:32:59 -07:00
3f5349a5a6 Merge pull request #80 from dcppc/add-centillion-config-back
add centillion config back. no sensitive info.
2018-08-21 11:16:21 -07:00
f88cf6ecad add centillion config back. no sensitive info. 2018-08-21 11:15:29 -07:00
ec54292a4b Merge pull request #79 from dcppc/add-port-env-var
add option to set port at runtime
2018-08-21 11:12:17 -07:00
296132d356 add option to set port at runtime with CENTILLION_PORT environment variable 2018-08-21 11:09:46 -07:00
0bc40ba323 Merge pull request #76 from dcppc/add-whitespace
add a bit o whitespace
2018-08-21 10:33:20 -07:00
8143e214c2 add a bit o whitespace 2018-08-21 10:06:16 -07:00
b015da2e9b add dismissable "thanks for your feedback" message to top 2018-08-20 20:42:58 -07:00
9c6b57ba85 improve message formatting 2018-08-20 15:04:21 -07:00
a080eebc29 add dumy function as placeholder for where we add info messages 2018-08-20 15:04:03 -07:00
323d7ce8ca return better messages 2018-08-20 15:03:21 -07:00
da62a5c887 add successful post call and export to JSON db 2018-08-20 14:10:20 -07:00
2714ad3e0c update todo 2018-08-20 14:09:58 -07:00
14 changed files with 582 additions and 269 deletions

2
.gitignore vendored
View File

@@ -1,4 +1,4 @@
config_centillion.py feedback_database.json
config_flask.py config_flask.py
vp vp
credentials.json credentials.json

View File

@@ -12,8 +12,13 @@ one centillion is 3.03 log-times better than a googol.
## What Is It ## What Is It
Centillion (https://github.com/dcppc/centillion) is a search engine that can index Centillion (https://github.com/dcppc/centillion) is a search engine that can index
three kinds of collections: Google Documents (.docx files), Github issues, and Markdown files in different kinds of document collections: Google Documents (.docx files), Google Drive files,
Github repos. Github issues, Github files, Github Markdown files, and Groups.io email threads.
## What Is It
We define the types of documents the centillion should index, We define the types of documents the centillion should index,
what info and how. The centillion then builds and what info and how. The centillion then builds and

View File

@@ -17,11 +17,13 @@ feedback form: where we are at
- feedback button - feedback button
- button triggers modal form - button triggers modal form
- modal has emojis for feedback, text box, buttons - modal has emojis for feedback, text box, buttons
feedback form: what we need to do
- clicking emojis changes color, to select - clicking emojis changes color, to select
- clicking submit with filled out form submits to an endpoint - clicking submit with filled out form submits to an endpoint
- not sure how to use separate url, and then redirect back to same place - clicking submit also closes form, but only if submit successful
feedback form: what we need to do
- fix alerts - thank you for your feedback doesn't show up until a refresh
- probably an easy ajax fix

View File

@@ -3,6 +3,7 @@ import subprocess
import codecs import codecs
import os, json import os, json
from datetime import datetime
from werkzeug.contrib.fixers import ProxyFix from werkzeug.contrib.fixers import ProxyFix
from flask import Flask, request, redirect, url_for, render_template, flash, jsonify from flask import Flask, request, redirect, url_for, render_template, flash, jsonify
@@ -266,16 +267,54 @@ def list_docs(doctype):
search = Search(app.config["INDEX_DIR"]) search = Search(app.config["INDEX_DIR"])
return jsonify(search.get_list(doctype)) return jsonify(search.get_list(doctype))
# nope
return render_template('403.html') return render_template('403.html')
@app.route('/feedback', methods=['POST']) @app.route('/feedback', methods=['POST'])
def parse_request(): def parse_request():
data = request.get_json()
flash("Thank you for your feedback!")
with open('dumdumdumdeedum.json','w') as f:
json.dumps(data,indent=4)
if not github.authorized:
return redirect(url_for("github.login"))
username = github.get("/user").json()['login']
resp = github.get("/user/orgs")
if resp.ok:
all_orgs = resp.json()
for org in all_orgs:
if org['login']=='dcppc':
try:
# Business as usual
data = request.form.to_dict();
data['github_login'] = username
data['timestamp'] = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
feedback_database = 'feedback_database.json'
if not os.path.isfile(feedback_database):
with open(feedback_database,'w') as f:
json_data = [data]
json.dump(json_data, f, indent=4)
else:
json_data = []
with open(feedback_database,'r') as f:
json_data = json.load(f)
json_data.append(data)
with open(feedback_database,'w') as f:
json.dump(json_data, f, indent=4)
## Should be done with Javascript
#flash("Thank you for your feedback!")
return jsonify({'status':'ok','message':'Thank you for your feedback!'})
except:
return jsonify({'status':'error','message':'An error was encountered while submitting your feedback. Try submitting an issue in the <a href="https://github.com/dcppc/centillion/issues/new">dcppc/centillion</a> repository.'})
# nope
return render_template('403.html')
@app.errorhandler(404) @app.errorhandler(404)
def oops(e): def oops(e):
@@ -303,5 +342,10 @@ def store_search(query, fields):
if __name__ == '__main__': if __name__ == '__main__':
# if running local instance, set to true # if running local instance, set to true
os.environ['OAUTHLIB_INSECURE_TRANSPORT'] = 'true' os.environ['OAUTHLIB_INSECURE_TRANSPORT'] = 'true'
app.run(host="0.0.0.0",port=5000) port = os.environ.get('CENTILLION_PORT','')
if port=='':
port = 5000
else:
port = int(port)
app.run(host="0.0.0.0",port=port)

View File

@@ -21,6 +21,8 @@ import dateutil.parser
from whoosh.qparser import MultifieldParser, QueryParser from whoosh.qparser import MultifieldParser, QueryParser
from whoosh.analysis import StemmingAnalyzer from whoosh.analysis import StemmingAnalyzer
from whoosh.qparser.dateparse import DateParserPlugin
from whoosh import fields, index
""" """
@@ -180,30 +182,38 @@ class Search:
# is defined. # is defined.
schema = Schema( schema = Schema(
id = ID(stored=True, unique=True), id = fields.ID(stored=True, unique=True),
kind = ID(stored=True), kind = fields.ID(stored=True),
created_time = ID(stored=True), created_time = fields.DATETIME(stored=True),
modified_time = ID(stored=True), modified_time = fields.DATETIME(stored=True),
indexed_time = ID(stored=True), indexed_time = fields.DATETIME(stored=True),
title = TEXT(stored=True, field_boost=100.0), title = fields.TEXT(stored=True, field_boost=100.0),
url = ID(stored=True, unique=True),
mimetype=ID(stored=True), url = fields.ID(stored=True),
owner_email=ID(stored=True),
owner_name=TEXT(stored=True),
repo_name=TEXT(stored=True), mimetype = fields.TEXT(stored=True),
repo_url=ID(stored=True),
github_user=TEXT(stored=True), owner_email = fields.ID(stored=True),
owner_name = fields.TEXT(stored=True),
# mainly for email threads, groups.io, hypothesis
group = fields.ID(stored=True),
repo_name = fields.TEXT(stored=True),
repo_url = fields.ID(stored=True),
github_user = fields.TEXT(stored=True),
tags = fields.KEYWORD(commas=True,
stored=True,
lowercase=True),
# comments only # comments only
issue_title=TEXT(stored=True, field_boost=100.0), issue_title = fields.TEXT(stored=True, field_boost=100.0),
issue_url=ID(stored=True), issue_url = fields.ID(stored=True),
content=TEXT(stored=True, analyzer=stemming_analyzer) content = fields.TEXT(stored=True, analyzer=stemming_analyzer)
) )
@@ -243,24 +253,32 @@ class Search:
writer.delete_by_term('id',item['id']) writer.delete_by_term('id',item['id'])
# Index a plain google drive file # Index a plain google drive file
writer.add_document( created_time = dateutil.parser.parse(item['createdTime'])
id = item['id'], modified_time = dateutil.parser.parse(item['modifiedTime'])
kind = 'gdoc', indexed_time = datetime.now().replace(microsecond=0)
created_time = item['createdTime'], try:
modified_time = item['modifiedTime'], writer.add_document(
indexed_time = datetime.now().replace(microsecond=0).isoformat(), id = item['id'],
title = item['name'], kind = 'gdoc',
url = item['webViewLink'], created_time = created_time,
mimetype = mimetype, modified_time = modified_time,
owner_email = item['owners'][0]['emailAddress'], indexed_time = indexed_time,
owner_name = item['owners'][0]['displayName'], title = item['name'],
repo_name='', url = item['webViewLink'],
repo_url='', mimetype = mimetype,
github_user='', owner_email = item['owners'][0]['emailAddress'],
issue_title='', owner_name = item['owners'][0]['displayName'],
issue_url='', group='',
content = content repo_name='',
) repo_url='',
github_user='',
issue_title='',
issue_url='',
content = content
)
except ValueError as e:
print(repr(e))
print(" > XXXXXX Failed to index Google Drive file \"%s\""%(item['name']))
else: else:
@@ -314,7 +332,7 @@ class Search:
) )
assert output == "" assert output == ""
except RuntimeError: except RuntimeError:
print(" > XXXXXX Failed to index document \"%s\""%(item['name'])) print(" > XXXXXX Failed to index Google Drive document \"%s\""%(item['name']))
# If export was successful, read contents of markdown # If export was successful, read contents of markdown
@@ -342,24 +360,33 @@ class Search:
else: else:
print(" > Creating a new record") print(" > Creating a new record")
writer.add_document( try:
id = item['id'], created_time = dateutil.parser.parse(item['createdTime'])
kind = 'gdoc', modified_time = dateutil.parser.parse(item['modifiedTime'])
created_time = item['createdTime'], indexed_time = datetime.now()
modified_time = item['modifiedTime'], writer.add_document(
indexed_time = datetime.now().replace(microsecond=0).isoformat(), id = item['id'],
title = item['name'], kind = 'gdoc',
url = item['webViewLink'], created_time = created_time,
mimetype = mimetype, modified_time = modified_time,
owner_email = item['owners'][0]['emailAddress'], indexed_time = indexed_time,
owner_name = item['owners'][0]['displayName'], title = item['name'],
repo_name='', url = item['webViewLink'],
repo_url='', mimetype = mimetype,
github_user='', owner_email = item['owners'][0]['emailAddress'],
issue_title='', owner_name = item['owners'][0]['displayName'],
issue_url='', group='',
content = content repo_name='',
) repo_url='',
github_user='',
issue_title='',
issue_url='',
content = content
)
except ValueError as e:
print(repr(e))
print(" > XXXXXX Failed to index Google Drive file \"%s\""%(item['name']))
@@ -393,31 +420,36 @@ class Search:
issue_comment_content += comment.body.rstrip() issue_comment_content += comment.body.rstrip()
issue_comment_content += "\n" issue_comment_content += "\n"
# Now create the actual search index record # Now create the actual search index record.
created_time = clean_timestamp(issue.created_at)
modified_time = clean_timestamp(issue.updated_at)
indexed_time = clean_timestamp(datetime.now())
# Add one document per issue thread, # Add one document per issue thread,
# containing entire text of thread. # containing entire text of thread.
writer.add_document(
id = issue.html_url, created_time = issue.created_at
kind = 'issue', modified_time = issue.updated_at
created_time = created_time, indexed_time = datetime.now()
modified_time = modified_time, try:
indexed_time = indexed_time, writer.add_document(
title = issue.title, id = issue.html_url,
url = issue.html_url, kind = 'issue',
mimetype='', created_time = created_time,
owner_email='', modified_time = modified_time,
owner_name='', indexed_time = indexed_time,
repo_name = repo_name, title = issue.title,
repo_url = repo_url, url = issue.html_url,
github_user = issue.user.login, mimetype='',
issue_title = issue.title, owner_email='',
issue_url = issue.html_url, owner_name='',
content = issue_comment_content group='',
) repo_name = repo_name,
repo_url = repo_url,
github_user = issue.user.login,
issue_title = issue.title,
issue_url = issue.html_url,
content = issue_comment_content
)
except ValueError as e:
print(repr(e))
print(" > XXXXXX Failed to index Github issue \"%s\""%(issue.title))
@@ -447,7 +479,8 @@ class Search:
print(" > XXXXXXXX Failed to find file info.") print(" > XXXXXXXX Failed to find file info.")
return return
indexed_time = clean_timestamp(datetime.now())
indexed_time = datetime.now()
if fext in MARKDOWN_EXTS: if fext in MARKDOWN_EXTS:
print("Indexing markdown doc %s from repo %s"%(fname,repo_name)) print("Indexing markdown doc %s from repo %s"%(fname,repo_name))
@@ -476,24 +509,31 @@ class Search:
usable_url = "https://github.com/%s/blob/master/%s"%(repo_name, fpath) usable_url = "https://github.com/%s/blob/master/%s"%(repo_name, fpath)
# Now create the actual search index record # Now create the actual search index record
writer.add_document( try:
id = fsha, writer.add_document(
kind = 'markdown', id = fsha,
created_time = '', kind = 'markdown',
modified_time = '', created_time = None,
indexed_time = indexed_time, modified_time = None,
title = fname, indexed_time = indexed_time,
url = usable_url, title = fname,
mimetype='', url = usable_url,
owner_email='', mimetype='',
owner_name='', owner_email='',
repo_name = repo_name, owner_name='',
repo_url = repo_url, group='',
github_user = '', repo_name = repo_name,
issue_title = '', repo_url = repo_url,
issue_url = '', github_user = '',
content = content issue_title = '',
) issue_url = '',
content = content
)
except ValueError as e:
print(repr(e))
print(" > XXXXXX Failed to index Github markdown file \"%s\""%(fname))
else: else:
print("Indexing github file %s from repo %s"%(fname,repo_name)) print("Indexing github file %s from repo %s"%(fname,repo_name))
@@ -501,24 +541,29 @@ class Search:
key = fname+"_"+fsha key = fname+"_"+fsha
# Now create the actual search index record # Now create the actual search index record
writer.add_document( try:
id = key, writer.add_document(
kind = 'ghfile', id = key,
created_time = '', kind = 'ghfile',
modified_time = '', created_time = None,
indexed_time = indexed_time, modified_time = None,
title = fname, indexed_time = indexed_time,
url = repo_url, title = fname,
mimetype='', url = repo_url,
owner_email='', mimetype='',
owner_name='', owner_email='',
repo_name = repo_name, owner_name='',
repo_url = repo_url, group='',
github_user = '', repo_name = repo_name,
issue_title = '', repo_url = repo_url,
issue_url = '', github_user = '',
content = '' issue_title = '',
) issue_url = '',
content = ''
)
except ValueError as e:
print(repr(e))
print(" > XXXXXX Failed to index Github file \"%s\""%(fname))
@@ -532,28 +577,42 @@ class Search:
Use a Github file API record to add a filename Use a Github file API record to add a filename
to the search index. to the search index.
""" """
indexed_time = clean_timestamp(datetime.now()) if 'created_time' in d.keys() and d['created_time'] is not None:
created_time = d['created_time']
else:
created_time = None
if 'modified_time' in d.keys() and d['modified_time'] is not None:
modified_time = d['modified_time']
else:
modified_time = None
indexed_time = datetime.now()
# Now create the actual search index record # Now create the actual search index record
writer.add_document( try:
id = d['permalink'], writer.add_document(
kind = 'emailthread', id = d['permalink'],
created_time = '', kind = 'emailthread',
modified_time = '', created_time = created_time,
indexed_time = indexed_time, modified_time = modified_time,
title = d['subject'], indexed_time = indexed_time,
url = d['permalink'], title = d['subject'],
mimetype='', url = d['permalink'],
owner_email='', mimetype='',
owner_name=d['original_sender'], owner_email='',
repo_name = '', owner_name=d['original_sender'],
repo_url = '', group=d['subgroup'],
github_user = '', repo_name = '',
issue_title = '', repo_url = '',
issue_url = '', github_user = '',
content = d['content'] issue_title = '',
) issue_url = '',
content = d['content']
)
except ValueError as e:
print(repr(e))
print(" > XXXXXX Failed to index Groups.io thread \"%s\""%(d['subject']))
@@ -631,10 +690,10 @@ class Search:
full_items[f['id']] = f full_items[f['id']] = f
## Shorter: ## Shorter:
#break break
# Longer: ## Longer:
if nextPageToken is None: #if nextPageToken is None:
break # break
writer = self.ix.writer() writer = self.ix.writer()
@@ -642,34 +701,41 @@ class Search:
temp_dir = tempfile.mkdtemp(dir=os.getcwd()) temp_dir = tempfile.mkdtemp(dir=os.getcwd())
print("Temporary directory: %s"%(temp_dir)) print("Temporary directory: %s"%(temp_dir))
try:
# Drop any id in indexed_ids
# not in remote_ids
drop_ids = indexed_ids - remote_ids
for drop_id in drop_ids:
writer.delete_by_term('id',drop_id)
# Drop any id in indexed_ids # Update any id in indexed_ids
# not in remote_ids # and in remote_ids
drop_ids = indexed_ids - remote_ids update_ids = indexed_ids & remote_ids
for drop_id in drop_ids: for update_id in update_ids:
writer.delete_by_term('id',drop_id) # cop out
writer.delete_by_term('id',update_id)
item = full_items[update_id]
self.add_drive_file(writer, item, temp_dir, config, update=True)
count += 1
# Update any id in indexed_ids # Add any id not in indexed_ids
# and in remote_ids # and in remote_ids
update_ids = indexed_ids & remote_ids add_ids = remote_ids - indexed_ids
for update_id in update_ids: for add_id in add_ids:
# cop out item = full_items[add_id]
writer.delete_by_term('id',update_id) self.add_drive_file(writer, item, temp_dir, config, update=False)
item = full_items[update_id] count += 1
self.add_drive_file(writer, item, temp_dir, config, update=True)
count += 1
# Add any id not in indexed_ids
# and in remote_ids
add_ids = remote_ids - indexed_ids
for add_id in add_ids:
item = full_items[add_id]
self.add_drive_file(writer, item, temp_dir, config, update=False)
count += 1
except Exception as e:
print("ERROR: While adding Google Drive files to search index")
print("-"*40)
print(repr(e))
print("-"*40)
print("Continuing...")
pass
print("Cleaning temporary directory: %s"%(temp_dir)) print("Cleaning temporary directory: %s"%(temp_dir))
subprocess.call(['rm','-fr',temp_dir]) subprocess.call(['rm','-fr',temp_dir])
@@ -1074,7 +1140,7 @@ class Search:
elif doctype=='issue': elif doctype=='issue':
item_keys = ['title','repo_name','repo_url','url','created_time','modified_time'] item_keys = ['title','repo_name','repo_url','url','created_time','modified_time']
elif doctype=='emailthread': elif doctype=='emailthread':
item_keys = ['title','owner_name','url'] item_keys = ['title','owner_name','url','created_time','modified_time']
elif doctype=='ghfile': elif doctype=='ghfile':
item_keys = ['title','repo_name','repo_url','url'] item_keys = ['title','repo_name','repo_url','url']
elif doctype=='markdown': elif doctype=='markdown':
@@ -1091,11 +1157,7 @@ class Search:
for r in results: for r in results:
d = {} d = {}
for k in item_keys: for k in item_keys:
if k=='created_time' or k=='modified_time': d[k] = r[k]
#d[k] = r[k]
d[k] = dateutil.parser.parse(r[k]).strftime("%Y-%m-%d")
else:
d[k] = r[k]
json_results.append(d) json_results.append(d)
return json_results return json_results
@@ -1108,7 +1170,9 @@ class Search:
query_string = " ".join(query_list) query_string = " ".join(query_list)
query = None query = None
if ":" in query_string: if ":" in query_string:
query = QueryParser("content", self.schema).parse(query_string) query = QueryParser("content", self.schema)
query.add_plugin(DateParserPlugin(free=True))
query = query.parse(query_string)
elif len(fields) == 1 and fields[0] == "filename": elif len(fields) == 1 and fields[0] == "filename":
pass pass
elif len(fields) == 2: elif len(fields) == 2:
@@ -1116,9 +1180,12 @@ class Search:
else: else:
# If the user does not specify a field, # If the user does not specify a field,
# these are the fields that are actually searched # these are the fields that are actually searched
fields = ['title', 'content','owner_name','owner_email','url'] fields = ['title', 'content','owner_name','owner_email','url','created_date','modified_date']
if not query: if not query:
query = MultifieldParser(fields, schema=self.ix.schema).parse(query_string) query = MultifieldParser(fields, schema=self.ix.schema)
query.add_plugin(DateParserPlugin(free=True))
query = query.parse(query_string)
#query = MultifieldParser(fields, schema=self.ix.schema).parse(query_string)
parsed_query = "%s" % query parsed_query = "%s" % query
print("query: %s" % parsed_query) print("query: %s" % parsed_query)
results = searcher.search(query, terms=False, scored=True, groupedby="kind") results = searcher.search(query, terms=False, scored=True, groupedby="kind")

28
config_centillion.py Normal file
View File

@@ -0,0 +1,28 @@
config = {
"repositories" : [
"dcppc/project-management",
"dcppc/nih-demo-meetings",
"dcppc/internal",
"dcppc/organize",
"dcppc/dcppc-bot",
"dcppc/full-stacks",
"dcppc/design-guidelines-discuss",
"dcppc/dcppc-deliverables",
"dcppc/dcppc-milestones",
"dcppc/crosscut-metadata",
"dcppc/lucky-penny",
"dcppc/dcppc-workshops",
"dcppc/metadata-matrix",
"dcppc/data-stewards",
"dcppc/dcppc-phase1-demos",
"dcppc/apis",
"dcppc/2018-june-workshop",
"dcppc/2018-july-workshop",
"dcppc/2018-august-workshop",
"dcppc/2018-september-workshop",
"dcppc/design-guidelines",
"dcppc/2018-may-workshop",
"dcppc/centillion"
]
}

View File

@@ -1,5 +1,7 @@
import requests, os, re import requests, os, re
from bs4 import BeautifulSoup from bs4 import BeautifulSoup
import dateutil.parser
import datetime
class GroupsIOException(Exception): class GroupsIOException(Exception):
pass pass
@@ -64,7 +66,7 @@ class GroupsIOArchivesCrawler(object):
## Short circuit ## Short circuit
## for debugging purposes ## for debugging purposes
#break break
return subgroups return subgroups
@@ -251,7 +253,7 @@ class GroupsIOArchivesCrawler(object):
subject = soup.find('title').text subject = soup.find('title').text
# Extract information for the schema: # Extract information for the schema:
# - permalink for thread (done) # - permalink for thread (done above)
# - subject/title (done) # - subject/title (done)
# - original sender email/name (done) # - original sender email/name (done)
# - content (done) # - content (done)
@@ -266,11 +268,35 @@ class GroupsIOArchivesCrawler(object):
pass pass
else: else:
# found an email! # found an email!
# this is a maze, thanks groups.io # this is a maze, not amazing.
# thanks groups.io!
td = tr.find('td') td = tr.find('td')
divrow = td.find('div',{'class':'row'}).find('div',{'class':'pull-left'})
sender_divrow = td.find('div',{'class':'row'})
sender_divrow = sender_divrow.find('div',{'class':'pull-left'})
if (i+1)==1: if (i+1)==1:
original_sender = divrow.text.strip() original_sender = sender_divrow.text.strip()
date_divrow = td.find('div',{'class':'row'})
date_divrow = date_divrow.find('div',{'class':'pull-right'})
date_divrow = date_divrow.find('font',{'class':'text-muted'})
date_divrow = date_divrow.find('script').text
try:
time_seconds = re.search(' [0-9]{1,} ',date_divrow).group(0)
time_seconds = time_seconds.strip()
# Thanks groups.io for the weird date formatting
time_seconds = time_seconds[:10]
mmicro_seconds = time_seconds[10:]
if (i+1)==1:
created_time = datetime.datetime.utcfromtimestamp(int(time_seconds))
modified_time = datetime.datetime.utcfromtimestamp(int(time_seconds))
else:
modified_time = datetime.datetime.utcfromtimestamp(int(time_seconds))
except AttributeError:
created_time = None
modified_time = None
for div in td.find_all('div'): for div in td.find_all('div'):
if div.has_attr('id'): if div.has_attr('id'):
@@ -299,7 +325,10 @@ class GroupsIOArchivesCrawler(object):
thread = { thread = {
'permalink' : permalink, 'permalink' : permalink,
'created_time' : created_time,
'modified_time' : modified_time,
'subject' : subject, 'subject' : subject,
'subgroup' : subgroup_name,
'original_sender' : original_sender, 'original_sender' : original_sender,
'content' : full_content 'content' : full_content
} }
@@ -324,11 +353,13 @@ class GroupsIOArchivesCrawler(object):
results = [] results = []
for row in rows: for row in rows:
# We don't care about anything except title and ugly link # This is where we extract
# a list of thread titles
# and corresponding links.
subject = row.find('span',{'class':'subject'}) subject = row.find('span',{'class':'subject'})
title = subject.get_text() title = subject.get_text()
link = row.find('a')['href'] link = row.find('a')['href']
#print(title)
results.append((title,link)) results.append((title,link))
return results return results

View File

@@ -7,19 +7,127 @@
// flask post data as json: // flask post data as json:
// https://stackoverflow.com/a/16664376 // https://stackoverflow.com/a/16664376
/* make the smile green */ /* this function is called when the user submits
* the feedback form. it submits a post request
* to the flask server, which squirrels away the
* feedback in a file.
*/
function submit_feedback() {
// this function is called when submit button clicked
// algorithm:
// - check if text box has content
// - check if happy/sad filled out
var smile_active = $('#modal-feedback-smile-div').hasClass('smile-active');
var frown_active = $('#modal-feedback-frown-div').hasClass('frown-active');
if( !( smile_active || frown_active ) ) {
alert('Please pick the smile or the frown.')
} else if( $('#modal-feedback-textarea').val()=='' ) {
alert('Please provide us with some feedback.')
} else {
var user_sentiment = '';
if(smile_active) {
user_sentiment = 'smile';
} else {
user_sentiment = 'frown';
}
var escaped_text = $('#modal-feedback-textarea').val();
// prepare form data
var data = {
sentiment : user_sentiment,
content : escaped_text
};
// post the form. the callback function resets the form
$.post("/feedback",
data,
function(response) {
$('#myModal').modal('hide');
$('#myModalForm')[0].reset();
add_alert(response);
frown_unclick();
smile_unclick();
});
}
}
function add_alert(response) {
str = ""
str += '<div id="feedback-messages-container" class="container">';
if (response['status']=='ok') {
// if status is ok, use alert-success
str += ' <div id="feedback-messages-alert" class="alert alert-success alert-dismissible fade in">';
} else {
// otherwise use alert-danger
str += ' <div id="feedback-messages-alert" class="alert alert-danger alert-dismissible fade in">';
}
str += ' <a href="#" class="close" data-dismiss="alert" aria-label="close">&times;</a>';
str += ' <div id="feedback-messages-contianer" class="container-fluid">';
str += ' <div id="feedback-messages-div" class="co-xs-12">';
str += ' <p>'
str += response['message'];
str += ' </p>';
str += ' </div>';
str += ' </div>';
str += '</div>';
$('div#messages').append(str);
}
/* for those particularly wordy users... limit feedback to 1000 chars */
function cool_it() {
if($('#modal-feedback-textarea').val().length > 1100 ){
$('#modal-too-long').show();
} else {
$('#modal-too-long').hide();
}
}
/* smiley functions */
function smile_click() { function smile_click() {
$('#modal-feedback-smile-div').addClass('smile-active'); $('#modal-feedback-smile-div').addClass('smile-active');
$('#modal-feedback-smile-icon').addClass('smile-active'); $('#modal-feedback-smile-icon').addClass('smile-active');
}
function frown_click() {
$('#modal-feedback-frown-div').addClass('frown-active');
$('#modal-feedback-frown-icon').addClass('frown-active');
}
function smile_unclick() {
$('#modal-feedback-smile-div').removeClass('smile-active');
$('#modal-feedback-smile-icon').removeClass('smile-active');
}
function frown_unclick() {
$('#modal-feedback-frown-div').removeClass('frown-active'); $('#modal-feedback-frown-div').removeClass('frown-active');
$('#modal-feedback-frown-icon').removeClass('frown-active'); $('#modal-feedback-frown-icon').removeClass('frown-active');
} }
/* make the frown red */ function smile() {
function frown_click() { frown_unclick();
$('#modal-feedback-smile-div').removeClass('smile-active'); smile_click();
$('#modal-feedback-smile-icon').removeClass('smile-active'); }
$('#modal-feedback-frown-div').addClass('frown-active'); function frown() {
$('#modal-feedback-frown-icon').addClass('frown-active'); smile_unclick();
frown_click();
} }
/* for those particularly wordy users... limit feedback to 1100 chars */
// how to check n characters in a textarea
// https://stackoverflow.com/a/19934613
/*
$(document).ready(function() {
$('#modal-feedback-textarea').on('change',function(event) {
if($('#modal-feedback-textarea').val().length > 1100 ){
$('#modal-too-long').show();
} else {
$('#modal-too-long').hide();
}
});
}
*/

View File

@@ -1,3 +1,7 @@
#modal-too-long {
visibility: hidden;
}
/* feedback smileys */ /* feedback smileys */
#modal-feedback-smile-icon, #modal-feedback-smile-icon,
#modal-feedback-frown-icon { #modal-feedback-frown-icon {

25
templates/banner.html Normal file
View File

@@ -0,0 +1,25 @@
<div class="container" id="banner-container">
{#
banner image
#}
<div class="row" id="banner-row">
<div class="col12sm" id="banner-col">
<center>
<a id="banner-a" href="{{ url_for('search')}}?query=&fields=">
<img id="banner-img" src="{{ url_for('static', filename='centillion_white.png') }}">
</a>
</center>
</div>
</div>
{% if config['TAGLINE'] %}
<div class="row" id="tagline-row">
<div class="col12sm" id="tagline-col">
<center>
<h2 id="tagline-tagline"> {{config['TAGLINE']}} </h2>
</center>
</div>
</div>
{% endif %}
</div>

View File

@@ -0,0 +1,14 @@
<div id="messages">
{% with messages = get_flashed_messages() %}
{% if messages %}
<div class="container" id="flashed-messages-container">
<div class="alert alert-success alert-dismissible fade in">
<a href="#" class="close" data-dismiss="alert" aria-label="close">&times;</a>
{% for message in messages %}
<p class="lead">{{ message }}</p>
{% endfor %}
</div>
</div>
{% endif %}
{% endwith %}
</div>

View File

@@ -26,53 +26,23 @@
<div id="master-div"> <div id="master-div">
{% with messages = get_flashed_messages() %} {#
{% if messages %} flashed messages
<div class="container" id="flashed-messages-container"> #}
<div class="alert alert-success alert-dismissible"> {% include "flashed_messages.html" %}
<a href="#" class="close" data-dismiss="alert" aria-label="close">&times;</a>
<ul class=flashes>
{% for message in messages %}
<li>{{ message }}</li>
{% endfor %}
</ul>
</div>
</div>
{% endif %}
{% endwith %}
<div class="container" id="banner-container">
{#
banner image
#}
<div class="row" id="banner-row">
<div class="col12sm" id="banner-col">
<center>
<a id="banner-a" href="{{ url_for('search')}}?query=&fields=">
<img id="banner-img" src="{{ url_for('static', filename='centillion_white.png') }}">
</a>
</center>
</div>
</div>
{% if config['TAGLINE'] %}
<div class="row" id="tagline-row">
<div class="col12sm" id="tagline-col">
<center>
<h2 id="tagline-tagline"> {{config['TAGLINE']}} </h2>
</center>
</div>
</div>
{% endif %}
</div>
{# {#
feedback modal banner image
#} #}
{% include "banner.html" %}
{#
feedback modal
#}
{% include "modal.html" %} {% include "modal.html" %}
{% block body %}{% endblock %} {% block body %}{% endblock %}
</div> </div>
{% if active_page=="search" or active_page=="master_list" %} {% if active_page=="search" or active_page=="master_list" %}

View File

@@ -1,38 +1,51 @@
<div class="modal fade" id="myModal" tabindex="-1" role="dialog" aria-labelledby="myModalLabel"> <div class="modal fade" id="myModal" tabindex="-1" role="dialog" aria-labelledby="myModalLabel">
<div class="modal-dialog" role="document">
<div class="modal-content"> <form id="myModalForm" method="post">
<div class="modal-header">
<button type="button" class="close" data-dismiss="modal" aria-label="Close"> <div class="modal-dialog" role="document">
<span aria-hidden="true">&times;</span> <div id="myModal-content" class="modal-content">
</button> <div id="myModal-header" class="modal-header">
<h4 class="modal-title" id="myModalLabel"> <button type="button" class="close" data-dismiss="modal" aria-label="Close">
Send us feedback! <span aria-hidden="true">&times;</span>
</h4> </button>
</div> <h4 class="modal-title" id="myModalLabel">
<div class="modal-body"> Send us feedback!
<div class="container-fluid"> </h4>
<div id="modal-feedback-smile-div" class="col-xs-6 text-center" onClick="smile_click()">
<i id="modal-feedback-smile-icon" class="fa fa-smile-o fa-4x" aria-hidden="true"></i>
</div>
<div id="modal-feedback-frown-div" class="col-xs-6 text-center" onClick="frown_click()">
<i id="modal-feedback-frown-icon" class="fa fa-frown-o fa-4x" aria-hidden="true"></i>
</div>
</div> </div>
<div class="container-fluid"> <div id="myModal-body" class="modal-body">
<p>&nbsp;</p> <div id="modal-feedback-smile-frown-container" class="container-fluid">
</div> <div id="modal-feedback-smile-div" class="col-xs-6 text-center"
<div class="container-fluid"> onClick="smile()">
<textarea id="modal-feedback-textarea" rows="6"></textarea> <i id="modal-feedback-smile-icon" class="fa fa-smile-o fa-4x" aria-hidden="true"></i>
</div> </div>
</div> <div id="modal-feedback-frown-div" class="col-xs-6 text-center"
<div class="modal-footer"> onClick="frown()">
<div class="text-center"> <i id="modal-feedback-frown-icon" class="fa fa-frown-o fa-4x" aria-hidden="true"></i>
<button id="submit-feedback-btn" type="button" class="btn btn-lg btn-primary" data-dismiss="modal"> </div>
Send
</button>
</div> </div>
<div class="container-fluid">
<p>&nbsp;</p>
</div>
<div id="modal-feedback-textarea-container" class="container-fluid">
<textarea id="modal-feedback-textarea" rows="6"></textarea>
</div>
<div id="modal-too-long" class="container-fluid" >
<p id="modal-too-long-text" class="lead">Please limit the length of your feedback. Thank you in advance!</p>
</div>
</div>
<div id="myModal-footer" class="modal-footer">
<div class="text-center">
<button id="submit-feedback-btn" type="button"
onClick="submit_feedback()"
class="btn btn-lg btn-primary">
Send
</button>
</div>
</div>
</div> </div>
</div> </div>
</div>
</form>
</div> </div>

View File

@@ -25,6 +25,8 @@
</div> </div>
</div> </div>
<div style="height: 20px;"><p>&nbsp;</p></div>
<div id="info-bars-container" class="container"> <div id="info-bars-container" class="container">
<div class="row"> <div class="row">