use whoosh to search documents in a google drive folder
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
Charles Reid a40243c3b8 Merge branch 'master' of https://git.charlesreid1.com/charlesreid1/cheeseburger-search into pandoc 6 years ago
static init commit of everything copied over from issues-search 6 years ago
templates minor cleanup of search template 6 years ago
.gitignore ignore google drive credential jsons 6 years ago
LICENSE init commit of everything copied over from issues-search 6 years ago
Readme.md udpate readme and todo 6 years ago
Todo.md mark v0.3 as in the bag 6 years ago
cheeseburger_app.py fix mysterious bug in app 6 years ago
cheeseburger_search.py actually clean up 6 years ago
config.py.sample init commit of everything copied over from issues-search 6 years ago
gdrive_util.py constructors should not return anything 6 years ago
markdown_parser.py init commit of everything copied over from issues-search 6 years ago
requirements.txt add pandoc mistune requests to requirements.txt 6 years ago

Readme.md

cheeseburger-search

use whoosh to search documents in a google drive folder.

Implemented in Python using Flask, Whoosh and Mistune.

virtualenv

virtualenv vp
source vp/bin/activate
pip install -r requirements.txt

notes

in addition to the schema changes listed in issues-search:

  • also need to update schema for new document types, of course
  • document schema implies new objects, new arguments being passed
  • each method listed will also have new arguments
  • how to integrate a new API?
  • API is only used in search portion...

integrating a new API:

  • open the *_search.py file
  • import it at the top
  • in the add all documents method, create your API instance
  • in the update index incremental (alt route), create your API instance

dealing with API library's objects:

  • from there, the API is used to obtain various kinds of objects
  • the objects can "just be used" without a problem from other methods
  • example: github api is used to get a repo object in add_all_issues(), and repo object is then passed to add_issue() with no problem

adding support for "collections":

  • may eventually want to add multiple folders
  • add labels for top-level folders... call it a "collection"

the api:

  • replace add_issue with add_document
  • replace add_all_issues with add_all_documents
  • replace method signatures
    • add_document() - add document object and collection label
    • add_all_documents() - add credentials filename and collection label
    • update_index_incremental() - add credentials filename and collection label
  • drive credentials
  • drive API object wrapper
    • spiraling out of control
  • example: list all files.
    • import pdb right before we run example code to print all docs
    • this is write before (ha ha) the writer.add_document() call

last schema thing to change:

  • search() method in *_search.py
  • list of fields needs to be updated
  • don't exactly understand that if block but okkkkk....

todo

see Todo.md

creating apps

link to google apps docs