|
7 years ago | |
---|---|---|
d3-calendar | 7 years ago | |
.gitignore | 7 years ago | |
GPL-LICENSE | 7 years ago | |
LICENSE | 7 years ago | |
README.md | 7 years ago | |
collections_sites_pages.py | 7 years ago | |
graph.py | 7 years ago | |
graph_algorithms.py | 7 years ago | |
mongo_graph.py | 7 years ago | |
page_graph.py | 7 years ago | |
page_history.py | 7 years ago | |
strongly-connected.py | 7 years ago |
README.md
charlesreid1-wiki-data
This repository contains scripts for analyzing graph and edit data from the charlesreid1 wiki: https://charlesreid1.com/wiki
This data must be prepared ahead of time and loaded into a MongoDB.
For the scripts that actually assemble the data set, see
the dotfiles/debian
repo, specifically the dotfiles/jupiter_scripts/
directory
and the push_wiki.py
script.
NOTE (08/23/2018): This repo has lots of cobwebs and needs to be cleaned up.
page history
Implements a method to scrape the wiki and create
commits for each revision. Implemented in page_history.py
.
The page history database contains a document for every revision on the wiki. Each revision document contains:
- Revision sha1 hash of page contents (id)
- Page title
- Page revision timestamp
- Page revision character count
- Namespace, tags, categories
page graph
This implements a graph data structure in the files:
graph.py
- base Graph objectmongodb_graph.py
- base MongoDB Graph object (MongoDB I/O)graph_algorithms.py
- implements several algorithms for graphs
The page graph is actually constructed in memory as the wiki was scraped, and the database is exported when the process is finished.