Unnamed repository; edit this file 'description' to name the repository.
git clone git://
Log | Files | Refs | README | LICENSE

commit 1e63d8b307a42230db0a7e3fe2b2db9abcf2b608
parent aa3fdeaefb1f80aa0838c2ea62b8e73f6e832d40
Author: Natalie Pendragon <>
Date:   Sun,  1 Nov 2020 11:05:07 -0500

Clean up todo list in README

Diffstat: | 29+++++++----------------------
1 file changed, 7 insertions(+), 22 deletions(-)

diff --git a/ b/ @@ -34,25 +34,10 @@ Now you'll have created `` directory, rename it to `index`. ## Roadmap / TODOs -- **log output of crawl**: I see some errors fly by, and it - would be nice to be able to review later and investigate. -- **get crawl to run on a schedule with systemd** -- **add more statistics**: this could go in the index statistics - page, and, in addition to using the index itself, could also - pull information from the jetforce logs. - - server uptime (from indexes) - - num new servers per week/month (from indexes) - - num GUS queries per day (from server logs) - - most common queries (not sure about this one) (from server logs) - - num cross-domain redirects - - num domains with robots -- **add tests**: there aren't any yet! -- **add functionality to create a mock index**: this would - be useful for local hacking on, so one does - not need to perform a real scrape of Geminispace to do - said hacking. -- **exclude raw-text links**: I think there is a "raw-text block" - type of construct in the Gemini spec now, so I should probably - add a TODO to refactor the extract_gemini_links function to - exclude any links found within such a block. -- **track number of inbound links** +- TODO: improve crawl and build_index automation +- TODO: get crawl to run on a schedule with systemd +- TODO: add some automated tests +- TODO: add functionality to create a mock index +- TODO: exclude raw-text blocks from indexed content +- TODO: strip control characters from logged output like URLs +- TODO: fix bug in calulation of backlinks (iirc the bug is visible on