diff --git a/ b/ @@ -34,25 +34,10 @@ Now you'll have created `` directory, rename it to `index`. ## Roadmap / TODOs -- **log output of crawl**: I see some errors fly by, and it - would be nice to be able to review later and investigate. -- **get crawl to run on a schedule with systemd** -- **add more statistics**: this could go in the index statistics - page, and, in addition to using the index itself, could also - pull information from the jetforce logs. - - server uptime (from indexes) - - num new servers per week/month (from indexes) - - num GUS queries per day (from server logs) - - most common queries (not sure about this one) (from server logs) - - num cross-domain redirects - - num domains with robots -- **add tests**: there aren't any yet! -- **add functionality to create a mock index**: this would - be useful for local hacking on, so one does - not need to perform a real scrape of Geminispace to do - said hacking. -- **exclude raw-text links**: I think there is a "raw-text block" - type of construct in the Gemini spec now, so I should probably - add a TODO to refactor the extract_gemini_links function to - exclude any links found within such a block. -- **track number of inbound links** +- TODO: improve crawl and build_index automation +- TODO: get crawl to run on a schedule with systemd +- TODO: add some automated tests +- TODO: add functionality to create a mock index +- TODO: exclude raw-text blocks from indexed content +- TODO: strip control characters from logged output like URLs +- TODO: fix bug in calulation of backlinks (iirc the bug is visible on