geminispace.info

Unnamed repository; edit this file 'description' to name the repository.
git clone git://code.clttr.info/geminispace.info.git
Log | Files | Refs | README | LICENSE

commit f928815d49189a5fcfa7753ed578ac36532ffc82
parent 6eedbd4190b8f1feafb34fcca4007655237fa9df
Author: René Wagner <rwa@clttr.info>
Date:   Mon, 11 Oct 2021 19:45:42 +0200

use cronjob for automated start

Diffstat:
MREADME.md | 11++---------
Dinfra/gus-crawl.timer | 13-------------
Minfra/gus-index.service | 2+-
Minfra/rebuild_index.sh | 1-
4 files changed, 3 insertions(+), 24 deletions(-)

diff --git a/README.md b/README.md @@ -39,9 +39,8 @@ Now you'll have created `index.new` directory, rename it to `index`. ### Running the crawl & indexer in production with systemd 1. update `infra/gus-crawl.service` & `infra/gus-index.service` to match your needs (directory, user) -2. update `infra/gus-crawl.timer` to match your needs (OnCalendar definition) -3. copy both files to `/etc/systemd/system/` -4. run `systemctl enable gus-crawl.timer` & `systemctl start gus-crawl.timer` to start the timer +2. copy both files to `/etc/systemd/system/` +3. set up a cron job for root with the following params: `0 9 */3 * * systemctl start gus-crawl --no-block` ## Running the test suite @@ -50,12 +49,6 @@ Run: `poetry run pytest` ## Roadmap / TODOs -- TODO: improve crawl and build_index automation - TODO: add functionality to create a mock index - TODO: exclude raw-text blocks from indexed content - TODO: strip control characters from logged output like URLs -- TODO: fix bug in calulation of backlinks (iirc the bug is visible on gemini.circumlunar.space) -- TODO: refactor manual exclusion logic to be regex-based instead of prefix-based. we could get more nuanced with exclusion logic this way -- TODO: write a "clean" script that removes domains/pages from index, db, and statistics files, in accordance with the various exclusion lists and patterns -- TODO: speed up statistics page, it's gotten reaaaaaaally slow -- TODO: speed up newest hosts/pages pages, they've gotten reaaaaaaally slow diff --git a/infra/gus-crawl.timer b/infra/gus-crawl.timer @@ -1,13 +0,0 @@ -[Unit] -Description= -ConditionVirtualization=!container - -[Timer] -OnCalendar=*-*-1/3 08:00:00 -AccuracySec=15m -Persistent=true -RandomizedDelaySec=600 - -[Install] -WantedBy=timers.target - diff --git a/infra/gus-index.service b/infra/gus-index.service @@ -9,5 +9,5 @@ Group=gus Type=oneshot WorkingDirectory=/home/gus Environment="PYTHONUNBUFFERED=1" -ExecStart=/bin/bash -c /home/gus/infra/update_index.sh +ExecStart=/bin/bash -c /home/gus/infra/rebuild_index.sh ExecStopPost=sudo systemctl restart gus diff --git a/infra/rebuild_index.sh b/infra/rebuild_index.sh @@ -1,6 +1,5 @@ cp -r /home/gus/index /home/gus/index.new /home/gus/.poetry/bin/poetry run build_index -d rm -rf /home/gus/index.old -#rm -rf /home/gus/index.new/MAIN.tmp/ mv /home/gus/index /home/gus/index.old mv /home/gus/index.new /home/gus/index