Author: Natalie Pendragon <email@example.com>
Date: Fri, 29 May 2020 14:40:56 -0400
[serve] Make sure two closely-timed seed requests don't break
This will prevent seed requests' incremental crawls from stomping on
each other, but due to the way in which incremental crawls
resolve (i.e., by restarting the entire GUS serve process via
systemctl), it also means any seed requests that came in after the
first will not be handled until either A) another seed request comes
in that ends up dealing with it, or B) a manual crawl is kicked off.
The situation is no worse than before however, so this is still an
improvement in the short-term.
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/gus/serve.py b/gus/serve.py
@@ -29,6 +29,7 @@ gemini_highlighter = highlight.Highlighter(
+crawl_thread_lock = threading.Lock()
statistics = load_last_statistics_from_file(filename)
@@ -380,8 +381,16 @@ def search(request):
- run_crawl(should_run_destructive=False, seed_urls=[seed_url])
- call(["sudo", "systemctl", "restart", "gus.service"])
+ # NB: this lock will never get released under normal conditions, as the
+ # expected conclusion of the incremental crawl thread is issue a call
+ # to systemctl to restart the entire GUS serve process. That new process
+ # will reinitialize everything, including a fresh, unlocked Lock object.
+ # However, if the incremental crawl thread crashes for some reason, it
+ # should catch the exception and release the lock, so new seed requests
+ # can kick off their own incremental crawls.
+ with crawl_thread_lock:
+ run_crawl(should_run_destructive=False, seed_urls=[seed_url])
+ call(["sudo", "systemctl", "restart", "gus.service"])