geminispace.info

search provider for gemini space
git clone git://git.clttr.info/geminispace.info.git
Log (Feed) | Files | Refs (Tags) | README | LICENSE

DateCommit messageAuthorFiles+-
2022-11-20 11:10new exclude, make crawl usw 3 threadsRené Wagner3+7-3
2022-11-20 11:09news 2022-11-19René Wagner2+11-0
2022-10-26 17:37fix index rebuild on first day of monthRené Wagner1+1-1
2022-10-26 17:37increase memory for indexerRené Wagner1+1-1
2022-10-26 17:36exclude webgate.geminet.orgRené Wagner1+1-0
2022-08-25 07:30fix link in latest newsRené Wagner1+1-1
2022-08-23 14:47properly implement deletion of capsules with outdated crawlsRené Wagner2+28-18
2022-08-22 15:25add donation linkRené Wagner2+10-0
2022-08-18 08:57news 2022-08-18René Wagner1+6-0
2022-08-16 17:38fix normalizing of URIs with default portRené Wagner2+7-9
2022-08-14 15:35fix test and add additional test for special robots.txtRené Wagner3+16-3
2022-08-06 13:26fix -d param for crawlRené Wagner1+1-1
2022-08-06 13:10update depsRené Wagner1+17-241
2022-08-06 13:04Merge pull request 'Upgrade feedparser to 6.0.10' (#51) from duncan-bayne/geminispace.info:master into masterRené W2+258-29
2022-08-06 10:37Upgrade feedparser to 6.0.10Duncan Bayne2+258-29
2022-07-23 06:40newest hosts templateRené Wagner1+1-1
2022-07-23 06:37show 50 newest hosts instead of 30René Wagner1+1-1
2022-07-23 06:36exclude auragem.space/twitch/René Wagner5+50-243
2022-06-10 17:54disable search suggestions due to bugRené Wagner3+28-27
2022-05-26 17:42move data deletion to indexingRené Wagner3+41-36
2022-05-16 17:01news 2021-05-16René Wagner5+8-67
2022-05-13 15:39make crawl multi-threadedRené Wagner2+42-26
2022-05-09 19:45disable mmap-ing of whoosh indexRené Wagner1+1-1
2022-05-08 17:29switch SQLite to WAL modeRené Wagner5+21-9
2022-05-08 09:03some adjustmentsRené Wagner2+11-4
2022-05-08 08:20updated dependencies, excludesRené Wagner4+217-203
2022-03-25 20:08news 2022-03-25René Wagner1+3-0
2022-03-23 07:38tweak whoosh writer settings for speedupRené Wagner1+1-2
2022-03-20 08:18workaround breaking change in markupsafe 2.1.xRené Wagner4+43-43
2022-03-19 17:55news 2022-03-19René Wagner2+10-2
2022-03-19 17:54update depsRené Wagner2+162-156
2022-03-03 18:34fix some typosRené Wagner2+1-2
2022-02-18 17:37add info about redirect indexingRené Wagner3+10-9
2022-02-06 18:53add index for speedupRené Wagner3+17-5
2022-02-06 18:13some sql adjustmentsRené Wagner3+17-13
2022-02-05 15:17precompute feeds and pagesRené Wagner2+27-33
2022-02-05 09:37precompute hosts statisticsRené Wagner2+26-30
2022-02-05 09:17update depsRené Wagner1+134-130
2022-02-05 09:17exclude git.skyjake.fiRené Wagner1+3-0
2022-02-04 18:25exclude tlgs.oneRené Wagner1+1-0
2022-02-04 08:41generic exception handling for page crawlingRené Wagner1+26-20
2022-01-28 12:12new exclude: taz.deRené Wagner1+1-0
2022-01-25 20:11news 2022-01-25René Wagner1+4-0
2022-01-04 18:57news 2021-12-29René Wagner1+4-0
2021-12-29 09:59don't delete excluded pages from the pages tableRené Wagner1+0-10
2021-12-29 09:57update poetry versionRené Wagner1+170-148
2021-11-24 19:45show 30 latest hostsRené Wagner2+4-1
2021-11-20 16:13exclude antenna filtersRené Wagner1+3-0
2021-11-19 15:06don't crash on URIs with non-number portRené Wagner1+1-1
2021-11-16 15:21update excludesRené Wagner1+5-2
2021-11-11 17:28update contactRené Wagner3+3-3
2021-11-09 17:41dependency updateRené Wagner1+76-74
2021-11-07 16:27cleanup excludesRené Wagner1+1-2
2021-10-25 18:45save first_seen_at if a page is created through a linkRené Wagner3+172-150
2021-10-14 18:22add link to source in geminispaceRené Wagner1+2-1
2021-10-14 16:54more meta data for index cleanupRené Wagner6+24-40
2021-10-11 18:03avoid crash when normalized_url is not setRené Wagner1+16-11
2021-10-11 17:45use cronjob for automated startRené Wagner4+3-24
2021-09-16 17:53some cleanupRené Wagner5+10-35
2021-09-06 06:19fix broken link to source codeRené Wagner1+4-3
2021-09-04 07:03do not add every single domain to the statistics fileRené Wagner5+18-11
2021-08-18 15:23news 2021-08-18René Wagner2+4-1
2021-08-17 19:00some minor changesRené Wagner4+33-57
2021-08-10 16:43ensure that scheme is given when searching for backlinksRené Wagner1+2-0
2021-08-10 16:37update 2021-08-07René Wagner2+7-0
2021-08-06 14:50ensure that seed-requests use absolute URIsRené Wagner2+3-0
2021-08-06 14:41more excludesRené Wagner2+45-53
2021-07-23 11:11implemented deletion of outdated dataRené Wagner1+17-1
2021-07-20 17:14small fixes and doc adjustmentsRené Wagner4+15-10
2021-07-17 17:40remove obsolete codeRené Wagner4+1-123
2021-07-17 09:06support prioritized robots.txt user-agentsHannu Hartikainen3+102-5
2021-07-17 10:35more excludes and less loggingRené Wagner2+3-1
2021-07-14 19:01treat schemeless links as non-gemini linksRené Wagner1+3-4
2021-07-14 18:56remove pikkulog separationRené Wagner2+0-20
2021-07-14 06:36minor code cleanup in db_modelRené Wagner3+4-19
2021-07-14 06:32update to some templatesRené Wagner5+5-16
2021-07-13 15:20remove Search modelRené Wagner3+4-16
2021-07-13 11:21enable 'newest-hosts' and 'newest-pages' sites againRené Wagner7+50-13
2021-07-13 07:21remove raw data from excluded capsulesRené Wagner2+10-1
2021-07-12 19:37index text files up to 5 MBRené Wagner4+25-21
2021-07-12 17:27commit search index only when indexing is completeRené Wagner4+26-124
2021-07-12 14:57store document id in whoosh indexRené Wagner1+1-1
2021-07-12 12:58some tweaks to indexingRené Wagner3+6-6
2021-07-11 17:03restructure crawl dataRené Wagner17+88-165
2021-07-11 07:05remove Crawl table, all info is stored in page table nowRené Wagner5+84-157
2021-07-10 07:08don't persist robots.txt over multiple crawlsRené Wagner2+3-24
2021-07-09 20:05improve indexing speed via optimized backlinks queryRené Wagner3+4-17
2021-07-09 15:38again a new excludeRené Wagner1+1-0
2021-07-09 15:37move gusmobile to new homeRené Wagner2+115-110
2021-07-04 19:49update 2021-07-04 & more excludesRené Wagner2+24-7
2021-06-28 07:31additional filterRené Wagner2+2-1
2021-06-26 11:16update 2021-06-26René Wagner2+8-1
2021-06-16 19:18exclude godocs.ioRené Wagner2+6-0
2021-06-14 07:13error handling on page crawl saveRené Wagner1+29-7
2021-06-04 09:40update 2021-06-04René Wagner1+9-0
2021-05-29 08:56more exception handling on link updateRené Wagner1+4-2
2021-05-27 13:24fix wrong embedding of excludesRené Wagner2+8-4
2021-05-26 11:06unify capitalisation of charset in statisticsRené Wagner1+2-2
2021-05-25 20:05move exclude definition to own fileRené Wagner5+253-251
2021-05-25 19:13news 2021-05-25René Wagner2+5-0
2021-05-21 19:58some exception handling and updated service filesRené Wagner4+11-8
2021-05-16 07:59fix last wrong exception in crawlRené Wagner2+1-2
2021-05-14 18:59fix wrong exception handling in crawlRené Wagner2+107-116
2021-05-12 15:46update 2021-05-12René Wagner2+7-2
2021-05-10 15:41rewrite statistics gathering to pure sqlRené Wagner3+28-27
2021-05-08 19:51exception handling on page saveRené Wagner4+279-223
2021-04-14 19:33news 2021-04-14René Wagner1+4-0
2021-04-05 06:07delete tmp files of whooshRené Wagner1+1-0
2021-03-25 20:33use .fromisoformat for getting timestamp from dbRené Wagner1+1-1
2021-03-25 20:10various correctionsRené Wagner3+6-3
2021-03-20 19:58hack: index update in separate dirRené Wagner5+13-13
2021-03-08 18:21skip a capsule after 5 consecutive failed requestsRené Wagner2+28-11
2021-03-08 17:59workaround for "index update blocks searches"René Wagner2+8-1
2021-03-08 17:59news update 2021-03-08René Wagner1+8-1
2021-03-08 17:51Merge branch 'master' of git://natpen.net/gusRené Wagner1+2-0
2021-03-05 18:02update poetry depsRené Wagner1+113-120
2021-02-26 17:52gsi specific updates 2021-02-26René Wagner2+6-1
2021-02-22 18:06robots.txt sections "*" and "indexer" are honoredRené Wagner2+4-13
2021-02-12 07:05correctly handle robots.txtRené Wagner2+26-8
2021-02-12 07:53add verbose search to robots.txtRené Wagner1+1-0
2021-02-12 07:53add verbose search to robots.txtRené Wagner1+1-0
2021-02-12 07:05correctly handle robots.txtRené Wagner2+26-8
2021-02-10 18:05Merge branch 'master' of git://natpen.net/gusRené Wagner1+3-3
2021-02-10 10:06limit max_crawl_depth to 100 for normal crawlRené Wagner1+1-1
2021-02-10 06:07increase frequency to avoid rescanning within a single crawlRené Wagner1+3-3
2021-02-08 16:43add some forbidden URIs & set max_crawl_depthRené Wagner1+38-22
2021-02-07 18:11remove seed-requests from repoRené Wagner2+5-97
2021-02-07 16:48Merge branch 'master' of git://natpen.net/gusRené Wagner2+10-1
2021-02-07 16:23Add a few more url parsing test casesNatalie Pendragon1+3-0
2021-02-07 16:20Update to Python 3.9 compatibilityNatalie Pendragon1+7-1
2021-02-04 20:06update python depsRené Wagner1+293-278
2021-02-04 20:05introduce systemd-unit for indexerRené Wagner4+19-12
2021-02-02 17:38update python depsRené Wagner1+293-278
2021-02-02 16:39updates geminispace.info 2021-02-02René Wagner4+11-3
2021-01-31 20:08introduce systemd-unit for indexerRené Wagner3+17-4
2021-01-31 14:04gsi specific updatesRené Wagner2+5-2
2021-01-30 15:15Make README heading lines more consistentNatalie Pendragon1+5-5
2021-01-30 15:05Fix trailing whitespace and reformat long stringNatalie Pendragon1+10-2
2021-01-30 15:15Make README heading lines more consistentNatalie Pendragon1+5-5
2021-01-29 09:08add systemd-units for automatic crawlingRené Wagner3+46-10
2021-01-30 15:05Fix trailing whitespace and reformat long stringNatalie Pendragon1+10-2
2021-01-28 10:33add "/robots.txt" route to views.pyRené Wagner1+4-0
2021-01-29 13:43gsi specific updates 2021-01-29René Wagner4+11-10
2021-01-28 19:59add systemd-units for automatic crawlingRené Wagner3+46-10
2021-01-27 12:35add "/robots.txt" route to views.pyRené Wagner1+4-0
2021-01-27 09:23modify views to match geminispace.infoRené Wagner9+29-93
2021-01-21 20:08add seeds & update ignored urlsGogs2+119-2
2020-12-26 17:30Defer search requests to threadsugla1+38-30
2020-12-22 11:46Health test script and systemd serviceRemco2+49-0
2020-12-22 15:00[serve] Fix copy-paste error in status endpoint function nameNatalie Pendragon1+1-1
2020-12-21 17:04[serve] Add status endpointNatalie Pendragon1+5-0
2020-12-08 15:10[serve] Improve formatting of statistics pageNatalie Pendragon1+4-4
2020-12-06 16:29[build_index] Import should_skipNatalie Pendragon1+1-1
2020-12-06 16:28Refactor change frequency constantsNatalie Pendragon2+39-24
2020-12-05 14:04[crawl] Abort robots.txt parsing attempt if not text/plainNatalie Pendragon1+1-1
2020-11-26 19:56[serve] Update contributions list on about pageNatalie Pendragon1+4-2
2020-11-26 19:47Bind to both IPv4 and IPv6Natalie Pendragon1+1-1
2020-11-23 02:50[crawl] Ignore another radio streamNatalie Pendragon1+2-1
2020-11-20 22:37Speed up get_newest_hostsRemco1+9-10
2020-11-17 14:09Add some more tests of GeminiResourceNatalie Pendragon1+30-0
2020-11-17 13:32Add regex-based url exclusion support & refactor testsNatalie Pendragon5+83-42
2020-11-16 13:50Add TODO to READMENatalie Pendragon1+1-0
2020-11-16 13:44Take exclusions into account when generating statisticsNatalie Pendragon1+10-5
2020-11-16 13:01[serve] Fix formatting of dates on statistics pageNatalie Pendragon1+3-3
2020-11-16 12:50Add two new TODOs to READMENatalie Pendragon1+2-0
2020-11-16 12:49[build_index] Only index text pages <= 1KB in sizeNatalie Pendragon3+5-2
2020-11-16 12:49More exclusionsNatalie Pendragon1+6-0
2020-11-16 12:47[serve] Fix index closing when program is killedNatalie Pendragon1+1-1
2020-11-15 15:56[crawl] Increase increment to temp error change frequencyNatalie Pendragon1+1-1
2020-11-15 14:19[serve] Update indexing documentationNatalie Pendragon1+8-0
2020-11-15 13:41[serve] Update about pageNatalie Pendragon1+5-3
2020-11-15 13:30Bump rolling writer's batch size back up to 5000Natalie Pendragon1+1-1
2020-11-15 13:30More exclusionsNatalie Pendragon1+8-0
2020-11-14 16:06Add systemd configNatalie Pendragon1+22-0
2020-11-13 13:24Move all whoosh related stuff into separate moduleRemco5+165-169
2020-11-12 20:03A friend for the other duckRemco1+4-0
2020-11-11 12:27Bump dependenciesNatalie Pendragon1+163-129
2020-11-11 12:18[build_index] Fix logging statementNatalie Pendragon1+1-1
2020-11-11 12:17[serve] Add statistics_overall_historical templateNatalie Pendragon1+14-0
2020-11-06 13:56Add .git-blame-ignore-revs fileNatalie Pendragon1+2-0
2020-11-06 13:44[crawl] Make logging message slightly clearerNatalie Pendragon1+1-1
2020-11-06 13:44Check for null input in new strip_control_chars functionNatalie Pendragon1+2-0
2020-11-06 13:43Update default logging config to log to both console and fileNatalie Pendragon1+11-5
2020-11-06 13:42Reformat code with BlackNatalie Pendragon14+685-404
2020-11-06 12:22[crawl] Strip control chars from URLs in crawl loggingNatalie Pendragon1+46-29
2020-11-03 13:38Add exclusion improvement TODO to READMENatalie Pendragon1+1-0
2020-11-01 14:39Ignore link like lines in preformatted text blocksRemco van 't Veer3+46-2
2020-11-02 13:39Add contributors section to about pageNatalie Pendragon1+20-0
2020-11-02 13:38Fix the index buildNatalie Pendragon3+34-18
2020-11-01 16:05Clean up todo list in READMENatalie Pendragon1+7-22
2020-10-31 14:06[build_index] Flush index segments to disk periodicallyNatalie Pendragon1+15-3
2020-10-31 15:53LoggingRemco van 't Veer5+144-86
2020-10-31 15:53Drop unused importsRemco van 't Veer3+12-52
2020-10-31 11:23Update gusmobile clone location in pyproject.tomlNatalie Pendragon1+1-1
2020-10-27 19:26Include notes on updating the indexRemco van 't Veer1+3-1
2020-10-27 16:02Describe procedure to get gus up and runningRemco van 't Veer1+30-0
2020-10-27 16:02Fix missing database column indexed_at on PageRemco van 't Veer1+1-0
2020-10-28 10:55[crawl] Add a few new exclusionsNatalie Pendragon1+17-0
2020-10-28 10:50[build_index] Perform prefix-based URL exclusion during index buildNatalie Pendragon1+8-0
2020-09-16 12:56[serve] Add "jump to page" functionality to searchNatalie Pendragon2+18-0
2020-09-16 12:43[serve] Upgrade to Jetforce v0.6.0Natalie Pendragon3+628-196
2020-09-16 11:02[serve] Add more quotesNatalie Pendragon1+17-0
2020-09-06 10:21[serve] Update documentation and links a bitNatalie Pendragon4+15-8
2020-09-04 12:21[serve] Add dynamic quotes to footerNatalie Pendragon3+66-17
2020-09-04 11:50[serve] Add newest pages endpoint, revamp documentation and indexNatalie Pendragon11+181-68
2020-09-03 12:00[serve] Add newest hosts routeNatalie Pendragon4+36-0
2020-08-25 08:37[serve] Remove extra quotation mark in add seeds templateNatalie Pendragon1+1-1
2020-08-11 12:30[crawl] Print change_frequencyNatalie Pendragon1+2-2
2020-08-11 12:18Fix bug in GeminiResource url constructionNatalie Pendragon1+3-3
2020-08-09 13:18[threads] Only work with textual pagesNatalie Pendragon1+3-0
2020-08-05 18:33[serve] Add favicon.txt routeNatalie Pendragon1+5-0
2020-08-05 13:03[serve] Add IP addresses to about pageNatalie Pendragon1+6-3
2020-08-05 13:03[threads] Add different sort orders for threadsNatalie Pendragon3+44-4
2020-08-03 16:55[serve] Improve feed matchingNatalie Pendragon1+4-0
2020-08-02 13:51Update namingNatalie Pendragon6+5-13
2020-08-02 13:46[crawl] Improve handling of change_frequencyNatalie Pendragon3+87-23
2020-08-02 09:45[serve] Add Known Feeds pageNatalie Pendragon6+37-2
2020-08-02 09:42[threads] Add collapsible log variationsNatalie Pendragon5+55-11
2020-07-28 12:56[threads] Fix thread orderingNatalie Pendragon1+4-4
2020-07-28 11:04[crawl] Index more errorsNatalie Pendragon1+8-2
2020-07-28 11:04[crawl] Add change_frequency backoffNatalie Pendragon1+13-4
2020-07-28 11:03Bump dependenciesNatalie Pendragon1+4-4
2020-07-28 11:02Add friendly authors and titles for threadsNatalie Pendragon5+100-11
2020-07-27 18:50Threads v1Natalie Pendragon8+271-19
2020-07-24 10:43[serve] Save searches to dbNatalie Pendragon2+12-3
2020-07-23 18:40[build_index] [serve] Distinguish cross-capsule backlinksNatalie Pendragon8+69-18
2020-07-23 13:44[crawl] Add is_cross_host_like field to dbNatalie Pendragon4+47-2
2020-07-23 12:35Gitignore all the indexesNatalie Pendragon1+1-2
2020-07-23 12:29Bump dependenciesNatalie Pendragon1+47-46
2020-07-23 10:54Create scripts directoryNatalie Pendragon6+175-2
2020-07-22 17:29Add normalized url to dbNatalie Pendragon5+44-38
2020-07-21 19:43[serve] Add cert change to news pageNatalie Pendragon1+3-0
2020-07-21 18:49[build_index] Account for per-page expirationNatalie Pendragon2+31-11
2020-07-20 12:19[build_index] Build index with backlink_count instead of backlinksNatalie Pendragon3+22-17
2020-07-20 11:56[crawl] Start indexing errorsNatalie Pendragon4+60-4
2020-07-19 13:23[crawl] Update db model, and delete links before recreatingNatalie Pendragon2+4-3
2020-07-19 12:18[crawl] Ensure manual exclusions stay out of the databaseNatalie Pendragon1+7-0
2020-07-19 11:35[serve] minor formatting updatesNatalie Pendragon2+2-2
2020-07-19 11:32[crawl] Support per-page expirationNatalie Pendragon4+130-109
2020-07-15 13:09[crawl] Rebuild link table completely and idempotentlyNatalie Pendragon2+12-2
2020-07-15 12:20[serve] Get backlinks from db instead of indexNatalie Pendragon1+12-11
2020-07-13 23:55[crawl] Set cap on maxiumum redirect chain lengthNatalie Pendragon2+12-2
2020-07-13 23:18[crawl] Abort when detecting self-redirectsNatalie Pendragon1+5-1
2020-07-13 23:17[crawl] Ignore 80h gopher proxyNatalie Pendragon1+3-0
2020-07-12 13:27[serve] Improve pager linking back to previous pageNatalie Pendragon1+3-1
2020-07-11 12:33[serve] Update backlinks links and presentation throughout GUSNatalie Pendragon5+10-5
2020-07-11 10:56[serve] Improve safety of backlinks code pathNatalie Pendragon1+2-0
2020-07-08 10:18[crawl] Add feature to seed incremental crawl with atom feedsNatalie Pendragon4+152-17
2020-07-06 10:22Make incremental build_index workNatalie Pendragon2+23-11
2020-07-06 10:20DRY up the sqlite model and init_db codeNatalie Pendragon5+54-91
2020-07-05 12:52[serve] Improve handling of backlink searchesNatalie Pendragon1+11-2
2020-07-05 12:02[serve] Add historical statistics pageNatalie Pendragon5+36-13
2020-07-05 11:01[crawl] [serve] Run statistics and domains from sqlite dbNatalie Pendragon4+53-54
2020-07-04 10:43Improve discovery of backlinksNatalie Pendragon2+13-6
2020-07-03 15:45[serve] Fix minor bug in counting of backlinksNatalie Pendragon1+2-2
2020-07-03 14:39[crawl] [serve] Switch crawl to 2-phase with sqliteNatalie Pendragon7+382-185
2020-06-30 12:57[crawl] Ignore localhostNatalie Pendragon1+1-0
2020-06-30 12:54[serve] Add backlinks news and documentationNatalie Pendragon2+11-0
2020-06-30 12:28[serve] Improve verbose modeNatalie Pendragon3+20-13
2020-06-30 12:24[serve] Update header levelsNatalie Pendragon6+26-23
2020-06-30 11:07[crawl] [serve] Add backlinksNatalie Pendragon6+94-11
2020-06-22 20:57[crawl] Ignore more bad contentNatalie Pendragon1+10-0
2020-06-18 11:16Update READMENatalie Pendragon1+3-20
2020-06-18 10:58[serve] Rearchitect serve to use templates and MVC patternNatalie Pendragon20+543-496
2020-06-17 13:09Add GUS licenceNatalie Pendragon1+33-0
2020-06-17 11:36[serve] Make seed request handling async again for nowNatalie Pendragon1+6-5
2020-06-17 11:33[crawl] Ignore some more alexschroeder pagesNatalie Pendragon1+27-1
2020-06-12 13:38[serve] Sort domains on the known-hosts pageNatalie Pendragon2+3-2
2020-06-12 10:40[serve] Add size to result renderingNatalie Pendragon2+91-11
2020-06-11 10:38[crawl] Start indexing response sizesNatalie Pendragon2+12-2
2020-06-10 12:09[serve] Use preformatted blocks on the statistics pageNatalie Pendragon1+8-2
2020-06-09 11:01Bump dependenciesNatalie Pendragon1+29-29
2020-06-09 10:55[crawl] Start indexing lang parameterNatalie Pendragon2+16-10
2020-06-08 11:29[serve] Update some copy on about pageNatalie Pendragon1+1-1
2020-06-08 11:28Revert "[crawl] Index raw content for regex searches"Natalie Pendragon1+0-2
2020-06-07 12:32[crawl] Ignore some more thingsNatalie Pendragon1+21-0
2020-06-07 11:05[crawl] Add marmaladefoo's calculator to manual exclusionsNatalie Pendragon1+3-0
2020-06-05 11:35Add easy CLI way of removing domains from indexNatalie Pendragon2+41-0
2020-06-05 10:46[crawl] Remove manual exclusions for alexschroeder.chNatalie Pendragon1+0-10
2020-06-05 10:41[crawl] Add custom crawl delaysNatalie Pendragon1+7-2
2020-06-04 15:27[crawl] Improve indexing performanceNatalie Pendragon1+31-45
2020-06-03 23:37Update some seedsNatalie Pendragon1+2-1
2020-06-03 20:28[crawl] Start indexing the charsetNatalie Pendragon4+56-8
2020-06-03 16:50[crawl] Only attempt to extract contained resources from text/geminiNatalie Pendragon1+8-4
2020-06-03 16:50[crawl] Ignore some troublesome content from alexschroeder.chNatalie Pendragon1+10-0
2020-06-03 16:50[crawl] Fix default crawl delay when not specified explicitlyNatalie Pendragon1+4-4
2020-06-03 14:58[crawl] Persist index & crawl statistics on non-destructive crawlsNatalie Pendragon2+14-12
2020-06-03 14:53Bump dependency versionsNatalie Pendragon1+11-11
2020-06-03 14:49[crawl] Index raw content for regex searchesNatalie Pendragon1+3-1
2020-06-03 14:47[serve] Use "OR" as the default connector for queriesNatalie Pendragon1+13-3
2020-05-29 18:40[serve] Make sure two closely-timed seed requests don't breakNatalie Pendragon1+11-2
2020-05-28 13:02[crawl] Improve hierarchical handling of robots.txt entriesNatalie Pendragon1+12-4
2020-05-26 13:48[serve] Update copy on known hosts pageNatalie Pendragon1+1-1
2020-05-26 10:57[crawl] Ignore some Geddit URL prefixesNatalie Pendragon1+4-0
2020-05-26 01:44[crawl] [serve] Add fetchable URL to the indexNatalie Pendragon2+7-2
2020-05-25 17:19Bump version of Jetforce dependencyNatalie Pendragon1+3-3
2020-05-25 10:31[crawl] Improve handling of quoting and unquoting URLsNatalie Pendragon1+9-3
2020-05-25 03:05Rename fully_qualified_url to fetchable_urlNatalie Pendragon2+19-19
2020-05-25 03:00Rename fully_qualified_massaged_url to indexable_urlNatalie Pendragon2+11-11
2020-05-25 02:54[crawl] Fix bug in fully_qualified_massaged_urlNatalie Pendragon2+2-2
2020-05-24 14:08[crawl] Stop storing responses in GeminiResource objectsNatalie Pendragon2+33-37
2020-05-24 14:10Bump version of gusmobile dependencyNatalie Pendragon1+1-1
2020-05-24 11:28[crawl] Handle url fragmentsNatalie Pendragon1+8-1
2020-05-23 13:11[crawl] Fix handling of robots.txtNatalie Pendragon2+63-50
2020-05-23 11:19[crawl] Exclude "rss.xml" pathsNatalie Pendragon1+1-0
2020-05-22 13:18[crawl] Optimize the index after crawlsNatalie Pendragon1+3-0
2020-05-22 12:42[serve] Update highlight scoring and renderingNatalie Pendragon2+9-4
2020-05-22 11:31[crawl] pickle and unpickle the robot_file_mapNatalie Pendragon2+14-3
2020-05-22 11:20Improve handling of unquoting URLsNatalie Pendragon1+1-4
2020-05-21 20:07[serve] Update documentation on filtersNatalie Pendragon1+17-6
2020-05-21 19:35Update locked version of GusmobileNatalie Pendragon1+4-4
2020-05-21 14:59[crawl] Add domain field to indexNatalie Pendragon1+6-0
2020-05-21 13:25Remove outdated TODONatalie Pendragon1+0-3
2020-05-21 13:18[serve] Update formatting of statistics pageNatalie Pendragon1+2-3
2020-05-21 12:39[serve] Fix bug with first/next/previous page link formattingNatalie Pendragon1+4-3
2020-05-21 11:57[serve] Only highlight nice content types in search resultsNatalie Pendragon1+1-1
2020-05-21 11:33[crawl] Make path exclusions more robustNatalie Pendragon1+4-4
2020-05-21 10:53[serve] Remove broken URL count from stats pageNatalie Pendragon1+0-1
2020-05-21 10:45Add houston to seeds, but ignore its search resultsNatalie Pendragon1+5-0
2020-05-21 10:45[crawl] [serve] Add search highlightsNatalie Pendragon3+106-7
2020-05-20 13:33[crawl] Index massaged URLsNatalie Pendragon2+27-13
2020-05-20 13:32[crawl] Handle trailing slash redirects betterNatalie Pendragon2+6-1
2020-05-20 12:15[serve] Update the loading of statisticsNatalie Pendragon1+21-6
2020-05-19 21:08[crawl] Fix lots of bugsNatalie Pendragon4+124-103
2020-05-19 10:47[crawl] Crawl the seed requests after the main crawlNatalie Pendragon1+15-0
2020-05-19 10:36[crawl] Fix bug in relative URL parsingNatalie Pendragon1+2-2
2020-05-18 19:52[crawl] Fix bug with computing full_qualified_urlsNatalie Pendragon3+29-10
2020-05-18 13:12[crawl] Use standardized print_index_statisticsNatalie Pendragon2+13-19
2020-05-18 13:01[no-op] Clean up comments in whoosh_extensionsNatalie Pendragon1+0-3
2020-05-18 12:57[serve] Crawl and index seed requests immediatelyNatalie Pendragon4+65-18
2020-05-17 14:30Update README TODOsNatalie Pendragon1+6-8
2020-05-17 14:20[crawl] Implement GeminiResourceNatalie Pendragon5+167-86
2020-05-17 11:45[crawl] Exclude GUS search result pages from crawlNatalie Pendragon1+2-0
2020-05-17 10:21[crawl] Add seedsNatalie Pendragon1+3-0
2020-05-16 18:51[crawl] Add jan.bio to seedsNatalie Pendragon1+1-0
2020-05-16 15:23Add index.bak to gitignoreNatalie Pendragon1+1-0
2020-05-16 14:57[crawl] Create non-destructive crawl optionNatalie Pendragon3+33-7
2020-05-16 13:23[serve] Improve documentation on content type queriesNatalie Pendragon1+5-13
2020-05-16 13:05[serve] Add verbose modeNatalie Pendragon2+51-13
2020-05-16 12:22[serve] Update how num_results is displayedNatalie Pendragon1+4-4
2020-05-16 12:12[serve] Improve search result data typeNatalie Pendragon1+13-5
2020-05-16 12:00[crawl] [serve] Add more statisticsNatalie Pendragon4+55-20
2020-05-16 10:57[crawl] Update seedsNatalie Pendragon1+4-0
2020-05-15 12:03[crawl] Update seedsNatalie Pendragon1+5-1
2020-05-15 12:01Update and reorder TODOsNatalie Pendragon1+14-9
2020-05-15 10:27[crawl] [no-op] Add a line after backup operationNatalie Pendragon1+1-0
2020-05-14 19:40Update statistics TODOsNatalie Pendragon1+5-1
2020-05-14 13:17[crawl] Add new seedNatalie Pendragon1+1-0
2020-05-14 12:49[serve] Update statistics copy slightlyNatalie Pendragon1+2-2
2020-05-14 11:56[serve] Implement pagingNatalie Pendragon2+29-16
2020-05-14 10:59Update README ideas for more index/usage statisticsNatalie Pendragon1+7-4
2020-05-13 14:20[crawl] Add new spanish site to crawl seedsNatalie Pendragon1+4-0
2020-05-13 13:51[crawl] Refactor manual exclusions and add fgaz' calculatorNatalie Pendragon1+13-4
2020-05-12 12:52Add TODO for generating and sharing GUS usage statisticsNatalie Pendragon1+5-0
2020-05-12 12:46[serve] Add news featureNatalie Pendragon1+43-1
2020-05-12 12:18[serve] Add page to show all known hostsNatalie Pendragon1+22-0
2020-05-12 11:56[statistics] Add ability to compute and print stats easilyNatalie Pendragon2+29-5
2020-05-12 11:23[statistics] Refactor statistics objects to pass around dictsNatalie Pendragon2+25-25
2020-05-12 11:07[serve] Add page headersNatalie Pendragon1+8-1
2020-05-11 18:51[serve] Update copy for current index statisticsNatalie Pendragon1+2-2
2020-05-11 18:45[serve] Stop hard-wrapping contentNatalie Pendragon1+7-18
2020-05-11 17:56[serve] Report out current index statisticsNatalie Pendragon4+56-5
2020-05-11 17:16Refactor some common/library code into separate filesNatalie Pendragon7+86-70
2020-05-10 16:12[serve] Remove TODO to add documentation for content_typeNatalie Pendragon1+0-1
2020-05-10 15:50[crawl] Alphabetize and add a few more seedsNatalie Pendragon1+25-16
2020-05-10 14:39[crawl] Backup old index before running crawlNatalie Pendragon1+9-0
2020-05-10 14:38[crawl] Add indexed_at fieldNatalie Pendragon2+7-2
2020-05-09 21:34[crawl] Compute and generate index statistics after each crawlNatalie Pendragon1+57-1
2020-05-09 21:23[serve] Update content_type search documentationNatalie Pendragon1+5-1
2020-05-09 20:05Add TODO to track Geminispace statisticsNatalie Pendragon1+5-0
2020-05-09 18:07[serve] Add documentation for content_typesNatalie Pendragon1+24-4
2020-05-09 17:35[serve] Add note that paging isn't implemented yetNatalie Pendragon1+1-1
2020-05-09 17:35[serve] Put index generation date in footerNatalie Pendragon2+11-2
2020-05-09 16:38Add a couple TODOsNatalie Pendragon1+2-0
2020-05-09 15:54[crawl] Add two new seedsNatalie Pendragon1+2-0
2020-05-09 15:06[crawl] Stop printing the sleep durationNatalie Pendragon1+0-1
2020-05-09 15:00[crawl] Improve error recoveryNatalie Pendragon1+35-24
2020-05-09 14:58[crawl] Adjust link line regex to only match at beginning of lineNatalie Pendragon2+6-2
2020-05-05 12:27[crawl] Respect robots.txt crawl_delays and add a kind defaultNatalie Pendragon2+29-8
2020-04-17 13:24Add some TODOsNatalie Pendragon1+4-0
2020-04-16 22:40[serve] Fix bug in displaying "input" resultsNatalie Pendragon1+2-2
2020-04-16 22:39Update dependenciesNatalie Pendragon1+49-49
2020-04-16 22:19[crawl] fix crawl bug with robots.txtNatalie Pendragon1+2-2
2020-04-16 22:18[serve] Update formattingNatalie Pendragon1+3-7
2020-03-15 02:50Improve it allNatalie Pendragon4+106-81
2020-03-05 13:55[serve] Add seed request trackingNatalie Pendragon2+21-0
2020-03-05 12:50[serve] Update aestheticsNatalie Pendragon1+12-12
2020-03-04 13:08Add search suggestionsNatalie Pendragon1+36-5
2020-03-04 13:08Update indexing and query parsingNatalie Pendragon3+36-6
2020-03-04 13:06Add TODO to track freshness of contentNatalie Pendragon1+1-0
2020-03-02 11:43[crawl] Respect "indexer" robots.txt entriesNatalie Pendragon1+1-1
2020-03-01 17:12Add more feature ideas to the READMENatalie Pendragon1+9-0
2020-03-01 17:12Index and serve mime typesNatalie Pendragon2+4-2
2020-02-29 13:33Improve README readabilityNatalie Pendragon1+6-6
2020-02-29 13:31Add README todo to add pagingNatalie Pendragon1+7-0
2020-02-29 13:27[serve] Remove numbers from search result rowsNatalie Pendragon1+1-1
2020-02-29 13:13Update README.mdNatalie Pendragon1+27-1
2020-02-27 14:06Update READMENatalie Pendragon2+24-19
2020-02-27 13:45Make GUS easier to run for othersNatalie Pendragon3+60-43
2020-02-23 14:30Add some new seed sitesNatalie Pendragon1+3-1
2020-02-21 13:44Respect robots.txtNatalie Pendragon2+41-10
2020-01-30 13:47Initial commitNatalie Pendragon6+1024-0