aboutsummaryrefslogtreecommitdiff
path: root/internal/postgres/insert_module_test.go
diff options
context:
space:
mode:
authorJulie Qiu <julie@golang.org>2020-04-27 17:47:17 -0400
committerJulie Qiu <julieqiu@google.com>2020-04-30 14:41:16 +0000
commit4837cb2a2c968d9af1aea5c5bd63eb0a9ef3d50a (patch)
treefd002cf0adf7f4d348581d998715383aa1012e52 /internal/postgres/insert_module_test.go
parent3c5ea2343246e593f11750c8bc9c0c6d1ddcc8a0 (diff)
downloadgo-x-pkgsite-4837cb2a2c968d9af1aea5c5bd63eb0a9ef3d50a.tar.xz
internal/postgres: change reprocessing logic
At the moment, we reprocess and requeue modules using the following logic: 1. Set all modules to be reprocessed = 505. 2. Requeue modules with status=0 or status >= 500. Prioritize the following: - IsLatest: sorted by release vs prerelease modules - IsBig: hardcoded list of modules we know are big This poses the following problems: 1. Requeue order is not idempotent: priority is given to categories of modules, but within each category, the order of modules being queued can change each time requeue is called. This leads to many modules sitting in the task queue, and a lack of clarity as to how much progress we have made when looking at the logs. 2. Modules missing from isBig list: there are several modules missing from the isBig list, but these aren't being accounted for. We deproritize large modules because they take a really long time to process and can timeout if too many are being processed at once, so we want to process them at a slower rate than other modules. 3. Alternative modules have the same priority as non-alternative modules: we usually don't care about alternative modules, and they will be deleted from search_documents once identified. These should be processed after the lastest version of non-alternative modules are processed to prevent unnecessary deletes. To address these issues, reprocessing / requeue now follows the following logic: 1. All modules are reprocess with a 50x status code based on their last fetch status in module version states. 2. Modules are requeued in the following order (with the exception of large modules): - Latest version of modules previously with 20x status - Latest version of bad modules and alternative modules - Any version of modules previously with 20x status - Any version of bad modules and alternative modules - Any module with a status=0 or status=500 (we expect these to already be in the queue) 3. All large modules are queued last, since these take up a lot of time and need to be processed at a slower rate. Within each category, modules are sorted as follows: 1. num_packages 2. version DESC 3. module_path This keeps the order idempotent, and prioritizes smaller and newer modules. It also allows modules of similar sizes to be processed together. Change-Id: I49580ed75bf60cc2698b756882bfdc906f72d935 Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/725873 Reviewed-by: Jonathan Amsterdam <jba@google.com>
Diffstat (limited to 'internal/postgres/insert_module_test.go')
0 files changed, 0 insertions, 0 deletions