diff options
| author | Julie Qiu <julie@golang.org> | 2020-04-27 17:47:17 -0400 |
|---|---|---|
| committer | Julie Qiu <julieqiu@google.com> | 2020-04-30 14:41:16 +0000 |
| commit | 4837cb2a2c968d9af1aea5c5bd63eb0a9ef3d50a (patch) | |
| tree | fd002cf0adf7f4d348581d998715383aa1012e52 /internal/postgres/insert_module_test.go | |
| parent | 3c5ea2343246e593f11750c8bc9c0c6d1ddcc8a0 (diff) | |
| download | go-x-pkgsite-4837cb2a2c968d9af1aea5c5bd63eb0a9ef3d50a.tar.xz | |
internal/postgres: change reprocessing logic
At the moment, we reprocess and requeue modules using the following
logic:
1. Set all modules to be reprocessed = 505.
2. Requeue modules with status=0 or status >= 500. Prioritize the
following:
- IsLatest: sorted by release vs prerelease modules
- IsBig: hardcoded list of modules we know are big
This poses the following problems:
1. Requeue order is not idempotent: priority is given to categories of
modules, but within each category, the order of modules being queued can
change each time requeue is called. This leads to many modules sitting
in the task queue, and a lack of clarity as to how much progress we have
made when looking at the logs.
2. Modules missing from isBig list: there are several modules missing
from the isBig list, but these aren't being accounted for. We
deproritize large modules because they take a really long time to
process and can timeout if too many are being processed at once, so we
want to process them at a slower rate than other modules.
3. Alternative modules have the same priority as non-alternative
modules: we usually don't care about alternative modules, and they will
be deleted from search_documents once identified. These should be
processed after the lastest version of non-alternative modules are
processed to prevent unnecessary deletes.
To address these issues, reprocessing / requeue now follows the
following logic:
1. All modules are reprocess with a 50x status code based on their last
fetch status in module version states.
2. Modules are requeued in the following order (with the exception of large modules):
- Latest version of modules previously with 20x status
- Latest version of bad modules and alternative modules
- Any version of modules previously with 20x status
- Any version of bad modules and alternative modules
- Any module with a status=0 or status=500 (we expect these to already be in the queue)
3. All large modules are queued last, since these take up a lot of time
and need to be processed at a slower rate.
Within each category, modules are sorted as follows:
1. num_packages
2. version DESC
3. module_path
This keeps the order idempotent, and prioritizes smaller and newer
modules. It also allows modules of similar sizes to be processed
together.
Change-Id: I49580ed75bf60cc2698b756882bfdc906f72d935
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/725873
Reviewed-by: Jonathan Amsterdam <jba@google.com>
Diffstat (limited to 'internal/postgres/insert_module_test.go')
0 files changed, 0 insertions, 0 deletions
