aboutsummaryrefslogtreecommitdiff
path: root/internal/postgres/requeue.go
AgeCommit message (Collapse)Author
2023-08-25internal/config: separate config initialization into serverconfigMichael Matloob
This change creates a new package that does config initialization and other GCP-specific operations that were previously done in package config, so that config can have no cloud dependencies. For golang/go#61399 Change-Id: I8d78294834e325b47d838892a1cef87003a4b90a Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/522516 Run-TryBot: Michael Matloob <matloob@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Robert Findley <rfindley@google.com> kokoro-CI: kokoro <noreply+kokoro@google.com>
2022-11-15all: convert interface{} to anyHana (Hyang-Ah) Kim
Change-Id: I1f3b7cc8899c7707abb01e3d14807c37c3451382 Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/449695 TryBot-Result: kokoro <noreply+kokoro@google.com> Reviewed-by: Jamal Carvalho <jamal@golang.org> Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com>
2022-04-04all: fix typosDan Kortschak
Change-Id: I71373b98c4bf80b176f0f01c3ce50335813678b1 Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/391974 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: kokoro <noreply+kokoro@google.com> Trust: Cherry Mui <cherryyz@google.com>
2021-07-14internal/config: use pkgsite internal/log packageJulie Qiu
The pkgsite logger is now used instead of the log package from the standard library. Change-Id: I5083d8cd6bf7a96a9245f848633b8803d0d80483 Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/333931 Trust: Julie Qiu <julie@golang.org> Run-TryBot: Julie Qiu <julie@golang.org> TryBot-Result: kokoro <noreply+kokoro@google.com> Reviewed-by: Jonathan Amsterdam <jba@google.com>
2021-07-13internal/postgres: reprocess modules in search_documentsJulie Qiu
An endpoint is added to the worker, which only reprocess modules that are in the search_documents table. For golang/go#44142 Change-Id: I46e1e457707f3010ea2ea42c8aac60adbf3e8fd7 Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/332190 Trust: Julie Qiu <julie@golang.org> Run-TryBot: Julie Qiu <julie@golang.org> TryBot-Result: kokoro <noreply+kokoro@google.com> Reviewed-by: Jonathan Amsterdam <jba@google.com>
2021-07-13internal/postgres: remove latest logic from reprocess queryJulie Qiu
The latest_versions subquery in GetNextModulesToFetch is causing the query to hang and preventing modules from being enqueued. This subquery isn't necessary, so it is removed to allow reprocessing to continue. The optimization will be fixed in a future query. Change-Id: I9f81b36d5c8ecf5e88922e2c78301c9fc362a92e Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/332377 Trust: Julie Qiu <julie@golang.org> Run-TryBot: Julie Qiu <julie@golang.org> TryBot-Result: kokoro <noreply+kokoro@google.com> Reviewed-by: Jonathan Amsterdam <jba@google.com>
2021-05-27Revert "internal/postgres: process new modules in arrival order"Jonathan Amsterdam
This reverts commit 61bac80a3d712e004b0fd1355b057c8905d43dce. Reason for revert: if we queue in arrival order, then multiple versions of the same module that are published together (which seems to happen a lot) will be queued together, and will conflict with each other on the module path lock. We'll have to remove the processing lag alert, but instead we'll alert on total backlog, which is more robust anyway. Change-Id: Ia1f9f5c7ec8b0b6577527d3dd6060b692c7a7b28 Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/323172 Trust: Jonathan Amsterdam <jba@google.com> Reviewed-by: Julie Qiu <julie@golang.org>
2021-05-10internal/postgres: process new modules in arrival orderJonathan Amsterdam
When enqueing modules for processing that we haven't seen before, prefer the ones that arrived earliest, as determined by their index timestamp. Besides being "fair," this will satisfy the assumption of the Worker Processing Lag metric, making that metric more useful. We don't change the ordering of modules that are being reprocessed. The arrival order no longer matters at that point, and we want to keep the pseudo-random ordering provided by the md5 hash to avoid runs of versions of the same (potentially large) module from gumming up the workers. Change-Id: I3c422eda576931110d4dbcd1db9f7199fde67e6f Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/317890 Trust: Jonathan Amsterdam <jba@google.com> Run-TryBot: Jonathan Amsterdam <jba@google.com> TryBot-Result: kokoro <noreply+kokoro@google.com> Reviewed-by: Julie Qiu <julie@golang.org>
2021-04-28internal/{postgres,worker}: only reprocess release versionJulie Qiu
A handler is added to the worker so that only release and non-incompatible versions are reprocessed. This is used to optimize reprocessing for the symbol_history table. For golang/go#37102 Change-Id: Ie8205dd16709620d267dd362383d4a6822de2495 Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/314690 Trust: Julie Qiu <julie@golang.org> Run-TryBot: Julie Qiu <julie@golang.org> TryBot-Result: kokoro <noreply+kokoro@google.com> Reviewed-by: Jonathan Amsterdam <jba@google.com>
2021-02-12internal/postgres: add a stack to all wrapped errorsJonathan Amsterdam
Add a stack trace when we wrap an error from the DB. These traces can be sent to the error reporting service. For golang/go#44231 Change-Id: I096cdec4e97a6dcb0b7eb2ccdb4c955e1a0f4ccd Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/291492 Trust: Jonathan Amsterdam <jba@google.com> Run-TryBot: Jonathan Amsterdam <jba@google.com> TryBot-Result: kokoro <noreply+kokoro@google.com> Reviewed-by: Julie Qiu <julie@golang.org>
2021-01-26internal/postgres,worker: reprocess latest versionsJulie Qiu
A latest_only option is added to the reprocess handler, so that we can reprocess only the latest version of each module. Change-Id: Ic4ece02ef79af9399f9903eef29b950088669de7 Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/286872 Trust: Julie Qiu <julie@golang.org> Run-TryBot: Julie Qiu <julie@golang.org> TryBot-Result: kokoro <noreply+kokoro@google.com> Reviewed-by: Jonathan Amsterdam <jba@google.com>
2020-11-17internal/postgres: remove arg from next-fetch queryJonathan Amsterdam
The new query for fetching the next modules to process doesn't use the large-module threshold, so remove it. Change-Id: I8b658522b6b80013a8da2a14d863ec64e4672852 Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/270837 Trust: Jonathan Amsterdam <jba@google.com> Run-TryBot: Jonathan Amsterdam <jba@google.com> TryBot-Result: kokoro <noreply+kokoro@google.com> Reviewed-by: Julie Qiu <julie@golang.org>
2020-11-12internal/postgres: switch to alternative fetch orderingJonathan Amsterdam
Use the "alternative" ordering that processes latest versions first and prioritizes some statuses, but then effectively randomizes the modules. Change-Id: I2de1837e056a7fe558b519e8841146a5fc547dd8 Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/269240 Trust: Jonathan Amsterdam <jba@google.com> Run-TryBot: Jonathan Amsterdam <jba@google.com> TryBot-Result: kokoro <noreply+kokoro@google.com> Reviewed-by: Jamal Carvalho <jamal@golang.org> Reviewed-by: Julie Qiu <julie@golang.org>
2020-09-29internal/postgres: enqueue modules in a random orderJonathan Amsterdam
For requeuing, we were sorting modules first by importance of reprocessing, then by module path. That clumped all versions of the same module together (at least all those that needed reprocessing). The main problem with reprocessing multiple versions of the same module was that if the load shedder was off about the size, that error would persist while all versions of the module were being processed. For example, github.com/elastic/beats/v7 has an 86Mi zip file, but only a few Ki of actual Go files. So a node with a 100Mi threshold would shed one version if it was processing another. This frequent shedding slowed the task queue rate to a crawl. This CL effectively randomizes the modules that are enqueued, by hashing the name and version. This small change has had a dramatic effect on processing. Workers are now living for many minutes, even hours, before OOMing; before, even with a load-shed threshold of 100Mi they would still OOM in a half hour or so. Before, processing was erratic: bursts of fast progress interspersed with intervals where processing slowed, as many versions of the same module were processed together, where the module was large or had other problems, like the the elastic/beats module discussed above. Now, processing is uniform and fast, with 8 workers processing about 5 packages/sec and very few sheds. Change-Id: Id4f2010f7ab3131d4b4e37721cab6d5ff1680a54 Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/258018 Reviewed-by: Julie Qiu <julie@golang.org> Trust: Jonathan Amsterdam <jba@google.com>
2020-09-29internal/postgres: change requeue ordering for GKEJonathan Amsterdam
Under an experiment flag, use a simpler requeue ordering that doesn't leave large modules until the end. It's actually better if large modules are mixed in with everything else, since they won't get clumped together at the end and grind things to a standstill. Change-Id: Iffd5ca70170ddc616a1516aaf783fe2ac446399d Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/257241 Trust: Jonathan Amsterdam <jba@google.com> Run-TryBot: Jonathan Amsterdam <jba@google.com> Reviewed-by: Julie Qiu <julie@golang.org>
2020-09-23internal/postgres: make large modules limit configurableJonathan Amsterdam
Change-Id: Iafeb3727f1d48ed37d10d64db6e39f26db590f29 Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/256761 Trust: Jonathan Amsterdam <jba@google.com> Run-TryBot: Jonathan Amsterdam <jba@google.com> TryBot-Result: kokoro <noreply+kokoro@google.com> Reviewed-by: Julie Qiu <julie@golang.org>
2020-09-09internal/database: change Exec to return rows affectedJonathan Amsterdam
Change Exec so it return the number of rows affected by the statement, rather than a sql.Result. - We never use the other methods of sql.Result. - It's annoying to get the number of affected rows from a sql.Result because of the error return value. - Examination of github.com/lib/pq shows that there is no extra cost to calling sql.Result.RowsAffected after an Exec. Change-Id: If16a15cbabf38755518732c4489109e0b01f2cd1 Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/253079 Reviewed-by: Julie Qiu <julie@golang.org>
2020-09-02internal/postgres: do not reprocess 490s and 491sJulie Qiu
Bad modules and alternative modules do not need to be reprocessed, and are no longer marked for reprocessing. Modules in the reprocess state for those statuses (540 and 541s) will continue be requeued so that we don't end up having two status codes that indicate the same thing. Once all 540s and 541s have been updated, we can delete those error codes. Change-Id: I804c021421e8a8e19be49992d9113ea832d35a9b Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/252679 Run-TryBot: Julie Qiu <julie@golang.org> TryBot-Result: kokoro <noreply+kokoro@google.com> Reviewed-by: Jonathan Amsterdam <jba@google.com>
2020-08-20Revert "internal/postgres: update module status in version_map when ↵Shaquille Que
reprocessing" This reverts commit 851d8cf80bfd77a2db108c00a74a9d11b687da5a. Reason for revert: use modules.status table instead, see golang.org/cl/249441 Change-Id: Ie5380ad2ca94d772487e4ddbaf8296b6c83b7349 Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/249442 Reviewed-by: Julie Qiu <julie@golang.org>
2020-08-19internal/postgres: don't read module_version_states.incompatibleJonathan Amsterdam
We don't need to read the 'incompatible' column from module_version_states. This also fixes a bug where there was a mismatch between the number of columns in the query and the number of arguments to rows.Scan in queryModuleVersionStates. Change-Id: I5a8e0d41666ce7b5e721ba78342cb18ee21ead2f Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/249297 Reviewed-by: Julie Qiu <julie@golang.org>
2020-08-18internal/postgres: update module status in version_map when reprocessingShaquille Que
Currently, the worker only updates the module_version_states table to mark that modules need to be reprocessed. Update the module status in the version_map table too. For golang/go#40807 Change-Id: I9a1ecf611a71f1d3dda3d7d7b20d17012a995ca3 Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/248677 Run-TryBot: Shaquille Que <shaquille@golang.org> TryBot-Result: kokoro <noreply+kokoro@google.com> Reviewed-by: Julie Qiu <julie@golang.org>
2020-08-17internal/postgres: populate module_version_states.incompatible fieldMiguel Acero
This change adds support for inserting, sorting and updating module_version_states.incompatible. Updates golang/go#37714 Change-Id: I40b0831adef6e78beafee259e972cb0e4d4b90c4 Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/248182 Run-TryBot: Julie Qiu <julie@golang.org> Run-TryBot: Jonathan Amsterdam <jba@google.com> Reviewed-by: Jonathan Amsterdam <jba@google.com> Reviewed-by: Julie Qiu <julie@golang.org>
2020-07-30internal/postgres: treat new modules like large modulesJonathan Amsterdam
Don't enqueue more than largeModulesLimit new modules, in addition to using that limit for large modules. Some new modules are large, and sometimes the worker does get a big batch of those new, large modules all at once. Its gets stuck processing them and makes little to no progress. Treating all new modules as large will result in processing pauses when the modules are small, but at least we'll be able to see that happening and gain some insight into the requeuing behavior. Updates b/162495665. Change-Id: I02319fbcec02d97b457638da31c8bf632d43955e Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/245917 Reviewed-by: Julie Qiu <julie@golang.org>
2020-07-20internal: reprocess modules with DBModuleInsertInvalid statusJulie Qiu
When we reprocess modules, ones with a status DBModuleInsertInvalid will now be reprocessed. Additionally, it is now possible to reprocess modules based on a specific status code using the /reprocess endpoint. Change-Id: I378d635ec3939717ee2c8305db7944c6467b28d3 Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/243277 Reviewed-by: Jonathan Amsterdam <jba@google.com>
2020-07-16internal/derrors: rename ToHTTPStatus to ToStatusJulie Qiu
ToHTTPStatus is renamed to ToStatus, since several derrors codes are not HTTP status codes. Change-Id: I41bf1452fdbbafe1a4f752bc3092e39515a2db4b Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/242882 Reviewed-by: Jonathan Amsterdam <jba@google.com>
2020-06-26internal/postgres: get next modules to fetch in one queryJonathan Amsterdam
Construct a single query that returns all the modules we want to reprocess, in the order we want. The main ideas are: - Use a WITH statement (CTE) to construct a table of (module path, latest version). - Add a computed "latest" column to every row by testing membership in that table. - Use a CASE in the ORDER BY clause to express the ordering we want. Some other changes: - We process status 0 first (the original motivation for this CL). - As a consequence of using one query, we'll always return a number of results. up to the limit. To handle the lower limit for large modules, we add code to truncate the results. - Before, all 5xx that were not one of the four special codes were processed after everything else. Now it's relatively easy to process other 5xx codes at the end of each category (latest versions, non-large modules). Change-Id: I4dacb093cfb93299ccfc8d29c1fd0b1cb49bf56f Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/239479 Reviewed-by: Julie Qiu <julie@golang.org>
2020-06-22internal/postgres: reorganize GetNextModulesToFetchJonathan Amsterdam
Change DB.GetNextModulesToFetch to have fewer special cases. This will make it easier to add new queries. Change-Id: I5d10e39543a2054fc3851071a7902398cf06a566 Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/239317 Reviewed-by: Julie Qiu <julie@golang.org>
2020-05-28internal/postgres: add requeue limit for large modulesJulie Qiu
Our requeue logic is such that large modules (ones with > 1500 packages) are requeued last, since they can take a long time to reprocess. However, with large modules, it's possible that the modules that are queued do not finish processing before the dedupe period for Cloud Tasks is up, causing the queue to be backed-up with duplicate modules. A limit of is now added so that at most 100 large modules can be added to the queue at once. Change-Id: I0d2a9b709b9a9972b609e9c3af125d58d37f4488 Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/755584 Reviewed-by: Jonathan Amsterdam <jba@google.com>
2020-05-26internal/postgres: fix requeue bugJulie Qiu
There was a bug in the requeue logic that caused all modules to be requeued when next_processed_at < NOW. This is now fixed so that only module version states with status=0 or status >= 500 is picked up. Fixes b/157403463 Change-Id: Ieffde87362c2cd05391780a20dba9647884c99a8 Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/754074 Reviewed-by: Jonathan Amsterdam <jba@google.com>
2020-05-13internal/postgres: tweak log for reprocessingJulie Qiu
The log for UpdateModuleVersionStatesForReprocessing is tweaked to improve clarity and make the prefix easier to search. Change-Id: I0aaf62210b75c84b33a42cb58040071e534f0356 Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/743626 CI-Result: Cloud Build <devtools-proctor-result-processor@system.gserviceaccount.com> Reviewed-by: Jonathan Amsterdam <jba@google.com>
2020-04-30internal/postgres: change reprocessing logicJulie Qiu
At the moment, we reprocess and requeue modules using the following logic: 1. Set all modules to be reprocessed = 505. 2. Requeue modules with status=0 or status >= 500. Prioritize the following: - IsLatest: sorted by release vs prerelease modules - IsBig: hardcoded list of modules we know are big This poses the following problems: 1. Requeue order is not idempotent: priority is given to categories of modules, but within each category, the order of modules being queued can change each time requeue is called. This leads to many modules sitting in the task queue, and a lack of clarity as to how much progress we have made when looking at the logs. 2. Modules missing from isBig list: there are several modules missing from the isBig list, but these aren't being accounted for. We deproritize large modules because they take a really long time to process and can timeout if too many are being processed at once, so we want to process them at a slower rate than other modules. 3. Alternative modules have the same priority as non-alternative modules: we usually don't care about alternative modules, and they will be deleted from search_documents once identified. These should be processed after the lastest version of non-alternative modules are processed to prevent unnecessary deletes. To address these issues, reprocessing / requeue now follows the following logic: 1. All modules are reprocess with a 50x status code based on their last fetch status in module version states. 2. Modules are requeued in the following order (with the exception of large modules): - Latest version of modules previously with 20x status - Latest version of bad modules and alternative modules - Any version of modules previously with 20x status - Any version of bad modules and alternative modules - Any module with a status=0 or status=500 (we expect these to already be in the queue) 3. All large modules are queued last, since these take up a lot of time and need to be processed at a slower rate. Within each category, modules are sorted as follows: 1. num_packages 2. version DESC 3. module_path This keeps the order idempotent, and prioritizes smaller and newer modules. It also allows modules of similar sizes to be processed together. Change-Id: I49580ed75bf60cc2698b756882bfdc906f72d935 Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/725873 Reviewed-by: Jonathan Amsterdam <jba@google.com>