| Age | Commit message (Collapse) | Author |
|
This change creates a new package that does config initialization and
other GCP-specific operations that were previously done in package
config, so that config can have no cloud dependencies.
For golang/go#61399
Change-Id: I8d78294834e325b47d838892a1cef87003a4b90a
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/522516
Run-TryBot: Michael Matloob <matloob@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Robert Findley <rfindley@google.com>
kokoro-CI: kokoro <noreply+kokoro@google.com>
|
|
Change-Id: I1f3b7cc8899c7707abb01e3d14807c37c3451382
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/449695
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jamal Carvalho <jamal@golang.org>
Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com>
|
|
Change-Id: I71373b98c4bf80b176f0f01c3ce50335813678b1
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/391974
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Trust: Cherry Mui <cherryyz@google.com>
|
|
The pkgsite logger is now used instead of the log package from the
standard library.
Change-Id: I5083d8cd6bf7a96a9245f848633b8803d0d80483
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/333931
Trust: Julie Qiu <julie@golang.org>
Run-TryBot: Julie Qiu <julie@golang.org>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jonathan Amsterdam <jba@google.com>
|
|
An endpoint is added to the worker, which only reprocess modules that
are in the search_documents table.
For golang/go#44142
Change-Id: I46e1e457707f3010ea2ea42c8aac60adbf3e8fd7
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/332190
Trust: Julie Qiu <julie@golang.org>
Run-TryBot: Julie Qiu <julie@golang.org>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jonathan Amsterdam <jba@google.com>
|
|
The latest_versions subquery in GetNextModulesToFetch is causing the
query to hang and preventing modules from being enqueued.
This subquery isn't necessary, so it is removed to allow reprocessing to
continue. The optimization will be fixed in a future query.
Change-Id: I9f81b36d5c8ecf5e88922e2c78301c9fc362a92e
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/332377
Trust: Julie Qiu <julie@golang.org>
Run-TryBot: Julie Qiu <julie@golang.org>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jonathan Amsterdam <jba@google.com>
|
|
This reverts commit 61bac80a3d712e004b0fd1355b057c8905d43dce.
Reason for revert: if we queue in arrival order, then multiple versions of the same module that are published together (which seems to happen a lot) will be queued together, and will conflict with each other on the module path lock.
We'll have to remove the processing lag alert, but instead we'll alert on total backlog, which is more robust anyway.
Change-Id: Ia1f9f5c7ec8b0b6577527d3dd6060b692c7a7b28
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/323172
Trust: Jonathan Amsterdam <jba@google.com>
Reviewed-by: Julie Qiu <julie@golang.org>
|
|
When enqueing modules for processing that we haven't seen before,
prefer the ones that arrived earliest, as determined by their index
timestamp.
Besides being "fair," this will satisfy the assumption of the Worker
Processing Lag metric, making that metric more useful.
We don't change the ordering of modules that are being reprocessed.
The arrival order no longer matters at that point, and we want to keep
the pseudo-random ordering provided by the md5 hash to avoid runs of
versions of the same (potentially large) module from gumming up the
workers.
Change-Id: I3c422eda576931110d4dbcd1db9f7199fde67e6f
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/317890
Trust: Jonathan Amsterdam <jba@google.com>
Run-TryBot: Jonathan Amsterdam <jba@google.com>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Julie Qiu <julie@golang.org>
|
|
A handler is added to the worker so that only release and
non-incompatible versions are reprocessed.
This is used to optimize reprocessing for the symbol_history table.
For golang/go#37102
Change-Id: Ie8205dd16709620d267dd362383d4a6822de2495
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/314690
Trust: Julie Qiu <julie@golang.org>
Run-TryBot: Julie Qiu <julie@golang.org>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jonathan Amsterdam <jba@google.com>
|
|
Add a stack trace when we wrap an error from the DB.
These traces can be sent to the error reporting service.
For golang/go#44231
Change-Id: I096cdec4e97a6dcb0b7eb2ccdb4c955e1a0f4ccd
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/291492
Trust: Jonathan Amsterdam <jba@google.com>
Run-TryBot: Jonathan Amsterdam <jba@google.com>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Julie Qiu <julie@golang.org>
|
|
A latest_only option is added to the reprocess handler, so that we can
reprocess only the latest version of each module.
Change-Id: Ic4ece02ef79af9399f9903eef29b950088669de7
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/286872
Trust: Julie Qiu <julie@golang.org>
Run-TryBot: Julie Qiu <julie@golang.org>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jonathan Amsterdam <jba@google.com>
|
|
The new query for fetching the next modules to process doesn't
use the large-module threshold, so remove it.
Change-Id: I8b658522b6b80013a8da2a14d863ec64e4672852
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/270837
Trust: Jonathan Amsterdam <jba@google.com>
Run-TryBot: Jonathan Amsterdam <jba@google.com>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Julie Qiu <julie@golang.org>
|
|
Use the "alternative" ordering that processes latest versions first
and prioritizes some statuses, but then effectively randomizes the
modules.
Change-Id: I2de1837e056a7fe558b519e8841146a5fc547dd8
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/269240
Trust: Jonathan Amsterdam <jba@google.com>
Run-TryBot: Jonathan Amsterdam <jba@google.com>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jamal Carvalho <jamal@golang.org>
Reviewed-by: Julie Qiu <julie@golang.org>
|
|
For requeuing, we were sorting modules first by importance of
reprocessing, then by module path. That clumped all versions of the
same module together (at least all those that needed reprocessing).
The main problem with reprocessing multiple versions of the same
module was that if the load shedder was off about the size, that error
would persist while all versions of the module were being processed.
For example, github.com/elastic/beats/v7 has an 86Mi zip file, but
only a few Ki of actual Go files. So a node with a 100Mi threshold
would shed one version if it was processing another. This frequent
shedding slowed the task queue rate to a crawl.
This CL effectively randomizes the modules that are enqueued,
by hashing the name and version.
This small change has had a dramatic effect on processing.
Workers are now living for many minutes, even hours, before OOMing;
before, even with a load-shed threshold of 100Mi they would still
OOM in a half hour or so.
Before, processing was erratic: bursts of fast progress interspersed
with intervals where processing slowed, as many versions of the same
module were processed together, where the module was large or had
other problems, like the the elastic/beats module discussed above.
Now, processing is uniform and fast, with 8 workers processing about 5
packages/sec and very few sheds.
Change-Id: Id4f2010f7ab3131d4b4e37721cab6d5ff1680a54
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/258018
Reviewed-by: Julie Qiu <julie@golang.org>
Trust: Jonathan Amsterdam <jba@google.com>
|
|
Under an experiment flag, use a simpler requeue ordering that doesn't
leave large modules until the end. It's actually better if large
modules are mixed in with everything else, since they won't get
clumped together at the end and grind things to a standstill.
Change-Id: Iffd5ca70170ddc616a1516aaf783fe2ac446399d
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/257241
Trust: Jonathan Amsterdam <jba@google.com>
Run-TryBot: Jonathan Amsterdam <jba@google.com>
Reviewed-by: Julie Qiu <julie@golang.org>
|
|
Change-Id: Iafeb3727f1d48ed37d10d64db6e39f26db590f29
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/256761
Trust: Jonathan Amsterdam <jba@google.com>
Run-TryBot: Jonathan Amsterdam <jba@google.com>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Julie Qiu <julie@golang.org>
|
|
Change Exec so it return the number of rows affected by
the statement, rather than a sql.Result.
- We never use the other methods of sql.Result.
- It's annoying to get the number of affected rows from a sql.Result
because of the error return value.
- Examination of github.com/lib/pq shows that there is no extra cost
to calling sql.Result.RowsAffected after an Exec.
Change-Id: If16a15cbabf38755518732c4489109e0b01f2cd1
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/253079
Reviewed-by: Julie Qiu <julie@golang.org>
|
|
Bad modules and alternative modules do not need to be reprocessed, and
are no longer marked for reprocessing.
Modules in the reprocess state for those statuses (540 and 541s) will
continue be requeued so that we don't end up having two status codes
that indicate the same thing.
Once all 540s and 541s have been updated, we can delete those error
codes.
Change-Id: I804c021421e8a8e19be49992d9113ea832d35a9b
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/252679
Run-TryBot: Julie Qiu <julie@golang.org>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jonathan Amsterdam <jba@google.com>
|
|
reprocessing"
This reverts commit 851d8cf80bfd77a2db108c00a74a9d11b687da5a.
Reason for revert: use modules.status table instead, see golang.org/cl/249441
Change-Id: Ie5380ad2ca94d772487e4ddbaf8296b6c83b7349
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/249442
Reviewed-by: Julie Qiu <julie@golang.org>
|
|
We don't need to read the 'incompatible' column from
module_version_states.
This also fixes a bug where there was a mismatch between the number of
columns in the query and the number of arguments to rows.Scan in
queryModuleVersionStates.
Change-Id: I5a8e0d41666ce7b5e721ba78342cb18ee21ead2f
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/249297
Reviewed-by: Julie Qiu <julie@golang.org>
|
|
Currently, the worker only updates the module_version_states table to
mark that modules need to be reprocessed. Update the module status in
the version_map table too.
For golang/go#40807
Change-Id: I9a1ecf611a71f1d3dda3d7d7b20d17012a995ca3
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/248677
Run-TryBot: Shaquille Que <shaquille@golang.org>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Julie Qiu <julie@golang.org>
|
|
This change adds support for inserting, sorting and updating
module_version_states.incompatible.
Updates golang/go#37714
Change-Id: I40b0831adef6e78beafee259e972cb0e4d4b90c4
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/248182
Run-TryBot: Julie Qiu <julie@golang.org>
Run-TryBot: Jonathan Amsterdam <jba@google.com>
Reviewed-by: Jonathan Amsterdam <jba@google.com>
Reviewed-by: Julie Qiu <julie@golang.org>
|
|
Don't enqueue more than largeModulesLimit new modules, in addition to
using that limit for large modules.
Some new modules are large, and sometimes the worker does get a big
batch of those new, large modules all at once. Its gets stuck
processing them and makes little to no progress.
Treating all new modules as large will result in processing pauses
when the modules are small, but at least we'll be able to see that
happening and gain some insight into the requeuing behavior.
Updates b/162495665.
Change-Id: I02319fbcec02d97b457638da31c8bf632d43955e
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/245917
Reviewed-by: Julie Qiu <julie@golang.org>
|
|
When we reprocess modules, ones with a status DBModuleInsertInvalid will
now be reprocessed.
Additionally, it is now possible to reprocess modules based on a
specific status code using the /reprocess endpoint.
Change-Id: I378d635ec3939717ee2c8305db7944c6467b28d3
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/243277
Reviewed-by: Jonathan Amsterdam <jba@google.com>
|
|
ToHTTPStatus is renamed to ToStatus, since several derrors codes are
not HTTP status codes.
Change-Id: I41bf1452fdbbafe1a4f752bc3092e39515a2db4b
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/242882
Reviewed-by: Jonathan Amsterdam <jba@google.com>
|
|
Construct a single query that returns all the modules we want to
reprocess, in the order we want.
The main ideas are:
- Use a WITH statement (CTE) to construct a table of (module path,
latest version).
- Add a computed "latest" column to every row by testing membership in
that table.
- Use a CASE in the ORDER BY clause to express the ordering we want.
Some other changes:
- We process status 0 first (the original motivation for this CL).
- As a consequence of using one query, we'll always return a number of
results. up to the limit. To handle the lower limit for large
modules, we add code to truncate the results.
- Before, all 5xx that were not one of the four special codes were
processed after everything else. Now it's relatively easy to process
other 5xx codes at the end of each category (latest versions,
non-large modules).
Change-Id: I4dacb093cfb93299ccfc8d29c1fd0b1cb49bf56f
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/239479
Reviewed-by: Julie Qiu <julie@golang.org>
|
|
Change DB.GetNextModulesToFetch to have fewer special cases.
This will make it easier to add new queries.
Change-Id: I5d10e39543a2054fc3851071a7902398cf06a566
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/239317
Reviewed-by: Julie Qiu <julie@golang.org>
|
|
Our requeue logic is such that large modules (ones with > 1500 packages)
are requeued last, since they can take a long time to reprocess.
However, with large modules, it's possible that the modules that are
queued do not finish processing before the dedupe period for Cloud Tasks
is up, causing the queue to be backed-up with duplicate modules.
A limit of is now added so that at most 100 large modules can be added
to the queue at once.
Change-Id: I0d2a9b709b9a9972b609e9c3af125d58d37f4488
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/755584
Reviewed-by: Jonathan Amsterdam <jba@google.com>
|
|
There was a bug in the requeue logic that caused all modules to be
requeued when next_processed_at < NOW. This is now fixed so that only
module version states with status=0 or status >= 500 is picked up.
Fixes b/157403463
Change-Id: Ieffde87362c2cd05391780a20dba9647884c99a8
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/754074
Reviewed-by: Jonathan Amsterdam <jba@google.com>
|
|
The log for UpdateModuleVersionStatesForReprocessing is tweaked
to improve clarity and make the prefix easier to search.
Change-Id: I0aaf62210b75c84b33a42cb58040071e534f0356
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/743626
CI-Result: Cloud Build <devtools-proctor-result-processor@system.gserviceaccount.com>
Reviewed-by: Jonathan Amsterdam <jba@google.com>
|
|
At the moment, we reprocess and requeue modules using the following
logic:
1. Set all modules to be reprocessed = 505.
2. Requeue modules with status=0 or status >= 500. Prioritize the
following:
- IsLatest: sorted by release vs prerelease modules
- IsBig: hardcoded list of modules we know are big
This poses the following problems:
1. Requeue order is not idempotent: priority is given to categories of
modules, but within each category, the order of modules being queued can
change each time requeue is called. This leads to many modules sitting
in the task queue, and a lack of clarity as to how much progress we have
made when looking at the logs.
2. Modules missing from isBig list: there are several modules missing
from the isBig list, but these aren't being accounted for. We
deproritize large modules because they take a really long time to
process and can timeout if too many are being processed at once, so we
want to process them at a slower rate than other modules.
3. Alternative modules have the same priority as non-alternative
modules: we usually don't care about alternative modules, and they will
be deleted from search_documents once identified. These should be
processed after the lastest version of non-alternative modules are
processed to prevent unnecessary deletes.
To address these issues, reprocessing / requeue now follows the
following logic:
1. All modules are reprocess with a 50x status code based on their last
fetch status in module version states.
2. Modules are requeued in the following order (with the exception of large modules):
- Latest version of modules previously with 20x status
- Latest version of bad modules and alternative modules
- Any version of modules previously with 20x status
- Any version of bad modules and alternative modules
- Any module with a status=0 or status=500 (we expect these to already be in the queue)
3. All large modules are queued last, since these take up a lot of time
and need to be processed at a slower rate.
Within each category, modules are sorted as follows:
1. num_packages
2. version DESC
3. module_path
This keeps the order idempotent, and prioritizes smaller and newer
modules. It also allows modules of similar sizes to be processed
together.
Change-Id: I49580ed75bf60cc2698b756882bfdc906f72d935
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/725873
Reviewed-by: Jonathan Amsterdam <jba@google.com>
|