| Age | Commit message (Collapse) | Author |
|
This change renames the "round" function to the more appropriately named
"alignUp" which rounds an integer up to the next multiple of a power of
two.
This change also adds the alignDown function, which is almost like
alignUp but rounds down to the previous multiple of a power of two.
With these two functions, we also go and replace manual rounding code
with it where we can.
Change-Id: Ie1487366280484dcb2662972b01b4f7135f72fec
Reviewed-on: https://go-review.googlesource.com/c/go/+/190618
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Part 1: CL 199499 (GOOS nacl)
Part 2: CL 200077 (amd64p32 files, toolchain)
Part 3: stuff that arguably should've been part of Part 2, but I forgot
one of my grep patterns when splitting the original CL up into
two parts.
This one might also have interesting stuff to resurrect for any future
x32 ABI support.
Updates #30439
Change-Id: I2b4143374a253a003666f3c69e776b7e456bdb9c
Reviewed-on: https://go-review.googlesource.com/c/go/+/200318
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
This change makes it so that worldsema isn't held across the mark phase.
This means that various operations like ReadMemStats may now stop the
world during the mark phase, reducing latency on such operations.
Only three such operations are still no longer allowed to occur during
marking: GOMAXPROCS, StartTrace, and StopTrace.
For the former it's because any change to GOMAXPROCS impacts GC mark
background worker scheduling and the details there are tricky.
For the latter two it's because tracing needs to observe consistent GC
start and GC end events, and if StartTrace or StopTrace may stop the
world during marking, then it's possible for it to see a GC end event
without a start or GC start event without an end, respectively.
To ensure that GOMAXPROCS and StartTrace/StopTrace cannot proceed until
marking is complete, the runtime now holds a new semaphore, gcsema,
across the mark phase just like it used to with worldsema.
Fixes #19812.
Change-Id: I15d43ed184f711b3d104e8f267fb86e335f86bf9
Reviewed-on: https://go-review.googlesource.com/c/go/+/182657
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Replaces legacy Go syntax for pointer struct member access
with more modern auto-deref alternative.
Found using https://go-critic.github.io/overview#underef-ref
Change-Id: I71a3c424126c4ff5d89f9e4bacb6cc01c6fa2ddf
Reviewed-on: https://go-review.googlesource.com/122895
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Replace `x[:]` where x is a slice with just `x`.
Found using https://go-critic.github.io/overview.html#unslice-ref
Change-Id: Ib0ee16e1d49b2a875b6b92a770049acc33208362
Reviewed-on: https://go-review.googlesource.com/123375
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Every time I poke at #14921, the g.waitreason string
pointer writes show up.
They're not particularly important performance-wise,
but it'd be nice to clear the noise away.
And it does open up a few extra bytes in the g struct
for some future use.
This is a re-roll of CL 99078, which was rolled
back because of failures on s390x.
Those failures were apparently due to an old version of gdb.
Change-Id: Icc2c12f449b2934063fd61e272e06237625ed589
Reviewed-on: https://go-review.googlesource.com/111256
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <mike.munday@ibm.com>
|
|
"Span" is a commonly used term in many distributed tracing systems
(Dapper, OpenCensus, OpenTracing, ...). They use it to refer to a
period of time, not necessarily tied into execution of underlying
processor, thread, or goroutine, unlike the "Span" of runtime/trace
package.
Since distributed tracing and go runtime execution tracing are
already similar enough to cause confusion, this CL attempts to avoid
using the same word if possible.
"Region" is being used in a certain tracing system to refer to a code
region which is pretty close to what runtime/trace.Span currently
refers to. So, replace that.
https://software.intel.com/en-us/itc-user-and-reference-guide-defining-and-recording-functions-or-regions
This CL also tweaks APIs a bit based on jbd and heschi's comments:
NewContext -> NewTask
and it now returns a Task object that exports End method.
StartSpan -> StartRegion
and it now returns a Region object that exports End method.
Also, changed WithSpan to WithRegion and it now takes func() with no
context. Another thought is to get rid of WithRegion. It is a nice
concept but in practice, it seems problematic (a lot of code churn,
and polluting stack trace). Already, the tracing concept is very low
level, and we hope this API to be used with great care.
Recommended usage will be
defer trace.StartRegion(ctx, "someRegion").End()
Left old APIs untouched in this CL. Once the usage of them are cleaned
up, they will be removed in a separate CL.
Change-Id: I73880635e437f3aad51314331a035dd1459b9f3a
Reviewed-on: https://go-review.googlesource.com/108296
Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: JBD <jbd@google.com>
|
|
This reverts commit 4eea887fd477368653f6fcf8ad766030167936e5.
Reason for revert: broke s390x build
Change-Id: Id6c2b6a7319273c4d21f613d4cdd38b00d49f847
Reviewed-on: https://go-review.googlesource.com/100375
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
|
|
Every time I poke at #14921, the g.waitreason string
pointer writes show up.
They're not particularly important performance-wise,
but it'd be nice to clear the noise away.
And it does open up a few extra bytes in the g struct
for some future use.
Change-Id: I7ffbd52fbc2a286931a2218038fda52ed6473cc9
Reviewed-on: https://go-review.googlesource.com/99078
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
|
|
This implements the annotation API proposed in golang.org/cl/63274.
traceString is updated to protect the string map with trace.stringsLock
because the assumption that traceString is called by a single goroutine
(either at the beginning of tracing and at the end of tracing when
dumping all the symbols and function names) is no longer true.
traceString is used by the annotation apis (NewContext, StartSpan, Log)
to register frequently appearing strings (task and span names, and log
keys) after this change.
NewContext -> one or two records (EvString, EvUserTaskCreate)
end function -> one record (EvUserTaskEnd)
StartSpan -> one or two records (EvString, EvUserSpan)
span end function -> one or two records (EvString, EvUserSpan)
Log -> one or two records (EvString, EvUserLog)
EvUserLog record is of the typical record format written by traceEvent
except that it is followed by bytes that represents the value string.
In addition to runtime/trace change, this change includes
corresponding changes in internal/trace to parse the new record types.
Future work to improve efficiency:
More efficient unique task id generation instead of atomic. (per-P
counter).
Instead of a centralized trace.stringsLock, consider using per-P
string cache or something more efficient.
R=go1.11
Change-Id: Iec9276c6c51e5be441ccd52dec270f1e3b153970
Reviewed-on: https://go-review.googlesource.com/71690
Reviewed-by: Austin Clements <austin@google.com>
|
|
This CL presents the proposed user annotation API skeleton.
This CL bumps up the trace version to 1.11.
Design doc https://goo.gl/iqJfJ3
Implementation CLs are followed.
The API introduces three basic building blocks. Log, Span, and Task.
Log is for basic logging. When called, the message will be recorded
to the trace along with timestamp, goroutine id, and stack info.
trace.Log(ctx, messageType message)
Span can be thought as an extension of log to record interesting
time interval during a goroutine's execution. A span is local to a
goroutine by definition.
trace.WithSpan(ctx, "doVeryExpensiveOp", func(ctx context) {
/* do something very expensive */
})
Task is higher-level concept that aids tracing of complex operations
that encompass multiple goroutines or are asynchronous.
For example, an RPC request, a HTTP request, a file write, or a
batch job can be traced with a Task.
Note we chose to design the API around context.Context so it allows
easier integration with other tracing tools, often designed around
context.Context as well. Log and WithSpan APIs recognize the task
information embedded in the context and record it in the trace as
well. That allows the Go execution tracer to associate and group
the spans and log messages based on the task information.
In order to create a Task,
ctx, end := trace.NewContext(ctx, "myTask")
defer end()
The Go execution tracer measures the time between the task created
and the task ended for the task latency.
More discussion history in golang.org/cl/59572.
Update #16619
R=go1.11
Change-Id: I59a937048294dafd23a75cf1723c6db461b193cd
Reviewed-on: https://go-review.googlesource.com/63274
Reviewed-by: Austin Clements <austin@google.com>
|
|
Since Go1.8, different types of GC mark workers were annotated and the
annotation strings were recorded during StartTrace. This change fixes
two issues around the use of traceString from StartTrace here.
1) "failed to parse trace: no consistent ordering of events possible"
This issue is a result of a missing 'batch' event entry. For efficient
tracing, tracer maintains system allocated buffers and once a buffer
is full, it is Flushed out for writing. Moreover, tracing assumes all
the records in the same buffer (batch) are already ordered and implements
more optimization in encoding and defers the completing order
reconstruction till the trace parsing time. Thus, when a Flush happens
and a new buffer is used, the new buffer should contain an event to
indicate the start of a new batch. Before this CL, the batch entry was
written only by traceEvent only when the buffer position is 0 and
wasn't written when flush occurs during traceString.
This CL fixes it by moving the batch entry write to the traceFlush.
2) crash during tracing due to invalid memory access, or during parsing
due to duplicate string entries
This issue is a result of memory allocation during traceString calls.
Execution tracer traces some memory allocation activities. Before this
CL, traceString took the buffer address (*traceBuf) and mutated the buffer.
If memory tracing occurs in the meantime from the same P, the allocation
tracing (traceEvent) will take the same buffer address through the pointer
to the buffer address (**traceBuf), and mutate the buffer.
As a result, one of the followings can happen:
- the allocation record is overwritten by the following trace string
record (data loss)
- if buffer flush occurs during the allocation tracing, traceString
will attempt to write the string record to the old buffer and
eventually causes invalid memory access crash.
- or flush on the same buffer can occur twice (once from the memory
allocation, and once from the string record write), and in this case
the trace can contain the same data twice and the parse will complain
about duplicate string record entries.
This CL fixes the second issue by making the traceString take
**traceBuf (*traceBufPtr).
Change-Id: I24f629758625b38e1916fbfc7d7be6ea210586af
Reviewed-on: https://go-review.googlesource.com/50873
Run-TryBot: Austin Clements <austin@google.com>
Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
|
|
allp now has length gomaxprocs, which means none of allp[i] are nil or
in state _Pdead. This lets replace several different styles of loops
over allp with normal range loops.
for i := 0; i < gomaxprocs; i++ { ... } loops can simply range over
allp. Likewise, range loops over allp[:gomaxprocs] can just range over
allp.
Loops that check for p == nil || p.state == _Pdead don't need to check
this any more.
Loops that check for p == nil don't have to check this *if* dead Ps
don't affect them. I checked that all such loops are, in fact,
unaffected by dead Ps. One loop was potentially affected, which this
fixes by zeroing p.gcAssistTime in procresize.
Updates #15131.
Change-Id: Ifa1c2a86ed59892eca0610360a75bb613bc6dcee
Reviewed-on: https://go-review.googlesource.com/45575
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
This makes it possible to eliminate the hard cap on GOMAXPROCS.
Updates #15131.
Change-Id: I4c422b340791621584c118a6be1b38e8a44f8b70
Reviewed-on: https://go-review.googlesource.com/45573
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
|
|
Use per-P timers, so each P may work with its own timers.
This CL improves performance on multi-CPU systems
in the following cases:
- When serving high number of concurrent connections
with read/write deadlines set (for instance, highly loaded
net/http server).
- When using high number of concurrent timers. These timers
may be implicitly created via context.WithDeadline
or context.WithTimeout.
Production servers should usually set timeout on connections
and external requests in order to prevent from resource leakage.
See https://blog.cloudflare.com/the-complete-guide-to-golang-net-http-timeouts/
Below are relevant benchmark results for various GOMAXPROCS values
on linux/amd64:
context package:
name old time/op new time/op delta
WithTimeout/concurrency=40 4.92µs ± 0% 5.17µs ± 1% +5.07% (p=0.000 n=9+9)
WithTimeout/concurrency=4000 6.03µs ± 1% 6.49µs ± 0% +7.63% (p=0.000 n=8+10)
WithTimeout/concurrency=400000 8.58µs ± 7% 9.02µs ± 4% +5.02% (p=0.019 n=10+10)
name old time/op new time/op delta
WithTimeout/concurrency=40-2 3.70µs ± 1% 2.78µs ± 4% -24.90% (p=0.000 n=8+9)
WithTimeout/concurrency=4000-2 4.49µs ± 4% 3.67µs ± 5% -18.26% (p=0.000 n=10+10)
WithTimeout/concurrency=400000-2 6.16µs ±10% 5.15µs ±13% -16.30% (p=0.000 n=10+10)
name old time/op new time/op delta
WithTimeout/concurrency=40-4 3.58µs ± 1% 2.64µs ± 2% -26.13% (p=0.000 n=9+10)
WithTimeout/concurrency=4000-4 4.17µs ± 0% 3.32µs ± 1% -20.36% (p=0.000 n=10+10)
WithTimeout/concurrency=400000-4 5.57µs ± 9% 4.83µs ±10% -13.27% (p=0.001 n=10+10)
time package:
name old time/op new time/op delta
AfterFunc 6.15ms ± 3% 6.07ms ± 2% ~ (p=0.133 n=10+9)
AfterFunc-2 3.43ms ± 1% 3.56ms ± 1% +3.91% (p=0.000 n=10+9)
AfterFunc-4 5.04ms ± 2% 2.36ms ± 0% -53.20% (p=0.000 n=10+9)
After 6.54ms ± 2% 6.49ms ± 3% ~ (p=0.393 n=10+10)
After-2 3.68ms ± 1% 3.87ms ± 0% +5.14% (p=0.000 n=9+9)
After-4 6.66ms ± 1% 2.87ms ± 1% -56.89% (p=0.000 n=10+10)
Stop 698µs ± 2% 689µs ± 1% -1.26% (p=0.011 n=10+10)
Stop-2 729µs ± 2% 434µs ± 3% -40.49% (p=0.000 n=10+10)
Stop-4 837µs ± 3% 333µs ± 2% -60.20% (p=0.000 n=10+10)
SimultaneousAfterFunc 694µs ± 1% 692µs ± 7% ~ (p=0.481 n=10+10)
SimultaneousAfterFunc-2 714µs ± 3% 569µs ± 2% -20.33% (p=0.000 n=10+10)
SimultaneousAfterFunc-4 782µs ± 2% 386µs ± 2% -50.67% (p=0.000 n=10+10)
StartStop 267µs ± 3% 274µs ± 0% +2.64% (p=0.000 n=8+9)
StartStop-2 238µs ± 2% 140µs ± 3% -40.95% (p=0.000 n=10+8)
StartStop-4 320µs ± 1% 125µs ± 1% -61.02% (p=0.000 n=9+9)
Reset 75.0µs ± 1% 77.5µs ± 2% +3.38% (p=0.000 n=10+10)
Reset-2 150µs ± 2% 40µs ± 5% -73.09% (p=0.000 n=10+9)
Reset-4 226µs ± 1% 33µs ± 1% -85.42% (p=0.000 n=10+10)
Sleep 857µs ± 6% 878µs ± 9% ~ (p=0.079 n=10+9)
Sleep-2 617µs ± 4% 585µs ± 2% -5.21% (p=0.000 n=10+10)
Sleep-4 689µs ± 3% 465µs ± 4% -32.53% (p=0.000 n=10+10)
Ticker 55.9ms ± 2% 55.9ms ± 2% ~ (p=0.971 n=10+10)
Ticker-2 28.7ms ± 2% 28.1ms ± 1% -2.06% (p=0.000 n=10+10)
Ticker-4 14.6ms ± 0% 13.6ms ± 1% -6.80% (p=0.000 n=9+10)
Fixes #15133
Change-Id: I6f4b09d2db8c5bec93146db6501b44dbfe5c0ac4
Reviewed-on: https://go-review.googlesource.com/34784
Reviewed-by: Austin Clements <austin@google.com>
|
|
Right now we only kind of sort of trace GC STW events. We emit events
around mark termination, but those start well after stopping the world
and end before starting it again, and we don't emit any events for
sweep termination.
Fix this by generalizing EvGCScanStart/EvGCScanDone. These were
already re-purposed to indicate mark termination (despite the names).
This commit renames them to EvGCSTWStart/EvGCSTWDone, adds an argument
to indicate the STW reason, and shuffles the runtime to generate them
right before stopping the world and right after starting the world,
respectively.
These events will make it possible to generate precise minimum mutator
utilization (MMU) graphs and could be useful in detecting
non-preemptible goroutines (e.g., #20792).
Change-Id: If95783f370781d8ef66addd94886028103a7c26f
Reviewed-on: https://go-review.googlesource.com/55411
Reviewed-by: Rick Hudson <rlh@golang.org>
|
|
This extends the GCSweepDone event with counts of swept and reclaimed
bytes. These are useful for understanding the duration and
effectiveness of sweep events.
Change-Id: I3c97a4f0f3aad3adbd188adb264859775f54e2df
Reviewed-on: https://go-review.googlesource.com/40811
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
|
|
Currently, each individual span sweep emits a span to the trace. But
sweeps are generally done in loops until some condition is satisfied,
so this tracing is lower-level than anyone really wants any hides the
fact that no other work is being accomplished between adjacent sweep
events. This is also high overhead: enabling tracing significantly
impacts sweep latency.
Replace this with instead tracing around the sweep loops used for
allocation. This is slightly tricky because sweep loops don't
generally know if any sweeping will happen in them. Hence, we make the
tracing lazy by recording in the P that we would like to start tracing
the sweep *if* one happens, and then only closing the sweep event if
we started it.
This does mean we don't get tracing on every sweep path, which are
legion. However, we get much more informative tracing on the paths
that block allocation, which are the paths that matter.
Change-Id: I73e14fbb250acb0c9d92e3648bddaa5e7d7e271c
Reviewed-on: https://go-review.googlesource.com/40810
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Now the runtime/trace tests pass with -l=4.
This also gets rid of the frames cache for multiple reasons:
1) The frames cache was used to avoid repeated calls to funcname and
funcline. Now these calls happen inside the CallersFrames iterator.
2) Maintaining a frames cache is harder: map[uintptr]traceFrame
doesn't work since each PC can map to multiple traceFrames.
3) It's not clear that the cache is important.
Change-Id: I2914ac0b3ba08e39b60149d99a98f9f532b35bbb
Reviewed-on: https://go-review.googlesource.com/40591
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
|
|
Found by github.com/mvdan/unparam.
Change-Id: I20145440ff1bcd27fcf15a740354c52f313e536c
Reviewed-on: https://go-review.googlesource.com/37894
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
|
|
Currently almost every function that deals with a *_func has to first
look up the *moduledata for the module containing the function's entry
point. This means we almost always do at least two identical module
lookups whenever we deal with a *_func (one to get the *_func and
another to get something from its module data) and sometimes several
more.
Fix this by making findfunc return a new funcInfo type that embeds
*_func, but also includes the *moduledata, and making all of the
functions that currently take a *_func instead take a funcInfo and use
the already-found *moduledata.
This transformation is trivial for the most part, since the *_func
type is usually inferred. The annoying part is that we can no longer
use nil to indicate failure, so this introduces a funcInfo.valid()
method and replaces nil checks with calls to valid.
Change-Id: I9b8075ef1c31185c1943596d96dec45c7ab5100f
Reviewed-on: https://go-review.googlesource.com/37331
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
|
|
Add new starvation mode for Mutex.
In starvation mode ownership is directly handed off from
unlocking goroutine to the next waiter. New arriving goroutines
don't compete for ownership.
Unfair wait time is now limited to 1ms.
Also fix a long standing bug that goroutines were requeued
at the tail of the wait queue. That lead to even more unfair
acquisition times with multiple waiters.
Performance of normal mode is not considerably affected.
Fixes #13086
On the provided in the issue lockskew program:
done in 1.207853ms
done in 1.177451ms
done in 1.184168ms
done in 1.198633ms
done in 1.185797ms
done in 1.182502ms
done in 1.316485ms
done in 1.211611ms
done in 1.182418ms
name old time/op new time/op delta
MutexUncontended-48 0.65ns ± 0% 0.65ns ± 1% ~ (p=0.087 n=10+10)
Mutex-48 112ns ± 1% 114ns ± 1% +1.69% (p=0.000 n=10+10)
MutexSlack-48 113ns ± 0% 87ns ± 1% -22.65% (p=0.000 n=8+10)
MutexWork-48 149ns ± 0% 145ns ± 0% -2.48% (p=0.000 n=9+10)
MutexWorkSlack-48 149ns ± 0% 122ns ± 3% -18.26% (p=0.000 n=6+10)
MutexNoSpin-48 103ns ± 4% 105ns ± 3% ~ (p=0.089 n=10+10)
MutexSpin-48 490ns ± 4% 515ns ± 6% +5.08% (p=0.006 n=10+10)
Cond32-48 13.4µs ± 6% 13.1µs ± 5% -2.75% (p=0.023 n=10+10)
RWMutexWrite100-48 53.2ns ± 3% 41.2ns ± 3% -22.57% (p=0.000 n=10+10)
RWMutexWrite10-48 45.9ns ± 2% 43.9ns ± 2% -4.38% (p=0.000 n=10+10)
RWMutexWorkWrite100-48 122ns ± 2% 134ns ± 1% +9.92% (p=0.000 n=10+10)
RWMutexWorkWrite10-48 206ns ± 1% 188ns ± 1% -8.52% (p=0.000 n=8+10)
Cond32-24 12.1µs ± 3% 12.4µs ± 3% +1.98% (p=0.043 n=10+9)
MutexUncontended-24 0.74ns ± 1% 0.75ns ± 1% ~ (p=0.650 n=10+10)
Mutex-24 122ns ± 2% 124ns ± 1% +1.31% (p=0.007 n=10+10)
MutexSlack-24 96.9ns ± 2% 102.8ns ± 2% +6.11% (p=0.000 n=10+10)
MutexWork-24 146ns ± 1% 135ns ± 2% -7.70% (p=0.000 n=10+9)
MutexWorkSlack-24 135ns ± 1% 128ns ± 2% -5.01% (p=0.000 n=10+9)
MutexNoSpin-24 114ns ± 3% 110ns ± 4% -3.84% (p=0.000 n=10+10)
MutexSpin-24 482ns ± 4% 475ns ± 8% ~ (p=0.286 n=10+10)
RWMutexWrite100-24 43.0ns ± 3% 43.1ns ± 2% ~ (p=0.956 n=10+10)
RWMutexWrite10-24 43.4ns ± 1% 43.2ns ± 1% ~ (p=0.085 n=10+9)
RWMutexWorkWrite100-24 130ns ± 3% 131ns ± 3% ~ (p=0.747 n=10+10)
RWMutexWorkWrite10-24 191ns ± 1% 192ns ± 1% ~ (p=0.210 n=10+10)
Cond32-12 11.5µs ± 2% 11.7µs ± 2% +1.98% (p=0.002 n=10+10)
MutexUncontended-12 1.48ns ± 0% 1.50ns ± 1% +1.08% (p=0.004 n=10+10)
Mutex-12 141ns ± 1% 143ns ± 1% +1.63% (p=0.000 n=10+10)
MutexSlack-12 121ns ± 0% 119ns ± 0% -1.65% (p=0.001 n=8+9)
MutexWork-12 141ns ± 2% 150ns ± 3% +6.36% (p=0.000 n=9+10)
MutexWorkSlack-12 131ns ± 0% 138ns ± 0% +5.73% (p=0.000 n=9+10)
MutexNoSpin-12 87.0ns ± 1% 83.7ns ± 1% -3.80% (p=0.000 n=10+10)
MutexSpin-12 364ns ± 1% 377ns ± 1% +3.77% (p=0.000 n=10+10)
RWMutexWrite100-12 42.8ns ± 1% 43.9ns ± 1% +2.41% (p=0.000 n=8+10)
RWMutexWrite10-12 39.8ns ± 4% 39.3ns ± 1% ~ (p=0.433 n=10+9)
RWMutexWorkWrite100-12 131ns ± 1% 131ns ± 0% ~ (p=0.591 n=10+9)
RWMutexWorkWrite10-12 173ns ± 1% 174ns ± 0% ~ (p=0.059 n=10+8)
Cond32-6 10.9µs ± 2% 10.9µs ± 2% ~ (p=0.739 n=10+10)
MutexUncontended-6 2.97ns ± 0% 2.97ns ± 0% ~ (all samples are equal)
Mutex-6 122ns ± 6% 122ns ± 2% ~ (p=0.668 n=10+10)
MutexSlack-6 149ns ± 3% 142ns ± 3% -4.63% (p=0.000 n=10+10)
MutexWork-6 136ns ± 3% 140ns ± 5% ~ (p=0.077 n=10+10)
MutexWorkSlack-6 152ns ± 0% 138ns ± 2% -9.21% (p=0.000 n=6+10)
MutexNoSpin-6 150ns ± 1% 152ns ± 0% +1.50% (p=0.000 n=8+10)
MutexSpin-6 726ns ± 0% 730ns ± 1% ~ (p=0.069 n=10+10)
RWMutexWrite100-6 40.6ns ± 1% 40.9ns ± 1% +0.91% (p=0.001 n=8+10)
RWMutexWrite10-6 37.1ns ± 0% 37.0ns ± 1% ~ (p=0.386 n=9+10)
RWMutexWorkWrite100-6 133ns ± 1% 134ns ± 1% +1.01% (p=0.005 n=9+10)
RWMutexWorkWrite10-6 152ns ± 0% 152ns ± 0% ~ (all samples are equal)
Cond32-2 7.86µs ± 2% 7.95µs ± 2% +1.10% (p=0.023 n=10+10)
MutexUncontended-2 8.10ns ± 0% 9.11ns ± 4% +12.44% (p=0.000 n=9+10)
Mutex-2 32.9ns ± 9% 38.4ns ± 6% +16.58% (p=0.000 n=10+10)
MutexSlack-2 93.4ns ± 1% 98.5ns ± 2% +5.39% (p=0.000 n=10+9)
MutexWork-2 40.8ns ± 3% 43.8ns ± 7% +7.38% (p=0.000 n=10+9)
MutexWorkSlack-2 98.6ns ± 5% 108.2ns ± 2% +9.80% (p=0.000 n=10+8)
MutexNoSpin-2 399ns ± 1% 398ns ± 2% ~ (p=0.463 n=8+9)
MutexSpin-2 1.99µs ± 3% 1.97µs ± 1% -0.81% (p=0.003 n=9+8)
RWMutexWrite100-2 37.6ns ± 5% 46.0ns ± 4% +22.17% (p=0.000 n=10+8)
RWMutexWrite10-2 50.1ns ± 6% 36.8ns ±12% -26.46% (p=0.000 n=9+10)
RWMutexWorkWrite100-2 136ns ± 0% 134ns ± 2% -1.80% (p=0.001 n=7+9)
RWMutexWorkWrite10-2 140ns ± 1% 138ns ± 1% -1.50% (p=0.000 n=10+10)
Cond32 5.93µs ± 1% 5.91µs ± 0% ~ (p=0.411 n=9+10)
MutexUncontended 15.9ns ± 0% 15.8ns ± 0% -0.63% (p=0.000 n=8+8)
Mutex 15.9ns ± 0% 15.8ns ± 0% -0.44% (p=0.003 n=10+10)
MutexSlack 26.9ns ± 3% 26.7ns ± 2% ~ (p=0.084 n=10+10)
MutexWork 47.8ns ± 0% 47.9ns ± 0% +0.21% (p=0.014 n=9+8)
MutexWorkSlack 54.9ns ± 3% 54.5ns ± 3% ~ (p=0.254 n=10+10)
MutexNoSpin 786ns ± 2% 765ns ± 1% -2.66% (p=0.000 n=10+10)
MutexSpin 3.87µs ± 1% 3.83µs ± 0% -0.85% (p=0.005 n=9+8)
RWMutexWrite100 21.2ns ± 2% 21.0ns ± 1% -0.88% (p=0.018 n=10+9)
RWMutexWrite10 22.6ns ± 1% 22.6ns ± 0% ~ (p=0.471 n=9+9)
RWMutexWorkWrite100 132ns ± 0% 132ns ± 0% ~ (all samples are equal)
RWMutexWorkWrite10 124ns ± 0% 123ns ± 0% ~ (p=0.656 n=10+10)
Change-Id: I66412a3a0980df1233ad7a5a0cd9723b4274528b
Reviewed-on: https://go-review.googlesource.com/34310
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
|
|
Now that we don't rescan stacks, stack barriers are unnecessary. This
removes all of the code and structures supporting them as well as
tests that were specifically for stack barriers.
Updates #17503.
Change-Id: Ia29221730e0f2bbe7beab4fa757f31a032d9690c
Reviewed-on: https://go-review.googlesource.com/36620
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
During the mark phase of garbage collection, goroutines that allocate
may be recruited to assist. This change creates trace events for mark
assists and displays them similarly to sweep assists in the trace
viewer.
Mark assists are different than sweeps in that they can be preempted, so
displaying them in the trace viewer is a little tricky -- we may need to
synthesize multiple slices for one mark assist. This could have been
done in the parser instead, but I thought it might be preferable to keep
the parser as true to the event stream as possible.
Change-Id: I381dcb1027a187a354b1858537851fa68a620ea7
Reviewed-on: https://go-review.googlesource.com/36015
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
|
|
Currently when a goroutine blocks on a GC assist, it emits a generic
EvGoBlock event. Since assist blocking events and, in particular, the
length of the blocked assist queue, are important for diagnosing GC
behavior, this commit adds a new EvGoBlockGC event for blocking on a
GC assist. The trace viewer uses this event to report a "waiting on
GC" count in the "Goroutines" row. This makes sense because, unlike
other blocked goroutines, these goroutines do have work to do, so
being blocked on a GC assist is quite similar to being in the
"runnable" state, which we also report in the trace viewer.
Change-Id: Ic21a326992606b121ea3d3d00110d8d1fdc7a5ef
Reviewed-on: https://go-review.googlesource.com/30704
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
|
|
Currently mark workers are shown in the trace as regular goroutines
labeled "runtime.gcBgMarkWorker". That's somewhat unhelpful to an end
user because of the opaque label and particularly unhelpful to runtime
developers because it doesn't distinguish the different types of mark
workers.
Fix this by introducing a variant of the GoStart event called
GoStartLabel that lets the runtime indicate a label for a goroutine
execution span and using this to label mark worker executions as "GC
(<mode>)" in the trace viewer.
Since this bumps the trace version to 1.8, we also add test data for
1.7 traces.
Change-Id: Id7b9c0536508430c661ffb9e40e436f3901ca121
Reviewed-on: https://go-review.googlesource.com/30702
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
|
|
runtime.SetMutexProfileFraction(n int) will capture 1/n-th of stack
traces of goroutines holding contended mutexes if n > 0. From runtime/pprof,
pprot.Lookup("mutex").WriteTo writes the accumulated
stack traces to w (in essentially the same format that blocking
profiling uses).
Change-Id: Ie0b54fa4226853d99aa42c14cb529ae586a8335a
Reviewed-on: https://go-review.googlesource.com/29650
Reviewed-by: Austin Clements <austin@google.com>
|
|
trace's reader *g is going to cause write barriers in unfortunate
places, so replace it with a guintptr.
Change-Id: Ie8fb13bb89a78238f9d2a77ec77da703e96df8af
Reviewed-on: https://go-review.googlesource.com/31469
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
|
|
This covers basically all sysAlloc'd, persistentalloc'd, and
fixalloc'd types.
Change-Id: I0487c887c2a0ade5e33d4c4c12d837e97468e66b
Reviewed-on: https://go-review.googlesource.com/30941
Reviewed-by: Rick Hudson <rlh@golang.org>
|
|
Currently these are labeled "MARK", which was accurate in the STW
collector, but these really indicate mark termination now, since
marking happens for the full duration of the concurrent GC. Re-label
them as "MARK TERMINATION" to clarify this.
Change-Id: Ie98bd961195acde49598b4fa3f9e7d90d757c0a6
Reviewed-on: https://go-review.googlesource.com/30018
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
|
|
When GC is disabled, we set gcpercent to -1. However, we still use
gcpercent to compute several values, such as next_gc and gc_trigger.
These calculations are meaningless when gcpercent is -1 and result in
meaningless values. This is okay in a sense because we also never use
these values if gcpercent is -1, but they're confusing when exposed to
the user, for example via MemStats or the execution trace. It's
particularly unfortunate in the execution trace because it attempts to
plot the underflowed value of next_gc, which scales all useful
information in the heap row into oblivion.
Fix this by making next_gc ^0 when gcpercent < 0. This has the
advantage of being true in a way: next_gc is effectively infinite when
gcpercent < 0. We can also detect this special value when updating the
execution trace and report next_gc as 0 so it doesn't blow up the
display of the heap line.
Change-Id: I4f366e4451f8892a4908da7b2b6086bdc67ca9a9
Reviewed-on: https://go-review.googlesource.com/30016
Reviewed-by: Rick Hudson <rlh@golang.org>
|
|
We reset global buffer only if its pos != 0.
We ought to do it always, but queue it only if pos != 0.
This is a latent bug. Currently it does not fire because
whenever we create a global buffer, we increment pos.
Change-Id: I01e28ae88ce9a5412497c524391b8b7cb443ffd9
Reviewed-on: https://go-review.googlesource.com/25574
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
|
|
In StartTrace we emit EvGoCreate for all existing goroutines.
This includes stack unwind to obtain current stack.
Real Go programs can contain hundreds of thousands of blocked goroutines.
For such programs StartTrace can take up to a second (few ms per goroutine).
Obtain current stack ID once and use it for all EvGoCreate events.
This speeds up StartTrace with 10K blocked goroutines from 20ms to 4 ms
(win for StartTrace called from net/http/pprof hander will be bigger
as stack is deeper).
Change-Id: I9e5ff9468331a840f8fdcdd56c5018c2cfde61fc
Reviewed-on: https://go-review.googlesource.com/25573
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
|
|
Currently tracer uses global sequencer and it introduces
significant slowdown on parallel machines (up to 10x).
Replace the global sequencer with per-goroutine sequencer.
If we assign per-goroutine sequence numbers to only 3 types
of events (start, unblock and syscall exit), it is enough to
restore consistent partial ordering of all events. Even these
events don't need sequence numbers all the time (if goroutine
starts on the same P where it was unblocked, then start does
not need sequence number).
The burden of restoring the order is put on trace parser.
Details of the algorithm are described in the comments.
On http benchmark with GOMAXPROCS=48:
no tracing: 5026 ns/op
tracing: 27803 ns/op (+453%)
with this change: 6369 ns/op (+26%, mostly for traceback)
Also trace size is reduced by ~22%. Average event size before: 4.63
bytes/event, after: 3.62 bytes/event.
Besides running trace tests, I've also tested with manually broken
cputicks (random skew for each event, per-P skew and episodic random skew).
In all cases broken timestamps were detected and no test failures.
Change-Id: I078bde421ccc386a66f6c2051ab207bcd5613efa
Reviewed-on: https://go-review.googlesource.com/21512
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Change-Id: I037101b1921fe151695d32e9874b50dd64982298
Reviewed-on: https://go-review.googlesource.com/22314
Reviewed-by: Austin Clements <austin@google.com>
|
|
1. Parse out version from trace header.
2. Restore handling of 1.5 traces.
3. Restore optional symbolization of traces.
4. Add some canned 1.5 traces for regression testing
(http benchmark trace, runtime/trace stress traces,
plus one with broken timestamps).
Change-Id: Idb18a001d03ded8e13c2730eeeb37c5836e31256
Reviewed-on: https://go-review.googlesource.com/21803
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
|
|
This makes traces self-contained and simplifies trace workflow
in modern cloud environments where it is simpler to reach
a service via HTTP than to obtain the binary.
Change-Id: I6ff3ca694dc698270f1e29da37d5efaf4e843a0d
Reviewed-on: https://go-review.googlesource.com/21732
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
|
|
traceEvent records system call events after a G has already entered
_Gsyscall, which means the garbage collector could be installing stack
barriers in the G's stack during the traceEvent. If traceEvent
attempts to capture the user stack during this, it may observe a
inconsistent stack barriers and panic. Fix this by acquiring the stack
lock around the stack walk in traceEvent.
Fixes #14101.
Change-Id: I15f0ab0c70c04c6e182221f65a6f761c5a896459
Reviewed-on: https://go-review.googlesource.com/18973
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
|
|
This replaces *traceAllocBlock with traceAllocBlockPtr.
Updates #10600.
Change-Id: I94a20d90f04cca7c457b29062427748e315e4857
Reviewed-on: https://go-review.googlesource.com/17004
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
|
|
runtime/internal/sys will hold system-, architecture- and config-
specific constants.
Updates #11647
Change-Id: I6db29c312556087a42e8d2bdd9af40d157c56b54
Reviewed-on: https://go-review.googlesource.com/16817
Reviewed-by: Russ Cox <rsc@golang.org>
|
|
Currently traceBuf keeps track of where it is in the trace buffer by
also maintaining a slice that points in to this buffer with an initial
length of 0 and a cap of the length of the array. All writes to this
buffer are done by appending to the slice (as long as the bounds
checks are right, it will never overflow and the append won't allocate
a new slice).
Each of these appends generates a write barrier. As long as we never
overflow the buffer, this write barrier won't fire, but this wreaks
havoc with eliminating write barriers from the tracing code. If we
were to overflow the buffer, this would both allocate and invoke a
write barrier, both things that are dicey at best to do in many of the
contexts tracing happens. It also wastes space in the traceBuf and
leads to more complex code and more complex generated code.
Replace this slice trick with keeping track of a simple array
position.
Updates #10600.
Change-Id: I0a63eecec1992e195449f414ed47653f66318d0e
Reviewed-on: https://go-review.googlesource.com/16814
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
|
|
This replaces *traceStack with traceStackPtr, much like the preceding
commit.
Updates #10600.
Change-Id: Ifadc35eb37a405ae877f9740151fb31a0ca1d08f
Reviewed-on: https://go-review.googlesource.com/16813
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
|
|
The tracing code is currently called from contexts such as sysmon and
the scheduler where write barriers are not allowed. Unfortunately,
while the common paths through the tracing code do not have write
barriers, many of the less common paths dealing with buffer overflow
and recycling do.
This change replaces all *traceBufs with traceBufPtrs. In the style of
guintptr, etc., the GC does not trace traceBufPtrs and write barriers
do not apply when these pointers are written. Since traceBufs are
allocated from non-GC'd memory and manually managed, this is always
safe.
Updates #10600.
Change-Id: I52b992d36d1b634ebd855c8cde27947ec14f59ba
Reviewed-on: https://go-review.googlesource.com/16812
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
|
|
This change breaks out most of the atomics functions in the runtime
into package runtime/internal/atomic. It adds some basic support
in the toolchain for runtime packages, and also modifies linux/arm
atomics to remove the dependency on the runtime's mutex. The mutexes
have been replaced with spinlocks.
all trybots are happy!
In addition to the trybots, I've tested on the darwin/arm64 builder,
on the darwin/arm builder, and on a ppc64le machine.
Change-Id: I6698c8e3cf3834f55ce5824059f44d00dc8e3c2f
Reviewed-on: https://go-review.googlesource.com/14204
Run-TryBot: Michael Matloob <matloob@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
|
|
This moves all of GC initialization, sweep termination, and the
transition to concurrent marking in to the off->mark transition
function. This means it's now handled on the goroutine that detected
the state exit condition.
As a result, malloc no longer needs to Gosched() at the beginning of
the GC cycle to prevent over-allocation while the GC is starting up
because it will now *help* the GC to start up. The Gosched hack is
still necessary during GC shutdown (this is easy to test by enabling
gctrace and hitting Ctrl-S to block the gctrace output).
At this point, the GC coordinator still handles later phases. This
requires a small tweak to how we start the GC coordinator. Currently,
starting the GC coordinator is best-effort and may fail if the
coordinator is about to park from the previous cycle but hasn't yet.
We fix this by replacing the park/ready to wake up the coordinator
with a semaphore. This is temporary since the coordinator will be
going away in a few commits.
Updates #11970.
Change-Id: I2c6a11c91e72dfbc59c2d8e7c66146dee9a444fe
Reviewed-on: https://go-review.googlesource.com/16357
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
This begins the conversion of the centralized GC coordinator to a
decentralized state machine by introducing the internal API that
triggers the first state transition from _GCoff to _GCmark (or
_GCmarktermination).
This change introduces the transition lock, the off->mark transition
condition (which is very similar to shouldtriggergc()), and the
general structure of a state transition. Since we're doing this
conversion in stages, it then falls back to the GC coordinator to
actually execute the cycle. We'll start moving logic out of the GC
coordinator and in to transition functions next.
This fixes a minor bug in gcstoptheworld debug mode where passing the
heap trigger once could trigger multiple STW GCs.
Updates #11970.
Change-Id: I964087dd190a639eb5766398f8e1bbf8b352902f
Reviewed-on: https://go-review.googlesource.com/16355
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
|
|
The placement and invocation of traceGoSysCall when using
entersyscallblock() instead of entersyscall() differs enough that the
TestTraceSymbolize test can fail on some platforms.
This change moves the invocation of traceGoSysCall for entersyscall() so
that the same number of "frames to skip" are present in the trace as when
entersyscallblock() is used ensuring system call traces remain identical
regardless of internal implementation choices.
Fixes golang/go#12056
Change-Id: I8361e91aa3708f5053f98263dfe9feb8c5d1d969
Reviewed-on: https://go-review.googlesource.com/13861
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
|
|
The call to hash the trace stack reversed the "seed" and "size"
arguments to memhash and, hence, always called memhash with a 0 size,
which dutifully returned a hash value that depended only on the number
of PCs in the stack and not their values. As a result, all stacks were
put in to a very subset of the 8,192 buckets.
Fix this by passing these arguments in the correct order.
Change-Id: I67cd29312f5615c7ffa23e205008dd72c6b8af62
Reviewed-on: https://go-review.googlesource.com/13613
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
|
|
Nearly all the flaky failures we've seen in trace tests have been
due to the use of time stamps to determine relative event ordering.
This is tricky for many reasons, including:
- different cores might not have exactly synchronized clocks
- VMs are worse than real hardware
- non-x86 chips have different timer resolution than x86 chips
- on fast systems two events can end up with the same time stamp
Stop trying to make time reliable. It's clearly not going to be for Go 1.5.
Instead, record an explicit event sequence number for ordering.
Using our own counter solves all of the above problems.
The trace still contains time stamps, of course. The sequence number
is just used for ordering.
Should alleviate #10554 somewhat. Then tickDiv can be chosen to
be a useful time unit instead of having to be exact for ordering.
Separating ordering and time stamps lets the trace parser diagnose
systems where the time stamp order and actual order do not match
for one reason or another. This CL adds that check to the end of
trace.Parse, after all other sequence order-based checking.
If that error is found, we skip the test instead of failing it.
Putting the check in trace.Parse means that cmd/trace will pick
up the same check, refusing to display a trace where the time stamps
do not match actual ordering.
Using net/http's BenchmarkClientServerParallel4 on various CPU counts,
not tracing vs tracing:
name old time/op new time/op delta
ClientServerParallel4 50.4µs ± 4% 80.2µs ± 4% +59.06% (p=0.000 n=10+10)
ClientServerParallel4-2 33.1µs ± 7% 57.8µs ± 5% +74.53% (p=0.000 n=10+10)
ClientServerParallel4-4 18.5µs ± 4% 32.6µs ± 3% +75.77% (p=0.000 n=10+10)
ClientServerParallel4-6 12.9µs ± 5% 24.4µs ± 2% +89.33% (p=0.000 n=10+10)
ClientServerParallel4-8 11.4µs ± 6% 21.0µs ± 3% +83.40% (p=0.000 n=10+10)
ClientServerParallel4-12 14.4µs ± 4% 23.8µs ± 4% +65.67% (p=0.000 n=10+10)
Fixes #10512.
Change-Id: I173eecf8191e86feefd728a5aad25bf1bc094b12
Reviewed-on: https://go-review.googlesource.com/12579
Reviewed-by: Austin Clements <austin@google.com>
|
|
ae1ea2a moved trace-related functions from runtime/pprof to
runtime/trace, but missed a doc comment and a code comment. Update
these to reflect the move.
Change-Id: I6e1e8861e5ede465c08a2e3f80b976145a8b32d8
Reviewed-on: https://go-review.googlesource.com/12525
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
|