aboutsummaryrefslogtreecommitdiff
path: root/src/pkg/runtime/runtime.h
AgeCommit message (Collapse)Author
2014-06-30undo CL 104200047 / 318b04f28372Keith Randall
Breaks windows and race detector. TBR=rsc ««« original CL description runtime: stack allocator, separate from mallocgc In order to move malloc to Go, we need to have a separate stack allocator. If we run out of stack during malloc, malloc will not be available to allocate a new stack. Stacks are the last remaining FlagNoGC objects in the GC heap. Once they are out, we can get rid of the distinction between the allocated/blockboundary bits. (This will be in a separate change.) Fixes #7468 Fixes #7424 LGTM=rsc, dvyukov R=golang-codereviews, dvyukov, khr, dave, rsc CC=golang-codereviews https://golang.org/cl/104200047 »»» TBR=rsc CC=golang-codereviews https://golang.org/cl/101570044
2014-06-30runtime: stack allocator, separate from mallocgcKeith Randall
In order to move malloc to Go, we need to have a separate stack allocator. If we run out of stack during malloc, malloc will not be available to allocate a new stack. Stacks are the last remaining FlagNoGC objects in the GC heap. Once they are out, we can get rid of the distinction between the allocated/blockboundary bits. (This will be in a separate change.) Fixes #7468 Fixes #7424 LGTM=rsc, dvyukov R=golang-codereviews, dvyukov, khr, dave, rsc CC=golang-codereviews https://golang.org/cl/104200047
2014-06-26all: remove 'extern register M *m' from runtimeRuss Cox
The runtime has historically held two dedicated values g (current goroutine) and m (current thread) in 'extern register' slots (TLS on x86, real registers backed by TLS on ARM). This CL removes the extern register m; code now uses g->m. On ARM, this frees up the register that formerly held m (R9). This is important for NaCl, because NaCl ARM code cannot use R9 at all. The Go 1 macrobenchmarks (those with per-op times >= 10 µs) are unaffected: BenchmarkBinaryTree17 5491374955 5471024381 -0.37% BenchmarkFannkuch11 4357101311 4275174828 -1.88% BenchmarkGobDecode 11029957 11364184 +3.03% BenchmarkGobEncode 6852205 6784822 -0.98% BenchmarkGzip 650795967 650152275 -0.10% BenchmarkGunzip 140962363 141041670 +0.06% BenchmarkHTTPClientServer 71581 73081 +2.10% BenchmarkJSONEncode 31928079 31913356 -0.05% BenchmarkJSONDecode 117470065 113689916 -3.22% BenchmarkMandelbrot200 6008923 5998712 -0.17% BenchmarkGoParse 6310917 6327487 +0.26% BenchmarkRegexpMatchMedium_1K 114568 114763 +0.17% BenchmarkRegexpMatchHard_1K 168977 169244 +0.16% BenchmarkRevcomp 935294971 914060918 -2.27% BenchmarkTemplate 145917123 148186096 +1.55% Minux previous reported larger variations, but these were caused by run-to-run noise, not repeatable slowdowns. Actual code changes by Minux. I only did the docs and the benchmarking. LGTM=dvyukov, iant, minux R=minux, josharian, iant, dave, bradfitz, dvyukov CC=golang-codereviews https://golang.org/cl/109050043
2014-05-31runtime: make continuation pc available to stack walkRuss Cox
The 'continuation pc' is where the frame will continue execution, if anywhere. For a frame that stopped execution due to a CALL instruction, the continuation pc is immediately after the CALL. But for a frame that stopped execution due to a fault, the continuation pc is the pc after the most recent CALL to deferproc in that frame, or else 0. That is where execution will continue, if anywhere. The liveness information is only recorded for CALL instructions. This change makes sure that we never look for liveness information except for CALL instructions. Using a valid PC fixes crashes when a garbage collection or stack copying tries to process a stack frame that has faulted. Record continuation pc in heapdump (format change). Fixes #8048. LGTM=iant, khr R=khr, iant, dvyukov CC=golang-codereviews, r https://golang.org/cl/100870044
2014-05-31runtime: fix empty heap dump bug on windows.Shenghou Ma
Fixes #8119. LGTM=khr, rsc R=alex.brainman, khr, bradfitz, rsc CC=golang-codereviews https://golang.org/cl/93640044
2014-05-13runtime: fix triggering of forced GCDmitriy Vyukov
mstats.last_gc is unix time now, it is compared with abstract monotonic time. On my machine GC is forced every 5 mins regardless of last_gc. LGTM=rsc R=golang-codereviews CC=golang-codereviews, iant, rsc https://golang.org/cl/91350045
2014-04-15liblink: introduce TLS register on 386 and amd64Russ Cox
When I did the original 386 ports on Linux and OS X, I chose to define GS-relative expressions like 4(GS) as relative to the actual thread-local storage base, which was usually GS but might not be (it might be FS, or it might be a different constant offset from GS or FS). The original scope was limited but since then the rewrites have gotten out of control. Sometimes GS is rewritten, sometimes FS. Some ports do other rewrites to enable shared libraries and other linking. At no point in the code is it clear whether you are looking at the real GS/FS or some synthesized thing that will be rewritten. The code manipulating all these is duplicated in many places. The first step to fixing issue 7719 is to make the code intelligible again. This CL adds an explicit TLS pseudo-register to the 386 and amd64. As a register, TLS refers to the thread-local storage base, and it can only be loaded into another register: MOVQ TLS, AX An offset from the thread-local storage base is written off(reg)(TLS*1). Semantically it is off(reg), but the (TLS*1) annotation marks this as indexing from the loaded TLS base. This emits a relocation so that if the linker needs to adjust the offset, it can. For example: MOVQ TLS, AX MOVQ 8(AX)(TLS*1), CX // load m into CX On systems that support direct access to the TLS memory, this pair of instructions can be reduced to a direct TLS memory reference: MOVQ 8(TLS), CX // load m into CX The 2-instruction and 1-instruction forms correspond roughly to ELF TLS initial exec mode and ELF TLS local exec mode, respectively. Liblink applies this rewrite on systems that support the 1-instruction form. The decision is made using only the operating system (and probably the -shared flag, eventually), not the link mode. If some link modes on a particular operating system require the 2-instruction form, then all builds for that operating system will use the 2-instruction form, so that the link mode decision can be delayed to link time. Obviously it is late to be making changes like this, but I despair of correcting issue 7719 and issue 7164 without it. To make sure I am not changing existing behavior, I built a "hello world" program for every GOOS/GOARCH combination we have and then worked to make sure that the rewrite generates exactly the same binaries, byte for byte. There are a handful of TODOs in the code marking kludges to get the byte-for-byte property, but at least now I can explain exactly how each binary is handled. The targets I tested this way are: darwin-386 darwin-amd64 dragonfly-386 dragonfly-amd64 freebsd-386 freebsd-amd64 freebsd-arm linux-386 linux-amd64 linux-arm nacl-386 nacl-amd64p32 netbsd-386 netbsd-amd64 openbsd-386 openbsd-amd64 plan9-386 plan9-amd64 solaris-amd64 windows-386 windows-amd64 There were four exceptions to the byte-for-byte goal: windows-386 and windows-amd64 have a time stamp at bytes 137 and 138 of the header. darwin-386 and plan9-386 have five or six modified bytes in the middle of the Go symbol table, caused by editing comments in runtime/sys_{darwin,plan9}_386.s. Fixes #7164. LGTM=iant R=iant, aram, minux.ma, dave CC=golang-codereviews https://golang.org/cl/87920043
2014-04-08reflect, runtime: fix crash in GC due to reflect.call + precise GCRuss Cox
Given type Outer struct { *Inner ... } the compiler generates the implementation of (*Outer).M dispatching to the embedded Inner. The implementation is logically: func (p *Outer) M() { (p.Inner).M() } but since the only change here is the replacement of one pointer receiver with another, the actual generated code overwrites the original receiver with the p.Inner pointer and then jumps to the M method expecting the *Inner receiver. During reflect.Value.Call, we create an argument frame and the associated data structures to describe it to the garbage collector, populate the frame, call reflect.call to run a function call using that frame, and then copy the results back out of the frame. The reflect.call function does a memmove of the frame structure onto the stack (to set up the inputs), runs the call, and the memmoves the stack back to the frame structure (to preserve the outputs). Originally reflect.call did not distinguish inputs from outputs: both memmoves were for the full stack frame. However, in the case where the called function was one of these wrappers, the rewritten receiver is almost certainly a different type than the original receiver. This is not a problem on the stack, where we use the program counter to determine the type information and understand that during (*Outer).M the receiver is an *Outer while during (*Inner).M the receiver in the same memory word is now an *Inner. But in the statically typed argument frame created by reflect, the receiver is always an *Outer. Copying the modified receiver pointer off the stack into the frame will store an *Inner there, and then if a garbage collection happens to scan that argument frame before it is discarded, it will scan the *Inner memory as if it were an *Outer. If the two have different memory layouts, the collection will intepret the memory incorrectly. Fix by only copying back the results. Fixes #7725. LGTM=khr R=khr CC=dave, golang-codereviews https://golang.org/cl/85180043
2014-04-03runtime: fix fault during arm software floating pointRuss Cox
The software floating point runs with m->locks++ to avoid being preempted; recognize this case in panic and undo it so that m->locks is maintained correctly when panicking. Fixes #7553. LGTM=dvyukov R=golang-codereviews, dvyukov CC=golang-codereviews https://golang.org/cl/84030043
2014-03-27runtime: enable 'bad pointer' check during garbage collection of Go stack framesRuss Cox
This is the same check we use during stack copying. The check cannot be applied to C stack frames, even though we do emit pointer bitmaps for the arguments, because (1) the pointer bitmaps assume all arguments are always live, not true of outputs during the prologue, and (2) the pointer bitmaps encode interface values as pointer pairs, not true of interfaces holding integers. For the rest of the frames, however, we should hold ourselves to the rule that a pointer marked live really is initialized. The interface scanning already implicitly checks this because it interprets the type word as a valid type pointer. This may slow things down a little because of the extra loads. Or it may speed things up because we don't bother enqueuing nil pointers anymore. Enough of the rest of the system is slow right now that we can't measure it meaningfully. Enable for now, even if it is slow, to shake out bugs in the liveness bitmaps, and then decide whether to turn it off for the Go 1.3 release (issue 7650 reminds us to do this). The new m->traceback field lets us force printing of fp= values on all goroutine stack traces when we detect a bad pointer. This makes it easier to understand exactly where in the frame the bad pointer is, so that we can trace it back to a specific variable and determine what is wrong. Update #7650 LGTM=khr R=khr CC=golang-codereviews https://golang.org/cl/80860044
2014-03-25runtime: WriteHeapDump dumps the heap to a file.Keith Randall
See http://golang.org/s/go13heapdump for the file format. LGTM=rsc R=rsc, bradfitz, dvyukov, khr CC=golang-codereviews https://golang.org/cl/37540043
2014-03-25runtime: redo stack map entries to avoid false retentionKeith Randall
Change two-bit stack map entries to encode: 0 = dead 1 = scalar 2 = pointer 3 = multiword If multiword, the two-bit entry for the following word encodes: 0 = string 1 = slice 2 = iface 3 = eface That way, during stack scanning we can check if a string is zero length or a slice has zero capacity. We can avoid following the contained pointer in those cases. It is safe to do so because it can never be dereferenced, and it is desirable to do so because it may cause false retention of the following block in memory. Slice feature turned off until issue 7564 is fixed. Update #7549 LGTM=rsc R=golang-codereviews, bradfitz, rsc CC=golang-codereviews https://golang.org/cl/76380043
2014-03-24runtime: use VEH, not SEH, for windows/386 exception handlingRuss Cox
Structured Exception Handling (SEH) was the first way to handle exceptions (memory faults, divides by zero) on Windows. The S might as well stand for "stack-based": the implementation interprets stack addresses in a few different ways, and it gets subtly confused by Go's management of stacks. It's also something that requires active maintenance during cgo switches, and we've had bugs in that maintenance in the past. We have recently come to believe that SEH cannot work with Go's stack usage. See http://golang.org/issue/7325 for details. Vectored Exception Handling (VEH) is more like a Unix signal handler: you set it once for the whole process and forget about it. This CL drops all the SEH code and replaces it with VEH code. Many special cases and 7 #ifdefs disappear. VEH was introduced in Windows XP, so Go on windows/386 will now require Windows XP or later. The previous requirement was Windows 2000 or later. Windows 2000 immediately preceded Windows XP, so Windows 2000 is the only affected version. Microsoft stopped supporting Windows 2000 in 2010. See http://golang.org/s/win2000-golang-nuts for details. Fixes #7325. LGTM=alex.brainman, r R=golang-codereviews, alex.brainman, stephen.gutekanst, dave CC=golang-codereviews, iant, r https://golang.org/cl/74790043
2014-03-14runtime: fix use after close race in Solaris network pollerAram Hăvărneanu
The Solaris network poller uses event ports, which are level-triggered. As such, it has to re-arm itself after each wakeup. The arming mechanism (which runs in its own thread) raced with the closing of a file descriptor happening in a different thread. When a network file descriptor is about to be closed, the network poller is awaken to give it a chance to remove its association with the file descriptor. Because the poller always re-armed itself, it raced with code that closed the descriptor. This change makes the network poller check before re-arming if the file descriptor is about to be closed, in which case it will ignore the re-arming request. It uses the per-PollDesc lock in order to serialize access to the PollDesc. This change also adds extensive documentation describing the Solaris implementation of the network poller. Fixes #7410. LGTM=dvyukov, iant R=golang-codereviews, bradfitz, iant, dvyukov, aram.h, gobot CC=golang-codereviews https://golang.org/cl/69190044
2014-03-13runtime: fix signal handling on Plan 9Anthony Martin
LGTM=rsc R=rsc, 0intro, aram, jeremyjackins, iant CC=golang-codereviews, lucio.dere, minux.ma, paurea, r https://golang.org/cl/9796043
2014-03-13runtime: detect stack split after forkDmitriy Vyukov
This check would allowed to easily prevent issue 7511. Update #7511 LGTM=rsc R=rsc, aram CC=golang-codereviews https://golang.org/cl/75260043
2014-03-13runtime: harden conditions when runtime panics on crashDmitriy Vyukov
This is especially important for SetPanicOnCrash, but also useful for e.g. nil deref in mallocgc. Panics on such crashes can't lead to anything useful, only to deadlocks, hangs and obscure crashes. This is a copy of broken but already LGTMed https://golang.org/cl/68540043/ TBR=rsc R=rsc CC=golang-codereviews https://golang.org/cl/75320043
2014-03-11runtime: more Native Client fixesDave Cheney
Thanks to Ian for spotting these. runtime.h: define uintreg correctly. stack.c: address warning caused by the type of uintreg being 32 bits on amd64p32. Commentary (mainly for my own use) nacl/amd64p32 defines a machine with 64bit registers, but address space is limited to a 4gb window (the window is placed randomly inside the full 48 bit virtual address space of a process). To cope with this 6c defines _64BIT and _64BITREG. _64BITREG is always defined by 6c, so both GOARCH=amd64 and GOARCH=amd64p32 use 64bit wide registers. However _64BIT itself is only defined when 6c is compiling for amd64 targets. The definition is elided for amd64p32 environments causing int, uint and other arch specific types to revert to their 32bit definitions. LGTM=iant R=iant, rsc, remyoudompheng CC=golang-codereviews https://golang.org/cl/72860046
2014-03-07runtime: round stack size to power of 2.Shenghou Ma
Fixes build on windows/386 and plan9/386. Fixes #7487. LGTM=mattn.jp, dvyukov, rsc R=golang-codereviews, mattn.jp, dvyukov, 0intro, rsc CC=golang-codereviews https://golang.org/cl/72360043
2014-03-07runtime: refactor and fix stack management codeDmitriy Vyukov
There are at least 3 bugs: 1. g->stacksize accounting is broken during copystack/shrinkstack 2. stktop->free is not properly maintained during copystack/shrinkstack 3. stktop->free logic is broken: we can have stktop->free==FixedStack, and we will free it into stack cache, but it actually comes from heap as the result of non-copying segment shrink This shows as at least spurious races on race builders (maybe something else as well I don't know). The idea behind the refactoring is to consolidate stacksize and segment origin logic in stackalloc/stackfree. Fixes #7490. LGTM=rsc, khr R=golang-codereviews, rsc, khr CC=golang-codereviews https://golang.org/cl/72440043
2014-03-07runtime: fix memory corruption and leak in recursive panic handlingDmitriy Vyukov
Recursive panics leave dangling Panic structs in g->panic stack. At best it leads to a Defer leak and incorrect output on a subsequent panic. At worst it arbitrary corrupts heap. LGTM=rsc R=rsc CC=golang-codereviews https://golang.org/cl/72480043
2014-03-06runtime: use custom thunks for race calls instead of cgoDmitriy Vyukov
Implement custom assembly thunks for hot race calls (memory accesses and function entry/exit). The thunks extract caller pc, verify that the address is in heap or global and switch to g0 stack. Before: ok regexp 3.692s ok compress/bzip2 9.461s ok encoding/json 6.380s After: ok regexp 2.229s (-40%) ok compress/bzip2 4.703s (-50%) ok encoding/json 3.629s (-43%) For comparison, normal non-race build: ok regexp 0.348s ok compress/bzip2 0.304s ok encoding/json 0.661s Race build: ok regexp 2.229s (+540%) ok compress/bzip2 4.703s (+1447%) ok encoding/json 3.629s (+449%) Also removes some race-related special cases from cgocall and scheduler. In long-term it will allow to remove cyclic runtime/race dependency on cmd/cgo. Fixes #4249. Fixes #7460. Update #6508 Update #6688 R=iant, rsc, bradfitz CC=golang-codereviews https://golang.org/cl/55100044
2014-03-05runtime: handle Go calls C calls Go panic correctly on windows/386Russ Cox
32-bit Windows uses "structured exception handling" (SEH) to handle hardware faults: that there is a per-thread linked list of fault handlers maintained in user space instead of something like Unix's signal handlers. The structures in the linked list are required to live on the OS stack, and the usual discipline is that the function that pushes a record (allocated from the current stack frame) onto the list pops that record before returning. Not to pop the entry before returning creates a dangling pointer error: the list head points to a stack frame that no longer exists. Go pushes an SEH record in the top frame of every OS thread, and that record suffices for all Go execution on that thread, at least until cgo gets involved. If we call into C using cgo, that called C code may push its own SEH records, but by the convention it must pop them before returning back to the Go code. We assume it does, and that's fine. If the C code calls back into Go, we want the Go SEH handler to become active again, not whatever C has set up. So runtime.callbackasm1, which handles a call from C back into Go, pushes a new SEH record before calling the Go code and pops it when the Go code returns. That's also fine. It can happen that when Go calls C calls Go like this, the inner Go code panics. We allow a defer in the outer Go to recover the panic, effectively wiping not only the inner Go frames but also the C calls. This sequence was not popping the SEH stack up to what it was before the cgo calls, so it was creating the dangling pointer warned about above. When eventually the m stack was used enough to overwrite the dangling SEH records, the SEH chain was lost, and any future panic would not end up in Go's handler. The bug in TestCallbackPanic and friends was thus creating a situation where TestSetPanicOnFault - which causes a hardware fault - would not find the Go fault handler and instead crash the binary. Add checks to TestCallbackPanicLocked to diagnose the mistake in that test instead of leaving a bad state for another test case to stumble over. Fix bug by restoring SEH chain during deferred "endcgo" cleanup. This bug is likely present in Go 1.2.1, but since it depends on Go calling C calling Go, with the inner Go panicking and the outer Go recovering the panic, it seems not important enough to bother fixing before Go 1.3. Certainly no one has complained. Fixes #7470. LGTM=alex.brainman R=golang-codereviews, alex.brainman CC=golang-codereviews, iant, khr https://golang.org/cl/71440043
2014-02-26runtime: grow stack by copyingKeith Randall
On stack overflow, if all frames on the stack are copyable, we copy the frames to a new stack twice as large as the old one. During GC, if a G is using less than 1/4 of its stack, copy the stack to a stack half its size. TODO - Do something about C frames. When a C frame is in the stack segment, it isn't copyable. We allocate a new segment in this case. - For idempotent C code, we can abort it, copy the stack, then retry. I'm working on a separate CL for this. - For other C code, we can raise the stackguard to the lowest Go frame so the next call that Go frame makes triggers a copy, which will then succeed. - Pick a starting stack size? The plan is that eventually we reach a point where the stack contains only copyable frames. LGTM=rsc R=dvyukov, rsc CC=golang-codereviews https://golang.org/cl/54650044
2014-02-26runtime: get rid of the settype buffer and lock.Keith Randall
MCaches now hold a MSpan for each sizeclass which they have exclusive access to allocate from, so no lock is needed. Modifying the heap bitmaps also no longer requires a cas. runtime.free gets more expensive. But we don't use it much any more. It's not much faster on 1 processor, but it's a lot faster on multiple processors. benchmark old ns/op new ns/op delta BenchmarkSetTypeNoPtr1 24 23 -0.42% BenchmarkSetTypeNoPtr2 33 34 +0.89% BenchmarkSetTypePtr1 51 49 -3.72% BenchmarkSetTypePtr2 55 54 -1.98% benchmark old ns/op new ns/op delta BenchmarkAllocation 52739 50770 -3.73% BenchmarkAllocation-2 33957 34141 +0.54% BenchmarkAllocation-3 33326 29015 -12.94% BenchmarkAllocation-4 38105 25795 -32.31% BenchmarkAllocation-5 68055 24409 -64.13% BenchmarkAllocation-6 71544 23488 -67.17% BenchmarkAllocation-7 68374 23041 -66.30% BenchmarkAllocation-8 70117 20758 -70.40% LGTM=rsc, dvyukov R=dvyukov, bradfitz, khr, rsc CC=golang-codereviews https://golang.org/cl/46810043
2014-02-26runtime, net: fixes from CL 68490043 reviewRuss Cox
These are mistakes in the first big NaCl CL. LGTM=minux.ma, iant R=golang-codereviews, minux.ma, iant CC=golang-codereviews https://golang.org/cl/69200043
2014-02-25all: merge NaCl branch (part 1)Dave Cheney
See golang.org/s/go13nacl for design overview. This CL is the mostly mechanical changes from rsc's Go 1.2 based NaCl branch, specifically 39cb35750369 to 500771b477cf from https://code.google.com/r/rsc-go13nacl. This CL does not include working NaCl support, there are probably two or three more large merges to come. CL 15750044 is not included as it involves more invasive changes to the linker which will need to be merged separately. The exact change lists included are 15050047: syscall: support for Native Client 15360044: syscall: unzip implementation for Native Client 15370044: syscall: Native Client SRPC implementation 15400047: cmd/dist, cmd/go, go/build, test: support for Native Client 15410048: runtime: support for Native Client 15410049: syscall: file descriptor table for Native Client 15410050: syscall: in-memory file system for Native Client 15440048: all: update +build lines for Native Client port 15540045: cmd/6g, cmd/8g, cmd/gc: support for Native Client 15570045: os: support for Native Client 15680044: crypto/..., hash/crc32, reflect, sync/atomic: support for amd64p32 15690044: net: support for Native Client 15690048: runtime: support for fake time like on Go Playground 15690051: build: disable various tests on Native Client LGTM=rsc R=rsc CC=golang-codereviews https://golang.org/cl/68150047
2014-02-24runtime, net: add support for GOOS=solarisAram Hăvărneanu
LGTM=dave, rsc R=golang-codereviews, minux.ma, mikioh.mikioh, dave, iant, rsc CC=golang-codereviews https://golang.org/cl/36030043
2014-02-20runtime/debug: add SetPanicOnFaultRuss Cox
SetPanicOnFault allows recovery from unexpected memory faults. This can be useful if you are using a memory-mapped file or probing the address space of the current program. LGTM=r R=r CC=golang-codereviews https://golang.org/cl/66590044
2014-02-20runtime: use goc2c as much as possibleRuss Cox
Package runtime's C functions written to be called from Go started out written in C using carefully constructed argument lists and the FLUSH macro to write a result back to memory. For some functions, the appropriate parameter list ended up being architecture-dependent due to differences in alignment, so we added 'goc2c', which takes a .goc file containing Go func declarations but C bodies, rewrites the Go func declaration to equivalent C declarations for the target architecture, adds the needed FLUSH statements, and writes out an equivalent C file. That C file is compiled as part of package runtime. Native Client's x86-64 support introduces the most complex alignment rules yet, breaking many functions that could until now be portably written in C. Using goc2c for those avoids the breakage. Separately, Keith's work on emitting stack information from the C compiler would require the hand-written functions to add #pragmas specifying how many arguments are result parameters. Using goc2c for those avoids maintaining #pragmas. For both reasons, use goc2c for as many Go-called C functions as possible. This CL is a replay of the bulk of CL 15400047 and CL 15790043, both of which were reviewed as part of the NaCl port and are checked in to the NaCl branch. This CL is part of bringing the NaCl code into the main tree. No new code here, just reformatting and occasional movement into .h files. LGTM=r R=dave, alex.brainman, r CC=golang-codereviews https://golang.org/cl/65220044
2014-02-12runtime: improve cpu profiles for GC/syscalls/cgoDmitriy Vyukov
Current "System->etext" is not very informative. Add parent "GC" frame. Replace un-unwindable syscall/cgo frames with Go stack that leads to the call. LGTM=rsc R=rsc, alex.brainman, ality CC=golang-codereviews https://golang.org/cl/61270043
2014-02-12runtime: refactor level-triggered IO supportDmitriy Vyukov
Remove GOOS_solaris ifdef from netpoll code, instead introduce runtime edge/level triggered IO flag. Replace armread/armwrite with a single arm(mode) function, that's how all other interfaces look like and these functions will need to do roughly the same thing anyway. LGTM=rsc R=golang-codereviews, dave, rsc CC=golang-codereviews https://golang.org/cl/55500044
2014-02-12runtime: refactor chan codeDmitriy Vyukov
1. Make internal chan functions static. 2. Move selgen local variable instead of a member of G struct. 3. Change "bool *pres/selected" parameter of chansend/chanrecv to "bool block", which is simpler, faster and less code. -37 lines total. LGTM=rsc R=golang-codereviews, dave, gobot, rsc CC=bradfitz, golang-codereviews, iant, khr https://golang.org/cl/58610043
2014-02-10runtime: do not cpu profile idle threads on windowsDmitriy Vyukov
Currently this leads to a significant skew towards 'etext' entry, since all idle threads are profiled every tick. Before: Total: 66608 samples 63188 94.9% 94.9% 63188 94.9% etext 278 0.4% 95.3% 278 0.4% sweepspan 216 0.3% 95.6% 448 0.7% runtime.mallocgc 122 0.2% 95.8% 122 0.2% scanblock 113 0.2% 96.0% 113 0.2% net/textproto.canonicalMIMEHeaderKey After: Total: 8008 samples 3949 49.3% 49.3% 3949 49.3% etext 231 2.9% 52.2% 231 2.9% scanblock 211 2.6% 54.8% 211 2.6% runtime.cas64 182 2.3% 57.1% 408 5.1% runtime.mallocgc 178 2.2% 59.3% 178 2.2% runtime.atomicload64 LGTM=alex.brainman R=golang-codereviews, alex.brainman CC=golang-codereviews https://golang.org/cl/61250043
2014-01-28runtime: fix windows buildDmitriy Vyukov
Currently windows crashes because early allocs in schedinit try to allocate tiny memory blocks, but m->p is not yet setup. I've considered calling procresize(1) earlier in schedinit, but this refactoring is better and must fix the issue as well. Fixes #7218. R=golang-codereviews, r CC=golang-codereviews https://golang.org/cl/54570045
2014-01-24runtime: combine small NoScan allocationsDmitriy Vyukov
Combine NoScan allocations < 16 bytes into a single memory block. Reduces number of allocations on json/garbage benchmarks by 10+%. json-1 allocated 8039872 7949194 -1.13% allocs 105774 93776 -11.34% cputime 156200000 100700000 -35.53% gc-pause-one 4908873 3814853 -22.29% gc-pause-total 2748969 2899288 +5.47% rss 52674560 43560960 -17.30% sys-gc 3796976 3256304 -14.24% sys-heap 43843584 35192832 -19.73% sys-other 5589312 5310784 -4.98% sys-stack 393216 393216 +0.00% sys-total 53623088 44153136 -17.66% time 156193436 100886714 -35.41% virtual-mem 256548864 256540672 -0.00% garbage-1 allocated 2996885 2932982 -2.13% allocs 62904 55200 -12.25% cputime 17470000 17400000 -0.40% gc-pause-one 932757485 925806143 -0.75% gc-pause-total 4663787 4629030 -0.75% rss 1151074304 1133670400 -1.51% sys-gc 66068352 65085312 -1.49% sys-heap 1039728640 1024065536 -1.51% sys-other 38038208 37485248 -1.45% sys-stack 8650752 8781824 +1.52% sys-total 1152485952 1135417920 -1.48% time 17478088 17418005 -0.34% virtual-mem 1343709184 1324204032 -1.45% LGTM=iant, bradfitz R=golang-codereviews, dave, iant, rsc, bradfitz CC=golang-codereviews, khr https://golang.org/cl/38750047
2014-01-23runtime: Print elision message if we skipped frames on traceback.Keith Randall
Fixes bug 7180 R=golang-codereviews, dvyukov CC=golang-codereviews, gri https://golang.org/cl/55810044
2014-01-22runtime: remove locks from netpoll hotpathsDmitriy Vyukov
Introduces two-phase goroutine parking mechanism -- prepare to park, commit park. This mechanism does not require backing mutex to protect wait predicate. Use it in netpoll. See comment in netpoll.goc for details. This slightly reduces contention between reader, writer and read/write io notifications; and just eliminates a bunch of mutex operations from hotpaths, thus making then faster. benchmark old ns/op new ns/op delta BenchmarkTCP4ConcurrentReadWrite 2109 1945 -7.78% BenchmarkTCP4ConcurrentReadWrite-2 1162 1113 -4.22% BenchmarkTCP4ConcurrentReadWrite-4 798 755 -5.39% BenchmarkTCP4ConcurrentReadWrite-8 803 748 -6.85% BenchmarkTCP4Persistent 9411 9240 -1.82% BenchmarkTCP4Persistent-2 5888 5813 -1.27% BenchmarkTCP4Persistent-4 4016 3968 -1.20% BenchmarkTCP4Persistent-8 3943 3857 -2.18% R=golang-codereviews, mikioh.mikioh, gobot, iant, rsc CC=golang-codereviews, khr https://golang.org/cl/45700043
2014-01-22runtime: allocate goroutine ids in batchesDmitriy Vyukov
Helps reduce contention on sched.goidgen. benchmark old ns/op new ns/op delta BenchmarkCreateGoroutines-16 259 237 -8.49% BenchmarkCreateGoroutinesParallel-16 127 43 -66.06% R=golang-codereviews, dave, bradfitz, khr CC=golang-codereviews, rsc https://golang.org/cl/46970043
2014-01-22runtime: fix and improve CPU profilingDmitriy Vyukov
- do not lose profiling signals when we have no mcache (possible for syscalls/cgo) - do not lose any profiling signals on windows - fix profiling of cgo programs on windows (they had no m->thread setup) - properly setup tls in cgo programs on windows - check _beginthread return value Fixes #6417. Fixes #6986. R=alex.brainman, rsc CC=golang-codereviews https://golang.org/cl/44820047
2014-01-21runtime: do not collect GC roots explicitlyDmitriy Vyukov
Currently we collect (add) all roots into a global array in a single-threaded GC phase. This hinders parallelism. With this change we just kick off parallel for for number_of_goroutines+5 iterations. Then parallel for callback decides whether it needs to scan stack of a goroutine scan data segment, scan finalizers, etc. This eliminates the single-threaded phase entirely. This requires to store all goroutines in an array instead of a linked list (to allow direct indexing). This CL also removes DebugScan functionality. It is broken because it uses unbounded stack, so it can not run on g0. When it was working, I've found it helpless for debugging issues because the two algorithms are too different now. This change would require updating the DebugScan, so it's simpler to just delete it. With 8 threads this change reduces GC pause by ~6%, while keeping cputime roughly the same. garbage-8 allocated 2987886 2989221 +0.04% allocs 62885 62887 +0.00% cputime 21286000 21272000 -0.07% gc-pause-one 26633247 24885421 -6.56% gc-pause-total 873570 811264 -7.13% rss 242089984 242515968 +0.18% sys-gc 13934336 13869056 -0.47% sys-heap 205062144 205062144 +0.00% sys-other 12628288 12628288 +0.00% sys-stack 11534336 11927552 +3.41% sys-total 243159104 243487040 +0.13% time 2809477 2740795 -2.44% R=golang-codereviews, rsc CC=cshapiro, golang-codereviews, khr https://golang.org/cl/46860043
2014-01-21runtime: per-P defer poolDmitriy Vyukov
Instead of a per-goroutine stack of defers for all sizes, introduce per-P defer pool for argument sizes 8, 24, 40, 56, 72 bytes. For a program that starts 1e6 goroutines and then joins then: old: rss=6.6g virtmem=10.2g time=4.85s new: rss=4.5g virtmem= 8.2g time=3.48s R=golang-codereviews, rsc CC=golang-codereviews https://golang.org/cl/42750044
2014-01-17runtime: add support for GOOS=solarisAram Hăvărneanu
R=alex.brainman, dave, jsing, gobot, rsc CC=golang-codereviews https://golang.org/cl/35990043
2014-01-16runtime: output how long goroutines are blockedDmitriy Vyukov
Example of output: goroutine 4 [sleep for 3 min]: time.Sleep(0x34630b8a000) src/pkg/runtime/time.goc:31 +0x31 main.func·002() block.go:16 +0x2c created by main.main block.go:17 +0x33 Full program and output are here: http://play.golang.org/p/NEZdADI3Td Fixes #6809. R=golang-codereviews, khr, kamil.kisiel, bradfitz, rsc CC=golang-codereviews https://golang.org/cl/50420043
2014-01-16runtime: use lock-free ring for work queuesDmitriy Vyukov
Use lock-free fixed-size ring for work queues instead of an unbounded mutex-protected array. The ring has single producer and multiple consumers. If the ring overflows, work is put onto global queue. benchmark old ns/op new ns/op delta BenchmarkMatmult 7 5 -18.12% BenchmarkMatmult-4 2 2 -18.98% BenchmarkMatmult-16 1 0 -12.84% BenchmarkCreateGoroutines 105 88 -16.10% BenchmarkCreateGoroutines-4 376 219 -41.76% BenchmarkCreateGoroutines-16 241 174 -27.80% BenchmarkCreateGoroutinesParallel 103 87 -14.66% BenchmarkCreateGoroutinesParallel-4 169 143 -15.38% BenchmarkCreateGoroutinesParallel-16 158 151 -4.43% R=golang-codereviews, rsc CC=ddetlefs, devon.odell, golang-codereviews https://golang.org/cl/46170044
2014-01-07runtime: use special records hung off the MSpan toKeith Randall
record finalizers and heap profile info. Enables removing the special bit from the heap bitmap. Also provides a generic mechanism for annotating occasional heap objects. finalizers overhead per obj old 680 B 80 B avg new 16 B/span 48 B profile overhead per obj old 32KB 24 B + hash tables new 16 B/span 24 B R=cshapiro, khr, dvyukov, gobot CC=golang-codereviews https://golang.org/cl/13314053
2013-12-19syscall: add NewCallbackCDecl againAlex Brainman
Fixes #6338 R=golang-dev, kin.wilson.za, rsc CC=golang-dev https://golang.org/cl/36180044
2013-12-06runtime: add GODEBUG option for an electric fence like heap modeCarl Shapiro
When enabled this new debugging mode will allocate objects on their own page and never recycle memory addresses. This is an essential tool to root cause a broad class of heap corruption. R=golang-dev, dave, daniel.morsing, dvyukov, rsc, iant, cshapiro CC=golang-dev https://golang.org/cl/22060046
2013-12-03runtime: add an allocation and free tracing for gc debuggingCarl Shapiro
Output for an allocation and free (sweep) follows MProf_Malloc(p=0xc2100210a0, size=0x50, type=0x0 <single object>) #0 0x46ee15 runtime.mallocgc /usr/local/google/home/cshapiro/go/src/pkg/runtime/malloc.goc:141 #1 0x47004f runtime.settype_flush /usr/local/google/home/cshapiro/go/src/pkg/runtime/malloc.goc:612 #2 0x45f92c gc /usr/local/google/home/cshapiro/go/src/pkg/runtime/mgc0.c:2071 #3 0x45f89e mgc /usr/local/google/home/cshapiro/go/src/pkg/runtime/mgc0.c:2050 #4 0x45258b runtime.mcall /usr/local/google/home/cshapiro/go/src/pkg/runtime/asm_amd64.s:179 MProf_Free(p=0xc2100210a0, size=0x50) #0 0x46ee15 runtime.mallocgc /usr/local/google/home/cshapiro/go/src/pkg/runtime/malloc.goc:141 #1 0x47004f runtime.settype_flush /usr/local/google/home/cshapiro/go/src/pkg/runtime/malloc.goc:612 #2 0x45f92c gc /usr/local/google/home/cshapiro/go/src/pkg/runtime/mgc0.c:2071 #3 0x45f89e mgc /usr/local/google/home/cshapiro/go/src/pkg/runtime/mgc0.c:2050 #4 0x45258b runtime.mcall /usr/local/google/home/cshapiro/go/src/pkg/runtime/asm_amd64.s:179 R=golang-dev, dvyukov, rsc, cshapiro CC=golang-dev https://golang.org/cl/21990045
2013-12-02runtime: pass key/value to map accessors by reference, not by value.Keith Randall
This change is part of the plan to get rid of all vararg C calls which are a pain for getting exact stack scanning. We allocate a chunk of zero memory to return a pointer to when a map access doesn't find the key. This is simpler than returning nil and fixing things up in the caller. Linker magic allocates a single zero memory area that is shared by all (non-reflect-generated) map types. Passing things by reference gets rid of some copies, so it speeds up code with big keys/values. benchmark old ns/op new ns/op delta BenchmarkBigKeyMap 34 31 -8.48% BenchmarkBigValMap 37 30 -18.62% BenchmarkSmallKeyMap 26 23 -11.28% R=golang-dev, dvyukov, khr, rsc CC=golang-dev https://golang.org/cl/14794043