aboutsummaryrefslogtreecommitdiff
path: root/src/pkg/runtime/malloc.h
AgeCommit message (Collapse)Author
2014-09-08build: move package sources from src/pkg to srcRuss Cox
Preparation was in CL 134570043. This CL contains only the effect of 'hg mv src/pkg/* src'. For more about the move, see golang.org/s/go14nopkg.
2014-09-04runtime: correct various Go -> C function callsRuss Cox
Some things get converted. Other things (too complex or too many C deps) get onM calls. Other things (too simple) get #pragma textflag NOSPLIT. After this CL, the offending function list is basically: - panic.c - netpoll.goc - mem*.c - race stuff - readgstatus - entersyscall/exitsyscall LGTM=r, iant R=golang-codereviews, r, iant CC=dvyukov, golang-codereviews, khr https://golang.org/cl/140930043
2014-09-03runtime: deferproc/deferreturn in GoKeith Randall
LGTM=rsc R=golang-codereviews, rsc, khr CC=golang-codereviews https://golang.org/cl/139900043
2014-09-01runtime: convert mprof.goc to mprof.goRuss Cox
The exported Go definitions appearing in mprof.go are copied verbatim from debug.go. The unexported Go funcs and types are new. The C Bucket type used a union and was not a line-for-line translation. LGTM=remyoudompheng R=golang-codereviews, remyoudompheng CC=dvyukov, golang-codereviews, iant, khr, r https://golang.org/cl/137040043
2014-08-30runtime: rename SysAlloc to sysAlloc for GoRuss Cox
Renaming the C SysAlloc will let Go define a prototype without exporting it. For use in cpuprof.goc's translation to Go. LGTM=mdempsky R=golang-codereviews, mdempsky CC=golang-codereviews, iant https://golang.org/cl/140060043
2014-08-29runtime: clean up GC codeDmitriy Vyukov
Remove C version of GC. Convert freeOSMemory to Go. Restore g0 check in GC. Remove unknownGCPercent check in GC, it's initialized explicitly now. LGTM=rsc R=golang-codereviews, rsc CC=golang-codereviews, khr https://golang.org/cl/139910043
2014-08-28runtime: move finalizer thread to Go.Keith Randall
LGTM=dvyukov R=golang-codereviews, dvyukov, khr CC=golang-codereviews https://golang.org/cl/124630043
2014-08-27runtime: rename Lock to MutexRuss Cox
Mutex is consistent with package sync, and when in the unexported Go form it avoids having a conflcit between the type (now mutex) and the function (lock). LGTM=iant R=golang-codereviews, iant CC=dvyukov, golang-codereviews, r https://golang.org/cl/133140043
2014-08-27cmd/cc, runtime: preserve C runtime type names in generated GoRuss Cox
uintptr or uint64 in the runtime C were turning into uint in the Go, bool was turning into uint8, and so on. Fix that. Also delete Go wrappers for C functions. The C functions can be called directly now (but still eventually need to be converted to Go). LGTM=bradfitz, minux, iant R=golang-codereviews, bradfitz, iant, minux CC=golang-codereviews, khr, r https://golang.org/cl/138740043
2014-08-25cmd/gc, runtime: treat slices and strings like pointers in garbage collectionRuss Cox
Before, a slice with cap=0 or a string with len=0 might have its base pointer pointing beyond the actual slice/string data into the next block. The collector had to ignore slices and strings with cap=0 in order to avoid misinterpreting the base pointer. Now, a slice with cap=0 or a string with len=0 still has a base pointer pointing into the actual slice/string data, no matter what. The collector can now always scan the pointer, which means strings and slices are no longer special. Fixes #8404. LGTM=khr, josharian R=josharian, khr, dvyukov CC=golang-codereviews https://golang.org/cl/112570044
2014-08-25runtime: remove dedicated scavenger threadDmitriy Vyukov
A whole thread is too much for background scavenger that sleeps all the time anyway. We already have sysmon thread that can do this work. Also remove g->isbackground and simplify enter/exitsyscall. LGTM=rsc R=golang-codereviews, rsc CC=golang-codereviews, khr, rlh https://golang.org/cl/108640043
2014-08-24runtime,runtime/debug: Converted some functions from goc to Go.Sanjay Menakuru
LGTM=khr R=golang-codereviews, khr CC=dvyukov, golang-codereviews https://golang.org/cl/131010044
2014-08-24runtime: fix races on mheap.allspansDmitriy Vyukov
This is based on the crash dump provided by Alan and on mental experiments: sweep 0 74 fatal error: gc: unswept span runtime stack: runtime.throw(0x9df60d) markroot(0xc208002000, 0x3) runtime.parfordo(0xc208002000) runtime.gchelper() I think that when we moved all stacks into heap, we introduced a bunch of bad data races. This was later worsened by parallel stack shrinking. Observation 1: exitsyscall can allocate a stack from heap at any time (including during STW). Observation 2: parallel stack shrinking can (surprisingly) grow heap during marking. Consider that we steadily grow stacks of a number of goroutines from 8K to 16K. And during GC they all can be shrunk back to 8K. Shrinking will allocate lots of 8K stacks, and we do not necessary have that many in heap at this moment. So shrinking can grow heap as well. Consequence: any access to mheap.allspans in GC (and otherwise) must take heap lock. This is not true in several places. Fix this by protecting accesses to mheap.allspans and introducing allspans cache for marking, similar to what we use for sweeping. LGTM=rsc R=golang-codereviews, rsc CC=adonovan, golang-codereviews, khr, rlh https://golang.org/cl/126510043
2014-08-21runtime: convert common scheduler functions to GoDmitriy Vyukov
These are required for chans, semaphores, timers, etc. LGTM=khr R=golang-codereviews, khr CC=golang-codereviews, rlh, rsc https://golang.org/cl/123640043
2014-08-19runtime: fix memstatsDmitriy Vyukov
Newly allocated memory is subtracted from inuse, while it was never added to inuse. Span leftovers are subtracted from both inuse and idle, while they were never added. Fixes #8544. Fixes #8430. LGTM=khr, cookieo9 R=golang-codereviews, khr, cookieo9 CC=golang-codereviews, rlh, rsc https://golang.org/cl/130200044
2014-08-18runtime: implement transfer cacheDmitriy Vyukov
Currently we do the following dance after sweeping a span: 1. lock mcentral 2. remove the span from a list 3. unlock mcentral 4. unmark span 5. lock mheap 6. insert the span into heap 7. unlock mheap 8. lock mcentral 9. observe empty list 10. unlock mcentral 11. lock mheap 12. grab the span 13. unlock mheap 14. mark span 15. lock mcentral 16. insert the span into empty list 17. unlock mcentral This change short-circuits this sequence to nothing, that is, we just cache and use the span after sweeping. This gives us functionality similar (even better) to tcmalloc's transfer cache. benchmark old ns/op new ns/op delta BenchmarkMalloc8 22.2 19.5 -12.16% BenchmarkMalloc16 31.0 26.6 -14.19% LGTM=khr R=golang-codereviews, khr CC=golang-codereviews, rlh, rsc https://golang.org/cl/119550043
2014-08-13runtime: remove FlagNoProfileDmitriy Vyukov
Turns out to be unused as well. LGTM=bradfitz R=golang-codereviews, bradfitz CC=golang-codereviews, khr https://golang.org/cl/127170044
2014-08-08runtime: bump MaxGcprocs to 32Dmitriy Vyukov
There was a number of improvements related to GC parallelization: 1. Parallel roots/stacks scanning. 2. Parallel stack shrinking. 3. Per-thread workbuf caches. 4. Workset reduction. Currently 32 threads work well. go.benchmarks:garbage benchmark on 2 x Intel Xeon E5-2690 (16 HT cores) 1 thread/1 processor: time=16405255 cputime=16386223 gc-pause-one=546793975 gc-pause-total=3280763 2 threads/1 processor: time=9043497 cputime=18075822 gc-pause-one=331116489 gc-pause-total=2152257 4 threads/1 processor: time=4882030 cputime=19421337 gc-pause-one=174543105 gc-pause-total=1134530 8 threads/1 processor: time=4134757 cputime=20097075 gc-pause-one=158680588 gc-pause-total=1015555 16 threads/1 processor + HT: time=2006706 cputime=31960509 gc-pause-one=75425744 gc-pause-total=460097 16 threads/2 processors: time=1513373 cputime=23805571 gc-pause-one=56630946 gc-pause-total=345448 32 threads/2 processors + HT: time=1199312 cputime=37592764 gc-pause-one=48945064 gc-pause-total=278986 LGTM=rlh R=golang-codereviews, tracey.brendan, rlh CC=golang-codereviews, khr, rsc https://golang.org/cl/123920043
2014-08-07cmd/cc, runtime: eliminate use of the unnamed substructure C extensionPeter Collingbourne
Eliminating use of this extension makes it easier to port the Go runtime to other compilers. This CL also disables the extension in cc to prevent accidental use. LGTM=rsc, khr R=rsc, aram, khr, dvyukov CC=axwalk, golang-codereviews https://golang.org/cl/106790044
2014-08-07runtime: convert markallocated from C to GoDmitriy Vyukov
benchmark old ns/op new ns/op delta BenchmarkMalloc8 28.7 22.4 -21.95% BenchmarkMalloc16 44.8 33.8 -24.55% BenchmarkMallocTypeInfo8 49.0 32.9 -32.86% BenchmarkMallocTypeInfo16 46.7 35.8 -23.34% BenchmarkMallocLargeStruct 907 901 -0.66% BenchmarkGobDecode 13235542 12036851 -9.06% BenchmarkGobEncode 10639699 9539155 -10.34% BenchmarkJSONEncode 25193036 21898922 -13.08% BenchmarkJSONDecode 96104044 89464904 -6.91% Fixes #8452. LGTM=khr R=golang-codereviews, bradfitz, rsc, dave, khr CC=golang-codereviews https://golang.org/cl/122090043
2014-08-07runtime: remove mal/malloc/FlagNoGC/FlagNoInvokeGCDmitriy Vyukov
FlagNoGC is unused now. FlagNoInvokeGC is unneeded as we don't invoke GC on g0 and when holding locks anyway. mal/malloc have very few uses and you never remember the exact set of flags they use and the difference between them. Moreover, eventually we need to give exact types to all allocations, something what mal/malloc do not support. LGTM=khr R=golang-codereviews, khr CC=golang-codereviews, rsc https://golang.org/cl/117580043
2014-08-06runtime: simplify codeDmitriy Vyukov
Full spans can't be passed to UncacheSpan since we get rid of free. LGTM=rsc R=golang-codereviews CC=golang-codereviews, khr, rsc https://golang.org/cl/119490044
2014-08-06runtime: cache one GC workbuf in thread-local storageDmitriy Vyukov
We call scanblock for lots of small root pieces e.g. for every stack frame args and locals area. Every scanblock invocation calls getempty/putempty, which accesses lock-free stack shared among all worker threads. One-element local cache allows most scanblock calls to proceed without accessing the shared stack. LGTM=rsc R=golang-codereviews, rlh CC=golang-codereviews, khr, rsc https://golang.org/cl/121250043
2014-07-31runtime: get rid of freeDmitriy Vyukov
Several reasons: 1. Significantly simplifies runtime. 2. This code proved to be buggy. 3. Free is incompatible with bump-the-pointer allocation. 4. We want to write runtime in Go, Go does not have free. 5. Too much code to free env strings on startup. LGTM=khr R=golang-codereviews, josharian, tracey.brendan, khr CC=bradfitz, golang-codereviews, r, rlh, rsc https://golang.org/cl/116390043
2014-07-30runtime: rewrite malloc in Go.Keith Randall
This change introduces gomallocgc, a Go clone of mallocgc. Only a few uses have been moved over, so there are still lots of uses from C. Many of these C uses will be moved over to Go (e.g. in slice.goc), but probably not all. What should remain of C's mallocgc is an open question. LGTM=rsc, dvyukov R=rsc, khr, dave, bradfitz, dvyukov CC=golang-codereviews https://golang.org/cl/108840046
2014-07-29runtime: simpler and faster GCDmitriy Vyukov
Implement the design described in: https://docs.google.com/document/d/1v4Oqa0WwHunqlb8C3ObL_uNQw3DfSY-ztoA-4wWbKcg/pub Summary of the changes: GC uses "2-bits per word" pointer type info embed directly into bitmap. Scanning of stacks/data/heap is unified. The old spans types go away. Compiler generates "sparse" 4-bits type info for GC (directly for GC bitmap). Linker generates "dense" 2-bits type info for data/bss (the same as stacks use). Summary of results: -1680 lines of code total (-1000+ in mgc0.c only) -25% memory consumption -3-7% binary size -15% GC pause reduction -7% run time reduction LGTM=khr R=golang-codereviews, rsc, christoph, khr CC=golang-codereviews, rlh https://golang.org/cl/106260045
2014-07-17undo CL 101570044 / 2c57aaea79c4Keith Randall
redo stack allocation. This is mostly the same as the original CL with a few bug fixes. 1. add racemalloc() for stack allocations 2. fix poolalloc/poolfree to terminate free lists correctly. 3. adjust span ref count correctly. 4. don't use cache for sizes >= StackCacheSize. Should fix bugs and memory leaks in original changelist. ««« original CL description undo CL 104200047 / 318b04f28372 Breaks windows and race detector. TBR=rsc ««« original CL description runtime: stack allocator, separate from mallocgc In order to move malloc to Go, we need to have a separate stack allocator. If we run out of stack during malloc, malloc will not be available to allocate a new stack. Stacks are the last remaining FlagNoGC objects in the GC heap. Once they are out, we can get rid of the distinction between the allocated/blockboundary bits. (This will be in a separate change.) Fixes #7468 Fixes #7424 LGTM=rsc, dvyukov R=golang-codereviews, dvyukov, khr, dave, rsc CC=golang-codereviews https://golang.org/cl/104200047 »»» TBR=rsc CC=golang-codereviews https://golang.org/cl/101570044 »»» LGTM=dvyukov R=dvyukov, dave, khr, alex.brainman CC=golang-codereviews https://golang.org/cl/112240044
2014-07-16runtime: convert map implementation to Go.Keith Randall
It's a bit slower, but not painfully so. There is still room for improvement (saving space so we can use nosplit, and removing the requirement for hash/eq stubs). benchmark old ns/op new ns/op delta BenchmarkMegMap 23.5 24.2 +2.98% BenchmarkMegOneMap 14.9 15.7 +5.37% BenchmarkMegEqMap 71668 72234 +0.79% BenchmarkMegEmptyMap 4.05 4.93 +21.73% BenchmarkSmallStrMap 21.9 22.5 +2.74% BenchmarkMapStringKeysEight_16 23.1 26.3 +13.85% BenchmarkMapStringKeysEight_32 21.9 25.0 +14.16% BenchmarkMapStringKeysEight_64 21.9 25.1 +14.61% BenchmarkMapStringKeysEight_1M 21.9 25.0 +14.16% BenchmarkIntMap 21.8 12.5 -42.66% BenchmarkRepeatedLookupStrMapKey32 39.3 30.2 -23.16% BenchmarkRepeatedLookupStrMapKey1M 322353 322675 +0.10% BenchmarkNewEmptyMap 129 136 +5.43% BenchmarkMapIter 137 107 -21.90% BenchmarkMapIterEmpty 7.14 8.71 +21.99% BenchmarkSameLengthMap 5.24 6.82 +30.15% BenchmarkBigKeyMap 34.5 35.3 +2.32% BenchmarkBigValMap 36.1 36.1 +0.00% BenchmarkSmallKeyMap 26.9 26.7 -0.74% LGTM=rsc R=golang-codereviews, dave, dvyukov, rsc, gobot, khr CC=golang-codereviews https://golang.org/cl/99380043
2014-06-30undo CL 104200047 / 318b04f28372Keith Randall
Breaks windows and race detector. TBR=rsc ««« original CL description runtime: stack allocator, separate from mallocgc In order to move malloc to Go, we need to have a separate stack allocator. If we run out of stack during malloc, malloc will not be available to allocate a new stack. Stacks are the last remaining FlagNoGC objects in the GC heap. Once they are out, we can get rid of the distinction between the allocated/blockboundary bits. (This will be in a separate change.) Fixes #7468 Fixes #7424 LGTM=rsc, dvyukov R=golang-codereviews, dvyukov, khr, dave, rsc CC=golang-codereviews https://golang.org/cl/104200047 »»» TBR=rsc CC=golang-codereviews https://golang.org/cl/101570044
2014-06-30runtime: stack allocator, separate from mallocgcKeith Randall
In order to move malloc to Go, we need to have a separate stack allocator. If we run out of stack during malloc, malloc will not be available to allocate a new stack. Stacks are the last remaining FlagNoGC objects in the GC heap. Once they are out, we can get rid of the distinction between the allocated/blockboundary bits. (This will be in a separate change.) Fixes #7468 Fixes #7424 LGTM=rsc, dvyukov R=golang-codereviews, dvyukov, khr, dave, rsc CC=golang-codereviews https://golang.org/cl/104200047
2014-05-08runtime: write memory profile statistics to the heap dump.Keith Randall
LGTM=rsc R=rsc, khr CC=golang-codereviews https://golang.org/cl/97010043
2014-04-28runtime: clean up scanning of GsKeith Randall
Use a real type for Gs instead of scanning them conservatively. Zero the schedlink pointer when it is dead. Update #7820 LGTM=rsc R=rsc, dvyukov CC=golang-codereviews https://golang.org/cl/89360043
2014-04-03cmd/gc, runtime: make GODEBUG=gcdead=1 mode work with livenessRuss Cox
Trying to make GODEBUG=gcdead=1 work with liveness and in particular ambiguously live variables. 1. In the liveness computation, mark all ambiguously live variables as live for the entire function, except the entry. They are zeroed directly after entry, and we need them not to be poisoned thereafter. 2. In the liveness computation, compute liveness (and deadness) for all parameters, not just pointer-containing parameters. Otherwise gcdead poisons untracked scalar parameters and results. 3. Fix liveness debugging print for -live=2 to use correct bitmaps. (Was not updated for compaction during compaction CL.) 4. Correct varkill during map literal initialization. Was killing the map itself instead of the inserted value temp. 5. Disable aggressive varkill cleanup for call arguments if the call appears in a defer or go statement. 6. In the garbage collector, avoid bug scanning empty strings. An empty string is two zeros. The multiword code only looked at the first zero and then interpreted the next two bits in the bitmap as an ordinary word bitmap. For a string the bits are 11 00, so if a live string was zero length with a 0 base pointer, the poisoning code treated the length as an ordinary word with code 00, meaning it needed poisoning, turning the string into a poison-length string with base pointer 0. By the same logic I believe that a live nil slice (bits 11 01 00) will have its cap poisoned. Always scan full multiword struct. 7. In the runtime, treat both poison words (PoisonGC and PoisonStack) as invalid pointers that warrant crashes. Manual testing as follows: - Create a script called gcdead on your PATH containing: #!/bin/bash GODEBUG=gcdead=1 GOGC=10 GOTRACEBACK=2 exec "$@" - Now you can build a test and then run 'gcdead ./foo.test'. - More importantly, you can run 'go test -short -exec gcdead std' to run all the tests. Fixes #7676. While here, enable the precise scanning of slices, since that was disabled due to bugs like these. That now works, both with and without gcdead. Fixes #7549. LGTM=khr R=khr CC=golang-codereviews https://golang.org/cl/83410044
2014-04-02runtime: revert change to PoisonPtr valueRuss Cox
Submitted accidentally in CL 83630044. Fixes various builds. TBR=khr CC=golang-codereviews https://golang.org/cl/83100047
2014-04-02cmd/gc, cmd/ld, runtime: compact liveness bitmapsRuss Cox
Reduce footprint of liveness bitmaps by about 5x. 1. Mark all liveness bitmap symbols as 4-byte aligned (they were aligned to a larger size by default). 2. The bitmap data is a bitmap count n followed by n bitmaps. Each bitmap begins with its own count m giving the number of bits. All the m's are the same for the n bitmaps. Emit this bitmap length once instead of n times. 3. Many bitmaps within a function have the same bit values, but each call site was given a distinct bitmap. Merge duplicate bitmaps so that no bitmap is written more than once. 4. Many functions end up with the same aggregate bitmap data. We used to name the bitmap data funcname.gcargs and funcname.gclocals. Instead, name it gclocals.<md5 of data> and mark it dupok so that the linker coalesces duplicate sets. This cut the bitmap data remaining after step 3 by 40%; I was not expecting it to be quite so dramatic. Applied to "go build -ldflags -w code.google.com/p/go.tools/cmd/godoc": bitmaps pclntab binary on disk before this CL 1326600 1985854 12738268 4-byte align 1154288 (0.87x) 1985854 (1.00x) 12566236 (0.99x) one bitmap len 782528 (0.54x) 1985854 (1.00x) 12193500 (0.96x) dedup bitmap 414748 (0.31x) 1948478 (0.98x) 11787996 (0.93x) dedup bitmap set 245580 (0.19x) 1948478 (0.98x) 11620060 (0.91x) While here, remove various dead blocks of code from plive.c. Fixes #6929. Fixes #7568. LGTM=khr R=khr CC=golang-codereviews https://golang.org/cl/83630044
2014-04-01runtime: adjust GODEBUG=allocfreetrace=1 and GODEBUG=gcdead=1Russ Cox
GODEBUG=allocfreetrace=1: The allocfreetrace=1 mode prints a stack trace for each block allocated and freed, and also a stack trace for each garbage collection. It was implemented by reusing the heap profiling support: if allocfreetrace=1 then the heap profile was effectively running at 1 sample per 1 byte allocated (always sample). The stack being shown at allocation was the stack gathered for profiling, meaning it was derived only from the program counters and did not include information about function arguments or frame pointers. The stack being shown at free was the allocation stack, not the free stack. If you are generating this log, you can find the allocation stack yourself, but it can be useful to see exactly the sequence that led to freeing the block: was it the garbage collector or an explicit free? Now that the garbage collector runs on an m0 stack, the stack trace for the garbage collector was never interesting. Fix all these problems: 1. Decouple allocfreetrace=1 from heap profiling. 2. Print the standard goroutine stack traces instead of a custom format. 3. Print the stack trace at time of allocation for an allocation, and print the stack trace at time of free (not the allocation trace again) for a free. 4. Print all goroutine stacks at garbage collection. Having all the stacks means that you can see the exact point at which each goroutine was preempted, which is often useful for identifying liveness-related errors. GODEBUG=gcdead=1: This mode overwrites dead pointers with a poison value. Detect the poison value as an invalid pointer during collection, the same way that small integers are invalid pointers. LGTM=khr R=khr CC=golang-codereviews https://golang.org/cl/81670043
2014-03-26runtime: fix yet another race in bgsweepDmitriy Vyukov
Currently it's possible that bgsweep finishes before all spans have been swept (we only know that sweeping of all spans has *started*). In such case bgsweep may fail wake up runfinq goroutine when it needs to. finq may still be nil at this point, but some finalizers may be queued later. Make bgsweep to wait for sweeping to *complete*, then it can decide whether it needs to wake up runfinq for sure. Update #7533 LGTM=rsc R=rsc CC=golang-codereviews https://golang.org/cl/75960043
2014-03-25runtime: WriteHeapDump dumps the heap to a file.Keith Randall
See http://golang.org/s/go13heapdump for the file format. LGTM=rsc R=rsc, bradfitz, dvyukov, khr CC=golang-codereviews https://golang.org/cl/37540043
2014-03-25runtime: redo stack map entries to avoid false retentionKeith Randall
Change two-bit stack map entries to encode: 0 = dead 1 = scalar 2 = pointer 3 = multiword If multiword, the two-bit entry for the following word encodes: 0 = string 1 = slice 2 = iface 3 = eface That way, during stack scanning we can check if a string is zero length or a slice has zero capacity. We can avoid following the contained pointer in those cases. It is safe to do so because it can never be dereferenced, and it is desirable to do so because it may cause false retention of the following block in memory. Slice feature turned off until issue 7564 is fixed. Update #7549 LGTM=rsc R=golang-codereviews, bradfitz, rsc CC=golang-codereviews https://golang.org/cl/76380043
2014-03-25runtime: accurately record whether heap memory is reservedIan Lance Taylor
The existing code did not have a clear notion of whether memory has been actually reserved. It checked based on whether in 32-bit mode or 64-bit mode and (on GNU/Linux) the requested address, but it confused the requested address and the returned address. LGTM=rsc R=rsc, dvyukov CC=golang-codereviews, michael.hudson https://golang.org/cl/79610043
2014-03-06runtime: shrink bigger stacks without any copying.Keith Randall
Instead, split the underlying storage in half and free just half of it. Shrinking without copying lets us reclaim storage used by a previously profligate Go routine that has now blocked inside some C code. To shrink in place, we need all stacks to be a power of 2 in size. LGTM=rsc R=golang-codereviews, rsc CC=golang-codereviews https://golang.org/cl/69580044
2014-03-06runtime: fix malloc page alignment + efenceRuss Cox
Two memory allocator bug fixes. - efence is not maintaining the proper heap metadata to make eventual memory reuse safe, so use SysFault. - now that our heap PageSize is 8k but most hardware uses 4k pages, SysAlloc and SysReserve results must be explicitly aligned. Do that in a few more call sites and document this fact in malloc.h. Fixes #7448. LGTM=iant R=golang-codereviews, josharian, iant CC=dvyukov, golang-codereviews https://golang.org/cl/71750048
2014-02-26runtime: grow stack by copyingKeith Randall
On stack overflow, if all frames on the stack are copyable, we copy the frames to a new stack twice as large as the old one. During GC, if a G is using less than 1/4 of its stack, copy the stack to a stack half its size. TODO - Do something about C frames. When a C frame is in the stack segment, it isn't copyable. We allocate a new segment in this case. - For idempotent C code, we can abort it, copy the stack, then retry. I'm working on a separate CL for this. - For other C code, we can raise the stackguard to the lowest Go frame so the next call that Go frame makes triggers a copy, which will then succeed. - Pick a starting stack size? The plan is that eventually we reach a point where the stack contains only copyable frames. LGTM=rsc R=dvyukov, rsc CC=golang-codereviews https://golang.org/cl/54650044
2014-02-26runtime: get rid of the settype buffer and lock.Keith Randall
MCaches now hold a MSpan for each sizeclass which they have exclusive access to allocate from, so no lock is needed. Modifying the heap bitmaps also no longer requires a cas. runtime.free gets more expensive. But we don't use it much any more. It's not much faster on 1 processor, but it's a lot faster on multiple processors. benchmark old ns/op new ns/op delta BenchmarkSetTypeNoPtr1 24 23 -0.42% BenchmarkSetTypeNoPtr2 33 34 +0.89% BenchmarkSetTypePtr1 51 49 -3.72% BenchmarkSetTypePtr2 55 54 -1.98% benchmark old ns/op new ns/op delta BenchmarkAllocation 52739 50770 -3.73% BenchmarkAllocation-2 33957 34141 +0.54% BenchmarkAllocation-3 33326 29015 -12.94% BenchmarkAllocation-4 38105 25795 -32.31% BenchmarkAllocation-5 68055 24409 -64.13% BenchmarkAllocation-6 71544 23488 -67.17% BenchmarkAllocation-7 68374 23041 -66.30% BenchmarkAllocation-8 70117 20758 -70.40% LGTM=rsc, dvyukov R=dvyukov, bradfitz, khr, rsc CC=golang-codereviews https://golang.org/cl/46810043
2014-02-20runtime: use goc2c as much as possibleRuss Cox
Package runtime's C functions written to be called from Go started out written in C using carefully constructed argument lists and the FLUSH macro to write a result back to memory. For some functions, the appropriate parameter list ended up being architecture-dependent due to differences in alignment, so we added 'goc2c', which takes a .goc file containing Go func declarations but C bodies, rewrites the Go func declaration to equivalent C declarations for the target architecture, adds the needed FLUSH statements, and writes out an equivalent C file. That C file is compiled as part of package runtime. Native Client's x86-64 support introduces the most complex alignment rules yet, breaking many functions that could until now be portably written in C. Using goc2c for those avoids the breakage. Separately, Keith's work on emitting stack information from the C compiler would require the hand-written functions to add #pragmas specifying how many arguments are result parameters. Using goc2c for those avoids maintaining #pragmas. For both reasons, use goc2c for as many Go-called C functions as possible. This CL is a replay of the bulk of CL 15400047 and CL 15790043, both of which were reviewed as part of the NaCl port and are checked in to the NaCl branch. This CL is part of bringing the NaCl code into the main tree. No new code here, just reformatting and occasional movement into .h files. LGTM=r R=dave, alex.brainman, r CC=golang-codereviews https://golang.org/cl/65220044
2014-02-13runtime: update malloc comment for MSpan.needzeroRuss Cox
Missed this suggestion in CL 57680046. LGTM=iant R=iant CC=golang-codereviews https://golang.org/cl/63390043
2014-02-13runtime: introduce MSpan.needzero instead of writing to span dataRuss Cox
This cleans up the code significantly, and it avoids any possible problems with madvise zeroing out some but not all of the data. Fixes #6400. LGTM=dave R=dvyukov, dave CC=golang-codereviews https://golang.org/cl/57680046
2014-02-12runtime: concurrent GC sweepDmitriy Vyukov
Moves sweep phase out of stoptheworld by adding background sweeper goroutine and lazy on-demand sweeping. It turned out to be somewhat trickier than I expected, because there is no point in time when we know size of live heap nor consistent number of mallocs and frees. So everything related to next_gc, mprof, memstats, etc becomes trickier. At the end of GC next_gc is conservatively set to heap_alloc*GOGC, which is much larger than real value. But after every sweep next_gc is decremented by freed*GOGC. So when everything is swept next_gc becomes what it should be. For mprof I had to introduce 3-generation scheme (allocs, revent_allocs, prev_allocs), because by the end of GC we know number of frees for the *previous* GC. Significant caution is required to not cross yet-unknown real value of next_gc. This is achieved by 2 means: 1. Whenever I allocate a span from MCentral, I sweep a span in that MCentral. 2. Whenever I allocate N pages from MHeap, I sweep until at least N pages are returned to heap. This provides quite strong guarantees that heap does not grow when it should now. http-1 allocated 7036 7033 -0.04% allocs 60 60 +0.00% cputime 51050 46700 -8.52% gc-pause-one 34060569 1777993 -94.78% gc-pause-total 2554 133 -94.79% latency-50 178448 170926 -4.22% latency-95 284350 198294 -30.26% latency-99 345191 220652 -36.08% rss 101564416 101007360 -0.55% sys-gc 6606832 6541296 -0.99% sys-heap 88801280 87752704 -1.18% sys-other 7334208 7405928 +0.98% sys-stack 524288 524288 +0.00% sys-total 103266608 102224216 -1.01% time 50339 46533 -7.56% virtual-mem 292990976 293728256 +0.25% garbage-1 allocated 2983818 2990889 +0.24% allocs 62880 62902 +0.03% cputime 16480000 16190000 -1.76% gc-pause-one 828462467 487875135 -41.11% gc-pause-total 4142312 2439375 -41.11% rss 1151709184 1153712128 +0.17% sys-gc 66068352 66068352 +0.00% sys-heap 1039728640 1039728640 +0.00% sys-other 37776064 40770176 +7.93% sys-stack 8781824 8781824 +0.00% sys-total 1152354880 1155348992 +0.26% time 16496998 16199876 -1.80% virtual-mem 1409564672 1402281984 -0.52% LGTM=rsc R=golang-codereviews, sameer, rsc, iant, jeremyjackins, gobot CC=golang-codereviews, khr https://golang.org/cl/46430043
2014-01-30runtime: increase page size to 8KDmitriy Vyukov
Tcmalloc uses 8K, 32K and 64K pages, and in custom setups 256K pages. Only Chromium uses 4K pages today (in "slow but small" configuration). The general tendency is to increase page size, because it reduces metadata size and DTLB pressure. This change reduces GC pause by ~10% and slightly improves other metrics. json-1 allocated 8037492 8038689 +0.01% allocs 105762 105573 -0.18% cputime 158400000 155800000 -1.64% gc-pause-one 4412234 4135702 -6.27% gc-pause-total 2647340 2398707 -9.39% rss 54923264 54525952 -0.72% sys-gc 3952624 3928048 -0.62% sys-heap 46399488 46006272 -0.85% sys-other 5597504 5290304 -5.49% sys-stack 393216 393216 +0.00% sys-total 56342832 55617840 -1.29% time 158478890 156046916 -1.53% virtual-mem 256548864 256593920 +0.02% garbage-1 allocated 2991113 2986259 -0.16% allocs 62844 62652 -0.31% cputime 16330000 15860000 -2.88% gc-pause-one 789108229 725555211 -8.05% gc-pause-total 3945541 3627776 -8.05% rss 1143660544 1132253184 -1.00% sys-gc 65609600 65806208 +0.30% sys-heap 1032388608 1035599872 +0.31% sys-other 37501632 22777664 -39.26% sys-stack 8650752 8781824 +1.52% sys-total 1144150592 1132965568 -0.98% time 16364602 15891994 -2.89% virtual-mem 1327296512 1313746944 -1.02% This is the exact reincarnation of already LGTMed: https://golang.org/cl/45770044 which must not break darwin/freebsd after: https://golang.org/cl/56630043 TBR=iant LGTM=khr, iant R=iant, khr CC=golang-codereviews https://golang.org/cl/58230043
2014-01-28runtime: fix windows buildDmitriy Vyukov
Currently windows crashes because early allocs in schedinit try to allocate tiny memory blocks, but m->p is not yet setup. I've considered calling procresize(1) earlier in schedinit, but this refactoring is better and must fix the issue as well. Fixes #7218. R=golang-codereviews, r CC=golang-codereviews https://golang.org/cl/54570045