aboutsummaryrefslogtreecommitdiff
path: root/src/internal/runtime/maps/map.go
diff options
context:
space:
mode:
authorMichael Pratt <mpratt@google.com>2024-04-22 15:48:57 -0400
committerGopher Robot <gobot@golang.org>2024-10-08 16:43:52 +0000
commit0733682e5ff4cd294f5eccb31cbe87a543147bc6 (patch)
tree85e0bb00c19fd019559ab6ed1009b197aa76db7a /src/internal/runtime/maps/map.go
parent13e9a55afd1e269504ac60143a67ffc8d0731bba (diff)
downloadgo-0733682e5ff4cd294f5eccb31cbe87a543147bc6.tar.xz
internal/runtime/maps: initial swiss table map implementation
Add a new package that will contain a new "Swiss Table" (https://abseil.io/about/design/swisstables) map implementation, which is intended to eventually replace the existing runtime map implementation. This implementation is based on the fabulous github.com/cockroachdb/swiss package contributed by Peter Mattis. This CL adds an hash map implementation. It supports all the core operations, but does not have incremental growth. For #54766. Change-Id: I52cf371448c3817d471ddb1f5a78f3513565db41 Reviewed-on: https://go-review.googlesource.com/c/go/+/582415 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
Diffstat (limited to 'src/internal/runtime/maps/map.go')
-rw-r--r--src/internal/runtime/maps/map.go128
1 files changed, 128 insertions, 0 deletions
diff --git a/src/internal/runtime/maps/map.go b/src/internal/runtime/maps/map.go
new file mode 100644
index 0000000000..0309e124fe
--- /dev/null
+++ b/src/internal/runtime/maps/map.go
@@ -0,0 +1,128 @@
+// Copyright 2024 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Package maps implements Go's builtin map type.
+package maps
+
+// This package contains the implementation of Go's builtin map type.
+//
+// The map design is based on Abseil's "Swiss Table" map design
+// (https://abseil.io/about/design/swisstables), with additional modifications
+// to cover Go's additional requirements, discussed below.
+//
+// Terminology:
+// - Slot: A storage location of a single key/element pair.
+// - Group: A group of abi.SwissMapGroupSlots (8) slots, plus a control word.
+// - Control word: An 8-byte word which denotes whether each slot is empty,
+// deleted, or used. If a slot is used, its control byte also contains the
+// lower 7 bits of the hash (H2).
+// - H1: Upper 57 bits of a hash.
+// - H2: Lower 7 bits of a hash.
+// - Table: A complete "Swiss Table" hash table. A table consists of one or
+// more groups for storage plus metadata to handle operation and determining
+// when to grow.
+//
+// At its core, the table design is similar to a traditional open-addressed
+// hash table. Storage consists of an array of groups, which effectively means
+// an array of key/elem slots with some control words interspersed. Lookup uses
+// the hash to determine an initial group to check. If, due to collisions, this
+// group contains no match, the probe sequence selects the next group to check
+// (see below for more detail about the probe sequence).
+//
+// The key difference occurs within a group. In a standard open-addressed
+// linear probed hash table, we would check each slot one at a time to find a
+// match. A swiss table utilizes the extra control word to check all 8 slots in
+// parallel.
+//
+// Each byte in the control word corresponds to one of the slots in the group.
+// In each byte, 1 bit is used to indicate whether the slot is in use, or if it
+// is empty/deleted. The other 7 bits contain the lower 7 bits of the hash for
+// the key in that slot. See [ctrl] for the exact encoding.
+//
+// During lookup, we can use some clever bitwise manipulation to compare all 8
+// 7-bit hashes against the input hash in parallel (see [ctrlGroup.matchH2]).
+// That is, we effectively perform 8 steps of probing in a single operation.
+// With SIMD instructions, this could be extended to 16 slots with a 16-byte
+// control word.
+//
+// Since we only use 7 bits of the 64 bit hash, there is a 1 in 128 (~0.7%)
+// probability of false positive on each slot, but that's fine: we always need
+// double check each match with a standard key comparison regardless.
+//
+// Probing
+//
+// Probing is done using the upper 57 bits (H1) of the hash as an index into
+// the groups array. Probing walks through the groups using quadratic probing
+// until it finds a group with a match or a group with an empty slot. See
+// [probeSeq] for specifics about the probe sequence. Note the probe
+// invariants: the number of groups must be a power of two, and the end of a
+// probe sequence must be a group with an empty slot (the table can never be
+// 100% full).
+//
+// Deletion
+//
+// Probing stops when it finds a group with an empty slot. This affects
+// deletion: when deleting from a completely full group, we must not mark the
+// slot as empty, as there could be more slots used later in a probe sequence
+// and this deletion would cause probing to stop too early. Instead, we mark
+// such slots as "deleted" with a tombstone. If the group still has an empty
+// slot, we don't need a tombstone and directly mark the slot empty. Currently,
+// tombstone are only cleared during grow, as an in-place cleanup complicates
+// iteration.
+//
+// Growth
+//
+// When the table reaches the maximum load factor, it grows by allocating a new
+// groups array twice as big as before and reinserting all keys (the probe
+// sequence will differ with a larger array).
+// NOTE: Spoiler alert: A later CL supporting incremental growth will make each
+// table instance have an immutable group count. Growth will allocate a
+// completely new (bigger) table instance.
+//
+// Iteration
+//
+// Iteration is the most complex part of the map due to Go's generous iteration
+// semantics. A summary of semantics from the spec:
+// 1. Adding and/or deleting entries during iteration MUST NOT cause iteration
+// to return the same entry more than once.
+// 2. Entries added during iteration MAY be returned by iteration.
+// 3. Entries modified during iteration MUST return their latest value.
+// 4. Entries deleted during iteration MUST NOT be returned by iteration.
+// 5. Iteration order is unspecified. In the implementation, it is explicitly
+// randomized.
+//
+// If the map never grows, these semantics are straightforward: just iterate
+// over every group and every slot and these semantics all land as expected.
+//
+// If the map grows during iteration, things complicate significantly. First
+// and foremost, we need to track which entries we already returned to satisfy
+// (1), but the larger table has a completely different probe sequence and thus
+// different entry layout.
+//
+// We handle that by having the iterator keep a reference to the original table
+// groups array even after the table grows. We keep iterating over the original
+// groups to maintain the iteration order and avoid violating (1). Any new
+// entries added only to the new groups will be skipped (allowed by (2)). To
+// avoid violating (3) or (4), while we use the original groups to select the
+// keys, we must look them up again in the new groups to determine if they have
+// been modified or deleted. There is yet another layer of complexity if the
+// key does not compare equal itself. See [Iter.Next] for the gory details.
+//
+// NOTE: Spoiler alert: A later CL supporting incremental growth will make this
+// even more complicated. Yay!
+
+// Extracts the H1 portion of a hash: the 57 upper bits.
+// TODO(prattmic): what about 32-bit systems?
+func h1(h uintptr) uintptr {
+ return h >> 7
+}
+
+// Extracts the H2 portion of a hash: the 7 bits not used for h1.
+//
+// These are used as an occupied control byte.
+func h2(h uintptr) uintptr {
+ return h & 0x7f
+}
+
+type Map = table