| Age | Commit message (Collapse) | Author |
|
|
|
We did not set a limit on the maximum size of sparse maps in
the old GNU sparse format. Set a limit based on the cumulative
size of the extension blocks used to encode the map (consistent
with how we limit the sparse map size for other formats).
Add an additional limit to the total number of sparse file entries,
regardless of encoding, to all sparse formats.
Thanks to Colin Walters (walters@verbum.org),
Uuganbayar Lkhamsuren (https://github.com/uug4na),
and Jakub Ciolek for reporting this issue.
Fixes #78301
Fixes CVE-2026-32288
Change-Id: I84877345d7b41cc60c58771860ba70e16a6a6964
Reviewed-on: https://go-internal-review.googlesource.com/c/go/+/3901
Reviewed-by: Damien Neil <dneil@google.com>
Reviewed-by: Roland Shoemaker <bracewell@google.com>
Reviewed-on: https://go-review.googlesource.com/c/go/+/763766
Auto-Submit: David Chase <drchase@google.com>
TryBot-Bypass: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Jakub Ciolek <jakub@ciolek.dev>
|
|
Change-Id: I0459f05e7f6abd9738813c65d993114e931720d5
Reviewed-on: https://go-review.googlesource.com/c/go/+/731000
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
|
|
This avoids complaints from scanners that look for and open
tar and bz2 files, and complain if they look weird.
In this case, they do look weird, because they are intentionally strange.
This kind of thing shouldn't be necessary, but we already have the machinery
to do it so it's easy enough.
Fixes #76799
Change-Id: Ib302b3aef30108a1325f91fcb2d166f8e1863792
Reviewed-on: https://go-review.googlesource.com/c/go/+/729780
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Joseph Tsai <joetsai@digital-static.net>
|
|
Sparse files in tar archives contain only the non-zero components
of the file. There are several different encodings for sparse
files. When reading GNU tar pax 1.0 sparse files, archive/tar did
not set a limit on the size of the sparse region data. A malicious
archive containing a large number of sparse blocks could cause
archive/tar to read an unbounded amount of data from the archive
into memory.
Since a malicious input can be highly compressable, a small
compressed input could cause very large allocations.
Cap the size of the sparse block data to the same limit used
for PAX headers (1 MiB).
Thanks to Harshit Gupta (Mr HAX) (https://www.linkedin.com/in/iam-harshit-gupta/)
for reporting this issue.
Fixes CVE-2025-58183
Fixes #75677
Change-Id: I70b907b584a7b8676df8a149a1db728ae681a770
Reviewed-on: https://go-internal-review.googlesource.com/c/go/+/2800
Reviewed-by: Roland Shoemaker <bracewell@google.com>
Reviewed-by: Nicholas Husin <husin@google.com>
Reviewed-on: https://go-review.googlesource.com/c/go/+/709861
Auto-Submit: Michael Pratt <mpratt@google.com>
TryBot-Bypass: Michael Pratt <mpratt@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
Using MD5 for checksums in tests is an overkill, as MD5 is designed for
cryptographic purposes. Use hash/crc32 instead, which is designed for
detecting random data corruptions, aka checksums.
Change-Id: I03b30ed7f38fba2a2e59d06bd4133b495f64a013
Reviewed-on: https://go-review.googlesource.com/c/go/+/617675
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
Replace reflect.DeepEqual with slices.Equal/maps.Equal, which is
much faster.
Clean up some unnecessary helper functions.
Change-Id: I9b94bd43886302b9b327539ab065a435ce0d75d9
GitHub-Last-Rev: b9ca21f165bcc5e45733e6a511a2344b1aa4a281
GitHub-Pull-Request: golang/go#67607
Reviewed-on: https://go-review.googlesource.com/c/go/+/587936
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Joseph Tsai <joetsai@digital-static.net>
|
|
Add GODEBUG=tarinsecurepath=1 and GODEBUG=zipinsecurepath=1 settings
to disable file name validation.
For #55356.
Change-Id: Iaacdc629189493e7ea3537a81660215a59dd40a4
Reviewed-on: https://go-review.googlesource.com/c/go/+/452495
Reviewed-by: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Damien Neil <dneil@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
|
|
Return a distinguishable error when reading an archive file
with a path that is:
- absolute
- escapes the current directory (../a)
- on Windows, a reserved name such as NUL
Users may ignore this error and proceed if they do not need name
sanitization or intend to perform it themselves.
Fixes #25849
Fixes #55356
Change-Id: Ieefa163f00384bc285ab329ea21a6561d39d8096
Reviewed-on: https://go-review.googlesource.com/c/go/+/449937
Reviewed-by: Joseph Tsai <joetsai@digital-static.net>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Damien Neil <dneil@google.com>
Auto-Submit: Damien Neil <dneil@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Roland Shoemaker <roland@golang.org>
|
|
Set a 1MiB limit on special file blocks (PAX headers, GNU long names,
GNU link names), to avoid reading arbitrarily large amounts of data
into memory.
Thanks to Adam Korczynski (ADA Logics) and OSS-Fuzz for reporting
this issue.
Fixes CVE-2022-2879
For #54853
Change-Id: I85136d6ff1e0af101a112190e027987ab4335680
Reviewed-on: https://team-review.git.corp.google.com/c/golang/go-private/+/1565555
Reviewed-by: Tatiana Bradley <tatianabradley@google.com>
Run-TryBot: Roland Shoemaker <bracewell@google.com>
Reviewed-by: Roland Shoemaker <bracewell@google.com>
Reviewed-on: https://go-review.googlesource.com/c/go/+/439355
Reviewed-by: Damien Neil <dneil@google.com>
Run-TryBot: Roland Shoemaker <roland@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Roland Shoemaker <roland@golang.org>
|
|
Change-Id: Id492ee4e614a38880a6a5830371dcd9a8b37129a
Reviewed-on: https://go-review.googlesource.com/c/go/+/422214
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Joseph Tsai <joetsai@digital-static.net>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Run-TryBot: hopehook <hopehook@qq.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: hopehook <hopehook@qq.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
And then revert the bootstrap cmd directories and certain testdata.
And adjust tests as needed.
Not reverting the changes in std that are bootstrapped,
because some of those changes would appear in API docs,
and we want to use any consistently.
Instead, rewrite 'any' to 'interface{}' in cmd/dist for those directories
when preparing the bootstrap copy.
A few files changed as a result of running gofmt -w
not because of interface{} -> any but because they
hadn't been updated for the new //go:build lines.
Fixes #49884.
Change-Id: Ie8045cba995f65bd79c694ec77a1b3d1fe01bb09
Reviewed-on: https://go-review.googlesource.com/c/go/+/368254
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Many of the methods inside the archive/tar package don't need to be
exported. Doing so sets a bad precedent that it's OK to export methods
to indicate an internal public API. That's not a good idea in general,
because exported methods increase cognitive load when reading code:
the reader needs to consider whether the exported method might be used
via some external interface or reflection.
This CL should have no externally visible behaviour changes at all.
Change-Id: I94a63de5e6a28e9ac8a283325217349ebce4f308
Reviewed-on: https://go-review.googlesource.com/c/go/+/341410
Reviewed-by: Joe Tsai <joetsai@digital-static.net>
Trust: Joe Tsai <joetsai@digital-static.net>
Trust: Michael Knyszek <mknyszek@google.com>
|
|
As part of #42026, these helpers from io/ioutil were moved to os.
(ioutil.TempFile and TempDir became os.CreateTemp and MkdirTemp.)
Update the Go tree to use the preferred names.
As usual, code compiled with the Go 1.4 bootstrap toolchain
and code vendored from other sources is excluded.
ReadDir changes are in a separate CL, because they are not a
simple search and replace.
For #42026.
Change-Id: If318df0216d57e95ea0c4093b89f65e5b0ababb3
Reviewed-on: https://go-review.googlesource.com/c/go/+/266365
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
The old ioutil references are still valid, but update our code
to reflect best practices and get used to the new locations.
Code compiled with the bootstrap toolchain
(cmd/asm, cmd/dist, cmd/compile, debug/elf)
must remain Go 1.4-compatible and is excluded.
Also excluded vendored code.
For #41190.
Change-Id: I6d86f2bf7bc37a9d904b6cee3fe0c7af6d94d5b1
Reviewed-on: https://go-review.googlesource.com/c/go/+/263142
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
|
|
The go/printer (and thus gofmt) uses a heuristic to determine
whether to break alignment between elements of an expression
list which is spread across multiple lines. The heuristic only
kicked in if the entry sizes (character length) was above a
certain threshold (20) and the ratio between the previous and
current entry size was above a certain value (4).
This heuristic worked reasonably most of the time, but also
led to unfortunate breaks in many cases where a single entry
was suddenly much smaller (or larger) then the previous one.
The behavior of gofmt was sufficiently mysterious in some of
these situations that many issues were filed against it.
The simplest solution to address this problem is to remove
the heuristic altogether and have a programmer introduce
empty lines to force different alignments if it improves
readability. The problem with that approach is that the
places where it really matters, very long tables with many
(hundreds, or more) entries, may be machine-generated and
not "post-processed" by a human (e.g., unicode/utf8/tables.go).
If a single one of those entries is overlong, the result
would be that the alignment would force all comments or
values in key:value pairs to be adjusted to that overlong
value, making the table hard to read (e.g., that entry may
not even be visible on screen and all other entries seem
spaced out too wide).
Instead, we opted for a slightly improved heuristic that
behaves much better for "normal", human-written code.
1) The threshold is increased from 20 to 40. This disables
the heuristic for many common cases yet even if the alignment
is not "ideal", 40 is not that many characters per line with
todays screens, making it very likely that the entire line
remains "visible" in an editor.
2) Changed the heuristic to not simply look at the size ratio
between current and previous line, but instead considering the
geometric mean of the sizes of the previous (aligned) lines.
This emphasizes the "overall picture" of the previous lines,
rather than a single one (which might be an outlier).
3) Changed the ratio from 4 to 2.5. Now that we ignore sizes
below 40, a ratio of 4 would mean that a new entry would have
to be 4 times bigger (160) or smaller (10) before alignment
would be broken. A ratio of 2.5 seems more sensible.
Applied updated gofmt to all of src and misc. Also tested
against several former issues that complained about this
and verified that the output for the given examples is
satisfactory (added respective test cases).
Some of the files changed because they were not gofmt-ed
in the first place.
For #644.
For #7335.
For #10392.
(and probably more related issues)
Fixes #22852.
Change-Id: I5e48b3d3b157a5cf2d649833b7297b33f43a6f6e
|
|
Change Reader to promote TypeRegA to TypeReg in headers, unless their
name have a trailing slash which is already promoted to TypeDir. This
will allow client code to handle just TypeReg instead both TypeReg and
TypeRegA.
Change Writer to promote TypeRegA to TypeReg or TypeDir in the headers
depending on whether the name has a trailing slash. This normalization
is motivated by the specification (in pax(1)):
0 represents a regular file. For backwards-compatibility, a
typeflag value of binary zero ( '\0' ) should be recognized as
meaning a regular file when extracting files from the
archive. Archives written with this version of the archive file
format create regular files with a typeflag value of the
ISO/IEC 646:1991 standard IRV '0'.
Fixes #22768.
Change-Id: I149ec55824580d446cdde5a0d7a0457ad7b03466
Reviewed-on: https://go-review.googlesource.com/85656
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Several usages of tar (reasonably) just use the Header.FileInfo
to determine the type of the header. However, the os.FileMode type
is not expressive enough to represent "files" that are not files
at all, but some form of metadata.
Thus, Header{Typeflag: TypeXGlobalHeader}.FileInfo().Mode().IsRegular()
reports true, even though the expected result may have been false.
To reduce (not eliminate) the possibility of failure for such usages,
use the placeholder filename from the global PAX headers.
Thus, in the event the user did not handle special "meta" headers
specifically, they will just be written to disk as a regular file.
As an example use case, the "git archive --format=tgz" command produces
an archive where the first "file" is a global PAX header with the
name "global_pax_header". For users that do not explicitly check
the Header.Typeflag field to ignore such headers, they may end up
extracting a file named "global_pax_header". While it is a bogus file,
it at least does not stop the extraction process.
Updates #22748
Change-Id: I28448b528dcfacb4e92311824c33c71b482f49c9
Reviewed-on: https://go-review.googlesource.com/78355
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
This CL removes the following APIs:
type SparseEntry struct{ ... }
type Header struct{ SparseHoles []SparseEntry; ... }
func (*Header) DetectSparseHoles(f *os.File) error
func (*Header) PunchSparseHoles(f *os.File) error
func (*Reader) WriteTo(io.Writer) (int, error)
func (*Writer) ReadFrom(io.Reader) (int, error)
This API was added during the Go1.10 dev cycle, and are safe to remove.
The rationale for reverting is because Header.DetectSparseHoles and
Header.PunchSparseHoles are functionality that probably better belongs in
the os package itself.
The other API like Header.SparseHoles, Reader.WriteTo, and Writer.ReadFrom
perform no OS specific logic and only perform the actual business logic of
reading and writing sparse archives. Since we do know know what the API added to
package os may look like, we preemptively revert these non-OS specific changes
as well by simply commenting them out.
Updates #13548
Updates #22735
Change-Id: I77842acd39a43de63e5c754bfa1c26cc24687b70
Reviewed-on: https://go-review.googlesource.com/78030
Reviewed-by: Russ Cox <rsc@golang.org>
|
|
The USTAR format says:
<<<
Implementors should be aware that the previous file format did not include
a mechanism to archive directory type files.
For this reason, the convention of using a filename ending with
<slash> was adopted to specify a directory on the archive.
>>>
In light of this suggestion, make the following changes:
* Writer.WriteHeader refuses to encode a header where a file that
is obviously a file-type has a trailing slash in the name.
* formatter.formatString avoids encoding a trailing slash in the event
that the string is truncated (the full string will be encoded elsewhere,
so stripping the slash is safe).
* Reader.Next treats a TypeRegA (which is the zero value of Typeflag)
as a TypeDir if the name has a trailing slash.
Change-Id: Ibf27aa8234cce2032d92e5e5b28546c2f2ae5ef6
Reviewed-on: https://go-review.googlesource.com/69293
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
The interfaces for io.Reader and io.Writer permit calling Read/Write
with an empty buffer. However, this condition is often not well tested
and can lead to bugs in various implementations of io.Reader and io.Writer.
For example, see #22028 for buggy io.Reader in the bzip2 package.
We reduce the likelihood of hitting these bugs by adjusting
regFileReader.Read and regFileWriter.Write to avoid performing
Read and Write calls when the buffer is known to be empty.
Fixes #22029
Change-Id: Ie4a26be53cf87bc4d2abd951fa005db5871cc75c
Reviewed-on: https://go-review.googlesource.com/66111
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
To support the efficient packing and extracting of sparse files,
add two new methods:
func Reader.WriteTo(io.Writer) (int64, error)
func Writer.ReadFrom(io.Reader) (int64, error)
If the current archive entry is sparse and the provided io.{Reader,Writer}
is also an io.Seeker, then use Seek to skip past the holes.
If the last region in a file entry is a hole, then we seek to 1 byte
before the EOF:
* for Reader.WriteTo to write a single byte
to ensure that the resulting filesize is correct.
* for Writer.ReadFrom to read a single byte
to verify that the input filesize is correct.
The downside of this approach is when the last region in the sparse file
is a hole. In the case of Reader.WriteTo, the 1-byte write will cause
the last fragment to have a single chunk allocated.
However, the goal of ReadFrom/WriteTo is *not* the ability to
exactly reproduce sparse files (in terms of the location of sparse holes),
but rather to provide an efficient way to create them.
File systems already impose their own restrictions on how the sparse file
will be created. Some filesystems (e.g., HFS+) don't support sparseness and
seeking forward simply causes the FS to write zeros. Other filesystems
have different chunk sizes, which will cause chunk allocations at boundaries
different from what was in the original sparse file. In either case,
it should not be a normal expectation of users that the location of holes
in sparse files exactly matches the source.
For users that really desire to have exact reproduction of sparse holes,
they can wrap os.File with their own io.WriteSeeker that discards the
final 1-byte write and uses File.Truncate to resize the file to the
correct size.
Other reasons we choose this approach over special-casing *os.File because:
* The Reader already has special-case logic for io.Seeker
* As much as possible, we want to decouple OS-specific logic from
Reader and Writer.
* This allows other abstractions over *os.File to also benefit from
the "skip past holes" logic.
* It is easier to test, since it is harder to mock an *os.File.
Updates #13548
Change-Id: I0a4f293bd53d13d154a946bc4a2ade28a6646f6a
Reviewed-on: https://go-review.googlesource.com/60872
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
The PAX specification says the following:
<<<
'g' represents global extended header records for the following files in the archive.
The format of these extended header records shall be as described in pax Extended Header.
Each value shall affect all subsequent files that do not override that value
in their own extended header record and until another global extended header record
is reached that provides another value for the same field.
>>>
This CL adds support for parsing and composing global PAX records,
but intentionally does not provide support for automatically
persisting the global state across files.
Changes made:
* When Reader encounters a TypeXGlobalRecord header, it parses the
PAX records and returns them to the user ad-verbatim. Reader does not
store them in its state, ensuring it has no effect on future Next calls.
* When Writer receives a TypeXGlobalRecord header, it writes the
PAX records to the archive ad-verbatim. It does not store them in
its state, ensuring it has no effect on future WriteHeader calls.
* The restriction regarding empty record values is lifted since this
value is used to represent deletion in global headers.
Why provide raw support only:
* Some archives in the wild have a global header section (often empty)
and it is the user's responsibility to manually read and discard it's body.
The logic added here allows users to more easily skip over these sections.
* For users that do care about global headers, having access to the raw
records allows them to implement the functionality of global headers themselves
and manually persist the global state across files.
* We can still upgrade to a full implementation in the future.
Why we don't provide full support:
* Even though the PAX specification describes their operation in detail,
both the GNU and BSD tar tools (which are the most common implementations)
do not have a consistent interpretation of many details.
* Global headers were a controversial feature in PAX, by admission of the
specification itself:
<<<
The concept of a global extended header (typeflag g) was controversial.
The typeflag g global headers should not be used with interchange media that
could suffer partial data loss in transporting the archive.
>>>
* Having state persist from entry-to-entry complicates the implementation
for a feature that is not widely used and not well supported.
Change-Id: I1d904cacc2623ddcaa91525a5470b7dbe226c7e8
Reviewed-on: https://go-review.googlesource.com/59190
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
|
|
This CL adds the following new publicly visible API:
type Header struct { ...; PAXRecords map[string]string }
The new Header.PAXRecords field is a map of all PAX extended header records.
We suggest (but do not enforce) that users use VENDOR-prefixed keys
according to the following in the PAX specification:
<<<
The standard developers have reserved keyword name space for vendor extensions.
It is suggested that the format to be used is:
VENDOR.keyword
where VENDOR is the name of the vendor or organization in all uppercase letters.
>>>
When reading, the Header.PAXRecords is populated with all PAX records
encountered so far, including basic ones (e.g., "path", "mtime", etc).
When writing, the fields of Header will be merged into PAXRecords,
overwriting any records that may conflict.
Since PAXRecords is a more expressive feature than Xattrs and
is entirely a superset of Xattrs, we mark Xattrs as deprecated,
and steer users towards the new PAXRecords API.
The issue has a discussion about adding a Header.SetPAXRecord method
to help validate records and keep the Header fields in sync.
However, we do not include that in this CL since that helper
method can always be added in the future.
There is no support for global records.
Fixes #14472
Change-Id: If285a52749acc733476cf75a2c7ad15bc1542071
Reviewed-on: https://go-review.googlesource.com/58390
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
The Reader and Writer are now at feature parity,
meaning that everything that can be parsed by the Reader,
can also be composed by the Writer.
This position enables us to support selection of the format
in a backwards compatible way, since it ensures that everything
that can be read can also be round-trip written.
As such, we add the following new API:
type Format int
const FormatUnknown Format = 0 ...
type Header struct { ...; Format Format }
The new Header.Format field is populated by the Reader on the
best guess on what the format is. Note that the Reader is very liberal
in what it permits, so a hybrid TAR file using aspects of multiple
formats can still be decoded, but will be reported as FormatUnknown.
Even though Reader has full support for V7 and basic support for STAR,
it will still report those formats as unknown (and the constants for
those formats are not even exported). The reasons for this is because
the Writer has no support for V7 or STAR. Leaving it as unknown allows
the Writer to choose a format usually USTAR or GNU that can encode
the equivalent Header.
When writing, the Header.allowedFormats will take the Format field
into consideration if it is a known format.
Fixes #18710
Change-Id: I00980c475d067c6969d3414e1ff0224fdd89cd49
Reviewed-on: https://go-review.googlesource.com/58230
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
This CL is the second step (of two; part1 is CL/56771) for adding
sparse file support to the Writer.
There are no new identifiers exported in this CL, but this does make
use of Header.SparseHoles added in part1. If the Typeflag is set to
TypeGNUSparse or len(SparseHoles) > 0, then the Writer will emit an
sparse file, where the holes must be written by the user as zeros.
If TypeGNUSparse is set, then the output file must use the GNU format.
Otherwise, it must use the PAX format (with GNU-defined PAX keys).
A future CL may export Reader.Discard and Writer.FillZeros,
but those methods are currently unexported, and only used by the
tests for efficiency reasons.
Calling Discard or FillZeros on a hole 10GiB in size does take
time, even if it is essentially a memcopy.
Updates #13548
Change-Id: Id586d9178c227c0577f796f731ae2cbb72355601
Reviewed-on: https://go-review.googlesource.com/57212
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
This CL is the first step (of two) for adding sparse file support
to the Writer. This CL only refactors the logic of sparse-file handling
in the Reader so that common logic can be easily shared by the Writer.
As a result of this CL, there are some new publicly visible API changes:
type SparseEntry struct { Offset, Length int64 }
type Header struct { ...; SparseHoles []SparseEntry }
A new type is defined to represent a sparse fragment and a new field
Header.SparseHoles is added to represent the sparse holes in a file.
The API intentionally represent sparse files using hole fragments,
rather than data fragments so that the zero value of SparseHoles
naturally represents a normal file (i.e., a file without any holes).
The Reader now populates SparseHoles for sparse files.
It is necessary to export the sparse hole information, otherwise it would
be impossible for the Writer to specify that it is trying to encode
a sparse file, and what it looks like.
Some unexported helper functions were added to common.go:
func validateSparseEntries(sp []SparseEntry, size int64) bool
func alignSparseEntries(src []SparseEntry, size int64) []SparseEntry
func invertSparseEntries(src []SparseEntry, size int64) []SparseEntry
The validation logic that used to be in newSparseFileReader is now moved
to validateSparseEntries so that the Writer can use it in the future.
alignSparseEntries is currently unused by the Reader, but will be used
by the Writer in the future. Since TAR represents sparse files by
only recording the data fragments, we add the invertSparseEntries
function to convert a list of data fragments to a normalized list
of hole fragments (and vice-versa).
Some other high-level changes:
* skipUnread is deleted, where most of it's logic is moved to the
Discard methods on regFileReader and sparseFileReader.
* readGNUSparsePAXHeaders was rewritten to be simpler.
* regFileReader and sparseFileReader were completely rewritten
in simpler and easier to understand logic.
* A bug was fixed in sparseFileReader.Read where it failed to
report an error if the logical size of the file ends before
consuming all of the underlying data.
* The tests for sparse-file support was completely rewritten.
Updates #13548
Change-Id: Ic1233ae5daf3b3f4278fe1115d34a90c4aeaf0c2
Reviewed-on: https://go-review.googlesource.com/56771
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
The GNU tar format defines the following type flags:
TypeGNULongName = 'L' // Next file has a long name
TypeGNULongLink = 'K' // Next file symlinks to a file w/ a long name
Anytime a string exceeds the field dedicated to store it, the GNU format
permits a fake "file" to be prepended where that file entry has a Typeflag
of 'L' or 'K' and the contents of the file is a NUL-terminated string.
Contrary to previous TODO comments,
the GNU format supports arbitrary strings (without NUL) rather UTF-8 strings.
The manual says the following:
<<<
The name, linkname, magic, uname, and gname are
null-terminated character strings
>>>
<<<
All characters in header blocks are represented
by using 8-bit characters in the local variant of ASCII.
>>>
From this description, we gather the following:
* We must forbid NULs in any GNU strings
* Any 8-bit value (other than NUL) is permitted
Since the modern world has moved to UTF-8, it is really difficult to
determine what a "local variant of ASCII" means. For this reason,
we treat strings as just an arbitrary binary string (without NUL)
and leave it to the user to determine the encoding of this string.
(Practically, it seems that UTF-8 is the typical encoding used
in GNU archives seen in the wild).
The implementation of GNU tar seems to confirm this interpretation
of the manual where it permits any arbitrary binary string to exist
within these fields so long as they do not contain the NUL character.
$ touch `echo -e "not\x80\x81\x82\x83utf8"`
$ gnutar -H gnu --tar -cvf gnu-not-utf8.tar $(echo -e "not\x80\x81\x82\x83utf8")
The fact that we permit arbitrary binary in GNU strings goes
hand-in-hand with the fact that GNU also permits a "base-256" encoding
of numeric fields, which is effectively two-complement binary.
Change-Id: Ic037ec6bed306d07d1312f0058594bd9b64d9880
Reviewed-on: https://go-review.googlesource.com/55573
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Both GNU and BSD tar do not care if the devmajor and devminor values are
set on entries (like regular files) that aren't character or block devices.
While this is non-sensible, it is more consistent with the Writer to actually
read these fields always. In a vast majority of the cases these will still
be zero. In the rare situation where someone actually cares about these,
at least information was not silently lost.
Change-Id: I6e4ba01cd897a1b13c28b1837e102a4fdeb420ba
Reviewed-on: https://go-review.googlesource.com/55572
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
USTAR and GNU strings are NUL-terminated. Thus, we should never
allow the NUL terminator, otherwise we will lose data round-trip.
Relevant specification text:
<<<
The fields magic, uname, and gname are character strings each terminated by a NUL character.
>>>
Technically, PAX keys and values should be UTF-8, but the observance
of invalid files in the wild causes us to be more liberal.
<<<
The <length> field, <blank>, <equals-sign>, and <newline> shown shall
be limited to the portable character set, as encoded in UTF-8.
>>>
Thus, we only reject NULs in PAX keys, and NULs for PAX values
representing the USTAR string fields (i.e., path, linkpath, uname, gname).
These are treated more strictly because they represent strings that
are typically represented as C-strings on POSIX systems.
Change-Id: I305b794d9d966faad852ff660bd0b3b0964e52bf
Reviewed-on: https://go-review.googlesource.com/14724
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
Given that sparse file logic is not trivial, there should be a test
in TestPartialRead to ensure that partial reads work.
Change-Id: I913da3e331da06dca6758a8be3f5099abba233a6
Reviewed-on: https://go-review.googlesource.com/54430
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
Prior to Go1.8, the Writer had a bug where it would output
an invalid tar file in certain rare situations because the logic
incorrectly believed that the old GNU format had a prefix field.
This is wrong and leads to an output file that mangles the
atime and ctime fields, which are often left unused.
In order to continue reading tar files created by former, buggy
versions of Go, we skeptically parse the atime and ctime fields.
If we are unable to parse them and the prefix field looks like
an ASCII string, then we fallback on the pre-Go1.8 behavior
of treating these fields as the USTAR prefix field.
Note that this will not use the fallback logic for all possible
files generated by a pre-Go1.8 toolchain. If the generated file
happened to have a prefix field that parses as valid
atime and ctime fields (e.g., when they are valid octal strings),
then it is impossible to distinguish between an valid GNU file
and an invalid pre-Go1.8 file.
Fixes #21005
Change-Id: Iebf5c67c08e0e46da6ee41a2e8b339f84030dd90
Reviewed-on: https://go-review.googlesource.com/53635
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
According to the GNU manual, the format is:
<<<
GNU.sparse.size=size
GNU.sparse.numblocks=numblocks
repeat numblocks times
GNU.sparse.offset=offset
GNU.sparse.numbytes=numbytes
end repeat
>>>
The logic in parsePAX converts the repeating sequence of
(offset, numbytes) pairs (which is not PAX compliant) into a single
comma-delimited list of numbers (which is now PAX compliant).
Thus, we validate the following:
* The (offset, numbytes) headers must come in the correct order.
* The ',' delimiter cannot appear in the value.
We do not validate that the value is a parsible decimal since that
will be determined later.
Change-Id: I8d6681021734eb997898227ae8603efb1e17c0c8
Reviewed-on: https://go-review.googlesource.com/31439
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Relevant PAX specification:
<<<
If the <value> field is zero length, it shall delete any header
block field, previously entered extended header value, or
global extended header value of the same name.
>>>
We don't delete global extender headers since the Reader doesn't
even support global headers (which the specification admits was
a controversial feature).
Change-Id: I2125a5c907b23a3dc439507ca90fa5dc47d474a9
Reviewed-on: https://go-review.googlesource.com/31440
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Took this opportunity to also embed tables in the functions
that they are actually used in and other stylistic cleanups.
There was no logical changes to the tests.
Change-Id: Ifa724060532175f6f4407d6cedc841891efd8f7b
Reviewed-on: https://go-review.googlesource.com/31436
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
The GNU format does not have a prefix field, so we should make
no attempt to read it. It does however have atime and ctime fields.
Since Go previously placed incorrect values here, we liberally
read the atime and ctime fields and ignore errors so that old tar
files written by Go can at least be partially read.
This fixes half of #12594. The Writer is much harder to fix.
Updates #12594
Change-Id: Ia32845e2f262ee53366cf41dfa935f4d770c7a30
Reviewed-on: https://go-review.googlesource.com/31444
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
* Assert that the format is GNU.
Both GNU and STAR have some form of sparse file support with
incompatible header structures. Worse yet, both formats use the
'S' type flag to indicate the presence of a sparse file.
As such, we should check the format (based on magic numbers)
and fail early.
* Move realsize parsing logic into readOldGNUSparseMap.
This is related to the sparse parsing logic and belongs here.
* Fix the termination condition for parsing sparse fields.
The termination condition for reading the sparse fields
is to simply check if the first byte of the offset field is NULL.
This does not seem to be documented in the GNU manual, but this is
the check done by the both the GNU and BSD implementations:
http://git.savannah.gnu.org/cgit/tar.git/tree/src/sparse.c?id=9a33077a7b7ad7d32815a21dee54eba63b38a81c#n731
https://github.com/libarchive/libarchive/blob/1fa9c7bf90f0862036a99896b0501c381584451a/libarchive/archive_read_support_format_tar.c#L2207
* Fix the parsing of sparse fields to use parseNumeric.
This is what GNU and BSD do. The previous two links show that
GNU and BSD both handle base-256 and base-8.
* Detect truncated streams.
The call to io.ReadFull does not check if the error is io.EOF.
Getting io.EOF in this method is never okay and should always be
converted to io.ErrUnexpectedEOF.
* Simplify the function.
The logic is essentially a do-while loop so we can remove
some redundant code.
Change-Id: Ib2f601b1a283eaec1e41b1d3396d649c80749c4e
Reviewed-on: https://go-review.googlesource.com/28471
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
|
|
Move all parse/format related functionality into strconv.go
and thoroughly test them. This also reduces the amount of noise
inside reader.go and writer.go.
There was zero functionality change other than moving code around.
Change-Id: I3bc288d10c20ebb3814b30b75d8acd7be62b85d7
Reviewed-on: https://go-review.googlesource.com/28470
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
The use of PAX headers can modify the overall file size, thus the
formerly created regFileReader may be stale.
The relevant PAX specification for this behavior is:
<<<
Any fields in the preceding optional extended header shall override
the associated fields in this header block for this file.
>>>
Where "optional extended header" refers to the preceding PAX header.
Where "this header block" refers to the subsequent USTAR header.
Fixes #15573
Fixes #15564
Change-Id: I83b1c3f05a9ca2d3be38647425ad21a9fe450ee2
Reviewed-on: https://go-review.googlesource.com/28418
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
The tar.Reader guarantees stickiness of errors. Ensuring this property means
that the methods of Reader need to be consistent about whose responsibility it
is to actually ensure that errors are sticky.
In this CL, we make it only the responsibility of the exported methods
(Next and Read) to store tr.err. All other methods just return the error as is.
As part of this change, we also check the error value of mergePAX (and test
that it properly detects invalid PAX files). Since the value of mergePAX was
never used before, we change it such that it always returns ErrHeader instead
of strconv.SyntaxError. This keeps it consistent with other usages of strconv
in the same tar package.
Change-Id: Ia1c31da71f1de4c175da89a385dec665d3edd167
Reviewed-on: https://go-review.googlesource.com/28215
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Motivation:
* Previous implementation did not detect integer overflow when
parsing a base-256 encoded field.
* Previous implementation did not treat the integer as a two's
complement value as specified by GNU.
The relevant GNU specification says:
<<<
GNU format uses two's-complement base-256 notation to store values
that do not fit into standard ustar range.
>>>
Fixes #12435
Change-Id: I4639bcffac8d12e1cb040b76bd05c9d7bc6c23a8
Reviewed-on: https://go-review.googlesource.com/17424
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Motivation for change:
* Recursive logic is hard to follow, since it tends to apply
things in reverse. On the other hand, the tar formats tend to
describe meta headers as affecting the next entry.
* Recursion also applies changes in the wrong order. Two test
files are attached that use multiple headers. The previous Go
behavior differs from what GNU and BSD tar do.
Change-Id: Ic1557256fc1363c5cb26570e5d0b9f65a9e57341
Reviewed-on: https://go-review.googlesource.com/14624
Run-TryBot: Joe Tsai <joetsai@digital-static.net>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Motivations for this change:
* It allows these functions to be used outside of Reader/Writer.
* It allows these functions to be more easily unit tested.
Change-Id: Iebe2b70bdb8744371c9ffa87c24316cbbf025b59
Reviewed-on: https://go-review.googlesource.com/15113
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Joe Tsai <joetsai@digital-static.net>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Motivations:
* Use of strconv.ParseInt does not properly treat integers as 64bit,
preventing this function from working properly on 32bit machines.
* Use of io.ReadFull does not properly detect truncated streams
when the file suddenly ends on a block boundary.
* The function blindly trusts user input for numEntries and allocates
memory accordingly.
* The function does not validate that numEntries is not negative,
allowing a malicious sparse file to cause a panic during make.
In general, this function was overly complicated for what it was
accomplishing and it was hard to reason that it was free from
bounds errors. Instead, it has been rewritten and relies on
bytes.Buffer.ReadString to do the main work. So long as invariants
about the number of '\n' in the buffer are maintained, it is much
easier to see why this approach is correct.
Change-Id: Ibb12c4126c26e0ea460ea063cd17af68e3cf609e
Reviewed-on: https://go-review.googlesource.com/15174
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Certain special type-flags, specifically 1, 2, 3, 4, 5, 6,
do not have a data section. Thus, regardless of what the size field
says, we should not attempt to read any data for these special types.
The relevant PAX and USTAR specification says:
<<<
If the typeflag field is set to specify a file to be of type 1 (a link)
or 2 (a symbolic link), the size field shall be specified as zero.
If the typeflag field is set to specify a file of type 5 (directory),
the size field shall be interpreted as described under the definition
of that record type. No data logical records are stored for types 1, 2, or 5.
If the typeflag field is set to 3 (character special file),
4 (block special file), or 6 (FIFO), the meaning of the size field is
unspecified by this volume of POSIX.1-2008, and no data logical records shall
be stored on the medium.
Additionally, for type 6, the size field shall be ignored when reading.
If the typeflag field is set to any other value, the number of logical
records written following the header shall be (size+511)/512, ignoring
any fraction in the result of the division.
>>>
Contrary to the specification, we do not assert that the size field
is zero for type 1 and 2 since we liberally accept non-conforming formats.
Change-Id: I666b601597cb9d7a50caa081813d90ca9cfc52ed
Reviewed-on: https://go-review.googlesource.com/16614
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Motivation:
* Reader.skipUnread never reports io.ErrUnexpectedEOF. This is strange
given that io.ErrUnexpectedEOF is given through Reader.Read if the
user manually reads the file.
* Reader.skipUnread fails to detect truncated files since io.Seeker
is lazy about reporting errors. Thus, the behavior of Reader differs
whether the input io.Reader also satisfies io.Seeker or not.
To solve this, we seek to one before the end of the data section and
always rely on at least one call to io.CopyN. If the tr.r satisfies
io.Seeker, this is guarunteed to never read more than blockSize.
Fixes #12557
Change-Id: I0ddddfc6bed0d74465cb7e7a02b26f1de7a7a279
Reviewed-on: https://go-review.googlesource.com/15175
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Motivation:
* The logic to verify the numEntries can overflow and incorrectly
pass, allowing a malicious file to allocate arbitrary memory.
* The use of strconv.ParseInt does not set the integer precision
to 64bit, causing this code to work incorrectly on 32bit machines.
Change-Id: I1b1571a750a84f2dde97cc329ed04fe2342aaa60
Reviewed-on: https://go-review.googlesource.com/15173
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
A recursive call to Reader.Next did not check the error before
trying to use the result, leading to a nil pointer panic.
This specific CL addresses the immediate issue, which is the panic,
but does not solve the root issue, which is due to an integer
overflow in the base-256 parser.
Updates #12435
Change-Id: Ia908671f0f411a409a35e24f2ebf740d46734072
Reviewed-on: https://go-review.googlesource.com/15437
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Motivation:
* There are an increasing number of "one-off" corrupt files added
to make sure that package does not succeed or crash on them.
Instead, allow for the test to specify the error that is expected
to occur (if any).
* Also, fold in the logic to check the MD5 checksum into this
function.
The following tests are being removed:
* TestIncrementalRead: Done by TestReader by using io.CopyBuffer
with a buffer of 8. This achieves the same behavior as this test.
* TestSparseEndToEnd: Since TestReader checks the MD5 checksums
if the input corpus provides them, then this is redundant.
* TestSparseIncrementalRead: Redundant for the same reasons that
TestIncrementalRead is now redundant
* TestNegativeHdrSize: Added to TestReader corpus
* TestIssue10968: Added to TestReader corpus
* TestIssue11169: Added to TestReader corpus
With this change, code coverage did not change: 85.3%
Change-Id: I8550d48657d4dbb8f47dfc3dc280758ef73b47ec
Reviewed-on: https://go-review.googlesource.com/15176
Reviewed-by: Andrew Gerrand <adg@golang.org>
|
|
The sparseFileReader is prone to two different forms of
denial-of-service attacks:
* A malicious tar file can cause an infinite loop
* A malicious tar file can cause arbitrary panics
This results because of poor error checking/handling, which this
CL fixes. While we are at it, add a plethora of unit tests to
test for possible malicious inputs.
Change-Id: I2f9446539d189f3c1738a1608b0ad4859c1be929
Reviewed-on: https://go-review.googlesource.com/15115
Reviewed-by: Andrew Gerrand <adg@golang.org>
Run-TryBot: Andrew Gerrand <adg@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|