aboutsummaryrefslogtreecommitdiff
path: root/src/archive/tar/reader.go
AgeCommit message (Collapse)Author
7 daysarchive/tar: limit the number of old GNU sparse format entriesDamien Neil
We did not set a limit on the maximum size of sparse maps in the old GNU sparse format. Set a limit based on the cumulative size of the extension blocks used to encode the map (consistent with how we limit the sparse map size for other formats). Add an additional limit to the total number of sparse file entries, regardless of encoding, to all sparse formats. Thanks to Colin Walters (walters@verbum.org), Uuganbayar Lkhamsuren (https://github.com/uug4na), and Jakub Ciolek for reporting this issue. Fixes #78301 Fixes CVE-2026-32288 Change-Id: I84877345d7b41cc60c58771860ba70e16a6a6964 Reviewed-on: https://go-internal-review.googlesource.com/c/go/+/3901 Reviewed-by: Damien Neil <dneil@google.com> Reviewed-by: Roland Shoemaker <bracewell@google.com> Reviewed-on: https://go-review.googlesource.com/c/go/+/763766 Auto-Submit: David Chase <drchase@google.com> TryBot-Bypass: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Jakub Ciolek <jakub@ciolek.dev>
2025-10-07archive/tar: set a limit on the size of GNU sparse file 1.0 regionsDamien Neil
Sparse files in tar archives contain only the non-zero components of the file. There are several different encodings for sparse files. When reading GNU tar pax 1.0 sparse files, archive/tar did not set a limit on the size of the sparse region data. A malicious archive containing a large number of sparse blocks could cause archive/tar to read an unbounded amount of data from the archive into memory. Since a malicious input can be highly compressable, a small compressed input could cause very large allocations. Cap the size of the sparse block data to the same limit used for PAX headers (1 MiB). Thanks to Harshit Gupta (Mr HAX) (https://www.linkedin.com/in/iam-harshit-gupta/) for reporting this issue. Fixes CVE-2025-58183 Fixes #75677 Change-Id: I70b907b584a7b8676df8a149a1db728ae681a770 Reviewed-on: https://go-internal-review.googlesource.com/c/go/+/2800 Reviewed-by: Roland Shoemaker <bracewell@google.com> Reviewed-by: Nicholas Husin <husin@google.com> Reviewed-on: https://go-review.googlesource.com/c/go/+/709861 Auto-Submit: Michael Pratt <mpratt@google.com> TryBot-Bypass: Michael Pratt <mpratt@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
2024-03-08archive/tar: use built-in clear to simplify codeapocelipes
Change-Id: I0e55dd68d92c39aba511b55368bf50d929d75f86 GitHub-Last-Rev: 17430140783db8bf3354304c8f28d6826186c6cb GitHub-Pull-Request: golang/go#66158 Reviewed-on: https://go-review.googlesource.com/c/go/+/569696 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: qiulaidongfeng <2645477756@qq.com> Auto-Submit: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ian Lance Taylor <iant@google.com>
2023-10-13archive: add available godoc linkcui fliter
Change-Id: I813aa09f8a65936796469fa637d0f23004d26098 Reviewed-on: https://go-review.googlesource.com/c/go/+/534757 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Ian Lance Taylor <iant@golang.org> Reviewed-by: Joseph Tsai <joetsai@digital-static.net> Reviewed-by: Ian Lance Taylor <iant@google.com> Run-TryBot: shuang cui <imcusg@gmail.com>
2023-01-19internal/godebug: export non-default-behavior counters in runtime/metricsRuss Cox
Allow GODEBUG users to report how many times a setting resulted in non-default behavior. Record non-default-behaviors for all existing GODEBUGs. Also rework tests to ensure that runtime is in sync with runtime/metrics.All, and generate docs mechanically from metrics.All. For #56986. Change-Id: Iefa1213e2a5c3f19ea16cd53298c487952ef05a4 Reviewed-on: https://go-review.googlesource.com/c/go/+/453618 TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2023-01-17archive/tar, archive/zip: document ErrInsecurePath and GODEBUG settingRuss Cox
These are mentioned in the release notes but not the actual doc comments. Nothing should exist only in release notes. Change-Id: I8d10f25a2c9b2677231929ba3f393af9034b777b Reviewed-on: https://go-review.googlesource.com/c/go/+/462195 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Damien Neil <dneil@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-12-21archive/tar, archive/zip: revert documentation of ErrInsecurePathDamien Neil
CL 452616 disables path security checks by default, enabling them only when GODEBUG=tarinsecurepath=0 or GODEBUG=zipinsecurepath=0 is set. Remove now-obsolete documenation of the path checks. For #55356 Change-Id: I4ae57534efe9e27368d5e67773a502dd0e56eff4 Reviewed-on: https://go-review.googlesource.com/c/go/+/458875 Reviewed-by: Russ Cox <rsc@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Damien Neil <dneil@google.com>
2022-12-03all: fix some comments for methodcui fliter
Change-Id: I4cff6b2a1fed6acdf754539c3c53a61eaa3b3f84 Reviewed-on: https://go-review.googlesource.com/c/go/+/450176 Auto-Submit: Ian Lance Taylor <iant@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2022-11-22archive/tar, archive/zip: disable ErrInsecurePath by defaultDamien Neil
This change is being made late in the release cycle. Disable it by default. Insecure path checks may be enabled by setting GODEBUG=tarinsecurepath=0 or GODEBUG=zipinsecurepath=0. We can enable this by default in Go 1.21 after publicizing the change more broadly and giving users a chance to adapt to the change. For #55356. Change-Id: I549298b3c85d6c8c7fd607c41de1073083f79b1d Reviewed-on: https://go-review.googlesource.com/c/go/+/452616 TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Damien Neil <dneil@google.com> Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Damien Neil <dneil@google.com>
2022-11-21archive/tar, archive/zip: disable insecure file name checks with GODEBUGDamien Neil
Add GODEBUG=tarinsecurepath=1 and GODEBUG=zipinsecurepath=1 settings to disable file name validation. For #55356. Change-Id: Iaacdc629189493e7ea3537a81660215a59dd40a4 Reviewed-on: https://go-review.googlesource.com/c/go/+/452495 Reviewed-by: Bryan Mills <bcmills@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Damien Neil <dneil@google.com> Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Heschi Kreinick <heschi@google.com>
2022-11-18all: add missing periods in commentscui fliter
Change-Id: I69065f8adf101fdb28682c55997f503013a50e29 Reviewed-on: https://go-review.googlesource.com/c/go/+/449757 Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Joedian Reid <joedian@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Joedian Reid <joedian@golang.org> Run-TryBot: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com>
2022-11-16archive/tar, archive/zip: return ErrInsecurePath for unsafe pathsDamien Neil
Return a distinguishable error when reading an archive file with a path that is: - absolute - escapes the current directory (../a) - on Windows, a reserved name such as NUL Users may ignore this error and proceed if they do not need name sanitization or intend to perform it themselves. Fixes #25849 Fixes #55356 Change-Id: Ieefa163f00384bc285ab329ea21a6561d39d8096 Reviewed-on: https://go-review.googlesource.com/c/go/+/449937 Reviewed-by: Joseph Tsai <joetsai@digital-static.net> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Damien Neil <dneil@google.com> Auto-Submit: Damien Neil <dneil@google.com> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Roland Shoemaker <roland@golang.org>
2022-10-05archive/tar: limit size of headersDamien Neil
Set a 1MiB limit on special file blocks (PAX headers, GNU long names, GNU link names), to avoid reading arbitrarily large amounts of data into memory. Thanks to Adam Korczynski (ADA Logics) and OSS-Fuzz for reporting this issue. Fixes CVE-2022-2879 For #54853 Change-Id: I85136d6ff1e0af101a112190e027987ab4335680 Reviewed-on: https://team-review.git.corp.google.com/c/golang/go-private/+/1565555 Reviewed-by: Tatiana Bradley <tatianabradley@google.com> Run-TryBot: Roland Shoemaker <bracewell@google.com> Reviewed-by: Roland Shoemaker <bracewell@google.com> Reviewed-on: https://go-review.googlesource.com/c/go/+/439355 Reviewed-by: Damien Neil <dneil@google.com> Run-TryBot: Roland Shoemaker <roland@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Roland Shoemaker <roland@golang.org>
2022-04-11all: gofmt main repoRuss Cox
[This CL is part of a sequence implementing the proposal #51082. The design doc is at https://go.dev/s/godocfmt-design.] Run the updated gofmt, which reformats doc comments, on the main repository. Vendored files are excluded. For #51082. Change-Id: I7332f099b60f716295fb34719c98c04eb1a85407 Reviewed-on: https://go-review.googlesource.com/c/go/+/384268 Reviewed-by: Jonathan Amsterdam <jba@google.com> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2021-08-19archive/tar: unexport internal methodsRoger Peppe
Many of the methods inside the archive/tar package don't need to be exported. Doing so sets a bad precedent that it's OK to export methods to indicate an internal public API. That's not a good idea in general, because exported methods increase cognitive load when reading code: the reader needs to consider whether the exported method might be used via some external interface or reflection. This CL should have no externally visible behaviour changes at all. Change-Id: I94a63de5e6a28e9ac8a283325217349ebce4f308 Reviewed-on: https://go-review.googlesource.com/c/go/+/341410 Reviewed-by: Joe Tsai <joetsai@digital-static.net> Trust: Joe Tsai <joetsai@digital-static.net> Trust: Michael Knyszek <mknyszek@google.com>
2020-10-20all: update references to symbols moved from io/ioutil to ioRuss Cox
The old ioutil references are still valid, but update our code to reflect best practices and get used to the new locations. Code compiled with the bootstrap toolchain (cmd/asm, cmd/dist, cmd/compile, debug/elf) must remain Go 1.4-compatible and is excluded. Also excluded vendored code. For #41190. Change-Id: I6d86f2bf7bc37a9d904b6cee3fe0c7af6d94d5b1 Reviewed-on: https://go-review.googlesource.com/c/go/+/263142 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
2019-12-10all: fix a number of misuses of the word "an"Daniel Martí
After golang.org/cl/210124, I wondered if the same error had gone unnoticed elsewhere. I quickly spotted another dozen mistakes after reading through the output of: git grep '\<[Aa]n [bcdfgjklmnpqrtvwyz][a-z]' Many results are false positives for acronyms like "an mtime", since it's pronounced "an em-time". However, the total amount of output isn't that large given how simple the grep pattern is. Change-Id: Iaa2ca69e42f4587a9e3137d6c5ed758887906ca6 Reviewed-on: https://go-review.googlesource.com/c/go/+/210678 Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Zach Jones <zachj1@gmail.com> Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-02-25archive/tar: remove loop label from readerAgniva De Sarker
CL 14624 introduced this label. At that time, the switch-case had a break to label statement which made this necessary. But now, the code no longer has a break statement and it directly returns. Hence, it is no longer necessary to have a label. Change-Id: Idde0fcc4d2db2d76424679f5acfe33ab8573bce4 Reviewed-on: https://go-review.googlesource.com/96935 Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2018-02-18all: remove "the" duplicationsKunpei Sakai
Change-Id: I1f25b11fb9b7cd3c09968ed99913dc85db2025ef Reviewed-on: https://go-review.googlesource.com/94976 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-02-13archive/tar: automatically promote TypeRegACaio Marcelo de Oliveira Filho
Change Reader to promote TypeRegA to TypeReg in headers, unless their name have a trailing slash which is already promoted to TypeDir. This will allow client code to handle just TypeReg instead both TypeReg and TypeRegA. Change Writer to promote TypeRegA to TypeReg or TypeDir in the headers depending on whether the name has a trailing slash. This normalization is motivated by the specification (in pax(1)): 0 represents a regular file. For backwards-compatibility, a typeflag value of binary zero ( '\0' ) should be recognized as meaning a regular file when extracting files from the archive. Archives written with this version of the archive file format create regular files with a typeflag value of the ISO/IEC 646:1991 standard IRV '0'. Fixes #22768. Change-Id: I149ec55824580d446cdde5a0d7a0457ad7b03466 Reviewed-on: https://go-review.googlesource.com/85656 Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com> Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-11-29archive/tar: use placeholder name for global PAX recordsJoe Tsai
Several usages of tar (reasonably) just use the Header.FileInfo to determine the type of the header. However, the os.FileMode type is not expressive enough to represent "files" that are not files at all, but some form of metadata. Thus, Header{Typeflag: TypeXGlobalHeader}.FileInfo().Mode().IsRegular() reports true, even though the expected result may have been false. To reduce (not eliminate) the possibility of failure for such usages, use the placeholder filename from the global PAX headers. Thus, in the event the user did not handle special "meta" headers specifically, they will just be written to disk as a regular file. As an example use case, the "git archive --format=tgz" command produces an archive where the first "file" is a global PAX header with the name "global_pax_header". For users that do not explicitly check the Header.Typeflag field to ignore such headers, they may end up extracting a file named "global_pax_header". While it is a bogus file, it at least does not stop the extraction process. Updates #22748 Change-Id: I28448b528dcfacb4e92311824c33c71b482f49c9 Reviewed-on: https://go-review.googlesource.com/78355 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-11-16archive/tar: partially revert sparse file supportJoe Tsai
This CL removes the following APIs: type SparseEntry struct{ ... } type Header struct{ SparseHoles []SparseEntry; ... } func (*Header) DetectSparseHoles(f *os.File) error func (*Header) PunchSparseHoles(f *os.File) error func (*Reader) WriteTo(io.Writer) (int, error) func (*Writer) ReadFrom(io.Reader) (int, error) This API was added during the Go1.10 dev cycle, and are safe to remove. The rationale for reverting is because Header.DetectSparseHoles and Header.PunchSparseHoles are functionality that probably better belongs in the os package itself. The other API like Header.SparseHoles, Reader.WriteTo, and Writer.ReadFrom perform no OS specific logic and only perform the actual business logic of reading and writing sparse archives. Since we do know know what the API added to package os may look like, we preemptively revert these non-OS specific changes as well by simply commenting them out. Updates #13548 Updates #22735 Change-Id: I77842acd39a43de63e5c754bfa1c26cc24687b70 Reviewed-on: https://go-review.googlesource.com/78030 Reviewed-by: Russ Cox <rsc@golang.org>
2017-11-07archive/tar: a cosmetic fix after checking by golintStanislav Afanasev
Existing methods regFileReader.LogicalRemaining and regFileReader.PhysicalRemaining have inconsistent reciever names with the previous name Change-Id: Ief2024716737eaf482c4311f3fdf77d92801c36e Reviewed-on: https://go-review.googlesource.com/76430 Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com> Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
2017-10-10archive/tar: improve handling of directory pathsJoe Tsai
The USTAR format says: <<< Implementors should be aware that the previous file format did not include a mechanism to archive directory type files. For this reason, the convention of using a filename ending with <slash> was adopted to specify a directory on the archive. >>> In light of this suggestion, make the following changes: * Writer.WriteHeader refuses to encode a header where a file that is obviously a file-type has a trailing slash in the name. * formatter.formatString avoids encoding a trailing slash in the event that the string is truncated (the full string will be encoded elsewhere, so stripping the slash is safe). * Reader.Next treats a TypeRegA (which is the zero value of Typeflag) as a TypeDir if the name has a trailing slash. Change-Id: Ibf27aa8234cce2032d92e5e5b28546c2f2ae5ef6 Reviewed-on: https://go-review.googlesource.com/69293 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-03archive/tar: fix typo in documentationJoe Tsai
s/TypeSymLink/TypeSymlink/g Change-Id: I2550843248eb27d90684d0036fe2add0b247ae5a Reviewed-on: https://go-review.googlesource.com/67810 Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-09-25archive/tar: avoid empty IO operationsJoe Tsai
The interfaces for io.Reader and io.Writer permit calling Read/Write with an empty buffer. However, this condition is often not well tested and can lead to bugs in various implementations of io.Reader and io.Writer. For example, see #22028 for buggy io.Reader in the bzip2 package. We reduce the likelihood of hitting these bugs by adjusting regFileReader.Read and regFileWriter.Write to avoid performing Read and Write calls when the buffer is known to be empty. Fixes #22029 Change-Id: Ie4a26be53cf87bc4d2abd951fa005db5871cc75c Reviewed-on: https://go-review.googlesource.com/66111 Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com> Reviewed-by: Giovanni Bajo <rasky@develer.com> Reviewed-by: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-18archive/tar: add Reader.WriteTo and Writer.ReadFromJoe Tsai
To support the efficient packing and extracting of sparse files, add two new methods: func Reader.WriteTo(io.Writer) (int64, error) func Writer.ReadFrom(io.Reader) (int64, error) If the current archive entry is sparse and the provided io.{Reader,Writer} is also an io.Seeker, then use Seek to skip past the holes. If the last region in a file entry is a hole, then we seek to 1 byte before the EOF: * for Reader.WriteTo to write a single byte to ensure that the resulting filesize is correct. * for Writer.ReadFrom to read a single byte to verify that the input filesize is correct. The downside of this approach is when the last region in the sparse file is a hole. In the case of Reader.WriteTo, the 1-byte write will cause the last fragment to have a single chunk allocated. However, the goal of ReadFrom/WriteTo is *not* the ability to exactly reproduce sparse files (in terms of the location of sparse holes), but rather to provide an efficient way to create them. File systems already impose their own restrictions on how the sparse file will be created. Some filesystems (e.g., HFS+) don't support sparseness and seeking forward simply causes the FS to write zeros. Other filesystems have different chunk sizes, which will cause chunk allocations at boundaries different from what was in the original sparse file. In either case, it should not be a normal expectation of users that the location of holes in sparse files exactly matches the source. For users that really desire to have exact reproduction of sparse holes, they can wrap os.File with their own io.WriteSeeker that discards the final 1-byte write and uses File.Truncate to resize the file to the correct size. Other reasons we choose this approach over special-casing *os.File because: * The Reader already has special-case logic for io.Seeker * As much as possible, we want to decouple OS-specific logic from Reader and Writer. * This allows other abstractions over *os.File to also benefit from the "skip past holes" logic. * It is easier to test, since it is harder to mock an *os.File. Updates #13548 Change-Id: I0a4f293bd53d13d154a946bc4a2ade28a6646f6a Reviewed-on: https://go-review.googlesource.com/60872 Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-30archive/tar: minor doc fixesJoe Tsai
Use "file" consistently instead of "entry". Change-Id: Ia81c9665d0d956adb78f7fa49de40cdb87fba000 Reviewed-on: https://go-review.googlesource.com/60150 Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-25archive/tar: improve package documentationJoe Tsai
Many aspects of the package is woefully undocumented. With the recent flurry of improvements, the package is now at feature parity with the GNU and TAR tools. Thoroughly all of the public API and perform some minor stylistic cleanup in some code segments. Change-Id: Ic892fd72c587f30dfe91d1b25b88c9c8048cc389 Reviewed-on: https://go-review.googlesource.com/59210 Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-25archive/tar: add raw support for global PAX recordsJoe Tsai
The PAX specification says the following: <<< 'g' represents global extended header records for the following files in the archive. The format of these extended header records shall be as described in pax Extended Header. Each value shall affect all subsequent files that do not override that value in their own extended header record and until another global extended header record is reached that provides another value for the same field. >>> This CL adds support for parsing and composing global PAX records, but intentionally does not provide support for automatically persisting the global state across files. Changes made: * When Reader encounters a TypeXGlobalRecord header, it parses the PAX records and returns them to the user ad-verbatim. Reader does not store them in its state, ensuring it has no effect on future Next calls. * When Writer receives a TypeXGlobalRecord header, it writes the PAX records to the archive ad-verbatim. It does not store them in its state, ensuring it has no effect on future WriteHeader calls. * The restriction regarding empty record values is lifted since this value is used to represent deletion in global headers. Why provide raw support only: * Some archives in the wild have a global header section (often empty) and it is the user's responsibility to manually read and discard it's body. The logic added here allows users to more easily skip over these sections. * For users that do care about global headers, having access to the raw records allows them to implement the functionality of global headers themselves and manually persist the global state across files. * We can still upgrade to a full implementation in the future. Why we don't provide full support: * Even though the PAX specification describes their operation in detail, both the GNU and BSD tar tools (which are the most common implementations) do not have a consistent interpretation of many details. * Global headers were a controversial feature in PAX, by admission of the specification itself: <<< The concept of a global extended header (typeflag g) was controversial. The typeflag g global headers should not be used with interchange media that could suffer partial data loss in transporting the archive. >>> * Having state persist from entry-to-entry complicates the implementation for a feature that is not widely used and not well supported. Change-Id: I1d904cacc2623ddcaa91525a5470b7dbe226c7e8 Reviewed-on: https://go-review.googlesource.com/59190 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
2017-08-25archive/tar: support arbitrary PAX recordsJoe Tsai
This CL adds the following new publicly visible API: type Header struct { ...; PAXRecords map[string]string } The new Header.PAXRecords field is a map of all PAX extended header records. We suggest (but do not enforce) that users use VENDOR-prefixed keys according to the following in the PAX specification: <<< The standard developers have reserved keyword name space for vendor extensions. It is suggested that the format to be used is: VENDOR.keyword where VENDOR is the name of the vendor or organization in all uppercase letters. >>> When reading, the Header.PAXRecords is populated with all PAX records encountered so far, including basic ones (e.g., "path", "mtime", etc). When writing, the fields of Header will be merged into PAXRecords, overwriting any records that may conflict. Since PAXRecords is a more expressive feature than Xattrs and is entirely a superset of Xattrs, we mark Xattrs as deprecated, and steer users towards the new PAXRecords API. The issue has a discussion about adding a Header.SetPAXRecord method to help validate records and keep the Header fields in sync. However, we do not include that in this CL since that helper method can always be added in the future. There is no support for global records. Fixes #14472 Change-Id: If285a52749acc733476cf75a2c7ad15bc1542071 Reviewed-on: https://go-review.googlesource.com/58390 Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-24archive/tar: support reporting and selecting the formatJoe Tsai
The Reader and Writer are now at feature parity, meaning that everything that can be parsed by the Reader, can also be composed by the Writer. This position enables us to support selection of the format in a backwards compatible way, since it ensures that everything that can be read can also be round-trip written. As such, we add the following new API: type Format int const FormatUnknown Format = 0 ... type Header struct { ...; Format Format } The new Header.Format field is populated by the Reader on the best guess on what the format is. Note that the Reader is very liberal in what it permits, so a hybrid TAR file using aspects of multiple formats can still be decoded, but will be reported as FormatUnknown. Even though Reader has full support for V7 and basic support for STAR, it will still report those formats as unknown (and the constants for those formats are not even exported). The reasons for this is because the Writer has no support for V7 or STAR. Leaving it as unknown allows the Writer to choose a format usually USTAR or GNU that can encode the equivalent Header. When writing, the Header.allowedFormats will take the Format field into consideration if it is a known format. Fixes #18710 Change-Id: I00980c475d067c6969d3414e1ff0224fdd89cd49 Reviewed-on: https://go-review.googlesource.com/58230 Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-19archive/tar: refactor Reader support for sparse filesJoe Tsai
This CL is the first step (of two) for adding sparse file support to the Writer. This CL only refactors the logic of sparse-file handling in the Reader so that common logic can be easily shared by the Writer. As a result of this CL, there are some new publicly visible API changes: type SparseEntry struct { Offset, Length int64 } type Header struct { ...; SparseHoles []SparseEntry } A new type is defined to represent a sparse fragment and a new field Header.SparseHoles is added to represent the sparse holes in a file. The API intentionally represent sparse files using hole fragments, rather than data fragments so that the zero value of SparseHoles naturally represents a normal file (i.e., a file without any holes). The Reader now populates SparseHoles for sparse files. It is necessary to export the sparse hole information, otherwise it would be impossible for the Writer to specify that it is trying to encode a sparse file, and what it looks like. Some unexported helper functions were added to common.go: func validateSparseEntries(sp []SparseEntry, size int64) bool func alignSparseEntries(src []SparseEntry, size int64) []SparseEntry func invertSparseEntries(src []SparseEntry, size int64) []SparseEntry The validation logic that used to be in newSparseFileReader is now moved to validateSparseEntries so that the Writer can use it in the future. alignSparseEntries is currently unused by the Reader, but will be used by the Writer in the future. Since TAR represents sparse files by only recording the data fragments, we add the invertSparseEntries function to convert a list of data fragments to a normalized list of hole fragments (and vice-versa). Some other high-level changes: * skipUnread is deleted, where most of it's logic is moved to the Discard methods on regFileReader and sparseFileReader. * readGNUSparsePAXHeaders was rewritten to be simpler. * regFileReader and sparseFileReader were completely rewritten in simpler and easier to understand logic. * A bug was fixed in sparseFileReader.Read where it failed to report an error if the logical size of the file ends before consuming all of the underlying data. * The tests for sparse-file support was completely rewritten. Updates #13548 Change-Id: Ic1233ae5daf3b3f4278fe1115d34a90c4aeaf0c2 Reviewed-on: https://go-review.googlesource.com/56771 Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-15archive/tar: centralize errors in common.goJoe Tsai
Move all sentinel errors to common.go since some of them are returned by both the reader and writer and remove errInvalidHeader since it not used. Also, consistently use the "tar: " prefix for errors. Change-Id: I0afb185bbf3db80dfd9595321603924454a4c2f9 Reviewed-on: https://go-review.googlesource.com/55650 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-15archive/tar: roundtrip reading device numbersJoe Tsai
Both GNU and BSD tar do not care if the devmajor and devminor values are set on entries (like regular files) that aren't character or block devices. While this is non-sensible, it is more consistent with the Writer to actually read these fields always. In a vast majority of the cases these will still be zero. In the rare situation where someone actually cares about these, at least information was not silently lost. Change-Id: I6e4ba01cd897a1b13c28b1837e102a4fdeb420ba Reviewed-on: https://go-review.googlesource.com/55572 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-11archive/tar: fallback to pre-Go1.8 behavior on certain GNU filesJoe Tsai
Prior to Go1.8, the Writer had a bug where it would output an invalid tar file in certain rare situations because the logic incorrectly believed that the old GNU format had a prefix field. This is wrong and leads to an output file that mangles the atime and ctime fields, which are often left unused. In order to continue reading tar files created by former, buggy versions of Go, we skeptically parse the atime and ctime fields. If we are unable to parse them and the prefix field looks like an ASCII string, then we fallback on the pre-Go1.8 behavior of treating these fields as the USTAR prefix field. Note that this will not use the fallback logic for all possible files generated by a pre-Go1.8 toolchain. If the generated file happened to have a prefix field that parses as valid atime and ctime fields (e.g., when they are valid octal strings), then it is impossible to distinguish between an valid GNU file and an invalid pre-Go1.8 file. Fixes #21005 Change-Id: Iebf5c67c08e0e46da6ee41a2e8b339f84030dd90 Reviewed-on: https://go-review.googlesource.com/53635 Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-10-22archive/tar: validate sparse headers in parsePAXJoe Tsai
According to the GNU manual, the format is: <<< GNU.sparse.size=size GNU.sparse.numblocks=numblocks repeat numblocks times GNU.sparse.offset=offset GNU.sparse.numbytes=numbytes end repeat >>> The logic in parsePAX converts the repeating sequence of (offset, numbytes) pairs (which is not PAX compliant) into a single comma-delimited list of numbers (which is now PAX compliant). Thus, we validate the following: * The (offset, numbytes) headers must come in the correct order. * The ',' delimiter cannot appear in the value. We do not validate that the value is a parsible decimal since that will be determined later. Change-Id: I8d6681021734eb997898227ae8603efb1e17c0c8 Reviewed-on: https://go-review.googlesource.com/31439 Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-10-19archive/tar: fix parsePAX to be POSIX.1-2001 compliantJoe Tsai
Relevant PAX specification: <<< If the <value> field is zero length, it shall delete any header block field, previously entered extended header value, or global extended header value of the same name. >>> We don't delete global extender headers since the Reader doesn't even support global headers (which the specification admits was a controversial feature). Change-Id: I2125a5c907b23a3dc439507ca90fa5dc47d474a9 Reviewed-on: https://go-review.googlesource.com/31440 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-10-19archive/tar: make Reader handle GNU format properlyJoe Tsai
The GNU format does not have a prefix field, so we should make no attempt to read it. It does however have atime and ctime fields. Since Go previously placed incorrect values here, we liberally read the atime and ctime fields and ignore errors so that old tar files written by Go can at least be partially read. This fixes half of #12594. The Writer is much harder to fix. Updates #12594 Change-Id: Ia32845e2f262ee53366cf41dfa935f4d770c7a30 Reviewed-on: https://go-review.googlesource.com/31444 Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-10-12archive/tar: fix and cleanup readOldGNUSparseMapJoe Tsai
* Assert that the format is GNU. Both GNU and STAR have some form of sparse file support with incompatible header structures. Worse yet, both formats use the 'S' type flag to indicate the presence of a sparse file. As such, we should check the format (based on magic numbers) and fail early. * Move realsize parsing logic into readOldGNUSparseMap. This is related to the sparse parsing logic and belongs here. * Fix the termination condition for parsing sparse fields. The termination condition for reading the sparse fields is to simply check if the first byte of the offset field is NULL. This does not seem to be documented in the GNU manual, but this is the check done by the both the GNU and BSD implementations: http://git.savannah.gnu.org/cgit/tar.git/tree/src/sparse.c?id=9a33077a7b7ad7d32815a21dee54eba63b38a81c#n731 https://github.com/libarchive/libarchive/blob/1fa9c7bf90f0862036a99896b0501c381584451a/libarchive/archive_read_support_format_tar.c#L2207 * Fix the parsing of sparse fields to use parseNumeric. This is what GNU and BSD do. The previous two links show that GNU and BSD both handle base-256 and base-8. * Detect truncated streams. The call to io.ReadFull does not check if the error is io.EOF. Getting io.EOF in this method is never okay and should always be converted to io.ErrUnexpectedEOF. * Simplify the function. The logic is essentially a do-while loop so we can remove some redundant code. Change-Id: Ib2f601b1a283eaec1e41b1d3396d649c80749c4e Reviewed-on: https://go-review.googlesource.com/28471 Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org>
2016-10-12archive/tar: handle integer overflow on 32bit machinesJoe Tsai
Most calls to strconv.ParseInt(x, 10, 0) should really be calls to strconv.ParseInt(x, 10, 64) in order to ensure that they do not overflow on 32b architectures. Furthermore, we should document a bug where Uid and Gid may overflow on 32b machines since the type is declared as int. Change-Id: I99c0670b3c2922e4a9806822d9ad37e1a364b2b8 Reviewed-on: https://go-review.googlesource.com/28472 Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>
2016-09-29archive/tar: move parse/format functionality into strconv.goJoe Tsai
Move all parse/format related functionality into strconv.go and thoroughly test them. This also reduces the amount of noise inside reader.go and writer.go. There was zero functionality change other than moving code around. Change-Id: I3bc288d10c20ebb3814b30b75d8acd7be62b85d7 Reviewed-on: https://go-review.googlesource.com/28470 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-09-02archive/tar: reapply Header.Size to regFileReader after mergingJoe Tsai
The use of PAX headers can modify the overall file size, thus the formerly created regFileReader may be stale. The relevant PAX specification for this behavior is: <<< Any fields in the preceding optional extended header shall override the associated fields in this header block for this file. >>> Where "optional extended header" refers to the preceding PAX header. Where "this header block" refers to the subsequent USTAR header. Fixes #15573 Fixes #15564 Change-Id: I83b1c3f05a9ca2d3be38647425ad21a9fe450ee2 Reviewed-on: https://go-review.googlesource.com/28418 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-08-31archive/tar: make Reader error handling consistentJoe Tsai
The tar.Reader guarantees stickiness of errors. Ensuring this property means that the methods of Reader need to be consistent about whose responsibility it is to actually ensure that errors are sticky. In this CL, we make it only the responsibility of the exported methods (Next and Read) to store tr.err. All other methods just return the error as is. As part of this change, we also check the error value of mergePAX (and test that it properly detects invalid PAX files). Since the value of mergePAX was never used before, we change it such that it always returns ErrHeader instead of strconv.SyntaxError. This keeps it consistent with other usages of strconv in the same tar package. Change-Id: Ia1c31da71f1de4c175da89a385dec665d3edd167 Reviewed-on: https://go-review.googlesource.com/28215 Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-08-25archive/tar: isolate regular and sparse file handling as methodsJoe Tsai
Factor out the regular file handling logic into handleRegularFile from nextHeader. We will need to reuse this logic when fixing #15573 in a future CL. Factor out the sparse file handling logic into handleSparseFile. Currently this logic is split between nextHeader (for GNU sparse files) and Next (for PAX sparse files). Instead, we move this related code into a single method. There is no overall logic change. Thus, no unit tests. Updates #15573 #15564 Change-Id: I3b8270d8b4e080e77d6c0df6a123d677c82cc466 Reviewed-on: https://go-review.googlesource.com/27454 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-06archive/tar: centralize all information about tar header formatJoe Tsai
The Reader and Writer have hard-coded constants regarding the offsets and lengths of certain fields in the tar format sprinkled all over. This makes it harder to verify that the offsets are correct since a reviewer would need to search for them throughout the code. Instead, all information about the layout of header fields should be centralized in one single file. This has the advantage of being both centralized, and also acting as a form of documentation about the header struct format. This method was chosen over using "encoding/binary" since that method would cause an allocation of a header struct every time binary.Read was called. This method causes zero allocations and its logic is no longer than if structs were declared. Updates #12594 Change-Id: Ic7a0565d2a2cd95d955547ace3b6dea2b57fab34 Reviewed-on: https://go-review.googlesource.com/14669 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-04-15archive/tar: style nit: s/nano_buf/nanoBuf/Matthew Dempsky
Pointed out during review of golang.org/cl/22104. Change-Id: If8842e7f8146441e918ec6a2b6e893b7cf88615c Reviewed-on: https://go-review.googlesource.com/22120 Run-TryBot: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-15all: remove unnecessary type conversionsMatthew Dempsky
cmd and runtime were handled separately, and I'm intentionally skipped syscall. This is the rest of the standard library. CL generated mechanically with github.com/mdempsky/unconvert. Change-Id: I9e0eff886974dedc37adb93f602064b83e469122 Reviewed-on: https://go-review.googlesource.com/22104 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-13all: use new io.SeekFoo constants instead of os.SEEK_FOOBrad Fitzpatrick
Automated change. Fixes #15269 Change-Id: I8deb2ac0101d3f7c390467ceb0a1561b72edbb2f Reviewed-on: https://go-review.googlesource.com/21962 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Andrew Gerrand <adg@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-12-17archive/tar: document how Reader.Read handles header-only filesJoe Tsai
Commit dd5e14a7511465d20c6e95bf54c9b8f999abbbf6 ensured that no data could be read for header-only files regardless of what the Header.Size said. We should document this fact in Reader.Read. Updates #13647 Change-Id: I4df9a2892bc66b49e0279693d08454bf696cfa31 Reviewed-on: https://go-review.googlesource.com/17913 Reviewed-by: Russ Cox <rsc@golang.org>