[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The notion of sparse file, and the ways of handling it from the point
of view of GNU tar
user have been described in detail in
Archiving Sparse Files. This chapter describes the internal format GNU tar
uses to store such files.
The support for sparse files in GNU tar
has a long history. The
earliest version featuring this support that I was able to find was 1.09,
released in November, 1990. The format introduced back then is called
old GNU sparse format and in spite of the fact that its design
contained many flaws, it was the only format GNU tar
supported
until version 1.14 (May, 2004), which introduced initial support for
sparse archives in PAX archives (see section GNU tar
and POSIX tar
). This
format was not free from design flaws, either and it was subsequently
improved in versions 1.15.2 (November, 2005) and 1.15.92 (June,
2006).
In addition to GNU sparse format, GNU tar
is able to read and
extract sparse files archived by star
.
The following subsections describe each format in detail.
E.0.1 Old GNU Format | ||
E.0.2 PAX Format, Versions 0.0 and 0.1 | ||
E.0.3 PAX Format, Version 1.0 |
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The format introduced in November 1990 (v. 1.09) was
designed on top of standard ustar
headers in such an
unfortunate way that some of its fields overwrote fields required by
POSIX.
An old GNU sparse header is designated by type ‘S’
(GNUTYPE_SPARSE
) and has the following layout:
Offset | Size | Name | Data type | Contents |
---|---|---|---|---|
0 | 345 | N/A | Not used. | |
345 | 12 | atime | Number | atime of the file. |
357 | 12 | ctime | Number | ctime of the file . |
369 | 12 | offset | Number | For multivolume archives: the offset of the start of this volume. |
381 | 4 | N/A | Not used. | |
385 | 1 | N/A | Not used. | |
386 | 96 | sp | sparse_header | (4 entries) File map. |
482 | 1 | isextended | Bool | 1 if an
extension sparse header follows, 0 otherwise. |
483 | 12 | realsize | Number | Real size of the file. |
Each of sparse_header
object at offset 386 describes a single
data chunk. It has the following structure:
Offset | Size | Data type | Contents |
---|---|---|---|
0 | 12 | Number | Offset of the beginning of the chunk. |
12 | 12 | Number | Size of the chunk. |
If the member contains more than four chunks, the isextended
field of the header has the value 1
and the main header is
followed by one or more extension headers. Each such header has
the following structure:
Offset | Size | Name | Data type | Contents |
---|---|---|---|---|
0 | 21 | sp | sparse_header | (21 entries) File map. |
504 | 1 | isextended | Bool | 1 if an
extension sparse header follows, or 0 otherwise. |
A header with isextended=0
ends the map.
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
There are two formats available in this branch. The version 0.0
is the initial version of sparse format used by tar
versions 1.14–1.15.1. The sparse file map is kept in extended
(x
) PAX header variables:
GNU.sparse.size
Real size of the stored file;
GNU.sparse.numblocks
Number of blocks in the sparse map;
GNU.sparse.offset
Offset of the data block;
GNU.sparse.numbytes
Size of the data block.
The latter two variables repeat for each data block, so the overall structure is like this:
GNU.sparse.size=size GNU.sparse.numblocks=numblocks repeat numblocks times GNU.sparse.offset=offset GNU.sparse.numbytes=numbytes end repeat
This format presented the following two problems:
GNU.sparse.offset
and
GNU.sparse.numbytes
are conflicting with the POSIX specs.
tar
results in extraction of sparse files in condensed form. If
the tar
implementation in question does not support POSIX
format, it will also extract a file containing extension header
attributes. This file can be used to expand the file to its original
state. However, posix-aware tar
s will usually ignore the
unknown variables, which makes restoring the file more
difficult. See Extraction of sparse members in v.0.0 format, for the detailed description of how to
restore such members using non-GNU tar
s.
GNU tar
1.15.2 introduced sparse format version 0.1
, which
attempted to solve these problems. As its predecessor, this format
stores sparse map in the extended POSIX header. It retains
GNU.sparse.size
and GNU.sparse.numblocks
variables, but
instead of GNU.sparse.offset
/GNU.sparse.numbytes
pairs
it uses a single variable:
GNU.sparse.map
Map of non-null data chunks. It is a string consisting of comma-separated values "offset,size[,offset-1,size-1...]"
To address the 2nd problem, the name
field in ustar
is replaced with a special name, constructed using the following pattern:
%d/GNUSparseFile.%p/%f
The real name of the sparse file is stored in the variable
GNU.sparse.name
. Thus, those tar
implementations
that are not aware of GNU extensions will at least extract the files
into separate directories, giving the user a possibility to expand it
afterwards. See Extraction of sparse members in v.0.1 format, for the detailed description of how to
restore such members using non-GNU tar
s.
The resulting GNU.sparse.map
string can be very long.
Although POSIX does not impose any limit on the length of a x
header variable, this possibly can confuse some tar
s.
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The version 1.0
of sparse format was introduced with GNU tar
1.15.92. Its main objective was to make the resulting file
extractable with little effort even by non-posix aware tar
implementations. Starting from this version, the extended header
preceding a sparse member always contains the following variables that
identify the format being used:
GNU.sparse.major
Major version
GNU.sparse.minor
Minor version
The name
field in ustar
header contains a special name,
constructed using the following pattern:
%d/GNUSparseFile.%p/%f
The real name of the sparse file is stored in the variable
GNU.sparse.name
. The real size of the file is stored in the
variable GNU.sparse.realsize
.
The sparse map itself is stored in the file data block, preceding the actual file data. It consists of a series of decimal numbers delimited by newlines. The map is padded with nulls to the nearest block boundary.
The first number gives the number of entries in the map. Following are map entries, each one consisting of two numbers giving the offset and size of the data block it describes.
The format is designed in such a way that non-posix aware tar
s and tar
s not
supporting GNU.sparse.*
keywords will extract each sparse file
in its condensed form with the file map prepended and will place it
into a separate directory. Then, using a simple program it would be
possible to expand the file to its original form even without GNU tar
.
See section Extracting Sparse Members, for the detailed information on how to extract
sparse members without GNU tar
.
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] |
This document was generated on August 23, 2023 using texi2html 5.0.