[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
8.1.1 Creating and Reading Compressed Archives | ||
8.1.2 Archiving Sparse Files |
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
GNU tar
is able to create and read compressed archives. It supports
a wide variety of compression programs, namely: gzip
,
bzip2
, lzip
, lzma
, lzop
,
zstd
, xz
and traditional compress
. The
latter is supported mostly for backward compatibility, and we recommend
against using it, because it is by far less effective than the other
compression programs(21).
Creating a compressed archive is simple: you just specify a compression option along with the usual archive creation commands. Available compression options are summarized in the table below:
Long | Short | Archive format |
---|---|---|
‘--gzip’ | ‘-z’ | gzip |
‘--bzip2’ | ‘-j’ | bzip2 |
‘--xz’ | ‘-J’ | xz |
‘--lzip’ | lzip | |
‘--lzma’ | lzma | |
‘--lzop’ | lzop | |
‘--zstd’ | zstd | |
‘--compress’ | ‘-Z’ | compress |
For example:
$ tar czf archive.tar.gz .
You can also let GNU tar
select the compression program based on
the suffix of the archive file name. This is done using
‘--auto-compress’ (‘-a’) command line option. For
example, the following invocation will use bzip2
for
compression:
$ tar caf archive.tar.bz2 .
whereas the following one will use lzma
:
$ tar caf archive.tar.lzma .
For a complete list of file name suffixes recognized by GNU tar
,
see auto-compress.
Reading compressed archive is even simpler: you don’t need to specify
any additional options as GNU tar
recognizes its format
automatically. Thus, the following commands will list and extract the
archive created in previous example:
# List the compressed archive $ tar tf archive.tar.gz # Extract the compressed archive $ tar xf archive.tar.gz
The format recognition algorithm is based on signatures, a
special byte sequences in the beginning of file, that are specific for
certain compression formats. If this approach fails, tar
falls back to using archive name suffix to determine its format
(see auto-compress, for a list of recognized suffixes).
Some compression programs are able to handle different compression
formats. GNU tar
uses this, if the principal decompressor for the
given format is not available. For example, if compress
is
not installed, tar
will try to use gzip
. As of
version 1.35 the following alternatives are
tried(22):
Format | Main decompressor | Alternatives |
---|---|---|
compress | compress | gzip |
lzma | lzma | xz |
bzip2 | bzip2 | lbzip2 |
The only case when you have to specify a decompression option while
reading the archive is when reading from a pipe or from a tape drive
that does not support random access. However, in this case GNU tar
will indicate which option you should use. For example:
$ cat archive.tar.gz | tar tf - tar: Archive is compressed. Use -z option tar: Error is not recoverable: exiting now
If you see such diagnostics, just add the suggested option to the
invocation of GNU tar
:
$ cat archive.tar.gz | tar tzf -
Notice also, that there are several restrictions on operations on
compressed archives. First of all, compressed archives cannot be
modified, i.e., you cannot update (‘--update’, alias ‘-u’)
them or delete (‘--delete’) members from them or
add (‘--append’, alias ‘-r’) members to them. Likewise, you
cannot append another tar
archive to a compressed archive using
‘--concatenate’ (‘-A’). Secondly, multi-volume
archives cannot be compressed.
The following options allow to select a particular compressor program:
Filter the archive through gzip
.
Filter the archive through xz
.
Filter the archive through bzip2
.
Filter the archive through lzip
.
Filter the archive through lzma
.
Filter the archive through lzop
.
Filter the archive through zstd
.
Filter the archive through compress
.
When any of these options is given, GNU tar
searches the compressor
binary in the current path and invokes it. The name of the compressor
program is specified at compilation time using a corresponding
‘--with-compname’ option to configure
, e.g.
‘--with-bzip2’ to select a specific bzip2
binary.
See section Using lbzip2 with GNU tar
., for a detailed discussion.
The output produced by tar --help
shows the actual
compressor names along with each of these options.
You can use any of these options on physical devices (tape drives,
etc.) and remote files as well as on normal files; data to or from
such devices or remote files is reblocked by another copy of the
tar
program to enforce the specified (or default) record
size. The default compression parameters are used.
You can override them by using the ‘-I’ option (see
below), e.g.:
$ tar -cf archive.tar.gz -I 'gzip -9 -n' subdir
A more traditional way to do this is to use a pipe:
$ tar cf - subdir | gzip -9 -n > archive.tar.gz
Compressed archives are easily corrupted, because compressed files have little redundancy. The adaptive nature of the compression scheme means that the compression tables are implicitly spread all over the archive. If you lose a few blocks, the dynamic construction of the compression tables becomes unsynchronized, and there is little chance that you could recover later in the archive.
Other compression options provide better control over creating compressed archives. These are:
Select a compression program to use by the archive file name suffix. The following suffixes are recognized:
Suffix | Compression program |
---|---|
‘.gz’ | gzip |
‘.tgz’ | gzip |
‘.taz’ | gzip |
‘.Z’ | compress |
‘.taZ’ | compress |
‘.bz2’ | bzip2 |
‘.tz2’ | bzip2 |
‘.tbz2’ | bzip2 |
‘.tbz’ | bzip2 |
‘.lz’ | lzip |
‘.lzma’ | lzma |
‘.tlz’ | lzma |
‘.lzo’ | lzop |
‘.xz’ | xz |
‘.zst’ | zstd |
‘.tzst’ | zstd |
Use external compression program command. Use this option if you
want to specify options for the compression program, or if you
are not happy with the compression program associated with the suffix
at compile time, or if you have a compression program that GNU tar
does not support. The command argument is a valid command
invocation, as you would type it at the command line prompt, with any
additional options as needed. Enclose it in quotes if it contains
white space (see section Running External Commands).
The command should follow two conventions:
First, when invoked without additional options, it should read data from standard input, compress it and output it on standard output.
Secondly, if invoked with the additional ‘-d’ option, it should do exactly the opposite, i.e., read the compressed data from the standard input and produce uncompressed data on the standard output.
The latter requirement means that you must not use the ‘-d’ option as a part of the command itself.
The ‘--use-compress-program’ option, in particular, lets you
implement your own filters, not necessarily dealing with
compression/decompression. For example, suppose you wish to implement
PGP encryption on top of compression, using gpg
(see gpg —- encryption and signing tool in GNU Privacy Guard Manual). The following script does that:
#! /bin/sh case $1 in -d) gpg --decrypt - | gzip -d -c;; '') gzip -c | gpg -s;; *) echo "Unknown option $1">&2; exit 1;; esac
Suppose you name it ‘gpgz’ and save it somewhere in your
PATH
. Then the following command will create a compressed
archive signed with your private key:
$ tar -cf foo.tar.gpgz -Igpgz .
Likewise, the command below will list its contents:
$ tar -tf foo.tar.gpgz -Igpgz .
8.1.1.1 Using lbzip2 with GNU tar . |
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
tar
. Lbzip2
is a multithreaded utility for handling
‘bzip2’ compression, written by Laszlo Ersek. It makes use of
multiple processors to speed up its operation and in general works
considerably faster than bzip2
. For a detailed description
of lbzip2
see http://freshmeat.net/projects/lbzip2 and
lbzip2: parallel bzip2 utility.
Recent versions of lbzip2
are mostly command line compatible
with bzip2
, which makes it possible to automatically invoke
it via the ‘--bzip2’ GNU tar
command line option. To do so,
GNU tar
must be configured with the ‘--with-bzip2’ command
line option, like this:
$ ./configure --with-bzip2=lbzip2 [other-options]
Once configured and compiled this way, tar --help
will show the
following:
$ tar --help | grep -- --bzip2 -j, --bzip2 filter the archive through lbzip2
which means that running tar --bzip2
will invoke lbzip2
.
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Files in the file system occasionally have holes. A hole
in a file is a section of the file’s contents which was never written.
The contents of a hole reads as all zeros. On many operating systems,
actual disk storage is not allocated for holes, but they are counted
in the length of the file. If you archive such a file, tar
could create an archive longer than the original. To have tar
attempt to recognize the holes in a file, use ‘--sparse’
(‘-S’). When you use this option, then, for any file using
less disk space than would be expected from its length, tar
searches the file for holes. It then records in the archive for the file where
the holes (consecutive stretches of zeros) are, and only archives the
“real contents” of the file. On extraction (using ‘--sparse’ is not
needed on extraction) any such files have also holes created wherever the holes
were found. Thus, if you use ‘--sparse’, tar
archives won’t
take more space than the original.
GNU tar
uses two methods for detecting holes in sparse files. These
methods are described later in this subsection.
This option instructs tar
to test each file for sparseness
before attempting to archive it. If the file is found to be sparse it
is treated specially, thus allowing to decrease the amount of space
used by its image in the archive.
This option is meaningful only when creating or updating archives. It has no effect on extraction.
Consider using ‘--sparse’ when performing file system backups, to avoid archiving the expanded forms of files stored sparsely in the system.
Even if your system has no sparse files currently, some may be
created in the future. If you use ‘--sparse’ while making file
system backups as a matter of course, you can be assured the archive
will never take more space on the media than the files take on disk
(otherwise, archiving a disk filled with sparse files might take
hundreds of tapes). See section Using tar
to Perform Incremental Dumps.
However, be aware that ‘--sparse’ option may present a serious
drawback. Namely, in order to determine the positions of holes in a file
tar
may have to read it before trying to archive it, so in total
the file may be read twice. This may happen when your OS or your FS
does not support SEEK_HOLE/SEEK_DATA feature in lseek (See
‘--hole-detection’, below).
When using ‘POSIX’ archive format, GNU tar
is able to store
sparse files using in three distinct ways, called sparse
formats. A sparse format is identified by its number,
consisting, as usual of two decimal numbers, delimited by a dot. By
default, format ‘1.0’ is used. If, for some reason, you wish to
use an earlier format, you can select it using
‘--sparse-version’ option.
Select the format to store sparse files in. Valid version values are: ‘0.0’, ‘0.1’ and ‘1.0’. See section Storing Sparse Files, for a detailed description of each format.
Using ‘--sparse-format’ option implies ‘--sparse’.
Enforce concrete hole detection method. Before the real contents of sparse
file are stored, tar
needs to gather knowledge about file
sparseness. This is because it needs to have the file’s map of holes
stored into tar header before it starts archiving the file contents.
Currently, two methods of hole detection are implemented:
lseek
system call (SEEK_HOLE
and SEEK_DATA
) which is able to
reuse file system knowledge about sparse file contents - so the
detection is usually very fast. To use this feature, your file system
and operating system must support it. At the time of this writing
(2015) this feature, in spite of not being accepted by POSIX, is
fairly widely supported by different operating systems.
When no ‘--hole-detection’ option is given, tar
uses
the ‘seek’, if supported by the operating system.
Using ‘--hole-detection’ option implies ‘--sparse’.
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] |
This document was generated on August 23, 2023 using texi2html 5.0.