Next: Persistence versus Durability, Previous: Virtual Memory and Big Data, Up: Performance
To be frugal with storage resources, pm-gawk
’s heap file should be
created as a sparse file: a file whose logical size is larger
than its storage resource footprint. Modern file systems support
sparse files, which are easy to create using the truncate
tool shown in our examples.
Let’s first create a conventional non-sparse file using
echo
:
$ echo hi > dense $ ls -l dense -rw-rw-r--. 1 me me 3 Aug 5 23:08 dense $ du -h dense 4.0K dense
The ls
utility reports that file dense is three bytes
long (two for the letters in “hi” plus one for the newline). The
du
utility reports that this file consumes 4 KiB of
storage—one block of disk, as small as a non-sparse file’s storage
footprint can be. Now let’s use truncate
to create a
logically enormous sparse file and check its physical size:
$ truncate -s 1T sparse $ ls -l sparse -rw-rw-r--. 1 me me 1099511627776 Aug 5 22:33 sparse $ du -h sparse 0 sparse
Whereas ls
reports the logical file size that we expect (one
TiB or 2 raised to the power 40 bytes), du
reveals that the
file occupies no storage whatsoever. The file system will allocate
physical storage resources beneath this file as data is written to it;
reading unwritten regions of the file yields zeros.
The “pay as you go” storage cost of sparse files offers both
convenience and control for pm-gawk
users. If your file system
supports sparse files, go ahead and create lavishly capacious heap
files for pm-gawk
. Their logical size costs nothing and persistent
memory allocation within pm-gawk
won’t fail until physical storage
resources beneath the file system are exhausted. But if instead you
want to prevent a heap file from consuming too much storage,
simply set its initial size to whatever bound you wish to enforce; it
won’t eat more disk than that. Copying sparse files with GNU
cp
creates sparse copies by default.
File-system encryption can preclude sparse files: If the cleartext of a byte offset range within a file is all zero bytes, the corresponding ciphertext probably shouldn’t be all zeros! Encrypting at the storage layer instead of the file system layer may offer acceptable security while still permitting file systems to implement sparse files.
Sometimes you might prefer a dense heap file backed by pre-allocated
storage resources, for example to increase the likelihood that
pm-gawk
’s internal memory allocation will succeed until the persistent
heap occupies the entire heap file. The fallocate
utility
will do the trick:
$ fallocate -l 1M mibi $ ls -l mibi -rw-rw-r--. 1 me me 1048576 Aug 5 23:18 mibi $ du -h mibi 1.0M mibi
We get the MiB we asked for, both logically and physically.
Next: Persistence versus Durability, Previous: Virtual Memory and Big Data, Up: Performance