1 @c This is part of the paxutils manual.
2 @c Copyright (C) 2006 Free Software Foundation, Inc.
3 @c This file is distributed under GFDL 1.1 or any later version
4 @c published by the Free Software Foundation.
6 The notion of sparse file, and the ways of handling it from the point
7 of view of @GNUTAR{} user have been described in detail in
8 @ref{sparse}. This chapter describes the internal format @GNUTAR{}
9 uses to store such files.
11 The support for sparse files in @GNUTAR{} has a long history. The
12 earliest version featuring this support that I was able to find was 1.09,
13 released in November, 1990. The format introduced back then is called
14 @dfn{old GNU} sparse format and in spite of the fact that its design
15 contained many flaws, it was the only format @GNUTAR{} supported
16 until version 1.14 (May, 2004), which introduced initial support for
17 sparse archives in @acronym{PAX} archives (@pxref{posix}). This
18 format was not free from design flows, either and it was subsequently
19 improved in versions 1.15.2 (November, 2005) and 1.15.92 (June,
22 In addition to GNU sparse format, @GNUTAR{} is able to read and
23 extract sparse files archived by @command{star}.
25 The following subsections describe each format in detail.
29 * PAX 0:: PAX Format, Versions 0.0 and 0.1
30 * PAX 1:: PAX Format, Version 1.0
34 @appendixsubsec Old GNU Format
36 The format introduced some time around 1990 (v. 1.09). It was
37 designed on top of standard @code{ustar} headers in such an
38 unfortunate way that some of its fields overwrote fields required by
41 An old GNU sparse header is designated by type @samp{S}
42 (@code{GNUTYPE_SPARSE}) and has the following layout:
44 @multitable @columnfractions 0.10 0.10 0.20 0.20 0.40
45 @headitem Offset @tab Size @tab Name @tab Data type @tab Contents
46 @item 0 @tab 345 @tab @tab N/A @tab Not used.
47 @item 345 @tab 12 @tab atime @tab Number @tab @code{atime} of the file.
48 @item 357 @tab 12 @tab ctime @tab Number @tab @code{ctime} of the file .
49 @item 369 @tab 12 @tab offset @tab Number @tab For
50 multivolume archives: the offset of the start of this volume.
51 @item 381 @tab 4 @tab @tab N/A @tab Not used.
52 @item 385 @tab 1 @tab @tab N/A @tab Not used.
53 @item 386 @tab 96 @tab sp @tab @code{sparse_header} @tab (4 entries) File map.
54 @item 482 @tab 1 @tab isextended @tab Bool @tab @code{1} if an
55 extension sparse header follows, @code{0} otherwise.
56 @item 483 @tab 12 @tab realsize @tab Number @tab Real size of the file.
59 Each of @code{sparse_header} object at offset 386 describes a single
60 data chunk. It has the following structure:
62 @multitable @columnfractions 0.10 0.10 0.20 0.60
63 @headitem Offset @tab Size @tab Data type @tab Contents
64 @item 0 @tab 12 @tab Number @tab Offset of the
65 beginning of the chunk.
66 @item 12 @tab 12 @tab Number @tab Size of the chunk.
69 If the member contains more than four chunks, the @code{isextended}
70 field of the header has the value @code{1} and the main header is
71 followed by one or more @dfn{extension headers}. Each such header has
72 the following structure:
74 @multitable @columnfractions 0.10 0.10 0.20 0.20 0.40
75 @headitem Offset @tab Size @tab Name @tab Data type @tab Contents
76 @item 0 @tab 21 @tab sp @tab @code{sparse_header} @tab
77 (21 entires) File map.
78 @item 504 @tab 1 @tab isextended @tab Bool @tab @code{1} if an
79 extension sparse header follows, or @code{0} otherwise.
82 A header with @code{isextended=0} ends the map.
85 @appendixsubsec PAX Format, Versions 0.0 and 0.1
88 There are two formats available in this branch. The version @code{0.0}
89 is the initial version of sparse format used by @command{tar}
90 versions 1.14--1.15.1. The sparse file map is kept in extended
91 (@code{x}) PAX header variables:
95 Real size of the stored file
97 @item GNU.sparse.numblocks
98 Number of blocks in the sparse map
100 @item GNU.sparse.offset
101 Offset of the data block
103 @item GNU.sparse.numbytes
104 Size of the data block
107 The latter two variables repeat for each data block, so the overall
108 structure is like this:
112 GNU.sparse.size=@var{size}
113 GNU.sparse.numblocks=@var{numblocks}
114 repeat @var{numblocks} times
115 GNU.sparse.offset=@var{offset}
116 GNU.sparse.numbytes=@var{numbytes}
121 This format presented the following two problems:
125 Whereas the POSIX specification allows a variable to appear multiple
126 times in a header, it requires that only the last occurrence be
127 meaningful. Thus, multiple ocurrences of @code{GNU.sparse.offset} and
128 @code{GNU.sparse.numbytes} are conficting with the POSIX specs.
131 Attempting to extract such archives using a third-party @command{tar}s
132 results in extraction of sparse files in @emph{compressed form}. If
133 the @command{tar} implementation in question does not support POSIX
134 format, it will also extract a file containing extension header
135 attributes. This file can be used to expand the file to its original
136 state. However, posix-aware @command{tar}s will usually ignore the
137 unknown variables, which makes restoring the file much more
138 difficult@FIXME-xref{how to extract sparse file using third-party @command{tar}s}.
141 @GNUTAR{} 1.15.2 introduced sparse format version @code{0.1}, which
142 attempted to solve these problems. As its predecessor, this format
143 stores sparse map in the extended POSIX header. It retains
144 @code{GNU.sparse.size} and @code{GNU.sparse.numblocks} variables, but
145 instead of @code{GNU.sparse.offset}/@code{GNU.sparse.numbytes} pairs
146 it uses a single variable:
150 Map of non-null data chunks. It is a string consisting of
151 comma-separated values "@var{offset},@var{size}[,@var{offset-1},@var{size-1}...]"
154 To address the 2nd problem, the @code{name} field in @code{ustar}
155 is replaced with a special name, constructed using the following pattern:
158 %d/GNUSparseFile.%p/%f
161 The real name of the sparse file is stored in the variable
162 @code{GNU.sparse.name}. Thus, those @command{tar} implementations
163 that are not aware of GNU extensions will at least extract the files
164 into separate directories, giving the user a possibility to expand it
165 afterwards @FIXME-ref{how to extract sparse file using third-party
168 The resulting @code{GNU.sparse.map} string can be @emph{very} long.
169 Although POSIX does not impose any limit on the length of a @code{x}
170 header variable, this possibly can confuse some tars.
173 @appendixsubsec PAX Format, Version 1.0
176 The version @code{1.0} of sparse format was introduced with @GNUTAR{}
177 1.15.92. Its main objective was to make the resulting file
178 extractable with little effort even by non-posix aware @command{tar}
179 implementations. Starting from this version, the extended header
180 preceding a sparse member always contains the following variables that
181 identify the format being used:
184 @item GNU.sparse.major
187 @item GNU.sparse.minor
191 The @code{name} field in @code{ustar} header contains a special name,
192 constructed using the following pattern:
195 %d/GNUSparseFile.%p/%f
198 The real name of the sparse file is stored in the variable
199 @code{GNU.sparse.name}. The real size of the file is stored in the
200 variable @code{GNU.sparse.realsize}.
202 The sparse map itself is stored in the file data block, preceding the actual
203 file data. It consists of a series of octal numbers of arbitrary length, delimited
204 by newlines. The map is padded with nulls to the nearest block boundary.
206 The first number gives the number of entries in the map. Following are map entries,
207 each one consisting of two numbers giving the offset and size of the
208 data block it describes.
210 The format is designed in such a way that non-posix aware tars and tars not
211 supporting @code{GNU.sparse.*} keywords will extract each sparse file
212 in its condensed form with the file map prepended and will place it
213 into a separate directory. Then, using a simple program it would be
214 possible to expand the file to its original form even without GNU tar.
215 @FIXME-xref{how to extract sparse file using third-party
216 @command{tar}s}. @FIXME{Write the program and give its URL here}.