1 @c This is part of the paxutils manual.
2 @c Copyright (C) 2006 Free Software Foundation, Inc.
3 @c This file is distributed under GFDL 1.1 or any later version
4 @c published by the Free Software Foundation.
7 @cindex sparse versions
8 The notion of sparse file, and the ways of handling it from the point
9 of view of @GNUTAR{} user have been described in detail in
10 @ref{sparse}. This chapter describes the internal format @GNUTAR{}
11 uses to store such files.
13 The support for sparse files in @GNUTAR{} has a long history. The
14 earliest version featuring this support that I was able to find was 1.09,
15 released in November, 1990. The format introduced back then is called
16 @dfn{old GNU} sparse format and in spite of the fact that its design
17 contained many flaws, it was the only format @GNUTAR{} supported
18 until version 1.14 (May, 2004), which introduced initial support for
19 sparse archives in @acronym{PAX} archives (@pxref{posix}). This
20 format was not free from design flows, either and it was subsequently
21 improved in versions 1.15.2 (November, 2005) and 1.15.92 (June,
24 In addition to GNU sparse format, @GNUTAR{} is able to read and
25 extract sparse files archived by @command{star}.
27 The following subsections describe each format in detail.
31 * PAX 0:: PAX Format, Versions 0.0 and 0.1
32 * PAX 1:: PAX Format, Version 1.0
36 @appendixsubsec Old GNU Format
38 @cindex sparse formats, Old GNU
39 @cindex Old GNU sparse format
40 The format introduced some time around 1990 (v. 1.09). It was
41 designed on top of standard @code{ustar} headers in such an
42 unfortunate way that some of its fields overwrote fields required by
45 An old GNU sparse header is designated by type @samp{S}
46 (@code{GNUTYPE_SPARSE}) and has the following layout:
48 @multitable @columnfractions 0.10 0.10 0.20 0.20 0.40
49 @headitem Offset @tab Size @tab Name @tab Data type @tab Contents
50 @item 0 @tab 345 @tab @tab N/A @tab Not used.
51 @item 345 @tab 12 @tab atime @tab Number @tab @code{atime} of the file.
52 @item 357 @tab 12 @tab ctime @tab Number @tab @code{ctime} of the file .
53 @item 369 @tab 12 @tab offset @tab Number @tab For
54 multivolume archives: the offset of the start of this volume.
55 @item 381 @tab 4 @tab @tab N/A @tab Not used.
56 @item 385 @tab 1 @tab @tab N/A @tab Not used.
57 @item 386 @tab 96 @tab sp @tab @code{sparse_header} @tab (4 entries) File map.
58 @item 482 @tab 1 @tab isextended @tab Bool @tab @code{1} if an
59 extension sparse header follows, @code{0} otherwise.
60 @item 483 @tab 12 @tab realsize @tab Number @tab Real size of the file.
63 Each of @code{sparse_header} object at offset 386 describes a single
64 data chunk. It has the following structure:
66 @multitable @columnfractions 0.10 0.10 0.20 0.60
67 @headitem Offset @tab Size @tab Data type @tab Contents
68 @item 0 @tab 12 @tab Number @tab Offset of the
69 beginning of the chunk.
70 @item 12 @tab 12 @tab Number @tab Size of the chunk.
73 If the member contains more than four chunks, the @code{isextended}
74 field of the header has the value @code{1} and the main header is
75 followed by one or more @dfn{extension headers}. Each such header has
76 the following structure:
78 @multitable @columnfractions 0.10 0.10 0.20 0.20 0.40
79 @headitem Offset @tab Size @tab Name @tab Data type @tab Contents
80 @item 0 @tab 21 @tab sp @tab @code{sparse_header} @tab
81 (21 entires) File map.
82 @item 504 @tab 1 @tab isextended @tab Bool @tab @code{1} if an
83 extension sparse header follows, or @code{0} otherwise.
86 A header with @code{isextended=0} ends the map.
89 @appendixsubsec PAX Format, Versions 0.0 and 0.1
91 @cindex sparse formats, v.0.0
92 There are two formats available in this branch. The version @code{0.0}
93 is the initial version of sparse format used by @command{tar}
94 versions 1.14--1.15.1. The sparse file map is kept in extended
95 (@code{x}) PAX header variables:
98 @vrindex GNU.sparse.size, extended header variable
100 Real size of the stored file
102 @item GNU.sparse.numblocks
103 @vrindex GNU.sparse.numblocks, extended header variable
104 Number of blocks in the sparse map
106 @item GNU.sparse.offset
107 @vrindex GNU.sparse.offset, extended header variable
108 Offset of the data block
110 @item GNU.sparse.numbytes
111 @vrindex GNU.sparse.numbytes, extended header variable
112 Size of the data block
115 The latter two variables repeat for each data block, so the overall
116 structure is like this:
120 GNU.sparse.size=@var{size}
121 GNU.sparse.numblocks=@var{numblocks}
122 repeat @var{numblocks} times
123 GNU.sparse.offset=@var{offset}
124 GNU.sparse.numbytes=@var{numbytes}
129 This format presented the following two problems:
133 Whereas the POSIX specification allows a variable to appear multiple
134 times in a header, it requires that only the last occurrence be
135 meaningful. Thus, multiple occurrences of @code{GNU.sparse.offset} and
136 @code{GNU.sparse.numbytes} are conflicting with the POSIX specs.
139 Attempting to extract such archives using a third-party @command{tar}s
140 results in extraction of sparse files in @emph{compressed form}. If
141 the @command{tar} implementation in question does not support POSIX
142 format, it will also extract a file containing extension header
143 attributes. This file can be used to expand the file to its original
144 state. However, posix-aware @command{tar}s will usually ignore the
145 unknown variables, which makes restoring the file more
146 difficult. @xref{extracting sparse v.0.x, Extraction of sparse
147 members in v.0.0 format}, for the detailed description of how to
148 restore such members using non-GNU @command{tar}s.
151 @cindex sparse formats, v.0.1
152 @GNUTAR{} 1.15.2 introduced sparse format version @code{0.1}, which
153 attempted to solve these problems. As its predecessor, this format
154 stores sparse map in the extended POSIX header. It retains
155 @code{GNU.sparse.size} and @code{GNU.sparse.numblocks} variables, but
156 instead of @code{GNU.sparse.offset}/@code{GNU.sparse.numbytes} pairs
157 it uses a single variable:
161 @vrindex GNU.sparse.map, extended header variable
162 Map of non-null data chunks. It is a string consisting of
163 comma-separated values "@var{offset},@var{size}[,@var{offset-1},@var{size-1}...]"
166 To address the 2nd problem, the @code{name} field in @code{ustar}
167 is replaced with a special name, constructed using the following pattern:
170 %d/GNUSparseFile.%p/%f
173 @vrindex GNU.sparse.name, extended header variable
174 The real name of the sparse file is stored in the variable
175 @code{GNU.sparse.name}. Thus, those @command{tar} implementations
176 that are not aware of GNU extensions will at least extract the files
177 into separate directories, giving the user a possibility to expand it
178 afterwards. @xref{extracting sparse v.0.x, Extraction of sparse
179 members in v.0.1 format}, for the detailed description of how to
180 restore such members using non-GNU @command{tar}s.
182 The resulting @code{GNU.sparse.map} string can be @emph{very} long.
183 Although POSIX does not impose any limit on the length of a @code{x}
184 header variable, this possibly can confuse some tars.
187 @appendixsubsec PAX Format, Version 1.0
189 @cindex sparse formats, v.1.0
190 The version @code{1.0} of sparse format was introduced with @GNUTAR{}
191 1.15.92. Its main objective was to make the resulting file
192 extractable with little effort even by non-posix aware @command{tar}
193 implementations. Starting from this version, the extended header
194 preceding a sparse member always contains the following variables that
195 identify the format being used:
198 @item GNU.sparse.major
199 @vrindex GNU.sparse.major, extended header variable
202 @item GNU.sparse.minor
203 @vrindex GNU.sparse.minor, extended header variable
207 The @code{name} field in @code{ustar} header contains a special name,
208 constructed using the following pattern:
211 %d/GNUSparseFile.%p/%f
214 @vrindex GNU.sparse.name, extended header variable, in v.1.0
215 @vrindex GNU.sparse.realsize, extended header variable
216 The real name of the sparse file is stored in the variable
217 @code{GNU.sparse.name}. The real size of the file is stored in the
218 variable @code{GNU.sparse.realsize}.
220 The sparse map itself is stored in the file data block, preceding the actual
221 file data. It consists of a series of octal numbers of arbitrary length, delimited
222 by newlines. The map is padded with nulls to the nearest block boundary.
224 The first number gives the number of entries in the map. Following are map entries,
225 each one consisting of two numbers giving the offset and size of the
226 data block it describes.
228 The format is designed in such a way that non-posix aware tars and tars not
229 supporting @code{GNU.sparse.*} keywords will extract each sparse file
230 in its condensed form with the file map prepended and will place it
231 into a separate directory. Then, using a simple program it would be
232 possible to expand the file to its original form even without @GNUTAR{}.
233 @xref{Sparse Recovery}, for the detailed information on how to extract
234 sparse members without @GNUTAR{}.