4 IO::Compress::Zlib::FAQ -- Frequently Asked Questions about IO::Compress::Zlib
8 Common questions answered.
10 =head2 Compatibility with Unix compress/uncompress.
12 This module is not compatible with Unix C<compress>.
14 If you have the C<uncompress> program available, you can use this to read
17 open F, "uncompress -c $filename |";
22 Alternatively, if you have the C<gunzip> program available, you can use
23 this to read compressed files
25 open F, "gunzip -c $filename |";
30 and this to write compress files, if you have the C<compress> program
33 open F, "| compress -c $filename ";
38 =head2 Accessing .tar.Z files
40 See previous FAQ item.
42 If the C<Archive::Tar> module is installed and either the C<uncompress> or
43 C<gunzip> programs are available, you can use one of these workarounds to
46 Firstly with C<uncompress>
52 open F, "uncompress -c $filename |";
53 my $tar = Archive::Tar->new(*F);
56 and this with C<gunzip>
62 open F, "gunzip -c $filename |";
63 my $tar = Archive::Tar->new(*F);
66 Similarly, if the C<compress> program is available, you can use this to
67 write a C<.tar.Z> file
74 my $fh = new IO::File "| compress -c >$filename";
75 my $tar = Archive::Tar->new();
80 =head2 Accessing Zip Files
82 This module provides support for reading/writing zip files using the
83 C<IO::Compress::Zip> and C<IO::Uncompress::Unzip> modules.
85 The primary focus of the C<IO::Compress::Zip> and C<IO::Uncompress::Unzip>
86 modules is to provide an C<IO::File> compatible streaming read/write
87 interface to zip files/buffers. They are not fully flegged archivers. If
88 you are looking for an archiver check out the C<Archive::Zip> module. You
89 can find it on CPAN at
91 http://www.cpan.org/modules/by-module/Archive/Archive-Zip-*.tar.gz
93 =head2 Compressed files and Net::FTP
95 The C<Net::FTP> module provides two low-level methods called C<stor> and
96 C<retr> that both return filehandles. These filehandles can used with the
97 C<IO::Compress/Uncompress> modules to compress or uncompress files read
98 from or written to an FTP Server on the fly, without having to create a
101 Firstly, here is code that uses C<retr> to uncompressed a file as it is
102 read from the FTP Server.
105 use IO::Uncompress::Gunzip qw(:all);
107 my $ftp = new Net::FTP ...
109 my $retr_fh = $ftp->retr($compressed_filename);
110 gunzip $retr_fh => $outFilename, AutoClose => 1
111 or die "Cannot uncompress '$compressed_file': $GunzipError\n";
113 and this to compress a file as it is written to the FTP Server
116 use IO::Compress::Gzip qw(:all);
118 my $stor_fh = $ftp->stor($filename);
119 gzip "filename" => $stor_fh, AutoClose => 1
120 or die "Cannot compress '$filename': $GzipError\n";
122 =head2 How do I recompress using a different compression?
124 This is easier that you might expect if you realise that all the
125 C<IO::Compress::*> objects are derived from C<IO::File> and that all the
126 C<IO::Uncompress::*> modules can read from an C<IO::File> filehandle.
128 So, for example, say you have a file compressed with gzip that you want to
129 recompress with bzip2. Here is all that is needed to carry out the
132 use IO::Uncompress::Gunzip ':all';
133 use IO::Compress::Bzip2 ':all';
135 my $gzipFile = "somefile.gz";
136 my $bzipFile = "somefile.bz2";
138 my $gunzip = new IO::Uncompress::Gunzip $gzipFile
139 or die "Cannot gunzip $gzipFile: $GunzipError\n" ;
141 bzip2 $gunzip => $bzipFile
142 or die "Cannot bzip2 to $bzipFile: $Bzip2Error\n" ;
144 Note, there is a limitation of this technique. Some compression file
145 formats store extra information along with the compressed data payload. For
146 example, gzip can optionally store the original filename and Zip stores a
147 lot of information about the original file. If the original compressed file
148 contains any of this extra information, it will not be transferred to the
149 new compressed file usign the technique above.
151 =head2 Apache::GZip Revisited
153 Below is a mod_perl Apache compression module, called C<Apache::GZip>,
155 F<http://perl.apache.org/docs/tutorials/tips/mod_perl_tricks/mod_perl_tricks.html#On_the_Fly_Compression>
157 package Apache::GZip;
158 #File: Apache::GZip.pm
161 use Apache::Constants ':common';
164 use constant GZIP_MAGIC => 0x1f8b;
165 use constant OS_MAGIC => 0x03;
170 my $file = $r->filename;
171 return DECLINED unless $fh=IO::File->new($file);
172 $r->header_out('Content-Encoding'=>'gzip');
173 $r->send_http_header;
174 return OK if $r->header_only;
176 tie *STDOUT,'Apache::GZip',$r;
177 print($_) while <$fh>;
184 # initialize a deflation stream
185 my $d = deflateInit(-WindowBits=>-MAX_WBITS()) || return undef;
187 # gzip header -- don't ask how I found out
188 $r->print(pack("nccVcc",GZIP_MAGIC,Z_DEFLATED,0,time(),0,OS_MAGIC));
190 return bless { r => $r,
201 my $data = $self->{d}->deflate($_);
202 $self->{r}->print($data);
203 # keep track of its length and crc
204 $self->{l} += length($_);
205 $self->{crc} = crc32($_,$self->{crc});
212 # flush the output buffers
213 my $data = $self->{d}->flush;
214 $self->{r}->print($data);
216 # print the CRC and the total length (uncompressed)
217 $self->{r}->print(pack("LL",@{$self}{qw/crc l/}));
222 Here's the Apache configuration entry you'll need to make use of it. Once
223 set it will result in everything in the /compressed directory will be
224 compressed automagically.
226 <Location /compressed>
227 SetHandler perl-script
228 PerlHandler Apache::GZip
231 Although at first sight there seems to be quite a lot going on in
232 C<Apache::GZip>, you could sum up what the code was doing as follows --
233 read the contents of the file in C<< $r->filename >>, compress it and write
234 the compressed data to standard output. That's all.
236 This code has to jump through a few hoops to achieve this because
242 The gzip support in C<Compress::Zlib> version 1.x can only work with a real
243 filesystem filehandle. The filehandles used by Apache modules are not
244 associated with the filesystem.
248 That means all the gzip support has to be done by hand - in this case by
249 creating a tied filehandle to deal with creating the gzip header and
254 C<IO::Compress::Gzip> doesn't have that filehandle limitation (this was one
255 of the reasons for writing it in the first place). So if
256 C<IO::Compress::Gzip> is used instead of C<Compress::Zlib> the whole tied
257 filehandle code can be removed. Here is the rewritten code.
259 package Apache::GZip;
262 use Apache::Constants ':common';
263 use IO::Compress::Gzip;
269 my $file = $r->filename;
270 return DECLINED unless $fh=IO::File->new($file);
271 $r->header_out('Content-Encoding'=>'gzip');
272 $r->send_http_header;
273 return OK if $r->header_only;
275 my $gz = new IO::Compress::Gzip '-', Minimal => 1
278 print $gz $_ while <$fh>;
283 or even more succinctly, like this, using a one-shot gzip
285 package Apache::GZip;
288 use Apache::Constants ':common';
289 use IO::Compress::Gzip qw(gzip);
293 $r->header_out('Content-Encoding'=>'gzip');
294 $r->send_http_header;
295 return OK if $r->header_only;
297 gzip $r->filename => '-', Minimal => 1
305 The use of one-shot C<gzip> above just reads from C<< $r->filename >> and
306 writes the compressed data to standard output.
308 Note the use of the C<Minimal> option in the code above. When using gzip
309 for Content-Encoding you should I<always> use this option. In the example
310 above it will prevent the filename being included in the gzip header and
311 make the size of the gzip data stream a slight bit smaller.
313 =head2 Using C<InputLength> to uncompress data embedded in a larger file/buffer.
315 A fairly common use-case is where compressed data is embedded in a larger
316 file/buffer and you want to read both.
318 As an example consider the structure of a zip file. This is a well-defined
319 file format that mixes both compressed and uncompressed sections of data in
322 For the purposes of this discussion you can think of a zip file as sequence
323 of compressed data streams, each of which is prefixed by an uncompressed
324 local header. The local header contains information about the compressed
325 data stream, including the name of the compressed file and, in particular,
326 the length of the compressed data stream.
328 To illustrate how to use C<InputLength> here is a script that walks a zip
329 file and prints out how many lines are in each compressed file (if you
330 intend write code to walking through a zip file for real see
331 L<IO::Uncompress::Unzip/"Walking through a zip file"> )
337 use IO::Uncompress::RawInflate qw(:all);
339 use constant ZIP_LOCAL_HDR_SIG => 0x04034b50;
340 use constant ZIP_LOCAL_HDR_LENGTH => 30;
342 my $file = $ARGV[0] ;
344 my $fh = new IO::File "<$file"
345 or die "Cannot open '$file': $!\n";
353 ($x = $fh->read($buffer, ZIP_LOCAL_HDR_LENGTH)) == ZIP_LOCAL_HDR_LENGTH
354 or die "Truncated file: $!\n";
356 my $signature = unpack ("V", substr($buffer, 0, 4));
358 last unless $signature == ZIP_LOCAL_HDR_SIG;
361 my $gpFlag = unpack ("v", substr($buffer, 6, 2));
362 my $compressedMethod = unpack ("v", substr($buffer, 8, 2));
363 my $compressedLength = unpack ("V", substr($buffer, 18, 4));
364 my $uncompressedLength = unpack ("V", substr($buffer, 22, 4));
365 my $filename_length = unpack ("v", substr($buffer, 26, 2));
366 my $extra_length = unpack ("v", substr($buffer, 28, 2));
369 $fh->read($filename, $filename_length) == $filename_length
370 or die "Truncated file\n";
372 $fh->read($buffer, $extra_length) == $extra_length
373 or die "Truncated file\n";
375 if ($compressedMethod != 8 && $compressedMethod != 0)
377 warn "Skipping file '$filename' - not deflated $compressedMethod\n";
378 $fh->read($buffer, $compressedLength) == $compressedLength
379 or die "Truncated file\n";
383 if ($compressedMethod == 0 && $gpFlag & 8 == 8)
385 die "Streamed Stored not supported for '$filename'\n";
388 next if $compressedLength == 0;
390 # Done reading the Local Header
392 my $inf = new IO::Uncompress::RawInflate $fh,
394 InputLength => $compressedLength
395 or die "Cannot uncompress $file [$filename]: $RawInflateError\n" ;
404 print "$filename: $line_count\n";
407 The majority of the code above is concerned with reading the zip local
408 header data. The code that I want to focus on is at the bottom.
412 # read local zip header data
414 # get $compressedLength
416 my $inf = new IO::Uncompress::RawInflate $fh,
418 InputLength => $compressedLength
419 or die "Cannot uncompress $file [$filename]: $RawInflateError\n" ;
428 print "$filename: $line_count\n";
431 The call to C<IO::Uncompress::RawInflate> creates a new filehandle C<$inf>
432 that can be used to read from the parent filehandle C<$fh>, uncompressing
433 it as it goes. The use of the C<InputLength> option will guarantee that
434 I<at most> C<$compressedLength> bytes of compressed data will be read from
435 the C<$fh> filehandle (The only exception is for an error case like a
436 truncated file or a corrupt data stream).
438 This means that once RawInflate is finished C<$fh> will be left at the
439 byte directly after the compressed data stream.
441 Now consider what the code looks like without C<InputLength>
445 # read local zip header data
447 # get $compressedLength
449 # read all the compressed data into $data
450 read($fh, $data, $compressedLength);
452 my $inf = new IO::Uncompress::RawInflate \$data,
454 or die "Cannot uncompress $file [$filename]: $RawInflateError\n" ;
463 print "$filename: $line_count\n";
466 The difference here is the addition of the temporary variable C<$data>.
467 This is used to store a copy of the compressed data while it is being
470 If you know that C<$compressedLength> isn't that big then using temporary
471 storage won't be a problem. But if C<$compressedLength> is very large or
472 you are writing an application that other people will use, and so have no
473 idea how big C<$compressedLength> will be, it could be an issue.
475 Using C<InputLength> avoids the use of temporary storage and means the
476 application can cope with large compressed data streams.
478 One final point -- obviously C<InputLength> can only be used whenever you
479 know the length of the compressed data beforehand, like here with a zip
484 L<Compress::Zlib>, L<IO::Compress::Gzip>, L<IO::Uncompress::Gunzip>, L<IO::Compress::Deflate>, L<IO::Uncompress::Inflate>, L<IO::Compress::RawDeflate>, L<IO::Uncompress::RawInflate>, L<IO::Compress::Bzip2>, L<IO::Uncompress::Bunzip2>, L<IO::Compress::Lzop>, L<IO::Uncompress::UnLzop>, L<IO::Compress::Lzf>, L<IO::Uncompress::UnLzf>, L<IO::Uncompress::AnyInflate>, L<IO::Uncompress::AnyUncompress>
486 L<Compress::Zlib::FAQ|Compress::Zlib::FAQ>
488 L<File::GlobMapper|File::GlobMapper>, L<Archive::Zip|Archive::Zip>,
489 L<Archive::Tar|Archive::Tar>,
494 This module was written by Paul Marquess, F<pmqs@cpan.org>.
496 =head1 MODIFICATION HISTORY
498 See the Changes file.
500 =head1 COPYRIGHT AND LICENSE
502 Copyright (c) 2005-2008 Paul Marquess. All rights reserved.
504 This program is free software; you can redistribute it and/or
505 modify it under the same terms as Perl itself.