Zip vs Tar.gz Files Explained and Compared (Archiving and the DEFLATE algorithm)

Описание к видео Zip vs Tar.gz Files Explained and Compared (Archiving and the DEFLATE algorithm)

In this video, I explain the similarities and differences between two popular compressed archive formats, zip on Windows, and tar.gz in the *nix world. Both formats typically use the same compression, and also serve as a way to collect files together in an archive, however, there are a some fundamental differences between how they work and why they're used for different purposes.

I discuss the two main steps of compression and linearizing the files into an archive, and how each stage differs between the two formats, as well as the advantages and disadvantages of each implementation. I hope you enjoy the video and learn something new!

This is yet another video that is a bit rambly at times (having cut it down from over 30 minutes). I have added timestamps so you can easily skip between the sections you're interested in. I apologize in advance for the clipped (and heavily compressed) audio, this was recorded accidentally with a 10 dB boost on the mic, on a 128K MP3 recorder making for a questionable combination.

Timestamps

Introduction
00:00 - Introduction
00:19 - What is a zip and tar file?
01:45 - Gzip (and other compressors)
02:45 - Why use tar instead of zip on Linux?

What's the difference?
03:48 - How does tar work?
04:39 - How does zip work?

Comparison
05:09 - The advantages of zip
06:46 - The advantage of tar.gz (and streaming compression)
08:30 - The disadvantage of tar

Further learning
09:30 - Notes on 7zip and .gz.tar
10:40 - Indexed tar files with pixz (and comparison to 7z)
13:30 - What should you use?

15:10 - Conclusion

Links (Get Smarter Section)

DEFLATE algorithm:

Wiki: https://en.wikipedia.org/wiki/Deflate
How DEFLATE works (good summary): https://zlib.net/feldspar.html
Full specification: https://datatracker.ietf.org/doc/html...

TAR format:

Wiki: https://en.wikipedia.org/wiki/Tar_(co...)
Man page: https://linux.die.net/man/1/tar
Tar format specs: https://www.gnu.org/software/tar/manu...

Gzip: (based on DEFLATE)

Wiki: https://en.wikipedia.org/wiki/Gzip
Homepage: https://www.gnu.org/software/gzip/

Bzip2:

Wiki: https://en.wikipedia.org/wiki/Bzip2
Homepage: https://www.sourceware.org/bzip2/

XZ utils: (LZMA2 compression)

Wiki: https://en.wikipedia.org/wiki/XZ_Utils
Homepage: https://tukaani.org/xz/

pixz: (parallel indexed xz)
https://github.com/vasi/pixz

pigz: (parallel implementation of gz)
https://github.com/madler/pigz

Lzip: (also based on LZMA2)

Homepage: https://www.nongnu.org/lzip/

LZMA2 Compression:

Wiki: https://en.wikipedia.org/wiki/Lempel%...

Z-Standard Compression: (aka zstd)

Wiki: https://en.wikipedia.org/wiki/Zstd
Homepage: https://facebook.github.io/zstd/
Source: https://github.com/facebook/zstd

7zip: (also generally LZMA)

Wiki: https://en.wikipedia.org/wiki/7-Zip
Homepage: https://www.7-zip.org/
Source code: https://sourceforge.net/projects/seve...

p7zip (POSIX port): http://p7zip.sourceforge.net/

Zip: (generally DEFLATE)

Wiki: https://en.wikipedia.org/wiki/ZIP_(fi...)
Specs: https://pkware.cachefly.net/webdocs/c...

Dar: (competing new format for tar)
http://dar.linux.free.fr/


Content used:
Zip and Tar icons in thumbnail from FlatIcon.
Ending music is We'll Meet Again by TheFatRat

Clarifications and Corrections

Just to clarify a few things before I get some comments: The 'only decompress the file' benefit I mentioned in zip is because zip (and 7z) keep an index at the front. If you did .gz.tar, you wouldn't get that benefit, as tar isn't indexed. Next, when I say 'the index of the tar file is at the end', what I mean is that if you want the file list (like an index would produce), you need to read the archive to the end as though there is an index there (tar files don't have an index, just a few bytes at the front of each file). So, to get a file list, you need to read those bytes at the start of each file, meaning you have to read the full archive. I hope this clarifies it.

Clarification: pigz is not indexed, pixz is. Pixz is backwards compatible with xz, although both support multithreading these days.

(more to be added)

Комментарии

Информация по комментариям в разработке