Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

HN crowd might have some interesting takes on this: what is your preferred archive (and/or compression) format, and why?

I’ve been using .tar.xz for archiving, but haven’t looked into what the “best” option really is.



If I will need to access individual files or it needs to be usable by others on arbitrary operating systems, zip is best. Random access is greatly helped by the directory at the end of a zip file. .tar.*z is particularly hostile to random access.

If I just need an archive for easy schlepping around, I use tar if the content is not likely compressible or tar.gz if it is compressible. The slowness of better compression or the likelihood that I will struggle with missing utilities tends to make me shy away from other compressors.

I would only highly optimize for size and suffer the slow compression if it will be downloaded a lot of times and that makes a difference for user experience or bandwidth costs.


For what it's worth, the guy who maintains GNU ed and wrote lzip says you should not do that[0].

0. https://www.nongnu.org/lzip/xz_inadequate.html


I remember some discussion about that a while back on HN. The TL;DR is a) he does say that and b) he would say that, since lzip is effectively a competitor to xz.

That said, I've been using pixz (which is compatible with xz but can do parallel decompression and a few other things) for many years on many dozens of terabytes of compressed data and have never had any problems.


.tar.zst - I have found Zstd to be enormously faster than gzip, bzip2, or xz. Its compression ratio is good and it's available in most Linux distros.


All the traditional UNIX/POSIX archive formats are inadequate for archiving modern file systems, because they lose a part of the file metadata, e.g. they may lose the extended attributes or access-control lists, or they may truncate the time stamps.

Most archivers have various extensions for the tar or pax file formats, to deal with modern metadata, but the format extensions may differ between various "tar" or "pax" implementations and not all such extensions really succeed to not lose any metadata.

One archiver available on Linux/*BSD systems, for which I have checked that it does the job right and it is able to archive files from a filesystem like XFS, without losing metadata like most other tar/pax/cpio programs, is the "bsdtar" program from the "libarchive" package, when used as a "pax" program, i.e. when invoked with the options "bsdtar --create --format=pax".

Since many years ago, I have been using exclusively this archiver, to avoid losing information when making archive files. It is also very versatile, allowing to combine archiving with many finely configurable compression or encryption algorithms.

Some years ago, the most widely available tar program, the GNU tar, was not able to store XFS or UFS files without data loss.

It is possible that the GNU tar has been improved meanwhile and now it can do its job right, but I had no reason to go back to it, so I have never checked it again for changes.

Of course, Windows file formats, e.g. zip or 7z, are even less appropriate for archiving UNIX/POSIX file systems than the traditional tar/cpio/pax.


Maybe a bit unorthodox, but I've been using SquashFS with xz for compression for long-term archival (I generally prefer zstd, but for long-term archival I don't mind waiting longer for better compression with xz).

SquashFS files have file-based deduplication, fast random access, and mountability, all of which are lacking from .tar.* archives without resorting to other indexing tools. And they're mountable on any Linux without installing anything.

Only downside is they're readonly (or more accurately append-only), but for my uses, that's totally fine.


I use windows, so .zip. Maybe on rare occasion .7z if I really want better compression.

.tar.xz files are a gross pain on windows and I’m always annoyed when people use them.


Why are they a pain?


How are tarballs any more of a pain than .7z?


I'll have a guess: because 7-zip gui doesn't decompress tar.gz files in a single step (ie. first you extract the tar, then you extract it again)


IIRC 7zip actually does it in one step. But many tools such as WinZip do not. I forget the exact behavior of built-in, WinZip, and WinRar.


For personal stuff I haven't compressed an archive in a while. I just copy the containing folder to an external drive and a cloud storage if it's really important. Everything I really really want archived is less than 1TB already. That fits on an external drive and cheap cloud storage subscription.

If I need to send something I'll use zip for personal stuff or tar.gz at work because I know everyone is using some kind of Linux and it's the only terminal zip command I have memorized.


Just to think out of the box: How about not creating archives, just leave the files as they are in a directory. For transferring, use a client that preserves directory structure.

The space of such clients and their features or popularity is limited though. Rsync has millions flags that add complexity. Then you have bit torrent. From what I know it doesn’t compress. Others: git, nfs, ftp.


Usually you want some kind of compression though.

That's actually the reason why I used tar instead of rsync; I only had around 750 GiB of space. The 519GiB tar.gz would fit, the 1.1TiB directory structure wouldn't.


I have always been partial to .uha, the files produced by Uwe Herklotz' UHARC archive utility. I mostly ran into them on pirated game rips downloaded from DALnet IRC in the 90s and early aughts. It's extremely slow, however.

If I was going to be packing lots of data I'd probably use mopaq (.mpq) which has support for LZMA and Huffman coding.


I use tar.lz (Lzip) for stuff that only I will care about.

For sending stuff to other people I just use zip, because it is the lowest common denominator, but I make sure to always use info-zip so that I don't use any weird proprietary extensions, and get proper zip64 support.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: