How to Compress and Uncompress Files with Gzip on Linux

0
394

[ad_1]

fatmawati achmad zaenuri / Shutterstock.com

There are many file compression utilities, but the one you are guaranteed to find in every Linux distribution is gzip. If you only learn how to use a compression tool, it should be gzip .

RELATED: How does file compression work?

Algorithms and trees

The gzip The data compression tool was written in the early 1990s and is still found in all Linux distributions. Other compression tools are available, but no matter what Linux computer you need to work on, you’ll find gzip in that. So if you know how to use gzipyou are ready to go without installing anything.

gzip is an implementation of the DEFLATE algorithm that was invented and patented by Phil Katz of PKZIP. The DEFLATE algorithm improved on earlier compression algorithms that worked on variations of a theme. The data to be compressed is scanned and unique strings are identified and added to a binary tree.

Unique strings are assigned a unique ID token by virtue of its position in the tree. Tokens are used to replace strings in the data, and because the tokens are smaller than the data they replaced, the file is compressed. Replacing the tokens with the original strings inflates the data back to its uncompressed state.

The DEFLATE algorithm added the twist that the most frequently encountered strings were assigned the smallest tokens and the least frequently encountered strings were assigned the largest. The DEFLATE algorithm also incorporated ideas from two earlier compression methods, Huffman encoding and LZ77 compression.

At the time of this writing, the DEFLATE algorithm is almost three decades old. Three decades ago, data storage costs were high and transmission speeds were slow. Data compression was of vital importance.

Data storage is much cheaper today, and transmission speeds are much faster. But we have a lot more data to store, and all over the world, people are accessing cloud storage and streaming services. Data compression is still vitally important, even if all you’re doing is shrinking something you need to upload or stream, or if you’re trying to reclaim some space on a local hard drive.

The gzip command

The larger a file is, the better the compression can be. This is for two reasons. One is that there will be many identical and repeated sequences of bytes throughout a large file. The second reason is that the list of strings and tokens must be stored in the compressed file so that decompression can take place. With a very small file, that overhead can negate the benefits of compression. But even with a fairly small file, there is bound to be some size reduction.

file compression

To compress a file, all you need to do is pass the name of the file to the gzip domain. We will check the original size of the file, compress it, and then check the size of the compressed file.

ls -lh calc-sheet.ods
gzip calc-sheet.ods
ls -lh cal-*

Compress a spreadsheet

The original file, a spreadsheet called “calc-sheet.ods” is 11KB, and the compressed file, also known as the archive file, is 9.3KB. Note that the archive file name is the original file name with “.gz” attached.

The first use of ls The command points to a specific file, the spreadsheet. The second use of ls finds all files beginning with “calc-“, but only finds the compressed file. That’s because, by default, gzip creates the archive file and deletes the original file.

That’s not a problem. If you need the original file, you can retrieve it from the archive file. But if you prefer to keep the original file, you can use the -k (hold) option.

gzip -k calc-sheet.ods
ls -lh calc-sheet.*

Compress a file and keep the original file

This time the original ODS file is preserved.

Unzip to file

To uncompress a GZ archive file, use the -d (unzip) option. This will extract the compressed file from the archive and uncompress it so that it is indistinguishable from the original file.

ls calc-sheet.*
gzip -d calc-sheet.ods.gz
ls calc-sheet.*

Unzip a file with gzip

This time, we can see that gzip you have deleted the archive file after extracting the original file. To retain the archive file, we need to use the -k (hold) option again, as well as the -d (unzip) option.

ls calc-sheet.*
gzip -d calc-sheet.ods.gz
ls calc-sheet.*

Unzip a file and keep the archive file

This time, gzip does not delete the compressed file.

RELATED: Why deleted files can be recovered and how you can avoid it

Decompression and overwriting

If you try to extract a file to a directory where the original file exists, or a different file with it, gzip will ask you to choose to abandon the extraction or overwrite the existing file.

gzip -d text-file.txt.gz

Overwrite gzip request when file in archive already exists in directory

If you know ahead of time that you’re happy to have the file in the directory overwritten by the archive file, use the -f (force) option.

gzip -df text-file.txt.gz

Force overwrite of an existing file

The file is overwritten and silently returns to the command line.

Compress directory trees

The -r Causes of options (recursive) gzip to compress the files into a complete directory tree. But the result may not be as expected.

Here is the directory tree that we are going to use in this example. Each directory contains a text file.

tree level1

Test directory tree structure

let’s use gzip in the directory tree and see what happens.

gzip -r level1/
tree level1

Directory structure after gzipping it

The result is gzip has created an archive file for each text file in the directory structure. It did not create a file from the entire directory tree. In fact, gzip you can only put a single file in an archive.

We can create an archive containing a directory tree and all its files, but we need to bring another command into play. The tar The program is used to create archives of many files, but it does not have its own compression routines. But using the appropriate options with tarwe can cause tar to push the archive file through gzip. That way we get a compressed file and an archive of several files or several directories.

tar -czvf level1.tar.gz level1

The tar the options are:

  • C: Create a file.
  • z: Push files through gzip.
  • v: Detailed mode. Print in the terminal window what tar it’s up to.
  • f level1.tar.gz: File name to use for the archive file.

tar output working its way through the directory tree

This archives the directory tree structure and all files within the directory tree.

RELATED: How to compress and extract files using the tar command in Linux

Get information about files

The -l The (list) option provides information about an archive file. It shows you the compressed and uncompressed sizes of the file in the archive, the compression ratio, and the file name.

gzip -l leve1.tar.gz
gzip -l text-file.txt.gz

Using the -l list option to view compression statistics for a file

You can verify the integrity of a compressed file with the -t (test) option.

gzip -t level1.tar.gz

Test a file with the -t option

If all goes well, you will quietly return to the command line. The absence of bad news is good news.

If the file is corrupted or not a file, you are informed about it.

gzip -t not-an-archive.gz

Use the -t option to test a file that is not a file

Speed ​​versus compression

You can choose to prioritize the speed of file creation or the degree of compression. To do this, provide a number as an option, from -1 through the top -9. The -1 The option offers the fastest speed at the sacrifice of compression and -9 gives the highest compression at the sacrifice of speed.

Unless you provide one of these options, gzip uses -6.

gzip -1 calc-sheet.ods
ls -lh calc-sheet.ods.gz
gzip -9 calc-sheet.ods
ls -lh calc-sheet.ods.gz
gzip -6 calc-sheet.ods
ls -lh calc-sheet.ods.gz

Using gzip with different speed and compression priorities

With a file as small as this, we didn’t see any significant difference in execution speed, but there was a small difference in compression.

Interestingly, there is no difference between using compression level 9 and compression level 6. You can only squeeze so much compression out of a given file, and in this case that limit was reached with compression level 6. Raising it to 9 brought no more reduction in file size. With larger files, the difference between level 6 and level 9 would be more pronounced.

Compressed, Unprotected

Don’t confuse compression with encryption or any form of protection. Compressing a file does not give it enhanced security or privacy. Anyone with access to your file can use gzip to unzip it.

RELATED: List the 10 largest files or directories in Linux

[ad_2]