Compression Paradox

Nowadays we compress files everyday and we are never  surprising watching the ability of compressors to obtain a smaller file, but this is an incredible result!

What we do compressing a file with N bits into a file with N/2 bits (a common compression ratio ) is to make a correspondence between a set S1 with 2^N possible files and a set S2 with 2^(N/2) elements .

What is the probability to take a compressible file ? To take A file in the set S1 with a correspondence in the set S2? The probability is 2^-(N/2). The probability is exponential low!. This mean that if you take an arbitrary file it is about impossible to compress it with every compressing algorithm!

So why also simple compressor work so well?

The answer is  because the existing files are only a subset of possible files and these existing files are absolutely compressible! The existing bits of information are absolutely compressible.

Not only the probability to have a compressible file is exponential low  but this is the most optimistic scenario where the compressor is the most powerful compressor possible! A compressor like this is able to make a correspondence for each string in S2 a string in S1 ( and there are also a lot of theoretical problem to do this ) but what is the state of the art of compressors? Are we using very powerful compressors?

To answer to this question we can try to make an inverse experiment . I choose an absolutely compressible binary string and then using the compressors I can see how much they are able to compress .

The result of this experiment show all the limits of the today compressors .

I compute different length of bit strings of evolution of the Cellular Automata Rule 30 . This data is theoretically compressible to 1 byte ( the size of the rule) + log(N) where N is the size of the bitstring in the worst case.

What I obtain using all known compressors is always an increasing in the size of the compressed file!

So given an arbitrary file is possible to compress it but given an absolutely compressible file , at the limit of compressibility compressors are not able to compress anything.

So what this experiment show is that we have solutions to the compression problem with a capability distant from the maximum performance available by the theory but also with this gap we are able to obtain incredible practical result in compression.

2 thoughts on “Compression Paradox

  1. Pingback: Shannon discrepance 2 « Breakingkeyboards’s Blog

  2. Pingback: Shannon discrepance 1 « Breakingkeyboards’s Blog

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s