View Single Post
  #1  
Old 2020-02-07, 12:34 AM
Five's Avatar
Five Five is offline
 
Join Date: Oct 2004
Location: Canada
The Validity of MD5 Checksums

this is an old article written by site founder RainDawg back in 2004, originally published in sharingthegroove.org technobabble then later on RainDawg's own website (now defunct). we still use md5 for video, but for audio we use st5/ffp for the last 15yrs or so.

In this thread find some lost pages from the past recovered via archive.org. enjoy.
-Five

Quote:
Originally Posted by RainDawg
The Validity of MD5 Checksums

Written By The RainDawg

I've heard several people voice a degree of skepticism as to how reliable the practice of using md5 checksums to validate files is. I mean, just how can a simple 32 digit number tell one file apart from the billions of billions of others on the internet? Well, for the more mathematically minded folks on here, check out this link for a quick description of the actual md5 algorithm:

RFC 1321 - The MD5 Message-Digest Algorithm

Incidentally, the output is actually a 128-bit number, though the output we generally see is encoded to hexadecimal. It is basically a random number ascribed to the particular combination of bits that comprise the file you are making a signature of. This signature is a single "word" that uses 16 characters (0-9 and a-f) and contains 32 places. Using that simple math you learned in high school, this means there are 16^32 combinations, or 3.403*10^38 possible md5 hash values.

Now, to give you an idea of the magnitude of this number, I just rambled off some simple calculations to give comparison. Scientists predict that the universe is 12-14 billion years old. Using the high end of that prediction as a baseline, it's easily calculated that the big bang occured on the order of 4.4*10^17 seconds ago. If you had a computer with the ability to randomly generate files and compute their checksum at 7.7*10^20 per second, it would have taken from the inception of the universe to the current time to randomly generate a file that matches a desired checksum.

Now, you've all verified checksums before, and just checking takes several seconds each, much less the random generation of a file of arbitrary size. Let's assume you had a trillion computers generating files since the beginning of time, you'd still have to have each one capable of randomly generating files at a rate of 770 million per second; impossible by even the most optimistic opinions of the limits of computing, now or in the future.

In fact, if I assume there are 1 trillion computers in the world and that there are 1 trillion files on each computer (both high estimates), there is still only a 1 in 340 trillion chance that there are two files that exist anywhere in the world that contain identical md5 hashes. And, of course, the chance of hitting on these odds is about the same as winning the pick 6 lottery....2 TIMES IN A ROW!

Now, these calculations do not take into account the fact that the file types would have to be compatible. The chance that two files return an identical md5 hash is so small as to be considered impossible. But even if they did, what are the chances that file would be a FLAC file, or even a valid file on a Windows system at all (or Mac, or Linux, or Atari 2600 for that matter). Even if the two files did match, they would be so markedly different that you'd know immediately upon attempting to open the file that wasn't a FLAC or SHN at all.

In conclusion, if you've read this far, this should prove to you more skeptical computers that it is simply impossible to a) create a file from it's md5 hash and b) to have a file pass the md5 hash and NOT be what you're expecting it to be.

Anyone who still doesn't think that md5 is a valid method of fingerprinting files must have a rather profound misunderstanding of the science of random numbers. I would have far more faith in the absurd tenants of tarot cards, the Bible Code, or even Hasidic numerology than to believe that the file I have is different than the one that created the md5 hash I'm verifying against....each of these "arts" has an infinitely higher chance of scoring a "hit".
__________________
Checksums Demystified | ask for help in Technobabble

thetradersden.org | ttd recommended free software/freeware webring
shntool tlh eac foobar2000 spek audacity cdwave vlc

Quote:
Originally posted by oxymoron
Here you are in a place of permanent madness, be careful!
Reply With Quote Reply with Nested Quotes