πŸ‹
Menu
Comparison Beginner 2 min read 342 words

Hash Function Mathematics: How Hashing Works Under the Hood

Hash functions map arbitrary data to fixed-size outputs. Understanding the math behind hashing explains why collisions happen, why hashes are one-way, and how to choose the right algorithm.

Key Takeaways

  • A hash function takes an input of any length and produces a fixed-length output (the digest).
  • The probability of a hash collision grows much faster than intuition suggests.
  • Standard hash functions are too fast for password storage.
  • Data integrity**: Compare file hashes before and after transfer
  • The same input always produces the same output, but even a single bit change in the input completely changes the digest (avalanche effect).

What a Hash Function Does

A hash function takes an input of any length and produces a fixed-length output (the digest). The same input always produces the same output, but even a single bit change in the input completely changes the digest (avalanche effect).

Key Properties

Property Meaning
Deterministic Same input β†’ same output, always
Uniform distribution Outputs evenly spread across the range
Avalanche effect Small input change β†’ ~50% of output bits flip
Pre-image resistance Cannot reverse from hash to input
Collision resistance Hard to find two inputs with the same hash

Common Algorithms

Algorithm Output Size Speed Security
MD5 128 bits Very fast Broken (collisions found)
SHA-1 160 bits Fast Deprecated (collisions found)
SHA-256 256 bits Moderate Strong
SHA-3 256/512 bits Moderate Strong (different construction)
BLAKE3 256 bits Very fast Strong

The Birthday Paradox

The probability of a hash collision grows much faster than intuition suggests. For a hash with n possible outputs, a collision becomes 50% likely after approximately √n hashes. For 128-bit MD5, that is 2⁢⁴ β€” large but feasible for modern computers. For 256-bit SHA-256, it is 2¹²⁸ β€” computationally infeasible.

Password Hashing

Standard hash functions are too fast for password storage. An attacker can try billions of guesses per second. Password-specific functions (bcrypt, Argon2, scrypt) intentionally slow down hashing with configurable work factors:

  • bcrypt: CPU-bound, adjustable rounds (default 12)
  • Argon2: Memory-hard (requires configurable RAM, making GPU attacks expensive)
  • scrypt: Both memory-hard and CPU-hard

Practical Applications

  • Data integrity: Compare file hashes before and after transfer
  • Hash tables: O(1) average lookup using hash as array index
  • Digital signatures: Hash the message, then sign the hash
  • Content addressing: Git commits are SHA-1 hashes of their contents