Hash Function Mathematics: How Hashing Works Under the Hood
Hash functions map arbitrary data to fixed-size outputs. Understanding the math behind hashing explains why collisions happen, why hashes are one-way, and how to choose the right algorithm.
Key Takeaways
- A hash function takes an input of any length and produces a fixed-length output (the digest).
- The probability of a hash collision grows much faster than intuition suggests.
- Standard hash functions are too fast for password storage.
- Data integrity**: Compare file hashes before and after transfer
- The same input always produces the same output, but even a single bit change in the input completely changes the digest (avalanche effect).
Hash Generator
Generate SHA-1, SHA-256, SHA-384, SHA-512 hashes from text
What a Hash Function Does
A hash function takes an input of any length and produces a fixed-length output (the digest). The same input always produces the same output, but even a single bit change in the input completely changes the digest (avalanche effect).
Key Properties
| Property | Meaning |
|---|---|
| Deterministic | Same input → same output, always |
| Uniform distribution | Outputs evenly spread across the range |
| Avalanche effect | Small input change → ~50% of output bits flip |
| Pre-image resistance | Cannot reverse from hash to input |
| Collision resistance | Hard to find two inputs with the same hash |
Common Algorithms
| Algorithm | Output Size | Speed | Security |
|---|---|---|---|
| MD5 | 128 bits | Very fast | Broken (collisions found) |
| SHA-1 | 160 bits | Fast | Deprecated (collisions found) |
| SHA-256 | 256 bits | Moderate | Strong |
| SHA-3 | 256/512 bits | Moderate | Strong (different construction) |
| BLAKE3 | 256 bits | Very fast | Strong |
The Birthday Paradox
The probability of a hash collision grows much faster than intuition suggests. For a hash with n possible outputs, a collision becomes 50% likely after approximately √n hashes. For 128-bit MD5, that is 2⁶⁴ — large but feasible for modern computers. For 256-bit SHA-256, it is 2¹²⁸ — computationally infeasible.
Password Hashing
Standard hash functions are too fast for password storage. An attacker can try billions of guesses per second. Password-specific functions (bcrypt, Argon2, scrypt) intentionally slow down hashing with configurable work factors:
- bcrypt: CPU-bound, adjustable rounds (default 12)
- Argon2: Memory-hard (requires configurable RAM, making GPU attacks expensive)
- scrypt: Both memory-hard and CPU-hard
Practical Applications
- Data integrity: Compare file hashes before and after transfer
- Hash tables: O(1) average lookup using hash as array index
- Digital signatures: Hash the message, then sign the hash
- Content addressing: Git commits are SHA-1 hashes of their contents