UUID v4 Generator ยท 5 min read
What Is a UUID and When Should You Use One?
UUIDs are 128-bit identifiers designed to be unique without central coordination. Learn what the four parts of a v4 UUID mean, when to use UUIDs vs auto-increment IDs, and the odds of a collision.
What Is a UUID?
A UUID โ Universally Unique Identifier โ is a 128-bit number used to identify information without requiring a central authority to assign it. The standard format, defined in RFC 4122, looks like this:
550e8400-e29b-41d4-a716-446655440000
The 32 hexadecimal digits are split into five groups of 8-4-4-4-12 characters, separated by hyphens. This gives 32 hex digits ร 4 bits = 128 bits of information โ though a few of those bits are reserved for version and variant information.
UUID Versions Explained
Version 1 โ Time-based
Version 1 UUIDs incorporate a 60-bit timestamp (100-nanosecond intervals since October 1582) combined with a MAC address or random node identifier. They are guaranteed unique across machines and time, but they embed information about when and where they were generated โ a privacy concern in some contexts.
Version 4 โ Random
Version 4 is the most widely used. It consists of 122 bits of cryptographically random data, with 4 bits reserved for the version number (4) and 2 bits for the variant. The version appears as the first digit of the third group (41d4 โ version 4). The variant bits appear in the first digit of the fourth group.
Version 7 โ Time-ordered random (new)
Version 7 is a newer format gaining adoption. It places a millisecond-precision Unix timestamp in the high bits, followed by random data. This makes v7 UUIDs sort in creation order โ solving the database indexing problem that plagues v4 UUIDs โ while retaining randomness and privacy.
The Collision Probability
A version 4 UUID has 122 random bits. The probability of generating two identical UUIDs seems vanishingly small โ and it is. To have a 50% chance of any collision, you would need to generate approximately 2.7 ร 1018 UUIDs. At a rate of one billion UUIDs per second, that would take over 85 years.
In practice, UUID collisions are not a realistic concern for any production system. The birthday problem math confirms that even at internet scale โ billions of UUIDs generated per day across all systems worldwide โ the cumulative collision probability over years of operation remains negligible.
UUID vs Auto-Increment IDs
Auto-incrementing integer IDs (1, 2, 3...) are simpler and more efficient in many database scenarios, but they have real drawbacks:
- Predictability: Knowing that record 1001 exists makes it trivial to guess that records 1000 and 1002 exist โ an information leak.
- Merge conflicts: Merging data from two databases with separate auto-increment sequences produces collisions. UUIDs eliminate this problem entirely.
- Distributed systems: In microservice architectures, records created independently on different nodes cannot safely use auto-increment without a coordination layer.
The tradeoff: random v4 UUIDs are terrible for database index performance. Because they are random, each insert goes to a random position in a B-tree index, causing frequent page splits and cache misses. This is why v7 UUIDs โ or ULIDs (Universally Unique Lexicographically Sortable Identifiers) โ exist: they are time-ordered, so inserts cluster at the end of the index like auto-increment IDs would.
Common Uses for UUIDs
- Database primary keys โ particularly in distributed or multi-tenant systems
- File names โ uploaded user files get UUID names to prevent collisions and path traversal attacks
- API keys and session tokens โ a v4 UUID provides 122 bits of randomness, sufficient for a temporary token
- Request correlation IDs โ trace a single request across distributed microservices
- Idempotency keys โ ensure a payment or operation is not processed twice
When NOT to Use UUIDs
UUIDs are 16 bytes versus 4 bytes for a 32-bit integer or 8 bytes for a 64-bit integer. At scale, this storage overhead matters โ both in the table itself and in every index, foreign key, and join that references that column. For a table with billions of rows and many foreign key references, switching from bigint to UUID can measurably increase storage and slow queries.
If your system is single-node, never needs to merge data from separate sources, and has high-volume write patterns where index performance is critical โ a bigint auto-increment ID is often the better engineering choice.
References
- Leach, P., Mealling, M., & Salz, R. (2005). RFC 4122: A Universally Unique IDentifier (UUID) URN Namespace. Internet Engineering Task Force.
- Schwartz, B. (2007). UUID vs auto increment in MySQL โ performance and storage analysis. Percona Database Performance Blog.
- The PostgreSQL Global Development Group. (2024). uuid-ossp โ generate universally unique identifiers. PostgreSQL Documentation.
- Wikipedia contributors. (2024). Universally unique identifier โ collision probability section. Wikipedia.