Eighteen Quintillion YouTube Videos

If you haven’t seen Will YouTube Ever Run Out Of Video IDs?, it’s worth a watch, not only because Tom Scott recorded the 5-minute video in a single take, but also because in the middle of it he managed to recite seventy-three quintillion, seven hundred eighty-six quadrillion, nine hundred seventy-six trillion, two hundred ninety-four billion, eight hundred thirty-eight million, two hundred six thousand, four hundred sixty-four from memory.

There’s only one problem: he recited the wrong number.

Actually, there are a few problems*, but first, let me provide some background.

The day after I saw the video, I needed to write some code at work to encode a random number in a non-decimal base for similar purposes. The function I was using to generate random numbers (openssl_random_pseudo_bytes) required a single parameter: the length of the random number to generate in bytes. Initially I was planning on encoding the random number in hexadecimal (base 16) because every hexadecimal digit (0-9, A-F) corresponds to a sequence of 4 binary digits (or bits), as shown in the following table.

base 10 base 2 base 16
decimal binary hexadecimal
0 0000 0
1 0001 1
2 0010 2
3 0011 3
4 0100 4
5 0101 5
6 0110 6
7 0111 7
8 1000 8
9 1001 9
10 1010 A
11 1011 B
12 1100 C
13 1101 D
14 1110 E
15 1111 F

Thus every byte (8 bits) returned by the random number generator can be encoded perfectly using 2 hexadecimal digits. For example, the byte 11010011 in binary equals D3 in hexadecimal (or 211 in decimal). The largest byte, 11111111, equals the largest 2-digit hexadecimal number: FF (or 255 in decimal). This is what I mean by a perfect byte encoding—there are no greater values possible with 2 hexadecimal digits.

But what if I encoded the random number in base 64? Turns out base 64 doesn’t encode bytes as neatly as hexadecimal. Every base 64 digit corresponds to a sequence of 6 bits, so the only time when base 64 perfectly encodes whole byte values is when the number of bits is divisible by 6. This is the case for 3 bytes (24 bits), 6 bytes (48 bits), 9 bytes (72 bits), and so on. You can certainly encode any byte value that is not a multiple of 3 in base 64, but you would leave some possible base 64 values unused. Here’s a table that shows the relationship between bytes, bits, and base 16 and 64 digits.

bytes bits base 10 base 16 base 64
values digits digits
1 8 256 2 1⅓
2 16 65,536 4 2⅔
3 24 16,777,216 6 4
4 32 4,294,967,296 8 5⅓
5 40 1,099,511,627,776 10 6⅔
6 48 281,474,976,710,656 12 8
7 56 72,057,594,037,927,936 14 9⅓
8 64 18,446,744,073,709,551,616 16 10⅔
9 72 4,722,366,482,869,645,213,696 18 12
10 80 1,208,925,819,614,629,174,706,176 20 13⅓
11 88 309,485,009,821,345,068,724,781,056 22 14⅔
12 96 79,228,162,514,264,337,593,543,950,336 24 16
13 104 20,282,409,603,651,670,423,947,251,286,016 26 17⅓
14 112 5,192,296,858,534,827,628,530,496,329,220,096 28 18⅔
15 120 1,329,227,995,784,915,872,903,807,060,280,344,576 30 20
16 128 340,282,366,920,938,463,463,374,607,431,768,211,456 32 21⅓

Which brings me back to YouTube. Why did they choose an 11-digit base 64 number for their video IDs, and not 8 or 12 or 16 digits? The answer has to do with that function for generating random numbers I referenced earlier. How long of a random number in bytes would one need to generate so that number encoded in base 64 would require 11 digits? Based on the table above, 8 bytes (64 bits) require 10⅔ digits in base 64, and since you can’t have ⅔ of a digit, 11 digits would be necessary to encode all possible 8-byte values. This discovery was unsurprising because 8 bytes is a common size for storing and calculating very large numbers. For instance, in a relational database, where one might store a video ID as an integer, the BIGINT data type requires 8 bytes.

This was my ah-ha moment. Though it’s possible that YouTube stores video IDs as 11-character strings, it’s far more probable that they generate and store video IDs as 8-byte integers, which they then trivially encode as 11-digit base 64 numbers used in their URLs. And if that’s the case, the largest unsigned 8 byte value they could generate is “only” 18 quintillion and change (18,446,744,073,709,551,615 to be precise) which is ¼ of the 73 quintillion value Tom Scott masterfully recites in his video (73,786,976,294,838,206,464 to be precise) which he arrived at by calculating every possible integer that an 11-digit base 64 number could encode (266).

I’m not claiming that YouTube stores video IDs in a relational database (though they may have at some point pre-Google), but I think it’s far more likely that they generate random 8-byte (64-bit) integers for their video IDs, as opposed to 66-bit integers. Keep in mind that 18 quintillion is the mathematical maximum. As Tom Scott rightly suggests, they probably reject any numbers whose base 64 encoding is a word in the dictionary (which must be pretty amusing code, having to filter for words across a variety of languages as well as any leetspeak variants).

If you need further evidence, I took the video IDs from the top 100 YouTube videos of all time, and converted them to integers. The greatest base 64 video ID on the list, _Z5-P9v3F8w, was, no surprise, 18,275,183,150,654,494,668 in decimal. As hypothesized, no video IDs breached the 8-byte (18 quintillion) threshold.

I should add that if YouTube ever ran out of video IDs (the horror!), they probably wouldn’t add a 12th character to the base 64 number as Tom Scott suggests, they’d start generating random 10-byte numbers, which would require 14 digits in base 64—giving them a cool septillion video IDs to use—for the rest of eternity.

There’s one last subtle error in the video that I wanted to point out. The base 64 digits in the video are displayed starting with 0-9. However, according to the base 64 standard (RFC 4648) the digits should be ordered from A-Z, then a-z, then 0-9, followed by + and / (or – and _ for use in URLs and filenames). Zero is 0 in base 2, 0 in base 10, 0 in base 16, and A in base 64.

*Tom Scott says that his in-video footnote,

Base64 encoding for messages is more complicated, but that’s outside the scope of this video.

was intended to handwave over these “bugs” in his video. Fair enough, though I still think that the 8-byte random number underlying YouTube’s 11-digit base 64 video IDs was a pretty neat discovery.

Update: Just stumbled upon this related and very interesting scholarly article, Gone in Six Characters: Short URLs Considered Harmful for Cloud Services.

There is an obvious tension between maintaining the benefits of short URLs and preventing scanning attacks. It might be instructive to compare the size of token space to the “bits of security” metric sometimes used in cryptography. Most token alphabets consist of 62 characters, which is close to 26. Each character can thus be thought of as providing roughly 6 bits of search space. We estimate that tokens of 10 characters or more would make it difficult to scan the entire token space.

5 Comments

Gawd, I just love it when your blog is all techie stuff!

Dad, glad you enjoyed. This absorbed my Saturday.

Z

Cool! Nice post. Just a couple of notes to add:

– I think it makes sense that base64 encodes 6 bit groups: 2^6=64
– it’s worth mentioning that base64 encoding is done in 24-bit blocks (so that it “plays nicely” with bytes); this is how you may end up with = or == padding at the end matching the number of bytes that had to be padded

A

This article got me thinking of how much data must be stored by Youtube alone not including Google and how much it must cost to store that data. If you were to record every possible Youtube 11 digit base64 in a text file seperated by a comma and a 1 representing a boolean of true that the address is in use or a 0 representing false then a comma and then the next 11 digit base64 address and so on. The text file would be over 200,000 petabytes. The price for a TeraByte from my local computer shop is about aud$40/TB that is not in bulk so for about aud$8,000,000,000 you could have a text file that will tell you if a base64 address is used or not but still have no clue to what the video using that address is. By the time you created the file you would always be obsolete because the time it takes to scan the 0 addresses again a lot of them would be 1 addresses between the time you start and finish the scan.

Care to Comment?

Or if you'd prefer to get in touch privately, please send me an email.

Name

Email (optional)

Blog (optional)