How YouTube Stores Its Data
YouTube has a massive amount of data to store, with millions of users all around the globe. There are rarely issues with uploads or streaming from their website, which begs the question: How do they store all that data?
Video files are large, and every day new users are creating content, interacting on the website. Many people purchase YouTube likes, further increasing the flow of traffic through YouTube. Their data is stored along with the rest of Google’s data, using Google’s innovative storage solution, Google BigTable. They also employ a lot of
How Much Data Does YouTube Need?
YouTube has come a long, long way since its launch in 2005. At the moment, no one really knows how much data is stored for YouTube alone. This is not surprising, given that there are over 50,000 hours of video uploaded daily on their servers. You can imagine that if they’d ever reveal the size of their database, it would be a fluctuating number that is steadily increasing.
What Is Google BigTable?
BigTable is a compressed wide-column storage system that Google uses to store data on an enormous scale. Development of BigTable began in 2004, and it was launched within the next year. It was the most innovative storage solution available at the time and is now used to store data for multiple Google applications such as Google Earth, YouTube, Google Maps, and many more (up to sixty Google products).
By using its own database, Google was able to increase its scalability and improve performance. They have obviously achieved both tasks because Google and its products are current market leaders in many industries. They hold the top two spots for the largest search engines in the world (Google and YouTube).
Solutions To YouTube’s Data Problems
The problem with YouTube is that there is a lot of data being produced very fast. It can be very difficult to extract data and measure performance at the speed which YouTube needs. However, their solution is two-fold, creating a Hadoop instance:
- Hadoop Distributed File System: A portable Java file system that can easily be scaled to fit the needs of vast data storage. It can store data over multiple machines and is highly reliable.
- MapReduce: A programming model for data analysis that filters, sorts, and summarizes operations for large data sets.
Together, the Hadoop instance helps YouTube with data mining and analysis better. Additionally, the physical storage used for YouTube data is all in spinning HDD (Hard Disk Drives). For personal (and sometimes commercial) use, SDD gives better performance, but that performance is not scalable, so it does not meet the high demands of YouTube’s massive data storage.
Finally, YouTube uses Google’s global data centers for distribution but also has its own content distribution network (CDN) that is used for more popular videos and ensures data availability to its users. Data is stored on servers arbitrarily and not necessarily at the closest geographical location.
Studying data storage solutions used by Google and YouTube is important for anyone that needs to manage their data, especially in large quantities.