YouTube is one of the most popular video-streaming platforms that allows users to upload, watch, search, and share video-based content. It also provides features to like, dislike, add comments to videos, etc. So, YouTube is a huge system! However, Have you ever thought about how the YouTube system works and the underlying principles behind its design? Let’s move forward to understand this in detail.
In this blog, we will focus on creating the small version of Youtube with the following features.
Client: A computer, mobile phone, etc.
Video Storage: A BLOB storage system for storing transcoded videos. Binary Large Object (BLOB) is a collection of binary data stored as a single entity in a database management system.
Transcoding Server: We use this for video transcoding or encoding purposes, which converts a video into multiple formats and resolutions (144p, 240p, 360p, 480p, 720p, 1080p & 4K) to provide the best video streaming for different devices and bandwidth requirements.
API Server: We use this to handle all requests except video streaming. This includes feed recommendations, generating video upload URL, updating metadata database and cache, user signup, etc. This server talks to various databases to get all the data.
Web Server: Used for handling incoming requests from the client. Based on the type of request, it routes the request to the API server or transcoding server.
CDN: Encoded videos are also stored in CDN for fast streaming purposes. When we play a video, a popular video is streamed from the CDN.
Load Balancer: We use this to evenly distributes requests among API servers.
Metadata Storage: We use this to store all the video metadata like title, video URL, thumbnails, user information, view count, likes, comment, size, resolution, format, etc. It is sharded and replicated to meet performance and high availability requirements.
Metadata Cache: We use this to cache metadata, user info, and thumbnails for better read performance.
As YouTube is a heavily loaded service, it has various APIs to perform operations smoothly. We can design APIs for features like video uploading, video streaming, video search, adding comments, videos, recommendations, etc.
We use the uploadVideo API for uploading the video content, which returns an HTTP response that demonstrates video is uploaded successfully or not.
string uploadVideo(string apiKey, stream videoData, string videoTitle, string videoDescription, string videoCategory, string videoTags[], string videoLanguage, string videoLocation)
The video upload flow is divided into two processes running in parallel: 1) Uploading the video content and 2) Updating the video metadata.
First, videos are uploaded by the user, and then transcoding servers start the video encoding process. The encoding process converts video into multiple formats and resolutions set by the platform user. For increasing the throughput, we can parallelize the process by spreading this task across several machines. If a video will be popular, it can do another level of compression to ensure the same visual quality of the video at a much smaller size. Overall, video is processed by a batch job that runs several automated processes like generating thumbnails, metadata, video transcripts, encoding, etc.
Video encoding is possible in two ways: lossless and lossy. In lossless encoding, there is no data loss between the original format to a new format. In the lossy encoding, some data is dropped from the original video to reduce the size of the new format. We might have experienced this when uploading a high-resolution image on a social network. After the upload, the image doesn’t look as good as the original image.
After the completion of encoding process, two things get executed in parallel: 1) Storing the encoded videos in a transcoded database and CDN 2) Updating the metadata database and cache. Finally, API servers inform the client that the video is successfully uploaded and ready for streaming.
While content is being uploaded to the video storage, the client, in parallel, sends a request to update the video metadata. This data includes title, video URL, thumbnails, user information, resolution, etc.
We use the uploadVideo API for uploading the video content, which continuously sends the small pieces of the video media stream from the given offset.
stream viewVideo(string apiKey, string videoId, int videoOffset, string codec, string videoResolution)
Whenever a user sends a request to watch the video, the platform checks the viewer’s device type, screen size, processing capability, the network bandwidth. Based on that, it delivers the best video version from the nearest edge server in real-time. After this, our device immediately loads a little bit of data at a time and continuously receives video streams from CDN or Video Storage. Here are two critical observations:
Based on the above observation, we can make cost-effective decisions and implement a few optimizations. For example, We can only stream the most popular videos from CDN. Other videos from the high-capacity video storage. If a video will be popular later, it will move from video storage to CDN.
We need to use a standard streaming protocol to control data transfer during video streaming. Some popular streaming protocols are MPEG–DASH, Apple HLS, Microsoft Smooth Streaming, Adobe HTTP Dynamic Streaming (HDS), etc. The different streaming protocols support different video encodings and playback players, so we need to choose the proper streaming protocol. Suppose we use Dynamic Adaptive Streaming (MPEG–DASH)protocol for video streaming, which can help us in two ways:
We can use MySQL to store metadata like user and video information. For this, we can maintain two separate tables: one table to store user metadata and another table to store video metadata.
In this architecture, we need two data sources to scale out the application: one to handle the write query and the other one to handle the read query. Here writes request will go to the master first and then apply to all the slaves. Read requests will be routed to slave replicas parallelly to reduce the load on the master. This could help us to increase the read throughput.
Such design may cause staleness in data from the read replica. How? Suppose we performed a write by adding a new video, then its metadata would be first updated in the master. Now before this new data gets updated to the slave, a new read request came. At this point, slaves would not be able to see it and return stale data. This inconsistency may create a difference in view counts for a video between the master and the replica. But this can be okay if there is a slight inconsistency (for a short duration) in the view count.
Since we have a huge number of new videos uploaded every day and read operation is extremely high, so the master-slave architecture will suffer from replication lag. On another side, update operation causes cache misses, which go to disk. Now the critical question is: how can we improve the performance of the read write operations further?
Sharding is one of the ways of scaling a relational database besides the replication. In this process, we distribute our data across multiple machines so that we can perform read/write operations efficiently. The idea is: Instead of a single master handling the write requests, we distribute write requests to various sharded machines to increase the write performance. We can also create separate replicas to improve redundancy.
Sharding can increase the system complexity and we need an abstract system to handle the scalability challenges. This led to the requirement of Vitess!
Vitess is a database clustering system that runs on top of MySQL. It has several built-in features that allow us to scale horizontally similar to the NoSQL database.
Vitess Architecture. Source: https://vitess.io/
Here are some important features of Vitess:
Scalability: Its built-in sharding features let you grow database without adding sharding logic to application.
Performance: It automatically rewrites bad queries to improve database performance. It also uses caching mechanisms and prevents duplicate queries.
Manageability: It improves manageability by automatically handling failovers and backups functionalities.
Sharding management: MySQL doesn’t natively support sharding, but we will need it as your database grows. It helps us to enable live resharding with minimal read-only downtime.
For building such a scalable system, we need to use different caching strategies. We can use distributed cache like Redis or Memcache to store the metadata. To make the caching service efficiently perform all its operations, we can use LRU (Least Recently Used) cache algorithm. We can also use CDN as a video content cache. CDN is useful in fetching media content directly from AWS S3. If the service is on AWS, it is convenient to use Cloudfront as a content cache and elastic cache service for metadata cache.
We need to avoid data loss or service unavailability due to power outages, machine failures, natural disasters like earthquakes, etc. For this, we can back up data on data centers located in different geographical locations. The nearest data center serves the user request, which could help to fetch data faster and reduce the system latency.
YouTube is a very complex and highly scalable service to design. In this blog, we have covered only the fundamental concepts necessary for building such Video Sharing services. However, we have limited our discussion to the system’s generic and critical requirements, but this could be expanded further by including the various other functionalities like personalized recommendations, etc.
Enjoy learning, Enjoy system design!