Unlike cache-aside pattern, which requires application logic to fetch and populate the cache, Read-Through cache manages data retrieval from database on behalf of the application. Here application will interact with the cache system that acts as an intermediary between the application and database.
One of the critical benefits of read-through pattern is the simplification of application code. It offloads the responsibility of handling cache misses and data fetching to the cache layer. Due to this, application will only interact with the cache as if it were the primary data source.
How read-through pattern works?
When an application requests data from the cache, cache will first check if data is present (cache hit). If yes, then cached data will be returned directly to the application. If data is not present (cache miss), cache will act as a mediator and retrieve data from the database. Now, cache will store the retrieved data and return it to the application as the result of the original read request. This will allow subsequent read requests for the same data to be served efficiently from the cache with low latency.
For writing data in read-through cache, we can follow these strategies based on the requirements: Write-around, Write-through, and Write-back (or write-behind) strategy. The choice of the best write approach depends on the tradeoff in terms of data consistency and write performance. Explore and think!
Advantages or benefits of Read-through cache
- Read-through cache significantly reduces the read latency for frequently accessed data, especially when original data source is located remotely (geographically distant data centres). That’s why it is used in a system where data access pattern is read-heavy.
- With write-through pattern, read-through cache provides an automatic way to ensure data consistency. On the other side, read-through with write-around pattern does not need complete knowledge of all possible data that might be requested in the future. Instead, it can cache data as read requests come in (lazy loading). This will allow system to load data into the cache on demand and avoids filling up cache with data that isn’t requested.
- Since read-through caching serves data from the cache whenever possible, it reduces the number of direct requests to the underlying data source.
- We can configure the read-through cache to automatically reload data from the database when it expires or when corresponding data changes in the database. So in the situation of high traffic, there can be low chances of cache misses for these cache items.
- It simplifies application code and abstracts away the complexity of cache management and data retrieval. So, developers don’t need to implement complex logic to handle caching manually.
- In case of a node failure, we can easily replace the node with a new empty node, so application can continue to function. But latency will increase at the start because there will be a high chance of a cache miss for some initial read queries. One solution for this delay problem is to warm up the cache by populating data that is expected to be frequently requested in the near future.
Limitations or drawbacks of Read-through cache
- Whenever a new read request comes or data is requested for the first time or after TTL has ended, it will always result in a cache miss. The idea is simple: In all such situations, read request will go to the database. This will add extra latency: Three network round trips 1) Check the cache 2) Retrieve from the database 3) Add data into the cache.
- With write-through caching, read-through caching can lead to the caching of data that is infrequently or never accessed again. This situation can consume valuable cache space that could have been used for more relevant and frequently accessed data.
- Just like cache-aside, it is also possible for data to become inconsistent between the cache and the database. For example, if there is a change in the database and cache key is not expired yet, it can throw stale data to the application. One solution is to use proper write strategies like write-through caching and configuring proper eviction strategy to ensure data consistency.
- As discussed above, we can automatically reload data in the cache from the database when it expires. The problem is: Simultaneous expiration of a lot of cache data can lead to a lot of database requests (cache stampede). These simultaneous requests cause a spike in load on the database. What is the solution? Think and explore!
Read-Through versus Cache-Aside Pattern
While read-through and cache-aside cache look very similar, there are some major differences:
- Read-through caching places the caching logic within the cache system itself and reduces the burden on application to manage caching logic. Cache-aside places caching logic within the application. The application decides when to cache data when to evict data, and how to handle cache misses. This will provide more control to the application.
- In read-through cache, application interacts with the cache as if it were the primary data source. Application is unaware of whether data is already cached or not, as cache manages this internally. In cache-aside, application must explicitly check the cache before accessing the data source.
- In cache aside pattern, if application does not handle cache updates correctly, there could be possibility of data inconsistencies. So it is the responsibility of application developer to write logic for cache invalidation or updating cached data correctly. On the other side, When data is updated in the data source, read-through cache can be easily configured to automatically update or invalidate corresponding cached entry.
Some critical ideas to explore!
- Which write strategy will work best with the read-through caching: Write-around, Write-through, and Write-back? What are the key tradeoffs, pros and cons of each combination?
- How can we use the read-through pattern in the distributed caching environment? What are the key challenges and advantages?
- What are the techniques to handle the issue of cache stampede in read-through caching?
- How to determine the appropriate TTL to balance performance and data consistency?
- Best practices for configuring cache size and eviction policies in read-through caching.
- How read-through caching handle situations when data source experiences performance issues?
Please write in the message below if you want to share some feedback or if you want to share more insight. Enjoy learning, Enjoy system design!