What is caching and why is it important for website speed?
What is a cache?
In computer science caching refers to a procedure of storing the data to a highly available and an easily accessible resource to reduce the time of retrieval of that data. The importance of caching goes back to the mid 1960s when computer processors could process higher density of instructions in a shorter time relative to the retrieval of those instructions from memory, which also implied that the processor would be idle for a specific time.
This problem was addressed by Sir Maurice Vincent where he describes a system of memory which is readily available to the processor. Although originally, this solution was designed for low-level systems (hardware), later similar principles had been proposed to increase the performance applications on the internet such as Websites.
Before we dive into how the caching mechanisms enable us to achieve better performance in websites let’s discuss briefly how a website gets served to a user.
How does a website work?
When a user types in a URL or website address in the browser, below are the set of actions (at minimum) that are taken by a browser.
1. Looks up the destination Internet Protocol (IP) address using a Domain Name Server (DNS). A DNS basically inputs a website address and outputs the IP Address of the server where the website lives.
2. Browser then sends a HTTP or HTTP 2.0 request to the server.
This request is sent using network layer but let’s just abstract the details of that part.
3. Server sends back a HTTP or HTTP 2.0 response.
4. Browser starts to render the response top to bottom, left to right and fires further HTTP/HTTP 2 requests as needed I.e., assets of the webpage.
If one thinks about it, each of these steps are trying to retrieve some sort of data from one resource or another. The resource that is responsible for data delivery, for instance a DNS server, will require a certain amount of time to find the answer to the corresponding request. Following this, there is always the added time for delivery of that answer.
Therefore, the problem that these steps highlight with regards to efficiency is not just reminiscent of that of the processor trying to retrieve instructions from the secondary memory, it is also analogous. This is the underlying reason that the same mechanism of caching works well in increasing the performance of web applications.
Different types of cache
DNS Caching
This is used during the time of DNS Lookup.
I.e. browser (or operating system) caches the IP address of the destination site, so it does not have to ask a DNS server.
Server-side caching
Each time a server is requested for a web page, it has to perform some sort of computation. This computation can be retrieving some data from the database i.e. the product someone ordered, or this computation can be interpreting the high-level code (i.e., PHP) into something called machine code (low-level).
To reduce the time complexity of these tasks programmers often use cache to avoid redoing the frequent computation. For instance, if a piece of code (e.g. PHP script) is written to compute sum of two integers there is no point to translate it into the machine code repeatedly for each request, therefore programmers converts the once interpreted code into a cache (called OPcache in the case of PHP) so that they do not need to interpret it again.
Similarly, database engines try to cache the indexes of frequently retrieved content so they can respond instantly.
Browser based caching
Ever been asked by a developer to “Clear the cache to see the latest changes”? If yes, chances are that the developer is asking you to clear your browser cache so you can see the updated code. Since browsers have to do all of the heavy lifting of rendering the time and compute complex resources, they tend to cache these resources, so the response time is quicker. These resources can include but not limited to stylesheets, fonts, images, or JavaScript libraries.
So what’s the cache (catch)?
Although caching is great, unfortunately we cannot cache everything for the following reasons:
1. Sometimes the requirement of a web application is to see the dynamic data i.e. A weather report or value of a stock.
2. Third-party libraries, for example if an application is using someone else’s code as part of the application and let’s say there is a security update to that code if that code is cached it would not be updated, thus, leading to catastrophic issues.
Not caching everything, however, adds further problems to the mix.
Cache hits and misses!
Cache hit and miss ratios
Cache hit means that a user or a system attempted to access some content/resource, that content was available in the cache and it was immediately given to the relevant user or the system. In contrast, “cache miss” is when some content is requested and cache does not contain the relevant data and request has to be directed
Cache hit ratio is simply the amount of times content was found in a cache divided by total times the data was recalled.
An efficient system attempts to increase the cache hit ratio and decrease the cache miss ratio.
Cache control policies and search engine optimisation (SEO)
Since systems like HTTP or content delivery networks such as Cloudflare caches the resources arbitrarily there is a need to inform these systems that what is required to be cached and for how long. Although different frameworks deal with the idea of cache policies differently i.e., .NET has their own set of programs to deal with it, the common denominator between all of these frameworks is HTTP. HTTP uses something called ‘Cache request and response directives’ to control the cache policies.
For example, if there is a component on a website which is required to be updated every-time someone adds a post then a directive of “Cache-Control: must-revalidate” should be used every time someone updates that post.
Since Search engines crawl and retrieve information constantly one of the things that they recommend is using better HTTP cache policies for the static content. Using the good cache policies not only provides a better user experience but it can be a key element in aiding the SEO of your website.
What actions should I take if I own a website?
If you are using a WordPress site with a managed hosting service such as WP Engine chances are that you are already using some sort of caching policy. For instance WP Engine uses multiple cache and employs a policy which forbids the core functionalities to be cached. If you are not using a maintained hosting service but are using WordPress as your platform there are tons of plugins which can help raise the bar to some extent i.e., WP Super Cache.
As we discussed earlier, most of the frameworks come with native caching mechanisms. It is best to employ those mechanisms to optimize the performance of your website and to optimize the website for Search engines.
In conclusion, caching can introduce some paradoxical situations, however, it also presents with so many opportunities when it comes to user experience and SEO.
Internal Day: March 2021
ClarityDX are finalists at the Global Agency Awards!
Let's Talk
Do you have a web design and build project coming up that you would like to talk about?