Blog

100% Uptime Democracy

Simon Thompson · March 25th, 2019

Something which caught my eye recently was a petition on the UK Government Petitions Site, which went viral and saw 4M+ signatures in a few days. However, during this time the site went down repeatedly - quite understandable due to the sudden high demand (the highest they've ever dealt with, peaking between 80K-100K simultaneous users).

This got me thinking; with the rise of Edge Computing, how feasible would it be to run a service like this entirely "on the Edge" and thus mitigate the risks of downtime due to traffic spikes / resource limitations? As a bit of weekend fun, and to serve as an example when I'm explaining "the Edge" to friends and colleagues, I've put together a proof of concept.

The Objectives

The actual service itself has quite a few features; creating and searching petitions, for instance. We won't focus on these for now, and instead we'll look at the parts of the application which would have been seeing the highest throughput; namely viewing and signing petitions.

Our proof of concept app should achieve the following;

We'll also be building it with the following considerations in mind;

Now, before we get into the technical side of things...

What is "The Edge"?

The "Edge" is an iteration on "serverless computing" - a model whereby applications use a cloud provider to manage their infrastructure. Despite the name "serverless", servers are still involved - it's instead referring to the idea that the developer does not self-provision servers / virtual machines for the code to run on.

The "Edge" element comes into play with things like Cloudflare Workers, which allow you to deploy your code and run it on the "Edge" - the "Edge" here being their network of 165+ data centers around the world.

In essence, this means we can deploy code without the need to spin up, manage or scale any servers, which provides an exciting prospect in the face of issues such as the traffic spikes to the Petitions site. If they could run on the Edge, they'd have no need to scale servers during peak load.

Building a proof of concept

As I'm most familiar with their stack through my work on Spark, I chose to use Cloudflare Workers for the computation side of things, paired with Workers KV for persistent storage.

I chose Workers KV as I wanted to build the entire thing on one provider, but in reality it may be suitable to use an alternate data store (i.e. Google's Cloud Firestore) to support querying / searching, something a key-value store won't excel at.

The application itself is pretty simple; it's written in TypeScript, compiled and bundled using Webpack (courtesy of the process described in this Cloudflare post), and uses request routing logic heavily inspired by workerrouter. Local testing was made possible with cloudflare-worker-local, and I combined this with periodic pushes up to the Workers Playground for testing via the workers-preview module.

The list of available petitions is stored as JSON in the KV store, and is read frequently by the app. Whilst Workers KV does allow "Unlimited" reads per-second per-key, this is optimised further by caching the result in memory so it can persist between requests during high traffic periods as described in the documentation here, and thus improves performance.

Petition signatures are stored in the KV store too, with a key of the format signature-{PETITIONID}-{HASHOFEMAIL} which allows quick checks to see whether an email has already been registered against a petition.

One challenge was around showing a count of signatures for petitions. With an estimated throughput of 33 signatures per-second, we'd far exceed the one-write per-second per-key limit imposed by Workers KV if we were to try and maintain some kind of counter in the main petitions list. As such, when a signature is added, the app checks whether the petition has had it's signatures counted within the past 30s and - if not - counts the number of stored values with a key beginning with signature-{PETITIONID} which have been added since the last count.

Overall, once deployed to Cloudflare's Edge, we're left with a fast (~150ms load time) app, which can handle insane amounts of traffic without us needing to intervene or manage any servers. Whether we're serving 10 requests or 10 million, our code will continue to run exactly the same - without any resource-related downtime.

You can view the source code for this project on GitHub.

Running the service

Another great thing about serverless (and Cloudflare Workers in particular) is that you typically only pay for what you use. In our case, we'd be paying per-request, with Workers having the first 10M requests included in a $5 baseline cost, and then it's $0.50 per million requests thereafter.

I haven't seen any stats around the exact traffic to the official Petitions site, but to illustrate how low the costs are here I've estimated some figures below.

Although these are very rough numbers, what we're able to see is that an Edge-based app like this could manage a sudden spike of 10M signatures, serving 30 million HTTP requests, for only $15. During non-peak periods, we'd likely drop down to a baseline cost of $5/m, so we're not wasting any money on servers which are doing nothing.

This could potentially be optimised even further via the use of Cloudflare's additional features; for instance page-level caching for GET requests, or custom firewall rules to block bad actors.


Summary

Overall, this was a fun weekend project to try out some new tech, and hopefully demonstrate some of the benefits of serverless.

Whilst I understand that there are regulations and such which might prohibit this example from being implemented in practice, this hopefully shows that applications can be designed to successfully run "on the Edge" to help them weather the storm of going viral.


Additional Notes

On the off-chance that somebody from Cloudflare does wind up reading this post, there are a few things which I feel would make building with Workers even better.

Expose KV NAMESPACE.list() in the Worker API

Part of this proof-of-concept uses the /:namespace_id/keys API endpoint, however this doesn't seem to be exposed through the "in Worker" API, which meant I needed to call the API directly. I'm guessing there might be performance reasons for this, but if not then it'd be super handy.

Support environment variables for configuration

I don't believe Workers currently supports any kind of environment variables (besides storing that config within a KV store itself, I guess). This meant that I needed to code the API Credentials directly within the worker code, which doesn't feel ideal (especially when combined with wanting to push up to the Workers Preview site, which is essentially public).

Allow scoped API tokens

This is something which people have already raised over on the forums, so might already be in development, but scoped API credentials would be a great improvement over the current account-wide API tokens.