Blog

100% Uptime Democracy – Replicating the HoC Petition site on the Edge

A proof of concept petitions app living entirely within the Cloudflare Workers ecosystem.

Something which caught my eye recently was a petition on the UK Government Petitions Site, which went viral and saw 4M+ signatures in a few days. However, during this time the site went down repeatedly – quite understandable due to the sudden high demand (the highest they’ve ever dealt with, peaking between 80K-100K simultaneous users).

This got me thinking; with the rise of “Edge Computing”, how feasible would it be to run a service like this entirely “on the Edge” and thus mitigate the risks of downtime due to traffic spikes / resource limitations. As a bit of weekend fun, and to serve as an example when I’m explaining “the Edge” to friends and colleagues, I’ve put together a proof of concept.

The objectives

The actual service itself has quite a few features; creating and searching petitions, for instance. We won’t focus on these for now, and instead we’ll look at the parts of the application which would have been seeing the highest throughput; namely viewing and signing petitions.

Our proof of concept app should achieve the following;

  • List available petitions
  • Allow viewing of a petition page, with an up-to-date signature count
  • Allow signing of a petition, with one submission per unique email

We’ll also be building it with the following considerations in mind;

  • The petition in question has 4M+ signatures. We’ll need to support this amount of data (ideally more, of course!).
  • The petition saw a rate of ~2K signatures per minute, which is roughly 33 every second. We’ll need to support this many writes-per-second.

Now, before we get into the technical side of things…

What is “The Edge”?

The “Edge” is an iteration on “serverless computing” – a model whereby applications use a cloud provider to manage their infrastructure. Despite the name “serverless”, servers are still involved – it’s instead referring to the idea that the developer does not self-provision servers / virtual machines for the code to run on.

The “Edge” element comes into play with things like Cloudflare Workers, which allow you to deploy your code and run it on the “Edge” – the “Edge” here being their network of 165+ data centers around the world.

In essence, this means we can deploy code without the need to spin up, manage or scale any servers, which provides an exciting prospect in the face of issues such as the traffic spikes to the Petitions site. If they could run on the Edge, they’d have no need to scale servers during peak load.

Building a proof of concept

As I’m most familiar with their stack through my work on Spark, I chose to use Cloudflare Workers for the computation side of things, paired with Workers KV for persistent storage.

I chose Workers KV as I wanted to try and build the entire thing on one provider, but in reality it might be more suitable to use an alternate data store (i.e. Google’s Cloud Firestore) to support querying / searching, something a key-value store won’t excel at.

The application itself is pretty simple; it’s written in TypeScript, compiled and bundled using Webpack (courtesy of the process described in this Cloudflare post), and uses request routing logic heavily inspired by workerrouter. Local testing was made possible with cloudflare-worker-local, and I combined this with periodic pushes up to the Workers Playground for testing via the workers-preview module.

The list of available petitions is stored as JSON in the KV store, and is read frequently by the app. Whilst Workers KV does allow “Unlimited” reads per-second per-key, this is optimised further by caching the result in memory so it can persist between requests during high traffic periods as described in the documentation here, and thus improves performance.

Petition signatures are stored in the KV store too, with a key of the format signature-{PETITIONID}-{HASHOFEMAIL}. This allows quick checks to see whether an email has already been registered against a petition.

One challenge was around showing a count of signatures for petitions. With an estimated throughput of 33 signatures per-second, we’d far exceed the one-write per-second per-key limit imposed by Workers KV if we were to try and maintain some kind of counter in the main petitions list. As such, when a signature is added, the app checks whether the petition has had it’s signatures counted within the past 30s and – if not – counts the number of stored values with a key beginning with signature-{PETITIONID} which have been added since the last count.

Overall, once deployed to Cloudflare’s Edge, we’re left with a fast (~150ms load time) app, which can handle insane amounts of traffic without us needing to intervene or manage any servers. Whether we’re serving 10 requests or 10 million, our code will continue to run exactly the same – without any resource-related downtime.

Running the service

Another great thing about “serverless” (and Cloudflare Workers in particular) is that you typically only pay for what you use. In our case, we’d be paying per-request, with Workers having the first 10M requests included in a $5 baseline cost, and then it’s $0.50 per million requests thereafter.

I haven’t seen any stats around the exact traffic to the official Petitions site, but to illustrate how low the costs are here I’ve estimated some figures below.

Although these are very rough numbers, what we’re able to see is that an Edge-based app like this could manage a sudden spike of 10M signatures, serving 30 million HTTP requests, for only $15. During non-peak periods, we’d likely drop down to a baseline cost of $5/m, so we’re not wasting any money on servers which are doing nothing.

This could potentially be optimised even further via the use of Cloudflare’s additional features; for instance page-level caching for GET requests, or custom firewall rules to block bad actors.


Summary

Overall, this was a fun weekend project to try out some new tech, and hopefully demonstrate some of the benefits of serverless.

Whilst I understand that there are regulations and such which might prohibit this example from being implemented in practice, this hopefully shows that applications can be designed to successfully run “on the Edge” to help them weather the storm of going viral.


Additional Notes

On the off-chance that somebody from Cloudflare does wind up reading this post, there are a few things which I feel would make building with Workers / Workers KV even better;

Expose KV NAMESPACE.list() in the Worker API
Part of this proof-of-concept uses the /:namespace_id/keys API endpoint, however this doesn’t seem to be exposed through the “in Worker” API, which meant I needed to call the API directly. I’m guessing there might be performance reasons for this, but if not then it’d be super handy.

Support environment variables for configuration
I don’t believe Workers currently supports any kind of environment variables (besides storing that config within a KV store itself, I guess). This meant that I needed to code the API Credentials directly within the worker code, which doesn’t feel ideal (especially when combined with wanting to push up to the Workers Preview site, which is essentially public)

Allow scoped API tokens
This is something which people have already raised over on the forums, so might already be in development, but scoped API credentials would be a great improvement over the current account-wide API tokens.

Moving to a “Known Web” – Using Certificate Transparency to crawl the internet

Imagine that, instead of crawling the whole internet to discover new domains, you were just told about them as soon as they launched in near-realtime. Even better; imagine that this was available to anybody, not just the huge tech companies. What could the possibilities be, and what could be built?

A while back now, I posted a tweet with a musing I’d had after reading up on the Certificate Transparency project. I was wondering if the underlying technologies – which are designed to allow for the auditing and monitoring of SSL certificate issuance – could serve a wider purpose in the area of web crawling, and whether it (or something similar) could trigger a shift in the way that crawlers operate in the distant future. In this post, I wanted to share my line of thinking.

Before we start to look at Certificate Transparency itself, let’s recap on the various methods by which an existing crawler (be it for a search engine, or any other purpose) might discover previously-unseen domains to process;

Standard CrawlingRelies on sites being linked to in order to discover them, and can be resource intensive (you may need to crawl an entire site just to discover one new domain).
Manual SubmissionRelies on people manually submitting their sites to you.
Parsing TLD Zone FilesOnly discovers the apex domain – you’ll be missing any subdomains.

Each of these has their own merits and drawbacks, but one thing in common is this; you need to somehow go looking for new sites to crawl. This is where Certificate Transparency comes in…

What is Certificate Transparency?

Certificate Transparency is a framework designed to allow monitoring and auditing of SSL Certificate issuance. When a trusted Certificate Authority (such as Let’s Encrypt or Cloudflare) issues a new certificate, they push an entry with it’s details to a number of cryptographically-verified public logs which can then be read by any number of consumers. An example consumer is the crt.sh tool, which allows us to view all of the certificates generated for this domain (simon-thompson.me).

Needless to say, this additional layer of transparency is a good thing for security on the web. It allows site owners to detect mis-issued certificates which could be impersonating them, and it allows rogue CA’s to be identified easily. If you’d like to read up a bit more on CT itself or want more technical details, I recommend reading Scott Helme’s introductory post, plus the official site itself.

Over the past few years, CT has increasingly become a requirement. For instance, Chrome now requires that an SSL certificate is logged via CT otherwise it simply won’t trust it – a move which has encouraged CA’s to adopt the technology. When you combine that with the increasing shift to have HTTPS as the default (e.g. Chrome’s UI changes and Google’s incorporation of HTTPS as a ranking signal), we’re increasingly headed towards a web where the majority of the public-facing web is going to be using an SSL Certificate and, by extension, getting logged into a CT log (all good things, might I add).

Use Cases & Caveats

So we’ve got a near-realtime stream of domains, but what can we actually use it for? Some examples I can think of are;

Search Engines
Albeit more limited by the caveats which I’ll detail momentarily, it’s possible that search engines could choose to use Certificate Transparency logs as a source of domains for their crawlers.
Phishing DetectionFacebook (and others) have done some work in this area already, but it’s possible to rapidly detect possible phishing attacks.
Vulnerability Analysis
Given that most people won’t be aware that the certificates they generate are being logged, a large amount of staging, development and otherwise hidden environments can be exposed.

As an aside, if you’re hoping to protect against this you should look to implement security methods – such as HTTP basic auth for staging sites – before or as soon after generating an SSL certificate as possible. Assume that it will be public knowledge and being poked by vulnerability scanners within a matter of hours at most. There are also some options around redaction in CT which you may wish to review.

Of course, this list is non-exhaustive. The thing which interests me about all of this though, is that certificate transparency essentially democratises the list of sites on the web so people can build whatever they want on top of it.

As always, there are some caveats to the data available;

  1. By definition, you’re only going to discover sites which have had a certificate generated and are on HTTPS. This will be an increasingly large portion of the web as time goes by, but you’ll still be missing non-secure sites and might need to detect them through other means.
  2. A lot of these domains are probably not designed to be public. For example, a huge amount of certs are for things like webmail / staging / control panel subdomains so, depending on your use case, aren’t worth pursuing. You could filter these out, of course.

How to access the logs

Now, you may be wondering how easy it is (or isn’t) to tap into the logs to try this out for yourself. It is possible to directly read them, however for the scope of this blog post and experimenting there’s an easier (and currently free) option – namely certstream – which abstracts away the work of parsing the huge logs and turns it into one simple stream.

Using their code samples, you can very quickly get something up and running which looks like the below.

Summary

I’m not as naive as to believe that things will change overnight, especially given that a large portion of the web (including some major sites) is still not on HTTPS, but I can’t help but feel that we’ll see a move towards a “known web” where a large amount of domains can be discovered quite easily with less overhead than required today.

This could mean sites being picked up and crawled more quickly, vulnerabilities being discovered before a site’s even made public, and information being disclosed that was assumed to be private (although, if you’re relying on security by obscurity, that’s probably not ideal anyway!).

Overall though, I’m excited to see what unexpected use cases come out as a side effect of Certificate Transparency as a technology.

Simple DOM Manipulation via jQuery in Cloudflare Workers

I’ve recently been trying out Cloudflare Workers as part of another write-up which I’ll be sharing soon, and I’m really excited about their potential. For those who aren’t familiar with them, Cloudflare Workers allow you to write custom JavaScript code to run “on the edge” (i.e. in Cloudflare’s data centres) which can modify a user’s request on it’s way to your origin server, or the response on the way back. This is pretty exciting, as it opens up a range of options across the board in the realms of security, performance and customisation (among others).

The Problem

The particular use-case I’m looking at involves lots of DOM manipulation – i.e. changing page content – something which is quite tricky with Cloudflare Workers currently as your only options are string manipulation and regex, which can get unwieldy very quickly. Ideally we’d be able to use the same tools that we use with JavaScript in a browser – like document.querySelectorAll to find elements matching a CSS selector – but Workers aren’t Browsers, so unfortunately these methods aren’t available to us.

After a number of tests I emailed to get the opinion of the Cloudflare Developer Help team, who let me know that they’re planning improvements in this area in the near future, but in the meantime they were aware of another user who had managed to incorporate some DOM functionality into their Worker by browserifying the Node.js dom-parser module and including it in their Worker, so that might be an option to investigate in the meantime. I tinkered with this and got it working pretty quickly, but it got me thinking; this is a good option for getting data out of the page, but what if we could include something like jQuery in a Cloudflare worker? This would reduce the complexity of DOM manipulation massively, potentially even for non-developers, and allow us to easily modify the response before sending it to the client.

The Solution

It turns out, the cheerio module for Node.js provides exactly what we need – a server-side implementation of jQuery.

After a few hours of testing and tweaking, I’ve managed to get a proof of concept working which embeds Cheerio (jQuery) into a Cloudflare Worker. If you’d like to give it a go yourself, you can play around with the code in the Playground I’ve put together (alternatively the source is available in this gist). Feel free to make use of either in your own projects!

In the example below, I’m using jQuery to modify the response from the server and change the content of all h1 tags to be “¯\_(ツ)_/¯”.

You can also apply CSS styles, as seen in the example below (note that all of the changes happen “on the edge” before the response is sent to the client – no Javascript is running in the client / browser);

The possibilities here are huge!

Bundling npm modules

If you’re a developer interested in how to bundle NPM modules into a Worker script, the steps are roughly as follows;

  1. Install Browserify globally
  2. Create a new node project with npm init, and npm install the module(s) you need
  3. Create a file main.js, and add a require('module') for each module
  4. Run browserify main.js -o bundle.js
  5. Look through bundle.js and find function(require,module,exports){…} – your code will go inside of this function. Drop the sample code from the Playground site in that function, paste it back into the Playground, and check the console to see if you get any errors

As an additional step, you can minify the output of Step 4 by using an ES6-compatible minifier (like https://skalman.github.io/UglifyJS-online/) – this will give you some tidier code to work with. I did find that this required a bit of tweaking of minifier settings to get it working correctly, so if you’re struggling – give me a shout via Twitter and I’m happy to chat!

Year In Review: 2017

2017 has been a great year for me in many ways, so I wanted to pull together some of my highlights in a quick post.

Writing & Research

Thanks to friendly nudges and support from colleagues, 2017 was the first year where I’ve started publicly sharing some of the R&D work I get to do alongside my day job. The initial catalyst was the reception to a tweet Chris put out about a piece of research we worked on together involving hourly rank tracking.

Given the evident community interest around this, I wrote up a bit more about our findings on the then newly released StrategiQ Medium blog, and went along to the talk Chris gave at Search London about it.

Luckily, as a follow-up to this I was also able to write about a (perhaps obvious) side-effect of the hourly rank tracking which we’d noticed in Google Search Console.

And finally, I polished up and released something I’d been experimenting with for quite a while – a way to view referring Twitter users in Google Analytics.

Work

2017 saw another full year at StrategiQ with a number of site launches which I’m particularly proud of. We’ve also made great strides with our development standards and hosting infrastructure, which is something I’m keen to share more about in 2018 to try and help other agencies put better processes in place too.

Open Source & Code

Mostly tied in with the blog posts above, I was able to release a few small open source projects over on GitHub. These were;

  • ghks – A Node key/value store, which uses GitHub gists for persistent storage.
  • twitlytics-server – A Node app which resolves a t.co referring URL to the original tweet / tweeter.
  • WPVersion – A JS Module and PHP Class for detecting the version of WordPress being used on a given site.

I’ve also finished up Louise’s personal website, something we’ve wanted to do for ages now, so if you happen to be looking for Music Tuition in Witham, check out her site!

Plans for 2018

Whilst I’m not one for setting specific “New Year’s Resolutions”, I’m definitely hoping to share more blog posts both here and through work in 2018, plus start actually shipping a few of the side-projects I’ve had bubbling away – so watch this space!

Back to WordPress

Until today this site ran on a script called “statik” which I wrote to turn markdown files into a website. Whilst it’s served a purpose, it didn’t do particularly well when it came to blogging – something I’m hoping to do more of.

Having had the chance to see “WordPress Done Right” over the past months, I’ve opted to switch this site over to WordPress.

Over at StrategiQ we’ve got a pretty decent setup for our sites, so I’m taking the same approach here; DNS routed through CloudFlare (with caching enabled), pointing via CNAME to hosting with WPEngine. Whilst slightly more pricey than just running a VPS, the reliability is worth it in my mind.

Hopefully this will be a motivator to write more, but also serves as a good test bed for the plugins and research I’ll be doing over the coming months.