The Cloudflare Story - and why it is considered harmful.
Websites should avoid using Cloudflare.
Cloudflare's HTTP fronting service incorporates some seriously questionable practices. As a result, the number of websites that use Cloudflare poses a hazard to the state of the web. As advertised to site operators, Cloudflare fails to identify these issues, and operators may even need to be aware of them.
The CAPTCHA absurdity
If a website uses Cloudflare, most likely and by default, this will result in the website being rendered stochastically defective. A website is an HTTP request processing service. Unfortunately, the adoption of Cloudflare results in such services becoming unreliable and causes denial-of-service conditions to occur for users in an essentially random and unaccountable fashion.
Where such denial-of-service conditions occur, Cloudflare provides a bizarre “one more step” page inviting visitors to complete a reCAPTCHA to access the site. Cloudflare claims that this is based on IP reputation, which constitutes a fallacious equivocation of IPs and users, which is highly detrimental to Tor users in terms of the browseability of the web. This doesn't even work if the user has cookies disabled or uses a browser that doesn't support iframes, such as lynx.
The HTTP status code for this page is 403 Forbidden. Essentially, Cloudflare, by design, randomly perpetrates denial-of-service attacks on users, yet at the same time, Cloudflare paradoxically advertises itself as a service to mitigate DoS attacks.
Cloudflare claims that these measures are necessary to counter abuse. However, this claim is dubious because it is a model of operation for a CDN with few imitators. In other words, other CDNs are fine implementing HTTP properly while providing anti-DDoS measures without resorting to such practices as random demands that users complete CAPTCHAs.
Cloudflare's randomly occurring demand that users complete CAPTCHAs discriminates against users who are not humans by design. It constitutes a hazard to the crawlability of the web. It is still being determined what recourse is available to organizations spidering the web which find themselves impaired by Cloudflare's actions. Even if Cloudflare were to whitelist these organizations, this essentially makes Cloudflare an authority on the legitimacy of a search engine, which, given the magnitude of Cloudflare's user base, is deeply concerning.
There is no basis in the HTTP standard to demand that users or their user agents complete CAPTCHAs to load a page, and the CAPTCHA demands issued by Cloudflare are not communicated in any standard way. The '403 Forbidden' status code used for these pages could express arbitrary policy prohibitions, such as non-negotiable denial of access. Cloudflare demands not only a human user (but not consistently; only if it randomly decides to do so) but also does not unambiguously communicate where this is the case in a machine-readable manner, which is discriminatory to robots, many of which are legitimate.
Rather absurdly, if you want to provide an API over Cloudflare, you have to exempt your API endpoints so that this doesn't happen, which raises the question of what the point is in the first place if you have to make holes in it, and proves the point.
Cloudflare's inexplicable inability to implement HTTP in a sane, transparent manner, despite this incapability being seemingly unshared by every CDN service, became even more ridiculous when Cloudflare reached out to the Tor project to request that they make changes to Tor to accommodate their problematic practices.
In its liaison with Tor, Cloudflare states that it is the reasoning for its CAPTCHAs is not DDoS mitigation but the following:
- They want to mitigate comment spam.
- It's better UI to verify the user in a GET request than in a POST request when they're commenting.
- Moreover, intercepting the POST request will only work correctly if the comment system uses AJAX.
This last point might make Cloudflare realize that trying to stop comment spam at the CDN level is futile and can only result in breaking HTTP. What Cloudflare is trying to do here is a fundamentally broken practice (in fact, the whole premise of “web application firewalls” is fundamentally broken, see below) because Cloudflare is not in a position to understand the semantic meaning of HTTP traffic and is not in a position to rearchitect a site operator's web application so that it understands why its own AJAX requests are randomly being denied.
In other words, since CAPTCHAs are discriminatory to robots, as discussed above, Cloudflare's service is unwittingly discriminatory to the JavaScript of the very websites it serves, breaking them. Cloudflare's response to this appears to be to CAPTCHA the entire website up-front on the off chance someone might want to post a comment, though even this doesn't always work; I have encountered websites that didn't do so and which had broken AJAX functionality due to the subsequent AJAX-triggered requests being denied by Cloudflare, a condition the JS code was not designed to handle (nor would there be any sane way for it to handle it anyway).
“Abuse” and the Web Application Firewall Fallacy
Being a comment spam filtering service is one of many things Cloudflare is trying to do besides being a CDN. They also claim to use their CAPTCHAs to mitigate other “abusive” traffic, like “harvesting e. mail addresses.”
- Lunar: [...] Could you tell us [sic] what qualifies as abuse?
- jgrahamc: Abuse: comment spamming, harvesting email addresses, attacking web applications (e.g., SQL injection), HTTP DoS (exploiting slow web servers/applications to knock them offline). I'm not interested in L3/L4 DoS and Tor as that's non-existent (unless then [sic] exit node is separately part of a botnet).
Spam filtering is a fundamentally broken practice. Attempting to filter for SQL injection at the CDN level is an exercise in futility and security theatre. The “Web Application Firewall” idea is the absurd idea that grepping requests/responses for known-to-be-naughty patterns are an adequate cure for vulnerable web applications.
Suppose I log in to a website with the username ' OR 0=0 --. In that case, Cloudflare has no way of knowing whether this is a SQL injection attack or just a peculiar username that the website has decided to issue legitimately. Furthermore, Cloudflare needs to find out if the website even uses SQL for data storage.
Suppose I post ' OR 0=0 -- in a comment. In that case, Cloudflare has no way of knowing whether this is an SQL injection attack, whether it will work, or whether I'm posting a comment discussing SQL injection and including examples (at which point this becomes a form of censorship).
Using Cloudflare means that Cloudflare will randomly cause DoS to users if it thinks they're trying to use a pattern of text to which Cloudflare is, by design, allergic. The circumstances in which these denials of service occur are, of course, ill-defined and in no way exhaustively enumerated, so using Cloudflare presents an intense and unaccounted liability in terms of availability and content neutrality for any website. Moreover, it can make your website unreliable and fail randomly.
The “web application firewall” concept is fundamentally flawed in all instances because it falsely presupposes that a blind intermediate proxy can reliably assess the semantic meaning of data transmitted, which is impossible. Since this kind of “service” is part of the Cloudflare value proposition and an attempt to add a profit-making value-add, Cloudflare has essentially built its entire business on doing something that is a bad idea and cannot be reliably implemented.
Arbitrary and poorly defined content mangulation
Continuing with the flawed “web application firewall” theme of an unknowing proxy trying to guess the semantic meaning of content transmitted through it, Cloudflare insists on being a CDN that does un-CDN-like things in yet other ways. Rather than being a neutral proxy of traffic, even when Cloudflare isn't stochastically DoSing its customer's websites, Cloudflare insists on doing interesting things with response bodies.
For example, it mangles e. mail addresses and replaces them with some JavaScript convolution intended to complicate harvesting. Except that it doesn't mangle e. mail addresses; it mangles anything which looks vaguely like an e. mail address, even if it isn't.
- This
# Welcome to example.com. To access the foobar API, use curl:
curl 'https://foo@example.com/foobar'
- becomes
# Welcome to example.com. To access the foobar API, use curl:
curl 'https://[email protected]/foobar'
XMPP address? Filtered. SIP address? Filtered. OpenSSH algorithm identifier? Filtered. Principal Kerberos principal? Filtered. Because this filtering is necessarily done without regard to the same issues as trying to prevent SQL injection and is a potent demonstration of how “Web Application Firewalls” are a fundamentally stupid idea. Cloudflare can't know what an email address is but filters away it.
Since I browse the web with JavaScript disabled by default, it's a running facepalm for me to find things on websites that aren't even email addresses replaced with [email protected], even parts of source code listings. Of course, this practice also discriminates against users with JavaScript disabled and against browsers that don't support JavaScript, preventing them from viewing email addresses (or anything that looks like one).
Cloudflare also takes other liberties. For example, it rejiggers a web page's JavaScript to optimize it. But, again, this should be seen as a liability from a website operator's perspective.
Cloudflare is an Intelligence Agency
- No other CDN service offers a free service comparable to that of Cloudflare. So why does Cloudflare offer service for free?
It's because Cloudflare isn't a CDN; it's an intelligence project. Its entire purpose is to collect data. This isn't our inference; the founders of Cloudflare have happily gone on record and said it:
- In 2003, Lee Holloway and I started Project Honey Pot as an open-source project to track online fraud and abuse. The Project allowed anyone with a website to install a piece of code and track hackers and spammers. We ran it as a hobby and only thought a little about it until, in 2008, the Department of Homeland Security called and said, 'Do you have any idea how valuable the data you have is?' That started us thinking about how to effectively deploy the data from Project Honey Pot and other sources to protect online websites. That turned into the initial impetus for Cloudflare.
- Yes, Cloudflare was founded by the Project Honeypot people.
Cloudflare also has a highly generous free tier, and most websites that use it probably need to pay. But as we've come to understand in this era of surveillance capitalism, if you aren't paying, you aren't the customer — you're the product.
Threat to the anonymity of Tor users. Cloudflare doesn't just pointlessly inconvenience Tor users by making them solve CAPTCHAs to view websites; it also poses a vehicle for the deanonymisation of Tor users. Cloudflare is an ideal platform for attacking Tor because it is the closest anyone has ever come to building a Global Active Adversary (GAA). This entity can observe and modify traffic anywhere in the world. Compare this with the lesser category of the Global Passive Adversary (GPA), which can monitor but not modify traffic anywhere in the world. Unfortunately, Tor is not designed to offer adequate security against either.
To put this in perspective, in 2013, the NSA had given up on ever achieving GPA status (and therefore on being able to deanonymize Tor traffic reliably), let alone GAA. So Cloudflare is effectively inviting people to help it become a GAA.
Cloudflare delivers tracking cookies for any website. So you still get a tracking cookie even if the website is static and stateless. (Since Cloudflare has assets in the EU, it's a CDN.)
It is probably a US Government-attached intelligence agency
Cloudflare is known for providing its services to various websites, including notorious piracy websites like The Pirate Bay. It is also a US company.
Since the US is known for taking down even companies that appear to be legal on paper, such as Megaupload, when they are associated with copyright infringement, this situation is peculiar.
Potential liability. 17 USC 512 provides exemptions from liability for copyright infringement for various entities. For example, 17 USC 512(c) provides for the takedown of “information residing on systems or networks at the direction of users”; this is the well-known “DMCA notice” provision. However, it also contains a provision 17 USC 512(b) related to caching proxies (i.e., Cloudflare).
This clause provides several conditions for this exemption from liability to be valid:
- That the caching proxy transmits the material without modification; and
- That the proxy handles takedown notices for material on a site if a court has ordered that the material be removed from the original site, amongst other things.
In other words, if a US court were to order that The Pirate Bay take down certain pages of their site, Cloudflare would be obliged to comply with notices asking them to give effect to that takedown in the absence of compliance by The Pirate Bay itself — and in any case, it seems likely that a US court could also order Cloudflare directly to disable access to it.
But even this is moot because Cloudflare modifies the material it passes. Therefore, it cannot claim a 17 USC 512(b) exemption. Moreover, since 47 USC 230 (the Communications Decency Act) explicitly exempts copyright from the immunity from liability it grants intermediaries, without an applicable 17 USC 512 exemption, it is likely to be liable. Despite this, there has been an absence of even an attempted attack by the US on Cloudflare's activities providing services to notorious piracy websites.
The author's conclusions. The US government could adversely affect Cloudflare's business via legal action if they wished. The fact that they have not is, therefore, unusual. However, it is well known that US federal law enforcement is happy to avoid shutting down illegal activities if they believe that they can obtain more intelligence by not doing so.
From previous programs like PRISM, it is well demonstrated that most prominent tech companies are happy to comply with requests for total access by intelligence and law enforcement agencies. Moreover, given that an absurdly large number of websites now use Cloudflare, this means that Cloudflare is the world's premier global MitM agency. This is a level of access and visibility that signals intelligence agencies could ordinarily only dream of, especially since it breaks TLS. To put it frankly, the intercept data available to Cloudflare is so tantalizing to intelligence agencies, it seems almost beyond plausibility that they haven't gone after it. Especially when considering the possibility of effectively blackmailing Cloudflare over the legality of their activities and, for that matter, the known historical contact of their founders with DHS. (Though even this assumes a default reluctance to assist extrajudicial surveillance on the part of tech companies, which is known to frequently not be the case in favor of indifference or outright enthusiasm.)
In other words, on the balance of probabilities, we believe that Cloudflare's continued lack of aggression from the US government and simple consideration of the standard MO of both intelligence agencies and tech companies makes it overwhelmingly likely that Cloudflare is an effective element of US signals intelligence.
This is not, of course, evidence or proof. However, the probabilities are adequate that assuming it is not the case would be imprudent. Although using Cloudflare as a wiretap alone would be alarming in itself, the prospect of the fusion of Cloudflare's partial GAA and the NSA's partial GPA capabilities would be extremely formidable. An abundance of caution — and a distrust of companies essentially trying to put themselves in a position to MitM all web traffic, even if they claim benevolence — is wholly advisable.
After all, the mere possibility of this threatens to undermine the “crypto renaissance”, to undermine everything the public cryptographic community has worked for since 2013 — and cryptography is all about mitigating mere possibilities in the first place(1).
Conclusions
- Cloudflare's product is based on fundamentally flawed ideas such as “web application firewalling,” which simply cannot work properly.
- Using Cloudflare is a way of stochastically DoSing and subtly breaking your website.
- Using Cloudflare discriminates against Tor users and, for that matter, some non-Tor users.
- Cloudflare is the world's leading global MitM agency, rivaling the power of any signals intelligence agency. They are in a position to monitor, surveil, deanonymize and modify an alarmingly large and growing percentage of web traffic because of its widespread usage and the fact that it terminates TLS sessions(2). Even if you were to trust Cloudflare, putting this level of trust in one entity is highly unwise.
- It is tough to imagine this intercept data not ending up in the hands of intelligence agencies.
(1). We can assume nowadays that anything an intelligence agency *could* be doing, it very probably *is* doing. The Snowden revelations show that this rule serves one well, so we sometimes refer to this as “Snowden's law.”
(2). A preemptive aside — people familiar with Cloudflare's offering may raise their “keyless SSL” service, but this doesn't change anything because Cloudflare still terminates the TLS session and sees the decrypted traffic.
(2). A preemptive aside — people familiar with Cloudflare's offering may raise their “keyless SSL” service, but this doesn't change anything because Cloudflare still terminates the TLS session and sees the decrypted traffic.
If you are an organization or an individual with an online presence for your organization, now it's time to use a distributed network of DNS services that protect your online presence. Follow the “Learn More” button below to learn about our Private DNS services.
© 2019 - 2022 iBlockchain Bank And Trust Technologies Co., All Rights Reserved