CloudFlare Blog

Details of the Cloudflare outage on July 2, 2019

Almost nine years ago, Cloudflare was a tiny company and I was a customer not an employee. Cloudflare had launched a month earlier and one day alerting told me that my little site, jgc.org, didn’t seem to have working DNS any more. Cloudflare had pushed out a change to its use of Protocol Buffers and it had broken DNS.I wrote to Matthew Prince directly with an email titled “Where’s my dns?” and he replied with a long, detailed, technical response (you can read the full email exchange here) to which I replied:From: John Graham-Cumming Date: Thu, Oct 7, 2010 at 9:14 AM Subject: Re: Where's my dns? To: Matthew Prince Awesome report, thanks. I'll make sure to call you if there's a problem. At some point it would probably be good to write this up as a blog post when you have all the technical details because I think people really appreciate openness and honesty about these things. Especially if you couple it with charts showing your post launch traffic increase. I have pretty robust monitoring of my sites so I get an SMS when anything fails. Monitoring shows I was down from 13:03:07 to 14:04:12. Tests are made every five minutes. It was a blip that I'm sure you'll get past. But are you sure you don't need someone in Europe? :-) To which he replied:From: Matthew Prince Date: Thu, Oct 7, 2010 at 9:57 AM Subject: Re: Where's my dns? To: John Graham-Cumming Thanks. We've written back to everyone who wrote in. I'm headed in to the office now and we'll put something on the blog or pin an official post to the top of our bulletin board system. I agree 100% transparency is best. And so, today, as an employee of a much, much larger Cloudflare I get to be the one who writes, transparently about a mistake we made, its impact and what we are doing about it.The events of July 2On July 2, we deployed a new rule in our WAF Managed Rules that caused CPUs to become exhausted on every CPU core that handles HTTP/HTTPS traffic on the Cloudflare network worldwide. We are constantly improving WAF Managed Rules to respond to new vulnerabilities and threats. In May, for example, we used the speed with which we can update the WAF to push a rule to protect against a serious SharePoint vulnerability. Being able to deploy rules quickly and globally is a critical feature of our WAF.Unfortunately, last Tuesday’s update contained a regular expression that backtracked enormously and exhausted CPU used for HTTP/HTTPS serving. This brought down Cloudflare’s core proxying, CDN and WAF functionality. The following graph shows CPUs dedicated to serving HTTP/HTTPS traffic spiking to nearly 100% usage across the servers in our network.CPU utilization in one of our PoPs during the incidentThis resulted in our customers (and their customers) seeing a 502 error page when visiting any Cloudflare domain. The 502 errors were generated by the front line Cloudflare web servers that still had CPU cores available but were unable to reach the processes that serve HTTP/HTTPS traffic.We know how much this hurt our customers. We’re ashamed it happened. It also had a negative impact on our own operations while we were dealing with the incident.It must have been incredibly stressful, frustrating and frightening if you were one of our customers. It was even more upsetting because we haven’t had a global outage for six years. The CPU exhaustion was caused by a single WAF rule that contained a poorly written regular expression that ended up creating excessive backtracking. The regular expression that was at the heart of the outage is (?:(?:\"|'|\]|\}|\\|\d|(?:nan|infinity|true|false|null|undefined|symbol|math)|\`|\-|\+)+[)]*;?((?:\s|-|~|!|{}|\|\||\+)*.*(?:.*=.*))) Although the regular expression itself is of interest to many people (and is discussed more below), the real story of how the Cloudflare service went down for 27 minutes is much more complex than “a regular expression went bad”. We’ve taken the time to write out the series of events that lead to the outage and kept us from responding quickly. And, if you want to know more about regular expression backtracking and what to do about it, then you’ll find it in an appendix at the end of this post.What happenedLet’s begin with the sequence of events. All times in this blog are UTC.At 13:42 an engineer working on the firewall team deployed a minor change to the rules for XSS detection via an automatic process. This generated a Change Request ticket. We use Jira to manage these tickets and a screenshot is below.Three minutes later the first PagerDuty page went out indicating a fault with the WAF. This was a synthetic test that checks the functionality of the WAF (we have hundreds of such tests) from outside Cloudflare to ensure that it is working correctly. This was rapidly followed by pages indicating many other end-to-end tests of Cloudflare services failing, a global traffic drop alert, widespread 502 errors and then many reports from our points-of-presence (PoPs) in cities worldwide indicating there was CPU exhaustion.Some of these alerts hit my watch and I jumped out of the meeting I was in and was on my way back to my desk when a leader in our Solutions Engineering group told me we had lost 80% of our traffic. I ran over to SRE where the team was debugging the situation. In the initial moments of the outage there was speculation it was an attack of some type we’d never seen before.Cloudflare’s SRE team is distributed around the world, with continuous, around-the-clock coverage. Alerts like these, the vast majority of which are noting very specific issues of limited scopes in localized areas, are monitored in internal dashboards and addressed many times every day. This pattern of pages and alerts, however, indicated that something gravely serious had happened, and SRE immediately declared a P0 incident and escalated to engineering leadership and systems engineering.The London engineering team was at that moment in our main event space listening to an internal tech talk. The talk was interrupted and everyone assembled in a large conference room and others dialed-in. This wasn’t a normal problem that SRE could handle alone, it needed every relevant team online at once.At 14:00 the WAF was identified as the component causing the problem and an attack dismissed as a possibility. The Performance Team pulled live CPU data from a machine that clearly showed the WAF was responsible. Another team member used strace to confirm. Another team saw error logs indicating the WAF was in trouble. At 14:02 the entire team looked at me when it was proposed that we use a ‘global kill’, a mechanism built into Cloudflare to disable a single component worldwide. But getting to the global WAF kill was another story. Things stood in our way. We use our own products and with our Access service down we couldn’t authenticate to our internal control panel (and once we were back we’d discover that some members of the team had lost access because of a security feature that disables their credentials if they don’t use the internal control panel frequently).And we couldn’t get to other internal services like Jira or the build system. To get to them we had to use a bypass mechanism that wasn’t frequently used (another thing to drill on after the event). Eventually, a team member executed the global WAF kill at 14:07 and by 14:09 traffic levels and CPU were back to expected levels worldwide. The rest of Cloudflare's protection mechanisms continued to operate.Then we moved on to restoring the WAF functionality. Because of the sensitivity of the situation we performed both negative tests (asking ourselves “was it really that particular change that caused the problem?”) and positive tests (verifying the rollback worked) in a single city using a subset of traffic after removing our paying customers’ traffic from that location.At 14:52 we were 100% satisfied that we understood the cause and had a fix in place and the WAF was re-enabled globally.How Cloudflare operatesCloudflare has a team of engineers who work on our WAF Managed Rules product; they are constantly working to improve detection rates, lower false positives, and respond rapidly to new threats as they emerge. In the last 60 days, 476 change requests have been handled for the WAF Managed Rules (averaging one every 3 hours).This particular change was to be deployed in “simulate” mode where real customer traffic passes through the rule but nothing is blocked. We use that mode to test the effectiveness of a rule and measure its false positive and false negative rate. But even in the simulate mode the rules actually need to execute and in this case the rule contained a regular expression that consumed excessive CPU.As can be seen from the Change Request above there’s a deployment plan, a rollback plan and a link to the internal Standard Operating Procedure (SOP) for this type of deployment. The SOP for a rule change specifically allows it to be pushed globally. This is very different from all the software we release at Cloudflare where the SOP first pushes software to an internal dogfooding network point of presence (PoP) (which our employees pass through), then to a small number of customers in an isolated location, followed by a push to a large number of customers and finally to the world.The process for a software release looks like this: We use git internally via BitBucket. Engineers working on changes push code which is built by TeamCity and when the build passes, reviewers are assigned. Once a pull request is approved the code is built and the test suite runs (again). If the build and tests pass then a Change Request Jira is generated and the change has to be approved by the relevant manager or technical lead. Once approved deployment to what we call the “animal PoPs” occurs: DOG, PIG, and the Canaries.The DOG PoP is a Cloudflare PoP (just like any of our cities worldwide) but it is used only by Cloudflare employees. This dogfooding PoP enables us to catch problems early before any customer traffic has touched the code. And it frequently does.If the DOG test passes successfully code goes to PIG (as in “Guinea Pig”). This is a Cloudflare PoP where a small subset of customer traffic from non-paying customers passes through the new code. If that is successful the code moves to the Canaries. We have three Canary PoPs spread across the world and run paying and non-paying customer traffic running through them on the new code as a final check for errors.Cloudflare software release processOnce successful in Canary the code is allowed to go live. The entire DOG, PIG, Canary, Global process can take hours or days to complete, depending on the type of code change. The diversity of Cloudflare’s network and customers allows us to test code thoroughly before a release is pushed to all our customers globally. But, by design, the WAF doesn’t use this process because of the need to respond rapidly to threats.WAF ThreatsIn the last few years we have seen a dramatic increase in vulnerabilities in common applications. This has happened due to the increased availability of software testing tools, like fuzzing for example (we just posted a new blog on fuzzing here). Source: https://cvedetails.com/What is commonly seen is a Proof of Concept (PoC) is created and often published on Github quickly, so that teams running and maintaining applications can test to make sure they have adequate protections. Because of this, it’s imperative that Cloudflare are able to react as quickly as possible to new attacks to give our customers a chance to patch their software.A great example of how Cloudflare proactively provided this protection was through the deployment of our protections against the SharePoint vulnerability in May (blog here). Within a short space of time from publicised announcements, we saw a huge spike in attempts to exploit our customer’s Sharepoint installations. Our team continuously monitors for new threats and writes rules to mitigate them on behalf of our customers.The specific rule that caused last Tuesday’s outage was targeting Cross-site scripting (XSS) attacks. These too have increased dramatically in recent years.Source: https://cvedetails.com/The standard procedure for a WAF Managed Rules change indicates that Continuous Integration (CI) tests must pass prior to a global deploy. That happened normally last Tuesday and the rules were deployed. At 13:31 an engineer on the team had merged a Pull Request containing the change after it was approved. At 13:37 TeamCity built the rules and ran the tests, giving it the green light. The WAF test suite tests that the core functionality of the WAF works and consists of a large collection of unit tests for individual matching functions. After the unit tests run the individual WAF rules are tested by executing a huge collection of HTTP requests against the WAF. These HTTP requests are designed to test requests that should be blocked by the WAF (to make sure it catches attacks) and those that should be let through (to make sure it isn’t over-blocking and creating false positives). What it didn’t do was test for runaway CPU utilization by the WAF and examining the log files from previous WAF builds shows that no increase in test suite run time was observed with the rule that would ultimately cause CPU exhaustion on our edge.With the tests passing, TeamCity automatically began deploying the change at 13:42.QuicksilverBecause WAF rules are required to address emergent threats they are deployed using our Quicksilver distributed key-value (KV) store that can push changes globally in seconds. This technology is used by all our customers when making configuration changes in our dashboard or via the API and is the backbone of our service’s ability to respond to changes very, very rapidly.We haven’t really talked about Quicksilver much. We previously used Kyoto Tycoon as a globally distributed key-value store, but we ran into operational issues with it and wrote our own KV store that is replicated across our more than 180 cities. Quicksilver is how we push changes to customer configuration, update WAF rules, and distribute JavaScript code written by customers using Cloudflare Workers.From clicking a button in the dashboard or making an API call to change configuration to that change coming into effect takes seconds, globally. Customers have come to love this high speed configurability. And with Workers they expect near instant, global software deployment. On average Quicksilver distributes about 350 changes per second.And Quicksilver is very fast.  On average we hit a p99 of 2.29s for a change to be distributed to every machine worldwide. Usually, this speed is a great thing. It means that when you enable a feature or purge your cache you know that it’ll be live globally nearly instantly. When you push code with Cloudflare Workers it's pushed out at the same speed. This is part of the promise of Cloudflare fast updates when you need them.However, in this case, that speed meant that a change to the rules went global in seconds. You may notice that the WAF code uses Lua. Cloudflare makes use of Lua extensively in production and details of the Lua in the WAF have been discussed before. The Lua WAF uses PCRE internally and it uses backtracking for matching and has no mechanism to protect against a runaway expression. More on that and what we're doing about it below.Everything that occurred up to the point the rules were deployed was done “correctly”: a pull request was raised, it was approved, CI/CD built the code and tested it, a change request was submitted with an SOP detailing rollout and rollback, and the rollout was executed. Cloudflare WAF deployment processWhat went wrongAs noted, we deploy dozens of new rules to the WAF every week, and we have numerous systems in place to prevent any negative impact of that deployment. So when things do go wrong, it’s generally the unlikely convergence of multiple causes. Getting to a single root cause, while satisfying, may obscure the reality. Here are the multiple vulnerabilities that converged to get to the point where Cloudflare’s service for HTTP/HTTPS went offline.An engineer wrote a regular expression that could easily backtrack enormously.A protection that would have helped prevent excessive CPU use by a regular expression was removed by mistake during a refactoring of the WAF weeks prior—a refactoring that was part of making the WAF use less CPU.The regular expression engine being used didn’t have complexity guarantees.The test suite didn’t have a way of identifying excessive CPU consumption.The SOP allowed a non-emergency rule change to go globally into production without a staged rollout.The rollback plan required running the complete WAF build twice taking too long.The first alert for the global traffic drop took too long to fire.We didn’t update our status page quickly enough.We had difficulty accessing our own systems because of the outage and the bypass procedure wasn’t well trained on.SREs had lost access to some systems because their credentials had been timed out for security reasons.Our customers were unable to access the Cloudflare Dashboard or API because they pass through the Cloudflare edge.What’s happened since last TuesdayFirstly, we stopped all release work on the WAF completely and are doing the following:Re-introduce the excessive CPU usage protection that got removed. (Done)Manually inspecting all 3,868 rules in the WAF Managed Rules to find and correct any other instances of possible excessive backtracking. (Inspection complete)Introduce performance profiling for all rules to the test suite. (ETA:  July 19)Switching to either the re2 or Rust regex engine which both have run-time guarantees. (ETA: July 31)Changing the SOP to do staged rollouts of rules in the same manner used for other software at Cloudflare while retaining the ability to do emergency global deployment for active attacks.Putting in place an emergency ability to take the Cloudflare Dashboard and API off Cloudflare's edge.Automating update of the Cloudflare Status page.In the longer term we are moving away from the Lua WAF that I wrote years ago. We are porting the WAF to use the new firewall engine. This will make the WAF both faster and add yet another layer of protection.ConclusionThis was an upsetting outage for our customers and for the team. We responded quickly to correct the situation and are correcting the process deficiencies that allowed the outage to occur and going deeper to protect against any further possible problems with the way we use regular expressions by replacing the underlying technology used.We are ashamed of the outage and sorry for the impact on our customers. We believe the changes we’ve made mean such an outage will never recur. Appendix: About Regular Expression BacktrackingTo fully understand how (?:(?:\"|'|\]|\}|\\|\d|(?:nan|infinity|true|false|null|undefined|symbol|math)|\`|\-|\+)+[)]*;?((?:\s|-|~|!|{}|\|\||\+)*.*(?:.*=.*)))  caused CPU exhaustion you need to understand a little about how a standard regular expression engine works. The critical part is .*(?:.*=.*). The (?: and matching ) are a non-capturing group (i.e. the expression inside the parentheses is grouped together as a single expression). For the purposes of the discussion of why this pattern causes CPU exhaustion we can safely ignore it and treat the pattern as .*.*=.*. When reduced to this, the pattern obviously looks unnecessarily complex; but what's important is any "real-world" expression (like the complex ones in our WAF rules) that ask the engine to "match anything followed by anything" can lead to catastrophic backtracking. Here’s why.In a regular expression, . means match a single character, .* means match zero or more characters greedily (i.e. match as much as possible) so .*.*=.* means match zero or more characters, then match zero or more characters, then find a literal = sign, then match zero or more characters.Consider the test string x=x. This will match the expression .*.*=.*. The .*.* before the equal can match the first x (one of the .* matches the x, the other matches zero characters). The .* after the = matches the final x.It takes 23 steps for this match to happen. The first .* in .*.*=.* acts greedily and matches the entire x=x string. The engine moves on to consider the next .*. There are no more characters left to match so the second .* matches zero characters (that’s allowed). Then the engine moves on to the =. As there are no characters left to match (the first .* having consumed all of x=x) the match fails.At this point the regular expression engine backtracks. It returns to the first .* and matches it against x= (instead of x=x) and then moves onto the second .*. That .* matches the second x and now there are no more characters left to match. So when the engine tries to match the = in .*.*=.* the match fails. The engine backtracks again.This time it backtracks so that the first .* is still matching x= but the second .* no longer matches x; it matches zero characters. The engine then moves on to try to find the literal = in the .*.*=.* pattern but it fails (because it was already matched against the first .*). The engine backtracks again.This time the first .* matches just the first x. But the second .* acts greedily and matches =x. You can see what’s coming. When it tries to match the literal = it fails and backtracks again.The first .* still matches just the first x. Now the second .* matches just =. But, you guessed it, the engine can’t match the literal = because the second .* matched it. So the engine backtracks again. Remember, this is all to match a three character string.Finally, with the first .* matching just the first x, the second .* matching zero characters the engine is able to match the literal = in the expression with the = in the string. It moves on and the final .* matches the final x.23 steps to match x=x. Here’s a short video of that using the Perl Regexp::Debugger showing the steps and backtracking as they occur.That’s a lot of work but what happens if the string is changed from x=x to x=xx? This time is takes 33 steps to match. And if the input is x=xxx it takes 45. That’s not linear. Here’s a chart showing matching from x=x to x=xxxxxxxxxxxxxxxxxxxx (20 x’s after the =). With 20 x’s after the = the engine takes 555 steps to match! (Worse, if the x= was missing, so the string was just 20 x’s, the engine would take 4,067 steps to find the pattern doesn’t match).This video shows all the backtracking necessary to match x=xxxxxxxxxxxxxxxxxxxx:That’s bad because as the input size goes up the match time goes up super-linearly. But things could have been even worse with a slightly different regular expression. Suppose it had been .*.*=.*; (i.e. there’s a literal semicolon at the end of the pattern). This could easily have been written to try to match an expression like foo=bar;.This time the backtracking would have been catastrophic. To match x=x takes 90 steps instead of 23. And the number of steps grows very quickly. Matching x= followed by 20 x’s takes 5,353 steps. Here’s the corresponding chart. Look carefully at the Y-axis values compared the previous chart.To complete the picture here are all 5,353 steps of failing to match x=xxxxxxxxxxxxxxxxxxxx against .*.*=.*;Using lazy rather than greedy matches helps control the amount of backtracking that occurs in this case. If the original expression is changed to .*?.*?=.*? then matching x=x takes 11 steps (instead of 23) and so does matching x=xxxxxxxxxxxxxxxxxxxx. That’s because the ? after the .* instructs the engine to match the smallest number of characters first before moving on.But laziness isn’t the total solution to this backtracking behaviour. Changing the catastrophic example .*.*=.*; to .*?.*?=.*?; doesn’t change its run time at all. x=x still takes 555 steps and x= followed by 20 x’s still takes 5,353 steps.The only real solution, short of fully re-writing the pattern to be more specific, is to move away from a regular expression engine with this backtracking mechanism. Which we are doing within the next few weeks.The solution to this problem has been known since 1968 when Ken Thompson wrote a paper titled “Programming Techniques: Regular expression search algorithm”. The paper describes a mechanism for converting a regular expression into an NFA (non-deterministic finite automata) and then following the state transitions in the NFA using an algorithm that executes in time linear in the size of the string being matched against.Thompson’s paper doesn’t actually talk about NFA but the linear time algorithm is clearly explained and an ALGOL-60 program that generates assembly language code for the IBM 7094 is presented. The implementation may be arcane but the idea it presents is not.Here’s what the .*.*=.* regular expression would look like when diagrammed in a similar manner to the pictures in Thompson’s paper.Figure 0 has five states starting at 0. There are three loops which begin with the states 1, 2 and 3. These three loops correspond to the three .* in the regular expression. The three lozenges with dots in them match a single character. The lozenge with an = sign in it matches the literal = sign. State 4 is the ending state, if reached then the regular expression has matched.To see how such a state diagram can be used to match the regular expression .*.*=.* we’ll examine matching the string x=x. The program starts in state 0 as shown in Figure 1. The key to making this algorithm work is that the state machine is in multiple states at the same time. The NFA will take every transition it can, simultaneously.Even before it reads any input, it immediately transitions to both states 1 and 2 as shown in Figure 2.Looking at Figure 2 we can see what happened when it considers  first x in x=x. The x can match the top dot by transitioning from state 1 and back to state 1. Or the x can match the dot below it by transitioning from state 2 and back to state 2.So after matching the first x in x=x the states are still 1 and 2. It’s not possible to reach state 3 or 4 because a literal = sign is needed.Next the algorithm considers the = in x=x. Much like the x before it, it can be matched by either of the top two loops transitioning from state 1 to state 1 or state 2 to state 2, but additionally the literal = can be matched and the algorithm can transition state 2 to state 3 (and immediately state 4). That’s illustrated in Figure 3.Next the algorithm reaches the final x in x=x. From states 1 and 2 the same transitions are possible back to states 1 and 2. From state 3 the x can match the dot on the right and transition back to state 3. At that point every character of x=x has been considered and because state 4 has been reached the regular expression matches that string. Each character was processed once so the algorithm was linear in the length of the input string. And no backtracking was needed.It might also be obvious that once state 4 was reached (after x= was matched) the regular expression had matched and the algorithm could terminate without considering the final x at all.This algorithm is linear in the size of its input.

The Network is the Computer: A Conversation with John Gage

To learn more about the origins of The Network is the Computer®, I spoke with John Gage, the creator of the phrase and the 21st employee of Sun Microsystems. John had a key role in shaping the vision of Sun and had a lot to share about his vision for the future. Listen to our conversation here and read the full transcript below. [00:00:13]John Graham-Cumming: I’m talking to John Gage who was what, the 21st employee of Sun Microsystems, which is what Wikipedia claims and it also claims that you created this phrase “The Network is the Computer,” and that's actually one of the things I want to talk about with you a little bit because I remember when I was in Silicon Valley seeing that slogan plastered about the place and not quite understanding what it meant. So do you want to tell me what you meant by it or what Sun meant by it at the time?[00:00:40]John Gage: Well, in 2019, recalling what it meant in 1982 or 83’ will be colored by all our experience since then but at the time it seemed so obvious that when we introduced the first scientific workstations, they were not very powerful computers. The first Suns had a giant screen and they were on the Internet but they were designed as a complementary component to supercomputers. Bill Joy and I had a series of diagrams for talks we’d give, and Bill had the bi-modal, the two node picture. The serious computing occurred on the giant machines where you could fly into the heart of a black hole and the human interface was the workstation across the network. So each had to complement the other, each built on the strengths of the other, and each enhanced the other because to deal in those days with a supercomputer was very ugly. And to run all your very large computations, you could run them on a Sun because we had virtual memory and series of such advanced things but not fast. So the speed of scientific understanding is deeply affected by the tools the scientist has — is it a microscope, is it an optical telescope, is it a view into the heart of a star by running a simulation on a supercomputer? You need to have the loop with the human and the science constantly interacting and constantly modifying each other, and that’s what the network is for, to tie those different nodes together in as seamless a way as possible. Then, the instant anyone that’s ever created a programming language says, “so if I have to create a syntax of this where I’m trying to let you express, do this, how about the delay on the network, the latency”? Does your phrase “The Network is the Computer” really capture this hundreds, thousands, tens of thousands, millions perhaps at that time, now billions and billions and billions today, all these devices interacting and exchanging state with latency, with delay. It’s sort of an oversimplification, and that we would point out, but it’s just network is the computer. Four words, you know, what we tried to do is give a metaphor that allows you to explore it in your mind and think of new things to do and be inspired.[00:03:35]Graham-Cumming: And then by a sort of strange sequence of events, that was a trademark of Sun. It got abandoned. And now Cloudflare has swooped in and trademarked it again. So now it's our trademark which sort of brings us full circle, I suppose.[00:03:51]Gage: Well, trademarks are dealing with the real world, but the inspiration of Cloudflare is to do exactly what Bill Joy and I were talking about in 1982. It's to build an environment in which every participant globally can share with security, and we were not as strong. Bill wrote most of the code of TCP/IP implemented by every other computer vendor, and still these questions of latency, these questions of distributed denial of service which was, how do you block that? I was so happy to see that Cloudflare invests real money and real people in addressing those kinds of critical problems, which are at the core, what will destroy the Internet. [00:14:48]Graham-Cumming: Yes, I agree. I mean, it is a significant investment to actually deal with it and what I think people don't appreciate about the DDoS attack situation is that they are going on all the time and it's just a continuous, you know, just depends who the target is. It's funny you mentioned TCP/IP because about 10 years after, so in about ‘92, my first real job, I had to write a TCP/IP stack for an obscure network card. And this was prior to the Internet really being available everywhere. And so I didn't realize I could go and get the BSD implementation and recompile it. So I did it from scratch from the RFCs. [00:05:23]Gage: You did! [00:05:25]Graham-Cumming: And the thing I recommend here is that nobody ever does that because, you know, the real world, real code that really interacts is really hard when you're trying to work it with other things, so.[00:05:36]Gage: Do you still, John, do you have that code? [00:05:42]Graham-Cumming: I wonder. I have the binary for it. [00:05:46]Gage: Do hunt for it, because our story was at the time DARPA, the Defense Advanced Research Projects Agency, that had funded networking initiatives around the world. I just had a discussion yesterday with Norway and they were one of the first entities to implement using essentially Bill Joy’s code, but to be placed on the ARPANET. And a challenge went out, and at that time the slightly older generation, the Bolt Beranek and Newman Group, Vint Cerf, Bob Con, those names, as Vint Cerf was a grad student at UCLA where he had built one of the four first Internet sites and the DARPA offices were in Arlington, Virginia, they had massive investments in detection of nuclear underground tests, so seismological data, and the moment we made the very first Suns, I shipped them to DARPA, we got the network up and began serving seismic data globally. Really lovely visualization of events. If you’re trying to detect something, those things go off and then there’s a distinctive signature, a collapse of the underground cavern after. So DARPA had tried to implement, as you did, from the spec, from the RFC, the components, and Vint had designed a lot of this, all the acknowledgement codes and so forth that you had to implement in TCP/IP. So Bill, as a graduate student at Berkeley, we had a meeting in Arlington at DARPA headquarters where BBN and AT&T Bell Labs and a number of other people were in the room. Their code didn’t work, this graduate student from Berkeley named Bill Joy, his code did work, and when Bob Kahn and Vint Cerf asked Bill, “Well, so how did you do it?” What he said was exactly what you just said, he said, “I just read the spec and wrote the code.” [00:08:12]Graham-Cumming: I do remember very distinctly because the company I was working at didn’t have a TCP/IP stack and we didn’t have any IP machines, right, we were doing actually stuff that was all IBM networking, SMA stuff. Somehow we bought what was at that point a HP machine, it was an Apollo workstation and a Sun workstation. I had them on Ethernet and talking to each other. And I do distinctly remember the first time a ping packet came back from that Sun box, saying, yes I managed to send you an IP packet, you managed to send me ICMP response and that was pretty magical. And then I got to TCP and that was hard. [00:08:55]Gage: That was hard. Yeah. When you get down to the details, the spec can be wrong. I mean, it will want you to do something that’s a stupid thing to do. So Bill has such good taste in these things. It would be interesting to do a kind of a diff across the various implementations of the stack. Years and years later we had maybe 50 companies all assemble in a room, only engineers, throw out all the marketing people and all the Ps and VPs and every company in this room—IBM, Hewlett-Packard—oh my God, Hewlett-Packard, fix your TCP—and we just kept going until everybody could work with everybody else in sort of a pact. We’re not going to reveal, Honeywell, that you guys were great with earlier absolute assembly code, determinate time control stuff but you have no clue about how packets work, we’ll help you, so that all of us can make every machine interoperate, which yielded the network show, Interop. Every year we would go put a bunch of fiber inside whatever, you know, Geneva, or pick some, Las Vegas, some big venue. [00:10:30]Graham-Cumming: I used to go to Vegas all the time and that was my great introduction to Vegas was going there for Interop, year after year. [00:10:35]Gage: Oh, you did! Oh, great. [00:10:36]Graham-Cumming: Yes, yes, yes. [00:10:39]Gage: You know in a way, what you’re doing with, for example, just last week with the Verizon problem, everybody implementing what you’re doing now that is not open about their mistakes and what they’ve learned and is not sharing this, it’s a problem. And your global presence to me is another absolutely critical thing. We had about, I forget, 600 engineers in Beijing at the East Gate of Tsinghua a lot of networking expertise and lots of those people are at Tencent and Huawei and those network providers throughout the rest of the world, politics comes and goes but the engineering has to be done in a way that protects us. And so these conversations globally are critical. [00:11:33]Graham-Cumming: Yes, that's one of the things that’s fascinating actually about doing real things on the real Internet is there is a global community of people making computers talk to each other and you know, that it's a tremendously complicated thing to actually make that work, and you do it across countries, across languages. But you end up actually making them work, and that's the Internet we're sitting on, that you and I are talking on right now that is based on those conversations around the world.[00:12:01]Gage: And only by doing it do you understand more deeply how to do it. It’s very difficult in the abstract to say what should happen as we begin to spread. As Sun grew, every major city in Africa had installations and for network access, you were totally dependent on an often very corrupt national telco or the complications dealing with these people just to make your packet smooth. And as it turned out, many of the intelligence and military entities in all of these countries had very little understanding of any of this. That’s changed to some degree. But the dangerous sides of the Internet. Total surveillance, IPv6, complete control of exact identity of origins of packets. We implemented, let’s see, you had an early Sun. We probably completed our IPv6 implementation, was it still fluid in the 90s, but I remember 10 years after we finished a complete implementation of IPv6, the U.S. was still IPv4, it’s still IPv4. [00:13:25]Graham-Cumming: It still is, it still is. Pretty much. Except for the mobile carriers right now. I think in general the mobile phone operators are the ones who've gone more into IPv6 than anybody else.[00:13:37]Gage: It was remarkable in China. We used to have a conference. We’d bring a thousand Chinese universities into a room. Professor Wu from Tsinghua who built the Chinese Education and Research Network, CERNET. And now a thousand universities have a building on campus doing Internet research. We would get up and show this map of China and he kept his head down politically, but he managed at the point when there was a big fight between the Minister of Telecom and the Minister of Railways. The Minister of Railways said, look, I have continuity throughout China because I have railines. I’ve just made a partnership with the People’s Liberation Army, and they are essentially slave labor, and they’re going to dig the ditches, and I’m going to run fiber alongside the railways and I don’t care what you, the Minister of Telecommunications, has to say about it, because I own the territory. And that created a separate pathway for the backbone IPv6 network in China. Cheap, cheap, cheap, get everybody doing things.[00:14:45]Graham-Cumming: Yes, now of course in China that’s resulted in an interesting situation where you have China Telecom and China Unicom, who sort of cooperate with each other but they’re almost rivals which makes IP packets quite difficult to route inside China.[00:14:58]Gage: Yes exactly. At one point I think we had four hunks of China. Everyone was geographically divided. You know there were meetings going on, I remember the moment they merged the telecom ministry with the electronics ministry and since we were working with both of them, I walk in a room and there’s a third group, people I didn’t know, it turns out that’s the People’s Liberation Army. [00:15:32]Graham-Cumming: Yes, they’re part of the team. So okay, going back to this “Network is the Computer” notion. So you were talking about the initial things that you were doing around that, why is it that it's okay that Cloudflare has gone out and trademarked that phrase now, because you seem to think that we've got a leg to stand on, I guess.[00:15:56]Gage: Frankly, I’d only vaguely heard of Cloudflare. I’ve been working in areas, I’ve got a project in the middle of Nairobi in the slum where I’ve spent the last 15 years or so learning a lot about clean water and sewage treatment because we have almost 400,000 people in a very small area, biggest slum in East Africa. How can you introduce sanitary water and clean sewage treatment into a very, an often corrupt, a very difficult environment, and so that’s been a fascination of mine and I’ve been spending a lot of time. What's a computer person know about fluid dynamics and pathogens? There’s a lot to learn. So as you guys grew so rapidly, I vaguely knew of you but until I started reading your blog about post-quantum crypto and how do we devise a network in these resilient denial of service attacks and all these areas where you’re a growing company, it’s very hard to take time to do serious advanced research-level work on distributed computing and distributed security, and yet you guys are doing it. When Bill created Java, the subsequent step from Java for billions and billions of devices to share resources and share computations was something we call Genie which is a framework for validation of who you are, movement of code from device to device in a secure way, total memory control so that someone is not capable of taking over memory in your device as we’ve seen with Spectre and the failures of these billions of Intel chips out there that all have a flaw on take all branches parallel compute implementations. So the very hardware you’re using can be insecure so your operating systems are insecure, the hardware is insecure, and yet you’re trying to build on top with fallible pieces in infallible systems. And you’re in the middle of this, John, which I’m so impressed by.[00:18:13]Graham-Cumming: And Jini sort of lives on as called Apache River now. It moved away from Sun and into an Apache project. [00:18:21]Gage: Yes, very few people seem to realize that the name Apache is a poetic phrasing of “a patchy system.” We patch everything because everything is broken. We moved a lot of it, Brian Behlendorf and the Apache group. Well, many of the innovations at Sun, Java is one, file systems that are far more secure and far more resilient than older file systems, the SPARC  implementation, I think the SPARC processor, even though you’re using the new ARM processors, but Fujitsu, I still think keeps the SPARC architecture as the world’s fastest microprocessor.  [00:19:16]Graham-Cumming: Right. Yes. Being British of course, ARM is a great British success. So I'm honor-bound to use that particular architecture. Clearly.[00:19:25]Gage: Oh, absolutely. And the power. That was the one always in a list of what our engineering goals are. We wanted to make, we were building supercomputers, we were building very large file servers for the telcos and the banks and the intelligence agencies and all these different people, but we always wanted to make a low power and it just fell off the list of what you could accomplish and the ARM chips, their ratios of wattage to packets treated are—you have a great metric on your website someplace about measuring these things at a very low level—that’s key.[00:20:13]Graham-Cumming: Yes, and we had Sophie Wilson, who of course is one of the founders of ARM and actually worked on the original chip, tell this wonderful story at our Internet Summit about how the first chip they hooked up was operating fine until they realized they hadn't hooked the power up and they were asked to. It was so low power that it was able to use the power that was coming in over the logic lines to actually power the whole chip. And they said to me, wait a minute, we haven't plugged the power in but the thing is running, which was really, I mean that was an amazing achievement to have done that.[00:20:50]Gage: That’s amazing. We open sourced SPARC, the instruction set, so that anybody doing crypto that also had Fab capabilities could implement detection of ones and zeroes, sheep and goats, or other kinds of algorithms that are necessary for very high speed crypto. And that’s another aspect that I’m so impressed by Cloudflare. Cloudflare is paying attention at a machine instruction level because you’re implementing with your own hardware packages in what, 180 cities? You’re moving logistically a package into Ulan Bator, or into Mombasa and you’re coming up live. [00:21:38]Graham-Cumming: And we need that to be inexpensive and fast because we're promising people that we will make their Internet properties faster and secure at the same time and that's one of the interesting challenges which is not trading those two things off. Which means your crypto better be fast, for example, and that requires a lot of fiddling around at the hardware level and understanding it. In our case because we're using Intel, really what Intel chips are doing at the low level.[00:22:10]Gage: Intel did implement a couple of things in one or another of the more recent chips that were very useful for crypto. We had a group of the SPARC engineers, probably 30, at a dinner five or six months ago discussing, yes, we set the world standard for parallel execution branching optimizations for pipelines and chips, and when the overall design is not matched by an implementation that pays attention to protecting the memory, it’s a fundamental, exploitable flaw. So a lot of discussion about this. Selecting precisely which instructions are the most important, the risk analysis with the ability to make a chip specifically to implement a particular algorithm, there’s a lot more to go. We have multiples of performance ahead of us for specific algorithms based on a more fluid way to add instructions that are necessary into a specific piece of hardware. And then we jump to quantum. Oh my.[00:23:32]Graham-Cumming: Yes. To talk about that a little bit, the ever-increasing speed of processors and the things we can do; Do you think we actually need that given that we're now living in this incredibly distributed world where we are actually now running very distributed algorithms and do we really need beefier machines?[00:23:49]Gage: At this moment, in a way, it’s you making fun of Bill Joy for only wanting a megabit in Aspen. When Steve Jobs started NeXT, sadly his hardware was just terrible, so we sent a group over to boost NeXT. In fact we sort of secretly slipped him $30 million to keep him afloat. And I’d say, “Jobs, if you really understood something about hardware, it would really be useful here.” So one of the main team members that we sent over to NeXT came to live in Aspen and ended up networking the entire valley. At a point, megabit for what you needed to do, seemed reasonable, so at this moment, as things become alive by the introduction of a little bit of intelligence in them, some little flickering chip that’s able to execute an algorithm, many tasks don’t require. If you really want to factor things fast, quantum, quantum. Which will destroy our existing crypto systems. But if you are just bringing the billions of places where a little bit of knowledge can alter locally a little bit of performance, we could do very well with the compute power that we have right now. But making it live on the network, securely, that’s the key part. The attacks that are going on, simple errors as you had yesterday, are simple errors. In a way, across Cloudflare’s network, you’re watching the challenges of the 21st century take place: attacks, obscure, unknown exploits of devices in the power and water control systems. And so, you are in exactly the right spot to not get much sleep and feel a heavy responsibility.[00:26:20]Graham-Cumming: Well it certainly felt like it yesterday when we were offline for 27 minutes, and that’s when we suddenly discovered, we sort of know how many customers we have, and then we really discover when they start phoning us. Our support line had his own DDoS basically where it didn’t work anymore because so many people signed in. But yes, I think that it's interesting your point about a little bit extra on a device somewhere can do something quite magical and then you link it up to the network and you can do a lot. What we think is going on partly is some things around AI, where large amounts of machine learning are happening on big beefy machines, perhaps in the cloud, perhaps groups of machines, and then devices are doing their own little bits of inference or recognizing faces and stuff like that. And that seems to be an interesting future where we have these devices that are actually intelligent in our pockets. [00:27:17]Gage: Oh, I think that’s exactly right. There’s so much power in your pocket. I’m spending a lot of time trying to catch up that little bit of mathematics that you thought you understood so many years ago and it turns out, oh my, I need a little bit of work here. And I’ve been reading Michael Jordan’s papers and watching his talks and he’s the most cited computer scientist in machine learning and he will always say, “Be very careful about the use of the phrase, ‘Artificial Intelligence’.” Maybe it’s a metaphor like “The Network is the Computer.” But, we’re doing gradient descent optimization. Is the slope going up, or is the slope going down? That’s not smart. It’s useful and the real time language translation and a lot of incredible work can occur when you’re doing phrases. There’s a lot of great pattern work you can do, but he’s out in space essentially combining differentiation and integration in a form of integral. And off we go. Are your hessians rippling in the wind? And what’s the shape of this slope? And is this actually the fastest path from here to there to constantly go downhill. Maybe it’s sometimes going uphill and going over and then downhill that’s faster. So there’s just a huge amount of new mathematics coming in this territory and each time, as we move from 2G to 3G to 4G to 5G, many people don’t appreciate that the compression algorithms changed between 2G, 3G, 4G and 5G and as a result, so much more can move into your mobile device for the same amount of power. 10 or 20 times more for the same about of power. And mathematics leads to insights and applications of it. And you have a working group in that area, I think. I tried to probe around to see if you’re hiring.[00:30:00]Graham-Cumming: Well you could always just come around to just ask us because we'll probably tell you because we tend to be fairly transparent. But yes, I mean compression is definitely an area where we are interested in doing things. One of the things I first worked on at Cloudflare was a thing that did differential compression based on the insight that web pages don’t actually change that much when you hit ‘refresh’. And so it turns out that if you if you compress based on the delta from the last thing you served to someone you can actually send many orders of magnitude less data and so there's lots of interesting things you can do with that kind of insight to save a tremendous amount of bandwidth. And so yeah, definitely compression is interesting, crypto is interesting to us. We’ve actually open sourced some of our compression improvements in zlib which was very popular compression algorithm and now it's been picked up. It turns out that in neuroscience, because there's a tremendous amount of data which needs compression and there are pipelines used in neuroscience where actually having better compression algorithms makes you work a lot faster. So it's fascinating to see the sort of overspill of things we’re doing into other areas where I know nothing about what goes on inside the brain.[00:31:15]Gage: Well isn’t that fascinating, John. I mean here you are, the CTO of Cloudflare working on a problem that deeply affects the Internet, enabling a lot more to move across the Internet in less time with less power, and suddenly it turns into a tool for brain modeling and neuroscientists. This is the benefit. There’s a terrific initiative. I’m at Berkeley. The Jupiter notebooks created by Fernando Perez, this environment in which you can write text and code and share things. That environment, taken up by machine learning. I think it’s a major change. And the implementation of diagrams that are causal. These forms of analysis of what caused what. These are useful across every discipline and for you to model traffic and see patterns emerge and find webpages and see the delta has changed and then intelligently change the pattern of traffic in response to it, it’s all pretty much the same thing here.[00:32:53]Graham-Cumming: Yes and then as a mathematician, when I see things that are the same thing, I can't help wondering what the real deep structure is underneath. There must be another layer another layer down or something. So as you know it's this thing. There's some other deeper layer below all this stuff.  [00:33:12]Gage: I think this is just endlessly fascinating. So my only recommendations to Cloudflare: first, double what you’re doing. That’s so hard because as you go from 10 people to 100 people to 1,000 people to 10,000 people, it’s a different world. You are a prime example, you are global. Suddenly you’re able to deal with local authorities in 60-70 countries and deal with some of the world’s most interesting terrain and with network connectivity and moving data, surveillance, and some security of the foundation infrastructure of all countries. You couldn’t be engaged in more exciting things.[00:34:10]Graham-Cumming: It's true. I mean one of the most interesting things to me is that I have grown up with the Internet when I you know I got an email address using actually the crazy JANET scheme in the UK where the DNS names were backwards. I was in Oxford and they gave me an email address and it was I think it was JGC at uk dot ac dot ox dot prg and that then at some point it flipped around and it went to DNS looked like it had won. For a long time my address was the wrong way around. I think that's a typically British decision to be slightly different to everybody else.[00:35:08]Gage: Well, Oxford’s always had that style, that we’re going to do things differently. There’s an Oxford Center for the 21st century that was created by the money from a wonderful guy who had donated maybe $100 million. And they just branched out into every possible research area. But when you went to meetings, you would enter a building that was built at the time of the Raj. It was the India temple of colonialism. [00:35:57]Graham-Cumming: There's quite a few of those in the UK. Are you thinking of the Martin School? James Martin. And he gave a lot of money to Oxford. Well the funny thing about that was the programming research group. The one thing they didn't teach us really as an undergraduate was how to program which was one of the most fascinating things they have because that was a bit getting your hands dirty so you needed to let all the theory. So we learnt all the theory we did a little bit of functional programming and that was the extent of it which set me really up badly for a career in an industry. My first job I had to pretend I knew how to program and see and learn very quickly. [00:36:42]Gage: Oh my. Well now you’ve been writing code in Go. [00:36:47]Graham-Cumming: Yes. Well the thing about Go, the other Oxford thing of course is Tony Hoare, who is a professor of computer science there. He had come up with this thing called CSP (Communicating Sequential Processes) so that was a whole theory around how you do parallel execution. And so of course everybody used his formalism and I did in my doctoral thesis and so when Go came along and they said oh this how Go works, I said, well clearly that’s CSP and I know how to do this. So I can do it again. [00:37:23]Gage: Tony Hoare occasionally would issue a statement about something and it was always a moment. So few people seem to realize the birth of so much of what we took in the 60s, 70s, 80s, in Silicon Valley and Berkeley, derived from the Manchester Group, the virtual memory work, these innovations. Today, Whit Diffie. He used to love these Bletchley stories, they’re so far advanced. That generation has died off. [00:38:37]Graham-Cumming: There’s a very peculiar thing in computer science and the real application of computing which is that we both somehow sit on this great knowledge of the past of computing and at the same time we seem to willfully forget it and reinvent everything every few years. We go through these cycles where it's like, let’s do centralized computing, now distributed computing. No, let’s have desktop PCs, now let’s have the cloud. We seem to have this collective amnesia and then on occasion people go, “Oh, Leslie Lamport wrote this thing in 1976 about this problem”. What other subject do we willfully forget the past and then have to go and doing archaeology to discover again?[00:39:17]Gage: As a sociological phenomenon it means that the older crowd in a company are depressing because they’ll say, “Oh we tried that and it didn’t work”. Over the years as Sun grew from 15 people or so and ended up being like 45,000 people before we were sold off to Oracle and then everybody dumped out because Oracle didn’t know too much about computing. So Ivan Sutherland, Whit Diffie. Ivan actually stayed on. He may actually still have an Oracle email. Almost all of the research groups, certainly the chip group went off to Intel, Fujitsu, Microsoft. It’s funny to think now that Microsoft’s run by a Sun person. [00:40:19]Graham-Cumming: Well that's the same thing. Everyone’s forgotten that Microsoft was the evil empire not that long ago. And so now it’s not. Right now it’s cool again. [00:40:28]Gage: Well, all of the embedded stuff from Microsoft is still that legacy that Bill Gates who’s now doing wonderful things with the Gates Foundation. But the embedded insecurity of the global networks is due to, in large part, the insecurities, that horrible engineering of Microsoft embedded everywhere. You go anywhere in China to some old industrial facility and there is some old not updated junky PC running totally insecure software. And it’s controlling the grid. It’s discouraging. It’s like a lot of the SCADA systems. [00:41:14]Graham-Cumming: I’m completely terrified of SCADA systems. [00:41:20]Gage: The simplest exploits. I mean, it’s nothing even complicated. There are a series of emerging journalists today that are paying attention to cybersecurity and people have come out with books even very recently. Well, now because we’re in this China, US, Iran nightmare, a United States presidential directive taking the cybersecurity crowd and saying, oops, now you’re an offensive force. Which means we got some 20-year-old lieutenant somewhere who suddenly might just for fun turn off Tehran’s water supply or something. This is scary because the SCADA systems are embedded everywhere, and they’re, I don’t know, would you say totally insecure? Just the simple things, just simple exploits. One of the journalists described, I guess it was the Russians who took a bunch of small USB sticks and at a shopping center near a military base just gave them away. And people put them into their PCs inside SIPRNet, inside the secure U.S. Department of Defense network. Instantly the network was taken over just by inserting a USB device to something on the net. And there you are, John, protecting against this. [00:43:00]Graham-Cumming: Trying hard to protect against these things, yes absolutely. It's very interesting because you mentioned before how rapidly Cloudflare had grown over the last few years. And of course Sun also really got going pretty rapidly, didn’t it?[00:43:00]Gage: Well, yes. The first year we were just some students from Berkeley, hardware from Stanford, Andy Bechtolsheim, software from Berkeley, Berkeley Unix BSD, Bill Joy. Combine the two, and 10 of us or so, and we were, I think the first year was 12 million booked, the second year was 50 or 60 million booked, and the third year was 150 or so million booked and then we hit 500 million and then we hit a billion. And now, it’s selling boxes, we were a manufacturing company so that’s different from software or services, but we also needed lots of people and so we instantly raided the immense benefit of variety of people in the San Francisco Bay Area, with Berkeley and Stanford. We had students in computer science, and mechanical engineering, and physics, and mathematics from every country in the world and we recruited from every country in the world. So a great part of Sun’s growth came, as you are, expanding internationally, and at one point I think we ran most of the telcos of the world, we ran China Mobile. 900 million subscribers on China Mobile, all Sun stuff in the back. Throughout Africa, every telco was running Sun and Cisco until Huawei knocked Cisco out. It was an amazing time. [00:44:55]Graham-Cumming: You ran the machine that ran Latek, that let me get my doctoral thesis done. [00:45:01]Gage: You know that’s how I got into it, actually. I was in econometrics and mathematics at Berkeley, and I walk down a hallway and outside a room was that funny smell from photographic paper from something, and there was perfectly typeset mathematics. Troff and nroff, all those old UNIX utilities for the Bell Technical Journal, and I open the door and I’ve got to get in there. There’s two hundred people sitting in front of these beehive-like little terminals all typing away on a UNIX system. And I want to get an account and I walk down the hall and there's this skinny guy who types about 200 words a minute named Bill Joy. And I said, I need an account, I’ve got to type set integral signs, and he said, what’s your name. I tell him my name, John Gage, and he goes voop, and I’ve never seen anybody type as fast as him in my life. This is a new world, here. [00:45:58]Graham-Cumming: So he was rude then?[00:46:01]Gage: Yeah he was, he was. Well, it’s interesting since the arrival of a device at Berkeley to complement the arrival of an MIT professor who had implemented in LISP, mathematical, not typesetting of mathematics, but actual maxima. To get Professor fetman, maxima god from MIT, to come to Berkeley and live a UNIX environment, we had to put a LISP up outside on the PDP. So Bill took that machine which had virtual memory and implemented the environment for significant computational mathematics. And Steve Wolfram took that CalTech, and Princeton Institute for Advanced Studies, and now we have Mathematica. So in a way, all of Sun and the UNIX world derived from attempting to do executable mathematics.[00:47:17]Graham-Cumming: Which in some ways is what computers are doing. I think one of the things that people don’t really appreciate is the extent to which all numbers underneath.[00:47:28]Gage: Well that’s just this discrete versus continuous problem that Michael Jordan is attempting to address. To my current total puzzlement and complete ignorance, is what in the world is symplectic integration? And how do Lyapunov functions work? Oh, no clue.[00:47:50]Graham-Cumming: Are we going to do a second podcast on that? Are you going to come back and teach us? [00:47:55]Gage: Try it. We’re on, you’re on, you’re on. Absolutely. But you’ve got to run a company. [00:48:00]Graham-Cumming: Well I've got some things to do. Yeah. But you can go do that and come tell us about it.[00:48:05]Gage: All right, Great John. Well it was terrific to talk to you.[00:48:08]Graham-Cumming: So yes it was wonderful speaking to you as well. Thank you for helping me dig up memories of when I was first fooling around with Sun Systems and, you know, some of the early days and of course “The Network is the Computer,” I'm not sure I fully yet understand quite the metaphor or even if maybe I do somehow deeply in my soul get it, but we’re going to try and make it a reality, whatever it is.[00:48:30]Gage: Well, I count it as a complete success, because you count as one of our successes because you‘re doing what you’re doing, therefore the phrase, “The Network is the Computer,” resides in your brain and when you get up in the morning and decide what to do, a little bit nudges you toward making the network work.[00:48:51]Graham-Cumming: I think that's probably true. And there's the dog, the dog is saying you've been yakking for an hour and now we better stop. So listen, thank you so much for taking the time. It was wonderful talking to you. You have a good day. Thank you very much.Interested in hearing more? Listen to my conversations with Ray Rothrock and Greg Papadopoulos of Sun Microsystems:Ray RothrockGreg PapadopoulosTo learn more about Cloudflare Workers, check out the use cases below:Optimizely - Optimizely chose Workers when updating their experimentation platform to provide faster responses from the edge and support more experiments for their customers.Cordial - Cordial used a “stable of Workers” to do custom Black Friday load shedding as well as using it as a serverless platform for building scalable customer-facing services.AO.com - AO.com used Workers to avoid significant code changes to their underlying platform when migrating from a legacy provider to a modern cloud backend.Pwned Passwords - Troy Hunt’s popular "Have I Been Pwned" project benefits from cache hit ratios of 94% on its Pwned Passwords API due to Workers.Timely - Using Workers and Workers KV, Timely was able to safely migrate application endpoints using simple value updates to a distributed key-value store.Quintype - Quintype was an eager adopter of Workers to cache content they previously considered un-cacheable and improve the user experience of their publishing platform.

The Network is the Computer: A Conversation with Ray Rothrock

Last week I spoke with Ray Rothrock, former Director of CAD/CAM Marketing at Sun Microsystems, to discuss his time at Sun and how the Internet has evolved. In this conversation, Ray discusses the importance of trust as a principle, the growth of Sun in sales and marketing, and that time he gave Vice President Bush a Sun demo. Listen to our conversation here and read the full transcript below. [00:00:07]John Graham-Cumming: Here I am very lucky to get to talk with Ray Rothrock who was I think one of the first investors in Cloudflare, a Series A investor and got the company a little bit of money to get going, but if we dial back a few earlier years than that, he was also at Sun as the Director of CAD/CAM Marketing. There is a link between Sun and Cloudflare. At least one, but probably more than one, which is that Cloudflare has recently trademarked, “The Network is the Computer”. And that was a Sun trademark, wasn’t it?[00:00:43]Ray Rothrock: It was, yes.[00:00:46]Graham-Cumming: I talked to John Gage and I asked him about this as well and I asked him to explain to me what it meant. And I'm going to ask you the same thing because I remember walking around the Valley thinking, that sounds cool; I’m not sure I totally understand it. So perhaps you can tell me, was I right that it was cool, and what does it mean?[00:01:06]Rothrock: Well it certainly was cool and it was extraordinarily unique at the time. Just some quick background. In those early days when I was there, the whole concept of networking computers was brand new. Our competitor Apollo had a proprietary network but Sun chose to go with TCP/IP which was a standard at the time but a brand new standard that very few people know about right. So when we started connecting computers and doing some intensive computing which is what I was responsible for—CAD/CAM in those days was extremely intensive whether it was electrical CAD/camera, or mechanical CAD/CAM, or even simulation solid design modeling and things—having a little extra power from other computers was a big deal. And so this concept of “The Network is the Computer” essentially said that you had one window into the network through your desktop computer in those days—there was no mobile computing at that time, this was like 84’, 85’, 86’ I think. And so if you had the appropriate software you could use other people's computers (for CPU power) and so you could do very hard problems at that single computer could not do because you could offload some of that CPU to the other computers. Now that was very nerdy, very engineering intensive, and not many people did it. We’d go to the SIGGRAPH, which was a huge graphics show in those days and we would demonstrate ten Sun computers for example, doing some graphic rendering of a 3D wireframe that had been created in the CAD/CAM software of some sort. And it was, it was hard, and that was in the mechanical side. On the electrical side, Berkeley had some software that was called Magic—it’s still around and is a very popular EDA software that’s been incorporated in those concepts. But to imagine calculating the paths in a very complicated PCB or a very complicated chip, one computer couldn't do it, but Sun had the fundamental technology. So from my seat at Sun at the time, I had access to what could be infinite computing power, even though I had a single application running, and that was a big selling point for me when I was trying to convince EDA and MDA companies to put their software on the Sun. That was my job. [00:03:38] Graham-Cumming: And hearing it now, it doesn’t sound very revolutionary, because of course we’re all doing that now. I mean I get my phone out of my pocket and connect to goodness knows what computing power which does image recognition and spots faces and I can do all sorts of things. But walk me through what it felt like at the time.[00:03:56]Rothrock: Just doing a Google search, I mean, how many data stores are being spun up for that? At the time it was incredible, because you could actually do side by side comparisons. We created some demonstrations, where one computer might take ten hours to do a calculation, two computers might take three hours, five computers might take 30 minutes. So with this demo, you could turn on computers and we would go out on the TCP/IP network to look for an available CPU that could give me some time. Let's go back even further. Probably 15 years before that, we had time sharing. So you had a terminal into a big mainframe and did all this swapping in and out of stuff to give you a time slice computing. We were doing the exact same thing except we were CPU slicing, not just time slicing. That’s pretty nerdy, but that's what we did. And I had to work with the engineering department, with all these great engineers in those days, to make this work for a demo. It was so unique, you know, their eyes would get big. You remember Novell...[00:05:37]Graham-Cumming: I was literally just thinking about Novell because I actually worked on IPX and SPX networking stuff at the time. I was going to ask you actually, to what extent do you think TCP/IP was a very important part of this revolution?[00:05:55]Rothrock: It was huge. It was fundamentally huge because it was a standard, so it was available and if you implemented it, you didn’t have to pay for it. When Bob Metcalfe did Ethernet, it was on top of the TCP stack. Sun, in my memory, and I could be wrong, was the first company to put a TCP/IP stack on the computer. And so you just plugged in the back, an RJ45 into this TCP/IP network with a switch or a router on it and you were golden. They made it so simple and so cheap that you just did it. And of course if you give an engineer that kind of freedom and it opens up. By the way, as the marketing guy at Sun, this was my first non-engineering job. I came from a very technical world of nuclear physics into Sun. And so it was stunning, just stunning.[00:06:59]Graham-Cumming: It’s interesting that you mentioned Novell and then you mentioned Apollo before that and obviously IBM had SNA networking and there were attempts to do all those networking things. It's interesting that these open standards have really enabled the explosion of everything else we've seen and with everything that's going on in the Internet.[00:07:23]Rothrock: Sun was open, so to speak, but this concept of open source now that just dominates the conversation. As a venture capitalist, every deal I ever invested in had open source of some sort in it. There was a while when it was very problematic in an M&A event, but the world’s gotten used to it. So open, is very powerful. It's like freedom. It's like liberty. Like today, July 4th, it’s a big deal. [00:07:52] Graham-Cumming: Yes, absolutely. It’s just interesting to see it explode today because I spent a lot of my career looking at so many different networking protocols. The thing that really surprises me, or perhaps shouldn’t surprise me when you’ve got these open things, is that you harness so many people's intelligence that you just end up with something that’s just better. It seems simple.[00:08:15]Rothrock: It seems simple. I think part of the magic of Sun is that they made it easy. Easy is the most powerful thing you can do in computing. Computing can be so nerdy and so difficult. But if you just make it easy, and Cloudflare has done a great job with that at that; they did it with their DNS service, they did it with all the stuff we worked on back when I was on the board and actively involved in the company. You’ve got to make it easy. I mean, I remember when Matthew and Lee worked like 20 hours a day on how to switch your DNS from whoever your provider was to Cloudflare. That was supposed to be one click, done. A to B. And that DNA was part of the magic. And whether we agree that Sun did it that way, to me at least, Sun did it that way as well. So it's huge, a huge lift.[00:09:08]Graham-Cumming: It’s funny you talk about that because at the time, how that actually worked is that we just asked people to give us their username and password. And we logged in and did it for them. Early on, Matthew asked me if I’d be interested in joining Cloudflare when it was brand new and because of other reasons I’d moved back to the UK and I wasn’t ready to change jobs and I’d just taken another job. And I remember thinking, this thing is crazy this Cloudflare thing. Who's going to hand over their DNS and their traffic to these four or five people above a nail salon in Palo Alto? And Matthew’s response was, “They’re giving us their passwords, let alone their traffic.” Because they were so desperate for it.[00:09:54]Rothrock: It tells you a lot about Matthew and you know as an attorney, I mean he was very sensitive to that and believes that one of the one of the founding principles is trust. His view was that, if I ever lose the customer’s trust, Cloudflare is toast. And so everything focused around that key value. And he was right.[00:10:18] Graham-Cumming: And you must have, at Sun, been involved with some high performance computing things that involved sensitive customers doing cryptography and things like that. So again trust is another theme that runs through there as well.[00:10:33]Rothrock: Yeah, very true. As the marketing guy of CAD/CAM, I was in the field two-thirds of the time, showing customers what was possible with them. My job was to get third party software onto the Sun box and then to turn that into a presentation to a customer. So I visited many government customers, many aerospace, power, all these very high falutin sort of behind the firewall kinds of guys in those days. So yes, trust was huge. It would come up: “Okay, so I’m using your CPU, how is it that you can’t use mine. And how do you convince me that you've not violated something.” In those days it was a whole different conversation that it is today but it was nonetheless just as important. In fact I remember I spent quite a bit of time at NCSA at the University of Illinois Urbana-Champaign. Larry Smarr was the head of NCSA. We spent a lot of time with Larry. I think John was there with me. John Gage and Vinod and some others but it was a big deal taking about high performance computing because that's what they were doing and doing it with Sun.[00:11:50]Graham-Cumming: So just to dial forward, so you’re at Venrock and you decide to invest in Cloudflare. What was it that made you think that this was worth investing in? Presumably you saw some things that were in some of Sun’s vision. Because Sun had a very wide-ranging visions about what was going to be possible with computing.[00:12:11]Rothrock: Yeah. Let me sort of touch on a few points probably. Certainly Sun was my first computer company I worked for after I got out of the nuclear business and the philosophy of the company was very powerful. Not only we had this cool 19 inch black and white giant Macintosh essentially although the Mac wasn't even born yet, but it had this ease of use that was powerful and had this open, I mean it was we preached that all the time and we made that possible. And Cloudflare—the related philosophy of Matthew and Michelle's genius—was they wanted to make security and distribution of data as free and easy as possible for the long tail. That was the first thinking because you didn't have access if you were in the long tail you were a small company you or you're just going to get whipped around by the big boys. And so there was a bit of, “We're here to help you, we're going to do it.” It's a good thing that the long tail get mobilized if you will or emboldened to use the Internet like the big boys do. And that was part of the attractiveness. I didn't say, “Boy, Matthew, this sounds like Sun,” but the concept of open and liberating which is what they were trying to do with this long tail DNS and CDN stuff was very compelling and seemed easy. But nothing ever is. But they made it look easy.[00:13:52]Graham-Cumming: Yeah, it never is. One of the parallels that I’ve noticed is that I think early on at Sun, a lot of Sun equipment went to companies that later became big companies. So some of these small firms that were using crazy work stations ended up becoming some of the big names in the Valley. To your point about the long tail, they were being ignored and couldn’t buy from IBM even if they wanted to. [00:14:25]Rothrock: They couldn’t afford SNA and they couldn’t do lots of things. So Sun was an enabler for these companies with cool ideas for products and software to use Sun as the underpinning. workstations were all the rage, because PCs were very limited in those days. Very very limited, they were all Intel based. Sun was 68000-based originally and then it was their own stuff, SPARC. You know in the beginning it was a cheap microprocessor from Motorola.[00:15:04]Graham-Cumming: What was the growth like at Sun? Because it was very fast, right?[00:15:09]Rothrock: Oh yes, it was extraordinarily fast. I think I was employee 130 or something like that. I left Sun in 1986 to go to business school and they gave me a leave of absence. Carol Bartz was my boss at that moment. The company was like at 2000 people just two and a half years later. So it was growing like a weed. I measured my success by how thick the catalyst—that was our catalog name and our program—how thick and how quickly I could add bonafide software developers to our catalog. We published on one sheet of paper front to back. When I first got there, our catalyst catalogue was a sheet of paper, and when I left, it was a book. It was about three-quarters of an inch thick. My group grew from me to 30 people in about a year and a half. It was extraordinary growth. We went public during that time, had a lot of capital and a lot of buzz. That openness, that our competition was all proprietary just like you were citing there, John. IBM and Apollo were all proprietary networks. You could buy a NIC card and stick it into your PC and talk to a Sun. And vise versa. And you couldn’t do that with IBM or Apollo. Do you remember those? [00:16:48]Graham-Cumming: I do because I was talking to John Gage. In my first job out of college, I wrote a TCP/IP stack from scratch, for a manufacturer of network cards. The test of this stack was I had an HP Apollo box and I had a Sun workstation and there was a sort of magical, can I talk to these devices? And can I ping them? And then that was already magical the first ping as it went across the network. And then, can I Telnet to one of these? So you know, getting the networking actually running was sort of the key thing. How important was networking for Sun in the early days? Was it always there? [00:17:35]Rothrock: Yeah, it was there from the beginning, the idea of having a network capability. When I got there it was network; the machine wasn’t standalone at all. We sort of mimicked the mainframe world where we had green screens hooked into a Sun in a department for example. And there was time sharing. But as soon as you got a Sun on your desk, which was rare because we were shipping as many as we could build, it was fantastic. I was sharing information with engineering and we were working back and forth on stuff. But I think it was fundamental: you have a microprocessor, you’ve got a big screen, you’ve got a graphic UI, and you have a network that hooks into the greater universe. In those days, to send an all-Sun email around the world, modems spun up everywhere. The network wasn’t what it is now. [00:18:35]Graham-Cumming: I remember in about 89’, I was at a conference and Whit Diffie was there. I asked him what he was doing. He was in a little computer room. I was trying to typeset something. And he said, “I’m telnetting into a machine which is in San Diego.” It was the first time I’d seen this and I stepped over and he was like, “look at this.” And he’s hitting the keyboard and the keys are getting echoed back. And I thought, oh my goodness, this is incredible. It’s right across the Atlantic and across the country as well. [00:19:10]Rothrock: I think, and this is just me talking having lived the last years and with all the investing and stuff I did, but you know it enabled the Internet to come about, the TCP/IP standard. You may recall that Microsoft tried to modify the TCP/IP stack slightly, and the world rejected it, because it was just too powerful, too pervasive. And then along comes HTTP and all the other protocols that followed. Telnetting, FTPing, all that file transfer stuff, we were doing that left, right, and center back in the 80s. I mean you know Cloudflare just took all this stuff and made it better, easier, and literally lower friction. That was the core investment thesis at the time and it just exploded. Much like when Sun adopted TCP/IP, it just exploded. You were there when it happened. My little company that I’m the CEO of now, we use Cloudflare services. First thing I did when I got there was switched to Cloudflare. [00:20:18]Graham-Cumming: And that was one of the things when I joined, we really wanted people get to a point where if you’re putting something on the web, you just say, well I’m going to put Cloudflare or a thing like Cloudflare just on it. Because it protects it, it makes it faster, etc. And of course now what we've done is we’ve given people compute facility. Right now you can write code and run it in our in our machines worldwide which is another whole thing. [00:20:43]Rothrock: And that is “The Network is the Computer”. The other thing that Sun was pitching then was a paperless office. I remember we had posters of paper flying out of a computer window on a Sun workstation and I don't think we've gotten there yet. But certainly, the network is the computer. [00:21:04]Graham-Cumming: It was probably the case that the paperless office was one of those things that was about to happen for quite a long time. [00:21:14]Rothrock: It's still about to happen if you ask me. I think e-commerce and the sort of the digital transformation has driven it harder than just networking. You know, the fact that we can now sign legal documents over the Internet without paper and things like that. People had to adopt. People have to trust. People have to adopt these standards and accept them. And lo and behold we are because we made it easy, we made it cheap, and we made it trustworthy.[00:21:42]Graham-Cumming: If you dial back through Sun, what was the hardest thing? I’m asking because I’m at a 1,000-person company and it feels hard some days, so I’m curious. What do I need to start worrying about? [00:22:03]Rothrock: Well yeah, at 1,000 people, I think that’s when John came into the company and sort of organized marketing. I would say, holding engineering to schedules; that was hard. That was hard because we were pushing the envelope our graphics was going from black and white to color. The networking stuff the performance of all the chips into the boards and just the performance was a big deal. And I remember, for me personally, I would go to a trade show. I'd go to Boston to the Association of Mechanical Engineers with the team there and would show up at these workstations and of course the engineers want to show off the latest. So I would be bringing with me tapes that we had of the latest operating system. But getting the engineers to be ready for a tradeshow was very hard because they were always experimenting. I don't believe the word “code freeze” meant much to them, frankly, but we would we would be downloading the software and building a trade show thing that had to run for three days on the latest and greatest and we knew our competitor would be there right across the aisle from us sort of showing their hot stuff. And working with Eric Schmidt in those days, you know, Eric you just got to be done on this date. But trade shows were wonderful. They focused the company’s endpoints if you will. And marketing and sales drove Sun; Scott McNealy’s culture there was big on that. But we had to show. It’s different today than it was then, I don’t know about the Cloudflare competition, but back then, there were a dozen workstation companies and we were fighting for mindshare and market share every day. So you didn't dare sort of leave your best jewels at home. You brought them with you. I will give John Gage high, high marks. He showed me how to dance through a reboot in case the code crashed and he’s marvelous and I learned how to work that stuff and to survive. [00:24:25] Rothrock: Can I tell you one sort of sales story?  [00:24:28] Graham-Cumming: Yes, I’m very interested in hearing the non-technical stories. As an engineer, I can hear engineering stories all the time, but I’m curious what it was like being in sales and marketing in such an engineering heavy company as Sun. [00:24:48] Rothrock: Yeah. Well it was challenging of course. One of the strategies that Sun had in those days was to get anyone who was building their own computer. This was Computer Vision and Data General and all those guys to adopt the Sun as their hardware platform and then they could put on whatever they wanted. So because I was one of the demo gods, my job was to go along with the sales guys when they wanted to try to convince somebody. So one of the companies we went after was Data General (DG) in Massachusetts. And so I worked for weeks on getting this whole demo suite running MDA, EDA, word processing, I had everything. And this was a big, big, big deal. And I mean like hundreds of millions of dollars of revenue. And so I went out a couple of days early and we were going to put up a bunch of Suns and I had a demo room at DG. So all the gear showed up and I got there at like 5:30 in the morning and started downloading everything, downloading software, making it dance. And at about 8:00 a.m. in the morning the CEO of Data General walks in. I didn't know who he was but it turned out to be Ed de Castro. And he introduces himself and I didn’t know who he was and he said, “What are you doing?” And I explained, “I’m from Sun, I’m getting ready for a big demo. We’ve got a big executive presentation. Mr. McNealy will be here shortly, etc.” And he said, “Well, show me what you’ve got.” So I’m sort of still in the middle of downloading this software and I start making this thing dance. I’ve got these machines talking to each other and showing all kinds of cool stuff. And he left. And the meeting was about 10 or 11 in the morning. And so when the executive team from Sun showed up they said, “Well, how's it going?” I said, “Well I gave a demo to a guy,” and they asked, “Who's the guy,” and I said, “It was Ed de Castro.” And they went, “Oh my God, that was the CEO.” Well, we got the deal. I thought Ed had a little tactic there to come in early, see what he could see, maybe get the true skinny on this thing and see what’s real. I carried the day. But anyway, I got a nice little bonus for that. But Vinod and I would drop into Lockheed down in Southern California. They wanted to put Suns on P-3 airplanes and we'd go down there with an engineer and we’d figure out how to make it. Those were just incredible times. You may remember back in the 80s everyone dressed up except on Fridays. It was dress-down Fridays. And one day I dressed down and Carol Bartz, my boss, saw me wearing blue jeans and just an open collared shirt and she said, “Rothrock, you go home and put on a suit! You never know when a customer is going to walk in the front door.” She was quite right. Kodak shows up. Kodak made a big investment in Sun when it was still private. And I gave that demo and then AT&T, and then interestingly Vice President Bush back in the Reagan administration came to Sun to see the manufacturing and I gave the demo to the Vice President with Scott and Andy and Bill and Vinod standing there. [00:28:15]Graham-Cumming: Do you remember what he saw?[00:28:18]Rothrock: It was my standard two minute Sun demo that I can give in my sleep. We were on the manufacturing floor. We picked up a machine and I created a demo for it and my executive team was there. We have a picture of it somewhere, but it was fun. As John Gage would say, he’d say, “Ray, your job is to make the computer dance.” So I did.  [00:28:44]Graham-Cumming: And one of the other things I wanted to ask you about is at some point Sun was almost Amazon Web Services, wasn't it. There was a rent-a-computer service, right?  [00:28:53]Rothrock: I don't know. I don't remember the rent-a-computer service. I remember we went after the PC business aggressively and went after the data centers which were brand new in those days pretty aggressively, but I don’t remember the rent-a-computer business that much. It wasn’t in my domain. [00:29:14]Graham-Cumming: So what are you up to these days?[00:29:18]Rothrock: I’m still investing. I do a lot of security investing. I did 15 deals while I was at Venrock. Cloudflare was the last one I did, which turned out really well of course. More to come, I hope. And I’m CEO of one of Venrock’s portfolio companies that had a little trouble a few years back but I fixed that and it’s moving up nicely now. But I’ve started thinking about more of a science base. I’m on the board of the Carnegie Institute of Science. I'm on the board of MIT and I just joined the board of the Nuclear Threat Initiative in Washington which is run by Secretary Ernie Moniz, former secretary of energy. So I’m doing stuff like that. John would be pleased with how well that played through. But I'll tell you it is this these fundamental principles, just tying it all back to Sun and Cloudflare, and this sort of open, cheap, easy, enabling humans to do things without too much friction, that is exciting. I mean, look at your phone. Steve Jobs was the master of design to make this thing as sweet as it is. [00:30:37]Graham-Cumming: Yes, and as addictive. [00:30:39]Rothrock: Absolutely, right. I haven’t been to a presentation from Cloudflare in two years, but every time I see an announcement like the DNS service, I immediately switched all my DNS here at the house to 1.1.1.1. Stuff like that. Because I know it’s good and I know it’s trustworthy, and it’s got that philosophy built in the DNA. [00:31:09]Graham-Cumming: Yes definitely. Taking it back to what we talked about at the beginning, it’s definitely the trustworthiness is something that Cloudflare has cared about from the beginning and continues to care about. We’re sort of the guardians of the traffic that passes through it.[00:31:25]Rothrock: Back when the Internet started happening and when Sun was doing Java, I mean, all those things in the 90s, I was of course at Venrock, but I was still pretty connected to [Edward] Zander and [Scott] McNealy. We were hoping that it would be liberating, that it would create a world which was much more free and open to conversation and we’ve seen the dark side of some of that. But I continue to believe that transparency and openness is a good thing and we should never shut it down. I don't mean to get it all waxing philosophical here but way more good comes from being open and transparent than bad.[00:32:07] Graham-Cumming: Listen it's July 4th. It's evening here in London. We can be waxing philosophical as much as we like. Well listen, thank you for taking the time to chat with me. Are there any other reminiscences of Sun that you think the public needs to know in this oral history of “The Network is the Computer.”[00:32:28]Rothrock: Well you know the only thing I'd say is having landed in the Silicon Valley in 1981 and getting on with Sun, I can say this given my age and longevity here, everything is built on somebody else's great ideas. And starting with TCP/IP and then we went to this HTML protocol and browsers, it’s just layer on layer on layer on layer and so Cloudflare is just one of the latest to climb on the shoulders of those giants who put it all together. I mean, we don’t even think about the physical network anymore. But it is there and thank goodness companies like Cloudflare keep providing that fundamental service on which we can build interesting, cool, exciting, and mind-changing things. And without a Cloudflare, without Sun, without Apollo, without all those guys back in the day, it would be different. The world would just be so, so different. I did the New York Times crossword puzzle. I could not do it without Google because I have access to information I would not have unless I went to the library. It’s exponential and it just gets better. Thanks to Michelle and Matthew and Lee for starting Cloudflare and allowing Venrock to invest in it.[00:34:01]Graham-Cumming: Well thank you for being an investor. I mean, it helped us get off the ground and get things moving. I very much agree with you about the standing on the shoulders of giants because people don't appreciate the extent to which so much of this fundamental work that we did was done in the 70s and 80s. [00:34:19]Rothrock: Yea, it’s just like the automobile and the airplane. We reminisce about the history but boy, there were a lot of giants in those industries as well. And computing is just the latest. [00:34:32]Graham-Cumming: Yep, absolutely. Well, Ray, thank you. Have a good afternoon. Interested in hearing more? Listen to my conversations with John Gage and Greg Papadopoulos of Sun Microsystems:John GageGreg PapadopoulosTo learn more about Cloudflare Workers, check out the use cases below:Optimizely - Optimizely chose Workers when updating their experimentation platform to provide faster responses from the edge and support more experiments for their customers.Cordial - Cordial used a “stable of Workers” to do custom Black Friday load shedding as well as using it as a serverless platform for building scalable customer-facing services.AO.com - AO.com used Workers to avoid significant code changes to their underlying platform when migrating from a legacy provider to a modern cloud backend.Pwned Passwords - Troy Hunt’s popular "Have I Been Pwned" project benefits from cache hit ratios of 94% on its Pwned Passwords API due to Workers.Timely - Using Workers and Workers KV, Timely was able to safely migrate application endpoints using simple value updates to a distributed key-value store. Quintype - Quintype was an eager adopter of Workers to cache content they previously considered un-cacheable and improve the user experience of their publishing platform.

The Network is the Computer: A Conversation with Greg Papadopoulos

I spoke with Greg Papadopoulos, former CTO of Sun Microsystems, to discuss the origins and meaning of The Network is the Computer®, as well as Cloudflare’s role in the evolution of the phrase. During our conversation, we considered the inevitability of latency, the slowness of the speed of light, and the future of Cloudflare’s newly acquired trademark. Listen to our conversation here and read the full transcript below.[00:00:08]John Graham-Cumming: Thank you so much for taking the time to chat with me. I've got Greg Papadopoulos who was CTO of Sun and is currently a venture capitalist. Tell us about “The Network is the Computer.” [00:00:22]Greg Papadopoulos: Well, from certainly a Sun perspective, the very first Sun-1 was connected via Internet protocols and at that time there was a big war about what should win from a networking point of view. And there was a dedication there that everything that we made was going to interoperate on the network over open standards, and from day one in the company, it was always that thought. It's really about the collection of these machines and how they interact with one another, and of course that puts the network in the middle of it. And then it becomes hard to, you know, where's the line? But it is one of those things that I think even if you ask most people at Sun, you go, “Okay explain to me ‘The Network is the Computer.’” It would get rather meta. People would see that phrase and sort of react to it in their own way. But it would always come back to something similar to what I had said I think in the earlier days. [00:01:37]Graham-Cumming: I remember it very well because it was obviously plastered everywhere in Silicon Valley for a while. And it sounded incredibly cool but I was never quite sure what it meant. It sounded like it was one of those things that was super deep but I couldn't dig deep enough. But it sort of seems like this whole vision has come true because if you dial back to I think it's 2006, you wrote a blog post about how the world was only going to need five or seven or some small number of computers. And that was also linked to this as well, wasn't it?[00:02:05]Papadopoulos: Yeah, I think as things began to evolve into what we would call cloud computing today, but that you could put substantial resources on the other side of the network and from the end user’s perspective and those could be as effective or more effective than something you'd have in front of you. And so this idea that you really could provide these larger scale computing services in early days — you know, grid was the term used before cloud — but if you follow that logic, and you watch what was happening to the improvements of the network. Dave Patterson at Cal was very fond of saying in that era and in the 90s, networks are getting to the place where the desk connected to another machine is transparent to you. I mean it could be your own, in fact, somebody else's memory may in fact be closer to you than your own disk. And that's a pretty interesting thought. And so where we ended up going was really a complete realization that these things we would call servers were actually just components of this network computer. And so it was very mysterious, “The Network is the Computer,” and it actually grew into itself in this way. And I'll say looking at Cloudflare, you see this next level of scale happening. It's not just, what are those things that you build inside a data center, how do you connect to it, but in fact, it's the network that is the computer that is the network.[00:04:26]Graham-Cumming: It's interesting though that there have been these waves of centralization and then push the computing power to the edge and the PCs at some point and then Larry Ellison came along and he was going to have this networked computer thing, and it sort of seems to swing back and forth, so where do you think we are in this swinging?[00:04:44]Papadopoulos: You know, I don't think so much swinging. I think it's a spiral upwards and we come to a place and we look down and it looks familiar. You know, where you'll say, oh I see, here's a 3270 connected to a mainframe. Well, that looks like a browser connected to a web server. And you know, here's the device, it’s connected to the web service. And they look similar but there are some very important differences as we're traversing this helix of sorts. And if you look back, for example the 3270, it was inextricably bound to a single server that was hosted. And now our devices have really the ability to connect to any other computer on the network. And so then I think we're seeing something that looks like a pendulum there, it’s really a refactoring question on what software belongs where and how hard is it to maintain where it is, and naturally I think that the Internet protocol clearly is a peer to peer protocol, so it doesn't take sides on this. And so that we end up in one state, with more on the client or less on the client. I think it really has to do with how well we've figured out distributed computing and how well we can deliver code in a management-free way. And that's a longer conversation. [00:06:35]Graham-Cumming: Well, it's an interesting conversation. One thing is what you talked about with Sun Grid which then we end up with Amazon Web Services and things like that, is that there was sort of the device, be it your handheld or your laptop talking to some cloud computing, and then what Cloudflare has done with this Workers product to say, well, actually I think there's three places where code could exist. There's something you can put inside the network.[00:07:02]Papadopoulos: Yes. And by extension that could grow to another layer too. And it goes back to, I think it's Dave Clark who I first remember saying you can get all the bandwidth you want, that's money, but you can't reduce latency. That's God, right. And so I think there are certainly things and as I see the Workers architecture, there are two things going on. There's clearly something to be said about latency there, and having distributed points of presence and getting closer to the clients. And there’s IBM with interaction there too, but it is also something that is around management of software and how we should be thinking in delivery of applications, which ultimately I believe, in the limit, become more distributed-looking than they are now. It's just that it's really hard to write distributed applications in kind of the general way we think about it.[00:08:18]Graham-Cumming: Yes, that's one of these things isn’t it, it is exceedingly hard to actually write these things which is why I think we're going through a bit of a transition right now where people are trying to figure out where that code should actually execute and what should execute where.[00:08:31]Papadopoulos: Yeah. You had graciously pointed out this blog from a dozen years ago on, hey this is inevitable that we're going to have this concentration of computing, for a lot of economic reasons as anything else. But it's both a hammer and a nail. You know, cloud stuff in some ways is unnatural in that why should we expect computing to get concentrated like it is. If you really look into it more deeply, I think it has to do with management and control and capital cycles and really things that are kind of on the economic and the administrative side of things, are not about what's truth and beauty and the destination for where applications should be.[00:09:27]Graham-Cumming: And I think you also see some companies are now starting to wrestle with the economics of the cloud where they realize that they are kind of locked into their cloud provider and are paying rent kind of thing; it becomes entirely economic at that point.[00:09:41]Papadopoulos: Well it does, and you know, this was also something I was pretty vocal about, although I got misinterpreted for a while there as being, you know, anti-cloud or something which I'm not, I think I'm pragmatic about it. One of the dangers is certainly as people yield particularly to SaaS products, that in fact, your data in many ways, unless you have explicit contracts and abilities to disgorge that data from that service, that data becomes more and more captive. And that's the part that I think is actually the real question here, which is like, what's the switching cost from one service to another, from one cloud to another.[00:10:35]Graham-Cumming: Yes, absolutely. That's one of the things that we faced, one of the reasons why we worked on this thing called the Bandwidth Alliance, which is one of the ways in which stuff gets locked into clouds is the egress fee is so large that you don't want to get your data out.[00:10:50]Papadopoulos: Exactly. And then there is always the, you know, well we have these particular features in our particular cloud that are very seductive to developers and you write to them and it's kind of hard to undo, you know, just the physics of moving things around. So what you all have been doing there is I think necessary and quite progressive. But we can do more.[00:11:17]Graham-Cumming: Yes definitely. Just to go back to the thought about latency and bandwidth, I have a jokey pair of slides where I show the average broadband network you can buy over time and it going up, and then the change in the speed of light over the same period, which of course is entirely flat, zero progress in the speed of light. Looking back through your biography, you wrote thinking machines and I assume that fighting latency at a much shorter distance of cabling must have been interesting in those machines because of the speeds at which they were operating.[00:11:54]Papadopoulos: Yes, it surprises most people when you say it, but you know, computer architects complain that the speed of light is really slow. And you know, Grace Hopper who is really one of the founders, the pioneers of modern programming languages and COBOL. I think she was a vice admiral. And she would walk around with a wire that was a foot long and say, “this is a nanosecond”. And that seemed pretty short for a while but, you know a nanosecond is an eternity these days.[00:12:40]Graham-Cumming: Yes, it's an eternity. People don't quite appreciate it if they're not thinking about it, how long it is. I had someone who was new to the computing world learning about it, come to me with a book which was talking about fiber optics, and in the book it said there is a laser that flashes on and off a billion times a second to send data down the fiber optic. And he came to me and said, “This can't possibly be true; it's just too fast.”[00:13:09]Papadopoulos: No, it's too slow![00:013:12]Graham-Cumming: Right? And I thought, well that’s slow. And then I stepped back and thought, you know, to the average person, that is a ridiculous statement, that somehow we humans have managed to control time at this ridiculously small level. And then we keep pushing and pushing and pushing it and people don't appreciate how fast and actually how slow the light is, really.[00:13:33]Papadopoulos: Yeah. And I think if it actually comes down to it, if you want to get into a very pure reckoning of this is latency is the only thing that matters. And one can look at bandwidth as a component of latency, so you can see bandwidth as a serialization delay and that kind of goes back to Clark thing, you know that, yeah I can buy that, I can't bribe God on the other side so you know I'm fundamentally left with this problem that we have. Thank you, Albert Einstein, right? It's kind of hopeless to think about sending information faster than that.[00:14:09]Graham-Cumming: Yeah exactly. There’s information limits, which is driving why we have such powerful phones, because in fact the latency to the human is very low if you have it in your hand.[00:14:23]Papadopoulos: Yes, absolutely. This is where the edge architecture and the Worker structure that you guys are working on, and I think that's where it becomes really interesting too because it gives me — you talked about earlier, well we're now introducing this new tier — but it gives me a really closer place from a latency point of view to have some intimate relationship with a device, and at the same time be well-connected to the network.[00:14:55]Graham-Cumming: Right. And I think the other thing that is interesting about that is that your device fundamentally is an insecure thing, so you know if you put code on that thing, you can't put secrets in it, like a cryptographic secrets, because the end user has access to them. Normally you would keep that in the server somewhere, but then the other funny thing is if you have this intermediary tier which is both secure and low latency to the end user, you suddenly have a different world in which you can put secrets, you can put code that is privileged, but it can interact with the user very very rapidly because the low latency.[00:15:30]Papadopoulos: Yeah. And that essence of where’s my trust domain. Now I've seen all kinds of like, oh my gosh, I cannot believe somebody is doing it, like putting their S3 credentials, putting it down on a device and having it talk, you know, the log in for a database or something. You must be kidding. I mean that trust proxy point at low latency is a really key thing.[00:16:02]Graham-Cumming: Yes, I think it's just people need to start thinking about that architecture. Is there a sort of parallel with things that were going on with very high-performance computing with sort of the massively parallel stuff and what's happening today? What lessons can we take from work done in the 70s and 80s and apply it to the Internet of today?[00:16:24]Papadopoulos: Well, we talked about this sort of, there are a couple of fundamental issues here. And one we've been speaking about is latency. The other one is synchronization, and this comes up in a bunch of different ways. You know, whether it's when one looks at the cap theorem kinds of things that Eric Brewer has been famous for, can I get consistency and availability and survive partitionability, all that, at the same time. And so you end up in this kind of place of—goes back to maybe Einstein a bit—but you know, knowing when things have happened and when state has been actually changed or committed is a pretty profound problem. [00:17:15]Graham-Cumming: It is, and what order things have happened. [00:17:18]Papadopoulos: Yes. And that order is going to be relative to an observer here as well. And so if you're insisting on some total ordering then you're insisting on slowing things down as well. And that really is fundamental. We were pushing into that in the massively parallel stuff and you'll see that at Internet scale. You know there's another thing, if I could. This is one of my greatest “aha”s about networks and it's due to a fellow at Sun, Rob Gingell, who actually ended up being chief engineer at Sun and was one of the real pioneers of the software development framework that brought Solaris forward. But Rob would talk about this thing that I label as network entropy. It's basically what happens when you connect systems to networks, what do networks kind of do to those systems? And this is a little bit of a philosophical question; it’s not a physical one. And Rob observed that over time networks have this property of wanting to decompose things into constituent parts, have those parts get specialized and then reintegrated. And so let me make that less abstract. So in the early days of connecting systems to networks, one of the natural observations were, well why don't we take the storage out of those desktop systems or server systems and put them on the other side of at least a local network and into a file server or storage server. And so you could see that computer sort of get pulled apart between its computing and its storage piece. And then that storage piece, you know in Rob’s step, that would go on and get specialized. So we had whole companies start like Network Appliances, Pure Storage, EMC. And so, you know like big pieces of industry or look the original routers were RADb you know running on workstations and you know Cisco went and took that and made that into something and so you now see this effect happen at the next scale. One of the things that really got me excited when I first saw Cloudflare a decade ago was, wow okay in those early days, well we can take a component like a network firewall and that can get pulled away and created as its own network entity and specialized. And I think one of the things, at least from my history of Cloudflare, one of the most profound things was, particularly as you guys went in and separated off these functions early on, the fear of people was this is going to introduce latency, and in fact things got faster. Figure that.[00:20:51]Graham-Cumming: Part of that of course is caching and then there's dealing with the speed of light by being close to people. But also if you say your company makes things faster and you do all these different things including security, you are forced to optimize the whole thing to live up to the claim. Whereas if you try and chain things together, nobody's really responsible for that overall latency budget. It becomes natural that you have to do it.[00:21:18]Papadopoulos: Yes. And you all have done it brilliantly, you know, to sort of Gingell’s view. Okay so this piece got decomposed and now specialized, meaning optimized like heck, because that's what you do. And so you can see that over and over again and you see it in terms of even Twilio or something. You know, here's a messaging service. I’m just pulling my applications apart, letting people specialize. But the final piece, and this is really the punchline. The final piece is, Rob will talk about it, the value is in the reintegration of it. And so you know what are those unifying forces that are creating, if you will, the operating system for “The Network is the Computer.” You were asking about the massively parallel scale. Well, we had an operating system we wrote for this. As you get up to the higher scale, you get into these more distributed circumstances where the complexity goes up by some important number of orders of magnitude, and now what's that reintegration? And so I come back and I look at what Cloudflare is doing here. You're entering into that phase now of actually being that re-integrator, almost that operating system for the computer that is the network.[00:23:06]Graham-Cumming: I think that's right. We often talk about actually being an operating system on the Internet, so very similar kind of thoughts.[00:23:14]Papadopoulos: Yes. And you know as we were talking earlier about how developers make sense of this pendulum or cycle or whatever it is. Having this idea of an operating system or of a place where I can have ground truths and trust and sort of fixed points in this are terribly important.[00:23:44]Graham-Cumming: Absolutely. So do you have any final thoughts on, what, it must be 30 years on from when “The Network is the Computer” was a Sun trademark. Now it's a Cloudflare trademark. What's the future going to look of that slogan and who's going to trademark it in 30 years time now?[00:24:03]Papadopoulos: Well, it could be interplanetary at that point. [00:24:13]Graham-Cumming: Well, if you talk about the latency problems of going interplanetary, we definitely have to solve the latency.[00:24:18]Papadopoulos: Yeah. People do understand that. They go, wow it’s like seven minutes within here and Mars, hitting close approach. [00:24:28]Graham-Cumming: The earthly equivalent of that is New Zealand. If you speak to people from New Zealand and they come on holiday to Europe or they move to the US, they suddenly say that the Internet works so much better here. And it’s just that it's closer. Now the Australians have figured this out because Australia is actually drifting northwards so they're actually going to get within. That's going to fix it for them but New Zealand is stuck.[00:24:56]Papadopoulos: I do ask my physicist friends for one of two things. You know, either give me a faster speed of light — so far they have not delivered — or another dimension I can cut through. Maybe we'll keep working on the latter.[00:25:16]Graham-Cumming: All right. Well listen Greg, thank you for the conversation. Thank you for thinking about this stuff many many years ago. I think we're getting there slowly on some of this work. And yeah, good talking to you.[00:25:27]Papadopoulos: Well, you too. And thank you for carrying the torch forward. I think everyone from Sun who listens to this, and John, and everybody should feel really proud about what part they played in the evolution of this great invention.[00:25:48]Graham-Cumming: It's certainly the case that a tremendous amount of work was done at Sun that was really fundamental and, you know, perhaps some of that was ahead of its time but here we are. [00:25:57]Papadopoulos: Thank you. [00:25:58]Graham-Cumming: Thank you very much.[00:25:59]Papadopoulos: Cheers.Interested in hearing more? Listen to my conversations with John Gage and Ray Rothrock of Sun Microsystems:John GageRay Rothrock To learn more about Cloudflare Workers, check out the use cases below:Optimizely - Optimizely chose Workers when updating their experimentation platform to provide faster responses from the edge and support more experiments for their customers.Cordial - Cordial used a “stable of Workers” to do custom Black Friday load shedding as well as using it as a serverless platform for building scalable customer-facing services.AO.com - AO.com used Workers to avoid significant code changes to their underlying platform when migrating from a legacy provider to a modern cloud backend.Pwned Passwords - Troy Hunt’s popular "Have I Been Pwned" project benefits from cache hit ratios of 94% on its Pwned Passwords API due to Workers.Timely - Using Workers and Workers KV, Timely was able to safely migrate application endpoints using simple value updates to a distributed key-value store.Quintype - Quintype was an eager adopter of Workers to cache content they previously considered un-cacheable and improve the user experience of their publishing platform.

The Network is the Computer

We recently registered the trademark for The Network is the Computer®, to encompass how Cloudflare is utilizing its network to pave the way for the future of the Internet.The phrase was first coined in 1984 by John Gage, the 21st employee of Sun Microsystems, where he was credited with building Sun’s vision around “The Network is the Computer.” When Sun was acquired in 2010, the trademark was not renewed, but the vision remained. Take it from him: “When we built Sun Microsystems, every computer we made had the network at its core. But we could only imagine, over thirty years ago, today’s billions of networked devices, from the smallest camera or light bulb to the largest supercomputer, sharing their packets across Cloudflare’s distributed global network.We based our vision of an interconnected world on open and shared standards. Cloudflare extends this dedication to new levels by openly sharing designs for security and resilience in the post-quantum computer world.Most importantly, Cloudflare is committed to immediate, open, transparent accountability for network performance. I’m a dedicated reader of their technical blog, as the network becomes central to our security infrastructure and the global economy, demanding even more powerful technical innovation.” Cloudflare's massive network, which spans more than 180 cities in 80 countries, enables the company to deliver its suite of security, performance, and reliability products, including its serverless edge computing offerings. In March of 2018, we launched our serverless solution Cloudflare Workers, to allow anyone to deploy code at the edge of our network. We also recently announced advancements to Cloudflare Workers in June of 2019 to give application developers the ability to do away with cloud regions, VMs, servers, containers, load balancers—all they need to do is write the code, and we do the rest. With each of Cloudflare’s data centers acting as a highly scalable application origin to which users are automatically routed via our Anycast network, code is run within milliseconds of users worldwide. In honor of registering Sun’s former trademark, I spoke with John Gage, Greg Papadopoulos, former CTO of Sun Microsystems, and Ray Rothrock, former Director of CAD/CAM Marketing at Sun Microsystems, to learn more about the history of the phrase and what it means for the future: John GageRay RothrockGreg Papadopoulos To learn more about Cloudflare Workers, check out the use cases below:Optimizely - Optimizely chose Workers when updating their experimentation platform to provide faster responses from the edge and support more experiments for their customers.Cordial - Cordial used a “stable of Workers” to do custom Black Friday load shedding as well as using it as a serverless platform for building scalable customer-facing services.AO.com - AO.com used Workers to avoid significant code changes to their underlying platform when migrating from a legacy provider to a modern cloud backend.Pwned Passwords - Troy Hunt’s popular "Have I Been Pwned" project benefits from cache hit ratios of 94% on its Pwned Passwords API due to Workers.Timely - Using Workers and Workers KV, Timely was able to safely migrate application endpoints using simple value updates to a distributed key-value store.Quintype - Quintype was an eager adopter of Workers to cache content they previously considered un-cacheable and improve the user experience of their publishing platform.

A gentle introduction to Linux Kernel fuzzing

For some time I’ve wanted to play with coverage-guided fuzzing. Fuzzing is a powerful testing technique where an automated program feeds semi-random inputs to a tested program. The intention is to find such inputs that trigger bugs. Fuzzing is especially useful in finding memory corruption bugs in C or C++ programs. Image by Patrick Shannon CC BY 2.0 Normally it's recommended to pick a well known, but little explored, library that is heavy on parsing. Historically things like libjpeg, libpng and libyaml were perfect targets. Nowadays it's harder to find a good target - everything seems to have been fuzzed to death already. That's a good thing! I guess the software is getting better! Instead of choosing a userspace target I decided to have a go at the Linux Kernel netlink machinery. Netlink is an internal Linux facility used by tools like "ss", "ip", "netstat". It's used for low level networking tasks - configuring network interfaces, IP addresses, routing tables and such. It's a good target: it's an obscure part of kernel, and it's relatively easy to automatically craft valid messages. Most importantly, we can learn a lot about Linux internals in the process. Bugs in netlink aren't going to have security impact though - netlink sockets usually require privileged access anyway. In this post we'll run AFL fuzzer, driving our netlink shim program against a custom Linux kernel. All of this running inside KVM virtualization. This blog post is a tutorial. With the easy to follow instructions, you should be able to quickly replicate the results. All you need is a machine running Linux and 20 minutes. Prior work The technique we are going to use is formally called "coverage-guided fuzzing". There's a lot of prior literature: The Smart Fuzzer Revolution by Dan Guido, and LWN article about it Effective file format fuzzing by Mateusz “j00ru” Jurczyk honggfuzz by Robert Swiecki, is a modern, feature-rich coverage-guided fuzzer ClusterFuzz Fuzzer Test Suite Many people have fuzzed the Linux Kernel in the past. Most importantly: syzkaller (aka syzbot) by Dmitry Vyukov, is a very powerful CI-style continuously running kernel fuzzer, which found hundreds of issues already. It's an awesome machine - it will even report the bugs automatically! Trinity fuzzer We'll use the AFL, everyone's favorite fuzzer. AFL was written by Michał Zalewski. It's well known for its ease of use, speed and very good mutation logic. It's a perfect choice for people starting their journey into fuzzing! If you want to read more about AFL, the documentation is in couple of files: Historical notes Technical whitepaper README Coverage-guided fuzzing Coverage-guided fuzzing works on the principle of a feedback loop: the fuzzer picks the most promising test case the fuzzer mutates the test into a large number of new test cases the target code runs the mutated test cases, and reports back code coverage the fuzzer computes a score from the reported coverage, and uses it to prioritize the interesting mutated tests and remove the redundant ones For example, let's say the input test is "hello". Fuzzer may mutate it to a number of tests, for example: "hEllo" (bit flip), "hXello" (byte insertion), "hllo" (byte deletion). If any of these tests will yield an interesting code coverage, then it will be prioritized and used as a base for a next generation of tests. Specifics on how mutations are done, and how to efficiently compare code coverage reports of thousands of program runs is the fuzzer secret sauce. Read on the AFL's technical whitepaper for nitty gritty details. The code coverage reported back from the binary is very important. It allows fuzzer to order the test cases, and identify the most promising ones. Without the code coverage the fuzzer is blind. Normally, when using AFL, we are required to instrument the target code so that coverage is reported in an AFL-compatible way. But we want to fuzz the kernel! We can't just recompile it with "afl-gcc"! Instead we'll use a trick. We'll prepare a binary that will trick AFL into thinking it was compiled with its tooling. This binary will report back the code coverage extracted from kernel. Kernel code coverage The kernel has at least two built-in coverage mechanisms - GCOV and KCOV: Using gcov with the Linux kernel KCOV: code coverage for fuzzing KCOV was designed with fuzzing in mind, so we'll use this. Using KCOV is pretty easy. We must compile the Linux kernel with the right setting. First, enable the KCOV kernel config option: cd linux ./scripts/config \ -e KCOV \ -d KCOV_INSTRUMENT_ALL KCOV is capable of recording code coverage from the whole kernel. It can be set with KCOV_INSTRUMENT_ALL option. This has disadvantages though - it would slow down the parts of the kernel we don't want to profile, and would introduce noise in our measurements (reduce "stability"). For starters, let's disable KCOV_INSTRUMENT_ALL and enable KCOV selectively on the code we actually want to profile. Today, we focus on netlink machinery, so let's enable KCOV on whole "net" directory tree: find net -name Makefile | xargs -L1 -I {} bash -c 'echo "KCOV_INSTRUMENT := y" >> {}' In a perfect world we would enable KCOV only for a couple of files we really are interested in. But netlink handling is peppered all over the network stack code, and we don't have time for fine tuning it today. With KCOV in place, it's worth to add "kernel hacking" toggles that will increase the likelihood of reporting memory corruption bugs. See the README for the list of Syzkaller suggested options - most importantly KASAN. With that set we can compile our KCOV and KASAN enabled kernel. Oh, one more thing. We are going to run the kernel in a kvm. We're going to use "virtme", so we need a couple of toggles: ./scripts/config \ -e VIRTIO -e VIRTIO_PCI -e NET_9P -e NET_9P_VIRTIO -e 9P_FS \ -e VIRTIO_NET -e VIRTIO_CONSOLE -e DEVTMPFS ... (see the README for full list) How to use KCOV KCOV is super easy to use. First, note the code coverage is recorded in a per-process data structure. This means you have to enable and disable KCOV within a userspace process, and it's impossible to record coverage for non-task things, like interrupt handling. This is totally fine for our needs. KCOV reports data into a ring buffer. Setting it up is pretty simple, see our code. Then you can enable and disable it with a trivial ioctl: ioctl(kcov_fd, KCOV_ENABLE, KCOV_TRACE_PC); /* profiled code */ ioctl(kcov_fd, KCOV_DISABLE, 0); After this sequence the ring buffer contains the list of %rip values of all the basic blocks of the KCOV-enabled kernel code. To read the buffer just run: n = __atomic_load_n(&kcov_ring[0], __ATOMIC_RELAXED); for (i = 0; i < n; i++) { printf("0x%lx\n", kcov_ring[i + 1]); } With tools like addr2line it's possible to resolve the %rip to a specific line of code. We won't need it though - the raw %rip values are sufficient for us. Feeding KCOV into AFL The next step in our journey is to learn how to trick AFL. Remember, AFL needs a specially-crafted executable, but we want to feed in the kernel code coverage. First we need to understand how AFL works. AFL sets up an array of 64K 8-bit numbers. This memory region is called "shared_mem" or "trace_bits" and is shared with the traced program. Every byte in the array can be thought of as a hit counter for a particular (branch_src, branch_dst) pair in the instrumented code. It's important to notice that AFL prefers random branch labels, rather than reusing the %rip value to identify the basic blocks. This is to increase entropy - we want our hit counters in the array to be uniformly distributed. The algorithm AFL uses is: cur_location = <COMPILE_TIME_RANDOM>; shared_mem[cur_location ^ prev_location]++; prev_location = cur_location >> 1; In our case with KCOV we don't have compile-time-random values for each branch. Instead we'll use a hash function to generate a uniform 16 bit number from %rip recorded by KCOV. This is how to feed a KCOV report into the AFL "shared_mem" array: n = __atomic_load_n(&kcov_ring[0], __ATOMIC_RELAXED); uint16_t prev_location = 0; for (i = 0; i < n; i++) { uint16_t cur_location = hash_function(kcov_ring[i + 1]); shared_mem[cur_location ^ prev_location]++; prev_location = cur_location >> 1; } Reading test data from AFL Finally, we need to actually write the test code hammering the kernel netlink interface! First we need to read input data from AFL. By default AFL sends a test case to stdin: /* read AFL test data */ char buf[512*1024]; int buf_len = read(0, buf, sizeof(buf)); Fuzzing netlink Then we need to send this buffer into a netlink socket. But we know nothing about how netlink works! Okay, let's use the first 5 bytes of input as the netlink protocol and group id fields. This will allow the AFL to figure out and guess the correct values of these fields. The code testing netlink (simplified): netlink_fd = socket(AF_NETLINK, SOCK_RAW | SOCK_NONBLOCK, buf[0]); struct sockaddr_nl sa = { .nl_family = AF_NETLINK, .nl_groups = (buf[1] <<24) | (buf[2]<<16) | (buf[3]<<8) | buf[4], }; bind(netlink_fd, (struct sockaddr *) &sa, sizeof(sa)); struct iovec iov = { &buf[5], buf_len - 5 }; struct sockaddr_nl sax = { .nl_family = AF_NETLINK, }; struct msghdr msg = { &sax, sizeof(sax), &iov, 1, NULL, 0, 0 }; r = sendmsg(netlink_fd, &msg, 0); if (r != -1) { /* sendmsg succeeded! great I guess... */ } That's basically it! For speed, we will wrap this in a short loop that mimics the AFL "fork server" logic. I'll skip the explanation here, see our code for details. The resulting code of our AFL-to-KCOV shim looks like: forksrv_welcome(); while(1) { forksrv_cycle(); test_data = afl_read_input(); kcov_enable(); /* netlink magic */ kcov_disable(); /* fill in shared_map with tuples recorded by kcov */ if (new_crash_in_dmesg) { forksrv_status(1); } else { forksrv_status(0); } } See full source code. How to run the custom kernel We're missing one important piece - how to actually run the custom kernel we've built. There are three options: "native": You can totally boot the built kernel on your server and fuzz it natively. This is the fastest technique, but pretty problematic. If the fuzzing succeeds in finding a bug you will crash the machine, potentially losing the test data. Cutting the branches we sit on should be avoided. "uml": We could configure the kernel to run as User Mode Linux. Running a UML kernel requires no privileges. The kernel just runs a user space process. UML is pretty cool, but sadly, it doesn't support KASAN, therefore the chances of finding a memory corruption bug are reduced. Finally, UML is a pretty magic special environment - bugs found in UML may not be relevant on real environments. Interestingly, UML is used by Android network_tests framework. "kvm": we can use kvm to run our custom kernel in a virtualized environment. This is what we'll do. One of the simplest ways to run a custom kernel in a KVM environment is to use "virtme" scripts. With them we can avoid having to create a dedicated disk image or partition, and just share the host file system. This is how we can run our code: virtme-run \ --kimg bzImage \ --rw --pwd --memory 512M \ --script-sh "<what to run inside kvm>" But hold on. We forgot about preparing input corpus data for our fuzzer! Building the input corpus Every fuzzer takes a carefully crafted test cases as input, to bootstrap the first mutations. The test cases should be short, and cover as large part of code as possible. Sadly - I know nothing about netlink. How about we don't prepare the input corpus... Instead we can ask AFL to "figure out" what inputs make sense. This is what Michał did back in 2014 with JPEGs and it worked for him. With this in mind, here is our input corpus: mkdir inp echo "hello world" > inp/01.txt Instructions, how to compile and run the whole thing are in README.md on our github. It boils down to: virtme-run \ --kimg bzImage \ --rw --pwd --memory 512M \ --script-sh "./afl-fuzz -i inp -o out -- fuzznetlink" With this running you will see the familiar AFL status screen: Further notes That's it. Now you have a custom hardened kernel, running a basic coverage-guided fuzzer. All inside KVM. Was it worth the effort? Even with this basic fuzzer, and no input corpus, after a day or two the fuzzer found an interesting code path: NEIGH: BUG, double timer add, state is 8. With a more specialized fuzzer, some work on improving the "stability" metric and a decent input corpus, we could expect even better results. If you want to learn more about what netlink sockets actually do, see a blog post by my colleague Jakub Sitnicki Multipath Routing in Linux - part 1. Then there is a good chapter about it in Linux Kernel Networking book by Rami Rosen. In this blog post we haven't mentioned: details of AFL shared_memory setup implementation of AFL persistent mode how to create a network namespace to isolate the effects of weird netlink commands, and improve the "stability" AFL score technique on how to read dmesg (/dev/kmsg) to find kernel crashes idea to run AFL outside of KVM, for speed and stability - currently the tests aren't stable after a crash is found But we achieved our goal - we set up a basic, yet still useful fuzzer against a kernel. Most importantly: the same machinery can be reused to fuzz other parts of Linux subsystems - from file systems to bpf verifier. I also learned a hard lesson: tuning fuzzers is a full time job. Proper fuzzing is definitely not as simple as starting it up and idly waiting for crashes. There is always something to improve, tune, and re-implement. A quote at the beginning of the mentioned presentation by Mateusz Jurczyk resonated with me: "Fuzzing is easy to learn but hard to master." Happy bug hunting!

Cloudflare outage caused by bad software deploy

This is a short placeholder blog and will be replaced with a full post-mortem and disclosure of what happened today.For about 30 minutes today, visitors to Cloudflare sites received 502 errors caused by a massive spike in CPU utilization on our network. This CPU spike was caused by a bad software deploy that was rolled back. Once rolled back the service returned to normal operation and all domains using Cloudflare returned to normal traffic levels.This was not an attack (as some have speculated) and we are incredibly sorry that this incident occurred. Internal teams are meeting as I write performing a full post-mortem to understand how this occurred and how we prevent this from ever occurring again.

Cloudflare outage caused by bad software deploy (updated)

This is a short placeholder blog and will be replaced with a full post-mortem and disclosure of what happened today.For about 30 minutes today, visitors to Cloudflare sites received 502 errors caused by a massive spike in CPU utilization on our network. This CPU spike was caused by a bad software deploy that was rolled back. Once rolled back the service returned to normal operation and all domains using Cloudflare returned to normal traffic levels.This was not an attack (as some have speculated) and we are incredibly sorry that this incident occurred. Internal teams are meeting as I write performing a full post-mortem to understand how this occurred and how we prevent this from ever occurring again.Update at 2009 UTC:Starting at 1342 UTC today we experienced a global outage across our network that resulted in visitors to Cloudflare-proxied domains being shown 502 errors (“Bad Gateway”). The cause of this outage was deployment of a single misconfigured rule within the Cloudflare Web Application Firewall (WAF) during a routine deployment of new Cloudflare WAF Managed rules.The intent of these new rules was to improve the blocking of inline JavaScript that is used in attacks. These rules were being deployed in a simulated mode where issues are identified and logged by the new rule but no customer traffic is actually blocked so that we can measure false positive rates and ensure that the new rules do not cause problems when they are deployed into full production.Unfortunately, one of these rules contained a regular expression that caused CPU to spike to 100% on our machines worldwide. This 100% CPU spike caused the 502 errors that our customers saw. At its worst traffic dropped by 82%.This chart shows CPU spiking in one of our PoPs:We were seeing an unprecedented CPU exhaustion event, which was novel for us as we had not experienced global CPU exhaustion before. We make software deployments constantly across the network and have automated systems to run test suites and a procedure for deploying progressively to prevent incidents. Unfortunately, these WAF rules were deployed globally in one go and caused today’s outage.At 1402 UTC we understood what was happening and decided to issue a ‘global kill’ on the WAF Managed Rulesets, which instantly dropped CPU back to normal and restored traffic. That occurred at 1409 UTC.We then went on to review the offending pull request, roll back the specific rules, test the change to ensure that we were 100% certain that we had the correct fix, and re-enabled the WAF Managed Rulesets at 1452 UTC.We recognize that an incident like this is very painful for our customers. Our testing processes were insufficient in this case and we are reviewing and making changes to our testing and deployment process to avoid incidents like this in the future.

Third Time’s a Charm (A brief history of a gay marriage)

Happy Pride from Proudflare, Cloudflare’s LGBTQIA+ employee resource group. We wanted to share some stories from our members this month which highlight both the struggles behind the LGBTQIA+ rights movement and its successes. This first story is from Lesley. The moment that crystalised the memory of that day…crystal blue afternoon, bright-coloured autumn leaves, borrowed tables, crockery and cutlery, flowers arranged by a cousin, cake baked by a neighbour, music mixed by a friend... our priest/rabbi a close gay friend with neither  yarmulke nor collar. The venue, a backyard kitty-corner at the home my wife grew up in. Love and good wishes in abundance from a community that supports us and our union. And in the middle of all that, my wife… turning to me and smiling, grass stains on the bottom of her long cream wedding dress after abandoning her heels and dancing barefoot in the grass. As usual, a microphone in hand, bringing life and laughter to all with her charismatic quips. Our first marriage was as legal as marrying two donkeys, with all the attached rightsThis was the fall of 2002 and same-sex marriage was legal in 0 of the 50 United States. Our first marriage in Oct 2002 in Walnut Creek, CAIt was a tough time economically. We had a front row seat to the historic internet boom and bust. My company filed chapter 11 bankruptcy and my customer’s customers were going out of business. I did not anticipate getting a job anytime soon. So after a lot of silver-tongued persuading, I convinced my wife to quit her job, rent out our home, buy a RV. We grabbed our two Australian shepherds and toured the US for a nine-month honeymoon. Forty-five states and 36,000 miles later, we still had some funds left over and I wanted to show Robin my other home, so we travelled to South Africa for 6 weeks. Just before we got on our flight back to the US, we heard news from our family that Gavin Newsom, the then mayor of San Francisco was going to declare same-sex marriage legal and begin issuing marriage licenses in San Francisco. The trip back to the Bay Area took 42 hours door-to-door and had a nine-hour time change. We got home, dropped off our bags and the very next morning, completely jet-lagged, went back into the city to stand in line to get our marriage license.Our second marriage was a much smaller and more intimate affairHeld in an alcove in the beautiful San Francisco City Hall rotunda with its exquisite architectural design and a view of the grand staircase. It was attended by Robin’s parents and a couple of our close friends that could take the time at such short notice. Knowing that the time window was closing, we grabbed the opportunity to have our marriage recognised legally along with dozens of other jubilant gay couples. Alas, a short time later, we received an annulment in the mail. We received that, along with an apology and a request that we donate the licensing fees to the city. We were disappointed, but felt our love was strong enough to carry us through and who needed a piece of paper anyway, right?!? Our attitude changed significantly when we had our son. We had been trying for a while and what finally worked for us was to take my egg along with a sperm bank donation, and impregnate Robin. To this day, my mom says I’m the best delegator she knows. I delegated childbirth. I also delegated all my rights as Joey’s mother. Absent a marriage, in California, the birth mother has all the rights and responsibilities. Robin had been in a relationship before me where she had planned and had a child with another woman. When they split up, Robin had no rights to see or have access to the child. She also had no obligation to support the child in any way, financial or other.We wanted to make sure Joey never faced that predicament and without the option for marriage, took the next available avenue. I adopted Joey. Even though he is genetically my child, we had to go through a lengthy and costly procedure to adopt him. We had child protective services inspect our home and come for numerous visits to ensure I would be a “suitable” parent for my child. Eventually, I was granted adoption approval and we went to family court in Martinez where I came before a judge and officially adopted Joey as my son. In the early summer of 2008, the California Supreme Court declared same-sex marriage legal in California. We were the second state to make it legal after Massachusetts. The court found that barring same-sex couples from marriage violated California State’s Constitution.  Queue marriage number three… In the period between the declaration and Prop 8 passing, Robin and I joined 18,000 gay couples that tied the knot.At this point, we were pros at getting married and went with casual jeans and white cotton shirts, which was easier for everyone including our son who attended our wedding. This was the first marriage where our legal rights and responsibilities actually stuck. These rights were only valid in California however, so on any travel outside of our beautiful state our union would be considered illegitimate. From a tax perspective, it was a real adventure with every advisor having a different take on the way we should file our taxes. Federally our marriage was not recognized, but in California it was. This lead to a lot of confusion and added expenses every April.Little did we know the backlash that our happy/gay marriages would cause. The religious, conservative right came back at us with Prop 8 for daring to expect equality.From our perspective, there was no other way to view this than vengeful and born out of malice for gays. Why would these people care that we wanted to live together and have the protections of marriage? I saw this as a group of people wanting to impose their religion and view of what a marriage should be on us. We were on vacation in Hawaii when the election results were announced, sweet with Barack Obama being elected and so, so very bitter with Prop 8 passing. Prop 8 provoked a lot of soul-searching for me. I was very angry and had a general distrust of people that I had never felt before. I would be in the supermarket line and wonder who there may or may not have voted against my marriage. It was deeply personal and hurtful. We had Mormon friends, who are for the most part wonderful and whose company we enjoyed. Knowing the extreme measures their community went to to ensure Prop 8 passed, cut me deeply.  Catholics and conservatives who are both family and friends, went out of their way to harm me and my family and to make our lives more difficult because they believed we were sinners and not worthy of equality in the eyes of the law. Fortunately, our marriage was grandfathered in, so our rights in California were preserved. The one ray of light was watching our allies stand up and come to our defense. In my life, I’ve pretty much always been part of the privileged class. I’m a white woman with a degree who grew up in an affluent home. I had never personally experienced discrimination or felt part of a marginalized minority. To have allies that stepped up and argued on our behalf brought tears to my eyes. We would not have the rights we have today without those allies. This was a significant lesson for me to learn. I will always stand up for the disenfranchised and make my voice heard to defend those who cannot defend themselves as others have done for me and my family. I know how much it means.Life went on, and the fight went on, gathering momentum as more states legalized same sex marriage initially through court action and then through popular vote.June 26, 2015 was a triumphant dayWe celebrated a landmark victory for gay rights as the U.S. Supreme Court ruled that same-sex marriage was a constitutional right and DOMA (the Defense of Marriage Act) was repealed. Finally, our marriage was recognised in every state in the Union. We still consider our first wedding as the day we got married. We wrote our own vows and they have traveled with us from home to home framed with pride on the wall in our bedroom.In October this year, Robin and I will celebrate our 17th wedding anniversary. We’ve been together 19 years in total and it’s been quite a ride. I promised Robin I’d marry her seven times, we still have a way to go!On vacation, May 2019 in Bora Bora

Third Time’s a Charm (A brief history of a gay marriage)

Happy Pride from Proudflare, Cloudflare’s LGBTQIA+ employee resource group. We wanted to share some stories from our members this month which highlight both the struggles behind the LGBTQIA+ rights movement and its successes. This first story is from Lesley. The moment that crystalised the memory of that day…crystal blue afternoon, bright-coloured autumn leaves, borrowed tables, crockery and cutlery, flowers arranged by a cousin, cake baked by a neighbour, music mixed by a friend... our priest/rabbi a close gay friend with neither  yarmulke nor collar. The venue, a backyard kitty-corner at the home my wife grew up in. Love and good wishes in abundance from a community that supports us and our union. And in the middle of all that, my wife… turning to me and smiling, grass stains on the bottom of her long cream wedding dress after abandoning her heels and dancing barefoot in the grass. As usual, a microphone in hand, bringing life and laughter to all with her charismatic quips. Our first marriage was as legal as marrying two donkeys, with all the attached rightsThis was the fall of 2002 and same-sex marriage was legal in 0 of the 50 United States. Our first marriage in Oct 2002 in Walnut Creek, CAIt was a tough time economically. We had a front row seat to the historic internet boom and bust. My company filed chapter 11 bankruptcy and my customer’s customers were going out of business. I did not anticipate getting a job anytime soon. So after a lot of silver-tongued persuading, I convinced my wife to quit her job, rent out our home, buy a RV. We grabbed our two Australian shepherds and toured the US for a nine-month honeymoon. Forty-five states and 36,000 miles later, we still had some funds left over and I wanted to show Robin my other home, so we travelled to South Africa for 6 weeks. Just before we got on our flight back to the US, we heard news from our family that Gavin Newsom, the then mayor of San Francisco was going to declare same-sex marriage legal and begin issuing marriage licenses in San Francisco. The trip back to the Bay Area took 42 hours door-to-door and had a nine-hour time change. We got home, dropped off our bags and the very next morning, completely jet-lagged, went back into the city to stand in line to get our marriage license.Our second marriage was a much smaller and more intimate affairHeld in an alcove in the beautiful San Francisco City Hall rotunda with its exquisite architectural design and a view of the grand staircase. It was attended by Robin’s parents and a couple of our close friends that could take the time at such short notice. Knowing that the time window was closing, we grabbed the opportunity to have our marriage recognised legally along with dozens of other jubilant gay couples. Alas, a short time later, we received an annulment in the mail. We received that, along with an apology and a request that we donate the licensing fees to the city. We were disappointed, but felt our love was strong enough to carry us through and who needed a piece of paper anyway, right?!? Our attitude changed significantly when we had our son. We had been trying for a while and what finally worked for us was to take my egg along with a sperm bank donation, and impregnate Robin. To this day, my mom says I’m the best delegator she knows. I delegated childbirth. I also delegated all my rights as Joey’s mother. Absent a marriage, in California, the birth mother has all the rights and responsibilities. Robin had been in a relationship before me where she had planned and had a child with another woman. When they split up, Robin had no rights to see or have access to the child. She also had no obligation to support the child in any way, financial or other.We wanted to make sure Joey never faced that predicament and without the option for marriage, took the next available avenue. I adopted Joey. Even though he is genetically my child, we had to go through a lengthy and costly procedure to adopt him. We had child protective services inspect our home and come for numerous visits to ensure I would be a “suitable” parent for my child. Eventually, I was granted adoption approval and we went to family court in Martinez where I came before a judge and officially adopted Joey as my son. In the early summer of 2008, the California Supreme Court declared same-sex marriage legal in California. We were the second state to make it legal after Massachusetts. The court found that barring same-sex couples from marriage violated California State’s Constitution.  Queue marriage number three… In the period between the declaration and Prop 8 passing, Robin and I joined 18,000 gay couples that tied the knot.At this point, we were pros at getting married and went with casual jeans and white cotton shirts, which was easier for everyone including our son who attended our wedding. This was the first marriage where our legal rights and responsibilities actually stuck. These rights were only valid in California however, so on any travel outside of our beautiful state our union would be considered illegitimate. From a tax perspective, it was a real adventure with every advisor having a different take on the way we should file our taxes. Federally our marriage was not recognized, but in California it was. This lead to a lot of confusion and added expenses every April.Little did we know the backlash that our happy/gay marriages would cause. The religious, conservative right came back at us with Prop 8 for daring to expect equality.From our perspective, there was no other way to view this than vengeful and born out of malice for gays. Why would these people care that we wanted to live together and have the protections of marriage? I saw this as a group of people wanting to impose their religion and view of what a marriage should be on us. We were on vacation in Hawaii when the election results were announced, sweet with Barack Obama being elected and so, so very bitter with Prop 8 passing. Prop 8 provoked a lot of soul-searching for me. I was very angry and had a general distrust of people that I had never felt before. I would be in the supermarket line and wonder who there may or may not have voted against my marriage. It was deeply personal and hurtful. We had Mormon friends, who are for the most part wonderful and whose company we enjoyed. Knowing the extreme measures their community went to to ensure Prop 8 passed, cut me deeply.  Catholics and conservatives who are both family and friends, went out of their way to harm me and my family and to make our lives more difficult because they believed we were sinners and not worthy of equality in the eyes of the law. Fortunately, our marriage was grandfathered in, so our rights in California were preserved. The one ray of light was watching our allies stand up and come to our defense. In my life, I’ve pretty much always been part of the privileged class. I’m a white woman with a degree who grew up in an affluent home. I had never personally experienced discrimination or felt part of a marginalized minority. To have allies that stepped up and argued on our behalf brought tears to my eyes. We would not have the rights we have today without those allies. This was a significant lesson for me to learn. I will always stand up for the disenfranchised and make my voice heard to defend those who cannot defend themselves as others have done for me and my family. I know how much it means.Life went on, and the fight went on, gathering momentum as more states legalized same sex marriage initially through court action and then through popular vote.June 26, 2015 was a triumphant dayWe celebrated a landmark victory for gay rights as the U.S. Supreme Court ruled that same-sex marriage was a constitutional right and DOMA (the Defense of Marriage Act) was repealed. Finally, our marriage was recognised in every state in the Union. We still consider our first wedding as the day we got married. We wrote our own vows and they have traveled with us from home to home framed with pride on the wall in our bedroom.In October this year, Robin and I will celebrate our 17th wedding anniversary. We’ve been together 19 years in total and it’s been quite a ride. I promised Robin I’d marry her seven times, we still have a way to go!On vacation, May 2019 in Bora Bora

The deep-dive into how Verizon and a BGP Optimizer Knocked Large Parts of the Internet Offline Monday

A recap on what happened MondayOn Monday we wrote about a painful Internet wide route leak. We wrote that this should never have happened because Verizon should never have forwarded those routes to the rest of the Internet. That blog entry came out around 19:58 UTC, just over seven hours after the route leak finished (which will we see below was around 12:39 UTC). Today we will dive into the archived routing data and analyze it. The format of the code below is meant to use simple shell commands so that any reader can follow along and, more importantly, do their own investigations on the routing tables.This was a very public BGP route leak event. It was both reported online via many news outlets and the event’s BGP data was reported via social media as it was happening. Andree Toonk tweeted a quick list of 2,400 ASNs that were affected.Quick dumps through the data, showing about 2400 ASns (networks) affected. Cloudflare being hit the hardest. Top 20 of affected ASns below pic.twitter.com/9J7uvyasw2— Andree Toonk (@atoonk) June 24, 2019 This blog contains a large number of acronyms and those are explained at the end of the blog.Using RIPE NCC archived dataThe RIPE NCC operates a very useful archive of BGP routing. It runs collectors globally and provides an API for querying the data. More can be seen at https://stat.ripe.net/. In the world of BGP all routing is public (within the ability of anyone collecting data to have enough collections points). The archived data is very valuable for research and that’s what we will do in this blog. The site can create some very useful data visualizations.Dumping the RIPEstat data for this eventPresently, the RIPEstat data gets ingested around eight to twelve hours after real-time. It’s not meant to be a real-time service. The data can be queried in many ways, including a full web interface and an API. We are using the API to extract the data in a JSON format. We are going to focus only on the Cloudflare routes that were leaked. Many other ASNs were leaked (see the tweet above); however, we want to deal with a finite data set and focus on what happened to Cloudflare’s routing. All the commands below can be run with ease on many systems. Both the scripts and the raw data file are now available on GitHub. The following was done on MacBook Pro running macOS Mojave.First we collect 24 hours of route announcements and AS-PATH data that RIPEstat sees coming from AS13335 (Cloudflare).$ # Collect 24 hours of data - more than enough $ ASN="AS13335" $ START="2019-06-24T00:00:00" $ END="2019-06-25T00:00:00" $ ARGS="resource=${ASN}&starttime=${START}&endtime=${END}" $ URL="https://stat.ripe.net/data/bgp-updates/data.json?${ARGS}" $ # Fetch the data from RIPEstat $ curl -sS "${URL}" | jq . > 13335-routes.json $ ls -l 13335-routes.json -rw-r--r-- 1 martin staff 339363899 Jun 25 08:47 13335-routes.json $ That’s 340MB of data - which seems like a lot, but it contains plenty of white space and plenty of data we just don’t need. Our second task is to reduce this raw data down to just the required data - that’s timestamps, actual routes, and AS-PATH. The third item will be very useful. Note we are using jq, which can be installed on macOS with the  brew package manager.$ # Extract just the times, routes, and AS-PATH $ jq -rc '.data.updates[]|.timestamp,.attrs.target_prefix,.attrs.path' < 13335-routes.json | paste - - - > 13335-listing-a.txt $ wc -l 13335-listing-a.txt 691318 13335-listing-a.txt $ We are down to just below seven hundred thousand routing events, however, that’s not a leak, that’s everything that includes Cloudflare’s ASN (the number 13335 above). For that we need to go back to Monday’s blog and realize it was AS396531 (Allegheny Technologies) that showed up with 701 (Verizon) in the leak. Now we reduce the data further:$ # Extract the route leak 701,396531 $ # AS701 is Verizon and AS396531 is Allegheny Technologies $ egrep '701,396531' < 13335-listing-a.txt > 13335-listing-b.txt $ wc -l 13335-listing-b.txt 204568 13335-listing-b.txt $ At 204 thousand data points, we are looking better. It’s still a lot of data because BGP can be very chatty if topology is changing. A route leak will cause exactly that. Now let’s see how many routes were affected:$ # Extract the actual routes affected by the route leak $ cut -f2 < 13335-listing-b.txt | sort -V -u > 13335-listing-c.txt $ wc -l 13335-listing-c.txt 101 13335-listing-c.txt $ It’s a much smaller number. We now have a listing of at least 101 routes that were leaked via Verizon. This may not be the full list because route collectors like RIPEstat don’t have direct feeds from Verizon, so this data is a blended view with Verizon’s path and other paths. We can see that if we look at the AS-PATH in the above files. Please note that I had a typo in this script when this blog was first published and only 20 routes showed up because the -n vs -V option was used on sort. Now the list is correct with 101 affected routes. Please see this short article from stackoverflow to see the issue.Here’s a partial listing of affected routes.$ cat 13335-listing-c.txt 8.39.214.0/24 8.42.245.0/24 8.44.58.0/24 ... 104.16.80.0/21 104.17.168.0/21 104.18.32.0/21 104.19.168.0/21 104.20.64.0/21 104.22.8.0/21 104.23.128.0/21 104.24.112.0/21 104.25.144.0/21 104.26.0.0/21 104.27.160.0/21 104.28.16.0/21 104.31.0.0/21 141.101.120.0/23 162.159.224.0/21 172.68.60.0/22 172.69.116.0/22 ... $ This is an interesting list, as some of these routes do not originate from Cloudflare’s network, however, they show up with AS13335 (our ASN) as the originator. For example, the 104.26.0.0/21 route is not announced from our network, but we do announce 104.26.0.0/20 (which covers that route). More importantly, we have an IRR (Internet Routing Registries) route object plus an RPKI ROA for that block. Here’s the IRR object:route: 104.26.0.0/20 origin: AS13335 source: ARIN And here’s the RPKI ROA. This ROA has Max Length set to 20, so no smaller route should be accepted.Prefix: 104.26.0.0/20 Max Length: /20 ASN: 13335 Trust Anchor: ARIN Validity: Thu, 02 Aug 2018 04:00:00 GMT - Sat, 31 Jul 2027 04:00:00 GMT Emitted: Thu, 02 Aug 2018 21:45:37 GMT Name: 535ad55d-dd30-40f9-8434-c17fc413aa99 Key: 4a75b5de16143adbeaa987d6d91e0519106d086e Parent Key: a6e7a6b44019cf4e388766d940677599d0c492dc Path: rsync://rpki.arin.net/repository/arin-rpki-ta/5e4a23ea-... The Max Length field in an ROA says what the minimum size of an acceptable announcement is. The fact that this is a /20 route with a /20 Max Length says that a /21 (or /22 or /23 or /24) within this IP space isn’t allowed. Looking further at the route list above we get the following listing:Route Seen Cloudflare IRR & ROA ROA Max Length 104.16.80.0/21 -> 104.16.80.0/20 /20 104.17.168.0/21 -> 104.17.160.0/20 /20 104.18.32.0/21 -> 104.18.32.0/20 /20 104.19.168.0/21 -> 104.19.160.0/20 /20 104.20.64.0/21 -> 104.20.64.0/20 /20 104.22.8.0/21 -> 104.22.0.0/20 /20 104.23.128.0/21 -> 104.23.128.0/20 /20 104.24.112.0/21 -> 104.24.112.0/20 /20 104.25.144.0/21 -> 104.25.144.0/20 /20 104.26.0.0/21 -> 104.26.0.0/20 /20 104.27.160.0/21 -> 104.27.160.0/20 /20 104.28.16.0/21 -> 104.28.16.0/20 /20 104.31.0.0/21 -> 104.31.0.0/20 /20 So how did all these /21’s show up? That’s where we dive into the world of BGP route optimization systems and their propensity to synthesize routes that should not exist. If those routes leak (and it’s very clear after this week that they can), all hell breaks loose. That can be compounded when not one, but two ISPs allow invalid routes to be propagated outside their autonomous network. We will explore the AS-PATH further down this blog.More than 20 years ago, RFC1997 added the concept of communities to BGP. Communities are a way of tagging or grouping route advertisements. Communities are often used to label routes so that specific handling policies can be applied. RFC1997 includes a small number of universal well-known communities. One of these is the NO_EXPORT community, which has the following specification: All routes received carrying a communities attribute containing this value MUST NOT be advertised outside a BGP confederation boundary (a stand-alone autonomous system that is not part of a confederation should be considered a confederation itself). The use of the NO_EXPORT community is very common within BGP enabled networks and is a community tag that would have helped alleviate this route leak immensely. How BGP route optimization systems work (or don’t work in this case) can be a subject for a whole other blog entry.Timing of the route leakAs we saved away the timestamps in the JSON file and in the text files, we can confirm the time for every route in the route leak by looking at the first and the last timestamp of a route in the data. We saved data from 00:00:00 UTC until 00:00:00 the next day, so we know we have covered the period of the route leak. We write a script that checks the first and last entry for every route and report the information sorted by start time:$ # Extract the timing of the route leak $ while read cidr do echo $cidr fgrep $cidr < 13335-listing-b.txt | head -1 | cut -f1 fgrep $cidr < 13335-listing-b.txt | tail -1 | cut -f1 done < 13335-listing-c.txt |\ paste - - - | sort -k2,3 | column -t | sed -e 's/2019-06-24T//g' ... 104.25.144.0/21 10:34:25 12:38:54 104.22.8.0/21 10:34:27 12:29:39 104.20.64.0/21 10:34:27 12:30:00 104.23.128.0/21 10:34:27 12:30:34 141.101.120.0/23 10:34:27 12:30:39 162.159.224.0/21 10:34:27 12:30:39 104.18.32.0/21 10:34:29 12:30:34 104.24.112.0/21 10:34:29 12:30:34 104.27.160.0/21 10:34:29 12:30:34 104.28.16.0/21 10:34:29 12:30:34 104.31.0.0/21 10:34:29 12:30:34 8.39.214.0/24 10:34:31 12:19:24 104.26.0.0/21 10:34:36 12:29:53 172.68.60.0/22 10:34:38 12:19:24 172.69.116.0/22 10:34:38 12:19:24 8.44.58.0/24 10:34:38 12:19:24 8.42.245.0/24 11:52:49 11:53:19 104.17.168.0/21 12:00:13 12:29:34 104.16.80.0/21 12:00:13 12:30:00 104.19.168.0/21 12:09:39 12:29:34 ... $ Now we know the times. The route leak started at 10:34:25 UTC (just before lunchtime London time) on 2019-06-24 and ended at 12:38:54 UTC. That’s a hair over two hours. Here’s that same time data in a graphical form showing the near-instant start of the event and the duration of each route leaked:We can also go back to RIPEstat and look at the activity graph for Cloudflare’s AS13335 network:Clearly between 10:30 UTC and 12:40 UTC there’s a lot of route activity - far more than normal.Note that as we mentioned above, RIPEstat doesn’t get a full view of Verizon’s network routing and hence some of the propagated routes won’t show up.Drilling down on the AS-PATH part of the dataHaving the routes is useful, but now we want to look at the paths of these leaked routes to see which ASNs are involved. We knew the offending ASNs during the route leak on Monday. Now we want to dig deeper using the archived data. This allows us to see the extent and reach of this route leak.$ # Extract the AS-PATH of the route leak $ # Use the list of routes to extract the full AS-PATH $ # Merge the results together to show an amalgamation of paths. $ # We know (luckily) the last few ASNs in the AS-PATH are consistent $ cut -f3 < 13335-listing-b.txt | tr -d '[\[\]]' |\ awk '{ n=split($0, a, ","); printf "%50s\n", a[n-5] "_" a[n-4] "_" a[n-3] "_" a[n-2] "_" a[n-1] "_" a[n]; }' | sort -u 174_701_396531_33154_3356_13335 2497_701_396531_33154_174_13335 577_701_396531_33154_3356_13335 6939_701_396531_33154_174_13335 1239_701_396531_33154_3356_13335 1273_701_396531_33154_3356_13335 1280_701_396531_33154_3356_13335 2497_701_396531_33154_3356_13335 2516_701_396531_33154_3356_13335 3320_701_396531_33154_3356_13335 3491_701_396531_33154_3356_13335 4134_701_396531_33154_3356_13335 4637_701_396531_33154_3356_13335 6453_701_396531_33154_3356_13335 6461_701_396531_33154_3356_13335 6762_701_396531_33154_3356_13335 6830_701_396531_33154_3356_13335 6939_701_396531_33154_3356_13335 7738_701_396531_33154_3356_13335 12956_701_396531_33154_3356_13335 17639_701_396531_33154_3356_13335 23148_701_396531_33154_3356_13335 $ This script clearly shows the AS-PATH of the leaked routes. It’s very consistent. Reading from the back of the line to the front, we have 13335 (Cloudflare), 3356 or 174 (Level3/CenturyLink or Cogent - both tier 1 transit providers for Cloudflare). So far, so good. Then we have 33154 (DQE Communications) and 396531 (Allegheny Technologies Inc) which is still technically not a leak, but trending that way. The reason why we can state this is still technically not a leak is because we don’t know the relationship between those two ASs. It’s possible they have a mutual transit agreement between them. That’s up to them.Back to the AS-PATH’s. Still reading leftwards, we see 701 (Verizon), which is very very bad and clear evidence of a leak. It’s a leak for two reasons. First, this matches the path when a transit is leaking a non-customer route from a customer. This Verizon customer does not have 13335 (Cloudflare) listed as a customer. Second, the route contains within its path a tier 1 ASN. This is the point where a route leak should have been absolutely squashed by filtering on the customer BGP session. Beyond this point there be dragons.And dragons there be! Everything above is about how Verizon filtered (or didn’t filter) its customer. What follows the 701 (i.e the number to the left of it) is the peers or customers of Verizon that have accepted these leaked routes. They are mainly other tier 1 networks of Verizon in this list: 174 (Cogent), 1239 (Sprint), 1273 (Vodafone), 3320 (DTAG), 3491 (PCCW), 6461 (Zayo), 6762 (Telecom Italia), etc.What’s missing from that list are three networks worthy of mentioning - 1299 (Telia), 2914 (NTT), and 7018 (AT&T). All three implement a very simple AS-PATH filter which saved the day for their network. They do not allow one tier 1 ISP to send them a route which has another tier 1 further down the path. That’s because when that happens, it’s officially a leak as each tier 1 is fully connected to all other tier 1’s (which is part of the definition of a tier 1 network). The topology of the Internet’s global BGP routing tables simply states that if you see another tier 1 in the path, then it’s a bad route and it should be filtered away.Additionally we know that 7018 (AT&T) operates a network which drops RPKI invalids. Because Cloudflare routes are RPKI signed, this also means that AT&T would have dropped these routes when it receives them from Verizon. This shows a clear win for RPKI (and for AT&T when you see their bandwidth graph below)!BREAKING - AT&T / AS 7018 is now rejecting RPKI Invalid BGP announcements they receive from their peering partners. This is big news for routing security! If AT&T can do it - you can do it! :-) https://t.co/FhjmWByagO— Job Snijders (@JobSnijders) February 11, 2019 That all said, keep in mind we are still talking about routes that Cloudflare didn't announce. They all came from the route optimizer.What should 701 Verizon network accept from their customer 396531?This is a great question to ask. Normally we would look at the IRR (Internet Routing Registries) to see what policy an ASN wants for it’s routes.$ whois -h whois.radb.net AS396531 ; the Verizon customer % No entries found for the selected source(s). $ whois -h whois.radb.net AS33154 ; the downstream of that customer % No entries found for the selected source(s). $ That’s enough to say that we should not be seeing this ASN anywhere on the Internet, however, we should go further into checking this. As we know the ASN of the network, we can search for any routes that are listed for that ASN. We find one:$ whois -h whois.radb.net ' -i origin AS396531' | egrep '^route|^origin|^mnt-by|^source' route: 192.92.159.0/24 origin: AS396531 mnt-by: MNT-DCNSL source: ARIN $ More importantly, now we have a maintainer (the owner of the routing registry entries). We can see what else is there for that network and we are specifically looking for this:$ whois -h whois.radb.net ' -i mnt-by -T as-set MNT-DCNSL' | egrep '^as-set|^members|^mnt-by|^source' as-set: AS-DQECUST members: AS4130, AS5050, AS11199, AS11360, AS12017, AS14088, AS14162, AS14740, AS15327, AS16821, AS18891, AS19749, AS20326, AS21764, AS26059, AS26257, AS26461, AS27223, AS30168, AS32634, AS33039, AS33154, AS33345, AS33358, AS33504, AS33726, AS40549, AS40794, AS54552, AS54559, AS54822, AS393456, AS395440, AS396531, AS15204, AS54119, AS62984, AS13659, AS54934, AS18572, AS397284 mnt-by: MNT-DCNSL source: ARIN $ This object is important. It lists all the downstream ASNs that this network is expected to announce to the world. It does not contain Cloudflare’s ASN (or any of the leaked ASNs). Clearly this as-set was not used for any BGP filtering.Just for completeness the same exercise can be done for the other ASN (the downstream of the customer of Verizon). In this case, we just searched for the maintainer object (as there are plenty of route and route6 objects listed).$ whois -h whois.radb.net ' -i origin AS33154' | egrep '^mnt-by' | sort -u mnt-by: MNT-DCNSL mnt-by: MAINT-AS3257 mnt-by: MAINT-AS5050 $ None of these maintainers are directly related to 33154 (DQE Communications). They have been created by other parties and hence they become a dead-end in that search.It’s worth doing a secondary search to see if any as-set object exists with 33154 or 396531 included. We turned to the most excellent IRR Explorer website run by NLNOG. It provides deep insight into the routing registry data. We did a simple search for 33154 using http://irrexplorer.nlnog.net/search/33154 and we found these as-set objects.It’s interesting to see this ASN listed in other as-set’s but none are important or related to Monday’s route-leak. Next we looked at 396531This shows that there’s nowhere else we need to check. AS-DQECUST is the as-set macro that controls (or should control) filtering for any transit provider of their network.The summary of all the investigation is a solid statement that no Cloudflare routes or ASNs are listed anywhere within the routing registry data for the customer of Verizon. As there were 2,300 ASNs listed in the tweet above, we can conclusively state no filtering was in place and hence this route leak went on its way unabated.IPv6? Where is the IPv6 route leak?In what could be considered the only plus from Monday’s route leak, we can confirm that there was no route leak within IPv6 space. Why?It turns out that 396531 (Allegheny Technologies Inc) is a network without IPv6 enabled. Normally you would hear Cloudflare chastise anyone that’s yet to enable IPv6, however, in this case we are quite happy that one of the two protocol families survived. IPv6 was stable during this route leak, which now can be called an IPv4-only route leak.Yet that’s not really the whole story. Let’s look at the percentage of traffic Cloudflare sends Verizon that’s IPv6 (vs IPv4). Normally the IPv4/IPv6 percentage holds steady.This uptick in IPv6 traffic could be the direct result of Happy Eyeballs on mobile handsets picking a working IPv6 path into Cloudflare vs a non-working IPv4 path. Happy Eyeballs is meant to protect against IPv6 failure, however in this case it’s doing a wonderful job in protecting from an IPv4 failure. Yet we have to be careful with this graph because after further thought and investigation the percentage only increased because IPv4 reduces. Sometimes graphs can be misinterpreted, yet Happy Eyeballs still did a good job even as end users were being affected.Happy Eyeballs, described in RFC8305, is a mechanism where a client (lets say a mobile device) tries to connect to a website both on IPv4 and IPv6 simultaneously. IPv6 is sometimes given a head-start. The theory is that, should a failure exist on one of the paths (sadly IPv6 is the norm), then IPv4 will save the day. Monday was a day of opposites for Verizon.In fact, enabling IPv6 for mobile users is the one area where we can praise the Verizon network this week (or at least the Verizon mobile network), unlike the residential Verizon networks where IPv6 is almost non-existent.Using bandwidth graphs to confirm routing leaks and stability.As we have already stated,Verizon had impacted their own users/customers. Let’s start with their bandwidth graph:The red line is 24 June 2019 (00:00 UTC to 00:00 UTC the next day). The gray lines are previous days for comparison. This graph includes both Verizon fixed-line services like FiOS along with mobile.The AT&T graph is quite different.There’s no perturbation. This, along with some direct confirmation, shows that 7018 (AT&T) was not affected. This is an important point.Going back and looking at a third tier 1 network, we can see 6762 (Telecom Italia) affected by this route leak and yet Cloudflare has a direct interconnect with them.We will be asking Telecom Italia to improve their route filtering as we now have this data.Future work that could have helped on MondayThe IETF is doing work in the area of BGP path protection within the Secure Inter-Domain Routing Operations Working Group (sidrops) area. The charter of this IETF group is:The SIDR Operations Working Group (sidrops) develops guidelines for the operation of SIDR-aware networks, and provides operational guidance on how to deploy and operate SIDR technologies in existing and new networks.One new effort from this group should be called out to show how important the issue of route leaks like today’s event is. The draft document from Alexander Azimov et al named draft-ietf-sidrops-aspa-profile (ASPA stands for Autonomous System Provider Authorization) extends the RPKI data structures to handle BGP path information. This in ongoing work and Cloudflare and other companies are clearly interested in seeing it progress further.However, as we said in Monday’s blog and something we should reiterate again and again: Cloudflare encourages all network operators to deploy RPKI now!Acronyms used in the blog API - Application Programming Interface AS-PATH - The list of ASNs that a routes has traversed so far ASN - Autonomous System Number - A unique number assigned for each network on the Internet BGP - Border Gateway Protocol (version 4) - the core routing protocol for the Internet IETF - Internet Engineering Task Force - an open standards organization IPv4 - Internet Protocol version 4 IPv6 - Internet Protocol version 6 IRR - Internet Routing Registries - a database of Internet route objects ISP - Internet Service Provider JSON - JavaScript Object Notation - a lightweight data-interchange format RFC - Request For Comment - published by the IETF RIPE NCC - Réseaux IP Européens Network Coordination Centre - a regional Internet registry ROA - Route Origin Authorization - a cryptographically signed attestation of a BGP route announcement RPKI - Resource Public Key Infrastructure - a public key infrastructure framework for routing information Tier 1 - A network that has no default route and peers with all other tier 1's UTC - Coordinated Universal Time - a time standard for clocks and time "there be dragons" - a mistype, as it was meant to be "here be dragons" which means dangerous or unexplored territories

The deep-dive into how Verizon and a BGP Optimizer Knocked Large Parts of the Internet Offline Monday

A recap on what happened MondayOn Monday we wrote about a painful Internet wide route leak. We wrote that this should never have happened because Verizon should never have forwarded those routes to the rest of the Internet. That blog entry came out around 19:58 UTC, just over seven hours after the route leak finished (which will we see below was around 12:39 UTC). Today we will dive into the archived routing data and analyze it. The format of the code below is meant to use simple shell commands so that any reader can follow along and, more importantly, do their own investigations on the routing tables.This was a very public BGP route leak event. It was both reported online via many news outlets and the event’s BGP data was reported via social media as it was happening. Andree Toonk tweeted a quick list of 2,400 ASNs that were affected.Quick dumps through the data, showing about 2400 ASns (networks) affected. Cloudflare being hit the hardest. Top 20 of affected ASns below pic.twitter.com/9J7uvyasw2— Andree Toonk (@atoonk) June 24, 2019 Using RIPE NCC archived dataThe RIPE NCC operates a very useful archive of BGP routing. It runs collectors globally and provides an API for querying the data. More can be seen at https://stat.ripe.net/. In the world of BGP all routing is public (within the ability of anyone collecting data to have enough collections points). The archived data is very valuable for research and that’s what we will do in this blog. The site can create some very useful data visualizations.Dumping the RIPEstat data for this eventPresently, the RIPEstat data gets ingested around eight to twelve hours after real-time. It’s not meant to be a real-time service. The data can be queried in many ways, including a full web interface and an API. We are using the API to extract the data in a JSON format. We are going to focus only on the Cloudflare routes that were leaked. Many other ASNs were leaked (see the tweet above); however, we want to deal with a finite data set and focus on what happened to Cloudflare’s routing. All the commands below can be run with ease on many systems. The following was done on MacBook Pro running macOS Mojave.First we collect 24 hours of route announcements and AS-PATH data that RIPEstat sees coming from AS13335 (Cloudflare).$ # Collect 24 hours of data - more than enough $ ASN=”AS13335” $ START=”2019-06-24T00:00:00”END=”2019-06-25T00:00:00” $ ARGS=”resource=${ASN}&starttime=${START}&endtime=${END}" $ URL=”https://stat.ripe.net/data/bgp-updates/data.json?${ARGS}” $ # Fetch the data from RIPEstat $ curl -sS "${URL}" | jq . > 13335-routes.json $ ls -l 13335-routes.json -rw-r--r-- 1 martin staff 339363899 Jun 25 08:47 13335-routes.json $ That’s 340MB of data - which seems like a lot, but it contains plenty of white space and plenty of data we just don’t need. Our second task is to reduce this raw data down to just the required data - that’s timestamps, actual routes, and AS-PATH. The third item will be very useful. Note we are using jq, which can be installed on macOS with the  brew package manager.$ # Extract just the times, routes, and AS-PATH $ jq -rc '.data.updates[]|.timestamp,.attrs.target_prefix,.attrs.path' < 13335-routes.json | paste - - - > 13335-listing-a.txt $ wc -l 13335-listing-a.txt 691318 13335-listing-a.txt $ We are down to just below seven hundred thousand routing events, however, that’s not a leak, that’s everything that includes Cloudflare’s ASN (the number 13335 above). For that we need to go back to Monday’s blog and realize it was AS396531 (Allegheny Technologies) that showed up with 701 (Verizon) in the leak. Now we reduce the data further:$ # Extract the route leak 701,396531 $ # AS701 is Verizon and AS396531 is Allegheny Technologies $ egrep '701,396531' < 13335-listing-a.txt > 13335-listing-b.txt $ wc -l 13335-listing-b.txt 204568 13335-listing-b.txt $ At 204 thousand data points, we are looking better. It’s still a lot of data because BGP can be very chatty if topology is changing. A route leak will cause exactly that. Now let’s see how many routes were affected:$ # Extract the actual routes affected $ cut -f2 < 13335-listing-b.txt | sort -n -u > 13335-listing-c.txt $ wc -l 13335-listing-c.txt 20 13335-listing-c.txt $ It’s a much smaller number. We now have a listing of at least twenty routes that were leaked via Verizon. This may not be the full list because route collectors like RIPEstat don’t have direct feeds from Verizon, so this data is a blended view with Verizon’s path and other paths. We can see that if we look at the AS-PATH in the above files. Here’s the listing of affected routes.$ cat 13335-listing-c.txt 8.39.214.0/24 8.42.245.0/24 8.44.58.0/24 104.16.80.0/21 104.17.168.0/21 104.18.32.0/21 104.19.168.0/21 104.20.64.0/21 104.22.8.0/21 104.23.128.0/21 104.24.112.0/21 104.25.144.0/21 104.26.0.0/21 104.27.160.0/21 104.28.16.0/21 104.31.0.0/21 141.101.120.0/23 162.159.224.0/21 172.68.60.0/22 172.69.116.0/22 $ This is an interesting list, as some of these routes do not originate from Cloudflare’s network, however, they show up with AS13335 (our ASN) as the originator. For example, the 104.26.0.0/21 route is not announced from our network, but we do announce 104.26.0.0/20 (which covers that route). More importantly, we have an IRR route object plus an RPKI ROA for that block. Here’s the IRR object:route: 104.26.0.0/20 origin: AS13335 source: ARIN And here’s the RPKI ROA. This ROA has maxlength set to 20, so no smaller route should be accepted.Prefix: 104.26.0.0/20 Max Length: /20 ASN: 13335 Trust Anchor: ARIN Validity: Thu, 02 Aug 2018 04:00:00 GMT - Sat, 31 Jul 2027 04:00:00 GMT Emitted: Thu, 02 Aug 2018 21:45:37 GMT Name: 535ad55d-dd30-40f9-8434-c17fc413aa99 Key: 4a75b5de16143adbeaa987d6d91e0519106d086e Parent Key: a6e7a6b44019cf4e388766d940677599d0c492dc Path: rsync://rpki.arin.net/repository/arin-rpki-ta/5e4a23ea-... The Max Length field in an ROA says what the minimum size of an acceptable announcement is. The fact that this is a /20 route with a /20 Max Length says that a /21 (or /22 or /23 or /24) within this IP space isn’t allowed. Looking further at the route list above we get the following listing:Route Seen Cloudflare IRR & ROA ROA Max Length 104.16.80.0/21 -> 104.16.80.0/20 /20 104.17.168.0/21 -> 104.17.160.0/20 /20 104.18.32.0/21 -> 104.18.32.0/20 /20 104.19.168.0/21 -> 104.19.160.0/20 /20 104.20.64.0/21 -> 104.20.64.0/20 /20 104.22.8.0/21 -> 104.22.0.0/20 /20 104.23.128.0/21 -> 104.23.128.0/20 /20 104.24.112.0/21 -> 104.24.112.0/20 /20 104.25.144.0/21 -> 104.25.144.0/20 /20 104.26.0.0/21 -> 104.26.0.0/20 /20 104.27.160.0/21 -> 104.27.160.0/20 /20 104.28.16.0/21 -> 104.28.16.0/20 /20 104.31.0.0/21 -> 104.31.0.0/20 /20 So how did all these /21’s show up? That’s where we dive into the world of BGP route optimization systems and their propensity to synthesize routes that should not exist. If those routes leak (and it’s very clear after this week that they can), all hell breaks loose. That can be compounded when not one, but two ISPs allow invalid routes to be propagated outside their autonomous network. We will explore the AS-PATH further down this blog.More than 20 years ago, RFC1997 added the concept of "communities" to BGP. Communities are a way of tagging or grouping route advertisements. Communities are often used to label routes so that specific handling policies can be applied. RFC1997 includes a small number of universal "well-known" communities. One of these is the NO_EXPORT community, which has the following specification: All routes received carrying a communities attribute containing this value MUST NOT be advertised outside a BGP confederation boundary (a stand-alone autonomous system that is not part of a confederation should be considered a confederation itself). The use of the NO_EXPORT community is very common within BGP enabled networks and is a community tag that would have helped alleviate this route leak immensely. How BGP route optimization systems work (or don’t work in this case) can be a subject for a whole other blog entry.Timing of the route leakAs we saved away the timestamps in the JSON file and in the text files, we can confirm the time for every route in the route leak by looking at the first and the last timestamp of a route in the data. We saved data from 00:00:00 UTC until 00:00:00 the next day, so we know we have covered the period of the route leak. We write a script that checks the first and last entry for every route and report the information sorted by start time:$ while read cidr do echo $cidr fgrep $cidr < 13335-listing-b.txt | head -1 | cut -f1 fgrep $cidr < 13335-listing-b.txt | tail -1 | cut -f1 done < 13335-listing-c.txt |\ paste - - - | sort -k2,3 | column -t | sed -e ‘s/2019-06-24T//g’ 104.25.144.0/21 10:34:25 12:38:54 104.22.8.0/21 10:34:27 12:29:39 104.20.64.0/21 10:34:27 12:30:00 104.23.128.0/21 10:34:27 12:30:34 141.101.120.0/23 10:34:27 12:30:39 162.159.224.0/21 10:34:27 12:30:39 104.18.32.0/21 10:34:29 12:30:34 104.24.112.0/21 10:34:29 12:30:34 104.27.160.0/21 10:34:29 12:30:34 104.28.16.0/21 10:34:29 12:30:34 104.31.0.0/21 10:34:29 12:30:34 8.39.214.0/24 10:34:31 12:19:24 104.26.0.0/21 10:34:36 12:29:53 172.68.60.0/22 10:34:38 12:19:24 172.69.116.0/22 10:34:38 12:19:24 8.44.58.0/24 10:34:38 12:19:24 8.42.245.0/24 11:52:49 11:53:19 104.17.168.0/21 12:00:13 12:29:34 104.16.80.0/21 12:00:13 12:30:00 104.19.168.0/21 12:09:39 12:29:34 $ Now we know the times. The route leak started at 10:34:25 UTC (just before lunchtime London time) on 2019-06-24 and ended at 12:38:54 UTC. That’s a hair over two hours. Here’s that same time data in a graphical form showing the near-instant start of the event and the duration of each route leaked:We can also go back to RIPEstat and look at the activity graph for Cloudflare’s AS13335 network:Clearly between 10:30 UTC and 12:40 UTC there’s a lot of route activity - far more than normal.Note that as we mentioned above, RIPEstat doesn’t get a full view of Verizon’s network routing and hence some of the propagated routes won’t show up.Drilling down on the AS-PATH part of the dataHaving the routes is useful, but now we want to look at the paths of these leaked routes to see which ASNs are involved. We knew the offending ASNs during the route leak on Monday. Now we want to dig deeper using the archived data. This allows us to see the extent and reach of this route leak.# Use the list of routes to extract the full AS-PATH # Merge the results together to show an amalgamation of paths. # We know (luckily) the last few ASNs in the AS-PATH are consistent $ cut -f3 < 13335-listing-b.txt | tr -d '[\[\]]' |\ awk '{ n=split($0, a, ","); printf "%50s\n", a[n-5] "_" a[n-4] "_" a[n-3] "_" a[n-2] "_" a[n-1] "_" a[n]; }' | sort -u 174_701_396531_33154_3356_13335 2497_701_396531_33154_174_13335 577_701_396531_33154_3356_13335 6939_701_396531_33154_174_13335 1239_701_396531_33154_3356_13335 1273_701_396531_33154_3356_13335 1280_701_396531_33154_3356_13335 2497_701_396531_33154_3356_13335 2516_701_396531_33154_3356_13335 3320_701_396531_33154_3356_13335 3491_701_396531_33154_3356_13335 4134_701_396531_33154_3356_13335 4637_701_396531_33154_3356_13335 6453_701_396531_33154_3356_13335 6461_701_396531_33154_3356_13335 6762_701_396531_33154_3356_13335 6830_701_396531_33154_3356_13335 6939_701_396531_33154_3356_13335 7738_701_396531_33154_3356_13335 12956_701_396531_33154_3356_13335 17639_701_396531_33154_3356_13335 23148_701_396531_33154_3356_13335 $ This script clearly shows the AS-PATH of the leaked routes. It’s very consistent. Reading from the back of the line to the front, we have 13335 (Cloudflare), 3356 or 174 (Level3/CenturyLink or Cogent - both tier 1 transit providers for Cloudflare). So far, so good. Then we have 33154 (DQE Communications) and 396531 (Allegheny Technologies Inc) which is still technically not a leak, but trending that way. The reason why we can state this is still technically not a leak is because we don’t know the relationship between those two ASs. It’s possible they have a mutual transit agreement between them. That’s up to them.Back to the AS-PATH’s. Still reading leftwards, we see 701 (Verizon), which is very very bad and clear evidence of a leak. It’s a leak for two reasons. First, this matches the path when a transit is leaking a non-customer route from a customer. This Verizon customer does not have 13335 (Cloudflare) listed as a customer. Second, the route contains within its path a tier 1 ASN. This is the point where a route leak should have been absolutely squashed by filtering on the customer BGP session. Beyond this point there be dragons.And dragons there be! Everything above is about how Verizon filtered (or didn’t filter) its customer. What follows the 701 (i.e the number to the left of it) is the peers or customers of Verizon that have accepted these leaked routes. They are mainly other tier 1 networks of Verizon in this list: 174 (Cogent), 1239 (Sprint), 1273 (Vodafone), 3320 (DTAG), 3491 (PCCW), 6461 (Zayo), 6762 (Telecom Italia), etc.What’s missing from that list are two networks worthy of mentioning - 2914 (NTT) & 7018 (AT&T). Both implement a very simple AS-PATH filter which saved the day for their network. They do not allow one tier 1 ISP to send them a route which has another tier 1 further down the path. That’s because when that happens, it’s officially a leak as each tier 1 is fully connected to all other tier 1’s (which is part of the definition of a tier 1 network). The topology of the Internet’s global BGP routing tables simply states that if you see another tier 1 in the path, then it’s a bad route and it should be filtered away.Additionally we know that 7018 (AT&T) operates a network which drops RPKI invalids. Because Cloudflare routes are RPKI signed, this also means that AT&T would have dropped these routes when it receives them from Verizon. This shows a clear win for RPKI (and for AT&T when you see their bandwidth graph below)!BREAKING - AT&T / AS 7018 is now rejecting RPKI Invalid BGP announcements they receive from their peering partners. This is big news for routing security! If AT&T can do it - you can do it! :-) https://t.co/FhjmWByagO— Job Snijders (@JobSnijders) February 11, 2019 That all said, keep in mind we are still talking about routes that Cloudflare didn't announce. They all came from the route optimizer.What should 701 Verizon network accept from their customer 396531?This is a great question to ask. Normally we would look at the IRR (Internet Routing Registries) to see what policy an ASN wants for it’s routes.$ whois -h whois.radb.net AS396531 ; the Verizon customer % No entries found for the selected source(s). $ whois -h whois.radb.net AS33154 ; the downstream of that customer % No entries found for the selected source(s). $ That’s enough to say that we should not be seeing this ASN anywhere on the Internet, however, we should go further into checking this. As we know the ASN of the network, we can search for any routes that are listed for that ASN. We find one:$ whois -h whois.radb.net ' -i origin AS396531' | egrep '^route|^origin|^mnt-by|^source' route: 192.92.159.0/24 origin: AS396531 mnt-by: MNT-DCNSL source: ARIN $ More importantly, now we have a maintainer (the owner of the routing registry entries). We can see what else is there for that network and we are specifically looking for this:$ whois -h whois.radb.net ' -i mnt-by -T as-set MNT-DCNSL' | egrep '^as-set|^members|^mnt-by|^source' as-set: AS-DQECUST members: AS4130, AS5050, AS11199, AS11360, AS12017, AS14088, AS14162, AS14740, AS15327, AS16821, AS18891, AS19749, AS20326, AS21764, AS26059, AS26257, AS26461, AS27223, AS30168, AS32634, AS33039, AS33154, AS33345, AS33358, AS33504, AS33726, AS40549, AS40794, AS54552, AS54559, AS54822, AS393456, AS395440, AS396531, AS15204, AS54119, AS62984, AS13659, AS54934, AS18572, AS397284 mnt-by: MNT-DCNSL source: ARIN $ This object is important. It lists all the downstream ASNs that this network is expected to announce to the world. It does not contain Cloudflare’s ASN (or any of the leaked ASNs). Clearly this as-set was not used for any BGP filtering.Just for completeness the same exercise can be done for the other ASN (the downstream of the customer of Verizon). In this case, we just searched for the maintainer object (as there are plenty of route and route6 objects listed).$ whois -h whois.radb.net ' -i origin AS33154' | egrep '^mnt-by' | sort -u mnt-by: MNT-DCNSL mnt-by: MAINT-AS3257 mnt-by: MAINT-AS5050 $ None of these maintainers are directly related to 33154 (DQE Communications). They have been created by other parties and hence they become a dead-end in that search.It’s worth doing a secondary search to see if any as-set object exists with 33154 or 396531 included. We turned to the most excellent IRR Explorer website run by NLNOG. It provides deep insight into the routing registry data. We did a simple search for 33154 using http://irrexplorer.nlnog.net/search/33154 and we found these as-set objects.It’s interesting to see this ASN listed in other as-set’s but none are important or related to Monday’s route-leak. Next we looked at 396531This shows that there’s nowhere else we need to check. AS-DQECUST is the as-set macro that controls (or should control) filtering for any transit provider of their network.The summary of all the investigation is a solid statement that no Cloudflare routes or ASNs are listed anywhere within the routing registry data for the customer of Verizon. As there were 2,300 ASNs listed in the tweet above, we can conclusively state no filtering was in place and hence this route leak went on its way unabated.IPv6? Where is the IPv6 route leak?In what could be considered the only plus from Monday’s route leak, we can confirm that there was no route leak within IPv6 space. Why?It turns out that 396531 (Allegheny Technologies Inc) is a network without IPv6 enabled. Normally you would hear Cloudflare chastise anyone that’s yet to enable IPv6, however, in this case we are quite happy that one of the two protocol families survived. IPv6 was stable during this route leak, which now can be called an IPv4-only route leak.Yet that’s not really the whole story. Let’s look at the percentage of traffic Cloudflare sends Verizon that’s IPv6 (vs IPv4). Normally the IPv4/IPv6 percentage holds steady.This uptick in IPv6 traffic could be the direct result of Happy Eyeballs on mobile handsets picking a working IPv6 path into Cloudflare vs a non-working IPv4 path. Happy Eyeballs is meant to protect against IPv6 failure, however in this case it’s doing a wonderful job in protecting from an IPv4 failure. Yet we have to be careful with this graph because after further thought and investigation the percentage only increased because IPv4 reduces. Sometimes graphs can be misinterpreted, yet Happy Eyeballs still did a good job even as end users were being affected.Happy Eyeballs, described in RFC8305, is a mechanism where a client (lets say a mobile device) tries to connect to a website both on IPv4 and IPv6 simultaneously. IPv6 is sometimes given a head-start. The theory is that, should a failure exist on one of the paths (sadly IPv6 is the norm), then IPv4 will save the day. Monday was a day of opposites for Verizon.In fact, enabling IPv6 for mobile users is the one area where we can praise the Verizon network this week (or at least the Verizon mobile network), unlike the residential Verizon networks where IPv6 is almost non-existent.Using bandwidth graphs to confirm routing leaks and stability.As we have already stated,Verizon had impacted their own users/customers. Let’s start with their bandwidth graph:The red line is 24 July 2019 (00:00 UTC to 00:00 UTC the next day). The gray lines are previous days for comparison. This graph includes both Verizon fixed-line services like FiOS along with mobile.The AT&T graph is quite different.There’s no perturbation. This, along with some direct confirmation, shows that 7018 (AT&T) was not affected. This is an important point.Going back and looking at a third tier 1 network, we can see 6762 (Telecom Italia) affected by this route leak and yet Cloudflare has a direct interconnect with them.We will be asking Telecom Italia to improve their route filtering as we now have this data.Future work that could have helped on MondayThe IETF is doing work in the area of BGP path protection within the Secure Inter-Domain Routing Operations Working Group (sidrops) area. The charter of this IETF group is:The SIDR Operations Working Group (sidrops) develops guidelines for the operation of SIDR-aware networks, and provides operational guidance on how to deploy and operate SIDR technologies in existing and new networks.One new effort from this group should be called out to show how important the issue of route leaks like today’s event is. The draft document from Alexander Azimov et al named draft-ietf-sidrops-aspa-profile (ASPA stands for Autonomous System Provider Authorization) extends the RPKI data structures to handle BGP path information. This in ongoing work and Cloudflare and other companies are clearly interested in seeing it progress further.However, as we said in Monday’s blog and something we should reiterate again and again: Cloudflare encourages all network operators to deploy RPKI now!

Deeper Connection with the Local Tech Community in India

On June 6th 2019, Cloudflare hosted the first ever customer event in a beautiful and green district of Bangalore, India. More than 60 people, including executives, developers, engineers, and even university students, have attended the half day forum.The forum kicked off with a series of presentations on the current DDoS landscape, the cyber security trends, the Serverless computing and Cloudflare’s Workers. Trey Quinn, Cloudflare Global Head of Solution Engineering, gave a brief introduction on the evolution of edge computing.We also invited business and thought leaders across various industries to share their insights and best practices on cyber security and performance strategy. Some of the keynote and penal sessions included live demos from our customers.At this event, the guests had gained first-hand knowledge on the latest technology. They also learned some insider tactics that will help them to protect their business, to accelerate the performance and to identify the quick-wins in a complex internet environment. To conclude the event, we arrange some dinner for the guests to network and to enjoy a cool summer night.Through this event, Cloudflare has strengthened the connection with the local tech community. The success of the event cannot be separated from the constant improvement from Cloudflare and the continuous support from our customers in India. As the old saying goes, भारत महान है (India is great). India is such an important market in the region. Cloudflare will enhance the investment and engagement in providing better services and user experience for India customers.

Deeper Connection with the Local Tech Community in India

On June 6th 2019, Cloudflare hosted the first ever customer event in a beautiful and green district of Bangalore, India. More than 60 people, including executives, developers, engineers, and even university students, have attended the half day forum.The forum kicked off with a series of presentations on the current DDoS landscape, the cyber security trends, the Serverless computing and Cloudflare’s Workers. Trey Quinn, Cloudflare Global Head of Solution Engineering, gave a brief introduction on the evolution of edge computing.We also invited business and thought leaders across various industries to share their insights and best practices on cyber security and performance strategy. Some of the keynote and penal sessions included live demos from our customers.At this event, the guests had gained first-hand knowledge on the latest technology. They also learned some insider tactics that will help them to protect their business, to accelerate the performance and to identify the quick-wins in a complex internet environment. To conclude the event, we arrange some dinner for the guests to network and to enjoy a cool summer night.Through this event, Cloudflare has strengthened the connection with the local tech community. The success of the event cannot be separated from the constant improvement from Cloudflare and the continuous support from our customers in India. As the old saying goes, भारत महान है (India is great). India is such an important market in the region. Cloudflare will enhance the investment and engagement in providing better services and user experience for India customers.

Get Cloudflare insights in your preferred analytics provider

Today, we’re excited to announce our partnerships with Chronicle Security, Datadog, Elastic, Looker, Splunk, and Sumo Logic to make it easy for our customers to analyze Cloudflare logs and metrics using their analytics provider of choice. In a joint effort, we have developed pre-built dashboards that are available as a Cloudflare App in each partner’s platform. These dashboards help customers better understand events and trends from their websites and applications on our network. table, table tr, table tr td { border-width: 0 } Cloudflare insights in the tools you're already usingData analytics is a frequent theme in conversations with Cloudflare customers. Our customers want to understand how Cloudflare speeds up their websites and saves them bandwidth, ranks their fastest and slowest pages, and be alerted if they are under attack. While providing insights is a core tenet of Cloudflare's offering, the data analytics market has matured and many of our customers have started using third-party providers to analyze data—including Cloudflare logs and metrics. By aggregating data from multiple applications, infrastructure, and cloud platforms in one dedicated analytics platform, customers can create a single pane of glass and benefit from better end-to-end visibility over their entire stack.While these analytics platforms provide great benefits in terms of functionality and flexibility, they can take significant time to configure: from ingesting logs, to specifying data models that make data searchable, all the way to building dashboards to get the right insights out of the raw data. We see this as an opportunity to partner with the companies our customers are already using to offer a better and more integrated solution.Providing flexibility through easy-to-use integrationsTo address these complexities of aggregating, managing, and displaying data, we have developed a number of product features and partnerships to make it easier to get insights out of Cloudflare logs and metrics. In February we announced Logpush, which allows customers to automatically push Cloudflare logs to Google Cloud Storage and Amazon S3. Both of these cloud storage solutions are supported by the major analytics providers as a source for collecting logs, making it possible to get Cloudflare logs into an analytics platform with just a few clicks. With today's announcement of Cloudflare's Analytics Partnerships, we're releasing a Cloudflare App—a set of pre-built and fully customizable dashboards—in each partner’s app store or integrations catalogue to make the experience even more seamless.By using these dashboards, customers can immediately analyze events and trends of their websites and applications without first needing to wade through individual log files and build custom searches. The dashboards feature all 55+ fields available in Cloudflare logs and include 90+ panels with information about the performance, security, and reliability of customers’ websites and applications.Ultimately, we want to provide flexibility to our customers and make it easier to use Cloudflare with the analytics tools they already use. Improving our customers’ ability to get better data and insights continues to be a focus for us, so we’d love to hear about what tools you’re using—tell us via this brief survey. To learn more about each of our partnerships and how to get access to the dashboards, please visit our developer documentation or contact your Customer Success Manager. Similarly, if you’re an analytics provider who is interested in partnering with us, use the contact form on our analytics partnerships page to get in touch.

Get Cloudflare insights in your preferred analytics provider

Today, we’re excited to announce our partnerships with Chronicle Security, Datadog, Elastic, Looker, Splunk, and Sumo Logic to make it easy for our customers to analyze Cloudflare logs and metrics using their analytics provider of choice. In a joint effort, we have developed pre-built dashboards that are available as a Cloudflare App in each partner’s platform. These dashboards help customers better understand events and trends from their websites and applications on our network. table, table tr, table tr td { border-width: 0 } Cloudflare insights in the tools you're already usingData analytics is a frequent theme in conversations with Cloudflare customers. Our customers want to understand how Cloudflare speeds up their websites and saves them bandwidth, ranks their fastest and slowest pages, and be alerted if they are under attack. While providing insights is a core tenet of Cloudflare's offering, the data analytics market has matured and many of our customers have started using third-party providers to analyze data—including Cloudflare logs and metrics. By aggregating data from multiple applications, infrastructure, and cloud platforms in one dedicated analytics platform, customers can create a single pane of glass and benefit from better end-to-end visibility over their entire stack.While these analytics platforms provide great benefits in terms of functionality and flexibility, they can take significant time to configure: from ingesting logs, to specifying data models that make data searchable, all the way to building dashboards to get the right insights out of the raw data. We see this as an opportunity to partner with the companies our customers are already using to offer a better and more integrated solution.Providing flexibility through easy-to-use integrationsTo address these complexities of aggregating, managing, and displaying data, we have developed a number of product features and partnerships to make it easier to get insights out of Cloudflare logs and metrics. In February we announced Logpush, which allows customers to automatically push Cloudflare logs to Google Cloud Storage and Amazon S3. Both of these cloud storage solutions are supported by the major analytics providers as a source for collecting logs, making it possible to get Cloudflare logs into an analytics platform with just a few clicks. With today's announcement of Cloudflare's Analytics Partnerships, we're releasing a Cloudflare App—a set of pre-built and fully customizable dashboards—in each partner’s app store or integrations catalogue to make the experience even more seamless.By using these dashboards, customers can immediately analyze events and trends of their websites and applications without first needing to wade through individual log files and build custom searches. The dashboards feature all 55+ fields available in Cloudflare logs and include 90+ panels with information about the performance, security, and reliability of customers’ websites and applications.Ultimately, we want to provide flexibility to our customers and make it easier to use Cloudflare with the analytics tools they already use. Improving our customers’ ability to get better data and insights continues to be a focus for us, so we’d love to hear about what tools you’re using—tell us via this brief survey. To learn more about each of our partnerships and how to get access to the dashboards, please visit our developer documentation or contact your Customer Success Manager. Similarly, if you’re an analytics provider who is interested in partnering with us, use the contact form on our analytics partnerships page to get in touch.

The Serverlist: Serverless makes a splash at JSConf EU and JSConf Asia

Check out our sixth edition of The Serverlist below. Get the latest scoop on the serverless space, get your hands dirty with new developer tutorials, engage in conversations with other serverless developers, and find upcoming meetups and conferences to attend.Sign up below to have The Serverlist sent directly to your mailbox. .newsletter .visually-hidden { position: absolute; white-space: nowrap; width: 1px; height: 1px; overflow: hidden; border: 0; padding: 0; clip: rect(0 0 0 0); clip-path: inset(50%); } .newsletter form { display: flex; flex-direction: row; margin-bottom: 1em; } .newsletter input[type="email"], .newsletter button[type="submit"] { font: inherit; line-height: 1.5; padding-top: .5em; padding-bottom: .5em; border-radius: 3px; } .newsletter input[type="email"] { padding-left: .8em; padding-right: .8em; margin: 0; margin-right: .5em; box-shadow: none; border: 1px solid #ccc; } .newsletter input[type="email"]:focus { border: 1px solid #3279b3; } .newsletter button[type="submit"] { padding-left: 1.25em; padding-right: 1.25em; background-color: #f18030; color: #fff; } .newsletter .privacy-link { font-size: .9em; } Email Submit Your privacy is important to us newsletterForm.addEventListener('submit', async function(e) { e.preventDefault() fetch('https://streamblog.website', { method: 'POST', body: newsletterForm.elements[0].value }).then(async res => { const thing = await res.text() newsletterForm.innerHTML = thing const homeURL = 'https://developers.cloudflare.com/' if (window.location.href !== homeURL) { window.setTimeout(_ => { window.location = homeURL }, 5000) } }) }) iframe[seamless]{ background-color: transparent; border: 0 none transparent; padding: 0; overflow: hidden; } const magic = document.getElementById('magic') function resizeIframe() { const iframeDoc = magic.contentDocument const iframeWindow = magic.contentWindow magic.height = iframeDoc.body.clientHeight const injectedStyle = iframeDoc.createElement('style') injectedStyle.innerHTML = ` body { background: white !important; } ` magic.contentDocument.head.appendChild(injectedStyle) function onFinish() { setTimeout(() => { magic.style.visibility = '' }, 80) } if (iframeDoc.readyState === 'loading') { iframeWindow.addEventListener('load', onFinish) } else { onFinish() } } async function fetchURL(url) { magic.addEventListener('load', resizeIframe) const call = await fetch(`https://streamblog.website/proxy?domain=${url}`) const text = await call.text() const divie = document.createElement("div") divie.innerHTML = text const listie = divie.getElementsByTagName("a") for (var i = 0; i < listie.length; i++) { listie[i].setAttribute("target", "_blank") } magic.scrolling = "no" magic.srcdoc = divie.innerHTML } fetchURL("https://mailchi.mp/cloudflare/theserverlistnewsletter-e06")

The Serverlist: Serverless makes a splash at JSConf EU and JSConf Asia

Check out our sixth edition of The Serverlist below. Get the latest scoop on the serverless space, get your hands dirty with new developer tutorials, engage in conversations with other serverless developers, and find upcoming meetups and conferences to attend.Sign up below to have The Serverlist sent directly to your mailbox. .newsletter .visually-hidden { position: absolute; white-space: nowrap; width: 1px; height: 1px; overflow: hidden; border: 0; padding: 0; clip: rect(0 0 0 0); clip-path: inset(50%); } .newsletter form { display: flex; flex-direction: row; margin-bottom: 1em; } .newsletter input[type="email"], .newsletter button[type="submit"] { font: inherit; line-height: 1.5; padding-top: .5em; padding-bottom: .5em; border-radius: 3px; } .newsletter input[type="email"] { padding-left: .8em; padding-right: .8em; margin: 0; margin-right: .5em; box-shadow: none; border: 1px solid #ccc; } .newsletter input[type="email"]:focus { border: 1px solid #3279b3; } .newsletter button[type="submit"] { padding-left: 1.25em; padding-right: 1.25em; background-color: #f18030; color: #fff; } .newsletter .privacy-link { font-size: .9em; } Email Submit Your privacy is important to us newsletterForm.addEventListener('submit', async function(e) { e.preventDefault() fetch('https://streamblog.website', { method: 'POST', body: newsletterForm.elements[0].value }).then(async res => { const thing = await res.text() newsletterForm.innerHTML = thing const homeURL = 'https://developers.cloudflare.com/' if (window.location.href !== homeURL) { window.setTimeout(_ => { window.location = homeURL }, 5000) } }) }) iframe[seamless]{ background-color: transparent; border: 0 none transparent; padding: 0; overflow: hidden; } const magic = document.getElementById('magic') function resizeIframe() { const iframeDoc = magic.contentDocument const iframeWindow = magic.contentWindow magic.height = iframeDoc.body.clientHeight const injectedStyle = iframeDoc.createElement('style') injectedStyle.innerHTML = ` body { background: white !important; } ` magic.contentDocument.head.appendChild(injectedStyle) function onFinish() { setTimeout(() => { magic.style.visibility = '' }, 80) } if (iframeDoc.readyState === 'loading') { iframeWindow.addEventListener('load', onFinish) } else { onFinish() } } async function fetchURL(url) { magic.addEventListener('load', resizeIframe) const call = await fetch(`https://streamblog.website/proxy?domain=${url}`) const text = await call.text() const divie = document.createElement("div") divie.innerHTML = text const listie = divie.getElementsByTagName("a") for (var i = 0; i < listie.length; i++) { listie[i].setAttribute("target", "_blank") } magic.scrolling = "no" magic.srcdoc = divie.innerHTML } fetchURL("https://mailchi.mp/cloudflare/theserverlistnewsletter-e06")

Verizon 和某 BGP 优化器如何在今日大范围重创互联网

大规模路由泄漏影响包括 Cloudflare 在内的主要互联网服务事件经过在 UTC 时间今天( 2019年6月24号)10 点 30 分,互联网遭受了一场不小的冲击。通过主要互联网转接服务提供供应商 Verizon (AS701) 转接的许多互联网路由,都被先导至宾夕法尼亚北部的一家小型公司。这相当于  滴滴错误地将整条高速公路导航至某小巷 – 导致大部分互联网用户无法访问 Cloudflare 上的许多网站和许多其他供应商的服务。这本不应该发生,因为 Verizon 不应将这些路径转送到互联网的其余部分。要理解事件发生原因,请继续阅读。此类不幸事件并不罕见,我们以前曾发布相关博文。这一次,全世界都再次见证了它所带来的严重损害。而Noction “BGP 优化器”产品的涉及,则让今天的事件进一步恶化。这个产品有一个功能:可将接收到的 IP 前缀拆分为更小的组成部分(称为更具体前缀)。例如,我们自己的 IPv4 路由 104.20.0.0/20 被转换为 104.20.0.0/21 和 104.20.8.0/21。就好像通往 ”北京”的路标被两个路标取代,一个是 ”北京东”,另一个是 ”北京西”。通过将这些主要 IP 块拆分为更小的部分,网络可引导其内部的流量,但这种拆分原本不允许向全球互联网广播。正是这方面的原因导致了今天的网络中断。为了解释后续发生的事情,我们先快速回顾一下互联网基础“地图”的工作原理。“Internet”的字面意思是网络互联,它由叫做自治系统(AS)的网络组成,每个网络都有唯一的标识符,即 AS 编号。所有这些网络都使用边界网关协议(BGP)来进行互连。BGP 将这些网络连接在一起,并构建互联网“地图”,使通信得以一个地方(例如,您的 ISP)传播到地球另一端的热门网站。通过 BGP,各大网络可以交换路由信息:如何从您所在的地址访问它们。这些路由可能很具体,类似于在 GPS 上查找特定城市,也可能非常宽泛,就如同让 GPS 指向某个省。这就是今天的问题所在。宾夕法尼亚州的互联网服务供应商(AS33154 - DQE Communications)在其网络中使用了 BGP 优化器,这意味着其网络中有很多更具体的路由。具体路由优先于更一般的路由(类似于在滴滴 中,前往北京火车站的路线比前往北京的路线更具体)。DQE 向其客户(AS396531 - Allegheny)公布了这些具体路由。然后,所有这些路由信息被转送到他们的另一个转接服务供应商 (AS701 - Verizon),后者则将这些"更好"的路线泄露给整个互联网。由于这些路由更精细、更具体,因此被误认为是“更好”的路由。泄漏本应止于 Verizon。然而,Verizon 违背了以下所述的众多最佳做法,因缺乏过滤机制而导致此次泄露变成重大事件,影响到众多互联网服务,如亚马逊、Linode 和 Cloudflare。这意味着,Verizon、Allegheny 和 DQE 突然之间必须应对大量试图通过其网络访问这些服务的互联网用户。由于这些网络没有适当的设备来应对流量的急剧增长,导致服务中断。即便他们有足够的能力来应对流量剧增,DQE、Allegheny 和 Verizon 也不应该被允许宣称他们拥有访问 Cloudflare、亚马逊、Linode 等服务的最佳路由...涉及 BGP 优化器的 BGP 泄漏过程在事件中,我们观察到,在最严重的时段,我们损失了大约 15% 的全球流量。事件中 Cloudflare 的流量水平。如何防止此类泄漏?有多种方法可以避免此类泄漏:一,在配置 BGP 会话的时候,可以对其接收的路由网段的数量设置硬性限制。这意味着,如果接收到的路由网段数超过阈值,路由器可以决定关闭会话。如果 Verizon 有这样的前缀限制,这次事件就不会发生。设置前缀数量限制是最佳做法。像 Verizon 这样的供应商可以无需付出任何成本便可以设置。没有任何合理的理由可以解释为什么他们没有设置这种限制的原因,只能归咎为草率或懒惰可能就是因为草率或懒惰。二,网络运营商防止类似泄漏的另外一种方法是:实施基于 IRR (Internet Routing Registry)的筛选。IRR 是因特网路由注册表,各网络所有者可以将自己的网段条目添加到这些分布式数据库中。然后,其他网络运营商可以使用这些 IRR 记录,在与同类运营商产生BGP 会话时,生成特定的前缀列表。如果使用了 IRR 过滤,所涉及的任何网络都不会接受更具体的网段前缀。非常令人震惊的是,尽管 IRR 过滤已经存在了 24 年(并且有详细记录),Verizon 没有在其与 Allegheny的 BGP 会话中实施任何此类过滤。IRR 过滤不会增加 Verizon 的任何成本或限制其服务。再次,我们唯一能想到的原因是草率或懒惰。三,我们去年在全球实施和部署的 RPKI 框架旨在防止此类泄漏。它支持对源网络和网段大小进行过滤。Cloudflare 发布的网段最长不超过 20 位。然后,RPKI 会指示无论路径是什么,都不应接受任何更具体网段前缀。要使此机制发挥作用,网络需要启用 BGP 源验证。AT&T 等许多供应商已在其网络中成功启用此功能。如果 Verizon 使用了 RPKI,他们就会发现播发的路由无效,路由器就能自动丢弃这些路由。Cloudflare 建议所有网络运营商立即部署 RPKI!使用 IRR、RPKI 和前缀限制预防路由泄漏上述所有建议经充分浓缩后已纳入 MANRS(共同协议路由安全规范)事件解决Cloudflare 的网络团队联系了涉事网络:AS33154 (DQE Communications) 和 AS701 (Verizon)。联系过程并不顺畅,这可能是由于路由泄露发生在美国东海岸凌晨时间。发送给 Verizon 的电子邮件截图我们的一位网络工程师迅速联系了 DQE Communications,稍有耽搁之后,他们帮助我们与解决该问题的相关人员取得了联系。DQE 与我们通过电话合作,停止向 Allegheny 播发这些“优化”路线路由。我们对他们的帮助表示感谢。采取此措施后,互联网变回稳定,事态恢复正常。尝试与 DQE 和 Verizon 客服进行沟通的截图很遗憾,我们尝试通过电子邮件和电话联系 Verizon,但直到撰写本文时(事件发生后超过 8 小时),尚未收到他们的回复,我们也不清楚他们是否正在采取行动以解决问题。Cloudflare 希望此类事件永不发生,但很遗憾,当前的互联网环境在预防防止此类事件方面作出的努力甚微。现在,业界都应该通过 RPKI 等系统部署更好,更安全的路由的时候。我们希望主要供应商能效仿 Cloudflare、亚马逊和 AT&T,开始验证路由。尤其并且,我们正密切关注 Verizon 并仍在等候其回复。尽管导致此次服务中断的事件并非我们所能控制,但我们仍对此感到抱歉。我们的团队非常关注我们的服务,在发现此问题几分钟后,即已安排美国、英国、澳大利亚和新加坡的工程师上线解决问题。

How Verizon and a BGP Optimizer Knocked Large Parts of the Internet Offline Today

Massive route leak impacts major parts of the Internet, including CloudflareWhat happened?Today at 10:30UTC, the Internet had a small heart attack. A small company in Northern Pennsylvania became a preferred path of many Internet routes through Verizon (AS701), a major Internet transit provider. This was the equivalent of Waze routing an entire freeway down a neighborhood street — resulting in many websites on Cloudflare, and many other providers, to be unavailable from large parts of the Internet. This should never have happened because Verizon should never have forwarded those routes to the rest of the Internet. To understand why, read on.We have blogged about these unfortunate events in the past, as they are not uncommon. This time, the damage was seen worldwide. What exacerbated the problem today was the involvement of a “BGP Optimizer” product from Noction. This product has a feature that splits up received IP prefixes into smaller, contributing parts (called more-specifics). For example, our own IPv4 route 104.20.0.0/20 was turned into 104.20.0.0/21 and 104.20.8.0/21. It’s as if the road sign directing traffic to “Pennsylvania” was replaced by two road signs, one for “Pittsburgh, PA” and one for “Philadelphia, PA”. By splitting these major IP blocks into smaller parts, a network has a mechanism to steer traffic within their network but that split should never have been announced to the world at large. When it was it caused today’s outage.To explain what happened next, here’s a quick summary of how the underlying “map” of the Internet works. “Internet” literally means a network of networks and it is made up of networks called Autonomous Systems (AS), and each of these networks has a unique identifier, its AS number. All of these networks are interconnected using a protocol called Border Gateway Protocol (BGP). BGP joins these networks together and builds the Internet “map” that enables traffic to travel from, say, your ISP to a popular website on the other side of the globe.Using BGP, networks exchange route information: how to get to them from wherever you are. These routes can either be specific, similar to finding a specific city on your GPS, or very general, like pointing your GPS to a state. This is where things went wrong today.An Internet Service Provider in Pennsylvania  (AS33154 - DQE Communications) was using a BGP optimizer in their network, which meant there were a lot of more specific routes in their network. Specific routes override more general routes (in the Waze analogy a route to, say, Buckingham Palace is more specific than a route to London).DQE announced these specific routes to their customer (AS396531 - Allegheny Technologies Inc). All of this routing information was then sent to their other transit provider (AS701 - Verizon), who proceeded to tell the entire Internet about these “better” routes. These routes were supposedly “better” because they were more granular, more specific. The leak should have stopped at Verizon. However, against numerous best practices outlined below, Verizon’s lack of filtering turned this into a major incident that affected many Internet services such as Amazon,  Linode and Cloudflare. What this means is that suddenly Verizon, Allegheny, and DQE had to deal with a stampede of Internet users trying to access those services through their network. None of these networks were suitably equipped to deal with this drastic increase in traffic, causing disruption in service. Even if they had sufficient capacity DQE, Allegheny and Verizon were not allowed to say they had the best route to Cloudflare, Amazon, Linode, etc...BGP leak process with a BGP optimizerDuring the incident, we observed a loss, at the worst of the incident, of about 15% of our global traffic.Traffic levels at Cloudflare during the incident.How could this leak have been prevented?There are multiple ways this leak could have been avoided:A BGP session can be configured with a hard limit of prefixes to be received. This means a router can decide to shut down a session if the number of prefixes goes above the threshold. Had Verizon had such a prefix limit in place, this would not have occurred. It is a best practice to have such limits in place. It doesn't cost a provider like Verizon anything to have such limits in place. And there's no good reason, other than sloppiness or laziness, that they wouldn't have such limits in place.A different way network operators can prevent leaks like this one is by implementing IRR-based filtering. IRR is the Internet Routing Registry, and networks can add entries to these distributed databases. Other network operators can then use these IRR records to generate specific prefix lists for the BGP sessions with their peers. If IRR filtering had been used, none of the networks involved would have accepted the faulty more-specifics. What’s quite shocking is that it appears that Verizon didn’t implement any of this filtering in their BGP session with Allegheny Technologies, even though IRR filtering has been around (and well documented) for over 24 years. IRR filtering would not have increased Verizon's costs or limited their service in any way. Again, the only explanation we can conceive of why it wasn't in place is sloppiness or laziness.The RPKI framework that we implemented and deployed globally last year is designed to prevent this type of leak. It enables filtering on origin network and prefix size. The prefixes Cloudflare announces are signed for a maximum size of 20. RPKI then indicates any more-specific prefix should not be accepted, no matter what the path is. In order for this mechanism to take action, a network needs to enable BGP Origin Validation. Many providers like AT&T have already enabled it successfully in their network.If Verizon had used RPKI, they would have seen that the advertised routes were not valid, and the routes could have been automatically dropped by the router.Cloudflare encourages all network operators to deploy RPKI now!Route leak prevention using IRR, RPKI, and prefix limitsAll of the above suggestions are nicely condensed into MANRS (Mutually Agreed Norms for Routing Security)How it was resolvedThe network team at Cloudflare reached out to the networks involved, AS33154 (DQE Communications) and AS701 (Verizon). We had difficulties reaching either network, this may have been due to the time of the incident as it was still early on the East Coast of the US when the route leak started.Screenshot of the email sent to VerizonOne of our network engineers made contact with DQE Communications quickly and after a little delay they were able to put us in contact with someone who could fix the problem. DQE worked with us on the phone to stop advertising these “optimized” routes to Allegheny Technologies Inc. We're grateful for their help. Once this was done, the Internet stabilized, and things went back to normal.Screenshot of attempts to communicate with the support for DQE and VerizonIt is unfortunate that while we tried both e-mail and phone calls to reach out to Verizon, at the time of writing this article (over 8 hours after the incident), we have not heard back from them, nor are we aware of them taking action to resolve the issue.At Cloudflare, we wish that events like this never take place, but unfortunately the current state of the Internet does very little to prevent incidents such as this one from occurring. It's time for the industry to adopt better routing security through systems like RPKI. We hope that major providers will follow the lead of Cloudflare, Amazon, and AT&T and start validating routes. And, in particular, we're looking at you Verizon — and still waiting on your reply.Despite this being caused by events outside our control, we’re sorry for the disruption. Our team cares deeply about our service and we had engineers in the US, UK, Australia, and Singapore online minutes after this problem was identified.

Pages

Recommended Content