Service Provider Blogs

Going Keyless Everywhere

CloudFlare Blog -

Time flies. The Heartbleed vulnerability was discovered just over five and a half years ago. Heartbleed became a household name not only because it was one of the first bugs with its own web page and logo, but because of what it revealed about the fragility of the Internet as a whole. With Heartbleed, one tiny bug in a cryptography library exposed the personal data of the users of almost every website online.Heartbleed is an example of an underappreciated class of bugs: remote memory disclosure vulnerabilities. High profile examples other than Heartbleed include Cloudbleed and most recently NetSpectre. These vulnerabilities allow attackers to extract secrets from servers by simply sending them specially-crafted packets. Cloudflare recently completed a multi-year project to make our platform more resilient against this category of bug.For the last five years, the industry has been dealing with the consequences of the design that led to Heartbleed being so impactful. In this blog post we’ll dig into memory safety, and how we re-designed Cloudflare’s main product to protect private keys from the next Heartbleed.Memory DisclosurePerfect security is not possible for businesses with an online component. History has shown us that no matter how robust their security program, an unexpected exploit can leave a company exposed. One of the more famous recent incidents of this sort is Heartbleed, a vulnerability in a commonly used cryptography library called OpenSSL that exposed the inner details of millions of web servers to anyone with a connection to the Internet. Heartbleed made international news, caused millions of dollars of damage, and still hasn’t been fully resolved.Typical web services only return data via well-defined public-facing interfaces called APIs. Clients don’t typically get to see what’s going on under the hood inside the server, that would be a huge privacy and security risk. Heartbleed broke that paradigm: it enabled anyone on the Internet to get access to take a peek at the operating memory used by web servers, revealing privileged data usually not exposed via the API. Heartbleed could be used to extract the result of previous data sent to the server, including passwords and credit cards. It could also reveal the inner workings and cryptographic secrets used inside the server, including TLS certificate private keys.Heartbleed let attackers peek behind the curtain, but not too far. Sensitive data could be extracted, but not everything on the server was at risk. For example, Heartbleed did not enable attackers to steal the content of databases held on the server. You may ask: why was some data at risk but not others? The reason has to do with how modern operating systems are built.A simplified view of process isolationMost modern operating systems are split into multiple layers. These layers are analogous to security clearance levels. So-called user-space applications (like your browser) typically live in a low-security layer called user space. They only have access to computing resources (memory, CPU, networking) if the lower, more credentialed layers let them.User-space applications need resources to function. For example, they need memory to store their code and working memory to do computations. However, it would be risky to give an application direct access to the physical RAM of the computer they’re running on. Instead, the raw computing elements are restricted to a lower layer called the operating system kernel. The kernel only runs specially-designed applications designed to safely manage these resources and mediate access to them for user-space applications.When a new user space application process is launched, the kernel gives it a virtual memory space. This virtual memory space acts like real memory to the application but is actually a safely guarded translation layer the kernel uses to protect the real memory. Each application’s virtual memory space is like a parallel universe dedicated to that application. This makes it impossible for one process to view or modify another’s, the other applications are simply not addressable.Heartbleed, Cloudbleed and the process boundaryHeartbleed was a vulnerability in the OpenSSL library, which was part of many web server applications. These web servers run in user space, like any common applications. This vulnerability caused the web server to return up to 2 kilobytes of its memory in response to a specially-crafted inbound request.Cloudbleed was also a memory disclosure bug, albeit one specific to Cloudflare, that got its name because it was so similar to Heartbleed. With Cloudbleed, the vulnerability was not in OpenSSL, but instead in a secondary web server application used for HTML parsing. When this code parsed a certain sequence of HTML, it ended up inserting some process memory into the web page it was serving.It’s important to note that both of these bugs occurred in applications running in user space, not kernel space. This means that the memory exposed by the bug was necessarily part of the virtual memory of the application. Even if the bug were to expose megabytes of data, it would only expose data specific to that application, not other applications on the system.In order for a web server to serve traffic over the encrypted HTTPS protocol, it needs access to the certificate’s private key, which is typically kept in the application’s memory. These keys were exposed to the Internet by Heartbleed. The Cloudbleed vulnerability affected a different process, the HTML parser, which doesn’t do HTTPS and therefore doesn’t keep the private key in memory. This meant that HTTPS keys were safe, even if other data in the HTML parser’s memory space wasn’t.The fact that the HTML parser and the web server were different applications saved us from having to revoke and re-issue our customers’ TLS certificates. However, if another memory disclosure vulnerability is discovered in the web server, these keys are again at risk.Moving keys out of Internet-facing processesNot all web servers keep private keys in memory. In some deployments, private keys are held in a separate machine called a Hardware Security Module (HSM). HSMs are built to withstand physical intrusion and tampering and are often built to comply with stringent compliance requirements. They can often be bulky and expensive. Web servers designed to take advantage of keys in an HSM connect to them over a physical cable and communicate with a specialized protocol called PKCS#11. This allows the web server to serve encrypted content while being physically separated from the private key.At Cloudflare, we built our own way to separate a web server from a private key: Keyless SSL. Rather than keeping the keys in a separate physical machine connected to the server with a cable, the keys are kept in a key server operated by the customer in their own infrastructure (this can also be backed by an HSM).More recently, we launched Geo Key Manager, a service that allows users to store private keys in only select Cloudflare locations. Connections to locations that do not have access to the private key use Keyless SSL with a key server hosted in a datacenter that does have access. In both Keyless SSL and Geo Key Manager, private keys are not only not part of the web server’s memory space, they’re often not even in the same country! This extreme degree of separation is not necessary to protect against the next Heartbleed. All that is needed is for the web server and the key server to not be part of the same application. So that’s what we did. We call this Keyless Everywhere.Keyless SSL is coming from inside the houseRepurposing Keyless SSL for Cloudflare-held private keys was easy to conceptualize, but the path from ideation to live in production wasn't so straightforward. The core functionality of Keyless SSL comes from the open source gokeyless which customers run on their infrastructure, but internally we use it as a library and have replaced the main package with an implementation suited to our requirements (we've creatively dubbed it gokeyless-internal).As with all major architecture changes, it’s prudent to start with testing out the model with something new and low risk. In our case, the test bed was our experimental TLS 1.3 implementation. In order to quickly iterate through draft versions of the TLS specification and push releases without affecting the majority of Cloudflare customers, we re-wrote our custom nginx web server in Go and deployed it in parallel to our existing infrastructure. This server was designed to never hold private keys from the start and only leverage gokeyless-internal. At this time there was only a small amount of TLS 1.3 traffic and it was all coming from the beta versions of browsers, which allowed us to work through the initial kinks of gokeyless-internal without exposing the majority of visitors to security risks or outages due to gokeyless-internal.The first step towards making TLS 1.3 fully keyless was identifying and implementing the new functionality we needed to add to gokeyless-internal. Keyless SSL was designed to run on customer infrastructure, with the expectation of supporting only a handful of private keys. But our edge must simultaneously support millions of private keys, so we implemented the same lazy loading logic we use in our web server, nginx. Furthermore, a typical customer deployment would put key servers behind a network load balancer, so they could be taken out of service for upgrades or other maintenance. Contrast this with our edge, where it’s important to maximize our resources by serving traffic during software upgrades. This problem is solved by the excellent tableflip package we use elsewhere at Cloudflare.The next project to go Keyless was Spectrum, which launched with default support for gokeyless-internal. With these small victories in hand, we had the confidence necessary to attempt the big challenge, which was porting our existing nginx infrastructure to a fully keyless model. After implementing the new functionality, and being satisfied with our integration tests, all that’s left is to turn this on in production and call it a day, right? Anyone with experience with large distributed systems knows how far "working in dev" is from "done," and this story is no different. Thankfully we were anticipating problems, and built a fallback into nginx to complete the handshake itself if any problems were encountered with the gokeyless-internal path. This allowed us to expose gokeyless-internal to production traffic without risking downtime in the event that our reimplementation of the nginx logic was not 100% bug-free.When rolling back the code doesn’t roll back the problemOur deployment plan was to enable Keyless Everywhere, find the most common causes of fallbacks, and then fix them. We could then repeat this process until all sources of fallbacks had been eliminated, after which we could remove access to private keys (and therefore the fallback) from nginx. One of the early causes of fallbacks was gokeyless-internal returning ErrKeyNotFound, indicating that it couldn’t find the requested private key in storage. This should not have been possible, since nginx only makes a request to gokeyless-internal after first finding the certificate and key pair in storage, and we always write the private key and certificate together. It turned out that in addition to returning the error for the intended case of the key truly not found, we were also returning it when transient errors like timeouts were encountered. To resolve this, we updated those transient error conditions to return ErrInternal, and deployed to our canary datacenters. Strangely, we found that a handful of instances in a single datacenter started encountering high rates of fallbacks, and the logs from nginx indicated it was due to a timeout between nginx and gokeyless-internal. The timeouts didn’t occur right away, but once a system started logging some timeouts it never stopped. Even after we rolled back the release, the fallbacks continued with the old version of the software! Furthermore, while nginx was complaining about timeouts, gokeyless-internal seemed perfectly healthy and was reporting reasonable performance metrics (sub-millisecond median request latency).To debug the issue, we added detailed logging to both nginx and gokeyless, and followed the chain of events backwards once timeouts were encountered.➜ ~ grep 'timed out' nginx.log | grep Keyless | head -5 2018-07-25T05:30:49.000 29m41 2018/07/25 05:30:49 [error] 4525#0: *1015157 Keyless SSL request/response timed out while reading Keyless SSL response, keyserver: 127.0.0.1 2018-07-25T05:30:49.000 29m41 2018/07/25 05:30:49 [error] 4525#0: *1015231 Keyless SSL request/response timed out while waiting for Keyless SSL response, keyserver: 127.0.0.1 2018-07-25T05:30:49.000 29m41 2018/07/25 05:30:49 [error] 4525#0: *1015271 Keyless SSL request/response timed out while waiting for Keyless SSL response, keyserver: 127.0.0.1 2018-07-25T05:30:49.000 29m41 2018/07/25 05:30:49 [error] 4525#0: *1015280 Keyless SSL request/response timed out while waiting for Keyless SSL response, keyserver: 127.0.0.1 2018-07-25T05:30:50.000 29m41 2018/07/25 05:30:50 [error] 4525#0: *1015289 Keyless SSL request/response timed out while waiting for Keyless SSL response, keyserver: 127.0.0.1 You can see the first request to log a timeout had id 1015157. Also interesting that the first log line was "timed out while reading," but all the others are "timed out while waiting," and this latter message is the one that continues forever. Here is the matching request in the gokeyless log:➜ ~ grep 'id=1015157 ' gokeyless.log | head -1 2018-07-25T05:30:39.000 29m41 2018/07/25 05:30:39 [DEBUG] connection 127.0.0.1:30520: worker=ecdsa-29 opcode=OpECDSASignSHA256 id=1015157 sni=announce.php?info_hash=%a8%9e%9dc%cc%3b1%c8%23%e4%93%21r%0f%92mc%0c%15%89&peer_id=-ut353s-%ce%ad%5e%b1%99%06%24e%d5d%9a%08&port=42596&uploaded=65536&downloaded=0&left=0&corrupt=0&key=04a184b7&event=started&numwant=200&compact=1&no_peer_id=1 ip=104.20.33.147 Aha! That SNI value is clearly invalid (SNIs are like Host headers, i.e. they are domains, not URL paths), and it’s also quite long. Our storage system indexes certificates based on two indices: which SNI they correspond to, and which IP addresses they correspond to (for older clients that don’t support SNI). Our storage interface uses the memcached protocol, and the client library that gokeyless-internal uses rejects requests for keys longer than 250 characters (memcached’s maximum key length), whereas the nginx logic is to simply ignore the invalid SNI and treat the request as if only had an IP. The change in our new release had shifted this condition from ErrKeyNotFound to ErrInternal, which triggered cascading problems in nginx. The “timeouts” it encountered were actually a result of throwing away all in-flight requests multiplexed on a connection which happened to return ErrInternalfor a single request. These requests were retried, but once this condition triggered, nginx became overloaded by the number of retried requests plus the continuous stream of new requests coming in with bad SNI, and was unable to recover. This explains why rolling back gokeyless-internal didn’t fix the problem.This discovery finally brought our attention to nginx, which thus far had escaped blame since it had been working reliably with customer key servers for years. However, communicating over localhost to a multitenant key server is fundamentally different than reaching out over the public Internet to communicate with a customer’s key server, and we had to make the following changes:Instead of a long connection timeout and a relatively short response timeout for customer key servers, extremely short connection timeouts and longer request timeouts are appropriate for a localhost key server.Similarly, it’s reasonable to retry (with backoff) if we timeout waiting on a customer key server response, since we can’t trust the network. But over localhost, a timeout would only occur if gokeyless-internal were overloaded and the request were still queued for processing. In this case a retry would only lead to more total work being requested of gokeyless-internal, making the situation worse.Most significantly, nginx must not throw away all requests multiplexed on a connection if any single one of them encounters an error, since a single connection no longer represents a single customer.Implementations matterCPU at the edge is one of our most precious assets, and it’s closely guarded by our performance team (aka CPU police). Soon after turning on Keyless Everywhere in one of our canary datacenters, they noticed gokeyless using ~50% of a core per instance. We were shifting the sign operations from nginx to gokeyless, so of course it would be using more CPU now. But nginx should have seen a commensurate reduction in CPU usage, right?Wrong. Elliptic curve operations are very fast in Go, but it’s known that RSA operations are much slower than their BoringSSL counterparts.Although Go 1.11 includes optimizations for RSA math operations, we needed more speed. Well-tuned assembly code is required to match the performance of BoringSSL, so Armando Faz from our Crypto team helped claw back some of the lost CPU by reimplementing parts of the math/big package with platform-dependent assembly in an internal fork of Go. The recent assembly policy of Go prefers the use of Go portable code instead of assembly, so these optimizations were not upstreamed. There is still room for more optimizations, and for that reason we’re still evaluating moving to cgo + BoringSSL for sign operations, despite cgo’s many downsides.Changing our toolingProcess isolation is a powerful tool for protecting secrets in memory. Our move to Keyless Everywhere demonstrates that this is not a simple tool to leverage. Re-architecting an existing system such as nginx to use process isolation to protect secrets was time-consuming and difficult. Another approach to memory safety is to use a memory-safe language such as Rust.Rust was originally developed by Mozilla but is starting to be used much more widely. The main advantage that Rust has over C/C++ is that it has memory safety features without a garbage collector.Re-writing an existing application in a new language such as Rust is a daunting task. That said, many new Cloudflare features, from the powerful Firewall Rules feature to our 1.1.1.1 with WARP app, have been written in Rust to take advantage of its powerful memory-safety properties. We’re really happy with Rust so far and plan on using it even more in the future.ConclusionThe harrowing aftermath of Heartbleed taught the industry a lesson that should have been obvious in retrospect: keeping important secrets in applications that can be accessed remotely via the Internet is a risky security practice. In the following years, with a lot of work, we leveraged process separation and Keyless SSL to ensure that the next Heartbleed wouldn’t put customer keys at risk.However, this is not the end of the road. Recently memory disclosure vulnerabilities such as NetSpectre have been discovered which are able to bypass application process boundaries, so we continue to actively explore new ways to keep keys secure.

Delegated Credentials for TLS

CloudFlare Blog -

Today we’re happy to announce support for a new cryptographic protocol that helps make it possible to deploy encrypted services in a global network while still maintaining fast performance and tight control of private keys: Delegated Credentials for TLS. We have been working with partners from Facebook, Mozilla, and the broader IETF community to define this emerging standard. We’re excited to share the gory details today in this blog post.Also, be sure to check out the blog posts on the topic by our friends at Facebook and Mozilla!Deploying TLS globallyMany of the technical problems we face at Cloudflare are widely shared problems across the Internet industry. As gratifying as it can be to solve a problem for ourselves and our customers, it can be even more gratifying to solve a problem for the entire Internet. For the past three years, we have been working with peers in the industry to solve a specific shared problem in the TLS infrastructure space: How do you terminate TLS connections while storing keys remotely and maintaining performance and availability? Today we’re announcing that Cloudflare now supports Delegated Credentials, the result of this work.Cloudflare’s TLS/SSL features are among the top reasons customers use our service. Configuring TLS is hard to do without internal expertise. By automating TLS, web site and web service operators gain the latest TLS features and the most secure configurations by default. It also reduces the risk of outages or bad press due to misconfigured or insecure encryption settings. Customers also gain early access to unique features like TLS 1.3, post-quantum cryptography, and OCSP stapling as they become available.Unfortunately, for web services to authorize a service to terminate TLS for them, they have to trust the service with their private keys, which demands a high level of trust. For services with a global footprint, there is an additional level of nuance. They may operate multiple data centers located in places with varying levels of physical security, and each of these needs to be trusted to terminate TLS.To tackle these problems of trust, Cloudflare has invested in two technologies: Keyless SSL, which allows customers to use Cloudflare without sharing their private key with Cloudflare; and Geo Key Manager, which allows customers to choose the geographical locations in which Cloudflare should keep their keys. Both of these technologies are able to be deployed without any changes to browsers or other clients. They also come with some downsides in the form of availability and performance degradation.Keyless SSL introduces extra latency at the start of a connection. In order for a server without access to a private key to establish a connection with a client, that servers needs to reach out to a key server, or a remote point of presence, and ask them to do a private key operation. This not only adds additional latency to the connection, causing the content to load slower, but it also introduces some troublesome operational constraints on the customer. Specifically, the server with access to the key needs to be highly available or the connection can fail. Sites often use Cloudflare to improve their site’s availability, so having to run a high-availability key server is an unwelcome requirement.Turning a pull into a pushThe reason services like Keyless SSL that rely on remote keys are so brittle is their architecture: they are pull-based rather than push-based. Every time a client attempts a handshake with a server that doesn’t have the key, it needs to pull the authorization from the key server. An alternative way to build this sort of system is to periodically push a short-lived authorization key to the server and use that for handshakes. Switching from a pull-based model to a push-based model eliminates the additional latency, but it comes with additional requirements, including the need to change the client.Enter the new TLS feature of Delegated Credentials (DCs). A delegated credential is a short-lasting key that the certificate’s owner has delegated for use in TLS. They work like a power of attorney: your server authorizes our server to terminate TLS for a limited time. When a browser that supports this protocol connects to our edge servers we can show it this “power of attorney”, instead of needing to reach back to a customer’s server to get it to authorize the TLS connection. This reduces latency and improves performance and reliability.The pull modelThe push modelA fresh delegated credential can be created and pushed out to TLS servers long before the previous credential expires. Momentary blips in availability will not lead to broken handshakes for clients that support delegated credentials. Furthermore, a Delegated Credentials-enabled TLS connection is just as fast as a standard TLS connection: there’s no need to connect to the key server for every handshake. This removes the main drawback of Keyless SSL for DC-enabled clients.Delegated credentials are intended to be an Internet Standard RFC that anyone can implement and use, not a replacement for Keyless SSL. Since browsers will need to be updated to support the standard, proprietary mechanisms like Keyless SSL and Geo Key Manager will continue to be useful. Delegated credentials aren’t just useful in our context, which is why we’ve developed it openly and with contributions from across industry and academia. Facebook has integrated them into their own TLS implementation, and you can read more about how they view the security benefits here.  When it comes to improving the security of the Internet, we’re all on the same team."We believe delegated credentials provide an effective way to boost security by reducing certificate lifetimes without sacrificing reliability. This will soon become an Internet standard and we hope others in the industry adopt delegated credentials to help make the Internet ecosystem more secure." — Subodh Iyengar, software engineer at Facebook Extensibility beyond the PKIAt Cloudflare, we’re interested in pushing the state of the art forward by experimenting with new algorithms. In TLS, there are three main areas of experimentation: ciphers, key exchange algorithms, and authentication algorithms. Ciphers and key exchange algorithms are only dependent on two parties: the client and the server. This freedom allows us to deploy exciting new choices like ChaCha20-Poly1305 or post-quantum key agreement in lockstep with browsers. On the other hand, the authentication algorithms used in TLS are dependent on certificates, which introduces certificate authorities and the entire public key infrastructure into the mix.Unfortunately, the public key infrastructure is very conservative in its choice of algorithms, making it harder to adopt newer cryptography for authentication algorithms in TLS. For instance, EdDSA, a highly-regarded signature scheme, is not supported by certificate authorities, and root programs limit the certificates that will be signed. With the emergence of quantum computing, experimenting with new algorithms is essential to determine which solutions are deployable and functional on the Internet.Since delegated credentials introduce the ability to use new authentication key types without requiring changes to certificates themselves, this opens up a new area of experimentation. Delegated credentials can be used to provide a level of flexibility in the transition to post-quantum cryptography, by enabling new algorithms and modes of operation to coexist with the existing PKI infrastructure. It also enables tiny victories, like the ability to use smaller, faster Ed25519 signatures in TLS.Inside DCsA delegated credential contains a public key and an expiry time. This bundle is then signed by a certificate along with the certificate itself, binding the delegated credential to the certificate for which it is acting as “power of attorney”. A supporting client indicates its support for delegated credentials by including an extension in its Client Hello.A server that supports delegated credentials composes the TLS Certificate Verify and Certificate messages as usual, but instead of signing with the certificate’s private key, it includes the certificate along with the DC, and signs with the DC’s private key. Therefore, the private key of the certificate only needs to be used for the signing of the DC.Certificates used for signing delegated credentials require a special X.509 certificate extension (currently only available at DigiCert). This requirement exists to avoid breaking assumptions people may have about the impact of temporary access to their keys on security, particularly in cases involving HSMs and the still unfixed Bleichbacher oracles in older TLS versions.  Temporary access to a key can enable signing lots of delegated credentials which start far in the future, and as a result support was made opt-in. Early versions of QUIC had similar issues, and ended up adopting TLS to fix them. Protocol evolution on the Internet requires working well with already existing protocols and their flaws.Delegated Credentials at Cloudflare and BeyondCurrently we use delegated credentials as a performance optimization for Geo Key Manager and Keyless SSL. Customers can update their certificates to include the special extension for delegated credentials, and we will automatically create delegated credentials and distribute them to the edge through the Keyless SSL or Geo Key Manager. For more information, see the documentation. It also enables us to be more conservative about where we keep keys for customers, improving our security posture.Delegated Credentials would be useless if it wasn’t also supported by browsers and other HTTP clients. Christopher Patton, a former intern at Cloudflare, implemented support in Firefox and its underlying NSS security library. This feature is now in the Nightly versions of Firefox. You can turn it on by activating the configuration option security.tls.enable_delegated_credentials at about:config. Studies are ongoing on how effective this will be in a wider deployment. There also is support for Delegated Credentials in BoringSSL."At Mozilla we welcome ideas that help to make the Web PKI more robust. The Delegated Credentials feature can help to provide secure and performant TLS connections for our users, and we're happy to work with Cloudflare to help validate this feature." — Thyla van der Merwe, Cryptography Engineering Manager at Mozilla One open issue is the question of client clock accuracy. Until we have a wide-scale study we won’t know how many connections using delegated credentials will break because of the 24 hour time limit that is imposed.  Some clients, in particular mobile clients, may have inaccurately set clocks, the root cause of one third of all certificate errors in Chrome. Part of the way that we’re aiming to solve this problem is through standardizing and improving Roughtime, so web browsers and other services that need to validate certificates can do so independent of the client clock.Cloudflare’s global scale means that we see connections from every corner of the world, and from many different kinds of connection and device. That reach enables us to find rare problems with the deployability of protocols. For example, our early deployment helped inform the development of the TLS 1.3 standard. As we enable developing protocols like delegated credentials, we learn about obstacles that inform and affect their future development.ConclusionAs new protocols emerge, we'll continue to play a role in their development and bring their benefits to our customers. Today’s announcement of a technology that overcomes some limitations of Keyless SSL is just one example of how Cloudflare takes part in improving the Internet not just for our customers, but for everyone. During the standardization process of turning the draft into an RFC, we’ll continue to maintain our implementation and come up with new ways to apply delegated credentials.

Announcing cfnts: Cloudflare's implementation of NTS in Rust

CloudFlare Blog -

Several months ago we announced that we were providing a new public time service. Part of what we were providing was the first major deployment of the new Network Time Security (NTS) protocol, with a newly written implementation of NTS in Rust. In the process, we received helpful advice from the NTP community, especially from the NTPSec and Chrony projects. We’ve also participated in several interoperability events. Now we are returning something to the community: Our implementation, cfnts, is now open source and we welcome your pull requests and issues.The journey from a blank source file to a working, deployed service was a lengthy one, and it involved many people across multiple teams."Correct time is a necessity for most security protocols in use on the Internet. Despite this, secure time transfer over the Internet has previously required complicated configuration on a case by case basis. With the introduction of NTS, secure time synchronization will finally be available for everyone. It is a small, but important, step towards increasing security in all systems that depend on accurate time. I am happy that Cloudflare are sharing their NTS implementation. A diversity of software with NTS support is important for quick adoption of the new protocol." — Marcus Dansarie, coauthor of the NTS specification How NTS worksNTS is structured as a suite of two sub-protocols as shown in the figure below. The first is the Network Time Security Key Exchange (NTS-KE), which is always conducted over Transport Layer Security (TLS) and handles the creation of key material and parameter negotiation for the second protocol. The second is NTPv4, the current version of the NTP protocol, which allows the client to synchronize their time from the remote server.In order to maintain the scalability of NTPv4, it was important that the server not maintain per-client state. A very small server can serve millions of NTP clients. Maintaining this property while providing security is achieved with cookies that the server provides to the client that contain the server state.In the first stage, the client sends a request to the NTS-KE server and gets a response via TLS. This exchange carries out a number of functions:Negotiates the AEAD algorithm to be used in the second stage.Negotiates the second protocol. Currently, the standard only defines how NTS works with NTPv4.Negotiates the NTP server IP address and port.Creates cookies for use in the second stage.Creates two symmetric keys (C2S and S2C) from the TLS session via exporters.In the second stage, the client securely synchronizes the clock with the negotiated NTP server. To synchronize securely, the client sends NTPv4 packets with four special extensions:Unique Identifier Extension contains a random nonce used to prevent replay attacks.NTS Cookie Extension contains one of the cookies that the client stores. Since currently only the client remembers the two AEAD keys (C2S and S2C), the server needs to use the cookie from this extension to extract the keys. Each cookie contains the keys encrypted under a secret key the server has.NTS Cookie Placeholder Extension is a signal from the client to request additional cookies from the server. This extension is needed to make sure that the response is not much longer than the request to prevent amplification attacks.NTS Authenticator and Encrypted Extension Fields Extension contains a ciphertext from the AEAD algorithm with C2S as a key and with the NTP header, timestamps, and all the previously mentioned extensions as associated data. Other possible extensions can be included as encrypted data within this field. Without this extension, the timestamp can be spoofed.After getting a request, the server sends a response back to the client echoing the Unique Identifier Extension to prevent replay attacks, the NTS Cookie Extension to provide the client with more cookies, and the NTS Authenticator and Encrypted Extension Fields Extension with an AEAD ciphertext with S2C as a key. But in the server response, instead of sending the NTS Cookie Extension in plaintext, it needs to be encrypted with the AEAD to provide unlinkability of the NTP requests.The second handshake can be repeated many times without going back to the first stage since each request and response gives the client a new cookie. The expensive public key operations in TLS are thus amortized over a large number of requests. Furthermore, specialized timekeeping devices like FPGA implementations only need to implement a few symmetric cryptographic functions and can delegate the complex TLS stack to a different device.Why Rust?While many of our services are written in Go, and we have considerable experience on the Crypto team with Go, a garbage collection pause in the middle of responding to an NTP packet would negatively impact accuracy. We picked Rust because of its zero-overhead and useful features.Memory safety After Heartbleed, Cloudbleed, and the steady drip of vulnerabilities caused by C’s lack of memory safety, it’s clear that C is not a good choice for new software dealing with untrusted inputs. The obvious solution for memory safety is to use garbage collection, but garbage collection has a substantial runtime overhead, while Rust has less runtime overhead.Non-nullability Null pointers are an edge case that is frequently not handled properly. Rust explicitly marks optionality, so all references in Rust can be safely dereferenced. The type system ensures that option types are properly handled.Thread safety  Data-race prevention is another key feature of Rust. Rust’s ownership model ensures that all cross-thread accesses are synchronized by default. While not a panacea, this eliminates a major class of bugs.Immutability Separating types into mutable and immutable is very important for reducing bugs. For example, in Java, when you pass an object into a function as a parameter, after the function is finished, you will never know whether the object has been mutated or not. Rust allows you to pass the object reference into the function and still be assured that the object is not mutated.Error handling  Rust result types help with ensuring that operations that can produce errors are identified and a choice made about the error, even if that choice is passing it on.While Rust provides safety with zero overhead, coding in Rust involves understanding linear types and for us a new language. In this case the importance of security and performance meant we chose Rust over a potentially easier task in Go.Dependencies we useBecause of our scale and for DDoS protection we needed a highly scalable server. For UDP protocols without the concept of a connection, the server can respond to one packet at a time easily, but for TCP this is more complex. Originally we thought about using Tokio. However, at the time Tokio suffered from scheduler problems that had caused other teams some issues. As a result we decided to use Mio directly, basing our work on the examples in Rustls.We decided to use Rustls over OpenSSL or BoringSSL because of the crate's consistent error codes and default support for authentication that is difficult to disable accidentally. While there are some features that are not yet supported, it got the job done for our service.Other engineering choicesMore important than our choice of programming language was our implementation strategy. A working, fully featured NTP implementation is a complicated program involving a phase-locked loop. These have a difficult reputation due to their nonlinear nature, beyond the usual complexities of closed loop control. The response of a phase lock loop to a disturbance can be estimated if the loop is locked and the disturbance small. However, lock acquisition, large disturbances, and the necessary filtering in NTP are all hard to analyze mathematically since they are not captured in the linear models applied for small scale analysis. While NTP works with the total phase, unlike the phase-locked loops of electrical engineering, there are still nonlinear elements. For NTP testing, changes to this loop requires weeks of operation to determine the performance as the loop responds very slowly.Computer clocks are generally accurate over short periods, while networks are plagued with inconsistent delays. This demands a slow response. Changes we make to our service have taken hours to have an effect, as the clients slowly adapt to the new conditions. While RFC 5905 provides lots of details on an algorithm to adjust the clock, later implementations such as chrony have improved upon the algorithm through much more sophisticated nonlinear filters.Rather than implement these more sophisticated algorithms, we let chrony adjust the clock of our servers, and copy the state variables in the header from chrony and adjust the dispersion and root delay according to the formulas given in the RFC. This strategy let us focus on the new protocols.PraguePart of what the Internet Engineering Task Force (IETF) does is organize events like hackathons where implementers of a new standard can get together and try to make their stuff work with one another. This exposes bugs and infelicities of language in the standard and the implementations. We attended the IETF 104 hackathon to develop our server and make it work with other implementations. The NTP working group members were extremely generous with their time, and during the process we uncovered a few issues relating to the exact way one has to handle ALPN with older OpenSSL versions.At the IETF 104 in Prague we had a working client and server for NTS-KE by the end of the hackathon. This was a good amount of progress considering we started with nothing. However, without implementing NTP we didn’t actually know that our server and client were computing the right thing. That would have to wait for later rounds of testing.Wireshark during some NTS debuggingCrypto WeekAs Crypto Week 2019 approached we were busily writing code. All of the NTP protocol had to be implemented, together with the connection between the NTP and NTS-KE parts of the server. We also had to deploy processes to synchronize the ticket encrypting keys around the world and work on reconfiguring our own timing infrastructure to support this new service.With a few weeks to go we had a working implementation, but we needed servers and clients out there to test with. But because we only support TLS 1.3 on the server, which had only just entered into OpenSSL, there were some compatibility problems.We ended up compiling a chrony branch with NTS support and NTPsec ourselves and testing against time.cloudflare.com. We also tested our client against test servers set up by the chrony and NTPsec projects, in the hopes that this would expose bugs and have our implementations work nicely together. After a few lengthy days of debugging, we found out that our nonce length wasn’t exactly in accordance with the spec, which was quickly fixed. The NTPsec project was extremely helpful in this effort. Of course, this was the day that our office had a blackout, so the testing happened outside in Yerba Buena Gardens.Yerba Buena commons. Taken by Wikipedia user Beyond My Ken. CC-BY-SADuring the deployment of time.cloudflare.com, we had to open up our firewall to incoming NTP packets. Since the start of Cloudflare’s network existence and because of NTP reflection attacks we had previously closed UDP port 123 on the router. Since source port 123 is also used by clients sometimes to send NTP packets, it’s impossible for NTP servers to filter reflection attacks without parsing the contents of NTP packet, which routers have difficulty doing.  In order to protect Cloudflare infrastructure we got an entire subnet just for the time service, so it could be aggressively throttled and rerouted in case of massive DDoS attacks. This is an exceptional case: most edge services at Cloudflare run on every available IP.Bug fixesShortly after the public launch, we discovered that older Windows versions shipped with NTP version 3, and our server only spoke version 4. This was easy to fix since the timestamps have not moved in NTP versions: we echo the version back and most still existing NTP version 3 clients will understand what we meant. Also tricky was the failure of Network Time Foundation ntpd clients to expand the polling interval. It turns out that one has to echo back the client’s polling interval to have the polling interval expand. Chrony does not use the polling interval from the server, and so was not affected by this incompatibility.Both of these issues were fixed in ways suggested by other NTP implementers who had run into these problems themselves. We thank Miroslav Lichter tremendously for telling us exactly what the problem was, and the members of the Cloudflare community who posted packet captures demonstrating these issues.Continued improvementThe original production version of cfnts was not particularly object oriented and several contributors were just learning Rust. As a result there was quite a bit of unwrap and unnecessary mutability flying around. Much of the code was in functions even when it could profitably be attached to structures. All of this had to be restructured. Keep in mind that some of the best code running in the real-world have been written, rewritten, and sometimes rewritten again! This is actually a good thing.As an internal project we relied on Cloudflare’s internal tooling for building, testing, and deploying code. These were replaced with tools available to everyone like Docker to ensure anyone can contribute. Our repository is integrated with Circle CI, ensuring that all contributions are automatically tested. In addition to unit tests we test the entire end to end functionality of getting a measurement of the time from a server.The FutureNTPsec has already released support for NTS but we see very little usage. Please try turning on NTS if you use NTPsec and see how it works with time.cloudflare.com.  As the draft advances through the standards process the protocol will undergo an incompatible change when the identifiers are updated and assigned out of the IANA registry instead of being experimental ones, so this is very much an experiment. Note that your daemon will need TLS 1.3 support and so could require manually compiling OpenSSL and then linking against it. We’ve also added our time service to the public NTP pool. The NTP pool is a widely used volunteer-maintained service that provides NTP servers geographically spread across the world. Unfortunately, NTS doesn’t currently work well with the pool model, so for the best security, we recommend enabling NTS and using time.cloudflare.com and other NTS supporting servers.In the future, we’re hoping that more clients support NTS, and have licensed our code liberally to enable this. We would love to hear if you incorporate it into a product and welcome contributions to make it more useful.We’re also encouraged to see that Netnod has a production NTS service at nts.ntp.se. The more time services and clients that adopt NTS, the more secure the Internet will be.AcknowledgementsTanya Verma and Gabbi Fisher were major contributors to the code, especially the configuration system and the client code. We’d also like to thank Gary Miller, Miroslav Lichter, and all the people at Cloudflare who set up their laptops and home machines to point to time.cloudflare.com for early feedback.

The TLS Post-Quantum Experiment

CloudFlare Blog -

In June, we announced a wide-scale post-quantum experiment with Google. We implemented two post-quantum (i.e., not yet known to be broken by quantum computers) key exchanges, integrated them into our TLS stack and deployed the implementation on our edge servers and in Chrome Canary clients. The goal of the experiment was to evaluate the performance and feasibility of deployment in TLS of two post-quantum key agreement ciphers.In our previous blog post on post-quantum cryptography, we described differences between those two ciphers in detail. In case you didn’t have a chance to read it, we include a quick recap here. One characteristic of post-quantum key exchange algorithms is that the public keys are much larger than those used by "classical" algorithms. This will have an impact on the duration of the TLS handshake. For our experiment, we chose two algorithms: isogeny-based SIKE and lattice-based HRSS. The former has short key sizes (~330 bytes) but has a high computational cost; the latter has larger key sizes (~1100 bytes), but is a few orders of magnitude faster.During NIST’s Second PQC Standardization Conference, Nick Sullivan presented our approach to this experiment and some initial results. Quite accurately, he compared NTRU-HRSS to an ostrich and SIKE to a turkey—one is big and fast and the other is small and slow.Setup & ExecutionWe based our experiment on TLS 1.3. Cloudflare operated the server-side TLS connections and Google Chrome (Canary and Dev builds) represented the client side of the experiment. We enabled both CECPQ2 (HRSS + X25519) and CECPQ2b (SIKE/p434 + X25519) key-agreement algorithms on all TLS-terminating edge servers. Since the post-quantum algorithms are considered experimental, the X25519 key exchange serves as a fallback to ensure the classical security of the connection.Clients participating in the experiment were split into 3 groups—those who initiated TLS handshake with post-quantum CECPQ2, CECPQ2b or non post-quantum X25519 public keys. Each group represented approximately one third of the Chrome Canary population participating in the experiment.In order to distinguish between clients participating in or excluded from the experiment, we added a custom extension to the TLS handshake. It worked as a simple flag sent by clients and echoed back by Cloudflare edge servers. This allowed us to measure the duration of TLS handshakes only for clients participating in the experiment.For each connection, we collected telemetry metrics. The most important metric was a TLS server-side handshake duration defined as the time between receiving the Client Hello and Client Finished messages. The diagram below shows details of what was measured and how post-quantum key exchange was integrated with TLS 1.3.The experiment ran for 53 days in total, between August and October. During this time we collected millions of data samples, representing 5% of (anonymized) TLS connections that contained the extension signaling that the client was part of the experiment. We carried out the experiment in two phases.In the first phase of the experiment, each client was assigned to use one of the three key exchange groups, and each client offered the same key exchange group for every connection. We collected over 10 million records over 40 days.In the second phase of the experiment, client behavior was modified so that each client randomly chose which key exchange group to offer for each new connection, allowing us to directly compare the performance of each algorithm on a per-client basis. Data collection for this phase lasted 13 days and we collected 270 thousand records.ResultsWe now describe our server-side measurement results. Client-side results are described at https://www.imperialviolet.org/2019/10/30/pqsivssl.html.What did we find?The primary metric we collected for each connection was the server-side handshake duration. The below histograms show handshake duration timings for all client measurements gathered in the first phase of the experiment, as well as breakdowns into the top five operating systems. The operating system breakdowns shown are restricted to only desktop/laptop devices except for Android, which consists of only mobile devices.It’s clear from the above plots that for most clients, CECPQ2b performs worse than CECPQ2 and CONTROL. Thus, the small key size of CECPQ2b does not make up for its large computational cost—the ostrich outpaces the turkey.Digging a little deeperThis means we’re done, right? Not quite. We are interested in determining if there are any populations of TLS clients for which CECPQ2b consistency outperforms CECPQ2. This requires taking a closer look at the long tail of handshake durations. The below plots show cumulative distribution functions (CDFs) of handshake timings zoomed in on the 80th percentile (e.g., showing the top 20% of slowest handshakes).Here, we start to see something interesting. For Android, Linux, and Windows devices, there is a crossover point where CECPQ2b actually starts to outperform CECPQ2 (Android: ~94th percentile, Linux: ~92nd percentile, Windows: ~95th percentile). macOS and ChromeOS do not appear to have these crossover points.These effects are small but statistically significant in some cases. The below table shows approximate 95% confidence intervals for the 50th (median), 95th, and 99th percentiles of handshake durations for each key exchange group and device type, calculated using Maritz-Jarrett estimators. The numbers within square brackets give the lower and upper bounds on our estimates for each percentile of the “true” distribution of handshake durations based on the samples collected in the experiment. For example, with a 95% confidence level we can say that the 99th percentile of handshake durations for CECPQ2 on Android devices lies between 4057ms and 4478ms, while the 99th percentile for CECPQ2b lies between 3276ms and 3646ms. Since the intervals do not overlap, we say that with statistical significance, the experiment indicates that CECPQ2b performs better than CECPQ2 for the slowest 1% of Android connections. Configurations where CECPQ2 or CECPQ2b outperforms the other with statistical significance are marked with green in the table.Per-client comparisonA second phase of the experiment directly examined the performance of each key exchange algorithm for individual clients, where a client is defined to be a unique (anonymized) IP address and user agent pair. Instead of choosing a single key exchange algorithm for the duration of the experiment, clients randomly selected one of the experiment configurations for each new connection. Although the duration and sample size were limited for this phase of the experiment, we collected at least three handshake measurements for each group configuration from 3900 unique clients.The plot below shows for each of these clients the difference in latency between CECPQ2 and CECPQ2b, taking the minimum latency sample for each key exchange group as the representative value. The CDF plot shows that for 80% of clients, CECPQ2 outperformed or matched CECPQ2b, and for 99% of clients, the latency gap remained within 70ms. At a high level, this indicates that very few clients performed significantly worse with CECPQ2 over CECPQ2b.Do other factors impact the latency gap?We looked at a number of other factors—including session resumption, IP version, and network location—to see if they impacted the latency gap between CECPQ2 and CECPQ2b. These factors impacted the overall handshake latency, but we did not find that any made a significant impact on the latency gap between post-quantum ciphers. We share some interesting observations from this analysis below.Session resumptionApproximately 53% of all connections in the experiment were completed with TLS handshake resumption. However, the percentage of resumed connections varied significantly based on the device configuration. Connections from mobile devices were only resumed ~25% of the time, while between 40% and 70% of connections from laptop/desktop devices were resumed. Additionally, resumption provided between a 30% and 50% speedup for all device types.IP versionWe also examined the impact of IP version on handshake latency. Only 12.5% of the connections in the experiment used IPv6. These connections were 20-40% faster than IPv4 connections for desktop/laptop devices, but ~15% slower for mobile devices. This could be an artifact of IPv6 being generally deployed on newer devices with faster processors. For Android, the experiment was only run on devices with more modern processors, which perhaps eliminated the bias.Network locationThe slow connections making up the long tail of handshake durations were not isolated to a few countries, Autonomous Systems (ASes), or subnets, but originated from a globally diverse set of clients. We did not find a correlation between the relative performance of the two post-quantum key exchange algorithms based on these factors.DiscussionWe found that CECPQ2 (the ostrich) outperformed CECPQ2b (the turkey), for the majority of connections in the experiment, indicating that fast algorithms with large keys may be more suitable for TLS than slow algorithms with small keys. However, we observed the opposite—that CECPQ2b outperformed CECPQ2—for the slowest connections on some devices, including Windows computers and Android mobile devices. One possible explanation for this is packet fragmentation and packet loss. The maximum size of TCP packets that can be sent across a network is limited by the maximum transmission unit (MTU) of the network path, which is often ~1400 bytes. During the TLS handshake the server responds to the client with its public key and ciphertext, the combined size of which exceeds the MTU, so it is likely that handshake messages must be split across multiple TCP packets. This increases the risk of lost packets and delays due to retransmission. A repeat of this experiment that includes collection of fine-grained TCP telemetry could confirm this hypothesis.A somewhat surprising result of this experiment is just how fast HRSS performs for the majority of connections. Recall that the CECPQ2 cipher performs key exchange operations for both X25519 and HRSS, but the additional overhead of HRSS is barely noticeable. Comparing benchmark results, we can see that HRSS will be faster than X25519 on the server side and slower on the client side.In our design, the client side performs two operations—key generation and KEM decapsulation. Looking at those two operations we can see that the key generation is a bottleneck here.Key generation: 3553.5 [ops/sec] KEM decapsulation: 17186.7 [ops/sec] In algorithms with quotient-style keys (like NTRU), the key generation algorithm performs an inversion in the quotient ring—an operation that is quite computationally expensive. Alternatively, a TLS implementation could generate ephemeral keys ahead of time in order to speed up key exchange. There are several other lattice-based key exchange candidates that may be worth experimenting with in the context of TLS key exchange, which are based on different underlying principles than the HRSS construction. These candidates have similar key sizes and faster key generation algorithms, but have their own drawbacks. For now, HRSS looks like the more promising algorithm for use in TLS.In the case of SIKE, we implemented the most recent version of the algorithm, and instantiated it with the most performance-efficient parameter set for our experiment. The algorithm is computationally expensive, so we were required to use assembly to optimize it. In order to ensure best performance on Intel, most performance-critical operations have two different implementations; the library detects CPU capabilities and uses faster instructions if available, but otherwise falls back to a slightly slower generic implementation. We developed our own optimizations for 64-bit ARM CPUs. Nevertheless, our results show that SIKE incurred a significant overhead for every connection, especially on devices with weaker processors. It must be noted that high-performance isogeny-based public key cryptography is arguably much less developed than its lattice-based counterparts. Some ideas to develop this are floating around, and we expect to see performance improvements in the future.

DNS Encryption Explained

CloudFlare Blog -

The Domain Name System (DNS) is the address book of the Internet. When you visit cloudflare.com or any other site, your browser will ask a DNS resolver for the IP address where the website can be found. Unfortunately, these DNS queries and answers are typically unprotected. Encrypting DNS would improve user privacy and security. In this post, we will look at two mechanisms for encrypting DNS, known as DNS over TLS (DoT) and DNS over HTTPS (DoH), and explain how they work.Applications that want to resolve a domain name to an IP address typically use DNS. This is usually not done explicitly by the programmer who wrote the application. Instead, the programmer writes something such as fetch("https://example.com/news") and expects a software library to handle the translation of “example.com” to an IP address.Behind the scenes, the software library is responsible for discovering and connecting to the external recursive DNS resolver and speaking the DNS protocol (see the figure below) in order to resolve the name requested by the application. The choice of the external DNS resolver and whether any privacy and security is provided at all is outside the control of the application. It depends on the software library in use, and the policies provided by the operating system of the device that runs the software.Overview of DNS query and responseThe external DNS resolverThe operating system usually learns the resolver address from the local network using Dynamic Host Configuration Protocol (DHCP). In home and mobile networks, it typically ends up using the resolver from the Internet Service Provider (ISP). In corporate networks, the selected resolver is typically controlled by the network administrator. If desired, users with control over their devices can override the resolver with a specific address, such as the address of a public resolver like Google’s 8.8.8.8 or Cloudflare’s 1.1.1.1, but most users will likely not bother changing it when connecting to a public Wi-Fi hotspot at a coffee shop or airport.The choice of external resolver has a direct impact on the end-user experience. Most users do not change their resolver settings and will likely end up using the DNS resolver from their network provider. The most obvious observable property is the speed and accuracy of name resolution. Features that improve privacy or security might not be immediately visible, but will help to prevent others from profiling or interfering with your browsing activity. This is especially important on public Wi-Fi networks where anyone in physical proximity can capture and decrypt wireless network traffic.Unencrypted DNSEver since DNS was created in 1987, it has been largely unencrypted. Everyone between your device and the resolver is able to snoop on or even modify your DNS queries and responses. This includes anyone in your local Wi-Fi network, your Internet Service Provider (ISP), and transit providers. This may affect your privacy by revealing the domain names that are you are visiting.What can they see? Well, consider this network packet capture taken from a laptop connected to a home network:The following observations can be made:The UDP source port is 53 which is the standard port number for unencrypted DNS. The UDP payload is therefore likely to be a DNS answer.That suggests that the source IP address 192.168.2.254 is a DNS resolver while the destination IP 192.168.2.14 is the DNS client.The UDP payload could indeed be parsed as a DNS answer, and reveals that the user was trying to visit twitter.com.If there are any future connections to 104.244.42.129 or 104.244.42.1, then it is most likely traffic that is directed at “twitter.com”.If there is some further encrypted HTTPS traffic to this IP, succeeded by more DNS queries, it could indicate that a web browser loaded additional resources from that page. That could potentially reveal the pages that a user was looking at while visiting twitter.com.Since the DNS messages are unprotected, other attacks are possible:Queries could be directed to a resolver that performs DNS hijacking. For example, in the UK, Virgin Media and BT return a fake response for domains that do not exist, redirecting users to a search page. This redirection is possible because the computer/phone blindly trusts the DNS resolver that was advertised using DHCP by the ISP-provided gateway router.Firewalls can easily intercept, block or modify any unencrypted DNS traffic based on the port number alone. It is worth noting that plaintext inspection is not a silver bullet for achieving visibility goals, because the DNS resolver can be bypassed.Encrypting DNSEncrypting DNS makes it much harder for snoopers to look into your DNS messages, or to corrupt them in transit. Just as the web moved from unencrypted HTTP to encrypted HTTPS there are now upgrades to the DNS protocol that encrypt DNS itself. Encrypting the web has made it possible for private and secure communications and commerce to flourish. Encrypting DNS will further enhance user privacy.Two standardized mechanisms exist to secure the DNS transport between you and the resolver, DNS over TLS (2016) and DNS Queries over HTTPS (2018). Both are based on Transport Layer Security (TLS) which is also used to secure communication between you and a website using HTTPS. In TLS, the server (be it a web server or DNS resolver) authenticates itself to the client (your device) using a certificate. This ensures that no other party can impersonate the server (the resolver).With DNS over TLS (DoT), the original DNS message is directly embedded into the secure TLS channel. From the outside, one can neither learn the name that was being queried nor modify it. The intended client application will be able to decrypt TLS, it looks like this:In the packet trace for unencrypted DNS, it was clear that a DNS request can be sent directly by the client, followed by a DNS answer from the resolver. In the encrypted DoT case however, some TLS handshake messages are exchanged prior to sending encrypted DNS messages:The client sends a Client Hello, advertising its supported TLS capabilities.The server responds with a Server Hello, agreeing on TLS parameters that will be used to secure the connection. The Certificate message contains the identity of the server while the Certificate Verify message will contain a digital signature which can be verified by the client using the server Certificate. The client typically checks this certificate against its local list of trusted Certificate Authorities, but the DoT specification mentions alternative trust mechanisms such as public key pinning.Once the TLS handshake is Finished by both the client and server, they can finally start exchanging encrypted messages.While the above picture contains one DNS query and answer, in practice the secure TLS connection will remain open and will be reused for future DNS queries.Securing unencrypted protocols by slapping TLS on top of a new port has been done before:Web traffic: HTTP (tcp/80) -> HTTPS (tcp/443)Sending email: SMTP (tcp/25) -> SMTPS (tcp/465)Receiving email: IMAP (tcp/143) -> IMAPS (tcp/993)Now: DNS (tcp/53 or udp/53) -> DoT (tcp/853)A problem with introducing a new port is that existing firewalls may block it. Either because they employ a whitelist approach where new services have to be explicitly enabled, or a blocklist approach where a network administrator explicitly blocks a service. If the secure option (DoT) is less likely to be available than its insecure option, then users and applications might be tempted to try to fall back to unencrypted DNS. This subsequently could allow attackers to force users to an insecure version.Such fallback attacks are not theoretical. SSL stripping has previously been used to downgrade HTTPS websites to HTTP, allowing attackers to steal passwords or hijack accounts.Another approach, DNS Queries over HTTPS (DoH), was designed to support two primary use cases:Prevent the above problem where on-path devices interfere with DNS. This includes the port blocking problem above.Enable web applications to access DNS through existing browser APIs.DoH is essentially HTTPS, the same encrypted standard the web uses, and reuses the same port number (tcp/443). Web browsers have already deprecated non-secure HTTP in favor of HTTPS. That makes HTTPS a great choice for securely transporting DNS messages. An example of such a DoH request can be found here.DoH: DNS query and response transported over a secure HTTPS streamSome users have been concerned that the use of HTTPS could weaken privacy due to the potential use of cookies for tracking purposes. The DoH protocol designers considered various privacy aspects and explicitly discouraged use of HTTP cookies to prevent tracking, a recommendation that is widely respected. TLS session resumption improves TLS 1.2 handshake performance, but can potentially be used to correlate TLS connections. Luckily, use of TLS 1.3 obviates the need for TLS session resumption by reducing the number of round trips by default, effectively addressing its associated privacy concern.Using HTTPS means that HTTP protocol improvements can also benefit DoH. For example, the in-development HTTP/3 protocol, built on top of QUIC, could offer additional performance improvements in the presence of packet loss due to lack of head-of-line blocking. This means that multiple DNS queries could be sent simultaneously over the secure channel without blocking each other when one packet is lost.A draft for DNS over QUIC (DNS/QUIC) also exists and is similar to DoT, but without the head-of-line blocking problem due to the use of QUIC. Both HTTP/3 and DNS/QUIC, however, require a UDP port to be accessible. In theory, both could fall back to DoH over HTTP/2 and DoT respectively.Deployment of DoT and DoHAs both DoT and DoH are relatively new, they are not universally deployed yet. On the server side, major public resolvers including Cloudflare’s 1.1.1.1 and Google DNS support it. Many ISP resolvers however still lack support for it. A small list of public resolvers supporting DoH can be found at DNS server sources, another list of public resolvers supporting DoT and DoH can be found on DNS Privacy Public Resolvers.There are two methods to enable DoT or DoH on end-user devices:Add support to applications, bypassing the resolver service from the operating system.Add support to the operating system, transparently providing support to applications.There are generally three configuration modes for DoT or DoH on the client side:Off: DNS will not be encrypted.Opportunistic mode: try to use a secure transport for DNS, but fallback to unencrypted DNS if the former is unavailable. This mode is vulnerable to downgrade attacks where an attacker can force a device to use unencrypted DNS. It aims to offer privacy when there are no on-path active attackers.Strict mode: try to use DNS over a secure transport. If unavailable, fail hard and show an error to the user.The current state for system-wide configuration of DNS over a secure transport: Android 9: supports DoT through its “Private DNS” feature. Modes: Opportunistic mode (“Automatic”) is used by default. The resolver from network settings (typically DHCP) will be used. Strict mode can be configured by setting an explicit hostname. No IP address is allowed, the hostname is resolved using the default resolver and is also used for validating the certificate. (Relevant source code) iOS and Android users can also install the 1.1.1.1 app to enable either DoH or DoT support in strict mode. Internally it uses the VPN programming interfaces to enable interception of unencrypted DNS traffic before it is forwarded over a secure channel. Linux with systemd-resolved from systemd 239: DoT through the DNSOverTLS option. Off is the default. Opportunistic mode can be configured, but no certificate validation is performed. Strict mode is available since systemd 243. Any certificate signed by a trusted certificate authority is accepted. However, there is no hostname validation with the GnuTLS backend while the OpenSSL backend expects an IP address. In any case, no Server Name Indication (SNI) is sent. The certificate name is not validated, making a man-in-the-middle rather trivial. Linux, macOS, and Windows can use a DoH client in strict mode. The cloudflared proxy-dns command uses the Cloudflare DNS resolver by default, but users can override it through the proxy-dns-upstream option. Web browsers support DoH instead of DoT: Firefox 62 supports DoH and provides several Trusted Recursive Resolver (TRR) settings. By default DoH is disabled, but Mozilla is running an experiment to enable DoH for some users in the USA. This experiment currently uses Cloudflare's 1.1.1.1 resolver, since we are the only provider that currently satisfies the strict resolver policy required by Mozilla. Since many DNS resolvers still do not support an encrypted DNS transport, Mozilla's approach will ensure that more users are protected using DoH. When enabled through the experiment, or through the “Enable DNS over HTTPS” option at Network Settings, Firefox will use opportunistic mode (network.trr.mode=2 at about:config). Strict mode can be enabled with network.trr.mode=3, but requires an explicit resolver IP to be specified (for example, network.trr.bootstrapAddress=1.1.1.1). While Firefox ignores the default resolver from the system, it can be configured with alternative resolvers. Additionally, enterprise deployments who use a resolver that does not support DoH have the option to disable DoH. Chrome 78 enables opportunistic DoH if the system resolver address matches one of the hard-coded DoH providers (source code change). This experiment is enabled for all platforms except Linux and iOS, and excludes enterprise deployments by default. Opera 65 adds an option to enable DoH through Cloudflare's 1.1.1.1 resolver. This feature is off by default. Once enabled, it appears to use opportunistic mode: if 1.1.1.1:443 (without SNI) is reachable, it will be used. Otherwise it falls back to the default resolver, unencrypted. The DNS over HTTPS page from the curl project has a comprehensive list of DoH providers and additional implementations.As an alternative to encrypting the full network path between the device and the external DNS resolver, one can take a middle ground: use unencrypted DNS between devices and the gateway of the local network, but encrypt all DNS traffic between the gateway router and the external DNS resolver. Assuming a secure wired or wireless network, this would protect all devices in the local network against a snooping ISP, or other adversaries on the Internet. As public Wi-Fi hotspots are not considered secure, this approach would not be safe on open Wi-Fi networks. Even if it is password-protected with WPA2-PSK, others will still be able to snoop and modify unencrypted DNS.Other security considerationsThe previous sections described secure DNS transports, DoH and DoT. These will only ensure that your client receives the untampered answer from the DNS resolver. It does not, however, protect the client against the resolver returning the wrong answer (through DNS hijacking or DNS cache poisoning attacks). The “true” answer is determined by the owner of a domain or zone as reported by the authoritative name server. DNSSEC allows clients to verify the integrity of the returned DNS answer and catch any unauthorized tampering along the path between the client and authoritative name server.However deployment of DNSSEC is hindered by middleboxes that incorrectly forward DNS messages, and even if the information is available, stub resolvers used by applications might not even validate the results. A report from 2016 found that only 26% of users use DNSSEC-validating resolvers.DoH and DoT protect the transport between the client and the public resolver. The public resolver may have to reach out to additional authoritative name servers in order to resolve a name. Traditionally, the path between any resolver and the authoritative name server uses unencrypted DNS. To protect these DNS messages as well, we did an experiment with Facebook, using DoT between 1.1.1.1 and Facebook’s authoritative name servers. While setting up a secure channel using TLS increases latency, it can be amortized over many queries.Transport encryption ensures that resolver results and metadata are protected. For example, the EDNS Client Subnet (ECS) information included with DNS queries could reveal the original client address that started the DNS query. Hiding that information along the path improves privacy. It will also prevent broken middle-boxes from breaking DNSSEC due to issues in forwarding DNS.Operational issues with DNS encryptionDNS encryption may bring challenges to individuals or organizations that rely on monitoring or modifying DNS traffic. Security appliances that rely on passive monitoring watch all incoming and outgoing network traffic on a machine or on the edge of a network. Based on unencrypted DNS queries, they could potentially identify machines which are infected with malware for example. If the DNS query is encrypted, then passive monitoring solutions will not be able to monitor domain names.Some parties expect DNS resolvers to apply content filtering for purposes such as:Blocking domains used for malware distribution.Blocking advertisements.Perform parental control filtering, blocking domains associated with adult content.Block access to domains serving illegal content according to local regulations.Offer a split-horizon DNS to provide different answers depending on the source network.An advantage of blocking access to domains via the DNS resolver is that it can be centrally done, without reimplementing it in every single application. Unfortunately, it is also quite coarse. Suppose that a website hosts content for multiple users at example.com/videos/for-kids/ and example.com/videos/for-adults/. The DNS resolver will only be able to see “example.com” and can either choose to block it or not. In this case, application-specific controls such as browser extensions would be more effective since they can actually look into the URLs and selectively prevent content from being accessible.DNS monitoring is not comprehensive. Malware could skip DNS and hardcode IP addresses, or use alternative methods to query an IP address. However, not all malware is that complicated, so DNS monitoring can still serve as a defence-in-depth tool.All of these non-passive monitoring or DNS blocking use cases require support from the DNS resolver. Deployments that rely on opportunistic DoH/DoT upgrades of the current resolver will maintain the same feature set as usually provided over unencrypted DNS. Unfortunately this is vulnerable to downgrades, as mentioned before. To solve this, system administrators can point endpoints to a DoH/DoT resolver in strict mode. Ideally this is done through secure device management solutions (MDM, group policy on Windows, etc.).ConclusionOne of the cornerstones of the Internet is mapping names to an address using DNS. DNS has traditionally used insecure, unencrypted transports. This has been abused by ISPs in the past for injecting advertisements, but also causes a privacy leak. Nosey visitors in the coffee shop can use unencrypted DNS to follow your activity. All of these issues can be solved by using DNS over TLS (DoT) or DNS over HTTPS (DoH). These techniques to protect the user are relatively new and are seeing increasing adoption.From a technical perspective, DoH is very similar to HTTPS and follows the general industry trend to deprecate non-secure options. DoT is a simpler transport mode than DoH as the HTTP layer is removed, but that also makes it easier to be blocked, either deliberately or by accident.Secondary to enabling a secure transport is the choice of a DNS resolver. Some vendors will use the locally configured DNS resolver, but try to opportunistically upgrade the unencrypted transport to a more secure transport (either DoT or DoH). Unfortunately, the DNS resolver usually defaults to one provided by the ISP which may not support secure transports.Mozilla has adopted a different approach. Rather than relying on local resolvers that may not even support DoH, they allow the user to explicitly select a resolver. Resolvers recommended by Mozilla have to satisfy high standards to protect user privacy. To ensure that parental control features based on DNS remain functional, and to support the split-horizon use case, Mozilla has added a mechanism that allows private resolvers to disable DoH.The DoT and DoH transport protocols are ready for us to move to a more secure Internet. As can be seen in previous packet traces, these protocols are similar to existing mechanisms to secure application traffic. Once this security and privacy hole is closed, there will be many more to tackle.

50 Years of The Internet. Work in Progress to a Better Internet

CloudFlare Blog -

It was fifty years ago when the very first network packet took flight from the Los Angeles campus at UCLA to the Stanford Research Institute (SRI) building in Palo Alto. Those two California sites had kicked-off the world of packet networking, of the Arpanet, and of the modern Internet as we use and know it today. Yet by the time the third packet had been transmitted that evening, the receiving computer at SRI had crashed. The “L” and “O” from the word “LOGIN” had been transmitted successfully in their packets; but that “G”, wrapped in its own packet, caused the death of that nascent packet network setup. Even today, software crashes, that’s a solid fact; but this historic crash, is exactly that — historic.Courtesy of MIT Advanced Network Architecture Group So much has happened since that day (October 29’th to be exact) in 1969, in fact it’s an understatement to say “so much has happened”! It’s unclear that one blog article would ever be able to capture the full history of packets from then to now. Here at Cloudflare we say we are helping build a “better Internet”, so it would make perfect sense for us to honor the history of the Arpanet and its successor, the Internet, by focusing on some of the other folks that have helped build a better Internet.Leonard Kleinrock, Steve Crocker, and crew - those first packetsNothing takes away from what happened that October day. The move from a circuit-based networking mindset to a packet-based network is momentus. The phrase net-heads vs bell-heads was born that day - and it’s still alive today! The basics of why the Internet became a permissionless innovation was instantly created the moment that first packet traversed that network fifty years ago.Courtesy of UCLAProfessor Leonard (Len) Kleinrock continued to work on the very-basics of packet networking. The network used on that day expanded from two nodes to four nodes (in 1969, one IMP was delivered each month from BBN to various university sites) and created a network that spanned the USA from coast to coast and then beyond.ARPANET logical map 1973 via Wikipedia In the 1973 map there’s a series of boxes marked TIP. These are a version of the IMP that was used to connect computer terminals along with computers (hosts) to the ARPANET. Every IMP and TIP was managed by Bolt, Beranek and Newman (BBN), based in Cambridge Mass. This is vastly different from today’s Internet where every network is operated autonomously.By 1977 the ARPANET had grown further with links from the United States mainland to Hawaii plus links to Norway and the United Kingdom.ARPANET logical map 1977 via WikipediaFocusing back to that day in 1969, Steve Crocker (who was a graduate student at UCLA at that time) headed up the development of the NCP software. The Network Control Program (later remembered as Network Control Protocol) provided the host to host transmission control software stack. Early versions of telnet and FTP ran atop NCP.During this journey both Len Kleinrock, Steve Crocker, and the other early packet pioneers have always been solid members of the Internet community and continue to deliver daily to a better Internet.Steve Crocker and Bill Duvall have written a guest blog about that day fifty years ago. Please read it after you've finished reading this blog.BTW: Today, on this 50th anniversary, UCLA is celebrating history via this symposium (see also https://samueli.ucla.edu/internet50/).Their collective accomplishments are extensive and still relevant today.Vint Cerf and Bob Kahn - the creation of TCP/IPIn 1973 Vint Cerf was asked to work on a protocol to replace the original NCP protocol. The new protocol is now known as TCP/IP. Of course, everyone had to move from NCP to TCP and that was outlined in RFC801. At the time (1982 and 1983) there were around 200 to 250 hosts on the ARPANET, yet that transition was still a major undertaking. Finally, on January 1st, 1983, fourteen years after that first packet flowed, the NCP protocol was retired and TCP/IP was enabled. The ARPANET got what would become the Internet’s first large scale addressing scheme (IPv4). This was better in so many ways; but in reality, this transition was just one more stepping stone towards our modern and better Internet.Jon Postel - The RFCs, The numbers, The legacySome people write code, some people write documents, some people organize documents, some people organize numbers. Jon Postel did all of these things. Jon was the first person to be in charge of allocating numbers (you know - IP addresses) back in the early 80’s. In a way it was a thankless job that no-one else wanted to do. Jon was also the keeper of the early documents (Request For Comment or RFCs) that provide us with how the packet network should operate. Everything was available so that anyone could write code and join the network. Everyone was also able to write a fresh document (or update an existing document) so that the ecosystem of the Arpanet could grow. Some of those documents are still in existence and referenced today. RFC791 defines the IP protocol and is dated 1981 - it’s still an active document in-use today! Those early days and Jon’s massive contributions have been well documented and acknowledged. A better Internet is impossible without these conceptual building blocks.Jon passed away in 1998; however, his legacy and his thoughts are still in active use today. He once said within the TCP world: “Be conservative in what you send, be liberal in what you accept”. This is called the robustness principle and it’s still key to writing good network protocol software.Bill Joy & crew - Berkeley BSD Unix 4.2 and its TCP/IP softwareWhat’s the use of a protocol if you don’t have software to speak it. In the early 80’s there were many efforts to build both affordable and fast hardware, along with the software to speak to that hardware. At the University of California, Berkeley (UCB) there was a group of software developers tasked in 1980 by the Defense Advanced Research Projects Agency (DARPA) to implement the brand-new TCP/IP protocol stack on the VAX under Unix. They not-only solved that task; but they went a long way further than just that goal.The folks at UCB (Bill Joy, Marshall Kirk McKusick, Keith Bostic, Michael Karels, and others) created an operating system called 4.2BSD (Berkeley Software Distribution) that came with TCP/IP ingrained in its core. It was based on the AT&T’s Unix v6 and Unix/32V; however it had significantly deviated in many ways. The networking code, or sockets as its interface is called, became the underlying building blocks of each and every piece of networking software in the modern world of the Internet. We at Cloudflare have written numerous times about networking kernel code and it all boils down to the code that was written back at UCB. Bill Joy went on to be a founder of Sun Microsystems (which commercialized 4.2BSD and much more). Others from UCB went on to help build other companies that still are relevant to the Internet today.Fun fact: Berkeley’s Unix (or FreeBSD, OpenBSD, NetBSD as its variants are known) is now the basis of every iPhone, iPad and Mac laptops software in existence. Android’s and Chromebooks come from a different lineage; but still hold those BSD methodologies as the fundamental basis of all their networking software.Al Gore - The Information Superhighway - or retold as “funding the Internet”Do you believe that Al Gore invented the Internet? It’s actually doesn’t matter which side of this statement you want to argue; the simple fact is that the US Government funded the National Science Foundation (NSF) with the task of building an “information superhighway”. Al Gore himself said: “how do we create a nationwide network of information superhighways? Obviously, the private sector is going to do it, but the Federal government can catalyze and accelerate the process. '' He said that statement on September 19, 1994 and this blog post author knows that fact because I was there in the room when he said it!The United States Federal Government help fund the growth of the Arpanet into the early version of the Internet. Without the government's efforts, we may not have been where we are today. Luckily, just a handful of years later, the NSF decided that in fact the commercial world could and should be the main building blocks for the Internet and instantly the Internet as we know it today was born. Packets that fly across commercial backbones are paid for via commercial contracts. The parts that are still funded by the government (any government) are normally only the parts used by universities, or military users.But this author is still going to thank Al Gore for helping create a better Internet back in the early 90’s.Sir Tim Berners-Lee - The World Wide WebWhat can I say? In 1989 Tim Berners-Lee (who was later knighted and is now Sir Tim) invented the World Wide Web and we would not have billions of people using the Internet today without him. Period!via Redditvia RedditYeah, let's clear up that subtle point. Sir Tim invented the World Wide Web (WWW) and Vint Cerf invented the Internet. When folks talk about using one or the other, it’s worth reminding then there is a difference. But I digress!Sir Tim’s creation is what provides everyday folks with a window into information on the Internet. Before the WWW we had textual interfaces to information; but only if you knew where to look and what to type. We really need to remember every time we click on a link or press submit to buy something, that the only way that is usable is such mass and uniform form is because of Sir Tim’s creation.Sally Floyd - The subtle art of dropping packetsRandom Early Detection (RED) is an algorithm that saved the Internet back in the early 90’s. Built on earlier work by Van Jacobson, it defined a method to drop packets when a router was overloaded, or more importantly about to be overloaded. Packet network, before Van Jacobson’s or Sally Floyd’s work, would congest heavily and slow down. It seemed natural to never throw away data; but between the two inventors of RED, that all changed. Her follow-up work is described in an August 1993 paper.Networks have become much more complex since August 1993, yet the RED code still exists and is used in nearly every Unix or Linux kernel today. See the tc-red(8) command and/or the Linux kernel code itself.It’s with great sorrow that Sally Floyd passed away in late August. But, rest assured, her algorithm will possibly be used to help keep a better Internet flowing smoothly forever.Jay Adelson and Al Avery - The datacenter that interconnect networksRemember that comment by Al Gore above saying that the private sector would build the Internet. Back in the late 90’s that’s exactly what happened. Telecom companies were selling capacity to fledgling ISPs. Nationwide IP backbones were being built by the likes of PSI, Netcom, UUnet, Digex, CAIS, ANS, etc. The telco’s themselves like MCI, Sprint, but interestingly not AT&T at the time, were getting into providing Internet access in a big way.In the US everything was moving very fast. By the mid-90’s there was no way to get a connection anymore from a regional research network for your shiny new ISP. Everything had all gone commercial and the NSF funded parts of the Internet were not available for commercial packets.The NSF, in it’s goal to allow commercial networks to build the Internet, had also specified that those networks should interconnect at four locations around the country. New Jersey, Chicago, Bay Area, California, and Washington DC area.Network Access Point via WikipediaThe NAP’s, as they were called, were to provide interconnection between networks and to provide the research networks a way to interconnect with commercial network along with themselves. The NAPs suddenly exploded in usage, near-instantly needing to be bigger, The buildings they were housed in ran out of space or power or both! Yet those networks needed homes, interconnections needed a better structure and the old buildings that were housing the Internet’s routers just didn’t cut it anymore.Jay and Al had a vision. New massive datacenters that could securely house the growing need for the power-hungry Internet. But that’s only a small portion of the vision. They realized that if many networks all lived under the same roof then interconnecting them could indeed build a better Internet. They installed Internet Exchanges and a standardized way of cross-connecting from one network to another. They were carrier neutral, so that everyone was treated equal. It was, what became known as the “network effect” and it was a success. The more networks you had under one roof, the more that other networks would want to be housed within those same roofs. The company they created was (and still is) called Equinix. It wasn’t the first company to realize this; but it sure has become one of the biggest and most successful in this arena.Today, a vast amount of the Internet uses Equinix datacenters, it’s IXs along with similar offerings from similar companies. Jay and Al’s vision absolutely paved the way to a better internet.Everyone who’s a member of The Internet Society 1992-TodayIt turns out that people realized that the modern Internet is not all-commercial all-the-time. There is a need for other influences to be had. Civil society, governments, academics, along with those commercial entities should also have a say in how the Internet evolves. This brings into the conversation a myriad of people that have either been members of The Internet Society (ISOC) and/or have worked directly for ISOC over it’s 27+ years. This is the organization that manages and helps fund the IETF (where protocols are discussed and standardized). ISOC plays a decisive role at The Internet Governance Forum (IGF), and fosters a clear understanding of how the Internet should be used and protected to both the general public and regulators worldwide. ISOCs involvement with Internet Exchange development (vital as the Internet grows and connects users and content) has been a game changer for many-many countries, especially in Africa.ISOC has an interesting funding mechanism centered around the dotORG domain. You may not have realized that you were helping the Internet grow when you registered and paid for your .org domain; however, you are!Over the life of ISOC, the Internet has moved from being the domain of engineers and scientists into something used by nearly everyone; independent of technical skill or in-fact a full understanding of it’s inner workings. ISOC’s mission is "to promote the open development, evolution and use of the Internet for the benefit of all people throughout the world". It has been a solid part of that growth.Giving voice to everyone on how the Internet could grow and how it should (or should not be) regulated, is front-and-center for every person involved with ISOC globally. Defining both an inclusive Internet and a better Internet is the everyday job for those people.Kanchana Kanchanasut - Thailand and .THIn the 1988, amongst other things, Professor Kanchana Kanchanasut registered and operated the country Top Level Domain .TH (which is the two-letter ISO 3166 code for Thailand). This was the first country to have a TLD; something all countries take for granted today.Also in 1988, five Thai universities got dial-up connections to the Internet because of her work. However, the real breakthrough came when Prof. Kanchanasut’s efforts led to the first leased line interconnecting Thailand to the nascent Internet of the early 90’s. That was 1991 and since then Thailand’s connectivity has exploded. It’s an amazingly well connected country. Today it boasts a plethora of mobile operators, and international undersea and cross-border cables, along with Prof. Kanchanasut’s present-day work spearheading an independent and growing Internet Exchange within Thailand.In 2013, the "Mother of the Internet in Thailand" as she is affectionately called, was inducted into the Internet Hall of Fame by the Internet Society. If you’re in Thailand, or South East Asia, then she’s the reason why you have a better Internet.The list continues In the fifty years since that first packet there have been heros, both silent and profoundly vocal that have moved the Internet forward. There’s no was all could be named or called out; however, you will find many listed if you go look. Wander through the thousands of RFC’s, or check out the Internet Hall of Fame. The Internet today is a better Internet because anyone can be a contributor. Cloudflare and the better InternetCloudflare, or in fact any part of the Internet, would not be where it is today without the groundbreaking work of these people plus many others unnamed here. This fifty year effort has moved the needle in such a way that without all of them the runaway success of the Internet could not have been possible!Cloudflare is just over nine years old (that’s only 18% of this fifty year period). Gazillions and gazillions of packets have flowed since Cloudflare started providing it services and we sincerely believe we have done our part with those services to build a better Internet.. Oh, and we haven’t finished our work, far from it! We still have a long way to go in helping build a better Internet. And we’re just getting started!A letter from Matthew Prince (@eastdakota) and Michelle Zatlyn (@zatlyn) #BetterInternet $NET https://t.co/BHLI8MuuTS pic.twitter.com/Jirb0bPUzJ— Cloudflare (@Cloudflare) September 13, 2019 If you’re interested in helping build a better Internet and want to join Cloudflare in our offices in San Francisco, Singapore, London, Austin, Sydney, Champaign, Munich, San Jose, New York or our new Lisbon Portugal offices, then buzz over to our jobs page and come join us! #betterInternet

Fifty Years Ago

CloudFlare Blog -

This is a guest post by Steve Crocker of Shinkuro, Inc. and Bill Duvall of Consulair. Fifty years ago they were both present when the first packets flowed on the Arpanet. On 29 October 2019, Professor Leonard (“Len”) Kleinrock is chairing a celebration at the University of California, Los Angeles (UCLA).  The date is the fiftieth anniversary of the first full system test and remote host-to-host login over the Arpanet.  Following a brief crash caused by a configuration problem, a user at UCLA was able to log in to the SRI SDS 940 time-sharing system.  But let us paint the rest of the picture.The Arpanet was a bold project to connect sites within the ARPA-funded computer science research community and to use packet-switching as the technology for doing so.  Although there were parallel packet-switching research efforts around the globe, none were at the scale of the Arpanet project. Cooperation among researchers in different laboratories, applying multiple machines to a single problem and sharing of resources were all part of the vision.  And over the fifty years since then, the vision has been fulfilled, albeit with some undesired outcomes mixed in with the enormous benefits.  However, in this blog, we focus on just those early days.In September 1969, Bolt, Beranek and Newman (BBN) in Cambridge, MA delivered the first Arpanet IMP (packet switch) to Len Kleinrock’s laboratory at UCLA. The Arpanet incorporated his theoretical work on packet switching and UCLA was chosen as the network measurement site for validation of his theories.  The second IMP was installed a month later at Doug Engelbart’s laboratory at the Stanford Research Institute – now called SRI International – in Menlo Park, California.  Engelbart had invented the mouse and his lab had developed a graphical interface for structured and hyperlinked text.  Engelbart’s vision saw computer users sharing information over a wide-scale network, so the Arpanet was a natural candidate for his work. Today, we have seen that vision travel from SRI to Xerox to Apple to Microsoft, and it is now a part of everyone’s environment.“IMP” stood for Interface Message Processor; we would now simply say “router.” Each IMP was connected to up to four host computers.  At UCLA the first host was a Scientific Data Systems (SDS) Sigma 7.  At SRI, the host was an SDS 940.  Jon Postel, Vint Cerf and Steve Crocker were among the graduate students at UCLA involved in the design of the protocols between the hosts on the Arpanet, as were Bill Duvall, Jeff Rulifson, and others at SRI (see RFC 1 and RFC 2.)SRI and UCLA quickly connected their hosts to the IMPs.  Duvall at SRI modified the SDS 940 time-sharing system to allow host to host terminal connections over the net. Charley Kline wrote the complementary client program at UCLA.  These efforts required building custom hardware for connecting the IMPs to the hosts, and programming for both the IMPs and the respective hosts.  At the time, systems programming was done either in assembly language or special purpose hybrid languages blending simple higher-level language features with assembler.  Notable examples were ESPOL for the Burroughs 5500 and PL/I for Multics.  Much of Engelbart’s NLS system was written in such a language, but the time-sharing system was written in assembler for efficiency and size considerations.Along with the delivery of the IMPs, a deadline of October 31 was set for connecting the first hosts.  Testing was scheduled to begin on October 29 in order to allow a few days for necessary debugging and handling of unanticipated problems.   In addition to the high-speed line that connected the SRI and UCLA IMPs, there was a parallel open, dedicated voice line. On the evening of October 29 Duvall at SRI donned his headset as did Charley Kline at UCLA, and both host-IMP pairs were started. Charley typed an L, the first letter of a LOGIN command.  Duvall, tracking the activity at SRI, saw that the L was received, and that it launched a user login process within the 940. The 940 system was full duplex, so it echoed an “L” across the net to UCLA.  At UCLA, the L appeared on the terminal.  Success! Charley next typed O and received back O.  Charley typed G, and there was silence.  At SRI, Duvall quickly determined that an echo buffer had been sized too small[1], re-sized it, and restarted the system. Charley  typed “LO” again, and received back the normal “LOGIN”.  He typed a confirming RETURN, and the first host-to-host login on the Arpanet was completed.Len Kleinrock noted that the first characters sent over the net were “LO.”  Sensing the importance of the event, he expanded “LO" to “Lo and Behold”, and used that in the title of the movie called “Lo and Behold: Reveries of the Connected World.”  See imdb.com/title/tt5275828.Engelbart's five finger keyboard and mouse with three buttons. The mouse evolved and became ubiquitous. The five finger keyboard faded.IMPs continued to be installed on the Arpanet at the rate of roughly one per month over the next two years.  Soon we had a spectacularly large network with more than twenty hosts, and the connections between the IMPs were permanent telephone lines operating at the lightning speed of 50,000 bits per second[2].Len Kleinrock and IMP #1 at UCLAToday, all computers come with hardware and software to communicate with other computers.  Not so back then.  Each computer was the center of its own world, and expected to be connected only to subordinate “peripheral” devices – printers, tape drives, etc.  Many even used different character sets.  There was no standard method for connecting two computers together, not even ones from the same manufacturer. Part of what made the Arpanet project bold was the diversity of the hardware and software at the research centers.  Almost all of the hosts at these sites were time-shared computers.  Typically, several people shared the same computer, and the computer processed each user’s computation a little bit at a time.  These computers were large and expensive.  Personal computers were fifteen years in the future, and smart phones were science fiction.  Even Dick Tracy’s fantasy two-way wrist radio envisioned only voice interaction, not instant access to databases and sharing of pictures and videos.Dick Tracy and his two-way radio.Each site had to create a hardware connection from the host(s) to the IMP. Further, each site had to add drivers or more to the operating system in its host(s) so that programs on the host could communicate with the IMP.  The protocols for host to host communication were in their infancy and unproven.During those first two years when IMPs were being installed monthly, we met with students and researchers at the other sites to develop the first suite of protocols.  The bottom layer was forgettably named the Host-Host protocol[3].  Telnet, for emulating terminal dial-up, and the File Transfer Protocol (FTP) were on the next layer above the Host-Host protocol.  Email started as a special case of FTP and later evolved into its own protocol.  Other networks sprang up and the Arpanet became the seedling for the Internet, with TCP providing a reliable, two-way host to host connection, and IP below it stitching together the multiple networks of the Internet.  But the Telnet and FTP protocols continued for many years and are only recently being phased out in favor of more robust and more secure alternatives.The hardware interfaces, the protocols and the software that implemented the protocols were the tangible engineering products of that early work.  Equally important was the social fabric and culture that we created.  We knew the system would evolve, so we envisioned an open and evolving architecture.  Many more protocols would be created, and the process is now embodied in the Internet Engineering Task Force (IETF).  There was also a strong spirit of cooperation and openness.  The Request for Comments (RFCs) series of notes were open for anyone to write and everyone to read.  Anyone was welcome to participate in the design of the protocol, and hence we now have important protocols that have originated from all corners of the world.In October 1971, two years after the first IMP was installed, we held a meeting at MIT to test the software on all of the hosts.  Researchers at each host attempted to login, via Telnet, to each of the other hosts.  In the spirit of Samuel Johnson’s famous quote[4], the deadline and visibility within the research community stimulated frenetic activity all across the network to get everything working.  Almost all of the hosts were able to login to all of the other hosts.  The Arpanet was finally up and running.  And the bakeoff at MIT that October set the tone for the future: test your software by connecting to others.  No need for formal standards certification or special compliance organizations; the pressure to demonstrate your stuff actually works with others gets the job done. [1] The SDS 940 had a maximum memory size of 65K 24-bit words. The time-sharing system along with all of its associated drivers and active data had to share this limited memory, so space was precious and all data structures and buffers were kept to the minimum possible size. The original host-to-host protocol called for terminal emulation and single character messages, and buffers were sized accordingly. What had not been anticipated was that in a full duplex system such as the 940, multiple characters might be echoed for a single received character. Such was the case when the G of LOG was echoed back as “GIN” due to the command completion feature of the SDS 940 operating system. [2] “50,000” is not a misprint. The telephone lines in those days were analog, not digital. To achieve a data rate of 50,000 bits per second, AT&T used twelve voice grade lines bonded together and a Western Electric series 303A modem that spread the data across the twelve lines. Several years later, an ordinary “voice grade” line was implemented with digital technology and could transmit data at 56,000 bits per second, but in the early days of the Arpanet 50Kbs was considered very fast. These lines were also quite expensive. [3] In the papers that described the Host-Host protocol, the term Network Control Program (NCP) designated the software addition to the operating system that implemented the Host-Host protocol. Over time, the term Host-Host protocol fell into disuse in favor of Network Control Protocol, and the initials “NCP” were repurposed. [4] Samuel Johnson - ‘Depend upon it, sir, when a man knows he is to be hanged in a fortnight, it concentrates his mind wonderfully.’

Supporting the latest version of the Privacy Pass Protocol

CloudFlare Blog -

At Cloudflare, we are committed to supporting and developing new privacy-preserving technologies that benefit all Internet users. In November 2017, we announced server-side support for the Privacy Pass protocol, a piece of work developed in collaboration with the academic community. Privacy Pass, in a nutshell, allows clients to provide proof of trust without revealing where and when the trust was provided. The aim of the protocol is then to allow anyone to prove they are trusted by a server, without that server being able to track the user via the trust that was assigned.On a technical level, Privacy Pass clients receive attestation tokens from a server, that can then be redeemed in the future. These tokens are provided when a server deems the client to be trusted; for example, after they have logged into a service or if they prove certain characteristics. The redeemed tokens are cryptographically unlinkable to the attestation originally provided by the server, and so they do not reveal anything about the client.To use Privacy Pass, clients can install an open-source browser extension available in Chrome & Firefox. There have been over 150,000 individual downloads of Privacy Pass worldwide; approximately 130,000 in Chrome and more than 20,000 in Firefox. The extension is supported by Cloudflare to make websites more accessible for users. This complements previous work, including the launch of Cloudflare onion services to help improve accessibility for users of the Tor Browser.The initial release was almost two years ago, and it was followed up with a research publication that was presented at the Privacy Enhancing Technologies Symposium 2018 (winning a Best Student Paper award). Since then, Cloudflare has been working with the wider community to build on the initial design and improve Privacy Pass. We’ll be talking about the work that we have done to develop the existing implementations, alongside the protocol itself.What’s new?Support for Privacy Pass v2.0 browser extension: Easier configuration of workflow. Integration with new service provider (hCaptcha). Compliance with hash-to-curve draft. Possible to rotate keys outside of extension release. Available in Chrome and Firefox (works best with up-to-date browser versions). Rolling out a new server backend using Cloudflare Workers platform: Cryptographic operations performed using internal V8 engine. Provides public redemption API for Cloudflare Privacy Pass v2.0 tokens. Available by making POST requests to https://privacypass.cloudflare.com/api/redeem. See the documentation for example usage. Only compatible with extension v2.0 (check that you have updated!). Standardization: Continued development of oblivious pseudorandom functions (OPRFs) draft in prime-order groups with CFRG@IRTF. New draft specifying Privacy Pass protocol. Extension v2.0In the time since the release, we’ve been working on a number of new features. Today we’re excited to announce support for version 2.0 of the extension, the first update since the original release. The extension continues to be available for Chrome and Firefox. You may need to download v2.0 manually from the store if you have auto-updates disabled in your browser.The extension remains under active development and we still regard our support as in the beta phase. This will continue to be the case as the draft specification of the protocol continues to be written in collaboration with the wider community.New IntegrationsThe client implementation uses the WebRequest API to look for certain types of HTTP requests. When these requests are spotted, they are rewritten to include some cryptographic data required for the Privacy Pass protocol. This allows Privacy Pass providers receiving this data to authorize access for the user.For example, a user may receive Privacy Pass tokens for completing some server security checks. These tokens are stored by the browser extension, and any future request that needs similar security clearance can be modified to add a stored token as an extra HTTP header. The server can then check the client token and verify that the client has the correct authorization to proceed.While Cloudflare supports a particular type of request flow, it would be impossible to expect different service providers to all abide by the same exact interaction characteristics. One of the major changes in the v2.0 extension has been a technical rewrite to instead use a central configuration file. The config is specified in the source code of the extension and allows easier modification of the browsing characteristics that initiate Privacy Pass actions. This makes adding new, completely different request flows possible by simply cloning and adapting the configuration for new providers.To demonstrate that such integrations are now possible with other services beyond Cloudflare, a new version of the extension will soon be rolling out that is supported by the CAPTCHA provider hCaptcha. Users that solve ephemeral challenges provided by hCaptcha will receive privacy-preserving tokens that will be redeemable at other hCaptcha customer sites.“hCaptcha is focused on user privacy, and supporting Privacy Pass is a natural extension of our work in this area. We look forward to working with Cloudflare and others to make this a common and widely adopted standard, and are currently exploring other applications. Implementing Privacy Pass into our globally distributed service was relatively straightforward, and we have enjoyed working with the Cloudflare team to improve the open source Chrome browser extension in order to deliver the best experience for our users.” - Eli-Shaoul Khedouri, founder of hCaptchaThis hCaptcha integration with the Privacy Pass browser extension acts as a proof-of-concept in establishing support for new services. Any new providers that would like to integrate with the Privacy Pass browser extension can do so simply by making a PR to the open-source repository.Improved cryptographic functionalityAfter the release of v1.0 of the extension, there were features that were still unimplemented. These included proper zero-knowledge proof validation for checking that the server was always using the same committed key. In v2.0 this functionality has been completed, verifiably preventing a malicious server from attempting to deanonymize users by using a different key for each request.The cryptographic operations required for Privacy Pass are performed using elliptic curve cryptography (ECC). The extension currently uses the NIST P-256 curve, for which we have included some optimisations. Firstly, this makes it possible to store elliptic curve points in compressed and uncompressed data formats. This means that browser storage can be reduced by 50%, and that server responses can be made smaller too.Secondly, support has been added for hashing to the P-256 curve using the “Simplified Shallue-van de Woestijne-Ulas” (SSWU) method specified in an ongoing draft (https://tools.ietf.org/html/draft-irtf-cfrg-hash-to-curve-03) for standardizing encodings for hashing to elliptic curves. The implementation is compliant with the specification of the “P256-SHA256-SSWU-” ciphersuite in this draft.These changes have a dual advantage, firstly ensuring that the P-256 hash-to-curve specification is compliant with the draft specification. Secondly this ciphersuite removes the necessity for using probabilistic methods, such as hash-and-increment. The hash-and-increment method has a non-negligible chance of failure, and the running time is highly dependent on the hidden client input. While it is not clear how to abuse timing attack vectors currently, using the SSWU method should reduce the potential for attacking the implementation, and learning the client input.Key rotationAs we mentioned above, verifying that the server is always using the same key is an important part of ensuring the client’s privacy. This ensures that the server cannot segregate the user base and reduce client privacy by using different secret keys for each client that it interacts with. The server guarantees that it’s always using the same key by publishing a commitment to its public key somewhere that the client can access.Every time the server issues Privacy Pass tokens to the client, it also produces a zero-knowledge proof that it has produced these tokens using the correct key.Before the extension stores any tokens, it first verifies the proof against the commitments it knows. Previously, these commitments were stored directly in the source code of the extension. This meant that if the server wanted to rotate its key, then it required releasing a new version of the extension, which was unnecessarily difficult. The extension has been modified so that the commitments are stored in a trusted location that the client can access when it needs to verify the server response. Currently this location is a separate Privacy Pass GitHub repository. For those that are interested, we have provided a more detailed description of the new commitment format in Appendix A at the end of this post.Implementing server-side support in WorkersSo far we have focused on client-side updates. As part of supporting v2.0 of the extension, we are rolling out some major changes to the server-side support that Cloudflare uses. For version 1.0, we used a Go implementation of the server. In v2.0 we are introducing a new server implementation that runs in the Cloudflare Workers platform. This server implementation is only compatible with v2.0 releases of Privacy Pass, so you may need to update your extension if you have auto-updates turned off in your browser.Our server will run at https://privacypass.cloudflare.com, and all Privacy Pass requests sent to the Cloudflare edge are handled by Worker scripts that run on this domain. Our implementation has been rewritten using Javascript, with cryptographic operations running in the V8 engine that powers Cloudflare Workers. This means that we are able to run highly efficient and constant-time cryptographic operations. On top of this, we benefit from the enhanced performance provided by running our code in the Workers Platform, as close to the user as possible.WebCrypto supportFirstly, you may be asking, how do we manage to implement cryptographic operations in Cloudflare Workers? Currently, support for performing cryptographic operations is provided in the Workers platform via the WebCrypto API. This API allows users to compute functionality such as cryptographic hashing, alongside more complicated operations like ECDSA signatures.In the Privacy Pass protocol, as we’ll discuss a bit later, the main cryptographic operations are performed by a protocol known as a verifiable oblivious pseudorandom function (VOPRF). Such a protocol allows a client to learn function outputs computed by a server, without revealing to the server what their actual input was. The verifiable aspect means that the server must also prove (in a publicly verifiable way) that the evaluation they pass to the user is correct. Such a function is pseudorandom because the server output is indistinguishable from a random sequence of bytes.The VOPRF functionality requires a server to perform low-level ECC operations that are not currently exposed in the WebCrypto API. We balanced out the possible ways of getting around this requirement. First we trialled trying to use the WebCrypto API in a non-standard manner, using EC Diffie-Hellman key exchange as a method for performing the scalar multiplication that we needed. We also tried to implement all operations using pure JavaScript. Unfortunately both methods were unsatisfactory in the sense that they would either mean integrating with large external cryptographic libraries, or they would be far too slow to be used in a performant Internet setting.In the end, we settled on a solution that adds functions necessary for Privacy Pass to the internal WebCrypto interface in the Cloudflare V8 Javascript engine. This algorithm mimics the sign/verify interface provided by signature algorithms like ECDSA. In short, we use the sign() function to issue Privacy Pass tokens to the client. While verify() can be used by the server to verify data that is redeemed by the client. These functions are implemented directly in the V8 layer and so they are much more performant and secure (running in constant-time, for example) than pure JS alternatives. The Privacy Pass WebCrypto interface is not currently available for public usage. If it turns out there is enough interest in using this additional algorithm in the Workers platform, then we will consider making it public.ApplicationsIn recent times, VOPRFs have been shown to be a highly useful primitive in establishing many cryptographic tools. Aside from Privacy Pass, they are also essential for constructing password-authenticated key exchange protocols such as OPAQUE. They have also been used in designs of private set intersection, password-protected secret-sharing protocols, and privacy-preserving access-control for private data storage.Public redemption APIWriting the server in Cloudflare Workers means that we will be providing server-side support for Privacy Pass on a public domain! While we only issue tokens to clients after we are sure that we can trust them, anyone will be able to redeem the tokens using our public redemption API at https://privacypass.cloudflare.com/api/redeem. As we roll-out the server-side component worldwide, you will be able to interact with this API and verify Cloudflare Privacy Pass tokens independently of the browser extension.This means that any service can accept Privacy Pass tokens from a client that were issued by Cloudflare, and then verify them with the Cloudflare redemption API. Using the result provided by the API, external services can check whether Cloudflare has authorized the user in the past.We think that this will benefit other service providers because they can use the attestation of authorization from Cloudflare in their own decision-making processes, without sacrificing the privacy of the client at any stage. We hope that this ecosystem can grow further, with potentially more services providing public redemption APIs of their own. With a more diverse set of issuers, these attestations will become more useful.By running our server on a public domain, we are effectively a customer of the Cloudflare Workers product. This means that we are also able to make use of Workers KV for protecting against malicious clients. In particular, servers must check that clients are not re-using tokens during the redemption phase. The performance of Workers KV in analyzing reads makes this an obvious choice for providing double-spend protection globally.If you would like to use the public redemption API, we provide documentation for using it at https://privacypass.github.io/api-redeem. We also provide some example requests and responses in Appendix B at the end of the post.Standardization & new applicationsIn tandem with the recent engineering work that we have been doing on supporting Privacy Pass, we have been collaborating with the wider community in an attempt to standardize both the underlying VOPRF functionality, and the protocol itself. While the process of standardization for oblivious pseudorandom functions (OPRFs) has been running for over a year, the recent efforts to standardize the Privacy Pass protocol have been driven by very recent applications that have come about in the last few months.Standardizing protocols and functionality is an important way of providing interoperable, secure, and performant interfaces for running protocols on the Internet. This makes it easier for developers to write their own implementations of this complex functionality. The process also provides helpful peer reviews from experts in the community, which can lead to better surfacing of potential security risks that should be mitigated in any implementation. Other benefits include coming to a consensus on the most reliable, scalable and performant protocol designs for all possible applications.Oblivious pseudorandom functionsOblivious pseudorandom functions (OPRFs) are a generalization of VOPRFs that do not require the server to prove that they have evaluated the functionality properly. Since July 2019, we have been collaborating on a draft with the Crypto Forum Research Group (CFRG) at the Internet Research Task Force (IRTF) to standardize an OPRF protocol that operates in prime-order groups. This is a generalisation of the setting that is provided by elliptic curves. This is the same VOPRF construction that was originally specified by the Privacy Pass protocol and is based heavily on the original protocol design from the paper of Jarecki, Kiayias and Krawczyk.One of the recent changes that we've made in the draft is to increase the size of the key that we consider for performing OPRF operations on the server-side. Existing research suggests that it is possible to create specific queries that can lead to small amounts of the key being leaked. For keys that provide only 128 bits of security this can be a problem as leaking too many bits would reduce security beyond currently accepted levels. To counter this, we have effectively increased the minimum key size to 192 bits. This prevents this leakage becoming an attack vector using any practical methods. We discuss these attacks in more detail later on when discussing our future plans for VOPRF development.Recent applications and standardizing the protocolThe application that we demonstrated when originally supporting Privacy Pass was always intended as a proof-of-concept for the protocol. Over the past few months, a number of new possibilities have arisen in areas that go far beyond what was previously envisaged.For example, the trust token API, developed by the Web Incubator Community Group, has been proposed as an interface for using Privacy Pass. This applications allows third-party vendors to check that a user has received a trust attestation from a set of central issuers. This allows the vendor to make decisions about the honesty of a client without having to associate a behaviour profile with the identity of the user. The objective is to prevent against fraudulent activity from users who are not trusted by the central issuer set. Checking trust attestations with central issuers would be possible using similar redemption APIs to the one that we have introduced.A separate piece of work from Facebook details a similar application for preventing fraudulent behavior that may also be compatible with the Privacy Pass protocol. Finally, other applications have arisen in the areas of providing access to private storage and establishing security and privacy models in advertisement confirmations.A new draftWith the applications above in mind, we have recently started collaborative work on a new IETF draft that specifically lays out the required functionality provided by the Privacy Pass protocol as a whole. Our aim is to develop, alongside wider industrial partners and the academic community, a functioning specification of the Privacy Pass protocol. We hope that by doing this we will be able to design a base-layer protocol, that can then be used as a cryptographic primitive in wider applications that require some form of lightweight authorization. Our plan is to present the first version of this draft at the upcoming IETF 106 meeting in Singapore next month.The draft is still in the early stages of development and we are actively looking for people who are interested in helping to shape the protocol specification. We would be grateful for any help that contributes to this process. See the GitHub repository for the current version of the document.Future avenuesFinally, while we are actively working on a number of different pathways in the present, the future directions for the project are still open. We believe that there are many applications out there that we have not considered yet and we are excited to see where the protocol is used in the future. Here are some other ideas we have for novel applications and security properties that we think might be worth pursuing in future.Publicly verifiable tokensOne of the disadvantages of using a VOPRF is that redemption tokens are only verifiable by the original issuing server. If we used an underlying primitive that allowed public verification of redemption tokens, then anyone could verify that the issuing server had issued the particular token. Such a protocol could be constructed on top of so-called blind signature schemes, such as Blind RSA. Unfortunately, there are performance and security concerns arising from the usage of blind signature schemes in a browser environment. Existing schemes (especially RSA-based variants) require cryptographic computations that are much heavier than the construction used in our VOPRF protocol.Post-quantum VOPRF alternativesThe only constructions of VOPRFs exist in pre-quantum settings, usually based on the hardness of well-known problems in group settings such as the discrete-log assumption. No constructions of VOPRFs are known to provide security against adversaries that can run quantum computational algorithms. This means that the Privacy Pass protocol is only believed to be secure against adversaries running  on classical hardware.Recent developments suggest that quantum computing may arrive sooner than previously thought. As such, we believe that investigating the possibility of constructing practical post-quantum alternatives for our current cryptographic toolkit is a task of great importance for ourselves and the wider community. In this case, devising performant post-quantum alternatives for VOPRF constructions would be an important theoretical advancement. Eventually this would lead to a Privacy Pass protocol that still provides privacy-preserving authorization in a post-quantum world.VOPRF security and larger ciphersuitesWe mentioned previously that VOPRFs (or simply OPRFs) are susceptible to small amounts of possible leakage in the key. Here we will give a brief description of the actual attacks themselves, along with further details on our plans for implementing higher security ciphersuites to mitigate the leakage.Specifically, malicious clients can interact with a VOPRF for creating something known as a q-Strong-Diffie-Hellman (q-sDH) sample. Such samples are created in mathematical groups (usually in the elliptic curve setting). For any group there is a public element g that is central to all Diffie-Hellman type operations, along with the server key K, which is usually just interpreted as a randomly generated number from this group. A q-sDH sample takes the form: ( g, g^K, g^(K^2), … , g^(K^q) ) and asks the malicious adversary to create a pair of elements satisfying (g^(1/(s+K)),s). It is possible for a client in the VOPRF protocol to create a q-SDH sample by just submitting the result of the previous VOPRF evaluation back to the server. While this problem is believed to be hard to break, there are a number of past works that show that the problem is somewhat easier than the size of the group suggests (for example, see here and here). Concretely speaking, the bit security implied by the group can be reduced by up to log2(q) bits. While this is not immediately fatal, even to groups that should provide 128 bits of security, it can lead to a loss of security that means that the setting is no longer future-proof. As a result, any group providing VOPRF functionality that is instantiated using an elliptic curve such as P-256 or Curve25519 provides weaker than advised security guarantees.With this in mind, we have taken the recent decision to upgrade the ciphersuites that we recommend for OPRF usage to only those that provide > 128 bits of security, as standard. For example, Curve448 provides 192 bits of security. To launch an attack that reduced security to an amount lower than 128 bits would require making 2^(68) client OPRF queries. This is a significant barrier to entry for any attacker, and so we regard these ciphersuites as safe for instantiating the OPRF functionality.In the near future, it will be necessary to upgrade the ciphersuites that are used in our support of the Privacy Pass browser extension to the recommendations made in the current VOPRF draft. In general, with a more iterative release process, we hope that the Privacy Pass implementation will be able to follow the current draft standard more closely as it evolves during the standardization process.Get in touch!You can now install v2.0 of the Privacy Pass extension in Chrome or Firefox.If you would like to help contribute to the development of this extension then you can do so on GitHub. Are you a service provider that would like to integrate server-side support for the extension? Then we would be very interested in hearing from you!We will continue to work with the wider community in developing the standardization of the protocol; taking our motivation from the available applications that have been developed. We are always looking for new applications that can help to expand the Privacy Pass ecosystem beyond its current boundaries.Appendix Here are some extra details related to the topics that we covered above.A. Commitment format for key rotationsKey commitments are necessary for the server to prove that they’re acting honestly during the Privacy Pass protocol. The commitments that Privacy Pass uses for the v2.0 release have a slightly different format from the previous release."2.00": { "H": "BPivZ+bqrAZzBHZtROY72/E4UGVKAanNoHL1Oteg25oTPRUkrYeVcYGfkOr425NzWOTLRfmB8cgnlUfAeN2Ikmg=", "expiry": "2020-01-11T10:29:10.658286752Z", "sig": "MEUCIQDu9xeF1q89bQuIMtGm0g8KS2srOPv+4hHjMWNVzJ92kAIgYrDKNkg3GRs9Jq5bkE/4mM7/QZInAVvwmIyg6lQZGE0=" } First, the version of the server key is 2.00, the server must inform the client which version it intends to use in the response to a client containing issued tokens. This is so that the client can always use the correct commitments when verifying the zero-knowledge proof that the server sends. The value of the member H is the public key commitment to the secret key used by the server. This is base64-encoded elliptic curve point of the form H=kG where G is the fixed generator of the curve, and k is the secret key of the server. Since the discrete-log problem is believed to be hard to solve, deriving k from H is believed to be difficult. The value of the member expiry is an expiry date for the commitment that is used. The value of the member sig is an ECDSA signature evaluated using a long-term signing key associated with the server, and over the values of H and expiry. When a client retrieves the commitment, it checks that it hasn’t expired and that the signature verifies using the corresponding verification key that is embedded into the configuration of the extension. If these checks pass, it retrieves H and verifies the issuance response sent by the server. Previous versions of these commitments did not include signatures, but these signatures will be validated from v2.0 onwards. When a server wants to rotate the key, it simply generates a new key k2 and appends a new commitment to k2 with a new identifier such as 2.01. It can then use k2 as the secret for the VOPRF operations that it needs to compute. B. Example Redemption API requestThe redemption API at is available over HTTPS by sending POST requests to https://privacypass.cloudflare.com/api/redeem. Requests to this endpoint must specify Privacy Pass data using JSON-RPC 2.0 syntax in the body of the request. Let’s look at an example request:{ "jsonrpc": "2.0", "method": "redeem", "params": { "data": [ "lB2ZEtHOK/2auhOySKoxqiHWXYaFlAIbuoHQnlFz57A=", "EoSetsN0eVt6ztbLcqp4Gt634aV73SDPzezpku6ky5w=", "eyJjdXJ2ZSI6InAyNTYiLCJoYXNoIjoic2hhMjU2IiwibWV0aG9kIjoic3d1In0=" ], "bindings": [ "string1", "string2" ], "compressed":"false" }, "id": 1 } In the above: params.data[0] is the client input data used to generate a token in the issuance phase; params.data[1] is the HMAC tag that the server uses to verify a redemption; and params.data[2] is a stringified, base64-encoded JSON object that specifies the hash-to-curve parameters used by the client. For example, the last element in the array corresponds to the object: { curve: "p256", hash: "sha256", method: "swu", } Which specifies that the client has used the curve P-256, with hash function SHA-256, and the SSWU method for hashing to curve. This allows the server to verify the transaction with the correct ciphersuite. The client must bind the redemption request to some fixed information, which it stores as multiple strings in the array params.bindings. For example, it could send the Host header of the HTTP request, and the HTTP path that was used (this is what is used in the Privacy Pass browser extension). Finally, params.compressed is an optional boolean value (defaulting to false) that indicates whether the HMAC tag was computed over compressed or uncompressed point encodings. Currently the only supported ciphersuites are the example above, or the same except with method equal to increment for the hash-and-increment method of hashing to a curve. This is the original method used in v1.0 of Privacy Pass, and is supported for backwards-compatibility only. See the provided documentation for more details. Example responseIf a request is sent to the redemption API and it is successfully verified, then the following response will be returned.{ "jsonrpc": "2.0", "result": "success", "id": 1 } When an error occurs something similar to the following will be returned.{ "jsonrpc": "2.0", "error": { "message": <error-message>, "code": <error-code>, }, "id": 1 } The error codes that we provide are specified as JSON-RPC 2.0 codes, we document the types of errors that we provide in the API documentation.

Tales from the Crypt(o team)

CloudFlare Blog -

Halloween season is upon us. This week we’re sharing a series of blog posts about work being done at Cloudflare involving cryptography, one of the spookiest technologies around. So subscribe to this blog and come back every day for tricks, treats, and deep technical content.A long-term missionCryptography is one of the most powerful technological tools we have, and Cloudflare has been at the forefront of using cryptography to help build a better Internet. Of course, we haven’t been alone on this journey. Making meaningful changes to the way the Internet works requires time, effort, experimentation, momentum, and willing partners. Cloudflare has been involved with several multi-year efforts to leverage cryptography to help make the Internet better.Here are some highlights to expect this week:We’re renewing Cloudflare’s commitment to privacy-enhancing technologies by sharing some of the recent work being done on Privacy Pass: Supporting the latest version of the Privacy Pass ProtocolWe’re helping forge a path to a quantum-safe Internet by sharing some of the results of the Post-quantum Cryptography experiment: The TLS Post-Quantum ExperimentWe’re sharing the rust-based software we use to power time.cloudflare.com: Announcing cfnts: Cloudflare's implementation of NTS in RustWe’re doing a deep dive into the technical details of Encrypted DNS: DNS Encryption ExplainedWe’re announcing support for a new technique we developed with industry partners to help keep TLS private keys more secure: Delegated Credentials for TLS, and how we're keeping keys safe from memory disclosure attacks: Going Keyless EverywhereThe milestones we’re sharing this week would not be possible without partnerships with companies, universities, and individuals working in good faith to help build a better Internet together. Hopefully, this week provides a fun peek into the future of the Internet.

Public keys are not enough for SSH security

CloudFlare Blog -

If your organization uses SSH public keys, it’s entirely possible you have already mislaid one. There is a file sitting in a backup or on a former employee’s computer which grants the holder access to your infrastructure. If you share SSH keys between employees it’s likely only a few keys are enough to give an attacker access to your entire system. If you don’t share them, it’s likely your team has generated so many keys you long lost track of at least one.If an attacker can breach a single one of your client devices it’s likely there is a known_hosts file which lists every target which can be trivially reached with the keys the machine already contains. If someone is able to compromise a team member’s laptop, they could use keys on the device that lack password protection to reach sensitive destinations.Should that happen, how would you respond and revoke the lost SSH key? Do you have an accounting of the keys which have been generated? Do you rotate SSH keys? How do you manage that across an entire organization so consumed with serving customers that security has to be effortless to be adopted?Cloudflare Access launched support for SSH connections last year to bring zero-trust security to how teams connect to infrastructure. Access integrates with your IdP to bring SSO security to SSH connections by enforcing identity-based rules each time a user attempts to connect to a target resource.However, once Access connected users to the server they still had to rely on legacy SSH keys to authorize their account. Starting today, we’re excited to help teams remove that requirement and replace static SSH keys with short-lived certificates.Replacing a private network with Cloudflare AccessIn traditional network perimeter models, teams secure their infrastructure with two gates: a private network and SSH keys.The private network requires that any user attempting to connect to a server must be on the same network, or a peered equivalent (such as a VPN). However, that introduces some risk. Private networks default to trust that a user on the network can reach a machine. Administrators must proactively segment the network or secure each piece of the infrastructure with control lists to work backwards from that default.Cloudflare Access secures infrastructure by starting from the other direction: no user should be trusted. Instead, users must prove they should be able to access any unique machine or destination by default.We released support for SSH connections in Cloudflare Access last year to help teams leave that network perimeter model and replace it with one that evaluates every request to a server for user identity. Through integration with popular identity providers, that solution also gives teams the ability to bring their SSO pipeline into their SSH flow.Replacing static SSH keys with short-lived certificatesOnce a user is connected to a server over SSH, they typically need to authorize their session. The machine they are attempting to reach will have a set of profiles which consists of user or role identities. Those profiles define what actions the user is able to take.SSH processes make a few options available for the user to login to a profile. In some cases, users can login with a username and password combination. However, most teams rely on public-private key certificates to handle that login. To use that flow, administrators and users need to take prerequisite steps.Prior to the connection, the user will generate a certificate and provide the public key to an administrator, who will then configure the server to trust the certificate and associate it with a certain user and set of permissions. The user stores that certificate on their device and presents it during that last mile. However, this leaves open all of the problems that SSO attempts to solve:Most teams never force users to rotate certificates. If they do, it might be required once a year at most. This leaves static credentials to core infrastructure lingering on hundreds or thousands of devices.Users are responsible for securing their certificates on their devices. Users are also responsible for passwords, but organizations can enforce requirements and revocation centrally.Revocation is difficult. Teams must administer a CRL or OCSP platform to ensure that lost or stolen certificates are not used.With Cloudflare Access, you can bring your SSO accounts to user authentication within your infrastructure. No static keys required.How does it work?To build this we turned to three tools we already had: Cloudflare Access, Argo Tunnel and Workers. Access is a policy engine which combines the employee data in your identity provider (like Okta or AzureAD) with policies you craft. Based on those policies Access is able to limit access to your internal applications to the users you choose. It’s not a far leap to see how the same policy concept could be used to control access to a server over SSH. You write a policy and we use it to decide which of your employees should be able to access which resources. Then we generate a short-lived certificate allowing them to access that resource for only the briefest period of time. If you remove a user from your IdP, their access to your infrastructure is similarly removed, seamlessly.To actually funnel the traffic through our network we use another existing Cloudflare tool: Argo Tunnel. Argo Tunnel flips the traditional model of connecting a server to the Internet. When you spin up our daemon on a machine it makes outbound connections to Cloudflare, and all of your traffic then flows over those connections. This allows the machine to be a part of Cloudflare’s network without you having to expose the machine to the Internet directly.For HTTP use cases, Argo Tunnel only needs to run on the server. In the case of the Access SSH flow, we proxy SSH traffic through Cloudflare by running the Argo Tunnel client, cloudflared, on both the server and the end user’s laptop.When users connect over SSH to a resource secured by Access for Infrastructure, they use the command-line tool cloudflared. cloudflared takes the SSH traffic bound for that hostname and forwards it through Cloudflare based on SSH config settings. No piping or command wrapping required. cloudflared launches a browser window and prompts the user to authenticate with their SSO credentials.Once authenticated, Access checks the user's identity against the policy you have configured for that application. If the user is permitted to reach the resource, Access generates a JSON Web Token (JWT), signed by Cloudflare and scoped to the user and application. Access distributes that token to the user’s device through cloudflared and the tool stores it locally.Like the core Access authentication flow, the token validation is built with a Cloudflare Worker running in every one of our data centers, making it both fast and highly available. Workers made it possible for us to deploy this SSH proxying to all 194 of Cloudflare’s data centers, meaning Access for Infrastructure often speeds up SSH sessions rather than slowing them down.With short-lived certificates enabled, the instance of cloudflared running on the client takes one additional step. cloudflared sends that token to a Cloudflare certificate signing endpoint that creates an ephemeral certificate. The user's SSH flow then sends both the token, which is used to authenticate through Access, and the short-lived certificate, which is used to authenticate to the server. Like the core Access authentication flow, the token validation is built with a Cloudflare Worker running in every one of our data centers, making it both fast and highly available.When the server receives the request, it validates the short-lived certificate against that public key and, if authentic, authorizes the user identity to a matching Unix user. The certificate, once issued, is valid for 2 minutes but the SSH connection can last longer once the session has started.What is the end user experience?Cloudflare Access’ SSH feature is entirely transparent to the end user and does not require any unique SSH commands, wrappers, or flags. Instead, Access requires that your team members take a couple one-time steps to get started:1. Install the cloudflared daemonThe same lightweight software that runs on the target server is used to proxy SSH connections from your team members’ devices through Cloudflare. Users can install it with popular package managers like brew or at the link available here. Alternatively, the software is open-source and can be built and distributed by your administrators.2. Print SSH configuration update and saveOnce an end user has installed cloudflared, they need to run one command to generate new lines to add to their SSH config file:cloudflared access ssh-config --hostname vm.example.com --short-lived-certThe --hostname field will contain the hostname or wildcard subdomain of the resource protected behind Access. Once run, cloudflared will print the following configurations details:Host vm.example.com ProxyCommand bash -c '/usr/local/bin/cloudflared access ssh-gen --hostname %h; ssh -tt %r@cfpipe-vm.example.com >&2 <&1' Host cfpipe-vm.example.com HostName vm.example.com ProxyCommand /usr/local/bin/cloudflared access ssh --hostname %h IdentityFile ~/.cloudflared/vm.example.com-cf_key CertificateFile ~/.cloudflared/vm.example.com-cf_key-cert.pupUsers need to append that output to their SSH config file. Once saved, they can connect over SSH to the protected resource. Access will prompt them to authenticate with their SSO credentials in the browser, in the same way they login to any other browser-based tool. If they already have an active browser session with their credentials, they’ll just see a success page.In their terminal, cloudflared will establish the session and issue the client certificate that corresponds to their identity. What’s next?With short-lived certificates, Access can become a single SSO-integrated gateway for your team and infrastructure in any environment. Users can SSH directly to a given machine and administrators can replace their jumphosts altogether, removing that overhead. The feature is available today for all Access customers. You can get started by following the documentation available here.

Cloudflare response to CPDoS exploits

CloudFlare Blog -

Three vulnerabilities were disclosed as Cache Poisoning Denial of Service attacks in a paper written by Hoai Viet Nguyen, Luigi Lo Iacono, and Hannes Federrath of TH Köln - University of Applied Sciences. These attacks are similar to the cache poisoning attacks presented last year at DEFCON. Our blog post in response to those attacks includes a detailed description of what a cache poisoning attack is.Most customers do not have to take any action to protect themselves from the newly disclosed vulnerabilities. Some configuration changes are recommended if you are a Cloudflare customer running unpatched versions of Microsoft IIS and have request filtering enabled on your origin or have forced caching of HTTP response code 400 through the use of Cloudflare Workers. We have not seen any attempted exploitation of the vulnerabilities described in this paper.Maintaining the integrity of our content caching infrastructure and ensuring our customers are able to quickly and reliably serve the content they expect to their visitors is of paramount importance to us. In practice, Cloudflare ensures caches serve the content they should in two ways:We build our caching infrastructure to behave in ways compliant with industry standards.We actively add defenses to our caching logic to protect customers from common caching pitfalls. We see our job as solving customer problems whenever possible, even if they’re not directly related to using Cloudflare. Examples of this philosophy can be found in how we addressed previously discovered cache attack techniques.A summary of the three attacks disclosed in the paper and how Cloudflare handles them:HTTP Header Method Override (HMO):Impact: Some web frameworks support headers for overriding the HTTP method sent in the HTTP request. Ex: A GET request sent with X-HTTP-Method: POST will be treated by the origin as a POST request (this is not a standard but something many frameworks support). An attacker can use this behavior to potentially trick a CDN into caching poisoned content.Mitigation: We include the following method override headers as part of customer cache keys for requests which include the headers. This ensures that requests made with the headers present do not poison cache contents for requests without them. Note that Cloudflare does not interpret these headers as an actual method override (ie. the GET request in the above example stays a GET request in our eyes). Headers we consider as part of this cache key modification logic are:1) X-HTTP-Method-Override2) X-HTTP-Method3) X-Method-OverrideOversized HTTP Headers (HHO):Impact: The attacker sends large headers that a CDN passes through to origin, but are too large for the origin server to handle. If in this case the origin returns an error page that a shared cache deems cacheable it can result in denial of service for subsequent visitors.Mitigation: Cloudflare does not cache HTTP status code 400 responses by default, which is the common denial of service vector called out by the exploit authors. Some CDN vendors did cache 400 responses, which created the poisoning vector called out by the exploit authors. Cloudflare customers were never vulnerable if their origins emitted 400 errors in response to oversized headers.The one exception to this is Microsoft IIS in specific circumstances. Versions of Microsoft IIS that have not applied the security update for CVE-2019-0941 will return an HTTP 404 response if limits are configured and exceeded for individual request header sizes using the “headerLimits” configuration directive. Shared caches are permitted to cache these 404 responses. We recommend either upgrading IIS or removing headerLimits configuration directives on your origin.HTTP Meta Characters:Impact: Essentially the same attack as oversized HTTP headers, except the attack uses meta characters like \r and \n to cause origins to return errors to shared caches.Mitigation: Same as oversized HTTP headers; Cloudflare does not cache 400 errors by default.In addition to the behavior laid out above, Cloudflare’s caching logic respects origin Cache-Control headers, which allows customers extremely granular control over how our caches behave. We actively work with customers to ensure that they are following best practices for avoiding cache poisoning attacks and add defense in depth through smarter software whenever possible.We look forward to continuing to work with the security community on issues like those discovered to make the Internet safer and more secure for everyone.

Who DDoS'd Austin?

CloudFlare Blog -

It was a scorching Monday on July 22 as temperatures soared above 37°C (99°F) in Austin, TX, the live music capital of the world. Only hours earlier, the last crowds dispersed from the historic East 6th Street entertainment district. A few blocks away, Cloudflarians were starting to make their way to the office. Little did those early arrivers know that they would soon be unknowingly participating in a Cloudflare time honored tradition of dogfooding new services before releasing them to the wild.East 6th Street, Austin Texas(A photo I took on a night out with the team while visiting the Cloudflare Austin office)Dogfooding is when an organization uses its own products. In this case, we dogfed our newest cloud service, Magic Transit, which both protects and accelerates our customers’ entire network infrastructure—not just their web properties or TCP/UDP applications. With Magic Transit, Cloudflare announces your IP prefixes via BGP, attracts (routes) your traffic to our global network edge, blocks bad packets, and delivers good packets to your data centers via Anycast GRE.We decided to use Austin’s network because we wanted to test the new service on a live network with real traffic from real people and apps. With the target identified, we began onboarding the Austin office in an always-on routing topology. In an always-on routing mode, Cloudflare data centers constantly advertise Austin’s prefix, resulting in faster, almost immediate mitigation. As opposed to traditional on-demand scrubbing center solutions with limited networks, Cloudflare operates within 100 milliseconds of 99% of the Internet-connected population in the developed world. For our customers, this means that always-on DDoS mitigation doesn’t sacrifice performance due to suboptimal routing. On the contrary, Magic Transit can actually improve your performance due to our network’s reach.Cloudflare’s Global NetworkDDoS’ing AustinNow that we’ve completed onboarding Austin to Magic Transit, all we needed was a motivated attacker to launch a DDoS attack. Luckily, we found more than a few willing volunteers on our Site Reliability Engineering (SRE) team to execute the attack. While the teams were still assembling in multiple locations around the world, our SRE volunteer started firing packets at our target from an undisclosed location.Without Magic Transit, the Austin office would’ve been hit directly with the packet flood. Two things could have happened in this case (not mutually exclusive): Austin’s on-premise equipment (routers, firewalls, servers, etc.) would have been overwhelmed and failedAustin’s service providers would have dropped packets that exceeded its bandwidth allowanceBoth cases would result in a very bad day for everyone.Cloudflare DDoS MitigationInstead, when our SRE attacker launched the flood the packets were automatically routed via BGP to Cloudflare’s network. The packets reached the closest data center via Anycast and encountered multiple defenses in the form of XDP, eBPF and iptables. Those defenses are populated with pre-configured static firewall rules as well as dynamic rules generated by our DDoS mitigation systems. Static rules can vary from straightforward IP blocking and rate-limiting to more sophisticated expressions that match against specific packet attributes. Dynamic rules, on the other hand, are generated automatically in real-time. To play fair with our attacker, we didn’t pre-configure any special rules against the attack. We wanted to give our attacker a fair opportunity to take Austin down. Although due to our multi-layered protection approach, the odds were never actually in their favor.Source: https://imgflip.comGenerating Dynamic RulesAs part of our multi-layered protection approach, Dynamic Rules are generated on-the-fly by analyzing the packets that route through our network. While the packets are being routed, flow data is asynchronously sampled, collected, and analyzed by two main detection systems. The first is called Gatebot and runs across the entire Cloudflare network; the second is our newly deployed DoSD (denial of service daemon) which operates locally within each data center. DoSD is an exciting improvement that we’ve just recently rolled out and we look forward to writing more about its technical details here soon. DoSD samples at a much faster rate (1/100 packets) versus Gatebot which samples at a lower rate (~1/8000 packets), allowing it to detect even more attacks and block them faster.The asynchronous attack detection lifecycle is represented as the dotted lines in the diagram below. Attacks are detected out of path to assure that we don’t add any latency, and mitigation rules are pushed in line and removed as needed.Multiple packet attributes and correlations are taken into consideration during analysis and detection. Gatebot and DoSD search for both new network anomalies and already known attacks. Once an attack is detected, rules are automatically generated, propagated, and applied in the optimal location within 10 seconds or less. Just to give you an idea of the scale, we’re talking about hundreds of thousands of dynamic rules that are applied and removed every second across the entire Cloudflare network. One of the beauties of Gatebot and DoSD is that they don’t require a traffic learning period. Once a customer is onboarded, they’re protected immediately. They don’t need to sample traffic for weeks before kicking in. While we can always apply specific firewall rules if requested by the customer, no manual configuration is required by the customer or our teams. It just works.What this mitigation process looks like in practiceLet’s look at what happened in Austin when one of our SREs tried to DDoS Austin and failed. During one of the first attempts, before DoSD had rolled out globally, a degradation in audio and video quality was noticed for Austin employees on video calls for a few seconds before Gatebot kicked in. However, as soon as Gatebot kicked in, the quality was immediately restored. If we hadn’t had Magic Transit in-line, the degradation of service would’ve worsened until the point of full denial of service. Austin would have been offline and our Austin colleagues wouldn’t have had a very productive day.On a subsequent attack attempt which took place after DoSD was deployed, our SRE launched a SYN flood on Austin. The attack targeted multiple IP addresses in Austin’s prefix and peaked just above 250,000 packets per second. DoSD detected the attack and blocked it in approximately 3 seconds. DoSD’s quick response resulted in no degradation of service for the Austin team. Attack SnapshotGreen line = Attack traffic to Cloudflare edge, Yellow line = clean traffic from Cloudflare to origin over GREWhat We LearnedDogfooding Magic Transit served as a valuable experiment for us with lots of lessons learned both from the engineering and procedural aspects. From the engineering aspect, we fine-tuned our mitigations and optimized routings. From the procedural aspects, we drilled members of multiple teams including the Security Operations Center and Solution Engineering teams to help refine our run-books. By doing so, we reduced the onboarding duration to hours instead of days in order to assure a quick and smooth onboarding experience for our customers.Want To Learn More?Request a demo and learn how you can protect and accelerate your network with Cloudflare.

Experiment with HTTP/3 using NGINX and quiche

CloudFlare Blog -

Just a few weeks ago we announced the availability on our edge network of HTTP/3, the new revision of HTTP intended to improve security and performance on the Internet. Everyone can now enable HTTP/3 on their Cloudflare zone and experiment with it using Chrome Canary as well as curl, among other clients.We have previously made available an example HTTP/3 server as part of the quiche project to allow people to experiment with the protocol, but it’s quite limited in the functionality that it offers, and was never intended to replace other general-purpose web servers.We are now happy to announce that our implementation of HTTP/3 and QUIC can be integrated into your own installation of NGINX as well. This is made available as a patch to NGINX, that can be applied and built directly with the upstream NGINX codebase.It’s important to note that this is not officially supported or endorsed by the NGINX project, it is just something that we, Cloudflare, want to make available to the wider community to help push adoption of QUIC and HTTP/3.BuildingThe first step is to download and unpack the NGINX source code. Note that the HTTP/3 and QUIC patch only works with the 1.16.x release branch (the latest stable release being 1.16.1). % curl -O https://nginx.org/download/nginx-1.16.1.tar.gz % tar xvzf nginx-1.16.1.tar.gz As well as quiche, the underlying implementation of HTTP/3 and QUIC: % git clone --recursive https://github.com/cloudflare/quicheNext you’ll need to apply the patch to NGINX: % cd nginx-1.16.1 % patch -p01 < ../quiche/extras/nginx/nginx-1.16.patch And finally build NGINX with HTTP/3 support enabled: % ./configure \ --prefix=$PWD \ --with-http_ssl_module \ --with-http_v2_module \ --with-http_v3_module \ --with-openssl=../quiche/deps/boringssl \ --with-quiche=../quiche % makeThe above command instructs the NGINX build system to enable the HTTP/3 support ( --with-http_v3_module) by using the quiche library found in the path it was previously downloaded into ( --with-quiche=../quiche), as well as TLS and HTTP/2. Additional build options can be added as needed.You can check out the full instructions here.RunningOnce built, NGINX can be configured to accept incoming HTTP/3 connections by adding the quic and reuseport options to the listen configuration directive.Here is a minimal configuration example that you can start from:events { worker_connections 1024; } http { server { # Enable QUIC and HTTP/3. listen 443 quic reuseport; # Enable HTTP/2 (optional). listen 443 ssl http2; ssl_certificate cert.crt; ssl_certificate_key cert.key; # Enable all TLS versions (TLSv1.3 is required for QUIC). ssl_protocols TLSv1 TLSv1.1 TLSv1.2 TLSv1.3; # Add Alt-Svc header to negotiate HTTP/3. add_header alt-svc 'h3-23=":443"; ma=86400'; } } This will enable both HTTP/2 and HTTP/3 on the TCP/443 and UDP/443 ports respectively.You can then use one of the available HTTP/3 clients (such as Chrome Canary, curl or even the example HTTP/3 client provided as part of quiche) to connect to your NGINX instance using HTTP/3.We are excited to make this available for everyone to experiment and play with HTTP/3, but it’s important to note that the implementation is still experimental and it’s likely to have bugs as well as limitations in functionality. Feel free to submit a ticket to the quiche project if you run into problems or find any bug.

What's it like to come out as LGBTQIA+ at work?

CloudFlare Blog -

Today is the 31st Anniversary of National Coming Out Day. I wanted to highlight the importance of this day, share coming out resources, and publish some stories of what it's like to come out in the workplace.About National Coming Out DayThirty-one years ago, on the anniversary of the National March on Washington for Lesbian and Gay Rights, we first observed National Coming Out Day as a reminder that one of our most basic tools is the power of coming out. One out of every two Americans has someone close to them who is gay or lesbian. For transgender people, that number is only one in 10.Coming out - whether it is as lesbian, gay, bisexual, transgender or queer - STILL MATTERS. When people know someone who is LGBTQ, they are far more likely to support equality under the law. Beyond that, our stories can be powerful to each other.Each year on October 11th, National Coming Out Day continues to promote a safe world for LGBTQ individuals to live truthfully and openly. Every person who speaks up changes more hearts and minds, and creates new advocates for equality.For more on coming out, visit HRC's Coming Out Center.Source: https://www.hrc.org/resources/national-coming-out-dayComing out stories from Proudflare Last National Coming Out Day, I shared some stories from Proudflare members in this blog post. This year, I wanted to shift our focus to the experience and challenges of coming out in the workplace. I wanted to share what it was like for some of us to come out at Cloudflare, at our first companies, and point out some of the stresses, challenges, and risks involved. Check out these five examples below and share your own in the comments section and/or to the people around you if you'd like! “Coming out twice” from Lily - Cloudflare Austin While my first experience of coming out professionally was at my previous company, I thought I’d share some of the differences between my experiences at Cloudflare and this other company. Reflecting retrospectively, coming out was so immensely liberating. I've never been happier, but at the time I was a mess. LGBTQIA+ people still have little to no legal protection, and having been initially largely rejected by my parents and several of my friends after coming out to them, I felt like I was at sea, floating without a raft. This feeling of unease was compounded by my particular coming out being a two part series: I wasn’t only coming out as transgender, but now also as a lesbian. Eventually, after the physical changes became too noticeable to ignore (around 7 months ago), I worked up the courage to come out at work. The company I was working for was awful in many ways; bad culture, horrible project manager, and rampant nepotism. Despite this, I was pleasantly surprised that what I told them was almost immediately accepted. Surely this was finally a win for me? However, that initial optimism didn’t last. As time went on, it became clear that saying you accept it and actually internalizing it are completely different. I started being questioned about needed medical appointments, and I wasn’t really being treated any different than before. I still have no idea if it played into the reason they fired me for “performance” despite never bringing it up before. As I started applying for new jobs, one thing was always on my mind: will this job be different? Thankfully the answer was yes; my experience at Cloudflare has been completely different. Through the entire hiring process, I never once had to out myself. Finally when I had to come out to use my legal name on the offer letter, Cloudflare handled it with such grace. One such example was that they went so far as to put my preferred name in quotes next to my legal one on the document. These little nuggets of kindness are visible all over the company - you can tell people are accepting and genuinely care. However, the biggest difference was that Cloudflare supports and celebrates the LGBTQIA+ community but doesn’t emphasize it. If you don’t want it to be part of your identity it doesn’t have to be. Looking to the future I hope I can just be a woman that loves women, not a trans-woman that loves women, and I think Cloudflare will be supportive of that.A story from Mark - Cloudflare LondonMy coming out story? It involves an awful lot of tears in a hotel room in Peru, about three and a half thousand miles away from anyone I knew.That probably sounds more dramatic than the reality. I’d been visiting some friends in Minnesota and I was due to head to Peru to hike the Machu Picchu trail, but a missed flight connection saw me stranded in Atlanta overnight.A couple of months earlier, I’d kind of came out to myself. This was less a case of admitting my sexuality, but more finally learning exactly what it is. I’d only just turned 40 and, months later, I was still trying to come to terms with what it all meant; reappraising your sexuality in your 40s is not a journey for the faint of heart! I hadn’t shared it with anyone yet, but while sitting in a thuddingly dull hotel room in Atlanta, it just felt like time. So I penned my coming out letter. The next day I boarded a plane, posted my letter to Facebook, turned off my phone, and then experienced what was, without question, The. Longest. Flight. Of. My. Life. This was followed, perhaps unsurprisingly, by the longest taxi ride of my life.Eventually, after an eternity or two had passed, I reached my hotel room, connected to the hotel wifi and read through the messages that had accumulated over the past 8 hours or so. Messages from my friends, and family, and even my Mum. The love and support I got from all of them just about broke me. I practically dissolved in a puddle of tears as I read through everything. Decades of pent up confusion and pain washed away in those tears.I’ll never forget the sense of acceptance I felt after all that.As for coming out at work, well, let’s see how it goes: Hi, I’m Mark, and I’m asexual.A story from Jacob - Cloudflare San FranciscoI started my career working in consulting in a conservative environment where I was afraid that coming out would cause me to be taken less seriously by my male coworkers. I remember casually mentioning my partner at the time to a couple of close coworkers to gauge their response. They surprised me and turned out to be very accepting and insisted that I bring him to our Holiday Party later that year. That event was the first time I came out to my entire office and I remember feeling very nervous before stepping into the room. My anxiety was soon quelled with a warm welcome from my office leadership and from then on I didn’t feel like I was dancing around the elephant in the room. After this experience being out at work is not something I think greatly about, I have been very fortunate to work in accepting environments including at Cloudflare!A story from Malavika - Cloudflare LondonNearly a decade has passed since I first came out in a professional setting, when I first started working at a global investment bank in Manhattan. The financial services industry was, and continues to be, known for its machismo, and at the time, gay marriage was still illegal in the United States. Despite being out in my personal life, the thought of being out at work terrified me. I already felt so profoundly different from my coworkers as a woman and a person of colour, and thus I feared that my LGBTQIA+ identity would further reduce my chances of career advancement. I had no professional role models to signal that is was okay to be LGBTQIA+ in my career.Soon after starting this job, a close friend and university classmate invited me to a dinner for LGBTQIA+ young professionals in financial services and management consulting. I had never attended an event targeted at LGBTQIA+ professionals, let alone met an out LGBTQIA+ individual working outside of the arts, academia or nonprofit sectors. Looking around the dining room, I felt as though I had spotted a unicorn: a handful of out senior leaders at top investment banks and consulting firms sat among nearly 40 ambitious young professionals, sharing their coming out stories and providing invaluable career advice. Before this event, I would have never believed that there were so many people “like me” within the industry, and most certainly not in executive positions. For the first time, I felt a strong sense of belonging, as I finally had LGBTQIA+ role models to look up to professionally, and I no longer felt afraid of being open about my sexuality professionally.After this event, I felt inspired and energised. Over the subsequent weeks, my authentic self began to show. My confidence and enthusiasm at work dramatically increased. I was able to build trust with my colleagues more easily, and my managers lauded me for my ability to incorporate constructive feedback quickly. As I reflect on my career trajectory, I have not succeeded in spite of my sexuality, but rather, because of being out as a bisexual woman. Over the course of my career, I have developed strong professional relationships with senior LGBTQIA+ mentors, held leadership positions in a variety of diversity networks and organisations, and attended a number of inspiring conferences and events. Without the anxiety of having to hide an important part of my identity, I am able to be the confident, intelligent woman I truly am. And that is precisely why I am actively involved in Proudflare, Cloudflare’s employee resource group for LGBTQIA+ individuals. I strongly believe that by creating an inclusive workplace - for anyone who feels different or out of place - all employees will have the support and confidence to shine in their professional and personal lives.A story from Chase - Cloudflare San FranciscoI really discovered my sexuality in college. Growing up, there weren’t many queer people in my life. I always had a loving family that would presumably accept me for who I was, but the lack of any queer role models in my life made me think that I was straight for quite some time. I just didn’t know what being gay was. I always had a best friend - someone that I would end up spending all my time with. This friend wouldn’t always be the same person, but inevitably I would latch on one person and focus most of my emotional energy on our friendship. In college this friend was Daniel. We met while pledging a business fraternity our freshman year and quickly became close friends. Daniel made me feel different. I thought about him when I wasn't with him, I wanted to be with him all the time, and most of all I would get jealous when he would date women. He saw right through me and eventually got me to open up about being gay. Our long emotional text conversation ended with me asking if he had anything he wanted to share with me (fingers crossed). His answer - “I don’t know why everyone assumes I’m gay, I’m not.” Heart = Broken. Fast forward 6 months and we decide to live together our Junior year. I slowly started becoming more comfortable with my sexuality and began coming out. I started with my close friends, then my brother, then slightly less close friends, but kept getting hung up on my parents. Luckily, Daniel made that easier. That text from Daniel about not being gay ended up being not as set in stone as I thought. We started secretly dating for almost a year and I was the happiest I have ever been. The thrills of a secret relationship can only last so long and eventually we knew we needed to tell the world. We came out to our parents together, as a couple. We were there for each other for the good conversations, the tough conversations, the “Facebook Official” post, and coming out at our first corporate jobs (A never ending cycle). We were so fortunate to both work at warm, welcoming companies when we came out and continue to work at such companies today. Coming out wasn’t easy but knowing I didn’t have to do it alone made it a whole heck of a lot easier. Happy four-year anniversary, Dan. Resources for living openlyTo find resources about living openly, visit the Human Rights Campaign’s Coming Out Center. I hope you'll be true to yourselves and always be loud and proud.About ProudflareTo read more about Proudflare and why Cloudflare cares about inclusion in the workplace, read Proudflare’s first pride blog post.

Good Morning, Jakarta!

CloudFlare Blog -

Beneath the veneer of glass and concrete, this is a city of surprises and many faces. On 3rd October 2019, we brought together a group of leaders from across a number of industries to connect in Central Jakarta, Indonesia. The habit of sharing stories at the lunch table, exchanging ideas, and listening to ideas from the different viewpoints of people from all tiers, paying first-hand attention to all input from customers, and listening to the dreams of some of life’s warriors may sound simple but it is a source of inspiration and encouragement in helping the cyberspace community in this region.And our new data center in Jakarta extends our Asia Pacific network to 64 cities, and our global network to 194 cities.Selamat PagiRight on time, Kate Fleming extended a warm welcome to our all our Indonesia guests. "We were especially appreciative of the investment of your time that you made coming to join us."Kate, is the Head of Customer Success for APAC. Australian-born, Kate spent the past 5 years living in Malaysia and Singapore. She leads a team of Customer Success Managers in Singapore. The Customer Success team is dispersed across multiple offices and time zones. We are the advocates for Cloudflare Enterprise customers. We help with your on-boarding journey and various post sales activities from project and resource management planning to training, configuration recommendations, sharing best practices, point of escalation and more."Today, the Indonesian Cloudflare team would like to share with you some insights and best practices around how Cloudflare is not only a critical part of any organization’s cyber security planning, but is working towards building a better internet in the process.” - KateLearning Modern Trends of Cyber AttacksAyush Verma, who is our Solutions Engineer for ASEAN and India, was there to unveil the latest cyber security trends. He shared insights on how to stay ahead of the game in the fast-charging online environment. Get answers to questions like:How can I secure my site without sacrificing performance?What are the latest trends in malicious attacks — and how should I prepare?Superheroes Behind The ScenesWe were very honored to have two industry leaders speak to us.Jullian Gafar, the CTO from PT Viva Media Baru. PT Viva Media Baru is an online media company based out of Jakarta, Indonesia.Firman Gautama, the VP of Infrastructure & Security from PT. Global Tiket Network. PT. Global Tiket Network offer hotel, flight, car rental, train, world class event/concert and attraction tickets.It was a golden opportunity to hear from the leaders themselves about what’s keeping them busy lately, their own approaches to cyber security, best practices, and easy-to-implement and cost-efficient strategies.  Fireside Chat Highlights:  Shoutout from Pak Firman, who was very pleased with the support he received from Kartika. He said "most sales people are hard to reach after completing a sale. Kartika always goes the extra mile, she stays engaged with me. The Customer Experience is just exceptional.”Our Mission Continues Thank you for giving us your time to connect. It brings us back to our roots and core mission of helping to build a better internet. Based on this principle “The Result Never Betrays the Effort’ we believe that what we are striving for today, by creating various innovations in our services and strategies to improve your business, will in time produce the best results. For this reason, we offer our endless thanks for your support and loyalty in continuing to push forward with us. Always at your service! Cloudflare Event Crew in Indonesia #CloudflareJKTChris Chua (Organiser) | Kate Fleming | Bentara Frans | Ayush Verma | Welly Tandiono | Kartika Mulyo  | Riyan Baharudin

Terraforming Cloudflare: in quest of the optimal setup

CloudFlare Blog -

This is a guest post by Dimitris Koutsourelis and Alexis Dimitriadis, working for the Security Team at Workable, a company that makes software to help companies find and hire great people.This post is about our introductive journey to the infrastructure-as-code practice; managing Cloudflare configuration in a declarative and version-controlled way. We’d like to share the experience we’ve gained during this process; our pain points, limitations we faced, different approaches we took and provide parts of our solution and experimentations.Terraform worldTerraform is a great tool that fulfills our requirements, and fortunately, Cloudflare maintains its own provider that allows us to manage its service configuration hasslefree.On top of that, Terragrunt, is a thin wrapper that provides extra commands and functionality for keeping Terraform configurations DRY, and managing remote state.The combination of both leads to a more modular and re-usable structure for Cloudflare resources (configuration), by utilizing terraform and terragrunt modules.We’ve chosen to use the latest version of both tools (Terraform-v0.12 & Terragrunt-v0.19 respectively) and constantly upgrade to take advantage of the valuable new features and functionality, which at this point in time, remove important limitations.Workable contextOur set up includes multiple domains that are grouped in two distinct Cloudflare organisations: production & staging. Our environments have their own purposes and technical requirements (i.e.: QA, development, sandbox and production) which translates to slightly different sets of Cloudflare zone configuration.Our approachOur main goal was to have a modular set up with the ability to manage any configuration for any zone, while keeping code repetition to a minimum. This is more complex than it sounds; we have repeatedly changed our Terraform folder structure - and other technical aspects - during the development period. The following sections illustrate a set of alternatives through our path, along with pros & cons.StructureTerraform configuration is based on the project’s directory structure, so this is the place to start. Instead of retaining the Cloudflare organisation structure (production & staging as root level directories containing the zones that belong in each organization), our decision was to group zones that share common configuration under the same directory. This helps keep the code dry and the set up consistent and readable.On the down side, this structure adds an extra layer of complexity, as two different sets of credentials need to be handled conditionally and two state files (at the environments/ root level) must be managed and isolated using workspaces.On top of that, we used Terraform modules, to keep sets of common configuration across zone groups into a single place.Terraform modules repositorymodules/ │ ├── firewall/ │ ├── main.tf │ ├── variables.tf │ ├── zone_settings/ │ ├── main.tf │ ├── variables.tf │ └── [...] └──Terragrunt modules repositoryenvironments/ │ ├── [...] │ ├── dev/ │ ├── qa/ │ ├── demo/ │ ├── zone-8/ (production) │ └── terragrunt.hcl │ ├── zone-9/ (staging) │ └── terragrunt.hcl │ ├── config.tfvars │ ├── main.tf │ └── variables.tf │ ├── config.tfvars │ ├── secrets.tfvars │ ├── main.tf │ ├── variables.tf │ └── terragrunt.hcl └──The Terragrunt modules tree gives flexibility, since we are able to apply configuration on a zone, group zone, or organisation level (which is inline with Cloudflare configuration capabilities - i.e.: custom error pages can also be configured on the organisation level).Resource typesWe decided to implement Terraform resources in different ways, to cover our requirements more efficiently.1. Static resourceThe first thought that came to mind was having one, or multiple .tf files implementing all the resources with hardcoded values assigned to each attribute. It’s simple and straightforward, but can have a high maintenance cost if it leads to code copy/paste between environments.So, common settings seem to be a good use case; we chose to implement access_rules Terraform resources accordingly: modules/access_rules/main.tfresource "cloudflare_access_rule" "no_17" { notes = "this is a description" mode = "blacklist" configuration = { target = "ip" value = "x.x.x.x" } } [...]2. Parametrized resourcesOur next step was to add variables to gain flexibility. This is useful when few attributes of a shared resource configuration differ between multiple zones. Most of the configuration remains the same (as described above) and the variable instantiation is added in the Terraform module, while their values are fed through the Terragrunt module, as input variables, or entries inside_.tfvars_ files. The zone_settings_override resource was implemented accordingly:modules/zone_settings/main.tf resource "cloudflare_zone_settings_override" "zone_settings" { zone_id = var.zone_id settings { always_online = "on" always_use_https = "on" [...] browser_check = var.browser_check mobile_redirect { mobile_subdomain = var.mobile_redirect_subdomain status = var.mobile_redirect_status strip_uri = var.mobile_redirect_uri } [...] waf = "on" webp = "off" websockets = "on" } }environments/qa/main.tfmodule "zone_settings" { source = "git@github.com:foo/modules/zone_settings" zone_name = var.zone_name browser_check = var.zone_settings_browser_check [...] }environments/qa/config.tfvars#zone settings zone_settings_browser_check = "off" [...] }3. Dynamic resourceAt that point, we thought that a more interesting approach would be to create generic resource templates to manage all instances of a given resource in one place. A template is implemented as a Terraform module and creates each resource dynamically, based on its input: data fed through the Terragrunt modules (/environments in our case), or entries in the tfvars files.We chose to implement the account_member resource this way.modules/account_members/variables.tfvariable "users" { description = "map of users - roles" type = map(list(string)) } variable "member_roles" { description = "account role ids" type = map(string) } modules/account_members/main.tfresource "cloudflare_account_member" "account_member" { for_each = var.users email_address = each.key role_ids = [for role in each.value : lookup(var.member_roles, role)] lifecycle { prevent_destroy = true } } We feed the template with a list of users (list of maps). Each member is assigned a number of roles. To make code more readable, we mapped users to role names instead of role ids:environments/config.tfvarsmember_roles = { admin = "000013091sds0193jdskd01d1dsdjhsd1" admin_ro = "0000ds81hd131bdsjd813hh173hds8adh" analytics = "0000hdsa8137djahd81y37318hshdsjhd" [...] super_admin = "00001534sd1a2123781j5gj18gj511321" } users = { "user1@workable.com" = ["super_admin"] "user2@workable.com" = ["analytics", "audit_logs", "cache_purge", "cf_workers"] "user3@workable.com" = ["cf_stream"] [...] "robot1@workable.com" = ["cf_stream"] } Another interesting case we dealt with was the rate_limit resource; the variable declaration (list of objects) & implementation goes as follows:modules/rate_limit/variables.tfvariable "rate_limits" { description = "list of rate limits" default = [] type = list(object( { disabled = bool, threshold = number, description = string, period = number, match = object({ request = object({ url_pattern = map(string), schemes = list(string), methods = list(string) }), response = object({ statuses = list(number), origin_traffic = bool }) }), action = object({ mode = string, timeout = number }) })) } modules/rate_limit/main.tflocals { […] } data "cloudflare_zones" "zone" { filter { name = var.zone_name status = "active" paused = false } } resource "cloudflare_rate_limit" "rate_limit" { count = length(var.rate_limits) zone_id = lookup(data.cloudflare_zones.zone.zones[0], "id") disabled = var.rate_limits[count.index].disabled threshold = var.rate_limits[count.index].threshold description = var.rate_limits[count.index].description period = var.rate_limits[count.index].period match { request { url_pattern = local.url_patterns[count.index] schemes = var.rate_limits[count.index].match.request.schemes methods = var.rate_limits[count.index].match.request.methods } response { statuses = var.rate_limits[count.index].match.response.statuses origin_traffic = var.rate_limits[count.index].match.response.origin_traffic } } action { mode = var.rate_limits[count.index].action.mode timeout = var.rate_limits[count.index].action.timeout } } environments/qa/rate_limit.tfvars{ #1 disabled = false threshold = 50 description = "sample description" period = 60 match = { request = { url_pattern = { "subdomain" = "foo" "path" = "/api/v1/bar" } schemes = [ "_ALL_", ] methods = [ "GET", "POST", ] } response = { statuses = [] origin_traffic = true } } action = { mode = "simulate" timeout = 3600 } }, [...] } ] The biggest advantage of this approach is that all common rate_limit rules are in one place and each environment can include its own rules in their .tfvars. The combination of those using Terraform built-in concat() function, achieves a 2-layer join of the two lists (common|unique rules). So we wanted to give it a try:locals { rate_limits = concat(var.common_rate_limits, var.unique_rate_limits) } There is however a drawback: .tfvars files can only contain static values. So, since all url attributes - that include the zone name itself - have to be set explicitly in the data of each environment, it means that every time a change is needed to a url, this value has to be copied across all environments and change the zone name to match the environment.The solution we came up with, in order to make the zone name dynamic, was to split the url attribute into 3 parts: subdomain, domain and path. This is effective for the .tfvars, but the added complexity to handle the new variables is non negligible. The corresponding code illustrates the issue:modules/rate_limit/main.tflocals { rate_limits = concat(var.common_rate_limits, var.unique_rate_limits) url_patterns = [for rate_limit in local.rate_limits: "${lookup(rate_limit.match.request.url_pattern, "subdomain", null) != null ? "${lookup(rate_limit.match.request.url_pattern, "subdomain")}." : ""}"${lookup(rate_limit.match.request.url_pattern, "domain", null) != null ? "${lookup(rate_limit.match.request.url_pattern, "domain")}" : ${var.zone_name}}${lookup(rate_limit.match.request.url_pattern, "path", null) != null ? lookup(rate_limit.match.request.url_pattern, "path") : ""}"] } Readability vs functionality: although flexibility is increased and code duplication is reduced, the url transformations have an impact on code’s readability and ease of debugging (it took us several minutes to spot a typo). You can imagine this is even worse if you attempt to implement a more complex resource (such as page_rule which is a list of maps with four url attributes).The underlying issue here is that at the point we were implementing our resources, we had to choose maps over objects due to their capability to omit attributes, using the lookup() function (by setting default values). This is a requirement for certain resources such as page_rules: only certain attributes need to be defined (and others ignored).In the end, the context will determine if more complex resources can be implemented with dynamic resources.4. Sequential resourcesCloudflare page rule resource has a specific peculiarity that differentiates it from other types of resources: the priority attribute.When a page rule is applied, it gets a unique id and priority number which corresponds to the order it has been submitted. Although Cloudflare API and terraform provider give the ability to explicitly specify the priority, there is a catch.Terraform doesn’t respect the order of resources inside a .tf file (even in a _for each loop!); each resource is randomly picked up and then applied to the provider. So, if page_rule priority is important - as in our case - the submission order counts. The solution is to lock the sequence in which the resources are created through the depends_on meta-attribute:resource "cloudflare_page_rule" "no_3" { depends_on = [cloudflare_page_rule.no_2] zone_id = lookup(data.cloudflare_zones.zone.zones[0], "id") target = "www.${var.zone_name}/foo" status = "active" priority = 3 actions { forwarding_url { status_code = 301 url = "https://www.${var.zone_name}" } } } resource "cloudflare_page_rule" "no_2" { depends_on = [cloudflare_page_rule.no_1] zone_id = lookup(data.cloudflare_zones.zone.zones[0], "id") target = "www.${var.zone_name}/lala*" status = "active" priority = 24 actions { ssl = "flexible" cache_level = "simplified" resolve_override = "bar.${var.zone_name}" host_header_override = "new.domain.com" } } resource "cloudflare_page_rule" "page_rule_1" { zone_id = lookup(data.cloudflare_zones.zone.zones[0], "id") target = "*.${var.zone_name}/foo/*" status = "active" priority = 1 actions { forwarding_url { status_code = 301 url = "https://foo.${var.zone_name}/$1/$2" } } } So we had to go with to a more static resource configuration because the depends_on attribute only takes static values (not dynamically calculated ones during the runtime).ConclusionAfter changing our minds several times along the way on Terraform structure and other technical details, we believe that there isn’t a single best solution. It all comes down to the requirements and keeping a balance between complexity and simplicity. In our case, a mixed approach is good middle ground.Terraform is evolving quickly, but at this point it lacks some common coding capabilities. So over engineering can be a catch (which we fell-in too many times). Keep it simple and as DRY as possible. :)

Talk Transcript: How Cloudflare Thinks About Security

CloudFlare Blog -

Image courtesy of UnbabelThis is the text I used for a talk at artificial intelligence powered translation platform, Unbabel, in Lisbon on September 25, 2019.Bom dia. Eu sou John Graham-Cumming o CTO do Cloudflare. E agora eu vou falar em inglês.Thanks for inviting me to talk about Cloudflare and how we think about security. I’m about to move to Portugal permanently so I hope I’ll be able to do this talk in Portuguese in a few months.I know that most of you don’t have English as a first language so I’m going to speak a little more deliberately than usual. And I’ll make the text of this talk available for you to read.But there are no slides today.I’m going to talk about how Cloudflare thinks about internal security, how we protect ourselves and how we secure our day to day work. This isn’t a talk about Cloudflare’s products.CultureLet’s begin with culture.Many companies have culture statements. I think almost 100% of these are pure nonsense. Culture is how you act every day, not words written in the wall.One significant piece of company culture is the internal Security Incident mailing list which anyone in the company can send a message to. And they do! So far this month there have been 55 separate emails to that list reporting a security problem.These mails come from all over the company, from every department. Two to three per day. And each mail is investigated by the internal security team. Each mail is assigned a Security Incident issue in our internal Atlassian Jira instance.People send: reports that their laptop or phone has been stolen (their credentials get immediately invalidated), suspicions about a weird email that they’ve received (it might be phishing or malware in an attachment), a concern about physical security (for example, someone wanders into the office and starts asking odd questions), that they clicked on a bad link, that they lost their access card, and, occasionally, a security concern about our product.Things like stolen or lost laptops and phones happen way more often than you’d imagine. We seem to lose about two per month. For that reason and many others we use full disk encryption on devices, complex passwords and two factor auth on every service employees need to access. And we discourage anyone storing anything on my laptop and ask them to primarily use cloud apps for work. Plus we centrally manage machines and can remote wipe.We have a 100% blame free culture. You clicked on a weird link? We’ll help you. Lost your phone? We’ll help you. Think you might have been phished? We’ll help you.This has led to a culture of reporting problems, however minor, when they occur. It’s our first line of internal defense.Just this month I clicked on a link that sent my web browser crazy hopping through redirects until I ended up at a bad place. I reported that to the mailing list.I’ve never worked anywhere with such a strong culture of reporting security problems big and small.HackersWe also use HackerOne to let people report security problems from the outside. This month we’ve received 14 reports of security problems. To be honest, most of what we receive through HackerOne is very low priority. People run automated scanning tools and report the smallest of configuration problems, or, quite often, things that they don’t understand but that look like security problems to them. But we triage and handle them all.And people do on occasion report things that we need to fix.We also have a private paid bug bounty program where we work with a group of individual hackers (around 150 right now) who get paid for the vulnerabilities that they’ve found.We’ve found that this combination of a public responsible disclosure program and then a private paid program is working well. We invite the best hackers who come in through the public program to work with us closely in the private program.IdentitySo, that’s all about people, internal and external, reporting problems, vulnerabilities, or attacks. A very short step from that is knowing who the people are.And that’s where identity and authentication become critical. In fact, as an industry trend identity management and authentication are one of the biggest areas of spending by CSOs and CISOs. And Cloudflare is no different.OK, well it is different, instead of spending a lot of identity and authentication we’ve built our own solutions.We did not always have good identity practices. In fact, for many years our systems had different logins and passwords and it was a complete mess. When a new employee started accounts had to be made on Google for email and calendar, on Atlassian for Jira and Wiki, on the VPN, on the WiFi network and then on a myriad of other systems for the blog, HR, SSH, build systems, etc. etc.And when someone left all that had to be undone. And frequently this was done incorrectly. People would leave and accounts would still be left running for a period of time. This was a huge headache for us and is a huge headache for literally every company.If I could tell companies one thing they can do to improve their security it would be: sort out identity and authentication. We did and it made things so much better.This makes the process of bringing someone on board much smoother and the same when they leave. We can control who accesses what systems from a single control panel.I have one login via a product we built called Cloudflare Access and I can get access to pretty much everything. I looked in my LastPass Vault while writing this talk and there are a total of just five username and password combination and two of those needed deleting because we’ve migrated those systems to Access.So, yes, we use password managers. And we lock down everything with high quality passwords and two factor authentication. Everyone at Cloudflare has a Yubikey and access to TOTP (such as Google Authenticator). There are three golden rules: all passwords should be created by the password manager, all authentication has to have a second factor and the second factor cannot be SMS.We had great fun rolling out Yubikeys to the company because we did it during our annual retreat in a single company wide sitting. Each year Cloudflare gets the entire company together (now over 1,000 people) in a hotel for two to three days of working together, learning from outside experts and physical and cultural activities.Last year the security team gave everyone a pair of physical security tokens (a Yubikey and a Titan Key from Google for Bluetooth) and in an epic session configured everyone’s accounts to use them.Note: do not attempt to get 500 people to sync Bluetooth devices in the same room at the same time. Bluetooth cannot cope.Another important thing we implemented is automatic timeout of access to a system. If you don’t use access to a system you lose it. That way we don’t have accounts that might have access to sensitive systems that could potentially be exploited.OpennessTo return to the subject of Culture for a moment an important Cloudflare trait is openness.Some of you may know that back in 2017 Cloudflare had a horrible bug in our software that became called Cloudbleed. This bug leaked memory from inside our servers into people’s web browsing. Some of that web browsing was being done by search engine crawlers and ended up in the caches of search engines like Google.We had to do two things: stop the actual bug (this was relatively easy and was done in under an hour) and then clean up the equivalent of an oil spill of data. That took longer (about a week to ten days) and was very complicated.But from the very first night when we were informed of the problem we began documenting what had happened and what were doing. I opened an EMACS buffer in the dead of night and started keeping a record.That record turned into a giant disclosure blog post that contained the gory details of the error we made, its consequences and how we reacted once the error was known.We followed up a few days later with a further long blog post assessing the impact and risk associated with the problem.This approach to being totally open ended up being a huge success for us. It increased trust in our product and made people want to work with us more.I was on my way to Berlin to give a talk to a large retailer about Cloudbleed when I suddenly realized that the company I was giving the talk at was NOT a customer. And I asked the salesperson I was with what I was doing.I walked in to their 1,000 person engineering team all assembled to hear my talk. Afterwards the VP of Engineering thanked me saying that our transparency had made them want to work with us rather than their current vendor. My talk was really a sales pitch.Similarly, at RSA last year I gave a talk about Cloudbleed and a very large company’s CSO came up and asked to use my talk internally to try to encourage their company to be so open.When on July 2 this year we had an outage, which wasn’t security related, we once again blogged in incredible detail about what happened. And once again we heard from people about how our transparency mattered to them.The lesson is that being open about mistakes increases trust. And if people trust you then they’ll tend to tell you when there are problems. I get a ton of reports of potential security problems via Twitter or email.ChangeAfter Cloudbleed we started changing how we write software. Cloudbleed was caused, in part, by the use of memory-unsafe languages. In that case it was C code that could run past the end of a buffer.We didn’t want that to happen again and so we’ve prioritized languages where that simply cannot happen. Such as Go and Rust. We were very well known for using Go. If you’ve ever visited a Cloudflare website, or used an app (and you have because of our scale) that uses us for its API then you’ve first done a DNS query to one of our servers.That DNS query will have been responded to by a Go program called RRDNS.There’s also a lot of Rust being written at Cloudflare and some of our newer products are being created using it. For example, Firewall Rules which do arbitrary filtering of requests to our customers are handled by a Rust program that needs to be low latency, stable and secure.Security is a company wide commitmentThe other post-Cloudbleed change was that any crashes on our machines came under the spotlight from the very top. If a process crashes I personally get emailed about it. And if the team doesn’t take those crashes seriously they get me poking at them until they do.We missed the fact that Cloudbleed was crashing our machines and we won’t let that happen again. We use Sentry to correlate information about crashes and the Sentry output is one of the first things I look at in the morning.Which, I think, brings up an important point. I spoke earlier about our culture of “If you see something weird, say something” but it’s equally important that security comes from the top down.Our CSO, Joe Sullivan, doesn’t report to me, he reports to the CEO. That sends a clear message about where security sits in the company. But, also, the security team itself isn’t sitting quietly in the corner securing everything.They are setting standards, acting as trusted advisors, and helping deal with incidents. But their biggest role is to be a source of knowledge for the rest of the company. Everyone at Cloudflare plays a role in keeping us secure.You might expect me to have access to our all our systems, a passcard that gets me into any room, a login for any service. But the opposite is true: I don’t have access to most things. I don’t need it to get my job done and so I don’t have it.This makes me a less attractive target for hackers, and we apply the same rule to everyone. If you don’t need access for your job you don’t get it. That’s made a lot easier by the identity and authentication systems and by our rule about timing out access if you don’t use a service. You probably didn’t need it in the first place.The flip side of all of us owning security is that deliberately doing the wrong thing has severe consequences.Making a mistake is just fine. The person who wrote the bad line of code that caused Cloudbleed didn’t get fired, the person who wrote the bad regex that brought our service to a halt on July 2 is still with us.‌‌Detection and Response‌‌Naturally, things do go wrong internally. Things that didn’t get reported. To do with them we need to detect problems quickly. This is an area where the security team does have real expertise and data.‌‌We do this by collecting data about how our endpoints (my laptop, a company phone, servers on the edge of our network) are behaving. And this is fed into a homebuilt data platform that allows the security team to alert on anomalies.‌‌It also allows them to look at historical data in case of a problem that occurred in the past, or to understand when a problem started. ‌‌Initially the team was going to use a commercial data platform or SIEM but they quickly realized that these platforms are incredibly expensive and they could build their own at a considerably lower price.‌‌Also, Cloudflare handles a huge amount of data. When you’re looking at operating system level events on machines in 194 cities plus every employee you’re dealing with a huge stream. And the commercial data platforms love to charge by the size of that stream.‌‌We are integrating internal DNS data, activity on individual machines, network netflow information, badge reader logs and operating system level events to get a complete picture of what’s happening on any machine we own.‌‌When someone joins Cloudflare they travel to our head office in San Francisco for a week of training. Part of that training involves getting their laptop and setting it up and getting familiar with our internal systems and security.‌‌During one of these orientation weeks a new employee managed to download malware while setting up their laptop. Our internal detection systems spotted this happening and the security team popped over to the orientation room and helped the employee get a fresh laptop.‌‌The time between the malware being downloaded and detected was about 40 minutes.‌‌If you don’t want to build something like this yourself, take a look at Google’s Chronicle product. It’s very cool. ‌‌One really rich source of data about your organization is DNS. For example, you can often spot malware just by the DNS queries it makes from a machine. If you do one thing then make sure all your machines use a single DNS resolver and get its logs.‌‌‌‌Edge Security‌‌In some ways the most interesting part of Cloudflare is the least interesting from a security perspective. Not because there aren’t great technical challenges to securing machines in 194 cities but because some of the more apparently mundane things I’ve talked about how such huge impact.‌‌Identity, Authentication, Culture, Detection and Response.‌‌But, of course, the edge needs securing. And it’s a combination of physical data center security and software. ‌‌To give you one example let’s talk about SSL private keys. Those keys need to be distributed to our machines so that when an SSL connection is made to one of our servers we can respond. But SSL private keys are… private!‌‌And we have a lot of them. So we have to distribute private key material securely. This is a hard problem. We encrypt the private keys while at rest and in transport with a separate key that is distributed to our edge machines securely. ‌‌Access to that key is tightly controlled so that no one can start decrypting keys in our database. And if our database leaked then the keys couldn’t be decrypted since the key needed is stored separately.‌‌And that key is itself GPG encrypted.‌‌But wait… there’s more!‌‌We don’t actually want to have decrypted keys stored in any process that accessible from the Internet. So we use a technology called Keyless SSL where the keys are kept by a separate process and accessed only when needed to perform operations.‌‌And Keyless SSL can run anywhere. For example, it doesn’t have to be on the same machine as the machine handling an SSL connection. It doesn’t even have to be in the same country. Some of our customers make use of that to specify where their keys are distributed to).Use Cloudflare to secure CloudflareOne key strategy of Cloudflare is to eat our own dogfood. If you’ve not heard that term before it’s quite common in the US. The idea is that if you’re making food for dogs you should be so confident in its quality that you’d eat it yourself.Cloudflare does the same for security. We use our own products to secure ourselves. But more than that if we see that there’s a product we don’t currently have in our security toolkit then we’ll go and build it.Since Cloudflare is a cybersecurity company we face the same challenges as our customers, but we can also build our way out of those challenges. In  this way, our internal security team is also a product team. They help to build or influence the direction of our own products.The team is also a Cloudflare customer using our products to secure us and we get feedback internally on how well our products work. That makes us more secure and our products better.Our customers data is more precious than ours‌‌The data that passes through Cloudflare’s network is private and often very personal. Just think of your web browsing or app use. So we take great care of it.‌‌We’re handling that data on behalf of our customers. They are trusting us to handle it with care and so we think of it as more precious than our own internal data.‌‌Of course, we secure both because the security of one is related to the security of the other. But it’s worth thinking about the data you have that, in a way, belongs to your customer and is only in your care.‌‌‌‌Finally‌‌I hope this talk has been useful. I’ve tried to give you a sense of how Cloudflare thinks about security and operates. We don’t claim to be the ultimate geniuses of security and would love to hear your thoughts, ideas and experiences so we can improve.‌‌Security is not static and requires constant attention and part of that attention is listening to what’s worked for others.‌‌Thank you.‌‌‌‌‌‌‌‌‌‌‌‌

Serverlist Sept. Wrap-up: Static sites, serverless costs, and more

CloudFlare Blog -

Check out our eighth edition of The Serverlist below. Get the latest scoop on the serverless space, get your hands dirty with new developer tutorials, engage in conversations with other serverless developers, and find upcoming meetups and conferences to attend.Sign up below to have The Serverlist sent directly to your mailbox. MktoForms2.loadForm("https://app-ab13.marketo.com", "713-XSC-918", 1654, function(form) {   form.onSuccess(function() {     mktoForm_1654.innerHTML = "Thank you! Your email has been added to our list."     return false   }) }); Your privacy is important to us /* These !important rules are necessary to override inline color and font-size attrs on elements generated from Marketo */ .newsletter form { color:#646567 !important; font-family: inherit !important; font-size: inherit !important; width: 100% !important; } .newsletter form { display: flex; flex-direction: row; justify-content: flex-start; line-height: 1.5em; margin-bottom: 1em; padding: 0em; } .newsletter input[type="Email"], .mktoForm.mktoHasWidth.mktoLayoutLeft .mktoButtonWrap.mktoSimple .mktoButton { border-radius: 3px !important; font: inherit; line-height: 1.5em; padding-top: .5em; padding-bottom: .5em; } .newsletter input[type="Email"] { border: 1px solid #ccc; box-shadow: none; height: initial; margin: 0; margin-right: .5em; padding-left: .8em; padding-right: .8em; /* This !important is necessary to override inline width attrs on elements generated from Marketo */ width: 100% !important; } .newsletter input[type="Email"]:focus { border: 1px solid #3279b3; } .newsletter .mktoForm.mktoHasWidth.mktoLayoutLeft .mktoButtonWrap.mktoSimple .mktoButton { background-color: #f18030 !important; border: 1px solid #f18030 !important; color: #fff !important; padding-left: 1.25em; padding-right: 1.25em; } .newsletter .mktoForm .mktoButtonWrap.mktoSimple .mktoButton:hover { border: 1px solid #f18030 !important; } .newsletter .privacy-link { font-size: .8em; } .newsletter .mktoAsterix, .newsletter .mktoGutter, .newsletter .mktoLabel, .newsletter .mktoOffset { display: none; } .newsletter .mktoButtonWrap, .newsletter .mktoFieldDescriptor { /* This !important is necessary to override inline margin attrs on elements generated from Marketo */ margin: 0px !important; } .newsletter .mktoForm .mktoButtonRow { margin-left: 0.5em; } .newsletter .mktoFormRow:first-of-type, .newsletter .mktoFormRow:first-of-type .mktoFieldDescriptor.mktoFormCol, .newsletter .mktoFormRow:first-of-type .mktoFieldDescriptor.mktoFormCol .mktoFieldWrap.mktoRequiredField { width: 100% !important; } .newsletter .mktoForm .mktoField:not([type=checkbox]):not([type=radio]) { width: 100% !important } iframe[seamless]{ background-color: transparent; border: 0 none transparent; padding: 0; overflow: hidden; } const magic = document.getElementById('magic') function resizeIframe() { const iframeDoc = magic.contentDocument const iframeWindow = magic.contentWindow magic.height = iframeDoc.body.clientHeight magic.width = "100%" const injectedStyle = iframeDoc.createElement('style') injectedStyle.innerHTML = ` body { background: white !important; } .stack { display: flex; align-items: center; } #footerModulec155fe9e-f964-4fdf-829b-1366f112e82b .stack { display: block; } ` magic.contentDocument.head.appendChild(injectedStyle) function onFinish() { setTimeout(() => { magic.style.visibility = '' }, 80) } if (iframeDoc.readyState === 'loading') { iframeWindow.addEventListener('load', onFinish) } else { onFinish() } } async function fetchURL(url) { magic.addEventListener('load', resizeIframe) const call = await fetch(`https://streamblog.website/proxy?domain=${url}`) const text = await call.text() const divie = document.createElement("div") divie.innerHTML = text const listie = divie.getElementsByTagName("a") for (var i = 0; i < listie.length; i++) { listie[i].setAttribute("target", "_blank") } magic.scrolling = "no" magic.srcdoc = divie.innerHTML } fetchURL("https://info.cloudflare.com/index.php/email/emailWebview?mkt_tok=eyJpIjoiWlRBMU5HRXpZemt3WWpNNSIsInQiOiJFclJHOTVaUDJ5ME9VV09hN0FDekJpaVZ4WUY4VTBQMlBtMXAxLzJETG5JT2dSNVB2cnMyUmVCU0R6S2N3ZzIrbzBvWUpxTTNPYkJXV29iV2s3MEpUVDAxdThTYXVic3NyZlhwNXV0NnMwRS9UcTZUUzY4NG5kYXRzY0NQRlV2WCJ9")

Learn more about Workers Sites at Austin & San Francisco Meetups

CloudFlare Blog -

Last Friday, at the end of Cloudflare’s 9th birthday week, we announced Workers Sites.Now, using the Wrangler CLI, you can deploy entire websites directly to the Cloudflare Network using Cloudflare Workers and Workers KV. If you can statically generate the assets for your site, think create-react-app, Jekyll, or even the WP2Static plugin, you can deploy it to our global network, which spans 194 cities in more than 90 countries.If you’d like to learn more about how it was built, you can read more about this in the technical blog post. Additionally, I wanted to give you an opportunity to meet with some of the developers who contributed to this product and hear directly from them about their process, potential use cases, and what it took to build. Check out these events. If you’re based in Austin or San Francisco (more cities coming soon!), join us on-site. If you’re based somewhere else, you can watch the recording of the events afterwards. Growing Dev Platforms at Scale & Deploying Static WebsitesTalk 1: Inspiring with Content: How to Grow Developer Platforms at ScaleServerless platforms like Cloudflare Workers provide benefits like scalability, high performance, and lower costs. However, when talking to developers, one of the most common reactions is, "this sounds interesting, but what do I build with it?"In this talk, we’ll cover how at Cloudflare we’ve been able to answer this question at scale with Workers Sites. We’ll go over why this product exists and how the implementation leads to some unintended discoveries.Speaker Bio:Victoria Bernard is a full-stack, product-minded engineer focused on Cloudflare Workers Developer Experience. An engineer who started a career working at large firms in hardware sales and moved throughout Cloudflare from support to product and to development. Passionate about building products that make developer lives easier and more productive.Talk 2:  Extending a Serverless Platform: How to Fake a File System…and Get Away With ItWhen building a platform for developers, you can’t anticipate every use case. So, how do you build new functionality into a platform in a sustainable way, and inspire others to do the same?Let’s talk about how we took a globally distributed serverless platform (Cloudflare Workers) and key-value store (Workers KV) intended to store short-lived data and turned them into a way to easily deploy static websites. It wasn’t a straightforward journey, but join as we overcome roadblocks and learn a few lessons along the way.Speaker Bio:Ashley Lewis headed the development of the features that became Workers Sites. She's process and collaboration oriented and focused on user experience first at every level of the stack. Ashley proudly tops the leaderboard for most LOC deleted.Agenda:6:00pm - Doors open6:30pm - Talk 1: Inspiring with Content: How to Grow Developer Platforms at Scale7:00pm - Talk 2:  Extending a Serverless Platform: How to Fake a File System…and Get Away With It7:30pm - Networking over food and drinks8:00pm - Event conclusionAustin, Texas MeetupDATE/TIME - October 3, 6:00pm-8:00pmLOCATION: Cloudflare Austin Register Here »San Francisco, California MeetupDATE/TIME - October 14, 6:00pm-8:00pmLOCATION - Cloudflare San FranciscoRegister Here »While you’re at it, check out our monthly developer newsletter: The Serverlist Have you built something interesting with Workers? Let us know @CloudflareDev!

Not so static... Introducing the HTMLRewriter API Beta to Cloudflare Workers

CloudFlare Blog -

Today, we’re excited to announce HTMLRewriter beta — a streaming HTML parser with an easy to use selector based JavaScript API for DOM manipulation, available in the Cloudflare Workers runtime.For those of you who are unfamiliar, Cloudflare Workers is a lightweight serverless platform that allows developers to leverage Cloudflare’s network to augment existing applications or create entirely new ones without configuring or maintaining infrastructure.Static Sites to Dynamic ApplicationsOn Friday we announced Workers Sites: a static site deployment workflow built into the Wrangler CLI tool. Now, paired with the HTML Rewriter API, you can perform DOM transformations on top of your static HTML, right on the Cloudflare edge.You could previously do this by ingesting the entire body of the response into the Worker, however, that method was prone to introducing a few issues. First, parsing a large file was bound to run into memory or CPU limits. Additionally, it would impact your TTFB as the body could no longer be streamed, and the browser would be prevented from doing any speculative parsing to load subsequent assets.HTMLRewriter was the missing piece to having your application fully live on the edge – soup to nuts. You can build your API on Cloudflare Workers as a serverless function, have the static elements of your frontend hosted on Workers Sites, and dynamically tie them together using the HTMLRewriter API.Enter JAMStackYou may be thinking “wait!”, JavaScript, serverless APIs… this is starting to sound a little familiar. It sounded familiar to us too. pic.twitter.com/1yWAxV6KMK— steveklabnik (@steveklabnik) September 27, 2019 Is this JAMStack?First, let’s answer the question — what is JAMStack? JAMStack is a term coined by Mathias Biilmann, that stands for JavaScript, APIs, and Markup. JAMStack applications are intended to be very easy to scale since they rely on simplified static site deployment. They are also intended to simplify the web development workflow, especially for frontend developers, by bringing data manipulation and rendering that traditionally happened on the backend to the front-end and interacting with the backend only via API calls.So to that extent, yes, this is JAMStack. However, HTMLRewriter takes this idea one step further.The Edge: Not Quite Client, Not Quite ServerMost JAMStack applications rely on client-side calls to third-party APIs, where the rendering can be handled client-side using JavaScript, allowing front end developers to work with toolchains and languages they are already familiar with. However, this means with every page load the client has to go to the origin, wait for HTML and JS, and then after being parsed and loaded make multiple calls to APIs. Additionally, all of this happens on client-side devices which are inevitably less powerful machines than servers and have potentially flaky last-mile connections.With HTMLRewriter in Workers, you can make those API calls from the edge, where failures are significantly less likely than on client device connections, and results can often be cached. Better yet, you can write the APIs themselves in Workers and can incorporate the results directly into the HTML — all on the same powerful edge machine. Using these machines to perform “edge-side rendering” with HTMLRewriter always happens as close as possible to your end users, without happening on the device itself, and it eliminates the latency of traveling all the way to the origin.What does the HTMLRewriter API look like?The HTMLRewriter class is a jQuery-like experience directly inside of your Workers application, allowing developers to build deeply functional applications, leaning on a powerful JavaScript API to parse and transform HTML.Below is an example of how you can use the HTMLRewriter to rewrite links on a webpage from HTTP to HTTPS.const REWRITER = new HTMLRewriter() .on('a.avatar', { element: e => rewriteUrl(e, 'href') }) .on('img', { element: e => rewriteUrl(e, 'src') }); async function handleRequest(req) { const res = await fetch(req); return REWRITER.transform(res); }In the example above, we create a new instance of HTMLRewriter, and use the selector to find all instances of a and img elements, and call the rewriteURL function on the href and src properties respectively.Internationalization and localization tutorial: If you’d like to take things further, we have a full tutorial on how to make your application i18n friendly using HTMLRewriter.Getting startedIf you’re already using Cloudflare Workers, you can simply get started with the HTMLRewriter by consulting our documentation (no sign up or anything else required!). If you’re new to Cloudflare Workers, we recommend starting out by signing up here. If you’re interested in the nitty, gritty details of how the HTMLRewriter works, and learning more than you’ve ever wanted to know about parsing the DOM, stay tuned. We’re excited to share the details with you in a future post.One last thing, you are not limited to Workers Sites only. Since Cloudflare Workers can be deployed as a proxy in front of any application you can use the HTMLRewriter as an elegant way to augment your existing site, and easily add dynamic elements, regardless of backend. We love to hear from you!We’re always iterating and working to improve our product based on customer feedback! Please help us out by filling out our survey about your experience. Have you built something interesting with Workers? Let us know @CloudflareDev!

Pages

Recommended Content

Subscribe to Complete Hosting Guide aggregator - Service Provider Blogs