CloudFlare Blog

Making DNS record changes more reliable

DNS is the very first step in accessing any website, API, or pretty much anything on the Internet, which makes it mission-critical to keeping your site up and running. This week, we are launching two significant changes that allow our customers to better maintain and update their DNS records. For customers who use Cloudflare as their authoritative DNS provider, we’ve added a much asked for feature: confirmation to DNS record edits. For our secondary DNS customers, we’re excited to provide a brand new onboarding experience.Confirm and CommitOne of the benefits of using Cloudflare DNS is that changes quickly propagate to our 200+ data centers. And I mean very quickly: DNS propagation typically takes <5 seconds worldwide. Our UI was set up to allow customers to edit records, click out of the input box, and boom! The record has propagated!There are a lot of advantages to fast DNS, but there's also one clear downside – it leaves room for fat fingering. What if you accidentally toggle the proxy icon, or mistype the content of your DNS record? This could result in users not being able to access your website or API and could cause a significant outage. To protect customers from these kinds of mistakes, we've added a Save button for DNS record changes.Now editing records in the DNS table allows you to take an extra look before committing the change. The new confirmation layout applies to all record types and affects any content, TTL, or proxy status changes.Let us know what you think by filling out the feedback survey linked at the top of the DNS tab in the dashboard.

Secondary DNS — A faster, more resilient way to serve your DNS records

What is secondary DNS, and why is it important?In DNS, nameservers are responsible for serving DNS records for a zone. How the DNS records populate into the nameservers differs based on the type of nameserver.A primary master is a nameserver that manages a zone’s DNS records. This is where the zone file is maintained and where DNS records are added, removed, and modified. However, relying on one DNS server can be risky. What if that server goes down, or your DNS provider has an outage? If you run a storefront, then your customers would have to wait until your DNS server is back up to access your site. If your website were a brick and mortar store, this would be effectively like boarding up the door while customers are trying to get in.This type of outage can be very costly.Now imagine you have another DNS server that has a replica of your DNS records. Wouldn’t it be great to have it as a back-up if your primary nameserver went down? Or better yet, what if both served your DNS records at all times— this could help decrease the latency of DNS requests, distribute the load between DNS servers, and add resiliency to your infrastructure! And that’s precisely what Secondary DNS nameservers were built for.As businesses grow, they often scale their DNS infrastructure. We’re seeing more customers move away from two or three on-premise DNS servers to using a managed DNS provider to having multiple DNS vendors—all to increase redundancy against the possibility of a DDoS attack taking down one of their providers. Cloudflare has data centers in over 200 cities, all of which run our DNS software allowing our authoritative DNS customers to benefit from DNS lookups averaging around 11ms globally. So we decided to expand this functionality to customers who want to use more than one DNS provider, or for those that find it too complicated to move away from their on-premise DNS server.Customer ChallengesWhen we first built our secondary DNS product, our MVP was focused on functionality and not ease of use. We did this because we thought that this feature would be used by a small portion of our Enterprise customers and that they would be comfortable using the API. But the demand for secondary DNS was far greater than we initially imagined. Many customers are interested in the service, including those who aren't comfortable managing DNS  through the API.Previously, setting up secondary DNS on a zone required a series of API calls: one for creating the zone, one for defining the IP address and settings of the master server, one for linking the master(s) to the zone, and one for initiating a zone transfer.We heard from customers that this experience was frustrating. There were also a lot of places where the setup could go wrong: some customers would forget to link a master to their zone, others would forget a step when adding subsequent zones, and still others would have to spend hours debugging a typo in their API call. We believe secondary DNS customers should have as seamless an experience as our authoritative DNS customers, and shouldn’t be treated as secondary (pun intended) class citizens. When creating the onboarding UI, we asked ourselves, how can we simplify the experience to just a few input fields? How do we prevent customers from making easy, potentially messy mistakes, like forgetting to attach a master?Enter: The new Secondary DNS Onboarding ExperienceStarting today, enterprise customers who are entitled to secondary DNS will be able to configure their zone in the Cloudflare Dashboard. The time from when they type in their domain name to when they see their records in the dashboard is less than two minutes. We’ve added error prevention to stop customers from adding their zone until they’ve configured at least one master. Customers will also be able to review their transferred records before finishing the onboarding process, allowing them to see what was transferred, without juggling API calls and and switching back and forth between the dashboard and a support article.How It LooksThe “Add Site” flow in the Cloudflare Dashboard gives customers two options: Authoritative or Secondary DNS. Next, they will need to fill out the IP address of their master server, attach a TSIG (Transactional Signature) to authenticate zone transfers, and voila! In just a few clicks, records populate to your DNS table. The Intricacies of Secondary DNSAs mentioned above, primary nameservers are where DNS records are managed, and secondary nameservers are responsible for holding the read-only replica of those records. But how do they get there? The communication between a primary master and a secondary nameserver is known as a zone transfer.Master servers use SOA (Start of Authority) records to keep track of zone updates. Every time a zone file changes (say you add or remove a DNS record), the serial number of the SOA record is incremented as a way to signal secondary nameservers that the zone updated, and it’s time to fetch a fresh copy.Primary masters can send a NOTIFY message to a secondary master to signal a zone file change. Once the secondary receives the NOTIFY, it will do an SOA sanity check against the master and perform a zone transfer if it sees that the SOA value has increased. An AXFR or IXFR query can initiate the zone transfer. An AXFR query initiates a full zone transfer and is usually requested the very first time a zone is transferred. But AXFR transfers are not always necessary as most zone file changes are minute. This is why IXFR (incremental zone transfer) requests were created— they tell a master server which version of the zone a secondary currently holds and the master sends the difference between the new version and the one the secondary has— this way only the new changes are transferred. Some masters, unfortunately, do not support NOTIFY queries. This means that instead of the master notifying the secondary of zone updates, the secondary needs to periodically check the SOA of the primary server to see if the value has changed.Securing Zone TransfersZone transfers between a primary and secondary server are unauthenticated on their own. TSIGs (Transactional Signatures) were developed as a means to add authentication to the DNS protocol, and have mostly been used for zone transfers. They provide mutual authentication between a client and a server by using a shared secret between the two parties and a one-way keyed hash function, which is attached as a TSIG record to a DNS message. The TSIG record guarantees that only secondary nameservers with the TSIG can pull zone transfers from a master. And vice versa, secondary servers will only accept zone transfers from masters that have the proper TSIG attached. Additionally, TSIGs provide data integrity and ensure that the DNS message was not modified en route.We support TSIGs and highly recommend that you add it when configuring your master.Extending DNS Analytics to Secondary DNSSetting up a secondary zone on Cloudflare is a simple process with the new onboarding UI. In just a few clicks, Cloudflare’s nameservers in all 200+ cities will begin responding to DNS queries. In addition to serving DNS records, secondary DNS customers will also be able to see the same DNS analytics that we provide to our authoritative DNS customers. The analytics show a breakdown of DNS traffic by record type, response code, and even geographical regions.One of our customers, Big Cartel, runs an E-commerce platform that has helped people all over the world sell $2.5 billion of their work since 2005. As they grow, Cloudflare’s secondary DNS product helps keep their site fast and reliable:“At Big Cartel, we provide an online storefront for our customers. We need to be always available and avoid any chances of downtime — eliminating all single points of failure is critical for us. With Cloudflare's Secondary DNS, we can do just that! It keeps our DNS infrastructure more resilient while allowing our customers to benefit from fast query times. Additionally, using Cloudflare's Secondary DNS analytics provides granular insights into how our traffic is balanced between our DNS providers” - Lee Jensen, Technical DirectorGetting StartedSecondary DNS is currently available on the Enterprise plan, if you’d like to take advantage of it, please let your account team know. For additional documentation on Secondary DNS, please refer to our support article.

Releasing Cloudflare Access’ most requested feature

Cloudflare Access, part of Cloudflare for Teams, replaces legacy corporate VPNs with Cloudflare’s global network. Instead of starting a VPN client to backhaul traffic through an office, users visit the hostname of an internal application and login with your team’s SSO provider. While the applications feel like SaaS apps for end users, your security and IT departments can configure granular controls and audit logging in a single place.Since Access launched two years ago, customers have been able to integrate multiple SSO providers at the same time. This MultiSSO option makes it seamless for teams to have employees login with Okta or Azure AD while partners and contractors use LinkedIN or GitHub.The integrations always applied globally. Users would see all SSO options when connecting to any application protected by Cloudflare Access. As more organizations use Cloudflare Access to connect distributed and mixed workforces to resources, listing every provider on every app no longer scales.For example, your team might have an internal GitLab instance that only employees need to access using your corporate G Suite login. Meanwhile, the marketing department needs to share QA versions of new sites with an external agency who authenticates with LinkedIn. Asking both sets of users to pick an SSO provider on both applications adds a redundant step and can lead to additional questions or IT tickets.The ability to only show users the relevant identity provider became the most requested feature in Cloudflare Access in the last few months. Starting today, you can use the new Cloudflare for Teams UI to configure identity options on individual applications.Cloudflare AccessCloudflare Access secures applications by applying zero-trust enforcement to every request. Rather than trusting anyone on a private network, Access checks for identity any time someone attempts to reach the application. With Cloudflare’s global network, that check takes place in a data center in over 200 cities around the world to avoid compromising performance.Behind the scenes, administrators build rules to decide who should be able to reach the tools protected by Access. In turn, when users need to connect to those tools, they are prompted to authenticate with one of the identity provider options. Cloudflare Access checks their login against the list of allowed users and, if permitted, allows the request to proceed.The challenge of agreeing on identityMost zero-trust options, like the VPN appliances they replace, rely on one source of identity. If your team has an application that you need to share with partners or contractors, you need to collectively agree on a single standard.Some teams opt to solve that challenge by onboarding external users to their own identity provider. When contractors join a project, the IT department receives help desk tickets to create new user accounts in the organization directory. Contractors receive instructions on how to sign-up, they spend time creating passwords and learning the new tool, and then use those credentials to login.This option gives an organization control of identity, but adds overhead in terms of time and cost. The project owner also needs to pay for new SSO seat licenses, even if those seats are temporary. The IT department must spend time onboarding, helping, and then offboarding those user accounts. And the users themselves need to learn a new system and manage yet another password - this one with permission to your internal resources.Alternatively, other groups decide to “federate” identity. In this flow, an organization will connect their own directory service to their partner’s equivalent service. External users login with their own credentials, but administrators do the work to merge the two services to trust one another.While this method avoids introducing new passwords, both organizations need to agree to dedicate time to integrate their identity providers - assuming that those providers can integrate. Businesses then need to configure this setup with each contractor or partner group. This model also requires that external users be part of a larger organization, making it unavailable to single users or freelancers.Cloudflare Access avoids forcing the decision on a single source of identity by supporting multiple. When users connect, they are presented with those options. Users choose their specific provider and Access checks that individual’s login against the list of allowed users.Configuring per-app optionsNot all of those options apply to every application that an organization secures. To segment those applications, and reduce user confusion, you can now scope specific apps to different providers.To get started, select the application that you want to segment with a particular provider in the Cloudflare for Teams UI. Click the tab titled “Authentication”.The tab will list all providers integrated with your account. By default, Access will continue to enable all options for end users. You can toggle any provider on or off in this view and save. The next time your users visit this application, they will only see the options enabled.If you disable all but one option, Access will skip the login page entirely and redirect the user directly to the provider - saving them an unnecessary click.What’s next?You can start configuring individual identity providers with specific applications in the new Cloudflare for Teams dashboard. Additional documentation is also available.The new Teams UI makes this feature possible, but the login page that your end users see still has the legacy design from the older Access dashboard that launched two years ago. Cloudflare for Teams will be releasing a style update to that page in the next month to bring it in line with this new UI.

Resolve internal hostnames with Cloudflare for Teams

Phishing attacks begin like any other visit to a site on the Internet. A user opens a suspicious link from an email, and their DNS resolver looks up the hostname, then connects the user to the origin.Cloudflare Gateway’s secure DNS blocks threats like this by checking every hostname query against a constantly-evolving list of known threats on the Internet. Instead of sending the user to the malicious host, Gateway stops the site from resolving.. The user sees a “blocked domain” page instead of the malicious site itself.As teams migrate to SaaS applications and zero-trust solutions, they rely more on the public Internet to do their jobs. Gateway's security works like a bouncer, keeping users safe as they navigate the Internet. However, some organizations still need to send traffic to internal destinations for testing or as a way to make the migration more seamless.Starting today, you can use Cloudflare Gateway to direct end user traffic to a different IP than the one they originally requested. Administrators can build rules to override the address that would be returned by a resolver and send traffic to a specified alternative.Like the security features of Cloudflare Gateway, the redirect function is available in every one of Cloudflare’s data centers in 200 cities around the world, so you can block bad traffic and steer internal traffic without compromising performance.What is Cloudflare Gateway?Cloudflare Gateway is one-half of Cloudflare for Teams, Cloudflare’s platform for securing users, devices, and data. With Cloudflare for Teams, our global network becomes your team’s network, replacing on-premise appliances and security subscriptions with a single solution delivered closer to your users - wherever they work.As part of that platform, Cloudflare Gateway blocks threats on the public Internet from becoming incidents inside of your organization. Gateway’s first release added  DNS security filtering and content blocking to the world’s fastest DNS resolver, Cloudflare’s takes less than 5 minutes. Teams can secure entire office networks and segment traffic reports by location. For distributed organizations, Gateway can be deployed via MDM on networks that support IPv6 or using a dedicated IPv4 as part of a Cloudflare enterprise account.With secure DNS filtering, administrators can click a single button to block known threats, like sources of malware or phishing sites. Policies can be extended to block specific categories, like gambling sites or social media. When users request a filtered site, Gateway stops the DNS query from resolving and prevents the device from connecting to a malicious destination or hostname with blocked material.Traffic bound for internal destinationsAs users connect to SaaS applications, Cloudflare Gateway can keep those teams secure from threats on the public Internet.In parallel, teams can move applications that previously lived on a private network to a zero-trust model with Cloudflare Access. Rather than trusting anyone on a private network, Access checks for identity any time someone attempts to reach the application.Together, Cloudflare for Teams keeps users safe and makes internal applications just as easy to use as SaaS tools. Making it easier to migrate to that model also reduces user friction. Domain overrides can smooth that transition from internal networks to a fully cloud-delivered model.With Gateway's domain override feature, administrators can choose certain hostnames that still run on the private network and send traffic to the local IPs with the same resolver that secures Internet-bound traffic. End users can continue to connect to those resources without disruption. Once ready, those tools can be secured with Cloudflare Access to remove the reliance on a private network altogether.Cloudflare Gateway can help reduce user confusion and IT overhead with split-horizon setups where some traffic routes to the Internet and other requests need to stay on the same network. Administrators can build policies to route traffic bound for hostnames, even ones that exist publicly, to internal IP addresses that a user can reach if they are on the same local network.How does it work?When administrators configure an override policy, Cloudflare Gateway pushes that information to the edge of our network. The rule becomes part of the Gateway enforcement flow for that organization’s account. Explicit override policies are enforced first, before allowed or blocked rules.When a user makes a request to the original destination, that request arrives at a Gateway IP address where Cloudflare's network checks the source IP to determine which policies to enforce. Gateway determines that the request has an override rule and returns the preconfigured IP address.Gateway's DNS override feature is supported in deployments that use Cloudflare's IPv4 or IPv6 addresses, as well as DNS over HTTPS.What’s next?The domain override feature is available to all Cloudflare for Teams customers today at no additional cost. You can begin building override rules by navigating to the Policies section of the Gateway product and selecting the “Custom” tab. Administrators can configure up to 1,000 custom rules.To help organizations in their transition to remote work, Cloudflare has made our Teams platform free for any organization through September 1. You can set up an account at now. Need help getting started? You can request a dedicated onboarding session at no charge.

DeepLinks and ScrollAnchor

What are DeepLinks?To directly quote Wikipedia:“Deep linking is the use of a hyperlink that links to a specific, generally searchable or indexed, piece of web content on a website (e.g., rather than the website's home page (e.g., The URL contains all the information needed to point to a particular item.”Why DeepLinks in Dashboard?There are many user experiences in Cloudflare’s Dashboard that are enhanced by the use of deep linking, such as:We’re able to direct users from marketing pages directly into the Dashboard so they can interact with new/changed features.Troubleshooting docs can have clearer, more intently directions. e.g. “Enable SSL encryption here” vs “Log into the Dashboard, choose your account and zone, navigate to the security tab, change SSL encryption level, blah blah blah”.One of the interesting challenges with deep linking in the Dashboard is that most interesting resources are “locked” behind the context of an account and a zone/domain/website. To illustrate this, look at a tree of possible URL paths into Cloudflare’s -> root-level resources: login, sign-up, forgot-password, two-factor<accountId>/ -> account-level resources: analytics, workers, domains, stream, billing, audit-log<accountId>/<zoneId> -> zone-level resources: dns, ssl-tls, firewall, speed, caching, page-rules, traffic, etc. You might notice that in order to deep link to anything more interesting than logging in, a deep linker will need to know a user’s account or zone beforehand. A troubleshooting doc might want to send a user to the Page Rules tab in Dashboard to help a user fix their zone, but the linker doesn’t know what that zone is.Another highly desired feature was the ability for a deep link to scroll to a particular piece of content on a Dashboard page, making it even easier for users to navigate. Instead of a troubleshooting doc asking a user to fumble around to find a setting, we could helpfully scroll that setting right into view. Now that would be slick!What do DeepLinks look like in Dashboard?The solution we came up with involves 3 main parts:Deep links URLs expose an intuitive schema for dynamic value resolution.A React component, DeepLink, consolidates routing/resolving deep links.A React component, ScrollAnchor, encapsulates a simple algorithm which scrolls its content into view when the DOM has “finished loading”.Just to prove that it works, here’s a GIF of us deep linking to the “TLS 1.3” setting on the security settings page: It works! I was asked to select one of my several accounts, then our DeepLink routing component was smart enough to know that I have only one zone within that account and auto-filled the rest of the URL path. After the page was fully loaded, we were automatically scrolled to the TLS 1.3 setting. If you’re curious how all of this works and want to jump into the nitty gritty details, read on!How are DeepLinks exposed?If you were paying attention to the URL bar in the GIF above, you already know what’s coming. In order to deal with dynamic account/zone resolution, a deep link can use a to query parameter to specify a path into Dashboard. I think it reads quite This example is saying that we’d like to link to the “Edge Certificates” section of the “SSL-TLS” product for some account and some zone that a user needs to manually resolve, as you saw above. It’s easy to imagine removing “?to=/” to transform the link URL into the resolved<resolvedAccount>/<resolvedZone>/ssl-tls/edge-certificates The URL-like schema of the to parameter makes it very natural to support different variations such as account-level Or allowing the linker to supply known This link takes the user to the “Traffic” product tab for some zone inside of account 1234567890abcdef. Indeed, the :account and :zone symbols are placeholders for user-supplied values, but they can be replaced with any permutation of real, known values to speed up resolution time to provide a better UX.DeepLink routingThese links are parsed and resolved in our top-level routing component, DeepLink. At a high level, this component contains a series of “resolvers” for unknown symbols that need automatic or user-interactive resolution (i.e. :account and :zone). But before we dive in, let’s take a step back and gain appreciation for how cool this component is.Cloudflare’s Dashboard is a single page React app, which means we use React Router to create routing components that handle what’s rendered on different URLs:<Switch> <Route path="/login"><Login /></Route> <Route path="/sign-up"><Signup /></Route> ... <AccountRoutes /> </Switch> When a page is loaded, a lot of things need to happen: API calls need to be made to fetch all the data needed to render a page, like account/user/zone info not cached in the browser. Many components need to be rendered. It turns out that we can improve the UX of many users by blocking React Router to make specific queries to our API instead of rendering an entire page that anecdotally fetches the information we need. For example, there’s no need to render a zone selection page if a user only has one zone, like in our GIF above ☝️.ResolversWhen a deep link gets parsed and split into parts, the framework iterates over those parts and tries to build a URL string that is later used to redirect users to a specific location in the dashboard.// to=/:account/:zone/traffic // parts = [‘:account’, ‘:zone’, ‘traffic’] for (const part of parts) { // do something with each part } We can build up the dynamic URL by looking at prefixes. If a part starts with “:”, it’s considered a symbol that needs to be resolved. Everything else is a static string that just gets appended.const resolvedParts: string[] = []; // parts = [‘:account’, ‘:zone’, ‘traffic’] for (let part of parts) { if (part.startsWith(‘:’)) { //resolve } resolvedParts.push(part); } const finalUrl = resolvedParts.join(‘/’); Symbols are handled by functions we call “resolvers”. A resolver is a function that:Is async.Has a context parameter.Always returns a string - the value it resolves to.In JavaScript, async functions always return a promise. Return values that are not type of Promise are wrapped in a resolved promise implicitly. They also allow “await” to be used in them. The async/await syntax is used for resolvers so they can perform any kind of asynchronous work - such as calling the API, while being able to “pause” JavaScript with “await” until that asynchronous work is done.Each dynamic symbol has its own resolver. We currently have two resolvers - for account and for zone.const RESOLVERS: Resolvers = { account: accountResolver, zone: zoneResolver }; const resolvedParts: string[] = []; // parts = [‘:account’, ‘:zone’, ‘traffic’] for (let part of parts) { if (part.startsWith(‘:’)) { // for :account, accountResolver is awaited and returns “abc123” // for :zone, zoneResolver is awaited and returns “” part = await RESOLVERS[part.slice(1)]; } resolvedParts.push(part); } const finalUrl = resolvedParts.join(‘/’); The internal implementation is a little bit more complicated, but this is a rough overview of how our DeepLink works.Resolver contextWe mentioned that each resolver has a context parameter. Context is an object that is passed to resolvers from the DeepLink component and it contains a bunch of handy utilities that give resolvers control over any part of the app. For example, it has access to the Redux store (we use Redux.js in the Dashboard to help us manage the application’s state). It has access to previously resolved values, and to all other parts of the deep link. It also has functions to help with user interactions.User interactionsIn many cases, a resolver is not able to resolve without the user's help. For example, if a user has multiple zones, the resolver working on :zone symbol needs to wait for the user to select a zone.const zoneResolver: Resolver = async ctx => { const zones = await fetchZone(); // Just one zone, :zone symbols can be resolved to without user’s help if (zones.length === 1) return zones[0].name; if (zones.length > 1) { // need user’s help to pick a zone } }; We already have a page in the dashboard with a zone list that looks like this.What we need to do is give the resolver the ability to somehow show this page, and wait for the result of the user's interaction.You might be asking: “But how do we show this page? You just told me that DeepLink blocks the entire page!”That’s true!We decided to block the React Router to prevent unnecessary API calls and DOM updates while a deep link is resolving. But there is no harm in showing some part of the UI, if needed. To be able to do that, we added two functions to context - unblockRouter and blockRouter. These functions just toggle the state that is gating our Router component.const zoneResolver: Resolver = async ctx => { // ... if (zones.length > 1) { // delegate to React Router to render the page with zone picker ctx.unblockRouter(); // need users help to pick a zone // block the router again ctx.blockRouter(); } }; Now, the last piece is to somehow observe user interactions from within the resolver. To be able to do that, we have written a powerful utility.waitForPageActionResolvers are isolated functions that live outside of the application’s components. To be able to observe anything that happens in distant branches of React DOM, we created a function called waitForPageAction. This function takes two parameters:1. pageToAwaitActionOn - URL string pointing to a page we want to await the user's action on. For example, “”2. actionType - Unique string describing the action. For example, ZONE_SELECTED.As you may have guessed, waitForPageAction is an async function. It returns a promise that resolves with action metadata whenever that action happens on the page specified by pageToAwaitActionOn. The promise rejects when the user navigates away from pageToAwaitActionOn. Otherwise, it keeps waiting… forever.This helps us to write a code that is very easy to understand.const zoneResolver: Resolver = async ctx => { // ... if (zones.length > 1) { // delegate to React Router to render the page with zone picker ctx.unblockRouter(); // need users help to pick a zone. Wait for ‘ZONE_SELECTED’ action at ‘’ // action is an object with metadata about zone. It contains zoneName, which can be used in this resolver to resolve :zone symbol const action = ctx.waitForPageAction( ‘’, ‘ZONE_SELECTED’ ); // block the router again ctx.blockRouter(); return action.zoneName } }; How does waitForPageAction work?As mentioned above, we use Redux to manage our state. The actionType parameter is nothing else than a type of Redux action. Whenever a zone is selected, React dispatches a Redux action in an onClick handler.<ZoneCard onClick={zoneName => { dispatch({type: ‘ZONE_SELECTED’, zoneName}) }} /> Now, how does waitForPageAction know that ZONE_SELECTED’ has been dispatched? Aren’t we supposed to write a reducer?!Not really. waitForPageAction is not changing any state, it’s just an observer that resolves whenever some action, that is dispatched, satisfies a predicate. And Redux has an API to subscribe to any store changes - store.subscribe(listener).The listener will be called any time an action is dispatched, and some part of the state tree may have changed. Unfortunately, the listener does not have access to the currently dispatched action. We can only read the current state. Solution? Store the action in the Redux store!Redux actions are just plain objects (mostly), and thus easy to serialize. We added a simple reducer that stores all actions in the Redux state.export function deepLinkReducer( state: State = DEFAULT_STATE, action: AnyAction ){ const nextState = { ... state, lastAction: action }; return nextState; } Anytime an action is dispatched, we can read that action’s metadata in store.getState().lastAction. Now, we have everything we need to finally implement waitForPageAction.export function waitForPageAction = (store: Store<DashState>) =>( pageToAwaitActionOn: string, actionType: string ) => new Promise<AnyAction>((resolve, reject) => { // Subscribe to redux store const unsubscribe = store.subscribe(() => { const state = store.getState(); const currentPage = state.router.location.pathname; const lastAction = state.lastAction; if (currentPage !== pageToAwaitActionOn) { // user navigated away -unsubscribe and reject unsubscribe(); reject(‘User navigated away’); } else if (lastAction.type === actionType) { // Action types match! Unsubscribe and resolve with action object unsubscribe(); resolve(lastAction); } }); }); The listener reads the current state and grabs the currentPage and lastAction data. If currentPage doesn’t match pageToAwaitActionOn, it means the user navigated away, and there’s no need to continue resolving the deep link - we unsubscribe, and reject the promise. Deep link resolvers are stopped, and React Router unblocked.Else, if lastAction.type matches the actionType parameter, it means the action we are waiting on just happened! Unsubscribe, and resolve the promise with action metadata. The deep link keeps resolving.That’s it! We also added a similar function - waitForAction - which does exactly the same thing, but is not restricted to a specific page.ScrollAnchor componentWe implemented a wrapper component ScrollAnchor that will scroll to its wrapped content, making our deep links even more targeted. A client would wrap some content like this:<ScrollAnchor id=”super-important-setting-card”> <SuperImportantSettingCard /> </ScrollAnchor> And then reference it via a typical URL Now I can hear you saying, “what’s the point? Can’t we get the same behavior with any old ID?”<div id=”super-important-setting-card”> <SuperImportantSettingCard /> </div> We thought so too! But it turns out that there are a few problems that prevent this super simple approach:The Dashboard’s fixed headerDOM updates after page loadSince the Dashboard contains a fixed header at the top of the page, we can’t simply anchor to any ID, since the content will be scrolled to the top of the browser window behind the header. Fortunately, there’s a simple CSS solution using negative margins:<div id=”super-important-setting-card” padding-top={headerOffset} margin-top={headerOffset}> <SuperImportantSettingCard /> </div> This CSS trick alone would work for a static site with a fixed header, but the Dashboard is very dynamic. We found early on in testing that using a normal HTML ID anchor in a URL would cause the browser to jump to the tag on page load but the DOM would change in response to newly fetched information or re-rendering, and the anchored content would be pushed out of view.A solution: scroll to the anchored content after the page content is fully loaded, i.e. after all API calls are resolved, spinners removed, content is rendered. Fortunately, there’s a good way to programmatically scroll a browser window: Element.scrollIntoView(). However, there isn’t a good way to tell when the DOM is finished changing, since it can be modified at any time after page load. Let’s consider two possible strategies for determining when to scroll anchored content into view.Strategy #1: scroll after a fixed duration. If our goal is to make sure we only scroll to content after a page is “fully loaded”, we can simplify the problem by making some assumptions. Namely, we can assume a maximum amount of time it will take a given page to fetch resources from the backend and re-render the DOM. Let’s call this assumed max duration M milliseconds. We can then easily scroll to some content by running a timeout on page load:setTimeout(() => scrollTo(htmlId), M) The problem with this approach is that the DOM might finish updating before or after we scroll. We end up with vertical alignment problems (as the DOM is still settling) or a jarring, unexpected scroll (if we scroll long after the DOM is settled). Both options are bad UX, and in practice it’s difficult to choose a duration constant M that is “just right” for every single page.Strategy #2: scroll after the DOM has “settled”. If we know that choosing a good duration M for every page isn’t practical, we should try to come up with an algorithm that can choose a better M:Define an arbitrary threshold of DOM “busyness”, B milliseconds.On page load, start a timer that will scroll to anchored content after B milliseconds.If we observe any changes to the DOM, reset the timer.Once the timer expires, we know that the DOM hasn’t changed in B milliseconds.By varying our choice of B, we’re able to have some control over how long we’re willing to wait for a page to “finish loading”. If B is 0 milliseconds, we’ll scroll to the anchored content immediately. If it’s 1000 milliseconds, we’ll wait a full second after any DOM change before scrolling. This algorithm is more resilient than fixed threshold scrolling since it explicitly listens to the DOM, but the chosen threshold is somewhat arbitrary. After some trial and error loading a sample of Dashboard pages, we determined that a 500 millisecond busyness threshold was sufficient to allow all content to load onto a page. Here’s what the implementation looks like:const SETTLE_THRESHOLD = 500; const scrollThunk = (observer: MutationObserver) => { scrollToAnchor(id); observer.disconnect(); }; let domTimer: number; const observer = new MutationObserver((_mutationsList, observer) => { domTimer = resetTimeout(domTimer, scrollTunk, SETTLE_THRESHOLD, observer); }); observer.observe(document.body, {childList: true, subtree: true}); domTimer = window.setTimeout(scrollThunk, SETTLE_THRESHOLD, observer); A key assumption is that API calls take roughly the same amount of time to resolve. If most fetches take 250ms to resolve but others take 1500ms, we might see that the DOM hasn’t been changed for a while and think that it’s settled. Who knew there would be so much work involved in scrolling!ConclusionThere you have it. A fully-featured deep linking solution with an intuitive schema, React Router blocking, autofilling, and scrolling. Thanks for reading.

Network-Layer DDoS Attack Trends for Q1 2020

As we wrapped up the first quarter of 2020, we set out to understand if and how DDoS attack trends have shifted during this unprecedented time of global shelter in place. Since then, traffic levels have increased by over 50% in many countries, but have DDoS attacks increased as well? Traffic increases are often observed during holiday seasons. During holidays, people may spend more time online; whether shopping, ordering food, playing online games or a myriad of other online activities. This higher usage translates into higher revenue per minute for the companies that provide those various online services. Downtime or service degradation during these peak times could result in user churn and loss of significant revenue in a very short time. ITIC estimates that the average cost of an outage is $5,600 per minute, which extrapolates to well over $300K per hour. It is therefore no surprise that attackers capitalize on the opportunity by launching a higher number of DDoS attacks during the holiday seasons.The current pandemic has a similar cause and effect. People are forced to stay home. They have become more reliant on online services to accomplish their daily tasks which has generated a surge in the Internet traffic and DDoS attacks.The Rise of Smaller, Shorter AttacksMost of the attacks that we observed in Q1 2020 were relatively small, as measured by their bit rates. As shown in the figure below, in Q1 2020, 92% of the attacks were under 10 Gbps, compared to 84% in Q4 2019. Diving deeper, an interesting shift can be observed in the distribution of attacks below 10 Gbps in Q1, as compared to the previous quarter. In Q4, 47% of network-layer DDoS attacks peaked below 500 Mbps, whereas in Q1 they increased to 64%. From a packet rate perspective, the majority of the attacks peaked below 1 million packets per second (pps). This rate, along with their bit rate, indicates that attackers are no longer focusing their efforts and resources to generate high-rate floods -- bits or packets per second. However, it's not only the packet and bit rates that are decreasing, but also the attack durations. The figure below illustrates that 79% of DDoS attacks in Q1 lasted between 30 to 60 minutes, compared to 60% in Q4, which represents a 19% increase.These three trends could be explained by the following:Launching DDoS attacks is cheap and you don’t need much technical background. DDoS-as-a-service tools have provided a possible avenue for bad actors with little to no technical expertise to launch DDoS attacks quickly, easily, in a cost-effective manner and with limited bandwidth. According to Kaspersky, DDoS attack services can cost as little as $5 for a 300-second attack (5 minutes). Additionally, amateur attackers can also easily leverage free tools to generate floods of packets. As we’ll see in the next section, 13.5% of all DDoS attacks in Q1 were generated using variations of the publicly available Mirai code.While an attack under 10 Gbps might seem small, it can still be enough to affect underprotected Internet properties. Smaller and quicker attacks might prove to deliver a high ROI for attackers to extort a ransom from companies in lieu of not disrupting the availability of the Internet property. Larger Attacks Still Persist, Albeit in Smaller NumbersWhile the majority of the attacks were under 10 Gbps, larger attacks are still prevalent. The below graph shows a trend in the largest bit-rate of network-layer DDoS attacks that Cloudflare has observed and mitigated in Q4 2019 and Q1 2020. The largest attack for the quarter was observed during March and peaked just above 550 Gbps. If At First You Don’t Succeed, Try, Try AgainA persistent attacker is one that does not give up when their attacks fail; they try and try again. They launch multiple attacks on their target, often utilizing multiple attack vectors. In the Q4 2019 holiday season, attackers persisted and launched as many as 523 DDoS attacks in one day against a single Cloudflare IP. Each Cloudflare IP under attack was targeted by as many as 4.6 DDoS attacks every day on average. During Q1, as the world entered COVID-19 lockdown, we observed a significant increase in the number of attacks compared to the monthly average. The last time we saw such an increase was in the Q4 2019 holiday season. However, an interesting difference is that attackers seem less persistent now than during the holidays. In Q1 2020, the average persistence rate dropped as low as 2.2 attacks per Cloudflare IP address per day, with a maximum of 311 attacks on a single IP; 40% less than the previous holiday quarter.Throughout the past two quarters, the average number of attack vectors employed in DDoS attacks per IP per day has been mostly steady at approximately 1.4, with a maximum of 10.Over the past quarter, we've seen over 34 different types of attack vectors on L3/4. ACK attacks formed the majority (50.1%) in Q1, followed by SYN attacks with 16.6%, and in third place, Mirai, which still represents a significant portion of the attacks (15.4%). Together, SYN & ACK DDoS attacks (TCP) form 66% of all L3/4 attack vectors in Q1.Top Attack VectorsAll Attack Vectors .tg {border-collapse:collapse;border-spacing:0;} .tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px; overflow:hidden;padding:10px 5px;word-break:normal;} .tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px; font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;} .tg .tg-kenc{background-color:rgb(201, 218, 248);border-color:inherit;color:rgb(0, 0, 0); font-family:Verdana, Geneva, sans-serif !important;;font-weight:bold;text-align:center;vertical-align:middle} .tg .tg-42jq{background-color:#ffffff;border-color:inherit;color:rgb(0, 0, 0);font-family:Verdana, Geneva, sans-serif !important;; text-align:center;vertical-align:middle} Attack Vector Percent in Q1 ACK 50.121% SYN 16.636% Mirai 15.404% UDP 5.714% LDAP 2.898% SSDP 2.833% DNS 2.677% Other 0.876% QUIC 0.527% NTP 0.373% RST 0.353% Memcached 0.296% ChargeGen 0.236% WS Discovery 0.221% ACK-PSH 0.208% SNMP 0.159% VSE 0.081% MSSQL 0.079% ICMP 0.072% Bittorrent 0.056% OpenVPN 0.046% Dahua 0.032% GRE 0.022% TFTP 0.014% LOIC 0.014% STUN 0.011% Lantronix 0.009% CoAP 0.008% Jenkins 0.006% VXWorks 0.005% Ubiquity 0.005% TeamSpeak 0.004% XMAS 0.003% SPSS 0.001% A Crisis is Unfortunately Sometimes a Malevolent OpportunityThe number of DDoS attacks in March 2020 increased as compared to January and February. Attackers found the crisis period to be an opportune time to launch an increased number of DDoS attacks, as illustrated below.Furthermore, as various government authorities started mandating lockdowns and shelter-in-place orders, attackers resorted to increasing the number of large-sized attacks in the latter half of March. There were 55% more attacks observed in the second half of month (March 16-31) as compared to the first half (March 1-15). Additionally, 94% of attacks peaking at 300-400 Gbps were launched in the month of March.Stop DDoS attacks, Large or Small, Closer To The SourceWith the ever shifting DDoS landscape, it is important to have a DDoS protection solution which is comprehensive and adaptive. In context with the attack insights illustrated above, here’s how Cloudflare stays ahead of these shifts to protect our customers.As attacks shrink in rate and duration, Time To Mitigate SLAs as long as 15 minutes provided by legacy vendors are just not practical anymore. Cloudflare mitigates network layer DDoS attacks under 10 seconds in most cases, which is especially critical for the increasingly shorter attacks. Read more about the recent enhancements to our DDoS detection and mitigation systems that allow us to automatically detect and mitigate DDoS attacks so quickly at scale. An increasing number of DDoS attacks are localized, which implies that legacy DDoS solutions which adopt a scrubbing center approach are not a feasible solution, as they are limited in their global coverage as well as act as a choke point, as DDoS traffic needs to be hauled back and forth from them. Cloudflare’s unique distributed architecture empowers every one of its data centers, spanning across 200 cities globally, to provide full DDoS mitigation capabilities. Large distributed volumetric attacks still exist and are employed by resourceful attackers when the opportunity is rife. An attack exceeding 1 Tbps can be expected in the future, so the ability to mitigate large DDoS attacks is a key aspect of today’s DDoS solution. Cloudflare has one of the most interconnected networks in the world with a capacity of over 35 Tbps which allows it to mitigate even the largest DDoS attacks. This massive network capacity concomitant with the globally distributed architecture allows Cloudflare to mitigate attacks, both small and large, closer to the source. To learn more about Cloudflare’s DDoS solution contact us or get started.

Backblaze B2 and the S3 Compatible API on Cloudflare

In May 2020, Backblaze, a founding Bandwidth Alliance partner announced S3 compatible APIs for their B2 Cloud Storage service. As a refresher, the Bandwidth Alliance is a group of forward-thinking cloud and networking companies that are committed to discounting or waiving data transfer fees for shared customers. Backblaze has been a proud partner since 2018. We are excited to see Backblaze introduce a new level of compatibility in their Cloud Storage service.History of the S3 APIFirst let’s dive into the history of the S3 API and why it’s important for Cloudflare users.Prior to 2006, before the mass migration to the Cloud, if you wanted to store content for your company you needed to build your own expensive and highly available storage platform that was large enough to store all your existing content with enough growth headroom for your business. AWS launched to help eliminate this model by renting their physical computing and storage infrastructure.Amazon Simple Storage Service (S3) led the market by offering a scalable and resilient tool for storing unlimited amounts of data without building it yourself. It could be integrated into any application but there was one catch: you couldn’t use any existing standard such as WebDAV, FTP or SMB: your application needed to interface with Amazon’s bespoke S3 API.Fast forward to 2020 and the storage provider landscape has become highly competitive with many providers capable of providing petabyte (and exabyte) scale content storage at extremely low cost-per-gigabyte. However, Amazon S3 has remained a dominant player despite heavy competition and not being the most cost-effective player.The broad adoption of the S3 API by developers in their codebases and internal systems has transformed the S3 API into what WebDAV promised us to be: de facto standard HTTP File Storage API.Engineering costs of changing storage providersWith many code bases and legacy applications being entrenched in the S3 API, the process to switch to a more cost-effective storage provider is not so easy. Companies need to consider the cost of engineer time programming a new storage API while also physically moving their data.This engineering overhead has led many storage providers to natively support the S3 API, leveling the playing field and allowing companies to focus on picking the most cost-effective provider.First-mile bandwidth costs and the Bandwidth AllianceCloudflare caches content in Points of Presence located in more than 200 cities around the world. This cached content is then handed to your Internet service provider (ISP) over low cost and often free Internet exchange connections in the same facility using mutual fibre optic cables. This cost saving is fairly well understood as the benefit of content delivery networks and has become highly commoditized.What is less well understood is the first-mile cost of moving data from a storage provider to the content delivery network. Typically storage providers expect traffic to route via the Internet and will charge the consumer per-gigabyte of data transmitted. This is not the case for Cloudflare as we also share facilities and mutual fibre optic cables with many storage providers.These shared interconnects created an opportunity to waive the cost of first-mile bandwidth between Cloudflare and many providers and is what prompted us to create the Bandwidth Alliance.Media and entertainment companies serving user-generated content have a continuous supply of new content being moved over the first-mile from the storage provider to the content delivery network. The first-mile bandwidth cost adds up and using a Bandwidth Alliance partner such Backblaze can entirely eliminate it.Using the S3 API in Cloudflare WorkersThe Solutions Engineering team at Cloudflare is tasked with providing strategic technical guidance for our enterprise customers.It’s not uncommon for developers to connect Cloudflare’s global network directly to their storage provider and directly serve content such as live and on-demand video without an intermediate web server.For security purposes engineers typically use Cloudflare Workers to sign each uncached request using the S3 API. Cloudflare Workers allows anyone to deploy code to our global network of over 200+ Points of Presence in seconds and is built on top of Service Workers.We’ve tested Backblaze B2’s S3 Compatible API in Cloudflare Workers using the same code tested for Amazon S3 buckets and it works perfectly by changing the target endpoint.Creating a S3 Compatible Worker scriptHere’s how it is done using Cloudflare Worker’s CLI tool Wrangler:Generate a new project in Wrangler using a template intended for use with Amazon S3:wrangler generate <projectname> This template uses aws4fetch. A fast, lightweight implementation of an S3 Compatible signing library that is commonly used in Service Worker environments like Cloudflare Workers.The template creates an index.js file with a standard request signing implementation:import { AwsClient } from 'aws4fetch' const aws = new AwsClient({ "accessKeyId": AWS_ACCESS_KEY_ID, "secretAccessKey": AWS_SECRET_ACCESS_KEY, "region": AWS_DEFAULT_REGION }); addEventListener('fetch', function(event) { event.respondWith(handleRequest(event.request)) }); async function handleRequest(request) { var url = new URL(request.url); url.hostname = AWS_S3_BUCKET; var signedRequest = await aws.sign(url); return await fetch(signedRequest, { "cf": { "cacheEverything": true } }); } Environment VariablesModify your wrangler.toml file to use your Backblaze B2 API Key ID and Secret:[] vars = { AWS_ACCESS_KEY_ID = "<BACKBLAZE B2 keyId>", AWS_SECRET_ACCESS_KEY = "<BACKBLAZE B2 secret>", AWS_DEFAULT_REGION = "", AWS_S3_BUCKET = "<BACKBLAZE B2 bucketName>.<BACKBLAZE B2 S3 Endpoint>"} AWS_S3_BUCKET environment variable will be the combination of your bucket name, period and S3 Endpoint. For a Backblaze B2 Bucket named example-bucket and S3 Endpoint use environment variable is interpreted from your S3 Endpoint. I use us-west-002.We recommend using Secret Environment variables to store your AWS_SECRET_ACCESS_KEY content when using this script in production.Preview your Cloudflare WorkerNext run wrangler preview --env dev to enter a preview window of your Worker script. My bucket contained a static website containing adaptive streaming video content stored in a Backblaze B2 bucket.Note: We permit caching of third party video content only for enterprise domains. Free/Pro/Biz users wanting to serve video content via Cloudflare may use Stream which delivers an end-to-end video delivery service.Backblaze B2’s compatibility for the S3 API is an exciting update that has made their storage platform highly compatible with existing code bases and legacy systems. And, as a special offer to Cloudflare blog readers, Backblaze will pay the migration costs for transferring your data from S3 to Backblaze B2 (click here for more detail). With the cost of migration covered and compatibility for your existing workflows, it is now easier than ever to switch to a Bandwidth Alliance partner and save on first-mile costs. By doing so, you can slash your cloud bills, gain flexibility, and make no compromises to your performance.To learn more, join us on May 14th for a webinar focused on getting you ultra fast worldwide content delivery.

Making Video Intuitive: An Explainer

On the Stream team at Cloudflare, we work to provide a great viewing experience while keeping our service affordable. That involves a lot of small tweaks to our video pipeline that can be difficult to discern by most people. And that makes the results of those tweaks less intuitive.In this post, let's have some fun. Instead of fine-grained optimization work, we’ll do the opposite. Today we’ll make it easy to see changes between different versions of a video: we’ll start with a high-quality video and ruin it. Instead of aiming for perfection, let’s see the impact of various video coding settings. We’ll go on a deep dive on how to make some victim video look gloriously bad and learn on the way.Everyone agrees that video on the Internet should look good, start playing fast, and never rebuffer regardless of the device they’re on. People can prefer one version of a video over another and say it looks better. Most people, though, would have difficulty elaborating on what ‘better’ means. That’s not an issue when you’re just consuming video. However, when you’re storing, encoding, and distributing it, how that video looks determines how happy your viewers are.To determine what looks better, video engineers can use a variety of techniques. The most accessible is the most obvious: compare two versions of a video by having people look at them—a subjective comparison. We’ll apply eyeballs here.So, who’s our sacrificial video? We’re going to use a classic video for the demonstration here—perhaps too classic for people that work with video—Big Buck Bunny. This is an open-source film by Sacha Goedegebure available under the permissive Creative Commons Attribution 3.0 license. We’re only going to work with 17 seconds of it to save some time. This is what the video looks like when downloaded from Take a moment to savor the quality since we’re only getting worse from here. For brevity, we'll evaluate our results by two properties: smooth motion and looking ‘crisp’. The video shouldn’t stutter and its important features should be distinguishable.It’s worth mentioning that video is a hack of your brain. Every video is just an optimized series of pictures— a very sophisticated flipbook. Display those pictures quickly enough and you can fool the brain into interpreting motion. If you show enough points of light close together, they meld into a continuous image. Then, change the color of those lights frequently enough and you end up with smooth motion.Frame rateNot stuttering is covered by framerate, measured in frames-per-second (fps). fps is the number of individual pictures displayed in a single second; many videos are encoded at somewhere between 24 and 30fps. One way to describe fps is in terms of how long a frame is shown for—commonly called the frame time. At 24fps, each frame is shown for about 41 milliseconds. At 2fps, that jumps to 500ms. Lowering fps causes frames to trend rapidly towards persisting for the full second. Smooth motion mostly comes down to the single knob of fps. Mucking about with framerate isn’t a sporting way to achieve our goal. It’s extremely easy to tank the framerate and ruin the experience. Humans have a low tolerance for janky motion. To get the idea, here’s what our original clip reduced to 2fps looks like; 500ms per-frame is a long time.ffmpeg -v info -y -hide_banner -i source.mp4 -r 2 -c:v h264 -c:a copy 2fps.mp4 ResolutionMaking tiny features distinguishable has many more knobs. Choices you can make include what codec, level, profile, bitrate, resolution, color space, or keyframe frequency, to name a few. Each of these also influences factors apart from perceived quality, such as how large the resulting file is plus what devices it is compatible with. There’s no universal right answer for what parameters to encode a video with. For the best experience while not wasting resources, the same video intended for a modern 4k display should be tailored differently for a 2007 iPod Nano. We’ll spend our time here focusing on what impacts a video’s crispness since that’s what largely determines the experience.We’re going to use FFmpeg to make this happen. This is the sonic screwdriver of the video world; a near-universal command-line tool for converting and manipulating media. FFmpeg is almost two decades old, has hundreds of contributors, and can do essentially any digital video-related task. Its flexibility also makes it rather complex to work with. For each version of the video, we’ll show the command used to generate it as we go.Let’s figure out exactly what we want to change about the video to make it a bad experience.You may have heard about resolution and bitrate. To explain them, let’s use an analogy. Resolution provides pixels. Pixels are buckets for information. Bitrate is the information that fills those buckets. How full a given bucket is determines how well a pixel can represent content. With too few bits of information for a bucket, the pixel will get less and less accurate to the original source. In practice, their numerical relationship is complicated. These are what we’ll be varying.The decision of which bucket should get how many bits of information is determined by software called a video encoder. The job of the encoder is to use the bits budgeted for it as efficiently as possible to display the best quality video. We’ll be changing the bitrate budget to influence the resulting bitrate. Like people with money, budgeting is a good idea for our encoder. Uncompressed video can use a byte, or more, per-pixel for each of the red, green, and blue(RGB) channels. For a 1080p video, that means 1920x1080 pixels multiplied by 3 bytes to get 6.2MB per frame. We’ll talk about frames later but 6.2 MB is a lot— at this rate, a DVD disc would only fit about 50 seconds of video.With our variables chosen, we’re good to go. For every variation we encode, we’ll show a comparison to this table. Our source video is encoded in H.264 at 24fps with a variety of other settings, those features will not change. Expect these numbers to get significantly smaller as we poke around to see what changes. Resolution Bitrate File Size Source 1280x720 7.5Mbps 16MB To start, let’s change just resolution and see what impact that has. The lowest resolution most people are exposed to is usually 140p, so let’s reencode our source video targeting that. Since many video platforms have this as an option, we’re not expecting an unwatchable experience quite yet.ffmpeg -v info -y -hide_banner -i source.mp4 -vf scale=-2:140 -c:v h264 -b:v 6000k -c:a copy scaled-140.mp4 Resolution Bitrate File Size Source 1280x720 7.5Mbps 16MB Scaled to 140p 248x140 2.9Mbps 6.1MB By the numbers, we find some curious results. We didn’t ask for a different bitrate from the source but our encoder gave us one that is roughly a third. Given that the number of pixels was dramatically reduced, the encoder had fewer buckets to put the information in our bitrate. Despite its best attempt at using the entire bitrate budget provided to it, our encoder filled all the buckets we provided. What did it do with the leftover information? Since it isn’t in the video, it tossed it.This would probably be an acceptable experience on a 4in phone screen. You wouldn’t notice the sort-of grainy result on a small display. On a 40in TV, it’d be blocky and unpleasant. At 40in, 140 rows of pixels become individually distinguishable which doesn’t fool the brain and ruins the magic.BitrateBitrate is the density of information for a given period of time, typically a second. This interacts with framerate to give us a per frame bitrate budget. Our source having a bitrate of 7.5Mbps (millions of bits-per-second) and framerate of 24fps means we have an average of 7500Kbps / 24fps = 312.5Kb of information per frame.Different kinds of frames There are different ways a frame can be encoded. It doesn’t make sense to use the same technique for a sequence of frames of a single color and most of the sequences in Big Buck Bunny. There’s differing information density and distribution between those sequences. Different ways of representing frames take advantage of those differing patterns. As a result, the 312Kb average for each frame is both lower than the size of the larger frames and greater than the size of the smallest frames. Some frames contain just changes relative to other frames – these are P or B frames – those could be far smaller than 312Kb. However, some frames contain full images – these are I frames – and tend to be far larger than 312Kb. Since we’re viewing the video holistically as multiple seconds, we don’t need to worry about them since we’re concerned with the overall experience. Knowing about frames is useful for their impact on bitrate for different types of content, which we’ll discuss later.Our starting bitrate is extremely large and has more information than we actually need. Let’s be aggressive and cut it down to 1/75th while maintaining the source’s resolution.ffmpeg -v info -y -hide_banner -i source.mp4 -c:v h264 -b:v 100k -c:a copy bitrate-100k.mp4 Resolution Bitrate File Size Source 1280x720 7.5Mbps 16MB Scaled to 140p 248x140 2.9Mbps 6.1MB Targeted to 100Kbps 1280x720 102Kbps 217KB When you take a look at the video, fur and grass become blobs. There’s just not enough information to accurately represent the fine details.Source Video100 Kbps budgetWe provided a bitrate budget of 100Kbps but the encoder doesn’t seem to have quite hit it. When we changed the resolution, we had a lower bitrate than we asked for, here we have a higher bitrate. Why would that be the case?We have so many buckets that there’s some minimum amount the encoder wants in each. Since it can play with the bitrate, it ends up favoring slightly more full buckets since that’s easier. This is somewhat the reverse of why our previous experiment had a lower bitrate than expected.We can influence how the encoder budgets bitrate using rate control modes. We’re going to stick with the default ‘Average-Bitrate’ mode to keep things easy. This mode is sub-optimal since it lets the encoder spend a bunch of budget up front to its detriment later. However, it's easy to reason about.Resolution + BitrateTargeting a bitrate of 100Kbps got us an unpleasant video but not something completely unwatchable. We haven’t quite ruined our video yet. We might as well take bitrate down to an even further extreme of 20Kbps while keeping the resolution constant.ffmpeg -v info -y -hide_banner -i source.mp4 -c:v h264 -b:v 20k -c:a copy bitrate-20k.mp4 Resolution Bitrate File Size Source 1280x720 7.5Mbps 16MB Scaled to 140p 248x140 2.9Mbps 6.1MB Targeted to 100Kbps 1280x720 102Kbps 217KB Targeted to 20Kbps 1280x720 35Kbps 81KB Now, this is truly unwatchable! There’s sometimes color but the video mostly devolves into grayscale rectangles roughly approximating the silhouettes of what we’re expecting. At slightly less than a third the bitrate of the previous trial, this definitely looks like it has less than a third of the information.As before, we didn’t hit our bitrate target and for the same reason that our pixel buckets were insufficiently filled with information. The encoder needed to start making hard decisions at some point between 102 and 35Kbps. Most of the color and the comprehensibility of the scene were sacrificed.We’ll discuss why there’s moving grayscale rectangles and patches of color in a bit. They’re giving us a hint about how the encoder works under the hood.What if we go just one step further and combine our tiny resolution with the absurdly low bitrate? That should be an even worse experience, right?ffmpeg -v info -y -hide_banner -i source.mp4 -vf scale=-2:140 -c:v h264 -b:v 20k -c:a copy scaled-140_bitrate-20k.mp4 Resolution Bitrate File Size Source 1280x720 7.5Mbps 16MB Scaled to 140p 248x140 2.9Mbps 6.1MB Targeted to 100Kbps 1280x720 102Kbps 217KB Targeted to 20Kbps 1280x720 35Kbps 81KB Scaled to 140p and Targeted to 20Kbps 248x140 19Kbps 48KB Wait a minute, that’s actually not too bad at all. It’s almost like a tinier version of 1280 by 720 at 100Kbps. Why doesn’t this look terrible? Having a lower bitrate means there’s less information, which implies that the video should look worse. A lower resolution means the image should be less detailed. The numbers got smaller, so the video shouldn’t look better!Thinking back to buckets and information, we now have less information but fewer discrete places for that information to live. This specific combination of low bitrate and low resolution means the buckets are nicely filled. The encoder exactly hit our target bitrate which is a reasonable indicator that it was at least somewhat satisfied with the final result.This isn’t going to be a fun experience on a 4k display but it is fine enough for an iPod Nano from 2007. A 3rd generation iPod Nano has a 320x240 display spread across a 2in screen. Our 140p video will be nearly indistinguishable from a much higher quality video. Even more, 48KB for 17 seconds of video makes fantastic use of the limited storage – 4GB on some models. In a resource-constrained environment, this low video quality can be a large quality of experience improvement.CC BY 2.0 - image by nezWe should have a decent intuition for the relationship between bitrate and resolution plus what the tradeoffs are. There’s a lingering question, though, do we need to make tradeoffs? There has to be some ratio of bitrate to pixel-count in order to get the best quality for a given resolution at a minimal file size.In fact, there are such perfect ratios. In ruining the video, we ended up testing a few candidates of this ratio for our source video. Resolution Bitrate File Size Bits/Pixel Source 1280x720 7.5Mbps 16MB 8.10 Scaled to 140p 248x140 2.9Mbps 6.1MB 83.5 Targeted to 100Kbps 1280x720 102Kbps 217KB 0.11 Targeted to 20Kbps 1280x720 35Kbps 81KB 0.03 Scaled to 140p and Targeted to 20Kbps 248x140 19Kbps 48KB 0.55 However, there are some complications.The biggest caveat is that the optimal ratio depends on your source video. Each video has a different amount of information required to be displayed. There are a couple of reasons for that. If a frame has many details then it takes more information to represent. Frames in chronological order that visually differ significantly (think of an action movie) take more information than a set of visually similar frames (like a security camera outside a quiet warehouse). The former can’t use as many B or P frames which occupy less space. Animated content with flat colors require encoders to make fewer trade offs that cause visual degradation than live-action.Thinking back to the settings that resulted in grayscale rectangles and patches of color, we can learn a bit more. We saw that the rectangles and color seem to move, as though the encoder was playing a shell game with tiny boxes of pictures.What is happening is that the encoder is recognizing repeated patterns within and between frames. Then, it can reference those patterns to move them around without needing to actually duplicate them. The P and B frames mentioned earlier are mainly composed of these shifted patterns. This is similar, at least in spirit, to other compression algorithms that use dictionaries to refer to previous content. In most video codecs, the bits of picture that can be shifted are called ‘macroblocks’, which subdivide each frame with NxN squares of pixels. The less stingy the bitrate, the less obvious the macroblock shell game.To see this effect more clearly, we can ask FFmpeg to show us decisions it makes. Specifically, it can show us what it decides is ‘motion’ moving the macroblocks. The video here is 140p for the motion vector arrows to be easier to see.ffmpeg -v info -y -hide_banner -flags2 +export_mvs -i source.mp4 -vf scale=-2:140,codecview=mv=pf+bf+bb -c:v h264 -b:v 6000k -c:a copy motion-vector.mp4 Even worse is that flat color and noise might only be seen in two different scenes in the same video. That forces you to either waste your bitrate budget in one scene or look terrible in the other. We give the encoder a bitrate budget it can use. How it uses it is the result of a feedback loop during encoding.Yet another caveat is that your resulting bitrate is influenced by all those knobs that were listed earlier, the most impactful being codec choice followed by bitrate budget. We explored the relationship between bitrate and resolution but every knob has an impact on the quality and a single knob frequently interacts with other knobs.So far we’ve taken a look at some of the knobs and settings that affect visual quality in a video. Every day, video engineers and encoders make tough decisions to optimize for the human eye, while keeping file sizes at a minimum. Modern encoding schemes use techniques such as per title encoding to narrow down the best resolution-bitrate combinations. Those schemes look somewhat similar to what we’ve done here: test various settings and see what gives the desired result.With every example, we’ve included an FFmpeg command you can use to replicate the output above and experiment with your own videos. We encourage you to try improving the video quality while reducing file sizes on your own and to find other levers that will help you on this journey!

CUBIC and HyStart++ Support in quiche

quiche, Cloudflare's IETF QUIC implementation has been running CUBIC congestion control for a while in our production environment as mentioned in Comparing HTTP/3 vs. HTTP/2 Performance. Recently we also added HyStart++  to the congestion control module for further improvements.In this post, we will talk about QUIC congestion control and loss recovery briefly and CUBIC and HyStart++ in the quiche congestion control module. We will also discuss lab test results and how to visualize those using qlog which was recently added to the quiche library as well.QUIC Congestion Control and Loss RecoveryIn the network transport area, congestion control is how to decide how much data the connection can send into the network. It has an important role in networking so as not to overrun the link but also at the same time it needs to play nice with other connections in the same network to ensure that the overall network, the Internet, doesn’t collapse. Basically congestion control is trying to detect the current capacity of the link and tune itself in real time and it’s one of the core algorithms for running the Internet.QUIC congestion control has been written based on many years of TCP experience, so it is little surprise that the two have mechanisms that bear resemblance. It’s based on the CWND (congestion window, the limit of how many bytes you can send to the network) and the SSTHRESH (slow start threshold, sets a limit when slow start will stop). Congestion control mechanisms can have complicated edge cases and can be hard to tune. Since QUIC is a new transport protocol that people are implementing from scratch, the current draft recommends Reno as a relatively simple mechanism to get people started. However, it has known limitations and so QUIC is designed to have pluggable congestion control; it’s up to implementers to adopt any more advanced ones of their choosing.Since Reno became the standard for TCP congestion control, many congestion control algorithms have been proposed by academia and industry. Largely there are two categories: loss-based congestion control such as Reno and CUBIC, where the congestion control responds to a packet loss event, and delay-based congestion control, such as Vegas and BBR , which the algorithm tries to find a balance between the bandwidth and RTT increase and tune the packet send rate.You can port TCP based congestion control algorithms to QUIC without much change by implementing a few hooks. quiche provides a modular API to add a new congestion control module easily.Loss detection is how to detect packet loss at the sender side. It’s usually separated from the congestion control algorithm but helps the congestion control to quickly respond to the congestion. Packet loss can be a result of the congestion on the link, but the link layer may also drop a packet without congestion due to the characteristics of the physical layer, such as on a WiFi or mobile network.Traditionally TCP uses 3 DUP ACKs for ACK based detection, but delay-based loss detection such as RACK  has also been used over the years. QUIC combines the lesson from TCP into two categories . One is based on the packet threshold (similar to 3 DUP ACK detection) and the other is based on a time threshold (similar to RACK). QUIC also has ACK Ranges similar to TCP SACK to provide a status of the received packets but ACK Ranges can keep a longer list of received packets in the ACK frame than TCP SACK. This simplifies the implementation overall and helps provide quick recovery when there is multiple loss.RenoReno (often referred as NewReno) is a standard congestion control for TCP and QUIC .Reno is easy to understand and doesn't need additional memory to store the state so can be implemented in low spec hardware too. However, its slow start can be very aggressive because it keeps increasing the CWND quickly until it sees congestion. In other words, it doesn’t stop until it sees the packet loss.Note that there are multiple states for Reno; Reno starts from "slow start" mode which increases the CWND very aggressively, roughly 2x for every RTT until the congestion is detected or CWND > SSTHRESH. When packet loss is detected, it enters into the “recovery” mode until packet loss is recovered.When it exits from recovery (no lost ranges) and CWND > SSTHRESH, it enters into the "congestion avoidance" mode where the CWND grows slowly (roughly a full packet per RTT) and tries to converge on a stable CWND. As a result you will see a “sawtooth” pattern when you make a graph of the CWND over time.Here is an example of Reno congestion control CWND graph. See the “Congestion Window” line.CUBICCUBIC was announced in 2008 and became the default congestion control in the Linux kernel. Currently it's defined in RFC8312  and implemented in many OS including Linux, BSD and Windows. quiche's CUBIC implementation follows RFC8312 with a fix made by Google in the Linux kernel .What makes the difference from Reno is during congestion avoidance  its CWND growth is based on a cubic function as follows:(from the CUBIC paper: is the value of CWND when the congestion is detected. Then it will reduce the CWND by 30% and then the CWND starts to grow again using a cubic function as in the graph, approaching Wmax aggressively in the beginning in the first half but slowly converging to Wmax later. This makes sure that CWND growth approaches the previous point carefully and once we pass Wmax, it starts to grow aggressively again after some time to find a new CWND (this is called "Max Probing").Also it has a "TCP-friendly" (actually a Reno-friendly) mode to make sure CWND growth is always bigger than Reno. When the congestion event happens, CUBIC reduces its CWND by 30%, where Reno cuts down CWND by 50%. This makes CUBIC a little more aggressive on packet loss.Note that the original CUBIC only defines how to update the CWND during congestion avoidance. Slow start mode is exactly the same as Reno.HyStart++The authors of CUBIC made a separate effort to improve slow start because CUBIC only changed the way the CWND grows during congestion avoidance. They came up with the idea of HyStart .HyStart is based on two ideas and basically changes how the CWND is updated during slow start:RTT delay samples: when the RTT is increased during slow start and over the threshold, it exits slow start early and enters into congestion avoidance.ACK train: When ACK inter-arrival time gets higher and over the threshold, it exits slow start early and enters into congestion avoidance.However in the real world, ACK train may not be very useful because of ACK compression (merging multiple ACKs into one). Also RTT delay may not work well when the network is unstable.To improve such situations there is a new IETF draft proposed by Microsoft engineers named HyStart++ . HyStart++ is included in the Windows 10 TCP stack with CUBIC.It's a little different from original HyStart:No ACK Train, only RTT sampling.Add a LSS (Limited Slow Start) phase after exiting slow start. LSS grows the CWND faster than congestion avoidance but slower than Reno slow start. Instead of going into congestion avoidance directly, slow start exits to LSS and LSS exits to congestion avoidance when packet loss happens.Simpler implementation.In quiche, HyStart++ is turned on by default for both Reno and CUBIC congestion control and can be configured via API.Lab TestHere is a test result using the test lab . The test condition is as follows:5Mbps bandwidth, 60ms RTT with a different packet loss from 0% to 8%Measure download time of 8MB fileNGINX 1.16.1 server with the HTTP3 patchTCP: CUBIC in Linux kernel 4.14QUIC: Cloudflare quicheDownload 20 times and take a median download timeI run the test with the following combination:TCP CUBIC (TCP-CUBIC)QUIC Reno (QUIC-RENO)QUIC Reno with Hystart++ (QUIC-RENO-HS)QUIC CUBIC (QUIC-CUBIC)QUIC CUBIC with Hystart++ (QUIC-CUBIC-HS)Overall Test ResultHere is a chart of overall test results:In these tests, TCP-CUBIC (blue bars) is the baseline to which we compare the performance of QUIC congestion control variants. We include QUIC-RENO (red and yellow bars) because that is the default QUIC baseline. Reno is simpler so we expect it to perform worse than TCP-CUBIC. QUIC-CUBIC (green and orange bars) should perform the same or better than TCP-CUBIC.You can see with 0% packet loss TCP and QUIC are almost doing the same (but QUIC is slightly slower). As  packet loss increases QUIC CUBIC performs better than TCP CUBIC. QUIC loss recovery looks to work well, which is great news for real-world networks that do encounter loss.With HyStart++, overall performance doesn’t change but that is to be expected, because the main goal of HyStart++ is to prevent overshooting the network. We will see that in the next section.The impact of HyStart++HyStart++ may not improve the download time but it will reduce packet loss while maintaining the same performance without it. Since slow start will exit to congestion avoidance when packet loss is detected, we focus on 0% packet loss where only network congestion creates packet loss.Packet LossFor each test, the number of detected packets lost (not the retransmit count) is shown in the following chart. The lost packets number is the average of 20 runs for each test.As shown above, you can see that HyStart++ reduces a lot of packet loss.Note that compared with Reno, CUBIC can create more packet loss in general. This is because the CUBIC CWND can grow faster than Reno during congestion avoidance and also reduces the CWND less (30%) than Reno (50%) at the congestion event.Visualization using qlog and qvisqvis  is a visualization tool based on qlog . Since quiche has implemented qlog support , we can take qlogs from a QUIC connection and use the qvis tool to visualize connection stats. This is a very useful tool for protocol development. We already used qvis for the Reno graph but let’s see a few more examples to understand how HyStart++ works.CUBIC without HyStart++Here is a qvis congestion chart for a 16MB transfer in the same lab test conditions, with 0% packet loss. You can see a high peak of CWND in the beginning due to slow start. After some time, it starts to show the CUBIC window growth pattern (concave function).When we zoom into the slow start section (the first 0.7 seconds), we can see there is a linear increase of CWND during slow start. This continues until we see a packet lost around 500ms and enters into congestion avoidance after recovery, as you can see in the following chart:CUBIC with HyStart++Let’s see the same graph when HyStart++ is enabled. You can see the slow start peak is smaller than without HyStart++, which will lead to less overshooting and packet loss:When we zoom in the slow start part again, now we can see that the slow start exits to Limited Slow Start (LSS) around 390ms and exit to congestion avoidance at the congestion event around 500ms.As a result you can see the slope is less steep until congestion is detected. It will lead to less packet loss due to less overshooting the network and faster convergence to a stable CWND.Conclusions and Future TasksThe QUIC draft spec already has integrated a lot of experience from TCP congestion control and loss recovery. It recommends the simple Reno mechanism as a means to get people started implementing the protocol but is under no illusion that there are better performing ones out there. So QUIC is designed to be pluggable in order for it to adopt mechanisms that are being deployed in state-of-the-art TCP implementations.CUBIC and HyStart++ are known implementations in the TCP world and give better performance (faster download and less packet loss) than Reno. We've made quiche pluggable and have added CUBIC and HyStart++ support. Our lab testing shows that QUIC is a clear performance winner in lossy network conditions, which is the very thing it is designed for.In the future, we also plan to work on advanced features in quiche, such as packet pacing, advanced recovery and BBR congestion control for better QUIC performance. Using quiche you can switch among multiple congestion control algorithms using the config API at the connection level, so you can play with it and choose the best one depending on your need. qlog endpoint logging can be visualized to provide high accuracy insight into how QUIC is behaving, greatly helping understanding and development.CUBIC and HyStart++ code is available in the quiche master branch today. Please try it!

Cloudflare Bot Management: machine learning and more

IntroductionBuilding Cloudflare Bot Management platform is an exhilarating experience. It blends Distributed Systems, Web Development, Machine Learning, Security and Research (and every discipline in between) while fighting ever-adaptive and motivated adversaries at the same time.This is the ongoing story of Bot Management at Cloudflare and also an introduction to a series of blog posts about the detection mechanisms powering it. I’ll start with several definitions from the Bot Management world, then introduce the product and technical requirements, leading to an overview of the platform we’ve built. Finally, I’ll share details about the detection mechanisms powering our platform.Let’s start with Bot Management’s nomenclature.Some DefinitionsBot - an autonomous program on a network that can interact with computer systems or users, imitating or replacing a human user's behavior, performing repetitive tasks much faster than human users could.Good bots - bots which are useful to businesses they interact with, e.g. search engine bots like Googlebot, Bingbot or bots that operate on social media platforms like Facebook Bot.Bad bots - bots which are designed to perform malicious actions, ultimately hurting businesses, e.g. credential stuffing bots, third-party scraping bots, spam bots and sneakerbots.Bot Management - blocking undesired or malicious Internet bot traffic while still allowing useful bots to access web properties by detecting bot activity, discerning between desirable and undesirable bot behavior, and identifying the sources of the undesirable activity.WAF - a security system that monitors and controls network traffic based on a set of security rules.Gathering requirementsCloudflare has been stopping malicious bots from accessing websites or misusing APIs from the very beginning, at the same time helping the climate by offsetting the carbon costs from the bots. Over time it became clear that we needed a dedicated platform which would unite different bot fighting techniques and streamline the customer experience. In designing this new platform, we tried to fulfill the following key requirements.Complete, not complex - customers can turn on/off Bot Management with a single click of a button, to protect their websites, mobile applications, or APIs.Trustworthy - customers want to know whether they can trust the website visitor is who they say they are and provide a certainty indicator for that trust level.Flexible - customers should be able to define what subset of the traffic Bot Management mitigations should be applied to, e.g. only login URLs, pricing pages or sitewide.Accurate - Bot Management detections should have a very small error, e.g. none or very few human visitors ever should be mistakenly identified as bots.Recoverable - in case a wrong prediction was made, human visitors still should be able to access websites as well as good bots being let through.Moreover, the goal for new Bot Management product was to make it work well on the following use cases:Technical requirementsAdditionally to the product requirements above, we engineers had a list of must-haves for the new Bot Management platform. The most critical were:Scalability - the platform should be able to calculate a score on every request, even at over 10 million requests per second.Low latency - detections must be performed extremely quickly, not slowing down request processing by more than 100 microseconds, and not requiring additional hardware.Configurability - it should be possible to configure what detections are applied on what traffic, including on per domain/data center/server level.Modifiability - the platform should be easily extensible with more detection mechanisms, different mitigation actions, richer analytics and logs.Security - no sensitive information from one customer should be used to build models that protect another customer.Explainability & debuggability - we should be able to explain and tune predictions in an intuitive way.Equipped with these requirements, back in 2018, our small team of engineers got to work to design and build the next generation of Cloudflare Bot Management.Meet the Score“Simplicity is the ultimate sophistication.” - Leonardo Da VinciCloudflare operates on a vast scale. At the time of this writing, this means covering 26M+ Internet properties, processing on average 11M requests per second (with peaks over 14M), and examining more than 250 request attributes from different protocol levels. The key question is how to harness the power of such “gargantuan” data to protect all of our customers from modern day cyberthreats in a simple, reliable and explainable way?Bot management is hard. Some bots are much harder to detect and require looking at multiple dimensions of request attributes over a long time, and sometimes a single request attribute could give them away. More signals may help, but are they generalizable?When we classify traffic, should customers decide what to do with it or are there decisions we can make on behalf of the customer? What concept could possibly address all these uncertainty problems and also help us to deliver on the requirements from above?As you might’ve guessed from the section title, we came up with the concept of Trusted Score or simply The Score - one thing to rule them all - indicating the likelihood between 0 and 100 whether a request originated from a human (high score) vs. an automated program (low score)."One Ring to rule them all" by idreamlikecrazy, used under CC BY / Desaturated from originalOkay, let’s imagine that we are able to assign such a score on every incoming HTTP/HTTPS request, what are we or the customer supposed to do with it? Maybe it’s enough to provide such a score in the logs. Customers could then analyze them on their end, find the most frequent IPs with the lowest scores, and then use the Cloudflare Firewall to block those IPs. Although useful, such a process would be manual, prone to error and most importantly cannot be done in real time to protect the customer's Internet property.Fortunately, around the same time we started worked on this system , our colleagues from the Firewall team had just announced Firewall Rules. This new capability provided customers the ability to control requests in a flexible and intuitive way, inspired by the widely known Wireshark®  language. Firewall rules supported a variety of request fields, and we thought - why not have the score be one of these fields? Customers could then write granular rules to block very specific attack types. That’s how the cf.bot_management.score field was born.Having a score in the heart of Cloudflare Bot Management addressed multiple product and technical requirements with one strike - it’s simple, flexible, configurable, and it provides customers with telemetry about bots on a per request basis. Customers can adjust the score threshold in firewall rules, depending on their sensitivity to false positives/negatives. Additionally, this intuitive score allows us to extend our detection capabilities under the hood without customers needing to adjust any configuration.So how can we produce this score and how hard is it? Let’s explore it in the following section.Architecture overviewWhat is powering the Bot Management score? The short answer is a set of microservices. Building this platform we tried to re-use as many pipelines, databases and components as we could, however many services had to be built from scratch. Let’s have a look at overall architecture (this overly simplified version contains Bot Management related services):Core Bot Management servicesIn a nutshell our systems process data received from the edge data centers, produce and store data required for bot detection mechanisms using the following technologies:Databases & data stores - Kafka, ClickHouse, Postgres, Redis, Ceph.Programming languages - Go, Rust, Python, Java, Bash.Configuration & schema management - Salt, Quicksilver, Cap’n Proto.Containerization - Docker, Kubernetes, Helm, Mesos/Marathon.Each of these services is built with resilience, performance, observability and security in mind.Edge Bot Management moduleAll bot detection mechanisms are applied on every request in real-time during the request processing stage in the Bot Management module running on every machine at Cloudflare’s edge locations. When a request comes in we extract and transform the required request attributes and feed them to our detection mechanisms. The Bot Management module produces the following output:Firewall fields - Bot Management fields - cf.bot_management.score - an integer indicating the likelihood between 0 and 100 whether a request originated from an automated program (low score) to a human (high score). - cf.bot_management.verified_bot - a boolean indicating whether such request comes from a Cloudflare whitelisted bot. - cf.bot_management.static_resource - a boolean indicating whether request matches file extensions for many types of static resources. Cookies - most notably it produces cf_bm, which helps manage incoming traffic that matches criteria associated with bots. JS challenges - for some of our detections and customers we inject into invisible JavaScript challenges, providing us with more signals for bot detection. Detection logs - we log through our data pipelines to ClickHouse details about each applied detection, used features and flags, some of which are used for analytics and customer logs, while others are used to debug and improve our models. Once the Bot Management module has produced the required fields, the Firewall takes over the actual bot mitigation.Firewall integrationThe Cloudflare Firewall's intuitive dashboard enables users to build powerful rules through easy clicks and also provides Terraform integration. Every request to the firewall is inspected against the rule engine. Suspicious requests can be blocked, challenged or logged as per the needs of the user while legitimate requests are routed to the destination, based on the score produced by the Bot Management module and the configured threshold.Firewall rules provide the following bot mitigation actions:Log - records matching requests in the Cloudflare Logs provided to customers.Bypass - allows customers to dynamically disable Cloudflare security features for a request.Allow - matching requests are exempt from challenge and block actions triggered by other Firewall Rules content.Challenge (Captcha) - useful for ensuring that the visitor accessing the site is human, and not automated.JS Challenge - useful for ensuring that bots and spam cannot access the requested resource; browsers, however, are free to satisfy the challenge automatically.Block - matching requests are denied access to the site.Our Firewall Analytics tool, powered by ClickHouse and GraphQL API, enables customers to quickly identify and investigate security threats using an intuitive interface. In addition to analytics, we provide detailed logs on all bots-related activity using either the Logpull API and/or LogPush, which provides the easy way to get your logs to your cloud storage.Cloudflare Workers integrationIn case a customer wants more flexibility on what to do with the requests based on the score, e.g. they might want to inject new, or change existing, HTML page content, or serve incorrect data to the bots, or stall certain requests, Cloudflare Workers provide an option to do that. For example, using this small code-snippet, we can pass the score back to the origin server for more advanced real-time analysis or mitigation:addEventListener('fetch', event => { event.respondWith(handleRequest(event.request)) }) async function handleRequest(request) { request = new Request(request); request.headers.set("Cf-Bot-Score", return fetch(request); } Now let’s have a look into how a single score is produced using multiple detection mechanisms.Detection mechanismsThe Cloudflare Bot Management platform currently uses five complementary detection mechanisms, producing their own scores, which we combine to form the single score going to the Firewall. Most of the detection mechanisms are applied on every request, while some are enabled on a per customer basis to better fit their needs.Having a score on every request for every customer has the following benefits:Ease of onboarding - even before we enable Bot Management in active mode, we’re able to tell how well it’s going to work for the specific customer, including providing historical trends about bot activity.Feedback loop - availability of the score on every request along with all features has tremendous value for continuous improvement of our detection mechanisms.Ensures scaling - if we can compute for score every request and customer, it means that every Internet property behind Cloudflare is a potential Bot Management customer.Global bot insights - Cloudflare is sitting in front of more than 26M+ Internet properties, which allows us to understand and react to the tectonic shifts happening in security and threat intelligence over time.Overall globally, more than third of the Internet traffic visible to Cloudflare is coming from bad bots, while Bot Management customers have the ratio of bad bots even higher at ~43%!Let’s dive into specific detection mechanisms in chronological order of their integration with Cloudflare Bot Management.Machine learningThe majority of decisions about the score are made using our machine learning models. These were also the first detection mechanisms to produce a score and to on-board customers back in 2018. The successful application of machine learning requires data high in Quantity, Diversity, and Quality, and thanks to both free and paid customers, Cloudflare has all three, enabling continuous learning and improvement of our models for all of our customers.At the core of the machine learning detection mechanism is CatBoost  - a high-performance open source library for gradient boosting on decision trees. The choice of CatBoost was driven by the library’s outstanding capabilities:Categorical features support - allowing us to train on even very high cardinality features.Superior accuracy - allowing us to reduce overfitting by using a novel gradient-boosting scheme.Inference speed - in our case it takes less than 50 microseconds to apply any of our models, making sure request processing stays extremely fast.C and Rust API - most of our business logic on the edge is written using Lua, more specifically LuaJIT, so having a compatible FFI interface to be able to apply models is fantastic.There are multiple CatBoost models run on Cloudflare’s Edge in the shadow mode on every request on every machine. One of the models is run in active mode, which influences the final score going to Firewall. All ML detection results and features are logged and recorded in ClickHouse for further analysis, model improvement, analytics and customer facing logs. We feed both categorical and numerical features into our models, extracted from request attributes and inter-request features built using those attributes, calculated and delivered by the Gagarin inter-requests features platform.We’re able to deploy new ML models in a matter of seconds using an extremely reliable and performant Quicksilver configuration database. The same mechanism can be used to configure which version of an ML model should be run in active mode for a specific customer.A deep dive into our machine learning detection mechanism deserves a blog post of its own and it will cover how do we train and validate our models on trillions of requests using GPUs, how model feature delivery and extraction works, and how we explain and debug model predictions both internally and externally.Heuristics engineNot all problems in the world are the best solved with machine learning. We can tweak the ML models in various ways, but in certain cases they will likely underperform basic heuristics. Often the problems machine learning is trying to solve are not entirely new. When building the Bot Management solution it became apparent that sometimes a single attribute of the request could give a bot away. This means that we can create a bunch of simple rules capturing bots in a straightforward way, while also ensuring lowest false positives.The heuristics engine was the second detection mechanism integrated into the Cloudflare Bot Management platform in 2019 and it’s also applied on every request. We have multiple heuristic types and hundreds of specific rules based on certain attributes of the request, some of which are very hard to spoof. When any of the requests matches any of the heuristics - we assign the lowest possible score of 1.The engine has the following properties:Speed - if ML model inference takes less than 50 microseconds per model, hundreds of heuristics can be applied just under 20 microseconds!Deployability - the heuristics engine allows us to add new heuristic in a matter of seconds using Quicksilver, and it will be applied on every request.Vast coverage - using a set of simple heuristics allows us to classify ~15% of global traffic and ~30% of Bot Management customers’ traffic as bots. Not too bad for a few if conditions, right?Lowest false positives - because we’re very sure and conservative on the heuristics we add, this detection mechanism has the lowest FP rate among all detection mechanisms.Labels for ML - because of the high certainty we use requests classified with heuristics to train our ML models, which then can generalize behavior learnt from from heuristics and improve detections accuracy.So heuristics gave us a lift when tweaked with machine learning and they contained a lot of the intuition about the bots, which helped to advance the Cloudflare Bot Management platform and allowed us to onboard more customers.Behavioral analysisMachine learning and heuristics detections provide tremendous value, but both of them require human input on the labels, or basically a teacher to distinguish between right and wrong. While our supervised ML models can generalize well enough even on novel threats similar to what we taught them on, we decided to go further. What if there was an approach which doesn’t require a teacher, but rather can learn to distinguish bad behavior from the normal behavior?Enter the behavioral analysis detection mechanism, initially developed in 2018 and integrated with the Bot Management platform in 2019. This is an unsupervised machine learning approach, which has the following properties:Fitting specific customer needs - it’s automatically enabled for all Bot Management customers, calculating and analyzing normal visitor behavior over an extended period of time.Detects bots never seen before - as it doesn’t use known bot labels, it can detect bots and anomalies from the normal behavior on specific customer’s website.Harder to evade - anomalous behavior is often a direct result of the bot’s specific goal.Please stay tuned for a more detailed blog about behavioral analysis models and the platform powering this incredible detection mechanism, protecting many of our customers from unseen attacks.Verified botsSo far we’ve discussed how to detect bad bots and humans. What about good bots, some of which are extremely useful for the customer website? Is there a need for a dedicated detection mechanism or is there something we could use from previously described detection mechanisms? While the majority of good bot requests (e.g. Googlebot, Bingbot, LinkedInbot) already have low score produced by other detection mechanisms, we also need a way to avoid accidental blocks of useful bots. That’s how the Firewall field cf.bot_management.verified_bot came into existence in 2019, allowing customers to decide for themselves whether they want to let all of the good bots through or restrict access to certain parts of the website.The actual platform calculating Verified Bot flag deserves a detailed blog on its own, but in the nutshell it has the following properties:Validator based approach - we support multiple validation mechanisms, each of them allowing us to reliably confirm good bot identity by clustering a set of IPs.Reverse DNS validator - performs a reverse DNS check to determine whether or not a bots IP address matches its alleged hostname.ASN Block validator - similar to rDNS check, but performed on ASN block.Downloader validator - collects good bot IPs from either text files or HTML pages hosted on bot owner sites.Machine learning validator - uses an unsupervised learning algorithm, clustering good bot IPs which are not possible to validate through other means.Bots Directory - a database with UI that stores and manages bots that pass through the Cloudflare network.Bots directory UI sample‌‌Using multiple validation methods listed above, the Verified Bots detection mechanism identifies hundreds of unique good bot identities, belonging to different companies and categories.JS fingerprintingWhen it comes to Bot Management detection quality it’s all about the signal quality and quantity. All previously described detections use request attributes sent over the network and analyzed on the server side using different techniques. Are there more signals available, which can be extracted from the client to improve our detections?As a matter of fact there are plenty, as every browser has unique implementation quirks. Every web browser graphics output such as canvas depends on multiple layers such as hardware (GPU) and software (drivers, operating system rendering). This highly unique output allows precise differentiation between different browser/device types. Moreover, this is achievable without sacrificing website visitor privacy as it’s not a supercookie, and it cannot be used to track and identify individual users, but only to confirm that request’s user agent matches other telemetry gathered through browser canvas API.This detection mechanism is implemented as a challenge-response system with challenge injected into the webpage on Cloudflare’s edge. The challenge is then rendered in the background using provided graphic instructions and the result sent back to Cloudflare for validation and further action such as  producing the score. There is a lot going on behind the scenes to make sure we get reliable results without sacrificing users’ privacy while being tamper resistant to replay attacks. The system is currently in private beta and being evaluated for its effectiveness and we already see very promising results. Stay tuned for this new detection mechanism becoming widely available and the blog on how we’ve built it.This concludes an overview of the five detection mechanisms we’ve built so far. It’s time to sum it all up!SummaryCloudflare has the unique ability to collect data from trillions of requests flowing through its network every week. With this data, Cloudflare is able to identify likely bot activity with Machine Learning, Heuristics, Behavioral Analysis, and other detection mechanisms. Cloudflare Bot Management integrates seamlessly with other Cloudflare products, such as WAF  and Workers.All this could not be possible without hard work across multiple teams! First of all thanks to everybody on the Bots Team for their tremendous efforts to make this platform come to life. Other Cloudflare teams, most notably: Firewall, Data, Solutions Engineering, Performance, SRE, helped us a lot to design, build and support this incredible platform.Bots team during Austin team summit 2019 hunting bots with axes :)Lastly, there are more blogs from the Bots series coming soon, diving into internals of our detection mechanisms, so stay tuned for more exciting stories about Cloudflare Bot Management!

Cinco de Mayo - What are we celebrating anyway?

Greetings from Latinflare, Cloudflare’s LatinX Employee Resource Group, with members all over the US, the UK, and Portugal. Today is Cinco de Mayo! Americans everywhere will be drinking margaritas and eating chips and salsa. But what is this Mexican holiday really about and what exactly are we celebrating?About Cinco de MayoCinco de Mayo, Spanish for "Fifth of May", is an annual celebration held in Mexico on May 5th. The date is observed to commemorate the Mexican Army's victory over the French Empire at the Battle of Puebla, on May 5, 1862, under the leadership of General Ignacio Zaragoza. The victory of the smaller Mexican force against a larger French force was a boost to morale for the Mexicans. Zaragoza died months after the battle due to illness. A year after the battle, a larger French force defeated the Mexican army at the Second Battle of Puebla, and Mexico City soon fell to the invaders.Source: ( the United States, Cinco de Mayo has taken on a significance beyond that in Mexico. More popularly celebrated in the United States than Mexico, the date has become associated with the celebration of Mexican-American culture. These celebrations began in California, where they have been observed annually since 1863. The day gained nationwide popularity in the 1980s thanks especially to advertising campaigns by beer and wine companies. Today, Cinco de Mayo generates beer sales on par with the Super Bowl. WOW!In Mexico, the commemoration of the battle continues to be mostly ceremonial, such as through military parades or battle reenactments. Cinco de Mayo is sometimes mistaken for Mexico's Independence Day—the most important national holiday in Mexico—which is celebrated on September 16th.Source: credit: Gail Williams via (license information)What Cinco de Mayo means to me? Stories and perspectives from Latinflare members.Before COVID-19, Latinflare members across the US were planning to host “dip contests” and “make-your-own-margarita happy hours” to recognize Cinco de Mayo. In our new “work from home” world, we decided to still celebrate the holiday, but in a new way. I asked members of Latinflare to share what the holiday means to them and their families. Here’s what they shared. Please feel free to share your own personal stories in the comments section if you'd like!What Cinco de Mayo means to me by Alonso - Cloudflare LondonHaving grown up in Mexico, my experience of Cinco de Mayo was quite different from many of my US-based friends and colleagues.Originally, Cinco de Mayo commemorated the Battle of Puebla, which took place on 5 May 1862. In that battle, the Mexican Army defeated the French Army, which later overran Mexican forces and conquered Mexico City. My experience of Cinco de Mayo was mostly as a bank holiday where you get to stay home from school or work. Other holidays like Día de la Independencia (Mexico's equivalent to 4th of July) get more headlines, fireworks, and celebrations. For the longest time, I didn't quite get when US-based friends would text me to wish me a "Happy Cinco."One of the fascinating things about Latinflare, and other Employee Resource Groups at Cloudflare, is that you get to learn from colleagues and their collective experiences. Hearing stories -like the ones shared in this blog- about the significance of Cinco de Mayo to employees across the U.S. is fascinating. The Hispanic community in the US has augmented this day, which now celebrates the rich heritage of immigrant families from across Latin America. So from all our friends at Latinflare, I wish you a very happy Cinco!A perspective from Salvador - Cloudflare AustinAbout 7 years ago when I was still living in Guadalajara, Mexico, Cinco de Mayo was a regular workday (full of meetings) and I remember American co-workers asking me how I was going to celebrate!  I was like: “Why do you ask?”, “That’s not a Mexican holiday!", “We just had a holiday (May Day)”.  I had to Google it so that I could explain to Americans what this holiday was about: Cinco de Mayo celebrates the Mexican victory over France on that day back in 1862. It is also known as "Battle of Puebla”, referring to the state in central Mexico where the battle took place. That’s the only Mexican region where Cinco de Mayo is a major holiday.I am still surprised how this minor holiday is more celebrated in the US than in Mexico, but celebrations are never a bad thing so, keep celebrating this date!! Viva Mexico!! Now that I live in the US, this is a great date to hang out with friends and share Mexican food (tacos, guacamole, nachos, etc.) so they can taste authentic Mexican food.Weighing in from Texas is Ricardo - Cloudflare AustinUnfortunately, in my experience, there are some misconceptions about this day: mainly that Cinco de Mayo is Mexico's Independence day (which it is not). Growing up in Mexico, Cinco de Mayo meant that I didn't have to go to school and got to stay home. In the US, however, it is a day to celebrate Hispanic heritage!Mostly a holiday in Puebla says Alex - Cloudflare AustinI don't really believe that Mexican families outside of Puebla are very aware of Cinco de Mayo. Even though I didn’t grow up in Puebla, I learned a bit more about the holiday due to the fact that my middle school in Ojocaliente, Zacatecas was named "Gral. Ignacio Zaragoza" after the general that defeated the French army in that battle in Puebla in 1862. This only made me try to be extra friendly to any French person that I've met. So even though we are not celebrating Mexican Independence Day,  I don’t have the heart to ruin the party for everyone.Resources for Celebrating Cinco de Mayo during QuarantineWhatever your thoughts or experiences on the holiday, if you choose to celebrate it, we found some cool resources for celebrating the holiday at home. Here are just a few: Forbes “How to Celebrate Cinco de Mayo in Quarantine” Travel & Leisure “How to Celebrate Cinco de Mayo at Home” Do Awesome Stuff in Austin “How to do Cinco de Mayo at Home in Austin” Wherever you are, we are wishing you a happy and healthy Cinco de Mayo!Photo Credit: S Pakhrin via Wikipedia Commons (license information)About LatinflareTo learn more about Latinflare and how we got started, read our first blog post “Bienvenidos a Latinflare”.We are Hiring!Does Cloudflare sound like the type of place you’d like to work? We are hiring! Check out our careers page for more information on full time positions and internship roles at our locations across the globe.

Setting up Cloudflare for Teams as a Start-Up Business

Earlier this year, Cloudflare acquired S2 Systems. We were a start-up in Kirkland, Washington and now we are home to Cloudflare’s Seattle-area office.Our team developed a new approach to remote browser isolation (RBI), a technology that runs your web browser in a cloud data center, stopping threats on the Internet from executing any code on your machine. The closer we can bring that data center to the user, the faster we can make that experience. Since the acquisition, we have been focused on running our RBI platform in every one of Cloudflare’s data centers in 200 cities around the world.The RBI solution will join a product suite that we call Cloudflare for Teams, which consists of two products: Access and Gateway.Those two products solve a number of problems that companies have with securing users, devices, and data. As a start-up, we struggled with a few of these challenges in really painful ways:How do we let prospects securely trial our RBI platform?How do we keep our small office secure without an IT staff?How can we connect to the powerful, but physically clunky and heavy development machines, when we are not in that office?Dogfooding our own products has long been part of Cloudflare’s identity, and our team has had a chance to do the same from a new perspective.Managing access to our RBI service for early adopter customers and partnersAs we built the first version of our product, we worked closely with early adopters to test the product and gather feedback. However, we were not ready to share the product with the entire world yet, so we needed a way to lock down who could reach the prototype and beta versions.It took us the best part of six months to build, test and modify (multiple times) the system for managing access to the product.We chose a complicated solution that took almost as much time to build as did features within the product. We deployed a load balancer that also served as a reverse proxy in front of the RBI host and acted as a bouncer for unauthenticated requests. That sat behind an ASP.NET core server. Furthest to the right sat the most difficult component: identity.We had to manually add identity providers every time a new customer wanted to test out the service. Our CTO frequently burned hours each day adding customers manually, configuring groups, and trying to balance policies that kept different tenants secure.From six months to 30 minutesAs we learned more about Cloudflare during the due diligence period, we started to hear more about Cloudflare Access. Like the RBI solution, Access applied Cloudflare’s network to a new type of problem: how do teams keep their users and resources secure without also slowing them down?When members of the Cloudflare team visited our office in Kirkland, none of them needed a VPN to connect. Their self-managed applications just worked, like any other SaaS app.We then had a chance to try Access ourselves. After the deal closed, we collaborated with the Cloudflare team on an announcement. This started just hours after the acquisition completed, so we did not have a chance to onboard to Cloudflare’s corporate SSO yet. Instead, the team secured new marketing pages and forms behind Cloudflare Access which prompted us to login with our S2 emails. Again, it just worked.We immediately began rethinking every hour we had spent building our own authentication platform. The next day, we set up a Cloudflare Access account. We secured our trial platform by building a couple of rules in the Access UI to decide who should be able to reach it.We sent a note out to the team to try it out. They logged in with our SSO credentials and Cloudflare connected them to the application. No client needed on their side, no multi-level authentication platform on ours.We shut down all of our demo authentication servers. Now, when we have customers who want to trial the RBI technology, we can add their account to the rules in a couple of minutes. They visit a single hostname, login, and can start connecting to a faster, safer browser.Protecting our people and devices from Internet threatsWhen we signed a sublease for our first office location, we found the business card of the building’s Comcast representative taped to the door. We called them and after a week the Comcast Business technicians had a simple network running for us.We wanted to implement a real network security model for our small office. We tried deploying multiple firewalls, with access controls, and added some tools to secure outbound traffic.We spent way too much time on it. Every configuration change involved the staff trying to troubleshoot problems. The system wound up blocking things that should not be blocked, and missing things that should be blocked. It reached the point where we just turned off most of it.Another product in the Cloudflare for Teams platform, Cloudflare Gateway, solved this challenge for us. Rather than 30 minutes, this upgrade took about 10.Cloudflare Gateway secures users from threats on the Internet by stopping traffic from devices or office networks from reaching malicious destinations. The first feature in the product, DNS-based security, adds threat-blocking into the world’s fastest DNS resolver, Cloudflare’s product.We created a policy to block security threats, changed our router’s DNS settings, and never had to worry about it again. As needed, we could log back into the UI and review reports that told us about the malicious traffic that Gateway caught.As I’m writing this post, none of us are working in that office. We’re staying home, but we still can use Gateway’s security model. Gateway now integrates with the app for mobile devices; in a couple of clicks, we can protect iOS and Android phones and tablets with the same level of security. Soon, we’ll be releasing desktop versions to make that easy on every device.Connecting to dev machines while working from homeBack at the office, we still have a small fleet of high-powered Linux machines. These desktops run 16 cores, 32 threads, and 32GB of DDR memory. We use these to build and test Chromium, but dragging these boxes to each developer’s house would have been a huge hassle.We still had a physical VPN appliance that we had purchased during our start-up days. We had hired vendors to install it onsite and configure some elaborate syncing with our identity providers. The only thing more difficult than setting it up was using it. With everyone suddenly working from home, I don’t think we would have been able to make it work.So we returned to Cloudflare Access instead. Working with guidance from Cloudflare’s IT and Security teams, we added a new hostname in the Cloudflare account for the Seattle area office. We then installed the Cloudflare daemon, cloudflared, on the machines in the offices. Those daemons created outbound-only tunnels from the machines to the Cloudflare network, available at a dedicated subdomain for each developer.On the other side of that connection, each engineer on our team installed cloudflared on their machines at home. They need to make one change to their SSH config file, adding two lines that include a ProxyCommand. The setup requires no other modifications, no special SSH clients or commands. Even the developers who rely on tools like Visual Studio Code’s Remote SSH extension could keep their workflow exactly the same.The only difference is that, instead of a VPN, when developers start a new SSH session, Access prompts them to login with Cloudflare’s SSO. They do so and are connected to their machine through Cloudflare’s network and smart routing technology.What’s next?As a start-up, every hour we spent trying to cobble together tools was an hour we lost building our product but we needed to provide secure access to our product so we made the time investment. The only other option would have been to purchase products that were way outside of the price range for a small start-up where the only office perk was bulk Costco trail mix.Cloudflare for Teams immediately solved the challenges we had, in a fairly comprehensive way. We now can seamlessly grant prospects permissions to try the product, our office network is safer, and our developers can stay productive at home.It could be easy to think “I wish we had done this sooner,” and to some extent, I do. However, seeing the before-and-after of our systems has made us more excited about what we’re doing as we bring the remote browser technology into Cloudflare’s network.The RBI platform is going to benefit from the same advantages of that network that make features in Access and Gateway feel like magic. We’re going to apply everything that Cloudflare has learned securing and improving connections and use it to solve a new customer problem.Interested in skipping the hard parts about our story and getting started with Cloudflare for Teams? You can use all of the features covered in this blog post today, at no cost through September.

A single dashboard for Cloudflare for Teams

Starting today, Cloudflare Access can now be used in the Cloudflare for Teams dashboard. You can manage security policies for your people and devices in the same place that you build zero-trust rules to protect your applications and resources. Everything is now in one place in a single dashboard.We are excited to launch a new UI that can be used across the entire Teams platform, but we didn’t build this dashboard just for the sake of a new look-and-feel. While migrating the Access dashboard, we focused on solving one of the largest sources of user confusion in the product.This post breaks down why the original  UI caused some headaches, how we think about objects in Cloudflare for Teams, and how we set out to fix the way we display that to our users.Cloudflare AccessCloudflare Access is one-half of Cloudflare for Teams, a security platform that runs on Cloudflare’s network. Teams protects users, devices and data  without compromising experience or performance. We built Cloudflare Access to solve our own headaches with private networks as we grew from a team concentrated in a single office to a globally distributed organization.Cloudflare Access replaces corporate VPNs with Cloudflare’s network in a zero-trust model. Instead of placing internal tools on a private network, teams deploy them in any environment, including hybrid or multi-cloud models, and secure them consistently with Cloudflare’s network.When users connect to those tools, they are prompted to login with their team’s identity provider. Cloudflare Access checks their login against the list of allowed users and, if permitted, allows the request to proceed.Deploying Access does not require exposing new holes in corporate firewalls. Teams connect their resources through a secure outbound connection, Argo Tunnel, which runs in your infrastructure to connect the applications and machines to Cloudflare. That tunnel makes outbound-only calls to the Cloudflare network and organizations can replace complex firewall rules with just one: disable all inbound connections.Sites vs. AccountsWhen you use Cloudflare, you use the platform at two levels: account and site. You have one Cloudflare account, though you can be a member of multiple accounts. That one account captures details like your billing profile and notification settings.Your account contains sites, the hostnames or zones that you add to Cloudflare. You configure features that apply to a site, like web application firewall (WAF) and caching rules.When we launched Access nearly two years ago, you could use the product to add an identity check to a site you added to Cloudflare, either at the hostname, subdomain, or path. To do that, users select the site in their Cloudflare dashboard, toggle to the Access tab, and build a rule specific to that site.To add rules to a different site, a user steps back up a level. They need to select the new site from the dropdown and load the Access tab for that site. However, two components in the UI remained the same and shared configuration:SSO integrationLogsThe SSO integration is where Access pulls information about identity. Users integrate their Okta, AzureAD, GSuite accounts, or other identity providers, in this card. We made a decision that the integration should apply across your entire account; you should not need to reconfigure your SSO connection on every site where you want to add an Access rule.However, we displayed that information in the site-specific page. Cloudflare has account-level concepts, like billing or account users, but we wanted to keep everything related to Access in a single page so we made this compromise. Logs followed a similar pattern.This decision caused confusion. For example, we add a log table to the bottom of the tab when users view “site{.}com”. However, that table actually presented logs from both “site{.}com” and any other hostname in the account.As more features were added, this exception grew out of control. At this point, the majority of features you see when you open the Access tab for one of your sites are account-level features stuffed into the site view. The page below is the Access tab for a site in my account, widgetcorp{.}tech. Highlighted in green are the boxes that apply to the site I have selected. Highlighted in red are the boxes that apply to my Access account.This user experience is unnecessarily complex . Even worse, though, is that confusion in security products can lead to real incidents. Any time that a user asks “am I building something for my account or this site?” We needed to fix both.Starting with a new designA few months ago, Cloudflare launched Cloudflare for Teams, which consists of two complementary products: Access and a new solution, Cloudflare Gateway. If Access is a bouncer standing in front of the door, checking identity, Gateway is a bodyguard, keeping your team safe as you navigate the Internet.Gateway has no concept of sites, at least not sites that you host yourself. Rather than securing your Internet properties, like Cloudflare’s infrastructure products that rely on the reverse proxy, Gateway secures your team from the Internet, and the threats on it. For the first time, you could use a Cloudflare product without a site on Cloudflare.Gateway introduced other new concepts which have no relation to a domain name in the traditional Cloudflare sense. You can add your office network and your home WiFi to your Gateway account. You can build rules to block any sites on the Internet. You can now use Gateway on mobile devices and soon desktops as well.To capture that model, we started on a new UI from scratch, and earlier this year we launched a new dashboard for Gateway, settings now have a home of their ownThe products in Cloudflare for Teams should live in one place; you shouldn’t need to hop back and forth between different dashboards to manage them. Bringing Access into the Teams dashboard puts everything under one roof.That also gave us an opportunity to solve the confusion in the current Access UI. Since the Teams dashboard is not constrained by the site-specific model, we could break out the dashboard into components that made sense for how people use the Access product.The new dashboard untangles the tools in Access that apply to your entire account (the methods that you use to secure your resources) from the features that apply to a single site (the rules you build to protect a resource).One dashboard for your teamMerging Access into the Cloudflare for Teams dashboard, and solving the problems of the original UI, is just the beginning. We’ll be using that foundation to release new features in both Access and Gateway, including more that apply across both products.You will also be able to continue to extend some of the configuration made in Access to Gateway. For example, an integration with a provider like Okta to build zero-trust policies in Access can eventually be reused for adding group-based policies into Gateway. You’ll see the beginning of that in the new UI, as well, with categories like “My Teams” and “Logs” that apply or will apply to both products. As we continue, we’re going to try to avoid making the same mistake of conflating account, site, and now product objects.What’s next?The new Access UI is available to all customers today in the Cloudflare for Teams dashboard. You can get started by visiting this link and signing in with your Cloudflare account.To use the Access UI, you will first need to enable Cloudflare Access and add a site to Cloudflare in the existing dashboard. Instructions are available here. You can also watch a guided tour of the new site.No new features have been added, though we’re busy working on them. This release focused entirely on improving how users approach the product based on the feedback we have received over 22 months. We’re still listening to new feedback. Run into an issue or notice an area of improvement? Please tell us.

When people pause the Internet goes quiet

Recent news about the Internet has mostly been about the great increase in usage as those workers who can have been told to work from home. I've written about this twice recently, first in early March and then last week look at how Internet use has risen to a new normal.As human behaviour has changed in response to the pandemic, it's left a mark on the charts that network operators look at day in, day out to ensure that their networks are running correctly.Most Internet traffic has a fairly simple rhythm to it. Here, for example, is daily traffic seen on the Amsterdam Internet Exchange. It's a pattern that's familiar to most network operators. People sleep at night, and there's a peak of usage in the early evening when people get home and perhaps stream a movie, or listen to music or use the web for things they couldn't do during the workday.But sometimes that rhythm get broken. Recently we've seen the evening peak by joined by morning peaks as well. Here's a graph from the Milan Internet Exchange. There are three peaks: morning, afternoon and evening.  These peaks seem to be caused by people working from home and children being schooled and playing at home.But there are ways human behaviour shows up on graphs like these.  When humans pause the Internet goes quiet. Here are two examples that I've seen recently.The UK and #ClapForNHSHere's a chart of Internet traffic last week in the UK. The triple peak is clearly visible (see circle A). But circle B shows a significant drop in traffic on Thursday, April 23. That's when people in the UK clapped for NHS workers to show their appreciation for those on the front line dealing with people sick with COVID-19.RamadanRamadan started last Friday, April 24 and it shows up in Internet traffic in countries with large Muslim populations. Here, for example, is a graph of traffic in Tunisia over the weekend. A similar pattern is seen across the Muslim world.Two important parts of the day during Ramadan show up on the chart. These are the iftar and sahoor. Circle A shows the iftar, the evening meal at which Muslims break the fast. Circle B shows the sahoor, the early morning meal before the day's fasting.Looking at the previous weekend (in green) you can see that the Ramadan-related changes are not present and that Internet use is generally higher (by 10% to 15%).ConclusionWe built the Internet for ourselves and despite all the machine to machine traffic that takes place (think IoT devices chatting to their APIs, or computers updating software in the night), human directed traffic dominates.I'd love to hear from readers about other ways human activity might show up in these Internet trends.

DDoS attacks have evolved, and so should your DDoS protection

The proliferation of DDoS attacks of varying size, duration, and persistence has made DDoS protection a foundational part of every business and organization’s online presence. However, there are key considerations including network capacity, management capabilities, global distribution, alerting, reporting and support that security and risk management technical professionals need to evaluate when selecting a DDoS protection solution.Gartner’s view of the DDoS solutions; How did Cloudflare fare?Gartner recently published the report Solution Comparison for DDoS Cloud Scrubbing Centers (ID G00467346), authored by Thomas Lintemuth, Patrick Hevesi and Sushil Aryal. This report enables customers to view a side-by-side solution comparison of different DDoS cloud scrubbing centers measured against common assessment criteria.  If you have a Gartner subscription, you can view the report here. Cloudflare has received the greatest number of ‘High’ ratings as compared to the 6 other DDoS vendors across 23 assessment criteria in the report.The vast landscape of DDoS attacksFrom our perspective, the nature of DDoS attacks has transformed, as the economics and ease of launching a DDoS attack has changed dramatically. With a rise in cost-effective capabilities of launching a DDoS attack, we have observed a rise in the number of under 10 Gbps DDoS network-level attacks, as shown in the figure below. Even though 10 Gbps from an attack size perspective does not seem that large, it is large enough to significantly affect a majority of the websites existing today.At the same time, larger-sized DDoS attacks are still prevalent and have the capability of crippling the availability of an organization’s infrastructure. In March 2020, Cloudflare observed numerous 300+ Gbps attacks with the largest attack being 550 Gbps in size. In the report Gartner also observes a similar trend, “In speaking with the vendors for this research, Gartner discovered a consistent theme: Clients are experiencing more frequent smaller attacks versus larger volumetric attacks.” In addition, they also observe that “For enterprises with Internet connections up to and exceeding 10 Gbps, frequent but short attacks up to 10 Gbps are still quite disruptive without DDoS protection. Not to say that large attacks have gone away. We haven’t seen a 1-plus Tbps attack since spring 2018, but attacks over 500 Gbps are still common.”Gartner recommends in the report to “Choose a provider that offers scrubbing capacity of three times the largest documented volumetric attack on your continent.”From an application-level DDoS attack perspective an interesting DDoS attack observed and mitigated by Cloudflare last year, is shown below. This HTTP DDoS attack had a peak of 1.4M requests per second, which isn’t highly rate-intensive. However, the fact that the 1.1M IPs from which the attack originated were unique and not spoofed made the attack quite interesting. The unique IP addresses were actual clients who were able to complete a TCP and HTTPS handshake. Harness the full power of Cloudflare’s DDoS protectionCloudflare’s cloud-delivered DDoS solution provides key features that enable security professionals to protect their organizations and customers against even the most sophisticated DDoS attacks. Some of the key features and benefits include:Massive network capacity: With over 35 Tbps of network capacity, Cloudflare ensures that you are protected against even the most sophisticated and largest DDoS attacks. Cloudflare’s network capacity is almost equal to the total scrubbing capacity of the other 6 leading DDoS vendors combined.Globally distributed architecture: Having a few scrubbing centers globally to mitigate DDoS attacks is an outdated approach. As DDoS attacks scale and individual attacks originate from millions of unique IPs worldwide, it’s important to have a DDoS solution that mitigates the attack at the source rather than hauling traffic to a dedicated scrubbing center. With every one of our data centers across 200 cities enabled with full DDoS mitigation capabilities, Cloudflare has more points of presence than the 6 leading DDoS vendors combined.Fast time to mitigation: Automated edge-analyzed and edge-enforced DDoS mitigation capabilities allows us to mitigate attacks at unprecedented speeds. Typical time to mitigate a DDoS attack is less than 10s.Integrated security: A key design tenet while building products at Cloudflare is integration. Our DDoS solution integrates seamlessly with other product offerings including WAF, Bot Management, CDN and many more. A comprehensive and integrated security solution to bolster the security posture while aiding performance. No tradeoffs between security and performance!Unmetered and unlimited mitigation: Cloudflare offers unlimited and unmetered DDoS mitigation. This eliminates the legacy concept of ‘Surge Pricing,’ which is especially painful when a business is under duress and experiencing a DDoS attack. This enables you to avoid unpredictable costs from traffic.Whether you’re part of a large global enterprise, or use Cloudflare for your personal site, we want to make sure that you’re protected and also have the visibility that you need. DDoS Protection is included as part of every Cloudflare service. Enterprise-level plans include advanced mitigation, detailed reporting, enriched logs, productivity enhancements and fine-grained controls. Enterprise Plan customers also receive access to dedicated customer success and solution engineering.To learn more about Cloudflare’s DDoS solution contact us or get started. *Gartner “Solution Comparison for DDoS Cloud Scrubbing Centers,” Thomas Lintemuth,  Patrick Hevesi, Sushil Aryal, 16 April 2020

Empowering our Customers and Service Partners

Last year, Cloudflare announced the planned expansion of our partner program to help managed and professional service partners efficiently engage with Cloudflare and join us in our mission to help build a better Internet. Today, we want to highlight some of those amazing partners and our growing support and training for MSPs around the globe. We want to make sure service partners have the enablement and resources they need to bring a more secure and performant Internet experience to their customers. This partner program tier is specifically designed for professional service firms and Managed Service Providers (MSPs and MSSPs) that want to build value-added services and support Cloudflare customers. While Cloudflare is hyper-focused on building highly scalable and easy to use products, we recognize that some customers may want to engage with a professional services firm to assist them in maximizing the value of our offerings. From building Cloudflare Workers, implementing multi-cloud load balancing, or managing WAF and DDoS events, our partner training and support enables sales and technical teams to position and support the Cloudflare platform as well as enhance their services businesses. TrainingOur training and certification is meant to help partners through each stage of Cloudflare adoption, from discovery and sale to implementation, operation and continuous optimization. The program includes hands-on education, partner support and success resources, and access to account managers and partner enablement engineers.  Accredited Sales Professional - Learn about key product features and how to identify opportunities and find the best solution for customers.Accredited Sales Engineer - Learn about Cloudflare’s technical differentiation that drives a smarter, faster and safer Internet.Accredited Configuration Engineer - Learn about implementation, best practices, and supporting Cloudflare.Accredited Services Architect - Launching in May, our Architect accreditation dives deeper into cybersecurity management, performance optimization, and migration services for Cloudflare.Accredited Workers Developer (In Development) - Learn how to develop and deploy serverless applications with Cloudflare Workers.Cloudflare Partner AccreditationService OpportunitiesOver the past year, the partners we’ve engaged with have found success throughout Cloudflare’s lifecycle by helping customers understand how to transform their network in their move to hybrid and multi-cloud solutions, develop serverless applications, or manage the Cloudflare platform.Network Digital Transformations“Cloudflare is streamlining our migration from on-prem to the cloud. As we tap into various public cloud services, Cloudflare serves as our independent, unified point of control — giving us the strategic flexibility to choose the right cloud solution for the job, and the ability to easily make changes down the line.” — Dr. Isabel Wolters, Chief Technology Officer, Handelsblatt Media GroupServerless Architecture Development"At Queue-it we pride ourselves on being the leading developer of virtual waiting room technology, providing a first-in, first-out online waiting system. By partnering with Cloudflare, we've made it easier for our joint customers to bring our solution to their applications through Cloudflare Apps and our Cloudflare Workers Connector that leverages the power of edge computing."  - Henrik Bjergegaard, VP Sales, Queue-ItManaged Security & Insights“Opticca Security supports our clients with proven and reliable solutions to ensure business continuity and protection of your online assets. Opticca Security has grown our partnership with Cloudflare over the years to support the quick deployment, seamless integration, and trusted expertise of Cloudflare Security solutions, Cloudflare Workers, and more." -- Joey Campione, President, Opticca SecurityPartner Showcase - Zilker TechnologyWe wanted to highlight the success of one of our managed service partners who, together with Cloudflare, is delivering a more secure, more high performing and more reliable Internet experience for customers.Zilker Technology engaged Cloudflare when one of their eCommerce clients, the retail store of a major NFL team, was facing carding attacks and other malicious activity on their sites. "Our client activated their Cloudflare subscription on a Thursday, and we were live with Cloudflare in production the following Tuesday, ahead of Black Friday later that same week," says Drew Harris, Director of Managed Services for Zilker. "It was crazy fast and easy!"Carding - also known as credit card stuffing, fraud or verification, happens when cyber criminals attempt to make small purchases with large volumes of stolen credit card numbers on one eCommerce platform.In addition to gaining the enhanced security and protection from Cloudflare WAF, advanced DDOS protection, and rate-limiting, Zilker replaced the client's legacy CDN with Cloudflare CDN, improving site performance and user experience. Zilker provides full-stack managed services and 24/7 support for the client, including Cloudflare monitoring and management.  “Partnering with Cloudflare gives us peace of mind that we can deliver on customer expectations of security and performance all the time, every day. Even as new threats emerge, Cloudflare is one step ahead of the game,” says Matthew Fox, VP of Business Development. Just getting startedCloudflare is committed to making our service partners successful to ensure our customers have the best technology and expertise available to them as they accelerate and protect their critical applications, infrastructure, and teams. As Cloudflare grows our product set, we’ve seen increased demand for the services provided by our partners. Cloudflare is excited and grateful to work with amazing agencies, professional services firms and managed security providers across the globe. The diverse Cloudflare Partner Network is essential to our mission of helping to build a better Internet, and we are dedicated to the success of our partners. We’ll continue our commitment to our customers and partners that Cloudflare will be the easiest and most rewarding solution to implement with partners.More Information:Become a Partner: Partner Program WebsiteReach out to

Doubling the intern class - and making it all virtual

Earlier this month, we announced our plans to relaunch our intern hiring and double our intern class this summer to support more students who may have lost their internships due to COVID-19. You can find that story here. We’ve had interns joining us over the last few summers - students were able to find their way to us by applying to full-time roles and sometimes through Twitter. But, it wasn’t until last summer, in 2019, when we officially had our first official Summer Internship Program. And this year, we are doubling down.Why do we invest in interns?We have found interns to be invaluable. Not only do they bring an electrifying new energy over the summer, but they also come with their curiosity to help solve problems, contribute to major projects, and bring refreshing perspectives to the company.Ship projects: Our interns are matched with a team and work on real and meaningful projects. They are expected to ramp up, contribute like other members of the team and ship by the end of their internship.Hire strong talent: The internship is the “ultimate interview” that allows us to better assess new grad talent. The 12 weeks they spend with us tell us how they work with the team, their curiosity, passion and interest in the company and mission, and overall ability to execute and ship. Increase brand awareness: Some of the best interns and new grads we’ve hired come from referrals from past interns. Students go back to school and will share their summer experience with their peers and classmates, and it can catch like wildfire. This will make long term hiring much easier.Help grow future talent: Companies of all sizes should hire interns to help grow a more diverse talent pool, otherwise the future talent would be shaped by companies like Google, Facebook, Microsoft and the like. The experience gained from working at a small or mid-sized startup versus a behemoth company is very different.Our founding principles. What makes a great internship?How do we make sure we’re prepared for interns? And what should companies and teams consider to ensure a great internship experience? It’s important for companies to be prepared to onboard interns so interns have a great and fruitful experience. These are general items to consider:Committed manager and/or mentor: Interns need a lot of support especially in the beginning, and it’s essential to have a manager or mentor who is willing to commit 30+% of their time to train, teach, and guide the intern for the entire duration of the summer. I would even advise managers/mentors to plan their summer vacations accordingly and if they’re not there for a week or more, they should have a backup support plan.Defined projects and goals: We ask managers to work with their interns to clearly  identify projects and goals they would be interested in working on either before the internship starts, or within the first 2 weeks. By the end of the internship, we want each intern to have learned a lot, be proud of the work they’ve accomplished and present their work to executives and the whole company.Open environment and networking: Throughout the internship, we intentionally create opportunities to meet more people and allow a safe environment for them to ask questions and be curious. Interns connect with each other, employees across other teams, and executives through our Buddy Program, Executive Round Tables, and other social events and outings. Visibility and exposure: Near the end of the internship, all interns are encouraged and given the opportunity to present their work to the whole company and share their project or experience on the company blog. Because they are an integral part of the team, many times they’ll join meetings with our leaders and executives.The pivot to virtual: what we changedThe above are general goals and best practices for an internship during normal times. These are far from normal times. Like many companies, we were faced with the daunting question of what to do with our internship program when it was apparent that all or most of it would be virtual. We leaned into that challenge and developed a plan to build a virtual internship program that still embodies the principles we mentioned and ensures a robust internship experience.The general mantra will be to over-communicate and make sure interns are included in all the team’s activities, communications, meetings, etc. Not only will it be important to include interns in this, it's even more important because these members of our team will crave it the most. They'll lack the historical context existing employees share, and also won't have the breadth of general work experience that their team has. This is where mentors and managers will have to find ways to go above and beyond. Here are some tips below.OnboardingInterns will need to onboard in a completely remote environment, which may be new to both the manager and the company. If possible, check in with the interns before their first day to start building that relationship - understand what their remote work environment is like, how’s their mental health during COVID-19, are they excited and prepared to start? Also, keep in mind that the first two weeks are critical to set expectations for goals and deliverables, to connect them with the right folks involved in their project, and allow them to ask all the questions and get comfortable with the team.Logistically, this may involve a laptop being mailed to them, or other accommodations for remote work. Verify that the intern has been onboarded correctly with access to necessary tools. Make a checklist. Some ideas to start with:Can they send/receive email on your company’s email address?Do you have their phone number if all else fails? And vice-versa?Do they have access to your team's wiki space? Jira? Chat rooms?Can they join a Google Meet/Zoom meeting with you and the team? Including working camera and microphone?Can they access Google Calendar and have they been invited to team meetings? Do they know the etiquette for meetings (to accept and decline) and how to set up meetings with others?Have they completed the expected onboarding training provided by the company?Do they have access to the role-specific tools they'll need to do their job? Source control, CI, Salesforce, Zendesk, etc. (make a checklist of these for your team!)Cadence of WorkIt's critical to establish a normal work cadence, and that can be particularly challenging if someone starts off fully remote. For some interns, this may be their first time working in a professional environment and may need more guidance. Some suggestions for getting that established:Hold an explicit kickoff meeting between the intern and mentor in which they review the project/goals, and discuss how the team will work and interact (meeting frequency, chat room communication, etc).If an intern is located in a different timezone, establish what would be normal working hours and how the team will update them if they miss certain meetings.Ensure there's a proper introduction to the team. This could be a dedicated 1:1 for each member, or a block of the team's regular meeting to introduce the candidate to the team and vice-versa. Set up a social lunch or hour during the first week to have more casual conversations.Schedule weekly 1:1s and checkpoint meetings for the duration of the internship.Set up a very short-term goal that can be achieved quickly so the intern can get a sense for the end-to-end. Similar to how you might learn a new card game by "playing a few hands for fun" - the best way to learn is to dive right in.Consider having the mentor do an end-of-day check-in with the intern every day for at least the first week or two.Schedule at least one dedicated midpoint meeting to provide feedback. This is a time to evaluate how they’re progressing against their goals and deliverables and if they’re meeting their internship expectations. If they are, great. If not, it is essential at this point to inform them so they can improve.Social ActivitiesA major part of a great internship also involves social activities and networking opportunities for interns to connect with different people. This becomes more difficult and requires ever more creativity to try to create those experiences. Here are some ideas:Hold weekly virtual intern lunches and if there’s budget, offer a food delivery gift card. Have themed lunches.Think about virtual social games, Netflix parties, and possibly other apps that can augment virtual networking experiences.Set up social hours for smaller groups of interns to connect and rotate. Have interns meet with interns from their original office locations, from the same departments,Set up an intern group chat and have a topic, joke, picture, meme of the day to the conversations alive.Create a constant “water cooler” Google Meet/Zoom room so folks can sign on anytime and see who is on.Host virtual conversations or round tables with executives and senior leaders.Involve them in other company activities, especially Employee Resource Groups (ERGs).Pair them with a buddy who is an employee from a different team or function. Also, pair them up with a peer intern buddy so they can share their experience.Send all the swag love you can so they can deck out their house and wardrobe. Maybe not all at once, so they can get some surprises.Find a way to highlight interns during regular all-hands meetings or other company events, so people are reminded they’re here.Survey the students and get their ideas! Very likely - they have better ideas on how to socialize in this virtual world.Interns in the past have proven to be invaluable and have made huge contributions to Cloudflare. So, we are excited that we are able to double the program to give more students meaningful work this summer. Despite these very odd and not-so-normal times, we are committed to providing them the best experience possible and making it memorable.We hope that by sharing our approach we can help other companies make the pivot to remote internships more easily. If you’re interested in collaborating and sharing ideas, please contact

Creating a True One-Stop Solution for Companies to Go Global: Announcing a Partnership Between Cloudflare and JD Cloud & AI

It’s well known that global companies can face challenges doing business in and out of China due to the country’s unique rules, regulations, and norms, not to mention recent political and trade complications. Less well known is that China’s logistical and technical network infrastructure is also quite different from the rest of the world’s. With global Internet traffic up 30% over the past month due to the pandemic, these logistical and technical hurdles are increasing the burden for global businesses at exactly the wrong time. It’s now not unusual for someone based in China to have to wait extended periods and often be unable to access applications hosted elsewhere, or vice-versa, due to the lower performance of international Internet traffic to and from China. This affects global companies with customers, suppliers or employees in China, and Chinese companies who are trying to reach global users.Our mission is to help build a better Internet, for everyone, everywhere. So, today we’re excited to announce a significant strategic partnership with JD Cloud & AI, the cloud and intelligent technology business unit of Chinese Internet giant Through this partnership, we’ll be adding 150 data centers in mainland China, an increase in the region of over 700%. The partnership will also enable JD to provide a Cloudflare-powered service to China-based customers. As a result, it will create a one-stop solution for companies both inside and outside of China to go truly global.Cloudflare’s Long Experience in ChinaCloudflare has helped our global customers deliver a secure, fast, and reliable Internet experience for China-based visitors since 2015 and we’ve served Chinese customers since our inception. Cloudflare customers currently are able to extend their configurations with the click of a button across data centers in 17 cities in mainland China. As a result, they’re able to deliver their content faster, more securely, and reliably in-country. The demand for the service has been overwhelming, and we’ve been exploring ways to provide our customers with a network that would have an order of magnitude greater coverage.China’s Balkanized Network ArchitectureWhat we’ve learned from our experience is that having a widely distributed network and world class partners in China matters more there than almost anywhere else in the world. To understand why, it’s important to understand the specific technical and logistical hurdles that exist there.China has a non-uniform technical and network infrastructure, directly impacting Internet performance. Mainland China has three major telecom carriers—China Telecom, China Unicom, and China Mobile—serving 22 provinces, 4 municipalities, and 5 autonomous regions. In many of these places, each carrier operates a distinct network and in some provinces more than one network, that in many cases, operate independently of one another. The result is many different sub-networks that need to be coordinated.Regulatory hurdles in the network space can also present challenges. Unlike the rest of the world, where Anycast routing is generally available, in China the three main ISPs control IP address allocation and routing for customers’ networks both inside the country and globally. Small or large companies rarely own their own IP address allocations, and even fewer use BGP to control Internet routing. Because of the lack of BGP and the static allocation of IPs, the carriers’ customers operate on IP addresses that are homed onto a single network’s backbone.The combination of this single-homed IP connectivity and the fragmented network topography leads to frequent bottlenecks between the various domestic ISPs. This makes network coverage all the more important. Add in a rapidly expanding economy with growing Internet activity, and extraordinary times such as these which puts even more strain on the Internet, and it's easy to see why situations regularly occur where too much traffic is paired with too little capacity.The Challenge of Putting Boots on the GroundCompounding these hurdles further is that, from a business and logistics perspective, China is similarly a collection of sub-markets. There are huge variations between provinces in terms of population levels, average income, consumer spending, and the like. Regional business regulations also vary dramatically. Although it is slowly opening up to outside competition, the Chinese transportation and logistics market is one of the most highly regulated in the world. Regulation exists at a number of different tiers, imposed by national, regional, and local authorities. Finally, there are shortages of high-quality logistics facilities and warehousing spaces, making it hard to find domestic providers for managing import, export, and local transportation as well as trade compliance. You often have to hire consultants who specialize in the China market to assess quality, trustworthiness, and other factors.This makes it challenging both for foreign companies seeking a fast, secure, and reliable Internet experience but also, as we often hear from our customers, to navigate China more generally.The Importance of a World Class Local PartnerGiven these technical, logistical, and regulatory complexities, it’s very difficult for foreign companies to navigate the China landscape without local expertise. Partnering with JD Cloud & AI provides not only local expertise, but also a relationship with one of the world’s largest logistics, e-commerce, and Internet companies, is a juggernaut, operating at a scale that’s rare among global companies. It’s China’s largest retailer by revenue, online or offline, with one billion retail customers, a quarter billion registered users, seven million enterprise customers, and $83 billion in 2019 revenue. Its highly automated logistics system uses robots, AI, and fleets of drones to cover 99% of China’s population.JD decided several years ago to open its technology platform to its enterprise customers and began offering cloud services through a new business unit called JD Cloud & AI. JD Cloud & AI has quickly become the fastest growing cloud company among the top five Chinese providers. It offers a full range of services across eight availability zones in China and has made security and compliance a key part of its offering. In line with its parent company, JD Cloud & AI has made serving a global audience a key part of its strategy and has partnered with the likes of Microsoft and Citrix to build on this strategy. Importantly, like Cloudflare, the company has continued to invest in its infrastructure through the current pandemic, and has been critical to keeping China’s supply chains flowing and its businesses functioning.Taking International Companies Into China & Chinese Companies GlobalOur partnership with JD Cloud & AI will allow international businesses to grow their online presence in China without having to worry about managing separate tools with separate vendors for security and performance in China. Customers will benefit from greater performance and security inside China using the same configurations that they use with Cloudflare everywhere else in the world.Using Cloudflare's international network outside of China, and JD Cloud & AI’s network inside of China, any enterprise can rapidly and securely deploy cloud-based firewall, WAN optimization, distributed denial of service (DDoS) mitigation, content delivery, DNS services, and Cloudflare Workers, our serverless computing solution, worldwide. All with the click of a button within Cloudflare’s dashboard and without deploying a single piece of hardware.For those customers who need it, we also expect to be able to help with in-country logistics. JD operates over 700 warehouses that cover almost all the counties and districts in China. It has over 360 million active individual consumers and seven million enterprise customers that purchase products on its platform. For Cloudflare customers interested in reaching these Chinese end-customers, no matter where they are located in China, will be able to help.The partnership with JD Cloud & AI will also allow us to help Chinese companies reach global audiences. JD Cloud & AI will use Cloudflare's international network outside of China, and the JD Cloud & AI network inside of China, to allow any China-based enterprise to use Cloudflare’s integrated performance and security services worldwide, all seamlessly controlled from within the JD Cloud & AI dashboard.Data ManagementAs always, we’re taking care to be thoughtful about the treatment of customer data with this partnership. Cloudflare operates all services outside of China, and JD Cloud & AI all services inside of China. No Cloudflare customer traffic passes through the China network unless a customer explicitly opts-in to the service. And, for Cloudflare customers that opt-in to proxying content inside China, traffic and log data from outside of China is not stored in the China network or shared with our partner.A One-Stop, Truly Global SolutionWe are excited about this new partnership which will help us continue to offer customers the best performance and security service available anywhere in the world — and as a one-stop solution. While we can’t control the trade and political climate, which will inevitably ebb and flow over time, we can help our customers with technical and logistical challenges they may face doing business around the world, especially in these challenging times.New and existing Cloudflare customers can request to be served in China by filling out an information request at

Releasing kubectl support in Access

Starting today, you can use Cloudflare Access and Argo Tunnel to securely manage your Kubernetes cluster with the kubectl command-line tool.We built this to address one of the edge cases that stopped all of Cloudflare, as well as some of our customers, from disabling the VPN. With this workflow, you can add SSO requirements and a zero-trust model to your Kubernetes management in under 30 minutes.Once deployed, you can migrate to Cloudflare Access for controlling Kubernetes clusters without disrupting your current kubectl workflow, a lesson we learned the hard way from dogfooding here at Cloudflare.What is kubectl?A Kubernetes deployment consists of a cluster that contains nodes, which run the containers, as well as a control plane that can be used to manage those nodes. Central to that control plane is the Kubernetes API server, which interacts with components like the scheduler and manager.kubectl is the Kubernetes command-line tool that developers can use to interact with that API server. Users run kubectl commands to perform actions like starting and stopping the nodes, or modifying other elements of the control plane.In most deployments, users connect to a VPN that allows them to run commands against that API server by addressing it over the same local network. In that architecture, user traffic to run these commands must be backhauled through a physical or virtual VPN appliance. More concerning, in most cases the user connecting to the API server will also be able to connect to other addresses and ports in the private network where the cluster runs.How does Cloudflare Access apply?Cloudflare Access can secure web applications as well as non-HTTP connections like SSH, RDP, and the commands sent over kubectl. Access deploys Cloudflare’s network in front of all of these resources. Every time a request is made to one of these destinations, Cloudflare’s network checks for identity like a bouncer in front of each door.If the request lacks identity, we send the user to your team’s SSO provider, like Okta, AzureAD, and G Suite, where the user can login. Once they login, they are redirected to Cloudflare where we check their identity against a list of users who are allowed to connect. If the user is permitted, we let their request reach the destination.In most cases, those granular checks on every request would slow down the experience. However, Cloudflare Access completes the entire check in just a few milliseconds. The authentication flow relies on Cloudflare’s serverless product, Workers, and runs in every one of our data centers in 200 cities around the world. With that distribution, we can improve performance for your applications while also authenticating every request.How does it work with kubectl?To replace your VPN with Cloudflare Access for kubectl, you need to complete two steps:Connect your cluster to Cloudflare with Argo TunnelConnect from a client machine to that cluster with Argo TunnelConnecting the cluster to CloudflareOn the cluster side, Cloudflare Argo Tunnel connects those resources to our network by creating a secure tunnel with the Cloudflare daemon, cloudflared. As an administrator, you can run cloudflared in any space that can connect to the kubectl API server over TCP.Once installed, an administrator authenticates the instance of cloudflared by logging in to a browser with their Cloudflare account and choosing a hostname to use. Once selected, Cloudflare will issue a certificate to cloudflared that can be used to create a subdomain for the cluster.Next, an administrator starts the tunnel. In the example below, the hostname value can be any subdomain of the hostname selected in Cloudflare; the url value should be the API server for the cluster.cloudflared tunnel --hostname --url tcp://kubernetes.docker.internal:6443 --socks5=true This should be run as a systemd process to ensure the tunnel reconnects if the resource restarts.Connecting as an end userEnd users do not need an agent or client application to connect to web applications secured by Cloudflare Access. They can authenticate to on-premise applications through a browser, without a VPN, like they would for SaaS tools. When we apply that same security model to non-HTTP protocols, we need to establish that secure connection from the client with an alternative to the web browser.Unlike our SSH flow, end users cannot modify kubeconfig to proxy requests through cloudflared. Pull requests have been submitted to add this functionality to kubeconfig, but in the meantime users can set an alias to serve a similar function.First, users need to download the same cloudflared tool that administrators deploy on the cluster. Once downloaded, they will need to run a corresponding command to create a local SOCKS proxy. When the user runs the command, cloudflared will launch a browser window to prompt them to login with their SSO and check that they are allowed to reach this hostname.$ cloudflared access tcp --hostname url -- The proxy allows your local kubectl tool to connect to cloudflared via a SOCKS5 proxy, which helps avoid issues with TLS handshakes to the cluster itself. In this model, TLS verification can still be exchanged with the kubectl API server without disabling or modifying that flow for end users.Users can then create an alias to save time when connecting. The example below aliases all of the steps required to connect in a single command. This can be added to the user’s bash profile so that it persists between restarts.$ alias kubeone="env HTTPS_PROXY=socks5:// kubectl" A (hard) lesson when dogfoodingWhen we build products at Cloudflare, we release them to our own organization first. The entire company becomes a feature’s first customer, and we ask them to submit feedback in a candid way.Cloudflare Access began as a product we built to solve our own challenges with security and connectivity. The product impacts every user in our team, so as we’ve grown, we’ve been able to gather more expansive feedback and catch more edge cases.The kubectl release was no different. At Cloudflare, we have a team that manages our own Kubernetes deployments and we went to them to discuss the prototype. However, they had more than just some casual feedback and notes for us.They told us to stop.We had started down an implementation path that was technically sound and solved the use case, but did so in a way that engineers who spend all day working with pods and containers would find to be a real irritant. The flow required a small change in presenting certificates, which did not feel cumbersome when we tested it, but we do not use it all day. That grain of sand would cause real blisters as a new requirement in the workflow.With their input, we stopped the release, and changed that step significantly. We worked through ideas, iterated with them, and made sure the Kubernetes team at Cloudflare felt this was not just good enough, but better.What’s next?Support for kubectl is available in the latest release of the cloudflared tool. You can begin using it today, on any plan. More detailed instructions are available to get started.If you try it out, please send us your feedback! We’re focused on improving the ease of use for this feature, and other non-HTTP workflows in Access, and need your input.New to Cloudflare for Teams? You can use all of the Teams products for free through September, including Cloudflare Access and Argo Tunnel. You can learn more about the program, and request a dedicated onboarding session, here.

Internet performance during the COVID-19 emergency

A month ago I wrote about changes in Internet traffic caused by the COVID-19 emergency. At the time I wrote:Cloudflare is watching carefully as Internet traffic patterns around the world alter as people alter their daily lives through home-working, cordon sanitaire, and social distancing. None of these traffic changes raise any concern for us. Cloudflare's network is well provisioned to handle significant spikes in traffic. We have not seen, and do not anticipate, any impact on our network's performance, reliability, or security globally.That holds true today; our network is performing as expected under increased load. Overall the Internet has shown that it was built for this: designed to handle huge changes in traffic, outages, and a changing mix of use. As we are well into April I thought it was time for an update.GrowthHere's a chart showing the relative change in Internet use as seen by Cloudflare since the beginning of the year. I've calculated moving average of the trailing seven days for each country and use December 29, 2019 as the reference point.On this chart the highest growth in Internet use has been in Portugal: it's currently running at about a 50% increase with Spain close behind followed by the UK. Italy flattened out at about a 40% increase in usage towards the end of March and France seems to be plateauing at a little over 30% up on the end of last year.It's interesting to see how steeply Internet use grew in the UK, Spain and Portugal (the red, yellow and blue lines rise very steeply), with Spain and Portugal almost in unison and the UK lagging by about two weeks.Looking at some other major economies we see other, yet similar patterns.Similar increases in utilization are seen here. The US, Canada, Australia and Brazil are all running at between 40% and 50% the level of use at the beginning of the year.StabilityWe measure the TCP RTT (round trip time) between our servers and visitors to Internet properties that are Cloudflare customers. This gives us a measure of the speed of the networks between us and end users, and if the RTT increases it is also a measure of congestion along the path.Looking at TCP RTT over the last 90 days can help identify changes in congestion or the network. Cloudflare connects widely to the Internet via peering (and through the use of transit) and we connect to the largest number of Internet exchanges worldwide to ensure fast access for all users.Cloudflare is also present in 200 cities worldwide; thus the TCP RTT seen by Cloudflare gives a measure of the performance of end-user networks within a country. Here's a chart showing the median and 95th percentile TCP RTT in the UK in the last 90 days.What's striking in this chart is that despite the massive increase in Internet use (the grey line), the TCP RTT hasn't changed significantly. From our vantage point UK networks are coping well.Here's the situation in Italy:The picture here is slightly different. Both median and 95th percentile TCP RTT increased as traffic increased. This indicates that networks aren't operating as smoothly in Italy. It's noticeable, though, that as traffic has plateaued the TCP RTT has improved somewhat (take a look at the 95th percentile) indicating that ISPs and other network providers in Italy have likely taken action to improve the situation.This doesn't mean that Italian Internet is in trouble, just that it's strained more than, say, the Internet in the UK. ConclusionThe Internet has seen incredible, sudden growth in traffic but continues to operate well. What Cloudflare sees reflects what we've heard anecdotally: some end-user networks are feeling the strain of the sudden change of load but are working and helping us all cope with the societal effects of COVID-19.It's hard to imagine another utility (say electricity, water or gas) coping with a sudden and continuous increase in demand of 50%.


Recommended Content