Amazon Web Services Blog

Amazon EKS Windows Container Support now Generally Available

In March of this year, we announced a preview of Windows Container support on Amazon Elastic Kubernetes Service and invited customers to experiment and provide us with feedback. Today, after months of refining the product based on that feedback, I am delighted to announce that  Windows Container support is now generally available. Many development teams build and support applications designed to run on Windows Servers and with this announcement they can now deploy them on Kubernetes alongside Linux applications. This ability will provide more consistency in system logging, performance monitoring, and code deployment pipelines. Amazon Elastic Kubernetes Service simplifies the process of building, securing, operating, and maintaining Kubernetes clusters, and allows organizations to focus on building applications instead of operating Kubernetes. We are proud to be the first Cloud provider to have General Availability of Windows Containers on Kubernetes and look forward to customers unlocking the business benefits of Kubernetes for both their Windows and Linux workloads. To show you how this feature works, I will need an Amazon Elastic Kubernetes Service cluster. I am going to create a new one, but this will work with any cluster that is using Kubernetes version 1.14 and above. Once the cluster has been configured, I will add some new Windows nodes and deploy a Windows application. Finally, I will test the application to ensure it is running as expected. The simplest way to get a cluster set up is to use eksctl, the official CLI tool for EKS. The command below creates a cluster called demo-windows-cluster and adds two Linux nodes to the cluster. Currently, at least one Linux node is required to support Windows node and pod networking, however, I have selected two for high availability and we would recomend that you do the same. eksctl create cluster \ --name demo-windows-cluster \ --version 1.14 \ --nodegroup-name standard-workers \ --node-type t3.medium \ --nodes 2 \ --nodes-min 1 \ --nodes-max 3 \ --node-ami auto Starting with eksctl version 0.7, a new utility has been added called install-vpc-controllers. This utility installs the required VPC Resource Controller and VPC Admission Webhook into the cluster. These components run on Linux nodes and are responsible for enabling networking for incoming pods on Windows nodes. To use the tool we run the following command. eksctl utils install-vpc-controllers --name demo-windows-cluster --approve If you don’t want to use eksctl we also provide guides in the documentation on how you can run PowerShell or Bash scripts, to achieve the same outcome. Next, I will need to add some Windows Nodes to our cluster. If you use eksctl to create the cluster then the command below will work. If you are working with an existing cluster, check out the documentation for instructions on how to create a Windows node group and connect it to your cluster. eksctl create nodegroup \ --region us-west-2 \ --cluster demo-windows-cluster \ --version 1.14 \ --name windows-ng \ --node-type t3.medium \ --nodes 3 \ --nodes-min 1 \ --nodes-max 4 \ --node-ami-family WindowsServer2019FullContainer \ --node-ami ami-0f85de0441a8dcf46 The most up to date Windows AMI ID for your region can be found by querying the AWS SSM Parameter Store. Instructions to do this can be found in the Amazon EKS documentation. Now I have the nodes up and running I can deploy a sample application. I am using a YAML file from the AWS containers roadmap GitHub repository. This file configures an app that consists of a single container that runs IIS which in turn hosts a basic HTML page. kubectl apply -f https://raw.githubusercontent.com/aws/containers-roadmap/master/preview-programs/eks-windows-preview/windows-server-IIS.yaml These are Windows containers, which are often a little larger than Linux containers and therefore take a little longer to download and start-up. I monitored the progress of the deployment by running the following command. kubectl get pods -o wide --watch I waited for around 5 minutes for the pod to transition to the Running state. I then executed the following command, which connects to the pod and initializes a PowerShell session inside the container. The windows-server-iis-66bf9745b-xsbsx property is the name of the pod, if you are following along with this your name will be different. kubectl exec -it windows-server-iis-66bf9745b-xsbsx powershell Once you are conected to the PowerShell session you can now execute PowerShell as if you were using the terminal inside the container. Therefore if we run the command below we should get some information back about the news blog Invoke-WebRequest -Uri https://aws.amazon.com/blogs/aws/ -UseBasicParsing To exit the PowerShell session I type exit and it returns me to my terminal. From there I can inspect the service that was deployed by the sample application, I type the following command: kubectl get svc windows-server-iis-service This gives me the following output that describes the service: NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE windows-server-iis-service LoadBalancer xx.xx.xxx.xxx unique.us-west-2.elb.amazonaws.com 80:32750/TCP 54s The External IP should be the address of a load balancer. If I type this URL into a browser and append /default.html then it will load a HTML page that was created by the sample application deployment. This is being served by our IIS server from one of the Windows containers I deployed. So there we have it, Windows Containers running on Amazon Elastic Kubernetes Service. For more details, please check out the documentation. Amazon EKS Windows Container Support is available in all the same regions as Amazon EKS is available, and pricing details can be found over here. We have a long roadmap for Amazon Elastic Kubernetes Service, but we are eager to get your feedback and will use it to drive our prioritization process. Please take a look at this new feature and let us know what you think!

Learn about AWS Services & Solutions – October AWS Online Tech Talks

Learn about AWS Services & Solutions – October AWS Online Tech Talks Join us this October to learn about AWS services and solutions. The AWS Online Tech Talks are live, online presentations that cover a broad range of topics at varying technical levels. These tech talks, led by AWS solutions architects and engineers, feature technical deep dives, live demonstrations, customer examples, and Q&A with AWS experts. Register Now! Note – All sessions are free and in Pacific Time. Tech talks this month: AR/VR:  October 30, 2019 | 9:00 AM – 10:00 AM PT – Using Physics in Your 3D Applications with Amazon Sumerian – Learn how to simulate real-world environments in 3D using Amazon Sumerian’s new robust physics system. Compute:  October 24, 2019 | 9:00 AM – 10:00 AM PT – Computational Fluid Dynamics on AWS – Learn best practices to run Computational Fluid Dynamics (CFD) workloads on AWS. October 28, 2019 | 1:00 PM – 2:00 PM PT – Monitoring Your .NET and SQL Server Applications on Amazon EC2 – Learn how to manage your application logs through AWS services to improve performance and resolve issues for your .Net and SQL Server applications. October 31, 2019 | 9:00 AM – 10:00 AM PT – Optimize Your Costs with AWS Compute Pricing Options – Learn which pricing models work best for your workloads and how to combine different purchase options to optimize cost, scale, and performance. Data Lakes & Analytics:  October 23, 2019 | 9:00 AM – 10:00 AM PT – Practical Tips for Migrating Your IBM Netezza Data Warehouse to the Cloud – Learn how to migrate your IBM Netezza Data Warehouse to the cloud to save costs and improve performance. October 31, 2019 | 11:00 AM – 12:00 PM PT – Alert on Your Log Data with Amazon Elasticsearch Service – Learn how to receive alerts on your data to monitor your application and infrastructure using Amazon Elasticsearch Service. Databases: October 22, 2019 | 1:00 PM – 2:00 PM PT – How to Build Highly Scalable Serverless Applications with Amazon Aurora Serverless – Get an overview of Amazon Aurora Serverless, an on-demand, auto-scaling configuration for Amazon Aurora, and learn how you can use it to build serverless applications. DevOps: October 21, 2019 | 11:00 AM – 12:00 PM PT – Migrate Your Ruby on Rails App to AWS Fargate in One Step Using AWS Rails Provisioner – Learn how to define and deploy containerized Ruby on Rails Applications on AWS with a few commands. End-User Computing:  October 24, 2019 | 11:00 AM – 12:00 PM PT – Why Software Vendors Are Choosing Application Streaming Instead of Rewriting Their Desktop Apps – Walk through common customer use cases of how Amazon AppStream 2.0 lets software vendors deliver instant demos, trials, and training of desktop applications. October 29, 2019 | 11:00 AM – 12:00 PM PT – Move Your Desktops and Apps to AWS End-User Computing – Get an overview of AWS End-User Computing services and then dive deep into best practices for implementation. Enterprise & Hybrid:  October 29, 2019 | 1:00 PM – 2:00 PM PT – Leverage Compute Pricing Models and Rightsizing to Maximize Savings on AWS – Get tips on building a cost-management strategy, incorporating pricing models and resource rightsizing. IoT: October 30, 2019 | 1:00 PM – 2:00 PM PT – Connected Devices at Scale: A Deep Dive into the AWS Smart Product Solution – Learn how to jump-start the development of innovative connected products with the new AWS Smart Product Solution. Machine Learning: October 23, 2019 | 1:00 PM – 2:00 PM PT – Analyzing Text with Amazon Elasticsearch Service and Amazon Comprehend – Learn how to deploy a cost-effective, end-to-end solution for extracting meaningful insights from unstructured text data like customer calls, support tickets, or online customer feedback. October 28, 2019 | 11:00 AM – 12:00 PM PT – AI-Powered Health Data Masking – Learn how to use the AI-Power Health Data Masking solution for use cases like clinical decision support, revenue cycle management, and clinical trial management. Migration: October 22, 2019 | 11:00 AM – 12:00 PM PT – Deep Dive: How to Rapidly Migrate Your Data Online with AWS DataSync – Learn how AWS DataSync makes it easy to rapidly move large datasets into Amazon S3 and Amazon EFS for your applications. Mobile: October 21, 2019 | 1:00 PM – 2:00 PM PT – Mocking and Testing Serverless APIs with AWS Amplify – Learn how to mock and test GraphQL APIs in your local environment with AWS Amplify. Robotics: October 22, 2019 | 9:00 AM – 10:00 AM PT – The Future of Smart Robots Has Arrived – Learn how to and why you should build smarter robots with AWS. Security, Identity and Compliance:  October 29, 2019 | 9:00 AM – 10:00 AM PT – Using AWS Firewall Manager to Simplify Firewall Management Across Your Organization – Learn how AWS Firewall Manager simplifies rule management across your organization. Serverless: October 21, 2019 | 9:00 AM – 10:00 AM PT – Advanced Serverless Orchestration with AWS Step Functions – Go beyond the basics and explore the best practices of Step Functions, including development and deployment of workflows and how you can track the work being done. October 30, 2019 | 11:00 AM – 12:00 PM PT – Managing Serverless Applications with SAM Templates – Learn how to reduce code and increase efficiency by managing your serverless apps with AWS Serverless Application Model (SAM) templates. Storage: October 23, 2019 | 11:00 AM – 12:00 PM PT – Reduce File Storage TCO with Amazon EFS and Amazon FSx for Windows File Server – Learn how to optimize file storage costs with AWS storage solutions.

EC2 High Memory Update – New 18 TB and 24 TB Instances

Last year we launched EC2 High Memory Instances with 6, 9, and 12 TiB of memory. Our customers use these instances to run large-scale SAP HANA installations, while also taking advantage of AWS services such as Amazon Elastic Block Store (EBS), Amazon Simple Storage Service (S3), AWS Identity and Access Management (IAM), Amazon CloudWatch, and AWS Config. Customers appreciate that these instances use the same AMIs and management tools as their other EC2 instances, and use them to build production systems that provide enterprise-grade data protection and business continuity. These are bare metal instances that can be run in a Virtual Private Cloud (VPC), and are EBS-Optimized by default. Today we are launching instances with 18 TiB and 24 TiB of memory. These are 8-socket instances powered by 2nd generation Intel® Xeon® Scalable (Cascade Lake) processors running at 2.7 GHz, and are available today in the US East (N. Virginia) Region, with more to come. Just like the existing 6, 9, and 12 TiB bare metal instances, the 18 and 24 TiB instances are available in Dedicated Host form with a Three Year Reservation. You also have the option to upgrade a reservation for a smaller size to one of the new sizes. Here are the specs: Instance Name Memory Logical Processors Dedicated EBS Bandwidth Network Bandwidth SAP Workload Certifications u-6tb1.metal 6 TiB 448 14 Gbps 25 Gbps OLAP, OLTP u-9tb1.metal 9 TiB 448 14 Gbps 25 Gbps OLAP, OLTP u-12tb1.metal 12 TiB 448 14 Gbps 25 Gbps OLAP, OLTP u-18tb1.metal 18 TiB 448 28 Gbps 100 Gbps OLAP, OLTP u-24tb1.metal 24 TiB 448 28 Gbps 100 Gbps OLTP SAP OLAP workloads include SAP BW/4HANA, BW on HANA (BWoH), and Datamart. SAP OLTP workloads include S/4HANA and Suite on HANA (SoH). Consult the SAP Hardware Directory for more information on the workload certifications. With 28 Gbps of dedicated EBS bandwidth, the u-18tb1.metal and u-24tb1.metal instances can load data into memory at very high speed. For example, my colleagues loaded 9 TB of data in just 45 minutes, an effective rate of 3.4 gigabytes per second (GBps): Here’s an overview of the scale-up and scale-out options that are possible when using these new instances to run SAP HANA: New Instances in Action My colleagues were kind enough to supply me with some screen shots from 18 TiB and 24 TiB High Memory instances. Here’s the output from the lscpu and free commands on an 18 TiB instance: Here’s top on the same instance: And here is HANA Studio on a 24 TiB instance: Available Now As I mentioned earlier, the new instance sizes are available today. — Jeff; PS – Be sure to check out the AWS Quick Start for SAP HANA and the AWS Quick Start for S/4HANA.

Now available in Amazon SageMaker: EC2 P3dn GPU Instances

In recent years, the meteoric rise of deep learning has made incredible applications possible, such as detecting skin cancer (SkinVision) and building autonomous vehicles (TuSimple). Thanks to neural networks, deep learning indeed has the uncanny ability to extract and model intricate patterns from vast amounts of unstructured data (e.g. images, video, and free-form text). However, training these neural networks requires equally vasts amounts of computing power. Graphics Processing Units (GPUs) have long proven that they were up to that task, and AWS customers have quickly understood how they could use Amazon Elastic Compute Cloud (EC2) P2 and P3 instances to train their models, in particular on Amazon SageMaker, our fully-managed, modular, machine learning service. Today, I’m very happy to announce that the largest P3 instance, named p3dn.24xlarge, is now available for model training on Amazon SageMaker. Launched last year, this instance is designed to accelerate large, complex, distributed training jobs: it has twice as much GPU memory as other P3 instances, 50% more vCPUs, blazing-fast local NVMe storage, and 100 Gbit networking. How about we give it a try on Amazon SageMaker? Introducing EC2 P3dn instances on Amazon SageMaker Let’s start from this notebook, which uses the built-in image classification algorithm to train a model on the Caltech-256 dataset. All I have to do to use a p3dn.24xlarge instance on Amazon SageMaker is to set train_instance_type to 'ml.p3dn.24xlarge', and train! ic = sagemaker.estimator.Estimator(training_image, role, train_instance_count=1, train_instance_type='ml.p3dn.24xlarge', input_mode='File', output_path=s3_output_location, sagemaker_session=sess) ... ic.fit(...) I ran some quick tests on this notebook, and I got a sweet 20% training speedup out of the box (your mileage may vary!). I’m using 'File' mode here, meaning that the full dataset is copied to the training instance: the faster network (100 Gbit, up from 25 Gbit) and storage (local NVMe instead of Amazon EBS) are certainly helping! When working with large data sets, you could put 100 Gbit networking to good use either by streaming data from Amazon Simple Storage Service (S3) with Pipe Mode, or by storing it in Amazon Elastic File System or Amazon FSx for Lustre. It would also help with distributed training (using Horovod, maybe), as instances would be able to exchange parameter updates faster. In short, the Amazon SageMaker and P3dn tag team packs quite a punch, and it should deliver a significant performance improvement for large-scale deep learning workloads. Now available! P3dn instances are available on Amazon SageMaker in the US East (N. Virginia) and US West (Oregon) regions. If you are ready to get started, please contact your AWS account team or use the Contact Us page to make a request. As always, we’d love to hear your feedback, either on the AWS Forum for Amazon SageMaker, or through your usual AWS contacts.

New languages for Amazon Translate: Greek, Hungarian, Romanian, Thai, Ukrainian, Urdu and Vietnamese

Technical Evangelists travel quite a lot, and the number one question that we get from customers when presenting Amazon Translate is: “Is my native language supported?“. Well, I’m happy to announce that starting today, we’ll be able to answer “yes” if your language is Greek, Hungarian, Romanian, Thai, Ukrainian, Urdu and Vietnamese. In fact, using Amazon Translate, we could even say “ναί”, “igen”, “da”, “ใช่”, “так”, “جی ہاں” and “có”… hopefully with a decent accent! With these additions, Amazon Translate now supports 32 languages: Arabic, Chinese (Simplified), Chinese (Traditional), Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Malay, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Spanish, Swedish, Thai, Turkish, Ukrainian, Urdu and Vietnamese. Between these languages, the service supports 987 translation combinations: you can see the full list of supported language pairs on this documentation page. Using Amazon Translate Amazon Translate is extremely simple to use. Let’s quickly test it in the AWS console on one of my favourite poems: Developers will certainly prefer to invoke the TranslateText API. Here’s an example with the AWS CLI. $ aws translate translate-text --source-language-code auto --target-language-code hu --text "Les sanglots longs des violons de l’automne blessent mon coeur d’une langueur monotone" { "TranslatedText": "Az őszi hegedű hosszú zokogása monoton bágyadtsággal fáj a szívem", "SourceLanguageCode": "fr", "TargetLanguageCode": "hu" } Of course, this API is also available in any of the AWS SDKs. In the continued spirit of language diversity, how about an example in C++? Here’s a short program translating a text file stored on disk. #include <aws/core/Aws.h> #include <aws/core/utils/Outcome.h> #include <aws/translate/TranslateClient.h> #include <aws/translate/model/TranslateTextRequest.h> #include <aws/translate/model/TranslateTextResult.h> #include <fstream> #include <iostream> #include <string> # define MAX_LINE_LENGTH 5000 int main(int argc, char **argv) { if (argc != 4) { std::cout << "Usage: translate_text_file 'target language code' 'input file' 'output file'" << std::endl; return -1; } const Aws::String target_language = argv[1]; const std::string input_file = argv[2]; const std::string output_file = argv[3]; std::ifstream fin(input_file.c_str(), std::ios::in); if (!fin.good()) { std::cerr << "Input file is invalid." << std::endl; return -1; } std::ofstream fout(output_file.c_str(), std::ios::out); if (!fout.good()) { std::cerr << "Output file is invalid." << std::endl; return -1; } Aws::SDKOptions options; Aws::InitAPI(options); { Aws::Translate::TranslateClient translate_client; Aws::Translate::Model::TranslateTextRequest request; request = request.WithSourceLanguageCode("auto").WithTargetLanguageCode(target_language); Aws::String line; while (getline(fin, line)) { if (line.empty()) { continue; } if (line.length() > MAX_LINE_LENGTH) { std::cerr << "Line is too long." << std::endl; break; } request.SetText(line); auto outcome = translate_client.TranslateText(request); if (outcome.IsSuccess()) { auto translation = outcome.GetResult().GetTranslatedText(); fout << translation << std::endl; } else { std::cout << "TranslateText error: " << outcome.GetError().GetExceptionName() << " - " << outcome.GetError().GetMessage() << std::endl; break; } } } Aws::ShutdownAPI(options); } Once the code has been built, let’s translate the full poem to Thai: $ translate_text_file th verlaine.txt verlaine-th.txt $ cat verlaine-th.txt “เสียงสะอื้นยาวของไวโอลินฤดูใบไม้ร่วงทำร้ายหัวใจของฉันด้วยความอ่อนเพลียที่น่าเบื่อ ทั้งหมดหายใจไม่ออกและซีดเมื่อชั่วโมงดังผมจำได้ว่าวันเก่าและร้องไห้ และฉันไปที่ลมเลวร้ายที่พาฉันออกไปจากที่นี่ไกลกว่าเช่นใบไม้ที่ตายแล้ว” - Paul Verlaine บทกวีของดาวเสาร์ As you can see, it’s extremely simple to integrate Amazon Translate in your own applications. An single API call is really all that it takes! Available Now! These new languages are available today in all regions where Amazon Translate is available. The free tier offers 2 million characters per month for the first 12 months, starting from your first translation request. We’re looking forward to your feedback! Please post it to the AWS Forum for Amazon Translate, or send it to your usual AWS support contacts. — Julien;

AWS IQ – Get Help from AWS Certified Third Party Experts on Demand

We want to make sure that you are able to capture the value of cloud computing by thinking big and building fast! As you embark on your journey to the cloud, we also want to make sure that you have access to the resources that you will need to have in order to success. For example: AWS Training and Certification – This program helps you and your team to build and validate your cloud skills. AWS Support – This program gives you access to tools, technology, and people, all designed to help you to optimize performance, lower costs, and innovate faster. AWS Professional Services – Our global team of experts are ready to work with you (and your chosen APN partner) to help you to achieve your enterprise cloud computing goals. APN Consulting Partners – This global team of professional service providers are able to help you design, architect, build, migrate, and manage your applications and workloads. AWS Managed Services (AMS) – This service operates AWS on behalf of our enterprise-scale customers. Today I would like to tell you about AWS IQ, a new service that will help you to engage with AWS Certified third party experts for project work. While organizations of any size can use and benefit from AWS IQ, I believe that small and medium-sized businesses will find it particularly useful. Regardless of the size of your organization, AWS IQ will let you quickly & securely find, engage, and pay AWS Certified experts for hands-on help. All of the experts have active AWS Associate, Professional, or Specialty Certifications, and are ready & willing to help you. AWS IQ is integrated with your AWS account and your AWS bill. If you are making use of the services of an expert, AWS IQ lets you grant, monitor, and control access to your AWS Account. You can also pay the expert at the conclusion of each project milestone. AWS IQ for Customers I can create a new request in minutes. I visit the AWS IQ Console and click New request to get started: One important note: The IAMFullAccess and AWSIQFullAccess managed policies must be in force if I am logged in as an IAM user. Then I describe my request and click Submit Request: My request is shared with the experts and they are encouraged to reply with proposals. I can monitor their responses from within the console, and I can also indicate that I am no longer accepting new responses: After one or more responses arrive, I can evaluate the proposals, chat with the experts via text or video, and ultimately decide to Accept the proposal that best meets my needs: A contract is created between me and the expert, and we are ready to move forward! The expert then requests permission to access my AWS account, making use of one of nine IAM policies. I review and approve their request, and the expert is supplied with a URL that will allow them to log in to the AWS Management Console using this role: When the agreed-upon milestones are complete, the expert creates payment requests. I approve them, and work continues until the project is complete. After the project is complete, I enter public and private feedback for the expert. The public feedback becomes part of the expert’s profile; the private feedback is reviewed in confidence by the AWS IQ team. AWS IQ for Experts I can register as an expert by visiting AWS IQ for Experts. I must have one or more active AWS Certifications, I must reside in the United States, and I must have US banking and tax information. After I complete the registration process and have been approved as an expert, I can start to look for relevant requests and reply with questions or an initial expression of interest: I can click Create to create a proposal: When a customer accepts a proposal, the status switches to ACCEPTED. Then I click Request Permission to gain IAM-controlled access to their AWS account: Then I ask for permission to access their AWS account: After the customer reviews and accepts the request, I click Console access instructions to log in to the customer’s AWS account, with my access governed by the IAM policy that I asked for: I do the work, and then request payment for a job well done: I can request full or partial payment. Requesting full payment also concludes the proposal, and immediately disallows further console access to the customer’s AWS account and resources: Things to Know Here are a couple of things that you should know about AWS IQ: Customers – Customers can reside anywhere in the world except China. Experts – Applications from several hundred would-be experts have already been reviewed and accepted; we’ll continue to add more as quickly as possible. As I noted earlier, experts must reside in the United States. Project Value – The project value must be $1 or more. Payment – The customer’s payment is charged to their AWS account at their request, and disbursed monthly to the expert’s account. Customers will be able to see their payments on their AWS bill. In the Works – We have a long roadmap for this cool new service, but we are eager to get your feedback and will use it to drive our prioritization process. Please take a look at AWS IQ and let us know what you think! — Jeff;    

Introducing Batch Mode Processing for Amazon Comprehend Medical

Launched at AWS re:Invent 2018, Amazon Comprehend Medical is a HIPAA-eligible natural language processing service that makes it easy to use machine learning to extract relevant medical information from unstructured text. For example, customers like Roche Diagnostics and The Fred Hutchinson Cancer Research Center can quickly and accurately extract information, such as medical condition, medication, dosage, strength, and frequency from a variety of sources like doctors’ notes, clinical trial reports, and patient health records. They can also identify protected health information (PHI) present in these documents in order to anonymize it before data exchange. In a previous blog post, I showed you how to use the Amazon Comprehend Medical API to extract entities and detect PHI in a single document. Today we’re happy to announce that this API can now process batches of documents stored in an Amazon Simple Storage Service (S3) bucket. Let’s do a demo! Introducing the Batch Mode API First, we need to grab some data to test batch mode: MT Samples is a great collection of real-life anonymized medical transcripts that are free to use and distribute. I picked a few transcripts, and converted them to the simple JSON format that Amazon Comprehend Medical expects: in a production workflow, converting documents to this format could easily be done by your application code, or by one of our analytics services such as AWS Glue. {"Text": " VITAL SIGNS: The patient was afebrile. He is slightly tachycardic, 105, but stable blood pressure and respiratory rate.GENERAL: The patient is in no distress. Sitting quietly on the gurney. HEENT: Unremarkable. His oral mucosa is moist and well hydrated. Lips and tongue look normal. Posterior pharynx is clear. NECK: Supple. His trachea is midline.There is no stridor. LUNGS: Very clear with good breath sounds in all fields. There is no wheezing. Good air movement in all lung fields. CARDIAC: Without murmur. Slight tachycardia. ABDOMEN: Soft, nontender. SKIN: Notable for a confluence erythematous, blanching rash on the torso as well as more of a blotchy papular, macular rash on the upper arms. He noted some on his buttocks as well. Remaining of the exam is unremarkable.} Then, I simply upload the samples to an Amazon S3 bucket located in the same region as the service… and yes, ‘esophagogastroduodenoscopy’ is a word. Now let’s head to the AWS console and create a entity detection job. The rest of the process would be identical for PHI. Samples are stored under the ‘input/’ prefix, and I’m expecting results under the ‘output/’ prefix. Of course, you could use different buckets if you were so inclined. Optionally, you could also use AWS Key Management Service (KMS) to encrypt output results. For the sake of brevity, I won’t set up KMS here, but you’d certainly want to consider it for production workflows. I also need to provide a data access role in AWS Identity and Access Management (IAM), allowing Amazon Comprehend Medical to access the relevant S3 bucket(s). You can use a role that you previously set up in AWS Identity and Access Management (IAM), or you can use the wizard in the Amazon Comprehend Medical console. For detailed information on permissions, please refer to the documentation. Then, I create the batch job, and wait for it to complete. After a few minutes, the job is done. Results are available at the output location: one output for each input, containing a JSON-formatted description of entities and their relationships. A manifest also includes global information: number of processed documents, total amount of data, etc. Paths are edited out for clarity. { "Summary" : {     "Status" : "COMPLETED",     "JobType" : "EntitiesDetection",     "InputDataConfiguration" : {         "Bucket" : "jsimon-comprehend-medical-uswest2",         "Path" : "input/"     },     "OutputDataConfiguration" : {         "Bucket" : "jsimon-comprehend-medical-uswest2",         "Path" : ...     },     "InputFileCount" : 4,     "TotalMeteredCharacters" : 3636,     "UnprocessedFilesCount" : 0,     "SuccessfulFilesCount" : 4,     "TotalDurationSeconds" : 366,     "SuccessfulFilesListLocation" : ... ,     "UnprocessedFilesListLocation" : ... } } After retrieving the ‘rash.json.out‘ object from S3, I can use a JSON editor to view its contents. Here are some of the entities that have been detected. Of course, this data is not meant to be read by humans. In a production workflow, it would be processed automatically by the Amazon Comprehend Medical APIs. Results would then stored in an AWS backend, and made available to healthcare professionals through a business application. Now Available! As you can see, it’s extremely easy to use Amazon Comprehend Medical in batch mode, even at very large scale. Zero machine learning work and zero infrastructure work required! The service is available today in the following AWS regions: US East (North Virginia), US East (Ohio), US West (Oregon), Canada (Central), EU (Ireland), EU (London), Asia Pacific (Sydney). The free tier covers 25,000 units of text (2.5 million characters) for the first three months when you start using the service, either with entity extraction or with PHI detection. As always, we’d love to hear your feedback: please post it to the AWS forum for Amazon Comprehend, or send it through your usual AWS contacts. Julien

AWS DataSync News – S3 Storage Class Support and Much More

AWS DataSync helps you to move large amounts of data into or out of the AWS Cloud (read my post, New – AWS DataSync – Automated and Accelerated Data Transfer, to learn more). As I explained in my post DataSync is a great fit for you Migration, Upload & Process, and Backup / DR use cases. DataSync is a managed service, and can be used to do one-time or periodic transfers of any size. Newest Features We launched DataSync at AWS re:Invent 2018 and have been adding features to it ever since. Today I would like to give you a brief recap of some of the newest features, and also introduce a few new ones: S3 Storage Class Support SMB Support Additional Regions VPC Endpoint Support FIPS for US Endpoints File and Folder Filtering Embedded CloudWatch Metrics Let’s take a look at each one… S3 Storage Class Support If you are transferring data to an Amazon S3 bucket, you now have control over the storage class that is used for the objects. You simply choose the class when you create a new location for use with DataSync: You can choose from any of the S3 storage classes: Objects stored in certain storage classes can incur additional charges for overwriting, deleting, or retrieving. To learn more, read Considerations When Working with S3 Storage Classes in DataSync. SMB Support Late last month we announced that AWS DataSync Can Now Transfer Data to and from SMB File Shares. SMB (Server Message Block) protocol is common in Windows-centric environments, and is also the preferred protocol for many file servers and network attached storage (NAS) devices. You can use filter patterns to control the files that are included in or excluded from the transfer, and you can use SMB file shares as the data transfer source or destination (Amazon S3 and Amazon EFS can also be used). You simply create a DataSync location that references your SMB server and share: To learn more, read Creating a Location for SMB. Additional Regions AWS DataSync is now available in more locations. Earlier this year it became available in the AWS GovCloud (US-West) and Middle East (Bahrain) Regions. VPC Endpoint Support You can deploy AWS DataSync in a Virtual Private Cloud (VPC). If you do this, data transferred between the DataSync agent will not traverse the public internet: The VPC endpoints for DataSync are powered by AWS PrivateLink; to learn more read AWS DataSync Now Supports Amazon VPC Endpoints and Using AWS DataSync in a Virtual Private Cloud. FIPS for US Endpoints In addition to support for VPC endpoints, we announced that AWS DataSync supports FIPS 140-2 Validated Endpoints in US Regions. The endpoints in these regions use a FIPS 140-2 validated cryptographic security module, making it easier for you to use DataSync for regulated workloads. You can use these endpoints by selecting them when you create your DataSync agent: File and Folder Filtering Earlier this year we added the ability to use file path and object key filters to exercise additional control over the data copied in a data transfer. To learn more, read about Excluding and including specific data in transfer tasks using AWS DataSync filters. Embedded CloudWatch Metrics Data transfer metrics are available in the Task Execution Details page so that you can track the progress of your transfer: Other AWS DataSync Resources Here are some resources to help you to learn more about AWS DataSync: Migrating Data to AWS: Understanding Your Options (AWS Online Tech Talk). Cloud Data Migration. AWS DataSync User Guide. AWS DataSync API Reference. AWS DataSync Quick Start In-cloud Transfer and Scheduler. Migrating storage with AWS DataSync (blog post). Migrating hundreds of TB of data to Amazon S3 with AWS DataSync (blog post). — Jeff;

Cloud-Powered, Next-Generation Banking

Traditional banks make extensive use of labor-intensive, human-centric control structures such as Production Support groups, Security Response teams, and Contingency Planning organizations. These control structures were deemed necessary in order to segment responsibilities and to maintain a security posture that is risk averse. Unfortunately, this traditional model tends to keep the subject matter experts in these organizations at a distance from the development teams, reducing efficiency and getting in the way of innovation. Banks and other financial technology (fintech) companies have realized that they need to move faster in order to meet the needs of the newest generation of customers. These customers, some in markets that have not been well-served by the traditional banks, expect a rich, mobile-first experience, top-notch customer service, and access to a broad array of services and products. They prefer devices to retail outlets, and want to patronize a bank that is responsive to their needs. AWS-Powered Banking Today I would like to tell you about a couple of AWS-powered banks that are addressing these needs. Both of these banks are born-in-the-cloud endeavors, and take advantage of the scale, power, and flexibility of AWS in new and interesting ways. For example, they make extensive use of microservices, deploy fresh code dozens or hundreds of times per day, and use analytics & big data to better understand their customers. They also apply automation to their compliance and control tasks, scanning code for vulnerabilities as it is committed, and also creating systems that systemically grant and enforce use of least-privilege IAM roles. NuBank – Headquartered in Brazil and serving over 10 million customers, NuBank has been recognized by Fast Company as one of the most innovative companies in the world. They were founded in 2013 and reached unicorn status (a valuation of one billion dollars), just four years later. After their most recent round of funding, their valuation has jumped to ten billion dollars. Here are some resources to help you learn more about how they use AWS: NuBank Case Study. How the Cloud Helps NuBank Support Millions of Daily Customers (video). Starling – Headquartered in London and founded in 2014, Starling is backed by over $300M in funding. Their mobile apps provide instant notification of transactions, support freezing and unfreezing of cards, and provide in-app chat with customer service representatives. Here are some resources to help you learn more about how they use AWS: They Said it Couldn’t Be Done – Starling Bank. Starling Bank Case Study. Automated Privilege Management via Slack with Starling Bank on AWS (video). Both banks are strong supporters of open banking, with support for APIs that allow third-party developers to build applications and services (read more about the NuBank API and the Starling API). I found two of the videos (How the Cloud… and Automated Privilege Management…) particularly interesting. The two videos detail how NuBank and Starling have implemented Compliance as Code, with an eye toward simplifying permissions management and increasing the overall security profile of their respective banks. I hope that you have enjoyed this quick look at how two next-generation banks are making use of AWS. The videos that I linked above contain tons of great technical information that you should also find of interest! — Jeff;            

Now Available – EC2 Instances (G4) with NVIDIA T4 Tensor Core GPUs

The NVIDIA-powered G4 instances that I promised you earlier this year are available now and you can start using them today in eight AWS regions in six sizes! You can use them for machine learning training & inferencing, video transcoding, game streaming, and remote graphics workstations applications. The instances are equipped with up to four NVIDIA T4 Tensor Core GPUs, each with 320 Turing Tensor cores, 2,560 CUDA cores, and 16 GB of memory. The T4 GPUs are ideal for machine learning inferencing, computer vision, video processing, and real-time speech & natural language processing. The T4 GPUs also offer RT cores for efficient, hardware-powered ray tracing. The NVIDIA Quadro Virtual Workstation (Quadro vWS) is available in AWS Marketplace. It supports real-time ray-traced rendering and can speed creative workflows often found in media & entertainment, architecture, and oil & gas applications. G4 instances are powered by AWS-custom Second Generation Intel® Xeon® Scalable (Cascade Lake) processors with up to 64 vCPUs, and are built on the AWS Nitro system. Nitro’s local NVMe storage building block provides direct access to up to 1.8 TB of fast, local NVMe storage. Nitro’s network building block delivers high-speed ENA networking. The Intel AVX512-Deep Learning Boost feature extends AVX-512 with a new set of Vector Neural Network Instructions (VNNI for short). These instructions accelerate the low-precision multiply & add operations that reside in the inner loop of many inferencing algorithms. Here are the instance sizes: Instance Name NVIDIA T4 Tensor Core GPUs vCPUs RAM Local Storage EBS Bandwidth Network Bandwidth g4dn.xlarge 1 4 16 GiB 1 x 125 GB Up to 3.5 Gbps Up to 25 Gbps g4dn.2xlarge 1 8 32 GiB 1 x 225 GB Up to 3.5 Gbps Up to 25 Gbps g4dn.4xlarge 1 16 64 GiB 1 x 225 GB Up to 3.5 Gbps Up to 25 Gbps g4dn.8xlarge 1 32 128 GiB 1 x 900 GB 7 Gbps 50 Gbps g4dn.12xlarge 4 48 192 GiB 1 x 900 GB 7 Gbps 50 Gbps g4dn.16xlarge 1 64 256 GiB 1 x 900 GB 7 Gbps 50 Gbps We are also working on a bare metal instance that will be available in the coming months: Instance Name NVIDIA T4 Tensor Core GPUs vCPUs RAM Local Storage EBS Bandwidth Network Bandwidth g4dn.metal 8 96 384 GiB 2 x 900 GB 14 Gbps 100 Gbps If you want to run graphics workloads on G4 instances, be sure to use the latest version of the NVIDIA AMIs (available in AWS Marketplace) so that you have access to the requisite GRID and Graphics drivers, along with an NVIDIA Quadro Workstation image that contains the latest optimizations and patches. Here’s where you can find them: NVIDIA Gaming – Windows Server 2016 NVIDIA Gaming – Windows Server 2019 NVIDIA Gaming – Ubuntu 18.04 The newest AWS Deep Learning AMIs include support for G4 instances. The team that produces the AMIs benchmarked a g3.16xlarge instance against a g4dn.12xlarge instance and shared the results with me. Here are some highlights: MxNet Inference (resnet50v2, forward pass without MMS) – 2.03 times faster. MxNet Inference (with MMS) – 1.45 times faster. MxNet Training (resnet50_v1b, 1 GPU) – 2.19 times faster. Tensorflow Inference (resnet50v1.5, forward pass) – 2.00 times faster. Tensorflow Inference with Tensorflow Service (resnet50v2) – 1.72 times faster. Tensorflow Training (resnet50_v1.5) – 2.00 times faster. The benchmarks used FP32 numeric precision; you can expect an even larger boost if you use mixed precision (FP16) or low precision (INT8). You can launch G4 instances today in the US East (N. Virginia), US East (Ohio), US West (Oregon), US West (N. California), Europe (Frankfurt), Europe (Ireland), Europe (London), Asia Pacific (Seoul), and Asia Pacific (Tokyo) Regions. We are also working to make them accessible in Amazon SageMaker and in Amazon EKS clusters. — Jeff;

New – Step Functions Support for Dynamic Parallelism

Microservices make applications easier to scale and faster to develop, but coordinating the components of a distributed application can be a daunting task. AWS Step Functions is a fully managed service that makes coordinating tasks easier by letting you design and run workflows that are made of steps, each step receiving as input the output of the previous step. For example, Novartis Institutes for Biomedical Research is using Step Functions to empower scientists to run image analysis without depending on cluster experts. Step Functions added some very interesting capabilities recently, such as callback patterns, to simplify the integration of human activities and third-party services, and nested workflows, to assemble together modular, reusable workflows. Today, we are adding support for dynamic parallelism within a workflow! How Dynamic Parallelism Works States machines are defined using the Amazon States Language, a JSON-based structured language. The Parallel state can be used to execute in parallel a fixed number of branches defined in the state machine. Now, Step Functions supports a new Map state type for dynamic parallelism. To configure a Map state, you define an Iterator, which is a complete sub-workflow. When a Step Functions execution enters a Map state, it will iterate over a JSON array in the state input. For each item, the Map state will execute one sub-workflow, potentially in parallel. When all sub-workflow executions complete, the Map state will return an array containing the output for each item processed by the Iterator. You can configure an upper bound on how many concurrent sub-workflows Map executes by adding the MaxConcurrency field. The default value is 0, which places no limit on parallelism and iterations are invoked as concurrently as possible. A MaxConcurrency value of 1 has the effect to invoke the Iterator one element at a time, in the order of their appearance in the input state, and will not start an iteration until the previous iteration has completed execution. One way to use the new Map state is to leverage fan-out or scatter-gather messaging patterns in your workflows: Fan-out is applied when delivering a message to multiple destinations, and can be useful in workflows such as order processing or batch data processing. For example, you can retrieve arrays of messages from Amazon SQS and Map will send each message to a separate AWS Lambda function. Scatter-gather broadcasts a single message to multiple destinations (scatter) and then aggregates the responses back for the next steps (gather). This can be useful in file processing and test automation. For example, you can transcode ten 500 MB media files in parallel and then join to create a 5 GB file. Like Parallel and Task states, Map supports Retry and Catch fields to handle service and custom exceptions. You can also apply Retry and Catch to states inside your Iterator to handle exceptions. If any Iterator execution fails because of an unhandled error or by transitioning to a Fail state, the entire Map state is considered to have failed and all its iterations are stopped. If the error is not handled by the Map state itself, Step Functions stops the workflow execution with an error. Using the Map State Let’s build a workflow to process an order and, by using the Map state, work on the items in the order in parallel. The tasks executed as part of this workflow are all Lambda functions, but with Step Functions you can use other AWS service integrations and have code running on EC2 instances, containers or on-premises infrastructure. Here’s our sample order, expressed as a JSON document, for a few books, plus some coffee to drink while reading them. The order has a detail section where there is a list of items that are part of the order. { "orderId": "12345678", "orderDate": "20190820101213", "detail": { "customerId": "1234", "deliveryAddress": "123, Seattle, WA", "deliverySpeed": "1-day", "paymentMethod": "aCreditCard", "items": [ { "productName": "Agile Software Development", "category": "book", "price": 60.0, "quantity": 1 }, { "productName": "Domain-Driven Design", "category": "book", "price": 32.0, "quantity": 1 }, { "productName": "The Mythical Man Month", "category": "book", "price": 18.0, "quantity": 1 }, { "productName": "The Art of Computer Programming", "category": "book", "price": 180.0, "quantity": 1 }, { "productName": "Ground Coffee, Dark Roast", "category": "grocery", "price": 8.0, "quantity": 6 } ] } } To process this order, I am using a state machine defining how the different tasks should be executed. The Step Functions console creates a visual representation of the workflow I am building: First, I validate and check the payment. Then, I process the items in the order, potentially in parallel, to check their availability, prepare for delivery and start the delivery process. At the end, a summary of the order is sent to the customer. In case the payment check fails, I intercept that, for example to send a notification to the customer.   Here is the same state machine definition expressed as a JSON document. The ProcessAllItems state is using Map to process items in the order in parallel. In this case, I limit concurrency to 3 using the MaxConcurrency field. Inside the Iterator, I can put a sub-workflow of arbitrary complexity. In this case, I have three steps, to CheckAvailability, PrepareForDelivery, and StartDelivery of the item. Each of this step can Retry and Catch errors to make the sub-workflow execution more reliable, for example in case of integrations with external services. { "StartAt": "ValidatePayment", "States": { "ValidatePayment": { "Type": "Task", "Resource": "arn:aws:lambda:us-west-2:123456789012:function:validatePayment", "Next": "CheckPayment" }, "CheckPayment": { "Type": "Choice", "Choices": [ { "Not": { "Variable": "$.payment", "StringEquals": "Ok" }, "Next": "PaymentFailed" } ], "Default": "ProcessAllItems" }, "PaymentFailed": { "Type": "Task", "Resource": "arn:aws:lambda:us-west-2:123456789012:function:paymentFailed", "End": true }, "ProcessAllItems": { "Type": "Map", "InputPath": "$.detail", "ItemsPath": "$.items", "MaxConcurrency": 3, "Iterator": { "StartAt": "CheckAvailability", "States": { "CheckAvailability": { "Type": "Task", "Resource": "arn:aws:lambda:us-west-2:123456789012:function:checkAvailability", "Retry": [ { "ErrorEquals": [ "TimeOut" ], "IntervalSeconds": 1, "BackoffRate": 2, "MaxAttempts": 3 } ], "Next": "PrepareForDelivery" }, "PrepareForDelivery": { "Type": "Task", "Resource": "arn:aws:lambda:us-west-2:123456789012:function:prepareForDelivery", "Next": "StartDelivery" }, "StartDelivery": { "Type": "Task", "Resource": "arn:aws:lambda:us-west-2:123456789012:function:startDelivery", "End": true } } }, "ResultPath": "$.detail.processedItems", "Next": "SendOrderSummary" }, "SendOrderSummary": { "Type": "Task", "InputPath": "$.detail.processedItems", "Resource": "arn:aws:lambda:us-west-2:123456789012:function:sendOrderSummary", "ResultPath": "$.detail.summary", "End": true } } } The Lambda functions used by this workflow are not aware of the overall structure of the order JSON document. They just need to know the part of the input state they are going to process. This is a best practice to make those functions easily reusable in multiple workflows. The state machine definition is manipulating the path used for the input and the output of the functions using JsonPath syntax via the InputPath, ItemsPath, ResultPath, and OutputPath fields: InputPath is used to filter the data in the input state, for example to only pass the detail of the order to the Iterator. ItemsPath is specific to the Map state and is used to identify where, in the input, the array field to process is found, for example to process the items inside the detail of the order. ResultPath makes it possible to add the output of a task to the input state, and not overwrite it completely, for example to add a summary to the detail of the order. I am not using OutputPath this time, but it could be useful to filter out unwanted information and pass only the portion of JSON that you care about to the next state. For example, to send as output only the detail of the order. Optionally, the Parameters field may be used to customize the raw input used for each iteration. For example, the deliveryAddress is in the detail of the order, but not in each item. To have the Iterator have an index of the items, and access the deliveryAddress, I can add this to a Map state: "Parameters": { "index.$": "$$.Map.Item.Index", "item.$": "$$.Map.Item.Value", "deliveryAddress.$": "$.deliveryAddress" } Available Now This new feature is available today in all regions where Step Functions is offered. Dynamic parallelism was probably the most requested feature for Step Functions. It unblocks the implementation of new use cases and can help optimize existing ones. Let us know what are you going to use it for!

NoSQL Workbench for Amazon DynamoDB – Available in Preview

I am always impressed by the flexibility of Amazon DynamoDB, providing our customers a fully-managed key-value and document database that can easily scale from a few requests per month to millions of requests per second. The DynamoDB team released so many great features recently, from on-demand capacity, to support for native ACID transactions. Here’s a great recap of other recent DynamoDB announcements such as global tables, point-in-time recovery, and instant adaptive capacity. DynamoDB now encrypts all customer data at rest by default. However, switching mindset from a relational database to NoSQL is not that easy. Last year we had two amazing talks at re:Invent that can help you understand how DynamoDB works, and how you can use it for your use cases: Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database by Jaso Sorenson Amazon DynamoDB Deep Dive: Advanced Design Patterns for DynamoDB by Rick Houlihan To help you even further, we are introducing today in preview NoSQL Workbench for Amazon DynamoDB, a free, client-side application available for Windows and macOS to help you design and visualize your data model, run queries on your data, and generate the code for your application! The three main capabilities provided by the NoSQL Workbench are: Data modeler — to build new data models, adding tables and indexes, or to import, modify, and export existing data models. Visualizer — to visualize data models based on their applications access patterns, with sample data that you can add manually or import via a SQL query. Operation builder — to define and execute data-plane operations or generate ready-to-use sample code for them. To see how this new tool can simplify working with DynamoDB, let’s build an application to retrieve information on customers and their orders. Using the NoSQL Workbench In the Data modeler, I start by creating a CustomerOrders data model, and I add a table, CustomerAndOrders, to hold my customer data and the information on their orders. You can use this tool to create a simple data model where customers and orders are in two distinct tables, each one with their own primary keys. There would be nothing wrong with that. Here I’d like to show how this tool can also help you use more advanced design patterns. By having the customer and order data in a single table, I can construct queries that return all the data I need with a single interaction with DynamoDB, speeding up the performance of my application. As partition key, I use the customerId. This choice provides an even distribution of data across multiple partitions. The sort key in my data model will be an overloaded attribute, in the sense that it can hold different data depending on the item: A fixed string, for example customer, for the items containing the customer data. The order date, written using ISO 8601 strings such as 20190823, for the items containing orders. By overloading the sort key with these two possible values, I am able to run a single query that returns the customer data and the most recent orders. For this reason, I use a generic name for the sort key. In this case, I use sk. Apart from the partition key and the optional sort key, DynamoDB has a flexible schema, and the other attributes can be different for each item in a table. However, with this tool I have the option to describe in the data model all the possible attributes I am going to use for a table. In this way, I can check later that all the access patterns I need for my application work well with this data model. For this table, I add the following attributes: customerName and customerAddress, for the items in the table containing customer data. orderId and deliveryAddress, for the items in the table containing order data. I am not adding a orderDate attribute, because for this data model the value will be stored in the sk sort key. For a real production use case, you would probably have much more attributes to describe your customers and orders, but I am trying to keep things simple enough here to show what you can do, without getting lost in details. Another access pattern for my application is to be able to get a specific order by ID. For that, I add a global secondary index to my table, with orderId as partition key and no sort key. I add the table definition to the data model, and move on to the Visualizer. There, I update the table by adding some sample data. I add data manually, but I could import a few rows from a table in a MySQL database, for example to simplify a NoSQL migration from a relational database. Now, I visualize my data model with the sample data to have a better understanding of what to expect from this table. For example, if I select a customerId, and I query for all the orders greater than a specific date, I also get the customer data at the end, because the string customer, stored in the sk sort key, is always greater that any date written in ISO 8601 syntax. In the Visualizer, I can also see how the global secondary index on the orderId works. Interestingly, items without an orderId are not part of this index, so I get only 4 of the 6 items that are part of my sample data. This happens because DynamoDB writes a corresponding index entry only if the index sort key value is present in the item. If the sort key doesn’t appear in every table item, the index is said to be sparse. Sparse indexes are useful for queries over a subsection of a table. I now commit my data model to DynamoDB. This step creates server-side resources such as tables and global secondary indexes for the selected data model, and loads the sample data. To do so, I need AWS credentials for an AWS account. I have the AWS Command Line Interface (CLI) installed and configured in the environment where I am using this tool, so I can just select one of my named profiles. I move to the Operation builder, where I see all the tables in the selected AWS Region. I select the newly created CustomerAndOrders table to browse the data and build the code for the operations I need in my application. In this case, I want to run a query that, for a specific customer, selects all orders more recent that a date I provide. As we saw previously, the overloaded sort key would also return the customer data as last item. The Operation builder can help you use the full syntax of DynamoDB operations, for example adding conditions and child expressions. In this case, I add the condition to only return orders where the deliveryAddress contains Seattle. I have the option to execute the operation on the DynamoDB table, but this time I want to use the query in my application. To generate the code, I select between Python, JavaScript (Node.js), or Java. You can use the Operation builder to generate the code for all the access patterns that you plan to use with your application, using all the advanced features that DynamoDB provides, including ACID transactions. Available Now You can find how to set up NoSQL Workbench for Amazon DynamoDB (Preview) for Windows and macOS here. We welcome your suggestions in the DynamoDB discussion forum. Let us know what you build with this new tool and how we can help you more!

Learn about AWS Services & Solutions – September AWS Online Tech Talks

Learn about AWS Services & Solutions – September AWS Online Tech Talks Join us this September to learn about AWS services and solutions. The AWS Online Tech Talks are live, online presentations that cover a broad range of topics at varying technical levels. These tech talks, led by AWS solutions architects and engineers, feature technical deep dives, live demonstrations, customer examples, and Q&A with AWS experts. Register Now! Note – All sessions are free and in Pacific Time. Tech talks this month:   Compute: September 23, 2019 | 11:00 AM – 12:00 PM PT – Build Your Hybrid Cloud Architecture with AWS – Learn about the extensive range of services AWS offers to help you build a hybrid cloud architecture best suited for your use case. September 26, 2019 | 1:00 PM – 2:00 PM PT – Self-Hosted WordPress: It’s Easier Than You Think – Learn how you can easily build a fault-tolerant WordPress site using Amazon Lightsail. October 3, 2019 | 11:00 AM – 12:00 PM PT – Lower Costs by Right Sizing Your Instance with Amazon EC2 T3 General Purpose Burstable Instances – Get an overview of T3 instances, understand what workloads are ideal for them, and understand how the T3 credit system works so that you can lower your EC2 instance costs today.   Containers: September 26, 2019 | 11:00 AM – 12:00 PM PT – Develop a Web App Using Amazon ECS and AWS Cloud Development Kit (CDK) – Learn how to build your first app using CDK and AWS container services.   Data Lakes & Analytics: September 26, 2019 | 9:00 AM – 10:00 AM PT – Best Practices for Provisioning Amazon MSK Clusters and Using Popular Apache Kafka-Compatible Tooling – Learn best practices on running Apache Kafka production workloads at a lower cost on Amazon MSK.   Databases: September 25, 2019 | 1:00 PM – 2:00 PM PT – What’s New in Amazon DocumentDB (with MongoDB compatibility) – Learn what’s new in Amazon DocumentDB, a fully managed MongoDB compatible database service designed from the ground up to be fast, scalable, and highly available. October 3, 2019 | 9:00 AM – 10:00 AM PT – Best Practices for Enterprise-Class Security, High-Availability, and Scalability with Amazon ElastiCache – Learn about new enterprise-friendly Amazon ElastiCache enhancements like customer managed key and online scaling up or down to make your critical workloads more secure, scalable and available.   DevOps: October 1, 2019 | 9:00 AM – 10:00 AM PT – CI/CD for Containers: A Way Forward for Your DevOps Pipeline – Learn how to build CI/CD pipelines using AWS services to get the most out of the agility afforded by containers.   Enterprise & Hybrid: September 24, 2019 | 1:00 PM – 2:30 PM PT – Virtual Workshop: How to Monitor and Manage Your AWS Costs – Learn how to visualize and manage your AWS cost and usage in this virtual hands-on workshop. October 2, 2019 | 1:00 PM – 2:00 PM PT – Accelerate Cloud Adoption and Reduce Operational Risk with AWS Managed Services – Learn how AMS accelerates your migration to AWS, reduces your operating costs, improves security and compliance, and enables you to focus on your differentiating business priorities.   IoT: September 25, 2019 | 9:00 AM – 10:00 AM PT – Complex Monitoring for Industrial with AWS IoT Data Services – Learn how to solve your complex event monitoring challenges with AWS IoT Data Services.   Machine Learning: September 23, 2019 | 9:00 AM – 10:00 AM PT – Training Machine Learning Models Faster – Learn how to train machine learning models quickly and with a single click using Amazon SageMaker. September 30, 2019 | 11:00 AM – 12:00 PM PT – Using Containers for Deep Learning Workflows – Learn how containers can help address challenges in deploying deep learning environments. October 3, 2019 | 1:00 PM – 2:30 PM PT – Virtual Workshop: Getting Hands-On with Machine Learning and Ready to Race in the AWS DeepRacer League – Join DeClercq Wentzel, Senior Product Manager for AWS DeepRacer, for a presentation on the basics of machine learning and how to build a reinforcement learning model that you can use to join the AWS DeepRacer League.   AWS Marketplace: September 30, 2019 | 9:00 AM – 10:00 AM PT – Advancing Software Procurement in a Containerized World – Learn how to deploy applications faster with third-party container products.   Migration: September 24, 2019 | 11:00 AM – 12:00 PM PT – Application Migrations Using AWS Server Migration Service (SMS) – Learn how to use AWS Server Migration Service (SMS) for automating application migration and scheduling continuous replication, from your on-premises data centers or Microsoft Azure to AWS.   Networking & Content Delivery: September 25, 2019 | 11:00 AM – 12:00 PM PT – Building Highly Available and Performant Applications using AWS Global Accelerator – Learn how to build highly available and performant architectures for your applications with AWS Global Accelerator, now with source IP preservation. September 30, 2019 | 1:00 PM – 2:00 PM PT – AWS Office Hours: Amazon CloudFront – Just getting started with Amazon CloudFront and Lambda@Edge? Get answers directly from our experts during AWS Office Hours.   Robotics: October 1, 2019 | 11:00 AM – 12:00 PM PT – Robots and STEM: AWS RoboMaker and AWS Educate Unite! – Come join members of the AWS RoboMaker and AWS Educate teams as we provide an overview of our education initiatives and walk you through the newly launched RoboMaker Badge.   Security, Identity & Compliance: October 1, 2019 | 1:00 PM – 2:00 PM PT – Deep Dive on Running Active Directory on AWS – Learn how to deploy Active Directory on AWS and start migrating your windows workloads.   Serverless: October 2, 2019 | 9:00 AM – 10:00 AM PT – Deep Dive on Amazon EventBridge – Learn how to optimize event-driven applications, and use rules and policies to route, transform, and control access to these events that react to data from SaaS apps.   Storage: September 24, 2019 | 9:00 AM – 10:00 AM PT – Optimize Your Amazon S3 Data Lake with S3 Storage Classes and Management Tools – Learn how to use the Amazon S3 Storage Classes and management tools to better manage your data lake at scale and to optimize storage costs and resources. October 2, 2019 | 11:00 AM – 12:00 PM PT – The Great Migration to Cloud Storage: Choosing the Right Storage Solution for Your Workload – Learn more about AWS storage services and identify which service is the right fit for your business.    

Learn From Your VPC Flow Logs With Additional Meta-Data

Flow Logs for Amazon Virtual Private Cloud enables you to capture information about the IP traffic going to and from network interfaces in your VPC. Flow Logs data can be published to Amazon CloudWatch Logs or Amazon Simple Storage Service (S3). Since we launched VPC Flow Logs in 2015, you have been using it for variety of use-cases like troubleshooting connectivity issues across your VPCs, intrusion detection, anomaly detection, or archival for compliance purposes. Until today, VPC Flow Logs provided information that included source IP, source port, destination IP, destination port, action (accept, reject) and status. Once enabled, a VPC Flow Log entry looks like the one below. While this information was sufficient to understand most flows, it required additional computation and lookup to match IP addresses to instance IDs or to guess the directionality of the flow to come to meaningful conclusions. Today we are announcing the availability of additional meta data to include in your Flow Logs records to better understand network flows. The enriched Flow Logs will allow you to simplify your scripts or remove the need for postprocessing altogether, by reducing the number of computations or lookups required to extract meaningful information from the log data. When you create a new VPC Flow Log, in addition to existing fields, you can now choose to add the following meta-data: vpc-id : the ID of the VPC containing the source Elastic Network Interface (ENI). subnet-id : the ID of the subnet containing the source ENI. instance-id : the Amazon Elastic Compute Cloud (EC2) instance ID of the instance associated with the source interface. When the ENI is placed by AWS services (for example, AWS PrivateLink, NAT Gateway, Network Load Balancer etc) this field will be “-“ tcp-flags : the bitmask for TCP Flags observed within the aggregation period. For example, FIN is 0x01 (1), SYN is 0x02 (2), ACK is 0x10 (16), SYN + ACK is 0x12 (18), etc. (the bits are specified in “Control Bits” section of RFC793 “Transmission Control Protocol Specification”). This allows to understand who initiated or terminated the connection. TCP uses a three way handshake to establish a connection. The connecting machine sends a SYN packet to the destination, the destination replies with a SYN + ACK and, finally, the connecting machine sends an ACK. In the Flow Logs, the handshake is shown as two lines, with tcp-flags values of 2 (SYN), 18 (SYN + ACK).  ACK is reported only when it is accompanied with SYN (otherwise it would be too much noise for you to filter out). type : the type of traffic : IPV4, IPV6 or Elastic Fabric Adapter. pkt-srcaddr : the packet-level IP address of the source. You typically use this field in conjunction with srcaddr to distinguish between the IP address of an intermediate layer through which traffic flows, such as a NAT gateway. pkt-dstaddr : the packet-level destination IP address, similar to the previous one, but for destination IP addresses. To create a VPC Flow Log, you can use the AWS Management Console, the AWS Command Line Interface (CLI) or the CreateFlowLogs API and select which additional information and the order you want to consume the fields, for example: Or using the AWS Command Line Interface (CLI) as below: $ aws ec2 create-flow-logs --resource-type VPC \ --region eu-west-1 \ --resource-ids vpc-12345678 \ --traffic-type ALL \ --log-destination-type s3 \ --log-destination arn:aws:s3:::sst-vpc-demo \ --log-format '${version} ${vpc-id} ${subnet-id} ${instance-id} ${interface-id} ${account-id} ${type} ${srcaddr} ${dstaddr} ${srcport} ${dstport} ${pkt-srcaddr} ${pkt-dstaddr} ${protocol} ${bytes} ${packets} ${start} ${end} ${action} ${tcp-flags} ${log-status}' # be sure to replace the bucket name and VPC ID ! { "ClientToken": "1A....HoP=", "FlowLogIds": [ "fl-12345678123456789" ], "Unsuccessful": [] } Enriched VPC Flow Logs are delivered to S3. We will automatically add the required S3 Bucket Policy to authorize VPC Flow Logs to write to your S3 bucket. VPC Flow Logs does not capture real-time log streams for your network interface, it might take several minutes to begin collecting and publishing data to the chosen destinations. Your logs will eventually be available on S3 at s3://<bucket name>/AWSLogs/<account id>/vpcflowlogs/<region>/<year>/<month>/<day>/ An SSH connection from my laptop with IP address 90.90.0.200 to an EC2 instance would appear like this : 3 vpc-exxxxxx2 subnet-8xxxxf3 i-0bfxxxxxxaf eni-08xxxxxxa5 48xxxxxx93 IPv4 172.31.22.145 90.90.0.200 22 62897 172.31.22.145 90.90.0.200 6 5225 24 1566328660 1566328672 ACCEPT 18 OK 3 vpc-exxxxxx2 subnet-8xxxxf3 i-0bfxxxxxxaf eni-08xxxxxxa5 48xxxxxx93 IPv4 90.90.0.200 172.31.22.145 62897 22 90.90.0.200 172.31.22.145 6 4877 29 1566328660 1566328672 ACCEPT 2 OK 172.31.22.145 is the private IP address of the EC2 instance, the one you see when you type ifconfig on the instance.  All flags are “OR”ed during aggregation period. When connection is short, probably both SYN and FIN (3), as well as SYN+ACK and FIN (19) will be set for the same lines. Once a Flow Log is created, you can not add additional fields or modify the structure of the log to ensure you will not accidently break scripts consuming this data. Any modification will require you to delete and recreate the VPC Flow Logs. There is no additional cost to capture the extra information in the VPC Flow Logs, normal VPC Flow Log pricing applies, remember that Enriched VPC Flow Log records might consume more storage when selecting all fields.  We do recommend to select only the fields relevant to your use-cases. Enriched VPC Flow Logs is available in all regions where VPC Flow logs is available, you can start to use it today. -- seb PS: I heard from the team they are working on adding additional meta-data to the logs, stay tuned for updates.

Now Available – Amazon Quantum Ledger Database (QLDB)

Given the wide range of data types, query models, indexing options, scaling expectations, and performance requirements, databases are definitely not one size fits all products. That’s why there are many different AWS database offerings, each one purpose-built to meet the needs of a different type of application. Introducing QLDB Today I would like to tell you about Amazon QLDB, the newest member of the AWS database family. First announced at AWS re:Invent 2018 and made available in preview form, it is now available in production form in five AWS regions. As a ledger database, QLDB is designed to provide an authoritative data source (often known as a system of record) for stored data. It maintains a complete, immutable history of all committed changes to the data that cannot be updated, altered, or deleted. QLDB supports PartiQL SQL queries to the historical data, and also provides an API that allows you to cryptographically verify that the history is accurate and legitimate. These features make QLDB a great fit for banking & finance, ecommerce, transportation & logistics, HR & payroll, manufacturing, and government applications and many other use cases that need to maintain the integrity and history of stored data. Important QLDB Concepts Let’s review the most important QLDB concepts before diving in: Ledger – A QLDB ledger consists of a set of QLDB tables and a journal that maintains the complete, immutable history of changes to the tables. Ledgers are named and can be tagged. Journal – A journal consists of a sequence of blocks, each cryptographically chained to the previous block so that changes can be verified. Blocks, in turn, contain the actual changes that were made to the tables, indexed for efficient retrieval. This append-only model ensures that previous data cannot be edited or deleted, and makes the ledgers immutable. QLDB allows you to export all or part of a journal to S3. Table – Tables exist within a ledger, and contain a collection of document revisions. Tables support optional indexes on document fields; the indexes can improve performance for queries that make use of the equality (=) predicate. Documents – Documents exist within tables, and must be in Amazon Ion form. Ion is a superset of JSON that adds additional data types, type annotations, and comments. QLDB supports documents that contain nested JSON elements, and gives you the ability to write queries that reference and include these elements. Documents need not conform to any particular schema, giving you the flexibility to build applications that can easily adapt to changes. PartiQL – PartiQL is a new open standard query language that supports SQL-compatible access to relational, semi-structured, and nested data while remaining independent of any particular data source. To learn more, read Announcing PartiQL: One Query Languge for All Your Data. Serverless – You don’t have to worry about provisioning capacity or configuring read & write throughput. You create a ledger, define your tables, and QLDB will automatically scale to meet the needs of your application. Using QLDB You can create QLDB ledgers and tables from the AWS Management Console, AWS Command Line Interface (CLI), a CloudFormation template, or by making calls to the QLDB API. I’ll use the QLDB Console and I will follow the steps in Getting Started with Amazon QLDB. I open the console and click Start tutorial to get started: The Getting Started page outlines the first three steps; I click Create ledger to proceed (this opens in a fresh browser tab): I enter a name for my ledger (vehicle-registration), tag it, and (again) click Create ledger to proceed: My ledger starts out in Creating status, and transitions to Active within a minute or two: I return to the Getting Started page, refresh the list of ledgers, choose my new ledger, and click Load sample data: This takes a second or so, and creates four tables & six indexes: I could also use PartiQL statements such as CREATE TABLE, CREATE INDEX, and INSERT INTO to accomplish the same task. With my tables, indexes, and sample data loaded, I click on Editor and run my first query (a single-table SELECT): This returns a single row, and also benefits from the index on the VIN field. I can also run a more complex query that joins two tables: I can obtain the ID of a document (using a query from here), and then update the document: I can query the modification history of a table or a specific document in a table, with the ability to find modifications within a certain range and on a particular document (read Querying Revision History to learn more). Here’s a simple query that returns the history of modifications to all of the documents in the VehicleRegistration table that were made on the day that I wrote this post: As you can see, each row is a structured JSON object. I can select any desired rows and click View JSON for further inspection: Earlier, I mentioned that PartiQL can deal with nested data. The VehicleRegistration table contains ownership information that looks like this: { "Owners":{ "PrimaryOwner":{ "PersonId":"6bs0SQs1QFx7qN1gL2SE5G" }, "SecondaryOwners":[ ] } PartiQL lets me reference the nested data using “.” notation: I can also verify the integrity of a document that is stored within my ledger’s journal. This is fully described in Verify a Document in a Ledger, and is a great example of the power (and value) of cryptographic verification. Each QLDB ledger has an associated digest. The digest is a 256-bit hash value that uniquely represents the ledger’s entire history of document revisions as of a point in time. To access the digest, I select a ledger and click Get digest: When I click Save, the console provides me with a short file that contains all of the information needed to verify the ledger. I save this file in a safe place, for use when I want to verify a document in the ledger. When that time comes, I get the file, click on Verification in the left-navigation, and enter the values needed to perform the verification. This includes the block address of a document revision, and the ID of the document. I also choose the digest that I saved earlier, and click Verify: QLDB recomputes the hashes to ensure that the document has not been surreptitiously changed, and displays the verification: In a production environment, you would use the QLDB APIs to periodically download digests and to verify the integrity of your documents. Building Applications with QLDB You can use the Amazon QLDB Driver for Java to write code that accesses and manipulates your ledger database. This is a Java driver that allows you to create sessions, execute PartiQL commands within the scope of a transaction, and retrieve results. Drivers for other languages are in the works; stay tuned for more information. Available Now Amazon QLDB is available now in the US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), and Asia Pacific (Tokyo) Regions. Pricing is based on the following factors, and is detailed on the Amazon QLDB Pricing page, including some real-world examples: Write operations Read operations Journal storage Indexed storage Data transfer — Jeff;

Introducing the Newest AWS Heroes – September 2019

Leaders within the AWS technical community educate others about the latest AWS services in a variety of ways: some share knowledge in person by speaking at events or running workshops and Meetups; while others prefer to share their insights online via social media, blogs, or open source contributions. The most prominent AWS community leaders worldwide are recognized as AWS Heroes, and today we are excited to introduce to you the latest members of the AWS Hero program: Alex Schultz – Fort Wayne, USA Machine Learning Hero Alex Schultz works in the Innovation Labs at Advanced Solutions where he develops machine learning enabled products and solutions for the biomedical and product distribution industries. After receiving a DeepLens at re:Invent 2017, he dove headfirst into machine learning where he used the device to win the AWS DeepLens challenge by building a project which can read books to children. As an active advocate for AWS, he runs the Fort Wayne AWS User Group and loves to share his knowledge and experience with other developers. He also regularly contributes to the online DeepRacer community where he has helped many people who are new to machine learning get started.               Chase Douglas – Portland, USA Serverless Hero Chase Douglas is the CTO and co-founder at Stackery, where he steers engineering and technical architecture of development tools that enable individuals and teams of developers to successfully build and manage serverless applications. He is a deeply experienced software architect and a long-time engineering leader focused on building products that increase development efficiency while delighting users. Chase is also a frequent conference speaker on the topics of serverless, instrumentation, and software patterns. Most recently, he discussed the serverless development-to-production pipeline at the Chicago and New York AWS Summits, and provided insight into how the future of serverless may be functionless in his blog series.             Chris Williams – Portsmouth, USA Community Hero Chris Williams is an Enterprise Cloud Consultant for GreenPages Technology Solutions—a digital transformation and cloud enablement company. There he helps customers design and deploy the next generation of public, private, and hybrid cloud solutions, specializing in AWS and VMware. Chris blogs about virtualization, technology, and design at Mistwire. He is an active community leader, co-organizing the AWS Portsmouth User Group, and both hosts and presents on vBrownBag.                 Dave Stauffacher – Milwaukee, USA Community Hero Dave Stauffacher is a Principal Platform Engineer focused on cloud engineering at Direct Supply where he has helped navigate a 30,000% data growth over the last 15 years. In his current role, Dave is focused on helping drive Direct Supply’s cloud migration, combining his storage background with cloud automation and standardization practices. Dave has published his automation work for deploying AWS Storage Gateway for use with SQL Server data protection. He is a participant in the Milwaukee AWS User Group and the Milwaukee Docker User Group and has showcased his cloud experience in presentations at the AWS Midwest Community Day, AWS re:Invent, HashiConf, the Milwaukee Big Data User Group and other industry events.             Gojko Adzic – London, United Kingdom Serverless Hero Gojko Adzic is a partner at Neuri Consulting LLP and a co-founder of MindMup, a collaborative mind mapping application that has been running on AWS Lambda since 2016. He is the author of the book Running Serverless and co-author of Serverless Computing: Economic and Architectural Impact, one of the first academic papers about AWS Lambda. He is also a key contributor to Claudia.js, an open-source tool that simplifies Lambda application deployment, and is a minor contributor to the AWS SAM CLI. Gojko frequently blogs about serverless application development on Serverless.Pub and his personal blog, and he has authored numerous other books.               Liz Rice – Enfield, United Kingdom Container Hero Liz Rice is VP Open Source Engineering with cloud native security specialists Aqua Security, where she and her team look after several container-related open source projects. She is chair of the CNCF’s Technical Oversight Committee, and was Co-Chair of the KubeCon + CloudNativeCon 2018 events in Copenhagen, Shanghai and Seattle. She is a regular speaker at conferences including re:Invent, Velocity, DockerCon and many more. Her talks usually feature live coding or demos, and she is known for making complex technical concepts accessible.                 Lyndon Leggate – London, United Kingdom Machine Learning Hero Lyndon Leggate is a senior technology leader with extensive experience of defining and delivering complex technical solutions on large, business critical projects for consumer facing brands. Lyndon is a keen participant in the AWS DeepRacer league. Racing as Etaggel, he has regularly positioned in the top 10, features in DeepRacer TV and in May 2019 established the AWS DeepRacer Community. This vibrant and rapidly growing community provides a space for new and experienced racers to seek advice and share tips. The Community has gone on to expand the DeepRacer toolsets, making the platform more accessible and pushing the bounds of the technology. He also organises the AWS DeepRacer London Meetup series.             Maciej Lelusz – Crakow, Poland Community Hero Maciej Lelusz is Co-Founder of Chaos Gears, a company concentrated on serverless, automation, IaC and chaos engineering as a way for the improvement of system resiliency. He is focused on community development, blogging, company management, and Public/Hybrid/Private cloud design. He cares about enterprise technology, IT transformation, its culture and people involved in it. Maciej is Co-Leader of the AWS User Group Poland – Cracow Chapter, and the Founder and Leader of the InfraXstructure conference and Polish VMware User Group.                 Nathan Glover – Perth, Australia Community Hero Nathan Glover is a DevOps Consultant at Mechanical Rock in Perth, Western Australia. Prior to that he worked as a Hardware Systems Designer and Embedded Developer in the IoT space. He is passionate about Cloud Native architecture and loves sharing his successes and failures on his blog. A key focus for him is breaking down the learning barrier by building practical examples using cloud services. On top of these he has a number of online courses teaching people how to get started building with Amazon Alexa Skills and AWS IoT. In his spare time, he loves to dabble in all areas of technology; building cloud connected toasters, embedded systems vehicle tracking, and competing in online capture the flag security events.             Prashanth HN – Bengaluru, India Serverless Hero Prashanth HN is the Chief Technology Officer at WheelsBox and one of the community leaders of the AWS Users Group, Bengaluru. He mentors and consults other startups to embrace a serverless approach, frequently blogs about serverless topics for all skill levels including topics for beginners and advanced users on his personal blog and Amplify-related topics on the AWS Amplify Community Blog, and delivers talks about building using microservices and serverless. In a recent talk, he demonstrated how microservices patterns can be implemented using serverless. Prashanth maintains the open-source project Lanyard, a serverless agenda app for event organizers and attendees, which was well received at AWS Community Day India.               Ran Ribenzaft – Tel Aviv, Israel Serverless Hero Ran Ribenzaft is the Chief Technology Officer at Epsagon, an AWS Advanced Technology Partner that specializes in monitoring and tracing for serverless applications. Ran is a passionate developer that loves sharing open-source tools to make everyone’s lives easier and writing technical blog posts on the topics of serverless, microservices, cloud, and AWS on Medium and the Epsagon blog. Ran is also dedicated to educating and growing the community around serverless, organizing Serverless meetups in SF and TLV, delivering online webinars and workshops, and frequently giving talks at conferences.                 Rolf Koski – Tampere, Finland Community Hero Rolf Koski works at Cybercom, which is an AWS Premier Partner from the Nordics headquartered in Sweden. He works as the CTO at Cybercom AWS Business Group. In his role he is both technical as well as being the thought leader in the Cloud. Rolf has been one of the leading figures at the Nordic AWS Communities as one of the community leads in Helsinki and Stockholm user groups and he initially founded and organized the first ever AWS Community Days Nordics. Rolf is professionally certified and additionally works as Well-Architected Lead doing Well-Architected Reviews for customer workloads.               Learn more about AWS Heroes and connect with a Hero near you by checking out the Hero website.

Optimize Storage Cost with Reduced Pricing for Amazon EFS Infrequent Access

Today we are announcing a new price reduction – one of the largest in AWS Cloud history to date – when using Infrequent Access (IA) with Lifecycle Management with Amazon Elastic File System. This price reduction makes it possible to optimize cost even further and automatically save up to 92% on file storage costs as your access patterns change. With this new reduced pricing you can now store and access your files natively in a file system for effectively $0.08/GB-month, as we’ll see in an example later in this post. Amazon Elastic File System (EFS) is a low-cost, simple to use, fully managed, and cloud-native NFS file system for Linux-based workloads that can be used with AWS services and on-premises resources. EFS provides elastic storage, growing and shrinking automatically as files are created or deleted – even to petabyte scale – without disruption. Your applications always have the storage they need immediately available. EFS also includes, for free, multi-AZ availability and durability right out of the box, with strong file system consistency. Easy Cost Optimization using Lifecycle Management As storage grows the likelihood that a given application needs access to all of the files all of the time lessens, and access patterns can also change over time. Industry analysts such as IDC, and our own analysis of usage patterns confirms, that around 80% of data is not accessed very often. The remaining 20% is in active use. Two common drivers for moving applications to the cloud are to maximize operational efficiency and to reduce the total cost of ownership, and this applies equally to storage costs. Instead of keeping all of the data on hand on the fastest performing storage it may make sense to move infrequently accessed data into a different storage class/tier, with an associated cost reduction. Identifying this data manually can be a burden so it’s also ideal to have the system monitor access over time and perform the movement of data between storage tiers automatically, again without disruption to your running applications. EFS Infrequent Access (IA) with Lifecycle Management provides an easy to use, cost-optimized price and performance tier suitable for files that are not accessed regularly. With the new price reduction announced today builders can now save up to 92% on their file storage costs compared to EFS Standard. EFS Lifecycle Management is easy to enable and runs automatically behind the scenes. When enabled on a file system, files not accessed according to the lifecycle policy you choose will be moved automatically to the cost-optimized EFS IA storage class. This movement is transparent to your application. Although the infrequently accessed data is held in a different storage class/tier it’s still immediately accessible. This is one of the advantages to EFS IA – you don’t have to sacrifice any of EFS‘s benefits to get the cost savings. Your files are still immediately accessible, all within the same file system namespace. The only tradeoff is slightly higher per operation latency (double digit ms vs single digit ms — think magnetic vs SSD) for the files in the IA storage class/tier. As an example of the cost optimization EFS IA provides let’s look at storage costs for 100 terabytes (100TB) of data. The EFS Standard storage class is currently charged at $0.30/GB-month. When it was launched in July the EFS IA storage class was priced at $0.045/GB-month. It’s now been reduced to $0.025/GB-month. As I noted earlier, this is one of the largest price drops in the history of AWS to date! Using the 20/80 access statistic mentioned earlier for EFS IA: 20% of 100TB = 20TB at $0.30/GB-month = $0.30 x 20 x 1,000 = $6,000 80% of 100TB = 80TB at $0.025/GB-month = $0.025 x 80 x 1,000 = $2,000 Total for 100TB = $8,000/month or $0.08/GB-month. Remember, this price also includes (for free) multi-AZ, full elasticity, and strong file system consistency. Compare this to using only EFS Standard where we are storing 100% of the data in the storage class, we get a cost of $0.30 x 100 x 1,000 = $30,000. $22,000/month is a significant saving and it’s so easy to enable. Remember too that you have control over the lifecycle policy, specifying how frequently data is moved to the IA storage tier. Getting Started with Infrequent Access (IA) Lifecycle Management From the EFS Console I can quickly get started in creating a file system by choosing a Amazon Virtual Private Cloud and the subnets in the Virtual Private Cloud where I want to expose mount targets for my instances to connect to. In the next step I can configure options for the new file system. This is where I select the Lifecycle policy I want to apply to enable use of the EFS IA storage class. Here I am going to enable files that have not been accessed for 14 days to be moved to the IA tier automatically. In the final step I simply review my settings and then click Create File System to create the file system. Easy! A Lifecycle Management policy can also be enabled, or changed, for existing file systems. Navigating to the file system in the EFS Console I can view the applied policy, if any. Here I’ve selected an existing file system that has no policy attached and therefore is not benefiting from EFS IA. Clicking the pencil icon to the right of the field takes me to a dialog box where I can select the appropriate Lifecycle policy, just as I did when creating a new file system. Amazon Elastic File System IA with Lifecycle Management is available now in all regions where Elastic File System is present. — Steve  

Operational Insights for Containers and Containerized Applications

The increasing adoption of containerized applications and microservices also brings an increased burden for monitoring and management. Builders have an expectation of, and requirement for, the same level of monitoring as would be used with longer lived infrastructure such as Amazon Elastic Compute Cloud (EC2) instances. By contrast containers are relatively short-lived, and usually subject to continuous deployment. This can make it difficult to reliably collect monitoring data and to analyze performance or other issues, which in turn affects remediation time. In addition builders have to resort to a disparate collection of tools to perform this analysis and inspection, manually correlating context across a set of infrastructure and application metrics, logs, and other traces. Announcing general availability of Amazon CloudWatch Container Insights At the AWS Summit in New York this past July, Amazon CloudWatch Container Insights support for Amazon ECS and AWS Fargate was announced as an open preview for new clusters. Starting today Container Insights is generally available, with the added ability to now also monitor existing clusters. Immediate insights into compute utilization and failures for both new and existing cluster infrastructure and containerized applications can be easily obtained from container management services including Kubernetes, Amazon Elastic Container Service for Kubernetes, Amazon ECS, and AWS Fargate. Once enabled Amazon CloudWatch discovers all of the running containers in a cluster and collects performance and operational data at every layer in the container stack. It also continuously monitors and updates as changes occur in the environment, simplifying the number of tools required to collect, monitor, act, and analyze container metrics and logs giving complete end to end visibility. Being able to easily access this data means customers can shift focus to increased developer productivity and away from building mechanisms to curate and build dashboards. Getting started with Amazon CloudWatch Container Insights I can enable Container Insights by following the instructions in the documentation. Once enabled and new clusters launched, when I visit the CloudWatch console for my region I see a new option for Container Insights in the list of dashboards available to me. Clicking this takes me to the relevant dashboard where I can select the container management service that hosts the clusters that I want to observe. In the below image I have selected to view metrics for my ECS Clusters that are hosting a sample application I have deployed in AWS Fargate. I can examine the metrics for standard time periods such as 1 hour, 3 hours, etc but can also specify custom time periods. Here I am looking at the metrics for a custom time period of the past 15 minutes. You can see that I can quickly gain operational oversight of the overall performance of the cluster. Clicking the cluster name takes me deeper to view the metrics for the tasks inside the cluster. Selecting a container allows me to then dive into either AWS X-Ray traces or performance logs. Selecting performance logs takes me to the Amazon CloudWatch Logs Insights page where I can run queries against the performance events collected for my container ecosystem (e.g., Container, Task/Pod, Cluster, etc.) that I can then use to troubleshoot and dive deeper. Container Insights makes it easy for me to get started monitoring my containers and enables me to quickly drill down into performance metrics and log analytics without the need to build custom dashboards to curate data from multiple tools. Beyond monitoring and troubleshooting I can also use the data and dashboards Container Insights provides me to support other use cases such as capacity requirements and planning, by helping me understand compute utilization by Pod/Task, Container, and Service for example. Availability Amazon CloudWatch Container Insights is generally available today to customers in all public AWS regions where Amazon Elastic Container Service for Kubernetes, Kubernetes, Amazon ECS, and AWS Fargate are present. — Steve

New – Port Forwarding Using AWS System Manager Sessions Manager

I increasingly see customers adopting the immutable infrastructure architecture pattern: they rebuild and redeploy an entire infrastructure for each update. They very rarely connect to servers over SSH or RDP to update configuration or to deploy software updates. However, when migrating existing applications to the cloud, it is common to connect to your Amazon Elastic Compute Cloud (EC2) instances to perform a variety of management or operational tasks. To reduce the surface of attack, AWS recommends using a bastion host, also known as a jump host. This special purpose EC2 instance is designed to be the primary access point from the Internet and acts as a proxy to your other EC2 instances. To connect to your EC2 instance, you first SSH / RDP into the bastion host and, from there, to the destination EC2 instance. To further reduce the surface of attack, the operational burden to manage bastion hosts and the additional costs incurred, AWS Systems Manager Session Manager allows you to securely connect to your EC2 instances, without the need to run and to operate your own bastion hosts and without the need to run SSH on your EC2 instances. When Systems Manager‘s Agent is installed on your instances and when you have IAM permissions to call Systems Manager API, you can use the AWS Management Console or the AWS Command Line Interface (CLI) to securely connect to your Linux or Windows EC2 instances. Interactive shell on EC2 instances is not the only use case for SSH. Many customers are also using SSH tunnel to remotely access services not exposed to the public internet. SSH tunneling is a powerful but lesser known feature of SSH that alows you to to create a secure tunnel between a local host and a remote service. Let’s imagine I am running a web server for easy private file transfer between an EC2 instance and my laptop. These files are private, I do not want anybody else to access that web server, therefore I configure my web server to bind only on 127.0.0.1 and I do not add port 80 to the instance’ security group. Only local processes can access the web server. To access the web server from my laptop, I create a SSH tunnel between the my laptop and the web server, as shown below This command tells SSH to connect to instance as user ec2-user, open port 9999 on my local laptop, and forward everything from there to localhost:80 on instance. When the tunnel is established, I can point my browser at http://localhost:9999 to connect to my private web server on port 80. Today, we are announcing Port Forwarding for AWS Systems Manager Session Manager. Port Forwarding allows you to securely create tunnels between your instances deployed in private subnets, without the need to start the SSH service on the server, to open the SSH port in the security group or the need to use a bastion host. Similar to SSH Tunnels, Port Forwarding allows you to forward traffic between your laptop to open ports on your instance. Once port forwarding is configured, you can connect to the local port and access the server application running inside the instance. Systems Manager Session Manager’s Port Forwarding use is controlled through IAM policies on API access and the Port Forwarding SSM Document. These are two different places where you can control who in your organisation is authorised to create tunnels. To experiment with Port Forwarding today, you can use this CDK script to deploy a VPC with private and public subnets, and a single instance running a web server in the private subnet. The drawing below illustrates the infrastructure that I am using for this blog post. The instance is private, it does not have a public IP address, nor a DNS name. The VPC Default Security Group does not authorise connection over SSH. The Systems Manager‘s Agent, running on your EC2 instance, must be able to communicate with the Systems Manager‘ Service Endpoint. The private subnet must therefore have a routing table to a NAT Gateway or you must configure an AWS Private Link to do so. Let’s use Systems Manager Session Manager Port Forwarding to access the web server running on this private instance. Before doing so, you must ensure the following prerequisites are met on the EC2 instance: System Manager Agent must be installed and running (version 2.3.672.0 or more recent, see instructions for Linux or Windows). The agent is installed and started by default on Amazon Linux 1 & 2, Windows and Ubuntu AMIs provided by Amazon (see this page for the exact versions), no action is required when you are using these. the EC2 instance must have an IAM role with permission to invoke Systems Manager API. For this example, I am using AmazonSSMManagedInstanceCore. On your laptop, you must: install the System Manager CLI extension (version 1.1.26.0 or more recent) use the latest version of the AWS Command Line Interface (CLI) (1.16.220 or more recent) Once the prerequisites are met, you use the AWS Command Line Interface (CLI) to create the tunnel (assuming you started the instance using the CDK script to create your instance) : # find the instance ID based on Tag Name INSTANCE_ID=$(aws ec2 describe-instances \ --filter "Name=tag:Name,Values=CodeStack/NewsBlogInstance" \ --query "Reservations[].Instances[?State.Name == 'running'].InstanceId[]" \ --output text) # create the port forwarding tunnel aws ssm start-session --target $INSTANCE_ID \ --document-name AWS-StartPortForwardingSession \ --parameters '{"portNumber":["80"],"localPortNumber":["9999"]}' Starting session with SessionId: sst-00xxx63 Port 9999 opened for sessionId sst-00xxx63 Connection accepted for session sst-00xxx63. You can now point your browser to port 9999 and access your private web server. Type ctrl-c to terminate the port forwarding session. The Session Manager Port Forwarding creates a tunnel similar to SSH tunneling, as illustrated below. Port Forwarding works for Windows and Linux instances. It is available in every public AWS region today, at no additional cost when connecting to EC2 instances, you will be charged for the outgoing bandwidth from the NAT Gateway or your VPC Private Link. -- seb

New – Client IP Address Preservation for AWS Global Accelerator

AWS Global Accelerator is a network service that routes incoming network traffic to multiple AWS regions in order to improve performance and availability for your global applications. It makes use of our collection of edge locations and our congestion-free global network to direct traffic based on application health, network health, and the geographic locations of your users, and provides a set of static Anycast IP addresses that are announced from multiple AWS locations (read New – AWS Global Accelerator for Availability and Performance to learn a lot more). The incoming TCP or UDP traffic can be routed to an Application Load Balancer, Network Load Balancer, or to an Elastic IP Address. Client IP Address Preservation Today we are announcing an important new feature for AWS Global Accelerator. If you are routing traffic to an Application Load Balancer, the IP address of the user’s client is now available to code running on the endpoint. This allows you to apply logic that is specific to a particular IP address. For example, you can use security groups that filter based on IP address, and you can serve custom content to users based on their IP address or geographic location. You can also use the IP addresses to collect more accurate statistics on the geographical distribution of your user base. Using Client IP Address Preservation If you are already using AWS Global Accelerator, we recommend that you phase in your use of Client IP Address Preservation by using weights on the endpoints. This will allow you to verify that any rules or systems that make use of IP addresses continue to function as expected. In order to test this new feature, I launched some EC2 instances, set up an Application Load Balancer, put the instances into a target group, and created an accelerator in front of my ALB: I checked the IP address of my browser: I installed a simple Python program (courtesy of the Global Accelerator team), sent an HTTP request to one of the Global Accelerator’s IP addresses, and captured the output: The Source (99.82.172.36) is an internal address used by my accelerator. With my baseline established and everything working as expected, I am now ready to enable Client IP Address Preservation! I open the AWS Global Accelerator Console, locate my accelerator, and review the current configuration, as shown above. I click the listener for port 80, and click the existing endpoint group: From there I click Add endpoint, add a new endpoint to the group, use a Weight of 255, and select Preserve client IP address: My endpoint group now has two endpoints (one with client IP preserved, and one without), both of which point to the same ALB: In a production environment I would start with a low weight and test to make sure that any security groups or other logic that was dependent on IP addresses continue to work as expected (I can also use the weights to manage traffic during blue/green deployments and software updates). Since I’m simply testing, I can throw caution to the wind and delete the old (non-IP-preserving) endpoint. Either way, the endpoint change becomes effective within a couple of minutes, and I can refresh my test window: Now I can see that my code has access to the IP address of the browser (via the X-Forwarded-For header) and I can use it as desired. I can also use this IP address in security group rules. To learn more about best practices for switching over, read Transitioning Your ALB Endpoints to Use Client IP Address Preservation. Things to Know Here are a couple of important things to know about client IP preservation: Elastic Network Interface (ENI) Usage – The Global Accelerator creates one ENI for each subnet that contains IP-preserving endpoints, and will delete them when they are no longer required. Don’t edit or delete them. Security Groups – The Global Accelerator creates and manages a security group named GlobalAccelerator. Again, you should not edit or delete it. Available Now You can enable this new feature for Application Load Balancers in the US East (N. Virginia), US East (Ohio), US West (Oregon), US West (N. California), Europe (Ireland), Europe (Frankfurt), Europe (London), Asia Pacific (Tokyo), Asia Pacific (Singapore), Asia Pacific (Seoul), Asia Pacific (Mumbai), and Asia Pacific (Sydney) Regions. — Jeff;

Pages

Recommended Content