AWS Cloud Operations Blog

Category: Management & Governance

Simplifying Prometheus metrics collection across your AWS infrastructure

If you’re running services such as Amazon EC2 instances, Amazon Elastic Container Service (Amazon ECS) containers, and Amazon Managed Streaming for Apache Kafka (Amazon MSK) clusters in AWS, maintaining separate Prometheus servers for each environment creates significant operational burden. Managing scraper configurations, high availability, scaling, and security distracts you from building great applications. AWS managed […]

AWS Unified Operations: Building Resilient Operations for Mission-Critical Workloads

Achieve Mission-Critical Resiliency at Scale with AWS Unified Operations – The Top Tier of AWS Support to Achieve High Availability, Faster Migrations, and Accelerated Incident Resolution The Shift-Left Paradigm: From Reactive Firefighting to Proactive Prevention Organizations running mission-critical workloads face three critical operational gaps that undermine resilience and slow cloud adoption. Skills gaps make cloud-native […]

Essential security controls to prevent unauthorized account removal in AWS Organizations

When AWS member accounts are compromised, attackers can remove them from your organization, disabling all governance controls. In this post, you’ll learn how to protect your AWS environment from account compromise leaving your AWS Organization using layered security controls, including service control policies, secure account migration, and centralized root access management. AWS secures the infrastructure […]

Adaptive sampling with AWS X-Ray to capture critical spans

Introduction Enterprise applications using AWS X-Ray generate large volumes of distributed tracing data across multiple services. Static sampling strategies keep costs down by capturing a fixed percentage of traffic. However, they frequently miss critical data during intermittent failures or sudden latency spikes. Tracing every request for maximum visibility at scale may increase sampling costs for […]

Automate AWS Systems Manager activation for hybrid-managed node registration

AWS Systems Manager (formerly known as SSM) is an AWS service that you can use to view and control your servers on AWS cloud and on-premises infrastructure. Systems Manager makes it easy to manage a hybrid environment. To set up servers and virtual machines (VMs) in your hybrid environment as Systems Manager managed instances, you […]

Scaling AWS Governance: How Moeve reduced response times with automated notifications

Moeve, formerly known as Cepsa, is a global integrated energy company with over 90 years of experience and more than 11,000 employees. Moeve is committed to driving Europe’s energy transition and accelerating decarbonization efforts. The company has embraced digital transformation to enhance energy efficiency, safety, and sustainability, focusing on investments in green hydrogen, second-generation biofuels, […]

Simplify AWS Control Tower governance with enhanced AWS CloudFormation Hooks

Introduction Organizations using AWS Control Tower to govern their multi-account environments face a persistent challenge: when AWS CloudFormation deployments fail due to proactive control violations, teams receive minimal information about why the failure occurred or how to fix it. This lack of visibility leads to: Delayed deployments as developers struggle to understand cryptic error messages […]

Featured image for the blog post Deploying custom Terraform to LZA-Managed Accounts with AFT

Deploying custom Terraform to LZA-Managed Accounts with AFT

As organizations scale their AWS environments, managing infrastructure consistently while enabling team autonomy becomes increasingly challenging. Landing Zone Accelerator on AWS (LZA) and AWS Account Factory for Terraform (AFT) both extend AWS Control Tower to help customers manage AWS environments at scale, offering complementary strengths. Many AWS customers struggle to balance centralized security governance with […]

Investigating Service Issues with Amazon CloudWatch Application Signals Custom Metrics

Investigating Service Issues with Amazon CloudWatch Application Signals Custom Metrics

When a critical service fails, you need to know how much revenue you’re losing, not just that latency has increased. This post shows you how to integrate business metrics with CloudWatch Application Signals to see both technical performance and business impact in one unified view. With CloudWatch Application Signals, you can view metrics, traces, and […]

CrossRegionPrivateLinkNetworkSyntheticMonitor

Cross-Region AWS PrivateLink monitoring with Amazon CloudWatch Network Synthetic Monitor

Introduction Global, distributed AWS architectures are the backbone for customers seeking high availability, resilience, and regulatory compliance. Workloads are commonly deployed across multiple AWS Regions and Availability Zones (AZs), often using AWS PrivateLink to connect services securely and privately across Amazon Virtual Private Cloud (Amazon VPC) networks. This approach enhances security and separation while requiring […]