AWS Architecture Blog
Category: Learning Levels
Automate safety monitoring with computer vision and generative AI
This post describes a solution that uses fixed camera networks to monitor operational environments in near real-time, detecting potential safety hazards while capturing object floor projections and their relationships to floor markings. While we illustrate the approach through distribution center deployment examples, the underlying architecture applies broadly across industries. We explore the architectural decisions, strategies for scaling to hundreds of sites, reducing site onboarding time, synthetic data generation using generative AI tools like GLIGEN, and other critical technical hurdles we overcame.
Streamlining access to powerful disaster recovery capabilities of AWS
In this blog post, we take a building blocks approach. Starting with the tools like AWS Backup to protect your data, we then add protection for Amazon Elastic Compute Cloud (Amazon EC2) compute using AWS Elastic Disaster Recovery (AWS DRS). Finally, we show how to use the full capabilities of AWS to restore your entire workload—data, infrastructure, networking, and configuration, using Arpio disaster recovery automation.
Architecting for agentic AI development on AWS
In this post, we demonstrate how to architect AWS systems that enable AI agents to iterate rapidly through design patterns for both system architecture and code base structure. We first examine the architectural problems that limit agentic development today. We then walk through system architecture patterns that support rapid experimentation, followed by codebase patterns that help AI agents understand, modify, and validate your applications with confidence.
AI-powered event response for Amazon EKS
In this post, you’ll learn how AWS DevOps Agent integrates with your existing observability stack to provide intelligent, automated responses to system events.
How Convera built fine-grained API authorization with Amazon Verified Permissions
In this post, we share how Convera used Amazon Verified Permissions to build a fine-grained authorization model for their API platform.
Mastering millisecond latency and millions of events: The event-driven architecture behind the Amazon Key Suite
In this post, we explore how the Amazon Key team used Amazon EventBridge to modernize their architecture, transforming a tightly coupled monolithic system into a resilient, event-driven solution. We explore the technical challenges we faced, our implementation approach, and the architectural patterns that helped us achieve improved reliability and scalability. The post covers our solutions for managing event schemas at scale, handling multiple service integrations efficiently, and building an extensible architecture that accommodates future growth.
Sovereign failover – Design for digital sovereignty using the AWS European Sovereign Cloud
This post explores the architectural patterns, challenges, and best practices for building cross-partition failover, covering network connectivity, authentication, and governance. By understanding these constraints, you can design resilient cloud-native applications that balance regulatory compliance with operational continuity.
Secure Amazon Elastic VMware Service (Amazon EVS) with AWS Network Firewall
In this post, we demonstrate how to utilize AWS Network Firewall to secure an Amazon EVS environment, using a centralized inspection architecture across an EVS cluster, VPCs, on-premises data centers and the internet. We walk through the implementation steps to deploy this architecture using AWS Network Firewall and AWS Transit Gateway.
Building an AI gateway to Amazon Bedrock with Amazon API Gateway
In this post, we’ll explore a reference architecture that helps enterprises govern their Amazon Bedrock implementations using Amazon API Gateway. This pattern enables key capabilities like authorization controls, usage quotas, and real-time response streaming. We’ll examine the architecture, provide deployment steps, and discuss potential enhancements to help you implement AI governance at scale.
Build resilient generative AI agents
Generative AI agents in production environments demand resilience strategies that go beyond traditional software patterns. AI agents make autonomous decisions, consume substantial computational resources, and interact with external systems in unpredictable ways. These characteristics create failure modes that conventional resilience approaches might not address. This post presents a framework for AI agent resilience risk analysis […]








