Overview
S4 Firewall is a forwarding proxy you run inside your own Amazon VPC to put a budget and a circuit breaker in front of your LLM token spend. Your application changes only its base_url: S4 Firewall exposes an OpenAI-compatible, an Anthropic Messages-compatible, and a Bedrock-compatible intake, relays each request to the upstream provider your application already uses, and returns the upstream response (including streaming responses, passed through chunk by chunk without buffering so time-to-first-token is preserved).
Headline capability - the runaway-loop circuit breaker. Every request runs through a synchronous in-memory pipeline before it is relayed: attribute the request to a feature, tenant, and customer; reserve the worst-case cost (input tokens counted now, output priced at max_tokens times the output rate); check the reservation against the hierarchy of budgets; then either forward or block. There are two layers, kept honestly distinct. Layer 1, the hard cap, is deterministic and pre-emptive: cumulative spend is known, so any request whose reservation would push the running total past a configured hard cap is blocked before it is relayed - a 100 percent pre-emptive block of over-cap requests, with the same state in producing the same decision out, fixed by chaos tests. Layer 2, the loop block, is best-effort and behavioral: a runaway is only knowable after a few calls (those few are already billed), so this layer detects agent loops, near-duplicate call chains, and in-session amplification and bounds the blast radius - containing the runaway within a small number of requests or a small dollar amount. Layer 2 is explicitly best-effort, not a 100 percent guarantee, and ships with a conservative default plus a dry-run shadow mode so you can measure the false-block rate before you enforce.
Honest budgeting under output uncertainty. The number of output tokens is unknowable until the response returns, so stopping before the bill is incurred is reserve-then-reconcile, not a flat estimate: the reservation uses the worst case to make the hard-cap decision, and when the response returns the provider's reported usage is taken as the source of truth and reconciled against the reservation. Token counts are normalized across providers and split into input, output, cached-read, and cache-write so the per-feature accounting reflects each provider's real rate card.
Attribution and metering. Tag requests by header or API key to attribute token spend to a feature, tenant, or customer - finer than IAM-principal granularity. Spend rolls up by dimension and is emitted to Amazon CloudWatch (namespace S4/Firewall) and, optionally, to a counts-only append-only S3 audit ledger that records token counts, quantities, and attribution metadata - never prompt or response bodies.
Data handling - the honest version. S4 Firewall itself does not persist or transmit your prompts or responses. Its only outbound call is the provider request your application would have made anyway; the firewall does not add an egress. The ledger and metrics carry token counts, not content (counts-not-content), fixed by property tests. Where prompts egress depends on the upstream you choose: when you route to Amazon Bedrock through a VPC interface endpoint (PrivateLink, which this AMI can provision), the Bedrock calls stay inside your VPC/AWS boundary; when you route to a third-party provider on the public internet, that traffic egresses to the internet and does not stay in your VPC. The value S4 Firewall provides is that the firewall does not hoard your prompts, does not send them to any third party of its own, and writes no bodies to the ledger - not that your prompts never leave the VPC.
Operations. No separate control plane and no external database: budget state is held in-memory per instance and re-derived from zero on restart. The data plane is a single static binary running under a hardened systemd unit with zero elevated capabilities and a least-privilege IAM role (upstream model invocation, CloudWatch PutMetricData scoped to the S4/Firewall namespace, and write-only PutObject to the ledger bucket). No telemetry home-call and no license-key check - billing is AMI hourly + annual. Deploy in minutes with the included CloudFormation template, which optionally creates the Bedrock VPC interface endpoint.
S4 Metrics bundle. S4 Firewall is sold standalone and is also offered as a bundle SKU with S4 Metrics (observability) as a separate Marketplace offer - observe plus enforce in one bill. The bundle is a distinct offer entity; this listing's pricing covers the standalone S4 Firewall product.
There is no lock-in - it is a normal Amazon Linux 2023 AMI you run in your own VPC, billed per instance per hour through your AWS bill, with an annual contract option.
Highlights
- Runaway-loop circuit breaker, pre-emptive: a deterministic hard cap blocks any request before it is relayed once a feature/tenant/customer budget would be exceeded (100 percent pre-emptive block of over-cap requests), plus best-effort detection of agent loops and near-duplicate call chains that bounds the blast radius of a runaway agent before it burns the month's budget. (Layer 2 is best-effort, not a 100 percent guarantee; ships with a dry-run shadow mode.)
- Per-feature/tenant/customer token attribution and budgets, in your VPC: a forwarding proxy you point your base_url at (OpenAI-compatible, Anthropic Messages-compatible, Bedrock-compatible). Reserve-then-reconcile accounting uses the provider's reported usage as the source of truth and splits input/output/cached/cache-write at each provider's real rate, emitted to Amazon CloudWatch and an optional counts-only audit ledger.
- No prompts hoarded, no separate control plane: S4 Firewall does not persist or transmit your prompts - the only egress is the provider call your app already makes, and the ledger carries token counts, not content. Optional Amazon Bedrock VPC interface endpoint keeps the Bedrock path inside your AWS boundary. Runs as a single static binary on a standard Amazon Linux 2023 AMI with a least-privilege IAM role and one-click CloudFormation - no external database, no telemetry home-call.
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Features and programs
Financing for AWS Marketplace purchases
Pricing
Dimension | Cost/hour |
|---|---|
c7g.large Recommended | $0.12 |
c6g.medium | $0.12 |
c7g.8xlarge | $0.12 |
c6g.8xlarge | $0.12 |
c6g.large | $0.12 |
c7g.4xlarge | $0.12 |
c6g.4xlarge | $0.12 |
c6g.xlarge | $0.12 |
c7g.xlarge | $0.12 |
c7g.2xlarge | $0.12 |
Vendor refund policy
Email aws-support@abyo.net within 30 days of charge for refund requests; refunds are evaluated case by case.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
64-bit (Arm) Amazon Machine Image (AMI)
Amazon Machine Image (AMI)
An AMI is a virtual image that provides the information required to launch an instance. Amazon EC2 (Elastic Compute Cloud) instances are virtual servers on which you can run your applications and workloads, offering varying combinations of CPU, memory, storage, and networking resources. You can launch as many instances from as many different AMIs as you need.
Version release notes
Initial release: in-VPC LLM token budget and runaway-loop control.
Additional details
Usage instructions
Deploy via the included CloudFormation (cfn-single.yaml for a single instance, or cfn-ha.yaml for a redundant fleet behind an internal load balancer); point your application base_url at the firewall. See the runbook and docs on the AMI.
Support
Vendor support
Email support at aws-support@abyo.net .
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.
Similar products

