AWS Big Data Blog

Run log analytics for a fraction of the cost with the new engine for Amazon OpenSearch Service

Amazon OpenSearch Service is a real-time retrieval engine for AI, search, and analytics at any scale. As log volumes grow 30–40 percent year over year, organizations face rising infrastructure costs and slower analytical queries across their observability data. Teams are forced to choose between retaining the data they need and staying within budget.

We’re introducing a purpose-built log analytics engine for Amazon OpenSearch Service. This new engine delivers up to 4x price performance, 2x faster data ingestion, up to 2x faster analytical queries, and up to 70 percent lower storage costs. You get all of this without sacrificing search capabilities on the same data.

In this post, you learn how to take advantage of these benefits, see how to get started, and review benchmark results at billion-document scale.

How the optimized engine works

The optimized engine is a new engine mode within the same Amazon OpenSearch Service domain. You use the same console, APIs, security model, and networking configuration that you already use with the general-purpose engine.

OpenSearch Service stores all data in Apache Parquet format. For fields configured as searchable, OpenSearch Service also writes the data to the inverted index. Apache Calcite parses and optimizes each query, then routes operations to the engine best suited to execute them: Apache DataFusion for analytical operations on columnar data, or Lucene for search predicates. The two hand off mid-query, so a single query can search log content and aggregate the results without additional roundtrips.

You ingest data through the same REST APIs and client libraries you use today and you don’t need to change your agents or pipelines. The optimized engine supports two query languages: Piped Processing Language (PPL) and SQL. Both execute natively through the vectorized engine. The Domain Specific Language (DSL) query API is not supported on the optimized engine at launch.

Getting started

At launch, the optimized engine is a domain-level setting selected at creation time. You can’t add the optimized engine to an existing domain or enable it on individual indices or fields within a general-purpose domain. To adopt the optimized engine, create a new domain and migrate your ingestion pipelines to it.

Create a new domain in the Amazon OpenSearch Service console and select Observability as your use case. The optimized engine is enabled by default. The console provides a side-by-side comparison of capabilities to help you choose.

Amazon OpenSearch Service console showing the Observability use case selected with a side-by-side comparison of engine capabilities

After your domain is ready, ingest JSON documents through the same Bulk API and client libraries you use today. No changes to your ingestion pipelines or application code are required.

Benefits of the optimized engine for log analytics

The optimized engine for log analytics introduces the following performance and cost improvements:

  • Up to 4x better price-performance compared to the existing general-purpose engine on internal benchmarks, while retaining full-text search for incident investigation.
  • Up to 2x faster analytical queries. The engine uses a vectorized query execution path that processes data in columnar batches for fast results across large datasets.
  • Up to 2x higher ingestion throughput. The append-only columnar write path increases sustained ingestion rates.
  • Up to 70 percent lower storage with columnar storage for aggregation workloads. You can retain up to 3x more data at the same cost.

To demonstrate these improvements, we benchmarked observability workloads at billion-document scale. In the following sections, we explore the benchmark methodology, test environment, and results. We recommend testing the optimized engine with your own workload to validate the gains for your use case.

Benchmark methodology

We used the Telemetry Generator for OpenTelemetry to generate synthetic traces and logs at scale, producing three observability datasets: OTEL traces, OTEL logs, and web server access logs. We stored the generated data as bulk-format NDJSON in Amazon Simple Storage Service (Amazon S3). We then ingested it through a pipeline on Amazon Elastic Container Service (Amazon ECS) with AWS Fargate. The pipeline reads chunks from Amazon S3, transforms timestamps, and writes to the OpenSearch Bulk API, simulating a production observability flow.

We benchmarked on two OpenSearch Service domains running OpenSearch 3.5, each with 9 data nodes in a 3-Availability Zone configuration:

Configuration Optimized Engine Standard Lucene
Instance type 9x or2.4xlarge.search 9x r8g.4xlarge.search
Leader nodes 3x m7g.large.search 3x m7g.large.search
EBS 2,500 GB gp3, 7,500 IOPS, 500 MB/s per node 2,500 GB gp3, 7,500 IOPS, 500 MB/s per node
Engine mode OPTIMIZED General Purpose (best_compression)

We ingested three data sets totaling 24.4 billion documents and 9.5 TB of raw JSON. All indices used 9 primary shards, 1 replica, and Index State Management (ISM)-managed rollover at 50 GB per primary shard. The Lucene baseline used best_compression (zstd) codec with _source enabled, representing the default customer configuration.

The ingestion pipeline ran on 90 Fargate tasks (16 vCPU, 120 GB RAM each, 48 writer threads per task, bulk size of 3,000 documents) in the same virtual private cloud (VPC) as the OpenSearch Service domains.

Results

Ingestion throughput

The optimized engine’s append-only columnar storage writes segments in bulk-optimized batches without per-document stored field overhead.

Metric Optimized Engine Lucene Baseline
Peak throughput 1.78M docs/sec ~647K docs/sec
Cluster CPU at peak 62% 72%
Write rejections 0 0
Total documents ingested 24.4 billion 15.7 billion

The optimized engine sustained 1.78 million documents per second at matched concurrency, approximately 2x the throughput of the Lucene baseline, while consuming less CPU. Both domains ran with zero write rejections. For teams ingesting terabytes per day, the throughput advantage translates to fewer nodes for the same volume, or longer retention on the same infrastructure.

Storage compression

The columnar Parquet format compresses observability data through dictionary encoding of repeated fields, tight packing of numeric columns, and elimination of per-document JSON overhead.

Measured across 24.4 billion documents:

Dataset Documents Source Optimized Engine Lucene (default)

Compression

vs.

source

Savings vs. Lucene
Web logs 8.76B 2,360 GB 254 GB 614 GB 89% 59%
OTEL logs 8.20B 3,720 GB 815 GB 1,549 GB 78% 47%
OTEL traces 7.43B 4,131 GB 841 GB 1,790 GB 80% 53%
Total 24.4B 9,539 GB 1,910 GB 3,953 GB 80% 52%

The optimized engine stores the same data at 5x compression versus raw JSON (80 percent savings). Against the default Lucene configuration (_source enabled, what most domains run), the optimized engine uses roughly half the storage. The optimized engine derives _source from Parquet columns on read, eliminating the need to store the raw JSON blob while still allowing document retrieval.

Analytical query performance

We measured query latency on a typical observability dashboard pattern: analytical aggregations scoped to a 15-minute time window over billions of log events. The optimized engine uses row-group pruning on the @timestamp column to skip data outside the query window, reading only the relevant subset.

Query pattern Dataset Optimized Engine Lucene baseline Speedup
Error count by service OTEL logs 717 ms 2.8 s 3.9x
Log volume by host OTEL logs 252 ms 17.6 s 70x
5xx errors by service and method OTEL logs 171 ms 885 ms 5.2x
Top services by error OTEL traces 635 ms 569 ms ~1x
Point lookup (single traceId) OTEL traces 394 ms 783 ms 2x

All queries scoped to a 15-minute window. Index sizes: 8.2 billion OTEL log events, 7.4 billion OTEL trace spans.

The optimized engine completes time-filtered analytical queries in 171 ms to 717 ms over billions of documents. The advantage is most pronounced on unfiltered aggregations (log volume by host: 70x) where the columnar engine reads only the columns needed. On queries where the Lucene inverted index provides strong predicate selectivity (top services by error on traces), performance is comparable between the two engines.

Search and point lookups

The optimized engine retains the Lucene inverted index alongside columnar storage. When the query planner recognizes a selective lookup (such as retrieving a single trace by ID), the planner routes the query to the inverted index rather than scanning columnar data. In our benchmark, a single traceId lookup across 7.4 billion spans returned in 165 ms.

This means a real investigation can use both engines in sequence: broad aggregations to localize the problem, then a point lookup to pull the offending trace, all from the same domain.

Now available

The optimized engine for Amazon OpenSearch Service is generally available today in all commercial AWS Regions (Regions other than the AWS GovCloud (US) Regions and the China Regions) where OpenSearch Optimized Instances are available.

Pricing follows standard Amazon OpenSearch Service rates for instances and storage, with no additional premium for the optimized engine. For more information, see Amazon OpenSearch Service Pricing.

To learn more about configuring and using the optimized engine, see Optimized for Log Analytics in the Amazon OpenSearch Service documentation. For an overview of the service, visit Amazon OpenSearch Service Log Analytics.

Give it a try and send feedback to AWS re:Post for Amazon OpenSearch Service or through your usual AWS Support contacts.


About the authors

Jagadish Kumar

Jagadish Kumar

Jagadish is a Senior Solutions Architect at Amazon Web Services, focused on OpenSearch and analytics workloads.

Rohin Bhargava

Rohin Bhargava

Rohin is a Senior Product Manager for Amazon OpenSearch Service.

Michael Supangkat

Michael Supangkat

Michael is a Solutions Architect at Amazon Web Services specializing in search and observability.