Analytics | AWS Big Data Blog

Beyond JSON blobs: Implementing the VARIANT data type in Apache Iceberg V3

This post is part 1 of a two-part series. We walk through the basics: creating an Iceberg V3 table with a VARIANT column, inserting semi-structured data, and querying it with variant_get(). In Part 2, we scale to millions of rows and benchmark VARIANT against traditional string storage. We measure the difference in query performance and storage footprint.

Upgrade PySpark from Spark 3.5 to Spark 4.0 with AWS Spark Upgrade Agent

In this post, we walk through a hands-on PySpark migration from Spark 3.5 to Spark 4.0 on Amazon EMR Serverless, using the AWS Spark Upgrade Agent. You’ll see how the agent iteratively validates your application on a live Amazon EMR Serverless application, automatically diagnosing and resolving failures from Amazon CloudWatch logs until the job succeeds.

Announcing Spark Connect on Amazon EMR Serverless: Interactive PySpark development, anywhere

Today, AWS is announcing support for Spark Connect on Amazon EMR Serverless with EMR release 7.13 (Apache Spark 3.5.6) and later versions. You can now build and debug Spark applications from your preferred local environment while running full-scale Spark operations on EMR Serverless.

Build stateful streaming applications with Apache Spark 4.0 on Amazon EMR Serverless

In this post, we demonstrate how to build a production-ready IoT device monitoring system using Spark 4.0’s transformWithState API on Amazon EMR Serverless. This example showcases the key capabilities of stateful streaming and provides a template you can adapt for your own use cases.

Announcing general availability of Apache Spark 4.0 on Amazon EMR

With this general availability announcement, Spark 4.0 is now supported across Amazon EMR Serverless, Amazon EMR on EC2, and Amazon EMR on EKS deployment options. In this post, you’ll learn about key Spark 4.0 capabilities now available on Amazon EMR including Spark Connect, the Variant data type, SQL scripting, Python API improvements, and streaming enhancements, along with infrastructure changes in the new emr-spark-8.0 release.

Unlock cost savings with incremental snapshot billing for Amazon Redshift Serverless and Amazon Redshift RG

Starting June 8, 2026, Amazon Redshift is introducing an incremental snapshot billing model for Amazon Redshift Serverless and Amazon Redshift RG (provisioned instances powered by AWS Graviton). With this enhancement, you pay only for the unique data blocks across your active manual snapshots within your account. This delivers significant cost savings for customers who have multiple snapshots that contain largely identical data blocks. In this post, you will learn how the new incremental snapshot billing model works, the customer use cases it addresses, and how it helps you optimize costs while improving your Recovery Point Objective (RPO).

Query Amazon Redshift using natural language with Kiro

In this post, you learn how to set up Kiro with the Amazon Redshift MCP server to query your data warehouse using natural language. You explore cluster discovery, schema browsing, analytical queries, cross-cluster comparisons, and data quality checks, all without writing SQL from scratch or switching between tools.

AWS Big Data Blog

Category: Analytics

Beyond JSON blobs: Implementing the VARIANT data type in Apache Iceberg V3

Upgrade PySpark from Spark 3.5 to Spark 4.0 with AWS Spark Upgrade Agent

Announcing Spark Connect on Amazon EMR Serverless: Interactive PySpark development, anywhere

Build stateful streaming applications with Apache Spark 4.0 on Amazon EMR Serverless

Announcing general availability of Apache Spark 4.0 on Amazon EMR

Unlock cost savings with incremental snapshot billing for Amazon Redshift Serverless and Amazon Redshift RG

Query Amazon Redshift using natural language with Kiro

Build governance dashboards for Amazon SageMaker Catalog with Amazon Quick

Accelerate SQL development with SageMaker Data Agent in Query Editor

Schedule notebook runs in Amazon SageMaker Unified Studio

Learn

Resources

Developers

Help