Overview
pg-cdc
Bring governed operational data to Al agents.
pg-cdc: PostgreSQL Change Data Capture for Amazon S3
pg-cdc is a container-deployed CDC platform that continuously streams PostgreSQL changes into analytics-ready Apache Parquet and Apache Iceberg datasets on Amazon S3. It produces query-ready tables directly from the WAL - no intermediate Kafka cluster, no external transformation layer, and no data leaving your AWS account.
How It Works
- Connect - Point pg-cdc at your PostgreSQL instance with logical replication enabled (wal_level = logical).
- Snapshot - pg-cdc performs an initial consistent snapshot of selected tables.
- Stream - Ongoing WAL changes are captured, transformed into Parquet files, and committed as Iceberg table updates in Amazon S3.
- Recover - Built-in checkpointing ensures exactly-once delivery and fast recovery from interruptions.
Key Capabilities
- Continuous WAL Capture - Streams inserts, updates, and deletes in near real-time from PostgreSQL logical replication slots.
- Apache Iceberg & Parquet Output - Produces open-format, schema-evolving datasets that integrate with Amazon Athena, Redshift Spectrum, EMR, and Spark.
- Schema Evolution - Automatically handles column additions and type changes without manual intervention.
- Checkpoint Recovery - Resumes from the last committed position after restarts or failures, preventing data loss or duplication.
- Snapshot Initialization - Bootstraps historical data before switching to incremental streaming.
- Amazon S3 Native - Writes directly to your S3 bucket with support for SSE-S3 or SSE-KMS encryption at rest.
Security & Data Isolation
pg-cdc runs entirely inside your AWS account as a container workload. Operational data never transits external infrastructure. Key security properties include:
- All replication traffic encrypted in transit via TLS.
- S3 objects encrypted at rest using your chosen KMS key or SSE-S3.
- Least-privilege IAM policies scoped to the specific S3 prefix and replication slot.
- Container image scanned for vulnerabilities before each release.
- No outbound data paths - Burnside Project provides licensing and software updates only; customer data remains fully isolated.
Deployment
pg-cdc is delivered as a container image through AWS Marketplace. Deploy it on Amazon ECS, EKS, or Fargate within your VPC. Prerequisites include a PostgreSQL instance (version 12 or later) with logical replication enabled and an S3 bucket for output. Configuration is handled through environment variables or a mounted config file.
Use Cases
- Lakehouse Analytics - Replicate transactional tables into an Iceberg lakehouse queryable by Athena or Spark, eliminating nightly batch ETL.
- AI Feature Stores - Feed fresh operational data into ML feature pipelines without impacting production database performance.
- Business Intelligence - Provide governed, near real-time datasets to BI tools without granting direct database access.
- Compliance & Audit - Maintain an immutable, versioned history of database changes in S3 for regulatory retention requirements.
Getting Started
Subscribe through AWS Marketplace, deploy the container in your VPC, and configure your PostgreSQL connection. pg-cdc begins streaming data within minutes of deployment. Contact Burnside Project to schedule a guided deployment session or request a pilot.
Highlights
- Produces query-ready Iceberg and Parquet tables directly from PostgreSQL WAL changes - no Kafka cluster, no external transformation layer, and no batch ETL jobs to maintain. Data lands in Amazon S3 in open formats immediately consumable by Athena, Redshift Spectrum, EMR, and Spark without additional processing steps.
- Deploys as a container entirely within your AWS account (ECS, EKS, or Fargate). Customer data never leaves your VPC or transits third-party infrastructure, reducing compliance scope, eliminating data-processing agreements with external vendors, and removing cross-account egress costs. Burnside Project provides only licensing and software updates.
- Built-in schema evolution, checkpoint recovery, and snapshot initialization handle operational complexity automatically. Column additions propagate without manual intervention, restarts resume from the last committed WAL position with no data loss or duplication, and initial table bootstrapping runs in parallel before switching to continuous streaming.
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Features and programs
Financing for AWS Marketplace purchases
Pricing
Dimension | Description | Cost/month |
|---|---|---|
pg-cdc Pro | One licensed pg-cdc pipeline (PostgreSQL CDC -> S3 Parquet/Iceberg) at the Pro tier, for the contract term. WAL streaming, typed Parquet/Iceberg output, license-gated operation. Parquet/Iceberg output, license-gated operation. | $399.00 |
pg-cdc Enterprise | pg-cdc at the Enterprise tier for the contract term - Pro features plus [enterprise add-ons: e.g. managed-source/RDS IAM, priority support] | $699.00 |
Vendor refund policy
We offer a 30-day money-back guarantee for initial pg-cdc purchases. If the product does not meet your needs, contact support@burnsideproject.ai within 30 days of purchase with your AWS account ID and order details. After 30 days, purchases are non-refundable except for AWS Marketplace billing errors or verified material product defects. Approved refunds are processed through AWS Marketplace. This policy does not limit any rights provided under applicable law.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
pg-cdc Container
- Amazon ECS
- Amazon ECS Anywhere
- Amazon EKS Anywhere
Container image
Containers are lightweight, portable execution environments that wrap server application software in a filesystem that includes everything it needs to run. Container applications run on supported container runtimes and orchestration services, such as Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS). Both eliminate the need for you to install and operate your own container orchestration software by managing and scheduling containers on a scalable cluster of virtual machines.
Version release notes
Initial public release of pg-cdc.
Features:
- PostgreSQL Change Data Capture (CDC)
- Initial snapshot and incremental replication
- Amazon S3 output
- Apache Parquet generation
- Apache Iceberg table support
- Schema evolution
- Checkpoint recovery
- Customer-managed deployment in AWS
- Burnside license activation
This release establishes the core platform for creating governed operational data products for analytics and AI applications.
Additional details
Usage instructions
- Subscribe to pg-cdc through AWS Marketplace.
- Pull the pg-cdc container image from AWS Marketplace.
- Deploy the container using Amazon ECS, Amazon EKS, Amazon ECS Anywhere, or a supported Kubernetes environment.
- Configure PostgreSQL connection information, Amazon S3 destination, AWS IAM permissions, and Burnside license credentials using environment variables or Kubernetes secrets.
- Activate the product through the Burnside Project license server.
- Start continuous PostgreSQL Change Data Capture.
The container runs entirely within the customer's AWS account. Customer operational data never leaves customer-controlled infrastructure.
Documentation: https://github.com/burnside-project/pg-cdc
Support: support@burnsideproject.ai
Resources
Vendor resources
Support
Vendor support
Contacting Support
Burnside Project provides support for pg-cdc via email at support@burnsideproject.ai . Reach out for assistance with deployment, configuration, troubleshooting, upgrades, or refund requests.
Scope of Support
Support covers installation and deployment guidance, PostgreSQL connection configuration, schema evolution questions, checkpoint recovery issues, S3 and Iceberg output troubleshooting, and software update assistance.
For additional details on support availability or to discuss your specific requirements, contact the team at support@burnsideproject.ai .
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.