Overview
MatchQ cluster monitoring dashboard
Cluster-level dashboard showing Slurm job states, resource utilization, and job-level cost visibility, with clickable instance identifiers for drill-down into detailed metrics.
MatchQ cluster monitoring dashboard
MatchQ instance-level monitoring
MatchQ is a self hosted Slurm platform deployed into the customer's AWS account, providing a familiar operational model for HPC environments while adding cloud specific automation, monitoring, and cost visibility.
The platform supports a wide range of HPC workloads, including semiconductor chip design, bioinformatics, media rendering, machine learning, and AI. MatchQ integrates natively with AWS IAM roles, networking, and tagging, and can operate in cloud only, hybrid, or on premises controlled configurations.
MatchQ includes precompiled multi OS and multi architecture binaries, assisted deployment using CloudFormation, integrated dashboards, and helper scripts to manage partitions, node groups, and scaling policies. This enables teams to focus on running jobs efficiently instead of managing infrastructure.
Pricing is a predictable flat subscription, not tied to instance count, vCPU-hours, or other dynamic usage metrics that can lead to large and unexpected charges as workloads scale.
Subscription tiers MatchQ is offered in three tiers based on cluster size. All tiers include the full MatchQ platform, software updates, onboarding assistance, migration assistance from other job schedulers, and email/ticket support during business hours.
MatchQ Small: For clusters with peak capacity up to 50 concurrent compute nodes. The right fit for development and evaluation environments, smaller production clusters, and teams getting started with HPC on AWS.
MatchQ Medium: For clusters with peak capacity of 50 to 250 concurrent compute nodes. Includes everything in Small, plus 24x7 on-call support for production-critical incidents, defined response SLAs, and quarterly architecture and optimization reviews.
MatchQ Large: For clusters with peak capacity above 250 concurrent compute nodes, or for organizations with complex production requirements. Includes everything in Medium, plus faster response SLAs, dedicated engineer hours per quarter for hands-on optimization and custom integrations, and priority access to new features and beta capabilities.
Customers should select the tier that matches their expected peak concurrent compute node count. Modality reviews usage with each customer at annual renewal to confirm the right tier for the next term.
Highlights
- Self hosted Slurm platform deployed directly into the customer's AWS VPC with full control, native AWS integration, and no vendor lock in.
- Supports multi architecture and multi OS HPC clusters including ARM64 and x86_64 with automated provisioning, monitoring, and cost visibility.
- Precompiled binaries, assisted installation, and helper scripts enable safe, production ready Slurm operation from day one, with monitoring, accounting, and cost visibility included.
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Features and programs
Financing for AWS Marketplace purchases
Pricing
Dimension | Description | Cost/month |
|---|---|---|
MatchQ Platform (Legacy) | Legacy dimension for existing subscribers. New customers should subscribe to MatchQ Small, Medium, or Large. | $600.00 |
MatchQ Small | MatchQ subscription for clusters with peak capacity up to 50 concurrent compute nodes. Includes the full MatchQ platform with no per-instance fees and no caps. | $600.00 |
MatchQ Medium | MatchQ subscription for clusters with peak capacity of 50 to 250 concurrent compute nodes. Includes the full MatchQ platform with no per-instance fees and no caps, plus 24x7 support for production-critical incidents. | $4,800.00 |
MatchQ Large | MatchQ subscription for clusters with peak capacity above 250 concurrent compute nodes. Includes the full MatchQ platform with no per-instance fees and no caps, plus 24x7 support and dedicated engineer hours per quarter. | $12,000.00 |
Vendor refund policy
Refunds are evaluated on a case-by-case basis. Customers may request a refund within 7 days of purchase if the product is unable to function as described. Refund requests must be submitted to matchqsupport@modality.cloud
Custom pricing options
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
MatchQ Slurm Cluster with Monitoring
This CloudFormation template deploys a complete Slurm HPC cluster optimized for AI/ML, EDA, and scientific computing workloads on AWS.
WHATS INCLUDED:
Head Node (Graviton ARM64):
- Slurm controller (slurmctld) and database daemon (slurmdbd)
- Pre-configured Prometheus and Grafana monitoring stack
- CloudConnector for automatic compute node scaling
- Helper scripts for partition and nodegroup management
- Cost tracking with automatic job tagging
Compute Nodes (Auto-Scaled):
- Multi-architecture support: ARM64 (Graviton) and x86_64
- Multi-OS support: Amazon Linux 2/2023, Ubuntu 22.04/24.04, Rocky Linux 8, CentOS 7
- Spot and On-Demand instances via EC2 Fleet API
- Constraint-based scheduling with Slurm features and weights
Infrastructure:
- RDS MySQL for Slurm accounting database
- Security groups with least-privilege access
- IAM roles for EC2 Fleet management
- S3-based binary and script distribution
DEPLOYMENT TIME: Approximately 15 minutes for full cluster deployment.
REQUIREMENTS:
- Existing VPC with public and private subnets
- Key pair for SSH access
- Sufficient EC2 and vCPU quota for desired compute capacity
POST-DEPLOYMENT:
- Access custom Grafana dashboards on port 3000
- SSH to head node and run Slurm commands (sbatch, srun, squeue)
- Use helper scripts to create partitions and nodegroups
- Configure hybrid connectivity for on-premises workers (optional)
CloudFormation Template (CFT)
AWS CloudFormation templates are JSON or YAML-formatted text files that simplify provisioning and management on AWS. The templates describe the service or application architecture you want to deploy, and AWS CloudFormation uses those templates to provision and configure the required services (such as Amazon EC2 instances or Amazon RDS DB instances). The deployed application and associated resources are called a "stack."
Version release notes
Overview
Initial release of MatchQ, a self-hosted Slurm platform deployed into the customer AWS environment to support production HPC workloads. MatchQ is built on upstream Slurm and adds cloud automation, operational tooling, and integrated monitoring to simplify running Slurm at scale on AWS.
Key capabilities
Upstream Slurm compatible platform with no workflow or job script changes required Automated provisioning and lifecycle management of EC2 compute capacity based on Slurm demand Support for mixed ARM64 and x86_64 compute environments under a single partition Integration with AWS networking, IAM roles, and customer-managed identity and security controls
Deployment and operations
Pre-baked AMIs and assisted installation to accelerate initial environment setup Head node based on ARM64 (Graviton) with dynamic compute fleets provisioned on demand Helper scripts and operational tooling for common cluster management tasks, including node groups and partitions Support for cloud-only and hybrid execution models, including on-prem compute workers
Monitoring and visibility
Integrated monitoring dashboards providing visibility into cluster health, resource utilization, and job activity Correlation of Slurm job data with underlying AWS infrastructure metrics to improve operational insight Support for summary views by user, partition, and workload characteristics
Included components
Slurm controller and accounting services Managed database backend for Slurm accounting Monitoring stack with cluster-level and node-level dashboards Cloud connector for elastic compute provisioning
Supported regions
Europe (Frankfurt, Ireland, Spain) United States (N. Virginia, Oregon) Middle East (Israel)
Additional details
Usage instructions
Launch the product using the provided AWS CloudFormation template and supply the required parameters, including VPC, subnets, and authentication settings.
After the stack completes, access the MatchQ head node using the configured credentials and begin submitting Slurm jobs using standard Slurm commands. Compute capacity is provisioned automatically based on queue demand.
Customers may optionally configure separate submit or login machines that connect to the MatchQ cluster for job submission, allowing users to submit and manage jobs without direct access to the head node.
Monitoring dashboards are available once the environment is running. Customers can further customize node groups, instance policies, and hybrid execution settings using the provided configuration options and helper scripts.
Support
Vendor support
Email: matchqsupport@modality.cloud
Support description:
MatchQ is offered and supported by Modality, an AWS Advanced Consulting Partner specializing in HPC and cloud optimization. Support includes onboarding assistance, platform configuration guidance, troubleshooting, and best-practice recommendations for performance, reliability, and cost efficiency. Support tiers include 24x7 on-call coverage and ongoing engagement with senior HPC engineers for production environments. Tier-specific details are described in the product pricing.
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.