Overview
Gravity-16B-A3B-Preview is a Mixture-of-Experts (MoE) large language model from Trillion Labs, built to deliver strong reasoning and generation quality at a fraction of the inference cost of comparable dense models. It has 16 billion total parameters but activates only about 3 billion per token, routing each token through a small subset of expert networks. This keeps latency and GPU cost low while preserving the breadth of a much larger model.
The model generates explicit step-by-step reasoning before its final answer, which improves reliability on analytical and multi-step tasks, and it supports a 32,768-token context window for long documents and conversations. It performs strongly in both English and Korean, making it well suited to multilingual and Korea-focused applications.
Gravity exposes a standard OpenAI-compatible chat-completions API, so existing OpenAI client libraries and tooling work without code changes. Deploy it as a real-time Amazon SageMaker endpoint for interactive applications, or run batch transform jobs to process large datasets offline. Target users: developers and enterprises building reasoning, chat, agentic, or content-generation applications, including bilingual English and Korean use cases.
Highlights
- Cost-efficient MoE: only about 3B of 16B parameters activate per token, delivering high-quality output at lower GPU cost and latency than dense models of comparable capability.
- Built-in reasoning: produces explicit step-by-step reasoning before its final answer, improving reliability on multi-step and analytical tasks.
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Features and programs
Financing for AWS Marketplace purchases
Pricing
Vendor refund policy
Software charges for Gravity-16B-A3B-Preview are metered by AWS Marketplace based on your usage (per instance-hour, or per the terms of your offer) and are generally non-refundable once incurred.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Amazon SageMaker model
An Amazon SageMaker model package is a pre-trained machine learning model ready to use without additional training. Use the model package to create a model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.
Version release notes
Public Release
Additional details
Inputs
- Summary
POST to /invocations with Content-Type application/json. Body is an OpenAI chat-completions object: model (must be Gravity-16B-A3B-Preview), messages array (roles system, user, assistant), optional max_tokens, temperature, top_p, stop.
- Input MIME type
- application/json
Support
Vendor support
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.