Surge
Accelerate your AI inference workloads with instant access to thousands of H100/H200 GPUs through our on-demand infrastructure. Surge provides enterprise-grade autoscaling for containerized AI models with on-demand billing and zero upfront commitments.
Why Choose Surge for AI Inference?
- Instant GPU Scaling - Go from 10 to hundreds of H100/H200 GPUs in minutes
- Cost-Efficient Inference - Pay only for active GPU time ($X/hr per H100) with automatic scale-to-zero
- Seamless Integration - Works with existing ML pipelines through AWS SQS, RabbitMQ, and Redis triggers
- Production-Ready - Built-in monitoring, logging, and automatic recovery for mission-critical workloads
Getting Started
Surge configuration is done through a YAML configuration file in a shared GitHub repository that has been set up for your organisation. The repository structure is
Configuration
The configuration file is available at path values.yaml
. To update your container deployment, simply modify the image
field in your values.yaml
file with your pre-pushed image from our Fluidstack Harbor registry:
It is possible to specify resource specifications in your values.yaml
file. This ensures optimal usage of cluster resources for your containers For example:
This configuration requests 1 GPU, 16GiB of RAM and 8 vCPUs for each container.
Make sure to adjust these values (number of GPUs, memory, and CPU) according to your workload demands.
Deployment
- Push container image to the Atlas Container Registry.
- Update the image tag in your configuration.
- Commit changes and push to the
main
branch of the shared repository. - Monitor deployment in ArgoCD.
Autoscaling
Your deployment automatically scales based on real-time demand using our pre-configured optimization rules. We monitor:
- Queue depths in message brokers (Kafka, SQS, RabbitMQ, Redis etc)
- API request rates
- Resource utilization patterns
For special scaling requirements, our support team can discuss and quickly implement custom rules.
Interested in Surge, contact [email protected].