Rate Limits

Helios uses concurrency limiting to ensure fair usage and system stability. This guide explains how limits work and how to handle them.

How Concurrency Limits Work

Instead of traditional rate limits (requests per minute), Helios limits the number of concurrent requests your account can have processing at once.

Why concurrency limits? Health data analysis takes several minutes per request. Traditional rate limits don’t work well for long-running operations, so we limit concurrent requests instead.

Default Limits

Agent Type	Default Limit
Light Agent	7 concurrent requests
Deep Agent	7 concurrent requests
Lab Results Agent	7 concurrent requests

Limits are tracked separately for each agent type. You can have 7 Light Agent requests, 7 Deep Agent requests, and 7 Lab Results Agent requests running simultaneously.

Need higher limits? Contact support@heliosinc.xyz to discuss custom limits for your account.

Response Headers

Every API response includes headers showing your current concurrency status:

Header	Description	Example
`X-Concurrent-Limit`	Maximum concurrent requests allowed	`7`
`X-Concurrent-Active`	Currently active requests (including this one)	`3`
`X-Concurrent-Remaining`	Available slots	`4`

These headers are included on every API response (200 OK, 429, etc.), not just rate limit errors. This allows you to monitor your concurrency usage proactively.

Example Response Headers (200 OK)

HTTP/1.1 200 OK
X-Concurrent-Limit: 7
X-Concurrent-Active: 3
X-Concurrent-Remaining: 4
Content-Type: application/json

Handling Rate Limit Errors

When you exceed your concurrency limit, the API returns a 429 Too Many Requests response:

{
  "error": "Too many concurrent requests",
  "message": "You have 7 active light agent requests. Maximum is 7.",
  "activeCount": 7,
  "limit": 7
}

Response Headers on 429

HTTP/1.1 429 Too Many Requests
X-Concurrent-Limit: 7
X-Concurrent-Active: 7
X-Concurrent-Remaining: 0
Retry-After: 60
Content-Type: application/json

The Retry-After header suggests how long to wait before retrying (in seconds).

Best Practices

Monitor your concurrency headers

Check the X-Concurrent-Remaining header on every response to know when you’re approaching your limit.

const response = await fetch('https://api.heliosai.health/api/v1/agent', {
  method: 'POST',
  headers: { 'x-api-key': apiKey, 'Content-Type': 'application/json' },
  body: JSON.stringify(requestData)
});

const remaining = response.headers.get('X-Concurrent-Remaining');
if (parseInt(remaining) < 2) {
  console.warn('Approaching concurrency limit');
}

Implement exponential backoff

When you receive a 429, wait before retrying:

async function submitWithRetry(requestData, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch('https://api.heliosai.health/api/v1/agent', {
      method: 'POST',
      headers: { 'x-api-key': apiKey, 'Content-Type': 'application/json' },
      body: JSON.stringify(requestData)
    });
    
    if (response.status === 429) {
      const retryAfter = response.headers.get('Retry-After') || '60';
      const waitTime = parseInt(retryAfter) * 1000;
      console.log(`Rate limited. Waiting ${retryAfter}s before retry...`);
      await new Promise(r => setTimeout(r, waitTime));
      continue;
    }
    
    return response;
  }
  throw new Error('Max retries exceeded');
}

Queue requests on your side

For high-volume use cases, implement a queue to control how many requests you submit:

class RequestQueue {
  constructor(maxConcurrent = 6) {
    this.maxConcurrent = maxConcurrent;
    this.active = 0;
    this.queue = [];
  }
  
  async submit(requestData) {
    if (this.active >= this.maxConcurrent) {
      await new Promise(resolve => this.queue.push(resolve));
    }
    
    this.active++;
    try {
      return await fetch('https://api.heliosai.health/api/v1/agent', {
        method: 'POST',
        headers: { 'x-api-key': apiKey, 'Content-Type': 'application/json' },
        body: JSON.stringify(requestData)
      });
    } finally {
      this.active--;
      if (this.queue.length > 0) {
        this.queue.shift()();
      }
    }
  }
}

Track active requests

Store the runId of submitted requests so you can track what’s in progress:

const activeRequests = new Map();

// When submitting
const { runId } = await response.json();
activeRequests.set(runId, { submittedAt: new Date(), data: requestData });

// When receiving webhook
app.post('/webhook', (req, res) => {
  const { runId, status } = req.body;
  activeRequests.delete(runId);
  res.status(200).send('OK');
});

Slot Release

Concurrency slots are automatically released when:

Processing completes - Successfully or with an error
Webhook is delivered - After the final webhook is sent
Timeout occurs - After ~14 minutes if processing hangs (safety mechanism)

If your webhook endpoint is slow to respond, slots may take slightly longer to release. Ensure your webhook returns 200 OK quickly.

How Concurrency Limits Work

Default Limits

Response Headers

Example Response Headers (200 OK)

Handling Rate Limit Errors

Response Headers on 429

Best Practices

Slot Release

Next Steps

Quickstart

Webhooks

​How Concurrency Limits Work

​Default Limits

​Response Headers

​Example Response Headers (200 OK)

​Handling Rate Limit Errors

​Response Headers on 429

​Best Practices

​Slot Release

​Next Steps

Quickstart

Webhooks

How Concurrency Limits Work

Default Limits

Response Headers

Example Response Headers (200 OK)

Handling Rate Limit Errors

Response Headers on 429

Best Practices

Slot Release

Next Steps