Skip to main content

Models & Regional Endpoints

ToothFairyAI provides comprehensive model management with real-time availability monitoring, regional sovereignty controls, and intelligent routing. This guide covers model discovery, regional endpoint configuration, and degradation data for optimal AI workload management.

📊 Model Discovery via API

Models List Endpoint

All available models are dynamically discoverable via the /models_list endpoint, which provides real-time model availability, capabilities, pricing, and health metrics across all regions.

Endpoint by Region:

RegionEndpointCoverage
AUhttps://ai.toothfairyai.com/models_listAsia-Pacific (Sydney)
EUhttps://ai.eu.toothfairyai.com/models_listEurope (Frankfurt)
UShttps://ai.us.toothfairyai.com/models_listUnited States (N. Virginia)

No Authentication Required:

The models_list endpoint is public and requires no API key or authentication. This enables:

  • Real-time model availability checks without credentials
  • Integration into external monitoring dashboards
  • Automated model selection based on current health metrics
  • Development and testing workflows without workspace access

Response Structure

Each model in the response includes comprehensive metadata:

{
"templates": {
"sorcerer": {
"name": "TF Sorcerer",
"provider": "toothfairyai",
"modelType": "medium",
"maxContextLength": 262144,
"supportsVision": true,
"toolCalling": true,
"dynamicRouting": true,
"pricing": {
"inputPer1mTokens": 0.42,
"outputPer1mTokens": 1.19
},
"health": {
"global": { ... },
"au": { ... },
"eu": { ... },
"us": { ... }
}
}
}
}

Key Fields:

  • name - Human-readable model name
  • provider - Model provider (toothfairyai, openai, anthropic, etc.)
  • modelType - Size category (small, medium, large)
  • maxContextLength - Maximum context window in tokens
  • supportsVision - Whether model can process images
  • toolCalling - Whether model supports function/tool calling
  • dynamicRouting - Whether model uses intelligent routing (Sorcerer/Mystica)
  • deprecated - Whether model is scheduled for removal
  • deploymentType - serverless or provisioned
  • pricing - Input and output costs per million tokens

🌍 Regional Endpoints & Routing Modes

Total Control Over Inference Location

ToothFairyAI offers regional endpoints that give you granular control over exactly where your AI inference happens. This is critical for:

  • Data Residency Compliance - Ensuring data never leaves your jurisdiction
  • Latency Optimization - Running workloads closest to your users
  • Contractual Obligations - Meeting strict geographic processing requirements
  • Cost Optimization - Selecting regions with optimal pricing

Two Routing Modes

ModeBehaviourBest For
Global (default)Prioritises your preferred region but routes to Europe and/or US if model availability in your region drops below 50%Most use cases — prioritises your region while ensuring availability
RegionalRequests stay in your selected region NO matter the degradation level. If capacity is unavailable, the request is queued until capacity frees upMaximum data residency, strict compliance, contractual obligations

How to Configure

Individual Agent Configuration:

  1. Navigate to Settings > Agents > [Select Agent]
  2. Find Regional Settings section
  3. Select Preferred Region (Australia East, Europe West, US East)
  4. Choose Routing Mode (Global or Regional)
  5. Save configuration

Workspace-Wide Enforcement (Business/Enterprise):

Admins can enforce regional inference across all agents in the workspace:

  1. Navigate to Settings > Workspace > Regional
  2. Enable Enforce Regional Inference
  3. Select the required region for all agents
  4. Save — all agents are forced to use the configured region

Why Enforce at Workspace Level:

  • Prevent Misconfiguration Leaks - A single agent misconfigured to "Global" won't accidentally route sensitive data outside your region
  • Compliance Assurance - Enforce regional sovereignty without trusting each team member to configure correctly
  • Audit Simplicity - One workspace-wide setting guarantees compliance rather than auditing individual agent configurations

📈 Degradation Data & Health Metrics

Real-Time Health Monitoring

The models_list endpoint provides real-time health metrics for each model across all regions. This data enables:

  • Intelligent Model Selection - Choose models with best current availability
  • Proactive Monitoring - Detect issues before they impact your workflows
  • Performance Optimization - Select regions with lowest latency for your use case

Health Data Structure

Each model includes health data for four contexts:

{
"health": {
"global": {
"status": "healthy",
"healthScore": 95,
"errorRate": 0,
"avgLatencyMs": 2809,
"reliability": 0.999,
"avgLatencyEma": 2545,
"totalWindows": 5683,
"lastFailureTs": 1777621643
},
"au": { ... },
"eu": { ... },
"us": { ... }
}
}

Metric Definitions:

MetricDescriptionRange
statusCurrent health statushealthy, degraded, unhealthy
healthScoreComposite score based on error rate and latency0-100
errorRatePercentage of failed requests in the last hour0.0-1.0
avgLatencyMsAverage response time in millisecondsmilliseconds
reliabilityLong-term reliability based on historical EMA (Exponential Moving Average)0.0-1.0
avgLatencyEmaSmoothed latency average over timemilliseconds
totalWindowsNumber of monitoring intervals collectedinteger
lastFailureTsUnix timestamp of last failure (null if none)timestamp or null

Status Thresholds:

  • Healthy - healthScore ≥ 80
  • ⚠️ Degraded - healthScore 50-79
  • Unhealthy - healthScore below 50

Using Health Data for Decisions

Example Decision Flow:

  1. Check Preferred Region - Is healthScore ≥ 80?

    • Yes → Use Regional mode for strict residency
    • No → Consider Global mode for availability
  2. Compare Regions - Which region has best metrics?

    • Compare healthScore across au, eu, us
    • Consider latency (avgLatencyMs) for your user base
    • Check reliability for long-term stability
  3. Monitor Degradation - Is errorRate increasing?

    • High errorRate suggests temporary issues
    • Low totalWindows indicates new deployment
    • lastFailureTs shows recent problems

🗄️ Caching Strategy

Intelligent Model Data Caching

Similar to MCP data, ToothFairyAI caches model availability data by region to optimize performance and reduce API calls.

Cache Strategy:

  1. Regional Cache - Each region's models_list is cached independently
  2. TTL (Time To Live) - Cache expires after 60 seconds
  3. Cache Busting - Force refresh with ?t={timestamp} parameter
  4. Parallel Fetching - All regions fetched simultaneously for status pages

Implementation Pattern:

const REGIONAL_ENDPOINTS = {
au: "https://ai.toothfairyai.com/models_list",
eu: "https://ai.eu.toothfairyai.com/models_list",
us: "https://ai.us.toothfairyai.com/models_list"
};

// Fetch all regions in parallel
const results = await Promise.allSettled(
Object.entries(REGIONAL_ENDPOINTS).map(async ([region, url]) => {
const cacheBuster = `?t=${Date.now()}`;
const response = await fetch(url + cacheBuster, {
cache: "no-store",
headers: {
"Cache-Control": "no-cache, no-store, must-revalidate",
},
});
return { region, data: await response.json() };
})
);

Benefits:

  • Reduced Latency - Cached data serves instantly
  • Lower Costs - Fewer API calls to endpoints
  • Better UX - Faster loading of model selection UI
  • Real-Time Option - Cache bypass for fresh data

🎯 Model Selection Best Practices

Choose by Use Case

Reference the Model Selection Guide for detailed recommendations by:

  • Code Generation - Deepseek R1, Qwen Coder family
  • Reasoning & Planning - Sorcerer/Mystica Thinking variants
  • Tool Calling - Mystica, Qwen 3 family
  • Vision Tasks - Llama 4, Qwen-VL family
  • Low Latency - Llama 3.1/3.2 small models

Monitor and Adapt

  1. Check Health Regularly - Use models_list endpoint or Status page
  2. Adapt to Degradation - Switch regions or models when healthScore drops
  3. Test Performance - Measure latency for your specific use cases
  4. Review Costs - Monitor token usage across regions

Enterprise Considerations

For Business and Enterprise plans:

  • Enforce Regional Compliance - Workspace-wide regional enforcement
  • Custom Model Integration - Add 3rd party providers to models list
  • Private Hosting - Deploy models on-premises with full control
  • SLA Guarantees - Production-grade availability commitments

📋 Summary

Key Capabilities:

  • Dynamic Discovery - All models available via public /models_list endpoint
  • Regional Sovereignty - Global or Regional routing modes for full control
  • Real-Time Health - Degradation data for every model in every region
  • Intelligent Caching - Regional caching strategy for optimal performance
  • Workspace Enforcement - Admin-level regional controls for compliance

Quick Reference:

  • Endpoint: https://ai.toothfairyai.com/models_list (public, no auth)
  • Regions: AU, EU, US with independent health metrics
  • Routing Modes: Global (prioritises region, routes if availability drops below 50%) or Regional (strict residency)
  • Health Metrics: status, healthScore, errorRate, latency, reliability
  • Configuration: Agent-level or workspace-wide enforcement

Next Steps:

  • View real-time model status: Models Status Page
  • Choose models by use case: Model Selection Guide
  • Configure regional settings: Settings > Workspace > Regional
  • Monitor degradation: models_list endpoint or Status dashboard