Models & Regional Endpoints
ToothFairyAI provides comprehensive model management with real-time availability monitoring, regional sovereignty controls, and intelligent routing. This guide covers model discovery, regional endpoint configuration, and degradation data for optimal AI workload management.
📊 Model Discovery via API
Models List Endpoint
All available models are dynamically discoverable via the /models_list endpoint, which provides real-time model availability, capabilities, pricing, and health metrics across all regions.
Endpoint by Region:
| Region | Endpoint | Coverage |
|---|---|---|
| AU | https://ai.toothfairyai.com/models_list | Asia-Pacific (Sydney) |
| EU | https://ai.eu.toothfairyai.com/models_list | Europe (Frankfurt) |
| US | https://ai.us.toothfairyai.com/models_list | United States (N. Virginia) |
No Authentication Required:
The models_list endpoint is public and requires no API key or authentication. This enables:
- Real-time model availability checks without credentials
- Integration into external monitoring dashboards
- Automated model selection based on current health metrics
- Development and testing workflows without workspace access
Response Structure
Each model in the response includes comprehensive metadata:
{
"templates": {
"sorcerer": {
"name": "TF Sorcerer",
"provider": "toothfairyai",
"modelType": "medium",
"maxContextLength": 262144,
"supportsVision": true,
"toolCalling": true,
"dynamicRouting": true,
"pricing": {
"inputPer1mTokens": 0.42,
"outputPer1mTokens": 1.19
},
"health": {
"global": { ... },
"au": { ... },
"eu": { ... },
"us": { ... }
}
}
}
}
Key Fields:
- name - Human-readable model name
- provider - Model provider (toothfairyai, openai, anthropic, etc.)
- modelType - Size category (small, medium, large)
- maxContextLength - Maximum context window in tokens
- supportsVision - Whether model can process images
- toolCalling - Whether model supports function/tool calling
- dynamicRouting - Whether model uses intelligent routing (Sorcerer/Mystica)
- deprecated - Whether model is scheduled for removal
- deploymentType - serverless or provisioned
- pricing - Input and output costs per million tokens
🌍 Regional Endpoints & Routing Modes
Total Control Over Inference Location
ToothFairyAI offers regional endpoints that give you granular control over exactly where your AI inference happens. This is critical for:
- Data Residency Compliance - Ensuring data never leaves your jurisdiction
- Latency Optimization - Running workloads closest to your users
- Contractual Obligations - Meeting strict geographic processing requirements
- Cost Optimization - Selecting regions with optimal pricing
Two Routing Modes
| Mode | Behaviour | Best For |
|---|---|---|
| Global (default) | Prioritises your preferred region but routes to Europe and/or US if model availability in your region drops below 50% | Most use cases — prioritises your region while ensuring availability |
| Regional | Requests stay in your selected region NO matter the degradation level. If capacity is unavailable, the request is queued until capacity frees up | Maximum data residency, strict compliance, contractual obligations |
How to Configure
Individual Agent Configuration:
- Navigate to Settings > Agents > [Select Agent]
- Find Regional Settings section
- Select Preferred Region (Australia East, Europe West, US East)
- Choose Routing Mode (Global or Regional)
- Save configuration
Workspace-Wide Enforcement (Business/Enterprise):
Admins can enforce regional inference across all agents in the workspace:
- Navigate to Settings > Workspace > Regional
- Enable Enforce Regional Inference
- Select the required region for all agents
- Save — all agents are forced to use the configured region
Why Enforce at Workspace Level:
- Prevent Misconfiguration Leaks - A single agent misconfigured to "Global" won't accidentally route sensitive data outside your region
- Compliance Assurance - Enforce regional sovereignty without trusting each team member to configure correctly
- Audit Simplicity - One workspace-wide setting guarantees compliance rather than auditing individual agent configurations
📈 Degradation Data & Health Metrics
Real-Time Health Monitoring
The models_list endpoint provides real-time health metrics for each model across all regions. This data enables:
- Intelligent Model Selection - Choose models with best current availability
- Proactive Monitoring - Detect issues before they impact your workflows
- Performance Optimization - Select regions with lowest latency for your use case
Health Data Structure
Each model includes health data for four contexts:
{
"health": {
"global": {
"status": "healthy",
"healthScore": 95,
"errorRate": 0,
"avgLatencyMs": 2809,
"reliability": 0.999,
"avgLatencyEma": 2545,
"totalWindows": 5683,
"lastFailureTs": 1777621643
},
"au": { ... },
"eu": { ... },
"us": { ... }
}
}
Metric Definitions:
| Metric | Description | Range |
|---|---|---|
| status | Current health status | healthy, degraded, unhealthy |
| healthScore | Composite score based on error rate and latency | 0-100 |
| errorRate | Percentage of failed requests in the last hour | 0.0-1.0 |
| avgLatencyMs | Average response time in milliseconds | milliseconds |
| reliability | Long-term reliability based on historical EMA (Exponential Moving Average) | 0.0-1.0 |
| avgLatencyEma | Smoothed latency average over time | milliseconds |
| totalWindows | Number of monitoring intervals collected | integer |
| lastFailureTs | Unix timestamp of last failure (null if none) | timestamp or null |
Status Thresholds:
- ✅ Healthy - healthScore ≥ 80
- ⚠️ Degraded - healthScore 50-79
- ❌ Unhealthy - healthScore below 50
Using Health Data for Decisions
Example Decision Flow:
-
Check Preferred Region - Is healthScore ≥ 80?
- Yes → Use Regional mode for strict residency
- No → Consider Global mode for availability
-
Compare Regions - Which region has best metrics?
- Compare healthScore across au, eu, us
- Consider latency (avgLatencyMs) for your user base
- Check reliability for long-term stability
-
Monitor Degradation - Is errorRate increasing?
- High errorRate suggests temporary issues
- Low totalWindows indicates new deployment
- lastFailureTs shows recent problems
🗄️ Caching Strategy
Intelligent Model Data Caching
Similar to MCP data, ToothFairyAI caches model availability data by region to optimize performance and reduce API calls.
Cache Strategy:
- Regional Cache - Each region's models_list is cached independently
- TTL (Time To Live) - Cache expires after 60 seconds
- Cache Busting - Force refresh with
?t={timestamp}parameter - Parallel Fetching - All regions fetched simultaneously for status pages
Implementation Pattern:
const REGIONAL_ENDPOINTS = {
au: "https://ai.toothfairyai.com/models_list",
eu: "https://ai.eu.toothfairyai.com/models_list",
us: "https://ai.us.toothfairyai.com/models_list"
};
// Fetch all regions in parallel
const results = await Promise.allSettled(
Object.entries(REGIONAL_ENDPOINTS).map(async ([region, url]) => {
const cacheBuster = `?t=${Date.now()}`;
const response = await fetch(url + cacheBuster, {
cache: "no-store",
headers: {
"Cache-Control": "no-cache, no-store, must-revalidate",
},
});
return { region, data: await response.json() };
})
);
Benefits:
- Reduced Latency - Cached data serves instantly
- Lower Costs - Fewer API calls to endpoints
- Better UX - Faster loading of model selection UI
- Real-Time Option - Cache bypass for fresh data
🎯 Model Selection Best Practices
Choose by Use Case
Reference the Model Selection Guide for detailed recommendations by:
- Code Generation - Deepseek R1, Qwen Coder family
- Reasoning & Planning - Sorcerer/Mystica Thinking variants
- Tool Calling - Mystica, Qwen 3 family
- Vision Tasks - Llama 4, Qwen-VL family
- Low Latency - Llama 3.1/3.2 small models
Monitor and Adapt
- Check Health Regularly - Use models_list endpoint or Status page
- Adapt to Degradation - Switch regions or models when healthScore drops
- Test Performance - Measure latency for your specific use cases
- Review Costs - Monitor token usage across regions
Enterprise Considerations
For Business and Enterprise plans:
- Enforce Regional Compliance - Workspace-wide regional enforcement
- Custom Model Integration - Add 3rd party providers to models list
- Private Hosting - Deploy models on-premises with full control
- SLA Guarantees - Production-grade availability commitments
📋 Summary
Key Capabilities:
- Dynamic Discovery - All models available via public
/models_listendpoint - Regional Sovereignty - Global or Regional routing modes for full control
- Real-Time Health - Degradation data for every model in every region
- Intelligent Caching - Regional caching strategy for optimal performance
- Workspace Enforcement - Admin-level regional controls for compliance
Quick Reference:
- Endpoint:
https://ai.toothfairyai.com/models_list(public, no auth) - Regions: AU, EU, US with independent health metrics
- Routing Modes: Global (prioritises region, routes if availability drops below 50%) or Regional (strict residency)
- Health Metrics: status, healthScore, errorRate, latency, reliability
- Configuration: Agent-level or workspace-wide enforcement
Next Steps:
- View real-time model status: Models Status Page
- Choose models by use case: Model Selection Guide
- Configure regional settings: Settings > Workspace > Regional
- Monitor degradation: models_list endpoint or Status dashboard