Models & Regional Endpoints

ToothFairyAI provides comprehensive model management with real-time availability monitoring, regional sovereignty controls, and intelligent routing. This guide covers model discovery, regional endpoint configuration, and degradation data for optimal AI workload management.

📊 Model Discovery via API

Models List Endpoint

All available models are dynamically discoverable via the /models_list endpoint, which provides real-time model availability, capabilities, pricing, and health metrics across all regions.

Endpoint by Region:

Region	Endpoint	Coverage
AU	`https://ai.toothfairyai.com/models_list`	Asia-Pacific (Sydney)
EU	`https://ai.eu.toothfairyai.com/models_list`	Europe (Frankfurt)
US	`https://ai.us.toothfairyai.com/models_list`	United States (N. Virginia)

No Authentication Required:

The models_list endpoint is public and requires no API key or authentication. This enables:

Real-time model availability checks without credentials
Integration into external monitoring dashboards
Automated model selection based on current health metrics
Development and testing workflows without workspace access

Response Structure

Each model in the response includes comprehensive metadata:

{
  "templates": {
    "sorcerer": {
      "name": "TF Sorcerer",
      "provider": "toothfairyai",
      "modelType": "medium",
      "maxContextLength": 262144,
      "supportsVision": true,
      "toolCalling": true,
      "dynamicRouting": true,
      "pricing": {
        "inputPer1mTokens": 0.42,
        "outputPer1mTokens": 1.19
      },
      "health": {
        "global": { ... },
        "au": { ... },
        "eu": { ... },
        "us": { ... }
      }
    }
  }
}

Key Fields:

name - Human-readable model name
provider - Model provider (toothfairyai, openai, anthropic, etc.)
modelType - Size category (small, medium, large)
maxContextLength - Maximum context window in tokens
supportsVision - Whether model can process images
toolCalling - Whether model supports function/tool calling
dynamicRouting - Whether model uses intelligent routing (Sorcerer/Mystica)
deprecated - Whether model is scheduled for removal
deploymentType - serverless or provisioned
pricing - Input and output costs per million tokens

🌍 Regional Endpoints & Routing Modes

Total Control Over Inference Location

ToothFairyAI offers regional endpoints that give you granular control over exactly where your AI inference happens. This is critical for:

Data Residency Compliance - Ensuring data never leaves your jurisdiction
Latency Optimization - Running workloads closest to your users
Contractual Obligations - Meeting strict geographic processing requirements
Cost Optimization - Selecting regions with optimal pricing

Two Routing Modes

Mode	Behaviour	Best For
Global (default)	Prioritises your preferred region but routes to Europe and/or US if model availability in your region drops below 50%	Most use cases — prioritises your region while ensuring availability
Regional	Requests stay in your selected region NO matter the degradation level. If capacity is unavailable, the request is queued until capacity frees up	Maximum data residency, strict compliance, contractual obligations

How to Configure

Individual Agent Configuration:

Navigate to Settings > Agents > [Select Agent]
Find Regional Settings section
Select Preferred Region (Australia East, Europe West, US East)
Choose Routing Mode (Global or Regional)
Save configuration

Workspace-Wide Enforcement (Business/Enterprise):

Admins can enforce regional inference across all agents in the workspace:

Navigate to Settings > Workspace > Regional
Enable Enforce Regional Inference
Select the required region for all agents
Save — all agents are forced to use the configured region

Why Enforce at Workspace Level:

Prevent Misconfiguration Leaks - A single agent misconfigured to "Global" won't accidentally route sensitive data outside your region
Compliance Assurance - Enforce regional sovereignty without trusting each team member to configure correctly
Audit Simplicity - One workspace-wide setting guarantees compliance rather than auditing individual agent configurations

📈 Degradation Data & Health Metrics

Real-Time Health Monitoring

The models_list endpoint provides real-time health metrics for each model across all regions. This data enables:

Intelligent Model Selection - Choose models with best current availability
Proactive Monitoring - Detect issues before they impact your workflows
Performance Optimization - Select regions with lowest latency for your use case

Health Data Structure

Each model includes health data for four contexts:

{
  "health": {
    "global": {
      "status": "healthy",
      "healthScore": 95,
      "errorRate": 0,
      "avgLatencyMs": 2809,
      "reliability": 0.999,
      "avgLatencyEma": 2545,
      "totalWindows": 5683,
      "lastFailureTs": 1777621643
    },
    "au": { ... },
    "eu": { ... },
    "us": { ... }
  }
}

Metric Definitions:

Metric	Description	Range
status	Current health status	`healthy`, `degraded`, `unhealthy`
healthScore	Composite score based on error rate and latency	0-100
errorRate	Percentage of failed requests in the last hour	0.0-1.0
avgLatencyMs	Average response time in milliseconds	milliseconds
reliability	Long-term reliability based on historical EMA (Exponential Moving Average)	0.0-1.0
avgLatencyEma	Smoothed latency average over time	milliseconds
totalWindows	Number of monitoring intervals collected	integer
lastFailureTs	Unix timestamp of last failure (null if none)	timestamp or null

Status Thresholds:

✅ Healthy - healthScore ≥ 80
⚠️ Degraded - healthScore 50-79
❌ Unhealthy - healthScore below 50

Using Health Data for Decisions

Example Decision Flow:

Check Preferred Region - Is healthScore ≥ 80?
- Yes → Use Regional mode for strict residency
- No → Consider Global mode for availability
Compare Regions - Which region has best metrics?
- Compare healthScore across au, eu, us
- Consider latency (avgLatencyMs) for your user base
- Check reliability for long-term stability
Monitor Degradation - Is errorRate increasing?
- High errorRate suggests temporary issues
- Low totalWindows indicates new deployment
- lastFailureTs shows recent problems

🗄️ Caching Strategy

Intelligent Model Data Caching

Similar to MCP data, ToothFairyAI caches model availability data by region to optimize performance and reduce API calls.

Cache Strategy:

Regional Cache - Each region's models_list is cached independently
TTL (Time To Live) - Cache expires after 60 seconds
Cache Busting - Force refresh with ?t={timestamp} parameter
Parallel Fetching - All regions fetched simultaneously for status pages

Implementation Pattern:

const REGIONAL_ENDPOINTS = {
  au: "https://ai.toothfairyai.com/models_list",
  eu: "https://ai.eu.toothfairyai.com/models_list",
  us: "https://ai.us.toothfairyai.com/models_list"
};

// Fetch all regions in parallel
const results = await Promise.allSettled(
  Object.entries(REGIONAL_ENDPOINTS).map(async ([region, url]) => {
    const cacheBuster = `?t=${Date.now()}`;
    const response = await fetch(url + cacheBuster, {
      cache: "no-store",
      headers: {
        "Cache-Control": "no-cache, no-store, must-revalidate",
      },
    });
    return { region, data: await response.json() };
  })
);

Benefits:

Reduced Latency - Cached data serves instantly
Lower Costs - Fewer API calls to endpoints
Better UX - Faster loading of model selection UI
Real-Time Option - Cache bypass for fresh data

🎯 Model Selection Best Practices

Choose by Use Case

Reference the Model Selection Guide for detailed recommendations by:

Code Generation - Deepseek R1, Qwen Coder family
Reasoning & Planning - Sorcerer/Mystica Thinking variants
Tool Calling - Mystica, Qwen 3 family
Vision Tasks - Llama 4, Qwen-VL family
Low Latency - Llama 3.1/3.2 small models

Monitor and Adapt

Check Health Regularly - Use models_list endpoint or Status page
Adapt to Degradation - Switch regions or models when healthScore drops
Test Performance - Measure latency for your specific use cases
Review Costs - Monitor token usage across regions

Enterprise Considerations

For Business and Enterprise plans:

Enforce Regional Compliance - Workspace-wide regional enforcement
Custom Model Integration - Add 3rd party providers to models list
Private Hosting - Deploy models on-premises with full control
SLA Guarantees - Production-grade availability commitments

📋 Summary

Key Capabilities:

Dynamic Discovery - All models available via public /models_list endpoint
Regional Sovereignty - Global or Regional routing modes for full control
Real-Time Health - Degradation data for every model in every region
Intelligent Caching - Regional caching strategy for optimal performance
Workspace Enforcement - Admin-level regional controls for compliance

Quick Reference:

Endpoint: https://ai.toothfairyai.com/models_list (public, no auth)
Regions: AU, EU, US with independent health metrics
Routing Modes: Global (prioritises region, routes if availability drops below 50%) or Regional (strict residency)
Health Metrics: status, healthScore, errorRate, latency, reliability
Configuration: Agent-level or workspace-wide enforcement

Next Steps:

View real-time model status: Models Status Page
Choose models by use case: Model Selection Guide
Configure regional settings: Settings > Workspace > Regional
Monitor degradation: models_list endpoint or Status dashboard

📊 Model Discovery via API​

Models List Endpoint​

Response Structure​

🌍 Regional Endpoints & Routing Modes​

Total Control Over Inference Location​

Two Routing Modes​

How to Configure​

📈 Degradation Data & Health Metrics​

Real-Time Health Monitoring​

Health Data Structure​

Using Health Data for Decisions​

🗄️ Caching Strategy​

Intelligent Model Data Caching​

🎯 Model Selection Best Practices​

Choose by Use Case​

Monitor and Adapt​

Enterprise Considerations​

📋 Summary​