Documentation Index
Fetch the complete documentation index at: https://docs.blindference.xyz/llms.txt
Use this file to discover all available pages before exploring further.
Supported Models
Blindference currently supports three inference models across cloud and local backends.Model Reference
| Model ID | Provider | Backend | VRAM / Requirements | Tier | Speed | Quality |
|---|---|---|---|---|---|---|
groq:llama-3.3-70b-versatile | Groq | Cloud API | GROQ_API_KEY env var | 0 | Fast | High |
gemini:gemini-2.5-flash | Cloud API | GOOGLE_API_KEY env var | 0 | Fast | Medium-High | |
facebook/opt-125m | Local | vLLM | 0.5GB+ VRAM, vllm package | 0 | Variable | Dev/Testing |
Resolution Order
When a job arrives, the node queries backends in registration order:- vLLM (if GPU available and model supported)
- Groq (if
GROQ_API_KEYset) - Gemini (if
GOOGLE_API_KEYset) - Mock (always available, universal fallback)
model_id wins.
Cloud API Setup
Groq
Google Gemini
Local GPU Setup
Minimum Requirements
- NVIDIA GPU with 0.5GB+ VRAM
- CUDA 11.8+ or ROCm (AMD)
- Python 3.10+
Install vLLM
Verify
Determinism
| Backend | Determinism Method |
|---|---|
| vLLM | temperature=0, seed=42, enforce_eager=True |
| Groq | temperature=0, seed=42 (API-native) |
| Gemini | temperature=0 (best-effort) |
| Mock | SHA-256 of (model_id, prompt) |
[seed_anchor:{hash}] prefix to reduce variance without restricting response creativity.