Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.blindference.xyz/llms.txt

Use this file to discover all available pages before exploring further.

Supported Models

Blindference currently supports three inference models across cloud and local backends.

Model Reference

Model IDProviderBackendVRAM / RequirementsTierSpeedQuality
groq:llama-3.3-70b-versatileGroqCloud APIGROQ_API_KEY env var0FastHigh
gemini:gemini-2.5-flashGoogleCloud APIGOOGLE_API_KEY env var0FastMedium-High
facebook/opt-125mLocalvLLM0.5GB+ VRAM, vllm package0VariableDev/Testing

Resolution Order

When a job arrives, the node queries backends in registration order:
  1. vLLM (if GPU available and model supported)
  2. Groq (if GROQ_API_KEY set)
  3. Gemini (if GOOGLE_API_KEY set)
  4. Mock (always available, universal fallback)
The first backend that (a) is available and (b) advertises the requested model_id wins.

Cloud API Setup

Groq

export GROQ_API_KEY="gsk_..."
Get your key at console.groq.com.

Google Gemini

export GOOGLE_API_KEY="AI..."
Get your key at ai.google.dev.

Local GPU Setup

Minimum Requirements

  • NVIDIA GPU with 0.5GB+ VRAM
  • CUDA 11.8+ or ROCm (AMD)
  • Python 3.10+

Install vLLM

pip install vllm

Verify

blindference-node models test --backend vllm --model facebook/opt-125m --prompt "Hello"

Determinism

BackendDeterminism Method
vLLMtemperature=0, seed=42, enforce_eager=True
Groqtemperature=0, seed=42 (API-native)
Geminitemperature=0 (best-effort)
MockSHA-256 of (model_id, prompt)
All cloud backends use a lightweight [seed_anchor:{hash}] prefix to reduce variance without restricting response creativity.

Adding Custom Models

See Model Backends for the full pluggable backend system.