GuidesVapi.ai

Vapi.ai Integration

Vapi.ai is a voice AI orchestration platform for building real-time phone and web call agents. This guide shows how to use VersionR to version and manage the system prompts powering your Vapi assistants.

This integration is particularly relevant for the VersionR validation app — an AI Interview platform built on Vapi + ElevenLabs + Deepgram.


Why manage Vapi prompts in VersionR?

Vapi assistants are configured with a system prompt that defines the AI’s behaviour during a call. Without VersionR:

  • The prompt lives in Vapi’s dashboard or hardcoded in your createAssistant payload
  • You can’t diff changes or roll back a broken prompt
  • You can’t A/B test different interview styles
  • You have no visibility into what changed when call quality drops

VersionR gives you versioning, environments, and logs — for voice AI prompts.


Installation

npm install versionr

Vapi’s SDK or REST API is used separately to create and manage calls.


Pattern

The integration is straightforward: fetch the prompt from VersionR before creating the Vapi assistant, then inject it as the system prompt.

Fetch your prompt

import { VersionR } from 'versionr'
 
const pd = new VersionR({ apiKey: process.env.VERSIONR_API_KEY })
 
const rendered = await pd.render('interviewer-system', {
  env: 'production',
  variables: {
    candidate_name: call.candidateName,
    job_role: call.jobRole,
    difficulty: call.difficulty,
  },
})

Create the Vapi assistant

const assistant = {
  model: {
    provider: 'openai',
    model: 'gpt-4o',
    systemPrompt: rendered.content,   // ← inject here
    temperature: rendered.temperature,
  },
  voice: {
    provider: 'elevenlabs',
    voiceId: process.env.ELEVENLABS_VOICE_ID,
  },
  firstMessage: `Hi ${call.candidateName}, I'm ready to begin. Can you start by telling me about yourself?`,
}

Start the call

const response = await fetch('https://api.vapi.ai/call', {
  method: 'POST',
  headers: {
    Authorization: `Bearer ${process.env.VAPI_API_KEY}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    assistant,
    customer: { number: call.phoneNumber },
  }),
})
 
const vapiCall = await response.json()

Log the result

After the call ends (via Vapi webhook):

await pd.log({
  promptId: rendered.promptId,
  promptSlug: rendered.promptSlug,
  versionId: rendered.versionId,
  versionNumber: rendered.versionNumber,
  environment: 'production',
  renderedPrompt: rendered.rendered,
  output: callTranscript,
  latencyMs: call.durationMs,
  metadata: {
    vapi_call_id: vapiCall.id,
    candidate_id: call.candidateId,
  },
})

Caching for low-latency call starts

Voice calls are latency-sensitive. The first SDK call fetches from the VersionR API; subsequent calls for the same prompt are served from the in-memory cache — typically sub-millisecond.

For serverless environments (where the cache is cold on every invocation), consider warming the cache at startup:

// Called once at server startup or in a warm-up route
export async function warmPromptCache() {
  await pd.get('interviewer-system', { env: 'production' })
}

Fallback prompt

Because VersionR may be unavailable and a call start should never fail:

const FALLBACK_SYSTEM_PROMPT = `You are an experienced technical interviewer conducting a professional job interview. Ask clear, relevant questions and evaluate responses objectively.`
 
let systemPrompt: string
 
try {
  const rendered = await pd.render('interviewer-system', {
    env: 'production',
    variables: { candidate_name: call.candidateName, job_role: call.jobRole, difficulty: call.difficulty },
  })
  systemPrompt = rendered.content
} catch {
  systemPrompt = FALLBACK_SYSTEM_PROMPT
}

MIT 2026 © Nextra.