Prerequisites

Before we begin, ensure you have:

An active ElevenLabs account with an API key.
An AI calling platform or custom Python/Node.js backend.
Basic knowledge of REST APIs.

Step 1: Get Your API Key

Navigate to your ElevenLabs profile settings and generate a new API key. Keep this secure; it allows access to your voice cloning and synthesis quota.

Step 2: Choose a Voice

List available voices using the API:

curl -X GET 'https://api.elevenlabs.io/v1/voices' \
  -H 'xi-api-key: YOUR_API_KEY'

Copy the voice_id of the voice you want to use.

Step 3: Text-to-Speech Request

To generate audio, send a POST request with your text and voice ID.

const response = await fetch('https://api.elevenlabs.io/v1/text-to-speech/{voice_id}', {
  method: 'POST',
  headers: {
    'xi-api-key': process.env.ELEVENLABS_API_KEY,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    text: "Hello, this is an AI voice call.",
    model_id: "eleven_monolingual_v1",
    voice_settings: {
      stability: 0.5,
      similarity_boost: 0.5
    }
  })
});

Optimizing for Latency

For real-time AI calling, latency is king. Use the stream query parameter to receive audio chunks as they are generated, rather than waiting for the entire file.

[!TIP] Ensure you use a WebSocket connection for streaming audio to your telephony provider (like Twilio or Vapi) to minimize delay.

Conclusion

Integrating ElevenLabs is straightforward and provides immediate quality improvements. Experiment with stability and style settings to find the perfect tone for your use case.