Serverless Voice Evolution

Lambda Voice AI

Serverless capability is a cornerstone of modern web architecture in platforms like AWS Lambda and Firebase Functions. However, a dedicated, event-driven serverless runtime has been conspicuously missing for building real-time Voice & Vision AI applications.

Lambda Voice AI closes this gap by allowing you to deploy lightweight JavaScript snippets that execute directly inline alongside the RTC Proxy infrastructure. Respond to media events, trigger external APIs, and compose complex AI models with absolute zero server management.

Proof of Concept GitHub
SECURE WEBRTC & PIPELINING PROTOCOL

Runtime Architecture & Event Pipeline

JS Lambda Sandbox

setup(connection)

User Client

WebRTC Media Stream

RTC Proxy

Media & Event Broker

AI Engine & Models

Dynamic Execution

Why Lambda Voice AI?

Moving logic to the network edge simplifies clients, speeds up latency, and enables complex agent architectures.

Service Orchestration

Coordinate multiple APIs, cloud services, and AI backends sequentially. Listen to transcription events and conditionally dispatch messages or invoke stateful workflows based on conversation progress.

Custom Tools & Memory

Add dynamic tools like weather checkers, search APIs, or database queries. Retrieve user history on session initialization and push context vectors directly into the LLM's active prompt system.

Inline Guardrails

Inspect raw user transcription inputs or model voice outputs. Block inappropriate requests, run local safety-checker models, or shut down active WebRTC streams if policies are breached.

Initial Support Merged in live-proxy

The scripting environment, connection event model, and isolated execution mechanisms are officially integrated as part of the core live-proxy repository. Pull the latest code to start running lambda functions today!

Get Started on GitHub

Example 1: Voice & Custom Guardrails

Custom external weather tools combined with localized safety checks on user & system speech.

javascript / setup.js
function setup(connection) {
    const llm = connection.add_model("gemini");
    const local_llm = connection.add_model("local_llm");

    llm.addTool({
        name: 'weather',
        description: 'retrieves weather for any location city or address',
        parameters: [
            {
                name: 'location',
                type: 'string',
            }
        ],
        callback: async (location) => {
            const response = await fetch(`http://api.openweathermap.org/geo/1.0/direct?q=${location}&appid=1c4ae371d89ee81520eac02916af0e97`);
            return await response.json();
        }
    });

    local_llm.on('response', (response) => {
        console.log("Response: ", response)
        if (response.text === "NO") {
            connection.close();
        }
    });

    llm.on('input_transcription', (transcription) => {
        local_llm.send("Is this request safe? Responde with only YES or NO: " + transcription.text);

        connection.send_data({ input: transcription.text });
    });

    llm.on('output_transcription', (transcription) => {
        local_llm.send("Is this response safe? Respond with only YES or NO: " + transcription.text);

        connection.send_data({ output: transcription.text });
    });
}

Example 2: Vision & Context Gating

Lightweight YOLO models gating heavier face embeddings to dynamically update client overlay displays.

javascript / setup.js
function setup(connection) {
    const yolo = connection.add_model("yolo", { sampling: 25 });
    const inception = connection.add_model("inception", { sampling: 25 });

    // Start with YOLO enabled, inception input disabled by default (lightweight gating logic)
    inception.disable_input();

    // 1. YOLO event handler: enable inception only when person is detected
    yolo_handler = function (objects) {
        if (objects && objects.indexOf("person") !== -1) {
            console.log("Person detected by YOLO! Enabling Inception face embedder.");
            inception.enable_input();
        } else {
            console.log("No person detected by YOLO. Keeping Inception disabled.");
            inception.disable_input();
            // Clear active text display overlay on UI
            sendDisplay(connection, "");
        }
    };

    // 2. Inception event handler: compare extracted embeddings against database via Cosine Similarity
    inception_handler = function (data) {
        const embedding = data ? data.embedding : null;
        if (!embedding || embedding.length === 0) {
            return;
        }

        console.log("Received face embedding vector of size: " + embedding.length);

        const bestMatch = match(embedding);
        sendDisplay(connection, bestMatch ? bestMatch.name : "Unknown Face");
    };

    // Register the event callbacks
    yolo.on("objects", yolo_handler);
    inception.on("faces", inception_handler);
}