AI Assistant For Hospital Booking

A comprehensive plan to build an intelligent voice agent for seamless hospital appointment management.

🤖 AI-Powered Hospital Booking Assistant

A comprehensive plan to build an intelligent voice agent for seamless hospital appointment management.

Project Brief

What is the Project?

This project aims to develop a sophisticated AI assistant capable of handling hospital appointment bookings via phone calls. The assistant will interact with patients using natural language, understand their requests to book or check appointments, find doctor availability, and record important notes. It will function as an autonomous receptionist, available 24/7 to improve efficiency and patient experience.

The Problem (What We Are Solving)

Hospital reception desks are often overwhelmed with high call volumes, leading to long wait times for patients and significant administrative burden on staff. This manual process is prone to human error, such as incorrect booking details or missed patient notes. Furthermore, reception hours are limited, preventing patients from managing their appointments at their convenience.

High Operational Costs: Staffing a reception desk around the clock is expensive.
Poor Patient Experience: Long hold times and limited availability lead to patient frustration.
Inefficiency & Errors: Manual data entry is time-consuming and can result in booking mistakes.
Lack of Accessibility: Patients cannot book or inquire about appointments outside of business hours.

The Solution (What We Are Building)

The solution is an AI-powered voice agent that integrates with the hospital's telephony and scheduling systems. The agent will leverage advanced AI to provide a seamless, conversational experience for patients, automating the entire booking process.

The core idea is to create a system that can understand a caller's intent, extract key information (patient name, doctor, date, notes), interact with a scheduling database, and respond intelligently, just like a human receptionist.

Core Features

Natural Language Conversation: Engage callers in fluid, human-like dialogue.
Appointment Booking: Schedule new appointments based on doctor availability.
Appointment Look-up & Cancellation: Allow patients to check or cancel existing appointments.
Doctor Availability Queries: Inform patients about a specific doctor's open slots.
Intelligent Note-Taking: Capture and save any relevant side notes or special requests from the patient (e.g., "Patient is allergic to penicillin," "Requires wheelchair access").
24/7 Availability: Function continuously without human supervision.
Multi-language Support (Advanced): Cater to a diverse patient population.

Technologies & Tool Stack

Component	Technology	Reasoning
Telephony	Twilio	Robust, developer-friendly API for managing programmable voice calls.
Speech-to-Text (STT)	Google Cloud Speech-to-Text / Deepgram	High accuracy for real-time audio streaming, essential for conversations.
AI Core (The Brain)	Google Gemini / OpenAI GPT-4o	Excellent at natural language understanding, intent recognition, and structured data extraction (like names, dates, and notes).
Text-to-Speech (TTS)	ElevenLabs / Google Cloud TTS	Provides natural, human-like voices for a better user experience.
Backend Framework	Python (FastAPI)	High-performance, modern, and ideal for building the API that connects all services. Python has the best AI ecosystem.
Database	PostgreSQL / MySQL	To store appointment data, patient records, and logs.
Deployment	Docker, AWS/GCP	Containerization for consistency and a cloud platform for scalable, reliable hosting.

Diagram: System Architecture

Building It: A Step-by-Step Guide

Step 1: Setup Backend Server & Twilio Webhook

First, create a basic FastAPI server. This server will have an endpoint that Twilio will call whenever someone dials your Twilio phone number.

# main.py
from fastapi import FastAPI, Response
from twilio.twiml.voice_response import VoiceResponse, Gather

app = FastAPI()

@app.post("/api/v1/call/incoming", response_class=Response)
def handle_incoming_call():
    """Greets the caller and waits for them to speak."""
    response = VoiceResponse()
    
    # Use <Gather> to collect speech. Twilio transcribes it and sends it to the 'action' URL.
    gather = Gather(
        input='speech',
        action='/api/v1/call/process-speech',
        speech_timeout='auto',
        speech_model='experimental_conversations' # Use a model good for conversations
    )
    gather.say(
        "Hello! You've reached the automated booking service. How can I help you today?",
        voice='Polly.Joanna' # A natural sounding voice
    )
    response.append(gather)

    # If the user doesn't say anything, redirect to the start.
    response.redirect('/api/v1/call/incoming')

    return Response(content=str(response), media_type="application/xml")

In your Twilio account, configure your phone number's "A CALL COMES IN" webhook to point to `https://your-server-url.com/api/v1/call/incoming`.

Step 2: Process Speech with an LLM

Create the endpoint that Twilio calls after transcribing the user's speech. This is where you'll call your LLM (e.g., Gemini) to understand the intent and extract information.

# main.py (continued)
from pydantic import BaseModel
import os

# A hypothetical client for our LLM
# In a real app, this would be the official google.generativeai or openai library
class GeminiClient:
    def analyze_text(self, text: str):
        # This is a placeholder for the actual API call.
        # The prompt is key to getting structured output.
        prompt = f"""
        You are a hospital booking assistant. Analyze the user's request and extract the following in a JSON format:
        - intent: Must be one of ["BOOK_APPOINTMENT", "CHECK_AVAILABILITY", "CANCEL_APPOINTMENT", "UNKNOWN"].
        - patient_name: The full name of the patient.
        - doctor_name: The name of the doctor, if mentioned.
        - requested_date: The date requested, in YYYY-MM-DD format.
        - requested_time: The time requested, in HH:MM AM/PM format.
        - notes: Any extra information, verbatim.
        
        User's request: "{text}"
        
        If any information is missing, set its value to null.
        """
        print(f"Sending prompt to LLM: {prompt}")
        # --- MOCK RESPONSE ---
        if "book" in text.lower() and "dr. smith" in text.lower():
            return {
                "intent": "BOOK_APPOINTMENT",
                "patient_name": "Jane Doe",
                "doctor_name": "Dr. Smith",
                "requested_date": "2025-07-20",
                "requested_time": "10:00 AM",
                "notes": "Patient mentioned they are feeling dizzy."
            }
        else:
            return {"intent": "UNKNOWN", "notes": "Could not determine intent."}

gemini_client = GeminiClient()

@app.post("/api/v1/call/process-speech", response_class=Response)
def process_speech(SpeechResult: str):
    """Processes the transcribed speech to determine the user's intent."""
    response = VoiceResponse()
    
    # Get structured data from the LLM
    parsed_data = gemini_client.analyze_text(SpeechResult)
    
    intent = parsed_data.get("intent")

    if intent == "BOOK_APPOINTMENT":
        # In a real app, you would now query your hospital DB
        # db.is_slot_available(doctor, date, time)
        
        note_confirmation = ""
        if parsed_data.get("notes"):
            note_confirmation = f"I've also made a note that: {parsed_data['notes']}"

        confirmation_message = (
            f"Got it. I'm booking an appointment for {parsed_data['patient_name']} "
            f"with {parsed_data['doctor_name']} on {parsed_data['requested_date']} at {parsed_data['requested_time']}. "
            f"{note_confirmation} I will call you back shortly to confirm. Goodbye."
        )
        response.say(confirmation_message, voice='Polly.Joanna')
        response.hangup()

    elif intent == "UNKNOWN":
        response.say("I'm sorry, I didn't quite understand that. Could you please state your request again?", voice='Polly.Joanna')
        response.redirect('/api/v1/call/incoming')
    
    else:
        # Handle other intents (CHECK_AVAILABILITY, etc.)
        response.say("Handling for this request is not yet implemented. Goodbye.", voice='Polly.Joanna')
        response.hangup()

    return Response(content=str(response), media_type="application/xml")

Step 3: Integrate with the Hospital Database

The final piece is connecting to the actual scheduling system. You'll need to create functions that can query your hospital's database for doctor schedules and available slots.

# database.py
# This is a mock database module. In reality, you'd use SQLAlchemy or a similar ORM.

# Mock data
DOCTOR_SCHEDULE = {
    "Dr. Smith": {
        "2025-07-20": ["09:00 AM", "10:00 AM", "11:00 AM"],
        "2025-07-21": ["02:00 PM", "03:00 PM"],
    }
}

def is_slot_available(doctor_name, date, time):
    """Checks if a specific time slot is available."""
    if doctor_name in DOCTOR_SCHEDULE:
        if date in DOCTOR_SCHEDULE[doctor_name]:
            return time in DOCTOR_SCHEDULE[doctor_name]
    return False

def book_slot(doctor_name, date, time, patient_name, notes):
    """Books the time slot and removes it from availability."""
    if is_slot_available(doctor_name, date, time):
        DOCTOR_SCHEDULE[doctor_name][date].remove(time)
        print(f"APPOINTMENT BOOKED: {patient_name} with {doctor_name} on {date} at {time}.")
        print(f"NOTES: {notes}")
        return True
    return False

# You would then call these functions from your `process_speech` endpoint.

This step-by-step guide provides the foundational code and logic. Building a production-ready system would involve adding robust error handling, security measures (like validating patient identity), and more sophisticated conversation management to handle multi-turn dialogues.