Skip to content

PantryPal is a voice-activated AI cooking assistant that runs locally on your laptop. It uses Web Speech API for hands-free voice commands and OpenAI GPT for recipe generation. It guides you through recipes step-by-step with spoken instructions, visual timers, and animated robot faces.

Notifications You must be signed in to change notification settings

RyanMastropaolo/PantryPalApp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PantryPal - AI Recipe Assistant Web App

A voice-activated AI recipe recommender web application that runs locally on your laptop. Uses Web Speech API for voice input/output and OpenAI for recipe generation.

Features

  • 🎤 Voice Interface: Uses your laptop's microphone and speakers via Web Speech API
  • 🤖 Animated Faces: Dark UI with expressive robot faces (^^, _, @@, o-o)
  • ⏱️ Visual Timer: LCD-style countdown timer for cooking steps
  • 🔊 Sound Effects: Plays startup, timer completion, and recipe completion sounds
  • 🍳 AI Recipe Generation: Creates recipes based on your ingredients or dish name
  • 🗣️ Text-to-Speech: PantryPal speaks instructions and guides you through each step
  • 📝 Step-by-Step Guidance: Voice-controlled navigation through recipe steps

How It Works

  1. Greeting: PantryPal introduces itself and asks for your ingredients
  2. Input Detection: Detects if you provided ingredients or a dish name
  3. Recipe Generation: Uses OpenAI GPT to create a personalized recipe
  4. Voice Navigation: Say "okay pal, next step" to move between steps
  5. Timer Support: Say "start timer" when a step requires timing
  6. Completion: PantryPal announces when the recipe is complete

Prerequisites

  • Python 3.12 or higher
  • OpenAI API key (get one here)
  • Modern web browser (Chrome, Safari, or Edge recommended)
  • Microphone and speakers (built into most laptops)
  • Platform Support:
    • Windows: Uses browser TTS (Windows voices like Microsoft Zira/David)
    • macOS: Uses browser TTS (macOS voices) with optional server-side TTS caching
    • Linux: Uses browser TTS (system voices)

Installation

macOS Installation

  1. Navigate to project directory:

    cd PantryPalApp
  2. Create virtual environment:

    python3.12 -m venv .venv
    source .venv/bin/activate

    If python3.12 is not found, try python3 or install Python 3.12+ from python.org

  3. Install dependencies:

    pip install -r requirements.txt
  4. Create .env file:

    echo "OPENAI_API_KEY=your_api_key_here" > .env

    Or manually create a .env file in the project root and add:

    OPENAI_API_KEY=your_actual_openai_api_key
    
  5. Run the application:

    python app.py
  6. Open in browser:

    • Navigate to http://localhost:5001 (will look something like: http://127.0.0.1:5001)
    • Grant microphone permissions when prompted
    • Click anywhere on the page to enable audio (required for browser autoplay policy)

Windows Installation

  1. Navigate to project directory:

    cd PantryPalApp
  2. Create virtual environment:

    python -m venv .venv
    .venv\Scripts\activate

    If python is not found, try py or install Python 3.12+ from python.org

    • Make sure to check "Add Python to PATH" during installation
  3. Install dependencies:

    pip install -r requirements.txt
  4. Create .env file:

    Option A - Using PowerShell:

    "OPENAI_API_KEY=your_api_key_here" | Out-File -FilePath .env -Encoding utf8

    Option B - Manual creation:

    • Create a new file named .env in the project root (you may need to enable "Show hidden files" in File Explorer)
    • Add the following line:
      OPENAI_API_KEY=your_actual_openai_api_key
      
  5. Run the application:

    python app.py
  6. Open in browser:

    • Navigate to http://localhost:5001 (will look something like: http://127.0.0.1:5001)
    • Grant microphone permissions when prompted
    • Click anywhere on the page to enable audio (required for browser autoplay policy)

Configuration

Environment Variables

Create a .env file in the project root with:

OPENAI_API_KEY=your_openai_api_key_here
PORT=5001  # Optional: defaults to 5001

Port Configuration

The app runs on port 5001 by default. To change it:

  1. Set PORT environment variable:

    export PORT=8080  # On Windows: set PORT=8080
    python app.py
  2. Or modify app.py directly (line 646):

    port = int(os.environ.get('PORT', 5001))  # Change 5001 to your desired port

Running the Application

Development Mode

python app.py

This starts the Flask development server on http://localhost:5001.

Production Mode (Optional)

For production use, you can use a WSGI server like gunicorn:

# Install gunicorn (already in requirements.txt)
pip install gunicorn

# Run with gunicorn
gunicorn -w 4 -b 0.0.0.0:5001 app:app

Note: The app is designed for local use. For production deployment, you would need to:

  • Set up HTTPS (required for microphone access)
  • Configure a proper web server
  • Handle CORS appropriately
  • Set up proper security measures

Usage

Voice Commands

  • "I have [ingredients]" - Provide ingredients you have (e.g., "I have eggs, cheese, bread")
  • "[dish name]" - Request a specific dish (e.g., "scrambled eggs")
  • "okay pal, next step" - Move to the next recipe step
  • "start timer" - Begin countdown for current step (if timer is required)
  • "next" - Shortcut for next step (when in recipe flow)

Flow

  1. Start: PantryPal greets you and asks for ingredients
  2. Provide Input: Tell PantryPal what ingredients you have or what dish you want
  3. Time Available: Specify how much time you have (e.g., "30 minutes")
  4. Recipe Generation: PantryPal creates a recipe and lists ingredients
  5. Follow Steps: PantryPal guides you through each step with voice
  6. Timer: When a step requires timing, say "start timer" to begin countdown
  7. Next Step: Say "okay pal, next step" to continue
  8. Completion: PantryPal announces when the recipe is complete

Project Structure

PantryPalApp/
├── app.py                 # Flask backend server
├── templates/
│   └── index.html        # Frontend UI (HTML, CSS, JavaScript)
├── sounds/               # Sound files
│   ├── startup.wav      # Sound played on app start
│   ├── timer_done.wav   # Sound when timer completes
│   └── recipe_done.wav  # Sound when recipe completes
├── static/
│   └── tts_cache/       # Cached text-to-speech audio files (auto-generated)
├── requirements.txt     # Python dependencies
├── .env                 # Environment variables (create this file, not in git)
└── README.md           # This file

Technical Details

Backend (app.py)

  • Flask Server: Serves the web app and handles API requests
  • OpenAI Integration: Generates recipes using GPT-4o-mini
  • TTS Caching: Caches text-to-speech audio files (macOS only, optional)
  • Recipe Parsing: Extracts steps, timers, and ingredients from AI responses
  • Platform Support: Server-side TTS works on macOS; Windows/Linux use browser TTS
  • API Endpoints:
    • / - Main application page
    • /api/generate-recipe - Generate recipe from ingredients
    • /api/get-ingredients-list - Get ingredients for a dish name
    • /api/check-input-type - Detect if input is dish name or ingredients
    • /api/text-to-speech - Generate TTS audio (macOS only)
    • /sounds/<filename> - Serve sound files
    • /api/health - Health check endpoint

Frontend (templates/index.html)

  • Web Speech API: Voice recognition and text-to-speech
  • Continuous Listening: Microphone stays active throughout the session
  • LCD Display: Visual display showing current step, timer, or status
  • Robot Faces: Animated facial expressions based on app state
  • Timer Visualization: Real-time countdown display
  • Sound Playback: Plays audio files at appropriate moments

Browser Compatibility

Recommended Browsers

  • Chrome/Edge (desktop & mobile) - Full support
  • Safari (iOS 14.5+, macOS) - Full support
  • ⚠️ Firefox - Limited Web Speech API support

Required Features

  • Web Speech API (Speech Recognition)
  • Web Speech API (Speech Synthesis)
  • Microphone access
  • Audio playback

Troubleshooting

App won't start

Problem: ModuleNotFoundError: No module named 'flask_cors'

Solution:

pip install -r requirements.txt

Problem: OPENAI_API_KEY not found in environment variables

Solution:

  • Create a .env file in the project root
  • Add: OPENAI_API_KEY=your_actual_api_key

Problem: Port already in use

Solution:

  • Change the port in .env: PORT=8080
  • Or kill the process using port 5001

Voice recognition not working

Problem: Microphone not detected

Solutions:

  • Grant microphone permissions when prompted by the browser
  • Check browser settings: Chrome → Settings → Privacy → Microphone
  • Ensure no other application is using the microphone
  • Try a different browser (Chrome/Safari recommended)

Problem: Microphone disconnects after inactivity

Solution: The app includes a keep-alive mechanism. If issues persist:

  • Refresh the page
  • Check browser console for errors
  • Ensure microphone permissions are still granted

Recipe generation fails

Problem: API errors

Solutions:

  • Verify your OpenAI API key is valid
  • Check your OpenAI account has credits/billing set up
  • Review browser console (F12) for error messages
  • Check Flask server logs in terminal

Problem: Network errors

Solutions:

  • Ensure you have internet connection
  • Check if OpenAI API is accessible
  • Verify firewall isn't blocking requests

TTS (Text-to-Speech) not working

Problem: No audio playback

Solutions:

  • Click anywhere on the page to enable audio (browser autoplay policy)
  • Check browser audio settings
  • Ensure speakers/headphones are connected and working
  • Check browser console for errors

Problem: Server-side TTS not working (macOS)

Solutions:

  • Ensure you're on macOS (server-side TTS is macOS-only)
  • Browser TTS will be used as fallback automatically
  • Check that say command works: say "test" in terminal

Note for Windows/Linux users: The app uses browser TTS automatically on these platforms. Windows will use built-in voices (Microsoft Zira, Microsoft David, etc.) and Linux will use system voices. No additional configuration needed.

Sound files not playing

Problem: Sounds don't play

Solutions:

  • Verify sound files exist in sounds/ directory:
    • startup.wav
    • timer_done.wav
    • recipe_done.wav
  • Check browser console for 404 errors
  • Ensure audio is enabled in browser
  • Click anywhere on page to enable audio

LCD display issues

Problem: Timer or text not showing

Solutions:

  • Check browser console for JavaScript errors
  • Ensure CSS is loading properly
  • Try hard refresh (Ctrl+Shift+R or Cmd+Shift+R)

Security Notes

  • The app runs locally and doesn't expose your data externally
  • OpenAI API calls are made server-side (API key stays on your machine)
  • Voice data is processed in the browser (not sent to external servers except OpenAI)
  • No user data is stored or logged
  • .env file should never be committed to git (already in .gitignore)

Performance Tips

  • TTS Caching: Audio files are cached in static/tts_cache/ to avoid regeneration
  • Microphone Keep-Alive: The app maintains microphone connection to prevent timeouts
  • Browser Caching: Browser caches static files for faster loading

Development

Making Changes

  1. Backend changes (app.py): Restart the Flask server
  2. Frontend changes (templates/index.html): Refresh browser (no restart needed)

Debugging

  • Browser Console: Press F12 to see JavaScript errors and logs
  • Flask Logs: Check terminal where python app.py is running
  • Network Tab: Use browser DevTools to inspect API requests

Testing

  • Test voice commands in a quiet environment
  • Verify microphone permissions are granted
  • Test with different recipes and ingredient combinations
  • Check timer functionality with various durations

Notes

  • The app is designed for local use only on your laptop
  • All processing happens locally or via OpenAI API
  • No external hosting or deployment is required
  • The app uses your laptop's default microphone and speakers
  • HTTPS is not required for local development (localhost is exempt)
  • The app runs on port 5001 by default
  • All voice processing happens in the browser (Web Speech API)
  • Recipe generation uses OpenAI's API (requires internet connection)
  • Sound files are served from the sounds/ directory
  • TTS audio files are cached in static/tts_cache/ to avoid regeneration

About

PantryPal is a voice-activated AI cooking assistant that runs locally on your laptop. It uses Web Speech API for hands-free voice commands and OpenAI GPT for recipe generation. It guides you through recipes step-by-step with spoken instructions, visual timers, and animated robot faces.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published