A voice-activated AI recipe recommender web application that runs locally on your laptop. Uses Web Speech API for voice input/output and OpenAI for recipe generation.
- 🎤 Voice Interface: Uses your laptop's microphone and speakers via Web Speech API
- 🤖 Animated Faces: Dark UI with expressive robot faces (^^,
_, @@, o-o) - ⏱️ Visual Timer: LCD-style countdown timer for cooking steps
- 🔊 Sound Effects: Plays startup, timer completion, and recipe completion sounds
- 🍳 AI Recipe Generation: Creates recipes based on your ingredients or dish name
- 🗣️ Text-to-Speech: PantryPal speaks instructions and guides you through each step
- 📝 Step-by-Step Guidance: Voice-controlled navigation through recipe steps
- Greeting: PantryPal introduces itself and asks for your ingredients
- Input Detection: Detects if you provided ingredients or a dish name
- Recipe Generation: Uses OpenAI GPT to create a personalized recipe
- Voice Navigation: Say "okay pal, next step" to move between steps
- Timer Support: Say "start timer" when a step requires timing
- Completion: PantryPal announces when the recipe is complete
- Python 3.12 or higher
- OpenAI API key (get one here)
- Modern web browser (Chrome, Safari, or Edge recommended)
- Microphone and speakers (built into most laptops)
- Platform Support:
- Windows: Uses browser TTS (Windows voices like Microsoft Zira/David)
- macOS: Uses browser TTS (macOS voices) with optional server-side TTS caching
- Linux: Uses browser TTS (system voices)
-
Navigate to project directory:
cd PantryPalApp -
Create virtual environment:
python3.12 -m venv .venv source .venv/bin/activateIf
python3.12is not found, trypython3or install Python 3.12+ from python.org -
Install dependencies:
pip install -r requirements.txt
-
Create
.envfile:echo "OPENAI_API_KEY=your_api_key_here" > .env
Or manually create a
.envfile in the project root and add:OPENAI_API_KEY=your_actual_openai_api_key -
Run the application:
python app.py
-
Open in browser:
- Navigate to
http://localhost:5001(will look something like: http://127.0.0.1:5001) - Grant microphone permissions when prompted
- Click anywhere on the page to enable audio (required for browser autoplay policy)
- Navigate to
-
Navigate to project directory:
cd PantryPalApp -
Create virtual environment:
python -m venv .venv .venv\Scripts\activate
If
pythonis not found, trypyor install Python 3.12+ from python.org- Make sure to check "Add Python to PATH" during installation
-
Install dependencies:
pip install -r requirements.txt
-
Create
.envfile:Option A - Using PowerShell:
"OPENAI_API_KEY=your_api_key_here" | Out-File -FilePath .env -Encoding utf8
Option B - Manual creation:
- Create a new file named
.envin the project root (you may need to enable "Show hidden files" in File Explorer) - Add the following line:
OPENAI_API_KEY=your_actual_openai_api_key
- Create a new file named
-
Run the application:
python app.py
-
Open in browser:
- Navigate to
http://localhost:5001(will look something like: http://127.0.0.1:5001) - Grant microphone permissions when prompted
- Click anywhere on the page to enable audio (required for browser autoplay policy)
- Navigate to
Create a .env file in the project root with:
OPENAI_API_KEY=your_openai_api_key_here
PORT=5001 # Optional: defaults to 5001The app runs on port 5001 by default. To change it:
-
Set
PORTenvironment variable:export PORT=8080 # On Windows: set PORT=8080 python app.py
-
Or modify
app.pydirectly (line 646):port = int(os.environ.get('PORT', 5001)) # Change 5001 to your desired port
python app.pyThis starts the Flask development server on http://localhost:5001.
For production use, you can use a WSGI server like gunicorn:
# Install gunicorn (already in requirements.txt)
pip install gunicorn
# Run with gunicorn
gunicorn -w 4 -b 0.0.0.0:5001 app:appNote: The app is designed for local use. For production deployment, you would need to:
- Set up HTTPS (required for microphone access)
- Configure a proper web server
- Handle CORS appropriately
- Set up proper security measures
- "I have [ingredients]" - Provide ingredients you have (e.g., "I have eggs, cheese, bread")
- "[dish name]" - Request a specific dish (e.g., "scrambled eggs")
- "okay pal, next step" - Move to the next recipe step
- "start timer" - Begin countdown for current step (if timer is required)
- "next" - Shortcut for next step (when in recipe flow)
- Start: PantryPal greets you and asks for ingredients
- Provide Input: Tell PantryPal what ingredients you have or what dish you want
- Time Available: Specify how much time you have (e.g., "30 minutes")
- Recipe Generation: PantryPal creates a recipe and lists ingredients
- Follow Steps: PantryPal guides you through each step with voice
- Timer: When a step requires timing, say "start timer" to begin countdown
- Next Step: Say "okay pal, next step" to continue
- Completion: PantryPal announces when the recipe is complete
PantryPalApp/
├── app.py # Flask backend server
├── templates/
│ └── index.html # Frontend UI (HTML, CSS, JavaScript)
├── sounds/ # Sound files
│ ├── startup.wav # Sound played on app start
│ ├── timer_done.wav # Sound when timer completes
│ └── recipe_done.wav # Sound when recipe completes
├── static/
│ └── tts_cache/ # Cached text-to-speech audio files (auto-generated)
├── requirements.txt # Python dependencies
├── .env # Environment variables (create this file, not in git)
└── README.md # This file
- Flask Server: Serves the web app and handles API requests
- OpenAI Integration: Generates recipes using GPT-4o-mini
- TTS Caching: Caches text-to-speech audio files (macOS only, optional)
- Recipe Parsing: Extracts steps, timers, and ingredients from AI responses
- Platform Support: Server-side TTS works on macOS; Windows/Linux use browser TTS
- API Endpoints:
/- Main application page/api/generate-recipe- Generate recipe from ingredients/api/get-ingredients-list- Get ingredients for a dish name/api/check-input-type- Detect if input is dish name or ingredients/api/text-to-speech- Generate TTS audio (macOS only)/sounds/<filename>- Serve sound files/api/health- Health check endpoint
- Web Speech API: Voice recognition and text-to-speech
- Continuous Listening: Microphone stays active throughout the session
- LCD Display: Visual display showing current step, timer, or status
- Robot Faces: Animated facial expressions based on app state
- Timer Visualization: Real-time countdown display
- Sound Playback: Plays audio files at appropriate moments
- ✅ Chrome/Edge (desktop & mobile) - Full support
- ✅ Safari (iOS 14.5+, macOS) - Full support
⚠️ Firefox - Limited Web Speech API support
- Web Speech API (Speech Recognition)
- Web Speech API (Speech Synthesis)
- Microphone access
- Audio playback
Problem: ModuleNotFoundError: No module named 'flask_cors'
Solution:
pip install -r requirements.txtProblem: OPENAI_API_KEY not found in environment variables
Solution:
- Create a
.envfile in the project root - Add:
OPENAI_API_KEY=your_actual_api_key
Problem: Port already in use
Solution:
- Change the port in
.env:PORT=8080 - Or kill the process using port 5001
Problem: Microphone not detected
Solutions:
- Grant microphone permissions when prompted by the browser
- Check browser settings: Chrome → Settings → Privacy → Microphone
- Ensure no other application is using the microphone
- Try a different browser (Chrome/Safari recommended)
Problem: Microphone disconnects after inactivity
Solution: The app includes a keep-alive mechanism. If issues persist:
- Refresh the page
- Check browser console for errors
- Ensure microphone permissions are still granted
Problem: API errors
Solutions:
- Verify your OpenAI API key is valid
- Check your OpenAI account has credits/billing set up
- Review browser console (F12) for error messages
- Check Flask server logs in terminal
Problem: Network errors
Solutions:
- Ensure you have internet connection
- Check if OpenAI API is accessible
- Verify firewall isn't blocking requests
Problem: No audio playback
Solutions:
- Click anywhere on the page to enable audio (browser autoplay policy)
- Check browser audio settings
- Ensure speakers/headphones are connected and working
- Check browser console for errors
Problem: Server-side TTS not working (macOS)
Solutions:
- Ensure you're on macOS (server-side TTS is macOS-only)
- Browser TTS will be used as fallback automatically
- Check that
saycommand works:say "test"in terminal
Note for Windows/Linux users: The app uses browser TTS automatically on these platforms. Windows will use built-in voices (Microsoft Zira, Microsoft David, etc.) and Linux will use system voices. No additional configuration needed.
Problem: Sounds don't play
Solutions:
- Verify sound files exist in
sounds/directory:startup.wavtimer_done.wavrecipe_done.wav
- Check browser console for 404 errors
- Ensure audio is enabled in browser
- Click anywhere on page to enable audio
Problem: Timer or text not showing
Solutions:
- Check browser console for JavaScript errors
- Ensure CSS is loading properly
- Try hard refresh (Ctrl+Shift+R or Cmd+Shift+R)
- The app runs locally and doesn't expose your data externally
- OpenAI API calls are made server-side (API key stays on your machine)
- Voice data is processed in the browser (not sent to external servers except OpenAI)
- No user data is stored or logged
.envfile should never be committed to git (already in.gitignore)
- TTS Caching: Audio files are cached in
static/tts_cache/to avoid regeneration - Microphone Keep-Alive: The app maintains microphone connection to prevent timeouts
- Browser Caching: Browser caches static files for faster loading
- Backend changes (
app.py): Restart the Flask server - Frontend changes (
templates/index.html): Refresh browser (no restart needed)
- Browser Console: Press F12 to see JavaScript errors and logs
- Flask Logs: Check terminal where
python app.pyis running - Network Tab: Use browser DevTools to inspect API requests
- Test voice commands in a quiet environment
- Verify microphone permissions are granted
- Test with different recipes and ingredient combinations
- Check timer functionality with various durations
- The app is designed for local use only on your laptop
- All processing happens locally or via OpenAI API
- No external hosting or deployment is required
- The app uses your laptop's default microphone and speakers
- HTTPS is not required for local development (localhost is exempt)
- The app runs on port 5001 by default
- All voice processing happens in the browser (Web Speech API)
- Recipe generation uses OpenAI's API (requires internet connection)
- Sound files are served from the
sounds/directory - TTS audio files are cached in
static/tts_cache/to avoid regeneration