Speech to Code

About this project
A web application that leverages Large Language Models to convert spoken language into executable code, streamlining the development process.
The Story Behind It
I originally developed this project in the summer of 2024 when Claude Sonnet 3.5 (Anthropic's leading coding model at the time) was released with strict rate limits. As an active user of LLMs for code generation, I needed a way to continue working after hitting rate limits and efficiently provide context about my codebase to the language model.
Through experimentation, I discovered that specific system prompts were crucial for generating complete, working code rather than partial solutions. I also found that speech-to-text was essential for rapidly providing context to the language models. This led me to create a tool that combined all these elements - prompt management, speech input, and codebase context injection.
While the landscape has evolved significantly since then - with improved rate limits and advances in AI-assisted IDEs - I still find this tool valuable for specific use cases. Today, I primarily use it for quick speech-to-text conversion, sophisticated prompt construction, and analyzing repository structure (including token counts per file). The ability to dump entire codebases into prompts and understand their structure remains particularly useful.
Note: This application requires local installation as it needs direct filesystem access to analyze and interact with your codebase.
Key Features
- Advanced Prompt Composer: Combine speech, repository files, and manual text input
- Multi-Model Support: Integration with multiple LLM providers (OpenAI, Anthropic, Google)
- Repository Integration: Interactive file viewer for repository navigation
- Transcription Management: Real-time speech-to-text conversion
- System Prompt Management: Create and edit system prompts
Screenshots



Tech Stack
- Frontend: React, Tailwind CSS
- Backend: Python, FastAPI
- AI Integration: OpenAI API, Anthropic API, Google API
- Speech Recognition: Web Speech API