Question
What changes when an AI system is spoken to instead of typed to?
Hypothesis
Voice systems need latency budgets, interruption handling, and repair strategies treated as core architecture.
Method
Map the flow across capture, transcription, policy, tool execution, generation, and speech output.
Prototype
Prototype a turn manager that handles barge-in, confirmation, and partial state repair.
Notes
Voice makes uncertainty obvious. The system needs to admit delay and recover gracefully.
Results / Open Questions
Open question: where should a voice agent ask for confirmation versus continuing with reversible action?
References
Placeholder for realtime speech systems, turn-taking, and multimodal interaction research.