Sound 2 – Voice & Al | INTERCULT

Sound 2 – Voice & AI | INTERCULT

Training Manual for Trainers: Teaching Al-Powered Voice Interaction Systems

Objective Equip trainers with a step-by-step guide to effectively teach real-time speech recognition and Al vocal synthesis for performance control, ensuring participants can transform spoken words into musical and visual experiences.

Structure of the Training Session

1. Preparation (approx. 10 minutes)

• Objective: Set up Al-powered voice interaction environment.

• Steps:
1. Research speech-to-text APls (Google Speech, Azure, or local solutions like Whisper).

2. Prepare code editors (Cursor, VS Code, or Sublime) with necessary libraries.

3. Set up audio production software (Ableton Live, Logic Pro, or Reaper) and visual tools.

Checklist:

• Microphone and audio interface tested.

• Speech recognition software configured.

• Music production and visual software connected.

• Text-to-speech engines ready for experimentation.

2. Introduction to Voice-Al Integration (approx. 15 minutes)

• Objective: Introduce participants to voice as a performance control interface.

• Steps:

1. Explain speech recognition technology and its applications in live performance.

2. Demonstrate Al language models converting speech to musical and visual triggers.

3. Show examples of voice-controlled performances and installations. Trainer Tip: Use clear, simple voice commands initially to demonstrate immediate response.

3. Hands-on Practice (approx. 30 minutes)

• Objective: Build a basic voice-to-music-to-visual pipeline.

• Steps:

1. Set up speech recognition with simple word detection.

2. Map recognized words to musical parameters (tempo, key, instruments).

3. Connect musical changes to visual responses in real-time. Trainer Tip: Encourage experimentation with different prompt styles and voice processing techniques.

4. Advanced Features and Creative Use Cases (approx. 15 minutes)

• Objective: Explore complex voice interactions and Al-generated responses.

• Steps:

1. Demonstrate Al text generation from voice input for narrative performances.

2. Show real-time voice transformation and synthesis techniques.

3. Explore multi-modal interactions combining voice, visuals, and haptics.

5. Wrap-Up and Feedback (approx. 10 minutes)

• Objective: Reinforce voice interaction possibilities.

• Steps:

1. Summarize speech recognition, Al processing, and output mapping concepts.

2. Share resources for voice Al APls and creative coding libraries.

3. Encourage exploration of different languages and vocal expressions.

Post-Training Follow-Up

• Provide access to recordings, cheat sheets, or tool manuals.

• Schedule optional Q&A sessions or office hours.

Trainer Tip: Encourage a collaborative group where participants can share projects and solutions.