Voice Kit 2

Neil HaddleyOctober 16, 2025

Google Assistant API

Setup Context

Note on Browser Compatibility

During setup, I encountered issues with the unsupported Chromium version in the Raspberry Pi image. This was resolved by installing Firefox, which properly handled the OAuth authentication flow required by the Google Assistant API.

BASH
1pi@raspberrypi:~ $ sudo apt update
2...
3pi@raspberrypi:~ $ sudo apt install firefox-esr
Google Cloud Console

Google Cloud Console

Credentials

Credentials

APIs & Services

APIs & Services

assistant.json

assistant.json

JSON
1{
2    "installed": {
3        "client_id": "<SPECIFIC TO YOU>.apps.googleusercontent.com",
4        "project_id": "<SPECIFIC TO YOU>",
5        "auth_uri": "https://accounts.google.com/o/oauth2/auth",
6        "token_uri": "https://oauth2.googleapis.com/token",
7        "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
8        "client_secret": "<SPECIFIC TO YOU>",
9        "redirect_uris": [
10            "http://localhost"
11        ]
12    }
13}
Assistant Library Demo

Assistant Library Demo

Project Overview

Python implementation of a voice-activated Google Assistant using hotword detection ("OK Google") with the AIY Voice Kit hardware.

Setup Notes

Authentication Challenge: The initial setup encountered browser compatibility issues with the unsupported Chromium version in the Raspberry Pi image. Successfully resolved by installing Firefox to handle the OAuth authentication flow for Google Assistant API integration.

Code Implementation

Core Dependencies
PYTHON
1import logging
2import platform
3import sys
4from google.assistant.library.event import EventType
5from aiy.assistant import auth_helpers
6from aiy.assistant.library import Assistant
7from aiy.board import Board, Led
Event Processing Logic

The process_event() function manages the Assistant's state machine with visual LED feedback:

- Ready State (ON_START_FINISHED)

- LED: BEACON_DARK

- Status: Assistant initialized and awaiting hotword

- Active Listening (ON_CONVERSATION_TURN_STARTED)

- LED: ON (solid)

- Status: Processing user voice input

- Processing Query (ON_END_OF_UTTERANCE)

- LED: PULSE_QUICK

- Status: Assistant analyzing and responding to query

- Completion States

- ON_CONVERSATION_TURN_FINISHED

- ON_CONVERSATION_TURN_TIMEOUT

- ON_NO_RESPONSE

- LED: Returns to BEACON_DARK

- Error Handling (ON_ASSISTANT_ERROR)

- Condition: Fatal errors trigger application exit

PYTHON
1def main():
2    # Initialize logging system
3    logging.basicConfig(level=logging.INFO)
4    
5    # Authenticate with Google Assistant API
6    credentials = auth_helpers.get_assistant_credentials()
7    
8    # Initialize hardware and Assistant
9    with Board() as board, Assistant(credentials) as assistant:
10        # Event processing loop
11        for event in assistant.start():
12            process_event(board.led, event)
Hardware Requirements

Supported Devices: Raspberry Pi 2/3

Excluded Devices: Raspberry Pi Zero (requires alternative trigger methods)

Required Components: AIY Voice Kit with microphone and speaker array

Operational Characteristics

Activation Method: Hotword detection ("OK Google")

Audio Handling: Direct access via Google Assistant Library

Visual Feedback: Multi-state LED indicators

Network Dependency: Requires persistent internet connection

Authentication Notes

The successful implementation relies on proper OAuth configuration with Google Assistant API, achieved through Firefox installation to bypass Chromium compatibility limitations in the standard Raspberry Pi image.

assistant_library_demo.py

PYTHON
1#!/usr/bin/env python3
2# Copyright 2017 Google Inc.
3#
4# Licensed under the Apache License, Version 2.0 (the "License");
5# you may not use this file except in compliance with the License.
6# You may obtain a copy of the License at
7#
8#     http://www.apache.org/licenses/LICENSE-2.0
9#
10# Unless required by applicable law or agreed to in writing, software
11# distributed under the License is distributed on an "AS IS" BASIS,
12# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13# See the License for the specific language governing permissions and
14# limitations under the License.
15
16"""Activates the Google Assistant with hotword detection, using the Google Assistant Library.
17
18The Google Assistant Library has direct access to the audio API, so this Python
19code doesn't need to record audio.
20
21.. note:
22
23    This example depends on hotword detection (such as "Okay Google") to activate the Google
24    Assistant, which is supported only with Raspberry Pi 2/3. If you're using a Pi Zero, this
25    code won't work. Instead, you must use the button or another type of trigger, as shown
26    in assistant_library_with_button_demo.py.
27"""
28
29import logging
30import platform
31import sys
32
33from google.assistant.library.event import EventType
34
35from aiy.assistant import auth_helpers
36from aiy.assistant.library import Assistant
37from aiy.board import Board, Led
38
39def process_event(led, event):
40    logging.info(event)
41
42    if event.type == EventType.ON_START_FINISHED:
43        led.state = Led.BEACON_DARK  # Ready.
44        logging.info('Say "OK, Google" then speak, or press Ctrl+C to quit...')
45
46    elif event.type == EventType.ON_CONVERSATION_TURN_STARTED:
47        led.state = Led.ON  # Listening.
48
49    elif event.type == EventType.ON_END_OF_UTTERANCE:
50        led.state = Led.PULSE_QUICK  # Thinking.
51
52    elif (event.type == EventType.ON_CONVERSATION_TURN_FINISHED
53          or event.type == EventType.ON_CONVERSATION_TURN_TIMEOUT
54          or event.type == EventType.ON_NO_RESPONSE):
55        led.state = Led.BEACON_DARK
56
57    elif event.type == EventType.ON_ASSISTANT_ERROR and event.args and event.args['is_fatal']:
58        sys.exit(1)
59
60
61def main():
62    logging.basicConfig(level=logging.INFO)
63
64    credentials = auth_helpers.get_assistant_credentials()
65    with Board() as board, Assistant(credentials) as assistant:
66        for event in assistant.start():
67            process_event(board.led, event)
68
69
70if __name__ == '__main__':
71    main()
72
73

Wake Word Detection is LOCAL

The "OK Google" hotword detection in the Google Assistant Library happens locally on the device, not through cloud APIs. Here's the evidence:

Key Technical Details:

1. Hardware/Architecture Dependency

From the documentation in aiy.assistant.rst:

Hotword detection "depends on the ARMv7 architecture"

Pi Zero (ARMv6) cannot do hotword detection - only Pi 2/3 (ARMv7) support it

This clearly indicates local processing, as cloud APIs wouldn't have CPU architecture dependencies

2. Direct Audio Access

Multiple files mention that the Google Assistant Library has "direct access to the audio API":

PYTHON
1# From assistant_library_demo.py comments:
2"The Google Assistant Library has direct access to the audio API, so this Pythoncode doesn't need to record audio."
3. No Network Requirements for Wake Word

The wake word detection works without sending audio to the cloud - only after the wake word is detected does the conversation with Google Assistant begin (which does use cloud APIs).

Architecture Overview:

TEXT
1Local Device (Pi 2/3):
2┌─────────────────────────────┐
3│ 1. Microphone Input         │
4│ 2. LOCAL Wake Word Detection│  ← Happens locally
5│    ("OK Google" processing) │
6│ 3. Wake word detected?      │
7└─────────────────────────────┘
89              ▼ (Only if wake word detected)
10┌─────────────────────────────┐
11│ 4. Start recording user     │
12│ 5. Send audio to Google     │  ← Cloud processing
13│ 6. Get Assistant response   │
14│ 7. Process/play response    │
15└─────────────────────────────┘

Why This Matters:

Privacy:

Wake word detection happens entirely on-device - no audio is sent to Google until after "OK Google" is detected

Performance:

Local detection means instant response without network latency

Offline Capability:

Wake word works even without internet (though the conversation itself requires internet)

Hardware Requirements:

This is why specific ARM architecture is needed - the local processing requires sufficient computational power

Comparison with Cloud Speech:

The project also includes aiy.cloudspeech which does use cloud APIs for general speech recognition, but this is different from the wake word detection - it's for converting full speech to text after you've already triggered the system.