Naming and versioning

API names are used in the .proto files as the package specifier.

Notation

Each API name follows a predefined pattern.

Snippet 1. The valid API name notation defined in ABNF form.

name             = namespace "." release

namespace        = "techmo.asr.api"
release          = version [patch] [pre-release]

version          = "v" number
patch            = "p" number
pre-release      = ("alpha" / "beta") number

number           = %x31-39 *DIGIT

Namespace

Each API release belongs to the techmo.asr.api namespace.

Version

To distinguish API releases, a version suffix is ​​added to the namespace. Such a suffix consists of up to three elements:

  1. a version number prefixed with v,

  2. an optional patch number prefixed with p,

  3. an optional pre-release number prefixed with alpha or beta*.

* This number refers to the preceding element and indicates, accordingly, a pre-release version or pre-release patch.

The first element, version, represents a major release of an API definition. Such a release is a completely different approach to the same goal or introduces a breaking change, so revisions at this level are not interchangeable.

The second element, patch, identifies an extension of an API definition, i.e., a new service, message, or field. A service implementing it must support all functionalities added so far and transparently respond to calls to the preceding revisions, so each subsequent patch must be backwards compatible with others at this level, including the base revision without a patch number.

The third element, pre-release, is used for an experimental API definition that should not be considered for production use. Such a revision is subject to dynamic change, may be withdrawn without prior notice, and do not have to be compatible with each other at the same level.

According to Snippet 1., examples of releases that should be considered valid: v123, v1p23, v1alpha23, v1p2beta3; invalid: v0, p1, v1alpha2beta3.

Releases

The latest API release is techmo.asr.api.v1p1.

Release notes

  • techmo.asr.api.v1 - 2024-01-12

The initial revision including age, gender, language and, above all, speech recognition. Recognition can be performed in one of the continuous or single utterance modes.

  • techmo.asr.api.v1p1 - 2024-04-25 (hold-response update)

The first API extension adding the ability to temporarily stop an ASR system from returning responses.

  • techmo.asr.api.v1p2 - upcoming release, WiP (timeout update)

The second API extension providing explicit support for customization of the recognition timeouts and interim results refresh rate.

Feature comparison

Table 1. A table showing whether an API release is compatible with a feature.

google.cloud.speech.v1

techmo.asr.api.v1 series

recognition engine

age recognition

gender recognition

language recognition

speech recognition

recognition modes

single utterance and continuous recognition modes

recognition request messages

config

control message

data

recognition result types

interim and final results

Features

Recognition engine

An entity responsible for extracting a specific type of information from audio.

  • age recognition

Estimates the likely age of the speaker based on voice characteristics.

  • gender recognition

Matches the speaker voice characteristics to one of the gender classes.

  • language recognition

Detects the language of the spoken input.

  • speech recognition

Converts spoken language into text (speech-to-text for short). The original core functionality.


Recognition request messages

During a recognition request, the following types of messages sent from a client are distinguished.

  • config

Transfers an initial configuration of the request.

  • control message

Transfers flags that control flow of the ongoing request.

  • data

Transfers data of the request.

Recognition result content

  • confidence

Provides information about the system’s confidence in the quality of recognition.

  • time alignment

Supplements speech recognition results with the time of occurrence and duration of individual words.

Recognition result types

A recognition result can be one of two types.

  • final result

A result that has been finalized due to timeout occurrence or the end of incoming data.

  • interim result

An unstable result containing the current state of the system.

Recognition termination modes

Recognition can proceed in one of two modes.

  • continuous recognition mode

Expects incoming data to run out to terminate recognition. Returns at least one final result. The default mode of operation.

  • single utterance mode

Processes incoming data until the end of an utterance is detected. Returns only one final result and after that terminates recognition.

Recognition timeouts

Timeouts are responsible for finalizing part of recognition.

  • no input timeout

Specifies the maximum length of silence before any speech before returning an empty utterance.

  • recognition timeout

Specifies the maximum length of speech to be included in an utterance.

  • speech complete timeout

Specifies the length of silence after the end of speech required to determine the end of an utterance.

  • speech incomplete timeout

Recognition timeouts control

manual input timer mode

A state in which input timers do not start

auto hold response mode

held responses merging

input timer

give response

The default state of a request, in which a final result is returned immediately. This is the opposite of hold response.

After entering this state from hold response, whenever a new final result is available, the first response returned combines all previously unreturned final results with it.

hold response

A state of a request, in which a final result is not returned. Recognition results are kept in a session state. This is the opposite of give response.

The status does not affect returning interim results. The client should decide how to interpret them.

session state

A storage for request data to be reused in future requests.

This is an abstract concept that depends on service implementation. The whole service is a multi-session