Naming and versioning¶
API names are used in the .proto
files as the package specifier.
Notation¶
Each API name follows a predefined pattern.
Snippet 1. The valid API name notation defined in ABNF form.
name = namespace "." release
namespace = "techmo.asr.api"
release = version [patch] [pre-release]
version = "v" number
patch = "p" number
pre-release = ("alpha" / "beta") number
number = %x31-39 *DIGIT
Namespace¶
Each API release belongs to the techmo.asr.api
namespace.
Version¶
To distinguish API releases, a version suffix is added to the namespace. Such a suffix consists of up to three elements:
a version number prefixed with
v
,an optional patch number prefixed with
p
,an optional pre-release number prefixed with
alpha
orbeta
*.
* This number refers to the preceding element and indicates, accordingly, a pre-release version or pre-release patch.
The first element, version, represents a major release of an API definition. Such a release is a completely different approach to the same goal or introduces a breaking change, so revisions at this level are not interchangeable.
The second element, patch, identifies an extension of an API definition, i.e., a new service, message, or field. A service implementing it must support all functionalities added so far and transparently respond to calls to the preceding revisions, so each subsequent patch must be backwards compatible with others at this level, including the base revision without a patch number.
The third element, pre-release, is used for an experimental API definition that should not be considered for production use. Such a revision is subject to dynamic change, may be withdrawn without prior notice, and do not have to be compatible with each other at the same level.
According to Snippet 1., examples of releases that should be considered valid: v123
, v1p23
, v1alpha23
, v1p2beta3
; invalid: v0
, p1
, v1alpha2beta3
.
Releases¶
The latest API release is techmo.asr.api.v1p1.
Release notes¶
techmo.asr.api.v1 - 2024-01-12
The initial revision including age, gender, language and, above all, speech recognition. Recognition can be performed in one of the continuous or single utterance modes.
techmo.asr.api.v1p1 - 2024-04-25 (hold-response update)
The first API extension adding the ability to temporarily stop an ASR system from returning responses.
techmo.asr.api.v1p2 - upcoming release, WiP (timeout update)
The second API extension providing explicit support for customization of the recognition timeouts and interim results refresh rate.
Feature comparison¶
Table 1. A table showing whether an API release is compatible with a feature.
google.cloud.speech.v1 |
techmo.asr.api.v1 series |
|
---|---|---|
✓ |
||
✓ |
||
✓ |
||
✓ |
✓ |
|
single utterance and continuous recognition modes |
✓ |
✓ |
✓ |
✓ |
|
✓ |
||
✓ |
✓ |
|
✓ |
✓ |
Features¶
Recognition engine¶
An entity responsible for extracting a specific type of information from audio.
age recognition
Estimates the likely age of the speaker based on voice characteristics.
gender recognition
Matches the speaker voice characteristics to one of the gender classes.
language recognition
Detects the language of the spoken input.
speech recognition
Converts spoken language into text (speech-to-text for short). The original core functionality.
Recognition request messages¶
During a recognition request, the following types of messages sent from a client are distinguished.
config
Transfers an initial configuration of the request.
control message
Transfers flags that control flow of the ongoing request.
data
Transfers data of the request.
Recognition result content¶
confidence
Provides information about the system’s confidence in the quality of recognition.
time alignment
Supplements speech recognition results with the time of occurrence and duration of individual words.
Recognition result types¶
A recognition result can be one of two types.
final result
A result that has been finalized due to timeout occurrence or the end of incoming data.
interim result
An unstable result containing the current state of the system.
Recognition termination modes¶
Recognition can proceed in one of two modes.
continuous recognition mode
Expects incoming data to run out to terminate recognition. Returns at least one final result. The default mode of operation.
single utterance mode
Processes incoming data until the end of an utterance is detected. Returns only one final result and after that terminates recognition.
Recognition timeouts¶
Timeouts are responsible for finalizing part of recognition.
no input timeout
Specifies the maximum length of silence before any speech before returning an empty utterance.
recognition timeout
Specifies the maximum length of speech to be included in an utterance.
speech complete timeout
Specifies the length of silence after the end of speech required to determine the end of an utterance.
speech incomplete timeout
Recognition timeouts control¶
manual input timer mode¶
A state in which input timers do not start
auto hold response mode¶
New in v1p1.
held responses merging¶
New in v1p1.
input timer¶
give response¶
New in v1p1.
The default state of a request, in which a final result is returned immediately. This is the opposite of hold response.
After entering this state from hold response, whenever a new final result is available, the first response returned combines all previously unreturned final results with it.
hold response¶
New in v1p1.
A state of a request, in which a final result is not returned. Recognition results are kept in a session state. This is the opposite of give response.
The status does not affect returning interim results. The client should decide how to interpret them.
session state¶
A storage for request data to be reused in future requests.
This is an abstract concept that depends on service implementation. The whole service is a multi-session