Protocol Documentation¶

techmo/asr/api/v1p1/asr.proto¶

AgeRecognitionAlternative¶

An alternative hypothesis of age recognition.

Field	Type	Label	Description
age	uint32		The assumed age of the person speaking in the audio, in years. For a reliable value, assure that there is only one person speaking in the audio.
confidence	float	optional	The confidence estimate, ranging from 0.0 to 1.0. Support for this feature is optional.

AgeRecognitionConfig¶

Configuration of age recognition.

Field	Type	Label	Description
enable_age_recognition	bool		The switch that enables age recognition for the request. If disabled or unspecified, the related results are excluded. The service responds with the `FAILED_PRECONDITION` gRPC status code if requested but not enabled.

AgeRecognitionResult¶

A result of age recognition.

Field	Type	Label	Description
error	techmo.api.Status		The recognition process status. It may communicate warnings. In case of an error hindering recognition, all other message fields should be left unset.
recognition_alternatives	AgeRecognitionAlternative	repeated	The confidence-ordered list of alternative recognition hypotheses.

Audio¶

Audio contents.

Field	Type	Label	Description
bytes	bytes		The audio data bytes.

AudioConfig¶

Audio configuration of a StreamingRecognize request.

Field	Type	Label	Description
encoding	AudioConfig.AudioEncoding		The encoding of the audio data sent in the request. Single channel (mono) audio is assumed. The service should respond with the `INVALID_ARGUMENT` gRPC status code if the encoding is `UNSPECIFIED`. The service should respond with the `FAILED_PRECONDITION` gRPC status code if the encoding is not supported.
sampling_rate_hz	float		The sampling rate of the audio data sent in the request. The service should silently ignore the field for encodings that are sent along wtih headers, and detect the value from them instead. The service should respond with the `INVALID_ARGUMENT` gRPC status code if the value is not greater than 0.

GenderRecognitionAlternative¶

An alternative hypothesis of gender recognition.

Field	Type	Label	Description
gender	string		The assumed gender of the person speaking in the audio. For a reliable value, assure that there is only one person speaking in the audio.
confidence	float	optional	The confidence estimate, ranging from 0.0 to 1.0. Support for this feature is optional.

GenderRecognitionConfig¶

Configuration of gender recognition.

Field	Type	Label	Description
enable_gender_recognition	bool		The switch that enables gender recognition for the request. If disabled or unspecified, the related results are excluded. The service responds with the `FAILED_PRECONDITION` gRPC status code if requested but not enabled.

GenderRecognitionResult¶

A result of gender recognition.

Field	Type	Label	Description
error	techmo.api.Status		The recognition process status. It may communicate warnings. In case of an error hindering recognition, all other message fields should be left unset.
recognition_alternatives	GenderRecognitionAlternative	repeated	The confidence-ordered list of alternative recognition hypotheses.

LanguageRecognitionAlternative¶

An alternative hypothesis of language recognition.

Field	Type	Label	Description
language	string		The language spoken in the audio, a BCP-47 tag.
confidence	float	optional	The confidence estimate, ranging from 0.0 to 1.0. Support for this feature is optional.

LanguageRecognitionConfig¶

Configuration of language recognition.

Field	Type	Label	Description
enable_language_recognition	bool		The switch that enables language recognition for the request. If disabled or unspecified, the related results are excluded. The service responds with the `FAILED_PRECONDITION` gRPC status code if requested but not enabled.

LanguageRecognitionResult¶

A result of language recognition.

Field	Type	Label	Description
error	techmo.api.Status		The recognition process status. It may communicate warnings. In case of an error hindering recognition, all other message fields should be left unset.
recognition_alternatives	LanguageRecognitionAlternative	repeated	The confidence-ordered list of alternative recognition hypotheses.

ResultConfig¶

Result configuration of a StreamingRecognize request.

Field	Type	Description
enable_single_utterance	bool	The switch that toggles continuous recognition into single utterance mode. The service returns a final result for each end of utterance it detects in the audio, which may occur multiple times during a request. If enabled, the request terminates right after its first final result.
enable_interim_results	bool	The switch that allows interim results. If enabled, results containing tentative hypotheses may be returned in addition to final ones. The service should silently ignore this field if it is unsupported.
enable_held_responses_merging	bool	The switch to allow the service merging responses in the “hold response” state. If enabled and there is more than a single response held, the service does not return them in a batch. Instead, it tries to merge their results into a single response. The service should respond with the `INVALID_ARGUMENT` gRPC status code if the `recognition_alternatives_limit` field of the `SpeechRecognitionConfig` message is greater than 1.

SpeechRecognitionAlternative¶

An alternative hypothesis of speech recognition.

Field	Type	Label	Description
transcript	string		The transcript of the audio.
confidence	float	optional	The confidence estimate, ranging from 0.0 to 1.0. Support for this feature is optional.
words	SpeechRecognitionWord	repeated	The details of the transcript’s words. Empty unless `enable_time_alignment` is `true` in the request’s `SpeechRecognitionConfig`.

SpeechRecognitionConfig¶

Configuration for speech recognition.

Field	Type	Label	Description
enable_speech_recognition	bool		The switch that enables speech recognition for the request. If disabled or unspecified, the related results are excluded. The service responds with the `FAILED_PRECONDITION` gRPC status code if requested but not enabled.
recognition_alternatives_limit	uint32		The maximum number of alternative transcriptions allowed to be included per response. The actual count received can be less than the specified value and may also be equal to 0. If unspecified or 0, one alternative is allowed to be returned too.
enable_time_alignment	bool		The switch that enables additional time alignment of recognitions in word details. If enabled, the `words` field of a `SpeechRecognitionAlternative` message includes a list of `SpeechRecognitionWord` messages. Otherwise, it remains empty. The service responds with the `FAILED_PRECONDITION` gRPC status code if requested but not enabled.
language_group_name	string		The name of a language group of models to be used. If left unspecified, it backs to the service’s default group. The service responds with the `NOT_FOUND` gRPC status code if the name is not registered.
model_name	string		The name of a model to be used. If left unspecified, it backs to the selected langugage group’s default. The service responds with the `NOT_FOUND` gRPC status code if the name is not registered.
config_fields	SpeechRecognitionConfig.ConfigFieldsEntry	repeated	Deprecated. The additional advanced service-dependend configuration for its speech recognizer. It may be silently ignored.

SpeechRecognitionConfig.ConfigFieldsEntry¶

Field	Type	Label	Description
key	string
value	string

SpeechRecognitionResult¶

A result of speech recognition.

Field	Type	Label	Description
error	techmo.api.Status		The recognition process status. It may communicate warnings. In case of an error hindering recognition, all other message fields should be left unset.
recognition_alternatives	SpeechRecognitionAlternative	repeated	The confidence-ordered list of alternative recognition hypotheses.
language_group_name	string		The actual name of the language group of the model, unrelated to the actual language spoken in the audio.
model_name	string		The actual name of the model used to obtain the result.

SpeechRecognitionWord¶

Details of a single word in speech recognition.

Field	Type	Label	Description
transcript	string		The transcript of the word itself.
confidence	float	optional	The confidence estimate, ranging from 0.0 to 1.0. Support for this feature is optional.
start_time	google.protobuf.Duration		The start time of the word relative to the beginning of the entire audio.
end_time	google.protobuf.Duration		The end time of the word relative to the beginning of the entire audio.

StreamingConfig¶

Streaming configuration of a StreamingRecognize request.

Field	Type	Label	Description
enable_manual_input_timer	bool		The switch that enables manual control of the input timer. The timer imposes two constraints: one that finalizes recognition after a specified period unless speech is detected, and the other that limits the total time for an utterance. Manual control allows recognition to begin but delays enforcement of these constraints. The timer restarts after each detected end of utterance (each final result). If enabled, the timer does not start automatically. Instead, it can be initiated by sending a `StreamingRecognizeRequestControlMessage` with the `start_input_timer` field set to `true` as needed. This should occur after the beginning of the request and be repeated after each final result.
enable_auto_hold_response	bool		The switch to automatically set the service in the “hold response” state at the begginig of the request and after each final result. The “hold response” state means that the internal recognition process continues, but results are kept, not returned. When needed, the state can be toggled into the “give response” state by sending the `StreamingRecognizeRequestControlMessage` message with the `give_response` field set to `true`. For speech recognit In the “give response” state the service responds as soons as it is ready. Any held responses are returned in a batch.

StreamingRecognizeRequest¶

A message streamed from the client through the StreamingRecognize method.

Field	Type	Description
config	StreamingRecognizeRequestConfig	The immutable initial configuration of the request. Must be sent once in the request’s first message.
control_message	StreamingRecognizeRequestControlMessage	The message controlling the processing flow of the request. May be sent multiple times except in the request’s first message.
data	StreamingRecognizeRequestData	The data contents of the request itself. May be sent multiple times except in the request’s first message.

StreamingRecognizeRequestConfig¶

A message holding configuration of a StreamingRecognize request.

Field	Type	Description
audio_config	AudioConfig	Part of the configuration for the request’s audio content.
result_config	ResultConfig	Part of the configuration for the request’s result form.
streaming_config	StreamingConfig	Part of the configuration for the request’s processing flow.
speech_recognition_config	SpeechRecognitionConfig	Part of the configuration for speech recognition.
age_recognition_config	AgeRecognitionConfig	Part of the configuration for age recognition.
gender_recognition_config	GenderRecognitionConfig	Part of the configuration for gender recognition.
language_recognition_config	LanguageRecognitionConfig	Part of the configuration for language recognition.

StreamingRecognizeRequestControlMessage¶

A message controlling the processing flow of a StreamingRecognize request.

Field	Type	Label	Description
start_input_timer	bool	optional	The flag that starts the input timer on demand and resets after each final result. It is silently ignored if the manual input timer setting is disabled for the request.
give_response	bool	optional	The flag to allow the service to return a response. After receiving this message, the service remains in the “give response” state. Ignored when the service is already in the “give response” state. Mutually exclusive with the `hold_response` field.
hold_response	bool	optional	The flag to forbid the service from returning a response. After receiving this message, the service remains in the “hold response” state. Ignored when the service is already in the “hold response” state. Mutually exclusive with the `give_response` field.

StreamingRecognizeRequestData¶

A message that carries data contents of a StreamingRecognizeRequest request.

Field	Type	Label	Description
audio	Audio		Part of the audio to perform recognition on.

StreamingRecognizeResponse¶

A message streamed from the service through the StreamingRecognize method.

Field	Type	Label	Description
result	StreamingRecognizeResult		The combined recognition results for another part of the audio.
processed_audio_duration	google.protobuf.Duration		The cumulative duration of the processed audio during the request, not necessarily matching the actual length of the sent audio, mandatorily updated with each final result.

StreamingRecognizeResult¶

Combined recognition result.

Field	Type	Description
error	techmo.api.Status	The recognition process status. It may communicate warnings. In case of an error hindering recognition, all other message fields should be left unset.
is_final	bool	The flag indicating whether the result is interim or final.
result_finalization_cause	StreamingRecognizeResult.ResultFinalizationCause	The field indicating the cause of result finalization. For interim results, the service should leave the field as `UNSPECIFIED`. For final results, the service must set the field to a value other than `UNSPECIFIED`.
speech_recognition_result	SpeechRecognitionResult	The speech recognition result for another part of the processed audio, new with each final result, updates with each interim one. To obtain a complete result for all processed audio, for each final result received, a client should pick one of the result’s recognition alternatives and buffer it on its own. It must be omitted if speech recognition is disabled.
age_recognition_result	AgeRecognitionResult	The current age recognition result for all processed audio, updated with each final result. It may be omitted in an interim result and must be omitted if age recognition is disabled.
gender_recognition_result	GenderRecognitionResult	The current gender recognition result for all processed audio, updated with each final result. It may be omitted in an interim result and must be omitted if gender recognition is disabled.
language_recognition_result	LanguageRecognitionResult	The current language recognition result for all processed audio, updated with each final result. It may be omitted in an interim result and must be omitted if language recognition is disabled.

AudioConfig.AudioEncoding¶

The possible audio encodings.

Name	Number	Description
UNSPECIFIED	0	Unspecified audio encoding.
LINEAR16	1	Linear pulse-code modulation of uncompressed 16-bit signed little-endian samples.
FLAC	2	Free Lossless Audio Codec (FLAC). The encoding requires only about half the bandwidth of `LINEAR16`. 16-bit and 24-bit samples. Not all fields in `STREAMINFO` are supported. When set, the service ignores the `sampling_rate_hz` field and detects the actual value from audio header instead.
OGG_OPUS	6	Ogg Encapsulated Opus Audio Codec (OggOpus). When set, the service ignores the `sampling_rate_hz` field and detects the actual value from audio header instead.
MP3	8	MP3 (ISO/IEC 11172-3 and ISO/IEC 13818-3). Only constant bitrate. When set, the service ignores the `sampling_rate_hz` field and detects the actual value from audio header instead.

StreamingRecognizeResult.ResultFinalizationCause¶

The anticipated causes for the service to finalize a result.

Name	Number	Description
UNSPECIFIED	0	The cause is not specified.
SUCCESS	1	The speech recognition result is not empty and the end of utterance is detected.
NO_INPUT_TIMEOUT	2	The speech recognition result is empty after the duration to expect a result is reached.
SUCCESS_MAXTIME	3	The speech recognition result is not empty after the utterance duration limit is reached. The returned speech recognition is incomplete and should be completed in the following result.
PARTIAL_MATCH	4	Unused.
NO_MATCH_MAXTIME	5	The speech recognition result is empty after the utterance duration limit is reached.

Asr¶

An automatic speech recognition (ASR) service providing a solution for speech-to-text conversion extended by the assessment of additional speech and speaker features.

Method Name	Request Type	Response Type	Description
StreamingRecognize	StreamingRecognizeRequest stream	StreamingRecognizeResponse stream	Perform bidirectional streaming recognition.

Scalar Value Types¶

.proto Type	Notes	C++	Java	Python	Go	C#	PHP	Ruby
double		double	double	float	float64	double	float	Float
float		float	float	float	float32	float	float	Float
int32	Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead.	int32	int	int	int32	int	integer	Bignum or Fixnum (as required)
int64	Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead.	int64	long	int/long	int64	long	integer/string	Bignum
uint32	Uses variable-length encoding.	uint32	int	int/long	uint32	uint	integer	Bignum or Fixnum (as required)
uint64	Uses variable-length encoding.	uint64	long	int/long	uint64	ulong	integer/string	Bignum or Fixnum (as required)
sint32	Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s.	int32	int	int	int32	int	integer	Bignum or Fixnum (as required)
sint64	Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s.	int64	long	int/long	int64	long	integer/string	Bignum
fixed32	Always four bytes. More efficient than uint32 if values are often greater than 2^28.	uint32	int	int	uint32	uint	integer	Bignum or Fixnum (as required)
fixed64	Always eight bytes. More efficient than uint64 if values are often greater than 2^56.	uint64	long	int/long	uint64	ulong	integer/string	Bignum
sfixed32	Always four bytes.	int32	int	int	int32	int	integer	Bignum or Fixnum (as required)
sfixed64	Always eight bytes.	int64	long	int/long	int64	long	integer/string	Bignum
bool		bool	boolean	boolean	bool	bool	boolean	TrueClass/FalseClass
string	A string must always contain UTF-8 encoded or 7-bit ASCII text.	string	String	str/unicode	string	string	string	String (UTF-8)
bytes	May contain any arbitrary sequence of bytes.	string	ByteString	str	[]byte	ByteString	string	String (ASCII-8BIT)