gRPC API¶

The service base API is defined by the proto file. The API includes functions listed below:

service TTS
{
	rpc GetServiceVersion(GetServiceVersionRequest) returns (GetServiceVersionResponse);
	rpc GetResourcesId(GetResourcesIdRequest) returns (GetResourcesIdResponse);

	rpc ListVoices(ListVoicesRequest) returns (ListVoicesResponse);
	rpc ListSoundIcons(ListSoundIconsRequest) returns (ListSoundIconsResponse);
	rpc ListRecordings(ListRecordingsRequest) returns (ListRecordingsResponse);
	rpc ListLexicons(ListLexiconsRequest) returns (ListLexiconsResponse);

	rpc SynthesizeStreaming(SynthesizeRequest) returns (stream SynthesizeResponse);
	rpc Synthesize(SynthesizeRequest) returns (SynthesizeResponse);

	rpc GetChannelsUsage(GetChannelsUsageRequest) returns (GetChannelsUsageResponse);

	rpc PutRecording(PutRecordingRequest) returns (PutRecordingResponse);
	rpc DeleteRecording(DeleteRecordingRequest) returns (DeleteRecordingResponse);
	rpc GetRecording(GetRecordingRequest) returns (GetRecordingResponse);

	rpc PutLexicon(PutLexiconRequest) returns (PutLexiconResponse);
	rpc DeleteLexicon(DeleteLexiconRequest) returns (DeleteLexiconResponse);
	rpc GetLexicon(GetLexiconRequest) returns (GetLexiconResponse);
}

Functions Definitions¶

GetServiceVersion¶

rpc GetServiceVersion(GetServiceVersionRequest) returns (GetServiceVersionResponse)

Returns the version of the service, in SemVer format.

GetResourcesId¶

rpc GetResourcesId(GetResourcesIdRequest) returns (GetResourcesIdResponse)

Returns an identifier of the resources used by the service.

ListVoices¶

rpc ListVoices(ListVoicesRequest) returns (ListVoicesResponse)

Lists all available voices which can be used to synthesize speech.

ListSoundIcons¶

rpc ListSoundIcons(ListSoundIconsRequest) returns (ListSoundIconsResponse)

Lists all sound icons (their keys) for the requested (voice, variant, language) tuple.

ListRecordings¶

rpc ListRecordings(ListRecordingsRequest) returns (ListRecordingsResponse)

Lists all recordings (their keys) for the requested (voice, variant, language) tuple.

ListLexicons¶

rpc ListLexicons(ListLexiconsRequest) returns (ListLexiconsResponse)

Lists all currently loaded lexicons which can be referred by <lexicon> tag in synthesize requests.

SynthesizeStreaming¶

rpc SynthesizeStreaming(SynthesizeRequest) returns (stream SynthesizeResponse)

Synthesizes the speech (audio signal) based on the requested phrase and the optional configuration.
Returns audio signal with synthesized speech (streaming version, one or more response packets).

Synthesize¶

rpc Synthesize(SynthesizeRequest) returns (SynthesizeResponse)

Synthesizes the speech (audio signal) based on the requested phrase and the optional configuration.
Returns audio signal with synthesized speech (non-streaming version, always one response packet).

GetChannelsUsage¶

rpc GetChannelsUsage(GetChannelsUsageRequest) returns (GetChannelsUsageResponse)

Returns the info containing number of total available channels and channels currently in use.

PutRecording¶

rpc PutRecording(PutRecordingRequest) returns (PutRecordingResponse)

Adds a new recording with the requested key for the requested voice, or overwrites the existing one if there is already such a recording defined.

Note:
Licence must allow reconfiguration, otherwise PERMISSION_DENIED error is returned.

DeleteRecording¶

rpc DeleteRecording(DeleteRecordingRequest) returns (DeleteRecordingResponse)

Removes the recording with the requested key from the list of recordings of the requested voice.

Note:
Licence must allow reconfiguration, otherwise PERMISSION_DENIED error is returned.

GetRecording¶

rpc GetRecording(GetRecordingRequest) returns (GetRecordingResponse)

Sends back the content of the recording with the requested key for the requested voice, data is returned in the linear PCM16 format.

PutLexicon¶

rpc PutLexicon(PutLexiconRequest) returns (PutLexiconResponse)

Adds a new lexicon with the requested name or overwrites the existing one if there is already a lexicon with such name.

Note:
Licence must allow reconfiguration, otherwise PERMISSION_DENIED error is returned.

DeleteLexicon¶

rpc DeleteLexicon(DeleteLexiconRequest) returns (DeleteLexiconResponse)

Removes the lexicon with the requested name.

Note:
Licence must allow reconfiguration, otherwise PERMISSION_DENIED error is returned.

GetLexicon¶

rpc GetLexicon(GetLexiconRequest) returns (GetLexiconResponse)

Sends back the content of the lexicon with the requested name.

Requests and Responses Definitions¶

GetServiceVersionRequest¶

The request message for GetServiceVersion function. The message is empty.

GetServiceVersionResponse¶

The version info returned by GetServiceVersion function.

Field	Type	Description
version	string	Version of the sevice, in SemVer format.

GetResourcesIdRequest¶

The request message for GetResourcesId function. The message is empty.

GetResourcesIdResponse¶

The identifier returned by GetResourcesId function.

Field	Type	Description
id	string	Identifier of the resource pack the service is started with.

Identifier is an free-form string, which uniquely identifies a resource pack provided with the service.

ListVoicesRequest¶

The request message for ListVoices function.

Field	Type	Description
language_code	string	ISO 639-1 language code with an optional dialect. Optional. When non-empty, limits the listed voices to the voices supporting the requested language.

ListVoicesResponse¶

The listing of available voices returned by ListVoices function.

Field	Type	Description
sampling_rate_hz	int32	The sampling rate in Hz of all voices (it is identical for all available voices).
voices	VoiceInfo (repeated)	The list of all available voices or voices supporting the requested language.

ListSoundIconsRequest¶

The request message for ListSoundIcons function.

Field	Type	Description
voice_profile	VoiceProfile	Profile of the voice to list the sound icons for.

ListSoundIconsResponse¶

The result of the ListSoundIcons function.

Field	Type	Description
keys	string (repeated)	The list of keys of all available sound icons for the requested voice profile.

ListRecordingsRequest¶

The request message for ListRecordings function.

Field	Type	Description
voice_profile	VoiceProfile	Profile of the voice to list the recordings for.

ListRecordingsResponse¶

The result of the ListRecordings function.

Field	Type	Description
keys	string (repeated)	The list of keys of all available recordings for the requested voice profile.

ListLexiconsRequest¶

The request message for ListLexicons function.

Field	Type	Description
language_code	string	ISO 639-1 language code with an optional dialect. Optional. When non-empty, limits the listed lexicons to the lexicons supprting the requested language.

ListLexiconsResponse¶

The result of the ListLexicons function.

Field	Type	Description
lexicons	LexiconInfo (repeated)	The list of all available lexicons.

SynthesizeRequest¶

The request message for SynthesizeStreaming and Synthesize functions.

Field	Type	Description
text	string	A phrase to be synthesized.
synthesis_config	SynthesisConfig	Optional. Tweaks the default service synthesis configuration.
output_config	OutputConfig	Optional. Overrides the default output audio properties.

The message contains a phrase to be synthesized and optional configurations.
The phrase to synthesize is either a plain text in orthographic form, or a subset of SSML. Consult the service documentation for the full list of supported SSML tags.
synthesis_config’s fields can be set to specify parameters of synthesis (language, voice, prosodic properties, etc.), and output_config alters the format of the output (sampling rate, PCM16 or encoding like Ogg/Vorbis compression).

SynthesizeResponse¶

The result of the SynthesizeStreaming and Synthesize functions.

Field	Type	Description
sampling_rate_hz	int32	Sampling rate of the returned audio in hertz.
audio	bytes	Audio data bytes either as Linear PCM (uncompressed 16-bit signed little-endian samples), or encoded if requested by output_config.
warnings	string (repeated)	All the warnings generated by the service during processing of the request.

During SynthesizeStreaming, a series of one or more such messages are streamed back to the caller.
On the other hand, Synthesize simply returns exactly one response message.

GetChannelsUsageRequest¶

The request message for GetChannelsUsage function. The message is empty.

GetChannelsUsageResponse¶

The result of the GetChannelsUsage function.

Field	Type	Description
total_channels_count	int32	The number of all available channels for the service, set by the licence. INT_MAX means unrestricted access.
used_channels_count	int32	The number of channels currently in use.

PutRecordingRequest¶

The request message for PutRecording function.

Field	Type	Description
voice_profile	VoiceProfile	Profile of the voice to put the recording for.
recording_key	string	The key of the new recording.
sampling_rate_hz	int32	Sampling rate of the recording audio data in Hertz.
content	bytes	The recording audio data, in linear PCM16 format.

If there already exists a recording with such key for the requested voice profile, the existing recording content is overwritten.

PutRecordingResponse¶

The result of the PutRecording function. The message is empty, the response is used to verify returned gRPC status.

DeleteRecordingRequest¶

The request message for DeleteRecording function.

Field	Type	Description
voice_profile	VoiceProfile	Profile of the voice to look for the recording.
recording_key	string	The requested key of the recording (unique for any given voice profile).

DeleteRecordingResponse¶

The result of the DeleteRecording function. Message is empty, is used to verify returned gRPC status.

GetRecordingRequest¶

The request message for GetRecording function.

Field	Type	Description
voice_profile	VoiceProfile	Profile of the voice to look for the recording.
recording_key	string	The requested key of the recording (unique for any given voice profile).

GetRecordingResponse¶

The result of the GetRecording function.

Field	Type	Description
sampling_rate_hz	int32	Sampling rate of the recording audio data in Hertz.
content	bytes	The recording audio data, in linear PCM16 format.

PutLexiconRequest¶

The request message for PutLexicon function.

Field	Type	Description
uri	string	URI of the lexicon, used as uri attribute of <lexicon> tags in synthesize requests.
outside_lookup_behaviour	OutsideLookupBehaviour	Can lexicon be selected for phrases outside of <lookup> SSML tags.
content	string	A content of the lexicon, shall comply to PLS.

The service supports only a subset of PLS. Consult the service documentation for the full list of supported PLS tags.

PutLexiconResponse¶

The result of the PutLexicon function. Message is empty, the response is used to verify returned gRPC status.

DeleteLexiconRequest¶

The request message for DeleteLexicon function.

Field	Type	Description
uri	string	URI of the lexicon to delete.

DeleteLexiconResponse¶

The result of the DeleteLexicon function. Message is empty, is used to verify returned gRPC status.

GetLexiconRequest¶

The request message for GetLexicon function.

Field	Type	Description
uri	string	URI of the lexicon to list its content.

GetLexiconResponse¶

The result of the GetLexicon function.

Field	Type	Description
outside_lookup_behaviour	OutsideLookupBehaviour	Can lexicon be selected for phrases outside of <lookup> SSML tags.
content	string	If successful, contains the content of the lexicon, in PLS format.

VoiceProfile¶

Provides information about voice, its variant, and language code as a selector for set of sound icons and predefined recordings.

Field	Type	Description
voice_name	string	The voice name to look for the recording.
voice_variant	int32	The variant of the voice to look for the recording.
language_code	string	ISO 639-1 language code with an optional dialect to look for the recording.

SynthesisConfig¶

Provides information to the synthesizer that specifies how to process the request.

Field	Type	Description
language_code	string	ISO 639-1 language code with an optional dialect of text to be synthesized. may be overridden by SSML tags in request text.
voice	Voice	Requested voice to be used to synthesize the text. May be overridden by SSML tags in request text.
prosodic_properties	ProsodicProperties	Optional. Defines the parameters of synthesized speech.
silence_duration_between_segments_ms	int32	Optional. Overrides the configured value for duration of silence between segments, in milliseconds.

If there is no voice satisfying all the required criteria defined by the voice field, the voice is selected according to name (if defined) first, gender (if defined) second, and age (if defined) third.

OutputConfig¶

Defines the parameters of the output audio.

Field	Type	Description
audio_encoding	AudioEncoding	Requested format of the output audio stream.
sampling_rate_hz	int32	Desired sampling frequency in Hertz of synthesized audio. The value 0 means use the default Synthesizer sampling rate.
max_frame_size	int32	Maximum frame size sent at once to the client to enable RTF Throttling (default=0, throttling disabled).

When RTF Throttling is enabled, the RTF (Real Time Factor) is throttled to 1.0, with one frame (with max_frame_size size) sent in advance. The frame size is expressed in samples, regardless of audio_encoding used (frame size expressed in bytes would likely be far smaller if output is not PCM16). Enabling RTF Throttling guarantees that when connection is interrupted, the respective channel is freed after time no longer than the playback time of a one frame.

RTF Throttling is effective only for TTS::SynthesizeStreaming calls. It is silently ignored for TTS::Synthesize calls.

Voice¶

Voice definition used to describe requested voice in SynthesisConfig.

Field	Type	Description
name	string	The name of the voice. If empty, it is not taken into account in voice selection.
gender	Gender (optional)	Gender of the voice. If not set, it is not taken into account in voice selection.
age	Age (optional)	Age of the voice. If not set, it is not taken into account in voice selection.
variant	int32	Variant of the voice.

ProsodicProperties¶

Prosodic properties of the speech to be synthesized.

Field	Type	Description
pitch	float	The average speech pitch scaling factor. The value 1.0 is a neutral value.
range	float	The pitch range scaling factor. The value 1.0 is a neutral value.
rate	float	The speech rate (speed) scaling factor. The value 1.0 is a neutral value.
stress	float	The speech stress scaling factor. The value 1.0 is a neutral value.
volume	float	The speech volume scaling factor. The value 1.0 is a neutral value.

VoiceInfo¶

Information about a voice, returned by ListVoices function.

Field	Type	Description
supported_languages	string (repeated)	The list of ISO 639-1 codes of languages supported by the voice.
name	string	The name of the voice.
gender	Gender	Gender of the voice.
age	Age	Age of the voice.
variants_count	int32	The number of variants of the voice (at least one).

LexiconInfo¶

Lexicon URI and behaviour outside lookup tags returned by ListLexicons function.

Field	Type	Description
uri	string	URI of the lexicon, used as uri attributes of <lexicon> tags in synthesize requests.
outside_lookup_behaviour	OutsideLookupBehaviour	Can lexicon be selected for phrases outside of <lookup> SSML tags.

Enumerations¶

Gender¶

Enum type, indicates the gender of the voice.

Name	Number
FEMALE	0
MALE	1

Age¶

Enum type, indicates the age of the voice.

Name	Number	Description
ADULT	0	Selected for SSML age attribute in range (16 - 60]. Default.
CHILD	1	Selected for SSML age attribute in range [0 - 16].
SENILE	2	Selected for SSML age attribute in range (60 - inf).

AudioEncoding¶

Enum type, indicates the requested format of the response audio data.

Name	Number	Description
PCM16	0	Uncompressed 16-bit signed integer samples, without any header.
OGG_VORBIS	1	Ogg/Vorbis encoded data stream.
OGG_OPUS	2	Ogg/Opus encoded data stream.
A_LAW	3	ITU-T G.711 A-law encoded stream.
MU_LAW	4	ITU-T G.711 mu-law encoded stream.

Note:
When using Ogg/Opus encoding, only 8kHz, 12kHz, 16kHz, 24kHz, and 48kHz sampling rates are allowed.

OutsideLookupBehaviour¶

Enum type, indicates if is lexicon allowed to be matched even for phrases outside of <lookup> SSML tags.

Name	Number
ALLOWED	0
DISALLOWED	1