gRPC API¶
The service base API is defined by the proto file. The API includes functions listed below:
service TTS
{
rpc GetServiceVersion(GetServiceVersionRequest) returns (GetServiceVersionResponse);
rpc GetResourcesId(GetResourcesIdRequest) returns (GetResourcesIdResponse);
rpc ListVoices(ListVoicesRequest) returns (ListVoicesResponse);
rpc ListSoundIcons(ListSoundIconsRequest) returns (ListSoundIconsResponse);
rpc ListRecordings(ListRecordingsRequest) returns (ListRecordingsResponse);
rpc ListLexicons(ListLexiconsRequest) returns (ListLexiconsResponse);
rpc SynthesizeStreaming(SynthesizeRequest) returns (stream SynthesizeResponse);
rpc Synthesize(SynthesizeRequest) returns (SynthesizeResponse);
rpc GetChannelsUsage(GetChannelsUsageRequest) returns (GetChannelsUsageResponse);
rpc PutRecording(PutRecordingRequest) returns (PutRecordingResponse);
rpc DeleteRecording(DeleteRecordingRequest) returns (DeleteRecordingResponse);
rpc GetRecording(GetRecordingRequest) returns (GetRecordingResponse);
rpc PutLexicon(PutLexiconRequest) returns (PutLexiconResponse);
rpc DeleteLexicon(DeleteLexiconRequest) returns (DeleteLexiconResponse);
rpc GetLexicon(GetLexiconRequest) returns (GetLexiconResponse);
}
Functions Definitions¶
GetServiceVersion¶
rpc GetServiceVersion(GetServiceVersionRequest) returns (GetServiceVersionResponse)
Returns the version of the service, in SemVer format.
GetResourcesId¶
rpc GetResourcesId(GetResourcesIdRequest) returns (GetResourcesIdResponse)
Returns an identifier of the resources used by the service.
ListVoices¶
rpc ListVoices(ListVoicesRequest) returns (ListVoicesResponse)
Lists all available voices which can be used to synthesize speech.
ListSoundIcons¶
rpc ListSoundIcons(ListSoundIconsRequest) returns (ListSoundIconsResponse)
Lists all sound icons (their keys) for the requested (voice, variant, language) tuple.
ListRecordings¶
rpc ListRecordings(ListRecordingsRequest) returns (ListRecordingsResponse)
Lists all recordings (their keys) for the requested (voice, variant, language) tuple.
ListLexicons¶
rpc ListLexicons(ListLexiconsRequest) returns (ListLexiconsResponse)
Lists all currently loaded lexicons which can be referred by <lexicon> tag in synthesize requests.
SynthesizeStreaming¶
rpc SynthesizeStreaming(SynthesizeRequest) returns (stream SynthesizeResponse)
Synthesizes the speech (audio signal) based on the requested phrase and the optional configuration.
Returns audio signal with synthesized speech (streaming version, one or more response packets).
Synthesize¶
rpc Synthesize(SynthesizeRequest) returns (SynthesizeResponse)
Synthesizes the speech (audio signal) based on the requested phrase and the optional configuration.
Returns audio signal with synthesized speech (non-streaming version, always one response packet).
GetChannelsUsage¶
rpc GetChannelsUsage(GetChannelsUsageRequest) returns (GetChannelsUsageResponse)
Returns the info containing number of total available channels and channels currently in use.
PutRecording¶
rpc PutRecording(PutRecordingRequest) returns (PutRecordingResponse)
Adds a new recording with the requested key for the requested voice, or overwrites the existing one if there is already such a recording defined.
Note:
Licence must allow reconfiguration, otherwise PERMISSION_DENIED error is returned.
DeleteRecording¶
rpc DeleteRecording(DeleteRecordingRequest) returns (DeleteRecordingResponse)
Removes the recording with the requested key from the list of recordings of the requested voice.
Note:
Licence must allow reconfiguration, otherwise PERMISSION_DENIED error is returned.
GetRecording¶
rpc GetRecording(GetRecordingRequest) returns (GetRecordingResponse)
Sends back the content of the recording with the requested key for the requested voice, data is returned in the linear PCM16 format.
PutLexicon¶
rpc PutLexicon(PutLexiconRequest) returns (PutLexiconResponse)
Adds a new lexicon with the requested name or overwrites the existing one if there is already a lexicon with such name.
Note:
Licence must allow reconfiguration, otherwise PERMISSION_DENIED error is returned.
DeleteLexicon¶
rpc DeleteLexicon(DeleteLexiconRequest) returns (DeleteLexiconResponse)
Removes the lexicon with the requested name.
Note:
Licence must allow reconfiguration, otherwise PERMISSION_DENIED error is returned.
GetLexicon¶
rpc GetLexicon(GetLexiconRequest) returns (GetLexiconResponse)
Sends back the content of the lexicon with the requested name.
Requests and Responses Definitions¶
GetServiceVersionRequest¶
The request message for GetServiceVersion function. The message is empty.
GetServiceVersionResponse¶
The version info returned by GetServiceVersion function.
Field |
Type |
Description |
---|---|---|
version |
string |
Version of the sevice, in SemVer format. |
GetResourcesIdRequest¶
The request message for GetResourcesId function. The message is empty.
GetResourcesIdResponse¶
The identifier returned by GetResourcesId function.
Field |
Type |
Description |
---|---|---|
id |
string |
Identifier of the resource pack the service is started with. |
Identifier is an free-form string, which uniquely identifies a resource pack provided with the service.
ListVoicesRequest¶
The request message for ListVoices function.
Field |
Type |
Description |
---|---|---|
language_code |
string |
ISO 639-1 language code with an optional dialect. |
ListVoicesResponse¶
The listing of available voices returned by ListVoices function.
Field |
Type |
Description |
---|---|---|
sampling_rate_hz |
int32 |
The sampling rate in Hz of all voices (it is identical for all available voices). |
voices |
VoiceInfo (repeated) |
The list of all available voices or voices supporting the requested language. |
ListSoundIconsRequest¶
The request message for ListSoundIcons function.
Field |
Type |
Description |
---|---|---|
voice_profile |
Profile of the voice to list the sound icons for. |
ListSoundIconsResponse¶
The result of the ListSoundIcons function.
Field |
Type |
Description |
---|---|---|
keys |
string (repeated) |
The list of keys of all available sound icons for the requested voice profile. |
ListRecordingsRequest¶
The request message for ListRecordings function.
Field |
Type |
Description |
---|---|---|
voice_profile |
Profile of the voice to list the recordings for. |
ListRecordingsResponse¶
The result of the ListRecordings function.
Field |
Type |
Description |
---|---|---|
keys |
string (repeated) |
The list of keys of all available recordings for the requested voice profile. |
ListLexiconsRequest¶
The request message for ListLexicons function.
Field |
Type |
Description |
---|---|---|
language_code |
string |
ISO 639-1 language code with an optional dialect. |
ListLexiconsResponse¶
The result of the ListLexicons function.
Field |
Type |
Description |
---|---|---|
lexicons |
LexiconInfo (repeated) |
The list of all available lexicons. |
SynthesizeRequest¶
The request message for SynthesizeStreaming and Synthesize functions.
Field |
Type |
Description |
---|---|---|
text |
string |
A phrase to be synthesized. |
synthesis_config |
Optional. Tweaks the default service synthesis configuration. |
|
output_config |
Optional. Overrides the default output audio properties. |
The message contains a phrase to be synthesized and optional configurations.
The phrase to synthesize is either a plain text in orthographic form, or a subset of SSML.
Consult the service documentation for the full list of supported SSML tags.
synthesis_config’s fields can be set to specify parameters of synthesis (language, voice, prosodic properties, etc.),
and output_config alters the format of the output (sampling rate, PCM16 or encoding like Ogg/Vorbis compression).
SynthesizeResponse¶
The result of the SynthesizeStreaming and Synthesize functions.
Field |
Type |
Description |
---|---|---|
sampling_rate_hz |
int32 |
Sampling rate of the returned audio in hertz. |
audio |
bytes |
Audio data bytes either as Linear PCM (uncompressed 16-bit signed little-endian samples), |
warnings |
string (repeated) |
All the warnings generated by the service during processing of the request. |
During SynthesizeStreaming, a series of one or more such messages are streamed back to the caller.
On the other hand, Synthesize simply returns exactly one response message.
GetChannelsUsageRequest¶
The request message for GetChannelsUsage function. The message is empty.
GetChannelsUsageResponse¶
The result of the GetChannelsUsage function.
Field |
Type |
Description |
---|---|---|
total_channels_count |
int32 |
The number of all available channels for the service, set by the licence. |
used_channels_count |
int32 |
The number of channels currently in use. |
PutRecordingRequest¶
The request message for PutRecording function.
Field |
Type |
Description |
---|---|---|
voice_profile |
Profile of the voice to put the recording for. |
|
recording_key |
string |
The key of the new recording. |
sampling_rate_hz |
int32 |
Sampling rate of the recording audio data in Hertz. |
content |
bytes |
The recording audio data, in linear PCM16 format. |
If there already exists a recording with such key for the requested voice profile, the existing recording content is overwritten.
PutRecordingResponse¶
The result of the PutRecording function. The message is empty, the response is used to verify returned gRPC status.
DeleteRecordingRequest¶
The request message for DeleteRecording function.
Field |
Type |
Description |
---|---|---|
voice_profile |
Profile of the voice to look for the recording. |
|
recording_key |
string |
The requested key of the recording (unique for any given voice profile). |
DeleteRecordingResponse¶
The result of the DeleteRecording function. Message is empty, is used to verify returned gRPC status.
GetRecordingRequest¶
The request message for GetRecording function.
Field |
Type |
Description |
---|---|---|
voice_profile |
Profile of the voice to look for the recording. |
|
recording_key |
string |
The requested key of the recording (unique for any given voice profile). |
GetRecordingResponse¶
The result of the GetRecording function.
Field |
Type |
Description |
---|---|---|
sampling_rate_hz |
int32 |
Sampling rate of the recording audio data in Hertz. |
content |
bytes |
The recording audio data, in linear PCM16 format. |
PutLexiconRequest¶
The request message for PutLexicon function.
Field |
Type |
Description |
---|---|---|
uri |
string |
URI of the lexicon, used as uri attribute of <lexicon> tags in synthesize requests. |
outside_lookup_behaviour |
Can lexicon be selected for phrases outside of <lookup> SSML tags. |
|
content |
string |
A content of the lexicon, shall comply to PLS. |
The service supports only a subset of PLS. Consult the service documentation for the full list of supported PLS tags.
PutLexiconResponse¶
The result of the PutLexicon function. Message is empty, the response is used to verify returned gRPC status.
DeleteLexiconRequest¶
The request message for DeleteLexicon function.
Field |
Type |
Description |
---|---|---|
uri |
string |
URI of the lexicon to delete. |
DeleteLexiconResponse¶
The result of the DeleteLexicon function. Message is empty, is used to verify returned gRPC status.
GetLexiconRequest¶
The request message for GetLexicon function.
Field |
Type |
Description |
---|---|---|
uri |
string |
URI of the lexicon to list its content. |
GetLexiconResponse¶
The result of the GetLexicon function.
Field |
Type |
Description |
---|---|---|
outside_lookup_behaviour |
Can lexicon be selected for phrases outside of <lookup> SSML tags. |
|
content |
string |
If successful, contains the content of the lexicon, in PLS format. |
VoiceProfile¶
Provides information about voice, its variant, and language code as a selector for set of sound icons and predefined recordings.
Field |
Type |
Description |
---|---|---|
voice_name |
string |
The voice name to look for the recording. |
voice_variant |
int32 |
The variant of the voice to look for the recording. |
language_code |
string |
ISO 639-1 language code with an optional dialect to look for the recording. |
SynthesisConfig¶
Provides information to the synthesizer that specifies how to process the request.
Field |
Type |
Description |
---|---|---|
language_code |
string |
ISO 639-1 language code with an optional dialect of text to be synthesized. |
voice |
Requested voice to be used to synthesize the text. |
|
prosodic_properties |
Optional. Defines the parameters of synthesized speech. |
|
silence_duration_between_segments_ms |
int32 |
Optional. Overrides the configured value for duration of silence between segments, in milliseconds. |
If there is no voice satisfying all the required criteria defined by the voice field, the voice is selected according to name (if defined) first, gender (if defined) second, and age (if defined) third.
OutputConfig¶
Defines the parameters of the output audio.
Field |
Type |
Description |
---|---|---|
audio_encoding |
Requested format of the output audio stream. |
|
sampling_rate_hz |
int32 |
Desired sampling frequency in Hertz of synthesized audio. The value 0 means use the default Synthesizer sampling rate. |
max_frame_size |
int32 |
Maximum frame size sent at once to the client to enable RTF Throttling (default=0, throttling disabled). |
When RTF Throttling is enabled, the RTF (Real Time Factor) is throttled to 1.0, with one frame (with max_frame_size size) sent in advance. The frame size is expressed in samples, regardless of audio_encoding used (frame size expressed in bytes would likely be far smaller if output is not PCM16). Enabling RTF Throttling guarantees that when connection is interrupted, the respective channel is freed after time no longer than the playback time of a one frame.
RTF Throttling is effective only for TTS::SynthesizeStreaming calls. It is silently ignored for TTS::Synthesize calls.
Voice¶
Voice definition used to describe requested voice in SynthesisConfig.
Field |
Type |
Description |
---|---|---|
name |
string |
The name of the voice. If empty, it is not taken into account in voice selection. |
gender |
Gender (optional) |
Gender of the voice. If not set, it is not taken into account in voice selection. |
age |
Age (optional) |
Age of the voice. If not set, it is not taken into account in voice selection. |
variant |
int32 |
Variant of the voice. |
ProsodicProperties¶
Prosodic properties of the speech to be synthesized.
Field |
Type |
Description |
---|---|---|
pitch |
float |
The average speech pitch scaling factor. The value 1.0 is a neutral value. |
range |
float |
The pitch range scaling factor. The value 1.0 is a neutral value. |
rate |
float |
The speech rate (speed) scaling factor. The value 1.0 is a neutral value. |
stress |
float |
The speech stress scaling factor. The value 1.0 is a neutral value. |
volume |
float |
The speech volume scaling factor. The value 1.0 is a neutral value. |
VoiceInfo¶
Information about a voice, returned by ListVoices function.
Field |
Type |
Description |
---|---|---|
supported_languages |
string (repeated) |
The list of ISO 639-1 codes of languages supported by the voice. |
name |
string |
The name of the voice. |
gender |
Gender of the voice. |
|
age |
Age of the voice. |
|
variants_count |
int32 |
The number of variants of the voice (at least one). |
LexiconInfo¶
Lexicon URI and behaviour outside lookup tags returned by ListLexicons function.
Field |
Type |
Description |
---|---|---|
uri |
string |
URI of the lexicon, used as uri attributes of <lexicon> tags in synthesize requests. |
outside_lookup_behaviour |
Can lexicon be selected for phrases outside of <lookup> SSML tags. |
Enumerations¶
Gender¶
Enum type, indicates the gender of the voice.
Name |
Number |
---|---|
FEMALE |
0 |
MALE |
1 |
Age¶
Enum type, indicates the age of the voice.
Name |
Number |
Description |
---|---|---|
ADULT |
0 |
Selected for SSML age attribute in range (16 - 60]. Default. |
CHILD |
1 |
Selected for SSML age attribute in range [0 - 16]. |
SENILE |
2 |
Selected for SSML age attribute in range (60 - inf). |
AudioEncoding¶
Enum type, indicates the requested format of the response audio data.
Name |
Number |
Description |
---|---|---|
PCM16 |
0 |
Uncompressed 16-bit signed integer samples, without any header. |
OGG_VORBIS |
1 |
|
OGG_OPUS |
2 |
|
A_LAW |
3 |
ITU-T G.711 A-law encoded stream. |
MU_LAW |
4 |
ITU-T G.711 mu-law encoded stream. |
Note:
When using Ogg/Opus encoding, only 8kHz, 12kHz, 16kHz, 24kHz, and 48kHz sampling rates are allowed.
OutsideLookupBehaviour¶
Enum type, indicates if is lexicon allowed to be matched even for phrases outside of <lookup> SSML tags.
Name |
Number |
---|---|
ALLOWED |
0 |
DISALLOWED |
1 |