gRPC API

The service base API is defined by the proto file. The API includes functions listed below:

service TTS
{
	rpc GetServiceVersion(GetServiceVersionRequest) returns (GetServiceVersionResponse);
	rpc GetResourcesId(GetResourcesIdRequest) returns (GetResourcesIdResponse);

	rpc ListVoices(ListVoicesRequest) returns (ListVoicesResponse);
	rpc ListSoundIcons(ListSoundIconsRequest) returns (ListSoundIconsResponse);
	rpc ListRecordings(ListRecordingsRequest) returns (ListRecordingsResponse);
	rpc ListLexicons(ListLexiconsRequest) returns (ListLexiconsResponse);

	rpc SynthesizeStreaming(SynthesizeRequest) returns (stream SynthesizeResponse);
	rpc Synthesize(SynthesizeRequest) returns (SynthesizeResponse);

	rpc GetChannelsUsage(GetChannelsUsageRequest) returns (GetChannelsUsageResponse);

	rpc PutRecording(PutRecordingRequest) returns (PutRecordingResponse);
	rpc DeleteRecording(DeleteRecordingRequest) returns (DeleteRecordingResponse);
	rpc GetRecording(GetRecordingRequest) returns (GetRecordingResponse);

	rpc PutLexicon(PutLexiconRequest) returns (PutLexiconResponse);
	rpc DeleteLexicon(DeleteLexiconRequest) returns (DeleteLexiconResponse);
	rpc GetLexicon(GetLexiconRequest) returns (GetLexiconResponse);
}

Functions Definitions

GetServiceVersion

rpc GetServiceVersion(GetServiceVersionRequest) returns (GetServiceVersionResponse)

Returns the version of the service, in SemVer format.

GetResourcesId

rpc GetResourcesId(GetResourcesIdRequest) returns (GetResourcesIdResponse)

Returns an identifier of the resources used by the service.

ListVoices

rpc ListVoices(ListVoicesRequest) returns (ListVoicesResponse)

Lists all available voices which can be used to synthesize speech.

ListSoundIcons

rpc ListSoundIcons(ListSoundIconsRequest) returns (ListSoundIconsResponse)

Lists all sound icons (their keys) for the requested (voice, variant, language) tuple.

ListRecordings

rpc ListRecordings(ListRecordingsRequest) returns (ListRecordingsResponse)

Lists all recordings (their keys) for the requested (voice, variant, language) tuple.

ListLexicons

rpc ListLexicons(ListLexiconsRequest) returns (ListLexiconsResponse)

Lists all currently loaded lexicons which can be referred by <lexicon> tag in synthesize requests.

SynthesizeStreaming

rpc SynthesizeStreaming(SynthesizeRequest) returns (stream SynthesizeResponse)

Synthesizes the speech (audio signal) based on the requested phrase and the optional configuration.
Returns audio signal with synthesized speech (streaming version, one or more response packets).

Synthesize

rpc Synthesize(SynthesizeRequest) returns (SynthesizeResponse)

Synthesizes the speech (audio signal) based on the requested phrase and the optional configuration.
Returns audio signal with synthesized speech (non-streaming version, always one response packet).

GetChannelsUsage

rpc GetChannelsUsage(GetChannelsUsageRequest) returns (GetChannelsUsageResponse)

Returns the info containing number of total available channels and channels currently in use.

PutRecording

rpc PutRecording(PutRecordingRequest) returns (PutRecordingResponse)

Adds a new recording with the requested key for the requested voice, or overwrites the existing one if there is already such a recording defined.

Note:
Licence must allow reconfiguration, otherwise PERMISSION_DENIED error is returned.

DeleteRecording

rpc DeleteRecording(DeleteRecordingRequest) returns (DeleteRecordingResponse)

Removes the recording with the requested key from the list of recordings of the requested voice.

Note:
Licence must allow reconfiguration, otherwise PERMISSION_DENIED error is returned.

GetRecording

rpc GetRecording(GetRecordingRequest) returns (GetRecordingResponse)

Sends back the content of the recording with the requested key for the requested voice, data is returned in the linear PCM16 format.

PutLexicon

rpc PutLexicon(PutLexiconRequest) returns (PutLexiconResponse)

Adds a new lexicon with the requested name or overwrites the existing one if there is already a lexicon with such name.

Note:
Licence must allow reconfiguration, otherwise PERMISSION_DENIED error is returned.

DeleteLexicon

rpc DeleteLexicon(DeleteLexiconRequest) returns (DeleteLexiconResponse)

Removes the lexicon with the requested name.

Note:
Licence must allow reconfiguration, otherwise PERMISSION_DENIED error is returned.

GetLexicon

rpc GetLexicon(GetLexiconRequest) returns (GetLexiconResponse)

Sends back the content of the lexicon with the requested name.

Requests and Responses Definitions

GetServiceVersionRequest

The request message for GetServiceVersion function. The message is empty.

GetServiceVersionResponse

The version info returned by GetServiceVersion function.

Field

Type

Description

version

string

Version of the sevice, in SemVer format.

GetResourcesIdRequest

The request message for GetResourcesId function. The message is empty.

GetResourcesIdResponse

The identifier returned by GetResourcesId function.

Field

Type

Description

id

string

Identifier of the resource pack the service is started with.

Identifier is an free-form string, which uniquely identifies a resource pack provided with the service.

ListVoicesRequest

The request message for ListVoices function.

Field

Type

Description

language_code

string

ISO 639-1 language code with an optional dialect.
Optional. When non-empty, limits the listed voices to the voices supporting the requested language.

ListVoicesResponse

The listing of available voices returned by ListVoices function.

Field

Type

Description

sampling_rate_hz

int32

The sampling rate in Hz of all voices (it is identical for all available voices).

voices

VoiceInfo (repeated)

The list of all available voices or voices supporting the requested language.

ListSoundIconsRequest

The request message for ListSoundIcons function.

Field

Type

Description

voice_profile

VoiceProfile

Profile of the voice to list the sound icons for.

ListSoundIconsResponse

The result of the ListSoundIcons function.

Field

Type

Description

keys

string (repeated)

The list of keys of all available sound icons for the requested voice profile.

ListRecordingsRequest

The request message for ListRecordings function.

Field

Type

Description

voice_profile

VoiceProfile

Profile of the voice to list the recordings for.

ListRecordingsResponse

The result of the ListRecordings function.

Field

Type

Description

keys

string (repeated)

The list of keys of all available recordings for the requested voice profile.

ListLexiconsRequest

The request message for ListLexicons function.

Field

Type

Description

language_code

string

ISO 639-1 language code with an optional dialect.
Optional. When non-empty, limits the listed lexicons to the lexicons supprting the requested language.

ListLexiconsResponse

The result of the ListLexicons function.

Field

Type

Description

lexicons

LexiconInfo (repeated)

The list of all available lexicons.

SynthesizeRequest

The request message for SynthesizeStreaming and Synthesize functions.

Field

Type

Description

text

string

A phrase to be synthesized.

synthesis_config

SynthesisConfig

Optional. Tweaks the default service synthesis configuration.

output_config

OutputConfig

Optional. Overrides the default output audio properties.

The message contains a phrase to be synthesized and optional configurations.
The phrase to synthesize is either a plain text in orthographic form, or a subset of SSML. Consult the service documentation for the full list of supported SSML tags.
synthesis_config’s fields can be set to specify parameters of synthesis (language, voice, prosodic properties, etc.), and output_config alters the format of the output (sampling rate, PCM16 or encoding like Ogg/Vorbis compression).

SynthesizeResponse

The result of the SynthesizeStreaming and Synthesize functions.

Field

Type

Description

sampling_rate_hz

int32

Sampling rate of the returned audio in hertz.

audio

bytes

Audio data bytes either as Linear PCM (uncompressed 16-bit signed little-endian samples),
or encoded if requested by output_config.

warnings

string (repeated)

All the warnings generated by the service during processing of the request.

During SynthesizeStreaming, a series of one or more such messages are streamed back to the caller.
On the other hand, Synthesize simply returns exactly one response message.

GetChannelsUsageRequest

The request message for GetChannelsUsage function. The message is empty.

GetChannelsUsageResponse

The result of the GetChannelsUsage function.

Field

Type

Description

total_channels_count

int32

The number of all available channels for the service, set by the licence.
INT_MAX means unrestricted access.

used_channels_count

int32

The number of channels currently in use.

PutRecordingRequest

The request message for PutRecording function.

Field

Type

Description

voice_profile

VoiceProfile

Profile of the voice to put the recording for.

recording_key

string

The key of the new recording.

sampling_rate_hz

int32

Sampling rate of the recording audio data in Hertz.

content

bytes

The recording audio data, in linear PCM16 format.

If there already exists a recording with such key for the requested voice profile, the existing recording content is overwritten.

PutRecordingResponse

The result of the PutRecording function. The message is empty, the response is used to verify returned gRPC status.

DeleteRecordingRequest

The request message for DeleteRecording function.

Field

Type

Description

voice_profile

VoiceProfile

Profile of the voice to look for the recording.

recording_key

string

The requested key of the recording (unique for any given voice profile).

DeleteRecordingResponse

The result of the DeleteRecording function. Message is empty, is used to verify returned gRPC status.

GetRecordingRequest

The request message for GetRecording function.

Field

Type

Description

voice_profile

VoiceProfile

Profile of the voice to look for the recording.

recording_key

string

The requested key of the recording (unique for any given voice profile).

GetRecordingResponse

The result of the GetRecording function.

Field

Type

Description

sampling_rate_hz

int32

Sampling rate of the recording audio data in Hertz.

content

bytes

The recording audio data, in linear PCM16 format.

PutLexiconRequest

The request message for PutLexicon function.

Field

Type

Description

uri

string

URI of the lexicon, used as uri attribute of <lexicon> tags in synthesize requests.

outside_lookup_behaviour

OutsideLookupBehaviour

Can lexicon be selected for phrases outside of <lookup> SSML tags.

content

string

A content of the lexicon, shall comply to PLS.

The service supports only a subset of PLS. Consult the service documentation for the full list of supported PLS tags.

PutLexiconResponse

The result of the PutLexicon function. Message is empty, the response is used to verify returned gRPC status.

DeleteLexiconRequest

The request message for DeleteLexicon function.

Field

Type

Description

uri

string

URI of the lexicon to delete.

DeleteLexiconResponse

The result of the DeleteLexicon function. Message is empty, is used to verify returned gRPC status.

GetLexiconRequest

The request message for GetLexicon function.

Field

Type

Description

uri

string

URI of the lexicon to list its content.

GetLexiconResponse

The result of the GetLexicon function.

Field

Type

Description

outside_lookup_behaviour

OutsideLookupBehaviour

Can lexicon be selected for phrases outside of <lookup> SSML tags.

content

string

If successful, contains the content of the lexicon, in PLS format.

VoiceProfile

Provides information about voice, its variant, and language code as a selector for set of sound icons and predefined recordings.

Field

Type

Description

voice_name

string

The voice name to look for the recording.

voice_variant

int32

The variant of the voice to look for the recording.

language_code

string

ISO 639-1 language code with an optional dialect to look for the recording.

SynthesisConfig

Provides information to the synthesizer that specifies how to process the request.

Field

Type

Description

language_code

string

ISO 639-1 language code with an optional dialect of text to be synthesized.
may be overridden by SSML tags in request text.

voice

Voice

Requested voice to be used to synthesize the text.
May be overridden by SSML tags in request text.

prosodic_properties

ProsodicProperties

Optional. Defines the parameters of synthesized speech.

silence_duration_between_segments_ms

int32

Optional. Overrides the configured value for duration of silence between segments, in milliseconds.

If there is no voice satisfying all the required criteria defined by the voice field, the voice is selected according to name (if defined) first, gender (if defined) second, and age (if defined) third.

OutputConfig

Defines the parameters of the output audio.

Field

Type

Description

audio_encoding

AudioEncoding

Requested format of the output audio stream.

sampling_rate_hz

int32

Desired sampling frequency in Hertz of synthesized audio. The value 0 means use the default Synthesizer sampling rate.

max_frame_size

int32

Maximum frame size sent at once to the client to enable RTF Throttling (default=0, throttling disabled).

When RTF Throttling is enabled, the RTF (Real Time Factor) is throttled to 1.0, with one frame (with max_frame_size size) sent in advance. The frame size is expressed in samples, regardless of audio_encoding used (frame size expressed in bytes would likely be far smaller if output is not PCM16). Enabling RTF Throttling guarantees that when connection is interrupted, the respective channel is freed after time no longer than the playback time of a one frame.

RTF Throttling is effective only for TTS::SynthesizeStreaming calls. It is silently ignored for TTS::Synthesize calls.

Voice

Voice definition used to describe requested voice in SynthesisConfig.

Field

Type

Description

name

string

The name of the voice. If empty, it is not taken into account in voice selection.

gender

Gender (optional)

Gender of the voice. If not set, it is not taken into account in voice selection.

age

Age (optional)

Age of the voice. If not set, it is not taken into account in voice selection.

variant

int32

Variant of the voice.

ProsodicProperties

Prosodic properties of the speech to be synthesized.

Field

Type

Description

pitch

float

The average speech pitch scaling factor. The value 1.0 is a neutral value.

range

float

The pitch range scaling factor. The value 1.0 is a neutral value.

rate

float

The speech rate (speed) scaling factor. The value 1.0 is a neutral value.

stress

float

The speech stress scaling factor. The value 1.0 is a neutral value.

volume

float

The speech volume scaling factor. The value 1.0 is a neutral value.

VoiceInfo

Information about a voice, returned by ListVoices function.

Field

Type

Description

supported_languages

string (repeated)

The list of ISO 639-1 codes of languages supported by the voice.

name

string

The name of the voice.

gender

Gender

Gender of the voice.

age

Age

Age of the voice.

variants_count

int32

The number of variants of the voice (at least one).

LexiconInfo

Lexicon URI and behaviour outside lookup tags returned by ListLexicons function.

Field

Type

Description

uri

string

URI of the lexicon, used as uri attributes of <lexicon> tags in synthesize requests.

outside_lookup_behaviour

OutsideLookupBehaviour

Can lexicon be selected for phrases outside of <lookup> SSML tags.

Enumerations

Gender

Enum type, indicates the gender of the voice.

Name

Number

FEMALE

0

MALE

1

Age

Enum type, indicates the age of the voice.

Name

Number

Description

ADULT

0

Selected for SSML age attribute in range (16 - 60]. Default.

CHILD

1

Selected for SSML age attribute in range [0 - 16].

SENILE

2

Selected for SSML age attribute in range (60 - inf).

AudioEncoding

Enum type, indicates the requested format of the response audio data.

Name

Number

Description

PCM16

0

Uncompressed 16-bit signed integer samples, without any header.

OGG_VORBIS

1

Ogg/Vorbis encoded data stream.

OGG_OPUS

2

Ogg/Opus encoded data stream.

A_LAW

3

ITU-T G.711 A-law encoded stream.

MU_LAW

4

ITU-T G.711 mu-law encoded stream.

Note:
When using Ogg/Opus encoding, only 8kHz, 12kHz, 16kHz, 24kHz, and 48kHz sampling rates are allowed.

OutsideLookupBehaviour

Enum type, indicates if is lexicon allowed to be matched even for phrases outside of <lookup> SSML tags.

Name

Number

ALLOWED

0

DISALLOWED

1