Python CLI client

The implementation of the client application is available at the following Github repository

To use the TTS Client on a Docker container, go to the tts-client/python/docker directory and run run_tts_client_python.sh script.

To send a simple request to the TTS service, use:

./run_tts_client_python.sh --service-address=IP_ADDRESS:PORT --text="Sample text to be synthesized"

To print the list of available options, use:

./run_tts_client_python.sh --help

Output audio files will be created inside tts-client/python/docker/wav directory. Source text files should be placed inside tts-client/python/docker/txt directory, if used.

NOTE: Unlike a local TTS Client instance, the run_tts_client_python.sh script doesn’t allow to set custom paths to the input/output files. Instead it uses predefined directories (wav and txt). When using options: --input-text-file (-i) and --output-file (-o), user should only provide filenames.

Local instance usage

Basic usage

TTS Client includes scripts for automatic environment configuration and launching on systems from the Debian Linux family. For launching TTS Client on other Linux-based OS or Windows, check out the “Manual Usage” section.

Before run

To install required dependencies and to prepare the virtual environment, run:

./setup.sh

Run

To run the TTS Client, use the run.sh script, e.g.:

./run --service-address IP_ADDRESS:PORT --text 'Some text to be synthesized'

To print usage description, use:

./run --help

Manual usage

Before run

Dependencies

Then install the required dependencies inside the virtual environment (this step only needs to be done the first time, for the further usage it is enough to use the existing virtual environment).

  • On Linux:

Use Python 3 with the virtual environment and install required packages (supported Python versions are: 3.5, 3.6, 3.7, 3.8, 3.9):

python3 -m venv .env
source .env/bin/activate
pip install -r requirements.txt

For Python 3.5 instead of requirements.txt use the requirements_python_3.5.txt file.

  • On Windows 10:

Temporarily change the PowerShell’s execution policy to allow scripting. Start the PowerShell with Run as Administrator and use command:

Set-ExecutionPolicy RemoteSigned

then confirm your choice.

Use Python 3 with virtual environment and install required packages (supported Python versions are: 3.5, 3.6, 3.7, 3.8, 3.9):

python3 -m venv .env
.\.env\Scripts\activate
pip install -r requirements.txt

For Python 3.5 instead of requirements.txt use the requirements_python_3.5.txt file.

To switch back PowerShell’s execution policy to the default, use command:

Set-ExecutionPolicy Restricted

Proto sources

To build the sources from .proto, run:

./make_proto.sh

Run

To run the TTS Client, activate the virtual environment first:

  • On Linux:

source .env/bin/activate
  • On Windows:

.\.env\Scripts\activate

Then run TTS Client. Sample use:

python tts_client.py -s "192.168.1.1:4321" -f 44100 -t "Some text to be synthesized"

For each request you have to provide the service address and the input text (directly as argument’s value or from text file).

Docker usage

Build docker image

To prepare a docker image with Python implementation of the TTS Client, open the project’s main directory and run following command:

docker build -f Dockerfile-python -t tts-client-python:2.2.0 . 

The build process will take several minutes. When the build process is complete, you will receive a message:

Successfully tagged tts-client-python:2.2.0

Usage

Basic usage: tts_client.py --service-address ADDRESS --text INPUT_TEXT


Available options:

Option

Description

-h, –help

Show help message and exit.

-s IP:PORT, –service-address IP:PORT

An IP address and port (address:port) of a service the client connects to.

–session-id SESSION_ID

A session ID to be passed to the service. If not specified, the service generates a default session ID.

–grpc-timeout GRPC_TIMEOUT

A timeout in milliseconds used to set gRPC deadline - how long the client is willing to wait for a reply from the server (optional).

–list-voices

Lists all available voices.

-r RESPONSE_TYPE, –response RESPONSE_TYPE

“streaming” or “single”, calls the streaming (default) or non-streaming version of Synthesize.

-t TEXT, –text TEXT

A text to be synthesized.

-i INPUT_FILE, –input-text-file INPUT_FILE

A file with text to be synthesized.

-o OUT_PATH, –out-path OUT_PATH

A path to the output wave file with synthesized audio content (default output file is TechmoTTS.wav).

-f SAMPLE_RATE, –sample_rate SAMPLE_RATE

A sample rate in Hz of synthesized audio. Set to 0 (default) to use voice’s original sample rate.

–ae ENCODING, –audio-encoding ENCODING

An encoding of the output audio, pcm16 (default) or ogg-vorbs.

–sp SPEECH_PITCH, –speech-pitch SPEECH_PITCH

Allows adjusting the default pitch of the synthesized speech (optional, can be overriden by SSML).

–sr SPEECH_RANGE, –speech-range SPEECH_RANGE

Allows adjusting the default range of the synthesized speech (optional, can be overriden by SSML).

–ss SPEECH_RATE, –speech-rate SPEECH_RATE

Allows adjusting the default rate (speed) of the synthesized speech (optional, can be overriden by SSML).

–sv SPEECH_VOLUME, –speech-volume SPEECH_VOLUME

Allows adjusting the default volume of the synthesized speech (optional, can be overriden by SSML).

–vn VOICE_NAME, –voice-name VOICE_NAME

A name of the voice used to synthesize the phrase (optional, can be overriden by SSML).

–vg VOICE_GENDER, –voice-gender VOICE_GENDER

A gender of the voice - “female” or “male” (optional, can be overriden by SSML).

–va VOICE_AGE, –voice-age VOICE_AGE

An age of the voice - adult, child, or senile (optional, can be overriden by SSML).

-l LANGUAGE, –language LANGUAGE

ISO 639-1 language code of the phrase to synthesize (optional, can be overriden by SSML).

–tls-dir TLS_DIR

If set to a path with SSL/TLS credential files (client.crt, client.key, ca.crt), use SSL/TLS authentication. Otherwise use insecure channel (default).

–list-lexicons

Lists all available lexicons.

–get-lexicon LEXICON_NAME

Sends back the content of the lexicon with the requested name.

–delete-lexicon LEXICON_NAME

Removes the lexicon with the requested name.

–put-lexicon LEXICON_NAME LEXICON_CONTENT

Adds a new lexicon with the requested name or overwrites the existing one if there is already a lexicon with such name. Content of the lexicon, shall comply to https://www.w3.org/TR/pronunciation-lexicon/.

–list-recordings VOICE_NAME

Lists all recording keys for the requested voice.

–get-recording VOICE_NAME RECORDING_KEY OUTPUT_PATH

Sends back the recording with the requested key for the requested voice in the linear PCM16 format.

–delete-recording VOICE_NAME RECORDING_KEY

Removes the recording with the requested key from the list of recordings of the requested voice.

–put-recording VOICE_NAME RECORDING_KEY AUDIO_PATH

Adds a new recording with the requested key for the requested voice, or overwrites the existing one if there is already such a key defined. The recording has to be PCM16 WAV audio.

The input text can be either a plain text or SSML (https://w3.org/TR/speech-synthesis11/). Currently the following SSML tags are supported:

  • <speak> - root xml node, with optional xml:lang attribute,

  • <prosody> - supported attributes: pitch, range, rate, and volume,

  • <break> - supported attributes: strength and time,

  • <emphasis> - supported attribute: level,

  • <say-as> - supported attribute: interpret-as (consult Techmo TTS documentation for the complete list of all available implementations),

  • <lang> - supported attribute: xml:lang,

  • <voice> - supported attributes: name, gender, and age.

Module:

You can use the TTS Client as a module for Python3. Install the package to your environment:

pip install -e ./python/.

This package provides modules call_synthesize and call_listvoices with functions with the same name, which run the client. Here are examples how to use them as a module:

from call_synthesize import call_synthesize
call_synthesize(args, text)

and

from call_listvoices import call_listvoices
call_listvoices(args)

The args are a parsed command line arguments, and text is a request text to synthesize (either a plain text or SSML). Function parameters are described in usage section above.