Python CLI client¶

The implementation of the client application is available at the following Github repository

To use the TTS Client on a Docker container, go to the tts-client/python/docker directory and run run_tts_client_python.sh script.

To send a simple request to the TTS service, use:

./run_tts_client_python.sh --service-address=IP_ADDRESS:PORT --text="Sample text to be synthesized"

To print the list of available options, use:

./run_tts_client_python.sh --help

Output audio files will be created inside tts-client/python/docker/wav directory. Source text files should be placed inside tts-client/python/docker/txt directory, if used.

NOTE: Unlike a local TTS Client instance, the run_tts_client_python.sh script doesn’t allow to set custom paths to the input/output files. Instead it uses predefined directories (wav and txt). When using options: --input-text-file (-i) and --output-file (-o), user should only provide filenames.

Local instance usage¶

Basic usage¶

TTS Client includes scripts for automatic environment configuration and launching on systems from the Debian Linux family. For launching TTS Client on other Linux-based OS or Windows, check out the “Manual Usage” section.

Before run¶

To install required dependencies and to prepare the virtual environment, run:

./setup.sh

Run¶

To run the TTS Client, use the run.sh script, e.g.:

./run --service-address IP_ADDRESS:PORT --text 'Some text to be synthesized'

To print usage description, use:

./run --help

Manual usage¶

Before run¶

Dependencies¶

Then install the required dependencies inside the virtual environment (this step only needs to be done the first time, for the further usage it is enough to use the existing virtual environment).

On Linux:

Use Python 3 with the virtual environment and install required packages (supported Python versions are: 3.5, 3.6, 3.7, 3.8, 3.9):

python3 -m venv .env
source .env/bin/activate
pip install -r requirements.txt

For Python 3.5 instead of requirements.txt use the requirements_python_3.5.txt file.

On Windows 10:

Temporarily change the PowerShell’s execution policy to allow scripting. Start the PowerShell with Run as Administrator and use command:

Set-ExecutionPolicy RemoteSigned

then confirm your choice.

Use Python 3 with virtual environment and install required packages (supported Python versions are: 3.5, 3.6, 3.7, 3.8, 3.9):

python3 -m venv .env
.\.env\Scripts\activate
pip install -r requirements.txt

For Python 3.5 instead of requirements.txt use the requirements_python_3.5.txt file.

To switch back PowerShell’s execution policy to the default, use command:

Set-ExecutionPolicy Restricted

Proto sources¶

To build the sources from .proto, run:

./make_proto.sh

Run¶

To run the TTS Client, activate the virtual environment first:

On Linux:

source .env/bin/activate

On Windows:

.\.env\Scripts\activate

Then run TTS Client. Sample use:

python tts_client.py -s "192.168.1.1:4321" -f 44100 -t "Some text to be synthesized"

For each request you have to provide the service address and the input text (directly as argument’s value or from text file).

Docker usage¶

Build docker image¶

To prepare a docker image with Python implementation of the TTS Client, open the project’s main directory and run following command:

docker build -f Dockerfile-python -t tts-client-python:2.2.0 . 

The build process will take several minutes. When the build process is complete, you will receive a message:

Successfully tagged tts-client-python:2.2.0

Usage¶

Basic usage: tts_client.py --service-address ADDRESS --text INPUT_TEXT

Available options:

Option	Description
-h, –help	Show help message and exit.
-s IP:PORT, –service-address IP:PORT	An IP address and port (address:port) of a service the client connects to.
–session-id SESSION_ID	A session ID to be passed to the service. If not specified, the service generates a default session ID.
–grpc-timeout GRPC_TIMEOUT	A timeout in milliseconds used to set gRPC deadline - how long the client is willing to wait for a reply from the server (optional).
–list-voices	Lists all available voices.
-r RESPONSE_TYPE, –response RESPONSE_TYPE	“streaming” or “single”, calls the streaming (default) or non-streaming version of Synthesize.
-t TEXT, –text TEXT	A text to be synthesized.
-i INPUT_FILE, –input-text-file INPUT_FILE	A file with text to be synthesized.
-o OUT_PATH, –out-path OUT_PATH	A path to the output wave file with synthesized audio content (default output file is `TechmoTTS.wav`).
-f SAMPLE_RATE, –sample_rate SAMPLE_RATE	A sample rate in Hz of synthesized audio. Set to 0 (default) to use voice’s original sample rate.
–ae ENCODING, –audio-encoding ENCODING	An encoding of the output audio, pcm16 (default) or ogg-vorbs.
–sp SPEECH_PITCH, –speech-pitch SPEECH_PITCH	Allows adjusting the default pitch of the synthesized speech (optional, can be overriden by SSML).
–sr SPEECH_RANGE, –speech-range SPEECH_RANGE	Allows adjusting the default range of the synthesized speech (optional, can be overriden by SSML).
–ss SPEECH_RATE, –speech-rate SPEECH_RATE	Allows adjusting the default rate (speed) of the synthesized speech (optional, can be overriden by SSML).
–sv SPEECH_VOLUME, –speech-volume SPEECH_VOLUME	Allows adjusting the default volume of the synthesized speech (optional, can be overriden by SSML).
–vn VOICE_NAME, –voice-name VOICE_NAME	A name of the voice used to synthesize the phrase (optional, can be overriden by SSML).
–vg VOICE_GENDER, –voice-gender VOICE_GENDER	A gender of the voice - “female” or “male” (optional, can be overriden by SSML).
–va VOICE_AGE, –voice-age VOICE_AGE	An age of the voice - adult, child, or senile (optional, can be overriden by SSML).
-l LANGUAGE, –language LANGUAGE	ISO 639-1 language code of the phrase to synthesize (optional, can be overriden by SSML).
–tls-dir TLS_DIR	If set to a path with SSL/TLS credential files (client.crt, client.key, ca.crt), use SSL/TLS authentication. Otherwise use insecure channel (default).
–list-lexicons	Lists all available lexicons.
–get-lexicon LEXICON_NAME	Sends back the content of the lexicon with the requested name.
–delete-lexicon LEXICON_NAME	Removes the lexicon with the requested name.
–put-lexicon LEXICON_NAME LEXICON_CONTENT	Adds a new lexicon with the requested name or overwrites the existing one if there is already a lexicon with such name. Content of the lexicon, shall comply to https://www.w3.org/TR/pronunciation-lexicon/.
–list-recordings VOICE_NAME	Lists all recording keys for the requested voice.
–get-recording VOICE_NAME RECORDING_KEY OUTPUT_PATH	Sends back the recording with the requested key for the requested voice in the linear PCM16 format.
–delete-recording VOICE_NAME RECORDING_KEY	Removes the recording with the requested key from the list of recordings of the requested voice.
–put-recording VOICE_NAME RECORDING_KEY AUDIO_PATH	Adds a new recording with the requested key for the requested voice, or overwrites the existing one if there is already such a key defined. The recording has to be PCM16 WAV audio.

The input text can be either a plain text or SSML (https://w3.org/TR/speech-synthesis11/). Currently the following SSML tags are supported:

<speak> - root xml node, with optional xml:lang attribute,
<prosody> - supported attributes: pitch, range, rate, and volume,
<break> - supported attributes: strength and time,
<emphasis> - supported attribute: level,
<say-as> - supported attribute: interpret-as (consult Techmo TTS documentation for the complete list of all available implementations),
<lang> - supported attribute: xml:lang,
<voice> - supported attributes: name, gender, and age.

Module:

You can use the TTS Client as a module for Python3. Install the package to your environment:

pip install -e ./python/.

This package provides modules call_synthesize and call_listvoices with functions with the same name, which run the client. Here are examples how to use them as a module:

from call_synthesize import call_synthesize
call_synthesize(args, text)

and

from call_listvoices import call_listvoices
call_listvoices(args)

The args are a parsed command line arguments, and text is a request text to synthesize (either a plain text or SSML). Function parameters are described in usage section above.