Architecture

Techmo TTS is a modern speech synthesis system that uses advanced machine learning and pattern matching algorithms to transform text into a high-quality speech signal. The way the speech signal is generated can be further controlled using SSML (Speech Synthesis Markup Language) tags. Communication with the system is provided by interfaces using the gRPC (and optionally MRCPv2) protocol. For high scalability and easy integration, the Techmo TTS service is distributed as a Docker image.

A single instance of the service can support hundreds of simultaneous sessions (the exact number depends on the machine’s performance); in case of higher demand, multiple instances can be used in a Kubernetes cluster.

The produced audio can be streamed to the client, allowing the system to remain highly responsive even when synthesizing long messages. Further improvements in processing time can be achieved by using the built-in cache system.

The Techmo TTS system operates in a client-server architecture, communicating via the gRPC protocol. Additionally, it can be adapted to communication using the MRCPv2 protocol thanks to the use of an additional proxy service.

For high scalability and easy integration, the Techmo TTS service is distributed as a Docker image.

For the details of the service architecture, refer to the TTS System Overview section.