The following specifications are suggested for a Coqui TTS hosting/inference scenario:
- Entry-level hosting: One GPU with about 8 GB of VRAM (such as the NVIDIA RTX 3060Ti 8GB) is suitable for mild concurrency and small-scale hosting.
- Mid-hosting: One GPU with 16–24 GB of VRAM (such as the RTX A4000 16GB or RTX 4090 24 GB class) is ideal for increased throughput, many voices, and moderate concurrency.
· High-throughput/multi-tenant hosting: high memory, quick input/output, several GPUs or a single huge GPU (such as the RTX 5090 32 GB VRAM). Low latency, several voices, and numerous concurrent requests.