What is an Audio Node?

The Audio node lets you generate audio from text using high-quality voice synthesis models. It’s ideal for turning responses from LLMs or static text into spoken audio.

This node is commonly used in voice interfaces, accessibility workflows, or any experience where you want to deliver output via sound.

It supports popular text-to-speech engines and customizable voices to match your tone and use case.

Key capabilities include:

  • Supports multilingual audio synthesis.
  • Choose from multiple voice models and accents.
  • Play back audio directly in the interface with Test Output.
  • Optionally use your own API key to connect with external TTS providers.

How to use it?

To use the Audio node:

  • Input: Accepts a text string (e.g., from an LLM or Input node).
  • Output: Returns a playable audio file that can be previewed that can be previewed.

After receiving text input, the Audio node displays a Test Output section with a play button, allowing you to listen to the generated audio.

Settings

Configuration Options

  • Model: Choose the TTS engine, such as eleven_multilingual_v2.
  • Voice: Select from available voice profiles (e.g., Sarah, Chris).
  • API Key: Optional field for providing your own TTS provider credentials.

How to expose Audio externally?

To make audio results available in your external interface:

  1. Go to the Export tab.
  2. Enable the Audio node in the Outputs section under Fields.
  3. Click Save Interface.
  4. When triggered, users will be able to hear the generated audio directly in the interface.