Signpost AI Logo
WorkersTools

Stt

📸 Screenshots

Here are visual examples of this section:

Stt - Worker Configuration Interface Stt - Worker Configuration Interface

1. Overview and Purpose

The STT (Speech-to-Text) worker converts audio files containing speech into text transcriptions. It uses OpenAI's Whisper model to provide accurate speech recognition for various audio formats including MP3, WAV, and OGG.

2. Configuration Parameters

The Worker accepts the following parameters:

  • engine: Specifies the speech recognition engine to use (defaults to "whisper-1")

3. Input/Output Handles

  • input: Input handle - accepts audio data in multiple formats: base64 encoded audio with extension {audio: string, ext: string}, or URL strings pointing to audio files
  • output: Output handle - returns the transcribed text as a string

4. Usage Examples with Code

// Configure STT worker with Whisper engine
const sttWorker = {
  parameters: {
    engine: "whisper-1"
  },
  fields: {
    input: { value: { audio: "base64AudioData", ext: "mp3" } },
    output: { value: null }
  }
}

5. Integration Examples

This worker integrates well with Content workers that fetch audio files from URLs, and can feed transcribed text to LLM workers for further processing or analysis.

6. Best Practices

  • Ensure audio files are in supported formats (MP3, WAV, OGG) for best results
  • Use clear, high-quality audio recordings to improve transcription accuracy
  • Consider audio file size limits when processing longer recordings
  • Verify your OpenAI API key has access to the Whisper API

7. Troubleshooting Tips

  • If transcription fails, check that the input audio format is supported and properly encoded
  • Verify the audio file is not corrupted by testing with a different audio player first
  • For URL inputs, ensure the audio file is publicly accessible and not behind authentication
  • Check OpenAI API quotas if requests are being rejected