Vision

📸 Screenshots

Here are visual examples of this section:

Vision - Worker Configuration Interface

The Vision worker analyzes images from URLs and generates comprehensive text descriptions using AI vision models. It's specifically designed for educational contexts, providing detailed descriptions of visual content including any written text found in images. The worker can process multiple images simultaneously and identify errors in written work when present.

2. Configuration Parameters

model: The AI model to use for image analysis (defaults to "openai/gpt-4o")
prompt: Custom prompt for image analysis (defaults to educational-focused description prompt)

3. Input/Output Handles

input: Input handle - accepts text containing image URLs (supports PNG, JPEG, JPG, WEBP formats)
output: Output handle - returns detailed text descriptions of the analyzed images

4. Usage Examples with Code

// Vision worker processes URLs from input text
const imageUrls = `
Check out these student worksheets:
https://example.com/worksheet1.png
https://example.com/diagram.jpg
`

// Worker extracts supported image URLs and analyzes them
// Returns comprehensive descriptions focusing on educational content

5. Integration Examples

The Vision worker integrates well with content analysis workflows, allowing you to convert visual materials into text for further processing by other workers or for accessibility purposes.

6. Best Practices

Ensure image URLs are publicly accessible and use supported formats (PNG, JPEG, JPG, WEBP)
Customize the prompt parameter for specific analysis needs beyond the default educational focus
Consider the 1000 token limit when analyzing multiple or complex images
Use clear, high-resolution images for better analysis results

7. Troubleshooting Tips

Verify that your OpenAI API key is properly configured if you encounter authentication errors
Check that image URLs are direct links to image files, not web pages containing images
Ensure images are in supported formats - unsupported formats will be filtered out automatically
If no output is generated, confirm that the input contains valid image URLs