Schema

Screenshots

Schema Worker Interface - Configure data extraction schema

The Schema worker generates structured data by extracting information from input text based on a predefined schema. It uses TypeChat and Zod validation to ensure the output conforms to the specified structure.

Key Features

Dynamic Schema Definition: Define custom data structures using connected handlers
Type Safety: Built-in validation using Zod schemas
Flexible Field Types: Support for strings, numbers, booleans, arrays, and enums
AI-Powered Extraction: Uses OpenAI models for intelligent data parsing

Configuration

Parameters

Model: OpenAI model to use (default: gpt-4o)

Input/Output

Input: Raw text content to extract data from
JSON Output: Structured JSON object matching the defined schema
Dynamic Fields: Additional output fields based on connected handlers

Use Cases

1. Contact Information Extraction

Extract structured contact details from unformatted text:

Name, email, phone number
Address components
Company information

2. Product Data Parsing

Structure product information from descriptions:

Price, features, specifications
Categories and tags
Availability status

3. Event Information Processing

Extract event details from text:

Date, time, location
Attendee information
Requirements and restrictions

4. Document Analysis

Parse structured data from documents:

Form field extraction
Data normalization
Content categorization

How It Works

Schema Definition: Connected handlers define the output structure
Type Mapping: Field types are converted to Zod validation schemas
AI Processing: Input text is processed using the specified model
Validation: Output is validated against the defined schema
Field Population: Extracted data populates the connected output fields

Best Practices

Schema Design

Use descriptive field names that match expected content
Provide clear prompts for each field to guide extraction
Choose appropriate data types for validation

Input Preparation

Ensure input text contains the information you want to extract
Use consistent formatting when possible
Provide sufficient context for accurate extraction

Model Selection

Use gpt-4o for complex schema extraction
Consider gpt-3.5-turbo for simpler, faster processing
Test different models for optimal accuracy

Validation Handling

Design schemas to handle optional fields gracefully
Use enum types for fields with limited valid values
Consider array types for lists or multiple values

Example Schema

// Example: Contact extraction schema
{
  name: string,           // Person's full name
  email: string,          // Email address
  phone: string,          // Phone number
  company: string,        // Company name
  role: string,           // Job title
  tags: string[]          // Category tags
}

Integration Tips

Chain with Text Workers: Process raw content before schema extraction
Combine with Display: Show extracted data in formatted views
Use with State: Store extracted data for later use
Connect to API: Send structured data to external systems

Error Handling

The worker includes built-in validation and will:

Skip extraction if input is empty
Handle partial matches gracefully
Provide structured error information
Maintain type safety throughout the process

Performance Considerations

Larger schemas may require more processing time
Complex nested structures need more capable models
Consider breaking large schemas into smaller, focused ones
Cache results when processing similar content repeatedly