Overview
Connections in Kodexa define how Assistants respond to events in your system. They create data flows that connect event sources (like document uploads or data changes) to Assistants that process those events. Understanding connections is essential for building automated document processing workflows.
What are Connections?
A connection links an event source to an Assistant:
Source - Where events originate (Document Store, Workspace, Data Object)
Event Type - What kind of event triggers the connection
Assistant - What processes the event
Subscription - Filter that controls which events reach the Assistant
Example: A connection from a Document Store to a Model Assistant that processes only PDF invoices.
Event-Driven Processing Model
Kodexa uses an event-driven architecture:
How It Works
Event occurs - User uploads a document, data is saved, schedule triggers
Event emitted - The source resource publishes an event
Connections evaluate - Each connection's subscription is checked
Matching Assistants activated - Assistants with matching subscriptions receive the event
Assistant processes - Assistant returns list of actions to perform
Actions execute - Platform performs the requested actions (extract data, run model, etc.)
Benefits
Automatic processing - Documents process without manual intervention
Scalable workflows - Add Assistants without changing existing flow
Flexible routing - Route different events to different Assistants
Parallel processing - Multiple Assistants can process same event
Event Sources
Events can originate from different sources:
Content Events
Triggered when new content is added:
Document uploaded - New document added to Document Store
Content updated - Document content modified
Family created - New document family established
Use case: Automatically extract data from newly uploaded invoices.
Document Family Events
Triggered when document metadata changes:
Metadata updated - Document properties changed
Status changed - Processing state updated
Tags modified - Classification or labels changed
Use case: Re-process documents when status changes to "Review Complete".
Workspace Events
Triggered when users interact with data:
Data saved - User saves extracted data
Review completed - User finishes reviewing document
Corrections made - User corrects extraction errors
Use case: Train AI model after user corrections to improve accuracy.
Data Object Events
Triggered when extracted data changes:
Data created - New data object created
Data updated - Existing data modified
Validation completed - Data passed validation rules
Use case: Export data to external system when validation succeeds.
Scheduled Events
Triggered on a schedule:
Time-based - Run at specific times (daily, hourly, etc.)
Recurring - Repeat on a schedule
One-time - Run once at specified time
Use case: Generate daily reports at midnight or clean up old documents weekly.
Connection Configuration
Components of a Connection
1. Source
The resource that emits events
Can be a Document Store, Data Store, or other resource
One source can have multiple connections
2. Event Type
Specifies which events trigger this connection
Examples: "content", "family", "workspace", "data", "scheduled"
Determines what information is available to the subscription
3. Target Assistant
The Assistant that processes matching events
Most commonly a Model Assistant
Can be custom Assistants for specialized processing
4. Subscription Expression
Filter that determines if event reaches the Assistant
Written in Expression Language
Evaluates to true or false
Empty subscription = all events pass through
Subscriptions
Subscriptions control which events reach an Assistant using Expression Language.
Purpose
Filter events by document type
Route based on metadata properties
Process only documents meeting criteria
Prevent unnecessary processing
Expression Language
Subscriptions use expressions to evaluate event data:
Common patterns:
true- Process all events (default)contentType == 'application/pdf'- Only PDF filesmetadata.documentType == 'invoice'- Only invoicesstatus == 'ready'- Only documents in "ready" statuspageCount > 1- Multi-page documents only
Combining conditions:
contentType == 'application/pdf' && pageCount < 10- PDFs under 10 pagesmetadata.urgent == true || metadata.priority == 'high'- Urgent or high priority
Available Variables
Variables available in subscription expressions depend on event type:
Content Events:
contentType- MIME type of documentpageCount- Number of pagesfilename- Original filenamemetadata- Custom metadata properties
Family Events:
status- Current processing statustags- Associated tagsmetadata- Family metadata
Workspace Events:
user- User who triggered the eventaction- Type of action performeddataChanged- Whether data was modified
Data Flows
Data Flows visualize the connections between sources and Assistants in your Project.
What is a Data Flow?
Visual representation of event routing
Shows how events flow from sources to Assistants
Displays connection subscriptions
Helps understand and manage processing pipelines
Accessing Data Flows
Open your Project
Navigate to Manage Project
Click Data Flows
View visual representation of connections
Data Flow Features
Visual editor - Drag and drop to create connections
Connection details - View and edit subscription expressions
Assistant configuration - Configure Assistants inline
Testing - Test connections with sample events
Model Assistants and Connections
Model Assistants are the most common type of Assistant:
How They Work
Receive events through connections
Execute specified models in sequence
Each model processes the document
Return actions for the platform to perform
Configuration
Model list - Specify which models to run
Execution order - Models run in the order listed
Parameters - Configure model-specific settings
Error handling - Define behavior on model failure
Common Workflow
Document uploaded to Document Store (Content event)
Connection subscription evaluates (e.g., "PDFs only")
Model Assistant receives event
Executes extraction model
Returns extracted data to platform
Platform stores data and triggers Data Object events
Creating Effective Connections
Design Principles
Specific subscriptions - Filter events to reduce unnecessary processing
Single responsibility - Each connection handles one type of event
Clear naming - Name Assistants and connections descriptively
Test incrementally - Add connections one at a time and verify
Common Patterns
Document Type Routing
Invoice connection:
metadata.type == 'invoice'Receipt connection:
metadata.type == 'receipt'Contract connection:
metadata.type == 'contract'
Priority-Based Processing
High priority:
metadata.priority == 'high'→ Fast modelNormal priority:
metadata.priority == 'normal'→ Standard modelBatch processing:
metadata.batch == true→ Scheduled Assistant
Quality Control Workflow
Initial extraction: Content event → Extraction Assistant
Human review: Workspace event → Review completion Assistant
Final processing: Data Object event → Export Assistant
Best Practices
Start simple - Begin with basic connections, add complexity as needed
Use descriptive subscriptions - Document what each subscription filters
Test with sample data - Verify connections work before production use
Monitor processing - Watch for events that don't match any connection
Avoid overlapping subscriptions - Ensure events don't trigger multiple similar Assistants
Document your flows - Maintain notes on why connections exist
Review regularly - Remove unused connections to keep flows clean
Troubleshooting Connections
Event Not Processing
Check subscription expression syntax
Verify event type matches connection
Ensure Assistant is enabled and configured
Review Assistant logs for errors
Too Many Events Processing
Refine subscription to be more specific
Add additional filter conditions
Split into multiple targeted connections
Processing Delays
Check for resource bottlenecks
Verify Assistant performance
Consider parallel processing with multiple Assistants
Tips
Data Flows are owned by Projects - manage them from project settings
Empty subscriptions process all events from the source
Multiple connections can process the same event
Subscriptions use Expression Language - similar to formula expressions
Model Assistants can execute multiple models in sequence
Test subscriptions with sample events before deploying
Use the Data Flow visualization to understand your processing pipeline
Workspace events are great for training AI with human corrections
Scheduled events enable batch processing and reporting workflows
