Skip to main content

Understanding Connections

Learn more about connections and subscriptions for Assistants

Updated today

Overview

Connections in Kodexa define how Assistants respond to events in your system. They create data flows that connect event sources (like document uploads or data changes) to Assistants that process those events. Understanding connections is essential for building automated document processing workflows.

What are Connections?

A connection links an event source to an Assistant:

  • Source - Where events originate (Document Store, Workspace, Data Object)

  • Event Type - What kind of event triggers the connection

  • Assistant - What processes the event

  • Subscription - Filter that controls which events reach the Assistant

Example: A connection from a Document Store to a Model Assistant that processes only PDF invoices.

Event-Driven Processing Model

Kodexa uses an event-driven architecture:

How It Works

  1. Event occurs - User uploads a document, data is saved, schedule triggers

  2. Event emitted - The source resource publishes an event

  3. Connections evaluate - Each connection's subscription is checked

  4. Matching Assistants activated - Assistants with matching subscriptions receive the event

  5. Assistant processes - Assistant returns list of actions to perform

  6. Actions execute - Platform performs the requested actions (extract data, run model, etc.)

Benefits

  • Automatic processing - Documents process without manual intervention

  • Scalable workflows - Add Assistants without changing existing flow

  • Flexible routing - Route different events to different Assistants

  • Parallel processing - Multiple Assistants can process same event

Event Sources

Events can originate from different sources:

Content Events

Triggered when new content is added:

  • Document uploaded - New document added to Document Store

  • Content updated - Document content modified

  • Family created - New document family established

Use case: Automatically extract data from newly uploaded invoices.

Document Family Events

Triggered when document metadata changes:

  • Metadata updated - Document properties changed

  • Status changed - Processing state updated

  • Tags modified - Classification or labels changed

Use case: Re-process documents when status changes to "Review Complete".

Workspace Events

Triggered when users interact with data:

  • Data saved - User saves extracted data

  • Review completed - User finishes reviewing document

  • Corrections made - User corrects extraction errors

Use case: Train AI model after user corrections to improve accuracy.

Data Object Events

Triggered when extracted data changes:

  • Data created - New data object created

  • Data updated - Existing data modified

  • Validation completed - Data passed validation rules

Use case: Export data to external system when validation succeeds.

Scheduled Events

Triggered on a schedule:

  • Time-based - Run at specific times (daily, hourly, etc.)

  • Recurring - Repeat on a schedule

  • One-time - Run once at specified time

Use case: Generate daily reports at midnight or clean up old documents weekly.

Connection Configuration

Components of a Connection

1. Source

  • The resource that emits events

  • Can be a Document Store, Data Store, or other resource

  • One source can have multiple connections

2. Event Type

  • Specifies which events trigger this connection

  • Examples: "content", "family", "workspace", "data", "scheduled"

  • Determines what information is available to the subscription

3. Target Assistant

  • The Assistant that processes matching events

  • Most commonly a Model Assistant

  • Can be custom Assistants for specialized processing

4. Subscription Expression

  • Filter that determines if event reaches the Assistant

  • Written in Expression Language

  • Evaluates to true or false

  • Empty subscription = all events pass through

Subscriptions

Subscriptions control which events reach an Assistant using Expression Language.

Purpose

  • Filter events by document type

  • Route based on metadata properties

  • Process only documents meeting criteria

  • Prevent unnecessary processing

Expression Language

Subscriptions use expressions to evaluate event data:

Common patterns:

  • true - Process all events (default)

  • contentType == 'application/pdf' - Only PDF files

  • metadata.documentType == 'invoice' - Only invoices

  • status == 'ready' - Only documents in "ready" status

  • pageCount > 1 - Multi-page documents only

Combining conditions:

  • contentType == 'application/pdf' && pageCount < 10 - PDFs under 10 pages

  • metadata.urgent == true || metadata.priority == 'high' - Urgent or high priority

Available Variables

Variables available in subscription expressions depend on event type:

Content Events:

  • contentType - MIME type of document

  • pageCount - Number of pages

  • filename - Original filename

  • metadata - Custom metadata properties

Family Events:

  • status - Current processing status

  • tags - Associated tags

  • metadata - Family metadata

Workspace Events:

  • user - User who triggered the event

  • action - Type of action performed

  • dataChanged - Whether data was modified

Data Flows

Data Flows visualize the connections between sources and Assistants in your Project.

What is a Data Flow?

  • Visual representation of event routing

  • Shows how events flow from sources to Assistants

  • Displays connection subscriptions

  • Helps understand and manage processing pipelines

Accessing Data Flows

  1. Open your Project

  2. Navigate to Manage Project

  3. Click Data Flows

  4. View visual representation of connections

Data Flow Features

  • Visual editor - Drag and drop to create connections

  • Connection details - View and edit subscription expressions

  • Assistant configuration - Configure Assistants inline

  • Testing - Test connections with sample events

Model Assistants and Connections

Model Assistants are the most common type of Assistant:

How They Work

  • Receive events through connections

  • Execute specified models in sequence

  • Each model processes the document

  • Return actions for the platform to perform

Configuration

  • Model list - Specify which models to run

  • Execution order - Models run in the order listed

  • Parameters - Configure model-specific settings

  • Error handling - Define behavior on model failure

Common Workflow

  1. Document uploaded to Document Store (Content event)

  2. Connection subscription evaluates (e.g., "PDFs only")

  3. Model Assistant receives event

  4. Executes extraction model

  5. Returns extracted data to platform

  6. Platform stores data and triggers Data Object events

Creating Effective Connections

Design Principles

  • Specific subscriptions - Filter events to reduce unnecessary processing

  • Single responsibility - Each connection handles one type of event

  • Clear naming - Name Assistants and connections descriptively

  • Test incrementally - Add connections one at a time and verify

Common Patterns

Document Type Routing

  • Invoice connection: metadata.type == 'invoice'

  • Receipt connection: metadata.type == 'receipt'

  • Contract connection: metadata.type == 'contract'

Priority-Based Processing

  • High priority: metadata.priority == 'high' → Fast model

  • Normal priority: metadata.priority == 'normal' → Standard model

  • Batch processing: metadata.batch == true → Scheduled Assistant

Quality Control Workflow

  • Initial extraction: Content event → Extraction Assistant

  • Human review: Workspace event → Review completion Assistant

  • Final processing: Data Object event → Export Assistant

Best Practices

  • Start simple - Begin with basic connections, add complexity as needed

  • Use descriptive subscriptions - Document what each subscription filters

  • Test with sample data - Verify connections work before production use

  • Monitor processing - Watch for events that don't match any connection

  • Avoid overlapping subscriptions - Ensure events don't trigger multiple similar Assistants

  • Document your flows - Maintain notes on why connections exist

  • Review regularly - Remove unused connections to keep flows clean

Troubleshooting Connections

Event Not Processing

  • Check subscription expression syntax

  • Verify event type matches connection

  • Ensure Assistant is enabled and configured

  • Review Assistant logs for errors

Too Many Events Processing

  • Refine subscription to be more specific

  • Add additional filter conditions

  • Split into multiple targeted connections

Processing Delays

  • Check for resource bottlenecks

  • Verify Assistant performance

  • Consider parallel processing with multiple Assistants

Tips

  • Data Flows are owned by Projects - manage them from project settings

  • Empty subscriptions process all events from the source

  • Multiple connections can process the same event

  • Subscriptions use Expression Language - similar to formula expressions

  • Model Assistants can execute multiple models in sequence

  • Test subscriptions with sample events before deploying

  • Use the Data Flow visualization to understand your processing pipeline

  • Workspace events are great for training AI with human corrections

  • Scheduled events enable batch processing and reporting workflows

Did this answer your question?