Overview
Data Definitions in Kodexa provide a structured blueprint for organizing and extracting data from documents. They use a hierarchical structure to categorize information at multiple levels, making it easier to manage and retrieve data. Data Definitions are fundamental to Kodexa, allowing you to create custom frameworks that meet your specific document processing needs.
What is a Data Definition?
A Data Definition is a hierarchical structure that defines:
What data to extract from documents
How to organize that data
What types of values to expect (text, numbers, dates, etc.)
Relationships between different data elements
For example, a Data Definition for invoices might include groups for "Header Information" containing elements like "Invoice Number" and "Date", and "Line Items" containing elements like "Description", "Quantity", and "Price".
Viewing Data Definitions
To view your Data Definitions:
Open a Project
Navigate to the Data Definitions section
You'll see a grid of cards, each representing one Data Definition
Data Definition Cards
Each card displays:
Name - The Data Definition's display name
Description - A brief explanation of its purpose
Element Count - How many data elements it contains
Version - The current version number
Public Indicator - Shows if the definition is publicly accessible (earth icon)
Creating a New Data Definition
To create a new Data Definition:
Click the New Data Definition button in the header
Enter a name and description for your Data Definition
Configure initial settings
Start adding data elements
Working with Data Elements
Hierarchical Structure
Data Definitions use a tree structure where:
Root Elements - Top-level data elements or groups
Groups - Containers that organize related data elements
Data Elements - Individual fields that capture specific values
Nested Structure - Elements can be nested within groups multiple levels deep
Creating Your First Element
When a Data Definition is empty, you'll see a helpful empty state:
Click Create a Data Element
Name your root element (e.g., "Contract" for contract documents)
Add child elements under it (e.g., "Contract Date", "Parties", "Terms")
Example: For contract documents, you might create:
Contract (group)
Contract Number (text)
Contract Date (date)
Parties (group)
Party Name (text)
Party Role (text)
Editing Data Definitions
To edit a Data Definition:
Click on the Data Definition card in the grid
The tree editor opens showing the hierarchical structure
Use the tree view to:
Add new elements
Edit existing elements
Reorder elements via drag and drop
Delete elements
Tree View Features
Filter Box - Search for specific elements by name
Expand/Collapse Icons - Navigate through nested structures
Drag and Drop - Reorder elements by dragging them
Element Details - Click an element to view and edit its properties
Data Element Types
Groups
Groups are containers for organizing related data elements. They:
Have no inherent data value themselves
Can contain other groups or data elements
Help organize complex data structures
Can be repeating (for lists like line items)
Data Elements
Data elements capture specific values. Common types include:
String - Text values
Number - Numeric values
Date - Date values
Date Time - Date and time values
Currency - Monetary amounts
Boolean - True/false values
URL - Web addresses
Email Address - Email values
Phone Number - Phone values
Percentage - Percentage values
Selection - Predefined list of options
Data Sources
Data elements can get their values from different sources:
Document - Extract directly from document content
Metadata - Use document metadata (filename, date, etc.)
Derived - Calculate from other elements
Formula - Use formulas for calculations
Expression - Use Groovy expressions for complex logic
Review - Manual review and input
External - Fetch from external APIs or databases
Searching Data Definitions
Use the search box in the header to filter Data Definitions by:
Name
Description
The search updates the grid in real-time as you type. Clear the search to see all definitions again.
Version Control
Data Definitions support versioning to track changes over time:
Each definition shows its current version number
You can view version history
Changes to structure are tracked
Helpful for maintaining consistency across documents
Public Access
Data Definitions can be marked as publicly accessible:
Public definitions show an earth icon on their card
Can be shared across Projects or Organizations
Useful for standard document types used by multiple teams
Best Practices
Start Simple - Begin with essential fields, add complexity as needed
Use Groups - Organize related fields into logical groups
Descriptive Names - Use clear, descriptive names for elements
Consistent Structure - Keep similar document types organized consistently
Plan Hierarchy - Think about how data relates before building
Use Appropriate Types - Choose the correct data type for each element
Test with Documents - Process test documents to validate your definition
Common Use Cases
Invoice Processing
Header (group): Invoice Number, Date, Due Date, Vendor
Billing (group): Bill To Name, Bill To Address
Line Items (repeating group): Description, Quantity, Unit Price, Total
Totals (group): Subtotal, Tax, Total Amount
Contract Management
Contract (group): Contract Number, Effective Date, Expiration Date
Parties (repeating group): Party Name, Party Role, Party Address
Terms (group): Payment Terms, Renewal Terms, Termination Clause
Form Processing
Applicant (group): First Name, Last Name, Date of Birth, Email
Address (group): Street, City, State, Zip Code
Employment (repeating group): Employer, Position, Start Date, End Date
Integration with Assistants
Data Definitions work together with Assistants (AI models) to:
Automatically extract data from documents
Validate extracted data
Suggest corrections
Learn from corrections to improve over time
Tips
Click any Data Definition card to open the tree editor
Use the filter box to quickly find specific elements in large definitions
Drag elements to reorder them - the structure is flexible
Groups can contain both data elements and other groups
Element count helps you understand definition complexity at a glance
Version numbers help track changes to your definitions over time
