Skip to main content

Introducing Data Definitions

Learn about Data Definitions, their purpose, and some key concepts

Updated today

Overview

Data Definitions in Kodexa provide a structured blueprint for organizing and extracting data from documents. They use a hierarchical structure to categorize information at multiple levels, making it easier to manage and retrieve data. Data Definitions are fundamental to Kodexa, allowing you to create custom frameworks that meet your specific document processing needs.

What is a Data Definition?

A Data Definition is a hierarchical structure that defines:

  • What data to extract from documents

  • How to organize that data

  • What types of values to expect (text, numbers, dates, etc.)

  • Relationships between different data elements

For example, a Data Definition for invoices might include groups for "Header Information" containing elements like "Invoice Number" and "Date", and "Line Items" containing elements like "Description", "Quantity", and "Price".

Viewing Data Definitions

To view your Data Definitions:

  1. Open a Project

  2. Navigate to the Data Definitions section

  3. You'll see a grid of cards, each representing one Data Definition

Data Definition Cards

Each card displays:

  • Name - The Data Definition's display name

  • Description - A brief explanation of its purpose

  • Element Count - How many data elements it contains

  • Version - The current version number

  • Public Indicator - Shows if the definition is publicly accessible (earth icon)

Creating a New Data Definition

To create a new Data Definition:

  1. Click the New Data Definition button in the header

  2. Enter a name and description for your Data Definition

  3. Configure initial settings

  4. Start adding data elements

Working with Data Elements

Hierarchical Structure

Data Definitions use a tree structure where:

  • Root Elements - Top-level data elements or groups

  • Groups - Containers that organize related data elements

  • Data Elements - Individual fields that capture specific values

  • Nested Structure - Elements can be nested within groups multiple levels deep

Creating Your First Element

When a Data Definition is empty, you'll see a helpful empty state:

  1. Click Create a Data Element

  2. Name your root element (e.g., "Contract" for contract documents)

  3. Add child elements under it (e.g., "Contract Date", "Parties", "Terms")

Example: For contract documents, you might create:

  • Contract (group)

    • Contract Number (text)

    • Contract Date (date)

    • Parties (group)

      • Party Name (text)

      • Party Role (text)

Editing Data Definitions

To edit a Data Definition:

  1. Click on the Data Definition card in the grid

  2. The tree editor opens showing the hierarchical structure

  3. Use the tree view to:

    • Add new elements

    • Edit existing elements

    • Reorder elements via drag and drop

    • Delete elements

Tree View Features

  • Filter Box - Search for specific elements by name

  • Expand/Collapse Icons - Navigate through nested structures

  • Drag and Drop - Reorder elements by dragging them

  • Element Details - Click an element to view and edit its properties

Data Element Types

Groups

Groups are containers for organizing related data elements. They:

  • Have no inherent data value themselves

  • Can contain other groups or data elements

  • Help organize complex data structures

  • Can be repeating (for lists like line items)

Data Elements

Data elements capture specific values. Common types include:

  • String - Text values

  • Number - Numeric values

  • Date - Date values

  • Date Time - Date and time values

  • Currency - Monetary amounts

  • Boolean - True/false values

  • URL - Web addresses

  • Email Address - Email values

  • Phone Number - Phone values

  • Percentage - Percentage values

  • Selection - Predefined list of options

Data Sources

Data elements can get their values from different sources:

  • Document - Extract directly from document content

  • Metadata - Use document metadata (filename, date, etc.)

  • Derived - Calculate from other elements

  • Formula - Use formulas for calculations

  • Expression - Use Groovy expressions for complex logic

  • Review - Manual review and input

  • External - Fetch from external APIs or databases

Searching Data Definitions

Use the search box in the header to filter Data Definitions by:

  • Name

  • Description

The search updates the grid in real-time as you type. Clear the search to see all definitions again.

Version Control

Data Definitions support versioning to track changes over time:

  • Each definition shows its current version number

  • You can view version history

  • Changes to structure are tracked

  • Helpful for maintaining consistency across documents

Public Access

Data Definitions can be marked as publicly accessible:

  • Public definitions show an earth icon on their card

  • Can be shared across Projects or Organizations

  • Useful for standard document types used by multiple teams

Best Practices

  • Start Simple - Begin with essential fields, add complexity as needed

  • Use Groups - Organize related fields into logical groups

  • Descriptive Names - Use clear, descriptive names for elements

  • Consistent Structure - Keep similar document types organized consistently

  • Plan Hierarchy - Think about how data relates before building

  • Use Appropriate Types - Choose the correct data type for each element

  • Test with Documents - Process test documents to validate your definition

Common Use Cases

Invoice Processing

  • Header (group): Invoice Number, Date, Due Date, Vendor

  • Billing (group): Bill To Name, Bill To Address

  • Line Items (repeating group): Description, Quantity, Unit Price, Total

  • Totals (group): Subtotal, Tax, Total Amount

Contract Management

  • Contract (group): Contract Number, Effective Date, Expiration Date

  • Parties (repeating group): Party Name, Party Role, Party Address

  • Terms (group): Payment Terms, Renewal Terms, Termination Clause

Form Processing

  • Applicant (group): First Name, Last Name, Date of Birth, Email

  • Address (group): Street, City, State, Zip Code

  • Employment (repeating group): Employer, Position, Start Date, End Date

Integration with Assistants

Data Definitions work together with Assistants (AI models) to:

  • Automatically extract data from documents

  • Validate extracted data

  • Suggest corrections

  • Learn from corrections to improve over time

Tips

  • Click any Data Definition card to open the tree editor

  • Use the filter box to quickly find specific elements in large definitions

  • Drag elements to reorder them - the structure is flexible

  • Groups can contain both data elements and other groups

  • Element count helps you understand definition complexity at a glance

  • Version numbers help track changes to your definitions over time

Did this answer your question?