In the framework of Kodexa, a Data Definition acts as a structured blueprint for defining and organizing data extracted from documents. It employs a hierarchical arrangement to categorize information across multiple levels, simplifying data management and retrieval. Data Definitions are fundamental to Kodexa, enabling users to develop customized data frameworks that meet their unique document processing needs.

Key Components of a Data Definition

Groups and Data Elements:

A Data Definition consists of groups, which may contain sub-groups or Data Elements.

This hierarchical structure allows for the organization of related categories, enhancing the detail and structure of data extraction.

Data Element Types:

Every Data Element within a Data Definition is designated a specific type.

These types dictate the nature and format of the data to be extracted, such as text, numbers, dates, etc.

Labels and Names:

Each Data Element and group within the Data Definition is assigned a 'label' for display purposes.

In contrast, the 'name' serves as a unique internal identifier for each Data Element or group, facilitating precise reference within Kodexa.

Descriptions and Contextual Definitions:

Descriptions offer a comprehensive explanation of what each Data Definition component represents.

Contextual definitions further clarify by detailing the application and relevance of each component within the document processing environment.

The button below the textbox helps users generate prompts by using AI to check and improve their input.

Data Source:

User can select for different source of the Data Element. By default, the source is "Document" since model will try to extract the information based from the prompt indicated.

Enabling Data Elements:

User now has the option to create Data Elements but not make it visible to the users. The features below aims to provide flexibility to the user by letting them toggle which elements should be visible and not.

Filtering Data Definitions

There is a filter field above the data definition tree structure:

The filter system allows you to search and filter data elements based on their names and special attributes. This filtering mechanism is particularly useful when working with large data definitions where you need to quickly find or filter specific elements.

Basic Filtering

The basic filter allows you to search data elements by their names. The filter will match any data element whose name contains the search term.

Special Commands

Special commands provide advanced filtering capabilities using predefined flags. Each command starts with the @ symbol and can be negated using ! prefix.

Available Commands

@hasvalidation (or @hasvalidations)

Purpose: Filters data elements that have validation rules defined
Behavior: Returns elements where validationRules array is not empty
Negation: !@hasvalidation shows elements without validation rules
Example: @hasvalidation will show only elements with data validation in place

@hasprompt

Purpose: Filters data elements that were created from prompts
Behavior: Returns elements that have:
- A semantic definition (non-empty)
- Originated from a prompt
Negation: !@hasprompt shows elements not created from prompts

@enabled

Purpose: Filters elements based on their enabled status
Behavior: Returns elements where the enabled flag is true
Negation: !@enabled shows disabled elements

@formula

Purpose: Filters formula-based elements
Behavior: Returns elements where valuePath equals "FORMULA"
Negation: !@formula shows non-formula elements

@metadata

Purpose: Filters metadata elements
Behavior: Returns elements where valuePath equals "METADATA"
Negation: !@metadata shows non-metadata elements

Usage Examples

Find enabled elements with validation:

@enabled @hasvalidation

Find all non-formula elements with prompts:

!@formula @hasprompt

Find metadata elements without validation:

@metadata !@hasvalidation

Implementation Notes

Multiple commands can be combined using AND logic (all conditions must be met)
Each command can be negated independently
Invalid/unknown commands are ignored (return true)
The filter is case-sensitive

Benefits of Using Data Definitions

Enhanced Structure:

Data Definitions enable the logical, hierarchical organization of data, streamlining the management of extensive information sets.

Customized Data Extraction:

Users can adapt the Data Definition to suit their specific needs, ensuring that the extracted data is pertinent and structured accordingly.

Efficient Data Retrieval:

The organized nature of Data Definitions aids in the swift and effective retrieval of specific data points from large datasets.

Consistency and Clarity:

Employing labels, names, descriptions, and contextual definitions ensures a uniform and clear understanding of the Data Definition's components among users.

How to Create a Data Definition in Kodexa

Outline Your Framework:

Begin by mapping out the hierarchical structure of your Data Definition, determining which elements will be groups and which will be Data Elements.

Designate Data Element Types:

Assign an appropriate type to each Data Element, reflecting the nature of the data you intend to extract.

Label and Name Your Components:

Assign descriptive labels to each component, while Kodexa generates unique names for internal use.

Incorporate Descriptions and Contextual Definitions:

Enhance each component with detailed descriptions and contextual definitions for clearer data extraction and processing.

Evaluate and Adjust:

Test your Data Definition against sample documents to verify accurate data extraction. Adjust the framework and components as necessary for enhanced performance.

Introducing Multi-Pass:

There is now an option to create different sets of Data Definition within the same project. This aims to help the user in ensuring that they can adjust the extracted data without having to create a new project. This can be set in Workspace > Data Definition:

Conclusion

Data Definitions in Kodexa are invaluable for structuring and extracting data from documents effectively. By leveraging the components of a Data Definition, users can improve their document processing operations, leading to more streamlined and accurate data management.

Introducing Organizations

Introducing Projects

Labeling, Chats and Guidance

Introducing Data Forms

Data Definition for AI