In the framework of Kodexa, a Data Definition acts as a structured blueprint for defining and organizing data extracted from documents. It employs a hierarchical arrangement to categorize information across multiple levels, simplifying data management and retrieval. Data Definitions are fundamental to Kodexa, enabling users to develop customized data frameworks that meet their unique document processing needs.
Key Components of a Data Definition
Groups and Data Elements:
A Data Definition consists of groups, which may contain sub-groups or Data Elements.
This hierarchical structure allows for the organization of related categories, enhancing the detail and structure of data extraction.
Data Element Types:
Every Data Element within a Data Definition is designated a specific type.
These types dictate the nature and format of the data to be extracted, such as text, numbers, dates, etc.
Labels and Names:
Each Data Element and group within the Data Definition is assigned a 'label' for display purposes.
In contrast, the 'name' serves as a unique internal identifier for each Data Element or group, facilitating precise reference within Kodexa.
Descriptions and Contextual Definitions:
Descriptions offer a comprehensive explanation of what each Data Definition component represents.
Contextual definitions further clarify by detailing the application and relevance of each component within the document processing environment.
The button below the textbox helps users generate prompts by using AI to check and improve their input.
Data Source:
User can select for different source of the Data Element. By default, the source is "Document" since model will try to extract the information based from the prompt indicated.
Enabling Data Elements:
User now has the option to create Data Elements but not make it visible to the users. The features below aims to provide flexibility to the user by letting them toggle which elements should be visible and not.
Benefits of Using Data Definitions
Enhanced Structure:
Data Definitions enable the logical, hierarchical organization of data, streamlining the management of extensive information sets.
Customized Data Extraction:
Users can adapt the Data Definition to suit their specific needs, ensuring that the extracted data is pertinent and structured accordingly.
Efficient Data Retrieval:
The organized nature of Data Definitions aids in the swift and effective retrieval of specific data points from large datasets.
Consistency and Clarity:
Employing labels, names, descriptions, and contextual definitions ensures a uniform and clear understanding of the Data Definition's components among users.
How to Create a Data Definition in Kodexa
Outline Your Framework:
Begin by mapping out the hierarchical structure of your Data Definition, determining which elements will be groups and which will be Data Elements.
Designate Data Element Types:
Assign an appropriate type to each Data Element, reflecting the nature of the data you intend to extract.
Label and Name Your Components:
Assign descriptive labels to each component, while Kodexa generates unique names for internal use.
Incorporate Descriptions and Contextual Definitions:
Enhance each component with detailed descriptions and contextual definitions for clearer data extraction and processing.
Evaluate and Adjust:
Test your Data Definition against sample documents to verify accurate data extraction. Adjust the framework and components as necessary for enhanced performance.
Introducing Multi-Pass:
There is now an option to create different sets of Data Definition within the same project. This aims to help the user in ensuring that they can adjust the extracted data without having to create a new project. This can be set in Workspace > Data Definition:
Conclusion
Data Definitions in Kodexa are invaluable for structuring and extracting data from documents effectively. By leveraging the components of a Data Definition, users can improve their document processing operations, leading to more streamlined and accurate data management.