Extract Personal Data with a Self-Hosted LLM

Self-Hosted LLM
Anjali
September 17, 2025
No Comments

Manually extracting personal information from chat messages can be slow and error-prone. This use case blog details an automated workflow that uses a self-hosted Large Language Model (LLM) to Extract Personal Data efficiently. This approach ensures data is processed securely and with high accuracy.

What is this Workflow?

This workflow is designed to listen for a chat message, extract specific personal data, and format it into a structured JSON object. It uses a self-hosted Mistral NeMo model for the analysis, which is accessed through an Ollama Chat Model node. The model is instructed to extract structured information like names, contact types, and contact details.

Step-by-Step Breakdown to Extract Personal Data

Here is a step-by-step look at how the workflow accomplishes personal data extraction:

Step 1: When chat message received

The Extract Personal Data workflow begins when a new chat message is detected. This acts as the trigger to start the data extraction process.

Step 2: Basic LLM Chain

This node is the core of the workflow. It’s where the user’s message is analyzed by the self-hosted LLM. The model is instructed to extract information according to a predefined JSON schema. The prompt provided to the LLM is: “Please analyse the incoming user request. Extract Extract Personal Data according to the JSON schema. Today is: ‘{{ $now.toISO() }}'”. The same LLM connects to both the Basic LLM Chain and the Auto-fixing Output Parser.

Step 3: Structured Output Parser

After the LLM generates a response, this node checks if the output adheres to a specific JSON schema. The schema defines fields such as name, surname, commtype, contacts, timestamp, and subject. The name and commtype fields are required. If the output doesn’t match, the process moves to the auto-fixing stage.

Step 4: Auto-fixing Output Parser

If the model’s initial response fails the checks in the Structured Output Parser, the Auto-fixing Output Parser takes over. It re-prompts the model with a new set of instructions to correct the original response and satisfy the schema constraints. The prompt for this step is: “Please only respond with an answer that satisfies the constraints laid out in the Instructions:”.

Step 5: Extract JSON Output

The final, correctly formatted JSON data is extracted and made available for other actions. This ensures that the result of the personal data extraction is consistent and ready for further use.

*Note: For the JSON template, please contact us and provide the blog URL.

Key Features of this Workflow

Self-Hosted LLM: The use of a self-hosted LLM like Mistral NeMo provides more control over data privacy and security.
Structured Output: Data is consistently formatted into a JSON object, making it easy to integrate with other systems.
Automatic Correction: The auto-fixing mechanism ensures high accuracy by correcting inconsistent outputs automatically.
Customizable Schema: The JSON schema can be defined to match your specific personal data extraction needs.

Why Choose a Self-Hosted LLM to Extract Personal Data?

Using a self-hosted LLM for personal data extraction offers several advantages:

Data Sovereignty: You maintain complete control over your data.
Security: Sensitive personal data doesn’t leave your infrastructure.
Cost-Effective: Can be more economical for high-volume tasks compared to API-based services.
Customization: Models can be fine-tuned to your specific requirements.

Relevant Reads:

Conclusion

To conclude, this automated workflow to Extract Personal Data provides a robust solution for personal data extraction from chat messages. By leveraging a self-hosted LLM like Mistral NeMo and implementing an auto-fixing output parser, the process ensures that the extracted data is not only accurate but also consistently formatted into a structured JSON object.

This approach offers a secure and efficient alternative to manual data processing, making it ideal for tasks that require handling sensitive information. The final structured data is then ready for further use or integration with other systems.

FAQs

What is Ollama and how does it relate to the workflow?

Ollama is a tool that allows you to run large language models locally2222. The Ollama Chat Model node in this workflow provides the interface to use the self-hosted Mistral NeMo model for personal data extraction.

What happens if the extracted data doesn’t match the JSON schema?

If the data doesn’t match the schema defined in the Structured Output Parser, the Auto-fixing Output Parser re-prompts the LLM. This process to Extract Personal Data aims to correct the output and ensure it meets the required format and constraints.

Can I customize the data I want to extract?

Yes, the workflow is highly customizable. You can modify the JSON schema in the Structured Output Parser to define which fields you need to extract. You would also update the prompt in the Basic LLM Chain to instruct the LLM on the new data to be extracted for your personal data extraction needs.

#Extract Personal Data