How to extract a value from a local pdf file

In this article I am trying to demonstrate how you can use a custom Form Processing AI Model to extract data. That model is published and used within a Power Automate flow.

Extracting data from a pdf

The idea from this example comes from the Power Automate community. In the Building flow sending email based on managers name in a delivered PDF in folder thread skalltje asked for help.

The goal of this how to is to collect a pdf file from a local file share. Process the pdf file and extract a manager value from it. This manager value will be used to send an e-mail to him/her and also add the file as an attachment to the e-mail.

Prerequisites

Before you can create this flow you will need to install and/or configure two things:
– Data Gateway
– AI Model (a paid add-on for Power Platform, this means additional cost)

Data Gateway

The Data Gateway is a component which will allow you to securely access data on-premises, it acts as a bridge. You can find installation steps in this Service Gateway Install article. After installation you should be able to register and sign in with your Power Automate account and use it within Power Automate actions.

The end result should look something similar like below.
gatewayisinstalled

Build a Form Processing AI Model

For this example we want to use a custom AI Model. Since these are pdf files, which are in a structured format, I wanted to use a Form Processing model. You can build one in the AI Builder section of Power Automate.

buildaimodel

In my case I created one called ExtractManagerModel and I trained it with 5 sample pdf documents to recognize the manager field value. The steps to configure such a model can be found in this Create Form Processing Model article.

When you done building and publishing you should have something similar like the screenshot below, a custom Form Processing Model called ExtractManagerModel.

formprocessingpublishedaimodel

The flow setup

1. Add a when a file is created (properties only). Make sure it’s connected to your previously installed data gateway. Point it to a drop off folder which is on the system where the data gateway is running.

filesystemtrigger

2. Add a File System Get file content action, use the Id from the trigger action

getfilecontent_filesystem

3. Add a Predict action. Select your custom AI builder form processing model, in this case ExtractManagerModel. Use the File Content dynamic value of the Get File Content action for the prediction. Also add the content-type of the files, in our case application/pdf

predict_managerfield

4. Add a Send an email (V2) action and use the Manager value from the Predict action for the To field. Use the File Content from the Get File Content action for the Attachments Content field and the DisplayName from the trigger action for the Attachments Name field

sendanemail_managervalue

The fun part, testing it

In my test I am using the following test pdf.

testmanagerdocument_details

This file will be dropped in the Drop off folder

testmanagerdocument

And if all goes well you should be seeing the e-mail and/or the following output in your flow run. A process pdf file with a field called Manager value which has the correct e-mail address from the pdf file.

managervalue_extraction

Happy testing!

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.