WO2023144218A1 - An electronic device and a method for tabular data extraction - Google Patents

An electronic device and a method for tabular data extraction Download PDF

Info

Publication number
WO2023144218A1
WO2023144218A1 PCT/EP2023/051825 EP2023051825W WO2023144218A1 WO 2023144218 A1 WO2023144218 A1 WO 2023144218A1 EP 2023051825 W EP2023051825 W EP 2023051825W WO 2023144218 A1 WO2023144218 A1 WO 2023144218A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
format
electronic device
pattern
extraction
Prior art date
Application number
PCT/EP2023/051825
Other languages
French (fr)
Inventor
Sunil Kumar CHINNAMGARI
Vipin Prabhudas SOLANKI
Sudharsan Bhaskera BABU
Lawrence MENDONCA
Original Assignee
A.P. Møller - Mærsk A/S
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by A.P. Møller - Mærsk A/S filed Critical A.P. Møller - Mærsk A/S
Publication of WO2023144218A1 publication Critical patent/WO2023144218A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management

Definitions

  • the present disclosure pertains to the field of electronic document control and management.
  • the present disclosure relates to an electronic device and a related method for tabular data extraction.
  • An electronic device comprising memory circuitry, processor circuitry, and an interface.
  • the electronic device is configured to obtain first data indicative of a document.
  • the first data has a first format.
  • the first data comprises tabular data.
  • the electronic device is configured to convert the first data into second data having a second format different from the first format.
  • the electronic device is configured to obtain third data indicative of a pattern.
  • the electronic device is configured to generate, based on the second data and the third data, an extraction result set comprising first extraction data.
  • the first extraction data has a third format different from the first format and from the second format.
  • the method comprises obtaining first data indicative of a document.
  • the first data has a first format.
  • the first data comprises tabular data.
  • the method comprises converting the first data into second data having a second format different from the first format.
  • the method comprises obtaining third data indicative of a pattern.
  • the method comprises generating, based on the second data and the third data, an extraction result set comprising first extraction data.
  • the first extraction data has a third format different from the first format and from the second format.
  • the disclosed electronic device and method allows for improved accuracy in tabular data extraction from documents and enables automation of the tabular data extraction.
  • the present disclosure may alleviate error propagation during the data handling process, e.g., using an appropriate pattern to generate the extraction result set.
  • extracted structural data can be fed into systems, which provides a more robust control of the data.
  • the extraction result set is generated so as to enable reuse and/or storage of the extraction result set by other systems and/or applications. Further, the extracted data can be used for auto reconciliation with other datasets, which would otherwise have ended in a laborious manual activity.
  • the present disclosure may alleviate the need for marking and saving (e.g. manually marking and saving) tabular coordinates for data extraction from a document.
  • the disclosed technique provides real time results and does not require any storage of the tabular coordinates that needs to be extracted from the document.
  • Fig. 1 is a diagram illustrating schematically a process where the disclosed technique is carried out by an example electronic device according to this disclosure
  • Figs. 2A-2B are diagrams illustrating an exemplary data extraction, performed by an electronic device, according to this disclosure.
  • Figs. 3A-3B are diagrams illustrating an exemplary data extraction, performed by an electronic device, according to this disclosure.
  • Fig. 4 is a flow-chart illustrating an exemplary method, performed by an electronic device, for providing an extraction result set according to this disclosure
  • Fig. 5 is a block diagram illustrating an exemplary electronic device according to this disclosure.
  • Fig. 1 is a diagram illustrating schematically an example process 1 where the disclosed technique is carried out by an example electronic device according to the disclosure.
  • the example process, performed by the electronic device may provide, such as by extracting the content, an extraction result set, e.g., tabular data set, from a document.
  • the document may be a document in one or more formats, e.g. a portable document format, PDF, excel format, image format, e.g., JPEG, PNG, TIFF, and GIF, and bitmapped image file format, BMP, etc.
  • a portable document format e.g., PDF, excel format
  • image format e.g., JPEG, PNG, TIFF, and GIF
  • bitmapped image file format e.g., JPEG, PNG, TIFF, and GIF
  • Fig. 1 may be seen as an illustration of the logic to provide an extraction result set from a document, such as a document comprising data provided in a table format.
  • the electronic device obtains first data 4 indicative of a document, such as a document comprising a commercial invoice, a packing list, a list of type of goods and associated information, permits and licenses for operations, credentials associated import and export ports, shipping containers information, and/or freight information.
  • a document such as a document comprising a commercial invoice, a packing list, a list of type of goods and associated information, permits and licenses for operations, credentials associated import and export ports, shipping containers information, and/or freight information.
  • the first data 4 may have a first format, such as portable document format, PDF, excel format, image format, e.g., JPEG, PNG, TIFF, GIF, and/or bitmapped image file format, BMP, etc.
  • a first format such as portable document format, PDF, excel format, image format, e.g., JPEG, PNG, TIFF, GIF, and/or bitmapped image file format, BMP, etc.
  • the first data 4 may comprise tabular data, such as information arranged in rows and columns.
  • An example tabular data is illustrated in Figs. 2A and 3A.
  • the electronic device can convert 5, the first data 4 into second data 6.
  • the second data 6 may have a second format.
  • the second format can be one or more of: Hyper Text Markup Language, HTML, format, Text, TXT, format, Document, DOC, format, etc.
  • the second format may be different from the first format.
  • the electronic device may execute a Robotic Process Automation, RPA, to convert the first data 4 into the second data 6.
  • the electronic device for example obtains third data 8.
  • the third data may be indicative of a pattern.
  • a pattern may be seen as an arrangement and/or sequence of data element showing a relation between the data elements.
  • data having a pattern may be data arranged in a particular sequence, that may repeat.
  • the pattern may indicate words and/or numbers arranged in a particular sequence and/or having a particular relation.
  • An example pattern is illustrated in Figs. 2B and 3B.
  • the electronic device generates 10, based on the second data 6 and the third data 8, an extraction result set 12.
  • the extraction result set 12 may comprise first extraction data 14.
  • the first extraction data 14 may have a third format.
  • the third format may be a text string, such as in Text, TXT, format, Document, DOC, format, JavaScript Object Notation, JSON etc.
  • the third format may be different from the first format.
  • the third format may be different from the second format.
  • the electronic device obtains a pdf file and a Pattern to be matched.
  • the electronic device may comprise a data Extractor that applies the pattern to be matched on the tabular data.
  • the electronic device may comprise an RPA that e.g. converts the pdf to HTML format which enables the data extractor to identify the tabular data and looks out for matching patterns provided as an input. This enables the electronic device to deliver just in time outcomes (e.g. outputs, e.g., extraction result set comprising first extraction data) rather than the cumbersome process of marking the pdf for the tabular coordinates.
  • the extracted result set is for example in a third format, such as standard JavaScript Object Notation, and can be easily integrated with downstream systems.
  • Fig. 2A is a diagram illustrating an exemplary table 50 comprising 20 rows and 2 columns.
  • a document may include a table like table 50.
  • the electronic device disclosed herein may obtain first data indicative of the document, such as a PDF file.
  • the first data comprises tabular data 52 of table 50.
  • the tabular data 52 illustrated is provided to the example electronic device to generate an extraction result set 56 according to this disclosure.
  • the first data comprising the tabular data 52 may be converted into second data by the electronic device.
  • the table 50 comprises 2 columns, such as a first column and a second column.
  • the first column may represent a first parameter.
  • the first column comprises data elements associated with the first parameter.
  • the second column may represent a second parameter.
  • the second column comprises data elements associated with the second parameter.
  • tabular data 52 may be indicative of freight data, such as freight allocation information.
  • the first data may comprise information indicative of a shipping package reference number and information indicative a Bureau of International Containers, BIC, number of a container in which the shipping package is loaded.
  • the first parameter provided in the first column may be a shipping package reference number.
  • the first column may comprise data elements indicative of shipping package reference numbers.
  • the second parameter provided in the second column may be a BIC, number of a container.
  • the second column may comprise data elements indicative of BIC, numbers of containers.
  • the shipping package reference number may be seen as a first parameter, and container BIC number may be seen as a second parameter.
  • the table may comprise N rows and M columns (where M and N are positive integers).
  • the freight allocation information may comprise M shipping package reference numbers and N container BIC numbers.
  • the M shipping package reference numbers may be placed in a column, such as the first column of the table 50
  • the N container BIC numbers may be placed in a column, such as the second column of the table 50.
  • the first column of the table 50 may be associated with the first parameter
  • the second column of the table may be associated with the second parameter of the tabular data, such as freight data.
  • Fig. 2B is illustrating exemplary data such as third data indicative of a pattern, such as the pattern 54 associated with e.g. tabular data 52 of Fig. 2A.
  • the pattern comprises 2 parts, such as a first part 54A and a second part 54B.
  • the pattern 54 may be obtained (such as retrieved and/or received) by the electronic device.
  • the pattern 54 may be provided by a user as an input to the electronic device and/or as input to another electronic device.
  • the pattern provides the arrangement of the first part and the second part, e.g. the arrangement of the first part in relation to the second part.
  • the first part 54A may be seen as information representing a first parameter.
  • the second part 54B may be seen as information representing a second parameter.
  • the first part 54A may comprise one or more letters and/or one or more numbers, e.g., reference numbers, identification codes, e.g., BIC codes.
  • the second part 54B may comprise one or more letters and/or one or more numbers, e.g., reference numbers, identification codes, e.g., BIC codes.
  • the first part 54A indicates the data of a first cell of the first column of table 50
  • the second part 54B indicates the data of a first cell of the second column of table 50.
  • the pattern 54 may not comprise data indicative of the first cell(s) of the first column or data indicative of the second column of the table 50.
  • the pattern 54 may not comprise data indicative of the first cell(s) of the first column nor data indicative of the second column of the table 50 but may comprise a generic format similar to a data format of cell(s) of the first column and/or of cell(s) of the second column to provide an indication.
  • the third data can indicate the pattern for extraction.
  • the third data can indicate how to extract the data.
  • the third data can be provided in a third format.
  • the third format may be a text, TXT, format.
  • the third data may comprise a string.
  • the parts of the pattern may be seen as strings.
  • the electronic device may obtain the pattern 54.
  • the electronic device may optionally obtain the parts of the patterns 54.
  • the electronic device may be a client device and/or a server device.
  • the electronic device may comprise an application programming interface, API, configured to obtain from a user a pattern and a document, via the first data and the third data respectively.
  • the electronic device may be an API configured to provide, based on the document and the pattern (e.g. via the first data and the third data respectively), the extraction result set to e.g. another device or machine.
  • the API can be hosted in a server device or on a distributed cloud.
  • the electronic device may be a tabular extraction data device.
  • the electronic device may use the second data (e.g. an HTML file) and the third data (such as the pattern 54) to generate an extraction result set 56 comprising extraction data, such as first extraction data 56A and optionally second extraction data.
  • the second data e.g. an HTML file
  • the third data such as the pattern 54
  • Generating an extraction result set 56 may comprise extracting the data elements of tabular data 52 that follow the pattern 54 indicated by the third data.
  • the extracted result set 56 may be in a third format, such as the standard JavaScript Object Notation format.
  • the electronic device may provide the extraction result set 56 to a control system, e.g. a shipping control system.
  • the electronic device may use the extracted result set 56 to control a process, such as a cost estimation, and/or generating invoices.
  • the electronic device may use the extracted result set 56 to control a machine, such as controlling the operation of cranes at the port.
  • the electronic device may provide (e.g. transmit) the extracted result set 56 and/or the third data (e.g. pattern) to another electronic device. Additionally and/or alternatively, the electronic device may provide (e.g. transmit) the extracted result set 56 and/or the third data (e.g. pattern) to a machine for controlling the machine, such as for controlling the operation of cranes at the port.
  • the extracted result set 56 may be in a third format comprises a text string.
  • the text string may be in the standard JavaScript Object Notation, JSON, format.
  • the recipient machine of the extracted result set 56 and/or the third data can use a JSON formatter and the third data (e.g. pattern) to read out the first extraction data from the extracted result set 56.
  • the extracted result set may be in form of e.g.: ⁇ A1 ,A2,B1 ,B2,C1 ,C2,D1 ,D2 ⁇ .
  • a machine receiving the extracted result and an associated pattern can use the pattern to read the extracted result set.
  • the machine can identify the first series as A1 from the pattern and the second series as B1.
  • the machine can classify extraction data in the extracted result set as A1 (first column) until the first occurrence of B1 . This can be implemented for example in a loop (for example by using a JSON formatter).
  • the machine can read the extracted result set e.g.:
  • FIG. 3A is illustrating an exemplary table 70 comprising 20 rows and 3 columns.
  • the table 70 provides tabular data 72 on which disclosed technique is carried out by an example electronic device to generate an extraction result set 76 according to this disclosure.
  • the table 70 comprises 3 columns, such as a first column, a second column, and a third column.
  • the first column may represent a first parameter.
  • the first column comprises data elements associated with the first parameter.
  • the second column may represent a second parameter.
  • the second column comprises data elements associated with the second parameter.
  • the third column comprises data elements associated with the third parameter.
  • tabular data 72 may be indicative of freight data, such as freight cost information which may comprise information indicative of a shipping package reference number and information indicative a BIC number of a container in which the shipping package is loaded and freight cost for shipping.
  • freight cost information may comprise information indicative of a shipping package reference number and information indicative a BIC number of a container in which the shipping package is loaded and freight cost for shipping.
  • the first data may comprise information indicative of a shipping package reference number and information indicative a BIC, number of a container in which the shipping package is loaded.
  • the first parameter provided in the first column may be a shipping package reference number.
  • the first column may comprise data elements indicative of shipping package reference numbers.
  • the second parameter provided in the second column may be a BIC, number of a container.
  • the second column may comprise data elements indicative of BIC, numbers of containers.
  • the third parameter provided in the third column may be a freight cost.
  • the third column may comprise data elements indicative of freight costs.
  • the freight cost information may comprise a plurality of shipping package reference numbers, a plurality of container BIC numbers, and a plurality of freight costs for shipping associated with corresponding shipping P packages in containers.
  • Shipping package reference numbers may be placed in a column, such as the first column of the table 70
  • container BIC numbers may be placed in a column, such as the second column of the table 70
  • freight costs for shipping may be placed in a column, such as the third column of table 70.
  • Fig. 3B is illustrating exemplary data, such as third data indicative of a pattern, such as the pattern 74.
  • the pattern comprises 3 parts, such as a first part 74A, a second part 74B, a third part 74C.
  • the pattern 74 may be obtained by the electronic device.
  • the arrangement of the first part 74A, the second part 74B, and the third part 74C in the pattern may be obtained by the electronic device.
  • the third data may have a format, such as a third format.
  • the third format may be a text, TXT, format.
  • the third data may comprise a string.
  • the parts of the pattern may be seen as strings.
  • the third data may comprise data indicative of a first part of the pattern, data indicative of a second part of the pattern and optionally data indicative of a third part of the pattern.
  • the third data can include data indicative of first part 54A, and data indicative of the second part 54B, of the pattern 54 as illustrated in Figs. 2A-B.
  • the third data can include data indicative of first part 74A, data indicative of the second part 74B, and data indicative of the third part 74C of the pattern 74 as illustrated in Figs. 3A-B. in the example of the pattern 54 of Fig. 2B.
  • the electronic device may obtain, via the third data, the pattern 74 for extracting data from the tabular data while maintaining the association between data elements of a first column with corresponding data elements of the second and third columns.
  • the electronic device may generate, based on the second data and the pattern (such as the pattern 74), an extraction result set 76 comprising extraction data, such as first extraction data 76A.
  • Generating an extraction result set 76 may comprise extracting the data elements of tabular data 72 which follow the pattern 74 indicated by the third data.
  • the extracted result set 76 may be in a third format, such as the standard JavaScript Object Notation format.
  • the electronic device may provide the extraction result set 76 to a control system.
  • the electronic device may use the extracted result set 76 to control a process, such as freight scheduling.
  • the electronic device may use the extracted result set 76 to control a machine, such as controlling the operation of cranes at the port to prioritize the container handling for express shipping.
  • Fig. 4 shows a flow diagram of an exemplary method 100, performed by an electronic device according to the disclosure, for providing an extraction result set.
  • the electronic device is the electronic device disclosed herein, such as the electronic device 300 of Fig.
  • the method 100 comprises obtaining S102 first data indicative of a document.
  • the first data has a first format.
  • the first data comprises tabular data.
  • Example documents include one or more of: shipping order, invoice data, and freight document.
  • the first data may be indicative of a document, such as a first document.
  • the first data has a format, such as the first format.
  • the first format may be a portable document format, PDF.
  • the first format may be one or more of: an excel format, an image format, e.g., JPEG, PNG, TIFF, GIF, and/or bitmapped image file format, BMP.
  • the first data may comprise tabular data.
  • tabular data may be seen as data provided in a table.
  • tabular data includes information arranged in Rows and Columns format, each row and/or column representing a data element group, such as one or more data elements related to the same parameter same type.
  • the first document may be a document having a PDF format retrieved and/or received from a sender by the electronic device 300.
  • the document (such as PDF file) may comprise information indicative of freight data and/or of invoicing data.
  • Freight data may be indicative of freight allocation information which may comprise information indicative of a shipping package reference number and information indicative a BIC number of a container in which the shipping package is loaded.
  • the format of the document may represent the first format of the first data.
  • the first data may comprise data indicative of shipping data and/or invoicing data and/or legal data and/or technical data.
  • the first data may be arranged in tabular format.
  • the first data comprises data elements.
  • the data elements may be indicative of freight information, e.g., commodity information, shipping quantity, shipping rates, shipping cost, discounts, and/or total cost of the fulfilled service, etc.
  • the data elements may be indicative of legal information.
  • the data elements may be indicative of billing information.
  • the data elements may be indicative of technical information.
  • the method 100 comprises converting S104 the first data into second data having a second format different from the first format.
  • the second format may be one of Hyper Text Markup Language, HTML, format, Text, TXT, format, and Document, DOC, format.
  • the first format may be PDF format.
  • the second format may be the HTML format.
  • the first format and the second format may be similar.
  • the second format may be a default format, such as HTML format.
  • the first data and the second data may represent similar information.
  • the method 100 comprises obtaining S106 third data indicative of a pattern.
  • the third data may be obtained via user input and/or an application programming interface.
  • the third data may be indicative of a pattern, such as a first pattern, and optionally a second pattern, and optionally a third pattern.
  • the pattern may comprise one or more parts. The one or more parts may be seen as one or more attributes representing a relation between the one or more data elements of the first data.
  • the method 100 comprises generating S108, based on the second data and the third data, an extraction result set comprising first extraction data.
  • the first extraction data has a third format different from the first format and from the second format.
  • the third format comprises a text string, such as one of Text, TXT, format, and Document, DOC, format.
  • the third format may include a JavaScript Object Notation format.
  • the extraction result set and the first data may represent similar information but where the extraction result set is adapted to provide the information to a control system.
  • converting S104 the first data into second data comprises executing S104A a Robotic Process Automation, RPA.
  • the RPA is configured to convert the first data into the second data.
  • RPA may be seen as a program that performs the automated steps, e.g.
  • the RPA may be configured to convert the first data with the first format into the second data with the second format.
  • RPA may use a format converter, e.g., PDF to HTML, to convert the first data with first format into second data with the second format.
  • the RPA is configured to obtain the third data indicative of the pattern.
  • the RPA may be configured to obtain the third data from a user input and/or an application programming interface (API) and/or the memory of the electronic device in which the RPA is executed.
  • RPA may be configured to generate the third data dynamically based on the historical data.
  • API application programming interface
  • the method 100 comprises determining S105, based on one or more sample documents, the third data indicative of the pattern.
  • the pattern may comprise one or more attributes, as illustrated in Fig. 2B and 3B.
  • the one or more attributes may be seen as providing a relation between one or more data elements of the first data so as to extract robustly e.g. the first extraction data.
  • RPA may be configured to obtain the pattern.
  • the third data may be generated, by the electronic device, based on the one or more sample documents by identifying one or more patterns in the sample documents.
  • the one or more sample documents may be seen as templates, e.g., templates related freight invoices, and documents comprising, freight details, freight acknowledgements, etc.
  • the one or more sample documents may comprise documents that are already processed.
  • the one or more sample documents may be provided as input to the electronic device.
  • the pattern is a target pattern indicative of a relation between data elements of the first data for extraction.
  • the first data may comprise the tabular data.
  • the tabular data may be arranged in one or more rows and one or more columns.
  • the pattern may be a target pattern having the arrangement associating data elements of a first column with corresponding data elements in a second column in a same row.
  • generating S108, based on the second data and the third data, the extraction result set comprising the first extraction data comprises generating S108A, based on the second data and the pattern indicated in the third data, the first extraction data.
  • the extraction result set comprises second extraction data
  • the second extraction data is generated based on the second data and the pattern indicated in the third data.
  • the extraction result set comprises third extraction data
  • the third extraction data is generated based on the second data and the pattern indicated in the third data.
  • the extraction result set and the first data may represent similar information, however the extraction result set is provided in a format that can be used by the control systems downstream.
  • generating S108, based on the second data and the third data, the extraction result set comprises extracting S108B data elements of the second data that are matching the pattern indicated by the third data. In one or more example methods, generating S108, based on the second data and the third data, the extraction result set comprises extracting S108B data elements of the second data that follow the pattern indicated by the third data.
  • the RPA may execute the extraction of data elements of the tabular data having the second format.
  • the electronic device may be configured to look for a similar matching pattern, such as the target pattern, in the second data to extract the data elements.
  • the first format is a Portable Document Format, PDF.
  • the second format comprises a Hyper Text Markup Language, HTML, format.
  • the third format comprises a text string.
  • the text string may be in the standard JavaScript Object Notation, JSON, format.
  • the method 100 comprises providing S110 the extraction result set to a control system.
  • the control system may be an invoicing control system, and/or a shipping control system.
  • providing the extraction result set to a control system may comprise controlling the control system.
  • the control system may be a logistics control system.
  • the method 100 comprises controlling S112, based on the extraction result set, a process and/or a machine.
  • the process can be a downstream system, such as a logistics system, and/or a shipping system and/or a billing system.
  • the extraction result set may be fed to the control system by the electronic device to control the process of the control system, such as controlling logistics processes, e.g., updating the priorities of shipment of containers.
  • the electronic device may be seen as a computing device for extraction of tabular data, such as a standalone computing system.
  • the electronic device may be seen as a computing device for extraction of tabular data, such as a client device and/or a server device.
  • the electronic device may be an API configured to obtain from a user a pattern and a document.
  • the electronic device may be an API configured to provide, based on the document and the pattern, the extraction result set to e.g. another device or machine.
  • the API can be hosted in a server device or on a distributed cloud.
  • the electronic device may be a tabular extraction data device.
  • the extraction result set may be fed to the control system by the electronic device to control a machine, such as controlling a machine logistics operation, e.g., turning off a machine when there is less freight to handle by the control system.
  • a machine logistics operation e.g., turning off a machine when there is less freight to handle by the control system.
  • Fig. 5 shows a block diagram of an exemplary electronic device 300 according to the disclosure.
  • the electronic device 300 comprises a memory circuitry 301 , a processor circuitry 302, and an interface 303.
  • the electronic device 300 is configured to perform any of the methods disclosed in Fig. 4. In other words, the electronic device 300 is configured for providing an extraction result set.
  • the electronic device may be seen as a computing device for extraction of tabular data, such as a standalone computing system.
  • the electronic device may be seen as a computing device for extraction of tabular data, such as a client device and/or a server device.
  • the electronic device may be an API configured to obtain from a user a pattern and a document.
  • the electronic device may be an API configured to provide, based on the document and the pattern, the extraction result set to e.g. another device or machine.
  • the API can be hosted in a server device or on a distributed cloud.
  • the electronic device may be a tabular extraction data device.
  • the electronic device 300 is configured to obtain (such as using the processor circuitry 302, and/or via the interface 303) first data indicative of a document.
  • the first data has a first format.
  • the first data comprises tabular data.
  • the electronic device 300 is configured to convert (such as using the processor circuitry 302) the first data into second data having a second format different from the first format.
  • the electronic device 300 is configured to obtain (such as using the processor circuitry 302, and/or via the interface 303) third data indicative of a pattern.
  • the electronic device 300 is configured to generate (such as using the processor circuitry 302), based on the second data and the third data, an extraction result set comprising first extraction data.
  • the first extraction data has a third format different from the first format and from the second format.
  • the electronic device 300 is configured to execute (such as using the processor circuitry 302) a Robotic Process Automation, RPA.
  • the RPA is configured to convert (such as using the processor circuitry 302) the first data into the second data.
  • the RPA is configured to obtain (such as using the processor circuitry 302, and/or via the interface 303) the third data indicative of the pattern.
  • the electronic device 300 is configured to determine (such as using the processor circuitry 302), based on one or more sample documents, the third data indicative of the pattern.
  • the pattern is a target pattern indicative of a relation between data elements of the first data for extraction.
  • the electronic device 300 is configured to generate (such as using the processor circuitry 302), based on the second data and the pattern indicated in the third data, the first extraction data.
  • the electronic device 300 is configured to generate (such as using the processor circuitry 302), based on the second data and the third data, the extraction result set by extracting data elements of the second data matching the pattern indicated by the third data.
  • the first format is a Portable Document Format, PDF.
  • the second format comprises a Hyper Text Markup Language, HTML, format.
  • the third format comprises a text string.
  • the electronic device obtains a pdf file and a Pattern to be matched.
  • the electronic device may comprise a data Extractor that applies the pattern to be matched on the tabular data.
  • the electronic device may comprise an RPA that e.g. converts the pdf to HTML format which enables the data extractor to identify the tabular data and looks out for matching patterns provided as an input. This enables the electronic device to deliver just in time outcomes (e.g. the extraction result set) rather than the cumbersome process of marking the pdf for the tabular coordinates.
  • the extracted result set is for example in a third format, such as standard JavaScript Object Notation, and can be easily integrated with downstream systems.
  • the electronic device 300 is configured to provide the extraction result set to a control system.
  • the electronic device 300 is configured to control, based on the extraction result set, a process and/or a machine.
  • the processor circuitry 302 is optionally configured to perform any of the operations disclosed in Fig. 4 (such as any one or more of: S102, S104, S104A, S105, S106, S108, S108A, S108B, S110, S112).
  • the operations of the electronic device 300 may be embodied in the form of executable logic routines (e.g., lines of code, software programs, etc.) that are stored on a non-transitory computer readable medium (e.g., the memory circuitry 301) and are executed by the processor circuitry 302.
  • the operations of the electronic device 300 may be considered a method that the electronic device 300 is configured to carry out. Also, while the described functions and operations may be implemented in software, such functionality may as well be carried out via dedicated hardware or firmware, or some combination of hardware, firmware and/or software.
  • the memory circuitry 301 may be one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, a random access memory (RAM), or other suitable device.
  • the memory circuitry 301 may include a non-volatile memory for long term data storage and a volatile memory that functions as system memory for the processor circuitry 302.
  • the memory circuitry 301 may exchange data with the processor circuitry 302 over a data bus. Control lines and an address bus between the memory circuitry 301 and the processor circuitry 302 also may be present (not shown in Fig. 5).
  • the memory circuitry 301 is considered a non-transitory computer readable medium.
  • the memory circuitry 301 may be configured to store first data, second data, third data, first extraction data, and extraction result.
  • the memory circuitry 301 may be configured to store one or more programs in a part of the memory.
  • the one or more programs may comprise instructions, which when executed by an electronic device cause the electronic device to perform any of the methods disclosed in Fig. 4.
  • Embodiments of methods and products (electronic device) according to the disclosure are set out in the following items:
  • Item 1 An electronic device comprising memory circuitry, processor circuitry, and an interface, wherein the electronic device is configured to obtain first data indicative of a document, wherein the first data has a first format, wherein the first data comprises tabular data; convert the first data into second data having a second format different from the first format; obtain third data indicative of a pattern; and generate, based on the second data and the third data, an extraction result set comprising first extraction data, wherein the first extraction data has a third format different from the first format and from the second format.
  • Item 2 The electronic device of item 1 , wherein the electronic device is configured to execute a Robotic Process Automation, RPA, wherein the RPA is configured to convert the first data into the second data.
  • RPA Robotic Process Automation
  • Item 3 The electronic device of item 2, wherein the RPA is configured to obtain the third data indicative of the pattern.
  • Item 4. The electronic device of any of the previous items, wherein the electronic device is configured to determine, based on one or more sample documents, the third data indicative of the pattern.
  • Item 5 The electronic device of any of the previous items, wherein the pattern is a target pattern indicative of a relation between data elements of the first data for extraction.
  • Item 6 The electronic device of any of the previous items, wherein the electronic device is configured to generate, based on the second data and the pattern indicated in the third data, the first extraction data.
  • Item 7 The electronic device of any of the previous items, wherein the electronic device is configured to generate, based on the second data and the third data, the extraction result set by extracting data elements of the second data matching the pattern indicated by the third data.
  • Item 8 The electronic device of any of the previous items, wherein the first format is a Portable Document Format, PDF.
  • Item 9 The electronic device of any of the previous items, wherein the second format comprises a Hyper Text Markup Language, HTML, format.
  • Item 10 The electronic device of any of the previous items, wherein the third format comprises a text string.
  • Item 11 The electronic device of any of the previous items, wherein the electronic device is configured to provide the extraction result set to a control system.
  • Item 12 The electronic device of any of the previous items, wherein the electronic device is configured to control, based on the extraction result set, a process and/or a machine.
  • a method, performed by an electronic device, for providing an extraction result set comprising: obtaining (S102) first data indicative of a document, wherein the first data has a first format, wherein the first data comprises tabular data; converting (S104) the first data into second data having a second format different from the first format; obtaining (S106) third data indicative of a pattern; and generating (S108), based on the second data and the third data, an extraction result set comprising first extraction data, wherein the first extraction data has a third format different from the first format and from the second format.
  • Item 14 The method of item 13, wherein converting (S104) the first data into second data comprises executing (S104A) a Robotic Process Automation, RPA, wherein the RPA is configured to convert the first data into the second data.
  • S104A Robotic Process Automation
  • Item 15 The method of item 14, wherein the RPA is configured to obtain the third data indicative of the pattern.
  • Item 16 The method according to any of items 13-15, the method comprising determining (S105), based on one or more sample documents, the third data indicative of the pattern.
  • Item 17 The method according to any of items 13-16, wherein the pattern is a target pattern indicative of a relation between data elements of the first data for extraction.
  • Item 18 The method according to any of items 13-17, wherein generating (S108), based on the second data and the third data, the extraction result set comprising the first extraction data comprises generating (S108A), based on the second data and the pattern indicated in the third data, the first extraction data.
  • Item 19 The method according to any of items 13-18, wherein generating (S108), based on the second data and the third data, the extraction result set comprises extracting (S108B) data elements of the second data that are matching the pattern indicated by the third data.
  • Item 20 The method according to any of items 13-19, wherein the first format is a Portable Document Format, PDF.
  • Item 21 The method according to any of items 13-20, wherein the second format comprises a Hyper Text Markup Language, HTML, format.
  • Item 22 The method according to any of items 13-21 , wherein the third format comprises a text string.
  • Item 23 The method according to any of items 13-22, the method comprising providing (S110) the extraction result set to a control system.
  • Item 24 The method according to any of items 13-23, the method comprising controlling (S112), based on the extraction result set, a process and/or a machine.
  • Item 25 A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by an electronic device cause the electronic device to perform any of the methods of items 13-24.
  • first”, “second”, “third” and “fourth”, “primary”, “secondary”, “tertiary” etc. does not imply any particular order, but are included to identify individual elements.
  • the use of the terms “first”, “second”, “third” and “fourth”, “primary”, “secondary”, “tertiary” etc. does not denote any order or importance, but rather the terms “first”, “second”, “third” and “fourth”, “primary”, “secondary”, “tertiary” etc. are used to distinguish one element from another.
  • the words “first”, “second”, “third” and “fourth”, “primary”, “secondary”, “tertiary” etc. are used here and elsewhere for labelling purposes only and are not intended to denote any specific spatial or temporal ordering.
  • the labelling of a first element does not imply the presence of a second element and vice versa.
  • Figs. 1-5 comprises some circuitries or operations which are illustrated with a solid line and some circuitries or operations which are illustrated with a dashed line.
  • the circuitries or operations which are comprised in a solid line are circuitries or operations which are comprised in the broadest example embodiment.
  • the circuitries or operations which are comprised in a dashed line are example embodiments which may be comprised in, or a part of, or are further circuitries or operations which may be taken in addition to the circuitries or operations of the solid line example embodiments. It should be appreciated that these operations need not be performed in order presented.
  • a computer-readable medium may include removable and non-removable storage devices including, but not limited to, Read Only Memory (ROM), Random Access Memory (RAM), compact discs (CDs), digital versatile discs (DVD), etc.
  • program circuitries may include routines, programs, objects, components, data structures, etc. that perform specified tasks or implement specific abstract data types.
  • Computer-executable instructions, associated data structures, and program circuitries represent examples of program code for executing steps of the methods disclosed herein. The particular sequence of such executable instructions or associated data structures represents examples of corresponding acts for implementing the functions described in such steps or processes.

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Quality & Reliability (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Character Input (AREA)

Abstract

Disclosed is an electronic device. The electronic device comprises memory circuitry, processor circuitry, and an interface. The electronic device is configured to obtain first data indicative of a document, wherein the first data has a first format, wherein the first data comprises tabular data. The electronic device is configured to convert the first data into second data having a second format different from the first format. The electronic device is configured to obtain third data indicative of a pattern. The electronic device is configured to generate, based on the second data and the third data, an extraction result set comprising first extraction data, wherein the first extraction data has a third format different from the first format and from the second format.

Description

AN ELECTRONIC DEVICE AND A METHOD FOR TABULAR DATA EXTRACTION
The present disclosure pertains to the field of electronic document control and management. The present disclosure relates to an electronic device and a related method for tabular data extraction.
BACKGROUND
Numerous types of documents are being exchanged across the various processes. Each document varies in the manner the content is presented. One of the common repetitive formats of content exchange to represent data in a homogenous form is tabular structure. Purchase Orders, and invoices typically have tabular data to represent the commodity, quantity, rates, cost, discounts, and total cost of the fulfilled service. Extracting the data from a table is performed manually which is time consuming and more prone to errors.
SUMMARY
Accordingly, there is a need for an electronic device and a method, which mitigate, alleviate, or address the shortcomings existing and provide a solution for tabular data extraction.
An electronic device is disclosed, the electronic device comprising memory circuitry, processor circuitry, and an interface. The electronic device is configured to obtain first data indicative of a document. The first data has a first format. The first data comprises tabular data. The electronic device is configured to convert the first data into second data having a second format different from the first format. The electronic device is configured to obtain third data indicative of a pattern. The electronic device is configured to generate, based on the second data and the third data, an extraction result set comprising first extraction data. The first extraction data has a third format different from the first format and from the second format.
Disclosed is a method, performed by an electronic device, for providing an extraction result set. The method comprises obtaining first data indicative of a document. The first data has a first format. The first data comprises tabular data. The method comprises converting the first data into second data having a second format different from the first format. The method comprises obtaining third data indicative of a pattern. The method comprises generating, based on the second data and the third data, an extraction result set comprising first extraction data. The first extraction data has a third format different from the first format and from the second format.
Disclosed is a computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by an electronic device with a display and a touch-sensitive surface cause the electronic device to perform any of the methods disclosed herein.
It is an advantage of the present disclosure that the disclosed electronic device and method allows for improved accuracy in tabular data extraction from documents and enables automation of the tabular data extraction.
The present disclosure may alleviate error propagation during the data handling process, e.g., using an appropriate pattern to generate the extraction result set.
It is an advantage of the present disclosure that extracted structural data can be fed into systems, which provides a more robust control of the data. The extraction result set is generated so as to enable reuse and/or storage of the extraction result set by other systems and/or applications. Further, the extracted data can be used for auto reconciliation with other datasets, which would otherwise have ended in a laborious manual activity.
Further, the present disclosure may alleviate the need for marking and saving (e.g. manually marking and saving) tabular coordinates for data extraction from a document.
The disclosed technique provides real time results and does not require any storage of the tabular coordinates that needs to be extracted from the document.
BRIEF DESCRIPTION OF THE DRAWINGS
The above and other features and advantages of the present disclosure will become readily apparent to those skilled in the art by the following detailed description of exemplary embodiments thereof with reference to the attached drawings, in which: Fig. 1 is a diagram illustrating schematically a process where the disclosed technique is carried out by an example electronic device according to this disclosure,
Figs. 2A-2B are diagrams illustrating an exemplary data extraction, performed by an electronic device, according to this disclosure,
Figs. 3A-3B are diagrams illustrating an exemplary data extraction, performed by an electronic device, according to this disclosure,
Fig. 4 is a flow-chart illustrating an exemplary method, performed by an electronic device, for providing an extraction result set according to this disclosure, and
Fig. 5 is a block diagram illustrating an exemplary electronic device according to this disclosure.
DETAILED DESCRIPTION
Various exemplary embodiments and details are described hereinafter, with reference to the figures when relevant. It should be noted that the figures may or may not be drawn to scale and that elements of similar structures or functions are represented by like reference numerals throughout the figures. It should also be noted that the figures are only intended to facilitate the description of the embodiments. They are not intended as an exhaustive description of the disclosure or as a limitation on the scope of the disclosure. In addition, an illustrated embodiment needs not have all the aspects or advantages shown. An aspect or an advantage described in conjunction with a particular embodiment is not necessarily limited to that embodiment and can be practiced in any other embodiments even if not so illustrated, or if not so explicitly described.
The figures are schematic and simplified for clarity, and they merely show details which aid understanding the disclosure, while other details have been left out. Throughout, the same reference numerals are used for identical or corresponding parts.
Even each document varies, a common repetitive format of content exchange to represent data in a homogenous form is tabular structure including tabular data. Purchase Orders, Invoices typically have tabular data to represent the commodity, quantity, rates, cost, discounts, and total cost of the fulfilled service. The present disclosure provides Robotic Process Automation Enabled Tabular Data Extraction for automatically fetching Tabular Data Content from documents, such as Portable Document Formats. Fig. 1 is a diagram illustrating schematically an example process 1 where the disclosed technique is carried out by an example electronic device according to the disclosure. The example process, performed by the electronic device, may provide, such as by extracting the content, an extraction result set, e.g., tabular data set, from a document. The document may be a document in one or more formats, e.g. a portable document format, PDF, excel format, image format, e.g., JPEG, PNG, TIFF, and GIF, and bitmapped image file format, BMP, etc.
Fig. 1 may be seen as an illustration of the logic to provide an extraction result set from a document, such as a document comprising data provided in a table format.
For example, the electronic device obtains first data 4 indicative of a document, such as a document comprising a commercial invoice, a packing list, a list of type of goods and associated information, permits and licenses for operations, credentials associated import and export ports, shipping containers information, and/or freight information.
The first data 4 may have a first format, such as portable document format, PDF, excel format, image format, e.g., JPEG, PNG, TIFF, GIF, and/or bitmapped image file format, BMP, etc.
The first data 4 may comprise tabular data, such as information arranged in rows and columns. An example tabular data is illustrated in Figs. 2A and 3A.
The electronic device can convert 5, the first data 4 into second data 6. The second data 6 may have a second format.
For example, the second format can be one or more of: Hyper Text Markup Language, HTML, format, Text, TXT, format, Document, DOC, format, etc. The second format may be different from the first format.
The electronic device may execute a Robotic Process Automation, RPA, to convert the first data 4 into the second data 6. The electronic device for example obtains third data 8. The third data may be indicative of a pattern. A pattern may be seen as an arrangement and/or sequence of data element showing a relation between the data elements. For example, data having a pattern may be data arranged in a particular sequence, that may repeat. The pattern may indicate words and/or numbers arranged in a particular sequence and/or having a particular relation. An example pattern is illustrated in Figs. 2B and 3B.
The electronic device generates 10, based on the second data 6 and the third data 8, an extraction result set 12. The extraction result set 12 may comprise first extraction data 14. The first extraction data 14 may have a third format. The third format may be a text string, such as in Text, TXT, format, Document, DOC, format, JavaScript Object Notation, JSON etc. The third format may be different from the first format. The third format may be different from the second format.
For example, the electronic device obtains a pdf file and a Pattern to be matched. The electronic device may comprise a data Extractor that applies the pattern to be matched on the tabular data. The electronic device may comprise an RPA that e.g. converts the pdf to HTML format which enables the data extractor to identify the tabular data and looks out for matching patterns provided as an input. This enables the electronic device to deliver just in time outcomes (e.g. outputs, e.g., extraction result set comprising first extraction data) rather than the cumbersome process of marking the pdf for the tabular coordinates. The extracted result set is for example in a third format, such as standard JavaScript Object Notation, and can be easily integrated with downstream systems.
Fig. 2A is a diagram illustrating an exemplary table 50 comprising 20 rows and 2 columns. For example, a document may include a table like table 50. For example, the electronic device disclosed herein may obtain first data indicative of the document, such as a PDF file. The first data comprises tabular data 52 of table 50.
The tabular data 52 illustrated is provided to the example electronic device to generate an extraction result set 56 according to this disclosure.
The first data comprising the tabular data 52 may be converted into second data by the electronic device.
The table 50 comprises 2 columns, such as a first column and a second column. The first column may represent a first parameter. The first column comprises data elements associated with the first parameter. The second column may represent a second parameter. The second column comprises data elements associated with the second parameter. For example, tabular data 52 may be indicative of freight data, such as freight allocation information. The first data may comprise information indicative of a shipping package reference number and information indicative a Bureau of International Containers, BIC, number of a container in which the shipping package is loaded. For example, the first parameter provided in the first column may be a shipping package reference number. The first column may comprise data elements indicative of shipping package reference numbers. For example, the second parameter provided in the second column may be a BIC, number of a container. The second column may comprise data elements indicative of BIC, numbers of containers.
The shipping package reference number may be seen as a first parameter, and container BIC number may be seen as a second parameter.
The table may comprise N rows and M columns (where M and N are positive integers).
The freight allocation information may comprise M shipping package reference numbers and N container BIC numbers. The M shipping package reference numbers may be placed in a column, such as the first column of the table 50, and the N container BIC numbers may be placed in a column, such as the second column of the table 50. The first column of the table 50 may be associated with the first parameter, and the second column of the table may be associated with the second parameter of the tabular data, such as freight data.
Fig. 2B is illustrating exemplary data such as third data indicative of a pattern, such as the pattern 54 associated with e.g. tabular data 52 of Fig. 2A. The pattern comprises 2 parts, such as a first part 54A and a second part 54B. The pattern 54 may be obtained (such as retrieved and/or received) by the electronic device. For example, the pattern 54 may be provided by a user as an input to the electronic device and/or as input to another electronic device. The pattern provides the arrangement of the first part and the second part, e.g. the arrangement of the first part in relation to the second part. The first part 54A may be seen as information representing a first parameter. The second part 54B may be seen as information representing a second parameter. The first part 54A may comprise one or more letters and/or one or more numbers, e.g., reference numbers, identification codes, e.g., BIC codes. The second part 54B may comprise one or more letters and/or one or more numbers, e.g., reference numbers, identification codes, e.g., BIC codes. In one or more examples, as illustrated in Fig. 2B, it can be that the first part 54A indicates the data of a first cell of the first column of table 50 and the second part 54B indicates the data of a first cell of the second column of table 50. In one or more examples, the pattern 54 may not comprise data indicative of the first cell(s) of the first column or data indicative of the second column of the table 50.
In one or more examples, the pattern 54 may not comprise data indicative of the first cell(s) of the first column nor data indicative of the second column of the table 50 but may comprise a generic format similar to a data format of cell(s) of the first column and/or of cell(s) of the second column to provide an indication.
The third data can indicate the pattern for extraction. In other words, the third data can indicate how to extract the data.
The third data can be provided in a third format. The third format may be a text, TXT, format. The third data may comprise a string. The parts of the pattern may be seen as strings.
The electronic device may obtain the pattern 54. The electronic device may optionally obtain the parts of the patterns 54. The electronic device may be a client device and/or a server device. For example, the electronic device may comprise an application programming interface, API, configured to obtain from a user a pattern and a document, via the first data and the third data respectively. For example, the electronic device may be an API configured to provide, based on the document and the pattern (e.g. via the first data and the third data respectively), the extraction result set to e.g. another device or machine. The API can be hosted in a server device or on a distributed cloud. The electronic device may be a tabular extraction data device.
The electronic device may use the second data (e.g. an HTML file) and the third data (such as the pattern 54) to generate an extraction result set 56 comprising extraction data, such as first extraction data 56A and optionally second extraction data.
Generating an extraction result set 56 may comprise extracting the data elements of tabular data 52 that follow the pattern 54 indicated by the third data. The extracted result set 56 may be in a third format, such as the standard JavaScript Object Notation format. The electronic device may provide the extraction result set 56 to a control system, e.g. a shipping control system.
The electronic device may use the extracted result set 56 to control a process, such as a cost estimation, and/or generating invoices. The electronic device may use the extracted result set 56 to control a machine, such as controlling the operation of cranes at the port.
The electronic device may provide (e.g. transmit) the extracted result set 56 and/or the third data (e.g. pattern) to another electronic device. Additionally and/or alternatively, the electronic device may provide (e.g. transmit) the extracted result set 56 and/or the third data (e.g. pattern) to a machine for controlling the machine, such as for controlling the operation of cranes at the port.
In one or more example methods, the extracted result set 56 may be in a third format comprises a text string. In one or more example methods, the text string may be in the standard JavaScript Object Notation, JSON, format.
For example, the recipient machine of the extracted result set 56 and/or the third data (e.g. pattern) can use a JSON formatter and the third data (e.g. pattern) to read out the first extraction data from the extracted result set 56.
For example, the extracted result set may be in form of e.g.: {A1 ,A2,B1 ,B2,C1 ,C2,D1 ,D2}. For example, a machine receiving the extracted result and an associated pattern can use the pattern to read the extracted result set. For example, the machine can identify the first series as A1 from the pattern and the second series as B1. For example, while reading the extracted result set, the machine can classify extraction data in the extracted result set as A1 (first column) until the first occurrence of B1 . This can be implemented for example in a loop (for example by using a JSON formatter). The machine can read the extracted result set e.g.:
Columnl - Result Set = {A1 ,A2}
Column2 - Result Set = {B1 ,B2}
Column3 - Result Set = {C1 ,02}
Column4 - Result Set ={D1 ,D2} Fig. 3A is illustrating an exemplary table 70 comprising 20 rows and 3 columns. The table 70 provides tabular data 72 on which disclosed technique is carried out by an example electronic device to generate an extraction result set 76 according to this disclosure.
The table 70 comprises 3 columns, such as a first column, a second column, and a third column. The first column may represent a first parameter. The first column comprises data elements associated with the first parameter. The second column may represent a second parameter. The second column comprises data elements associated with the second parameter. The third column comprises data elements associated with the third parameter.
For example, tabular data 72 may be indicative of freight data, such as freight cost information which may comprise information indicative of a shipping package reference number and information indicative a BIC number of a container in which the shipping package is loaded and freight cost for shipping.
The first data may comprise information indicative of a shipping package reference number and information indicative a BIC, number of a container in which the shipping package is loaded. For example, the first parameter provided in the first column may be a shipping package reference number. The first column may comprise data elements indicative of shipping package reference numbers. For example, the second parameter provided in the second column may be a BIC, number of a container. The second column may comprise data elements indicative of BIC, numbers of containers. For example, the third parameter provided in the third column may be a freight cost. The third column may comprise data elements indicative of freight costs.
The freight cost information may comprise a plurality of shipping package reference numbers, a plurality of container BIC numbers, and a plurality of freight costs for shipping associated with corresponding shipping P packages in containers. Shipping package reference numbers may be placed in a column, such as the first column of the table 70, container BIC numbers may be placed in a column, such as the second column of the table 70, and freight costs for shipping may be placed in a column, such as the third column of table 70. Fig. 3B is illustrating exemplary data, such as third data indicative of a pattern, such as the pattern 74. The pattern comprises 3 parts, such as a first part 74A, a second part 74B, a third part 74C. The pattern 74 may be obtained by the electronic device. The arrangement of the first part 74A, the second part 74B, and the third part 74C in the pattern may be obtained by the electronic device.
The third data may have a format, such as a third format. The third format may be a text, TXT, format. The third data may comprise a string. The parts of the pattern may be seen as strings. The third data may comprise data indicative of a first part of the pattern, data indicative of a second part of the pattern and optionally data indicative of a third part of the pattern. For example the third data can include data indicative of first part 54A, and data indicative of the second part 54B, of the pattern 54 as illustrated in Figs. 2A-B. For example the third data can include data indicative of first part 74A, data indicative of the second part 74B, and data indicative of the third part 74C of the pattern 74 as illustrated in Figs. 3A-B. in the example of the pattern 54 of Fig. 2B.
The electronic device may obtain, via the third data, the pattern 74 for extracting data from the tabular data while maintaining the association between data elements of a first column with corresponding data elements of the second and third columns.
The electronic device may generate, based on the second data and the pattern (such as the pattern 74), an extraction result set 76 comprising extraction data, such as first extraction data 76A. Generating an extraction result set 76 may comprise extracting the data elements of tabular data 72 which follow the pattern 74 indicated by the third data.
The extracted result set 76 may be in a third format, such as the standard JavaScript Object Notation format. The electronic device may provide the extraction result set 76 to a control system.
The electronic device may use the extracted result set 76 to control a process, such as freight scheduling. The electronic device may use the extracted result set 76 to control a machine, such as controlling the operation of cranes at the port to prioritize the container handling for express shipping.
Fig. 4 shows a flow diagram of an exemplary method 100, performed by an electronic device according to the disclosure, for providing an extraction result set. The electronic device is the electronic device disclosed herein, such as the electronic device 300 of Fig.
5.
The method 100 comprises obtaining S102 first data indicative of a document. In one or more example methods, the first data has a first format. In one or more example methods, the first data comprises tabular data. Example documents include one or more of: shipping order, invoice data, and freight document.
In one or more example methods, the first data may be indicative of a document, such as a first document. In one or more example methods, the first data has a format, such as the first format. The first format may be a portable document format, PDF. The first format may be one or more of: an excel format, an image format, e.g., JPEG, PNG, TIFF, GIF, and/or bitmapped image file format, BMP.
In one or more example methods, the first data may comprise tabular data. For example, tabular data may be seen as data provided in a table. For example, tabular data includes information arranged in Rows and Columns format, each row and/or column representing a data element group, such as one or more data elements related to the same parameter same type.
For example, the first document may be a document having a PDF format retrieved and/or received from a sender by the electronic device 300. The document (such as PDF file) may comprise information indicative of freight data and/or of invoicing data. Freight data may be indicative of freight allocation information which may comprise information indicative of a shipping package reference number and information indicative a BIC number of a container in which the shipping package is loaded.
The format of the document may represent the first format of the first data. The first data may comprise data indicative of shipping data and/or invoicing data and/or legal data and/or technical data. The first data may be arranged in tabular format.
In one or more example methods, the first data comprises data elements. The data elements may be indicative of freight information, e.g., commodity information, shipping quantity, shipping rates, shipping cost, discounts, and/or total cost of the fulfilled service, etc. The data elements may be indicative of legal information. The data elements may be indicative of billing information. The data elements may be indicative of technical information.
The method 100 comprises converting S104 the first data into second data having a second format different from the first format.
In one or more example methods, the second format may be one of Hyper Text Markup Language, HTML, format, Text, TXT, format, and Document, DOC, format.
In one or more example methods, the first format may be PDF format. In one or more example methods, the second format may be the HTML format. In one or more example methods, the first format and the second format may be similar. In one or more example methods, the second format may be a default format, such as HTML format. In one or more example methods, the first data and the second data may represent similar information.
The method 100 comprises obtaining S106 third data indicative of a pattern. For example, the third data may be obtained via user input and/or an application programming interface.
In one or more example methods, the third data may be indicative of a pattern, such as a first pattern, and optionally a second pattern, and optionally a third pattern. In one or more example methods, the pattern may comprise one or more parts. The one or more parts may be seen as one or more attributes representing a relation between the one or more data elements of the first data.
The method 100 comprises generating S108, based on the second data and the third data, an extraction result set comprising first extraction data. In one or more example methods, the first extraction data has a third format different from the first format and from the second format.
In one or more example methods, the third format comprises a text string, such as one of Text, TXT, format, and Document, DOC, format. In one or more example methods, the third format may include a JavaScript Object Notation format. In one or more example methods, the extraction result set and the first data may represent similar information but where the extraction result set is adapted to provide the information to a control system. In one or more example methods, converting S104 the first data into second data comprises executing S104A a Robotic Process Automation, RPA. In one or more example methods, the RPA is configured to convert the first data into the second data. In one or more example methods, RPA may be seen as a program that performs the automated steps, e.g. obtain the first data and converting the first data into the second data. In one or more example methods, the RPA may be configured to convert the first data with the first format into the second data with the second format. In one or more example methods, RPA may use a format converter, e.g., PDF to HTML, to convert the first data with first format into second data with the second format.
In one or more example methods, the RPA is configured to obtain the third data indicative of the pattern. In one or more example methods, the RPA may be configured to obtain the third data from a user input and/or an application programming interface (API) and/or the memory of the electronic device in which the RPA is executed. In one or more example methods, RPA may be configured to generate the third data dynamically based on the historical data.
In one or more example methods, the method 100 comprises determining S105, based on one or more sample documents, the third data indicative of the pattern. In one or more example methods, the pattern may comprise one or more attributes, as illustrated in Fig. 2B and 3B. The one or more attributes may be seen as providing a relation between one or more data elements of the first data so as to extract robustly e.g. the first extraction data. In one or more example methods, RPA may be configured to obtain the pattern. In one or more example methods, the third data may be generated, by the electronic device, based on the one or more sample documents by identifying one or more patterns in the sample documents.
The one or more sample documents may be seen as templates, e.g., templates related freight invoices, and documents comprising, freight details, freight acknowledgements, etc. In one or more example methods, the one or more sample documents may comprise documents that are already processed. In one or more example methods, the one or more sample documents may be provided as input to the electronic device.
In one or more example methods, the pattern is a target pattern indicative of a relation between data elements of the first data for extraction. In one or more example methods, the first data may comprise the tabular data. In one or more example methods, the tabular data may be arranged in one or more rows and one or more columns. In one or more example methods, the pattern may be a target pattern having the arrangement associating data elements of a first column with corresponding data elements in a second column in a same row.
In one or more example methods, generating S108, based on the second data and the third data, the extraction result set comprising the first extraction data comprises generating S108A, based on the second data and the pattern indicated in the third data, the first extraction data. For example, when the extraction result set comprises second extraction data, the second extraction data is generated based on the second data and the pattern indicated in the third data. For example, when the extraction result set comprises third extraction data, the third extraction data is generated based on the second data and the pattern indicated in the third data. In one or more example methods, the extraction result set and the first data may represent similar information, however the extraction result set is provided in a format that can be used by the control systems downstream.
In one or more example methods, generating S108, based on the second data and the third data, the extraction result set comprises extracting S108B data elements of the second data that are matching the pattern indicated by the third data. In one or more example methods, generating S108, based on the second data and the third data, the extraction result set comprises extracting S108B data elements of the second data that follow the pattern indicated by the third data.
In one or more example methods, when the arrangement of the one or more strings in pattern and the arrangement order of rows and/or columns match, then the RPA may execute the extraction of data elements of the tabular data having the second format.
In one or more example methods, the electronic device may be configured to look for a similar matching pattern, such as the target pattern, in the second data to extract the data elements. In one or more example methods, the first format is a Portable Document Format, PDF. In one or more example methods, the second format comprises a Hyper Text Markup Language, HTML, format.
In one or more example methods, the third format comprises a text string. In one or more example methods, the text string may be in the standard JavaScript Object Notation, JSON, format.
In one or more example methods, the method 100 comprises providing S110 the extraction result set to a control system. The control system may be an invoicing control system, and/or a shipping control system. In one or more example methods, providing the extraction result set to a control system may comprise controlling the control system. In one or more example methods, the control system may be a logistics control system.
In one or more example methods, the method 100 comprises controlling S112, based on the extraction result set, a process and/or a machine. The process can be a downstream system, such as a logistics system, and/or a shipping system and/or a billing system.
In one or more example methods, the extraction result set may be fed to the control system by the electronic device to control the process of the control system, such as controlling logistics processes, e.g., updating the priorities of shipment of containers. The electronic device may be seen as a computing device for extraction of tabular data, such as a standalone computing system. The electronic device may be seen as a computing device for extraction of tabular data, such as a client device and/or a server device. For example, the electronic device may be an API configured to obtain from a user a pattern and a document. For example, the electronic device may be an API configured to provide, based on the document and the pattern, the extraction result set to e.g. another device or machine. The API can be hosted in a server device or on a distributed cloud. The electronic device may be a tabular extraction data device.
In one or more example methods, the extraction result set may be fed to the control system by the electronic device to control a machine, such as controlling a machine logistics operation, e.g., turning off a machine when there is less freight to handle by the control system.
Fig. 5 shows a block diagram of an exemplary electronic device 300 according to the disclosure. The electronic device 300 comprises a memory circuitry 301 , a processor circuitry 302, and an interface 303. The electronic device 300 is configured to perform any of the methods disclosed in Fig. 4. In other words, the electronic device 300 is configured for providing an extraction result set.
The electronic device may be seen as a computing device for extraction of tabular data, such as a standalone computing system. The electronic device may be seen as a computing device for extraction of tabular data, such as a client device and/or a server device. For example, the electronic device may be an API configured to obtain from a user a pattern and a document. For example, the electronic device may be an API configured to provide, based on the document and the pattern, the extraction result set to e.g. another device or machine. The API can be hosted in a server device or on a distributed cloud. The electronic device may be a tabular extraction data device.
The electronic device 300 is configured to obtain (such as using the processor circuitry 302, and/or via the interface 303) first data indicative of a document. The first data has a first format. In one or more example electronic devices, the first data comprises tabular data.
The electronic device 300 is configured to convert (such as using the processor circuitry 302) the first data into second data having a second format different from the first format.
The electronic device 300 is configured to obtain (such as using the processor circuitry 302, and/or via the interface 303) third data indicative of a pattern.
The electronic device 300 is configured to generate (such as using the processor circuitry 302), based on the second data and the third data, an extraction result set comprising first extraction data. In one or more example electronic devices, the first extraction data has a third format different from the first format and from the second format.
In one or more example electronic devices, the electronic device 300 is configured to execute (such as using the processor circuitry 302) a Robotic Process Automation, RPA. In one or more example electronic devices, the RPA is configured to convert (such as using the processor circuitry 302) the first data into the second data. In one or more example electronic devices, the RPA is configured to obtain (such as using the processor circuitry 302, and/or via the interface 303) the third data indicative of the pattern.
In one or more example electronic devices, the electronic device 300 is configured to determine (such as using the processor circuitry 302), based on one or more sample documents, the third data indicative of the pattern.
In one or more example electronic devices, the pattern is a target pattern indicative of a relation between data elements of the first data for extraction.
In one or more example electronic devices, the electronic device 300 is configured to generate (such as using the processor circuitry 302), based on the second data and the pattern indicated in the third data, the first extraction data.
In one or more example electronic devices, the electronic device 300 is configured to generate (such as using the processor circuitry 302), based on the second data and the third data, the extraction result set by extracting data elements of the second data matching the pattern indicated by the third data.
In one or more example electronic devices, the first format is a Portable Document Format, PDF.
In one or more example electronic devices, the second format comprises a Hyper Text Markup Language, HTML, format.
In one or more example electronic devices, the third format comprises a text string.
For example, the electronic device obtains a pdf file and a Pattern to be matched. The electronic device may comprise a data Extractor that applies the pattern to be matched on the tabular data. The electronic device may comprise an RPA that e.g. converts the pdf to HTML format which enables the data extractor to identify the tabular data and looks out for matching patterns provided as an input. This enables the electronic device to deliver just in time outcomes (e.g. the extraction result set) rather than the cumbersome process of marking the pdf for the tabular coordinates. The extracted result set is for example in a third format, such as standard JavaScript Object Notation, and can be easily integrated with downstream systems. In one or more example electronic devices, the electronic device 300 is configured to provide the extraction result set to a control system.
In one or more example electronic devices, the electronic device 300 is configured to control, based on the extraction result set, a process and/or a machine.
The processor circuitry 302 is optionally configured to perform any of the operations disclosed in Fig. 4 (such as any one or more of: S102, S104, S104A, S105, S106, S108, S108A, S108B, S110, S112). The operations of the electronic device 300 may be embodied in the form of executable logic routines (e.g., lines of code, software programs, etc.) that are stored on a non-transitory computer readable medium (e.g., the memory circuitry 301) and are executed by the processor circuitry 302.
Furthermore, the operations of the electronic device 300 may be considered a method that the electronic device 300 is configured to carry out. Also, while the described functions and operations may be implemented in software, such functionality may as well be carried out via dedicated hardware or firmware, or some combination of hardware, firmware and/or software.
The memory circuitry 301 may be one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, a random access memory (RAM), or other suitable device. In a typical arrangement, the memory circuitry 301 may include a non-volatile memory for long term data storage and a volatile memory that functions as system memory for the processor circuitry 302. The memory circuitry 301 may exchange data with the processor circuitry 302 over a data bus. Control lines and an address bus between the memory circuitry 301 and the processor circuitry 302 also may be present (not shown in Fig. 5). The memory circuitry 301 is considered a non-transitory computer readable medium.
The memory circuitry 301 may be configured to store first data, second data, third data, first extraction data, and extraction result.
The memory circuitry 301 may be configured to store one or more programs in a part of the memory. The one or more programs may comprise instructions, which when executed by an electronic device cause the electronic device to perform any of the methods disclosed in Fig. 4.
It is noted that descriptions and features of electronic device functionality, such as electronic device configured to, also apply to methods and vice versa. For example, a description of an electronic device configured to determine also applies to a method, e.g., performed by an electronic device, wherein the method comprises determining and vice versa.
Embodiments of methods and products (electronic device) according to the disclosure are set out in the following items:
Item 1 . An electronic device comprising memory circuitry, processor circuitry, and an interface, wherein the electronic device is configured to obtain first data indicative of a document, wherein the first data has a first format, wherein the first data comprises tabular data; convert the first data into second data having a second format different from the first format; obtain third data indicative of a pattern; and generate, based on the second data and the third data, an extraction result set comprising first extraction data, wherein the first extraction data has a third format different from the first format and from the second format.
Item 2. The electronic device of item 1 , wherein the electronic device is configured to execute a Robotic Process Automation, RPA, wherein the RPA is configured to convert the first data into the second data.
Item 3. The electronic device of item 2, wherein the RPA is configured to obtain the third data indicative of the pattern. Item 4. The electronic device of any of the previous items, wherein the electronic device is configured to determine, based on one or more sample documents, the third data indicative of the pattern.
Item 5. The electronic device of any of the previous items, wherein the pattern is a target pattern indicative of a relation between data elements of the first data for extraction.
Item 6. The electronic device of any of the previous items, wherein the electronic device is configured to generate, based on the second data and the pattern indicated in the third data, the first extraction data.
Item 7. The electronic device of any of the previous items, wherein the electronic device is configured to generate, based on the second data and the third data, the extraction result set by extracting data elements of the second data matching the pattern indicated by the third data.
Item 8. The electronic device of any of the previous items, wherein the first format is a Portable Document Format, PDF.
Item 9. The electronic device of any of the previous items, wherein the second format comprises a Hyper Text Markup Language, HTML, format.
Item 10. The electronic device of any of the previous items, wherein the third format comprises a text string.
Item 11. The electronic device of any of the previous items, wherein the electronic device is configured to provide the extraction result set to a control system. Item 12. The electronic device of any of the previous items, wherein the electronic device is configured to control, based on the extraction result set, a process and/or a machine.
Item 13. A method, performed by an electronic device, for providing an extraction result set, the method comprising: obtaining (S102) first data indicative of a document, wherein the first data has a first format, wherein the first data comprises tabular data; converting (S104) the first data into second data having a second format different from the first format; obtaining (S106) third data indicative of a pattern; and generating (S108), based on the second data and the third data, an extraction result set comprising first extraction data, wherein the first extraction data has a third format different from the first format and from the second format.
Item 14. The method of item 13, wherein converting (S104) the first data into second data comprises executing (S104A) a Robotic Process Automation, RPA, wherein the RPA is configured to convert the first data into the second data.
Item 15. The method of item 14, wherein the RPA is configured to obtain the third data indicative of the pattern.
Item 16. The method according to any of items 13-15, the method comprising determining (S105), based on one or more sample documents, the third data indicative of the pattern. Item 17. The method according to any of items 13-16, wherein the pattern is a target pattern indicative of a relation between data elements of the first data for extraction.
Item 18. The method according to any of items 13-17, wherein generating (S108), based on the second data and the third data, the extraction result set comprising the first extraction data comprises generating (S108A), based on the second data and the pattern indicated in the third data, the first extraction data.
Item 19. The method according to any of items 13-18, wherein generating (S108), based on the second data and the third data, the extraction result set comprises extracting (S108B) data elements of the second data that are matching the pattern indicated by the third data.
Item 20. The method according to any of items 13-19, wherein the first format is a Portable Document Format, PDF.
Item 21. The method according to any of items 13-20, wherein the second format comprises a Hyper Text Markup Language, HTML, format.
Item 22. The method according to any of items 13-21 , wherein the third format comprises a text string.
Item 23. The method according to any of items 13-22, the method comprising providing (S110) the extraction result set to a control system.
Item 24. The method according to any of items 13-23, the method comprising controlling (S112), based on the extraction result set, a process and/or a machine. Item 25. A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by an electronic device cause the electronic device to perform any of the methods of items 13-24.
The use of the terms “first”, “second”, “third” and “fourth”, “primary”, “secondary”, “tertiary” etc. does not imply any particular order, but are included to identify individual elements. Moreover, the use of the terms “first”, “second”, “third” and “fourth”, “primary”, “secondary”, “tertiary” etc. does not denote any order or importance, but rather the terms “first”, “second”, “third” and “fourth”, “primary”, “secondary”, “tertiary” etc. are used to distinguish one element from another. Note that the words “first”, “second”, “third” and “fourth”, “primary”, “secondary”, “tertiary” etc. are used here and elsewhere for labelling purposes only and are not intended to denote any specific spatial or temporal ordering. Furthermore, the labelling of a first element does not imply the presence of a second element and vice versa.
It may be appreciated that Figs. 1-5 comprises some circuitries or operations which are illustrated with a solid line and some circuitries or operations which are illustrated with a dashed line. The circuitries or operations which are comprised in a solid line are circuitries or operations which are comprised in the broadest example embodiment. The circuitries or operations which are comprised in a dashed line are example embodiments which may be comprised in, or a part of, or are further circuitries or operations which may be taken in addition to the circuitries or operations of the solid line example embodiments. It should be appreciated that these operations need not be performed in order presented.
Furthermore, it should be appreciated that not all of the operations need to be performed. The exemplary operations may be performed in any order and in any combination.
It is to be noted that the word "comprising" does not necessarily exclude the presence of other elements or steps than those listed.
It is to be noted that the words "a" or "an" preceding an element do not exclude the presence of a plurality of such elements.
It should further be noted that any reference signs do not limit the scope of the claims, that the exemplary embodiments may be implemented at least in part by means of both hardware and software, and that several "means", "units" or "devices" may be represented by the same item of hardware.
The various exemplary methods, devices, nodes, and systems described herein are described in the general context of method steps or processes, which may be implemented in one aspect by a computer program product, embodied in a computer- readable medium, including computer-executable instructions, such as program code, executed by computers in networked environments. A computer-readable medium may include removable and non-removable storage devices including, but not limited to, Read Only Memory (ROM), Random Access Memory (RAM), compact discs (CDs), digital versatile discs (DVD), etc. Generally, program circuitries may include routines, programs, objects, components, data structures, etc. that perform specified tasks or implement specific abstract data types. Computer-executable instructions, associated data structures, and program circuitries represent examples of program code for executing steps of the methods disclosed herein. The particular sequence of such executable instructions or associated data structures represents examples of corresponding acts for implementing the functions described in such steps or processes.
Although features have been shown and described, it will be understood that they are not intended to limit the claimed disclosure, and it will be made obvious to those skilled in the art that various changes and modifications may be made without departing from the scope of the claimed disclosure. The specification and drawings are, accordingly, to be regarded in an illustrative rather than restrictive sense. The claimed disclosure is intended to cover all alternatives, modifications, and equivalents.

Claims

1 . An electronic device comprising memory circuitry, processor circuitry, and an interface, wherein the electronic device is configured to obtain first data indicative of a document, wherein the first data has a first format, wherein the first data comprises tabular data; convert the first data into second data having a second format different from the first format; obtain third data indicative of a pattern; and generate, based on the second data and the third data, an extraction result set comprising first extraction data, wherein the first extraction data has a third format different from the first format and from the second format.
2. The electronic device of claim 1 , wherein the electronic device is configured to execute a Robotic Process Automation, RPA, wherein the RPA is configured to convert the first data into the second data.
3. The electronic device of claim 2, wherein the RPA is configured to obtain the third data indicative of the pattern.
4. The electronic device of any of the previous claims, wherein the electronic device is configured to determine, based on one or more sample documents, the third data indicative of the pattern.
5. The electronic device of any of the previous claims, wherein the pattern is a target pattern indicative of a relation between data elements of the first data for extraction. 6. The electronic device of any of the previous claims, wherein the electronic device is configured to generate, based on the second data and the pattern indicated in the third data, the first extraction data.
7. The electronic device of any of the previous claims, wherein the electronic device is configured to generate, based on the second data and the third data, the extraction result set by extracting data elements of the second data matching the pattern indicated by the third data.
8. The electronic device of any of the previous claims, wherein the first format is a Portable Document Format, PDF, and/or wherein the second format comprises a Hyper Text Markup Language, HTML, format.
9. The electronic device of any of the previous claims, wherein the third format comprises a text string. 0. The electronic device of any of the previous claims, wherein the electronic device is configured to provide the extraction result set to a control system and/or wherein the electronic device is configured to control, based on the extraction result set, a process and/or a machine. 1. A method, performed by an electronic device, for providing an extraction result set, the method comprising: obtaining (S102) first data indicative of a document, wherein the first data has a first format, wherein the first data comprises tabular data; converting (S104) the first data into second data having a second format different from the first format; obtaining (S106) third data indicative of a pattern; and generating (S108), based on the second data and the third data, the extraction result set comprising first extraction data, wherein the first extraction data has a third format different from the first format and from the second format.
12. The method of claim 11 , wherein converting (S104) the first data into second data comprises executing (S104A) a Robotic Process Automation, RPA, wherein the RPA is configured to convert the first data into the second data.
13. The method of claim 12, wherein the RPA is configured to obtain the third data indicative of the pattern.
14. The method according to any of claims 11-13, the method comprising determining (S105), based on one or more sample documents, the third data indicative of the pattern.
15. The method according to any of claims 11-14, wherein the pattern is a target pattern indicative of a relation between data elements of the first data for extraction.
16. The method according to any of claims 11-15, wherein generating (S108), based on the second data and the third data, the extraction result set comprising the first extraction data comprises generating (S108A), based on the second data and the pattern indicated in the third data, the first extraction data.
17. The method according to any of claims 11-16, wherein generating (S108), based on the second data and the third data, the extraction result set comprises extracting (S108B) data elements of the second data that are matching the pattern indicated by the third data.
18. The method according to any of claims 11-17, wherein the first format is a Portable Document Format, PDF, and/or wherein the second format comprises a Hyper Text Markup Language, HTML, format. 19. The method according to any of claims 11-18, wherein the third format comprises a text string.
20. The method according to any of claims 11-19, the method comprising: providing (S110) the extraction result set to a control system and/or controlling (S112), based on the extraction result set, a process and/or a machine.
PCT/EP2023/051825 2022-01-27 2023-01-25 An electronic device and a method for tabular data extraction WO2023144218A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DKPA202270035 2022-01-27
DKPA202270035 2022-01-27

Publications (1)

Publication Number Publication Date
WO2023144218A1 true WO2023144218A1 (en) 2023-08-03

Family

ID=85108860

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2023/051825 WO2023144218A1 (en) 2022-01-27 2023-01-25 An electronic device and a method for tabular data extraction

Country Status (1)

Country Link
WO (1) WO2023144218A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160055376A1 (en) * 2014-06-21 2016-02-25 iQG DBA iQGATEWAY LLC Method and system for identification and extraction of data from structured documents
US10740603B2 (en) * 2017-03-22 2020-08-11 Drilling Info, Inc. Extracting data from electronic documents

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160055376A1 (en) * 2014-06-21 2016-02-25 iQG DBA iQGATEWAY LLC Method and system for identification and extraction of data from structured documents
US10740603B2 (en) * 2017-03-22 2020-08-11 Drilling Info, Inc. Extracting data from electronic documents

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KAJROLKAR ASMITA ET AL: "Customer Order Processing using Robotic Process Automation", 2021 INTERNATIONAL CONFERENCE ON COMMUNICATION INFORMATION AND COMPUTING TECHNOLOGY (ICCICT), IEEE, 25 June 2021 (2021-06-25), pages 1 - 4, XP033959071, DOI: 10.1109/ICCICT50803.2021.9510109 *

Similar Documents

Publication Publication Date Title
US11301484B2 (en) Systems and methods for type coercion
US10366123B1 (en) Template-free extraction of data from documents
JP5385349B2 (en) Receipt definition data creation device and program thereof
CN113269504B (en) Warehouse goods storage method and computer equipment
CN104462179B (en) Method for processing big data, apparatus for executing the same and storage medium storing the same
US20140169665A1 (en) Automated Processing of Documents
US20130063769A1 (en) Information management apparatus and method, information management system, and non-transitory computer readable medium
WO2023144218A1 (en) An electronic device and a method for tabular data extraction
JP6644369B1 (en) Information processing system, information processing method and information processing program
CN113869014A (en) Extraction method and device of table data, storage medium and electronic equipment
CN117371401A (en) Data standardization processing method based on large language model
CN111047261A (en) Warehouse logistics order identification method and system
JP6480376B2 (en) Industry application standard data processing program
JP6445645B1 (en) Form information recognition apparatus and form information recognition method
US11282025B1 (en) Concatenated shipping documentation processing spawning intelligent generation subprocesses
US20170169518A1 (en) System and method for automatically tagging electronic documents
US10147132B2 (en) System and method for selection of two parameters via UI element
WO2023099313A1 (en) An electronic device and a related method for controlling a legal document
WO2023099317A1 (en) An electronic device and a method for classifying legal documents
CN111881795A (en) Freight note number identification method and device
WO2022230180A1 (en) Information processing device, information processing method, and program
CN109118159A (en) A kind of self-verifying method and device of clearance data
US20240143919A1 (en) Systems and methods for extracting data from documents
JP2011145914A (en) Freight tracking information correction method and freight tracking system
JP6123435B2 (en) Tally file creation system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23702108

Country of ref document: EP

Kind code of ref document: A1