US20230005075A1 - Ai-augmented auditing platform including techniques for automated assessment of vouching evidence - Google Patents
Ai-augmented auditing platform including techniques for automated assessment of vouching evidence Download PDFInfo
- Publication number
- US20230005075A1 US20230005075A1 US17/854,329 US202217854329A US2023005075A1 US 20230005075 A1 US20230005075 A1 US 20230005075A1 US 202217854329 A US202217854329 A US 202217854329A US 2023005075 A1 US2023005075 A1 US 2023005075A1
- Authority
- US
- United States
- Prior art keywords
- data
- erp
- vouching
- information
- item
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 79
- 238000012545 processing Methods 0.000 claims description 24
- 238000010606 normalization Methods 0.000 claims description 16
- 238000013075 data extraction Methods 0.000 claims description 3
- 238000000605 extraction Methods 0.000 abstract description 31
- 238000004422 calculation algorithm Methods 0.000 description 69
- 238000004458 analytical method Methods 0.000 description 15
- 239000002609 medium Substances 0.000 description 15
- 239000000758 substrate Substances 0.000 description 14
- 230000008569 process Effects 0.000 description 13
- 230000004044 response Effects 0.000 description 13
- 230000010354 integration Effects 0.000 description 12
- 238000013459 approach Methods 0.000 description 8
- 238000012550 audit Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 6
- 238000012360 testing method Methods 0.000 description 6
- 238000012795 verification Methods 0.000 description 6
- 238000012935 Averaging Methods 0.000 description 5
- 238000010801 machine learning Methods 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 230000002123 temporal effect Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 230000001939 inductive effect Effects 0.000 description 4
- 238000012800 visualization Methods 0.000 description 4
- 238000005457 optimization Methods 0.000 description 3
- 239000000654 additive Substances 0.000 description 2
- 230000000996 additive effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000012790 confirmation Methods 0.000 description 2
- 238000013499 data model Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000037406 food intake Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000007790 scraping Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 239000006163 transport media Substances 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 102100032601 Adhesion G protein-coupled receptor B2 Human genes 0.000 description 1
- 101000796784 Homo sapiens Adhesion G protein-coupled receptor B2 Proteins 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 201000003373 familial cold autoinflammatory syndrome 3 Diseases 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/12—Accounting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3347—Query execution using vector based model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/93—Document management systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/29—Graphical models, e.g. Bayesian networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/042—Knowledge-based neural networks; Logical representations of neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/096—Transfer learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
- G06N5/022—Knowledge engineering; Knowledge acquisition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
- G06N5/041—Abduction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
- G06N5/045—Explanation of inference; Explainable artificial intelligence [XAI]; Interpretable artificial intelligence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
- G06N5/048—Fuzzy inferencing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/018—Certifying business or products
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/412—Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/416—Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors
Definitions
- This relates generally to automated data processing and validation of data, and more specifically to AI-augmented auditing platforms including techniques for assessment of vouching evidence.
- known document-understanding techniques are sensitive to the structure of the documents that are ingested and analyzed. Accordingly, known document-understanding techniques may fail to correctly recognize and identify certain entities referenced in documents, due for example to a misinterpretation of the structure or layout of one or more ingested documents. Accordingly, there is a need for improved document-understanding (e.g., document ingestion and analysis) techniques that are more robust to various document structures and layouts and that provide higher accuracy for entity recognition in documents.
- the document-understanding techniques disclosed herein may leverage a priori knowledge (e.g., information available from a data source separate from the document(s) being assessed for sufficiency for vouching purposes) of one or more entities in extracting and/or analyzing information from one or more documents.
- the document-understanding techniques may analyze the spatial configuration of words, paragraphs, or other content in a document in extracting and/or analyzing information from one or more documents.
- a system is configured vouch payment data against evidence data. More specifically, a system may be configured to provide a framework that performs ERP payment activities vouching against physical bank statement.
- the system may include a pipeline that perform information extraction and characteristics extraction from bank statements, and the system may leverage one or more advanced data structures and matching algorithms to perform one-to-many matching between ERP data and bank statement data.
- the payment vouching systems provided herein may thus automate the process of finding material evidence such as remittance advice or bank statements to corroborate ERP payment entries.
- a first system is provided, the first system being for determining whether data within an electronic document constitutes vouching evidence for an enterprise resource planning (ERP) item, the first system comprising one or more processors configured to cause the first system to: receive data representing an ERP item; generate hypothesis data based on the received data represent an ERP item; receive an electronic document; extract ERP information from the document; apply one or more models to the hypothesis data and to extracted ERP information in order to generate output data indicating whether the extracted ERP information constitutes vouching evidence for the ERP item.
- ERP enterprise resource planning
- extracting the instance of ERP information comprises generating first data representing information content of the instance of ERP information and second data representing a document location for the instance of ERP information
- the ERP information comprises one or more of: a purchase order number, a customer name, a date, a delivery term, a shipping term, a unit price, and a quantity.
- applying the one or more models to generate output data is based on preexisting information regarding spatial relationships amongst instances of ERP information in documents.
- the preexisting information comprises a graph representing spatial relationships amongst instances of ERP information in documents.
- the one or more processors are configured to cause the system to augment the hypothesis data based on one or more models representing contextual data.
- the contextual data comprises information regarding one or more synonyms for the information content of the instance of ERP information.
- the instance of ERP information comprises a single word in the document.
- the instance of ERP information comprises a plurality of words in the document.
- the one or more processors are configured to determine whether the ERP information vouches for the ERP item.
- determining whether the ERP information vouches for the ERP item comprises generating and evaluating a similarity score representing a comparison of the ERP information and the ERP item.
- the similarity generated by comparing an entity graph associated with the ERP information to an entity graph associated with the ERP item.
- extracting the ERP information from the document comprises applying a fingerprinting operation to determine, based on the receive data representing an ERP item, a characteristic of a data extraction operation to be applied to the electronic document.
- a first non-transitory computer-readable storage medium storing instructions for determining whether data within an electronic document constitutes vouching evidence for an enterprise resource planning (ERP) item, the instructions configured to be executed by a system comprising one or more processors to cause the system to: receive data representing an ERP item; generate hypothesis data based on the received data represent an ERP item; receive an electronic document; extract ERP information from the document; apply one or more models to the hypothesis data and to extracted ERP information in order to generate output data indicating whether the extracted ERP information constitutes vouching evidence for the ERP item.
- ERP enterprise resource planning
- a first method is provided, the first method being for determining whether data within an electronic document constitutes vouching evidence for an enterprise resource planning (ERP) item, wherein the first method is performed by a system comprising one or more processors, the first method comprising: receiving data representing an ERP item; generating hypothesis data based on the received data represent an ERP item; receiving an electronic document; extracting ERP information from the document; applying one or more models to the hypothesis data and to extracted ERP information in order to generate output data indicating whether the extracted ERP information constitutes vouching evidence for the ERP item.
- ERP enterprise resource planning
- a second system is provided, the second system being for verifying an assertion against a source document, the second system comprising one or processors configured to cause the second system to: receive first data indicating an unverified assertion; receive second data comprising a plurality of source documents; apply one or more extraction models to extract a set of key data from the plurality of source documents; and apply one or more matching models to compare the first data to the set of key data to generate an output indicating whether one or more of the plurality of source documents satisfies one or more verification criteria for verifying the unverified assertion.
- the one or more extraction models comprise one or more machine learning models.
- the one or more matching models comprises one or more approximation models.
- the one or more matching models are configured to perform one-to-many matching between the first data and the set of key data.
- the one of more processors are configured to cause the system to modify one or more of the extraction models without modification of one or more of the matching models.
- the one of more processors are configured to cause the system to modify one or more of the matching models without modification of one or more of the extraction models.
- the unverified assertion comprises an ERP payment entry.
- the plurality of source documents comprises a bank statement.
- applying one or more matching models comprises generating a match score and generating a confidence score.
- applying one or more matching models comprises: applying a first matching model; if a match is indicated by the first matching model, generating a match score and a confidence score based on the first matching model; if a match is not indicated by the second matching model: applying a second matching model; if a match is indicated by the second matching model, generating a match score and a confidence score based on the second matching mode; and if a match is not indicated by the second matching model, generating a match score of 0.
- a second non-transitory computer-readable storage medium storing instructions for verifying an assertion against a source document, the instructions configured to be executed by a system comprising one or processors to cause the system to: receive first data indicating an unverified assertion; receive second data comprising a plurality of source documents; apply one or more extraction models to extract a set of key data from the plurality of source documents; and apply one or more matching models to compare the first data to the set of key data to generate an output indicating whether one or more of the plurality of source documents satisfies one or more verification criteria for verifying the unverified assertion.
- a second method is provided, the second method being for verifying an assertion against a source document, wherein the second method is executed by a system comprising one or processors, the second method comprising: receiving first data indicating an unverified assertion; receiving second data comprising a plurality of source documents; applying one or more extraction models to extract a set of key data from the plurality of source documents; and applying one or more matching models to compare the first data to the set of key data to generate an output indicating whether one or more of the plurality of source documents satisfies one or more verification criteria for verifying the unverified assertion.
- a third system for determining whether data within an electronic document constitutes vouching evidence for an enterprise resource planning (ERP) item, the third system comprising one or more processors configured to cause the third system to: receive data representing an ERP item; generate hypothesis data based on the received data represent an ERP item; receive an electronic document; extract ERP information from the document; apply a first set of one or more models to the hypothesis data and to extracted ERP information in order to generate first output data indicating whether the extracted ERP information constitutes vouching evidence for the ERP item; apply a second set of one or more models to the extracted ERP information in order to generate second output data indicating whether the extracted ERP information constitutes vouching evidence for the ERP item; generate combined determination data, based on the first output data and the second output data, indicating whether the extracted ERP information constitutes vouching evidence for the ERP item.
- ERP enterprise resource planning
- a third non-transitory computer-readable storage medium storing instructions for determining whether data within an electronic document constitutes vouching evidence for an enterprise resource planning (ERP) item, the instructions configured to be executed by a system comprising one or more processors to cause the system to: receive data representing an ERP item; generate hypothesis data based on the received data represent an ERP item; receive an electronic document; extract ERP information from the document; apply a first set of one or more models to the hypothesis data and to extracted ERP information in order to generate first output data indicating whether the extracted ERP information constitutes vouching evidence for the ERP item; apply a second set of one or more models to the extracted ERP information in order to generate second output data indicating whether the extracted ERP information constitutes vouching evidence for the ERP item; generate combined determination data, based on the first output data and the second output data, indicating whether the extracted ERP information constitutes vouching evidence for the ERP item.
- ERP enterprise resource planning
- a third method for determining whether data within an electronic document constitutes vouching evidence for an enterprise resource planning (ERP) item, wherein the third method is performed by a system comprising one or more processors, the third method comprising: receiving data representing an ERP item; generating hypothesis data based on the received data represent an ERP item; receiving an electronic document; extracting ERP information from the document; applying a first set of one or more models to the hypothesis data and to extracted ERP information in order to generate first output data indicating whether the extracted ERP information constitutes vouching evidence for the ERP item; applying a second set of one or more models to the extracted ERP information in order to generate second output data indicating whether the extracted ERP information constitutes vouching evidence for the ERP item; generating combined determination data, based on the first output data and the second output data, indicating whether the extracted ERP information constitutes vouching evidence for the ERP item.
- ERP enterprise resource planning
- any one or more of the features, characteristics, or aspects of any one or more of the above systems, methods, or non-transitory computer-readable storage media may be combined, in whole or in part, with one another and/or with any one or more of the features, characteristics, or aspects (in whole or in part) of any other embodiment or disclosure herein.
- FIG. 1 shows two examples of extracting entities from documents, in accordance with some embodiments.
- FIG. 2 shows a system for data processing for an AI-augmented auditing platform, in accordance with some embodiments.
- FIGS. 3 A- 3 B depict a diagram of how a fingerprinting algorithm may be used as part of a process to render a decision about whether purchase order is vouched, in accordance with some embodiments.
- FIG. 4 shows a diagram of a fingerprinting algorithm, document-understanding, and vouching algorithm, in accordance with some embodiments.
- FIGS. 5 A- 5 B show a diagram of a payment vouching method, in accordance with some embodiments.
- FIG. 6 illustrates an example of a computer, according to some embodiments.
- known document-understanding techniques are sensitive to the structure of the documents that are ingested and analyzed. Accordingly, known document-understanding techniques may fail to correctly recognize and identify certain entities referenced in documents, due for example to a misinterpretation of the structure or layout of one or more ingested documents. Accordingly, there is a need for improved document-understanding (e.g., document ingestion and analysis) techniques that are more robust to various document structures and layouts and that provide higher accuracy for entity recognition in documents.
- the document-understanding techniques disclosed herein may leverage a priori knowledge (e.g., information available from a data source separate from the document(s) being assessed for sufficiency for vouching purposes) of one or more entities in extracting and/or analyzing information from one or more documents.
- the document-understanding techniques may analyze the spatial configuration of words, paragraphs, or other content in a document in extracting and/or analyzing information from one or more documents.
- a document-understanding system is configured to perform automated hypothesis generation based on one or more data sets.
- the data sets on which hypothesis generation is based may include one or more sets of ingested documents, for example documents ingested in accordance with one or more document-understanding techniques described herein.
- the data sets on which hypothesis generation is based may include enterprise resource planning (ERP) data.
- ERP enterprise resource planning
- the data (e.g., ERP data) may indicate one or more entities, for example a PO #, a customer name, a date, a delivery term, a shipping term, a unit price, and/or a quantity.
- the system may be configured to apply a priori knowledge (e.g., information available from a data source separate from the document(s) being assessed for sufficiency for vouching purposes) regarding one or more of the entities indicated in the data.
- a priori knowledge e.g., information available from a data source separate from the document(s) being assessed for sufficiency for vouching purposes
- the hypothesis generation techniques disclosed herein may enable more accurate vouching of ERP data with evidence from unstructured documents and other evidence sources.
- the system may be configured to analyze spatial relationships and constellation among entities indicated in the data. For example, the position at which entities are indicated in a document (e.g., a unit price and a quantity indicated on a same line of a document versus on a different line of a document) may be analyzed.
- the system may be configured to generate, store, and/or analyze a data structure, such as a graph data structure, that represents spatial relationships amongst a plurality of entities in one or more documents.
- the system may be configured to apply one or more AI models to comprehend documents to identify and assess evidence to vouch for the validity of financial information reported in ERPs.
- the system may use the ERP data to weakly label and provide hypotheses to documents that are candidates for possible evidence.
- the system may further apply one or more name entity extraction models to provide additional bias-free information to overlay on top of these documents.
- the combination of these features may enable the system to validate whether candidate evidence is indeed vouching evidence (e.g., whether it meets vouching criteria) for a given ERP entry, including by providing a quantification/score of the system's confidence in the conclusion that the candidate evidence does or does not constitute vouching evidence.
- the system may be configured to receive ERP data and to apply one or more data processing operations (e.g., AI models) to the received data in order to generate hypothesis data.
- data processing operations e.g., AI models
- Any data processing operation referenced herein may include application of one or more models trained by machine-learning.
- the hypothesis data may consist of one or more content entities that the system hypothesizes to be indicated in the received data, for example: PO #, customer name, date, delivery term, shipping term, unit price, and/or quantity.
- the system may assess one or more of the following in generating hypothesis data and/or in assessing hypothesis data once it is generated: a priori knowledge (e.g., knowledge from one or more data sources aside from the ERP data source); spatial relationships amongst words, paragraphs, or other indications of entities within the ERP data (e.g., spatial relationships of words within a document), and/or constellations amongst entities (e.g., unit price & quantity appearing on the same line).
- a priori knowledge e.g., knowledge from one or more data sources aside from the ERP data source
- spatial relationships amongst words, paragraphs, or other indications of entities within the ERP data e.g., spatial relationships of words within a document
- constellations amongst entities e.g., unit price & quantity appearing on the same line.
- the system may apply one or more data processing operations (e.g., AI models) in order to augment one or more of the generated hypotheses.
- the system may augment (or otherwise modify) a generated hypothesis on the basis of context data available to the system.
- context data may include synonym data, such that the system may augment a hypothesis in accordance with synonym data. For example, hypothesis data that includes the word “IBM” may be augmented to additionally include the term “International Business Machines”.
- the system may be configured to perform spatial entity extraction.
- spatial entity extraction includes extracting entities (at the word-level and at the multi-word level) from a document to generate information regarding (a) the entity content/identity and (b) information regarding a spatial location of the entity (e.g., an absolute spatial location within a document and/or a spatial location/proximity/alignment/orientation with respect to one or more other entities within the document).
- the system may be configured to perform one or more hypothesis testing operations in order to evaluate the likelihood of a match, for example based on calculating a similarity score.
- the likelihood of a match may be evaluated between ERP data on one hand and a plurality of documents on the other hand.
- the likelihood of a match may be based on calculating a similarity score between the entity (or entities) representing the hypothesis and the entity (or entity graph) representing components within the documents.
- the systems and methods provided herein may provide improvements over existing approaches, including by providing the ability to use contextual information guided by an audit process to aid in comprehension, to use contextual information to form hypotheses on the expected information to be extracted from documents, to allow the testing of these hypotheses to guide document comprehension, and/or to apply methods to mitigate and account for the possibility of biases introduced by contextual information (e.g., by adjusting a confidence score accordingly).
- FIG. 1 depicts two examples of extracting entities from documents, in accordance with some embodiments.
- FIG. 2 depicts a system 200 for data processing for an AI-augmented auditing platform, in accordance with some embodiments.
- the components labeled “hypothesis generation” and “active vouching” may, in some embodiments, include any one or more of the systems (and/or may apply any one or more of the methods) described herein.
- each of the schematic blocks shown in FIG. 2 may represent a distinct module (e.g., each distinct module comprising one or more distinct computer systems including storage devices and/or one or more physical and/or virtual processors) configured to perform associated functionality.
- any one or more of the schematic blocks shown in FIG. 2 may represent functionalities performed by a same module (e.g., by a same computer system).
- system 200 may be configured to perform any one or processes for active vouching; passive vouching and tracing; and/or data integrity integration, for example as described herein.
- system 100 may include documents source 202 , which may include any one or more computer storage devices such as databases, data stores, data repositories, live data feeds, or the like.
- Documents source 202 may be communicatively coupled to one or more other components of system 200 and configured to provide a plurality of document to system 200 , such that the documents can be assessed to determine whether one or more data integrity criteria are met, e.g., whether the documents sufficiently vouch for one or more representations made by a set of ERP data.
- system 200 may receive documents from documents source 202 on a scheduled basis, in response to a user input, in response to one or more trigger conditions being met, and/or in response to the documents being manually sent.
- Documents received from documents source 202 may be provided in any suitable electronic data format, for example as structured, unstructured, and/or semi-structured data.
- the documents may include, for example, spreadsheets, word processing documents, and/or PDFs.
- System 200 may include OCR module 204 , which may include any one or more processors configured to perform OCR analysis and/or any other text or character recognition/extraction based on documents received from documents source 202 .
- OCR module 204 may generate data representing characters recognized in the received documents.
- System 200 may include document classification module 206 , which may include one or more processors configured to perform document classification of documents received from documents source 202 and/or from OCR module 204 .
- Document classification module 206 may receive document data from documents source 202 and/or may receive data representing characters in documents from OCR module 204 , and may apply one or more classification algorithms to the received data to apply one or more classifications to the documents received from documents source 202 .
- Data representing the determined classifications may be stored as metadata in association with the documents themselves and/or may be used to store the documents in a manner according to their determined respective classification(s).
- System 200 may include ERP data source 208 , which may include any one or more computer storage devices such as databases, data stores, data repositories, live data feeds, or the like.
- Documents source 202 may be communicatively coupled to one or more other components of system 200 and configured to provide ERP data to system 200 , such that the ERP data can be assessed to determine whether one or more data integrity criteria are met, e.g., whether the ERP data is sufficiently vouched by one or more documents (e.g., the documents provided by documents source 202 ).
- one or more components of system 200 may receive ERP data from ERP data source 208 on a scheduled basis, in response to a user input, in response to one or more trigger conditions being met, and/or in response to the data being manually sent.
- ERP data received from ERP data source 208 may be provided in any suitable electronic data format.
- ERP data may be provided in a tabular data format, including a data model that defines the structure of the data.
- System 200 may include knowledge substrate 210 , which may include any one or more data sources such as master data source 210 a , ontology data source 210 b , and exogenous knowledge data source 210 c .
- the data sources included in knowledge substrate 210 may be provided as part of a single computer system, multiple computer systems, a single network, or multiple networks.
- the data sources included in knowledge substrate 210 may be configured to provide data to one or more components of system 200 (e.g., hypothesis generation module 212 , normalization and contextualization module 222 , and/or passive vouching and tracing module 224 ).
- one or more components of system 200 may receive data from knowledge substrate 210 on a scheduled basis, in response to a user input, in response to one or more trigger conditions being met, and/or in response to the data being manually sent.
- Data received from knowledge substrate 210 may be provided in any suitable data format.
- interaction with knowledge substrate 210 may be query based.
- Interaction with knowledge substrate 210 may be in one or more of the following forms: question answering, information retrieval, query into knowledge graph engine, and/or inferencing engine (e.g., against inferencing rules).
- Knowledge substrate 210 may include data such as ontology/taxonomy data, knowledge graph data, and/or inferencing rules data.
- Master data received from master data source 210 a may include, for example, master customer data, master vendor data, and/or master product data.
- Ontology data received from ontology data source 210 b may include, for example, IncoTerms data for international commercial terms that define the cost, liability, and/or insurance among the sell side, buy side, and shipper for shipping a product.
- Exogenous knowledge data source received from exogenous knowledge data source 210 c may include, for example, knowledge external to a specific audit client. This knowledge could be related to the industry of the client, the geographic area of a client, and/or the entire economy.
- System 200 may include hypothesis generation module 212 , which may include one or more processors configured to generate hypothesis data.
- Hypothesis generation module 212 may receive input data from any one or more of: (a) document classification module 206 , (b) ERP data source 208 , and (c) knowledge substrate 210 .
- Hypothesis generation module 212 may apply one or more hypothesis generation algorithms to some or all of the received data and may thereby generate hypothesis data.
- Hypothesis generation may be based on any one of, and/or a combination of: (1) ERP data, (2) document type data, (3) data regarding prior understanding of one or more documents.
- a generated hypothesis may represent where and what is expected to be found in documents data, based on previous exposure to similar documents.
- Document classification data (e.g., from document classification module 206 ), for one document and/or for a group of documents, maybe used to determine, augment, and/or weight hypothesis data generated by hypothesis generation module 212 .
- document content itself e.g., document data received from documents source 202
- document classification data e.g., as generated by document classification module 206
- document content itself may be used, in addition to document classification data, for hypothesis generation.
- the hypothesis data generated by hypothesis generation module 212 may be provided in any suitable data format.
- hypothesis data in the context of document understanding may be represented as sets of tuples (e.g., representing entity, location, and value), each of which represent what is expected to be found from the documents data.
- system 200 may provide for an “active vouching” pipeline and for a “passive vouching” pipeline that may each be applied, using some or all of the same underlying data, in parallel to one another.
- the two pipelines may be applied at the same time or one after the other.
- the active vouching pipeline is described with respect to element 214
- the passive vouching pipeline is described with respect to elements 216 - 224 .
- System 200 may include active vouching module 214 , which may include one or more processors configured to apply any one or more active vouching analysis operations.
- Active vouching module 214 may receive input data from one or more of: OCR module 204 , document classification module 206 , and hypothesis generation module 212 .
- Active vouching module 214 may apply one or more active vouching analysis operations to some or all of the received data and may thereby generate active vouching output data.
- an active vouching analysis operation may include a “fingerprinting” analysis operation.
- active vouching or fingerprinting may include data processing operations configured to determine whether there exist one (or more) tuples (e.g., representing entity, location, and value) extracted from documents data that can match hypothesis data. Some embodiments of a fingerprinting analysis operation are described below with respect to FIGS. 3 and 4 .
- the active vouching output data generated by active vouching module 214 may be provided in any suitable data format.
- the active vouching output may include data indicating one or more of the following: a confidence score indicating a confidence level as to whether there is a match (e.g., whether vouching criteria are met, whether there is a match for a hypothesis); a binary indication as to whether there is any match for a hypothesis, which may feedback iteratively into the fingerprinting process; and/or a location within a document corresponding to a hypothesis for which a confidence and/or a binary indication are generated.
- the active vouching output may include four values: an entity name, an entity value, a location (indicating an exact or relative location of the entity), and a confidence value indicating a confidence value of the determined match.
- the active vouching operations performed by module 214 may leverage contextual knowledge to inform what information is sought in an underlying document. In some embodiments, the active vouching operations performed by module 214 may be considered “context aware” because they are able to draw on contextual information that is injected via hypothesis generation module 212 drawing on data received from knowledge substrate 210 .
- the active vouching operations may include one or more deductive reasoning operations, which may include application of one or more rules-based approaches to evaluate document information (e.g., information received from OCR module 204 ). For example, a rules based approach may be used to determine that, if a document is a certain document type, then the document will be known to include certain associated data fields.
- the deductive reasoning operation(s) may be used to calculate and/or adjust an overall weighting.
- weighting may be used in integrating results from multiple approaches (e.g., an inductive approach and a deductive approach). A weighting may be trained using various machine learning methods.
- the active vouching operations may include one or more inductive reasoning operations that may be based on a previous calculation or determination, historical information, or one or more additional insights.
- inductive reasoning operations may based on learning from previous instances of similar data (e.g., sample documents) to determine what may be expected from future data.
- active vouching module 214 may apply context awareness, deductive reasoning, and inductive reasoning together for hypothesis testing.
- system 200 may include three parallel pipelines within the passive vouching pipeline, as represented by template-based pipeline 216 , templateless pipeline 218 , and specialized pipeline 220 .
- Each of pipelines 216 - 220 may comprise one or more processors configured to receive input data from OCR module 204 and/or from document classification module 206 and to process the received input data.
- Each of the pipelines 216 - 220 may apply respective data analysis operations to the received input data and may generate respective output data.
- Template-based pipeline 216 may be configured to apply any one or more template-based analysis operations to the received document data and/or document classification data and to generate output data representing document contents, such as one or more tuples representing entity, location, and value for content extracted from the document.
- Template-based pipeline 216 may be configured to apply one or more document understanding models that are trained for a specific known format. Abbyy Flexicapture is an example of such template-based tool.
- Templateless pipeline 218 may be configured to apply any one or more analysis operations to the received document data and/or document classification data and to generate output data representing document contents, such as one or more tuples representing entity, location, and value for content extracted from the document.
- Templateless pipeline 218 may be configured which to operate without any assumption that documents being analyzed have a presumed “template” for document understanding.
- a templateless approach may be less accurate than a template-based tool, and may require more training against a larger training set as compared to a template-based tool.
- Specialized pipeline 220 may be configured to apply any one or more analysis operations to the received document data and/or document classification data and to generate output data representing document contents.
- specialized pipeline 220 may be configured to apply a signature analysis.
- signature analysis may include signature detection, for example using a machine-learning algorithm configured to determine whether or not a signature is present.
- signature analysis may include signature matching, for example using one or more data processing operations to determine a person whose signature matches a detected signature (for example by leveraging comparison to a library of known signatures).
- specialized pipeline 220 may be used when system 200 has access to outside information, such as information in addition to information from documents source 202 and from ERP data source 208 .
- specialized pipeline may be configured to use information from knowledge substrate 210 in analyzing the received data and generating output data.
- pipeline 220 may be configured to extract data from documents that includes additional data (or data in a different format) as compared to data that is extracted by pipelines 216 and 218 .
- pipeline 220 may extract data other than (or in addition to) a tuple representing entity, location, and value).
- the extracted data may include logo data, signature data (e.g., an image or other representation of the signature, an indication as to whether there is a signature, etc.), figures, drawings, or the like.
- output data may include the logo itself (e.g., an image or other representation of the signature), a location within the document, and/or a customer name matched to the logo.
- output data may include the signature itself (e.g., an image or other representation of the signature), a location within the document, and/or a customer name matched to the signature.
- output data may include the handwriting itself (e.g., an image or other representation of the handwriting), a location within the document, a customer name matched to the handwriting, and/or text extracted from the handwriting.
- output data may include the figure itself (e.g., an image or other representation of the figure), a location within the document, and/or a bounding box for the figure.
- System 200 may include normalization and contextualization module 222 , which may include one or more processors configured to perform one or more data normalization and/or contextualization operations.
- Normalization and contextualization module 222 may receive input data from any one or more of: (a) template-based pipeline 216 , (b) templateless pipeline 218 , (c) specialized pipeline 220 ; and knowledge substrate 210 .
- Normalization and contextualization module 222 may apply one or more normalization and contextualization operations to some or all of the received data and may thereby generate normalized and/or contextualized output data.
- a normalization and contextualization data processing operation may determine context of an entity and/or may normalize an entity value so that it can be used for subsequent comparison or classification. Examples include (but are not limited to) the following: normalization of customer name data (such as alias, abbreviations, and potentially including parent/sibling/subsidiary when the name is used in the context of payment) based on master customer/vendor data; normalization of address data (e.g., based on geocoding, based on standardized addresses from a postal office, and/or based on customer/vendor data); normalization of product name and SKU based on master product data; normalization of shipping and payment terms based on terms (e.g., based on International Commerce Terms); and/or normalization of currency exchange code (e.g., based on ISO 4217).
- customer name data such as alias, abbreviations, and potentially including parent/sibling/subsidiary when the name is used in the context of payment
- address data e.g.,
- the normalized and/or contextualized output data generated by normalization and contextualization module 222 may be provided in any suitable data format, for example as a set of tuples representing entity, entity location, normalized entity value, and confidence score.
- System 200 may include passive vouching and tracing module 224 , which may include one or more processors configured to perform one or more passive vouching and tracing operations.
- Passive vouching and tracing module 224 may receive input data from any one or more of: (a) normalization and contextualization module 222 , (b) knowledge substrate 210 , and (c) ERP data source 208 .
- Passive vouching and tracing module 224 may apply one or more passive vouching and/or tracing operations to some or all of the received data and may thereby generate passive vouching and tracing output data.
- Passive vouching may comprise comparing values from a given transaction record (e.g., as represented in ERP data) with entity values extracted from documents data (which may be assumed to be the evidence that is associated with the transaction record).
- Passive tracing may comprise comparing values from a given document with a corresponding transaction record, e.g., from in the ERP. Comparison of entity values may be precise, such that the generated result indicates either a match or a mismatch, or the comparison may be fuzzy, such that the generated result comprises a similarity score.
- the passive vouching and tracing output data generated by passive vouching and tracing module 224 may be provided in any suitable data format.
- the passive vouching and tracing operations performed by module 224 may be considered “context aware” because they are able to draw on contextual information received from knowledge substrate 210 .
- the passive vouching output may include four values: an entity name, an entity value, a location (indicating an exact or relative location of the entity), and a confidence value indicating a confidence value of the determined match.
- system 200 Downstream of both the active vouching pipeline and the passive vouching pipeline, system 200 may be configured to combine the results of the active vouching and the passive vouching pipelines in order to generate a combined result.
- System 200 may include data integrity integration module 226 , which may include one or more processors configured to perform one or more data integrity integration operations.
- Data integrity integration module 226 may receive input data from any one or more of: (a) active vouching module 214 and (b) passive vouching and tracing module 224 .
- Data integrity integration module 226 may apply one or more data integrity integration operations to some or all of the received data and may thereby data integrity integration output data.
- the data integrity integration output data generated by data integrity integration module 226 may be provided in any suitable data format, and may for example include a combined confidence score indicating a confidence level (e.g., a percentage confidence) by which system 200 has determined that the underlying documents vouch for the ERP information.
- the data integrity integration output data may comprise a set of tuples—e.g., representing entity, match score, and confidence—for each of the entities that have been analyzed.
- a decision e.g., a preliminary decision
- a record e.g., an ERP record
- the one or more data integrity integration operations applied by module 226 may process the input data from active vouching module 214 and passive vouching module 224 in accordance with one of the following four scenarios:
- FIGS. 3 A- 3 B depict a diagram of how a fingerprinting algorithm may be used as part of a process to render a decision (e.g., a confidence value) about whether purchase order is vouched, in some embodiments, by the systems disclosed herein.
- FIGS. 3 A- 3 B depict how two evidence sets may be used to generate an overall result indicating a vouching confidence level.
- “evidence set 1” may comprise output data generated by an active vouching algorithm, and may share any one or more characteristics in common with the output data generated by active vouching module 214 in system 200 .
- FIGS. 1 may comprise output data generated by an active vouching algorithm, and may share any one or more characteristics in common with the output data generated by active vouching module 214 in system 200 .
- “evidence set 2” may comprise output data generated by one or more document processing pipelines, and may share any one or more characteristics in common with the output data generated by pipelines 216 , 218 , and/or 220 in system 200 .
- the combination of evidence set 1 and evidence set 2, as shown in FIGS. 3 A- 3 B , to generate a vouching decision and/or a confidence value may correspond to any one or more of modules 222 , 224 , and 226 in system 200 .
- Fingerprinting is a technique that may leverage ERP data to aid document understanding and vouching. Fingerprinting uses the context from ERP as a fingerprint for how the system searches an unstructured document for evidence of a match. By knowing what PO characteristics to look for from the ERP entry (e.g., specific PO #, set of item numbers associated with this PO, total amount of this PO, etc.), the system may look for those evidences in the attached PO (unstructured document).
- specific PO # e.g., specific PO #, set of item numbers associated with this PO, total amount of this PO, etc.
- fingerprinting may provide important context that allows an AI algorithm to make better judgement of what it is seeing on a document, such that the system can achieve higher extraction accuracy and match rates.
- One drawback of fingerprinting is that, if not used carefully, it may introduce bias—e.g., causing the system to see “only what you want to see.” For example, there may be additional attachments (POs, transactions, statements) that bear no relationships to the ERP but yet should be carefully reviewed.
- fingerprinting should not be used alone, but rather should be combined with other vouching logic and algorithms to ensure accuracy and effectiveness.
- fingerprinting can include a simple search for an expected value, such as a particular PO number.
- an expected value such as a particular PO number.
- PO number is very unique, this may work well in most cases, giving the system confidence that if it found PBC2145XC01, it did indeed match on the expected PO number.
- other fields might not be as simple, for example, the field Quantity. Searching for a value of ‘1’ could return a number of matches on a single document and even more across an entire set of documents, giving the system little confident that it has indeed matched on Quantity.
- Confidence in fingerprinting may be refined by combining what is learned from 1 ) template-based extraction, 2) template-less extraction, and 3) additional ML models and algorithms on top of search findings, to remove spurious matches and increase confidence in matches.
- FIGS. 3 A- 3 B show how various document-understanding components function together with fingerprinting, in accordance with some embodiments.
- the combination of functions shown in FIGS. 3 A- 3 B may enable improved overall goals, including an increased percent of vouched entries and an increased confidence on vouched entries.
- FIG. 4 shows a diagram of a fingerprinting algorithm, in accordance with some embodiments.
- a fingerprinting algorithm may generate output for PO Headers and/or PO Lines.
- the algorithm may use Elasticsearch to index OCR text extraction of unstructured documents for search and/or lookup.
- the algorithm may use entity extraction to identify and normalize dates.
- the algorithm may use one or more spatial models to identify PO Lines to reduce spurious matches.
- the algorithm may support derived total amount search.
- the algorithm may support delivery terms synonyms.
- the fingerprinting algorithm may include one or more of the following steps, sub-steps, and/or features:
- a system is configured vouch payment data against evidence data. More specifically, a system may be configured to provide a framework that performs ERP payment activities vouching against physical bank statement.
- the system may include a pipeline that perform information extraction and characteristics extraction from bank statements, and the system may leverage one or more advanced data structures and matching algorithms to perform one-to-many matching between ERP data and bank statement data.
- the payment vouching systems provided herein may thus automate the process of finding material evidence such as remittance advice or bank statements to corroborate ERP payment entries.
- the system may be configured to receive a data set comprising bank statement data, wherein the bank statement data may be provided, for example, in the form of PDF files or JPG files of bank statements.
- the system may apply one or more data processing operations (e.g., AI models) to the received bank statement data in order to extract information (e.g., key content and characteristics) from said data.
- the extracted information may be stored in any suitable output format, and/or may be used to generate one or more feature vectors representing one or more bank statements in the bank statement data.
- the system may be configured to receive a data set comprising ERP data, wherein the ERP data may be comprise one or more ERP entries.
- the system may apply one or more data processing operations (e.g., AI models) to the received ERP data in order to extract information (e.g., key content and characteristics) from said data.
- the extracted information may be stored in any suitable output format, and/or may be used to generate one or more feature vectors representing one or more ERP entries in the ERP data.
- the system may be configured to apply one or more algorithms (e.g., matching algorithms) to compare the information extracted from the bank statements against the information extracted from the ERP entries, and to thereby determine whether the bank statements sufficiently vouch the ERP entries.
- performing the comparison may comprise applying an approximation algorithm configured to achieve better matching rates between ERP records and bank statements with minor numeric discrepancies, which may be caused, for example, due to currency conversion, rather than being indicative of substantive discrepancies.
- the system may determine, based on the similarity or dissimilarity of the information indicated by the two information sets, whether one or more vouching criteria are satisfied.
- the system may generate an output that indicates a level of matching between the bank statements and ERP entries (e.g., a similarity score), an indication of whether one or more vouching criteria (e.g., a threshold similarity score and/or threshold confidence level) are met, an indication of any discrepancies identified, and/or a level of confidence (e.g., a confidence score) in one or more conclusions reached by the system.
- output data may be stored, transmitted, presented to a user, used to generate one or more visualizations, and/or used to trigger one or more automated system actions.
- the system may be configured in a modular manner, such that one or more data processing operations may be modified without modification or one or more feature engineering and/or data comparison operations, and vice versa. This may allow for the system to be configured and fine-tuned in accordance with changes in business priorities, requested new features, or evolution of legal or regulatory requirements.
- FIGS. 5 A- 5 B show a diagram of a payment vouching method 500 , in accordance with some embodiments.
- all or part of the method depicted in FIGS. 5 A- 5 B may be applied by the systems described herein (e.g., system 200 ).
- a payment vouching method may seek to match data representing one or more of the following: date, amount, customer name, and invoice number.
- the system may accept ERP payment journal data and bank statement data as inputs (optionally following data pre-processing and formatting).
- the bank statement data may be subject to one or more AI information extraction models to extract information regarding transaction category, customer name, and invoices.
- the system may then apply a first matching algorithm, for example a fuzzy matching algorithm, to compare the ERP data to the data extracted from the bank statements. If a match is detected, then the system may, among one or more other operations, apply one or more comparison and/or scoring operations in order to generate overall match score data and overall confidence data. If no match is detected, then the system may apply a second matching algorithm, for example an optimization algorithm that has been proposed to solve the Knapsack problem. If no match is detected by the second algorithm, then an overall match score of 0 may be generated. If a match is detected by the second algorithm, then the system may select an optimal subset candidate and may, among one or more other operations, apply one or more comparison and/or scoring operations in order to generate an overall match score and an overall confidence score.
- a first matching algorithm for example a fuzzy matching algorithm
- the system may receive data representing ERP information, for example by receiving data from an ERP payment journal data source.
- the data representing ERP information may be received automatically, according to a predefined schedule, in response to one or more trigger conditions being met, as part of a scraping method, and/or in response to a user input.
- the system may receive the ERP data in any acceptable format.
- ERP data may be provided in a tabular data format, including a data model that defines the structure of the data. ERP data may be received from “account receivable” data or from “cash received” data.
- ERP data may be in tabular format including customer name, invoice data, and invoice amount.
- the system may receive data representing one or more bank statements.
- the data representing the bank statements may be received automatically, according to a predefined schedule, in response to one or more trigger conditions being met, as part of a scraping method, and/or in response to a user input.
- the system may receive the bank statement data in any acceptable format, for example as a structured and/or unstructured document, including for example a PDF document.
- the system may receive bank statement data in PDF format and/or CSV format.
- the system may download electronic bank statement data (such as BAI/BAI2, Multicash, MT940).
- the system may receive bank statement data via EDI and/or ISO 20022.
- the system may receive bank statement data through one or more API aggregators such as Plaid and Yodlee.
- the system may apply one or more information extraction models to the data representing the one or more bank statements.
- the one information extraction models may generate transaction category data 508 , customer name data 510 , and/or invoice data 512 .
- the extracted information may be stored, displayed to a user, transmitted, and/or used for further processing for example as disclosed herein.
- the system may apply one or more fuzzy matching algorithms.
- the one or more fuzzy matching algorithms may accept input data including (but not limited to) data representing ERP information from block 502 , transaction category data 508 , customer name data 510 , and/or invoice data 512 .
- the one or more fuzzy matching algorithms may compare data in a many-to-many manner.
- the one or more fuzzy matching algorithms may process the received input data in order to determine whether there is a match or a near match (e.g., a “fuzzy match”) between the data representing ERP information and the transaction category data 508 , customer name data 510 , and/or invoice data 512 .
- the one or more fuzzy matching algorithms may generate data representing an indication as to whether or not a match has been determined.
- the indication may comprise a binary indication as to whether or not a match has been determined and/or may comprise a confidence score representing a confidence level that a match has been determined.
- the system may determine whether a match was determined at block 514 .
- the system may reference output data generated by the one or more fuzzy matching algorithms to determine whether a match was determined, for example by referencing whether a match is indicated by the output data on a binary basis.
- the system may determine whether a match score generated at block 514 exceeds one or more predetermined or dynamically-determined threshold values in order to determine whether match criteria are met and thus whether a match is determined.
- method 500 may proceed to blocks 518 - 538 .
- method 500 may proceed to block 540 and onward.
- the system may determine whether the match that was determined is a one-to-one match.
- the system may reference output data generated by the one or more fuzzy matching algorithms to determine whether the match that was determined is a one-to-one match.
- the method may proceed to one or both of blocks 510 and 524 .
- the system may apply a fuzzy comparison algorithm to data representing customer name information.
- the system may compare customer name data in the data representing ERP information (received at block 502 ) to customer name data in the data representing one or more bank statements (received at block 504 ).
- the comparison of customer name data may generate output data comprising customer name match score 522 , which may indicate an extent to which and/or a confidence with which the compared customer name data matches.
- the system may apply a fuzzy comparison algorithm to data representing invoice information.
- the system may compare invoice data in the data representing ERP information (received at block 502 ) to invoice data in the data representing one or more bank statements (received at block 504 ).
- the comparison of invoice data may generate output data comprising invoice match score 526 , which may indicate an extent to which and/or a confidence with which the compared invoice data matches.
- the processes represented by blocks 518 , 520 , and 524 may be performed as follows.
- the system may test whether there is a match between data extracted from bank statements and ERP data for the following three attributes: we will need to test whether there is a match between the data extracted from the back statement and the ERP data for the following three attributes: fuzzy date comparison, where small deviations of date data between bank statements and ERP data may be considered acceptable; fuzzy customer name comparison, which may allow comparing normalized customer name data from bank statements (if present) with customer name data from ERP data; and invoice number comparison, where fuzzy invoice number comparison allows comparing invoice numbers between bank statement (if present). It should be noted that customer name and invoice number might not always be available in the bank statement data.
- one or more other component scores aside from or in addition to a customer name match score and an invoice match score, may be computed.
- the system may generate data comprising temporal match score 528 , for example by performing a fuzzy comparison of date data as shown at block 527 .
- Temporal match score 528 may be computed based on a temporal difference (e.g., a number of days difference) in compared data. For example, the system may compare a date indicated in the data representing ERP information (received at block 502 ) to a date indicated in the data representing one or more bank statements (received at block 504 ), and may generate temporal match score 528 based on the difference between the two compared dates.
- the system may generate an overall match score and/or an overall confidence score based on the component scores.
- the system may compute overall match score 534 .
- Computation of overall match score 534 may comprise applying an averaging algorithm (e.g., averaging non-zero component scores), for example by computing a weighted or unweighted average of one or more underlying component scores.
- overall match score 534 may be computed as a the sum of three terms: a weighted fuzzy date comparison score (e.g., weighted 528 ), a weighted fuzzy customer name comparison score (e.g., weighted 522 ), and a weighted fuzzy invoice number comparison score (e.g., weighted 526 ).
- Computing an additive overall match score 534 may mean the overall match score 534 is higher when it is based on a comparison of more (e.g., all three) underlying terms than when it is not.
- the system may compute overall confidence score 538 .
- Computation of overall confidence score 538 may comprise applying an algorithm based one or more underlying confidence scores, such as confidence scores associated with one or more of underlying component scores.
- a highest underlying confidence score may be selected as overall confidence score 538 .
- a lowest underlying confidence score may be selected as overall confidence score 538 .
- a weighted or unweighted average of underlying confidence scores may be computed as overall confidence score 538 .
- a product based on underlying confidence scores may be computed as overall confidence score 538 .
- Overall match score 534 and/or overall confidence score 538 may be stored, transmitted, presented to a user, used to generate one or more visualizations, and/or used to trigger one or more automated system actions.
- the system may apply one or more amount matching algorithms, for example including one or more optimization algorithms that have been proposed to solve the Knapsack problem.
- the one or more amount matching algorithms may accept input data including (but not limited to) data representing ERP information from block 502 , transaction category data 508 , customer name data 510 , and/or invoice data 512 .
- the one or more amount matching algorithms may compare data in a one to many manner.
- the one or more amount matching algorithms may compare data from one bank transaction (e.g., data received at block 504 ) to data for many vouchers (e.g., data received at block 502 ).
- the one or more amount matching algorithms may process the received input data in order to determine whether there is a match between the data representing ERP information and the transaction category data 508 , customer name data 510 , and/or invoice data 512 .
- the one or more amount matching algorithms may generate data representing an indication as to whether or not a match has been determined.
- the indication may comprise a binary indication as to whether or not a match has been determined and/or may comprise a confidence score representing a confidence level that a match has been determined.
- the system may determine whether a match was determined at block 540 .
- the system may reference output data generated by the one or more amount matching algorithms to determine whether a match was determined, for example by referencing whether a match is indicated by the output data on a binary basis.
- the system may determine whether a match score generated at block 540 exceeds one or more predetermined or dynamically-determined threshold values in order to determine whether match criteria are met and thus whether a match is determined.
- method 500 may proceed to blocks 544 - 564 .
- method 500 may proceed to block 566 and onward.
- the system may select a candidate subset of data from the data received at block 502 and/or the data received at block 504 .
- the analysis performed at blocks 546 - 564 may be performed with respect to the selected candidate subset of data.
- the system may identify a set of bank transactions that may be a match, and may then assess each item in the subset to determine which is the best match.
- candidate subsets may include different numbers of items in the candidate subset. For example, one candidate subsets may be “three transactions that may match to a voucher,” while another candidate subset may be “two transactions that may match to a voucher.”
- candidate subset selection may proceed as follows: candidates may be sorted from largest to smallest; then those items in the sorted list that are already larger than the target may be eliminated, and only those which are smaller than or equal to the target amount are retained; then, a total amount from all of the remaining items may be computed, and those that match the target may be identified.
- an overall objective may include determining whether the amount C from payment is a match to two or more elements among ⁇ A1, A2, A3 ⁇ . If A1, A2, A3, have been sorted from largest to smallest, then it may be necessary to test whether
- the system may generate one or more component scores, such as component scores 548 , 552 , and/or 556 described below.
- the system may apply one or more subset match score algorithms to the selected candidate subset of data, thereby generating subset match score 548 , which may indicate an extent to which and or a confidence by which two or more components (e.g., data points) of the selected subset match with one another.
- Block 546 may compare a voucher amount to a bank amount.
- Block 546 may compare an amount appearing in the data received at block 502 to an amount appearing in the data received at block 504 .
- the system may apply one or more fuzzy name comparison algorithms to the selected candidate subset of data, thereby generating customer name match score 552 , which may indicate an extent to which and or a confidence by which two or more customer names in the selected subset match with one another.
- Block 550 may compare a customer name in voucher data with a customer name in statement data.
- Block 550 may compare a customer name appearing in the data received at block 502 to a customer name appearing in the data received at block 504 .
- the system may apply one or more fuzzy invoice comparison algorithms to the selected candidate subset of data, thereby generating invoice match score 556 , which may indicate an extent to which and or a confidence by which two or more invoices in the selected subset match with one another.
- Block 554 may compare two instances of invoice data to one another.
- Block 550 may compare invoice data appearing in the data received at block 502 to invoice data appearing in the data received at block 504 .
- the system may generate an overall match score and/or an overall confidence score based on the component scores.
- the system may compute overall match score 560 .
- Computation of overall match score 560 may comprise applying an averaging algorithm (e.g., averaging non-zero component scores), for example by computing a weighted or unweighted average of one or more underlying component scores.
- an averaging algorithm e.g., averaging non-zero component scores
- the system may compute overall confidence score 564 .
- Computation of overall confidence score 564 may comprise applying an algorithm based one or more underlying confidence scores, such as confidence scores associated with one or more of underlying component scores.
- a highest underlying confidence score may be selected as overall confidence score 564 .
- a lowest underlying confidence score may be selected as overall confidence score 564 .
- a weighted or unweighted average of underlying confidence scores may be computed as overall confidence score 564 .
- a product based on underlying confidence scores may be computed as overall confidence score 564 .
- Overall match score 560 and/or overall confidence score 564 may be stored, transmitted, presented to a user, used to generate one or more visualizations, and/or used to trigger one or more automated system actions.
- the system may determine that an overall match score is 0.
- the overall match score of 0 may be stored, transmitted, presented to a user, used to generate one or more visualizations, and/or used to trigger one or more automated system actions.
- the system may be configured to apply a plurality of different algorithms (e.g., two different algorithms, three different algorithms, etc.) as part of a payment vouching process.
- the algorithms may be applied in parallel.
- the algorithms may be applied in series.
- the algorithms may be applied selectively dependent on the outcome of one another; for example, the system may first apply one algorithm and then may apply another algorithm selectively dependent on the outcome of the first algorithm (e.g., whether or not a match was indicated by the first algorithm).
- the system may be configured to apply a waterfall algorithm, a fuzzy date-amount algorithm, and an optimization algorithm that has been proposed to solve the Knapsack problem.
- FIG. 6 illustrates an example of a computer, according to some embodiments.
- Computer 600 can be a component of a system for providing an AI-augmented auditing platform including techniques for providing AI-explainability for processing data through multiple layers.
- computer 600 may execute any one or more of the methods described herein.
- Computer 600 can be a host computer connected to a network.
- Computer 600 can be a client computer or a server.
- computer 600 can be any suitable type of microprocessor-based device, such as a personal computer, workstation, server, or handheld computing device, such as a phone or tablet.
- the computer can include, for example, one or more of processor 610 , input device 620 , output device 630 , storage 640 , and communication device 660 .
- Input device 620 and output device 630 can correspond to those described above and can either be connectable or integrated with the computer.
- Input device 620 can be any suitable device that provides input, such as a touch screen or monitor, keyboard, mouse, or voice-recognition device.
- Output device 630 can be any suitable device that provides an output, such as a touch screen, monitor, printer, disk drive, or speaker.
- Storage 640 can be any suitable device that provides storage, such as an electrical, magnetic, or optical memory, including a random access memory (RAM), cache, hard drive, CD-ROM drive, tape drive, or removable storage disk.
- Communication device 660 can include any suitable device capable of transmitting and receiving signals over a network, such as a network interface chip or card.
- the components of the computer can be connected in any suitable manner, such as via a physical bus or wirelessly.
- Storage 640 can be a non-transitory computer-readable storage medium comprising one or more programs, which, when executed by one or more processors, such as processor 610 , cause the one or more processors to execute methods described herein.
- Software 650 which can be stored in storage 640 and executed by processor 610 , can include, for example, the programming that embodies the functionality of the present disclosure (e.g., as embodied in the systems, computers, servers, and/or devices as described above). In some embodiments, software 650 can include a combination of servers such as application servers and database servers.
- Software 650 can also be stored and/or transported within any computer-readable storage medium for use by or in connection with an instruction execution system, apparatus, or device, such as those described above, that can fetch and execute instructions associated with the software from the instruction execution system, apparatus, or device.
- a computer-readable storage medium can be any medium, such as storage 640 , that can contain or store programming for use by or in connection with an instruction execution system, apparatus, or device.
- Software 650 can also be propagated within any transport medium for use by or in connection with an instruction execution system, apparatus, or device, such as those described above, that can fetch and execute instructions associated with the software from the instruction execution system, apparatus, or device.
- a transport medium can be any medium that can communicate, propagate, or transport programming for use by or in connection with an instruction execution system, apparatus, or device.
- the transport-readable medium can include but is not limited to, an electronic, magnetic, optical, electromagnetic, or infrared wired or wireless propagation medium.
- Computer 600 may be connected to a network, which can be any suitable type of interconnected communication system.
- the network can implement any suitable communications protocol and can be secured by any suitable security protocol.
- the network can comprise network links of any suitable arrangement that can implement the transmission and reception of network signals, such as wireless network connections, T1 or T3 lines, cable networks, DSL, or telephone lines.
- Computer 600 can implement any operating system suitable for operating on the network.
- Software 650 can be written in any suitable programming language, such as C, C++, Java, or Python.
- application software embodying the functionality of the present disclosure can be deployed in different configurations, such as in a client/server arrangement or through a Web browser as a Web-based application or Web service, for example.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- General Business, Economics & Management (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Technology Law (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- Operations Research (AREA)
- Game Theory and Decision Science (AREA)
- Educational Administration (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
Abstract
Description
- This application claims the benefit of U.S. Provisional Application No. 63/217,119 filed Jun. 30, 2021; U.S. Provisional Application No. 63/217,123 filed Jun. 30, 2021; U.S. Provisional Application No. 63/217,127 filed Jun. 30, 2021; U.S. Provisional Application No. 63/217,131 filed Jun. 30, 2021; and U.S. Provisional Application No. 63/217,134, filed Jun. 30, 2021, the entire contents of each of which are incorporated herein by reference.
- This relates generally to automated data processing and validation of data, and more specifically to AI-augmented auditing platforms including techniques for assessment of vouching evidence.
- When performing audits, or when otherwise ingesting, reviewing, and analyzing documents or other data, there is often a need to establish that one or more statements, assertions, or other representations of fact are sufficiently substantiated by documentary evidence. In the context of performing audits, establishing that one or more statements (e.g., a financial statement line item (FSLI)) is sufficiently supported by documentary evidence is referred to as vouching.
- When performing audits, or when otherwise ingesting, reviewing, and analyzing documents or other data, there is often a need to establish that one or more statements, assertions, or other representations of fact are sufficiently substantiated by documentary evidence. In the context of performing audits, establishing that one or more statements (e.g., a financial statement line item (FSLI)) is sufficiently supported by documentary evidence is referred to as vouching.
- In automated auditing systems that seek to ingest and understand documentary evidence in order to vouch for one or more statements (e.g., FSLI's), known document-understanding techniques are sensitive to the structure of the documents that are ingested and analyzed. Accordingly, known document-understanding techniques may fail to correctly recognize and identify certain entities referenced in documents, due for example to a misinterpretation of the structure or layout of one or more ingested documents. Accordingly, there is a need for improved document-understanding (e.g., document ingestion and analysis) techniques that are more robust to various document structures and layouts and that provide higher accuracy for entity recognition in documents. There is a need for such improved document-understanding techniques configured to be able to be applied in automated auditing systems in order to determine whether one or more documents constitutes sufficient vouching evidence to substantiate one or more assertions (e.g., FSLI's).
- Disclosed herein are improved document-understanding techniques that may address one or more of the above-identified needs. In some embodiments, as explained herein, the document-understanding techniques disclosed herein may leverage a priori knowledge (e.g., information available from a data source separate from the document(s) being assessed for sufficiency for vouching purposes) of one or more entities in extracting and/or analyzing information from one or more documents. In some embodiments, the document-understanding techniques may analyze the spatial configuration of words, paragraphs, or other content in a document in extracting and/or analyzing information from one or more documents.
- Furthermore, pursuant to the need to perform automated vouching, there is a need for improved systems and methods for vouching ERP entries against bank statement data in order to verify payment.
- In some embodiments, a system is configured vouch payment data against evidence data. More specifically, a system may be configured to provide a framework that performs ERP payment activities vouching against physical bank statement. The system may include a pipeline that perform information extraction and characteristics extraction from bank statements, and the system may leverage one or more advanced data structures and matching algorithms to perform one-to-many matching between ERP data and bank statement data. The payment vouching systems provided herein may thus automate the process of finding material evidence such as remittance advice or bank statements to corroborate ERP payment entries.
- In some embodiments, a first system is provided, the first system being for determining whether data within an electronic document constitutes vouching evidence for an enterprise resource planning (ERP) item, the first system comprising one or more processors configured to cause the first system to: receive data representing an ERP item; generate hypothesis data based on the received data represent an ERP item; receive an electronic document; extract ERP information from the document; apply one or more models to the hypothesis data and to extracted ERP information in order to generate output data indicating whether the extracted ERP information constitutes vouching evidence for the ERP item.
- In some embodiments of the first system, extracting the instance of ERP information comprises generating first data representing information content of the instance of ERP information and second data representing a document location for the instance of ERP information
- In some embodiments of the first system, the ERP information comprises one or more of: a purchase order number, a customer name, a date, a delivery term, a shipping term, a unit price, and a quantity.
- In some embodiments of the first system, applying the one or more models to generate output data is based on preexisting information regarding spatial relationships amongst instances of ERP information in documents.
- In some embodiments of the first system, the preexisting information comprises a graph representing spatial relationships amongst instances of ERP information in documents.
- In some embodiments of the first system, the one or more processors are configured to cause the system to augment the hypothesis data based on one or more models representing contextual data.
- In some embodiments of the first system, the contextual data comprises information regarding one or more synonyms for the information content of the instance of ERP information.
- In some embodiments of the first system, the instance of ERP information comprises a single word in the document.
- In some embodiments of the first system, the instance of ERP information comprises a plurality of words in the document.
- In some embodiments of the first system, the one or more processors are configured to determine whether the ERP information vouches for the ERP item.
- In some embodiments of the first system, determining whether the ERP information vouches for the ERP item comprises generating and evaluating a similarity score representing a comparison of the ERP information and the ERP item.
- In some embodiments of the first system, the similarity generated by comparing an entity graph associated with the ERP information to an entity graph associated with the ERP item.
- In some embodiments of the first system, extracting the ERP information from the document comprises applying a fingerprinting operation to determine, based on the receive data representing an ERP item, a characteristic of a data extraction operation to be applied to the electronic document.
- In some embodiments, a first non-transitory computer-readable storage medium is provided, the first non-transitory computer-readable storage medium storing instructions for determining whether data within an electronic document constitutes vouching evidence for an enterprise resource planning (ERP) item, the instructions configured to be executed by a system comprising one or more processors to cause the system to: receive data representing an ERP item; generate hypothesis data based on the received data represent an ERP item; receive an electronic document; extract ERP information from the document; apply one or more models to the hypothesis data and to extracted ERP information in order to generate output data indicating whether the extracted ERP information constitutes vouching evidence for the ERP item.
- In some embodiments, a first method is provided, the first method being for determining whether data within an electronic document constitutes vouching evidence for an enterprise resource planning (ERP) item, wherein the first method is performed by a system comprising one or more processors, the first method comprising: receiving data representing an ERP item; generating hypothesis data based on the received data represent an ERP item; receiving an electronic document; extracting ERP information from the document; applying one or more models to the hypothesis data and to extracted ERP information in order to generate output data indicating whether the extracted ERP information constitutes vouching evidence for the ERP item.
- In some embodiments, a second system is provided, the second system being for verifying an assertion against a source document, the second system comprising one or processors configured to cause the second system to: receive first data indicating an unverified assertion; receive second data comprising a plurality of source documents; apply one or more extraction models to extract a set of key data from the plurality of source documents; and apply one or more matching models to compare the first data to the set of key data to generate an output indicating whether one or more of the plurality of source documents satisfies one or more verification criteria for verifying the unverified assertion.
- In some embodiments of the second system, the one or more extraction models comprise one or more machine learning models.
- In some embodiments of the second system, the one or more matching models comprises one or more approximation models.
- In some embodiments of the second system, the one or more matching models are configured to perform one-to-many matching between the first data and the set of key data.
- In some embodiments of the second system, the one of more processors are configured to cause the system to modify one or more of the extraction models without modification of one or more of the matching models.
- In some embodiments of the second system, the one of more processors are configured to cause the system to modify one or more of the matching models without modification of one or more of the extraction models.
- In some embodiments of the second system, the unverified assertion comprises an ERP payment entry.
- In some embodiments of the second system, the plurality of source documents comprises a bank statement.
- In some embodiments of the second system, applying one or more matching models comprises generating a match score and generating a confidence score.
- In some embodiments of the second system, applying one or more matching models comprises: applying a first matching model; if a match is indicated by the first matching model, generating a match score and a confidence score based on the first matching model; if a match is not indicated by the second matching model: applying a second matching model; if a match is indicated by the second matching model, generating a match score and a confidence score based on the second matching mode; and if a match is not indicated by the second matching model, generating a match score of 0.
- In some embodiments, a second non-transitory computer-readable storage medium is provided, the second non-transitory computer-readable storage medium storing instructions for verifying an assertion against a source document, the instructions configured to be executed by a system comprising one or processors to cause the system to: receive first data indicating an unverified assertion; receive second data comprising a plurality of source documents; apply one or more extraction models to extract a set of key data from the plurality of source documents; and apply one or more matching models to compare the first data to the set of key data to generate an output indicating whether one or more of the plurality of source documents satisfies one or more verification criteria for verifying the unverified assertion.
- In some embodiments, a second method is provided, the second method being for verifying an assertion against a source document, wherein the second method is executed by a system comprising one or processors, the second method comprising: receiving first data indicating an unverified assertion; receiving second data comprising a plurality of source documents; applying one or more extraction models to extract a set of key data from the plurality of source documents; and applying one or more matching models to compare the first data to the set of key data to generate an output indicating whether one or more of the plurality of source documents satisfies one or more verification criteria for verifying the unverified assertion.
- In some embodiments, a third system, for determining whether data within an electronic document constitutes vouching evidence for an enterprise resource planning (ERP) item, is provided, the third system comprising one or more processors configured to cause the third system to: receive data representing an ERP item; generate hypothesis data based on the received data represent an ERP item; receive an electronic document; extract ERP information from the document; apply a first set of one or more models to the hypothesis data and to extracted ERP information in order to generate first output data indicating whether the extracted ERP information constitutes vouching evidence for the ERP item; apply a second set of one or more models to the extracted ERP information in order to generate second output data indicating whether the extracted ERP information constitutes vouching evidence for the ERP item; generate combined determination data, based on the first output data and the second output data, indicating whether the extracted ERP information constitutes vouching evidence for the ERP item.
- In some embodiments, a third non-transitory computer-readable storage medium is provided, the third non-transitory computer-readable storage medium storing instructions for determining whether data within an electronic document constitutes vouching evidence for an enterprise resource planning (ERP) item, the instructions configured to be executed by a system comprising one or more processors to cause the system to: receive data representing an ERP item; generate hypothesis data based on the received data represent an ERP item; receive an electronic document; extract ERP information from the document; apply a first set of one or more models to the hypothesis data and to extracted ERP information in order to generate first output data indicating whether the extracted ERP information constitutes vouching evidence for the ERP item; apply a second set of one or more models to the extracted ERP information in order to generate second output data indicating whether the extracted ERP information constitutes vouching evidence for the ERP item; generate combined determination data, based on the first output data and the second output data, indicating whether the extracted ERP information constitutes vouching evidence for the ERP item.
- In some embodiments, a third method, for determining whether data within an electronic document constitutes vouching evidence for an enterprise resource planning (ERP) item, is provided, wherein the third method is performed by a system comprising one or more processors, the third method comprising: receiving data representing an ERP item; generating hypothesis data based on the received data represent an ERP item; receiving an electronic document; extracting ERP information from the document; applying a first set of one or more models to the hypothesis data and to extracted ERP information in order to generate first output data indicating whether the extracted ERP information constitutes vouching evidence for the ERP item; applying a second set of one or more models to the extracted ERP information in order to generate second output data indicating whether the extracted ERP information constitutes vouching evidence for the ERP item; generating combined determination data, based on the first output data and the second output data, indicating whether the extracted ERP information constitutes vouching evidence for the ERP item.
- In some embodiments, any one or more of the features, characteristics, or aspects of any one or more of the above systems, methods, or non-transitory computer-readable storage media may be combined, in whole or in part, with one another and/or with any one or more of the features, characteristics, or aspects (in whole or in part) of any other embodiment or disclosure herein.
- Various embodiments are described with reference to the accompanying figures, in which:
-
FIG. 1 shows two examples of extracting entities from documents, in accordance with some embodiments. -
FIG. 2 shows a system for data processing for an AI-augmented auditing platform, in accordance with some embodiments. -
FIGS. 3A-3B depict a diagram of how a fingerprinting algorithm may be used as part of a process to render a decision about whether purchase order is vouched, in accordance with some embodiments. -
FIG. 4 shows a diagram of a fingerprinting algorithm, document-understanding, and vouching algorithm, in accordance with some embodiments. -
FIGS. 5A-5B show a diagram of a payment vouching method, in accordance with some embodiments. -
FIG. 6 illustrates an example of a computer, according to some embodiments. - Active Document Comprehension for Assurance
- When performing audits, or when otherwise ingesting, reviewing, and analyzing documents or other data, there is often a need to establish that one or more statements, assertions, or other representations of fact are sufficiently substantiated by documentary evidence. In the context of performing audits, establishing that one or more statements (e.g., a financial statement line item (FSLI)) is sufficiently supported by documentary evidence is referred to as vouching.
- In automated auditing systems that seek to ingest and understand documentary evidence in order to vouch for one or more statements (e.g., FSLI's), known document-understanding techniques are sensitive to the structure of the documents that are ingested and analyzed. Accordingly, known document-understanding techniques may fail to correctly recognize and identify certain entities referenced in documents, due for example to a misinterpretation of the structure or layout of one or more ingested documents. Accordingly, there is a need for improved document-understanding (e.g., document ingestion and analysis) techniques that are more robust to various document structures and layouts and that provide higher accuracy for entity recognition in documents. There is a need for such improved document-understanding techniques configured to be able to be applied in automated auditing systems in order to determine whether one or more documents constitutes sufficient vouching evidence to substantiate one or more assertions (e.g., FSLI's).
- Disclosed herein are improved document-understanding techniques that may address one or more of the above-identified needs. In some embodiments, as explained herein, the document-understanding techniques disclosed herein may leverage a priori knowledge (e.g., information available from a data source separate from the document(s) being assessed for sufficiency for vouching purposes) of one or more entities in extracting and/or analyzing information from one or more documents. In some embodiments, the document-understanding techniques may analyze the spatial configuration of words, paragraphs, or other content in a document in extracting and/or analyzing information from one or more documents.
- In some embodiments, a document-understanding system is configured to perform automated hypothesis generation based on one or more data sets. The data sets on which hypothesis generation is based may include one or more sets of ingested documents, for example documents ingested in accordance with one or more document-understanding techniques described herein. In some embodiments, the data sets on which hypothesis generation is based may include enterprise resource planning (ERP) data. In some embodiments, the data (e.g., ERP data) may indicate one or more entities, for example a PO #, a customer name, a date, a delivery term, a shipping term, a unit price, and/or a quantity. The system may be configured to apply a priori knowledge (e.g., information available from a data source separate from the document(s) being assessed for sufficiency for vouching purposes) regarding one or more of the entities indicated in the data. The hypothesis generation techniques disclosed herein may enable more accurate vouching of ERP data with evidence from unstructured documents and other evidence sources.
- The system may be configured to analyze spatial relationships and constellation among entities indicated in the data. For example, the position at which entities are indicated in a document (e.g., a unit price and a quantity indicated on a same line of a document versus on a different line of a document) may be analyzed. In some embodiments, the system may be configured to generate, store, and/or analyze a data structure, such as a graph data structure, that represents spatial relationships amongst a plurality of entities in one or more documents.
- The system may be configured to apply one or more AI models to comprehend documents to identify and assess evidence to vouch for the validity of financial information reported in ERPs. The system may use the ERP data to weakly label and provide hypotheses to documents that are candidates for possible evidence. The system may further apply one or more name entity extraction models to provide additional bias-free information to overlay on top of these documents. The combination of these features may enable the system to validate whether candidate evidence is indeed vouching evidence (e.g., whether it meets vouching criteria) for a given ERP entry, including by providing a quantification/score of the system's confidence in the conclusion that the candidate evidence does or does not constitute vouching evidence.
- In some embodiments, the system may be configured to receive ERP data and to apply one or more data processing operations (e.g., AI models) to the received data in order to generate hypothesis data. (Any data processing operation referenced herein may include application of one or more models trained by machine-learning.) The hypothesis data may consist of one or more content entities that the system hypothesizes to be indicated in the received data, for example: PO #, customer name, date, delivery term, shipping term, unit price, and/or quantity. The system may assess one or more of the following in generating hypothesis data and/or in assessing hypothesis data once it is generated: a priori knowledge (e.g., knowledge from one or more data sources aside from the ERP data source); spatial relationships amongst words, paragraphs, or other indications of entities within the ERP data (e.g., spatial relationships of words within a document), and/or constellations amongst entities (e.g., unit price & quantity appearing on the same line).
- Following hypothesis generation, the system may apply one or more data processing operations (e.g., AI models) in order to augment one or more of the generated hypotheses. In some embodiments, the system may augment (or otherwise modify) a generated hypothesis on the basis of context data available to the system. In some embodiments, context data may include synonym data, such that the system may augment a hypothesis in accordance with synonym data. For example, hypothesis data that includes the word “IBM” may be augmented to additionally include the term “International Business Machines”.
- The system may be configured to perform spatial entity extraction. In some embodiments, spatial entity extraction includes extracting entities (at the word-level and at the multi-word level) from a document to generate information regarding (a) the entity content/identity and (b) information regarding a spatial location of the entity (e.g., an absolute spatial location within a document and/or a spatial location/proximity/alignment/orientation with respect to one or more other entities within the document).
- The system may be configured to perform one or more hypothesis testing operations in order to evaluate the likelihood of a match, for example based on calculating a similarity score. The likelihood of a match may be evaluated between ERP data on one hand and a plurality of documents on the other hand. In some embodiments, the likelihood of a match may be based on calculating a similarity score between the entity (or entities) representing the hypothesis and the entity (or entity graph) representing components within the documents.
- The systems and methods provided herein may provide improvements over existing approaches, including by providing the ability to use contextual information guided by an audit process to aid in comprehension, to use contextual information to form hypotheses on the expected information to be extracted from documents, to allow the testing of these hypotheses to guide document comprehension, and/or to apply methods to mitigate and account for the possibility of biases introduced by contextual information (e.g., by adjusting a confidence score accordingly).
-
FIG. 1 depicts two examples of extracting entities from documents, in accordance with some embodiments. -
FIG. 2 depicts asystem 200 for data processing for an AI-augmented auditing platform, in accordance with some embodiments. The components labeled “hypothesis generation” and “active vouching” may, in some embodiments, include any one or more of the systems (and/or may apply any one or more of the methods) described herein. - In some embodiments each of the schematic blocks shown in
FIG. 2 may represent a distinct module (e.g., each distinct module comprising one or more distinct computer systems including storage devices and/or one or more physical and/or virtual processors) configured to perform associated functionality. In some embodiments, any one or more of the schematic blocks shown inFIG. 2 may represent functionalities performed by a same module (e.g., by a same computer system). - As described below,
system 200 may be configured to perform any one or processes for active vouching; passive vouching and tracing; and/or data integrity integration, for example as described herein. - As shown in
FIG. 2 ,system 100 may includedocuments source 202, which may include any one or more computer storage devices such as databases, data stores, data repositories, live data feeds, or the like. Documents source 202 may be communicatively coupled to one or more other components ofsystem 200 and configured to provide a plurality of document tosystem 200, such that the documents can be assessed to determine whether one or more data integrity criteria are met, e.g., whether the documents sufficiently vouch for one or more representations made by a set of ERP data. In some embodiments,system 200 may receive documents from documents source 202 on a scheduled basis, in response to a user input, in response to one or more trigger conditions being met, and/or in response to the documents being manually sent. Documents received from documents source 202 may be provided in any suitable electronic data format, for example as structured, unstructured, and/or semi-structured data. The documents may include, for example, spreadsheets, word processing documents, and/or PDFs. -
System 200 may includeOCR module 204, which may include any one or more processors configured to perform OCR analysis and/or any other text or character recognition/extraction based on documents received fromdocuments source 202.OCR module 204 may generate data representing characters recognized in the received documents. -
System 200 may includedocument classification module 206, which may include one or more processors configured to perform document classification of documents received from documents source 202 and/or fromOCR module 204.Document classification module 206 may receive document data from documents source 202 and/or may receive data representing characters in documents fromOCR module 204, and may apply one or more classification algorithms to the received data to apply one or more classifications to the documents received fromdocuments source 202. Data representing the determined classifications may be stored as metadata in association with the documents themselves and/or may be used to store the documents in a manner according to their determined respective classification(s). -
System 200 may includeERP data source 208, which may include any one or more computer storage devices such as databases, data stores, data repositories, live data feeds, or the like. Documents source 202 may be communicatively coupled to one or more other components ofsystem 200 and configured to provide ERP data tosystem 200, such that the ERP data can be assessed to determine whether one or more data integrity criteria are met, e.g., whether the ERP data is sufficiently vouched by one or more documents (e.g., the documents provided by documents source 202). In some embodiments, one or more components ofsystem 200 may receive ERP data from ERP data source 208 on a scheduled basis, in response to a user input, in response to one or more trigger conditions being met, and/or in response to the data being manually sent. ERP data received from ERP data source 208 may be provided in any suitable electronic data format. In some embodiments, ERP data may be provided in a tabular data format, including a data model that defines the structure of the data. -
System 200 may includeknowledge substrate 210, which may include any one or more data sources such asmaster data source 210 a,ontology data source 210 b, and exogenousknowledge data source 210 c. The data sources included inknowledge substrate 210 may be provided as part of a single computer system, multiple computer systems, a single network, or multiple networks. The data sources included inknowledge substrate 210 may be configured to provide data to one or more components of system 200 (e.g.,hypothesis generation module 212, normalization andcontextualization module 222, and/or passive vouching and tracing module 224). In some embodiments, one or more components ofsystem 200 may receive data fromknowledge substrate 210 on a scheduled basis, in response to a user input, in response to one or more trigger conditions being met, and/or in response to the data being manually sent. Data received fromknowledge substrate 210 may be provided in any suitable data format. - In some embodiments, interaction with
knowledge substrate 210 may be query based. Interaction withknowledge substrate 210 may be in one or more of the following forms: question answering, information retrieval, query into knowledge graph engine, and/or inferencing engine (e.g., against inferencing rules). -
Knowledge substrate 210 may include data such as ontology/taxonomy data, knowledge graph data, and/or inferencing rules data. Master data received frommaster data source 210 a may include, for example, master customer data, master vendor data, and/or master product data. Ontology data received fromontology data source 210 b may include, for example, IncoTerms data for international commercial terms that define the cost, liability, and/or insurance among the sell side, buy side, and shipper for shipping a product. Exogenous knowledge data source received from exogenousknowledge data source 210 c may include, for example, knowledge external to a specific audit client. This knowledge could be related to the industry of the client, the geographic area of a client, and/or the entire economy. -
System 200 may includehypothesis generation module 212, which may include one or more processors configured to generate hypothesis data.Hypothesis generation module 212 may receive input data from any one or more of: (a)document classification module 206, (b)ERP data source 208, and (c)knowledge substrate 210.Hypothesis generation module 212 may apply one or more hypothesis generation algorithms to some or all of the received data and may thereby generate hypothesis data. Hypothesis generation may be based on any one of, and/or a combination of: (1) ERP data, (2) document type data, (3) data regarding prior understanding of one or more documents. A generated hypothesis may represent where and what is expected to be found in documents data, based on previous exposure to similar documents. Document classification data (e.g., from document classification module 206), for one document and/or for a group of documents, maybe used to determine, augment, and/or weight hypothesis data generated byhypothesis generation module 212. In some embodiments, document content itself (e.g., document data received from documents source 202), as distinct from document classification data (e.g., as generated by document classification module 206) may not be used for hypothesis generation. In some embodiments, document content itself may be used, in addition to document classification data, for hypothesis generation. The hypothesis data generated byhypothesis generation module 212 may be provided in any suitable data format. In some embodiments, hypothesis data in the context of document understanding may be represented as sets of tuples (e.g., representing entity, location, and value), each of which represent what is expected to be found from the documents data. - As shown in
FIG. 2 ,system 200 may provide for an “active vouching” pipeline and for a “passive vouching” pipeline that may each be applied, using some or all of the same underlying data, in parallel to one another. The two pipelines may be applied at the same time or one after the other. Below, the active vouching pipeline is described with respect toelement 214, while the passive vouching pipeline is described with respect to elements 216-224. -
System 200 may includeactive vouching module 214, which may include one or more processors configured to apply any one or more active vouching analysis operations.Active vouching module 214 may receive input data from one or more of:OCR module 204,document classification module 206, andhypothesis generation module 212.Active vouching module 214 may apply one or more active vouching analysis operations to some or all of the received data and may thereby generate active vouching output data. In some embodiments, an active vouching analysis operation may include a “fingerprinting” analysis operation. In some embodiments, active vouching or fingerprinting may include data processing operations configured to determine whether there exist one (or more) tuples (e.g., representing entity, location, and value) extracted from documents data that can match hypothesis data. Some embodiments of a fingerprinting analysis operation are described below with respect toFIGS. 3 and 4 . In some embodiments, the active vouching output data generated byactive vouching module 214 may be provided in any suitable data format. In some embodiments, the active vouching output may include data indicating one or more of the following: a confidence score indicating a confidence level as to whether there is a match (e.g., whether vouching criteria are met, whether there is a match for a hypothesis); a binary indication as to whether there is any match for a hypothesis, which may feedback iteratively into the fingerprinting process; and/or a location within a document corresponding to a hypothesis for which a confidence and/or a binary indication are generated. In some embodiments, the active vouching output may include four values: an entity name, an entity value, a location (indicating an exact or relative location of the entity), and a confidence value indicating a confidence value of the determined match. - In some embodiments, the active vouching operations performed by
module 214 may leverage contextual knowledge to inform what information is sought in an underlying document. In some embodiments, the active vouching operations performed bymodule 214 may be considered “context aware” because they are able to draw on contextual information that is injected viahypothesis generation module 212 drawing on data received fromknowledge substrate 210. - In some embodiments, the active vouching operations may include one or more deductive reasoning operations, which may include application of one or more rules-based approaches to evaluate document information (e.g., information received from OCR module 204). For example, a rules based approach may be used to determine that, if a document is a certain document type, then the document will be known to include certain associated data fields. In some embodiments, the deductive reasoning operation(s) may be used to calculate and/or adjust an overall weighting. In some embodiments, weighting may be used in integrating results from multiple approaches (e.g., an inductive approach and a deductive approach). A weighting may be trained using various machine learning methods.
- In some embodiments, the active vouching operations may include one or more inductive reasoning operations that may be based on a previous calculation or determination, historical information, or one or more additional insights. In some embodiments, inductive reasoning operations may based on learning from previous instances of similar data (e.g., sample documents) to determine what may be expected from future data.
- In some embodiments,
active vouching module 214 may apply context awareness, deductive reasoning, and inductive reasoning together for hypothesis testing. - Turning now to the passive vouching pipeline (elements 216-224),
system 200 may include three parallel pipelines within the passive vouching pipeline, as represented by template-basedpipeline 216,templateless pipeline 218, andspecialized pipeline 220. Each of pipelines 216-220 may comprise one or more processors configured to receive input data fromOCR module 204 and/or fromdocument classification module 206 and to process the received input data. Each of the pipelines 216-220 may apply respective data analysis operations to the received input data and may generate respective output data. - Template-based
pipeline 216 may be configured to apply any one or more template-based analysis operations to the received document data and/or document classification data and to generate output data representing document contents, such as one or more tuples representing entity, location, and value for content extracted from the document. Template-basedpipeline 216 may be configured to apply one or more document understanding models that are trained for a specific known format. Abbyy Flexicapture is an example of such template-based tool. -
Templateless pipeline 218 may be configured to apply any one or more analysis operations to the received document data and/or document classification data and to generate output data representing document contents, such as one or more tuples representing entity, location, and value for content extracted from the document.Templateless pipeline 218 may be configured which to operate without any assumption that documents being analyzed have a presumed “template” for document understanding. In some embodiments, a templateless approach may be less accurate than a template-based tool, and may require more training against a larger training set as compared to a template-based tool. -
Specialized pipeline 220 may be configured to apply any one or more analysis operations to the received document data and/or document classification data and to generate output data representing document contents. In some embodiments,specialized pipeline 220 may be configured to apply a signature analysis. In some embodiments, signature analysis may include signature detection, for example using a machine-learning algorithm configured to determine whether or not a signature is present. In some embodiments, additionally or alternatively to signature detection, signature analysis may include signature matching, for example using one or more data processing operations to determine a person whose signature matches a detected signature (for example by leveraging comparison to a library of known signatures). - In some embodiments,
specialized pipeline 220 may be used whensystem 200 has access to outside information, such as information in addition to information from documents source 202 and fromERP data source 208. For example, specialized pipeline may be configured to use information fromknowledge substrate 210 in analyzing the received data and generating output data. - In some embodiments,
pipeline 220 may be configured to extract data from documents that includes additional data (or data in a different format) as compared to data that is extracted bypipelines pipeline 220 may extract data other than (or in addition to) a tuple representing entity, location, and value). The extracted data may include logo data, signature data (e.g., an image or other representation of the signature, an indication as to whether there is a signature, etc.), figures, drawings, or the like. For an extracted logo, output data may include the logo itself (e.g., an image or other representation of the signature), a location within the document, and/or a customer name matched to the logo. For an extracted signature, output data may include the signature itself (e.g., an image or other representation of the signature), a location within the document, and/or a customer name matched to the signature. For extracted handwriting, output data may include the handwriting itself (e.g., an image or other representation of the handwriting), a location within the document, a customer name matched to the handwriting, and/or text extracted from the handwriting. For an extracted figure, output data may include the figure itself (e.g., an image or other representation of the figure), a location within the document, and/or a bounding box for the figure. -
System 200 may include normalization andcontextualization module 222, which may include one or more processors configured to perform one or more data normalization and/or contextualization operations. Normalization andcontextualization module 222 may receive input data from any one or more of: (a) template-basedpipeline 216, (b)templateless pipeline 218, (c)specialized pipeline 220; andknowledge substrate 210. Normalization andcontextualization module 222 may apply one or more normalization and contextualization operations to some or all of the received data and may thereby generate normalized and/or contextualized output data. - A normalization and contextualization data processing operation may determine context of an entity and/or may normalize an entity value so that it can be used for subsequent comparison or classification. Examples include (but are not limited to) the following: normalization of customer name data (such as alias, abbreviations, and potentially including parent/sibling/subsidiary when the name is used in the context of payment) based on master customer/vendor data; normalization of address data (e.g., based on geocoding, based on standardized addresses from a postal office, and/or based on customer/vendor data); normalization of product name and SKU based on master product data; normalization of shipping and payment terms based on terms (e.g., based on International Commerce Terms); and/or normalization of currency exchange code (e.g., based on ISO 4217).
- The normalized and/or contextualized output data generated by normalization and
contextualization module 222 may be provided in any suitable data format, for example as a set of tuples representing entity, entity location, normalized entity value, and confidence score. -
System 200 may include passive vouching andtracing module 224, which may include one or more processors configured to perform one or more passive vouching and tracing operations. Passive vouching andtracing module 224 may receive input data from any one or more of: (a) normalization andcontextualization module 222, (b)knowledge substrate 210, and (c)ERP data source 208. Passive vouching andtracing module 224 may apply one or more passive vouching and/or tracing operations to some or all of the received data and may thereby generate passive vouching and tracing output data. Passive vouching may comprise comparing values from a given transaction record (e.g., as represented in ERP data) with entity values extracted from documents data (which may be assumed to be the evidence that is associated with the transaction record). Passive tracing may comprise comparing values from a given document with a corresponding transaction record, e.g., from in the ERP. Comparison of entity values may be precise, such that the generated result indicates either a match or a mismatch, or the comparison may be fuzzy, such that the generated result comprises a similarity score. - The passive vouching and tracing output data generated by passive vouching and
tracing module 224 may be provided in any suitable data format. The passive vouching and tracing operations performed bymodule 224 may be considered “context aware” because they are able to draw on contextual information received fromknowledge substrate 210. In some embodiments, the passive vouching output may include four values: an entity name, an entity value, a location (indicating an exact or relative location of the entity), and a confidence value indicating a confidence value of the determined match. - Downstream of both the active vouching pipeline and the passive vouching pipeline,
system 200 may be configured to combine the results of the active vouching and the passive vouching pipelines in order to generate a combined result. -
System 200 may include dataintegrity integration module 226, which may include one or more processors configured to perform one or more data integrity integration operations. Dataintegrity integration module 226 may receive input data from any one or more of: (a)active vouching module 214 and (b) passive vouching andtracing module 224. Dataintegrity integration module 226 may apply one or more data integrity integration operations to some or all of the received data and may thereby data integrity integration output data. The data integrity integration output data generated by dataintegrity integration module 226 may be provided in any suitable data format, and may for example include a combined confidence score indicating a confidence level (e.g., a percentage confidence) by whichsystem 200 has determined that the underlying documents vouch for the ERP information. In some embodiments, the data integrity integration output data may comprise a set of tuples—e.g., representing entity, match score, and confidence—for each of the entities that have been analyzed. A decision (e.g., a preliminary decision) on whether the evidence is considered to support the existence and accuracy of a record (e.g., an ERP record) may be rendered as part of the data integrity integration output data. - In some embodiments, the one or more data integrity integration operations applied by
module 226 may process the input data fromactive vouching module 214 andpassive vouching module 224 in accordance with one of the following four scenarios: -
-
Scenario 1—in embodiments in whichactive vouching module 214 andpassive vouching module 224 each confirm an entity, the two confidence values associated with the two vouching methods may be combined with one another (e.g., through averaging and/or through a multiplication operation), including optionally by being used to boost one another, to generate an overall confidence level, or the higher of the two confidence levels may be chosen as the overall confidence level; -
Scenario 2—in embodiments in whichactive vouching module 214 confirms an entity butpassive vouching module 224 does not confirm an entity, the confidence level fromactive vouching module 214 may be used as an overall confidence level (with or without downward adjustment to reflect the lack of confirmation by passive vouching module 224); -
Scenario 3—in embodiments in whichpassive vouching module 224 confirms an entity butactive vouching module 214 does not confirm an entity, the confidence level frompassive vouching module 224 may be used as an overall confidence level (with or without downward adjustment to reflect the lack of confirmation by active vouching module 214); -
Scenario 4—in embodiments in whichactive vouching module 214 andpassive vouching module 224 generate conflicting results, the system may apply one or more operations to reconcile the conflicting results. In some embodiments, integrating result from passive and active vouching may comprise resolving an entity value, e.g., based on confidence level(s) obtained from passive and active approaches. This resolution may be performed for each individual entity.
-
-
FIGS. 3A-3B depict a diagram of how a fingerprinting algorithm may be used as part of a process to render a decision (e.g., a confidence value) about whether purchase order is vouched, in some embodiments, by the systems disclosed herein.FIGS. 3A-3B depict how two evidence sets may be used to generate an overall result indicating a vouching confidence level. In the example ofFIGS. 3A-3B , “evidence set 1” may comprise output data generated by an active vouching algorithm, and may share any one or more characteristics in common with the output data generated byactive vouching module 214 insystem 200. In the example ofFIGS. 3A-3B , “evidence set 2” may comprise output data generated by one or more document processing pipelines, and may share any one or more characteristics in common with the output data generated bypipelines system 200. In some embodiments, the combination of evidence set 1 and evidence set 2, as shown inFIGS. 3A-3B , to generate a vouching decision and/or a confidence value (as shown, for example, inFIG. 3B ), may correspond to any one or more ofmodules system 200. - Fingerprinting is a technique that may leverage ERP data to aid document understanding and vouching. Fingerprinting uses the context from ERP as a fingerprint for how the system searches an unstructured document for evidence of a match. By knowing what PO characteristics to look for from the ERP entry (e.g., specific PO #, set of item numbers associated with this PO, total amount of this PO, etc.), the system may look for those evidences in the attached PO (unstructured document).
- One advantage of fingerprinting is that it may provide important context that allows an AI algorithm to make better judgement of what it is seeing on a document, such that the system can achieve higher extraction accuracy and match rates. One drawback of fingerprinting is that, if not used carefully, it may introduce bias—e.g., causing the system to see “only what you want to see.” For example, there may be additional attachments (POs, transactions, statements) that bear no relationships to the ERP but yet should be carefully reviewed. Thus, in some embodiments fingerprinting should not be used alone, but rather should be combined with other vouching logic and algorithms to ensure accuracy and effectiveness.
- In some embodiments, fingerprinting can include a simple search for an expected value, such as a particular PO number. As PO number is very unique, this may work well in most cases, giving the system confidence that if it found PBC2145XC01, it did indeed match on the expected PO number. However, other fields might not be as simple, for example, the field Quantity. Searching for a value of ‘1’ could return a number of matches on a single document and even more across an entire set of documents, giving the system little confident that it has indeed matched on Quantity. Thus, it is important to include the ability to measure the system's confidence, as well as to design additional algorithms and ML models to help improve confidence and hone in on the right match. For example, if the system that the Item #, Unit Price for the PO line with that Quantity are located nearby or resides on the same PO line, this gives the match higher confidence and can remove other spurious matches of the value “1”. Confidence in fingerprinting may be refined by combining what is learned from 1) template-based extraction, 2) template-less extraction, and 3) additional ML models and algorithms on top of search findings, to remove spurious matches and increase confidence in matches.
-
FIGS. 3A-3B show how various document-understanding components function together with fingerprinting, in accordance with some embodiments. The combination of functions shown inFIGS. 3A-3B may enable improved overall goals, including an increased percent of vouched entries and an increased confidence on vouched entries. -
FIG. 4 shows a diagram of a fingerprinting algorithm, in accordance with some embodiments. - In some embodiments, a fingerprinting algorithm may generate output for PO Headers and/or PO Lines. The algorithm may support exact match (fuzzy=1.0) and fuzzy matches. The algorithm may use Elasticsearch to index OCR text extraction of unstructured documents for search and/or lookup. The algorithm may use entity extraction to identify and normalize dates. The algorithm may use one or more spatial models to identify PO Lines to reduce spurious matches. The algorithm may support derived total amount search. The algorithm may support delivery terms synonyms.
- In some embodiments, the fingerprinting algorithm may include one or more of the following steps, sub-steps, and/or features:
- 1) Prepare the ERP data for search (prepare_master.ipynb).
-
- a) This puts it in a standard format for searching field content against unstructured document. If one follows same format, this can be applied to other ERP entries (invoices, shipment tracking number, etc.)
- b) Also, computes the total amount from the PO lines and will look for this derived total amount while going through the “PO Headers” in Step 6.
2) Perform text extraction of PDFs using Abbyy Finereader FRE. - a) This produces a_basic.XML that has all the text blocks.
3) Create concatenated text document from these text blocks
4) Perform entity extraction on text document
5) Index text document into Elasticsearch (text plus entities and some metadata) - a) Incorporate document classification model results so the system knows which ones are POs
- i) Optional whether the system excludes indexing non-POs or marks it in elasticsearch
6) Run fingerprinting search on PO headers
- i) Optional whether the system excludes indexing non-POs or marks it in elasticsearch
- a) For each field, analyze expected ERP data and generate text value candidates
- i) For example, delivery terms will have a set of synonyms to the one in ERP as search candidates
- ii) For example, date will be normalized to search against the date entities of documents
- b) Issue appropriate query against elasticsearch
- i) Target at documents with same SO
- ii) If non-POs were included, optionally limit to docclass=PO
- c) Evaluate elasticsearch results
- i) Interpret and find fuzzy matches from elasticsearch highlighted text
- ii) Compute fuzzy scores with search candidates
- iii) Match if fuzzy score equal or above configured threshold
- iv) compute confidence (1/number of matches)
7) Run fingerprinting search on PO lines
- a) The PO lines search is run separately from the PO headers
- b) Run algorithm to identify PO lines
- i) For each SO,
- (1) From ERP, find all the item numbers, this is used as anchor
- (2) Find all POs (document classification results) for this SO, and for each document
- (a) Identify locations in text of all anchor values (i.e. in item numbers)
- (b) Calculate spacing between anchor values (number of word token part)
- (c) Calculate average of these spacing as line window width
- (3) With the line window width and the location of the anchors, the system know the vicinity of values for a given PO line
- i) For each SO,
- c) Run search for each ERP PO line, limited to the PO line window of text identified in previous step
- i) For each PO line in ERP, look for the line values (e.g., Item #, Unit Price, Quantity, etc.) in the corresponding PO line window
- (1) The window may be defined as: (location of anchor−window size, location of anchor+window size)
- (2) This may be refined with more experiments
- (3) Match if fuzzy score equal or above configured threshold
- (4) Compute confidence (1/number of matches)
- i) For each PO line in ERP, look for the line values (e.g., Item #, Unit Price, Quantity, etc.) in the corresponding PO line window
- Pursuant to the need to perform automated vouching, there is a need for improved systems and methods for vouching ERP entries against bank statement data in order to verify payment.
- In some embodiments, a system is configured vouch payment data against evidence data. More specifically, a system may be configured to provide a framework that performs ERP payment activities vouching against physical bank statement. The system may include a pipeline that perform information extraction and characteristics extraction from bank statements, and the system may leverage one or more advanced data structures and matching algorithms to perform one-to-many matching between ERP data and bank statement data. The payment vouching systems provided herein may thus automate the process of finding material evidence such as remittance advice or bank statements to corroborate ERP payment entries.
- The system may be configured to receive a data set comprising bank statement data, wherein the bank statement data may be provided, for example, in the form of PDF files or JPG files of bank statements. The system may apply one or more data processing operations (e.g., AI models) to the received bank statement data in order to extract information (e.g., key content and characteristics) from said data. The extracted information may be stored in any suitable output format, and/or may be used to generate one or more feature vectors representing one or more bank statements in the bank statement data.
- The system may be configured to receive a data set comprising ERP data, wherein the ERP data may be comprise one or more ERP entries. The system may apply one or more data processing operations (e.g., AI models) to the received ERP data in order to extract information (e.g., key content and characteristics) from said data. The extracted information may be stored in any suitable output format, and/or may be used to generate one or more feature vectors representing one or more ERP entries in the ERP data.
- The system may be configured to apply one or more algorithms (e.g., matching algorithms) to compare the information extracted from the bank statements against the information extracted from the ERP entries, and to thereby determine whether the bank statements sufficiently vouch the ERP entries. In some embodiments, performing the comparison may comprise applying an approximation algorithm configured to achieve better matching rates between ERP records and bank statements with minor numeric discrepancies, which may be caused, for example, due to currency conversion, rather than being indicative of substantive discrepancies. The system may determine, based on the similarity or dissimilarity of the information indicated by the two information sets, whether one or more vouching criteria are satisfied. The system may generate an output that indicates a level of matching between the bank statements and ERP entries (e.g., a similarity score), an indication of whether one or more vouching criteria (e.g., a threshold similarity score and/or threshold confidence level) are met, an indication of any discrepancies identified, and/or a level of confidence (e.g., a confidence score) in one or more conclusions reached by the system. In some embodiments, output data may be stored, transmitted, presented to a user, used to generate one or more visualizations, and/or used to trigger one or more automated system actions.
- In some embodiments, the system may be configured in a modular manner, such that one or more data processing operations may be modified without modification or one or more feature engineering and/or data comparison operations, and vice versa. This may allow for the system to be configured and fine-tuned in accordance with changes in business priorities, requested new features, or evolution of legal or regulatory requirements.
-
FIGS. 5A-5B show a diagram of apayment vouching method 500, in accordance with some embodiments. In some embodiments, all or part of the method depicted inFIGS. 5A-5B may be applied by the systems described herein (e.g., system 200). In some embodiments, a payment vouching method may seek to match data representing one or more of the following: date, amount, customer name, and invoice number. As shown inFIG. 5A , the system may accept ERP payment journal data and bank statement data as inputs (optionally following data pre-processing and formatting). The bank statement data may be subject to one or more AI information extraction models to extract information regarding transaction category, customer name, and invoices. The system may then apply a first matching algorithm, for example a fuzzy matching algorithm, to compare the ERP data to the data extracted from the bank statements. If a match is detected, then the system may, among one or more other operations, apply one or more comparison and/or scoring operations in order to generate overall match score data and overall confidence data. If no match is detected, then the system may apply a second matching algorithm, for example an optimization algorithm that has been proposed to solve the Knapsack problem. If no match is detected by the second algorithm, then an overall match score of 0 may be generated. If a match is detected by the second algorithm, then the system may select an optimal subset candidate and may, among one or more other operations, apply one or more comparison and/or scoring operations in order to generate an overall match score and an overall confidence score. A more detailed description follows. - At
block 502, in some embodiments, the system may receive data representing ERP information, for example by receiving data from an ERP payment journal data source. The data representing ERP information may be received automatically, according to a predefined schedule, in response to one or more trigger conditions being met, as part of a scraping method, and/or in response to a user input. The system may receive the ERP data in any acceptable format. In some embodiments, ERP data may be provided in a tabular data format, including a data model that defines the structure of the data. ERP data may be received from “account receivable” data or from “cash received” data. ERP data may be in tabular format including customer name, invoice data, and invoice amount. - At
block 504, in some embodiments, the system may receive data representing one or more bank statements. The data representing the bank statements may be received automatically, according to a predefined schedule, in response to one or more trigger conditions being met, as part of a scraping method, and/or in response to a user input. The system may receive the bank statement data in any acceptable format, for example as a structured and/or unstructured document, including for example a PDF document. In some embodiments, the system may receive bank statement data in PDF format and/or CSV format. In some embodiments, the system may download electronic bank statement data (such as BAI/BAI2, Multicash, MT940). In some embodiments, the system may receive bank statement data via EDI and/or ISO 20022. In some embodiments, the system may receive bank statement data through one or more API aggregators such as Plaid and Yodlee. - At
block 506, in some embodiments, the system may apply one or more information extraction models to the data representing the one or more bank statements. The one information extraction models may generatetransaction category data 508,customer name data 510, and/orinvoice data 512. The extracted information may be stored, displayed to a user, transmitted, and/or used for further processing for example as disclosed herein. - At
block 514, in some embodiments, the system may apply one or more fuzzy matching algorithms. The one or more fuzzy matching algorithms may accept input data including (but not limited to) data representing ERP information fromblock 502,transaction category data 508,customer name data 510, and/orinvoice data 512. The one or more fuzzy matching algorithms may compare data in a many-to-many manner. The one or more fuzzy matching algorithms may process the received input data in order to determine whether there is a match or a near match (e.g., a “fuzzy match”) between the data representing ERP information and thetransaction category data 508,customer name data 510, and/orinvoice data 512. The one or more fuzzy matching algorithms may generate data representing an indication as to whether or not a match has been determined. The indication may comprise a binary indication as to whether or not a match has been determined and/or may comprise a confidence score representing a confidence level that a match has been determined. - At
block 516, in some embodiments, the system may determine whether a match was determined atblock 514. In some embodiments, the system may reference output data generated by the one or more fuzzy matching algorithms to determine whether a match was determined, for example by referencing whether a match is indicated by the output data on a binary basis. In some embodiments, the system may determine whether a match score generated atblock 514 exceeds one or more predetermined or dynamically-determined threshold values in order to determine whether match criteria are met and thus whether a match is determined. In accordance with a determination that a match was determined,method 500 may proceed to blocks 518-538. In accordance with a determination that a match was not determined,method 500 may proceed to block 540 and onward. - Turning first to cases in which it is determined at
block 516 that a match was determined, attention is drawn to block 518. Atblock 518, the system may determine whether the match that was determined is a one-to-one match. In some embodiments, the system may reference output data generated by the one or more fuzzy matching algorithms to determine whether the match that was determined is a one-to-one match. In accordance with a determination that the match that was determined is a one-to-one match, the method may proceed to one or both ofblocks - At
block 520, in some embodiments, the system may apply a fuzzy comparison algorithm to data representing customer name information. In some embodiments, the system may compare customer name data in the data representing ERP information (received at block 502) to customer name data in the data representing one or more bank statements (received at block 504). The comparison of customer name data may generate output data comprising customername match score 522, which may indicate an extent to which and/or a confidence with which the compared customer name data matches. - At
block 524, in some embodiments, the system may apply a fuzzy comparison algorithm to data representing invoice information. In some embodiments, the system may compare invoice data in the data representing ERP information (received at block 502) to invoice data in the data representing one or more bank statements (received at block 504). The comparison of invoice data may generate output data comprisinginvoice match score 526, which may indicate an extent to which and/or a confidence with which the compared invoice data matches. - In some embodiments, the processes represented by
blocks - In some embodiments, one or more other component scores, aside from or in addition to a customer name match score and an invoice match score, may be computed.
- In addition to or alternatively to customer
name match score 522 andinvoice match score 526, the system may generate data comprisingtemporal match score 528, for example by performing a fuzzy comparison of date data as shown atblock 527.Temporal match score 528 may be computed based on a temporal difference (e.g., a number of days difference) in compared data. For example, the system may compare a date indicated in the data representing ERP information (received at block 502) to a date indicated in the data representing one or more bank statements (received at block 504), and may generatetemporal match score 528 based on the difference between the two compared dates. - Following generation of component scores including for example customer
name match score 522,invoice match score 526, and/ortemporal match score 528, the system may generate an overall match score and/or an overall confidence score based on the component scores. - At block 532, in some embodiments, the system may compute
overall match score 534. Computation ofoverall match score 534 may comprise applying an averaging algorithm (e.g., averaging non-zero component scores), for example by computing a weighted or unweighted average of one or more underlying component scores. In some embodiments,overall match score 534 may be computed as a the sum of three terms: a weighted fuzzy date comparison score (e.g., weighted 528), a weighted fuzzy customer name comparison score (e.g., weighted 522), and a weighted fuzzy invoice number comparison score (e.g., weighted 526). Computing an additiveoverall match score 534 may mean theoverall match score 534 is higher when it is based on a comparison of more (e.g., all three) underlying terms than when it is not. - At
block 536, in some embodiments, the system may computeoverall confidence score 538. Computation ofoverall confidence score 538 may comprise applying an algorithm based one or more underlying confidence scores, such as confidence scores associated with one or more of underlying component scores. In some embodiments, a highest underlying confidence score may be selected asoverall confidence score 538. In some embodiments, a lowest underlying confidence score may be selected asoverall confidence score 538. In some embodiments, a weighted or unweighted average of underlying confidence scores may be computed asoverall confidence score 538. In some embodiments, a product based on underlying confidence scores may be computed asoverall confidence score 538. -
Overall match score 534 and/oroverall confidence score 538 may be stored, transmitted, presented to a user, used to generate one or more visualizations, and/or used to trigger one or more automated system actions. - Turning now to cases in which it is determined at
block 516 that a match was not determined, attention drawn to block 540. Atblock 540, in some embodiments, the system may apply one or more amount matching algorithms, for example including one or more optimization algorithms that have been proposed to solve the Knapsack problem. The one or more amount matching algorithms may accept input data including (but not limited to) data representing ERP information fromblock 502,transaction category data 508,customer name data 510, and/orinvoice data 512. The one or more amount matching algorithms may compare data in a one to many manner. The one or more amount matching algorithms may compare data from one bank transaction (e.g., data received at block 504) to data for many vouchers (e.g., data received at block 502). The one or more amount matching algorithms may process the received input data in order to determine whether there is a match between the data representing ERP information and thetransaction category data 508,customer name data 510, and/orinvoice data 512. The one or more amount matching algorithms may generate data representing an indication as to whether or not a match has been determined. The indication may comprise a binary indication as to whether or not a match has been determined and/or may comprise a confidence score representing a confidence level that a match has been determined. - At
block 542, in some embodiments, the system may determine whether a match was determined atblock 540. In some embodiments, the system may reference output data generated by the one or more amount matching algorithms to determine whether a match was determined, for example by referencing whether a match is indicated by the output data on a binary basis. In some embodiments, the system may determine whether a match score generated atblock 540 exceeds one or more predetermined or dynamically-determined threshold values in order to determine whether match criteria are met and thus whether a match is determined. In accordance with a determination that a match was determined,method 500 may proceed to blocks 544-564. In accordance with a determination that a match was not determined,method 500 may proceed to block 566 and onward. - At
block 544, in some embodiments, the system may select a candidate subset of data from the data received atblock 502 and/or the data received atblock 504. The analysis performed at blocks 546-564 may be performed with respect to the selected candidate subset of data. In some embodiments, to perform candidate subset selection, the system may identify a set of bank transactions that may be a match, and may then assess each item in the subset to determine which is the best match. In some embodiments, candidate subsets may include different numbers of items in the candidate subset. For example, one candidate subsets may be “three transactions that may match to a voucher,” while another candidate subset may be “two transactions that may match to a voucher.” - In some embodiments, candidate subset selection may proceed as follows: candidates may be sorted from largest to smallest; then those items in the sorted list that are already larger than the target may be eliminated, and only those which are smaller than or equal to the target amount are retained; then, a total amount from all of the remaining items may be computed, and those that match the target may be identified. In some embodiments, an overall objective may include determining whether the amount C from payment is a match to two or more elements among {A1, A2, A3}. If A1, A2, A3, have been sorted from largest to smallest, then it may be necessary to test whether
-
C=A1+A2; or -
C=A2+A3; or -
C=A1+A2+A3. - Thus, if A1 is known to be larger than C, then other additive combinations that include A1 may be known to be larger than C, and thus may not need to be tested, and the only remaining possibility that may need to be tested is whether C=A2+A3.
- Based on the selected candidate subset, the system may generate one or more component scores, such as component scores 548, 552, and/or 556 described below.
- At
block 546, in some embodiments, the system may apply one or more subset match score algorithms to the selected candidate subset of data, thereby generating subset match score 548, which may indicate an extent to which and or a confidence by which two or more components (e.g., data points) of the selected subset match with one another.Block 546 may compare a voucher amount to a bank amount.Block 546 may compare an amount appearing in the data received atblock 502 to an amount appearing in the data received atblock 504. - At
block 550, in some embodiments, the system may apply one or more fuzzy name comparison algorithms to the selected candidate subset of data, thereby generating customername match score 552, which may indicate an extent to which and or a confidence by which two or more customer names in the selected subset match with one another.Block 550 may compare a customer name in voucher data with a customer name in statement data.Block 550 may compare a customer name appearing in the data received atblock 502 to a customer name appearing in the data received atblock 504. - At
block 554, in some embodiments, the system may apply one or more fuzzy invoice comparison algorithms to the selected candidate subset of data, thereby generatinginvoice match score 556, which may indicate an extent to which and or a confidence by which two or more invoices in the selected subset match with one another.Block 554 may compare two instances of invoice data to one another.Block 550 may compare invoice data appearing in the data received atblock 502 to invoice data appearing in the data received atblock 504. - Following generation of component scores including for example subset match score 548, customer
name match score 552, and/orinvoice match score 556, the system may generate an overall match score and/or an overall confidence score based on the component scores. - At
block 558, in some embodiments, the system may computeoverall match score 560. Computation ofoverall match score 560 may comprise applying an averaging algorithm (e.g., averaging non-zero component scores), for example by computing a weighted or unweighted average of one or more underlying component scores. - At
block 562, in some embodiments, the system may compute overall confidence score 564. Computation of overall confidence score 564 may comprise applying an algorithm based one or more underlying confidence scores, such as confidence scores associated with one or more of underlying component scores. In some embodiments, a highest underlying confidence score may be selected as overall confidence score 564. In some embodiments, a lowest underlying confidence score may be selected as overall confidence score 564. In some embodiments, a weighted or unweighted average of underlying confidence scores may be computed as overall confidence score 564. In some embodiments, a product based on underlying confidence scores may be computed as overall confidence score 564. -
Overall match score 560 and/or overall confidence score 564 may be stored, transmitted, presented to a user, used to generate one or more visualizations, and/or used to trigger one or more automated system actions. - Turning now to cases in which it is determined at
block 542 that a match was not determined, attention drawn to block 564. At block 564, in some embodiments, the system may determine that an overall match score is 0. The overall match score of 0 may be stored, transmitted, presented to a user, used to generate one or more visualizations, and/or used to trigger one or more automated system actions. - In some embodiments, the system may be configured to apply a plurality of different algorithms (e.g., two different algorithms, three different algorithms, etc.) as part of a payment vouching process. In some embodiments, the algorithms may be applied in parallel. In some embodiments, the algorithms may be applied in series. In some embodiments, the algorithms may be applied selectively dependent on the outcome of one another; for example, the system may first apply one algorithm and then may apply another algorithm selectively dependent on the outcome of the first algorithm (e.g., whether or not a match was indicated by the first algorithm). In some embodiments the system may be configured to apply a waterfall algorithm, a fuzzy date-amount algorithm, and an optimization algorithm that has been proposed to solve the Knapsack problem.
-
FIG. 6 illustrates an example of a computer, according to some embodiments.Computer 600 can be a component of a system for providing an AI-augmented auditing platform including techniques for providing AI-explainability for processing data through multiple layers. In some embodiments,computer 600 may execute any one or more of the methods described herein. -
Computer 600 can be a host computer connected to a network.Computer 600 can be a client computer or a server. As shown inFIG. 6 ,computer 600 can be any suitable type of microprocessor-based device, such as a personal computer, workstation, server, or handheld computing device, such as a phone or tablet. The computer can include, for example, one or more ofprocessor 610, input device 620, output device 630, storage 640, and communication device 660. Input device 620 and output device 630 can correspond to those described above and can either be connectable or integrated with the computer. - Input device 620 can be any suitable device that provides input, such as a touch screen or monitor, keyboard, mouse, or voice-recognition device. Output device 630 can be any suitable device that provides an output, such as a touch screen, monitor, printer, disk drive, or speaker.
- Storage 640 can be any suitable device that provides storage, such as an electrical, magnetic, or optical memory, including a random access memory (RAM), cache, hard drive, CD-ROM drive, tape drive, or removable storage disk. Communication device 660 can include any suitable device capable of transmitting and receiving signals over a network, such as a network interface chip or card. The components of the computer can be connected in any suitable manner, such as via a physical bus or wirelessly. Storage 640 can be a non-transitory computer-readable storage medium comprising one or more programs, which, when executed by one or more processors, such as
processor 610, cause the one or more processors to execute methods described herein. - Software 650, which can be stored in storage 640 and executed by
processor 610, can include, for example, the programming that embodies the functionality of the present disclosure (e.g., as embodied in the systems, computers, servers, and/or devices as described above). In some embodiments, software 650 can include a combination of servers such as application servers and database servers. - Software 650 can also be stored and/or transported within any computer-readable storage medium for use by or in connection with an instruction execution system, apparatus, or device, such as those described above, that can fetch and execute instructions associated with the software from the instruction execution system, apparatus, or device. In the context of this disclosure, a computer-readable storage medium can be any medium, such as storage 640, that can contain or store programming for use by or in connection with an instruction execution system, apparatus, or device.
- Software 650 can also be propagated within any transport medium for use by or in connection with an instruction execution system, apparatus, or device, such as those described above, that can fetch and execute instructions associated with the software from the instruction execution system, apparatus, or device. In the context of this disclosure, a transport medium can be any medium that can communicate, propagate, or transport programming for use by or in connection with an instruction execution system, apparatus, or device. The transport-readable medium can include but is not limited to, an electronic, magnetic, optical, electromagnetic, or infrared wired or wireless propagation medium.
-
Computer 600 may be connected to a network, which can be any suitable type of interconnected communication system. The network can implement any suitable communications protocol and can be secured by any suitable security protocol. The network can comprise network links of any suitable arrangement that can implement the transmission and reception of network signals, such as wireless network connections, T1 or T3 lines, cable networks, DSL, or telephone lines. -
Computer 600 can implement any operating system suitable for operating on the network. Software 650 can be written in any suitable programming language, such as C, C++, Java, or Python. In various embodiments, application software embodying the functionality of the present disclosure can be deployed in different configurations, such as in a client/server arrangement or through a Web browser as a Web-based application or Web service, for example. - Following is a list of enumerated embodiments:
-
-
Embodiment 1. A system for determining whether data within an electronic document constitutes vouching evidence for an enterprise resource planning (ERP) item, the system comprising one or more processors configured to cause the system to: - receive data representing an ERP item;
- generate hypothesis data based on the received data represent an ERP item;
- receive an electronic document;
- extract ERP information from the document;
- apply a first set of one or more models to the hypothesis data and to extracted ERP information in order to generate first output data indicating whether the extracted ERP information constitutes vouching evidence for the ERP item;
- apply a second set of one or more models to the extracted ERP information in order to generate second output data indicating whether the extracted ERP information constitutes vouching evidence for the ERP item; and
- generate combined determination data, based on the first output data and the second output data, indicating whether the extracted ERP information constitutes vouching evidence for the ERP item.
-
Embodiment 2. The system ofembodiment 1, wherein extracting the ERP information comprises generating first data representing information content of the ERP information and second data representing a document location for the ERP information -
Embodiment 3. The system of any one of embodiments 1-2, wherein the ERP information comprises one or more of: a purchase order number, a customer name, a date, a delivery term, a shipping term, a unit price, and a quantity. -
Embodiment 4. The system of any one of embodiments 1-3, wherein applying the first set of one or more models to generate output data is based on preexisting information regarding spatial relationships amongst instances of ERP information in documents. -
Embodiment 5. The system ofembodiment 4, wherein the preexisting information comprises a graph representing spatial relationships amongst instances of ERP information in documents. - Embodiment 6. The system of any one of embodiments 1-5, wherein the one or more processors are configured to cause the system to augment the hypothesis data based on one or more models representing contextual data.
- Embodiment 7. The system of embodiment 6, wherein the contextual data comprises information regarding one or more synonyms for the information content of the ERP information.
- Embodiment 8. The system of any one of embodiments 1-7, wherein the ERP information comprises a single word in the document.
- Embodiment 9. The system of any one of embodiments 1-8, wherein the ERP information comprises a plurality of words in the document.
-
Embodiment 10. The system of any one of embodiments 1-9, wherein the second output data comprises one or more of: - a confidence score indicating a confidence level as to whether the extracted ERP information constitutes vouching evidence for the ERP item;
- a binary indication as to whether the extracted ERP information constitutes vouching evidence for the ERP item; and
- a location within the electronic document corresponding to the determination as to whether the extracted ERP information constitutes vouching evidence for the ERP item.
- Embodiment 11. The system of
embodiment 1, wherein generating the second output data comprises generating a similarity score representing a comparison of the ERP information and the ERP item. -
Embodiment 12. The system of embodiment 11, wherein the similarity score is generated based on an entity graph representing contextual data. - Embodiment 13. The system of any one of embodiments 1-12, wherein extracting the ERP information from the document comprises applying a fingerprinting operation to determine, based on the receive data representing an ERP item, a characteristic of a data extraction operation to be applied to the electronic document.
- Embodiment 14. The system of any one of embodiments 1-13, wherein applying the second set of one or more models is based at least in part on contextual data.
- Embodiment 15. The system of any one of embodiments 1-14, wherein applying the second set of one or more models comprises:
- applying a set of document processing pipelines in parallel to generate a plurality of processing pipeline output data;
- applying one or more data normalization operations to the plurality of processing pipeline output data to generate normalized data; and
- generating the second output data based on the normalized data.
- Embodiment 16. A non-transitory computer-readable storage medium storing instructions for determining whether data within an electronic document constitutes vouching evidence for an enterprise resource planning (ERP) item, the instructions configured to be executed by a system comprising one or more processors to cause the system to:
- receive data representing an ERP item;
- generate hypothesis data based on the received data represent an ERP item;
- receive an electronic document;
- extract ERP information from the document;
- apply a first set of one or more models to the hypothesis data and to extracted ERP information in order to generate first output data indicating whether the extracted ERP information constitutes vouching evidence for the ERP item;
- apply a second set of one or more models to the extracted ERP information in order to generate second output data indicating whether the extracted ERP information constitutes vouching evidence for the ERP item; and
- generate combined determination data, based on the first output data and the second output data, indicating whether the extracted ERP information constitutes vouching evidence for the ERP item.
- Embodiment 17. A method for determining whether data within an electronic document constitutes vouching evidence for an enterprise resource planning (ERP) item, wherein the method is performed by a system comprising one or more processors, the method comprising:
- receiving data representing an ERP item;
- generating hypothesis data based on the received data represent an ERP item;
- receiving an electronic document;
- extracting ERP information from the document;
- applying a first set of one or more models to the hypothesis data and to extracted
- ERP information in order to generate first output data indicating whether the extracted ERP information constitutes vouching evidence for the ERP item;
- applying a second set of one or more models to the extracted ERP information in order to generate second output data indicating whether the extracted ERP information constitutes vouching evidence for the ERP item; and
- generating combined determination data, based on the first output data and the second output data, indicating whether the extracted ERP information constitutes vouching evidence for the ERP item.
- Embodiment 18. A system for verifying an assertion against a source document, the system comprising one or processors configured to cause the system to:
- receive first data indicating an unverified assertion;
- receive second data comprising a plurality of source documents;
- apply one or more extraction models to extract a set of key data from the plurality of source documents; and
- apply one or more matching models to compare the first data to the set of key data to generate an output indicating whether one or more of the plurality of source documents satisfies one or more verification criteria for verifying the unverified assertion.
-
Embodiment 19. The system of embodiment 18, wherein the one or more extraction models comprise one or more machine learning models. - Embodiment 20. The system of any one of embodiments 18-19, wherein the one or more matching models comprises one or more approximation models.
- Embodiment 21. The system of any one of embodiments 18-20, wherein the one or more matching models are configured to perform one-to-many matching between the first data and the set of key data.
- Embodiment 22. The system of any one of embodiments 16-21, wherein the one of more processors are configured to cause the system to modify one or more of the extraction models without modification of one or more of the matching models.
- Embodiment 23. The system of any one of embodiments 18-22, wherein the one of more processors are configured to cause the system to modify one or more of the matching models without modification of one or more of the extraction models.
- Embodiment 24. The system of any one of embodiments 18-23, wherein the unverified assertion comprises an ERP payment entry.
- Embodiment 25. The system of any one of embodiments 18-24, wherein the plurality of source documents comprises a bank statement.
- Embodiment 26. The system of any one of embodiments 18-25, wherein applying one or more matching models comprises generating a match score and generating a confidence score.
- Embodiment 27. The system of any one of embodiments 18-26, wherein applying one or more matching models comprises: applying a first matching model;
- if a match is indicated by the first matching model, generating a match score and a confidence score based on the first matching model;
- if a match is not indicated by the second matching model:
- applying a second matching model; and
- if a match is indicated by the second matching model, generating a match score and a confidence score based on the second matching mode; and
- if a match is not indicated by the second matching model, generating a match score of 0.
-
Embodiment 28. A non-transitory computer-readable storage medium storing instructions for verifying an assertion against a source document, the instructions configured to be executed by a system comprising one or processors to cause the system to:- receive first data indicating an unverified assertion;
- receive second data comprising a plurality of source documents;
- apply one or more extraction models to extract a set of key data from the plurality of source documents; and
- apply one or more matching models to compare the first data to the set of key data to generate an output indicating whether one or more of the plurality of source documents satisfies one or more verification criteria for verifying the unverified assertion.
- Embodiment 29. A method for verifying an assertion against a source document, wherein the method is executed by a system comprising one or processors, the method comprising:
- receiving first data indicating an unverified assertion;
- receiving second data comprising a plurality of source documents;
- applying one or more extraction models to extract a set of key data from the plurality of source documents; and
- applying one or more matching models to compare the first data to the set of key data to generate an output indicating whether one or more of the plurality of source documents satisfies one or more verification criteria for verifying the unverified assertion.
-
- This application incorporates by reference the entire contents of the U.S. patent application titled “AI-AUGMENTED AUDITING PLATFORM INCLUDING TECHNIQUES FOR AUTOMATED ADJUDICATION OF COMMERCIAL SUBSTANCE, RELATED PARTIES, AND COLLECTABILITY”, filed Jun. 30, 2022, Attorney Docket no. 13574-20069.00.
- This application incorporates by reference the entire contents of the U.S. patent application titled “AI-AUGMENTED AUDITING PLATFORM INCLUDING TECHNIQUES FOR APPLYING A COMPOSABLE ASSURANCE INTEGRITY FRAMEWORK”, filed Jun. 30, 2022, Attorney Docket no. 13574-20070.00.
- This application incorporates by reference the entire contents of the U.S. patent application titled “AI-AUGMENTED AUDITING PLATFORM INCLUDING TECHNIQUES FOR AUTOMATED DOCUMENT PROCESSING”, filed Jun. 30, 2022, Attorney Docket no. 13574-20071.00.
- This application incorporates by reference the entire contents of the U.S. patent application titled “AI-AUGMENTED AUDITING PLATFORM INCLUDING TECHNIQUES FOR PROVIDING AI-EXPLAINABILITY FOR PROCESSING DATA THROUGH MULTIPLE LAYERS”, filed Jun. 30, 2022, Attorney Docket no. 13574-20072.00.
Claims (17)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/854,329 US20230005075A1 (en) | 2021-06-30 | 2022-06-30 | Ai-augmented auditing platform including techniques for automated assessment of vouching evidence |
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202163217134P | 2021-06-30 | 2021-06-30 | |
US202163217123P | 2021-06-30 | 2021-06-30 | |
US202163217119P | 2021-06-30 | 2021-06-30 | |
US202163217127P | 2021-06-30 | 2021-06-30 | |
US202163217131P | 2021-06-30 | 2021-06-30 | |
US17/854,329 US20230005075A1 (en) | 2021-06-30 | 2022-06-30 | Ai-augmented auditing platform including techniques for automated assessment of vouching evidence |
Publications (1)
Publication Number | Publication Date |
---|---|
US20230005075A1 true US20230005075A1 (en) | 2023-01-05 |
Family
ID=84692999
Family Applications (5)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/854,352 Pending US20230004845A1 (en) | 2021-06-30 | 2022-06-30 | Ai-augmented auditing platform including techniques for providing ai-explainability for processing data through multiple layers |
US17/854,329 Pending US20230005075A1 (en) | 2021-06-30 | 2022-06-30 | Ai-augmented auditing platform including techniques for automated assessment of vouching evidence |
US17/854,348 Pending US20230004604A1 (en) | 2021-06-30 | 2022-06-30 | Ai-augmented auditing platform including techniques for automated document processing |
US17/854,337 Pending US20230004590A1 (en) | 2021-06-30 | 2022-06-30 | Ai-augmented auditing platform including techniques for automated adjudication of commercial substance, related parties, and collectability |
US17/854,338 Pending US20230004888A1 (en) | 2021-06-30 | 2022-06-30 | Ai-augmented auditing platform including techniques for applying a composable assurance integrity framework |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/854,352 Pending US20230004845A1 (en) | 2021-06-30 | 2022-06-30 | Ai-augmented auditing platform including techniques for providing ai-explainability for processing data through multiple layers |
Family Applications After (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/854,348 Pending US20230004604A1 (en) | 2021-06-30 | 2022-06-30 | Ai-augmented auditing platform including techniques for automated document processing |
US17/854,337 Pending US20230004590A1 (en) | 2021-06-30 | 2022-06-30 | Ai-augmented auditing platform including techniques for automated adjudication of commercial substance, related parties, and collectability |
US17/854,338 Pending US20230004888A1 (en) | 2021-06-30 | 2022-06-30 | Ai-augmented auditing platform including techniques for applying a composable assurance integrity framework |
Country Status (5)
Country | Link |
---|---|
US (5) | US20230004845A1 (en) |
EP (5) | EP4364031A1 (en) |
AU (5) | AU2022301115A1 (en) |
CA (5) | CA3225591A1 (en) |
WO (5) | WO2023279037A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116562785A (en) * | 2023-03-17 | 2023-08-08 | 广东铭太信息科技有限公司 | Auditing and welcome system |
US20240012848A1 (en) * | 2022-07-11 | 2024-01-11 | Bank Of America Corporation | Agnostic image digitizer |
CN117422428A (en) * | 2023-12-19 | 2024-01-19 | 尚恰实业有限公司 | Automatic examination and approval method and system for robot based on artificial intelligence |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP7242903B2 (en) * | 2019-05-14 | 2023-03-20 | ドルビー ラボラトリーズ ライセンシング コーポレイション | Method and Apparatus for Utterance Source Separation Based on Convolutional Neural Networks |
US11847415B2 (en) * | 2020-09-30 | 2023-12-19 | Astrazeneca Ab | Automated detection of safety signals for pharmacovigilance |
US11798380B2 (en) * | 2021-07-02 | 2023-10-24 | Target Brands, Inc. | Identifying barcode-to-product mismatches using point of sale devices |
JP2023044938A (en) * | 2021-09-21 | 2023-04-03 | 株式会社日立製作所 | Requirement definition support device and requirement definition support method for data analysis |
US20230319098A1 (en) * | 2022-03-31 | 2023-10-05 | Sophos Limited | Methods and apparatus for visualization of machine learning malware detection models |
US11657222B1 (en) * | 2022-11-23 | 2023-05-23 | Intuit Inc. | Confidence calibration using pseudo-accuracy |
US11727702B1 (en) * | 2023-01-17 | 2023-08-15 | VelocityEHS Holdings, Inc. | Automated indexing and extraction of information in digital documents |
KR102526211B1 (en) * | 2023-01-17 | 2023-04-27 | 주식회사 코딧 | The Method And The Computer-Readable Recording Medium To Extract Similar Legal Documents Or Parliamentary Documents For Inputted Legal Documents Or Parliamentary Documents, And The Computing System for Performing That Same |
US20240289720A1 (en) * | 2023-02-27 | 2024-08-29 | Kpmg Llp | System and method for implementing a data analysis substantive procedure for revenue transactions |
WO2024186351A1 (en) * | 2023-03-09 | 2024-09-12 | PwC Product Sales LLC | Method and apparatus for decentralized privacy preserving audit based on zero knowledge proof protocol |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190171944A1 (en) * | 2017-12-06 | 2019-06-06 | Accenture Global Solutions Limited | Integrity evaluation of unstructured processes using artificial intelligence (ai) techniques |
US20200034842A1 (en) * | 2018-07-24 | 2020-01-30 | Accenture Global Solutions Limited | Digital content and transaction management using an artificial intelligence (ai) based communication system |
US20220097228A1 (en) * | 2020-09-28 | 2022-03-31 | Sap Se | Converting Handwritten Diagrams to Robotic Process Automation Bots |
Family Cites Families (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110219035A1 (en) * | 2000-09-25 | 2011-09-08 | Yevgeny Korsunsky | Database security via data flow processing |
US11651039B1 (en) * | 2007-02-06 | 2023-05-16 | Dmitri Soubbotin | System, method, and user interface for a search engine based on multi-document summarization |
US8020763B1 (en) * | 2009-06-30 | 2011-09-20 | Intuit Inc. | Method and system for assessing merchant risk during payment transaction |
US9047283B1 (en) * | 2010-01-29 | 2015-06-02 | Guangsheng Zhang | Automated topic discovery in documents and content categorization |
US9690770B2 (en) * | 2011-05-31 | 2017-06-27 | Oracle International Corporation | Analysis of documents using rules |
US8930295B2 (en) * | 2011-09-12 | 2015-01-06 | Stanley Victor CAMPBELL | Systems and methods for monitoring and analyzing transactions |
US8930267B1 (en) * | 2012-08-27 | 2015-01-06 | Jpmorgan Chase Bank, N.A. | Automated transactions clearing system and method |
US9582475B2 (en) * | 2012-12-17 | 2017-02-28 | Business Objects Software Ltd. | Packaging business intelligence documents with embedded data |
US9053516B2 (en) * | 2013-07-15 | 2015-06-09 | Jeffrey Stempora | Risk assessment using portable devices |
US10289742B2 (en) * | 2013-08-22 | 2019-05-14 | Sensoriant, Inc. | Method and system for addressing the problem of discovering relevant services and applications that are available over the internet or other communications network |
US9465800B2 (en) * | 2013-10-01 | 2016-10-11 | Trunomi Ltd. | Systems and methods for sharing verified identity documents |
US9619481B2 (en) * | 2014-06-17 | 2017-04-11 | Adobe Systems Incorporated | Method and apparatus for generating ordered user expert lists for a shared digital document |
US10332215B2 (en) * | 2015-07-15 | 2019-06-25 | Caseware International Inc. | Method, software, and device for displaying a graph visualizing audit risk data |
US10210518B2 (en) * | 2016-04-13 | 2019-02-19 | Abdullah Abdulaziz I. Alnajem | Risk-link authentication for optimizing decisions of multi-factor authentications |
RU2640297C2 (en) * | 2016-05-17 | 2017-12-27 | Общество с ограниченной ответственностью "Аби Продакшн" | Definition of confidence degrees related to attribute values of information objects |
US9710544B1 (en) * | 2016-05-19 | 2017-07-18 | Quid, Inc. | Pivoting from a graph of semantic similarity of documents to a derivative graph of relationships between entities mentioned in the documents |
EP3523771A4 (en) * | 2016-10-09 | 2020-04-29 | Vatbox, Ltd. | System and method for verifying unstructured enterprise resource planning data |
US20180315141A1 (en) * | 2017-04-26 | 2018-11-01 | Clause, Inc. | System and method for business intelligence through data-driven contract analysis |
US10242257B2 (en) * | 2017-05-18 | 2019-03-26 | Wipro Limited | Methods and devices for extracting text from documents |
WO2019060468A1 (en) * | 2017-09-20 | 2019-03-28 | Conversica, Inc. | Systems and methods for natural language processing and classification |
US20190156428A1 (en) * | 2017-11-20 | 2019-05-23 | Accenture Global Solutions Limited | Transaction reconciliation system |
US10943196B2 (en) * | 2018-07-09 | 2021-03-09 | Accenture Global Solutions Limited | Data reconciliation |
US11315008B2 (en) * | 2018-12-31 | 2022-04-26 | Wipro Limited | Method and system for providing explanation of prediction generated by an artificial neural network model |
US11862305B1 (en) * | 2019-06-05 | 2024-01-02 | Ciitizen, Llc | Systems and methods for analyzing patient health records |
US11620843B2 (en) * | 2019-09-10 | 2023-04-04 | Intuit Inc. | Metamodeling for confidence prediction in machine learning based document extraction |
US11929066B2 (en) * | 2019-10-08 | 2024-03-12 | PwC Product Sales LLC | Intent-based conversational knowledge graph for spoken language understanding system |
US20210142169A1 (en) * | 2019-11-08 | 2021-05-13 | Accenture Global Solutions Limited | Prediction interpretation |
US11568143B2 (en) * | 2019-11-15 | 2023-01-31 | Intuit Inc. | Pre-trained contextual embedding models for named entity recognition and confidence prediction |
US11526725B2 (en) * | 2019-12-06 | 2022-12-13 | Bank Of America Corporation | Attention-based layered neural network architecture for explainable and high-performance AI processing |
US11361532B1 (en) * | 2020-04-30 | 2022-06-14 | Idemia Identity & Security USA LLC | System and method for OCR based object registration |
US11860950B2 (en) * | 2021-03-30 | 2024-01-02 | Sureprep, Llc | Document matching and data extraction |
US11829332B1 (en) * | 2021-06-28 | 2023-11-28 | Amazon Technologies, Inc. | Content importing with discovery in a collaborative environment |
-
2022
- 2022-06-30 EP EP22834412.3A patent/EP4364031A1/en active Pending
- 2022-06-30 WO PCT/US2022/073277 patent/WO2023279037A1/en active Application Filing
- 2022-06-30 US US17/854,352 patent/US20230004845A1/en active Pending
- 2022-06-30 CA CA3225591A patent/CA3225591A1/en active Pending
- 2022-06-30 WO PCT/US2022/073280 patent/WO2023279039A1/en active Application Filing
- 2022-06-30 CA CA3225621A patent/CA3225621A1/en active Pending
- 2022-06-30 AU AU2022301115A patent/AU2022301115A1/en active Pending
- 2022-06-30 US US17/854,329 patent/US20230005075A1/en active Pending
- 2022-06-30 WO PCT/US2022/073292 patent/WO2023279047A1/en active Application Filing
- 2022-06-30 AU AU2022303525A patent/AU2022303525A1/en active Pending
- 2022-06-30 US US17/854,348 patent/US20230004604A1/en active Pending
- 2022-06-30 EP EP22834413.1A patent/EP4364006A1/en active Pending
- 2022-06-30 CA CA3225768A patent/CA3225768A1/en active Pending
- 2022-06-30 EP EP22834414.9A patent/EP4364074A1/en active Pending
- 2022-06-30 AU AU2022301331A patent/AU2022301331A1/en active Pending
- 2022-06-30 CA CA3225999A patent/CA3225999A1/en active Pending
- 2022-06-30 US US17/854,337 patent/US20230004590A1/en active Pending
- 2022-06-30 CA CA3225771A patent/CA3225771A1/en active Pending
- 2022-06-30 AU AU2022305353A patent/AU2022305353A1/en active Pending
- 2022-06-30 EP EP22834419.8A patent/EP4363953A1/en active Pending
- 2022-06-30 EP EP22834417.2A patent/EP4363993A1/en active Pending
- 2022-06-30 WO PCT/US2022/073279 patent/WO2023279038A1/en active Application Filing
- 2022-06-30 WO PCT/US2022/073290 patent/WO2023279045A1/en active Application Filing
- 2022-06-30 AU AU2022305355A patent/AU2022305355A1/en active Pending
- 2022-06-30 US US17/854,338 patent/US20230004888A1/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190171944A1 (en) * | 2017-12-06 | 2019-06-06 | Accenture Global Solutions Limited | Integrity evaluation of unstructured processes using artificial intelligence (ai) techniques |
US20200034842A1 (en) * | 2018-07-24 | 2020-01-30 | Accenture Global Solutions Limited | Digital content and transaction management using an artificial intelligence (ai) based communication system |
US20220097228A1 (en) * | 2020-09-28 | 2022-03-31 | Sap Se | Converting Handwritten Diagrams to Robotic Process Automation Bots |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20240012848A1 (en) * | 2022-07-11 | 2024-01-11 | Bank Of America Corporation | Agnostic image digitizer |
US11934447B2 (en) * | 2022-07-11 | 2024-03-19 | Bank Of America Corporation | Agnostic image digitizer |
CN116562785A (en) * | 2023-03-17 | 2023-08-08 | 广东铭太信息科技有限公司 | Auditing and welcome system |
CN117422428A (en) * | 2023-12-19 | 2024-01-19 | 尚恰实业有限公司 | Automatic examination and approval method and system for robot based on artificial intelligence |
Also Published As
Publication number | Publication date |
---|---|
US20230004604A1 (en) | 2023-01-05 |
AU2022305355A1 (en) | 2024-01-18 |
WO2023279039A1 (en) | 2023-01-05 |
AU2022301331A1 (en) | 2024-01-18 |
CA3225591A1 (en) | 2023-01-05 |
CA3225771A1 (en) | 2023-01-05 |
US20230004590A1 (en) | 2023-01-05 |
EP4364006A1 (en) | 2024-05-08 |
WO2023279047A1 (en) | 2023-01-05 |
US20230004888A1 (en) | 2023-01-05 |
EP4363993A1 (en) | 2024-05-08 |
WO2023279038A1 (en) | 2023-01-05 |
CA3225621A1 (en) | 2023-01-05 |
CA3225768A1 (en) | 2023-01-05 |
EP4364074A1 (en) | 2024-05-08 |
AU2022305353A1 (en) | 2024-01-18 |
EP4364031A1 (en) | 2024-05-08 |
CA3225999A1 (en) | 2023-01-05 |
WO2023279045A1 (en) | 2023-01-05 |
WO2023279037A1 (en) | 2023-01-05 |
EP4363953A1 (en) | 2024-05-08 |
AU2022303525A1 (en) | 2024-01-18 |
US20230004845A1 (en) | 2023-01-05 |
AU2022301115A1 (en) | 2024-01-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20230005075A1 (en) | Ai-augmented auditing platform including techniques for automated assessment of vouching evidence | |
US11954739B2 (en) | Methods and systems for automatically detecting fraud and compliance issues in expense reports and invoices | |
CN108985912B (en) | Data reconciliation | |
US20230132208A1 (en) | Systems and methods for classifying imbalanced data | |
Zhaokai et al. | Contract analytics in auditing | |
US20140258169A1 (en) | Method and system for automated verification of customer reviews | |
US20220139098A1 (en) | Identification of blocks of associated words in documents with complex structures | |
US20230236890A1 (en) | Apparatus for generating a resource probability model | |
US11694208B2 (en) | Self learning machine learning transaction scores adjustment via normalization thereof accounting for underlying transaction score bases relating to an occurrence of fraud in a transaction | |
Khadivizand et al. | Towards intelligent feature engineering for risk-based customer segmentation in banking | |
US11341354B1 (en) | Using serial machine learning models to extract data from electronic documents | |
CN117882081A (en) | AI enhanced audit platform including techniques for automatically evaluating evidence of a checklist | |
Tornés et al. | Detecting forged receipts with domain-specific ontology-based entities & relations | |
CN116402056A (en) | Document information processing method and device and electronic equipment | |
CN114662457A (en) | Information generation method, device, equipment and computer storage medium | |
US20040193433A1 (en) | Mathematical decomposition of table-structured electronic documents | |
Wei et al. | Using machine learning to detect PII from attributes and supporting activities of information assets | |
US20240330323A1 (en) | Ai-augmented composable and configurable microservices for record linkage and reconciliation | |
US20240331056A1 (en) | Ai-augmented composable and configurable microservices for determining a roll forward amount | |
US20240346418A1 (en) | Method and apparatus to extract client data with context using enterprise knowledge graph framework | |
EP4310755A1 (en) | Self learning machine learning transaction scores adjustment via normalization thereof | |
Ciovică | Graph Convolutional Neural Network for extracting tabular data of purchase order documents | |
Niu et al. | An Audit Risk Model Based on Improved BP Neural Network Data Mining Algorithm | |
WO2024205582A1 (en) | Ai-augmented composable and configurable microservices for record linkage and reconciliation | |
WO2024205581A1 (en) | Ai-augmented composable and configurable microservices for determining a roll forward amount |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
AS | Assignment |
Owner name: PRICEWATERHOUSECOOPERS LLP, NEW YORK Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:LI, CHUNG-SHENG;REEL/FRAME:064128/0919 Effective date: 20221111 |
|
AS | Assignment |
Owner name: PWC PRODUCT SALES LLC, NEW YORK Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:PRICEWATERHOUSECOOPERS LLP;REEL/FRAME:065532/0034 Effective date: 20230630 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |