CN117751362A

CN117751362A - AI-enhanced audit platform including techniques for applying combinable assurance integrity frameworks

Info

Publication number: CN117751362A
Application number: CN202280053275.4A
Authority: CN
Inventors: 李中生; W·程; M·J·弗拉维尔; L·M·霍尔马克; N·A·利佐特; K·M·梁
Original assignee: Pwc Product Sales Co ltd
Current assignee: Pwc Product Sales Co ltd
Priority date: 2021-06-30
Filing date: 2022-06-30
Publication date: 2024-03-22
Also published as: CN117882041A; CN117897705A; CN117882081A; CN117859122A

Abstract

A system for generating a risk assessment based on data representing a plurality of reports and data representing corroborative evidence is provided. The system receives data representing a plurality of reports and data representing corroborative evidence. The system applies one or more integrity analysis models to the first data and the second data to generate an assessment that one or more of the plurality of reports represents a risk of significant misstatement. A system for generating an assessment of loyalty of data is provided. The system compares the data representing the report with the data representing the corroborative evidence and generates a similarity measure representing their similarity. Based on the similarity measure, the system generates an output representing an assessment of the loyalty of the first dataset.

Description

AI-enhanced audit platform including techniques for applying combinable assurance integrity frameworks

Cross Reference to Related Applications

U.S. provisional application No.63/217,119, filed at 2021, 6, 30; U.S. provisional application No.63/217,123, filed at 30/6/2021; U.S. provisional application No.63/217,127, filed at 30, 6, 2021; U.S. provisional application No.63/217,131, filed at 30/6/2021; and U.S. provisional application No.63/217,134, filed on 6/30 of 2021, the entire contents of each of which are incorporated herein by reference.

Technical Field

The present invention relates generally to data processing enhanced with AI, and more particularly to a data processing system and method that employ combinable assurance integrity frameworks and context-aware data integrity frameworks.

Background

Manual auditing is time consuming and expensive, is prone to human error, and is prone to human bias. Furthermore, due to the inherent limitations of manual auditing, sampling methods are used instead of ensemble testing. The sampling method attempts to select a representative sample, but does not guarantee that important information is not missing from the data that was not selected for review.

Furthermore, according to known techniques for vouching and tracking (which may be done according to an audit process), vouching and tracking are done as two separate processes and independently using audit samples.

Disclosure of Invention

As explained above, performing audits manually is time consuming and expensive, is prone to human error, and is prone to human bias. Furthermore, due to the inherent limitations of manual auditing, sampling methods are used instead of comprehensive testing. The sampling method attempts to select a representative sample, but does not guarantee that important information is not missing from the data that was not selected for review. Thus, attempts have been made to automate portions of the audit process. However, introduction of technology into auditing methods has focused mainly on substantive or control testing or risk assessment. Furthermore, in existing audit systems incorporating one or more techniques, the uncertainty introduced by the technique itself is largely ignored or, at best, is inaccurately and inaccurately interpreted by attempted manual inspection. In addition, existing audit systems incorporating one or more techniques have focused on a narrow approach to a single specific financial statement line project (FSLI, financial statement line item). Thus, there is a lack of a consistent framework to address more than 20 FSLI. The narrow solution for each FSLI audit is difficult to popularize and is almost impossible to expand effectively and economically. Furthermore, the insight provided by existing financial data systems cannot distinguish between fully contextualized transactions and incompletely contextualized transactions.

Furthermore, according to known techniques for vouching and tracking (which may be done according to an audit process), vouching and tracking are done as two separate processes and independently using audit samples. However, known systems and methods for information integrity do not handle fuzzy comparisons, do not leverage the context of evidence (e.g., host data, industry ontology, industry, and customer knowledge), do not leverage multiple pieces of evidence to establish data integrity, do not address challenges for which evidence may have been modified or updated, and do not address one-to-many/many-to-one/many relationships.

Thus, there is a need for improved systems and methods that address one or more of the above-described drawbacks of known systems for automated auditing. In particular, there is a need for end-to-end transformations for technology-based auditing methods. There is a need for a system and method for an AI-enhanced audit platform that provides the ability to combine a guaranteed integrity framework, the ability to accurately and automatically interpret the uncertainty introduced by the technology, the ability to apply to multiple FSLI in a generic manner, and the ability to distinguish fully contextualized transactions from non-contextualized transactions. The systems and methods described herein may meet one or more of these needs. Disclosed herein are systems and methods configured to review a plurality of FSLI and determine whether any FSLI includes significant misinformation (misstatement) based on the reviewed evidence data. The system may meet one or more of the above needs by providing a combinable framework that can accommodate each of a plurality of FSLI, thereby allowing the system flexibility and adaptability in addressing wide variations in industry, business practices, and rapidly changing business environments. The combinable framework provided by the systems described herein may provide a consistent method for tracking activities within a financial operation of an enterprise and for determining the potential importance of a loss-in-real statement of a financial statement.

Further, there is a need for improved systems and methods that address one or more of the above-described drawbacks in known methods for vouching and tracking. Disclosed herein are methods and systems for performing automated (or semi-automated) data processing operations for audit processes, wherein vouching and tracking (e.g., FSLI audit for multiple documents and ERP records) is performed semi-automatically or fully automatically at the same time, wherein specification and actual matching of corresponding columns in the ledger to supporting source documents is performed automatically.

In some embodiments, a first system is provided for generating a risk assessment based on data representing a plurality of reports and data representing corroborative evidence (corroborating evidence), the first system comprising one or more processors configured to cause the first system to: receiving a first dataset representing a plurality of reports; receiving a second dataset comprising corroborative evidence related to one or more of the plurality of reports; one or more integrity analysis models are applied to the first data set and the second data set to generate one or more of the plurality of reports representing an assessment of risk of significant misstatement.

In some embodiments of the first system, applying one or more integrity analysis models includes applying one or more process integrity analysis models to track one or more changes represented by the plurality of reports.

In some embodiments of the first system, applying the one or more integrity analysis models includes applying the one or more data integrity analysis models to generate an assessment of fidelity of information in one or more of the first data set and the second data set to the ground truth represented by the information.

In some embodiments of the first system, applying the one or more integrity analysis models includes applying the one or more policy integrity models to generate output data including a arbitration according to a guaranteed knowledge base (assurance knowledge substrate), wherein the arbitration is based on all or part of one or both of: multiple reports and corroborative evidence.

In some embodiments of the first system, the guaranteed knowledge base includes data representing one or more of: industry practices of an industry related to one or more of the plurality of reports, historical behavior related to one or more of the parties related to one or more of the plurality of reports, one or more accounting policies, and one or more audit criteria.

In some embodiments of the first system, the evaluation of the risk that one or more of the plurality of reports represent significant misinformation is associated with a level selected from the group consisting of: transaction level, account level, and line item level.

In some embodiments of the first system, the evaluation of the generated risk is based at least in part on an evaluated risk level attributable to one or more automated processes used in generating or processing one or both of the first data set and the second data set.

In some embodiments of the first system, generating the assessment of risk includes performing a global test on the first data set and the second data set.

In some embodiments of the first system, performing the population of tests includes: applying one or more process integrity models based on ERP data included in one or both of the first data set and the second data set; and applying one or more data integrity models based on the corroborative evidence in the second data set.

In some embodiments of the first system, the one or more processors are configured to apply the assessment of risk to configure the characteristics of the target sampling process.

In some embodiments of the first system, the one or more processors are configured to apply the one or more common modules across two or more models selected from: a data integrity model, a process integrity model, and a policy integrity model.

In some embodiments of the first system, the one or more processors are configured to apply a warranty insight model to generate warranty insight data based at least in part on the generated risk assessment of significant false positives.

In some embodiments of the first system, the one or more processors are configured to apply a guaranteed recommendation model to generate recommendation data based at least in part on the guaranteed insight data.

In some embodiments, a first non-transitory computer-readable storage medium is provided that stores instructions for generating a risk assessment based on data representing a plurality of reports and data representing corroborative evidence, the instructions configured to be executed by a system comprising one or more processors to cause the system to: receiving a first dataset representing a plurality of reports; receiving a second dataset comprising corroborative evidence related to one or more of the plurality of reports; one or more integrity analysis models are applied to the first data set and the second data set to generate one or more of the plurality of reports representing an assessment of risk of significant misstatement.

In some embodiments, a first method is provided for generating a risk assessment based on data representing a plurality of reports and data representing corroborative evidence, wherein the first method is performed by a system comprising one or more processors, the first method comprising: receiving a first dataset representing a plurality of reports; receiving a second dataset comprising corroborative evidence related to one or more of the plurality of reports; one or more integrity analysis models are applied to the first data set and the second data set to generate one or more of the plurality of reports representing an assessment of risk of significant misstatement.

In some embodiments, a second system is provided for generating an assessment of the loyalty of the data, the second system comprising one or more processors configured to cause the second system to: receiving a first dataset representing a plurality of reports; receiving a second data set comprising a plurality of items of corroborative evidence related to one or more of the plurality of reports; generating a corresponding report feature vector for each report in the multiple reports; generating a respective evidence feature vector for each of a plurality of items of corroborating evidence; calculating a similarity metric based on one or more of the report feature vectors and based on one or more of the evidence feature vectors, the similarity metric representing a level of similarity between a set of one or more of the plurality of reports and a set of one or more of the plurality of items of corroborating evidence; output data representing an assessment of loyalty of the first dataset is generated based on the similarity measure.

In some embodiments of the second system, generating output data representing an assessment of loyalty includes performing a clustering operation on a set of similarity metrics including the similarity metric.

In some embodiments of the second system, generating the respective report feature vectors includes encoding one or more of: content information included in the first data set, context information included in the first data set; and information received from a different data source than the first data set.

In some embodiments of the second system, generating the respective evidence feature vector includes encoding one or more of: content information included in the second data set, context information included in the second data set; and information received from a data source different from the second data set.

In some embodiments of the second system, the first data set is selected based on one or more data selection criteria for selecting a subset of the available data within the system, wherein the subset selection criteria include one or more of: data content criteria and time criteria.

In some embodiments of the second system, the second data set includes data representing provenance of one or more of the items of corroborative evidence.

In some embodiments of the second system, the second data set comprises one or more of the following: structured data, semi-structured data, and unstructured data.

In some embodiments of the second system, the second data set includes data representing multiple versions of a single document.

In some embodiments of the second system, generating the similarity measure includes comparing the single report feature vector to a plurality of evidence feature vectors.

In some embodiments of the second system, generating the similarity measure includes applying dynamic programming.

In some embodiments of the second system, generating the similarity measure includes applying one or more weights, wherein the weights are determined according to one or more machine learning models.

In some embodiments of the second system, generating output data representing an assessment of loyalty includes generating a confidence score.

In some embodiments of the second system, generating output data representing an assessment of loyalty includes assessing sufficiency of loyalty at a plurality of levels.

In some embodiments, a second non-transitory computer-readable storage medium is provided that stores instructions for generating an assessment of loyalty of data, the instructions configured to be executed by a system comprising one or more processors to cause the system to: receiving a first dataset representing a plurality of reports; receiving a second data set comprising a plurality of items of corroborative evidence related to one or more of the plurality of reports; generating a corresponding report feature vector for each report in the multiple reports; generating a respective evidence feature vector for each of a plurality of items of corroborating evidence; calculating a similarity metric based on one or more of the report feature vectors and based on one or more of the evidence feature vectors, the similarity metric representing a level of similarity between a set of one or more of the plurality of reports and a set of one or more of the plurality of items of corroborating evidence; and generating output data representing an assessment of loyalty of the first dataset based on the similarity measure.

In some embodiments, a second method is provided for generating an assessment of the loyalty of data, wherein the second method is performed by a system comprising one or more processors, the second method comprising: receiving a first dataset representing a plurality of reports; receiving a second data set comprising a plurality of items of corroborative evidence related to one or more of the plurality of reports; generating a corresponding report feature vector for each report in the multiple reports; generating a respective evidence feature vector for each of a plurality of items of corroborating evidence; calculating a similarity metric based on one or more of the report feature vectors and based on one or more of the evidence feature vectors, the similarity metric representing a level of similarity between a set of one or more of the plurality of reports and a set of one or more of the plurality of items of corroborating evidence; and generating output data representing an assessment of loyalty of the first dataset based on the similarity measure.

In some embodiments, a third system is provided for generating a risk assessment based on data representing a plurality of reports and data representing corroborative evidence, the third system comprising one or more processors configured to cause the system to: receiving a first dataset representing a plurality of reports; receiving a second dataset comprising corroborative evidence related to one or more of the plurality of reports; one or more integrity analysis models are applied to the first data set and the second data set to generate output data including an assessment of risk.

In some embodiments, a third non-transitory computer-readable storage medium is provided that stores instructions for generating a risk assessment based on data representing a plurality of reports and data representing corroborative evidence, the instructions configured to be executed by a system comprising one or more processors to cause the system to: receiving a first dataset representing a plurality of reports; receiving a second dataset comprising corroborative evidence related to one or more of the plurality of reports; one or more integrity analysis models are applied to the first data set and the second data set to generate output data including an assessment of risk.

In some embodiments, a third method is provided for generating a risk assessment based on data representing a plurality of reports and data representing corroborative evidence, wherein the third method is performed by a system comprising one or more processors, the third method comprising: receiving a first dataset representing a plurality of reports; receiving a second dataset comprising corroborative evidence related to one or more of the plurality of reports; one or more integrity analysis models are applied to the first data set and the second data set to generate output data including an assessment of risk.

In some embodiments, any one or more of the features, characteristics, or aspects of any one or more of the above-described systems, methods, or non-transitory computer-readable storage media may be combined with each other, in whole or in part, and/or with any one or more (in whole or in part) of any other embodiment or feature, characteristic, or aspect of the disclosure herein.

Drawings

Various embodiments are described with reference to the accompanying drawings, in which:

1A-1B illustrate a system architecture diagram of a system for providing a combinable integrity framework, in accordance with some embodiments.

Fig. 2A-2B depict conceptual architectures of systems for providing a combinable integrity framework, according to some embodiments.

Fig. 3 depicts a graph showing the probability of using a bayesian belief network (Bayesian belief network) to track the overall assertion of uncertainty in a inference as true, in accordance with some embodiments.

Fig. 4 depicts evidence reasoning for revenue and receivables using a bayesian belief network, in accordance with some embodiments.

Fig. 5 illustrates an example of a computer according to some embodiments.

Detailed Description

Described herein are systems and methods for providing an AI-enhanced audit platform, including providing a combinable framework for ensuring integrity that can be adapted for each of a plurality of FSLI. The output data generated by the system may include an indication of whether one or more FSLI or other assertions analyzed by the system include significant false positives. Further, the systems and methods described herein are systems and methods for semi-or fully-automatic simultaneous vouching and tracking of data integrity. In some embodiments, any one or more of the data integrity techniques discussed herein may be used as part of a combinable guaranteed integrity system. The systems and methods described herein may establish a representation loyalty of financial data (representation faithfulness), which is useful for determining if there are any significant false positives (material misstatement), for example, in FSLI.

Combinable to ensure integrity

Thus, attempts have been made to automate portions of the audit process. However, introduction of technology into auditing methods has focused mainly on substantive or control testing or risk assessment.

Furthermore, in existing audit systems incorporating one or more techniques, the uncertainty introduced by the technique itself is largely ignored or, at best, is inaccurately and inaccurately interpreted by attempted manual inspection.

In addition, existing audit systems incorporating one or more techniques have focused on a narrow approach to a single specific Financial Statement Line Item (FSLI). Thus, there is a lack of a consistent framework to address more than 20 FSLI. The narrow solution for each FSLI audit is difficult to popularize and is almost impossible to expand effectively and economically.

Furthermore, the insight provided by existing financial data systems cannot distinguish between fully contextualized transactions and incompletely contextualized transactions.

Accordingly, there is a need for improved systems and methods that address one or more of the above-described disadvantages. In particular, there is a need for end-to-end transformations for technology-based auditing methods. There is a need for a system and method for an AI-enhanced audit platform that provides the ability to combine a guaranteed integrity framework, the ability to accurately and automatically interpret the uncertainty introduced by the technology, the ability to apply to multiple FSLI in a generic manner, and the ability to distinguish fully contextualized transactions from non-contextualized transactions. The systems and methods described herein may meet one or more of these needs.

In some embodiments, a system for providing an AI-enhanced audit platform is provided. The system includes one or more processors configured to receive input data (e.g., documents, financial statements, other evidence) for auditing received from one or more data sources. The system is configured to automatically apply one or more data processing operations to the received data to render one or more evaluations, scores, and/or decisions based on the received data. (any of the data processing operations referenced herein may include application of one or more models trained by machine learning.) the system may generate output data indicative of one or more results of the data processing operations, and the results may be stored, visualized, or otherwise provided to one or more users, and/or used to trigger one or more automated operations of the system. In some embodiments, the system may be configured to review a plurality of FSLI and determine whether any FSLI includes significant false positives based on the reviewed evidence data.

The system may meet one or more of the above needs by providing a combinable framework that can accommodate each of a plurality of FSLI, thereby allowing the system flexibility and adaptability in addressing wide variations in industry, business practices, and rapidly changing business environments. The combinable framework provided by the systems described herein may provide a consistent method for tracking activities within a financial operation of an enterprise and for determining the potential importance of a loss-in-real statement of a financial statement. As described herein, the system may be configured to begin analysis from an accounting table and may track activities recorded to its source; this allows the system to determine if there are any anomalies in the data. Furthermore, the system may be adapted for sample-based testing and ensemble testing, at least due to the flexibility and efficiency provided by the combinable framework.

In some embodiments, the output data generated by the system may include an indication of whether one or more FSLI or other assertions analyzed by the system include significant false positives, as determined based at least in part on one or both of: (a) Evidence data processed by the system and (b) uncertainties introduced by one or more techniques (e.g., OCR) applied by the system during the evaluation process. In some embodiments, the output data may include a classification of FSLI-e.g., "do include a false positive," do not include a false positive, "" do include a significant false positive, "or" do include a significant false positive. In some embodiments, the output data may include a metric that quantifies or scores the system's assessment as to whether the FSLI includes significant false positives. In some embodiments, the metrics may score the degree of importance of the false positive. In some embodiments, the metrics may score the confidence of the system in the conclusions. In some embodiments, the metrics may be based on the determined level of importance of the false positive and the confidence of the system in the conclusion. In some embodiments, a separate output may be provided for each separate FSLI. In some embodiments, a combined or collective output may be provided for the transaction, account (e.g., including multiple transactions), or other overall audit scope as a whole.

In some embodiments, the output data may be stored, visualized, or otherwise provided to one or more users, and/or used to trigger one or more automated actions of the system. In some embodiments, the output data may be used to evaluate a risk profile of the transaction population. In some embodiments, the output data may be used as a basis for the target samples (e.g., to automatically determine the degree of sampling and/or the manner in which the sampling is performed).

In some embodiments, the system may use a combinable integrity framework to track multiple transactions (or interactions or statements) end-to-end with corresponding evidence data received by the system in order to establish a risk of significant misstatement for each transaction (or interaction or statement). In some embodiments, the system may apply one or more criteria, thresholds, or criteria requirements to make one or more evaluations, such as an evaluation as to whether the transaction was successfully verified. In some embodiments, the system may be capable of being configured in accordance with one or more user inputs (or other trigger events) to set or adjust the criteria required by the system (e.g., the number of pieces of evidence, the strength of the evidence, the level of matching, and/or the confidence level) in order to generate a particular output (e.g., an indication of successful verification).

In some embodiments, the system may apply one or more data processing operations and/or AI models to evaluate process integrity. Assessing process integrity may include tracking changes within an account, for example, by using a chart of accounts, in order to track changes to its source and identify activities associated with the changes.

In some embodiments, the system may apply one or more data processing operations and/or AI models to evaluate data integrity. Assessing data integrity may include assessing the fidelity of information in a digital system with respect to real world ground truths that the data is intended to represent.

In some embodiments, the system may apply one or more data processing operations and/or AI models to evaluate policy integrity. Evaluating policy integrity may include arbitrating the collected evidence data according to the process integrity and data integrity processes explained herein, wherein the arbitration is made according to a guaranteed knowledge base. In some embodiments, the guaranteed knowledge base includes the following components: (a) Information about the enterprise's context, including industry practices, historical behavior, etc., determined using endogenous and/or exogenous information, and (b) one or more accounting policies (e.g., GAAP or IFRS) and/or audit criteria.

In some embodiments, the systems disclosed herein may leverage orchestration to enable reuse and sharing of certain common modules across process integrity, data integrity, and policy integrity processes for use with a variety of different kinds of FSLI.

In some embodiments, one or more of the three integrity assessments (process, data, and policies) may be applied with respect to the full population of available data (as opposed to a limited (e.g., random) sample of available data selected for representative testing). In some embodiments, process integrity assessment alone may be applied to data obtained from an Enterprise Resource Planning (ERP) system or database-ERP data (while data integrity and policy integrity may not be applied to the ERP data).

In some embodiments, the system may be configured to apply a model that includes a assurance insight layer that develops insights about space, time, space-time, customers, products, and other attributes. The insight may be developed later in the community layer, where the integrity of each transaction is analyzed.

In some embodiments, the system may be configured to apply a model that includes a guaranteed recommendation layer that generates recommendations based on audit insights and based on data about one or more previous conventions to provide to one or more users of the system, such as audit participation teams or audit clients. In some embodiments, the system may be configured such that one or more automated actions are automatically triggered in accordance with the recommendation generated by the recommendation layer (in some embodiments after the user enters approval recommendations).

Features and characteristics of some embodiments of a system for providing an audit platform including AI enhancements that can combine to ensure an integrity framework are provided below with reference to the figures and appendices herein.

Improved systems and methods, such as those disclosed herein, may include performing data-driven and AI-enhanced auditing using ensemble testing.

Fig. 1A-1B illustrate a system architecture diagram of a system 100 for providing a combinable integrity framework, in accordance with some embodiments. As shown in fig. 1A-1B, orchestration engine (orchestration engine) 102 may be communicatively coupled with process integrity engine 110, data integrity engine 120, and policy integrity engine 140. Each of engines 102, 110, 120, and 140 may include one or more processors (including one or more of the same processors as each other) configured to perform any one or more of the techniques disclosed herein. In some embodiments, engines 110, 120, and/or 140 may be communicatively coupled to each other and/or to orchestration engine 102. In some embodiments, any one or more of the engines of system 100 may be configured to receive user input to control the functionality described herein. In some embodiments, orchestration engine 102 may be configured to coordinate collaborative functionality across engines 110, 120, and/or 140, e.g., coordinating data exchanges between the engines and/or controlling the manner in which output is generated by one of the engines may trigger and/or control the functionality of the other engine.

The process integrity engine 110 may be configured to perform one or more AI-enhanced reconciliation (reconalization) data processing operations to generate output data related to ERP data for process verification. The data integrity engine 120 may be configured to perform one or more AI-enhanced vouching and tracking data processing operations to verify ERP transaction data against a source document. The policy integrity 140 engine may be configured to perform one or more AI-enhanced arbitration data processing operations to generate (based on one or more accounting criteria) recalculated financial statement data and/or discrepancy and anomaly data.

In some embodiments, the process integrity engine 110 may include an ERP data source 112, a reconciliation engine 114, and an output data store 116. The process integrity engine may be configured to analyze the ERP data to determine whether the data meets one or more criteria defined by the process rule set and/or the process model.

ERP data source 112 may include any one or more computer storage devices, such as databases, data stores, data repositories, live data feeds, and the like. ERP data source 112 may be communicatively coupled to one or more other components of system 100 and/or engine 110 and may be configured to provide ERP data to reconciliation engine 114 such that ERP data may be processed by engine 114 to generate output data representative of one or more process integrity determinations. In some embodiments, one or more components of system 100 and/or engine 110 may receive ERP data from ERP data source 112 on a scheduled basis, in response to user input, in response to one or more trigger conditions being met, and/or in response to the data being manually sent. ERP data received from ERP data source 112 may be provided in any suitable electronic data format.

In some embodiments, ERP data received from ERP data source 112 may include structured, unstructured, and/or partially structured (e.g., semi-structured) data. In some embodiments, ERP data received from ERP data source 112 may include data representing one or more of ledger information, invoice information, receivables information, cash receipt information, and/or inventory information.

In some embodiments, reconciliation engine 114 may include any one or more processors configured to accept ERP data from ERP data source 112 as input data and process the ERP data via one or more data processing operations to generate output data indicating whether the ERP data meets one or more criteria. The one or more criteria may be defined by a user, by a system setting, by a third party input, by a system dynamic determination, and/or by one or more predefined criteria. In some embodiments, the one or more criteria may include criteria related to timing (e.g., time requirements), sequence of events/steps, presence or absence of one or more events/steps, agreement of quantity, agreement of price, and/or agreement of amount. The one or more criteria may require that multiple representations in the entire available ERP data be consistent with each other (e.g., ERP data is internally consistent). The one or more criteria may require that events represented in the ERP data (e.g., events in a business process) occur in the correct (e.g., predefined order) relative to each other and/or that there are no missing events in the predefined sequence of desired events. The engine 110 may receive one or more criteria from any suitable source, such as determined by a user, customer input, and/or using process mining logic.

In some embodiments, reconciliation engine 114 may evaluate one or more criteria by tracking ERP data back through a predefined sequence of events (e.g., moving back through a predefined business process from revenue and tracking back toward payment information).

In some embodiments, reconciliation engine 114 may not evaluate whether the ERP data was verified (e.g., vouched-for) by the underlying written evidence (underlying documentary evidence); instead, reconciliation engine 114 may evaluate process integrity based entirely on representations made in the ERP data itself. In some embodiments, vouching for the assessed ERP data against one or more original documents (underlying document) may be performed by other components of the system 100, such as the data integrity engine 120.

The output data generated by reconciliation engine 114 may include electronic data in any suitable format that indicates whether the ERP data provided by ERP data source 112 meets one or more process criteria being evaluated. The output data may indicate whether a criterion (e.g., binary), a degree to which the criterion is met (e.g., score), a confidence level associated with one or more determinations (e.g., confidence score), and/or metadata indicating data and/or criteria and/or data sources upon which one or more evaluations are made.

The output data generated by reconciliation engine 114 may be stored in output data store 116 or in any other suitable computer storage component of system 100 and/or associated systems. Output data generated by reconciliation engine 114 may be transmitted, presented to a user, used to generate one or more visualizations, and/or used to trigger one or more automated system actions. In some embodiments, the functionality of data integrity engine 120 and/or policy integrity engine 140 may be triggered by output data generated by reconciliation engine 114; this collaboration functionality may be controlled and coordinated by orchestration engine 102. In some embodiments, if the process integrity criteria are not met, the system 100 may responsively determine (e.g., via the data integrity engine 120) whether the ERP information that does not meet one or more of the process integrity criteria may be verified by the original document (or, e.g., whether the ERP data may be in fact inaccurate). In some embodiments, one or more anomalies indicated by the output data generated by reconciliation engine 114 may be transmitted to and/or displayed to a human user, for example, as an alert soliciting a manual review.

In some embodiments, the analysis performed by the process integrity engine 110 may be performed on ERP data for a single transaction and/or ERP data for multiple transactions (e.g., a cluster of transactions).

In some embodiments, the data integrity engine 120 may include an ERP data source 122, a document data source 124, an exogenous data source 126, a document understanding engine 128, a vouching and tracking engine 130, and an output data store 132. The data integrity engine 120 may be configured to analyze the ERP data, the source document data, and/or the exogenous data to perform one or more vouching/tracking operations to determine whether the ERP data meets one or more vouching data integrity criteria.

In some embodiments, ERP data source 122 may include any one or more computer storage devices, such as databases, data stores, data repositories, live data feeds, and the like. The ERP data source 122 may be communicatively coupled to one or more other components of the system 100 and/or engine 120 and may be configured to provide ERP data thereto. In some embodiments, one or more components of system 100 and/or engine 120 may receive ERP data from ERP data source 122 on a scheduled basis, in response to user input, in response to one or more trigger conditions being met, and/or in response to the data being manually sent. ERP data received from ERP data source 122 may be provided in any suitable electronic data format. In some embodiments, ERP data source 122 may share any one or more features in common with ERP data source 112; in some embodiments, ERP data source 122 may include data sources that overlap ERP data source 112; in some embodiments, system 100 may rely on a single ERP data source (or a single set of ERP data sources) in place of separate data sources 122 and 112. In some embodiments, ERP data received from ERP data source 122 may include structured, unstructured, and/or partially structured (e.g., semi-structured) data. In some embodiments, ERP data received from ERP data source 122 may include data representing one or more of sales order information, invoice information, and/or accounts receivable information.

In some embodiments, the document data source 124 may include any one or more computer storage devices, such as databases, data stores, data repositories, live data feeds, and the like. Document data sources 124 may include sources of enterprise content management data. The document data source 124 may be communicatively coupled to one or more other components of the system 100 and/or engine 120 and may be configured to provide document data thereto. In some embodiments, one or more components of system 100 and/or engine 120 may receive document data from document data source 124 on a scheduled basis, in response to user input, in response to one or more trigger conditions being met, and/or in response to the data being manually sent. The document data received from the document data source 124 may be provided in any suitable electronic data format including, for example, a word processing document format, a spreadsheet document format, a CSV document format, a PDF document format, and/or an image document format. In some embodiments, the documents received from the document data source 124 may include one or more of a purchase order document, a bill of lading document, and/or a bank statement document.

In some embodiments, the external/primary data source 126 may include any one or more computer storage devices, such as databases, data stores, data repositories, live data feeds, and the like. The exogenous/primary data source 126 may be communicatively coupled to one or more other components of the system 100 and/or engine 120 and may be configured to provide exogenous data and/or primary data thereto. In some embodiments, one or more components of system 100 and/or engine 120 may receive exogenous data and/or primary data from exogenous/primary data source 126 on a scheduled basis, in response to user input, in response to one or more trigger conditions being met, and/or in response to the data being manually sent. The external/primary data received from the external data source 126 may be provided in any suitable electronic data format. In some embodiments, the data received from the exogenous data source 126 may include data representing customer information and/or product information.

In some embodiments, the exogenous data from the exogenous/primary data source 126 may include data from a third party data source and/or a third party organization external to the particular client. The exogenous data may include public SEC record data (e.g., edgar database), data from public internet resources, and the like. In some embodiments, the primary data from the external/primary data source 126 may include endogenous data from a data source associated with a party that is relevant to the analysis being performed by the system 100 (e.g., from a customer data source). The primary data may include primary customer data, primary vendor data, and/or primary product data.

In some embodiments, the document understanding engine 128 may include any one or more processors configured to accept document data from the document data source 126 and/or exogenous data from the exogenous data source 128 as input data and process the received data via one or more data processing operations to extract output data. The one or more data processing operations may include one or more document preprocessing operations, character recognition operations, information extraction operations, and/or natural language understanding models. In some embodiments, one or more data processing operations applied by the document understanding engine 128 may be defined by a user, defined by a system setting, defined by a third party input, and/or dynamically determined by the system. The document understanding engine 128 may generate output data representing information extracted from the input document, and the output data may be transmitted to the vouching and tracking engine 130 for further processing as described below. In some embodiments, the output data generated by the document understanding engine 128 may be in the form of tuples (e.g., indicating entity names, locations, entity values, and confidence levels associated with one or more of the values).

The vouching and tracking engine 130 may include any one or more processors configured to accept input data including ERP data and document data and process the input data to determine whether one or more vouching and tracking criteria are met. In some embodiments, evaluating one or more vouching or tracking criteria may include determining whether the ERP data is validated (e.g., vouched for) by the document data. In some embodiments, the vouching and tracking engine 130 may accept input data from the ERP data source 122 and from the document understanding engine 128. The vouching and tracking engine 130 may process the input data via one or more vouching and/or tracking data processing operations to generate output data indicative of whether (or to what extent) one or more vouching and/or tracking criteria are met. In some embodiments, one or more data processing operations applied by the vouching and tracking engine 130 may be defined by a user, defined by a system setting, defined by a third party input, and/or dynamically determined by the system. The vouching and tracking engine 130 may generate output data including an indication of whether the evaluated criteria are met (e.g., a binary indication), a degree to which the evaluated criteria are met (e.g., a vouching score), an associated confidence score, and/or associated metadata indicating underlying data on which the output data is based. In some embodiments, the output data generated by the document understanding engine 128 may be in the form of tuples (e.g., indicating entity names, locations, entity values, and confidence levels associated with one or more of the values).

In some embodiments, the vouching and tracking engine 130 can evaluate presence criteria, integrity criteria, and/or accuracy criteria for any one or more assertions (and/or any collection of assertions (e.g., cluster)). The presence criteria may evaluate whether evidence for an assertion is present; integrity criteria may evaluate whether all necessary evidence and all necessary components related to an assertion are present; and the accuracy criteria may evaluate whether the evidence indicates substantial information content consistent with the assertion. In some embodiments, the vouching and tracking engine 130 may employ one or more vouching and/or tracking operations, as described in U.S. patent application entitled "AI-AUGMENTED AUDITING PLATFORM INCLUDING TECHNIQUES FOR AUTOMATED ASSESSMENT OF VOUCHING EVIDENCE" filed on attorney docket No.13574-20068.00 at month 30 of 2022, the entire contents of which are incorporated herein by reference.

The output data generated by the vouching and tracking engine 130 may be stored in an output data store 132 or any other suitable computer storage component of the system 100 and/or associated system. The output data generated by the vouching and tracking engine 130 may be transmitted, presented to a user, used to generate one or more visualizations, and/or used to trigger one or more automation system actions. In some embodiments, the functionality of the process integrity engine 110 and/or the policy integrity engine 140 may be triggered by output data generated by the vouching and tracking engine 130; this collaboration functionality may be controlled and coordinated by orchestration engine 102. In some embodiments, one or more anomalies indicated by the output data generated by the vouching and tracking engine 130 may be transmitted to and/or displayed to a human user, for example, as an alert soliciting a manual review.

In some embodiments, the analysis performed by the data integrity engine 120 may be performed on data for a single transaction and/or on data for multiple transactions (e.g., clusters of transactions).

In some embodiments, policy integrity engine 140 may include a arbitration engine 142, a criteria data source 144, a revised output data store 146, and an output difference and difference data store 148. The policy integrity engine 140 may be configured to analyze the ERP data and/or the source document data to perform one or more policy integrity data processing operations to determine whether the input data meets one or more policy integrity criteria.

The criteria data source 144 may include any one or more computer storage devices, such as a database, data repository, live data feed, and the like. The criterion data source 144 may be communicatively coupled to one or more other components of the system 100 and/or engine 140 and may be configured to provide criterion data thereto. In some embodiments, one or more components of the system 100 and/or engine 140 may receive criterion data from the criterion data source 144 on a scheduled basis, in response to user input, in response to one or more trigger conditions being met, and/or in response to the data being manually sent. The criterion data received from the criterion data source 144 may be provided in any suitable electronic data format, including, for example, one or more structured, unstructured, and/or partially structured documents. In some embodiments, engine 140 may generate a rule set for policy integrity criteria by extracting rules from documents received from criteria data source 144.

The arbitration engine 142 may include any one or more processors configured to accept input data including ERP data, document data, and/or data generated by the process integrity engine 110 and/or the data integrity engine 120, and process the input data to determine whether or not one or more policy integrity criteria are met. In some embodiments, evaluating the one or more policy integrity criteria may include determining whether the input data indicates that one or more processes represented by the input data meet a time criterion, an order of operation criterion, a public criterion, a relatives criterion, a retrievability criterion, an internal consistency criterion, an ownership transfer criterion, a business parenchyma criterion, and/or a price (contract)/payment/retrievability criterion. In some embodiments, assessing the price of the game may include assessing a fixed price of the game and/or a variable price of the game.

In some embodiments, the arbitration engine 142 may accept as input data output data generated by the process integrity engine 110 and/or the data integrity engine 120, and may process the received data to perform one or more data processing operations, including "bundling" operations and/or "rolling forward" operations in terms of tracking transactions through business processes. The data generated by the process integrity engine 110 and/or the data integrity engine 120 indicating the differences and/or inconsistencies may be input data to the arbitration engine 142.

In some embodiments, the arbitration engine 142 may accept standard data from the criteria data source 144 as input data.

In some embodiments, the arbitration engine 142 may accept additional input data regarding related transactions as input data, such as in the case of transactions involving multiple shipments, returns/refunds, and/or single payments for multiple transactions. In some embodiments, relevant transaction data may be required as input according to accounting and/or auditing principles applied in accordance with criteria received from data sources 144.

In some embodiments, the arbitration engine 142 may be implemented by an inference engine, where rules may be triggered by received inputs (e.g., inputs indicating differences and inconsistencies found by the process integrity engine 110 and/or the data integrity engine 120 and/or outputs indicating additional transaction data).

In some embodiments, the arbitration engine 142 may consider implicit/explicit variable considerations, which may include various forms of discounts (e.g., including discounts captured in the original purchase order and/or invoice, discount rules in pricing, discount rules for customers, implicit discounts not captured elsewhere, and/or discounts for settling transactions when there is a difference between the invoice and payment). The actual accrued revenue may be the amount of the invoice adjusted with the various forms of discounts.

In some embodiments, the arbitration engine 142 may consider cashless valuations, which may include in-kinetically exchange (in-exchange).

In some embodiments, the arbitration engine 142 may evaluate the input data according to a multi-step process. In some embodiments, in step one, the arbitration engine 142 may evaluate whether a contract exists, for example, by evaluating one or more ownership transfers, business practices, and/or price checking criteria. In some embodiments, in step two, the arbitration engine 142 may identify multiple obligations of the contract, including, for example, (different) merchandise, (different) services, (different) packages of merchandise or services, and/or a series of different merchandise or services that are substantially identical and have the same pattern of transfer to the customer. In some embodiments, in step three, the arbitration engine 142 may identify the transaction price of the contract. In some embodiments, in step four, the arbitration engine 142 may assign the transaction price to an obligation that has been fulfilled. In some embodiments, in step five, the corresponding transaction price is mapped to the performance obligations, which may be the last step of identifying revenue for each of the performance obligations that have been satisfied.

In some embodiments, the arbitration engine 142 may evaluate whether one or more contracts should be combined together into a single contract.

In some embodiments, the arbitration engine 142 may employ one or more arbitration operations, as described in U.S. patent application Ser. No.13574-20069.00, entitled "AI-AUGMENTED AUDITING PLATFORM INCLUDING TECHNIQUES FOR AUTOMATED ADJUDICATION OF COMMERCIAL SUBSTANCE, RELATED PARTIES, AND COLLECTABITY," filed on 6 months and 30 days 2022, the entire contents of which are incorporated herein by reference.

The arbitration engine 142 may process the input data (e.g., ERP data, document data, and/or output data generated by one or both of the engines 110 and 120) via one or more policy integrity data processing operations to generate output data indicating whether (or to what extent) one or more policy integrity criteria are met. In some embodiments, one or more data processing operations applied by the arbitration engine 142 may be defined by a user, defined by a system setting, defined by a third party input, and/or dynamically determined by the system. In some embodiments, the user may select policy criteria that may include one or more accounting criteria and/or one or more audit criteria. The arbitration engine 142 may generate output data that includes an indication of whether the evaluated criteria are met (e.g., a binary indication), a degree to which the evaluated criteria are met (e.g., a vouching score), an associated confidence score, and/or associated metadata that indicates underlying data on which the output data is based. In some embodiments, the output data generated by the arbitrator 142 may be in the form of tuples (e.g., indicating entity names, locations, entity values, and confidence levels associated with one or more of the values).

In some embodiments, the output data generated by the arbitration engine 142 may include a revised version of the document and/or ERP data input into the arbitration engine 142, where the document and/or ERP data is revised to meet one or more policy integrity criteria. In some embodiments, the output data generated by the arbitration engine 142 may include recalculated financial statements. In some embodiments, such output data may be transmitted to the revised output data store 146.

In some embodiments, the output data generated by the arbitration engine 142 may include an indication of one or more discrepancies and/or anomalies, such as an indication of one or more pieces of input data that do not meet one or more policy integrity criteria. The differences and/or anomalies may be transmitted to the differences and anomalies data store 148 for storage and/or may be transmitted to or displayed to a user (e.g., via an alert suggesting a manual review).

The output data generated by the arbitration engine 142 may be stored in the output data store 146 and/or 148, and/or in any other suitable computer storage component of the system 100 and/or associated system. The output data generated by the arbitration engine 142 may be transmitted, presented to a user, used to generate one or more visualizations, and/or used to trigger one or more automation system actions. In some embodiments, the functionality of the process integrity engine 110 and/or the data integrity engine 120 may be triggered by output data generated by the arbitration engine 142; this collaboration functionality may be controlled and coordinated by orchestration engine 102. In some embodiments, one or more anomalies indicated by the output data generated by the arbitration engine 142 may be transmitted to and/or displayed to a human user, for example, as an alert soliciting a manual review.

In some embodiments, the analysis performed by the policy integrity engine 140 may be performed on data for a single transaction and/or on data for multiple transactions (e.g., clusters of transactions).

In some embodiments, policy integrity criteria data source 144 may include any one or more computer storage devices, such as a database, data store, data repository, live data feed, and the like. Policy integrity criteria data source 144 may be communicatively coupled to one or more other components of system 100 and/or engine 140 and may be configured to provide policy criteria data thereto. In some embodiments, one or more components of the system 100 and/or engine 140 may receive criterion data from the policy integrity criterion data source 144 on a scheduled basis, in response to user input, in response to one or more trigger conditions being met, and/or in response to the data being manually sent. Policy criteria data received from the policy integrity criteria data source 144 may be provided in any suitable electronic data format. In some embodiments, the criterion data received from the data source 144 may include structured, unstructured, and/or partially structured (e.g., semi-structured) data.

In some embodiments, the system 100 may provide one or more user-oriented options such that a user of the system may configure the system to customize it for a particular use case. For example, a user may select from available data sources, may select from available criteria, and may configure the manner in which one or more criteria are evaluated. In some embodiments, the user may be able to select whether (and/or to what extent) one or more criteria need to be met. In some embodiments, the user may be able to select which data needs to be bundled and which data does not need to be bundled. The user may be able to configure the system 100 to control what data is evaluated in the data integrity assessment, e.g., control whether all data is evaluated and whether one or more confidence levels below 100% are considered acceptable for the data integrity assessment. A user may be able to configure the system 100 to control what policies (e.g., what criteria) are applied for the purpose of policy integrity assessment.

In some embodiments, the system 100 may allow a user to selectively perform one or more of the following: process integrity, data integrity, and policy integrity. In some embodiments, a portion of system 100 may be applied while other portions are not applied. For example, where ERP data is available but raw document data is not, the system 100 may apply process integrity evaluations and/or policy integrity evaluations without applying any data integrity evaluations.

In some embodiments, the output data generated by engine 110, engine 120, and/or engine 140 may be used to generate an overall risk assessment score. In some embodiments, the output data generated by one or both of the engines 110, 120, or 140 may be sufficient to indicate a sufficiently high risk level such that no evaluation of the remaining engine(s) is applied. In some embodiments, the output data generated by one or both of the engines 110, 120, or 140 may be sufficient to indicate a sufficiently low risk level such that no evaluation of the remaining engine(s) is applied.

Fig. 2A-2B depict a conceptual architecture of a system 200 for providing a combinable integrity framework, according to some embodiments. As shown in fig. 2A-2B, system 200 may include a data lake layer 202; a knowledge base (knowledge substrate) layer 208; an integrity micro service layer 210; a normalization, contextualization and integrity verification layer 212; insight into micro-service layer 220; and a recommendation layer 222. Fig. 2A shows the layers at the bottom of the architecture, while fig. 2B shows the layers at the top of the architecture.

In some embodiments, data lake layer 202 may include an endogenous data source 204 and an exogenous data source 206, each of which may include any one or more computer storage devices, such as databases, data stores, data repositories, live data feeds, and the like. The data sources 204 and/or 206 may be communicatively coupled to one or more other components of the system 200 and may be configured to provide data thereto. In some embodiments, one or more components of system 200 may receive data from data sources 204 and/or 206 on a scheduled basis, in response to user input, in response to one or more trigger conditions being met, and/or in response to the data being manually sent. The data received from data sources 204 and/or 206 may be provided in any suitable electronic data format. In some embodiments, the data received from data sources 204 and/or 206 may include structured, unstructured, and/or partially structured (e.g., semi-structured) data. In some embodiments, the endogenous data source 204 may provide data including internal data directly originating from the party to which it belongs, such as an ERP representation from the party. In some embodiments, the endogenous data source 206 may provide data including external data originating from a third party source other than the party to which the data pertains.

Knowledge base layer 208 may include one or more processors and one or more data stores. Knowledge base layer 208 may include one or more processors configured to receive data from data sources 204 and/or 206 and process the data to generate processed endogenous/exogenous knowledge data, including, for example, master data, ontology/dictionary data, use case library data, planned document data, process knowledge data, and/or meeting/audit criteria data.

The integrity micro service layer 210 may include one or more processors and one or more data stores. The integrity micro service layer 210 may include one or more processors configured to receive data from the data sources 204, 206, and/or the knowledge base layer 208. One or more processors of the micro service layer 210 may apply one or more data processing operations to the received data to generate output data. In some embodiments, the micro service layer 210 may apply one or more micro services including, for example: open source microservices (e.g., openCV, tesseract, NLTK); vendor tools (e.g., abbyy, tableu); and/or custom tools (e.g., infoExct).

Normalization, contextualization, and integrity verification layer 212 may include one or more processors and one or more data stores. The normalization, contextualization, and integrity verification layer 212 may include one or more processors configured to receive input data (e.g., from one or more of the underlying layers (unreeling layers) 202, 208, and/or 210 in the system 200 and/or from one or more external data sources) and apply one or more integrity assessment data processing models configured to generate output data that generates an indication of whether (and/or to what extent) the input data meets one or more integrity criteria. In some embodiments, layer 212 may generate an overall risk score that indicates the risk associated with the transaction (or with a collection of transactions).

In some embodiments, layer 212 may share any one or more features in common with system 100 described above with respect to fig. 1A-1B. In some embodiments, the layer 212 may include a process integrity engine 214 (which may share any one or more features common to the process integrity engine 110 described above with respect to fig. 1A-1B), a data integrity engine 216 (which may share any one or more features common to the data integrity engine 120 described above with respect to fig. 1A-1B), and a policy integrity engine 218 (which may share any one or more features common to the policy integrity engine 140 described above with respect to fig. 1A-1B).

The insight micro service layer 220 can include one or more processors and one or more data stores. The insight micro service layer 220 may include one or more processors configured to receive input data (e.g., from one or more of the underlying layers 202, 208, 210, and/or 212 in the system 200 and/or from one or more external data sources) and apply one or more data processing models to generate the insight data. In some embodiments, insight micro-service layer 220 may apply one or more clustering operations configured to cluster transactions based upon customer, product, time, location, or other suitable clustering criteria. In some embodiments, insight micro-service layer 220 may extract the behavior of groups and/or sub-groups of transactions from layer 212. Transactions may be aggregated based on time, location, amount, product, client, vendor, and/or any other attribute or combination of attributes.

Recommendation layer 222 may include one or more processors and one or more data stores. Recommendation layer 222 may include one or more processors configured to receive input data (e.g., from one or more of the base layers 202, 208, 210, 212, and/or 220 in system 200 and/or from one or more external data sources) and apply one or more data processing models to generate recommendation data. The output generated by the recommendation layer may include one or more remedial actions based on the output from the underlying layers (e.g., 220 and 212). In some embodiments, the recommendation data may include data contained in an alert transmitted to and/or displayed to a human user or an analyst in order to prompt further review.

FIG. 3 depicts a graph showing probabilities of using a Bayesian belief network to track the overall assertion of uncertainty in a inference as true, in accordance with some embodiments. Data analysis according to this network may be applied by one or more data processing engines of the system. Fig. 3 depicts how an overall probability may be determined based on a plurality of potential probabilities (underlying probability), including, for example, the probability of a presence (evaluation) assertion being true, the probability of a deadline assertion being true, and/or the probability of an accuracy assertion being true.

Fig. 4 depicts evidence reasoning for revenue and receivables using a bayesian belief network, in accordance with some embodiments. Data analysis according to this network may be applied by one or more data processing engines of the system.

The financial statement may include the following components:

asset liability statement or financial status statement, reports of assets, liabilities and owner interests of the company at a given point in time.

Revenue report-or profit-loss report (P & L report), or comprehensive revenue table, or revenue & expense report-reports of the company's profits, spending and profits over a specified period of time. The damage report provides information about the operation of the enterprise. These include sales and various expenses incurred during the period.

Rights change report or rights report, or report of rights change conditions of a company in a prescribed period of time with a benefit form maintained.

Cash flow statement report of cash flow activity of company, in particular its operations, investments and financing activities within a specified period of time.

The comprehensive revenue table relates to other comprehensive revenue items that were not accounted for in determining the net profit.

These financial statements are entitled "line items" and may include the following:

1. income (income)

2. Cost of sales

3. Muli (Maoli)

4. Management expense

5. Sale expense

6. Business profit of business

7. Financial cost

8. Profit before tax

These line items may be mapped to different parts of the financial statement. For example, cash register items map to liability statement and cash flow statement. The financial statement line items may be mapped to various portions of the accounting table. In accounting tables, there are equity liability table accounts, which may be required to create equity liability tables:

1. the asset account records whatever resources you company owns to provide value to the company. They may be tangible assets, such as land, equipment and cash, or intangible assets, such as patents, trademarks and software.

2. Liability accounts are records of all liabilities owed by a company. The name of the liability account is usually given the term "payable" —accounts payable, invoice payable. "non-labor revenue" is another liability account-typically cash payment that you company receives before providing the service.

3. Equity accounts are somewhat abstract. They represent the business remaining after all liabilities of the company are subtracted from the company's assets. They basically measure the value of a company to its owners or stakeholders.

In addition, the revenue report account includes the following:

revenue accounts keep track of any revenue the business obtains through sales of goods, services, or rentals.

The expense account is all the money and resources the business spends in generating revenue, i.e., hydropower, payroll and rent.

Use case study: revenue & accounts receivable audit

An audit of a financial statement line item (such as revenue) may require the establishment of the following assertions:

occurrence of: whether transactions occur and are related to the entity

Integrity: whether or not all transactions are recorded

Accuracy: whether or not the transaction is accurately recorded

Cut-off: whether or not the transaction is recorded in the correct accounting period

Classification: whether the transaction is recorded in the correct account

To audit financial statement line items such as receipts and accounts receivable, the accounts receivable and the receipts are substantially tested to establish the assertions described above:

substantial testing of revenue to establish occurrence, accuracy and valuation:

Returning recorded sales transaction vouchers to customer orders and shipping orders

Comparing billing and shipping amounts to customer orders

Sales cut-off of end-of-year records should be noted in particular

Scanning duplicate entries in sales logs

Substantial testing of revenue cut-off testing:

can be executed for sales, sales returns, cash receipts

Providing evidence of whether the transaction was recorded in the appropriate period of time

The deadline is typically several days before and after the date of the liability statement

The degree of cut-off test depends on the validity of the client control

Sales cut-off

The auditor selects sales samples recorded during the expiration period and provides sales invoices and shipping documents to determine if sales were recorded during the appropriate period

Cut-off test assertion of O presence and integrity

The auditor may also check the terms of the sales contract

Sales return cut-off

The client should document the return of the good using the receipt report

Reporting should be date, description, status, number of merchandise

The auditor selects a sample of the receipt report issued during the deadline and determines if the credit was recorded in the correct period

Substantial testing of revenue integrity:

Analysis procedure important to use previously numbered documents

Cut-off test

An auditor selects a sample of the shipment document and tracks it into a sales journal to test the integrity of the records of the sales

Substantial test of accounts receivable presence & occurrence:

estimation value

Is sales and accounts receivable initially recorded in the correct amount?

Does the client fully reclaim the recorded receivables (i.e., retractability)?

Rights and obligations

Liabilities associated with the element or sales arrangement

Account receivable cash deposit

Demonstration and disclosure

Mortgage, impression, transfer or related party receivables

Substantial testing of accounts receivable:

obtaining and evaluating aging of accounts receivable accounts

Confirming accounts receivable with customer

Execute cut-off test

Subsequent withdrawal of the audit of the accounts to be received

Regarding aging of receivables, because receivables are reported as variable net values, auditors must evaluate the management estimates for bad accounts:

timetable that auditors will obtain or prepare an aged receivables

If the schedule is prepared by the client, it will be tested for mathematical and aging accuracy

Aging schedules may be used for

Details of agreeing to control account balance

Selecting customer balance for validation

Identifying the party to whom money should be drawn for disclosure

Identifying overdue balance

Auditors evaluate the percent of unretractable

The auditor then recalculates the balance in the balance account (Allowance account)

With respect to aged accounts receivable, additional substantive testing may involve confirming accounts receivable with the customer:

validation provides reliable external evidence about

Presence of recorded accounts receivable

Cash withdrawals, sales discounts, and integrity of sales returns and discounts

The GAAS is required to acknowledge unless one of the following is present:

it is not important that the account be collected

The use confirmation will be invalid

Environmental risk is assessed as low and sufficient evidence can be obtained by using other substantive tests

The type of acknowledgement may include a positive acknowledgement:

customers need to reconcile the amount on the confirmation with their accounting records and respond directly to auditors, whether or not they agree to the amount

Positive acknowledgement claim response

If the customer does not respond, then the auditor must use an alternative procedure

The types of acknowledgements may include negative acknowledgements:

Only when the customer does not agree to the balance, will his response be required (non-response is considered as agreeing)

Cost less because if the customer does not respond, no additional procedures need to be performed

Can be used when all of the following conditions exist

Confirming the balance of a large number of small customers

Environmental risk of receivables is assessed as low

Auditors believe that the customer will give the appropriate attention to the validation

The type of acknowledgement may include a subsequent procedure that is not responded to:

if the client does not respond to a positive acknowledgement, then the auditor may send a second or even a third request

If the customer is still unresponsive, then the auditor will use an alternative procedure

Check the cash register journal for cash withdrawn after the end of the year

Care is taken to ensure that the receipt is an annual receivables, not a subsequent sale

Checking accounts receivable enabled documents (purchase orders, sales invoices, shipping documents) to determine if sales occurred before the end of the year

Evidence gathered from internal documents is considered unreliable

Sampling for substantive testing

In the PCAOB AS 2315, the audit sampling method is discussed when "audit procedures are applied to less than 100% of items within an account balance or transaction category for the purpose of assessing certain characteristics of the balance or transaction category". Sampling is also one of the reasons that audit results can only be reasonably guaranteed, not absolutely guaranteed.

Reasonable guarantees are high-level guarantees for heavy misinformation, but not absolute guarantees. Reasonable assurance includes an understanding that it is highly likely that significant false positives cannot be prevented or detected in time. To obtain reasonable assurance, the auditor needs to obtain adequate, proper audit evidence to reduce audit risk to an acceptably low level. This means that using samples creates some uncertainty as significant false positives may be missed. On the other hand, absolute assurance provides assurance that the financial statement is free of significant misstatement.

Absolute values are not possible due to factors such as the need for professional judgment, the use of tests, inherent limitations of internal control, the dependence of accounting on estimates, and audit evidence which are generally convincing rather than conclusive.

By realizing that reasonable assurance is the supplement of audit risk, the meaning of the reasonable assurance to auditors can be deeply understood: audit risk + assurance level = 100%.

Audit risk is defined in AU section 312, "Audit risk and importance for Auditing," which refers to "risk that an auditor may unknowingly fail to properly modify his opinion on a heavily mispronounced financial statement. Since auditors must limit the overall audit risk to a low level, reasonable assurance must be at a high level. In mathematical terms, if the audit risk is 5%, then the assurance level is 95%.

In general, audit risk is the product of intrinsic risk, control risk, and detection risk:

(audit risk) = (major false alarm risk) × (detection risk)

Wherein:

(significant false positive risk) = (intrinsic risk of significant false positive) = (control risk)

The significant false positive risk or RMM consists of the inherent risk of significant false positive, while the control risk is the control that the audit client cannot prevent or detect significant false positive.

Example embodiment

The exemplary embodiments discussed below are demonstrated using audits of revenue and receivables. Revenue and receivables capture revenue generated through the order-to-cash process. The order-to-cash process includes creation of a sales order, preparation of a shipment (if the order involves a shipment), invoicing the customer, and receipt of payment at the time of payment by the customer. This process is repeated for all transactions recorded in the collection account in the ledger.

During the order to cash process, various information systems may be required to participate in the business process. The sales order is captured in an order management system (which may be part of an ERP system) that will trigger warehouse management to prepare shipments based on delivery date. Inventory management will record inventory reductions as products are shipped. And order management will invoice the customer (based on delivery terms). This transaction will be posted in a revenue account (lender) and a receivables account (debit) when invoicing the customer. And when payment is received it will be recorded in the receivables (lenders) and cash accounts (borrowers) in the ledger.

Audit of revenue accounts may require tracking transactions through the system in combination with validating evidence to verify values in the account to ensure that each transaction is properly posted according to accounting policy ASC 606 (IFRS 15). Sales orders are vouched for purchase orders, shipments are vouched for pick-up orders, and payments are vouched for various payment details such as bank statement, credit card processor settlement reports, ACH daily reports, etc.

Data integrity

Data integrity is intended to enable the existence (or occurrence), integrity, and accuracy of an audit process. Data integrity includes vouching and tracking:

vouching refers to checking document evidence that supports and validates transactions. This is the practice of determining the authenticity of the transaction recorded in the primary book. It includes verifying the transaction recorded in the book with the rights of the relevant document evidence and accounting basis; it is also confirmed that the amount mentioned in the voucher has been credited to the appropriate account which will disclose the nature of the transaction in the final account statement. In some embodiments, the vouchers do not include valuations.

Tracking is the process of tracking transactions in accounting records back to the source document. This may involve locating an item in the ledger, tracing it back to the statement (if necessary) to find a unique identifying document number, and then going to the accounting file to find the source document. Tracking is used to track transaction errors and also to verify that the transaction has been properly recorded.

Tracking provides evidence of integrity. The vouchers provide evidence of the occurrence. Tracking from the document to the financial statement may indicate integrity, but not occurrence, because some portion of the entire financial statement number has not yet been viewed. The vouching may indicate that an occurrence is indicated, but not integrity, because the original document may be lost (e.g., if it was not originally included in the financial statement).

In some embodiments, the form of the document evidence is in the form of a document, whether a pdf document, a word document, an excel spreadsheet, or an email. The evidence provided by a third party (such as a bank, carrier or customer of an entity) may be better than the evidence directly generated by the audited entity. In some embodiments, evidence provided in digital form by a third party (such as data that may be obtained directly through an API or portal site) provides the most powerful evidence. Evidence provided in structured or semi-structured form, without further explanation (such as EDI), may also provide an accurate evidence when available. Documents in Excel, word, or email form may require understanding using natural language processing, while scanned documents may require additional OCR to extract characters, words, entities, paragraphs, and tables from the document.

In some embodiments, the following data integrity verification may be performed for each of the following categories of FSLI:

revenue and accounts receivable: evidence may include one or more of the following: purchase order, various forms of shipment confirmation (e.g., bill of lading, proof of delivery, boxing bill, packaging list, shipment confirmation from a third party (such as shippo. Com), various forms of payment details (e.g., cash receipt, bank statement, eCock, money transfer notification, ACH report, information from a third party (such as plaid. Com), transaction and settlement report for credit card, and/or various forms of contract. Note that some of the document evidence may take the form of EDI messages.

Expense and accounts payable: evidence may include one or more of the following: invoice, proof of delivery or receipt of goods, payment details, and/or various forms of contracts. Note that some of the document evidence may take the form of EDI messages.

JE: evidence may include one or more of the following: various supporting documents for JE entries, such as invoices, cash receipts, excel, word, pdf, emails, and/or various electronic proofs. Cash and bank reconciliation may also involve bank statement to confirm assertion within cash account within G/L's account chart

Cash and cash equivalents: evidence may include one or more of the following: the bank statement and/or the password box cash manages daily reports.

Property, factory building and equipment (including lease accounting): evidence related to capital assets includes lease agreements, evidence (including images and video) supporting physical custody of the asset, maintenance receipts, various documents supporting depreciation calculations.

Inventory: evidence may include proof of physical custody-such as images and/or video, as well as shipping details that justify movement of inventory.

It should be noted that in some embodiments, the same set of documents may be used for data integrity verification of various FSLI. For example, information in shipping documents may be used for revenue and receivables and inventory FSLI.

Process integrity

Process integrity may evaluate the process consistency of each step of the process, both business process aspects and accounting process aspects. In some embodiments, for each of the following categories of FSLI, process integrity verification may be performed as follows:

revenue and accounts receivable: including verification of sales order to invoice, invoice to inventory exempt, invoice to revenue G/L, invoice to receivables, invoice to customer transactions, payment journal to receivables, credit voucher to inventory returns, credit voucher to receivables, and/or credit voucher to revenue.

Expense and accounts payable: including verification of accounts payable from purchase applications, purchase applications to expenses, payment journal to accounts payable, finance to cash, purchase applications to inventory additions, and/or various return processes.

JE: including validating business processes involving creation and adjustment of journal entries, including those streams from revenue (invoice to receivables, invoice to cash account in G/L), expense, equity, and/or liability.

Cash and cash equivalents: including validating business processes involving cash and cash equivalents in accounting tables-including payment journal to cash, finance to cash.

Property factory building and equipment (including rental accounting): including business processes involving the setting, operation, and maintenance and/or handling of PPEs.

Inventory: related business processes involving inventory ledgers include inventory deduction and/or inventory returns.

It should be noted that in some embodiments, many business processes involve one or more FSLI audits. As an example, the payment journal to cash process exists in revenue and receivables, JE, and cash equivalents.

Policy integrity

In some embodiments, policy integrity verification may be performed for each of the following categories of FSLI as follows:

Revenue and accounts receivable: the relevant accounting criteria includes ASC 606 (IFRS 15) "revenue identification from contracts with customers".

Expense and accounts payable: related accounting criteria include ASC 705 sales and service costs. There are separate accounting standards for payroll (ASC 710, ASC 712, ASC 715 and ASC 718), development (ASC 730) and resulting tax (ASC 740).

JE: related accounting criteria include ASC 210-asset liability statement, ASC 220-revenue statement, ASC 225-revenue statement, and ASC 230-cash flow statement.

Cash and cash equivalents: the relevant accounting criteria include ASCI 210-liability statement (originally ASC 305).

Property factory building and equipment (including rental accounting): related accounting standards include ASC 842 (IFRS 16), which replaced ASC 840 in the early 2019.

Inventory: the relevant accounting criteria include ASC 330.

Arrangement of

As shown in fig. 3, the orchestration engine may be used to orchestrate underlying modules within the data integrity system (e.g., purchase orders, vouching and tracking of bank statements), the process integrity system (e.g., validating orders to cash processes), and the policy integrity system (e.g., modules related to arbitrating revenue identification based on ASC 606). The orchestration engine may be configured to consider dependencies between these integrity verification systems-as policy integrity may depend on the results of data integrity and process integrity. In some embodiments, data and process integrity may run largely concurrently, as they may not have dependencies on each other in some embodiments. Within each integrity module, the orchestration engine may take full advantage of the maximum concurrency between modules.

The following includes a description of the features and characteristics of the various modules.

Data integrity module

And (5) invoice guarantee. This module performs symmetric vouching and tracking between invoice data in the ERP system and invoice data extracted (e.g., using ABBYY Flexicapture) from the physical invoice after post-processing is performed. Post-processing may involve normalizing customer name, customer address, line item number, line item description, and customer item number using the master data. The identity of the ERP data entry and the extracted document entry may be determined by the invoice number, and a fuzzy comparison may be performed on the configurable input list for a given column for comparison.

Purchase order guarantee: this module performs symmetric vouching and tracking between sales orders in the ERP system and purchase order data extracted from the physical PO after post-processing is performed (e.g., using a template-based method such as Abbyy Flexicapture or a template-less method). Post-processing may involve normalizing customer name, customer address, line item number, line item description, and customer item number using the master data. The identity of the ERP data entry and the extracted document entry may be determined by the PO number, and fuzzy comparisons may be performed on the configurable input list of a given column for comparison.

Bill of lading guaranty: this module performs symmetric vouching and tracking between invoice data in the customer ERP system and bill of lading form data extracted from the physical bill of lading (e.g., using ABBYY Flexicapture) -including packaging lists, and/or BoL forms after post-processing is performed. The identification of ERP data entries and extracted document entries is determined by the sales order or invoice number, and fuzzy comparisons are performed on the configurable input list for a given column for comparison.

Third party shipping record vouches: using a multi-carrier shipment tracking API (e.g., shippo), this module can verify the acceptance date, delivery date, shipping address, and/or shipping address of a given shipment.

Payment guarantee: the cash receipt vouching module compares the ERP journal payment entry with proof of payment in various supporting documents, such as bank statement, eccheck, money transfer notification, daily ACH report, and/or credit card settlement report. One or more of two different algorithms may be used to attempt to match the journal credential data with the bank statement data. The first algorithm is the "fuzzy date + amount" algorithm. Under this first algorithm, the journal voucher and the bank statement are matched by taking into account their date and amount; in terms of dates, a particular window of days (+/-delta_days_windows) may be allowed to match, as there may be a slight difference between the date recorded on the bank statement and the date recorded on the journal entry. The second algorithm is the "knapsack-amount matching" algorithm. Under this second algorithm, a single banking transaction may be mapped to multiple journal certificates in the case of a deposit such as counter deposit or one-time deposit. Backpack matching allows for consideration of the set of journal vouchers that match a single bank statement transaction, and several possible sets of journal vouchers that sum up to the amount of the bank statement transaction may be returned. To choose the best from several possible groups, the system may select the set of daily accounting vouchers with the highest matching score, where the matching score may be based on a fuzzy comparison of customer name and the mentioned invoice amount.

Process integrity module

Sales order to invoice: this module examines the correspondence between sales orders and invoices to ensure that customer information (e.g., name, bill & shipping address, item number, description, and/or line item in terms of unit price) is consistent. This module also helps to verify the partially invoiced sales order.

Invoice to customer transactions: this module examines the correspondence between sales in the customer transaction table (filtered for sales) and the sales invoice header table. The system may check whether each invoice number (e.g., primary key) from the customer transaction is present in the sales invoice header or vice versa.

Payment journal to customer transaction: this module checks the transaction in Paymentjournal against the payment in the transaction. The system may perform a check at the TransactionID level. The system may check the amount, customer account number, date, and/or currency for ambiguity. Moreover, the system may assign a reason code to any missing transaction in one of the tables, differences between columns, information loss during aggregation from InvoureNumber to TransactionID level (billing for payment days), and/or a relationship between Paymentjournal.

Accounts receivable roll forward (Account Receivable Roll Forward): this module is designed to present the initial balance and reconcile by examining the accuracy of the accounts receivable activity during the current period to derive the end-of-term balance. The module starts with the LedgerTransactionList table and performs COA numbering and financial period filtering to identify journal entries of interest. Subsequently, left-hand concatenation is performed on the general LedgerARRPORATED and general LedgerARRROGed using VoucherNumbers in the identified entries to obtain credential title information and invoice level information. Note that AR is posted when invoicing a customer and is removed when payment is received. For each entry, the type of receivables activity, the original invoice amount, the recalculated invoice amount, and/or a match metric are identified and calculated.

AR removal extension: this module may verify that the AR removed entry corresponds to the received payment credential and may link to the corresponding invoice.

Inventory returns: this module ensures that items related to the credit vouchers are properly added back to inventory on the financial statement if they should be (e.g., by verifying that they are not scrapped or sent back to the customer). Accounting events that occur upon return include debiting inventory and crediting sales Costs (COGS); while the second entry includes debit revenue and credit AR. Thus, in some embodiments, for each event that an item is credited and not scrapped, there should be an event that adds the item back to the inventory ledger. The module attempts to determine if there is agreement between the item numbers, units of measurement and/or quantities of each credit voucher of the inventory ledger while also ensuring that the dates of the two events occur within the accounting year. Each credit voucher is assigned a unique identifier that is also present in the inventory ledger (e.g., a voucher number) and this is used to identify the presence of records in both tables. Each credit voucher found in the inventory ledger is assigned a binary score of 1 or 0 based on whether the voucher assigned to the credit voucher is found in the inventory ledger. The module then compares the above mentioned metrics based on fuzzy logic or exact matches (only numbers). This step allows the system to determine if inventory has been added back correctly.

Credit voucher to customer transaction: this module is designed to ensure that the credit voucher is also included in the customer transaction table. Each credit voucher is identified by an invoice number ending in "CCN", which is also present in the customer transaction filtered by transaction type (sales) and this is used to identify the presence of records in both tables. Each record found in the invoice table (filtered for credit vouchers) and the transaction table is assigned a binary score of 1 or 0 based on whether it is found in both data sets. Similarly, checks are performed on the amount, customer, date, and/or currency to check the accuracy and validity of the data.

Payment journal to customer transaction: this module is configured to check the transaction in Paymentjournal against the payment in the transaction. The system performs a check at the TransactionID level. The system performs a fuzzy check on the amount, customer account number, date and/or currency. Moreover, the system assigns reason codes to any missing transactions in one of the tables, differences between columns, information loss during aggregation from InvyceNumber to TransactionID level (for payment logs), and/or the relationship between PaymentJonal. InvyceNumber and PaymentJonal. TransactionID (one-to-one/many-to-one).

Inventory abatement: this module is designed to verify that the invoiced item has been properly deducted from inventory. The module attempts to match the item numbers, units of measure, quantity, and/or date in the invoice row with the inventory ledger. In addition, it marks items that have been shipped but not invoiced, items that were invoiced prior to exemption, and/or invoice/shipment dates across accounting periods. It should be noted that invoice lines that are not considered items (such as services) may not be relieved because they may not be related to inventory in some embodiments. In some embodiments, the zero number invoiced items may not be dispensed with. Each invoice row is assigned a common identifier, which is also present in inventory, and this can be used to identify the presence of records in both tables. Each invoice found in the inventory ledger is assigned a binary score of 1 or 0 based on whether the voucher assigned to the invoice is found in the inventory ledger. The module then compares the above-mentioned metrics based on fuzzy logic while an exact match is expected. This step allows the system to determine if inventory has been properly deducted. These checks allow the system to ensure that items identified as revenue have been removed from inventory on the financial statement. Accounting events that occur during shipment include debiting COGS and crediting inventory; and accounting events that occur upon revenue recognition include debit AR and credit revenue. Thus, for each event an item is invoiced, there should be an event that removes the item from the inventory ledger.

Policy integrity module

Transfer of control right: this module uses shipping terminology to determine whether transfer control occurs at the shipping point, delivery point, or somewhere in between. This enables testing whether the obligation is completed before or after the boundary during accounting.

Contract approval and promise: this module/use case study is designed to identify contracts, identify contractual obligations, and identify business practices. Possible sources of potential misinformation covered by this section are unauthorized changes, incorrect sales orders/contracts, incorrectly entered orders, improper trade price allocation, improper individual performance obligations that are not in compliance with regulations, and/or individual performance obligations that are not properly interpreted.

Fixed price: this module/case study is designed to identify the unit price of each PO and any differences from the unit price reported in ERP. Possible sources of potential misstatement covered in this section are an unlicensed or inappropriately entered system for invoice pricing, inaccurate or complete identification of the total price of the contract (including cash, cashless, fixed and variable price), and/or a transaction price that is not properly determined according to IFRS15/ASC 606.

Calculated expected revenue: this module recalculates expected revenue after considering the existence of an agreement (e.g., contract), identifying obligations, determining transaction prices, assigning transaction prices to obligations, and determining final revenue that can be identified.

ASC 606 (IFRS 15) may be mapped to data, process, and policy integrity.

In some embodiments, any one or more of the data processing operations, cross-validation processes, vouching procedures, and/or other methods/techniques depicted herein may be performed, in whole or in part, by one or more of the systems (and/or components/modules thereof) disclosed herein.

Context-aware data integrity

Information integrity (also referred to as data integrity) may be defined as the presentation fidelity of information to the underlying subject matter of the information and the suitability of the information for its intended use. Information integrity, including vouching and tracking, is critical to FSLI auditing in terms of meeting two basic assertions (integrity and presence). Vouching is to verify the value of an entry in the ledger against its supporting documents (or underlying representation of the real world), while tracking is to verify each document (or representation of the real world) and track to the entry in the ledger. Vouchers are used to establish "presence" assertions, while tracking is used to establish "integrity" assertions.

According to known techniques, when auditing is sample-based, vouching and tracking is done independently as two separate processes. For example, during a typical audit period, the sampling rate may be 1-5% of all available transactions. However, known systems and methods for information integrity do not handle fuzzy comparisons, do not leverage the context of evidence (e.g., host data, industry ontology, industry, and customer knowledge), do not leverage multiple pieces of evidence to establish data integrity, do not address challenges for which evidence may have been revised or updated, and do not address one-to-many/many-to-one/many relationships. Accordingly, there is a need for improved systems and methods that address one or more of the above-described disadvantages.

Disclosed herein are methods and systems for performing automated (or semi-automated) data processing operations of an audit process, wherein vouching and tracking (e.g., FSLI audit for multiple documents and ERP records) is performed semi-automatically or fully automatically at the same time, wherein specification and actual matching of corresponding columns in the ledger to supporting source documents is performed automatically.

The systems and methods disclosed herein may provide improvements over known methods in a variety of ways. For example, the systems and methods disclosed herein may perform vouching and tracking simultaneously, rather than performing them as two separate processes and/or at two separate times. The system and method may categorize a collection of documents and identify available evidence for conducting a test representing loyalty. The system and method may leverage multiple pieces of evidence (e.g., processing more than one piece of evidence from a single application of data processing operations) simultaneously that have an impact on a single assertion (which may be contradictory).

Furthermore, the systems and methods disclosed herein may leverage a progressive framework to organize available evidence to ensure a quick direct match while allowing the greatest opportunity for matching evidence with higher ambiguity. The system and method may gradually organize the set of ERP/ledger data and unstructured documents based on the primary identifier. These documents may be located in multiple groups, given the potential ambiguity in extracting identifiers from the documents.

Further, the systems and methods disclosed herein may leverage fuzzy comparison frameworks to allow for potentially minor deviations. The system and method may compare and match entries from both ledgers and unstructured documents simultaneously. The system and method may use fuzzy comparisons of numbers and strings from ledgers and unstructured documents.

In addition, the systems and methods disclosed herein may leverage contextual information-both endogenous and exogenous information and knowledge, including primary data-to ensure that the data is fully understood in context. The system and method may automatically match support bar(s) within a document through machine learning, including deep learning, reinforcement learning, and/or continuous learning.

Further, the systems disclosed herein may have the ability to continuously/iteratively improve their performance over time, e.g., based on machine learning and feedback processing. The system and method may automatically augment a support document with additional contextual knowledge.

Described below are additional features, characteristics, and embodiments of systems and methods for semi-or fully-automatic simultaneous vouching and tracking of data integrity. In some embodiments, any one or more of the data integrity techniques discussed herein may be used as part of a combinable guaranteed integrity system, such as those described herein. In some embodiments, any one or more of the data integrity techniques discussed herein may share any one or more features/characteristics with the data integrity techniques discussed above with respect to the combinable guaranteed integrity framework.

In some embodiments, the system may be configured to perform one or more data processing operations to establish presentation loyalty for financial data, which may be used to determine if there are any significant false positives, for example, in FSLI.

The system may establish a subset of data within a financial system (such as an ERP system) over a specified period of time (e.g., accounting period), performing verification of the representation loyalty by vouching and tracking between the data in the financial system and various evidences. Note that some of the data (such as inventory, shipping, and/or payment) may be applicable to multiple FSLI. Subset selection may be based on a combination of best practices, industry-specific prior knowledge, and/or client-specific considerations. Based on cut-off criteria, best practices, and/or industry and client specific knowledge, the window for presentation loyalty verification may begin earlier than the accounting period or may end later than the accounting period. Information about the subset selection may be indicated by user input and/or automatically determined by the system.

The system may build a collection of (possibly multi-modal) evidence (including its provenance/lineage) that may be required to verify the representation loyalty of the selected subset of data. In some embodiments, the evidence may be in a structured or semi-structured form, such as EDI messages for the PO, bank statement, and/or shipping information. Available evidence and its provenance may be recommended based on best practices, such as indicated by one or more user inputs received by the system.

In some embodiments, the financial system may capture the final state of the agreement or transaction. It may be required to track the entire history from the original agreement and then make subsequent (e.g., multiple) revisions to fully verify the current state of the financial system.

Multiple multimodal proofs may be required to validate a single entry in a financial system. As an example, a sales order in a financial system may require validation of the unit price in the sales contract and the quantity in the EDI message. As another example, email communications may be used to revise the original purchase order or contract.

The system may collect evidence (e.g., each evidence may include one or more fields) associated with an entry (e.g., having one or more fields) in the transaction-level financial system, where the association may be defined by a similarity measure between the evidence and data in the financial system. In some embodiments, the collection and/or selection of evidence may be based on an automated process of the system, which may be performed based on the identification of the system as to which evidence is required to verify financial data that has been selected for verification. In some embodiments, the user specifies which evidence should be collected.

One or more pieces of evidence may be represented by one or more feature vectors. One or more entries from the financial system may be represented by one or more feature vectors. Feature vectors may be generated and stored by the system based on applying one or more data processing and/or information extraction operations to the collected evidence and the collected financial system entries.

In some embodiments, the system may represent one or more pieces of evidence as feature vectors. The system may generate and store feature vectors based on documents or other data representing evidence received by the system. The system may be configured to generate one or more feature vectors to represent a subset of data within a financial system (e.g., ERP system) over a specified period of time (e.g., accounting period), wherein verification of the representation loyalty is performed by vouching and tracking between the data of the financial system and various pieces of evidence. The system may be configured to generate one or more feature vectors to represent one or more of a set of (possibly multi-modal) evidence (including its provenance/lineage), which may be used to verify the representation loyalty.

In some embodiments, the system may be configured to encode the context information into one or more feature vectors, thereby capturing the context awareness. For example, when a document is obtained from a document repository or other data source, feature vectors may be generated based at least in part on metadata, document names, and/or other contextual information. Feature vectors may also be generated based at least in part on computing content extracted from evidence, such as a purchase order number, invoice number, payment journal ID, amount, and/or customer name. Feature vectors may also be generated based at least in part on calculations based on additional context information (whether endogenous and/or exogenous). The feature vector for field level evidence may be (or include, or be based on) the value of the field itself.

In some embodiments, the system may calculate a similarity metric that quantifies/scores the similarity between evidence (e.g., ingested documents) and data in the financial system (e.g., FSLI) to determine an association between records in the financial system and the evidence. This may establish a potential one-to-one, one-to-many, many-to-one, and/or many-to-many relationship between evidence and data from the financial system. The computation of the similarity measure may be based on feature vectors representing one or more pieces of evidence and/or on feature vectors representing one or more pieces of data in the financial system. In some embodiments, the system may use one or more weights in the similarity metric calculation. The weights may be specified by the user and/or may be trained using a machine learning model, e.g., continuous learning with observed performance based on similarity metrics. Calculating a similarity measure between the feature vector(s) representing evidence and data from the financial system may be based on dynamic programming.

In some embodiments, the system may generate output data indicative of a level, quantization, classification, and/or degree of representation loyalty, where the output may be generated based on the similarity metric. In some embodiments, the outputting may be based on selecting a subset of similarity metrics that indicate the highest level of similarity. In some embodiments, the output may be based on performing classification and/or clustering based on the calculated similarity metric.

In some embodiments, the output may be generated according to the following. The system may establish the representation loyalty based on the ontology representation of the items in the financial system for the various fields within the items. The measure representing loyalty may be based at least in part on a confidence that there is similarity between the evidence and the data in the financial system (e.g., based on a calculated similarity measure). The sufficiency of the entry to represent loyalty at each level may be established by an explicit specification or an implicit model. The association between evidence and transactions/items in the financial system may be one-to-one, one-to-many, many-to-one, or many-to-many. The representation loyalty may be determined based on direct evidence, indirect evidence, or both.

In some embodiments, the systems and methods disclosed herein may be configured according to the following dimensions for consideration in the matching of credentials:

data modality

○Excel

Webpage form

○ERP

Evidence modality

OCR+document understanding

The scanned version pdf,

signature, handwriting

Document understanding

E-mail

·Word、Excel

·EDI

·XBRL

Evidence type

Invoice (O)

○PO

○BoL、POD

Contract/lease

○8K/10K/10Q

Tax return

Entity extraction

Title and line

O PO#, invoice#

Amount (row, total)

Date o

Customer name/address

Product description/SKU

Delivery/payment terms

O number, unit price

Money (O)

Normalization

Main data (customer main data, product main data)

O-noumenon (e.g. incoter 2020)

ERP change

Client side variation

Contextualization

Order to cash

Purchasing to pay

Record to report

Sources of context

Endogenous o

Exogenous matching method

O is direct and indirect

Accurate

Ambiguity (with similarity score and confidence level)

Back bag

Fuzzy knapsack

Passive and active

Passive method of comparing data from document understanding and entity extraction

The proactive approach generates multiple surrogate hypotheses to determine whether evidence matches exist

Type of match

○1:1

O1. Duo (e.g., one payment is distributed to multiple invoices)

O multiple 1 (e.g., multiple payments are distributed to the same invoice)

Many (e.g., multiple lines on SO are consistent with multiple lines on invoice or PO) (reconcile)

Multiple evidence

Rationalize multiple matches made simultaneously and contribute to full vouch for relative priority

Version control/revision

Change order

O revision

FIG. 5 depicts one example of leveraging multiple evidence in terms of data integrity.

Example embodiment

Information integrity is one of the five struts of Information Assurance (IA) (availability, integrity, authentication, confidentiality and non-repudiation) and also one of the three major struts of information security (availability, integrity and confidentiality (commonly referred to as CIA triples)). It is also the basis for financial statement assurance.

Information integrity may be defined as the representative loyalty of information to the underlying topic of the information and the suitability of the information for its intended use. The information may be structured (e.g., form data, accounting transactions), semi-structured (e.g., XML), or unstructured (e.g., text, images, and video). The information is composed of representations of one or more events and/or instances created for a particular use. Such events or instances may have a number of attributes and characteristics that may or may not be included in the collection of information, depending on the intended use of the information.

There are various risks associated with the design, creation and use of information and in authenticating the integrity thereof (attestation engagement). Four types of risks are presented for information integrity in the AICPA information integrity white paper (2013):

1. Subject risk: refers to the risk that no suitable criteria can be formulated for an event or instance, and that the information about the event or instance is not suitable for its intended use (its applicability). It may include (1) that the attribute of interest or environmental attribute and other meta-information related to the event or instance may be unobservable or unmeasurable. (2) The information that can be supplied is misleading or may be misinterpreted by the intended recipient.

2. Risk of use: refers to the risk that the information will be used outside its intended purpose, used improperly, or not used when it should be used. It includes (1) the intended user using the information for purposes beyond its intended use, or not using the information for its intended use, resulting in a false decision or misinterpretation by the user. (2) The use of this information by someone other than the intended user results in a misinterpretation or erroneous decision making by the user.

3. Information design risk: including the risk of misinformation and use due to failure of the information design to address the topic, and the risk inherent to activities occurring throughout the life of the information

4. Information processing lifecycle risk: including risks introduced in the lifecycle of the specific information: (1) creation or identification of data, (2) measurement, (3) archives or records, (4) input, (5) processing, changing or aggregating to transform data into information, (6) storage or archiving, (7) output or retrieval, (8) use, (9) destruction

All the risks discussed above indicate that the integrity of the information depends on the integrity of the meta-information. Thus, in some embodiments, these risks and their nature may be considered when reporting information.

Within the expert standards, opinion regarding the integrity of information is derived by measuring or evaluating the reported information against appropriate criteria. Since criteria are closely related to meta information, identification of criteria requires analysis of meta information necessary to understand the subject. Information containing complete meta-information will provide a further set of possible criteria for evaluating information integrity or reporting information integrity. For example, if the meta-information indicates that the information is prepared according to generally accepted accounting principles, this may be the criteria for evaluating the information. In some embodiments, criteria must be appropriate, meaning that they must be objective, measurable, complete and relevant. Thus, in some embodiments, the criteria must be identifiable and capable of consistent evaluation; not only between different time periods, but also between entities or the like. Furthermore, it is important that the criteria may follow a procedure that gathers enough evidence to support the opinion or conclusion provided in the practitioner report. Moreover, in some embodiments, a metric may be selected that addresses the identified risk.

The information integrity in the context of financial statement audits is focused on the presentation fidelity of the information used by the financial statement audits. Financial statement audits include the following categories, often referred to as Financial Statement Line Items (FSLIs). A subset of these FSLI are listed below:

1. revenue & accounts receivable:

2. expense and accounts payable

3. Journal entry

4. Cash & Cash equivalent

5. Inventory of

6. Cost of goods sales

7. Prepay & other client assets

Ppe, lease and depreciation

9. Investment in

10. Reputation & intangible asset

Each of these line items may require that line item information in the ledger be connected to the real world. As an example, accounts receivable would need to be connected to the invoice and the purchase order, accounts payable would need to be connected to the purchase order, invoice and received goods, journal entries would need to support documents, and direct observation of the inventory requirement repository.

The present disclosure addresses one of the major risk areas for information integrity-the information processing cycle risk, as it recurs in handling each financial report. Other risk areas for information integrity are outside the scope of this disclosure.

Most FSLI establishes a common technique to represent loyalty is based on capturing documents (paper or electronic) of events that occur in the real world using a vouching & tracking technique.

The sampling-based vouching method for the PO may involve the steps of:

1. a set of transaction samples that need to be vouched for in transactions from ERP is established. The sampling may be a combination of:

dollar amount based most important transactions

Transactions that may have the highest risk or uncertainty due to the nature of the transaction

Layering transactions to apply different sampling rates to different segments (bands), since higher sampling rates apply to dollar amounts or higher risk transactions

Statistical sampling of the entire population

2. Locating a document in a document repository, assuming that the document can be accessed based on a PO number (or equivalent unique identifier)

3. Verification of the identity of the transaction-which may be a purchase order number, possibly in combination with a date and revision to uniquely identify the appropriate version of the PO when it is possible to make corrections and revisions

4. Verifying customer name, address (for shipping and billing), shipping terms, and payment terms

5. Verifying the number and unit price of each row

6. Verifying the total amount

On the other hand, the tracking may follow the following steps:

1. a sample set of documents that need to be tracked in a document repository is established. The sampling may be a combination of:

a. Statistical sampling of documents

2. Locating transactions in a document repository, assuming that the document can be accessed based on the PO number (or equivalent unique identifier)

5. Verifying the number and unit price of each row

6. Verifying the total amount

Evidence representing loyalty that may be used to build financial statement line projects includes any one or more of the following:

11. revenue & accounts receivable: may include contracts, purchase orders (pdf, word, excel, email), EDI messages (for POs, shipping, bank remittance notices), bill of lading, package inventory, shipping slips, delivery confirmation, consignment agreements, payment details (including cash receipts, bank statements, check images, remittance notices).

12. Expense and accounts payable: may include contracts, invoices, purchase orders, EDI messages, bill of lading, packaging lists, shipping slips, delivery confirmation

13. Journal entry: various forms of supporting documents, including emails, spreadsheets, word and pdf documents, receipts, contracts, and the like.

14. Cash & cash equivalents: bank statement of account

15. Inventory: images and videos of warehouse and store shelves, package listings/shipping slips, and returns may be included.

Ppe, lease and depreciation: various documentation of rental agreements, revenues and expenses associated with a particular PPE asset may be included.

This example embodiment relates to representing loyalty of data related to revenue and accounts receivable in a financial system, i.e., initial agreements (such as contracts & POs), evidence of performance of obligations (such as shipping), and evidence of payment details (such as bank statement). Note that some of these evidence may come from one or more third parties, such as evidence directly from a bank or shipping company. Some of the evidence may be in a semi-structured form (e.g., EDI or XML/EDI messages).

The process of verifying the representation of loyalty may include one or more of the following steps:

first step-establishing a subset of data within a financial System

The system may establish a subset of data within a financial system (such as ERP) during a specified accounting period, where the representation loyalty is verified by vouching and tracking between the data in the financial system and various evidences.

Accounting periods may be annual, single quarter, one month, one week, or one day (e.g., with continuous control and monitoring).

Subset selection may be based on best practices, a priori knowledge of a particular industry, and/or a combination of particular clients. Data in the financial system that is relevant to revenue and receivables FSLI audit and may require verification of the faithful representation includes sales order tables, sales invoice tables, and payment journal tables.

Based on the cut-off criteria and best practices and industry and client specific knowledge, the window for presentation loyalty verification may begin slightly earlier than the accounting period or may end later than the accounting period.

Second step-set up of evidence needed for verification

The system may build a collection of (possibly multi-modal) evidence (including its provenance/lineage) that may be required to verify that the representation is faithful.

The presentation loyalty for sales orders is often based on purchase orders or contracts. The purchase order may be received in the form of pdf, email, excel spreadsheet, word document, EDI message. Different portions of the sales order may come from different sources with different modalities. As an example, a sales order for the automotive part manufacturing industry may have a price determined by a sales contract and may receive a quantity for timely delivery via EDI messages. Alternatively, the pricing of the merchandise orders may be based on daily pricing tables rather than on advance agreements.

The presentation loyalty for shipment verification may be based on bill of lading, proof of delivery, package inventory, or shipping slips. It may be in the form of an EDI message or may be obtained from a third party service provider, such as shippo.

The presentation loyalty for payment journal may be based on various forms of payment details including check images, money transfer notices, bank statements, daily ACH reports, daily credit card settlement reports, EDI messages, or obtained from a third party service provider (such as a platform.

Verification of the faithful representation may include using both content and metadata associated with the evidence. In particular, the provenance and lineage of evidence can greatly facilitate the validation process (see USP US20100114628A1; US20100114629 A1). The provenance or lineage of the evidence captures all content relevant from the moment the evidence was created. It should track the location of the evidence, the visitor and access time, and the operations and transformations that may have been applied. If the provenance captures all content from the moment the evidence was created until it was loaded into the financial system, and if the provenance/lineage can prove unalterable and thus the non-repudiation can be fully established, then the presentation loyalty can be done entirely for only the provenance.

Note that financial systems often capture the final state of an agreement or transaction. On the other hand, the system may track the entire evolution history of evidence, starting from the original agreement, and then requiring subsequent (typically multiple) corrections to fully verify the current state of the financial system.

Third step-collection of evidence and creation of feature vectors

The system may collect, at a transaction level, evidence (each evidence may include one or more fields) associated with an entry (having one or more fields) in the financial system, where the association is defined by a similarity measure between the evidence and data in the financial system:

each evidence and item from the financial system may be represented by one or more feature vectors, for example as follows:

v＝(v ₁ ，v ₂ ，...，v _N )

feature vectors may be extracted, derived, or computed from each evidence and data of transactions in the financial system.

When obtaining documents from a repository of such documents, feature vectors may be calculated from metadata, document names, or other contextual information. The feature vector may include calculations of "content" extracted from evidence, such as purchase order#, invoice#, payment journal ID, amount, customer name. Feature vectors may also include calculations based on additional context information (endogenous and exogenous) that may be relevant. The feature vector of the field level evidence may be the value of the field itself.

As an example, the feature vector of the pdf document SO0001238-PO.pdf may be (SO 0001238, PO), indicating that this should be the PO associated with sales order # 1238. However, validation of the association will depend on additional verification of the content.

The feature vector may be defined based on the content of the document, such as (PO#, customer name, date, total amount).

Fourth step-calculate similarity measure

The association between the evidence and the items within the financial system may be based on a calculation of a similarity measure between the evidence and the items in the financial system. Some potential similarity metrics may be defined as follows:

cosine similarity: cosine similarity may be advantageous because even if two similar vectors are far apart in euclidean distance (due to the size of the document), they can still be oriented closer together. The smaller the angle, the higher the cosine similarity.

Manhattan distance: manhattan distance is a measure in which the distance between two points is the sum of the absolute differences of their Cartesian coordinates. Simply the sum of the differences between the x and y coordinates.

|x ₁ -x ₂ |+|y ₁ -y ₂ |

Euclidean distance: euclidean distances between two points in a planar or 3-dimensional space measure the length of a line segment connecting the two points.

The weights used in the similarity measure may be specified or trained using a machine learning model, possibly with continuous learning based on the performance of the observed similarity measure.

Calculating similarity between feature vector(s) representing evidence and data from a financial system may require a "soft" decision to be made in the process.

Using the example of vouching & tracking between sales orders and purchase order documents in ERP, an initial decision (based on the first k queries) can be made on the most appropriate k purchase order documents to be used for matching. Matching of purchase order numbers, customers, delivery & payment terms, and individual line items may then be performed within each purchase order document. Some of these items may require further discussion, such as customer and line items. The overall confidence score for each item may affect the overall ranking of the evidence. As an example, the first 2 documents that may be candidates for an entry for the financial system are doc_1 and doc_2, with similarity scores (or confidence scores) of c_11 and c_12. The subsequent evaluations of the combined confidence score for the next level of evaluation are c_21 and c_22. Using the definition for fuzzyAND, the overall confidence for c_1 becomes c_11×c_21 and the overall confidence for c_2 becomes c_12×c_22. Thus, the relative ranking between these two pieces of evidence as potential matches to items in the financial system may change. This approach allows us to evaluate multiple potential evidence simultaneously without pruning them prematurely.

Additional methods may be based on dynamic programming with backtracking. (see Li, c.s., chang, y.c., smith, j.r., bergman, l.d., and Castelli, v.,1999, 12, framework for efficient processing of content-based fuzzy Cartesian queries, storage and Retrieval for Media Databases 2000 (volume 3972), pages 64-75), international Society for Optics and Photonics; natsev, A., chang, Y.C., smith, J.R., li, C.S., and Vitter, J.S., month 8 of 2001, supporting incremental join queries on ranked inputs, VLDB (volume 1, pages 281-290); USP 6,778,946 (algorithm for identifying combinations). )

Fifth step-build up of representation loyalty

The expression loyalty may be established by the system as follows. The representation loyalty may be built based on the ontology representation of the items in the financial system for the various fields within the items. The measure representing loyalty may be indicated by a confidence that there is consistency/similarity between the evidence and the data in the financial system (e.g., based on a calculated similarity measure). Note that the confidence score when concatenating two feature vectors may be obtained by fuzzy AND logic for the confidence level of each feature vector:

·fuzzyAND(x,y)＝min(x,y)

·fuzzyAND(x,y)＝x*y

The sufficiency of representing loyalty at each level of an entry may be established by an explicit specification and/or an implicit model. The association between evidence and transactions/items in a financial system may be one-to-one, one-to-many, many-to-one, and many-to-many. Note that the representation of authenticity may be based on direct evidence and/or indirect evidence.

The level of match and confidence between two entities, whether numerical, string or date, can be calculated as explained below.

Fuzzy matching of values can be calculated as follows: a and B, each of which also has a confidence level:

a= (value_a, confidence_a) confidence_a is between [0,1]

B= (value_b, confidence_b) confidence_b is between [0,1]

Matching score between a and B = max {1- |value_a-value_b|/max_diff,0} ×100%

Note that max_diff is a parameter to be set to indicate that the matching score is 0.

Confidence = min { confidence_a, confidence_b }

Fuzzy matching of strings may be based on the leveshein distance. Whether deleting or inserting characters, three types of string mismatches may be as follows:

insertion of co.t.fwdarw.coat

Delete coat → co → t

Substitute coat → cost

The leveshein distance may be referred to as an edit distance and may count the minimum number of operations (edits) required to transform one string into another. As an example, the leveshein distance between "kitten" and "sitting" is 3. The minimum editing script that transforms the former into the latter is:

kitten→sitten (replacing "k" with "s")

sitten→sittin (replacement of "e" with "i")

sittin→sitting (insert "g" at the end)

Fully utilize the Fuzzywuzzy open source Python library

Confidence = min { confidence_a, confidence_b }

Match_score＝

Fuzzy matching of dates may be calculated as follows: assuming that the tolerance window is M days before being considered a complete mismatch, the system may perform the following:

a= (date_a, confidence_a) confidence_a is between [0,1]

B= (date_b, confidence_b) confidence_b is between [0,1]

Matching score between a and B = max {1- |date_a-date_b|/M,0}, 100%

Confidence = min { confidence_a, confidence_b }

In the systems described herein, vouching and tracking may be performed simultaneously. The system herein may examine each journal entry (in the G/L, AP, AR, or other area) in the ERP, validate its supporting documents, and simultaneously track each source document to the corresponding entry in the ERP. Such simultaneous vouching and tracking may minimize the amount of I/O operations to be performed on the ERP system and the content management system, rather than performing this operation separately and requiring twice the number of underlying operation accesses.

In some embodiments, any one or more of the data processing operations, cross-validation procedures, vouching procedures, and/or other methods/techniques depicted herein may be performed in whole or in part by one or more of the systems (and/or components/modules thereof) disclosed herein.

Computer with a memory for storing data

Fig. 5 illustrates an example of a computer according to some embodiments. Computer 500 may be a component of a system for providing an AI-enhanced audit platform including techniques for providing AI interpretability for processing data through multiple layers. In some embodiments, computer 500 may perform any one or more of the methods described herein.

The computer 500 may be a host computer connected to a network. The computer 500 may be a client computer or a server. As shown in fig. 5, computer 500 may be any suitable type of microprocessor-based device, such as a personal computer, workstation, server, or handheld computing device (such as a telephone or tablet computer). The computer may include, for example, one or more of a processor 510, an input device 520, an output device 530, a storage 540, and a communication device 560. The input device 520 and the output device 530 may correspond to those described above and may be connected to or integrated with a computer.

The input device 520 may be any suitable device that provides input, such as a touch screen or monitor, keyboard, mouse, or voice recognition device. The output device 530 may be any suitable device that provides output, such as a touch screen, monitor, printer, disk drive, or speaker.

Storage 540 may be any suitable device that provides storage, such as electronic, magnetic, or optical memory, including Random Access Memory (RAM), cache, hard disk drive, CD-ROM drive, tape drive, or removable storage disk. The communication device 560 may include any suitable device capable of transmitting and receiving signals over a network, such as a network interface chip or card. The components of the computer may be connected in any suitable manner, such as via a physical bus or wirelessly. Storage 540 may be a non-transitory computer-readable storage medium including one or more programs that, when executed by one or more processors (such as processor 510), cause the one or more processors to perform the methods described herein.

Software 550, which may be stored in storage 540 and executed by processor 510, may include, for example, programming (e.g., as described above, as embodied in a system, computer, server, and/or device) that implements the functionality of the present disclosure. In some embodiments, software 550 may include a combination of servers, such as an application server and a database server.

The software 550 may also be stored and/or transported within any computer-readable storage medium for use by or in connection with an instruction execution system, apparatus, or device, such as those described above, that can fetch and execute the instructions associated with the software from the instruction execution system, apparatus, or device. In the context of this disclosure, a computer-readable storage medium may be any medium, such as storage 540, that can contain or store programming for use by or in connection with an instruction execution system, apparatus, or device.

The software 550 may also be propagated within any transmission medium used by or in connection with an instruction execution system, apparatus, or device, such as those described above, from which instructions associated with the software can be retrieved and executed. In the context of this disclosure, a transport medium can be any medium that can communicate, propagate, or transport programming for use by or in connection with an instruction execution system, apparatus, or device. The transportation readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, or infrared wired or wireless propagation medium.

The computer 500 may be connected to a network, which may be any suitable type of interconnected communication system. The network may implement any suitable communication protocol and may be secured by any suitable security protocol. The network may include any suitably arranged network link, such as a wireless network connection, T1 or T3 line, wired network, DSL, or telephone line, that enables transmission and reception of network signals.

Computer 500 may implement any operating system suitable for operating on a network. The software 550 may be written in any suitable programming language, such as C, C ++, java, or Python. In various embodiments, for example, application software implementing the functionality of the present disclosure may be deployed in different configurations, such as in a client/server arrangement or through a Web browser as a Web-based application or Web service.

Appendix a shows additional information about AI-enhanced audit platforms, including techniques for applying combinable assurance integrity frameworks, in accordance with some embodiments.

The following is a list of the listed examples.

Embodiment 1, a system for generating risk assessment based on data representing a plurality of reports and data representing corroborative evidence, the system comprising one or more processors configured to cause the system to:

receiving a first dataset representing a plurality of reports;

receiving a second dataset comprising corroborative evidence related to one or more of the plurality of reports; and

one or more integrity analysis models are applied to the first data set and the second data set to generate output data including an assessment of risk.

Embodiment 2, the system of embodiment 1, wherein outputting the data comprises evaluating a risk that one or more of the plurality of reports represent a significant error.

Embodiment 3, the system of any of embodiments 1-2, wherein applying the one or more integrity analysis models comprises applying one or more process integrity analysis models to generate output data indicative of whether one or more process integrity criteria are met.

Embodiment 4, the system of embodiment 3, wherein applying the one or more process integrity analysis models includes determining whether the first data set indicates that one or more process integrity criteria with respect to a predefined procedure are met.

Embodiment 5, the system of any of embodiments 3-4, wherein applying the one or more process integrity analysis models includes determining whether the first data set indicates that one or more temporal process integrity criteria are met.

Embodiment 6, the system of any of embodiments 3-5, wherein applying the one or more process integrity analysis models includes determining whether the first data set indicates that one or more internal consistency process integrity criteria are met.

Embodiment 7, the system of any of embodiments 1-6, wherein applying the one or more integrity analysis models comprises applying one or more data integrity analysis models to generate an assessment of fidelity of information represented by the first dataset to information represented by the second dataset.

Embodiment 8, the system of embodiment 7, wherein applying the one or more data integrity analysis models is based on exogenous data in addition to the first data set and the second data set.

Embodiment 9, the system of any of embodiments 1-8, wherein applying the one or more integrity analysis models comprises applying one or more policy integrity models to generate output data comprising an arbitration according to one or more policy integrity criteria, wherein the arbitration is based on all or part of one or both of: the plurality of reports and corroborative evidence.

The system of embodiment 10, as in embodiment 9, wherein the arbitration by the one or more policy integrity models is based on a guarantee of a knowledge base including data representing one or more of: industry practices of an industry related to one or more of the plurality of reports, historical behavior related to one or more parties related to one or more of the plurality of reports, one or more accounting policies, and one or more audit criteria.

The system of any one of embodiments 11, 1-10, wherein the assessment of risk is associated with a level selected from the group consisting of: transaction level, account level, and line item level.

Embodiment 12, the system of any of embodiments 1-11, wherein generating the assessment of risk is based at least in part on an assessed risk level attributable to one or more automated processes used to generate or process one or both of the first data set and the second data set.

Embodiment 13, the system of any of embodiments 1-12, wherein generating the assessment of risk comprises performing a global test on the first data set and the second data set.

The system of any of embodiments 14, 1-13, wherein generating an assessment of risk comprises:

applying one or more process integrity models based on ERP data included in one or both of the first data set and the second data set; and

one or more data integrity models are applied based on the corroborative evidence in the second data set.

The system of embodiment 15, any of embodiments 1-14, wherein the one or more processors are configured to apply the assessment of risk to configure the characteristics of the target sampling process.

The system of embodiment 16, any of embodiments 1-15, wherein the one or more processors are configured to apply one or more common modules across two or more models selected from: a data integrity model, a process integrity model, and a policy integrity model.

Embodiment 17, the system of any of embodiments 1-16, wherein the one or more processors are configured to apply a warranty insight model to generate warranty insight data based at least in part on the generated assessment of significant false positive risk.

Embodiment 18, the system of embodiment 17, wherein the one or more processors are configured to apply a guaranteed recommendation model to generate recommendation data based at least in part on the guaranteed insight data.

The system of any of embodiments 19, 1-18, wherein the one or more processors are configured to:

receiving user input comprising instructions on a set of criteria to be applied; and

the one or more integrity analysis models are applied in accordance with received instructions regarding a set of criteria to be applied.

The system of any of embodiments 20, 1-19, wherein applying the one or more integrity analysis models comprises:

Applying the first set of one or more integrity analysis models to generate first result data; and

based on the first result data, it is determined whether to apply a second subset of the one or more integrity analysis models.

Embodiment 21, a non-transitory computer-readable storage medium storing instructions for generating a risk assessment based on data representing a plurality of reports and data representing corroborative evidence, the instructions configured to be executed by a system comprising one or more processors to cause the system to:

receiving a first dataset representing a plurality of reports;

Embodiment 22, a method for generating a risk assessment based on data representing a plurality of reports and data representing corroborative evidence, wherein the method is performed by a system comprising one or more processors, the method comprising:

receiving a first dataset representing a plurality of reports;

Embodiment 23, a system for generating an assessment of loyalty of data, the system comprising one or more processors configured to cause the system to:

receiving a first dataset representing a plurality of reports;

receiving a second data set comprising a plurality of items of corroborative evidence related to one or more of the plurality of reports;

generating a corresponding report feature vector for each report in the multiple reports;

generating a respective evidence feature vector for each of the plurality of items of corroborating evidence;

calculating a similarity metric based on one or more of the report feature vectors and based on one or more of the evidence feature vectors, the similarity metric representing a level of similarity between a set of one or more of the plurality of reports and a set of one or more of the plurality of items of corroborating evidence; and

output data representing an assessment of loyalty of the first dataset is generated based on the similarity measure.

Embodiment 24, the system of embodiment 23, wherein generating output data representing an assessment of loyalty includes performing a clustering operation on a set of similarity metrics including the similarity metric.

The system of embodiment 25, any of embodiments 23-24, wherein generating the respective report feature vectors includes encoding one or more of: content information included in the first data set, context information included in the first data set; and information received from a different data source than the first data set.

The system of embodiment 26, any of embodiments 23-25, wherein generating the respective evidence feature vector includes encoding one or more of: content information included in the second data set, context information included in the second data set; and information received from a data source different from the second data set.

The system of embodiment 27, any of embodiments 23-26, wherein the first data set is selected based on one or more data selection criteria for selecting a subset of available data within the system, wherein the subset selection criteria comprises one or more of: data content criteria and time criteria.

The system of embodiment 28, any of embodiments 23-27, wherein the second data set includes data representing provenance of one or more of the evidence-corroborating items.

The system of embodiment 29, any of embodiments 23-28, wherein the second data set comprises one or more of: structured data, semi-structured data, and unstructured data.

The system of embodiment 30, any of embodiments 23-29, wherein the second data set includes data representing multiple versions of a single document.

The system of embodiment 31, any of embodiments 23-30, wherein generating the similarity measure includes comparing a single one of the report feature vectors to a plurality of evidence feature vectors.

The system of embodiment 32, any of embodiments 23-31, wherein generating the similarity measure includes applying dynamic programming.

The system of embodiment 33, any of embodiments 23-32, wherein generating the similarity measure includes applying one or more weights, wherein the weights are determined according to one or more machine learning models.

The system of embodiment 34, any of embodiments 23-33, wherein generating output data representing an assessment of loyalty comprises generating a confidence score.

The system of any of embodiments 35, 23-34, wherein generating output data representing an assessment of loyalty comprises assessing sufficiency of loyalty at a plurality of levels.

Embodiment 36, a non-transitory computer-readable storage medium storing instructions for generating an assessment of loyalty of data, the instructions configured to be executed by a system comprising one or more processors to cause the system to:

receiving a first dataset representing a plurality of reports;

Embodiment 37, a method for generating an assessment of loyalty of data, wherein the method is performed by a system comprising one or more processors, the method comprising:

receiving a first dataset representing a plurality of reports;

The present application incorporates by reference the entire content of U.S. patent application entitled "AI-AUGMENTED AUDITING PLATFORM INCLUDING TECHNIQUES FOR AUTOMATED ASSESSMENT OF VOUCHING EVIDENCE", filed at month 6 and 30 of 2022, attorney docket number: 13574-20068.00.

The present application incorporates by reference in its entirety the U.S. patent application entitled "AI-AUGMENTED AUDITING PLATFORM INCLUDING TECHNIQUES FOR AUTOMATED ADJUDICATION OF COMMERCIAL SUBSTANCE, RELATED PARTIES, AND collectiboliy", filed at month 6 and 30 of 2022, attorney docket number: 13574-20069.00.

The present application incorporates by reference the entire content of U.S. patent application entitled "AI-AUGMENTED AUDITING PLATFORM INCLUDING TECHNIQUES FOR AUTOMATED DOCUMENT PROCESSING", filed at month 6 and 30 of 2022, attorney docket number: 13574-20071.00.

The present application incorporates by reference the entire contents of U.S. patent application entitled "AI-AUGMENTED AUDITING PLATFORM INCLUDING TECHNIQUES FOR PROVIDING AI-EXPLAINABILITY FOR PROCESSING DATA THROUGH MULTIPLE LAYERS", filed at 2022, month 6 and 30, attorney docket number: 13574-20072.00.

Claims

1. A system for generating a risk assessment based on data representing a plurality of reports and data representing corroborative evidence, the system comprising one or more processors configured to cause the system to:

Receiving a first dataset representing a plurality of reports;

2. The system of claim 1, wherein outputting data comprises evaluating a risk that one or more of the plurality of reports represent significant misinformation.

3. The system of any of claims 1-2, wherein applying the one or more integrity analysis models includes applying one or more process integrity analysis models to generate output data indicative of whether one or more process integrity criteria are met.

4. The system of claim 3, wherein applying the one or more process integrity analysis models comprises determining whether the first data set indicates that one or more process integrity criteria with respect to a predefined procedure are met.

5. The system of any of claims 3-4, wherein applying the one or more process integrity analysis models includes determining whether the first data set indicates that one or more temporal process integrity criteria are met.

6. The system of any of claims 3-5, wherein applying the one or more process integrity analysis models includes determining whether the first data set indicates that one or more internal consistency process integrity criteria are met.

7. The system of any of claims 1-6, wherein applying the one or more integrity analysis models includes applying one or more data integrity analysis models to generate an assessment of fidelity of information represented by the first data set to information represented by the second data set.

8. The system of claim 7, wherein applying the one or more data integrity analysis models is based on exogenous data in addition to the first data set and the second data set.

9. The system of any of claims 1-8, wherein applying the one or more integrity analysis models comprises applying one or more policy integrity models to generate output data comprising a arbitration according to one or more policy integrity criteria, wherein the arbitration is based on all or part of one or both of: the plurality of reports and corroborative evidence.

10. The system of claim 9, wherein the arbitration made by the one or more policy integrity models is based on a guarantee of a knowledge base including data representing one or more of: industry practices of an industry related to one or more of the plurality of reports, historical behavior related to one or more parties related to one or more of the plurality of reports, one or more accounting policies, and one or more audit criteria.

11. The system of any one of claims 1-10, wherein the assessment of risk is associated with a level selected from the group consisting of: transaction level, account level, and line item level.

12. The system of any of claims 1-11, wherein generating an assessment of risk is based at least in part on an assessed risk level attributable to one or more automated processes used to generate or process one or both of the first data set and the second data set.

13. The system of any of claims 1-12, wherein generating an assessment of risk comprises performing a population of tests on the first data set and the second data set.

14. The system of any one of claims 1-13, wherein generating an assessment of risk comprises:

15. The system of any of claims 1-14, wherein the one or more processors are configured to apply the assessment of risk to configure characteristics of the target sampling process.

16. The system of any of claims 1-15, wherein the one or more processors are configured to apply one or more common modules across two or more models selected from: a data integrity model, a process integrity model, and a policy integrity model.

17. The system of any of claims 1-16, wherein the one or more processors are configured to apply a warranty insight model to generate warranty insight data based at least in part on the generated assessment of significant false positive risk.

18. The system of claim 17, wherein the one or more processors are configured to apply a guaranteed recommendation model to generate recommendation data based at least in part on the guaranteed insight data.

19. The system of any of claims 1-18, wherein the one or more processors are configured to:

20. The system of any of claims 1-19, wherein applying the one or more integrity analysis models comprises:

21. A non-transitory computer-readable storage medium storing instructions for generating a risk assessment based on data representing a plurality of reports and data representing corroborative evidence, the instructions configured to be executed by a system comprising one or more processors to cause the system to:

receiving a first dataset representing a plurality of reports;

22. A method for generating a risk assessment based on data representing a plurality of reports and data representing corroborative evidence, wherein the method is performed by a system comprising one or more processors, the method comprising:

receiving a first dataset representing a plurality of reports;