CN117897705A

CN117897705A - AI-enhanced audit platform including techniques for automatic arbitration of business parenchyma, correlators, and retrievability

Info

Publication number: CN117897705A
Application number: CN202280057539.3A
Authority: CN
Inventors: 李中生; W·程; M·J·弗拉维尔; L·M·霍尔马克; N·A·利佐特; K·M·梁
Original assignee: Pwc Product Sales Co ltd
Current assignee: Pwc Product Sales Co ltd
Priority date: 2021-06-30
Filing date: 2022-06-30
Publication date: 2024-04-16
Also published as: CN117882041A; CN117882081A; CN117751362A; CN117859122A

Abstract

Systems and methods for AI-enhanced automatic analysis of arbitrated documents are provided to quickly and efficiently make various arbitrations based on the document, including an arbitration as to whether the document represents underlying data meeting one or more predefined or dynamically determined criteria. Criteria for arbitrating may include business parenchymal criteria, related party transaction criteria, and/or retrievability criteria. The system may receive a plurality of documents and generate a plurality of feature vectors by applying natural language processing techniques. The system may apply one or more classification models to the plurality of feature vectors to generate output data classifying each of the feature vectors. The system may identify for each feature vector a subset that most closely matches the previous feature vector. Based on the classification and based on the identified subset, the system may arbitrate each feature vector with respect to the business substance, including arbitrating the classification and arbitrating the confidence score.

Description

AI-enhanced audit platform including techniques for automatic arbitration of business parenchyma, correlators, and retrievability

Cross Reference to Related Applications

U.S. provisional application No.63/217,119, filed at 2021, 6, 30; U.S. provisional application No.63/217,123, filed at 30/6/2021; U.S. provisional application No.63/217,127, filed at 30, 6, 2021; U.S. provisional application No.63/217,131, filed at 30/6/2021; and U.S. provisional application No.63/217,134, filed on 6/30 of 2021, the entire contents of each of which are incorporated herein by reference.

Technical Field

This document relates generally to AI-enhanced automated analysis of documents, and more particularly to AI-enhanced automated analysis of documents used in an audit platform to automatically arbitrate business essence (commercial substance), relatives, and retrievability (collectability).

Background

Traditional methods for processing documents to evaluate business parenchymal criteria, associated party transaction criteria, and/or retrievability criteria rely on manual evaluation by a human reviewer.

Disclosure of Invention

There is a need for improved methods for AI-enhanced automatic analysis of documents to quickly and efficiently make various decisions based on the document, including decisions as to whether the document represents underlying data that meets one or more predefined or dynamically determined criteria.

Disclosed herein are systems and methods for AI-enhanced automatic analysis of a arbitrated document in order to quickly and efficiently make various arbitrations based on the document, including arbitrations as to whether the document represents underlying data meeting one or more predefined or dynamically determined criteria. Criteria for arbitrating may include business parenchymal criteria, related party transaction criteria, and/or retrievability criteria.

In some embodiments, a first system is provided for classifying documents, the first system comprising one or more processors configured to cause the first system to: receiving a plurality of documents; generating a plurality of feature vectors by applying one or more natural language processing techniques to each document of the plurality of documents to generate a respective feature vector representing the document; applying one or more classification models to the plurality of feature vectors to generate output data classifying each feature vector as a respective one or more of a plurality of classes; identifying, based on the feature vectors, a subset of a second plurality of feature vectors that most closely matches each of the respective feature vectors of the plurality of feature vectors; a plurality of decisions for each of the plurality of feature vectors representing the document is determined based on the output data classifying each feature vector and based on the identification of the subset of subsets, wherein each of the plurality of decisions includes a decision classification and a decision confidence score.

In some embodiments of the first system: the one or more processors are configured to apply one or more models to each of the plurality of feature vectors to calculate a respective change in one or more characteristics; and wherein the determination of the plurality of decisions is further based on corresponding changes in the calculation for the one or more characteristics.

In some embodiments of the first system, the one or more features include one or more of the following features: risk characteristics, timing characteristics, and monetary characteristics.

In some embodiments of the first system, determining the arbitration classification includes determining whether the document meets business parenchymal criteria.

In some embodiments, a first non-transitory computer-readable storage medium is provided that stores instructions for classifying documents, the instructions configured to be executed by one or more processors to cause a system to: receiving a plurality of documents; generating a plurality of feature vectors by applying one or more natural language processing techniques to each document of the plurality of documents to generate a respective feature vector representing the document; applying one or more classification models to the plurality of feature vectors to generate output data classifying each feature vector as a respective one or more of a plurality of classes; identifying, based on the feature vectors, a subset of a second plurality of feature vectors that most closely matches each of the respective feature vectors of the plurality of feature vectors; a plurality of decisions for each of the plurality of feature vectors representing the document is determined based on the output data classifying each feature vector and based on the identification of the subset of subsets, wherein each of the plurality of decisions includes a decision classification and a decision confidence score.

In some embodiments, a first method for classifying documents is provided, wherein the first method is performed by a system comprising one or more processors, the first method comprising: receiving a plurality of documents; generating a plurality of feature vectors by applying one or more natural language processing techniques to each document of the plurality of documents to generate a respective feature vector representing the document; applying one or more classification models to the plurality of feature vectors to generate output data classifying each feature vector as a respective one or more of a plurality of classes; identifying, based on the feature vectors, a subset of a second plurality of feature vectors that most closely matches each of the respective feature vectors of the plurality of feature vectors; a plurality of decisions for each of the plurality of feature vectors representing the document is determined based on the output data classifying each feature vector and based on the identification of the subset of subsets, wherein each of the plurality of decisions includes a decision classification and a decision confidence score.

In some embodiments, a second system is provided for identifying interested parties within a plurality of databases, the second system comprising one or more processors configured to cause the second system to: receiving a dataset indicating a first set of parties associated with an entity; generating a graph data structure representing a first plurality of relationships between the entity and the first set of parties based on the first set of parties; submitting one or more of the first set of parties as one or more input queries to obtain a second set of parties related to the one or more input queries from a plurality of databases; the graph data structure is updated based on the second set of parties to represent a second plurality of relationships between the entity and the second set of parties.

In some embodiments of the second system, the one or more processors are configured to apply one or more disambiguation models to the second set of parties prior to updating the graph data structure based on the second set of parties.

In some embodiments, a second non-transitory computer-readable storage medium is provided that stores instructions for identifying interested parties within a plurality of databases, the instructions configured to be executed by a system comprising one or more processors configured to cause the system to: receiving a dataset indicating a first set of parties associated with an entity; generating a graph data structure representing a first plurality of relationships between the entity and the first set of parties based on the first set of parties; submitting one or more of the first set of parties as one or more input queries to obtain a second set of parties related to the one or more input queries from a plurality of databases; the graph data structure is updated based on the second set of parties to represent a second plurality of relationships between the entity and the second set of parties.

In some embodiments, a second method is provided for identifying interested parties within a plurality of databases, wherein the second method is performed by a system comprising one or more processors, the second method comprising: receiving a dataset indicating a first set of parties associated with an entity; generating a graph data structure representing a first plurality of relationships between the entity and the first set of parties based on the first set of parties; submitting one or more of the first set of parties as one or more input queries to obtain a second set of parties related to the one or more input queries from a plurality of databases; the graph data structure is updated based on the second set of parties to represent a second plurality of relationships between the entity and the second set of parties.

In some embodiments, a third system is provided for anomaly identification and analysis, the third system comprising one or more processors configured to cause the third system to: receiving input data representing a plurality of interactions between a first entity and a plurality of respective entities; applying one or more anomaly identification models to generate anomaly data representing a first subset of interactions as anomalies; a second subset of interactions is identified, wherein the second subset is a subset of the first subset, wherein the identification of the second subset is based on the anomaly data and on a data structure representing a plurality of relationships between the first entity and a set of entities associated with the entity.

In some embodiments of the third system, the input data comprises transaction data.

In some embodiments of the third system, the second subset of interactions is identified as transactions for which there is an increased risk of abnormality of the relevant party.

In some embodiments, a third non-transitory computer readable storage medium is provided that stores instructions for anomaly identification and analysis configured to be executed by a system comprising one or more processors to cause the system to: receiving input data representing a plurality of interactions between a first entity and a plurality of respective entities; applying one or more anomaly identification models to generate anomaly data representing a first subset of interactions as anomalies; a second subset of interactions is identified, wherein the second subset is a subset of the first subset, wherein the identification of the second subset is based on the anomaly data and on a data structure representing a plurality of relationships between the first entity and a set of entities associated with the entity.

In some embodiments, a third method is provided for anomaly identification and analysis, wherein the third method is performed by a system comprising one or more processors, the third method comprising: receiving input data representing a plurality of interactions between a first entity and a plurality of respective entities; applying one or more anomaly identification models to generate anomaly data representing a first subset of interactions as anomalies; a second subset of interactions is identified, wherein the second subset is a subset of the first subset, wherein the identification of the second subset is based on the anomaly data and on a data structure representing a plurality of relationships between the first entity and a set of entities associated with the entity.

In some embodiments, a fourth system is provided for behavioral modeling and analysis, the fourth system comprising one or more processors configured to cause the fourth system to: receiving first input data comprising a data structure representing a relationship between a plurality of entities; receiving second input data representing the behavior of one or more of the entities represented in the data structure; one or more behavioral models are applied to determine a risk of a relevant party anomaly represented by the second input data based on the first input data and the second input data.

In some embodiments, a fourth non-transitory computer-readable storage medium is provided that stores instructions for behavioral modeling and analysis configured to be executed by a system comprising one or more processors to cause the system to: receiving first input data comprising a data structure representing a relationship between a plurality of entities; receiving second input data representing the behavior of one or more of the entities represented in the data structure; one or more behavioral models are applied to determine a risk of a relevant party anomaly represented by the second input data based on the first input data and the second input data.

In some embodiments, a fourth method for behavioral modeling and analysis is provided, wherein the fourth method is performed by a system comprising one or more processors, the fourth method comprising: receiving first input data comprising a data structure representing a relationship between a plurality of entities; receiving second input data representing the behavior of one or more of the entities represented in the data structure; one or more behavioral models are applied to determine a risk of a relevant party anomaly represented by the second input data based on the first input data and the second input data.

In some embodiments, a fifth system is provided for predicting a likelihood of retraction, the fifth system comprising one or more processors configured to cause the fifth system to: receiving a first data set comprising endogenous information related to a transaction; receiving a second data set comprising exogenous information related to one or more parties to the transaction; configuring a retrievable uncertainty model based on the first data set and the second data set; receiving a third data set comprising information about the transaction; and providing information about the interaction to the retrievability uncertainty model to generate an output indicative of a likelihood of retrieval of the transaction.

In some embodiments of the fifth system, the endogenous information comprises one or more selected from the group consisting of: payment history information of a party to the transaction; credit assessment information performed prior to initiation of the transaction; and payment history information for one or more parties associated with the party to the transaction.

In some embodiments of the fifth system, the exogenous information comprises one or more selected from the group consisting of: economic behavior information of an industry related to a party to a transaction; economic behavior information of a value chain of a transaction party; news information related to a party to a transaction, related industry, or related value chain; product review information, employee mood information; consumer mood information.

In some embodiments of the fifth system, the third data set includes information regarding prior disputes for transactions between the plurality of entities.

In some embodiments of the fifth system, applying the retrievable uncertainty model comprises: generating an initial prediction of uncertainty based on a first dataset comprising endogenous information; and applying one or more predictive models based on the second dataset comprising exogenous information.

In some embodiments of the fifth system, the retrievability uncertainty model is validated after the occurrence of the rare event and based on its prediction in response to the rare event.

In some embodiments, a fifth non-transitory computer-readable storage medium is provided that stores instructions for predicting a likelihood of retraction, the instructions configured to be executed by a system comprising one or more processors to cause the system to: receiving a first data set comprising endogenous information related to a transaction; receiving a second data set comprising exogenous information related to one or more parties to the transaction; configuring a retrievable uncertainty model based on the first data set and the second data set; receiving a third data set comprising information about the transaction; and providing information about the interaction to the retrievability uncertainty model to generate an output indicative of a likelihood of retrieval of the transaction.

In some embodiments, a fifth method is provided for predicting a likelihood of retraction, wherein the fifth method is performed by a system comprising one or more processors, the fifth method comprising: receiving a first data set comprising endogenous information related to a transaction; receiving a second data set comprising exogenous information related to one or more parties to the transaction; configuring a retrievable uncertainty model based on the first data set and the second data set; receiving a third data set comprising information about the transaction; and providing information about the interaction to the retrievability uncertainty model to generate an output indicative of a likelihood of retrieval of the transaction.

In some embodiments, a sixth system is provided for classifying documents, the sixth system comprising one or more processors configured to cause the sixth system to: receiving data representing a document; applying one or more natural language processing techniques to the received data to generate feature vectors representing the document; identifying a second feature vector from the use case library based on similarity to the feature vector based on the feature vector; applying the plurality of models to the feature vectors to calculate respective changes in the plurality of features represented by the document; and determining a arbitration for the document based on the identified second feature vector and based on the calculated corresponding changes in the plurality of features, wherein the arbitration includes an arbitration classification and an arbitration confidence score.

In some embodiments, a sixth non-transitory computer readable storage medium is provided that stores instructions for classifying documents, the instructions configured to be executed by one or more processors to cause a system to: receiving data representing a document; applying one or more natural language processing techniques to the received data to generate feature vectors representing the document; identifying a second feature vector from the use case library based on similarity to the feature vector based on the feature vector; applying the plurality of models to the feature vectors to calculate respective changes in the plurality of features represented by the document; and determining a arbitration for the document based on the identified second feature vector and based on the calculated corresponding changes in the plurality of features, wherein the arbitration includes an arbitration classification and an arbitration confidence score.

In some embodiments, a sixth method is provided for classifying documents, wherein the sixth method is performed by a system comprising one or more processors, the sixth method comprising: receiving data representing a document; applying one or more natural language processing techniques to the received data to generate feature vectors representing the document; identifying a second feature vector from the use case library based on similarity to the feature vector based on the feature vector; applying the plurality of models to the feature vectors to calculate respective changes in the plurality of features represented by the document; and determining a arbitration for the document based on the identified second feature vector and based on the calculated corresponding changes in the plurality of features, wherein the arbitration includes an arbitration classification and an arbitration confidence score.

In some embodiments, a seventh system is provided for identifying relationships between entities represented in one or more data sets, the seventh system comprising one or more processors configured to cause the seventh system to: receiving one or more data sets representing a plurality of entities; generating a graph data structure based at least in part on the one or more data sets, the graph data structure representing entities among the plurality of entities as nodes and relationships between pairs of entities as edges between corresponding pairs of nodes; receiving input data indicating a query entity pair; and determining whether the query entity pair meets one or more relevant entity criteria based at least in part on the graph data structure.

In some embodiments, a seventh non-transitory computer readable medium is provided that stores instructions for identifying relationships between entities represented in one or more data sets, the instructions configured to be executed by a system comprising one or more processors to cause the system to: receiving one or more data sets representing a plurality of entities; generating a graph data structure based at least in part on the one or more data sets, the graph data structure representing entities among the plurality of entities as nodes and relationships between pairs of entities as edges between corresponding pairs of nodes; receiving input data indicating a query entity pair; and determining whether the query entity pair meets one or more relevant entity criteria based at least in part on the graph data structure.

In some embodiments, a seventh method is provided for identifying relationships between entities represented in one or more data sets, wherein the seventh method is performed by a system comprising one or more processors, the seventh method comprising: receiving one or more data sets representing a plurality of entities; generating a graph data structure based at least in part on the one or more data sets, the graph data structure representing entities among the plurality of entities as nodes and relationships between pairs of entities as edges between corresponding pairs of nodes; receiving input data indicating a query entity pair; and determining whether the query entity pair meets one or more relevant entity criteria based at least in part on the graph data structure.

In some embodiments, any one or more of the features, characteristics, or aspects of any one or more of the systems, methods, or non-transitory computer-readable storage media described above may be combined with each other, in whole or in part, and/or with any one or more (in whole or in part) of any other embodiment or feature, characteristic, or aspect of the disclosure herein.

Drawings

Various embodiments are described with reference to the accompanying drawings, in which:

FIG. 1A illustrates an exemplary architecture of a system for extracting information from documents and making an overall decision of a business essence, according to some embodiments.

FIG. 1B illustrates an exemplary architecture for ownership transfer classification and arbitration module and associated system components, in accordance with some embodiments.

FIG. 1C illustrates an exemplary architecture of an obligation classification and arbitration module and associated system components, according to some embodiments.

FIG. 1D illustrates an exemplary architecture of a transaction price classification and arbitration module and associated system components, according to some embodiments.

FIG. 1E illustrates an exemplary feature vector data structure for use in representing information extracted from a document and for use in arbitrating business essence, in accordance with some embodiments.

FIG. 1F illustrates an exemplary architecture of a risk/timing/amount arbitration module and associated system components, according to some embodiments.

FIG. 1G illustrates an exemplary architecture of a global arbitration module and associated system components, according to some embodiments.

FIG. 2 illustrates an exemplary method for generating a plurality of graph data structures representing relationships between entities.

FIG. 3A illustrates an exemplary logical architecture for making decisions for retrievability, according to some embodiments.

FIG. 3B illustrates an exemplary method for applying multiple models to arbitrate retrievability based on customer data and multiple data sources.

Fig. 4 illustrates a computer according to some embodiments.

Detailed Description

There is a need for improved methods for AI-enhanced automatic analysis of documents to quickly and efficiently make various decisions based on the document, including decisions as to whether the document represents underlying data that meets one or more predefined or dynamically determined criteria. In some embodiments, a collection of documents (and/or other data) may be automatically ingested and evaluated to decide whether the documents/data represent a schedule or contract that meets business parenchymal criteria. In some embodiments, a collection of documents (and/or other data) may be automatically ingested and evaluated to arbitrate whether the documents/data represent related parties, transactions between related parties, and/or transactions that meet criteria/requirements regarding transactions between related parties. In some embodiments, a collection of documents (and/or other data) may be automatically ingested and evaluated to arbitrate whether the documents/data represent a transaction and/or party meeting the criterion for retrievability, including by arbitrating the likelihood of retrievability.

Commercial essence

As explained above, there is a need for improved methods for AI-enhanced automatic analysis of documents in order to quickly and efficiently make various decisions based on the document, including decisions as to whether the document represents a transaction, agreement, contract, arrangement, or other interaction of its business nature.

An improved system to meet these needs may have application in a variety of use cases, including rapid and accurate assessment of compliance with revenue recognition criteria (e.g., IFRS 15/ASC 606), where one or more criteria require agreement (e.g., contract, transaction, etc.) to be of commercial nature. Transactions, agreements, contracts, arrangements, or other interactions may be considered to be of business essence when future cash flows of an entity (e.g., an enterprise) are expected to change due to the interactions. A change in cash flow may be considered to exist when any one or more of the following (excluding tax considerations) changes (e.g., changes sufficient to satisfy one or more criteria):

risk: such as an increased risk of failing to create an inflow cash flow due to a transaction; for example, the corporation accepts the primary guarantee status of the debt in exchange for a larger payoff amount.

Timing: such as a change in the timing of receipt of cash inflow due to a transaction; for example, the business agrees to delay payment in exchange for a larger amount.

Amount of money: such as a change in the amount paid for the transaction; for example, businesses receive cash faster in exchange for receiving a smaller amount.

If there is monetary benefit from exchanging transactions, then the transaction may be said to have a business parenchyma; and if the monetary benefit is not changed, the transaction may be said to have no commercial substance. In some embodiments, if a business substance exists in a transaction, the transaction is recorded as a fair value of the asset; and if the transaction does not have a business essence, recording the transaction in terms of the asset's accounting value.

One example of a transaction that does not have a business substance is the sale of an asset to a unique business owner who immediately rents the asset back to the business. There is little distinction between the exclusive enterprise and its owners, so there is a high probability that no real change in ownership will occur. Another example of a transaction that is not commercially substantial is a bandwidth capacity exchange between different internet networks and telephone service providers. By doing so, both entities recognize revenue without actually generating actual revenue that would result in a change in profit.

Traditional methods for assessment of business essence rely on human assessment, which can lead to inaccuracy due to human error, inefficiency, and the possibility of human introduced bias. Furthermore, manual arbitration according to known methods provides insufficient granularity (e.g., at the transaction level) and is difficult or impossible to expand (e.g., for comprehensive population testing). Accordingly, there is a need for a system and method for performing automatic arbitration of business essence based on processed documents or other data to improve efficiency and accuracy and reduce human introduced bias.

Disclosed herein are systems configured for AI-enhanced arbitration of business essence (e.g., arrangement, agreement, contract, transaction, or other underlying data) of interactions represented by one or more ingested documents (or other data). As explained herein, the systems disclosed herein may apply a variety of AI techniques, including, for example, developing feature vectors, clustering, classifying, and arbitration, to enable automatic determination of whether one or more business parenchymal criteria are met. The system disclosed herein may perform one or more automatic evaluations of risk, timing, and/or cash flow evidenced by the documents being analyzed to determine whether the documents represent business-essential interactions.

As explained herein, the arbitration of a business substance by the system disclosed herein may include the use of feature vectors to represent interactions (e.g., contracts or transactions), where the feature vectors may be used for clustering, classification, similarity searching, and/or arbitration. As further explained herein, the result data from the various methods may be integrated to generate a business essence overall decision.

In some embodiments, a system for automatic arbitration of a business substance is provided. The system may be configured to receive one or more documents (e.g., PDF documents, word processing documents, JPG documents, etc.) or other data and automatically process the received documents to extract information from the documents. The extracted information may be evaluated to determine whether the information represents one or more interactions, such as a contract or transaction. The system may then evaluate the extracted information regarding those one or more interactions to determine whether the one or more interactions meet predefined (or dynamically determined) business parenchymal criteria.

In some embodiments, determining whether the business parenchyma criterion is met may be performed at least in part by generating and evaluating one or more feature vectors. The system may be configured to automatically generate feature vectors (which may be referred to as "use case vectors") representing identified interactions (e.g., identified contracts, transactions, etc.) in the received document.

Generating feature vectors may include performing structural, semantic, and/or linguistic analysis on ingested documents (e.g., contracts, purchase orders, etc.) using Natural Language Processing (NLP) techniques. Analysis using NLP techniques may generate an output indicative of a respective range of each of one or more portions of the document being evaluated; analysis using NLP techniques may generate an output indicative of a respective topic for each of one or more portions of the document being evaluated.

In some embodiments, the feature vector may be configured to represent content, scope, participant identity, timing, amount, location, terms, etc., extracted from one or more documents. In some embodiments, feature vectors may be configured to represent "what", "when" and "how" a contract, transaction, etc. In some embodiments, one or more fields in the feature vector may represent (or may be associated with) a confidence value that indicates a confidence level of the feature vector field. In some embodiments, the feature vector may represent information indicating the obligation of one party to another party in the contract. In some embodiments, the feature vector may represent information indicating a price (contract) in the contract. In some embodiments, the feature vector may represent information indicating whether the price comprises a physical exchange. In some embodiments, the feature vector may represent information (e.g., points in time, time windows, and/or a schedule of individual points in time/time windows) indicating when the obligations and valuations are to be fulfilled. In some embodiments, the feature vector may represent information indicating how the price will be provided from one party to the other party in the contract. In some embodiments, feature vectors are used to represent information indicating the entity name or identity of the parties to the interaction. In some embodiments, the feature vector represents information indicative of a duration (e.g., a duration of an agreement or contract).

In some embodiments, the system may generate or enhance feature vectors based in part on context data and/or metadata that may be available to the system via one or more sources separate from the one or more documents being analyzed. For example, the system may leverage metadata from the financial system and/or data from the contract management system to generate and/or enhance feature vectors.

In some embodiments, the system may use feature engineering including one or more of the following to generate feature vectors:

entity names of one or more parties to the interaction, possibly normalized with respect to the dataset of entity names

Duration of agreement

Features of ownership transfer, obligation, trade pricing, price-checking and/or exchange

Payment condition

Shipping clause

Additional data (e.g., metadata) from one or more other data sources (e.g., from a financial system and/or contract management system), such as valuation information (e.g., initial value, depreciation, fair value)

In some embodiments, the system may use document embedding (see Dai, A.M., olah, C.and Le, Q.V.,2015.Document embedding with paragraph vectors.arXiv pre-print the arXiv: 1507.07998) and/or an automatic encoder (see, e.g., li, J., luong, M.T., and Jurafsky, D.,2015.A hierarchical neural autoencoder for paragraphs and documents.arXiv pre-print the arXiv: 1506.01057) to generate feature vectors.

After creating the feature vector representing the interaction (e.g., representing the transaction or contract), the system may use the feature vector to cluster and/or classify the interactions represented by the feature vector. In some embodiments, clustering may be performed such that feature vectors within the same cluster are more similar to each other than feature vectors belonging to other clusters. In some embodiments, the system may apply one or more classification models (e.g., machine learning and/or AI classification models) to the feature vectors. In some embodiments, the classification model may be configured to classify feature vectors as either representing (a) interactions with business parenchyma or (b) interactions without business parenchyma. In some embodiments, the classification model may be configured to classify the feature vector into one or more of any suitable number of classifications. In some embodiments, the classification model may be configured to assign respective confidence values to classifications of feature vectors. In some embodiments, the classification model may be a machine learning model trained using training data based on previous interactions (e.g., previous contracts) that have been determined to have or not have business essence.

In some embodiments, cluster analysis may be applied so that interactions with similar feature vectors may be arbitrated simultaneously to optimize the computational speed of reasoning and arbitration. In some embodiments, cluster analysis may include applying unsupervised clustering such as K-means, which may enable similar documents to be clustered together. In some embodiments, cluster analysis may include applying hierarchical clustering, which may reduce dimensionality by methods such as singular value decomposition. ( See, for example, castelli, v., thomasian, a. And Li, c.s.,2003.CSVD Clustering and singular value decomposition for approximate similarity search in high-dimensional spaces. IEEE Transactions on Knowledge and Data Engineering,15 (3), pages 671-685. )

In some embodiments, the classification analysis may include classifying interactions based on feature vectors (in some embodiments, whether feature vectors are based on feature engineering or built using document embedding and/or automatic encoder techniques). In some embodiments, the classification analysis may include the use of one or more supervised machine learning models (e.g., SVM) and/or deep learning models (see, e.g., distilBERT (Sanh, v., debut, l., chaumond, j. And Wolf, t.,2019.DistilBERT,a distilled version of BERT:smaller,faster,cheaper and lighter.arXiv preprint arXiv:1910.01108 ]), MT-DNN (Liu, x., he, p., chen, w, and Gao, j.,2019.Multi-task deep neural networks for natural language understanding. ArXiv preprint arXiv: 1901.11504)). In some embodiments, a classification analysis may be applied to classify feature vectors into categories indicative of commercial parenchyma or categories indicative of lack of commercial parenchyma. In some embodiments, the classification analysis may include generating one or more confidence levels associated with the classification of the feature vector.

In some embodiments, the system may be configured to identify a subset of feature vectors from the set of feature vectors representing a plurality of other interactions/contracts/transactions that are most similar to the target feature vector for arbitration. For example, a respective similarity score between the target feature vector and each respective feature vector in the set of other feature vectors may be generated, and a subset of top feature vectors having the highest similarity score may be selected. In some embodiments, a subset of all feature vectors having a similarity score exceeding a predetermined or dynamically determined threshold similarity score value may be selected. In some embodiments, a subset of feature vectors with top k similarity scores may be selected, where the value k may be determined according to system settings, user input, and/or dynamic determination based on processed documents or other information available to the system.

In some embodimentsIn an example, evaluating the similarity for the purpose of selecting a subset of similar feature vectors may include determining the similarity based on any one or more of the following similarity metrics: l (L) ₀ 、L ₁ 、L ₂ Or L _infinity 。L ₂ May also be referred to as euclidean distance. In some embodiments, weighting of one or more portions of feature vectors may be introduced in determining similarity between feature vectors. In some embodiments, the weights may be learned by iterative refinement.

In some embodiments, the function f= Σcmay be used _i X _i Wherein if the interaction is deemed to have a business essence, then X _i = 1{i =1..k }; x if the interaction is deemed not to be of commercial nature _i = (-1) { i=1..k }; and c _i Is a measure of similarity measure for feature vectors of interactions. If the measure of interaction F is above a threshold, then the interaction may be considered to be of commercial nature, with the threshold being between-1 and 1.

In some embodiments, the system may be configured to apply AI-enhanced reasoning to determine corresponding changes in one or more features, where the determination may be based on feature vectors and/or based on additional information available to the system and determined to be relevant to the interaction (e.g., enterprise Resource Planning (ERP) data from one or more financial systems). In some embodiments, evaluating the changed characteristics via interaction (e.g., according to a contract) may include one or more of:

a. risk change-assessment of risk of inflowing cash flow before and after interaction;

b. timing change-evaluation of timing of cash inflow before and after interaction; and

c. change in amount-evaluation of the cash inflow amount before and after interaction.

In some embodiments, the change in the evaluation of each of the one or more features may be quantified in any suitable unit or any suitable score. In some embodiments, the assessed change to each of the one or more features may be classified as significant or not significant depending on whether the assessed change exceeds a threshold.

In some embodiments, the evaluation of the change of each of the one or more features may be performed entirely automatically by the inference engine or may be performed by manual augmentation based on user input.

The inputs considered by the system in evaluating the change to each of the one or more features may include a feature vector and an impact on the evaluation of cash flow as a result of the interaction represented by the feature vector. In some embodiments, the evaluation may be performed based in part on:

risk-the risk of arbitrating cash flows may be based on interactive obligations, such as represented in the contract document being analyzed. As an example, accepting primary debt instead of advanced debt carries a higher risk because primary debt has a lower priority in terms of obtaining reimbursement when clearing during the bankruptcy procedure.

Timing-determining whether execution timing will result in a change in cash flow may be evaluated according to obligations. Deferred payment in exchange for larger payments may be considered to constitute a business entity.

Monetary—the cash inflow amount before and after the interaction (e.g., before and after executing a contract or other agreement) may be considered. As an example, a zero-amount sales order (such as sending a sample to a potential customer) does not have an impact on cash flow and therefore may not be considered to constitute a business essence.

After performing one or more of the foregoing three analyses- (a) clustering/classification of feature vectors, (b) a subset of feature vectors most similar to the target feature vector for arbitration, and (c) a change to one or more characteristics, the system may then make an overall arbitration as to whether the interactions represented by the feature vectors meet one or more business essence criteria. The arbitration may be based on any one or more of the foregoing analyses. In some embodiments, the overall determination as to whether the business parenchyma criterion is met may be made based on calculating a business parenchyma score from the foregoing analysis and determining whether the business parenchyma score meets a business parenchyma score threshold.

Making an overall decision as to whether the interaction represented by the feature vector meets one or more business parenchymal criteria may include generating an indication as to whether the one or more business parenchymal criteria are met. Making a arbitration may also include generating a confidence score that indicates a confidence level for all or part of the arbitration (e.g., relative to the overall arbitration, relative to one or more of the underlying previous analyses, and/or relative to a particular business parenchymal criterion of a plurality of business parenchymal criteria).

In some embodiments, the overall arbitration may be based in part on the clustering/classification of feature vectors described above. Classifying feature vectors into clusters/classifications that represent commercial parenchyma or clustering/classifying feature vectors with other feature vectors that represent commercial parenchyma may support overall arbitration that is beneficial to commercial parenchyma.

In some embodiments, the overall arbitration may be based in part on the identification of a subset of feature vectors that are most similar to the target feature vector described above. The identification of one or more similar feature vectors in the identified subset that are themselves associated with the interaction of the business substance may support an overall arbitration that is beneficial to the business substance.

In some embodiments, the overall arbitration may be based in part on the assessed changes for one or more characteristics. Changes to more significant assessment of one or more of the features may support overall discretion that is conducive to commercial essence.

In some embodiments, the foregoing analysis ((a) clustering/classification of feature vectors, (b) a subset of feature vectors most similar to the target feature vector for arbitration, and (c) changes to one or more features) may be evaluated independently of each other or in combination with each other to determine an overall arbitration. In some embodiments, a score for one or more of the foregoing analyses may be calculated. In some embodiments, if any of the foregoing analyses meets an overall arbitration criterion (e.g., a threshold), an overall arbitration may be made that favors the discovery of commercial essence. In some embodiments, if any of the foregoing analyses fails to meet overall arbitration criteria (e.g., a threshold), an overall arbitration may be made against the findings of the business essence. In some embodiments, the foregoing analyzed score or evaluation may be combined with another analyzed score or evaluation and jointly evaluated, such as by being used (weighted or unweighted) to calculate the overall arbitration score as a sum or product, and the overall arbitration score may be compared to a threshold to determine an overall arbitration.

FIG. 1A illustrates an exemplary architecture of a system 100 for extracting information from documents and making a business-substantive overall decision.

As shown in fig. 1A, the system 100 may receive one or more input documents including a contract document and/or a revision document.

The input document may be processed via one or more document understanding operations to extract information from the input document.

In some embodiments, processing the document via one or more document understanding operations may include performing structural, semantic, and/or linguistic analysis of the document. Structural analysis of one or more documents may enable identification of one or more portions of one or more of the documents. For example, the contract contains one or more of the following parts, any one or more of which may be identified by the system:

words of introduction, statement, agreement

Definition of

Action part (price)

Delegate and guarantee

Contract and rights

Conditions of obligation

Residual office regulation and remedial action

General rule of

Signature

In some embodiments, the system 100 may be configured according to the following assumptions: obligations and prices are often set forth in the action section(s) of the contract. The action portion may include an exchange of commitments as subject of an agreement. It may specifically identify the value to be exchanged between the parties. For example, it may identify goods or services to be provided to another party. It may indicate the total amount or unit exchange rate of money exchanged in the transaction. This section may lay a foundation for other contractual terms that support such exchanges.

The obligations of the parties may include:

rights of parties

Related date

Related price or other dollar amount

Correlation quantity

Payment clause

Disposable payment, COD, installment payment

Payment due date

Tax type

Interest rate

Expired fine

In some embodiments, performing semantic analysis includes leveraging topic modeling (topic modeling) in Natural Language Processing (NLP) such that the intent of one or more portions, sub-portions, and/or paragraphs of a document are correctly identified. Linguistic analysis may classify sentences as either cognitive or predictive based on situational. Obligations in contracts are typically expressed in a sense modality.

Natural language processing techniques may be used to identify one or more of the following in the contract, revision, or other document(s) being analyzed:

ownership transfer-can be categorized based on an action part that describes how ownership of the good is transferred (in conjunction with delivery terms).

The obligations of one party to the other in the contract-can be categorized by the action part.

Pricing of transactions-including point in time and pricing over a period of time.

Rewards price-can be categorized based on the action component and can include fixed and variable price (e.g., price related to some form of discount).

One aspect that may be determined is whether the transaction involves a physical exchange as part of the price. The physical exchange may be an equipment exchange in a manufacturing environment or a payment to reserve a portion of the raw materials as the oil and gas industry.

Timing-obligations and price to be fulfilled. This is to facilitate classifying the contract as a point in time or within a period of time.

Contract fulfillment—for determining payment terms (e.g., "net 30" may indicate payment 30 days after receipt of the invoice) and shipping terms (e.g., "EXW" may indicate that ownership transfer occurred at the origin).

As shown in fig. 1A, feature vectors may then be generated based on information extracted from one or more received documents. As shown, the feature vector may be generated via classification and arbitration of ownership transfer, classification and arbitration of obligations, and/or classification and arbitration of transaction prices. The feature vectors may then be used to evaluate three underlying analyses, such as described above.

First, feature vectors may be processed via vector-based clustering and classification operations, which may cluster and/or classify target feature vectors, including referencing one or more other feature vectors available via a use case library.

Second, the feature vectors may undergo a similarity search operation (e.g., selecting a subset of other feature vectors that are most similar to the target feature vector). This evaluation may be made with reference to one or more other feature vectors available via the use case library.

Third, feature vectors can be processed via an arbitration engine to evaluate changes in risk, timing, and monetary amount. This processing may be based on the target feature vector itself and on other information such as ERP data.

All three (or any one or more of the three) of these underlying analyses may then be used to generate an overall decision as to whether the contract meets business parenchymal criteria, such as described above.

As shown in fig. 1A, system 100 may include a contract and revision data source 101, which may include any one or more computer storage devices, such as a database, data store, data repository, live data feed, and the like. The data source 101 may be communicatively coupled to one or more other components of the system 100 and configured to provide contract data and/or revision data to the system 100 such that the contract data and/or revision data may be evaluated to determine whether one or more business parenchyma criteria and/or related criteria are met. In some embodiments, the system 100 may receive data from the document source 101 on a scheduled basis, in response to user input, in response to one or more trigger conditions being met, and/or in response to manually sending a document. The data received from the data source 101 may be provided in any suitable electronic data format, for example as structured, unstructured and/or semi-structured data. The data may include, for example, a spreadsheet, a word processing document, an image file, and/or a PDF.

The system 100 may include a document understanding module 102, which may include any one or more processors configured to perform one or more document processing operations on contract data and/or revision data provided by the data source 101. The document processing operations performed by module 102 may include information extraction and/or structural classification that identifies and classifies different portions of a document. The document understanding module 204 may generate data representing information extracted from the received contract data and/or revision data. The document understanding module 204 may generate data representing the identified document/revised portion/structure and associated metadata that classifies or characterizes the identified portion/structure.

Downstream of the document understanding module 102, the system 100 may include a plurality of arbitration modules configured to receive output data generated by the document understanding module 102 (optionally along with contract/revision data received from the data source 101) and process the received data to generate classification data and/or arbitration data. In the illustrated example, the system 100 includes an ownership transfer classification and arbitration module 103, an obligation classification and arbitration module 104, and a transaction price classification and arbitration module 105.

In some embodiments, one or more arbitration modules that generate data for inclusion in feature vectors may leverage active logic and/or passive logic. For example, the ownership transfer arbitration module may apply active logic by generating hypotheses and evaluating evidence to determine whether the hypotheses may be verified, while the ownership transfer arbitration module may apply passive logic by analyzing document data to identify portions (e.g., paragraphs) that include data indicating how ownership is transferred or not transferred.

Ownership transfer classification and arbitration module 103 may include any one or more processors configured to perform one or more data processing operations for classification and/or arbitration of ownership transfers. (any of the data processing operations referenced herein may include application of one or more models trained by machine learning.) module 103 may receive output data generated by module 102 and may process the received data to generate an output data classification representing ownership transfer and/or ownership transfer arbitration.

The obligation classification and arbitration module 104 may include any one or more processors configured to perform one or more data processing operations for classification and/or arbitration of obligations. The module 104 may receive the output data generated by the module 102 and may process the received data to generate output data representing an obligation classification and/or an obligation arbitration.

The transaction price classification and arbitration module 105 may include any one or more processors configured to perform one or more data processing operations for classification and/or arbitration of transaction prices. The module 105 may receive the output data generated by the module 102 and may process the received data to generate output data representing transaction price classifications and/or transaction price decisions.

The output data generated by one or more of ownership transfer classification and arbitration module 103, obligation classification and arbitration module 104, and transaction price classification and arbitration module 105 may be used to create feature vector 106. Feature vector 106 may include an indication of the classification and/or arbitration for each upstream module and may optionally include a confidence level associated with one or more of the included classifications and/or arbitration.

The system 100 may include a risk/timing/amount arbitration module 107, which may include any one or more processors configured to perform one or more data processing operations for arbitration of risk, timing, and/or amount. The module 107 may receive the feature vector 106 and may process the feature vector 106 to generate output data representing a resolution of risk, timing, and/or amount of contract data and/or revision data originally received from the data source 101. The output data generated by module 107 may be provided to overall arbitration module 112, as described in further detail below.

In some embodiments, the data processing operations performed by module 107 may also be based on ERP data received from ERP data source 108.

The ERP data source 108 may include any one or more computer storage devices, such as databases, data stores, data repositories, live data feeds, and the like. The ERP data source 108 may be communicatively coupled to one or more other components of the system 100 and configured to provide ERP data to the system 100 such that ERP data for generating data representing one or more decisions regarding contract data and/or revision data received from the data source 101 may be evaluated. In some embodiments, one or more components of the system 100 may receive data from the ERP data source 108 on a scheduled basis, in response to user input, in response to meeting one or more trigger conditions, and/or in response to manually transmitted data. The ERP data received from ERP data source 108 may be provided in any suitable electronic data format.

The system 100 may include a similarity search module 109, which may include any one or more processors configured to perform one or more similarity search operations. The similarity search operation may use the input feature vector to calculate a distance between the feature vector and one or more (e.g., a set of) other feature vectors (e.g., feature vectors in a use case library and characterizing the phase application). The distance calculation may be, for example, a euclidean distance. The similarity search operation may include ranking distances from a minimum distance (most similar) to a maximum distance (least similar).

The module 109 may receive the feature vector 106 and may receive one or more additional feature vectors (e.g., feature vectors representing other source documents and generated in the same or similar manner as the feature vector 106) from the use case repository 110 (described in further detail below). The similarity search module 109 may compare the feature vector 106 to one or more feature vectors received from the use case library 110 to generate output data representing similarity between the feature vector 106 and the one or more feature vectors received from the use case library 110. Comparing feature vectors may include calculating a distance (e.g., a weighted distance) between the two feature vectors being compared. The output data may include a similarity score (e.g., including a distance metric such as euclidean distance) and/or an indication of one or more parameters that are similar or different between the compared vectors. In some embodiments, the system may search the use case library 110 for the most similar feature vector (or vectors) and may then find stored arbitration results for the identified most similar use case(s). The arbitration result for the identified most similar use case(s) may be included in the output data generated by the similarity search module 109, e.g., so that the system may arbitrate the current use case in a similar manner. The output data generated by module 109 may be provided to overall arbitration module 112, as described in further detail below.

Use case library 110 may include any one or more computer storage devices, such as a database, data repository, live data feed, etc. The use case library 110 may be communicatively coupled to one or more other components of the system 100 and configured to provide data regarding prior evaluations and/or prior decisions to the system 100 so that the system may utilize the data regarding prior evaluations/decisions in making a current evaluation/decision. In some embodiments, use case library 110 may store feature vectors representing previously ingested contract data and/or revision data, which may be generated in the same or similar manner as feature vector 107. In some embodiments, one or more components of system 100 may receive data from use case library 110 on a scheduled basis, in response to user input, in response to meeting one or more trigger conditions, and/or in response to manually sending a document. The data received from the use case library 110 may be provided in any suitable electronic data format.

The system 100 may include a use case-vector clustering and categorization module 111, which may include any one or more processors configured to perform one or more use case vector clustering and/or categorization data processing operations. The module 111 may receive the feature vector 106 as input data and may receive one or more clusters of feature vectors from the use case library 110 as input data. Module 111 may process feature vector 106 and the received one or more feature vector clusters to generate a similarity measure and/or an indication of the most similar identified feature-vector clusters, in a manner similar to that described above with respect to module 109. The module 111 may find the arbitration result of the one or more feature vectors in the identified most similar cluster(s) and the result may be included in the output data generated by the module 111, e.g., so that the current use case may be arbitrated by the system in a similar manner. In some embodiments, the output data generated by module 111 may be provided to overall arbitration module 112, as described in further detail below.

In some embodiments, module 111 can additionally perform clustering based on feature vectors 106, and can store feature vectors 106 as part of one or more clusters in use case library 110.

The system 100 may include a global arbitration module 112, which may include any one or more processors configured to perform one or more global arbitration data processing operations. The module 112 may receive output data generated from one or more of the following modules: risk/timing/amount arbitration module 107, similarity search module 109, and/or use case-vector clustering and classification module 111. The module 112 can process the received data to generate output data that includes an overall resolution of the contract data and/or revision data originally received from the data source 101. The overall sanction output data generated by module 112 may include a binary indication of the sanction (e.g., whether one or more criteria are met, such as whether the received data meets one or more business parenchymal criteria, one or more relatives criteria, and/or one or more retrievability criteria). In some embodiments, the overall arbitration output data may include tuples (L, C), where L represents data indicating a likelihood of meeting one or more criteria (e.g., business substance), and where C represents data indicating a confidence level of the arbitration. The output data generated by the module 112 may be stored, transmitted, presented to a user, used to generate one or more visualizations, and/or used to trigger one or more automation system actions.

Fig. 1B illustrates an exemplary architecture for ownership transfer classification and arbitration module 103 and associated system components, in accordance with some embodiments. As shown in fig. 1B, module 103 may receive input data (e.g., contract data 113 and/or portions of data derived therefrom) and may process the input data via a data processing pipeline to generate output data including ownership transfer classification data 118 (a).

As shown, contract data 113, which may be data received from data source 101 in fig. 1A, may be processed by structure classification module 114. The structure classification module 114 may be part of the document understanding module 102 in FIG. 1A. The structure classification module 114 may include any one or more processors configured to perform one or more structure classification data processing operations. The module 114 may receive the contract data 113 as input data and may process the received contract data 113 to generate partial data 115 representing structural classification information, including, for example, data indicating which portions of the contract data 113 include ownership transfer language. The portion data 115 may, for example, include an indication of a document page, document portion, and/or contract portion that is determined to include (or is determined not to include) all rights transfer languages. The portion data 115 may include an indication of the type of ownership transfer language included in the identified portion(s). The partial data 115 may include an indication of a confidence level that indicates a confidence of one or more of the determinations indicated in the partial data 115.

In some embodiments, module 114 may perform one or more document structure and layout analysis data processing operations including, for example, partitioning a document into a plurality of different regions based on the layout of the document, and including, for example, classifying one or more of the regions into partial classes such as titles, partial titles, paragraphs, bulleted lists, numbered lists, graphics, tables, and the like. Machine learning and deep learning techniques may be utilized for this purpose.

The structure classification module 114 may generate partial data 115, which may form all or part of the input of the module 103. The input data received by module 103 may be provided in any suitable structured, partially structured, and/or unstructured format.

Module 103 may include a sentence classification module 116, which may include any one or more processors configured to perform one or more sentence classification data processing operations. In some embodiments, the sentence classification module 116 may classify each sentence in the document that may include a discussion of certain topics (such as ownership transfer), for example, using machine learning. The module 116 may receive the portion data 115 as input data and/or may receive the contract data 113 as input data and may process the received data to generate sentence data 117 representing sentence classification information, including, for example, information indicating which sentences of the contract data 113 include ownership transfer language. Sentence data 117 may, for example, include an indication of sentences determined to include (or determined not to include) all rights transfer languages. Sentence data 117 can include an indication of the type of ownership transfer language included in the identified sentence(s). The sentence data 115 may include an indication of a confidence level that indicates the confidence of one or more of the determinations indicated in the sentence data 117.

Module 103 may include an ownership transfer classification module 118, which may include any one or more processors configured to perform one or more ownership transfer classification operations. The module 118 may receive sentence data 117, portion data 115, and/or contract data 113 as input data, and may process the received data to generate ownership transfer classification data 118 (a). Ownership transfer classification data 118 (a) may indicate a classification of the received data (and/or underlying contract data and/or revision data received from data source 101) that indicates an ownership transfer classification, such as by indicating whether the data represents a complete transfer of ownership, a partial transfer of ownership, or no ownership transfer. The ownership transfer classification data may also include an indication of a confidence level that indicates the confidence of one or more classifications. Ownership transfer classification data 118 (a) may constitute or may be included in output data representing ownership transfer classifications and/or ownership transfer decisions generated by module 103 described above with reference to fig. 1A. Ownership transfer classification data 118 (a) may be included in feature vector 106.

The module 103 may be communicatively coupled to a sample term database 119, and the sample term database 119 may include any one or more computer storage devices, such as databases, data stores, data repositories, live data feeds, and the like. The sample term database 119 may be communicatively coupled to the ownership transfer classification module 118 and configured to receive classification information related to documents, revisions, document portions, and/or sentences from the ownership transfer classification module 118. The terms themselves may be stored in the database 119 in association with the classification results associated with the terms. Ownership transfer classification module 118 may be configured to receive data stored in sample term database 119 and to use the received data to generate output data, such as by comparing sample terms received from database 119 with terms being analyzed.

The module 103 may be communicatively coupled to a continuous learning module 120, which may include any one or more processors configured to perform one or more machine learning operations based on the term data and/or term classification data stored in the term database 119. The continuous learning module 120 may be used to train one or more data processing operations applied by the module 118 in order to improve the performance of the module 118.

FIG. 1C illustrates an exemplary architecture of the obligation classification and arbitration module 104 and associated system components, according to some embodiments. As shown in fig. 1C, module 104 may receive input data (e.g., contract data 113) and may process the input data through a data processing pipeline to generate output data including ownership transfer classification data 127, 128, 129, 130, 131, and/or 132.

The module 104 may receive input data including contract data 113, which contract data 113 may be data received from the data source 101 in FIG. 1A.

Module 104 may include a language modality classification module 121, which may include any one or more processors configured to perform one or more language modality data processing operations. Module 121 may receive contract data 113 and may process the received data to generate output data including an indication of one or more language modalities, such as cognitive output data 123 and/or sense output data 124. The output data 123 and/or 124 may include an indication of language modalities and/or associated confidence levels.

The module 104 may include a structure classification module 122 that may share any one or more features in common with the structure classification module 114 described above with reference to fig. 1C. The structure classification module 122 may receive the contract data 113 and may process the received data to generate output data including one or more portions within the contract and/or an indication of the revision represented by the contract data. In some embodiments, structure classification module 122 and language modality classification module 121 may work together to generate language modality data (e.g., output data 123 and/or 124) corresponding to one or more particular identified portions identified in the document represented by contract data 113.

The module 104 may include a committee/beneficiary classifier module 125 that may include any one or more processors configured to perform one or more committee/beneficiary classifier data processing operations. Module 125 may receive language modality classification data (e.g., data 123 and/or 124), structure classification data (e.g., as generated by module 22), and/or contract data 113, and may process the received data to generate output data classifying the received data according to whether the input data relates to a committee, and/or a beneficiary. In some embodiments, module 125 may receive as input sense data 124 and may not receive as input cognitive data 123.

The module 104 may include an obligation classifier module 126, which may include any one or more processors configured to perform one or more obligation classifier data processing operations. Module 126 can receive language modality classification data (e.g., data 123 and/or 124), structure classification data (e.g., as generated by module 22), and/or contract data 113, and can process the received data to generate output data classifying the received data according to whether the input data relates to obligations, permissions, and/or prohibitions. For example, the module 126 may generate the obligation output data 127 (identifying portions of the contract represented by the contract data 113 that relate to obligations and/or associated confidence levels), the license output data 128 (identifying portions of the contract represented by the contract data 113 that relate to licenses and/or associated confidence levels), and/or the inhibit output data 128 (identifying portions of the contract represented by the contract data 113 that relate to inhibit and/or associated confidence levels). In some embodiments, module 126 may receive as input sense data 124 and may not receive as input cognitive data 123.

In some embodiments, modules 125 and 126 may work (individually and/or together) to generate output data that associates the identified obligations (e.g., 127), permissions (e.g., 128), and/or prohibits (e.g., 129) with one or more identified committees, and/or beneficiaries. Committee output data 130 (identifying obligations, permissions, and/or associations associated with the committee), committee output data 131 (identifying obligations, permissions, and/or associations associated with the committee), and/or beneficiary output data 132 (identifying obligations, permissions, and/or associations associated with the beneficiary) may thereby be generated.

The obligation classification data 127, 128, 129, 130, 131 and/or 132 as generated by the module 104 may constitute output data representing or may be included in the obligation classification and/or obligation arbitration as generated by the module 104 described above with reference to fig. 1A. The obligation classification data may be included in the feature vector 106.

FIG. 1D illustrates an exemplary architecture of the transaction price classification and arbitration module 105 and associated system components, according to some embodiments. As shown in fig. 1D, module 105 may receive input data (e.g., contract data 113 and/or portions of data derived therefrom) and may process the input data via a data processing pipeline to generate output data including transaction price classification data 142-144.

As shown, contract data 113, which may be data received from data sources 101 in fig. 1A, may be processed by structure classification module 133. The structure classification module 133 may be part of the document understanding module 102 in FIG. 1A. The structure classification module 133 may include any one or more processors configured to perform one or more structure classification data processing operations. The module 133 may receive the contract data 113 as input data and may process the received contract data 113 to generate partial data 134 representing structural classification information, including, for example, data indicating which portions of the contract data 113 include prices or a price-to-price language. The portion data 134 may, for example, include an indication of a document page, document portion, and/or contract portion that is determined to include (or is determined not to include) a price or a price-for-language. The portion data 134 may include an indication of the price type or the price language included in the identified portion(s). The partial data 134 may include an indication of a confidence level that indicates a confidence of one or more of the determinations indicated in the partial data 134.

In some embodiments, module 134 may perform one or more document structure and layout analysis data processing operations including, for example, partitioning a document into a plurality of different regions based on the layout of the document, and including, for example, classifying one or more of the regions into partial classes such as titles, partial titles, paragraphs, bulleted lists, numbered lists, graphics, tables, and the like. Machine learning and deep learning techniques may be utilized for this purpose.

The structure classification module 133 may generate partial data 134, which may form all or part of the input of the module 105. The input data received by module 105 may be provided in any suitable structured, partially structured, and/or unstructured format.

Module 105 may include a sentence classification module 135, which may include any one or more processors configured to perform one or more sentence classification data processing operations. In some embodiments, the sentence classification data processing operations may include segmenting the document into a plurality of different sentences, and may include classifying one or more of the sentences into sentence subclasses. Machine learning and deep learning techniques may be fully utilized for this purpose. The recognition of sentences may be leveraged to identify certain content in certain types of sentences, as certain types of sentences may be known to include certain content related to a particular focus of a sentence for a base price (fixed price) and potentially discounted rules (variable price).

The module 135 may receive the partial data 134 as input data and/or may receive the contract data 113 as input data and may process the received data to generate sentence data 136 representative of sentence classification information, including, for example, data indicating which sentences of the contract data 113 include ownership transfer language. Sentence data 136 may, for example, include an indication of sentences determined to include (or determined not to include) all rights transfer languages. Sentence data 136 may include an indication of the type of ownership transfer language included in the identified sentence(s). The sentence data 136 may include an indication of a confidence level that indicates a confidence level of one or more of the determinations indicated in the sentence data 136.

Module 105 may include a price classification module 137, which may include any one or more processors configured to perform one or more price classification operations. The module 137 may receive sentence data 136, portion data 134, and/or contract data 113 as input data, and may process the received data to generate price classification data. The price classification data may indicate a classification of the received data (and/or the underlying contract data and/or revision data received from the data source 101) that indicates the price classification. The price classification data may also include an indication of a confidence level that indicates the confidence of one or more of the classifications.

The module 105 may be communicatively coupled to a sample term database 138, and the sample term database 138 may include any one or more computer storage devices, such as a database, a data store, a data repository, a live data feed, and the like. The sample term database 138 may be communicatively coupled to the price classification module 137 and configured to receive classification information related to documents, revisions, document parts, and/or sentences from the price classification module 137. The terms themselves may be stored in the database 138 in association with the classification result associated with the terms. The price classification module 137 may be configured to receive data stored in the sample term database 138 and use the received data to generate output data, such as by comparing sample terms received from the database 138 with terms being analyzed. In some embodiments, a single database may be used in place of the sample term database 138 and the sample term database 119.

The module 105 may be communicatively coupled to a continuous learning module 139, and the continuous learning module 139 may include any one or more processors configured to perform one or more machine learning operations based on the term data and/or term classification data stored in the sample term database 138. The continuous learning module 139 may be used to train one or more data processing operations applied by the module 137 in order to improve the performance of the module 137.

Module 105 may include a dependency resolution module 140, which may include any one or more processors configured to perform one or more dependency resolution data processing operations. The dependency resolution data processing operations may include determining dependencies between phrases of a sentence in order to determine a grammatical structure of the sentence. The sentence may be divided into sub-portions based on the determination. Dependency resolution may be based on the assumption that there is a direct relationship between each linguistic unit in a sentence. Module 140 may receive the price classification data from module 137 and may process the price classification data to generate dependency resolution output data. The dependency resolution output data may include dependency graphs describing syntactic relationships between different parts of sentences.

Module 105 may include a mapping module 141 that may include any one or more processors configured to perform one or more mapping data processing operations to map the price data to the obligation and/or duration data. Module 141 may accept input data from module 140, module 137, and/or module 126 in fig. 1C. The obligations identified by module 126 may include an indication of responsibility (e.g., permission and/or prohibition) from one party to the other. For example, the price (e.g., payment milestones, potential discount rules) as identified by module 137 may be mapped to a corresponding obligation (e.g., as identified by module 126) based on output from dependency resolution module 140 for one or more sentences describing the price.

The output data generated by the mapping module 141 may include data indicating correspondence between (a) one or more of the valuations included in the contract and/or revision represented by the contract data 113 and (b) one or more determined obligations and/or durations. The output data may also include one or more confidence levels (e.g., scores) associated with any determined mapping. The output data generated by the mapping module 141 may include transaction price classification data 142-144, which may indicate associations between identified obligations, corresponding fixed pari-mutuels, and corresponding variable pari-mutuels.

The transaction price classification data 142-144 may include a list of obligations with fixed and variable price pairs (e.g., fixed price, discount). For example, obligations may include:

1.sku_1、base price_1、volume_discount_1；

2.sku_2、base price_2、fiexed_discount_2；

3.sku_3,base price_3, complex discount rules.

The transaction price classification data 142-144 may constitute output data representing a transaction price classification and/or transaction price arbitration generated by the module 105 as described above with reference to fig. 1A, or may be included therein. Transaction price classification data may be included in feature vector 106.

FIG. 1E illustrates an exemplary feature vector data structure for use in representing information extracted from a document and arbitrating business essence, in accordance with some embodiments. The data structure shown may be a data structure for feature vector 106. As shown, the feature vector may include a component indicating ownership transfer information, such as by indicating whether full transfer, partial transfer, or no transfer is determined. The feature vector may include a plurality of components, each component indicating an obligation, a corresponding fixed price, and a corresponding variable price. Each component in the feature vector may be associated with one or more confidence levels that may be used to weight the associated feature vector component.

In some embodiments, feature vector 107 may include one or more confidence values associated with values in feature vector 107 and/or one or more values indicating a quantity, amount, or degree of evidence associated with values in feature vector 107. The confidence value and/or evidence value may be provided as a component weight (or part of a component weight) in the feature vector 107 and may be used in calculating the distance between different feature vectors.

In some embodiments, feature vector 107 may be six-dimensional, having three components, each component having two dimensions. The first component may include a first dimension representing the presence or absence of evidence for ownership transfer, and a second dimension representing a confidence level and/or evidence level associated with the first component. The second component may include a first dimension representing the presence or absence of evidence for the obligation, and a second dimension representing a confidence level and/or evidence level associated with the second component. And the third component may include a first dimension representing the presence or absence of evidence for a price (e.g., a transaction price), and a second dimension representing a confidence level and/or evidence level associated with the third component.

Fig. 1F illustrates an exemplary architecture of risk/timing/amount arbitration module 107 and associated system components, in accordance with some embodiments. As shown, the module 107 may provide a data processing pipeline that accepts input data including feature vectors 145 and ERP data from the ERP data source 108 and processes the received data to generate output data (and associated confidence levels) representing corresponding changes in risk, timing, and monetary amounts. Feature vector 145 may share any one or more characteristics in common with feature vector 106 and may indicate, for example, (accounting_i, fixed consideration _i, variable consideration _i) for the ith obligation, and the assigned transaction price (price-to-price) and discount (variable price-to-price). As shown, the ERP data source 108 itself may receive ERP data from any suitable data source for revision and/or reference of other (e.g., previous) contracts in addition to the contract represented by the analyzed input data.

Module 107 may include three parallel classification modules 146, 147, and 148, each of which may include any one or more processors configured to perform a respective data analysis operation based on received input data.

The risk classification module 146 may accept the feature vector 145 (or 106) as input and may generate output data indicative of the risk value and associated confidence level. The risk value may indicate whether and/or to what extent the obligation indicated in the document data substantially changed the profile of the risk (e.g., future cash flows will no longer occur).

The timing classification module 147 may accept the feature vector 145 (or 106) as input and may generate output data indicative of the timing value and associated confidence level. The output data generated by module 147 may indicate whether and/or to what extent the timing (e.g., timing of cash flow) was substantially changed by the obligation indicated in the classified document data.

The price-for-classification module 148 may accept the feature vector 145 (or 106) as input and may generate output data indicative of a price-for-pair value (e.g., a value representing an amount of money) and an associated confidence level. The risk value may indicate whether and/or to what extent the obligation indicated in the document data substantially changes the payment amount paid as a result of the transaction.

Module 107 may include three parallel classification modules 151, 152, and 153, each of which may include any one or more processors configured to perform a respective comparison of (a) the values determined by classification modules 126, 127, and 148, respectively, with (b) the comparison ERP data.

The risk comparison module 151 may compare the risk value determined by the risk classification module 146 with ERP data representing the risk value, and may generate risk change data 154, which may include the risk change value and a risk change confidence level associated with the determination of the risk change value.

The timing comparison module 152 may compare the timing value determined by the timing classification module 147 to ERP data representing the timing value and may generate timing change data 155, which may include the timing change value and a timing change confidence level associated with the determination of the timing change value.

The price/amount comparison module 153 may compare the price/amount value determined by the price/amount classification module 148 with ERP data representing the price/amount value and may generate change data 154, which may include price/amount change values and price/amount change confidence levels associated with the determination of price/amount change values.

The change values (and associated confidence levels) 154, 155, and/or 156 may constitute or may be included in the output data generated by the module 105 described above with reference to fig. 1A. The change value may be accepted as an input by the overall arbitration module 112.

FIG. 1G illustrates an exemplary architecture of the overall arbitration module 112 and associated system components, according to some embodiments. As shown, the overall arbitration module 112 provides a data processing pipeline that may accept the similarity search output data from the similarity search module 109 and may accept the arbitration output data from the arbitration module 107. The overall arbitration module 112 may then process the received input data to generate output data that includes an overall arbitration of the contract data and/or revision data originally received from the data source 101.

As shown in fig. 1G, the similarity search module 109 may use the use case feature vector and data from the use case library 110 to generate similarity search output data, and the similarity search output data may be transmitted to the module 112. The similarity search output data may indicate one or more use cases from the use case library 110 that are most similar (or sufficiently similar, e.g., exceed a similarity threshold) to the subject use case. The output from the similarity search module 109 may be received by the arbitration module 158, and the arbitration module 158 may apply one or more data processing operations to find and/or process the result data from the indicated similar use case. The arbitration module 158 may generate output data indicative of risk changes and associated confidence levels, timing changes and associated confidence levels, and monetary changes and associated confidence levels based on its analysis of identified similar use cases. The output data generated by module 158 may be forwarded to arbitration coordination engine 160.

Meanwhile, the arbitration coordination engine 160 may also receive output data from the arbitration module 107. As discussed above, the arbitration module 107 may generate output data indicative of risk changes and associated confidence levels, timing changes and associated confidence levels, and monetary changes and associated confidence levels. In some embodiments, the data received by engine 160 from module 107 and from module 158 may be in the same format; in some embodiments, the data received by engine 160 from module 107 and from module 158 may be in different formats.

The arbitration coordination engine 160 may apply one or more data processing operations to coordinate, combine, and/or otherwise process the received input data to generate output data as described above. In some embodiments, the arbitration coordination engine 160 may average the corresponding input values weighted according to the confidence values and/or weighted according to one or more other weighting factors. In some embodiments, the arbitration coordination engine 160 may select preferred values and/or discard non-preferred values. The overall sanction output data generated by module 112 may include a binary indication of the sanction (e.g., whether one or more criteria are met, such as whether the received data meets one or more business parenchymal criteria, one or more relatives criteria, and/or one or more retrievability criteria). In some embodiments, the overall arbitration output data may include tuples (L, C), where L represents data indicating a likelihood of meeting one or more criteria (e.g., business substance), and where C represents data indicating a confidence level of the arbitration. The output data generated by the module 112 may be stored, transmitted, presented to a user, used to generate one or more visualizations, and/or used to trigger one or more automation system actions.

Correlation square

As explained above, there is a need for improved methods for AI-enhanced automatic analysis of documents in order to quickly and efficiently make various decisions based on the document, including decisions as to whether the document indicates a relationship between two or more parties and whether the document represents a transaction, agreement, contract, arrangement, or other interaction between the relevant parties.

Improved systems that meet these needs may find application in a variety of use cases, including rapid and accurate assessment of compliance with regulations and/or best practices regarding related party transactions. For example, compliance with ASC 850 may require identifying a interested party and arbitrating whether one or more interactions constitute a interested party transaction.

In some embodiments, compliance with regulations and/or best practices may require that financial statements disclose significant related party transactions in addition to compensation arrangements, fee benefits, or other similar items that occur during normal business processes. For these purposes, a interested party may be defined as any party that includes control or may have a significant impact on another entity's management or business policy to the extent that it would prevent the other entity from fully pursuing its own interests. The interested parties may include affiliated companies, invested parties that are authorized by equity, trust established for employee benefits, primary owners, management layers, and/or immediate relatives of owners and/or management layers. In some embodiments, compliance and/or best practices may require disclosure of transactions with interested parties even if such transactions are not validated (e.g., even if the service is performed without payment). In some embodiments, compliance and/or best practices may require that the disclosure of terms that do not assert a related party transaction be essentially equivalent to a fair transaction, unless those assertions may be confirmed. In some embodiments, compliance and/or best practices may require that if reporting of the financial status or business outcome of an entity can change significantly due to common control or common management, then the nature of ownership or management control must be disclosed even if there is no transaction between the entities.

Examples of related-party transactions (or other interactions) may include transactions (or other interactions) between: parent entity and subsidiary; two or more subsidiary companies of the same parent entity; an entity with trust established for the benefit of the entity's staff (such as pension trust or profit sharing trust managed or hosted by the entity); the entity and its primary owners, management layers, and/or orthotics; and/or a subsidiary.

Transactions between related parties may occur during normal business processes. Examples of common transactions between interested parties may include: selling, purchasing and/or transferring real and/or real properties; services such as accounting, management, engineering, and/or legal services; property and/or equipment is used by leasing or other means; borrowing, loaning, and/or securing; maintaining a compensating bank balance for the benefit of the interested party; entity internal charging based on common cost allocation; and submitting the comprehensive tax return.

Thus, compliance with regulations and/or best practices may require accurate identification of which parties should be considered related parties to each other, and which interactions between parties should be considered related party transactions (e.g., which interactions satisfy one or more related party transaction criteria).

Traditional methods for determining relationships between parties, identifying and evaluating related party transactions rely on voluntary manual disclosure and manual evaluation, which introduces inaccuracy due to human error, inefficiency, incompleteness, and the likelihood of human introduced bias or dishonest. Using conventional methods, it is difficult to identify the undisclosed party from the sample transaction and to distinguish between pure accounting errors and accounting fraud.

Accordingly, there is a need for systems and methods for automated identification of relevant parties and automated identification and arbitration of associated transactions based on processed documents or other data to improve efficiency and accuracy and reduce human-induced bias.

Disclosed herein are systems configured for AI-enhanced identification of interested parties and arbitrating whether interactions (e.g., arrangements, agreements, contracts, transactions, or other underlying data) represented by one or more ingested documents (or other data) meet the interested party criteria. As explained herein, the systems disclosed herein may automatically generate and iteratively/recursively augment data structures, such as relational graphs, representing relationships between parties based on received queries of documents and/or multiple data sources. The system disclosed herein may also automatically arbitrate whether the interaction satisfies the interested party criteria (e.g., whether the transaction constitutes a interested party transaction) based on the generated data structure representing the interested party and/or based on anomaly detection and behavioral modeling of one or more parties to the interaction.

In some embodiments, a system for automatically determining relationships between parties and for automatically arbitrating related party transactions is provided. The system may be configured to receive one or more documents (e.g., PDF documents, word processing documents, JPG documents, etc.) or other data and automatically process the received documents to extract information from the documents. The extracted information may be evaluated to identify and characterize relationships between one or more parties, identify additional relationships between parties based on querying additional data sources, and determine whether the information represents one or more interactions (such as contracts or transactions). The system may then evaluate the extracted information regarding those one or more interactions to determine whether the one or more interactions satisfy predefined (or dynamically determined) relevant party transaction criteria.

In some embodiments, determining whether the relevant party transaction criteria are met may be performed based at least in part on evaluating a data structure (which may be system generated) (e.g., a graph data structure) representing the relationship between the parties. Determining whether the relevant party transaction criteria are met may be performed based at least in part on behavioral modeling. In some embodiments, determining whether the relevant party transaction criteria are met may be performed at least in part by generating and evaluating one or more feature vectors. The system may be configured to automatically generate feature vectors (which may be referred to as "use case vectors") representing identified interactions (e.g., identified contracts, transactions, etc.) in the received document.

In some embodiments, the system may initially identify one or more relationships between parties based on information extracted from one or more documents and/or based on information submitted by one or more users. For example, one or more document understanding techniques may be used to automatically extract initial information about the relevant parties from the public disclosure documents and/or from documents provided during the auditing process. The initial information may identify, for example, a board of directors, stakeholders, bond holders, investors, and/or other stakeholders associated with the corporate entity. This initial information may be used to generate a data structure, such as a graph data structure, that represents an initial understanding of known relationships between multiple parties. In some embodiments, entities may be represented as nodes in a graph data structure, and relationships between entities may be represented as edges linking together a collection of nodes in a graph data structure.

After generating the initial version of the data structure representing the relationship, the system may augment or otherwise update the data structure by building on the basis of the data structure to represent additional relationships not disclosed in the initially processed document (and/or to represent additional information about the relationships that have been depicted). In some embodiments, the system may augment or otherwise update the data structure by sending one or more queries to one or more entity databases, where the query input is based on the name/identity of the target entity and/or the name/identity of one or more entities that have been initially determined to be related to the target entity. New entities returned as a result of these queries may be disambiguated and then added to the data structure representing the relevant party. The data structure representing the interested party may be updated in response to a user request, in response to a triggering event, in response to receiving new data/documents, and/or in response to receiving a request to make a arbitration regarding a potential interested party transaction involving the entity represented by the data structure.

In some embodiments, the system may additionally be configured to perform behavioral modeling on one or more entities that are the subject of, or are otherwise included in, a data structure (e.g., a graph data structure) representing relationships of the entities. For example, the system may analyze and evaluate a history of one or more entities with respect to behavior, such as any one or more of the following:

order-to-cash behavior (e.g., on a per entity basis, during accounting, assessing and identifying differences between retrievability and original credit assessment);

explicit and implicit discounts for a given action (e.g., as recorded in ERP data and/or as determined from actual payment data);

reclaiming activity (e.g., activity regarding overdue invoices, treatment of benefits/readies (e.g., reimbursement of outstanding invoices a certain number of days after the invoice expires)); and/or

Management overrides the behavior.

In some embodiments, behavioral modeling may be based on data extracted from one or more documents received by the system and subjected to one or more document understanding processes, as described herein.

Once the data structure representing the interested parties has been created and the behavioral modeling has been performed on one or more of the parties represented in the data structure, the system can use the results of the data structure and behavioral modeling to determine any dependencies (according to the behavioral model) between the behaviors of the entities represented in the same data structure and indicated as being related to each other. In some embodiments, the system may determine a highly correlated behavioral model (e.g., a correlation score exceeding a particular threshold) and a correlation pattern may indicate that there is a high risk of correlation pattern anomalies. In some embodiments, the system may be configured to assign a score to quantify an estimated risk of the relevant party anomaly, as determined based on the behavioral model and a data structure representing the relevant party. For example, if the behavior of the transaction is highly correlated with a behavior model determined by the system that indicates that the parties to the transaction are correlated with each other, and if the data structure indicates that the parties to the transaction are correlated with each other, the system may determine that there is a risk that the transaction is a "party-to-party transaction" that meets one or more party-to-party transaction criteria, such as requiring reporting or disclosure to comply with regulations or best practices.

In some embodiments, the system may arbitrate interactions, such as transactions, represented by documents received by the system, where the arbitration determines whether the interactions meet one or more relevant party transaction criteria (e.g., according to ASC 850), e.g., require reporting or disclosure to comply with regulations or best practices.

FIG. 2A illustrates an exemplary method 200 for generating a plurality of graph data structures representing relationships between entities. In some embodiments, graph data structures generated in accordance with method 200 may be used to make one or more decisions about a interested party, for example, by applying one or more data processing operations to the generated graph data structure(s) to automatically determine whether data representing two or more parties meets relevant party criteria, and/or by applying one or more data processing operations to the generated graph data structure(s) to automatically determine a quantification of a relationship (e.g., a relationship type and/or a relationship score) between two or more parties. In some embodiments, a system performing method 200 (e.g., a system including one or more processors configured to perform the method based on received data representing multiple parties) may store, transmit, display, and/or visualize output data including graph data structures as generated by method 200 and/or including determinations made based on automatic analysis of one or more of the graph data structures. In some embodiments, the system may perform one or more automated system actions that are triggered based on the output data.

As indicated by block 201, in some embodiments, the method steps shown downstream of block 201 may be performed for a given entity. Entities may include people, companies, partnerships, organizations, government, universities, towns, cities, countries, etc. A system performing method 200 may receive data representing a given entity, including structured, unstructured, and/or partially structured data, for example. In some embodiments, the system may extract data about a given entity from one or more documents, for example, by applying one or more document understanding techniques. In some embodiments, the system may identify a given entity from among a plurality of entities represented in the received data. In some embodiments, a given entity may be specified by user input received by the system.

Turning first to blocks 202-208, the system may generate a first graph data structure representing board relationships, parent relationships, and/or child relationships for a given entity.

At block 202, in some embodiments, the system may apply one or more data analysis operations to automatically identify one or more boards of directives associated with a given entity based on received documents and/or other data representing the given entity. In some embodiments, a person or entity having a different designated role (other than board of directors) may alternatively (or additionally) be identified at block 202. In some embodiments, the system may store data and/or metadata representing one or more boards identified as being associated with a given entity locally and/or remotely. Can store indication board is a function of the identity of the person, and information representing roles about the board of the board may be stored in association therewith (e.g., time information, location information, etc.).

At block 204, in some embodiments, the system may apply one or more data analysis operations to automatically identify one or more entities associated with the identified board(s).

The system may determine which entities are associated with the identified board(s) by analyzing the received documents and/or other data representing the given entity, the received documents and/or other data representing other entities, and/or any other data available to the system. In some embodiments, the system may identify the interested party based only on information that the system has available. In some embodiments, the system may actively seek and retrieve information associated with one or more of the identified boards, for example, by crawling the information from publicly available data sources, in order to identify entities related to the one or more boards.

In some embodiments, the system may apply one or more relationship scoring algorithms to quantify the degree of relationship between two entities in order to determine whether an entity should be designated as being "related" (e.g., "associated") with a board of directors. In some embodiments, an entity pair may be designated as related only if the relationship score of the entity pair exceeds a predefined (or dynamically determined) threshold.

At block 206, in some embodiments, the system may apply one or more data analysis operations to automatically identify one or more parent entities and/or one or more child company entities associated with a given entity. In some embodiments, entities having different relationships (other than parent or child) may alternatively (or additionally) be identified at block 206. In some embodiments, the system may store data and/or metadata representing one or more entities identified as being associated with a given entity locally and/or remotely. Data indicative of the identity of the entity may be stored, and metadata representing information about the entity (e.g., time information, location information, etc.) may be stored in association therewith.

The system may identify the associated entity by analyzing the received document and/or other data representing the given entity, the received document and/or other data representing other entities, the received document and/or other data representing one or more identified directors, and/or any other data available to the system. In some embodiments, the system may identify the interested party based only on information that the system has available. In some embodiments, the system may actively seek and retrieve information associated with one or more identified boards, for example, by crawling the information from publicly available data sources, in order to identify entities related to the one or more boards.

In some embodiments, the system may apply one or more relationship scoring algorithms to quantify the degree of relationship between two entities in order to determine whether an entity should be designated as "related" (e.g., "associated") to a given entity. In some embodiments, an entity pair may be designated as related only if the relationship score of the entity pair exceeds a predefined (or dynamically determined) threshold.

At block 208, in some embodiments, the system may generate and store a graph data structure representing the given entity and one or more relationships between the given entity and other entities. Other entities in the graph data structure may include people, companies, partnerships, organizations, governments, universities, towns, cities, countries, etc. In some embodiments, other entities in the graph data structure may include one or more boards of directors identified at block 202 and/or one or more sub-companies or parent organizations identified at block 206. In some embodiments, the identified entity may be included in the graph data structure only if it meets one or more criteria, such as a relationship score exceeding a threshold.

In some embodiments, the graph data structure may represent only certain kinds of relationships. For example, the graph data structure generated at block 202 may represent entities related to a given entity by: (a) is a board of a given entity, (b) is a child of the given entity, or (c) is a parent of the given entity; while other kinds of relationships (e.g., employees as a given entity, partnerships with a given entity, etc.) may not be included.

In some embodiments, the graph data structure may represent entities as nodes and may represent relationship information as edges linking pairs of nodes. The graph data structure can store identification data and/or metadata associated with nodes representing entities. The graph data structure can store information that identifies, quantifies, and/or otherwise characterizes a relationship between two entities as an edge of a linked node pair. In some embodiments, edges may be weighted or otherwise configured according to data indicating the type, strength, or other characteristics of a relationship between two nodes. For example, edges may be weighted according to a relationship score such that edges have a higher weight when two entities are more closely related. In some embodiments, all information available to the system about various relationship types (and various corresponding relationship strengths) between two entities may be combined and normalized into a single relationship-score quantification.

Turning now to blocks 210-212, the system may generate a second graph data structure representing one or more relationship types of a given entity other than board relationships, parent relationships, and/or child relationships.

At block 210, in some embodiments, the system may apply one or more data analysis operations to automatically identify one or more entities associated with a given entity (either as its board, as its parent company, or as other means than its child company) based on the received documents and/or other data representing the given entity. In some embodiments, the system may store data and/or metadata representing one or more entities identified as being associated with a given entity locally and/or remotely. Data indicating the identity of the identified entity may be stored, and metadata representing information (e.g., time information, location information, etc.) about the relationship of the entity with the target entity may be stored in association therewith.

At block 212, in some embodiments, the system may generate and store a second graph data structure that is different from the graph data structure generated at block 208, which represents one or more relationships between the given entity and other entities identified at block 210. The process for generating the second graph data structure at block 212 may share one or more characteristics common to the process for generating the second graph data structure at block 208.

All or a portion of method 200 may be performed iteratively, as indicated by the arrows returning from blocks 208 and 212 to block 201. For example, after identifying one or more new entities associated with the original given entity and generating an initial version of the graph data structure, the new entity may be selected (from among the nodes of the graph data structure) as the new given entity, and then the process of identifying the associated entity may be repeated. Thus, new nodes and/or new edges may be added to the previously generated graph data structure based on the newly identified parties and/or newly identified relationships.

In some embodiments, iterations of method 200 may continue until one or more stop conditions are met. For example, the stop condition may include one or more of the following: the predetermined number of iterations is performed after a predetermined amount of time, a predetermined number of iterations is performed, a predetermined number of nodes is added to the graph, a predetermined number of edges is added to the graph, a predetermined number of iterations is performed with less than a threshold number of edges and/or nodes being added to the graph, and/or a predetermined number of iterations is performed within the sliding window in the graph with less than a threshold number of edges and/or nodes being added to the sliding window.

In some embodiments, the system may automatically perform one or more new iterations of method 200 according to a predetermined schedule, according to user input, and/or in response to one or more trigger conditions (e.g., the system detects new data is available for analysis).

In some embodiments, the system may select a new given entity as the focus of the iteration of method 200 based on random or quasi-random selections. In some embodiments, a new given entity for the iteration may be selected based on user input. In some embodiments, a new given entity may be selected for iteration based on its proximity (or distance) to a previously analyzed entity. In some embodiments, a new given entity for an iteration may be selected based on the new given entity being most recently added to the graph, e.g., based on it being added in a previous iteration and/or based on it not yet being analyzed as a target entity.

In some embodiments, after generating one or more graph data structures, the system may analyze the one or more graph data structures to determine whether the entity pairs meet the relatives criteria. In some embodiments, the relatives criteria may include pairs of entities that are indicated as being related by each entity being included in the same graph data structure. In some embodiments, the relatedness criterion may include that a pair of entities is indicated as being related by each entity being included in a minimum threshold number of identical graph data structures. In some embodiments, the relatedness criterion may include that an entity pair is indicated as being related by being present in the same graph data structure (or a threshold minimum number of same graph data structures) within a certain distance of each other. For example, the system may calculate the distance by calculating the number of "hops" separating two nodes representing two entities in the graph data structure, and if the number of hops falls below a threshold distance, then it may be determined that the two entities meet the relatives criterion. In some embodiments, calculating the distance between two parties in the graph data structure may include calculating a weighted distance, where the distance between nodes is calculated from the number of hops as weighted on a per-hop basis, based on the weights assigned to the edges of the nodes connecting the particular hop. Thus, when a pathway between two entities includes edges that are weighted more heavily, the distance between the entities may be calculated to be smaller (indicating that the entities are more closely related) than when edges are assigned low weight values.

Retractability of

As explained above, there is a need for improved methods for AI-enhanced automatic analysis of documents in order to quickly and efficiently make various decisions based on the document, including decisions as to whether the document indicates the likelihood of a reclaim (e.g., a reclaimable decision).

Improved systems that meet these needs may find application in a variety of use cases, including rapid and accurate assessment of compliance with regulations and/or best practices with respect to certifying/verifying potential recovery. For example, compliance with ASC 606 requirements for a trade price for providing services or goods to a customer must be possible, where "possible" means that future events are likely to occur.

Traditional methods of determining retrievability provide inadequate granularity for intended interpretation and behavioral analysis. For example, an invoice may expire and eventually enter a retraction process for various reasons, such as not receiving the invoice, the invoice contents or amount being disputed, or the invoice issuer holding payment as a bargaining chip for additional offers. Existing methods of determining the retrievability are to measure the retrievability at the customer level rather than at the transaction/contract level. These prior methods do not take into account that customers may delay payment for a particular contract and/or transaction due to disputes.

Conventional methods of determining retractability are limited by excessive look-back. The retractability is intended to measure the ability and willingness of a customer to pay on time. Existing methods are based on checking payment history (aided by some of these ERPs) and do not take into account fully the current and future circumstances of the parties, their value chains (upstream or downstream) and the broader economic environment. For example, a port strike may prevent the export of goods and the import of parts, and a chain reaction to the physical cash flow may be foreseen. The prior art does not explain the possibilities of this.

Traditional methods of determining retrievability do not consider "black swan" events (e.g., rare events). The existing methods do not take into account the broader economics. For example, catastrophic events such as major terrorist attacks, major national/global financial crisis and/or pandemic events can all result in sudden, substantial and comprehensive damage that cannot be predicted from only past events. When such a swan event occurs, many entities may attempt to retain cash, including withdrawing a line of credit, deferring payment, and/or halting efforts that do not contribute to the instant cash flow. Thus, for retrievability considerations, it is beneficial to consider the effects of such swan events (in terms of retrievability), even though it may not be necessary to predict the actual occurrence of such events.

Accordingly, there is a need for a system and method for performing automatic arbitration of retractability that is more efficient and accurate than the prior art. There is a need for a system that applies improved scalability resolution techniques in the following manner: providing greater granularity than existing systems, without overseeing/looking back completely, and fully accounting for the consequences of potential black swan events.

Disclosed herein are systems configured for AI-enhanced arbitration of retrievability based on one or more ingested documents (or other data). As explained herein, the systems disclosed herein may receive one or more documents (or other data) representing one or more parties and interactions (e.g., contracts, transactions, etc.) between the parties, and the systems may automatically make decisions regarding whether the one or more parties and/or interactions meet one or more retrievability criteria. The criterion of retrievability may include a criterion that the retrieval is more likely than less likely, and/or may include a criterion that the likelihood of the retrieval meets a particular likelihood threshold (e.g., 75%, 90%, etc.).

To determine whether retraction is possible, a transaction price may first be determined prior to assessing the retractability, wherein the determined transaction price takes into account any price yielding. Dominant yield and implicit yield may be considered; for example, implicit yield may be supported by a history of providers offering discounts to customers.

To assess the financial capabilities of an entity (e.g., a customer), to attempt to assess the likelihood of a withdrawal, any one or more of the following may be considered: credit risk, credit history, past experience with the entity category to which the entity belongs, current economic status of the entity industry, and/or revenue for the entity.

The retractability may be re-assessed if one or more facts or circumstances change significantly, such as in the case of one or more of the following: one party announces a bankruptcy during the contract and/or one party reports a negative cash flow after the contract begins. If re-evaluation indicates that retrievability is unlikely, the provider may cease confirming revenue, but the provider may not need to offset previously confirmed revenue.

If a partial payment is received but the retrievability of the entire payment (e.g., the remainder of the payment) is unlikely, then in some embodiments one or more of three events must occur to make the payment considered revenue: (1) The payee has no residual obligation to the payer, and the promised money of the payer is totally received and cannot be refunded; (2) the contract/agreement has terminated and the payment is not refundable; and/or (3) the payee has transferred control of the goods or services associated with the received price, the payee has stopped transferring goods or services to another party (as applicable) and has no obligation to transfer additional goods or services under contract, and the price received from the other party is irrevocable.

In some embodiments, a system may receive one or more documents and may subject the received documents to one or more document understanding techniques, e.g., as described herein, in order to extract data from the received documents. The data extracted from the document may be used to arbitrate the retrievability.

In some embodiments, the arbitration regarding the retrievability may be based at least in part on endogenous information (received by the system from one or more sources that are endogenous to the interaction to arbitrate the retrievability). In some embodiments, the arbitration regarding the retrievability may be based at least in part on exogenous information (received by the system from one or more sources that are exogenous to the interaction to arbitrate the retrievability). In some embodiments, the system may receive the endogenous information and the exogenous information together, and may subject the received information to one or more data processing operations (e.g., models) to identify the endogenous information and to identify the exogenous information. In some embodiments, the system may receive initial input indicative of certain endogenous and/or exogenous information, and the system may locate and identify other endogenous and/or exogenous information based on the received information.

In some embodiments, the received endogenous information may include information and knowledge related to the contract (or other interaction) that may be used to determine an uncertainty level for pay-per-time (e.g., to evaluate the likelihood of reclaiming). The received endogenous information may include, for example:

payment history, including differences in different products, services and/or product/service categories;

credit assessment (e.g., at the time of introduction to the customer, prior to initiating a contract/interaction that is assessing its retrievability);

payment history of other entities (e.g., entities within the same department/industry to establish benchmarks);

payment histories of other entities that are part of the value chain (upstream and/or downstream) of the target entity whose retrievability is being evaluated.

In some embodiments, the received exogenous information may include information and knowledge that may be used to determine the uncertainty level of the pay-per-time (e.g., to evaluate the likelihood of retraction). The received exogenous information may include, for example:

economic behavior of industries related to the target entity;

economic behaviour of the value chain of the target entity (upstream and/or downstream);

Information about news events related to the target entity, industry, and/or value chain of the target entity;

product review information;

employee mood information (e.g., from social media);

consumer mood information (e.g., derived from social media).

In some embodiments, the resolution of the retrievability may be made based at least in part on information related to one or more disputes between two or more entities, the one or more disputes being related to contracts or other interactions that are assessed for the retrievability. This may include contending with respect to the target contract/interaction and/or contending with respect to other contracts/interactions. This information may be received as part of and/or in addition to the endogenous and/or exogenous information described above. In some embodiments, in the case of consignment agreements, disputes between entities may be included in the price.

Once the endogenous and exogenous information (optionally, along with any other information) are received by the system and subjected to any data processing operations (e.g., document understanding models), the system can use the received information to generate a retrievable uncertainty model. The retrievability uncertainty model can be developed based at least in part on the endogenous information and/or the exogenous information. The retrievability uncertainty model may be configured to generate an output regarding the retrievability uncertainty, e.g., an uncertainty that predicts pay-per-time behavior, for a particular entity or group of entities and/or for a particular interaction/contract or group of interactions/contracts.

The baseline uncertainty (baseline uncertainty) of the retrievable uncertainty model may originate from (e.g., be determined based on) a previous payment action. Uncertainty may increase for an entity that was previously overdue paid, worsening of the entity's payment behavior over time, persistent disputes for the particular contract/interaction being evaluated, and/or worsening of payment behavior observed from one or more other entities in the same industry as the target entity.

In addition to the baseline uncertainty, one or more predictive models may be used to predict cash flow of the target entity over the relevant time period of the assessed contract/interaction. In some embodiments, the predictive model may be based at least in part on exogenous information received by the system, such as information about economic behavior of industries related to the target entity, economic behavior of a value chain of the target entity, financial performance (e.g., available if the target company is a marketable company) prior to the target entity, and/or information about a broader (e.g., local, national, and/or global) economic environment.

The system may be configured to apply one or more pressure tests to the retrievable uncertainty model (and/or one or more predictive models included therein) in order to evaluate the performance of the model in response to black swan events. These stress tests may be used to verify the elasticity of the model in response to black swan events, e.g., to assess the accuracy of the behavior of the model predicted in response to such events. Such stress tests may be performed after a black swan event occurs when actual data of the event outcome is available in order to evaluate the performance of the model. In some embodiments, the model may be refined or otherwise updated based on the results of one or more of the stress tests (e.g., output data regarding the accuracy of the model).

The retrievability uncertainty model can be configured to leverage information about product reviews, employee moods, and/or consumer moods to generate an output about the retrievability uncertainty (e.g., uncertainty in predicting pay-per-time behavior).

In some embodiments, the retrievability uncertainty model may be configured to receive as input information about a particular contract/interaction to arbitrate, and use this information to arbitrate the retrievability. (in some embodiments, the system may consider this information in an alternative or additional manner in addition to applying the retrievability uncertainty model described herein.) in some embodiments, the information about the particular contract/interaction to be arbitrated may include fine-grained information, including due investigation information, communications with the entity since the invoice was created, any dispute information between entities about the contract/interaction to be arbitrated (whether about the contract/interaction to be arbitrated or about one or more other contracts/interactions). In some embodiments, in the case of consignment agreements, disputes between entities may be included in the price. After generating and optionally refining the retrievability uncertainty model, the system can apply the retrievability uncertainty model to arbitrate the retrievability of the target contract/interaction to be arbitrated. The system may receive information about the particular contract/interaction to arbitrate and may use this information to arbitrate the retrievability. Applying the retrievability uncertainty model can include providing information regarding the particular contract/interaction to arbitrate as input (e.g., as described above) such that the retrievability uncertainty model can generate output data indicative of the measure of retrievability. The output data may include a score for retrievability, a categorization of retrievability (e.g., "retrievable" and "non-retrievable"), a predicted percentage likelihood that all of the retrievals will be made, a predicted percentage likelihood that partial payments will be made, and/or a predicted percentage likelihood that all or partial payments will be made at one or more particular points in time. The generated output data may be displayed to a user, transmitted to one or more other systems for storage, used as a basis for one or more visualizations, or used as a trigger event for applying one or more data processing operations to the generated graph data structure(s) for automatic determination.

FIG. 3A illustrates an exemplary logical architecture for making a resolution of the retractability. All or part of the logical architecture shown in fig. 3A may be applied by the systems described herein, including by being applied as part of the above-described retrievable uncertainty model. As shown, the logical architecture in fig. 3A may identify clients whose retrievability is to be assessed and may determine whether available credit data indicates a low risk. If the available credit data does not indicate a low risk, the system may make a decision indicating that the interaction is not retrievable. If the available credit data does indicate a low risk, the system may analyze the available payment history data for a period of the past 12 months. If the payment history data indicates a number of days outstanding sales (DSO) and/or delinquent invoice data indicating that a threshold was exceeded, the system may make a arbitration indicating that the interaction is not retrievable. If the payment history data does not result in an interactive, unretractable arbitration, the system may analyze the adverse event data to determine if the adverse event may significantly affect the cash flow of the customer. If it is determined that there are sufficiently likely significant cash flow effects, the system may make a decision indicating that the interaction is not retrievable. If the system determines that significant cash flow effects are unlikely, the system may make a decision indicating that the interaction is retrievable.

FIG. 3B illustrates an exemplary method 301 for applying multiple models to arbitrate retrievability based on customer data and multiple data sources. The method 301 may be applied by a system comprising one or more processors. Method 301 may share any one or more features in common with the method described above with reference to the logical architecture for making decisions of retrievability shown in fig. 3A.

At block 302, in some embodiments, the system may identify data representing accounts receivable. In some embodiments, method 301 may be applied to each receivables in the available data sets or across multiple data sets.

At block 304, in some embodiments, the system may identify the entity (e.g., customer) indicated in the receivables identified at block 302. In some embodiments, method 301 may be applied to each entity in the identified accounts payable.

At block 306, in some embodiments, the system may retrieve data (if available) indicating a third party rating or quantification of the identified customer. This data may be retrieved from any suitable public or private data source. For example, the system may retrieve data related to the D & B rating or ratings of any suitable mechanism that quantifies or characterizes the payability or creditability of the customer.

At block 308, in some embodiments, the system may retrieve data (if available) indicative of industry benchmark data, industry trend data, or the like. This data may be retrieved from any suitable public or private data source. In some embodiments, industry trend data may be generated by the system based on data related to multiple individual organizations in the same industry or department. For example, the system may retrieve data regarding financial performance of other entities in the same industry as the identified entity.

At block 310, in some embodiments, the system may retrieve data (if available) indicative of any news or current events related to the identified entity, related to the industry or department of the identified entity, and/or that may be otherwise expected to affect the identified entity (e.g., by affecting cash flow). This data may be retrieved from any suitable public or private data source.

At block 312, in some embodiments, the system may retrieve data (if available) indicating past payment behavior of the identified entity. This data may be retrieved from any suitable public or private data source.

At block 314, following block 302, the system may identify an invoice associated with the identified accounts receivable. In some embodiments, method 301 may be applied to each invoice in the identified accounts payable.

At block 316, in some embodiments, the system may retrieve data (if available) indicating one or more questions and/or disputes (including disputes regarding the identified invoice and/or involving other invoices or other interactions) related to the two or more entities related to the identified invoice.

At block 318, the data retrieved at blocks 306, 308, 310, 312, and/or 316 may be processed via a retrievability prediction model. In some embodiments, the retrievability prediction model may be configured according to the data retrieved at blocks 306, 308, 310, 312, and/or 316. If data from one or more of blocks 306, 308, 310, 312, and/or 316 is not available, the model may be configured based on other data available.

In some embodiments, the retrievability prediction model can be configured to accept data regarding the invoice identified at block 314 (or another invoice involving the identified entity) and process the received data to generate an output, wherein the output can include a retrievability expiration date 320 and an associated confidence level 322. The output data may include tuples indicating the likelihood of reclamation and associated confidence of the reclamation prediction. As an example, (100%, 90%) may indicate a full retraction before the expiration date and a 90% confidence for the prediction, while (50%, 65%) may indicate a 50% retraction before the expiration date and a 65% confidence for the prediction.

At block 324, in some embodiments, the system may receive data indicative of a payment or collection event associated with the identified invoice for which the output of the model 318 was previously generated. Based on the data indicative of the collection or payment event (and optionally in response to receiving the data), the system may apply one or more continuous learning data processing techniques to process the received data and update the retrievability prediction model 318 so that the model 318 may be improved for future applications.

Computer with a memory for storing data

Fig. 4 illustrates an example of a computer according to some embodiments. Computer 400 may be a component of a system for providing an AI-enhanced audit platform and/or for performing business essence, interested party and/or retrievable AI-enhanced decisions. In some embodiments, computer 400 may perform any one or more of the methods described herein.

The computer 400 may be a host computer connected to a network. The computer 400 may be a client computer or a server. As shown in fig. 4, computer 400 may be any suitable type of microprocessor-based device, such as a personal computer, workstation, server, or handheld computing device (such as a telephone or tablet computer). The computer may include, for example, one or more of a processor 410, an input device 420, an output device 430, a storage 440, and a communication device 460. Input device 420 and output device 430 may correspond to those described above and may be connected to or integrated with a computer.

The input device 420 may be any suitable device that provides input, such as a touch screen or monitor, keyboard, mouse, or voice recognition device. The output device 530 may be any suitable device that provides output, such as a touch screen, monitor, printer, disk drive, or speaker.

Storage 440 may be any suitable device that provides storage, such as electronic, magnetic, or optical memory, including Random Access Memory (RAM), cache, hard disk drive, CD-ROM drive, tape drive, or removable storage disk. Communication device 460 may include any suitable device capable of transmitting and receiving signals over a network, such as a network interface chip or card. The components of the computer may be connected in any suitable manner, such as via a physical bus or wirelessly. The storage 440 may be a non-transitory computer readable storage medium including one or more programs that, when executed by one or more processors (such as processor 410), cause the one or more processors to perform the methods described herein.

Software 450, which may be stored in storage 440 and executed by processor 410, may include, for example, programming (e.g., as described above, as embodied in a system, computer, server, and/or device) that implements the functionality of the disclosure. In some embodiments, software 450 may include a combination of servers such as an application server and a database server.

Software 450 may also be stored and/or transmitted within any computer-readable storage medium for use by or in connection with an instruction execution system, apparatus, or device, such as those described above, that can fetch and execute the instructions associated with the software from the instruction execution system, apparatus, or device. In the context of this disclosure, a computer-readable storage medium may be any medium, such as storage 440, that can contain or store programming for use by or in connection with an instruction execution system, apparatus, or device.

Software 450 may also be propagated within any transmission medium used by or in conjunction with an instruction execution system, apparatus, or device, such as those described above, from which instructions associated with the software are retrieved and executed. In the context of this disclosure, a transmission medium may be any medium that can communicate, propagate, or transport programming for use by or in connection with an instruction execution system, apparatus, or device. The transmission readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, or infrared wired or wireless propagation medium.

The computer 400 may be connected to a network, which may be any suitable type of interconnected communication system. The network may implement any suitable communication protocol and may be secured by any suitable security protocol. The network may include any suitably arranged network link, such as a wireless network connection, T1 or T3 line, wired network, DSL, or telephone line, that enables transmission and reception of network signals.

Computer 400 may implement any operating system suitable for operating on a network. The software 450 may be written in any suitable programming language, such as C, C ++, java, or Python. In various embodiments, for example, application software implementing the functionality of the present disclosure may be deployed in different configurations, such as in a client/server arrangement or through a Web browser as a Web-based application or Web service.

The following is a list of examples:

embodiment 1, a system for classifying documents, the system comprising one or more processors configured to cause the system to:

receiving data representing a document;

applying one or more natural language processing techniques to the received data to generate a feature vector representing the document;

identifying a second feature vector from the use case library based on similarity to the feature vector based on the feature vector;

applying the plurality of models to the feature vectors to calculate respective changes to the plurality of features represented by the document; and

determining a arbitration for the document based on the identified second feature vector and based on the calculated corresponding changes in the plurality of features, wherein the arbitration includes an arbitration classification and an arbitration confidence score.

Example 2, the system of example 1, wherein:

the one or more processors are configured to identify, from the use case library, a cluster of feature vectors from the use case library that has a highest level of similarity to the feature vector among the clusters of feature vectors in the use case library based on the feature vector; and

wherein the determining of the arbitration is further based on the cluster of identified feature vectors.

Embodiment 3, the system of any of embodiments 1-2, wherein the plurality of features includes one or more of: risk characteristics, timing characteristics, and monetary characteristics.

Embodiment 4, the system of any of embodiments 1-3, wherein applying the plurality of models to the feature vector includes calculating a plurality of features and comparing the calculated plurality of features to corresponding baseline features obtained from the ERP data source to calculate the corresponding changes.

Embodiment 5, the system of any of embodiments 1-4, wherein calculating the respective changes includes generating a plurality of respective change values and a plurality of respective change confidence levels.

Embodiment 6, the system of any of embodiments 1-5, wherein applying the one or more natural language processing techniques to the received data to generate a feature vector comprises:

Applying a plurality of model sets in parallel with each other, wherein each of the model sets is configured to process the received data to generate corresponding output data; and

output data from each of the models is stored in a feature vector.

Embodiment 7, the system of embodiment 6, wherein a first set of models of the plurality of sets of models includes a first sentence classification module and a classification module configured to generate output data related to a first type of content of the document.

The system of any of embodiments 8, 6-7, wherein the second set of models of the plurality of sets of models includes a structural classification module, a language modality classification module, and a classification module configured to generate output data related to the second type of content of the document.

Embodiment 9, the system of any of embodiments 6-8, wherein a third set of models of the plurality of sets of models includes a second sentence classification module and a classification module configured to generate output data related to a third type of content of the document.

The system of any of embodiments 10, 1-9, wherein determining the sanction classification includes determining whether the document meets business parenchyma criteria.

The system of any of embodiments 11, 1-10, wherein determining the arbitration classification and arbitration confidence score includes applying arbitration coordination data processing operations based on data associated with the identified second feature vector and based on the calculated corresponding changes in the plurality of features.

Embodiment 12, a non-transitory computer-readable storage medium storing instructions for classifying documents, the instructions configured to be executed by one or more processors to cause a system to:

receiving data representing a document;

Embodiment 13, a method for classifying documents, wherein the method is performed by a system comprising one or more processors, the method comprising:

Receiving data representing a document;

Embodiment 14, a system for identifying interested parties within a plurality of databases, the system comprising one or more processors configured to cause the system to:

receiving a data set indicating a first set of parties associated with an entity;

generating a graph data structure representing a first plurality of relationships between the entity and the first set of parties based on the first set of parties;

submitting one or more of the first set of parties as one or more input queries to obtain a second set of parties from a plurality of databases that are related to the one or more input queries; and

The graph data structure is updated based on the second set of parties to represent a second plurality of relationships between the entity and the second set of parties.

Embodiment 15, the system of embodiment 14, wherein the one or more processors are configured to apply one or more disambiguation models to the second set of parties prior to updating the graph data structure based on the second set of parties.

Embodiment 16, a non-transitory computer-readable storage medium storing instructions for identifying interested parties within a plurality of databases, the instructions configured to be executed by a system comprising one or more processors configured to cause the system to:

receiving a dataset indicating a first set of parties associated with an entity;

Embodiment 17, a method for identifying interested parties within a plurality of databases, wherein the method is performed by a system comprising one or more processors, the method comprising:

Embodiment 18, a system for anomaly identification and analysis, the system comprising one or more processors configured to cause the system to:

receiving input data representing a plurality of interactions between a first entity and a plurality of respective entities;

applying one or more anomaly identification models to generate anomaly data representing a first subset of interactions as anomalies; and

a second subset of interactions is identified, wherein the second subset is a subset of the first subset, wherein the identification of the second subset is based on the anomaly data and on a data structure representing a plurality of relationships between the first entity and a set of entities associated with the entity.

The system of embodiment 19, wherein the input data comprises transaction data.

The system of embodiment 20, any of embodiments 18-19, wherein the second subset of interactions is identified as transactions having an elevated risk of correlated party anomalies.

Embodiment 21, a non-transitory computer-readable storage medium storing instructions for anomaly identification and analysis, the instructions configured to be executed by a system comprising one or more processors to cause the system to:

Embodiment 22, a method for anomaly identification and analysis, wherein the method is performed by a system comprising one or more processors, the method comprising:

Embodiment 23, a system for behavioral modeling and analysis, the system comprising one or more processors configured to cause the system to:

receiving first input data comprising a data structure representing a relationship between a plurality of entities;

receiving second input data representing the behavior of one or more of the entities represented in the data structure; and

one or more behavioral models are applied to determine a risk of a relevant party anomaly represented by the second input data based on the first input data and the second input data.

Embodiment 24, a non-transitory computer-readable storage medium storing instructions for behavioral modeling and analysis, the instructions configured to be executed by a system comprising one or more processors to cause the system to:

Embodiment 25, a method for behavioral modeling and analysis, wherein the method is performed by a system comprising one or more processors, the method comprising:

Embodiment 26, a system for identifying relationships between entities represented in one or more data sets, the system comprising one or more processors configured to cause the system to:

Receiving one or more data sets representing a plurality of entities;

generating a graph data structure based at least in part on the one or more data sets, the graph data structure representing entities of the plurality of entities as nodes and relationships between pairs of entities as edges between corresponding pairs of nodes;

receiving input data indicating a query entity pair; and

a determination is made, based at least in part on the graph data structure, whether the query entity pair meets one or more related entity criteria.

The system of embodiment 27, embodiment 26, wherein generating the graph data structure comprises:

selecting a first target entity from among the plurality of entities;

identifying a first set of relationships between a target entity and one or more other entities within the plurality of entities; and

data representing the first set of relationships is stored in a graph data structure.

The system of embodiment 28, embodiment 27, wherein generating the graph data structure comprises:

selecting a second target entity from the one or more entities identified in the first set of relationships that are related to the first target entity;

identifying a second set of relationships between a second entity and one or more other entities within the plurality of entities; and

Data representing the second set of relationships is stored in a graph data structure.

The system of embodiment 29, any of embodiments 26-28, wherein generating the graph data structure includes iteratively expanding the graph data structure until one or more stop conditions are met.

Embodiment 30, the system of any of embodiments 26-29, wherein the edges of the graph data structure are weighted according to a relationship score representing a strength of a relationship between entities represented by the linked nodes.

The system of embodiment 31, any of embodiments 26-30, wherein determining whether the pair of query entities meets one or more relevant entity criteria comprises determining whether the query entities are all represented as nodes in a graph data structure.

The system of embodiment 32, any of embodiments 26-31, wherein determining whether the pair of query entities meets one or more related entity criteria comprises determining whether the query entities are separated by a distance within the graph data structure that is less than a predetermined number of hops.

Embodiment 33, the system of any of embodiments 26-32, wherein determining whether the pair of query entities meets one or more relevant entity criteria comprises determining whether the query entities are separated within the graph data structure by a weighted distance that is less than a predetermined threshold distance, wherein the weighted distance is calculated based on a number of hops between the query entities and based on weights linking one or more hops edges between the query entities.

Embodiment 34, the system of any of embodiments 26-33, wherein determining whether the query entity pair meets one or more relevant entity criteria comprises applying a behavior modeling algorithm to the query entity.

Embodiment 35, a non-transitory computer-readable medium storing instructions for identifying relationships between entities represented in one or more data sets, the instructions configured to be executed by a system comprising one or more processors to cause the system to:

receiving one or more data sets representing a plurality of entities;

receiving input data indicating a query entity pair; and

Embodiment 36, a method for identifying relationships between entities represented in one or more data sets, wherein the method is performed by a system comprising one or more processors, the method comprising:

Receiving one or more data sets representing a plurality of entities;

receiving input data indicating a query entity pair; and

Embodiment 37, a system for predicting likelihood of retraction, the system comprising one or more processors configured to cause the system to:

receiving a first data set comprising endogenous information related to a transaction;

receiving a second data set comprising exogenous information related to one or more parties to the transaction;

configuring a retrievable uncertainty model based on the first data set and the second data set;

receiving a third data set comprising information about the transaction; and

information about the interaction is provided to the retrievability uncertainty model to generate an output indicative of a likelihood of retrieval of the transaction.

Embodiment 38, the system of embodiment 37, wherein the endogenous information comprises one or more selected from the group consisting of: payment history information of a party to the transaction; credit assessment information performed prior to initiation of the transaction; and payment history information for one or more parties associated with the party to the transaction.

Embodiment 39, the system of any of embodiments 37-38, wherein the exogenous information comprises one or more selected from the group consisting of: economic behavior information of an industry related to a party to a transaction; economic behavior information of a value chain of a transaction party; news information related to a party to a transaction, related industry, or related value chain; product review information, employee mood information; consumer mood information.

The system of embodiment 40, any of embodiments 37-39, wherein the third data set includes information regarding prior disputes for transactions between the plurality of entities.

The system of embodiment 41, any of embodiments 37-40, wherein applying the retrievable uncertainty model comprises:

generating an initial prediction of uncertainty based on a first dataset comprising endogenous information; and

one or more predictive models are applied based on a second dataset comprising exogenous information.

Embodiment 42, the system of any of embodiments 37-41, wherein the retrievable uncertainty model is validated after occurrence of the rare event and based on a prediction thereof in response to the rare event.

Embodiment 43, the system of any of embodiments 37-42, wherein the retrievable uncertainty model is configured to generate output data comprising a retrievable expiration date and an associated confidence level.

The system of any of embodiments 44, 37-43, wherein the system is configured to:

receiving data regarding a retraction event associated with a transaction; and

a continuous learning feedback loop is applied to update the retrievability uncertainty model based on data about the retrieval event.

Embodiment 45, a non-transitory computer-readable storage medium storing instructions for predicting a likelihood of retraction, the instructions configured to be executed by a system comprising one or more processors to cause the system to:

receiving a third data set comprising information about the transaction; and

Embodiment 46, a method for predicting a likelihood of retraction, wherein the method is performed by a system comprising one or more processors, the method comprising:

receiving a third data set comprising information about the transaction; and

The present application was filed at month 6 and 30 of 2022, attorney docket number: 13574-20068.00, U.S. patent application entitled "AI-AUGMENTED AUDITING PLATFORM INCLUDING TECHNIQUES FOR AUTOMATED ASSESSMENT OF VOUCHING EVIDENCE".

The present application was filed at month 6 and 30 of 2022, attorney docket number: 13574-20070.00, U.S. patent application entitled "AI-AUGMENTED AUDITING PLATFORM INCLUDING TECHNIQUES FOR APPLYING A COMPOSABLE ASSURANCE INTEGRITY FRAMEWORK".

The present application was filed at month 6 and 30 of 2022, attorney docket number: 13574-20071.00, U.S. patent application entitled "AI-AUGMENTED AUDITING PLATFORM INCLUDING TECHNIQUES FOR AUTOMATED DOCUMENT PROCESSING".

The present application was filed at month 6 and 30 of 2022, attorney docket number: 13574-20072.00, U.S. patent application entitled "AI-AUGMENTED AUDITING PLATFORM INCLUDING TECHNIQUES FOR PROVIDING AI-EXPLAINABILITY FOR PROCESSING DATA THROUGH MULTIPLE LAYERS".

Claims

1. A system for classifying documents, the system comprising one or more processors configured to cause the system to:

receiving data representing a document;

applying one or more natural language processing techniques to the received data to generate feature vectors representing the document;

identifying a second feature vector from a use case library based on similarity to the feature vector based on the feature vector;

applying a plurality of models to the feature vectors to calculate respective changes in a plurality of features represented by the document; and

2. The system of claim 1, wherein:

The one or more processors are configured to identify, from the use case library, a cluster of feature vectors from the use case library that has a highest level of similarity to the feature vector among clusters of feature vectors in the use case library based on the feature vector; and

3. The system of any of claims 1-2, wherein the plurality of features includes one or more of: risk characteristics, timing characteristics, and monetary characteristics.

4. The system of any of claims 1-3, wherein applying the plurality of models to the feature vector includes calculating a plurality of features and comparing the calculated plurality of features to corresponding baseline features obtained from an ERP data source to calculate the corresponding changes.

5. The system of any of claims 1-4, wherein calculating the respective changes includes generating a plurality of respective change values and a plurality of respective change confidence levels.

6. The system of any of claims 1-5, wherein applying the one or more natural language processing techniques to the received data to generate a feature vector comprises:

output data from each of the models is stored in the feature vector.

7. The system of claim 6, wherein a first set of models of the plurality of sets of models includes a first sentence classification module and a classification module configured to generate output data related to a first type of content of the document.

8. The system of any of claims 6-7, wherein a second set of models of the plurality of sets of models includes a structural classification module, a language modality classification module, and a classification module configured to generate output data related to a second type of content of the document.

9. The system of any of claims 6-8, wherein a third set of models of the plurality of sets of models includes a second sentence classification module and a classification module configured to generate output data related to a third type of content of the document.

10. The system of any of claims 1-9, wherein determining an arbitration classification includes determining whether the document meets business parenchymal criteria.

11. The system of any of claims 1-10, wherein determining a arbitration classification and an arbitration confidence score includes applying an arbitration coordination data processing operation based on data associated with the identified second feature vector and based on the calculated respective changes of the plurality of features.

12. A non-transitory computer-readable storage medium storing instructions for classifying documents, the instructions configured to be executed by one or more processors to cause a system to:

receiving data representing a document;

13. A method for classifying documents, wherein the method is performed by a system comprising one or more processors, the method comprising:

Receiving data representing a document;