WO2023274187A1

WO2023274187A1 - Information processing method and apparatus based on natural language inference, and electronic device

Info

Publication number: WO2023274187A1
Application number: PCT/CN2022/101739
Authority: WO
Inventors: 孙长志; 张欣勃; 周浩; 李磊
Original assignee: 北京有竹居网络技术有限公司
Priority date: 2021-07-01
Filing date: 2022-06-28
Publication date: 2023-01-05
Also published as: CN113505206A; CN113505206B

Abstract

Disclosed in embodiments of the present disclosure are an information processing method and apparatus based on natural language inference, and an electronic device. One specific embodiment of the method comprises: receiving a question statement and an associated context, wherein the question statement is used for representing a question to be answered; and on the basis of question feature information of the question statement and context feature information of the associated context, determining an answer to the question and a demonstration graph for obtaining the answer, the demonstration graph representing a process of obtaining the answer by means of inference of the associated context. The demonstration graph for demonstration of the answer is determined at the same time, such that a user can know the process of obtaining the answer conveniently, and the credibility of the answer is improved.

Description

Information processing method, device and electronic device based on natural language reasoning

Cross References to Related Applications

This application claims the priority of the Chinese patent application with application number 202110744658.3 and titled "Information Processing Method, Device and Electronic Equipment Based on Natural Language Reasoning" filed on July 1, 2021, the entire content of which is incorporated by reference incorporated in this application.

technical field

The present disclosure relates to the technical field of the Internet, and in particular to an information processing method, device and electronic equipment based on natural language reasoning.

Background technique

With the development of the field of artificial intelligence, people hope that artificial intelligence can be used to understand natural language, and on this basis, human-computer dialogue can be realized.

In related technologies, the knowledge base can be used for automatic reasoning. To achieve automatic reasoning on the knowledge base, the early work focused on reasoning on the formal representation, that is, the expression form of each sentence in the knowledge base is logic rules, such as first-order logic. logic).

Contents of the invention

This Disclosure section is provided to introduce a simplified form of concepts that are described in detail that follow in the Detailed Description section. This disclosure part is not intended to identify key features or essential features of the claimed technical solution, nor is it intended to be used to limit the scope of the claimed technical solution.

Embodiments of the present disclosure provide an information processing method, device, and electronic device based on natural language reasoning.

In the first aspect, the embodiment of the present disclosure provides an information processing method based on natural language reasoning, including: receiving a question statement and an associated context, wherein the question statement is used to represent a question to be given an answer; a question based on the question statement The feature information and the context feature information of the associated context determine the answer to the question and an argument graph for deriving the answer; the argument graph represents a process of deriving the answer from the associated context.

In a second aspect, an embodiment of the present disclosure provides an information processing device based on natural language reasoning, including: a receiving unit configured to receive a question sentence and an associated context, wherein the question sentence is used to represent a question to be given an answer; A determining unit, configured to determine an answer to the question and an argument diagram for deriving the answer based on the question feature information of the question sentence and the context feature information of the associated context; The process of reasoning to arrive at the stated answer.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device for storing one or more programs, when the one or more programs are executed by the one or more processors, so that the one or more processors implement the information processing method based on natural language reasoning as described in the first aspect.

In a fourth aspect, an embodiment of the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, the information processing method based on natural language reasoning as described in the first aspect is implemented.

The information processing method, device, and electronic device based on natural language reasoning provided by the embodiments of the present disclosure receive a question statement and an associated context, wherein the question statement is used to represent a question to be given an answer; a question based on the question statement The feature information and the context feature information of the associated context determine the answer to the question and obtain the argument graph of the answer, so that when the answer to the question is determined according to the associated context, the argument graph of the argument answer can be determined at the same time, which facilitates The user understands the process of getting the above answers. Compared with the schemes that only give answers or generate answers and argument diagrams separately, the argument diagrams of this scheme can assist in the prediction of answers and improve the ability to answer questions.

Description of drawings

The above and other features, advantages and aspects of the various embodiments of the present disclosure will become more apparent with reference to the following detailed description in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numerals denote the same or similar elements. It should be understood that the drawings are schematic and that elements and elements are not necessarily drawn to scale.

FIG. 1 is a flowchart of some embodiments of an information processing method based on natural language reasoning according to the present disclosure;

Fig. 2 is a flowchart of other embodiments of information processing methods based on natural language reasoning according to the present disclosure;

Fig. 3 is a schematic flow chart of the establishment of the probability graph neural network model in the embodiment shown in Fig. 2;

Fig. 4 a shows a schematic diagram of the relationship between variable A in the joint distribution in the embodiment shown in Fig. 2, Vi and the side variable Eij;

Figure 4b shows the factor plot of the joint distribution of the example in Figure 4a;

FIG. 5 is a schematic diagram of an application scenario of an information processing method based on natural language reasoning provided by the present disclosure;

Fig. 6 is a schematic structural diagram of some embodiments of an information processing device based on natural language reasoning provided by the present disclosure;

FIG. 7 is an exemplary system architecture in which the information processing method based on natural language reasoning according to an embodiment of the present disclosure can be applied;

Fig. 8 is a schematic diagram of a basic structure of an electronic device provided according to an embodiment of the present disclosure.

detailed description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although certain embodiments of the present disclosure are shown in the drawings, it should be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein; A more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for exemplary purposes only, and are not intended to limit the protection scope of the present disclosure.

It should be understood that the various steps described in the method implementations of the present disclosure may be executed in different orders, and/or executed in parallel. Additionally, method embodiments may include additional steps and/or omit performing illustrated steps. The scope of the present disclosure is not limited in this respect.

As used herein, the term "comprise" and its variations are open-ended, ie "including but not limited to". The term "based on" is "based at least in part on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one further embodiment"; the term "some embodiments" means "at least some embodiments." Relevant definitions of other terms will be given in the description below.

It should be noted that concepts such as "first" and "second" mentioned in this disclosure are only used to distinguish different devices, modules or units, and are not used to limit the sequence of functions performed by these devices, modules or units or interdependence.

It should be noted that the modifications of "one" and "multiple" mentioned in the present disclosure are illustrative and not restrictive, and those skilled in the art should understand that unless the context clearly indicates otherwise, it should be understood as "one or more" multiple".

The names of messages or information exchanged between multiple devices in the embodiments of the present disclosure are used for illustrative purposes only, and are not used to limit the scope of these messages or information.

Please refer to FIG. 1 , which shows the flow of some embodiments of an information processing method based on natural language reasoning according to the present disclosure. As shown in Figure 1, the information processing method based on natural language reasoning includes the following steps:

Step 101, receiving a question statement and an associated context, wherein the question statement is used to characterize a question to be answered.

The above question sentences and associated contexts are represented by natural language. For example, question sentences and associated contexts expressed in Chinese; or question sentences and associated contexts expressed in other languages.

The above-mentioned associative context may include facts and rules expressed in natural language.

As an illustrative illustration, the above facts may be, for example:

F1: The circuit includes a battery; F2: The connecting wire is a metal connecting wire.

Such rules may include, for example:

R: If the circuit includes a switch and the switch is open, the circuit is complete.

In some application scenarios, the above question statement may be a declarative statement. The above-mentioned answer to be given may be an answer for giving a judgment result, and the above-mentioned answer may include, for example, "right" or "wrong", "yes" or "no" and so on.

Step 102, based on the question feature information of the question sentence and the context feature information of the associated context, determine the answer to the question and the argumentation diagram for obtaining the answer; the argumentation map represents the answer process.

The argument graph may be a directed acyclic graph, the directed acyclic graph includes nodes and directed edges between nodes, the nodes are statements in the associated context, and the directed edges between the nodes An edge represents an inferred relationship between two associated nodes.

The question feature information can be extracted from the question statement and the context feature information can be extracted from the associated context in various ways.

In some optional implementation manners, the feature information includes feature vectors. The above step 102 may include the following steps:

First, the question sentence and the associated context are input into a pre-trained language model to obtain a question feature vector and an associated context feature vector.

Second, according to the question feature vector and the associated context feature vector, determine the answer and the argument graph.

The above-mentioned language model may be various existing models for determining feature vectors of natural languages. The aforementioned models may be various types of machine learning models.

Various analyzes can be performed on the question feature vector and the associated context feature vector to determine the answer to the question statement and the argument graph.

Optionally, before inputting the question statement and the associated context into the pre-trained language model to obtain the question feature vector and the associated context feature vector, the above step 102 further includes: according to the question statement, using a preset A retrieval method, retrieving at least one sentence from the associated context; encoding the at least one retrieved sentence using a preset encoding method; and inputting the question sentence and the associated context into the pre-training A language model for obtaining the question feature vector and the associated context feature vector, including: inputting the coded question sentence and the coded at least one sentence into a pre-trained language model to obtain the question feature vector and associated contextual feature vectors.

In these optional implementation manners, the aforementioned associated context may be a relatively long article or sentence paragraph. At least one sentence may be retrieved from the associated context by using a preset retrieval method according to the question sentence. The above sentence may be a sentence that is highly related to the question sentence.

For example, according to the keyword of the question sentence, the keyword can be used to retrieve from the associated context to obtain at least one sentence.

Can use single-byte character set (single-byte character set or SBCS) multi-byte character set (multi-byte character set or MBCS) or Unicode method etc. to encode the words and words included in the at least one sentence. Thereby obtaining the computer-processed coded at least one sentence. Words and phrases included in the question sentence can be encoded using a word vector encoding method or the like. Thus, the coded question sentence is obtained.

A feature vector analysis model such as a word vector model can be used to determine the feature vector of the encoded associated context, the feature vector of each sentence in the associated context; and determine the feature vector of the encoded question sentence.

Related technologies use the knowledge base to perform automatic reasoning. In the scheme of realizing automatic reasoning on the knowledge base, since the reasoning is performed on the formal representation, it is necessary to construct logic rules for each sentence in the knowledge base. The process of converting a sentence into logic rules is performed by semantic parsing. The scheme of using the knowledge base for automatic reasoning can give the answer corresponding to the question, without giving the argument while giving the answer, so the user cannot determine whether the answer obtained is reasonable, making the use of the knowledge base in related technologies The automatic reasoning scheme appeared to be less capable of answering.

The information processing method based on natural language reasoning provided by this embodiment receives question sentences and associated contexts, wherein the question sentences are used to characterize the questions to be answered; question feature information based on the question sentences and the association The context characteristic information of the context determines the answer to the question and the argumentation diagram for the answer, so that when the answer to the question is determined according to the associated context, the argumentation diagram for the argumentation answer can be determined at the same time, so that the user can understand the process of obtaining the above answer. Compared with the solution of only giving the answer or separately generating the answer and the argument graph, the argument graph in this embodiment can assist in the prediction of the answer and improve the ability to answer questions.

Please continue to refer to FIG. 2 , which shows a flow chart of some other embodiments of information processing methods based on natural language reasoning provided by the present disclosure.

As shown in Figure 2, the information processing method based on natural language inference provided by these embodiments includes the following steps:

Step 201, receiving a question sentence and associated context, wherein the question sentence is used to characterize the question to be answered.

Natural language inference can use machine learning models to judge the semantic relationship between sentences. For example, input a set of sentences describing facts and judgment rules, and then input a question, and the answer to the question is determined by the above sentences and judgment rules. In this embodiment, the above machine learning model may include a language model and a probabilistic graph neural network model.

The above question sentences and associated contexts are represented by natural language. For example, question sentences and associated context expressed in Chinese; or question sentences and associated context expressed in other languages.

Step 202, input the question sentence and the associated context into a pre-trained language model to obtain a question feature vector and an associated context feature vector.

Before the step 202, the information processing method based on natural language reasoning further includes: according to the question sentence, using a preset retrieval method to retrieve at least one sentence from the associated context; using a preset coding method to retrieve the retrieved The at least one statement is encoded; and step 202 may also include: inputting the encoded question statement and the encoded at least one statement into a pre-trained language model to obtain the question feature vector and associated context Feature vector.

In this embodiment, the concatenation of the associated context C (facts and rules) and the question Q can be input into the language model. The associated context C and question Q can be separated using preset tags. The aforementioned preset flags include "SEP". As an example, the concatenation of the input context C and the question can be represented as: [CLS],C,[SEP],[SEP],Q,[SEP]. where "CLS" stands for context context global.

The following three feature vectors are determined through the above language model:

h _A = h _CLS (1);

in

H _CLS is the global feature vector of the associated context C;

is the feature vector of node S _i in the associated context;

is the eigenvector of the directed edge S _i -> S _j . in

Represents the concatenation operation of vectors.

Step 203 , input the question feature vector and the associated context feature vector into a probabilistic graph neural network model, and deduce the answer and an argument graph of the answer from the probabilistic graph neural network model.

The above answers can be used to characterize answers that mean right or wrong, yes or no.

The above argument graph may include multiple nodes. A node can be a fact, a rule (both expressed in natural language) or a NAF node. NAF node: Negation As Failure, which means that under the Close World Assumption (CWA), for a statement S, if it cannot be inferred that S is correct based on existing facts and rules, that is, S is wrong, then it can be Introducing non-S is correct. It should be noted that, under the closed world assumption, there is no fact of the negative form, and there is no rule to draw a negative conclusion. Because negative facts and rules are redundant under CWA.

Please refer to Fig. 3, which shows the establishment steps of the probability graph neural network model in the embodiment shown in Fig. 2 above. As shown in Figure 3, the establishment steps of the probabilistic graph neural network model include the following:

Step 301, define the argument graphs of all possible answers to the question statement and the joint distribution of the answers through the probabilistic graph model, so as to explicitly establish the dependence between the argument graphs and the answers.

The argument graph includes nodes representing sentences of the associated context and directed edges between nodes, and the joint distribution includes answer variables, node variables and directed edge variables.

In the argument graph, each node can correspond to a statement in the associated context. Possible answers, nodes in the possible argument graph, and directed edges between nodes in the possible argument graph may be included as variables in the joint distribution.

Step 302, using the designed answer potential function, node potential function and edge potential function to explicitly establish the dependence relationship between different variables in the joint distribution.

The above node potential functions are related to nodes and answers.

The above edge potential functions are related to nodes, answers, and directed edges between nodes.

Probabilistic graphical models can use graph theory to represent the relationship between several independent random variables. The graph obtained according to the probability graph model can include multiple nodes, and any node can be a random variable. If there is no border between two nodes, the two variables can be considered to be conditionally independent of each other. Two common probabilistic graph models are graphs with directed edges and graphs with undirected edges. According to the directionality of graphs, probabilistic graphical models can be divided into two categories, Bayesian networks and Markov networks. The present disclosure may employ undirected probabilistic graphical models.

Specifically, given an associative context C=s ₁ , s ₂ , . . . , s _n and a question statement Q, assign true/false values to all variables. The above variables include answer variable A, node variable V _i and edge variable E _ij .

Use the following expression (4) to represent all output variables:

A joint distribution can be defined over all possible Ys, denoted p(y), where:

in

They are the answer potential function corresponding to the answer variable A, the node potential function corresponding to the node variable V _i and the edge potential function corresponding to the edge variable E _ij .

In the Markov network, a series of functions are defined to evaluate the close relationship between variables, and these functions are called potential functions or factors.

The factorization in the above formula can describe the correlation among the answer variable A, the node variable V _i and the edge variable E _ij .

As shown in Fig. 4a, it shows a schematic diagram of the relationship among answer variable A, node variable V _i and edge variable E _ij . The associated context in Fig. 4a includes statements S ₁ , S ₂ , S ₃ . The above association context may include three nodes: S ₁ , S ₂ , and S ₃ . The answer to the question statement is "True". The proof graph (proof) includes node S ₁ , node S ₂ and directed edges from node S ₁ to node S ₃ .

The solid circles of the nodes in the right figure indicate that when the answer variable A is 1, the value of node V ₁ is 1, the value of node V ₂ is 0, and the value of node V ₃ is 1. The value of directed edge E ₁₃ (node V ₁ pointing to node V ₃ ) is 1, and the values of other directed edges E ₁₂ , E ₂₃ , E ₃₂ , and E ₂₁ are 0.

The potential functions corresponding to Fig. 4a include

Figure 4b shows a factor plot of the joint distribution p(Y) for the example in Figure 4a. As shown in Figure 4b, the above factors include nodes V1, V2, V3, edges E ₁₂ , E ₁₃ , E ₂₁ , E ₂₃ , E ₃₁ , E ₃₂ , answer V _A , potential functions between each node and the answer

The potential function Φ ^A of the answer; the potential function corresponding to each side

The relationship between the various factors.

Theoretically, for the standard answer y^*, the following objective is minimized:

L _joint = logp(Y = y ^* ) (6);

Step 303, using the neural network to parameterize each potential function to obtain a parameterized joint distribution.

Potential function Φ ^A (a): In order to score the possible values of the answer variable A (0 or 1), use a multilayer perceptron (Multilayer Perceptron, MLP) as a nonlinear conversion function, and perform a nonlinear transformation on the global feature vector of the associated context C Convert to get the answer potential function of the answer variable A:

Potential type

For each sentence S _i (a fact or a rule), the feature vector of the sentence S _i can be calculated by step 302

In order to score the possible values of the variables (V,A), another multi-layer perceptron MLP2 can be used as a non-linear transfer function for the feature vector

Nonlinearization yields nodal potential functions for nodal variables:

Dimension 4 represents the number of possible combinations of node variable V and answer variable A. Additionally the parameters of MLP ₂ can be shared across all sentences.

potential function

For each sentence pair (s _i , s _j ), get the sentence pair representation

In order to score four variables (V _i , V _j , E _ij , A). Using an MLP ₃ as the nonlinearization function, the directed edge E _ij is nonlinearized to obtain the edge potential function of the directed edge variable:

in

Represents a vector concatenation.

Dimension 16 represents the number of possible values of combinations of four variables (V _i , V _j , E _ij , A). The parameters of MLP ₃ can be shared across all sentence pairs.

Step 304, determining the pseudo-likelihood function of the parameterized joint distribution; and determining a variational approximation for approximately characterizing the pseudo-likelihood function, so as to obtain a computer-solvable probability graph neural network model.

In order to simplify the calculation, a pseudo-likelihood function can be used to approximate the above parameterized joint distribution (also known as joint probability distribution).

In order to reduce the difficulty of using the pseudo-likelihood function to determine the optimal allocation and facilitate the use of computers to solve the above-mentioned joint distribution, the variational approximation used to approximate the pseudo-likelihood function can be determined to obtain a computer-solvable probability graph neural network model .

Variational approximation: Based on the mean field assumption, the pseudo-likelihood of Y is approximated using a variational distribution (variational approximation) q(Y), where y∈Y is independent of each other. Likewise, each independent distribution can be parameterized with a neural network. The above variational approximation is expressed by formulas (11)~(12):

Once the variational distribution (variational approximation) q(Y) is obtained, the condition p(y|-Y) can be provided for the pseudo-likelihood. Sampling to determine the optimal distribution of the pseudo-likelihood function is thereby avoided.

The probabilistic graph neural network model is established through the above steps. The above probabilistic graph neural network model can be trained to obtain the trained probabilistic graph neural network model. The trained probabilistic graph neural network model can be used to determine the answer and argument graph corresponding to the question sentence according to the input associated context and the question sentence.

In some optional implementations, the probabilistic graph neural network model is obtained through the following steps of training:

First, obtain a training sample set, the training sample set includes multiple sets of training samples, each set of training samples includes a sample associated context, a sample question sentence, a sample answer corresponding to the sample question sentence, and the sample associated context obtained from the sample A sample demonstration graph of the answer; wherein, the sample demonstration graph is a directed acyclic graph, and the directed acyclic graph includes nodes and directed edges between nodes, and the nodes are statements in the context, so The directed edge between the nodes represents an inferred relationship between two associated nodes, and the sample association context is an association context including answers corresponding to sample question sentences.

Secondly, the sample associated context and the sample question statement are used as input, and the sample answer and the sample argument graph are used as output to train the probability graph neural network model to obtain the trained probability graph neural network model.

In some application scenarios, the trained probabilistic graph neural network model can be obtained after training the probabilistic graph neural network model for a preset number of times using the above training samples. The preset number of times mentioned above may include 1000 times, 5000 times, etc., and there is no limitation this time.

In some other optional implementation manners, the sample associated context and sample question statement are used as input, and the sample answer and sample argumentation diagram are used as output to train the initial training model to obtain a trained neural network model, include:

First, the loss function is established based on the following steps: based on the node eigenvectors of the sample nodes and the edge eigenvectors corresponding to the directed edges, the joint distribution and approximate variational distribution among the sample answers, node eigenvectors, and edge eigenvectors are established; according to The joint distribution and the approximated variational distribution determine the loss function.

Secondly, the sample associated context and sample question statement are used as input, and the sample answer and sample argument graph are used as output, and the probability graph neural network model is trained based on the preset loss function using the backpropagation algorithm until Meet the preset conditions.

For the training of the probability graph neural network model by using the backpropagation algorithm based on the preset loss function above, reference may be made to the existing method for training the neural network model by using the backpropagation algorithm, which will not be repeated here.

In these optional implementations, the loss function can be established first, and then the probability graph neural network model can be trained using the loss function. The established loss function is related to the node feature vector of the sample node, the edge feature vector corresponding to the directed edge, and the joint distribution and approximate variational distribution between the sample answer, node feature vector, and edge feature vector. The degree of matching with the probabilistic graph neural network model of the present disclosure is high, and in the process of training the above probabilistic graph neural network model, the effect of quickly optimizing the above probabilistic graph neural network model can be achieved.

Further optionally, the establishment of a joint distribution among sample answers, node feature vectors, and edge feature vectors based on the node feature vector of the sample node and the edge feature vector corresponding to the directed edge includes: according to the sample question and the sample association context The global feature representation of , determine the first potential function about the sample answer; for each sample node in the sample argument graph, according to the feature vector of the sample node, establish the second potential function about the sample node and the sample answer; for For each directed edge of the sample demonstration graph, according to the eigenvector of the directed edge, a third potential function about the directed edge, the sample answer and the associated two sample nodes is established, and the characteristic of the directed edge The vectors are related to the corresponding eigenvectors of the two nodes that are related.

For the training samples, the first potential function about the sample answer, the second potential function about the sample node and the sample answer, the sample has The third potential function of the edge, the sample answer and the associated two sample nodes will not be described here.

Based on the node eigenvectors of the sample nodes and the edge eigenvectors corresponding to the directed edges, an approximate variational distribution among the sample answers, node eigenvectors, and edge eigenvectors is established, including: determining the pseudo-likelihood corresponding to the joint distribution function; based on the mean field assumption, using a variational distribution to approximate the pseudo-likelihood function, wherein each variable in the variational distribution is independent of each other, the variables include: sample answer, the sample argument graph The nodes and sample graphs of demonstrate the directed edges in the graph.

In order to facilitate the calculation of the value of the loss function, the joint distribution between the sample answer, the node of the sample argumentation graph and the directed edge of the sample argumentation graph can be approximated as a pseudo-likelihood function (refer to the relevant content of formula (10), I won’t go into details this time). Specifically, the above-mentioned determining the loss function according to the joint distribution and the approximate variational distribution includes: determining the first loss function and the second loss function according to the variational distribution, so as to determine the first loss function according to the pseudo-likelihood function Three loss functions; the first loss function is used to characterize: the deviation between the prediction nodes included in the demonstration graph predicted by the probability graph neural network model and the sample nodes included in the sample demonstration graph; the The second loss function is used to characterize: the deviation between the predicted directed edges included in the argument graph predicted by the probabilistic graph neural network model and the directed edges included in the sample argument graph; the third loss The function is used to characterize: the deviation between the predicted answer included in the argument graph predicted by the probabilistic graph neural network model and the sample answer; wherein, the nodes in the third loss function determined by the pseudo-likelihood function and directed edges are the predicted outcomes through the variational distribution.

The approximate variational distribution among the above sample answers, node feature vectors, and edge feature vectors can refer to formulas (11) and (12), which will not be repeated this time.

Specifically, the above-mentioned first loss function is characterized by the following formula:

The above second loss function is characterized by the following formula:

The third loss function can be characterized by the following formula:

in

is the prediction result of the variational approximation. Here P() is the pseudo-likelihood function of the joint distribution.

is the node marked in the training sample;

is the directed edge labeled in the training sample. a ^* is the answer labeled by the training sample.

In some application scenarios, the sum of the first loss function, the second loss function, and the third loss function may be determined as the optimization target.

Optionally, the preset conditions mentioned above include:

The sum of the first loss function, the second loss function and the third loss function satisfies a convergence condition.

During multiple consecutive trainings of the above convergence condition, the change of the sum of the first loss function, the second loss function, and the third loss function obtained in every two adjacent trainings is less than a preset change threshold; or

The sum of the first loss function, the second loss function, and the third loss function is the smallest.

Through the above process, the process of calculating the value of the loss function can be simplified, which is conducive to improving the training efficiency of the neural network model of the neural probability graph.

Optionally, the aforementioned preset condition includes that the number of training times reaches a preset number of times threshold.

The neural network model of the above probability map can obtain higher prediction accuracy.

After the above-mentioned probabilistic graph neural network model is trained in a large-sample training data set (a large-sample training data set can include, for example, 70,000 training data), the result after the probabilistic graph neural network model test: the answer accuracy rate is 99.99% ; The accuracy of the demonstration graph is 88.8%.

The above-mentioned probabilistic graph neural network model is trained in a small-sample training data set (for example, 30,000, 10,000, and 1,000 sets of training data randomly selected from the above-mentioned large-sample training samples). Test the trained PGNN model. The test results are as follows: the training sample is 30,000 groups, and the answer accuracy rate is 99.9%; the accuracy rate of the demonstration map is 86.8%; the training sample is 10,000 groups, the answer accuracy rate is 99.9%; the accuracy rate of the argument map is 72.4%; There are 1000 groups, the accuracy rate of the answer is 82.1%; the accuracy rate of the argument graph is 21.1%. When the above probabilistic graph neural network model is trained with small samples, a probabilistic graph neural network model with high accuracy can be obtained. That is, a small amount of training data can be used to train the probabilistic graph neural network model, and a probabilistic graph neural network model with high prediction result accuracy can be obtained.

After using the training samples of the non-target category to train the probabilistic graph neural network model, the method of using the trained probabilistic graph neural network model to reason about the problem of this category according to the associated context of the category is called (Zero-shot Evaluation).

After using a large sample data set that does not include the target category to train the probabilistic graph neural network model, the results of inference on the test sample of the target category are as follows: the accuracy of the answer is 96.3%; the accuracy of the argument graph is 79.3% .

From the above data, it can be seen that the portability of the trained probabilistic graph neural network model is high. When the trained probabilistic graph neural network model is applied to other types of natural language inference environments, higher inference results can be obtained.

Please refer to FIG. 5 , which shows a schematic diagram of an application scenario of the information processing method based on natural language reasoning provided by the present disclosure.

As shown in FIG. 5 , the input information in FIG. 5 includes: associated context and question statement. The associated context may include fact statements F1, F2 and judgment rule statements R1, R2, R3, R4, R5, R6. Question sentences Q1 and Q2 can be input. The above question statements Q1 and Q2 can be input in batches. The answer A1 and argument diagram 1 corresponding to the question statement Q1 can be obtained through the above steps 302-303, and the answer A2 and argument diagram 2 corresponding to the question statement Q2 can be obtained. Each statement can be thought of as a node. The corresponding answer A1:TRUE of the question statement Q1, and the argument graph corresponding to the answer A1 is a directed edge from node F2 to node R2. The above demonstration diagram illustrates the process of obtaining the above answer A1 from nodes F2 and R2: first, the fact statement on node F2 determines that the wire is metal, and then the judgment rule provided by the rule statement R4 determines that the answer to the question is "Yes ".

For the problem statement Q2, the corresponding answer A2 of the problem statement Q2 "no current flows in the circuit" is FALSE. Argument to arrive at answer A2 Figure 2 shows the process of arriving at answer A2 above. NAF node: Negation As Failure, which means that under the Close World Assumption (Close World Assumption), for a statement S, if it cannot be inferred from the existing facts and rules that the statement S is correct, that is, the statement S is wrong, then it can be Introducing the non-statement S is correct. In Figure 5, the NAF node means "the circuit does not have the switch". The above answer A2 can be obtained by demonstrating the nodes and directed edges in Figure 2. That is, the directed edge from node NAF to node R1, the directed edge from node F1 to node R1; the directed edge from node R1 to node R6; the directed edge from node NAF to node R3, and the directed edge from node R3 to node R6 The directed edge of ; the directed edge from node F2 to node R4, the directed edge from node R4 to node R6. The above argument Figure 2 shows the argument that leads to answer A2.

The information processing method based on natural language reasoning provided by this embodiment highlights the steps of using a language model and a probability graph neural network model to obtain answers to questions and argument graphs; the above-mentioned probability graph neural network model consists of answers, nodes, directed The joint distribution of the edges is obtained, therefore, the obtained answer and the argument graph have a high degree of correlation, and the argument graph has a greater strength to prove the answer. The argument graph given by the above probabilistic graph model neural network model can assist in the prediction of answers and improve the ability to answer questions. In addition, since the probabilistic graph neural network model is obtained from the joint distribution of answers, nodes, and directed edges, it can be trained with few samples, and a probabilistic graph neural network model with high accuracy of results can be obtained.

Further referring to FIG. 6 , as an implementation of the methods shown in the above figures, the present disclosure provides some embodiments of an information processing device based on natural language reasoning, which corresponds to the method embodiment shown in FIG. 1 , the device can be specifically applied to various electronic devices.

As shown in FIG. 6 , the information processing device based on natural language reasoning in this embodiment includes: a receiving unit 601 and a determining unit 602 . Wherein, the receiving unit 601 is used to receive the question statement and the associated context, wherein the question statement is used to characterize the question to be answered; the determining unit 602 is used to receive the question characteristic information based on the question statement and the associated context The contextual feature information of the question, determine the answer to the question and the argument graph for deriving the answer; the argument graph represents the process of deriving the answer from the associated context reasoning

In this embodiment, the specific processing of the generating unit 601, the receiving unit 601, and the determining unit 602 of the information processing device based on natural language inference and the technical effects brought about by them can refer to step 101 and step 101 in the corresponding embodiment in FIG. Relevant descriptions of 102 will not be repeated here.

In some optional implementation manners, the argument graph is a directed acyclic graph, the directed acyclic graph includes nodes and directed edges between nodes, the nodes are statements in the associated context, A directed edge between the nodes represents an inferred relationship between the associated two nodes.

In some optional implementations, the determining unit 602 is further configured to: input the question statement and the associated context into a pre-trained language model to obtain a question feature vector and an associated context feature vector; And the associated context feature vector is input to a probabilistic graph neural network model, and the answer and a demonstration graph of the answer are obtained by inference from the probabilistic graph neural network model.

In some optional implementations, the probabilistic graph neural network model is obtained by the following steps: define the argument graph of all possible answers to the question statement and the joint distribution of the answers through the probabilistic graph model, so as to explicitly establish the argument Dependence between the graph and the answer, the argument graph includes the nodes representing the sentences of the associated context and the directed edges between the nodes, and the joint distribution includes answer variables, node variables and directed edge variables; using the designed answer potential function , node potential function and edge potential function explicitly establish the dependency between different variables in the joint distribution; the node potential function is related to the node and the answer, and the edge potential function is related to the node, answer, and Edge-wise correlation; use the neural network to parameterize each potential function to obtain a joint distribution after parameterization; determine the pseudo-likelihood function of the joint distribution after parameterization; and determine the Variational approximation of the pseudo-likelihood function to obtain a computer-solvable probabilistic graph neural network model.

In some optional implementation manners, the shown natural language reasoning-based information processing apparatus further includes a training unit (not shown in the figure). The training unit is used to train the probability graph neural network model based on the following steps to obtain the trained probability graph neural network model: obtain a training sample set, the training sample set includes multiple sets of training samples, and each set of training samples includes a sample associated context , a sample question sentence, a sample answer corresponding to a sample question sentence, and a sample argumentation graph obtained from the sample associated context to obtain the sample answer; wherein, the sample argumentation graph is a directed acyclic graph, and the directed acyclic The graph includes nodes and directed edges between nodes, the nodes are sentences in the context, the directed edges between the nodes represent the inferred relationship between two associated nodes, and the sample association context is Including the associated context of the answer corresponding to the sample question statement; using the sample associated context and the sample question statement as input, the sample answer and the sample argument graph as output, the probability graph neural network model is trained, and the trained probability graph neural network model is obtained. network model.

In some optional implementations, the training unit is further used to establish the loss function based on the following steps: based on the node feature vector of the sample node and the edge feature vector corresponding to the directed edge, the relationship between the sample answer, the node feature vector, and the edge feature vector is established. The joint distribution between them and the approximate variational distribution; the loss function is determined according to the joint distribution and the approximated variational distribution; the sample association context and the sample question statement are used as input, and the sample answer and the sample argument graph are used as output and using a backpropagation algorithm based on the preset loss function to train the probabilistic graph neural network model until a preset condition is met.

In some optional implementations, the training unit is further used to: determine the first potential function of the sample answer according to the global feature representation of the sample question and the sample context; for each sample node of the sample argument graph, according to the sample The eigenvector of the node, establishes the second potential function about the sample node and the sample answer; for each directed edge of the sample argument graph, according to the eigenvector of the directed edge, establishes the The third potential function of the sample answer and the associated two sample nodes, the eigenvectors of the directed edges are related to the respective eigenvectors corresponding to the two related nodes; based on the first potential function, each sample node is respectively The corresponding second potential function and the third potential function respectively corresponding to the directed edges of each sample parameterize the joint distribution among the sample answers, the nodes of the sample argument graph and the directed edges of the sample argument graph.

In some optional implementations, the training unit is further used to: determine the pseudo-likelihood function corresponding to the joint distribution; use a variational distribution to approximate the pseudo-likelihood function based on the mean field assumption, wherein the variational Each variable in the distribution is independent of each other, and the variables include: sample answers, nodes in the argument graph of the sample graph, and directed edges in the argument graph of the sample graph.

In some optional implementation manners, the training unit is further configured to: determine a first loss function and a second loss function according to the approximate variational distribution; determine a third loss function according to the pseudo-likelihood function; the first A loss function is used to characterize: the deviation between the prediction nodes included in the demonstration graph predicted by the probability graph neural network model and the sample nodes included in the sample demonstration graph; the second loss function is used to characterize : the deviation between the predicted directed edges included in the argument graph predicted by the probability graph neural network model and the directed edges included in the sample argument graph; the third loss function is used to characterize: by the The deviation between the predicted answer included in the demonstration graph predicted by the probability graph neural network model and the sample answer; wherein, the nodes and directed edges in the third loss function determined by the pseudo-likelihood function are obtained by changing distribution of prediction results.

In some optional implementation manners, the preset condition includes: the sum of the first loss function, the second loss function, and the third loss function satisfies a convergence condition; or the number of training times reaches a preset number threshold.

In some optional implementation manners, the determining unit 602 is further configured to: use a preset retrieval method to retrieve at least one sentence from the associated context according to the question sentence; Encoding the at least one statement; and inputting the encoded question statement and the encoded at least one statement into a pre-trained language model to obtain the question feature vector and the associated context feature vector.

Please refer to FIG. 7 , which shows an exemplary system architecture in which an information processing method based on natural language reasoning based on natural language reasoning according to an embodiment of the present disclosure can be applied.

As shown in FIG. 7 , the system architecture may include

terminal devices

701 , 702 , and 703 , a network 704 , and a server 705 . The network 704 is used as a medium for providing communication links between the

terminal devices

701 , 702 , 703 and the server 705 . Network 704 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.

The

terminal devices

701, 702, 703 can interact with the server 705 through the network 704 to receive or send messages and the like. Various client applications, such as web browser applications, search applications, and news information applications, may be installed on the

terminal devices

701, 702, and 703. The client applications in the

terminal devices

701, 702, and 703 can receive user instructions and complete corresponding functions according to the user instructions, such as adding corresponding information to information according to the user instructions.

Terminal devices

701, 702, and 703 may be hardware or software. When the

terminal devices

701, 702, and 703 are hardware, they may be various electronic devices that have display screens and support web browsing, including but not limited to smartphones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, moving picture expert compression standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, moving picture expert compression standard audio layer 4) player, laptop portable computer and desktop computer, etc. When the

terminal devices

701, 702, and 703 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (such as software or software modules for providing distributed services), or as a single software or software module. No specific limitation is made here.

The server 705 can provide various services, such as receiving question sentences and associated contexts sent by

terminal devices

701 , 702 , and 703 , analyzing and processing the question sentences and associated contexts, and sending analysis and processing results to the terminal devices.

It should be noted that the information display method provided by the embodiments of the present disclosure can be executed by a terminal device, and accordingly, the information display apparatus can be set in the

terminal devices

701, 702, and 703. In addition, the information display method provided by the embodiment of the present disclosure may also be executed by the server 705 , and accordingly, the information display device may be set in the server 705 .

It should be understood that the numbers of terminal devices, networks and servers in FIG. 7 are only illustrative. According to the implementation needs, there can be any number of terminal devices, networks and servers.

Referring now to FIG. 8 , it shows a schematic structural diagram of an electronic device (such as the terminal device or server in FIG. 7 ) suitable for implementing the embodiments of the present disclosure. The terminal equipment in the embodiment of the present disclosure may include but not limited to such as mobile phone, notebook computer, digital broadcast receiver, PDA (personal digital assistant), PAD (tablet computer), PMP (portable multimedia player), vehicle terminal (such as mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers and the like. The electronic device shown in FIG. 8 is only an example, and should not limit the functions and scope of use of the embodiments of the present disclosure.

As shown in FIG. 8 , an electronic device may include a processing device (such as a central processing unit, a graphics processing unit, etc.) (RAM) 803 to execute various appropriate actions and processing. In the RAM 803, various programs and data necessary for the operation of the electronic device 800 are also stored. The processing device 801, ROM 802, and RAM 803 are connected to each other through a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804 .

Typically, the following devices can be connected to the I/O interface 805: input devices 806 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 807 such as a computer; a storage device 808 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 809. The communication means 809 may allow the electronic device to perform wireless or wired communication with other devices to exchange data. While FIG. 8 shows an electronic device having various means, it is to be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from a network via communication means 809, or from storage means 808, or from ROM 802. When the computer program is executed by the processing device 801, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are performed.

It should be noted that the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two. A computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In the present disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present disclosure, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.

In some embodiments, the client and the server can communicate using any currently known or future network protocols such as HTTP (HyperText Transfer Protocol, Hypertext Transfer Protocol), and can communicate with digital data in any form or medium The communication (eg, communication network) interconnections. Examples of communication networks include local area networks ("LANs"), wide area networks ("WANs"), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.

The above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.

The above-mentioned computer-readable medium bears one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: receives a question statement and an associated context, wherein the question statement is used to represent a given The question of the answer; based on the question feature information of the question sentence and the context feature information of the associated context, determine the answer to the question and the argumentation diagram for obtaining the answer; The process of describing the answer, the argument graph is a directed acyclic graph, the directed acyclic graph includes nodes and directed edges between nodes, the nodes are statements in the associated context, and the nodes between The directed edge between represents the inferred relationship between two associated nodes.

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as the "C" language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.

The units involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of a unit does not constitute a limitation of the unit itself under certain circumstances.

The functions described herein above may be performed at least in part by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), System on Chips (SOCs), Complex Programmable Logical device (CPLD) and so on.

In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.

The above description is only a preferred embodiment of the present disclosure and an illustration of the applied technical principle. Those skilled in the art should understand that the disclosure scope involved in this disclosure is not limited to the technical solution formed by the specific combination of the above-mentioned technical features, but also covers the technical solutions formed by the above-mentioned technical features or Other technical solutions formed by any combination of equivalent features. For example, a technical solution formed by replacing the above-mentioned features with (but not limited to) technical features with similar functions disclosed in this disclosure.

In addition, while operations are depicted in a particular order, this should not be understood as requiring that the operations be performed in the particular order shown or performed in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while the above discussion contains several specific implementation details, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are merely example forms of implementing the claims.

Claims

An information processing method based on natural language reasoning, comprising:

receiving a question statement and an associated context, wherein the question statement characterizes a question to be answered;

Based on the question feature information of the question sentence and the context feature information of the associated context, determine the answer to the question and the argumentation graph for deriving the answer; the argumentation graph represents the process of deriving the answer from the associated context .
The method according to claim 1, wherein the argument graph is a directed acyclic graph, the directed acyclic graph includes nodes and directed edges between nodes, and the nodes are nodes in the associated context statement, the directed edge between the nodes represents the inferred relationship between the two associated nodes.
The method according to claim 1, wherein, determining the answer to the question and obtaining an argument diagram of the answer based on the question feature information of the question sentence and the context feature information of the associated context includes:

Inputting the question sentence and the associated context into a pre-trained language model to obtain a question feature vector and an associated context feature vector;

The question feature vector and the associated context feature vector are input into a probabilistic graph neural network model, and the answer and an argument graph of the answer are deduced from the probabilistic graph neural network model.
The method according to claim 3, wherein the argument graph includes nodes and directed edges between nodes; the probabilistic graph neural network model is obtained based on the following steps:

A joint distribution of argument graphs and answers of all possible answers to the question statement is defined by a probabilistic graph model to explicitly establish dependencies between the argument graph and answers, the argument graph includes nodes representing sentences of the associated context and directed edges between nodes, the joint distribution includes answer variables, node variables and directed edge variables;

Use the designed answer potential function, node potential function and edge potential function to explicitly establish the dependency between the different variables in the joint distribution; the node potential function is related to the node and the answer, and the edge potential function is related to the node, The answer is related to the directed edge between the nodes;

Use the neural network to parameterize each potential function to obtain the joint distribution after parameterization;

determining a pseudo-likelihood function of the parameterized joint distribution; and determining a variational approximation for approximately characterizing the pseudo-likelihood function, so as to obtain a computer-solvable probability graph neural network model.
The method according to claim 3, wherein the probability graph neural network model is obtained through the following steps of training:

Obtain a training sample set, the training sample set includes multiple sets of training samples, each set of training samples includes a sample associated context, a sample question sentence, a sample answer corresponding to the sample question sentence, and the sample answer obtained from the sample associated context sample argument diagram;

The sample association context and the sample question sentence are used as input, and the sample answer and the sample argument graph are used as output to train the probability graph neural network model to obtain the trained probability graph neural network model.
The method according to claim 5, wherein the argument graph includes nodes and directed edges between nodes; the sample associated context and sample question statement are used as input, and the sample answer and sample argument graph are output as , train the probabilistic graph neural network model, and obtain the trained probabilistic graph neural network model, including:

The loss function is established based on the following steps: based on the node eigenvectors of the sample nodes and the edge eigenvectors corresponding to the directed edges, the joint distribution and approximate variational distribution among the sample answers, node eigenvectors, and edge eigenvectors are established; according to the A joint distribution and an approximated variational distribution determine said loss function;

Taking sample associated context and sample question statement as input, taking the sample answer and sample argument graph as output, using the backpropagation algorithm based on the loss function, to train the probability graph neural network model until the preset conditions are met .
The method according to claim 6, wherein, the establishment of a joint distribution among sample answers, node feature vectors, and edge feature vectors based on the node feature vectors of the sample nodes and the edge feature vectors corresponding to the directed edges includes:

According to the sample question and the global feature representation of the sample association context, determine the first potential function about the sample answer;

For each sample node in the sample demonstration graph, according to the feature vector of the sample node, establish a second potential function about the sample node and the sample answer;

For each directed edge of the sample demonstration graph, according to the eigenvector of the directed edge, a third potential function about the directed edge, the sample answer and the associated two sample nodes is established, and the directed edge's The eigenvectors are related to the corresponding eigenvectors of the two related nodes;

Based on the first potential function, the second potential function corresponding to each sample node, and the third potential function corresponding to each sample directed edge, parameterize the sample answer, the sample argument graph node, and the sample argument Joint distribution between directed edges in a graph.
The method according to claim 6, wherein, based on the node feature vector of the sample node and the edge feature vector corresponding to the directed edge, an approximate variational distribution between the sample answer, the node feature vector, and the edge feature vector is established, include:

determining a pseudo-likelihood function corresponding to the joint distribution;

Based on the mean field assumption, the pseudo-likelihood function is approximated using a variational distribution, wherein each variable in the variational distribution is independent of each other, and the variables include: sample answers, nodes in the sample argument graph and directed edges in the sample argument graph.
The method of claim 8, wherein said determining said loss function from said joint distribution and an approximated variational distribution comprises:

determining a first loss function and a second loss function based on the variational distribution to determine a third loss function based on the pseudo-likelihood function;

The first loss function is used to characterize: the deviation between the predicted nodes included in the demonstration graph predicted by the probability graph neural network model and the sample nodes included in the sample demonstration graph;

The second loss function is used to characterize: the deviation between the predicted directed edges included in the argument graph predicted by the probabilistic graph neural network model and the directed edges included in the sample argument graph;

The third loss function is used to characterize: the deviation between the predicted answer included in the argument map predicted by the probability graph neural network model and the sample answer; wherein, the third loss function determined by the pseudo-likelihood function The nodes and directed edges in the loss function are the predicted results through the variational distribution.
The method according to claim 6, wherein the preset conditions include:

The sum of the first loss function, the second loss function, and the third loss function satisfies a convergence condition; or

The number of training times reaches the preset number of times threshold.
The method according to claim 3, wherein, before the question statement and the associated context are input into a pre-trained language model to obtain the question feature vector and the associated context feature vector, the question based on the question statement The characteristic information and the contextual characteristic information of the associated context, determining the answer to the question and obtaining an argument graph for the answer, further include:

Retrieve at least one sentence from the associated context by using a preset retrieval method according to the question sentence;

Encoding the at least one sentence retrieved above using a preset encoding method;

And said inputting said question statement and said associated context into a pre-trained language model to obtain said question feature vector and associated context feature vector, including:

Inputting the coded question sentence and the coded at least one sentence into a pre-trained language model to obtain the question feature vector and the associated context feature vector.
An information processing device based on natural language reasoning based on natural language reasoning, comprising:

a receiving unit, configured to receive a question statement and an associated context, wherein the question statement is used to characterize a question to be given an answer;

A determining unit, configured to determine an answer to the question and an argument diagram for deriving the answer based on the question feature information of the question sentence and the context feature information of the associated context; The process of reasoning to arrive at the stated answer.
An electronic device, characterized in that it comprises:

one or more processors;

storage means for storing one or more programs,

When the one or more programs are executed by the one or more processors, the one or more processors are made to implement the method according to any one of claims 1-11.
A computer-readable medium, on which a computer program is stored, wherein, when the program is executed by a processor, the method according to any one of claims 1-11 is realized.