CN113505206B

CN113505206B - Information processing method and device based on natural language reasoning and electronic equipment

Info

Publication number: CN113505206B
Application number: CN202110744658.3A
Authority: CN
Inventors: 孙长志; 张欣勃; 周浩; 李磊
Original assignee: Beijing Youzhuju Network Technology Co Ltd
Current assignee: Beijing Youzhuju Network Technology Co Ltd
Priority date: 2021-07-01
Filing date: 2021-07-01
Publication date: 2023-04-18
Anticipated expiration: 2041-07-01
Also published as: CN113505206A; WO2023274187A1

Abstract

The embodiment of the disclosure discloses an information processing method and device based on natural language reasoning and electronic equipment. One embodiment of the method comprises: receiving question sentences and associated context, wherein the question sentences are used for representing questions to be given with answers; determining answers of the questions and a demonstration graph for obtaining the answers based on the question feature information of the question sentences and the context feature information of the associated context; the argument graph represents the process of obtaining the answer through the context inference, the argument graph is a directed acyclic graph, the directed acyclic graph comprises nodes and directed edges between the nodes, the nodes are statements in the context, and the directed edges between the nodes represent the inference relation between two associated nodes. The demonstration graph of the demonstration answers is determined at the same time, so that the user can conveniently know the process of obtaining the answers, and the reliability of the answers is improved.

Description

Information processing method and device based on natural language reasoning and electronic equipment

Technical Field

The present disclosure relates to the field of internet technologies, and in particular, to an information processing method and apparatus based on natural language reasoning, and an electronic device.

Background

With the development of the field of artificial intelligence, people hope to understand natural language through artificial intelligence, and can realize man-machine conversation and the like on the basis of the natural language.

In the related art, a knowledge base can be used for automatic reasoning. But automatic reasoning is implemented on knowledge bases, and early work focused on reasoning on formal (formal) representations, i.e. the representation of each statement (sense) in the knowledge base is a logical rule (logic rules), such as First-order logic. However, building such knowledge bases and reasoning on them still presents significant challenges. For example, the process of converting a sentence into a logic rule requires semantic parsing (semantic parsing). The above process is complicated and does not achieve the desired effect.

Disclosure of Invention

This disclosure is provided to introduce concepts in a simplified form that are further described below in the detailed description. This disclosure is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

The embodiment of the disclosure provides a natural language reasoning-based information processing method and device and electronic equipment.

In a first aspect, an embodiment of the present disclosure provides an information processing method based on natural language reasoning, including: receiving question sentences and associated context, wherein the question sentences are used for representing questions to be given as answers; determining an answer to the question and deriving a demographics of the answer based on the question feature information of the question statement and the context feature information of the associated context; the argument graph represents the process of obtaining the answer by the context inference, the argument graph is a directed acyclic graph, the directed acyclic graph comprises nodes and directed edges between the nodes, the nodes are statements in the context, and the directed edges between the nodes represent the inference relation between two associated nodes.

In a second aspect, an embodiment of the present disclosure provides an information processing apparatus based on natural language reasoning, including: a receiving unit, configured to receive a question statement and an associated context, where the question statement is used to characterize a question to be given an answer; the determining unit is used for determining answers of the questions and obtaining demonstration graphs of the answers based on the question feature information of the question sentences and the context feature information of the associated contexts; the argument graph represents the process of reasoning to obtain the answer by the associated context, the argument graph comprises associated nodes associated with the answer and directed edges among the associated nodes, and the associated nodes are at least one statement in the associated context.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the information processing method based on natural language inference as described in the first aspect.

In a fourth aspect, the disclosed embodiments provide a computer readable medium, on which a computer program is stored, where the program, when executed by a processor, implements the natural language inference based information processing method according to the first aspect.

According to the information processing method, the information processing device and the electronic equipment based on natural language reasoning, the question sentence and the associated context are received, wherein the question sentence is used for representing the question of the answer to be given; the answer of the question is determined and the demonstration graph of the answer is obtained based on the question feature information of the question sentence and the context feature information of the associated context, so that the demonstration graph of the demonstration answer can be determined simultaneously when the answer of the question is determined according to the associated context, and a user can know the process of obtaining the answer conveniently. Compared with the scheme of only giving answers or respectively generating answers and demonstration graphs, the demonstration graph of the scheme can assist in predicting the answers and improve the ability of answering questions.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.

FIG. 1 is a flow diagram of some embodiments of a natural language reasoning based information processing method according to the present disclosure;

FIG. 2 is a flow diagram of additional embodiments of a natural language reasoning based information processing method according to the present disclosure;

FIG. 3 is a schematic flow diagram of probabilistic graphical neural network modeling in the embodiment shown in FIG. 2;

FIG. 4a shows the variables A, V in the joint distribution in the embodiment of FIG. 2 _i And the edge variable E _ij A schematic diagram of the relationship between the two;

FIG. 4b shows a factor graph of the joint distribution of the example of FIG. 4 a;

FIG. 5 is a schematic diagram of an application scenario of the information processing method based on natural language reasoning provided by the present disclosure;

FIG. 6 is a block diagram of some embodiments of a natural language inference based information processing apparatus provided by the present disclosure;

FIG. 7 is an exemplary system architecture to which the natural language inference based information processing method of one embodiment of the present disclosure may be applied;

fig. 8 is a schematic diagram of a basic structure of an electronic device provided according to an embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more complete and thorough understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "including" and variations thereof as used herein is intended to be open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

Referring to fig. 1, a flow diagram of some embodiments of a natural language reasoning based information processing method according to the present disclosure is shown. The information processing method based on natural language reasoning, as shown in fig. 1, includes the following steps:

step 101, receiving a question statement and an associated context, wherein the question statement is used for characterizing a question to be given an answer.

The question statements and associated context are characterized using natural language. Question statements and associated contexts, for example expressed in chinese; or question statements and associated contexts expressed using other languages.

The context of association may include facts and rules expressed using natural language.

As an illustrative illustration, the above facts may be, for example:

f1, the circuit comprises a battery; f2, the connecting wire is a metal connecting wire.

The rules may include, for example:

r: if the circuit includes a switch that is open, the circuit is complete.

In some application scenarios, the question statements may be declarative statements. The answer to be given may be an answer for giving a judgment result, and the answer may include, for example, "pair" or "error", "yes" or "no", and the like.

Step 102, based on the question feature information of the question statement and the context feature information of the associated context, determining an answer to the question and obtaining a demonstration graph of the answer.

The argument graph can represent a process of obtaining the answer through the association context inference, the argument graph is a directed acyclic graph, the directed acyclic graph comprises nodes and directed edges between the nodes, the nodes are statements in the association context, and the directed edges between the nodes represent an inference relation between two associated nodes.

Problem feature information may be extracted from the problem statement and context feature information may be extracted from the associated context in various ways.

In some optional implementations, the feature information includes a feature vector. The step 102 may include the following steps:

firstly, inputting the question sentence and the associated context into a pre-trained language model to obtain a question feature vector and an associated context feature vector.

Secondly, according to the question feature vector and the associated context feature vector, the answer and the demonstration graph are determined.

The language model may be any of various existing models for determining feature vectors of natural language. The model may be various types of machine learning models.

Various analyses may be performed on the question feature vectors and associated context feature vectors to determine answers to the question statements and the demonstration graph.

Optionally, before inputting the question sentence and the associated context to the pre-trained language model to obtain a question feature vector and an associated context feature vector, the step 102 further includes: retrieving at least one statement from the associated context using a preset retrieval method according to the question statement; coding the at least one retrieved statement by using a preset coding method; inputting the question sentence and the associated context into a pre-trained language model to obtain a question feature vector and an associated context feature vector, wherein the method comprises the following steps: and inputting the coded question sentence and the coded at least one sentence into a pre-trained language model to obtain the question feature vector and the associated context feature vector.

In these alternative implementations, the context of association may be a long article, or sentence paragraph. At least one statement may be retrieved from the associated context using a preset retrieval method based on the question statement. The above-mentioned sentence may be a sentence having a large degree of association with the question sentence.

For example, at least one statement may be retrieved from the associated context using the keyword based on the keyword of the question statement.

The words and phrases included in the at least one sentence may be encoded using a single-byte character set (single-byte character set or SBCS), a multi-byte character set (multi-byte character set or MBCS), a Unicode method, or the like. Thereby resulting in a computer-processed encoded said at least one statement. The words and phrases included in the question sentence may be encoded using a word vector encoding method or the like. Thereby resulting in an encoded problem statement.

A feature vector analysis model such as a word vector model can be used for determining feature vectors of coded associated contexts and feature vectors of sentences in the associated contexts; and determining a feature vector of the encoded question statement.

In the information processing method based on natural language reasoning provided by this embodiment, a question sentence and an associated context are received, where the question sentence is used to represent a question to be given an answer; the answer of the question and the argument graph of the answer are determined based on the question feature information of the question sentence and the context feature information of the associated context, so that the argument graph of the argument answer can be determined simultaneously when the answer of the question is determined according to the associated context, and the user can know the process of obtaining the answer conveniently. Compared with a scheme of only giving answers or respectively generating answers and a demonstration graph, the demonstration graph of the embodiment can assist in predicting the answers, and the ability of answering questions is improved.

Continuing to refer to FIG. 2, a flow diagram of additional embodiments of a natural language reasoning based information processing method provided by the present disclosure is shown.

As shown in fig. 2, the information processing method based on natural language reasoning provided by the embodiments includes the following steps:

step 201, receiving a question statement and an associated context, wherein the question statement is used for characterizing a question to be given an answer.

Natural language reasoning mainly uses machine learning models to judge semantic relationships between sentences. For example, a set of sentences describing facts and judgment rules are input, a question is input, and answers to the question are determined by the sentences and the judgment rules. In this embodiment, the machine learning model may include a language model and a probabilistic neural network model.

In some application scenarios, the question statement may be a declarative statement. The answer to be given may be an answer for giving a judgment result, and the answer may include, for example, "pair" or "error", "yes" or "no", and the like.

Step 202, inputting the question statement and the associated context into a pre-trained language model to obtain a question feature vector and an associated context feature vector.

Prior to the step 202, the method further comprises: according to the question sentences, using a preset retrieval method to retrieve at least one sentence from the associated context; using a preset coding method to code the retrieved at least one statement; (ii) a And step 202 may further comprise: and inputting the coded question sentence and the coded at least one sentence into a pre-trained language model to obtain the question feature vector and the associated context feature vector.

In this embodiment, a concatenation of the associated context C (facts and rules) and the question Q may be input into the language model. May be separated by SEP markers, i.e. expressed as: [ CLS ], C, [ SEP ], [ SEP ], Q, [ SEP ]. CLS represents the associated context global;

the following three feature vectors are determined through the language model:

h _A ＝h _CLS (1)；

wherein

h _CLS A global feature vector for the context of association C;

a feature vector of a node Si in the associated context;

is an oriented edge Si->Sj, the feature vector of the vector. Wherein +>

Representing a concatenation operation of the vectors.

Step 203, inputting the question feature vector and the associated context feature vector into a designed probabilistic graph neural network model, and obtaining the answer and a demonstration graph of the answer through inference by the probabilistic graph neural network model.

The above answers may be used to characterize answers in the sense of true or false, yes or no.

The demonstration graph may include a plurality of nodes. The node may be a fact, rule (all expressed in natural language) or NAF node. NAF node, nega As Failure, indicates that under the closed World Assumption (Close World assertion), for a statement S, if it is inferred that S is not correct, i.e., S is wrong, based on existing facts and rules, it can be concluded that non-S is correct. It should be noted that under the Closed World Assumption (CWA), there is no negative form of fact and no rule for drawing a negative conclusion. Because negative facts and rules are redundant under CWA.

Please refer to fig. 3, which illustrates the steps of establishing the neural network model of the probability map in the embodiment shown in fig. 2. As shown in fig. 3, the step of establishing the probabilistic neural network model includes the following steps:

step 301, defining a joint distribution of the argument graph and the answer of all possible answers of the question sentence through a probability graph model to explicitly establish the dependency between the argument graph and the answer.

The demonstration graph comprises nodes representing the sentences of the associated context and directed edges among the nodes, and the joint distribution comprises answer variables, node variables and directed edge variables.

In the demonstration graph, each node may correspond to a statement in the associated context. The possible answers, the nodes in the possible argumentation graph, and the directed edges between the nodes in the possible argumentation graph may be used as variables in the joint distribution.

And step 302, explicitly establishing a dependency relationship among different variables in the joint distribution by using the designed answer potential function, node potential function and edge potential function.

The node potential function is related to the node and the answer.

The edge potential function is associated with directed edges between the nodes, the answers, and the nodes.

The probabilistic graphical model is a modeling method for expressing the relationship of several independent random variables by using a graph theory method. The graph can comprise a plurality of nodes, any node is a random variable, and if two nodes are connected without edges, the two variables are independent of each other. Two common probabilistic graphical models are graphs with directional edges and graphs with non-directional edges. Probability map models can be divided into two broad categories, bayesian networks and markov networks, according to the directionality of the graph. The present disclosure may employ an undirected probabilistic graphical model.

In particular, given one association context C = s ₁ ，s ₂ ，…，s _n And a question statement Q, which assigns true/false values to all variables. The variables include answer variable A and node variable V _i And the edge variable E _ij 。

All output variables are expressed using the following expression (4):

a joint distribution can be defined over all possible Y, denoted p (Y), where:

wherein

Φ ^A (a)，

The answer potential function and the node variable V respectively correspond to the answer variable A _i Corresponding node potential function and edge variable E _ij The corresponding edge potential function.

The markov network is used to evaluate the close relationship of the mutual influence between variables by defining a series of functions, which are called potential functions or factors.

Factorization in the above formula may characterize the answer variable A, the node variable V _i And the edge variable E _ij The correlation between them.

FIG. 4a shows the answer variable A, the node variable V _i And the edge variable E _ij A schematic diagram of the relationship between them. The context of relevance in fig. 4a comprises statements S1, S2, S3. The association context may include 3 nodes: s1, S2 and S3. The answer corresponding to the question statement is "True". Demonstration graph (proof) packageIncluding node S1, node S2, and the directed edge that node S1 points to node S3.

The solid circles of nodes in the right graph indicate that when the answer variable a takes 1, node V1 has a value of 1, node V2 has a value of 0, and node V3 has a value of 1. The value of the directional edge E13 (the node V1 points to the node V3) is 1, and the values of the other directional edges E12, E23, E32, and E21 are 0.

The potential function corresponding to FIG. 4a includes

FIG. 4b shows the factor graph of the joint distribution p (Y) of the example of FIG. 4 a. As shown in FIG. 4b, the above factors include nodes V1, V2, V3, edges E12, E13, E21, E23, E31, E32, and answer V _A Potential function between each node and answer

Potential function phi of the answer ^A (ii) a Potential function corresponding to each side

And (4) association relationship among factors.

Theoretically, for the standard answer y ^ the following objectives are minimized:

L _joint ＝log p(Y＝y ^* ) (6)；

and 303, parameterizing each potential function by using a neural network to obtain the parameterized combined distribution.

Potential function phi ^A (a) In order to score possible values of the answer variable a (0 or 1), the global feature vector of the associated context C is nonlinearly converted using a Multilayer Perceptron (MLP) as a nonlinear conversion function to obtain an answer potential function of the answer variable a:

potential function type

For each sentence S _i (a fact or a rule), the sentence S can be calculated by step 302 _i Characteristic vector->

To score the possible values of the variable (V, A), another multi-level perceptron MLP2 can be used as a non-linear transfer function to pick up the feature vector->

And (3) carrying out nonlinear transformation to obtain a node potential function of the node variable: />

Where dimension 4 represents the number of possible values for the node variable V, and answer variable a combination. In addition, MLP can be shared among all sentences ₂ The parameter (c) of (c).

Potential function

For each sentence pair(s) _i ,s _j ) Get the sentence pair representation h _si,sj ,. To cope with four variables (V) _i ,V _j ,E _ij And, A). Scoring. Using one MLP ₃ As a non-linear function, there will be a directed edge E _ij Carrying out nonlinear transformation to obtain an edge potential function of the directed edge variable:

wherein

Representing vector stitching.

Where dimension 16 represents four variables (V) _i ,V _j ,E _ij And A) the number of possible values of the combination. MLP can be shared among all sentence pairs ₃ The parameter (c) of (c).

Step 304, determining a pseudo-likelihood function of the parameterized joint distribution; and determining a variational approximation for approximating the representation of the pseudo-likelihood function to obtain a computer-solvable probabilistic graphical neural network model.

To simplify the computation, the parameterized joint distribution (also called joint probability distribution) can be approximately characterized using a pseudo-likelihood function.

In order to reduce the difficulty of determining the optimal allocation by using the pseudo-likelihood function, the computer is convenient to solve the joint distribution, and variation approximation for approximately representing the pseudo-likelihood function can be determined, so that a probability map neural network model which can be solved by the computer is obtained.

Variation approximation: based on the mean field assumption, the pseudo-likelihood of Y is approximated using a variational distribution (variational approximation) q (Y), where Y ∈ Y are independent of each other. Likewise, a neural network may be used to parameterize each individual distribution. The above variation approximation is expressed by equations (11) to (12):

/>

once the variational distribution (variational approximation) q (Y) is obtained, the condition p (Y-Y) can be provided for the pseudo-likelihood. Thereby avoiding sampling to determine the optimal distribution of the pseudo-likelihood functions.

And establishing a probability map neural network model through the steps. The probability map neural network model can be trained to obtain the trained probability map neural network model. The trained probabilistic graphical neural network model may be used to determine answers and demonstrative graphs corresponding to the question sentences according to the input associated context and the question sentences.

In some optional implementations, the probabilistic neural network model is trained by the following steps:

firstly, obtaining a training sample set, wherein the training sample set comprises a plurality of groups of training samples, and each group of training samples comprises a sample association context, a sample question sentence, a sample answer corresponding to the sample question sentence, and a sample demonstration graph of the sample answer obtained by the sample association context; the sample demonstration graph is a directed acyclic graph, the directed acyclic graph comprises nodes and directed edges between the nodes, the nodes are statements in the context, the directed edges between the nodes represent an inferred relationship between two associated nodes, and the sample associated context is associated context comprising answers corresponding to sample question statements;

secondly, taking the sample associated context and the sample question sentences as input, taking the sample answers and the sample demonstration graph as output, and training the probability graph neural network model to obtain the trained probability graph neural network model.

In some application scenarios, the training sample may be used to train the probabilistic graph neural network model for a preset number of times, so as to obtain a trained probabilistic graph neural network model. The preset times may include 1000 times, 5000 times, and the like, and this time is not limited.

In some other optional implementation manners, the training an initial training model by using the sample association context and the sample question statement as inputs and using the sample answer and the sample demonstration graph as outputs to obtain a trained neural network model includes:

first, a loss function is established based on the following steps: establishing joint distribution and approximate variation distribution among sample answers, the node feature vectors and the edge feature vectors based on the node feature vectors of the sample nodes and the edge feature vectors corresponding to the directed edges; determining the loss function according to the joint distribution and the approximate variation distribution.

Secondly, taking sample associated context and sample question sentences as input, taking sample answers and sample demonstration graphs as output, and training the probability graph neural network model by utilizing a back propagation algorithm based on the preset loss function until a preset condition is met.

The probability map neural network model is trained by using the back propagation algorithm based on the preset loss function, and reference may be made to the existing method for training the neural network model by using the back propagation algorithm, which is not described herein again.

In these alternative implementations, the loss function may be established first, and then the probability map neural network model may be trained using the loss function. The established loss function is related to node feature vectors of sample nodes, edge feature vectors corresponding to directed edges, and joint distribution and approximate variation distribution among sample answers, the node feature vectors and the edge feature vectors. The probability map neural network model has high matching degree, and can play a role in quickly optimizing the probability map neural network model in the process of training the probability map neural network model.

Further optionally, the establishing of joint distribution among the sample answers, the node feature vectors, and the edge feature vectors based on the node feature vectors of the sample nodes and the edge feature vectors corresponding to the directed edges includes: determining a first potential function for the sample answer based on the sample question and the global feature representation of the sample associated context; for each sample node of the sample demonstration graph, establishing a second potential function related to the sample node and the sample answer according to the feature vector of the sample node; and for each directed edge of the sample demonstration graph, establishing a third potential function related to the directed edge, the sample answer and two associated sample nodes according to the feature vector of the directed edge, wherein the feature vector of the directed edge is related to the feature vector corresponding to each of the two associated nodes.

For the training samples, the same methods as the formulas (7), (8), and (9) can be referred to respectively establish a first potential function related to the sample answer, a second potential function related to the sample node and the sample answer, and a third potential function related to the sample node and the sample answer and the two associated sample nodes, which are not described herein again.

The above-mentioned approximate variation distribution among the sample answer, the node feature vector, the edge feature vector based on the node feature vector of the sample node, the directed edge corresponds to, includes: determining a pseudo-likelihood function corresponding to the joint distribution; approximating the pseudo-likelihood function using a variational distribution based on a mean field hypothesis, wherein each variable in the variational distribution is independent of each other, the variables comprising: sample answers, nodes in the sample demographics, and directed edges in the sample demographics.

In order to facilitate calculation of the value of the loss function, the joint distribution among the sample answers, the sample demograph nodes, and the directed edges of the sample demograph may be approximated as a pseudo-likelihood function (refer to the relevant content of equation (10), which is not described herein again). Specifically, the determining the loss function according to the joint distribution and the approximate variation distribution includes: determining a first loss function and a second loss function according to the variation distribution so as to determine a third loss function according to the pseudo-likelihood function; the first loss function is used to characterize: the deviation between a prediction node included in a demonstration graph predicted by the probability graph neural network model and a sample node included in a sample demonstration graph; the second loss function is used to characterize: a deviation between a predicted directed edge included in a demo graph predicted by the probabilistic graph neural network model and a directed edge included in a sample demo graph; the third loss function is used to characterize: a deviation between a predicted answer included in a demo predicted by the probabilistic neural network model and a sample answer; wherein the nodes and directed edges in the third loss function determined by the pseudo-likelihood function are prediction results distributed by variation.

The approximate variation distribution among the sample answers, the node feature vectors, and the edge feature vectors can refer to equations (11) and (12), which are not described herein again.

Specifically, the first loss function described above is characterized by the following formula:

the second loss function described above is characterized by the following equation:

the third loss function may be characterized by the following equation:

wherein

Is the prediction result of variation approximation. P () here is a jointly distributed pseudo-likelihood function. />

Is a node labeled in the training sample; />

Are the directed edges labeled in the training samples. a is ^* Is the answer to the training sample label.

In some application scenarios, the sum of the first loss function, the second loss function, and the third loss function may be determined as the optimization objective.

Optionally, the preset condition includes:

and the sum of the first loss function, the second loss function and the third loss function meets a convergence condition.

In the continuous multiple training of the convergence condition, the change of the sum of the first loss function, the second loss function and the third loss function obtained by every two adjacent training is smaller than a preset change threshold; or

The sum of the first loss function, the second loss function, and the third loss function is minimal.

Through the process, the process of solving the value of the loss function can be simplified, and the training efficiency of the neural network model of the neural probability map can be improved.

Optionally, the preset condition includes that the number of training times reaches a preset number threshold.

The neural network model of the probability map can obtain higher prediction accuracy.

After the probabilistic neural network model is trained on a large sample training data set (for example, 7 ten thousand training data), the results after the test of the probabilistic neural network model are as follows: the answer accuracy is 99.99%; the accuracy of the demonstration chart was 88.8%.

The probabilistic graphical neural network model is trained on a small sample training data set (e.g., 30000, 10000, 1000 sets of training data randomly drawn from the large sample training sample). And testing the trained neural network model of the probability map. The test results were as follows: the training samples are 30000 groups, and the answer accuracy is 99.9%; the accuracy of the demonstration chart is 86.8%; 10000 training samples and 99.9% answer accuracy; the accuracy of the demonstration chart is 72.4%; the training samples are 1000 groups, and the answer accuracy is 82.1%; the accuracy of the demonstration chart was 21.1%. When the probability map neural network model is trained by using a small sample, the probability map neural network model with higher accuracy can be obtained. That is, the probability map neural network model with a high accuracy of the predicted result can be obtained by training the probability map neural network model with a small amount of training data.

After the probability map neural network model is trained by using the training samples of the non-target classes, the method of reasoning the problems of the classes according to the associated contexts of the classes by using the trained probability map neural network model is called Zero-shot Evaluation (Zero-shot Evaluation).

After the probabilistic graph neural network model is trained by using a large sample data set without a target class, the test sample of the target class is reasoned to obtain the following result: the answer accuracy is 96.3%; the accuracy of the demonstration chart is 79.3%.

From the data, the portability of the trained probabilistic graphical neural network model is high. When the trained probabilistic graph neural network model is applied to natural language reasoning environments of other categories, a higher reasoning result can be obtained.

Please refer to fig. 5, which illustrates an application scenario diagram of the information processing method based on natural language reasoning provided by the present disclosure.

As shown in fig. 5, the information input in fig. 5 includes: and associating the context and the question statement. The association context may include fact statements F1, F2 and judgment rule statements R1, R2, R3, R4, R5, R6. Question sentences Q1 and Q2 may be input. The question sentences Q1, Q2 can be input in several times. The answer A1 and the demonstration fig. 1 corresponding to the question sentence Q1 can be obtained through the above steps without steps 302 to 303, and the answer A2 and the demonstration fig. 2 corresponding to the question sentence Q2 can be obtained. Each statement can be considered a node. The answer A1 corresponding to the question statement Q1 is TRUE, and the argument graph corresponding to the answer A1 is a directed edge which is pointed to the node R2 by the node F2. The above demonstration chart illustrates the process by which the answer A1 can be obtained from the node F2 and the node R2: the fact statement on the node F2 determines that the wire is metal, and the judgment rule provided by the rule statement R4 determines that the answer to the question is 'pair'.

For question statement Q2, question statement Q2 "no current flows through the circuit" corresponds to answer A2: FALSE. Demonstration of the answer A2 fig. 2 shows the process of finding the answer A2 described above. NAF node, nega As Failure, indicates that under the closed World Assumption (Close World assertion), for a statement S, if it is inferred from the existing facts and rules that the statement S is not correct, i.e., the statement S is incorrect, it can be concluded that the non-statement S is correct. In fig. 5, NAF node means "the circuit dos not have the switch". The answer A2 can be obtained by demonstrating each node and directed edge in fig. 2. That is, a directed edge from node NAF to node R1, a directed edge from node F1 to node R1; directed edges of nodes R1 to R6; a directed edge from node NAF to node R3, and a directed edge from node R3 to node R6; a directed edge from node F2 to node R4, and a directed edge from node R4 to node R6. The above demonstration fig. 2 gives a demonstration of the answer A2.

The information processing method based on natural language reasoning provided by the embodiment highlights the steps of obtaining answers matched with questions and a demonstration graph by using a language model and a probabilistic graph neural network model; the probability graph neural network model is obtained by joint distribution of answers, nodes and directed edges, so that the obtained answers and the demonstration graph have high association degree, and the demonstration graph has high proving strength on the answers. The demonstration graph provided by the probabilistic graphical model neural network model can assist in predicting answers and improve the ability of answering questions. In addition, the probability graph neural network model is obtained by joint distribution of answers, nodes and directed edges, so that the probability graph neural network model with high result accuracy can be obtained by using less sample training.

With further reference to fig. 6, as an implementation of the method shown in the above figures, the present disclosure provides some embodiments of an information processing apparatus based on natural language reasoning, which correspond to the method embodiment shown in fig. 1, and which can be applied in various electronic devices.

As shown in fig. 6, the information processing apparatus based on natural language inference of the present embodiment includes: receiving section 601, determining section 602. The receiving unit 601 is configured to receive a question statement and an associated context, where the question statement is used to characterize a question to be given an answer; a determining unit 602, configured to determine an answer to the question and a demo graph of the obtained answer based on the question feature information of the question statement and the context feature information of the associated context; the argument graph represents a process of reasoning to obtain the answer through the associated context, the argument graph comprises associated nodes associated with the answer and directed edges between the associated nodes, and the associated nodes are at least one statement in the associated context.

In this embodiment, specific processes of the generating unit 601, the receiving unit 601, and the determining unit 602 of the information processing apparatus based on natural language inference and technical effects brought by the specific processes can refer to related descriptions of step 101 and step 102 in the corresponding embodiment of fig. 1, which are not described herein again.

In some optional implementations, the determining unit 602 is further configured to: inputting the question sentence and the associated context into a pre-trained language model to obtain a question feature vector and an associated context feature vector; and inputting the question feature vector and the associated context feature vector into a probability graph neural network model, and obtaining the answer and a demonstration graph of the answer through inference by the probability graph neural network model.

In some optional implementations, the probability map neural network model is obtained by: defining a combined distribution of a demonstration graph and answers of all possible answers of the question sentence through a probabilistic graph model to explicitly establish the dependence between the demonstration graph and the answers, wherein the demonstration graph comprises nodes of the sentence representing the associated context and directed edges among the nodes, and the combined distribution comprises answer variables, node variables and directed edge variables; explicitly establishing a dependency relationship among different variables in the joint distribution by using a designed answer potential function, a node potential function and an edge potential function; the node potential function is related to the nodes and the answers, and the edge potential function is related to the nodes, the answers and directed edges among the nodes; parameterizing each potential function by using a neural network to obtain parameterized combined distribution; determining a pseudo-likelihood function of the parameterized joint distribution; and determining a variational approximation for approximating and characterizing the pseudo-likelihood function to obtain a computer-solverable probabilistic graphical neural network model.

In some alternative implementations, the illustrated natural language inference based information processing apparatus further includes a training unit (not shown in the figures). The training unit is used for training the probability graph neural network model to obtain a trained probability graph neural network model based on the following steps: obtaining a training sample set, wherein the training sample set comprises a plurality of groups of training samples, and each group of training samples comprises a sample association context, a sample question sentence, a sample answer corresponding to the sample question sentence, and a sample demonstration graph of the sample answer obtained by the sample association context; the sample demonstration graph is a directed acyclic graph, the directed acyclic graph comprises nodes and directed edges between the nodes, the nodes are statements in the context, the directed edges between the nodes represent an inferred relationship between two associated nodes, and the sample associated context is associated context comprising answers corresponding to sample question statements; and taking the sample associated context and the sample question sentence as input, taking the sample answer and the sample demonstration graph as output, and training the probability graph neural network model to obtain the trained probability graph neural network model.

In some optional implementations, the training unit is further configured to establish the loss function based on: establishing joint distribution and approximate variation distribution among sample answers, the node feature vectors and the edge feature vectors based on the node feature vectors of the sample nodes and the edge feature vectors corresponding to the directed edges; determining the loss function according to the joint distribution and the approximate variation distribution; and taking the sample associated context and the sample question sentence as input, taking the sample answer and the sample demonstration graph as output, and training the probability graph neural network model by utilizing a back propagation algorithm based on the preset loss function until the preset condition is met.

In some optional implementations, the training unit is further to: determining a first potential function for the sample answer based on the sample question and the global feature representation of the sample associated context; for each sample node of the sample demonstration graph, establishing a second potential function related to the sample node and the sample answer according to the feature vector of the sample node; for each directed edge of the sample demonstration graph, establishing a third potential function related to the directed edge, the sample answer and two associated sample nodes according to the feature vector of the directed edge, wherein the feature vector of the directed edge is related to the feature vector corresponding to each of the two associated nodes; and parameterizing the joint distribution among the sample answers, the sample demo graph nodes and the sample demo graph directed edges on the basis of the first potential function, the second potential function corresponding to each sample node and the third potential function corresponding to each sample directed edge.

In some optional implementations, the training unit is further to: determining a pseudo-likelihood function corresponding to the joint distribution; approximating the pseudo-likelihood function using a variational distribution based on a mean field assumption, wherein each variable in the variational distribution is independent of each other, the variables comprising: sample answers, nodes in the sample demographics, and directed edges in the sample demographics.

In some optional implementations, the training unit is further to: determining a first loss function and a second loss function according to the approximate variation distribution; determining a third loss function according to the pseudo-likelihood function; the first loss function is used to characterize: the deviation between a prediction node included in a demonstration graph predicted by the probability graph neural network model and a sample node included in a sample demonstration graph; the second loss function is used to characterize: the deviation between a predicted directed edge included in a demo graph predicted by the probabilistic graph neural network model and a directed edge included in a sample demo graph; the third loss function is used to characterize: a deviation between a predicted answer included in a demo predicted by the probabilistic neural network model and a sample answer; wherein the nodes and directed edges in the third loss function determined by the pseudo-likelihood function are prediction results distributed by variation.

In some optional implementations, the preset condition includes: the sum of the first loss function, the second loss function and the third loss function meets a convergence condition; or the training times reach a preset time threshold.

In some optional implementations, the determining unit 602 is further configured to: retrieving at least one statement from the associated context using a preset retrieval method according to the question statement; using a preset coding method to code the retrieved at least one sentence; and inputting the encoded question sentence and the encoded at least one sentence into a pre-trained language model to obtain the question feature vector and the associated context feature vector.

Referring to fig. 7, fig. 7 illustrates an exemplary system architecture to which a natural language inference based information processing method based on natural language inference of an embodiment of the present disclosure may be applied.

As shown in fig. 7, the system architecture may include

terminal devices

701, 702, 703, a network 704, and a server 705. The network 704 serves to provide a medium for communication links between the

terminal devices

701, 702, 703 and the server 705. Network 704 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The

terminal devices

701, 702, 703 may interact with a server 705 over a network 704 to receive or send messages or the like. The

terminal devices

701, 702, 703 may have various client applications installed thereon, such as a web browser application, a search-type application, and a news-information-type application. The client applications in the

terminal devices

701, 702, and 703 may receive the instruction of the user, and complete corresponding functions according to the instruction of the user, for example, add corresponding information to the information according to the instruction of the user.

The

terminal devices

701, 702, and 703 may be hardware or software. When the

terminal devices

701, 702, and 703 are hardware, they may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, motion Picture Experts compression standard Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, motion Picture Experts compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like. When the

terminal devices

701, 702, and 703 are software, they can be installed in the electronic devices listed above. It may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 705 may provide various services, such as receiving question sentences and associated contexts sent by the

terminal devices

701, 702, and 703, analyzing the question sentences and the associated contexts, and sending the analysis results to the terminal devices.

It should be noted that the information display method provided by the embodiment of the present disclosure may be executed by a terminal device, and accordingly, the information display apparatus may be disposed in the

terminal devices

701, 702, and 703. In addition, the information display method provided by the embodiment of the present disclosure may also be executed by the server 705, and accordingly, an information display apparatus may be provided in the server 705.

It should be understood that the number of terminal devices, networks, and servers in fig. 7 are merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for an implementation.

Referring now to fig. 8, shown is a schematic diagram of an electronic device (e.g., a terminal device or a server of fig. 7) suitable for use in implementing embodiments of the present disclosure. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a fixed terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 8, an electronic device may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 801 that may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage means 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data necessary for the operation of the electronic apparatus 800 are also stored. The processing apparatus 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

Generally, the following devices may be connected to the I/O interface 805: input devices 806 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, or the like; output devices 807 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage 808 including, for example, magnetic tape, hard disk, etc.; and a communication device 809. The communication means 809 may allow the electronic device to communicate with other devices wirelessly or by wire to exchange data. While fig. 8 illustrates an electronic device having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, the processes described above with reference to the flow diagrams may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication means 809, or installed from the storage means 808, or installed from the ROM 802. The computer program, when executed by the processing apparatus 801, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may be separate and not incorporated into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: receiving question sentences and associated context, wherein the question sentences are used for representing questions to be given with answers; determining answers of the questions and a demonstration graph for obtaining the answers based on the question feature information of the question sentences and the context feature information of the associated context; the argument graph represents the process of obtaining the answer by the context inference, the argument graph is a directed acyclic graph, the directed acyclic graph comprises nodes and directed edges between the nodes, the nodes are statements in the context, and the directed edges between the nodes represent the inference relation between two associated nodes.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems on a chip (SOCs), complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. An information processing method based on natural language reasoning, comprising:

receiving question sentences and associated context, wherein the question sentences are used for representing questions to be given with answers;

determining answers to the questions and a demonstration graph of the obtained answers based on the question feature information of the question sentences and the context feature information of the associated context; the argument graph represents the process of obtaining the answer by the context inference, the argument graph is a directed acyclic graph, the directed acyclic graph comprises nodes and directed edges between the nodes, the nodes are statements in the context, and the directed edges between the nodes represent the inference relation between two associated nodes;

wherein the determining answers to the questions and the obtaining of a demographics of the answers based on the question feature information of the question sentences and the context feature information of the associated context comprises:

inputting the question sentence and the associated context into a pre-trained language model to obtain a question feature vector and an associated context feature vector;

and inputting the question feature vector and the associated context feature vector into a probability graph neural network model, and reasoning by the probability graph neural network model to obtain the answer and obtain a demonstration graph of the answer.

2. The method of claim 1, wherein the probabilistic neural network model is derived based on:

defining a combined distribution of a demonstration graph and answers of all possible answers of the question sentence through a probabilistic graph model to explicitly establish the dependence between the demonstration graph and the answers, wherein the demonstration graph comprises nodes of the sentence representing the associated context and directed edges among the nodes, and the combined distribution comprises answer variables, node variables and directed edge variables;

explicitly establishing a dependency relationship among different variables in the joint distribution by using a designed answer potential function, a node potential function and an edge potential function; the node potential function is related to the nodes and the answers, and the edge potential function is related to the nodes, the answers and directed edges among the nodes;

parameterizing each potential function by using a neural network to obtain parameterized combined distribution;

determining a pseudo-likelihood function of the parameterized joint distribution; and determining a variational approximation for approximating the representation of the pseudo-likelihood function to obtain a computer-solvable probabilistic graphical neural network model.

3. The method of claim 1, wherein the probabilistic neural network model is trained by:

obtaining a training sample set, wherein the training sample set comprises a plurality of groups of training samples, each group of training samples comprises a sample association context, a sample question sentence, a sample answer corresponding to the sample question sentence, and a sample demonstration chart of the sample answer obtained by the sample association context; the sample demonstration graph is a directed acyclic graph which comprises nodes and directed edges between the nodes, the nodes are statements in the sample association context, the directed edges between the nodes represent an inference relation between two associated nodes, and the sample association context is an association context comprising answers corresponding to sample question statements;

and taking the sample associated context and the sample question sentence as input, taking the sample answer and the sample demonstration graph as output, and training the probability graph neural network model to obtain the trained probability graph neural network model.

4. The method of claim 3, wherein the training a probabilistic graph neural network model with the sample associated context and the sample question statements as inputs and the sample answers and the sample demonstration graph as outputs to obtain the trained probabilistic graph neural network model comprises:

establishing a loss function based on the following steps: establishing joint distribution and approximate variation distribution among sample answers, the node feature vectors and the edge feature vectors based on the node feature vectors of the sample nodes and the edge feature vectors corresponding to the directed edges; determining the loss function according to the joint distribution and the approximate variation distribution;

and taking the sample associated context and the sample question sentence as input, taking the sample answer and the sample demonstration graph as output, and training the probability graph neural network model by utilizing a back propagation algorithm based on the loss function until a preset condition is met.

5. The method of claim 4, wherein the establishing of the joint distribution among the sample answers, the node feature vectors and the edge feature vectors based on the node feature vectors of the sample nodes and the edge feature vectors corresponding to the directed edges comprises:

determining a first potential function for the sample answer based on the sample question and the global feature representation of the sample associated context;

for each sample node of the sample demonstration graph, establishing a second potential function related to the sample node and the sample answer according to the feature vector of the sample node;

for each directed edge of the sample demonstration graph, establishing a third potential function related to the directed edge, the sample answer and two associated sample nodes according to the feature vector of the directed edge, wherein the feature vector of the directed edge is related to the feature vector corresponding to each of the two associated nodes;

and parameterizing the joint distribution among the sample answers, the sample demo graph nodes and the sample demo graph directed edges on the basis of the first potential function, the second potential function corresponding to each sample node and the third potential function corresponding to each sample directed edge.

6. The method of claim 4, wherein establishing approximate variational distributions among the sample answers, the node feature vectors and the edge feature vectors based on the node feature vectors of the sample nodes and the edge feature vectors corresponding to the directed edges comprises:

determining a pseudo likelihood function corresponding to the joint distribution;

approximating the pseudo-likelihood function using a variational distribution based on a mean field assumption, wherein each variable in the variational distribution is independent of each other, the variables comprising: sample answers, nodes in the sample demographics, and directed edges in the sample demographics.

7. The method of claim 6, wherein said determining the loss function from the joint distribution and the approximated variation distribution comprises:

determining a first loss function and a second loss function according to the variation distribution so as to determine a third loss function according to the pseudo-likelihood function;

the first loss function is used to characterize: deviations between prediction nodes included in a demo graph predicted by the probabilistic graph neural network model and sample nodes included in a sample demo graph;

the second loss function is used to characterize: a deviation between a predicted directed edge included in a demo graph predicted by the probabilistic graph neural network model and a directed edge included in a sample demo graph;

the third loss function is used to characterize: a deviation between a predicted answer included in a demo predicted by the probabilistic neural network model and a sample answer; wherein the nodes and directed edges in the third loss function determined by the pseudo-likelihood function are prediction results distributed by variation.

8. The method of claim 4, wherein the preset conditions include:

the sum of the first loss function, the second loss function and the third loss function meets a convergence condition; or

The training times reach a preset time threshold.

9. The method of claim 1, wherein the determining an answer to the question and deriving a demogram of the answer based on question feature information of the question sentence and context feature information of the context before inputting the question sentence and the context into a pre-trained language model resulting in a question feature vector and a context feature vector, further comprises:

retrieving at least one statement from the associated context using a preset retrieval method according to the question statement;

using a preset coding method to code the retrieved at least one sentence;

inputting the question statement and the associated context into a pre-trained language model to obtain a question feature vector and an associated context feature vector, wherein the method comprises the following steps:

and inputting the coded question sentence and the coded at least one sentence into a pre-trained language model to obtain the question feature vector and the associated context feature vector.

10. An information processing apparatus based on natural language reasoning, comprising:

a receiving unit, configured to receive a question statement and an associated context, where the question statement is used to characterize a question to be given an answer;

the determining unit is used for determining answers of the questions and obtaining demonstration graphs of the answers based on the question feature information of the question sentences and the context feature information of the associated contexts; the argument graph represents the process of reasoning to obtain the answer through the associated context, the argument graph comprises associated nodes associated with the answer and directed edges among the associated nodes, and the associated nodes are at least one statement in the associated context;

the determining unit is specifically configured to input the question statement and the associated context to a pre-trained language model to obtain a question feature vector and an associated context feature vector; and inputting the question feature vector and the associated context feature vector into a probability graph neural network model, and reasoning by the probability graph neural network model to obtain the answer and obtain a demonstration graph of the answer.

11. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-9.

12. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-9.