WO2023274187A1 - Information processing method and apparatus based on natural language inference, and electronic device - Google Patents

Information processing method and apparatus based on natural language inference, and electronic device Download PDF

Info

Publication number
WO2023274187A1
WO2023274187A1 PCT/CN2022/101739 CN2022101739W WO2023274187A1 WO 2023274187 A1 WO2023274187 A1 WO 2023274187A1 CN 2022101739 W CN2022101739 W CN 2022101739W WO 2023274187 A1 WO2023274187 A1 WO 2023274187A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
answer
question
graph
nodes
Prior art date
Application number
PCT/CN2022/101739
Other languages
French (fr)
Chinese (zh)
Inventor
孙长志
张欣勃
周浩
李磊
Original Assignee
北京有竹居网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京有竹居网络技术有限公司 filed Critical 北京有竹居网络技术有限公司
Publication of WO2023274187A1 publication Critical patent/WO2023274187A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Definitions

  • the present disclosure relates to the technical field of the Internet, and in particular to an information processing method, device and electronic equipment based on natural language reasoning.
  • the knowledge base can be used for automatic reasoning.
  • the early work focused on reasoning on the formal representation that is, the expression form of each sentence in the knowledge base is logic rules, such as first-order logic. logic).
  • Embodiments of the present disclosure provide an information processing method, device, and electronic device based on natural language reasoning.
  • the embodiment of the present disclosure provides an information processing method based on natural language reasoning, including: receiving a question statement and an associated context, wherein the question statement is used to represent a question to be given an answer; a question based on the question statement
  • the feature information and the context feature information of the associated context determine the answer to the question and an argument graph for deriving the answer; the argument graph represents a process of deriving the answer from the associated context.
  • an embodiment of the present disclosure provides an information processing device based on natural language reasoning, including: a receiving unit configured to receive a question sentence and an associated context, wherein the question sentence is used to represent a question to be given an answer; A determining unit, configured to determine an answer to the question and an argument diagram for deriving the answer based on the question feature information of the question sentence and the context feature information of the associated context; The process of reasoning to arrive at the stated answer.
  • an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device for storing one or more programs, when the one or more programs are executed by the one or more processors, so that the one or more processors implement the information processing method based on natural language reasoning as described in the first aspect.
  • an embodiment of the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, the information processing method based on natural language reasoning as described in the first aspect is implemented.
  • the information processing method, device, and electronic device based on natural language reasoning receive a question statement and an associated context, wherein the question statement is used to represent a question to be given an answer; a question based on the question statement
  • the feature information and the context feature information of the associated context determine the answer to the question and obtain the argument graph of the answer, so that when the answer to the question is determined according to the associated context, the argument graph of the argument answer can be determined at the same time, which facilitates The user understands the process of getting the above answers.
  • the argument diagrams of this scheme can assist in the prediction of answers and improve the ability to answer questions.
  • FIG. 1 is a flowchart of some embodiments of an information processing method based on natural language reasoning according to the present disclosure
  • Fig. 2 is a flowchart of other embodiments of information processing methods based on natural language reasoning according to the present disclosure
  • Fig. 3 is a schematic flow chart of the establishment of the probability graph neural network model in the embodiment shown in Fig. 2;
  • Fig. 4 a shows a schematic diagram of the relationship between variable A in the joint distribution in the embodiment shown in Fig. 2, Vi and the side variable Eij;
  • Figure 4b shows the factor plot of the joint distribution of the example in Figure 4a
  • FIG. 5 is a schematic diagram of an application scenario of an information processing method based on natural language reasoning provided by the present disclosure
  • Fig. 6 is a schematic structural diagram of some embodiments of an information processing device based on natural language reasoning provided by the present disclosure
  • FIG. 7 is an exemplary system architecture in which the information processing method based on natural language reasoning according to an embodiment of the present disclosure can be applied;
  • Fig. 8 is a schematic diagram of a basic structure of an electronic device provided according to an embodiment of the present disclosure.
  • the term “comprise” and its variations are open-ended, ie “including but not limited to”.
  • the term “based on” is “based at least in part on”.
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one further embodiment”; the term “some embodiments” means “at least some embodiments.” Relevant definitions of other terms will be given in the description below.
  • FIG. 1 shows the flow of some embodiments of an information processing method based on natural language reasoning according to the present disclosure.
  • the information processing method based on natural language reasoning includes the following steps:
  • Step 101 receiving a question statement and an associated context, wherein the question statement is used to characterize a question to be answered.
  • question sentences and associated contexts are represented by natural language. For example, question sentences and associated contexts expressed in Chinese; or question sentences and associated contexts expressed in other languages.
  • the above-mentioned associative context may include facts and rules expressed in natural language.
  • the above facts may be, for example:
  • F1 The circuit includes a battery
  • F2 The connecting wire is a metal connecting wire.
  • Such rules may include, for example:
  • the above question statement may be a declarative statement.
  • the above-mentioned answer to be given may be an answer for giving a judgment result, and the above-mentioned answer may include, for example, "right” or “wrong", “yes” or “no” and so on.
  • Step 102 based on the question feature information of the question sentence and the context feature information of the associated context, determine the answer to the question and the argumentation diagram for obtaining the answer; the argumentation map represents the answer process.
  • the argument graph may be a directed acyclic graph, the directed acyclic graph includes nodes and directed edges between nodes, the nodes are statements in the associated context, and the directed edges between the nodes An edge represents an inferred relationship between two associated nodes.
  • the question feature information can be extracted from the question statement and the context feature information can be extracted from the associated context in various ways.
  • the feature information includes feature vectors.
  • the above step 102 may include the following steps:
  • the question sentence and the associated context are input into a pre-trained language model to obtain a question feature vector and an associated context feature vector.
  • the above-mentioned language model may be various existing models for determining feature vectors of natural languages.
  • the aforementioned models may be various types of machine learning models.
  • Various analyzes can be performed on the question feature vector and the associated context feature vector to determine the answer to the question statement and the argument graph.
  • the above step 102 further includes: according to the question statement, using a preset A retrieval method, retrieving at least one sentence from the associated context; encoding the at least one retrieved sentence using a preset encoding method; and inputting the question sentence and the associated context into the pre-training A language model for obtaining the question feature vector and the associated context feature vector, including: inputting the coded question sentence and the coded at least one sentence into a pre-trained language model to obtain the question feature vector and associated contextual feature vectors.
  • the aforementioned associated context may be a relatively long article or sentence paragraph.
  • At least one sentence may be retrieved from the associated context by using a preset retrieval method according to the question sentence.
  • the above sentence may be a sentence that is highly related to the question sentence.
  • the keyword can be used to retrieve from the associated context to obtain at least one sentence.
  • a feature vector analysis model such as a word vector model can be used to determine the feature vector of the encoded associated context, the feature vector of each sentence in the associated context; and determine the feature vector of the encoded question sentence.
  • the information processing method based on natural language reasoning receives question sentences and associated contexts, wherein the question sentences are used to characterize the questions to be answered; question feature information based on the question sentences and the association
  • the context characteristic information of the context determines the answer to the question and the argumentation diagram for the answer, so that when the answer to the question is determined according to the associated context, the argumentation diagram for the argumentation answer can be determined at the same time, so that the user can understand the process of obtaining the above answer.
  • the argument graph in this embodiment can assist in the prediction of the answer and improve the ability to answer questions.
  • FIG. 2 shows a flow chart of some other embodiments of information processing methods based on natural language reasoning provided by the present disclosure.
  • the information processing method based on natural language inference provided by these embodiments includes the following steps:
  • Step 201 receiving a question sentence and associated context, wherein the question sentence is used to characterize the question to be answered.
  • Natural language inference can use machine learning models to judge the semantic relationship between sentences. For example, input a set of sentences describing facts and judgment rules, and then input a question, and the answer to the question is determined by the above sentences and judgment rules.
  • the above machine learning model may include a language model and a probabilistic graph neural network model.
  • question sentences and associated contexts are represented by natural language. For example, question sentences and associated context expressed in Chinese; or question sentences and associated context expressed in other languages.
  • the above-mentioned associative context may include facts and rules expressed in natural language.
  • the above question statement may be a declarative statement.
  • the above-mentioned answer to be given may be an answer for giving a judgment result, and the above-mentioned answer may include, for example, "right” or “wrong", “yes” or “no” and so on.
  • Step 202 input the question sentence and the associated context into a pre-trained language model to obtain a question feature vector and an associated context feature vector.
  • the information processing method based on natural language reasoning further includes: according to the question sentence, using a preset retrieval method to retrieve at least one sentence from the associated context; using a preset coding method to retrieve the retrieved
  • the at least one statement is encoded; and step 202 may also include: inputting the encoded question statement and the encoded at least one statement into a pre-trained language model to obtain the question feature vector and associated context Feature vector.
  • the concatenation of the associated context C (facts and rules) and the question Q can be input into the language model.
  • the associated context C and question Q can be separated using preset tags.
  • the aforementioned preset flags include "SEP".
  • the concatenation of the input context C and the question can be represented as: [CLS],C,[SEP],[SEP],Q,[SEP]. where "CLS" stands for context context global.
  • H CLS is the global feature vector of the associated context C; is the feature vector of node S i in the associated context; is the eigenvector of the directed edge S i -> S j . in Represents the concatenation operation of vectors.
  • Step 203 input the question feature vector and the associated context feature vector into a probabilistic graph neural network model, and deduce the answer and an argument graph of the answer from the probabilistic graph neural network model.
  • the above answers can be used to characterize answers that mean right or wrong, yes or no.
  • the above argument graph may include multiple nodes.
  • a node can be a fact, a rule (both expressed in natural language) or a NAF node.
  • NAF node Negation As Failure, which means that under the Close World Assumption (CWA), for a statement S, if it cannot be inferred that S is correct based on existing facts and rules, that is, S is wrong, then it can be Introducing non-S is correct. It should be noted that, under the closed world assumption, there is no fact of the negative form, and there is no rule to draw a negative conclusion. Because negative facts and rules are redundant under CWA.
  • Fig. 3 shows the establishment steps of the probability graph neural network model in the embodiment shown in Fig. 2 above.
  • the establishment steps of the probabilistic graph neural network model include the following:
  • Step 301 define the argument graphs of all possible answers to the question statement and the joint distribution of the answers through the probabilistic graph model, so as to explicitly establish the dependence between the argument graphs and the answers.
  • the argument graph includes nodes representing sentences of the associated context and directed edges between nodes, and the joint distribution includes answer variables, node variables and directed edge variables.
  • each node can correspond to a statement in the associated context.
  • Possible answers, nodes in the possible argument graph, and directed edges between nodes in the possible argument graph may be included as variables in the joint distribution.
  • Step 302 using the designed answer potential function, node potential function and edge potential function to explicitly establish the dependence relationship between different variables in the joint distribution.
  • the above edge potential functions are related to nodes, answers, and directed edges between nodes.
  • Probabilistic graphical models can use graph theory to represent the relationship between several independent random variables.
  • the graph obtained according to the probability graph model can include multiple nodes, and any node can be a random variable. If there is no border between two nodes, the two variables can be considered to be conditionally independent of each other.
  • Two common probabilistic graph models are graphs with directed edges and graphs with undirected edges. According to the directionality of graphs, probabilistic graphical models can be divided into two categories, Bayesian networks and Markov networks. The present disclosure may employ undirected probabilistic graphical models.
  • a joint distribution can be defined over all possible Ys, denoted p(y), where:
  • the factorization in the above formula can describe the correlation among the answer variable A, the node variable V i and the edge variable E ij .
  • Fig. 4a shows a schematic diagram of the relationship among answer variable A, node variable V i and edge variable E ij .
  • the associated context in Fig. 4a includes statements S 1 , S 2 , S 3 .
  • the above association context may include three nodes: S 1 , S 2 , and S 3 .
  • the answer to the question statement is "True”.
  • the proof graph (proof) includes node S 1 , node S 2 and directed edges from node S 1 to node S 3 .
  • the solid circles of the nodes in the right figure indicate that when the answer variable A is 1, the value of node V 1 is 1, the value of node V 2 is 0, and the value of node V 3 is 1.
  • the value of directed edge E 13 node V 1 pointing to node V 3 ) is 1, and the values of other directed edges E 12 , E 23 , E 32 , and E 21 are 0.
  • Figure 4b shows a factor plot of the joint distribution p(Y) for the example in Figure 4a.
  • the above factors include nodes V1, V2, V3, edges E 12 , E 13 , E 21 , E 23 , E 31 , E 32 , answer V A , potential functions between each node and the answer The potential function ⁇ A of the answer; the potential function corresponding to each side The relationship between the various factors.
  • Step 303 using the neural network to parameterize each potential function to obtain a parameterized joint distribution.
  • MLP Multilayer Perceptron
  • the feature vector of the sentence S i can be calculated by step 302
  • another multi-layer perceptron MLP2 can be used as a non-linear transfer function for the feature vector
  • Nonlinearization yields nodal potential functions for nodal variables:
  • Dimension 4 represents the number of possible combinations of node variable V and answer variable A. Additionally the parameters of MLP 2 can be shared across all sentences.
  • Dimension 16 represents the number of possible values of combinations of four variables (V i , V j , E ij , A).
  • the parameters of MLP 3 can be shared across all sentence pairs.
  • Step 304 determining the pseudo-likelihood function of the parameterized joint distribution; and determining a variational approximation for approximately characterizing the pseudo-likelihood function, so as to obtain a computer-solvable probability graph neural network model.
  • a pseudo-likelihood function can be used to approximate the above parameterized joint distribution (also known as joint probability distribution).
  • the variational approximation used to approximate the pseudo-likelihood function can be determined to obtain a computer-solvable probability graph neural network model .
  • Variational approximation Based on the mean field assumption, the pseudo-likelihood of Y is approximated using a variational distribution (variational approximation) q(Y), where y ⁇ Y is independent of each other. Likewise, each independent distribution can be parameterized with a neural network.
  • variational approximation is expressed by formulas (11) ⁇ (12):
  • -Y) can be provided for the pseudo-likelihood. Sampling to determine the optimal distribution of the pseudo-likelihood function is thereby avoided.
  • the probabilistic graph neural network model is established through the above steps.
  • the above probabilistic graph neural network model can be trained to obtain the trained probabilistic graph neural network model.
  • the trained probabilistic graph neural network model can be used to determine the answer and argument graph corresponding to the question sentence according to the input associated context and the question sentence.
  • the probabilistic graph neural network model is obtained through the following steps of training:
  • the training sample set includes multiple sets of training samples, each set of training samples includes a sample associated context, a sample question sentence, a sample answer corresponding to the sample question sentence, and the sample associated context obtained from the sample A sample demonstration graph of the answer; wherein, the sample demonstration graph is a directed acyclic graph, and the directed acyclic graph includes nodes and directed edges between nodes, and the nodes are statements in the context, so The directed edge between the nodes represents an inferred relationship between two associated nodes, and the sample association context is an association context including answers corresponding to sample question sentences.
  • sample associated context and the sample question statement are used as input, and the sample answer and the sample argument graph are used as output to train the probability graph neural network model to obtain the trained probability graph neural network model.
  • the trained probabilistic graph neural network model can be obtained after training the probabilistic graph neural network model for a preset number of times using the above training samples.
  • the preset number of times mentioned above may include 1000 times, 5000 times, etc., and there is no limitation this time.
  • sample associated context and sample question statement are used as input, and the sample answer and sample argumentation diagram are used as output to train the initial training model to obtain a trained neural network model, include:
  • the loss function is established based on the following steps: based on the node eigenvectors of the sample nodes and the edge eigenvectors corresponding to the directed edges, the joint distribution and approximate variational distribution among the sample answers, node eigenvectors, and edge eigenvectors are established; according to The joint distribution and the approximated variational distribution determine the loss function.
  • sample associated context and sample question statement are used as input, and the sample answer and sample argument graph are used as output, and the probability graph neural network model is trained based on the preset loss function using the backpropagation algorithm until Meet the preset conditions.
  • the loss function can be established first, and then the probability graph neural network model can be trained using the loss function.
  • the established loss function is related to the node feature vector of the sample node, the edge feature vector corresponding to the directed edge, and the joint distribution and approximate variational distribution between the sample answer, node feature vector, and edge feature vector.
  • the degree of matching with the probabilistic graph neural network model of the present disclosure is high, and in the process of training the above probabilistic graph neural network model, the effect of quickly optimizing the above probabilistic graph neural network model can be achieved.
  • the establishment of a joint distribution among sample answers, node feature vectors, and edge feature vectors based on the node feature vector of the sample node and the edge feature vector corresponding to the directed edge includes: according to the sample question and the sample association context The global feature representation of , determine the first potential function about the sample answer; for each sample node in the sample argument graph, according to the feature vector of the sample node, establish the second potential function about the sample node and the sample answer; for For each directed edge of the sample demonstration graph, according to the eigenvector of the directed edge, a third potential function about the directed edge, the sample answer and the associated two sample nodes is established, and the characteristic of the directed edge
  • the vectors are related to the corresponding eigenvectors of the two nodes that are related.
  • the first potential function about the sample answer, the second potential function about the sample node and the sample answer, the sample has The third potential function of the edge, the sample answer and the associated two sample nodes will not be described here.
  • an approximate variational distribution among the sample answers, node eigenvectors, and edge eigenvectors is established, including: determining the pseudo-likelihood corresponding to the joint distribution function; based on the mean field assumption, using a variational distribution to approximate the pseudo-likelihood function, wherein each variable in the variational distribution is independent of each other, the variables include: sample answer, the sample argument graph The nodes and sample graphs of demonstrate the directed edges in the graph.
  • the joint distribution between the sample answer, the node of the sample argumentation graph and the directed edge of the sample argumentation graph can be approximated as a pseudo-likelihood function (refer to the relevant content of formula (10), I won’t go into details this time).
  • the above-mentioned determining the loss function according to the joint distribution and the approximate variational distribution includes: determining the first loss function and the second loss function according to the variational distribution, so as to determine the first loss function according to the pseudo-likelihood function Three loss functions; the first loss function is used to characterize: the deviation between the prediction nodes included in the demonstration graph predicted by the probability graph neural network model and the sample nodes included in the sample demonstration graph; the The second loss function is used to characterize: the deviation between the predicted directed edges included in the argument graph predicted by the probabilistic graph neural network model and the directed edges included in the sample argument graph; the third loss The function is used to characterize: the deviation between the predicted answer included in the argument graph predicted by the probabilistic graph neural network model and the sample answer; wherein, the nodes in the third loss function determined by the pseudo-likelihood function and directed edges are the predicted outcomes through the variational distribution.
  • the above-mentioned first loss function is characterized by the following formula:
  • P() is the pseudo-likelihood function of the joint distribution.
  • a * is the answer labeled by the training sample.
  • the sum of the first loss function, the second loss function, and the third loss function may be determined as the optimization target.
  • the preset conditions mentioned above include:
  • the sum of the first loss function, the second loss function and the third loss function satisfies a convergence condition.
  • the change of the sum of the first loss function, the second loss function, and the third loss function obtained in every two adjacent trainings is less than a preset change threshold;
  • the sum of the first loss function, the second loss function, and the third loss function is the smallest.
  • the process of calculating the value of the loss function can be simplified, which is conducive to improving the training efficiency of the neural network model of the neural probability graph.
  • the aforementioned preset condition includes that the number of training times reaches a preset number of times threshold.
  • the neural network model of the above probability map can obtain higher prediction accuracy.
  • a large-sample training data set can include, for example, 70,000 training data
  • the result after the probabilistic graph neural network model test the answer accuracy rate is 99.99% ;
  • the accuracy of the demonstration graph is 88.8%.
  • the above-mentioned probabilistic graph neural network model is trained in a small-sample training data set (for example, 30,000, 10,000, and 1,000 sets of training data randomly selected from the above-mentioned large-sample training samples).
  • the test results are as follows: the training sample is 30,000 groups, and the answer accuracy rate is 99.9%; the accuracy rate of the demonstration map is 86.8%; the training sample is 10,000 groups, the answer accuracy rate is 99.9%; the accuracy rate of the argument map is 72.4%; There are 1000 groups, the accuracy rate of the answer is 82.1%; the accuracy rate of the argument graph is 21.1%.
  • a probabilistic graph neural network model with high accuracy can be obtained. That is, a small amount of training data can be used to train the probabilistic graph neural network model, and a probabilistic graph neural network model with high prediction result accuracy can be obtained.
  • the results of inference on the test sample of the target category are as follows: the accuracy of the answer is 96.3%; the accuracy of the argument graph is 79.3% .
  • FIG. 5 shows a schematic diagram of an application scenario of the information processing method based on natural language reasoning provided by the present disclosure.
  • the input information in FIG. 5 includes: associated context and question statement.
  • the associated context may include fact statements F1, F2 and judgment rule statements R1, R2, R3, R4, R5, R6.
  • Question sentences Q1 and Q2 can be input.
  • the above question statements Q1 and Q2 can be input in batches.
  • the answer A1 and argument diagram 1 corresponding to the question statement Q1 can be obtained through the above steps 302-303, and the answer A2 and argument diagram 2 corresponding to the question statement Q2 can be obtained.
  • Each statement can be thought of as a node.
  • the corresponding answer A1:TRUE of the question statement Q1, and the argument graph corresponding to the answer A1 is a directed edge from node F2 to node R2.
  • the above demonstration diagram illustrates the process of obtaining the above answer A1 from nodes F2 and R2: first, the fact statement on node F2 determines that the wire is metal, and then the judgment rule provided by the rule statement R4 determines that the answer to the question is "Yes ".
  • NAF node Negation As Failure, which means that under the Close World Assumption (Close World Assumption), for a statement S, if it cannot be inferred from the existing facts and rules that the statement S is correct, that is, the statement S is wrong, then it can be Introducing the non-statement S is correct.
  • the NAF node means "the circuit does not have the switch”. The above answer A2 can be obtained by demonstrating the nodes and directed edges in Figure 2.
  • the directed edge from node NAF to node R1 the directed edge from node F1 to node R1; the directed edge from node R1 to node R6; the directed edge from node NAF to node R3, and the directed edge from node R3 to node R6
  • the above argument Figure 2 shows the argument that leads to answer A2.
  • the information processing method based on natural language reasoning highlights the steps of using a language model and a probability graph neural network model to obtain answers to questions and argument graphs; the above-mentioned probability graph neural network model consists of answers, nodes, directed The joint distribution of the edges is obtained, therefore, the obtained answer and the argument graph have a high degree of correlation, and the argument graph has a greater strength to prove the answer.
  • the argument graph given by the above probabilistic graph model neural network model can assist in the prediction of answers and improve the ability to answer questions.
  • the probabilistic graph neural network model is obtained from the joint distribution of answers, nodes, and directed edges, it can be trained with few samples, and a probabilistic graph neural network model with high accuracy of results can be obtained.
  • the present disclosure provides some embodiments of an information processing device based on natural language reasoning, which corresponds to the method embodiment shown in FIG. 1 , the device can be specifically applied to various electronic devices.
  • the information processing device based on natural language reasoning in this embodiment includes: a receiving unit 601 and a determining unit 602 .
  • the receiving unit 601 is used to receive the question statement and the associated context, wherein the question statement is used to characterize the question to be answered;
  • the determining unit 602 is used to receive the question characteristic information based on the question statement and the associated context The contextual feature information of the question, determine the answer to the question and the argument graph for deriving the answer; the argument graph represents the process of deriving the answer from the associated context reasoning
  • the specific processing of the generating unit 601, the receiving unit 601, and the determining unit 602 of the information processing device based on natural language inference and the technical effects brought about by them can refer to step 101 and step 101 in the corresponding embodiment in FIG. Relevant descriptions of 102 will not be repeated here.
  • the argument graph is a directed acyclic graph
  • the directed acyclic graph includes nodes and directed edges between nodes
  • the nodes are statements in the associated context
  • a directed edge between the nodes represents an inferred relationship between the associated two nodes.
  • the determining unit 602 is further configured to: input the question statement and the associated context into a pre-trained language model to obtain a question feature vector and an associated context feature vector; And the associated context feature vector is input to a probabilistic graph neural network model, and the answer and a demonstration graph of the answer are obtained by inference from the probabilistic graph neural network model.
  • the probabilistic graph neural network model is obtained by the following steps: define the argument graph of all possible answers to the question statement and the joint distribution of the answers through the probabilistic graph model, so as to explicitly establish the argument Dependence between the graph and the answer, the argument graph includes the nodes representing the sentences of the associated context and the directed edges between the nodes, and the joint distribution includes answer variables, node variables and directed edge variables; using the designed answer potential function , node potential function and edge potential function explicitly establish the dependency between different variables in the joint distribution; the node potential function is related to the node and the answer, and the edge potential function is related to the node, answer, and Edge-wise correlation; use the neural network to parameterize each potential function to obtain a joint distribution after parameterization; determine the pseudo-likelihood function of the joint distribution after parameterization; and determine the Variational approximation of the pseudo-likelihood function to obtain a computer-solvable probabilistic graph neural network model.
  • the shown natural language reasoning-based information processing apparatus further includes a training unit (not shown in the figure).
  • the training unit is used to train the probability graph neural network model based on the following steps to obtain the trained probability graph neural network model: obtain a training sample set, the training sample set includes multiple sets of training samples, and each set of training samples includes a sample associated context , a sample question sentence, a sample answer corresponding to a sample question sentence, and a sample argumentation graph obtained from the sample associated context to obtain the sample answer; wherein, the sample argumentation graph is a directed acyclic graph, and the directed acyclic
  • the graph includes nodes and directed edges between nodes, the nodes are sentences in the context, the directed edges between the nodes represent the inferred relationship between two associated nodes, and the sample association context is Including the associated context of the answer corresponding to the sample question statement; using the sample associated context and the sample question statement as input, the sample answer and the sample argument graph as output, the probability graph neural network model is trained, and the trained probability graph neural network model
  • the training unit is further used to establish the loss function based on the following steps: based on the node feature vector of the sample node and the edge feature vector corresponding to the directed edge, the relationship between the sample answer, the node feature vector, and the edge feature vector is established. The joint distribution between them and the approximate variational distribution; the loss function is determined according to the joint distribution and the approximated variational distribution; the sample association context and the sample question statement are used as input, and the sample answer and the sample argument graph are used as output and using a backpropagation algorithm based on the preset loss function to train the probabilistic graph neural network model until a preset condition is met.
  • the training unit is further used to: determine the first potential function of the sample answer according to the global feature representation of the sample question and the sample context; for each sample node of the sample argument graph, according to the sample The eigenvector of the node, establishes the second potential function about the sample node and the sample answer; for each directed edge of the sample argument graph, according to the eigenvector of the directed edge, establishes the The third potential function of the sample answer and the associated two sample nodes, the eigenvectors of the directed edges are related to the respective eigenvectors corresponding to the two related nodes; based on the first potential function, each sample node is respectively The corresponding second potential function and the third potential function respectively corresponding to the directed edges of each sample parameterize the joint distribution among the sample answers, the nodes of the sample argument graph and the directed edges of the sample argument graph.
  • the training unit is further used to: determine the pseudo-likelihood function corresponding to the joint distribution; use a variational distribution to approximate the pseudo-likelihood function based on the mean field assumption, wherein the variational
  • Each variable in the distribution is independent of each other, and the variables include: sample answers, nodes in the argument graph of the sample graph, and directed edges in the argument graph of the sample graph.
  • the training unit is further configured to: determine a first loss function and a second loss function according to the approximate variational distribution; determine a third loss function according to the pseudo-likelihood function; the first A loss function is used to characterize: the deviation between the prediction nodes included in the demonstration graph predicted by the probability graph neural network model and the sample nodes included in the sample demonstration graph; the second loss function is used to characterize : the deviation between the predicted directed edges included in the argument graph predicted by the probability graph neural network model and the directed edges included in the sample argument graph; the third loss function is used to characterize: by the The deviation between the predicted answer included in the demonstration graph predicted by the probability graph neural network model and the sample answer; wherein, the nodes and directed edges in the third loss function determined by the pseudo-likelihood function are obtained by changing distribution of prediction results.
  • the preset condition includes: the sum of the first loss function, the second loss function, and the third loss function satisfies a convergence condition; or the number of training times reaches a preset number threshold.
  • the determining unit 602 is further configured to: use a preset retrieval method to retrieve at least one sentence from the associated context according to the question sentence; Encoding the at least one statement; and inputting the encoded question statement and the encoded at least one statement into a pre-trained language model to obtain the question feature vector and the associated context feature vector.
  • FIG. 7 shows an exemplary system architecture in which an information processing method based on natural language reasoning based on natural language reasoning according to an embodiment of the present disclosure can be applied.
  • the system architecture may include terminal devices 701 , 702 , and 703 , a network 704 , and a server 705 .
  • the network 704 is used as a medium for providing communication links between the terminal devices 701 , 702 , 703 and the server 705 .
  • Network 704 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.
  • the terminal devices 701, 702, 703 can interact with the server 705 through the network 704 to receive or send messages and the like.
  • client applications such as web browser applications, search applications, and news information applications, may be installed on the terminal devices 701, 702, and 703.
  • the client applications in the terminal devices 701, 702, and 703 can receive user instructions and complete corresponding functions according to the user instructions, such as adding corresponding information to information according to the user instructions.
  • Terminal devices 701, 702, and 703 may be hardware or software.
  • the terminal devices 701, 702, and 703 may be various electronic devices that have display screens and support web browsing, including but not limited to smartphones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, moving picture expert compression standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, moving picture expert compression standard audio layer 4) player, laptop portable computer and desktop computer, etc.
  • the terminal devices 701, 702, and 703 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (such as software or software modules for providing distributed services), or as a single software or software module. No specific limitation is made here.
  • the server 705 can provide various services, such as receiving question sentences and associated contexts sent by terminal devices 701 , 702 , and 703 , analyzing and processing the question sentences and associated contexts, and sending analysis and processing results to the terminal devices.
  • the information display method provided by the embodiments of the present disclosure can be executed by a terminal device, and accordingly, the information display apparatus can be set in the terminal devices 701, 702, and 703.
  • the information display method provided by the embodiment of the present disclosure may also be executed by the server 705 , and accordingly, the information display device may be set in the server 705 .
  • terminal devices, networks and servers in FIG. 7 are only illustrative. According to the implementation needs, there can be any number of terminal devices, networks and servers.
  • FIG. 8 it shows a schematic structural diagram of an electronic device (such as the terminal device or server in FIG. 7 ) suitable for implementing the embodiments of the present disclosure.
  • the terminal equipment in the embodiment of the present disclosure may include but not limited to such as mobile phone, notebook computer, digital broadcast receiver, PDA (personal digital assistant), PAD (tablet computer), PMP (portable multimedia player), vehicle terminal (such as mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers and the like.
  • the electronic device shown in FIG. 8 is only an example, and should not limit the functions and scope of use of the embodiments of the present disclosure.
  • an electronic device may include a processing device (such as a central processing unit, a graphics processing unit, etc.) (RAM) 803 to execute various appropriate actions and processing.
  • a processing device such as a central processing unit, a graphics processing unit, etc.
  • RAM random access memory
  • various programs and data necessary for the operation of the electronic device 800 are also stored.
  • the processing device 801, ROM 802, and RAM 803 are connected to each other through a bus 804.
  • An input/output (I/O) interface 805 is also connected to the bus 804 .
  • the following devices can be connected to the I/O interface 805: input devices 806 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 807 such as a computer; a storage device 808 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 809.
  • the communication means 809 may allow the electronic device to perform wireless or wired communication with other devices to exchange data. While FIG. 8 shows an electronic device having various means, it is to be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from a network via communication means 809, or from storage means 808, or from ROM 802.
  • the processing device 801 When the computer program is executed by the processing device 801, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are performed.
  • the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two.
  • a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
  • the client and the server can communicate using any currently known or future network protocols such as HTTP (HyperText Transfer Protocol, Hypertext Transfer Protocol), and can communicate with digital data in any form or medium
  • HTTP HyperText Transfer Protocol
  • the communication eg, communication network
  • Examples of communication networks include local area networks (“LANs”), wide area networks (“WANs”), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
  • the above-mentioned computer-readable medium bears one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: receives a question statement and an associated context, wherein the question statement is used to represent a given The question of the answer; based on the question feature information of the question sentence and the context feature information of the associated context, determine the answer to the question and the argumentation diagram for obtaining the answer;
  • the process of describing the answer, the argument graph is a directed acyclic graph, the directed acyclic graph includes nodes and directed edges between nodes, the nodes are statements in the associated context, and the nodes between The directed edge between represents the inferred relationship between two associated nodes.
  • Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet service provider such as AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
  • each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of a unit does not constitute a limitation of the unit itself under certain circumstances.
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs System on Chips
  • CPLD Complex Programmable Logical device
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

Disclosed in embodiments of the present disclosure are an information processing method and apparatus based on natural language inference, and an electronic device. One specific embodiment of the method comprises: receiving a question statement and an associated context, wherein the question statement is used for representing a question to be answered; and on the basis of question feature information of the question statement and context feature information of the associated context, determining an answer to the question and a demonstration graph for obtaining the answer, the demonstration graph representing a process of obtaining the answer by means of inference of the associated context. The demonstration graph for demonstration of the answer is determined at the same time, such that a user can know the process of obtaining the answer conveniently, and the credibility of the answer is improved.

Description

基于自然语言推理的信息处理方法、装置和电子设备Information processing method, device and electronic device based on natural language reasoning
相关申请的交叉引用Cross References to Related Applications
本申请要求于2021年07月01日提交的,申请号为202110744658.3、发明名称为“基于自然语言推理的信息处理方法、装置和电子设备”的中国专利申请的优先权,该申请的全文通过引用结合在本申请中。This application claims the priority of the Chinese patent application with application number 202110744658.3 and titled "Information Processing Method, Device and Electronic Equipment Based on Natural Language Reasoning" filed on July 1, 2021, the entire content of which is incorporated by reference incorporated in this application.
技术领域technical field
本公开涉及互联网技术领域,尤其涉及一种基于自然语言推理的信息处理方法、装置和电子设备。The present disclosure relates to the technical field of the Internet, and in particular to an information processing method, device and electronic equipment based on natural language reasoning.
背景技术Background technique
随着人工智能领域的发展,人们希望可以通过人工智能来理解自然语言,并在此基础上可以实现人机对话等。With the development of the field of artificial intelligence, people hope that artificial intelligence can be used to understand natural language, and on this basis, human-computer dialogue can be realized.
相关技术中,可以利用知识库进行自动推理。在知识库上实现自动推理,早期的工作聚焦在形式(formal)表示上进行推理,即知识库中每一条语句(sentence)的表现形式是逻辑规则(logic rules),如一阶逻辑(First-order logic)。In related technologies, the knowledge base can be used for automatic reasoning. To achieve automatic reasoning on the knowledge base, the early work focused on reasoning on the formal representation, that is, the expression form of each sentence in the knowledge base is logic rules, such as first-order logic. logic).
发明内容Contents of the invention
提供该公开内容部分以便以简要的形式介绍构思,这些构思将在后面的具体实施方式部分被详细描述。该公开内容部分并不旨在标识要求保护的技术方案的关键特征或必要特征,也不旨在用于限制所要求的保护的技术方案的范围。This Disclosure section is provided to introduce a simplified form of concepts that are described in detail that follow in the Detailed Description section. This disclosure part is not intended to identify key features or essential features of the claimed technical solution, nor is it intended to be used to limit the scope of the claimed technical solution.
本公开实施例提供了一种基于自然语言推理的基于自然语言推理的信息处理方法、装置和电子设备。Embodiments of the present disclosure provide an information processing method, device, and electronic device based on natural language reasoning.
第一方面,本公开实施例提供了基于自然语言推理的信息处理方法,包括:接收问题语句以及关联上下文,其中所述问题语句用于表征待给定答案的问题;基于所述问题语句的问题特征信息以及所述关联上下文的上下文特征信息,确定所述问题的答案以及得出所述答案的论证图;所述论证图表征由所述关联上下文推理得到所述答案的过程。In the first aspect, the embodiment of the present disclosure provides an information processing method based on natural language reasoning, including: receiving a question statement and an associated context, wherein the question statement is used to represent a question to be given an answer; a question based on the question statement The feature information and the context feature information of the associated context determine the answer to the question and an argument graph for deriving the answer; the argument graph represents a process of deriving the answer from the associated context.
第二方面,本公开实施例提供了一种基于自然语言推理的信息处 理装置,包括:接收单元,用于接收问题语句以及关联上下文,其中所述问题语句用于表征待给定答案的问题;确定单元,用于基于所述问题语句的问题特征信息以及所述关联上下文的上下文特征信息,确定所述问题的答案以及得出所述答案的论证图;所述论证图表征由所述关联上下文推理得到所述答案的过程。In a second aspect, an embodiment of the present disclosure provides an information processing device based on natural language reasoning, including: a receiving unit configured to receive a question sentence and an associated context, wherein the question sentence is used to represent a question to be given an answer; A determining unit, configured to determine an answer to the question and an argument diagram for deriving the answer based on the question feature information of the question sentence and the context feature information of the associated context; The process of reasoning to arrive at the stated answer.
第三方面,本公开实施例提供了一种电子设备,包括:一个或多个处理器;存储装置,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如第一方面所述的基于自然语言推理的信息处理方法。In a third aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device for storing one or more programs, when the one or more programs are executed by the one or more processors, so that the one or more processors implement the information processing method based on natural language reasoning as described in the first aspect.
第四方面,本公开实施例提供了一种计算机可读介质,其上存储有计算机程序,该程序被处理器执行时实现如第一方面所述的基于自然语言推理的信息处理方法。In a fourth aspect, an embodiment of the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, the information processing method based on natural language reasoning as described in the first aspect is implemented.
本公开实施例提供的基于自然语言推理的信息处理方法、装置和电子设备,通过接收问题语句以及关联上下文,其中所述问题语句用于表征待给定答案的问题;基于所述问题语句的问题特征信息以及所述关联上下文的上下文特征信息,确定所述问题的答案以及得出所述答案的论证图,实现了在根据关联上下文确定问题的答案时,可以同时确定论证答案的论证图,便于用户了解得到上述答案的过程。相比于仅给出答案或者分别生成答案和论证图的方案,本方案的论证图可以辅助答案的预测,提高回答问题的能力。The information processing method, device, and electronic device based on natural language reasoning provided by the embodiments of the present disclosure receive a question statement and an associated context, wherein the question statement is used to represent a question to be given an answer; a question based on the question statement The feature information and the context feature information of the associated context determine the answer to the question and obtain the argument graph of the answer, so that when the answer to the question is determined according to the associated context, the argument graph of the argument answer can be determined at the same time, which facilitates The user understands the process of getting the above answers. Compared with the schemes that only give answers or generate answers and argument diagrams separately, the argument diagrams of this scheme can assist in the prediction of answers and improve the ability to answer questions.
附图说明Description of drawings
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。The above and other features, advantages and aspects of the various embodiments of the present disclosure will become more apparent with reference to the following detailed description in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numerals denote the same or similar elements. It should be understood that the drawings are schematic and that elements and elements are not necessarily drawn to scale.
图1是根据本公开的基于自然语言推理的信息处理方法的一些实施例的流程图;FIG. 1 is a flowchart of some embodiments of an information processing method based on natural language reasoning according to the present disclosure;
图2是根据本公开的基于自然语言推理的信息处理方法的另外一些实施例的流程图;Fig. 2 is a flowchart of other embodiments of information processing methods based on natural language reasoning according to the present disclosure;
图3是图2所示实施例中的概率图神经网络模型建立的示意性流程图;Fig. 3 is a schematic flow chart of the establishment of the probability graph neural network model in the embodiment shown in Fig. 2;
图4a示出了图2所示实施例中联合分布中变量A,Vi和边变量Eij之间关联关系的一种示意图;Fig. 4 a shows a schematic diagram of the relationship between variable A in the joint distribution in the embodiment shown in Fig. 2, Vi and the side variable Eij;
图4b显示了图4a中例子的联合分布的因子图;Figure 4b shows the factor plot of the joint distribution of the example in Figure 4a;
图5是本公开提供的基于自然语言推理的信息处理方法的一个应用场景示意图;FIG. 5 is a schematic diagram of an application scenario of an information processing method based on natural language reasoning provided by the present disclosure;
图6是本公开提供的基于自然语言推理的信息处理装置的一些实 施例的结构示意图;Fig. 6 is a schematic structural diagram of some embodiments of an information processing device based on natural language reasoning provided by the present disclosure;
图7本公开的一个实施例的基于自然语言推理的信息处理方法可以应用于其中的示例性系统架构;FIG. 7 is an exemplary system architecture in which the information processing method based on natural language reasoning according to an embodiment of the present disclosure can be applied;
图8是根据本公开实施例提供的电子设备的基本结构的示意图。Fig. 8 is a schematic diagram of a basic structure of an electronic device provided according to an embodiment of the present disclosure.
具体实施方式detailed description
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although certain embodiments of the present disclosure are shown in the drawings, it should be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein; A more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for exemplary purposes only, and are not intended to limit the protection scope of the present disclosure.
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。It should be understood that the various steps described in the method implementations of the present disclosure may be executed in different orders, and/or executed in parallel. Additionally, method embodiments may include additional steps and/or omit performing illustrated steps. The scope of the present disclosure is not limited in this respect.
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。As used herein, the term "comprise" and its variations are open-ended, ie "including but not limited to". The term "based on" is "based at least in part on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one further embodiment"; the term "some embodiments" means "at least some embodiments." Relevant definitions of other terms will be given in the description below.
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。It should be noted that concepts such as "first" and "second" mentioned in this disclosure are only used to distinguish different devices, modules or units, and are not used to limit the sequence of functions performed by these devices, modules or units or interdependence.
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。It should be noted that the modifications of "one" and "multiple" mentioned in the present disclosure are illustrative and not restrictive, and those skilled in the art should understand that unless the context clearly indicates otherwise, it should be understood as "one or more" multiple".
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。The names of messages or information exchanged between multiple devices in the embodiments of the present disclosure are used for illustrative purposes only, and are not used to limit the scope of these messages or information.
请参考图1,其示出了根据本公开的基于自然语言推理的信息处理方法的一些实施例的流程。如图1所示该基于自然语言推理的信息处理方法,包括以下步骤:Please refer to FIG. 1 , which shows the flow of some embodiments of an information processing method based on natural language reasoning according to the present disclosure. As shown in Figure 1, the information processing method based on natural language reasoning includes the following steps:
步骤101,接收问题语句以及关联上下文,其中所述问题语句用于表征待给定答案的问题。 Step 101, receiving a question statement and an associated context, wherein the question statement is used to characterize a question to be answered.
上述问题语句以及关联上下文使用自然语言表征。例如使用中文表达的问题语句以及关联上下文;或者使用其他语言表达的问题语句及关联上下文。The above question sentences and associated contexts are represented by natural language. For example, question sentences and associated contexts expressed in Chinese; or question sentences and associated contexts expressed in other languages.
上述关联上下文可以包括使用自然语言表达的事实和规则。The above-mentioned associative context may include facts and rules expressed in natural language.
作为示意性说明,上述事实例如可以为:As an illustrative illustration, the above facts may be, for example:
F1:电路包括电池;F2:连接线是金属连接线。F1: The circuit includes a battery; F2: The connecting wire is a metal connecting wire.
上述规则例如可以包括:Such rules may include, for example:
R:如果电路包括开关,开关是打开的,则电路是完整的。R: If the circuit includes a switch and the switch is open, the circuit is complete.
在一些应用场景中,上述问题语句可以是陈述式语句。上述待给出的答案可以为用于给出判断结果的答案,上述答案例如可以包括“对”或“错”,“是”或“否”等。In some application scenarios, the above question statement may be a declarative statement. The above-mentioned answer to be given may be an answer for giving a judgment result, and the above-mentioned answer may include, for example, "right" or "wrong", "yes" or "no" and so on.
步骤102,基于所述问题语句的问题特征信息以及所述关联上下文的上下文特征信息,确定所述问题的答案以及得出答案的论证图;所述论证图表征由所述关联上下文推理得到所述答案的过程。 Step 102, based on the question feature information of the question sentence and the context feature information of the associated context, determine the answer to the question and the argumentation diagram for obtaining the answer; the argumentation map represents the answer process.
所述论证图可以为有向无环图,所述有向无环图包括节点和节点之间的有向边,所述节点为所述关联上下文中的语句,所述节点之间的有向边表征关联的两节点之间的推断关系。The argument graph may be a directed acyclic graph, the directed acyclic graph includes nodes and directed edges between nodes, the nodes are statements in the associated context, and the directed edges between the nodes An edge represents an inferred relationship between two associated nodes.
可以通过各种方式从所述问题语句中提取出问题特征信息,以及从所述关联上下文中提取出上下文特征信息。The question feature information can be extracted from the question statement and the context feature information can be extracted from the associated context in various ways.
在一些可选的实现方式中,上述特征信息包括特征向量。上述步骤102可以包括如下步骤:In some optional implementation manners, the feature information includes feature vectors. The above step 102 may include the following steps:
首先,将所述问题语句以及所述关联上下文输入到预训练的语言模型,得到问题特征向量以及关联上下文特征向量。First, the question sentence and the associated context are input into a pre-trained language model to obtain a question feature vector and an associated context feature vector.
其次,根据所述问题特征向量以及所述关联上下文特征向量,确定所述答案以及所述论证图。Second, according to the question feature vector and the associated context feature vector, determine the answer and the argument graph.
上述语言模型可以是现有的各种确定自然语言的特征向量的模型。上述模型可以为各种类型的机器学习模型。The above-mentioned language model may be various existing models for determining feature vectors of natural languages. The aforementioned models may be various types of machine learning models.
可以对问题特征向量与关联上下文特征向量进行各种分析,来确定上述问题语句的答案以及论证图。Various analyzes can be performed on the question feature vector and the associated context feature vector to determine the answer to the question statement and the argument graph.
可选地,在上述将所述问题语句以及所述关联上下文输入到预训练的语言模型,得到问题特征向量以及关联上下文特征向量之前,上述步骤102还包括:根据所述问题语句,使用预设检索方法,从所述关联上下文中检索出至少一个语句;使用预设编码方法将上述检索出的所述至少一个语句进行编码;以及所述将所述问题语句以及所述关联上下文输入到预训练的语言模型,得到所述问题特征向量以及关联上下文特征向量,包括:将编码化的所述问题语句以及编码化的所述至少一个语句,输入到预训练的语言模型,得到所述问题特征向量以及关联上下文特征向量。Optionally, before inputting the question statement and the associated context into the pre-trained language model to obtain the question feature vector and the associated context feature vector, the above step 102 further includes: according to the question statement, using a preset A retrieval method, retrieving at least one sentence from the associated context; encoding the at least one retrieved sentence using a preset encoding method; and inputting the question sentence and the associated context into the pre-training A language model for obtaining the question feature vector and the associated context feature vector, including: inputting the coded question sentence and the coded at least one sentence into a pre-trained language model to obtain the question feature vector and associated contextual feature vectors.
在这些可选的实现方式中,上述关联上下文可以是较长的一篇文章,或者句子段落。可以根据问题语句,使用预设检索方法,从所述关联上下文检索出至少一个语句。上述语句可以是与所述问题语句关联度较大的语句。In these optional implementation manners, the aforementioned associated context may be a relatively long article or sentence paragraph. At least one sentence may be retrieved from the associated context by using a preset retrieval method according to the question sentence. The above sentence may be a sentence that is highly related to the question sentence.
例如,可以根据问题语句的关键词,使用关键词从关联上下文中进行检索,得到至少一个语句。For example, according to the keyword of the question sentence, the keyword can be used to retrieve from the associated context to obtain at least one sentence.
可以使用单字节字符集(single-byte character set or SBCS)多字节字符集(multi-byte character set or MBCS)或Unicode方法等将所述至少一个语句所包括的字、词进行编码。从而得到计算机处理的编码化的所述至少一个语句。可以使用词向量编码方法等将问题语句所包括的字、词进行编码。从而得到编码化的问题语句。Can use single-byte character set (single-byte character set or SBCS) multi-byte character set (multi-byte character set or MBCS) or Unicode method etc. to encode the words and words included in the at least one sentence. Thereby obtaining the computer-processed coded at least one sentence. Words and phrases included in the question sentence can be encoded using a word vector encoding method or the like. Thus, the coded question sentence is obtained.
可以使用词向量模型等特征向量分析模型来确定编码化的关联上下文的特征向量、关联上下文中各语句的特征向量;以及确定编码化的问题语句的特征向量。A feature vector analysis model such as a word vector model can be used to determine the feature vector of the encoded associated context, the feature vector of each sentence in the associated context; and determine the feature vector of the encoded question sentence.
相关技术利用知识库进行自动推理,在知识库上实现自动推理的方案中,由于在形式(formal)表示上进行推理,需要构建知识库中每一条语句(sentence)的逻辑规则(logic rules),将一句话转化为logic rules的过程进行语义解析(semantic parsing)。利用知识库进行自动推理的方案,可以给出与问题对应的答案,没有在给出答案的同时给出论证,因此导致用户无法确定所得到的答案是否合理,使得相关技术中的利用知识库进行自动推理的方案显得回答能力较差。Related technologies use the knowledge base to perform automatic reasoning. In the scheme of realizing automatic reasoning on the knowledge base, since the reasoning is performed on the formal representation, it is necessary to construct logic rules for each sentence in the knowledge base. The process of converting a sentence into logic rules is performed by semantic parsing. The scheme of using the knowledge base for automatic reasoning can give the answer corresponding to the question, without giving the argument while giving the answer, so the user cannot determine whether the answer obtained is reasonable, making the use of the knowledge base in related technologies The automatic reasoning scheme appeared to be less capable of answering.
本实施例提供的基于自然语言推理的信息处理方法,通过接收问题语句以及关联上下文,其中所述问题语句用于表征待给定答案的问题;基于所述问题语句的问题特征信息以及所述关联上下文的上下文特征信息,确定所述问题的答案以及得出答案的论证图,实现了在根据关联上下文确定问题的答案时,可以同时确定论证答案的论证图,便于用户了解得到上述答案的过程。相比于仅给出答案或者分别生成答案和论证图的方案,本实施例的论证图可以辅助答案的预测,提高了回答问题的能力。The information processing method based on natural language reasoning provided by this embodiment receives question sentences and associated contexts, wherein the question sentences are used to characterize the questions to be answered; question feature information based on the question sentences and the association The context characteristic information of the context determines the answer to the question and the argumentation diagram for the answer, so that when the answer to the question is determined according to the associated context, the argumentation diagram for the argumentation answer can be determined at the same time, so that the user can understand the process of obtaining the above answer. Compared with the solution of only giving the answer or separately generating the answer and the argument graph, the argument graph in this embodiment can assist in the prediction of the answer and improve the ability to answer questions.
请继续参考图2,其示出了本公开提供的基于自然语言推理的信息处理方法另外一些实施例的流程图。Please continue to refer to FIG. 2 , which shows a flow chart of some other embodiments of information processing methods based on natural language reasoning provided by the present disclosure.
如图2所示,这些实施例提供的基于自然语言推理的信息处理方法,包括如下步骤:As shown in Figure 2, the information processing method based on natural language inference provided by these embodiments includes the following steps:
步骤201,接收问题语句以及关联上下文,其中所述问题语句用于表征待给定答案的问题。 Step 201, receiving a question sentence and associated context, wherein the question sentence is used to characterize the question to be answered.
自然语言推理可以使用机器学习模型判断句子之间的语义关系。例如,输入一组描述事实的句子以及判断规则,再输入一个问题,由上述句子以及判断规则确定问题的答案。在本实施例中,上述机器学习模型可以包括语言模型和概率图神经网络模型。Natural language inference can use machine learning models to judge the semantic relationship between sentences. For example, input a set of sentences describing facts and judgment rules, and then input a question, and the answer to the question is determined by the above sentences and judgment rules. In this embodiment, the above machine learning model may include a language model and a probabilistic graph neural network model.
上述问题语句以及关联上下文使用自然语言表征。例如使用中文表达的问题语句以及关联上下文;或者使用其他语言表达的问题语句 及关联上下文。The above question sentences and associated contexts are represented by natural language. For example, question sentences and associated context expressed in Chinese; or question sentences and associated context expressed in other languages.
上述关联上下文可以包括使用自然语言表达的事实和规则。The above-mentioned associative context may include facts and rules expressed in natural language.
在一些应用场景中,上述问题语句可以是陈述式语句。上述待给出的答案可以为用于给出判断结果的答案,上述答案例如可以包括“对”或“错”,“是”或“否”等。In some application scenarios, the above question statement may be a declarative statement. The above-mentioned answer to be given may be an answer for giving a judgment result, and the above-mentioned answer may include, for example, "right" or "wrong", "yes" or "no" and so on.
步骤202,将所述问题语句以及所述关联上下文输入到预训练的语言模型,得到问题特征向量以及关联上下文特征向量。 Step 202, input the question sentence and the associated context into a pre-trained language model to obtain a question feature vector and an associated context feature vector.
在所述步骤202之前,基于自然语言推理的信息处理方法方法还包括:根据问题语句,使用预设检索方法,从所述关联上下文中检索出至少一个语句;使用预设编码方法将检索出的所述至少一个语句进行编码;以及步骤202还可以包括:将编码化的所述问题语句以及编码化的所述至少一个语句,输入到预训练的语言模型,得到所述问题特征向量以及关联上下文特征向量。Before the step 202, the information processing method based on natural language reasoning further includes: according to the question sentence, using a preset retrieval method to retrieve at least one sentence from the associated context; using a preset coding method to retrieve the retrieved The at least one statement is encoded; and step 202 may also include: inputting the encoded question statement and the encoded at least one statement into a pre-trained language model to obtain the question feature vector and associated context Feature vector.
在本实施例中,可以向语言模型中输入关联上下文C(事实和规则)和问题Q的拼接。可以使用预设标记将关联上下文C和问题Q分开。上述预设标记包括“SEP”。作为示例,所输入的关联上下文C和问题的拼接可以表示为:[CLS],C,[SEP],[SEP],Q,[SEP]。其中“CLS”表示关联上下文全局。In this embodiment, the concatenation of the associated context C (facts and rules) and the question Q can be input into the language model. The associated context C and question Q can be separated using preset tags. The aforementioned preset flags include "SEP". As an example, the concatenation of the input context C and the question can be represented as: [CLS],C,[SEP],[SEP],Q,[SEP]. where "CLS" stands for context context global.
通过上述语言模型确定出如下三个特征向量:The following three feature vectors are determined through the above language model:
h A=h CLS       (1); h A = h CLS (1);
Figure PCTCN2022101739-appb-000001
Figure PCTCN2022101739-appb-000001
Figure PCTCN2022101739-appb-000002
其中
Figure PCTCN2022101739-appb-000002
in
H CLS为关联上下文C的全局特征向量;
Figure PCTCN2022101739-appb-000003
为关联上下文中节点S i的特征向量;
Figure PCTCN2022101739-appb-000004
为有向边S i->S j的特征向量。其中
Figure PCTCN2022101739-appb-000005
表示向量的拼接操作。
H CLS is the global feature vector of the associated context C;
Figure PCTCN2022101739-appb-000003
is the feature vector of node S i in the associated context;
Figure PCTCN2022101739-appb-000004
is the eigenvector of the directed edge S i -> S j . in
Figure PCTCN2022101739-appb-000005
Represents the concatenation operation of vectors.
步骤203,将所述问题特征向量以及所述关联上下文特征向量输入到的概率图神经网络模型,由所述概率图神经网络模型推理得到所述答案和得出所述答案的论证图。 Step 203 , input the question feature vector and the associated context feature vector into a probabilistic graph neural network model, and deduce the answer and an argument graph of the answer from the probabilistic graph neural network model.
上述答案可以用于表征对或错,是或否的含义的答案。The above answers can be used to characterize answers that mean right or wrong, yes or no.
上述论证图可以包括多个节点。节点可以是一个事实、规则(均用自然语言表达)或NAF节点。NAF节点:Negation As Failure,表示在封闭世界假设(Close World Assumption,CWA)下,对于一个陈述S,如果根据已有的事实和规则推断不出S是正确的,即S是错误的,则可推出非S是正确的。需要说明的是,在封闭世界假设下,没有否定形式的事实,也没有推出否定结论的规则。因为否定的事实和规则在CWA下是冗余的。The above argument graph may include multiple nodes. A node can be a fact, a rule (both expressed in natural language) or a NAF node. NAF node: Negation As Failure, which means that under the Close World Assumption (CWA), for a statement S, if it cannot be inferred that S is correct based on existing facts and rules, that is, S is wrong, then it can be Introducing non-S is correct. It should be noted that, under the closed world assumption, there is no fact of the negative form, and there is no rule to draw a negative conclusion. Because negative facts and rules are redundant under CWA.
请参考图3,其示出了上述图2所示实施例中的概率图神经网络模 型的建立步骤。如图3所示,概率图神经网络模型的建立步骤包括如下:Please refer to Fig. 3, which shows the establishment steps of the probability graph neural network model in the embodiment shown in Fig. 2 above. As shown in Figure 3, the establishment steps of the probabilistic graph neural network model include the following:
步骤301,通过概率图模型定义所述问题语句的所有可能的答案的论证图和答案的联合分布,以显式建立所述论证图和答案之间的依赖。 Step 301, define the argument graphs of all possible answers to the question statement and the joint distribution of the answers through the probabilistic graph model, so as to explicitly establish the dependence between the argument graphs and the answers.
论证图包括表征所述关联上下文的语句的节点和节点间的有向边,所述联合分布包括答案变量、节点变量和有向边变量。The argument graph includes nodes representing sentences of the associated context and directed edges between nodes, and the joint distribution includes answer variables, node variables and directed edge variables.
在论证图中,每个节点可以对应关联上下文中的一个语句。可以将可能的答案、可能的论证图中的节点以及可能的论证图中的节点间的有向边作为联合分布中的变量。In the argument graph, each node can correspond to a statement in the associated context. Possible answers, nodes in the possible argument graph, and directed edges between nodes in the possible argument graph may be included as variables in the joint distribution.
步骤302,利用设计的答案势函数、节点势函数和边势函数显式建立所述联合分布中的不同变量之间的依赖关系。 Step 302, using the designed answer potential function, node potential function and edge potential function to explicitly establish the dependence relationship between different variables in the joint distribution.
上述节点势函数与节点、答案相关。The above node potential functions are related to nodes and answers.
上述边势函数与节点、答案和节点间的有向边相关。The above edge potential functions are related to nodes, answers, and directed edges between nodes.
概率图模型可以用图论方法来表现数个独立随机变量之关系。根据概率图模型得到的图中可以包括多个节点,任一节点可以为随机变量,若两节点间无边相接,可以认为此二变量彼此条件独立。两种常见的概率图模型是具有向性边的图及具有无向性边的图。根据图的有向性,概率图模型可以分成两大类,分别是贝叶斯网络和马尔可夫网络。本公开可以采用无向概率图模型。Probabilistic graphical models can use graph theory to represent the relationship between several independent random variables. The graph obtained according to the probability graph model can include multiple nodes, and any node can be a random variable. If there is no border between two nodes, the two variables can be considered to be conditionally independent of each other. Two common probabilistic graph models are graphs with directed edges and graphs with undirected edges. According to the directionality of graphs, probabilistic graphical models can be divided into two categories, Bayesian networks and Markov networks. The present disclosure may employ undirected probabilistic graphical models.
具体地,给定一个关联上下文C=s 1,s 2,…,s n和一个问题语句Q,为所有变量分配真/假值。上述变量包括答案变量A,节点变量V i和边变量E ijSpecifically, given an associative context C=s 1 , s 2 , . . . , s n and a question statement Q, assign true/false values to all variables. The above variables include answer variable A, node variable V i and edge variable E ij .
使用如下表达式(4)表示所有输出变量:Use the following expression (4) to represent all output variables:
Figure PCTCN2022101739-appb-000006
Figure PCTCN2022101739-appb-000006
可以在所有可能的Y上定义一个联合分布,表示为p(y),其中:A joint distribution can be defined over all possible Ys, denoted p(y), where:
Figure PCTCN2022101739-appb-000007
其中
Figure PCTCN2022101739-appb-000007
in
Figure PCTCN2022101739-appb-000008
分别为答案变量A对应的答案势函数,节点变量V i对应的节点势函数和边变量E ij对应的边势函数。
Figure PCTCN2022101739-appb-000008
They are the answer potential function corresponding to the answer variable A, the node potential function corresponding to the node variable V i and the edge potential function corresponding to the edge variable E ij .
马尔科夫网络中通过定义一系列的函数用来评价变量之间相互影响的紧密关系,这些函数称之为势函数或者因子。In the Markov network, a series of functions are defined to evaluate the close relationship between variables, and these functions are called potential functions or factors.
上式中的因子化可以刻画答案变量A,节点变量V i和边变量E ij之间的相互关联。 The factorization in the above formula can describe the correlation among the answer variable A, the node variable V i and the edge variable E ij .
如图4a,其示出了答案变量A,节点变量V i和边变量E ij之间关联关系的一种示意图。图4a中的关联上下文包括语句S 1、S 2、S 3。上述关联上下文可以包括3个节点:S 1、S 2、S 3。问题语句对应的答案为“True”。论证图(proof)包括节点S 1、节点S 2以及节点S 1指向节点 S 3的有向边。 As shown in Fig. 4a, it shows a schematic diagram of the relationship among answer variable A, node variable V i and edge variable E ij . The associated context in Fig. 4a includes statements S 1 , S 2 , S 3 . The above association context may include three nodes: S 1 , S 2 , and S 3 . The answer to the question statement is "True". The proof graph (proof) includes node S 1 , node S 2 and directed edges from node S 1 to node S 3 .
右图中的节点实心圆圈标识在答案变量A取1时,节点V 1的值为1,节点V 2的值为0,节点V 3的值为1。有向边E 13(节点V 1指向节点V 3)的值为1,其他有向边E 12、E 23、E 32、E 21的值为0。 The solid circles of the nodes in the right figure indicate that when the answer variable A is 1, the value of node V 1 is 1, the value of node V 2 is 0, and the value of node V 3 is 1. The value of directed edge E 13 (node V 1 pointing to node V 3 ) is 1, and the values of other directed edges E 12 , E 23 , E 32 , and E 21 are 0.
与图4a对应的势函数包括
Figure PCTCN2022101739-appb-000009
The potential functions corresponding to Fig. 4a include
Figure PCTCN2022101739-appb-000009
Figure PCTCN2022101739-appb-000010
Figure PCTCN2022101739-appb-000010
图4b显示了图4a中例子的联合分布p(Y)的因子图。如图4b所示,上述因子包括节点V1、V2、V3,边E 12、E 13、E 21、E 23、E 31、E 32、答案V A、各节点分别与答案之间的势函数
Figure PCTCN2022101739-appb-000011
答案的势函数Φ A;各边对应的势函数
Figure PCTCN2022101739-appb-000012
各因子之间的关联关系。
Figure 4b shows a factor plot of the joint distribution p(Y) for the example in Figure 4a. As shown in Figure 4b, the above factors include nodes V1, V2, V3, edges E 12 , E 13 , E 21 , E 23 , E 31 , E 32 , answer V A , potential functions between each node and the answer
Figure PCTCN2022101739-appb-000011
The potential function Φ A of the answer; the potential function corresponding to each side
Figure PCTCN2022101739-appb-000012
The relationship between the various factors.
从理论上讲,对于标准答案y^*,使得以下目标最小化:Theoretically, for the standard answer y^*, the following objective is minimized:
L joint=logp(Y=y *)   (6); L joint = logp(Y = y * ) (6);
步骤303,使用神经网络对每个势函数进行参数化,得到经过参数化处理后的联合分布。 Step 303, using the neural network to parameterize each potential function to obtain a parameterized joint distribution.
势函数Φ A(a):为了给答案变量A的可能值打分(0或1)使用多层感知器(Multilayer Perceptron,MLP)作为非线性转换函数,对关联上下文C的全局特征向量进行非线性转换,得到答案变量A的答案势函数: Potential function Φ A (a): In order to score the possible values of the answer variable A (0 or 1), use a multilayer perceptron (Multilayer Perceptron, MLP) as a nonlinear conversion function, and perform a nonlinear transformation on the global feature vector of the associated context C Convert to get the answer potential function of the answer variable A:
Figure PCTCN2022101739-appb-000013
Figure PCTCN2022101739-appb-000013
势函数型
Figure PCTCN2022101739-appb-000014
对于每个句子S i(一个事实或一个规则),可以由步骤302来计算句子S i的特征向量
Figure PCTCN2022101739-appb-000015
为了对变量(V,A)的可能值进行评分,可以使用另一个多层感知机MLP2为非线性转换函数,对特征向量
Figure PCTCN2022101739-appb-000016
非线性化,得到节点变量的节点势函数:
Potential type
Figure PCTCN2022101739-appb-000014
For each sentence S i (a fact or a rule), the feature vector of the sentence S i can be calculated by step 302
Figure PCTCN2022101739-appb-000015
In order to score the possible values of the variables (V,A), another multi-layer perceptron MLP2 can be used as a non-linear transfer function for the feature vector
Figure PCTCN2022101739-appb-000016
Nonlinearization yields nodal potential functions for nodal variables:
Figure PCTCN2022101739-appb-000017
Figure PCTCN2022101739-appb-000017
其中维度4表示节点变量V,和答案变量A组合的可能值的数量。另外可以在所有句子中共享MLP 2的参数。 Dimension 4 represents the number of possible combinations of node variable V and answer variable A. Additionally the parameters of MLP 2 can be shared across all sentences.
势函数
Figure PCTCN2022101739-appb-000018
对于每个句子对(s i,s j),得到句子对表示
Figure PCTCN2022101739-appb-000019
为了对四个变量(V i,V j,E ij,A)进行评分。使用一个MLP 3作为非线性化函数,将有向边E ij进行非线性化,得到有向边变量的边势函数:
potential function
Figure PCTCN2022101739-appb-000018
For each sentence pair (s i , s j ), get the sentence pair representation
Figure PCTCN2022101739-appb-000019
In order to score four variables (V i , V j , E ij , A). Using an MLP 3 as the nonlinearization function, the directed edge E ij is nonlinearized to obtain the edge potential function of the directed edge variable:
Figure PCTCN2022101739-appb-000020
其中
Figure PCTCN2022101739-appb-000020
in
Figure PCTCN2022101739-appb-000021
表示向量拼接。
Figure PCTCN2022101739-appb-000021
Represents a vector concatenation.
其中维度16表示四个变量(V i,V j,E ij,A)组合的可能值的数量。可以在所有句子对中共享MLP 3的参数。 Dimension 16 represents the number of possible values of combinations of four variables (V i , V j , E ij , A). The parameters of MLP 3 can be shared across all sentence pairs.
步骤304,确定所述经过参数化处理后的联合分布的伪似然函数;以及确定用于近似表征所述伪似然函数的变分近似,以得到计算机可求解的概率图神经网络模型。 Step 304, determining the pseudo-likelihood function of the parameterized joint distribution; and determining a variational approximation for approximately characterizing the pseudo-likelihood function, so as to obtain a computer-solvable probability graph neural network model.
为了简化计算,可以使用伪似然函数来近似表征上述参数化后的联合分布(又称联合概率分布)。In order to simplify the calculation, a pseudo-likelihood function can be used to approximate the above parameterized joint distribution (also known as joint probability distribution).
Figure PCTCN2022101739-appb-000022
Figure PCTCN2022101739-appb-000022
为了降低使用伪似然函数确定最优分配的难度,便于使用计算机对上述联合分布进行求解,可以确定用于近似表征伪似然函数的变分近似,从而得到计算机可求解的概率图神经网络模型。In order to reduce the difficulty of using the pseudo-likelihood function to determine the optimal allocation and facilitate the use of computers to solve the above-mentioned joint distribution, the variational approximation used to approximate the pseudo-likelihood function can be determined to obtain a computer-solvable probability graph neural network model .
变分近似:基于平均场假设,使用变分分布(变分近似)q(Y)来近似Y的伪似然,其中y∈Y是相互独立的。同样地,可以用一个神经网络对每个独立分布进行参数化。由公式(11)~(12)表示上述变分近似:Variational approximation: Based on the mean field assumption, the pseudo-likelihood of Y is approximated using a variational distribution (variational approximation) q(Y), where y∈Y is independent of each other. Likewise, each independent distribution can be parameterized with a neural network. The above variational approximation is expressed by formulas (11)~(12):
Figure PCTCN2022101739-appb-000023
Figure PCTCN2022101739-appb-000023
Figure PCTCN2022101739-appb-000024
Figure PCTCN2022101739-appb-000024
一旦得到变分分布(变分近似)q(Y),就可以为伪似然提供条件p(y|-Y)。从而避免了为了确定伪似然函数的最优分布而进行的采样。Once the variational distribution (variational approximation) q(Y) is obtained, the condition p(y|-Y) can be provided for the pseudo-likelihood. Sampling to determine the optimal distribution of the pseudo-likelihood function is thereby avoided.
通过上述步骤建立了概率图神经网络模型。可以对上述概率图神经网络模型进行训练,得到训练后的概率图神经网络模型。可以使用训练后的概率图神经网络模型来根据输入的关联上下文以及问题语句来确定问题语句对应的答案和论证图。The probabilistic graph neural network model is established through the above steps. The above probabilistic graph neural network model can be trained to obtain the trained probabilistic graph neural network model. The trained probabilistic graph neural network model can be used to determine the answer and argument graph corresponding to the question sentence according to the input associated context and the question sentence.
在一些可选的实现方式中,所述概率图神经网络模型经过如下步 骤训练得到:In some optional implementations, the probabilistic graph neural network model is obtained through the following steps of training:
首先,获取训练样本集,所述训练样本集包括多组训练样本,每组训练样本包括样本关联上下文、样本问题语句、样本问题语句对应的样本答案、以及由所述样本关联上下文得到所述样本答案的样本论证图;其中,所述样本论证图是有向无环图,所述有向无环图包括节点和节点之间的有向边,所述节点为所述上下文中的语句,所述节点之间的有向边表征关联的两节点之间的推断关系,所述样本关联上下文为包括样本问题语句对应的答案的关联上下文。First, obtain a training sample set, the training sample set includes multiple sets of training samples, each set of training samples includes a sample associated context, a sample question sentence, a sample answer corresponding to the sample question sentence, and the sample associated context obtained from the sample A sample demonstration graph of the answer; wherein, the sample demonstration graph is a directed acyclic graph, and the directed acyclic graph includes nodes and directed edges between nodes, and the nodes are statements in the context, so The directed edge between the nodes represents an inferred relationship between two associated nodes, and the sample association context is an association context including answers corresponding to sample question sentences.
其次,将样本关联上下文以及样本问题语句作为输入,将所述样本答案以及样本论证图作为输出,对概率图神经网络模型进行训练,得到训练后的概率图神经网络模型。Secondly, the sample associated context and the sample question statement are used as input, and the sample answer and the sample argument graph are used as output to train the probability graph neural network model to obtain the trained probability graph neural network model.
在一些应用场景中,可以使用上述训练样本对所述概率图神经网络模型训练预设次数后,得到训练后的概率图神经网络模型。上述预设次数可以包括1000次、5000次等,此次不做限制。In some application scenarios, the trained probabilistic graph neural network model can be obtained after training the probabilistic graph neural network model for a preset number of times using the above training samples. The preset number of times mentioned above may include 1000 times, 5000 times, etc., and there is no limitation this time.
在另外一些可选的实现方式中,所述将样本关联上下文以及样本问题语句作为输入,将所述样本答案以及样本论证图作为输出,对初始训练模型进行训练,得到训练后的神经网络模型,包括:In some other optional implementation manners, the sample associated context and sample question statement are used as input, and the sample answer and sample argumentation diagram are used as output to train the initial training model to obtain a trained neural network model, include:
首先,基于如下步骤建立损失函数:基于样本节点的节点特征向量、有向边对应的边特征向量,建立样本答案、节点特征向量、边特征向量之间的联合分布和近似的变分分布;根据所述联合分布和近似的变分分布确定所述损失函数。First, the loss function is established based on the following steps: based on the node eigenvectors of the sample nodes and the edge eigenvectors corresponding to the directed edges, the joint distribution and approximate variational distribution among the sample answers, node eigenvectors, and edge eigenvectors are established; according to The joint distribution and the approximated variational distribution determine the loss function.
其次,将样本关联上下文以及样本问题语句作为输入,将所述样本答案以及样本论证图作为输出,基于所述预设损失函数利用反向传播算法,对所述概率图神经网络模型进行训练,直至满足预设条件。Secondly, the sample associated context and sample question statement are used as input, and the sample answer and sample argument graph are used as output, and the probability graph neural network model is trained based on the preset loss function using the backpropagation algorithm until Meet the preset conditions.
上述基于所述预设损失函数利用反向传播算法对概率图神经网络模型进行训练,可以参考现有的利用反向传播算法对神经网络模型进行训练的方法,此处不赘述。For the training of the probability graph neural network model by using the backpropagation algorithm based on the preset loss function above, reference may be made to the existing method for training the neural network model by using the backpropagation algorithm, which will not be repeated here.
在这些可选的实现方式中,可以先建立损失函数,然后再利用损失函数对概率图神经网络模型进行训练。所建立的上述损失函数与样本节点的节点特征向量、有向边对应的边特征向量,建立样本答案、节点特征向量、边特征向量之间的联合分布和近似的变分分布相关。与本公开的概率图神经网络模型的匹配度较高,在训练上述概率图神经网络模型的过程中,可以起到快速优化上述概率图神经网络模型的效果。In these optional implementations, the loss function can be established first, and then the probability graph neural network model can be trained using the loss function. The established loss function is related to the node feature vector of the sample node, the edge feature vector corresponding to the directed edge, and the joint distribution and approximate variational distribution between the sample answer, node feature vector, and edge feature vector. The degree of matching with the probabilistic graph neural network model of the present disclosure is high, and in the process of training the above probabilistic graph neural network model, the effect of quickly optimizing the above probabilistic graph neural network model can be achieved.
进一步可选地,所述基于样本节点的节点特征向量、有向边对应的边特征向量,建立样本答案、节点特征向量、边特征向量之间的联合分布,包括:根据样本问题和样本关联上下文的全局特征表示,确定关于样本答案的第一势函数;对于样本论证图的每一个样本节点, 根据该样本节点的特征向量,建立关于该样本节点与所述样本答案的第二势函数;对于样本论证图的每一个有向边,根据该有向边的特征向量,建立关于该有向边与所述样本答案及关联的两个样本节点的第三势函数,所述有向边的特征向量与该有相关的两个节点各自对应的特征向量相关。Further optionally, the establishment of a joint distribution among sample answers, node feature vectors, and edge feature vectors based on the node feature vector of the sample node and the edge feature vector corresponding to the directed edge includes: according to the sample question and the sample association context The global feature representation of , determine the first potential function about the sample answer; for each sample node in the sample argument graph, according to the feature vector of the sample node, establish the second potential function about the sample node and the sample answer; for For each directed edge of the sample demonstration graph, according to the eigenvector of the directed edge, a third potential function about the directed edge, the sample answer and the associated two sample nodes is established, and the characteristic of the directed edge The vectors are related to the corresponding eigenvectors of the two nodes that are related.
对于训练样本,可以参照公式(7)、(8)、(9)各自相同的方法来分别建立关于样本答案的第一势函数、关于样本节点与所述样本答案的第二势函数、样本有向边与所述样本答案及关联的两个样本节点的第三势函数,此处不赘述。For the training samples, the first potential function about the sample answer, the second potential function about the sample node and the sample answer, the sample has The third potential function of the edge, the sample answer and the associated two sample nodes will not be described here.
上述基于样本节点的节点特征向量、有向边对应的边特征向量,建立样本答案、节点特征向量、边特征向量之间的近似的变分分布,包括:确定所述联合分布对应的伪似然函数;基于平均场假设,使用变分分布来近似所述伪似然函数,其中所述变分分布中的每个变量是相互独立的,所述变量包括:样本答案、所述样本论证图中的节点和样本图论证图中的有向边。Based on the node eigenvectors of the sample nodes and the edge eigenvectors corresponding to the directed edges, an approximate variational distribution among the sample answers, node eigenvectors, and edge eigenvectors is established, including: determining the pseudo-likelihood corresponding to the joint distribution function; based on the mean field assumption, using a variational distribution to approximate the pseudo-likelihood function, wherein each variable in the variational distribution is independent of each other, the variables include: sample answer, the sample argument graph The nodes and sample graphs of demonstrate the directed edges in the graph.
为了便于计算损失函数的值,可以将样本答案、所述样本论证图节点及所述样本论证图有向边之间的联合分布近似为伪似然函数(可以参考公式(10)相关的内容,此次不赘述)。具体地,上述根据所述联合分布和近似的变分分布确定所述损失函数,包括:根据所述变分分布确定第一损失函数和第二损失函数,以根据所述伪似然函数确定第三损失函数;所述第一损失函数用于表征:由所述概率图神经网络模型预测的论证图中所包括的预测节点,与样本论证图中所包括的样本节点之间的偏差;所述第二损失函数用于表征:由所述概率图神经网络模型预测的论证图中所包括的预测有向边,与样本论证图中所包括的有向边之间的偏差;所述第三损失函数用于表征:由所述概率图神经网络模型预测的论证图中所包括的预测答案与样本答案之间的偏差;其中,通过所述伪似然函数所确定的第三损失函数中的节点和有向边是通过变分分布的预测结果。In order to facilitate the calculation of the value of the loss function, the joint distribution between the sample answer, the node of the sample argumentation graph and the directed edge of the sample argumentation graph can be approximated as a pseudo-likelihood function (refer to the relevant content of formula (10), I won’t go into details this time). Specifically, the above-mentioned determining the loss function according to the joint distribution and the approximate variational distribution includes: determining the first loss function and the second loss function according to the variational distribution, so as to determine the first loss function according to the pseudo-likelihood function Three loss functions; the first loss function is used to characterize: the deviation between the prediction nodes included in the demonstration graph predicted by the probability graph neural network model and the sample nodes included in the sample demonstration graph; the The second loss function is used to characterize: the deviation between the predicted directed edges included in the argument graph predicted by the probabilistic graph neural network model and the directed edges included in the sample argument graph; the third loss The function is used to characterize: the deviation between the predicted answer included in the argument graph predicted by the probabilistic graph neural network model and the sample answer; wherein, the nodes in the third loss function determined by the pseudo-likelihood function and directed edges are the predicted outcomes through the variational distribution.
上述样本答案、节点特征向量、边特征向量之间的近似的变分分布可以参考公式(11)、(12),此次不赘述。The approximate variational distribution among the above sample answers, node feature vectors, and edge feature vectors can refer to formulas (11) and (12), which will not be repeated this time.
具体地,上述第一损失函数由如下公式表征:Specifically, the above-mentioned first loss function is characterized by the following formula:
Figure PCTCN2022101739-appb-000025
Figure PCTCN2022101739-appb-000025
上述第二损失函数由如下公式表征:The above second loss function is characterized by the following formula:
Figure PCTCN2022101739-appb-000026
Figure PCTCN2022101739-appb-000026
第三损失函数可以由如下公式表征:The third loss function can be characterized by the following formula:
Figure PCTCN2022101739-appb-000027
其中
Figure PCTCN2022101739-appb-000027
in
Figure PCTCN2022101739-appb-000028
是变分近似的预测结果。这里的P()为联合分布的伪似然函数。
Figure PCTCN2022101739-appb-000029
是训练样本中标注的节点;
Figure PCTCN2022101739-appb-000030
是训练样本中标注的有向边。a *是训练样本标注的答案。
Figure PCTCN2022101739-appb-000028
is the prediction result of the variational approximation. Here P() is the pseudo-likelihood function of the joint distribution.
Figure PCTCN2022101739-appb-000029
is the node marked in the training sample;
Figure PCTCN2022101739-appb-000030
is the directed edge labeled in the training sample. a * is the answer labeled by the training sample.
在一些应用场景中,可以将第一损失函数、第二损失函数、第三损失函数之和确定为优化目标。In some application scenarios, the sum of the first loss function, the second loss function, and the third loss function may be determined as the optimization target.
可选地,上述所述预设条件,包括:Optionally, the preset conditions mentioned above include:
所述第一损失函数、第二损失函数、第三损失函数之和满足收敛条件。The sum of the first loss function, the second loss function and the third loss function satisfies a convergence condition.
上述收敛条件连续多次训练中,每相邻两次的训练得到的所述第一损失函数、第二损失函数、第三损失函数之和的变化小于预设变化阈值;或者During multiple consecutive trainings of the above convergence condition, the change of the sum of the first loss function, the second loss function, and the third loss function obtained in every two adjacent trainings is less than a preset change threshold; or
第一损失函数、第二损失函数、第三损失函数之和最小。The sum of the first loss function, the second loss function, and the third loss function is the smallest.
通过上述过程,可以简化求取损失函数的值的过程,有利于提高神经概率图神经网模型的训练效率。Through the above process, the process of calculating the value of the loss function can be simplified, which is conducive to improving the training efficiency of the neural network model of the neural probability graph.
可选地,上述预设条件包括训练次数达到预设次数阈值。Optionally, the aforementioned preset condition includes that the number of training times reaches a preset number of times threshold.
上述概率图的神经网络模型可以得到较高的预测精度。The neural network model of the above probability map can obtain higher prediction accuracy.
在大样本训练数据集(大样本训练数据集可以例如可以包括7万个训练数据)对上述概率图神经网络模型进行训练后,对概率图神经网络模型测试后的结果:答案准确率为99.99%;论证图的准确率为88.8%。After the above-mentioned probabilistic graph neural network model is trained in a large-sample training data set (a large-sample training data set can include, for example, 70,000 training data), the result after the probabilistic graph neural network model test: the answer accuracy rate is 99.99% ; The accuracy of the demonstration graph is 88.8%.
在小样本训练数据集(例如从上述大样本训练样本随机抽取30000组、10000组、1000组的训练数据)对上述概率图神经网络模型进行训练。对训练后的对概率图神经网络模型进行测试。测试结果如下:训练样本为30000组,答案准确率为99.9%;论证图的准确率为86.8%;训练样本为10000组,答案准确率为99.9%;论证图的准确率为72.4%;训练样本为1000组,答案准确率为82.1%;论证图的准确率为21.1%。上述概率图神经网络模型在使用小样本进行训练时,可以得到准确度较高的概率图神经网络模型。也即,可以使用数量较小的训练数据对概率图神经网络模型进行训练,就可以得到预测结果精确度较高的概率图神经网络模型。The above-mentioned probabilistic graph neural network model is trained in a small-sample training data set (for example, 30,000, 10,000, and 1,000 sets of training data randomly selected from the above-mentioned large-sample training samples). Test the trained PGNN model. The test results are as follows: the training sample is 30,000 groups, and the answer accuracy rate is 99.9%; the accuracy rate of the demonstration map is 86.8%; the training sample is 10,000 groups, the answer accuracy rate is 99.9%; the accuracy rate of the argument map is 72.4%; There are 1000 groups, the accuracy rate of the answer is 82.1%; the accuracy rate of the argument graph is 21.1%. When the above probabilistic graph neural network model is trained with small samples, a probabilistic graph neural network model with high accuracy can be obtained. That is, a small amount of training data can be used to train the probabilistic graph neural network model, and a probabilistic graph neural network model with high prediction result accuracy can be obtained.
使用非目标类别的训练样本对概率图神经网络模型进行训练后,使用训练后的概率图神经网络模型根据该类别的关联上下文对该类别的问题进行推理的方式称为(Zero-shot Evaluation)。After using the training samples of the non-target category to train the probabilistic graph neural network model, the method of using the trained probabilistic graph neural network model to reason about the problem of this category according to the associated context of the category is called (Zero-shot Evaluation).
在使用不包括目标类别的大样本数据集对概率图神经网络模型进 行训练后,对该目标类别的测试样本进行推理得到的结果为:答案准确率为96.3%;论证图的准确率为79.3%。After using a large sample data set that does not include the target category to train the probabilistic graph neural network model, the results of inference on the test sample of the target category are as follows: the accuracy of the answer is 96.3%; the accuracy of the argument graph is 79.3% .
从上述数据可以看出,经过训练后的概率图神经网络模型的可移植性较高。在将训练后的概率图神经网络模型应用到其他类别的自然语言推理环境中时,可以得到较高的推理结果。From the above data, it can be seen that the portability of the trained probabilistic graph neural network model is high. When the trained probabilistic graph neural network model is applied to other types of natural language inference environments, higher inference results can be obtained.
请参考图5,其示出了本公开提供的基于自然语言推理的信息处理方法的一个应用场景示意图。Please refer to FIG. 5 , which shows a schematic diagram of an application scenario of the information processing method based on natural language reasoning provided by the present disclosure.
如图5所示,图5中输入的信息包括:关联上下文、问题语句。关联上下文可以包括事实语句F1、F2和判断规则语句R1、R2、R3、R4、R5、R6。可以输入问题语句Q1和Q2。上述问题语句Q1、Q2可以分次输入。可以通过上述步骤302~303得到问题语句Q1对应的答案A1和论证图1,以问题语句Q2对应的答案A2和论证图2。可以将每个语句视为一个节点。问题语句Q1的对应答案A1:TRUE,得出对应答案A1的论证图为由节点F2指向节点R2的有向边。上述论证图说明了可以由节点F2和节点R2得到上述答案A1的过程:先由节点F2上的事实语句确定导线是金属的,再由规则语句R4提供的判断规则,确定问题的答案为“对”。As shown in FIG. 5 , the input information in FIG. 5 includes: associated context and question statement. The associated context may include fact statements F1, F2 and judgment rule statements R1, R2, R3, R4, R5, R6. Question sentences Q1 and Q2 can be input. The above question statements Q1 and Q2 can be input in batches. The answer A1 and argument diagram 1 corresponding to the question statement Q1 can be obtained through the above steps 302-303, and the answer A2 and argument diagram 2 corresponding to the question statement Q2 can be obtained. Each statement can be thought of as a node. The corresponding answer A1:TRUE of the question statement Q1, and the argument graph corresponding to the answer A1 is a directed edge from node F2 to node R2. The above demonstration diagram illustrates the process of obtaining the above answer A1 from nodes F2 and R2: first, the fact statement on node F2 determines that the wire is metal, and then the judgment rule provided by the rule statement R4 determines that the answer to the question is "Yes ".
对于问题语句Q2,问题语句Q2“电路没有电流流过”的对应答案A2:FALSE。得出答案A2的论证图2给出了得出上述答案A2的过程。NAF节点:Negation As Failure,表示在封闭世界假设(Close World Assumption)下,对于一个陈述S,如果根据已有的事实和规则推断不出陈述S是正确的,即陈述S是错误的,则可推出非陈述S是正确的。在图5中,NAF节点表示“the circuit does not have the switch”。上述答案A2,可以由论证图2中的各节点与有向边得出。也即,由节点NAF至节点R1的有向边,节点F1至节点R1的有向边;节点R1至节点R6的有向边;节点NAF至节点R3的有向边,以及节点R3至节点R6的有向边;节点F2至节点R4的有向边,节点R4至节点R6的有向边。上述论证图2给出了得出答案A2的论证。For the problem statement Q2, the corresponding answer A2 of the problem statement Q2 "no current flows in the circuit" is FALSE. Argument to arrive at answer A2 Figure 2 shows the process of arriving at answer A2 above. NAF node: Negation As Failure, which means that under the Close World Assumption (Close World Assumption), for a statement S, if it cannot be inferred from the existing facts and rules that the statement S is correct, that is, the statement S is wrong, then it can be Introducing the non-statement S is correct. In Figure 5, the NAF node means "the circuit does not have the switch". The above answer A2 can be obtained by demonstrating the nodes and directed edges in Figure 2. That is, the directed edge from node NAF to node R1, the directed edge from node F1 to node R1; the directed edge from node R1 to node R6; the directed edge from node NAF to node R3, and the directed edge from node R3 to node R6 The directed edge of ; the directed edge from node F2 to node R4, the directed edge from node R4 to node R6. The above argument Figure 2 shows the argument that leads to answer A2.
本实施例提供的基于自然语言推理的信息处理方法,突出了使用语言模型以及概率图神经网络模型,得到问题匹配的答案以及论证图的步骤;上述概率图神经网络模型由答案、节点、有向边的联合分布得到,因此,所得到的答案以及论证图的关联度较高,论证图对所述答案的证明力度较大。由上述概率图模型神经网络模型给出的论证图可以辅助答案的预测,提高了回答问题的能力。另外,由于概率图神经网络模型由答案、节点、有向边的联合分布得到,因此可以使用少样本训练,既可得到可以得到结果精确度较高的概率图神经网络模型。The information processing method based on natural language reasoning provided by this embodiment highlights the steps of using a language model and a probability graph neural network model to obtain answers to questions and argument graphs; the above-mentioned probability graph neural network model consists of answers, nodes, directed The joint distribution of the edges is obtained, therefore, the obtained answer and the argument graph have a high degree of correlation, and the argument graph has a greater strength to prove the answer. The argument graph given by the above probabilistic graph model neural network model can assist in the prediction of answers and improve the ability to answer questions. In addition, since the probabilistic graph neural network model is obtained from the joint distribution of answers, nodes, and directed edges, it can be trained with few samples, and a probabilistic graph neural network model with high accuracy of results can be obtained.
进一步参考图6,作为对上述各图所示方法的实现,本公开提供了一种基于自然语言推理的信息处理装置的一些实施例,该装置实施例 与图1所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。Further referring to FIG. 6 , as an implementation of the methods shown in the above figures, the present disclosure provides some embodiments of an information processing device based on natural language reasoning, which corresponds to the method embodiment shown in FIG. 1 , the device can be specifically applied to various electronic devices.
如图6所示,本实施例的基于自然语言推理的信息处理装置包括:接收单元601、确定单元602。其中,接收单元601,用于接收问题语句以及关联上下文,其中所述问题语句用于表征待给定答案的问题;确定单元602,用于基于所述问题语句的问题特征信息以及所述关联上下文的上下文特征信息,确定所述问题的答案以及得出答案的论证图;所述论证图表征由所述关联上下文推理得到所述答案的过程As shown in FIG. 6 , the information processing device based on natural language reasoning in this embodiment includes: a receiving unit 601 and a determining unit 602 . Wherein, the receiving unit 601 is used to receive the question statement and the associated context, wherein the question statement is used to characterize the question to be answered; the determining unit 602 is used to receive the question characteristic information based on the question statement and the associated context The contextual feature information of the question, determine the answer to the question and the argument graph for deriving the answer; the argument graph represents the process of deriving the answer from the associated context reasoning
在本实施例中,基于自然语言推理的信息处理装置的生成单元601、接收单元601、确定单元602的具体处理及其所带来的技术效果可分别参考图1对应实施例中步骤101、步骤102的相关说明,在此不再赘述。In this embodiment, the specific processing of the generating unit 601, the receiving unit 601, and the determining unit 602 of the information processing device based on natural language inference and the technical effects brought about by them can refer to step 101 and step 101 in the corresponding embodiment in FIG. Relevant descriptions of 102 will not be repeated here.
在一些可选的实现方式中,所述论证图为有向无环图,所述有向无环图包括节点和节点之间的有向边,所述节点为所述关联上下文中的语句,所述节点之间的有向边表征关联的两节点之间的推断关系。In some optional implementation manners, the argument graph is a directed acyclic graph, the directed acyclic graph includes nodes and directed edges between nodes, the nodes are statements in the associated context, A directed edge between the nodes represents an inferred relationship between the associated two nodes.
在一些可选的实现方式中,确定单元602进一步用于:将所述问题语句以及所述关联上下文输入到预训练的语言模型,得到问题特征向量以及关联上下文特征向量;将所述问题特征向量以及所述关联上下文特征向量输入到概率图神经网络模型,由所述概率图神经网络模型推理得到所述答案和得到所述答案的论证图。In some optional implementations, the determining unit 602 is further configured to: input the question statement and the associated context into a pre-trained language model to obtain a question feature vector and an associated context feature vector; And the associated context feature vector is input to a probabilistic graph neural network model, and the answer and a demonstration graph of the answer are obtained by inference from the probabilistic graph neural network model.
在一些可选的实现方式中,所述概率图神经网络模型如下步骤得到:通过概率图模型定义所述问题语句的所有可能的答案的论证图和答案的联合分布,以显式建立所述论证图和答案之间的依赖,论证图包括表征所述关联上下文的语句的节点和节点间的有向边,所述联合分布包括答案变量、节点变量和有向边变量;利用设计的答案势函数、节点势函数和边势函数显式建立所述联合分布中的不同变量之间的依赖关系;所述节点势函数与节点、答案相关,所述边势函数与节点、答案和节点间的有向边相关;使用神经网络对每个势函数进行参数化,得到经过参数化处理后的联合分布;确定所述经过参数化处理后的联合分布的伪似然函数;以及确定用于近似表征所述伪似然函数的变分近似,以得到计算机可求解的概率图神经网络模型。In some optional implementations, the probabilistic graph neural network model is obtained by the following steps: define the argument graph of all possible answers to the question statement and the joint distribution of the answers through the probabilistic graph model, so as to explicitly establish the argument Dependence between the graph and the answer, the argument graph includes the nodes representing the sentences of the associated context and the directed edges between the nodes, and the joint distribution includes answer variables, node variables and directed edge variables; using the designed answer potential function , node potential function and edge potential function explicitly establish the dependency between different variables in the joint distribution; the node potential function is related to the node and the answer, and the edge potential function is related to the node, answer, and Edge-wise correlation; use the neural network to parameterize each potential function to obtain a joint distribution after parameterization; determine the pseudo-likelihood function of the joint distribution after parameterization; and determine the Variational approximation of the pseudo-likelihood function to obtain a computer-solvable probabilistic graph neural network model.
在一些可选的实现方式中,所示基于自然语言推理的基于自然语言推理的信息处理装置还包括训练单元(图中未示出)。训练单元用于基于如下步骤对所述概率图神经网络模型训练得到训练后的概率图神经网络模型:获取训练样本集,所述训练样本集包括多组训练样本,每组训练样本包括样本关联上下文、样本问题语句、样本问题语句对应的样本答案、以及由所述样本关联上下文得到所述样本答案的样本 论证图;其中,所述样本论证图是有向无环图,所述有向无环图包括节点和节点之间的有向边,所述节点为所述上下文中的语句,所述节点之间的有向边表征关联的两节点之间的推断关系,,所述样本关联上下文为包括样本问题语句对应的答案的关联上下文;将样本关联上下文以及样本问题语句作为输入,将所述样本答案以及样本论证图作为输出,对概率图神经网络模型进行训练,得到训练后的概率图神经网络模型。In some optional implementation manners, the shown natural language reasoning-based information processing apparatus further includes a training unit (not shown in the figure). The training unit is used to train the probability graph neural network model based on the following steps to obtain the trained probability graph neural network model: obtain a training sample set, the training sample set includes multiple sets of training samples, and each set of training samples includes a sample associated context , a sample question sentence, a sample answer corresponding to a sample question sentence, and a sample argumentation graph obtained from the sample associated context to obtain the sample answer; wherein, the sample argumentation graph is a directed acyclic graph, and the directed acyclic The graph includes nodes and directed edges between nodes, the nodes are sentences in the context, the directed edges between the nodes represent the inferred relationship between two associated nodes, and the sample association context is Including the associated context of the answer corresponding to the sample question statement; using the sample associated context and the sample question statement as input, the sample answer and the sample argument graph as output, the probability graph neural network model is trained, and the trained probability graph neural network model is obtained. network model.
在一些可选的实现方式中,训练单元进一步用于基于如下步骤建立损失函数:基于样本节点的节点特征向量、有向边对应的边特征向量,建立样本答案、节点特征向量、边特征向量之间的联合分布和近似的变分分布;根据所述联合分布和近似的变分分布确定所述损失函数;将样本关联上下文以及样本问题语句作为输入,将所述样本答案以及样本论证图作为输出,基于所述预设损失函数利用反向传播算法,对所述概率图神经网络模型进行训练,直至满足预设条件。In some optional implementations, the training unit is further used to establish the loss function based on the following steps: based on the node feature vector of the sample node and the edge feature vector corresponding to the directed edge, the relationship between the sample answer, the node feature vector, and the edge feature vector is established. The joint distribution between them and the approximate variational distribution; the loss function is determined according to the joint distribution and the approximated variational distribution; the sample association context and the sample question statement are used as input, and the sample answer and the sample argument graph are used as output and using a backpropagation algorithm based on the preset loss function to train the probabilistic graph neural network model until a preset condition is met.
在一些可选的实现方式中,训练单元进一步用于:根据样本问题和样本关联上下文的全局特征表示,确定关于样本答案的第一势函数;对于样本论证图的每一个样本节点,根据该样本节点的特征向量,建立关于该样本节点与所述样本答案的第二势函数;对于样本论证图的每一个有向边,根据该有向边的特征向量,建立关于该有向边与所述样本答案及关联的两个样本节点的第三势函数,所述有向边的特征向量与该有相关的两个节点各自对应的特征向量相关;基于所述第一势函数、各样本节点分别对应的第二势函数以及各样本有向边分别对应的第三势函数,参数化所述样本答案、所述样本论证图节点及所述样本论证图有向边之间的联合分布。In some optional implementations, the training unit is further used to: determine the first potential function of the sample answer according to the global feature representation of the sample question and the sample context; for each sample node of the sample argument graph, according to the sample The eigenvector of the node, establishes the second potential function about the sample node and the sample answer; for each directed edge of the sample argument graph, according to the eigenvector of the directed edge, establishes the The third potential function of the sample answer and the associated two sample nodes, the eigenvectors of the directed edges are related to the respective eigenvectors corresponding to the two related nodes; based on the first potential function, each sample node is respectively The corresponding second potential function and the third potential function respectively corresponding to the directed edges of each sample parameterize the joint distribution among the sample answers, the nodes of the sample argument graph and the directed edges of the sample argument graph.
在一些可选的实现方式中,训练单元进一步用于:确定所述联合分布对应的伪似然函数;基于平均场假设,使用变分分布来近似所述伪似然函数,其中所述变分分布中的每个变量是相互独立的,所述变量包括:样本答案、所述样本论证图中的节点和样本图论证图中的有向边。In some optional implementations, the training unit is further used to: determine the pseudo-likelihood function corresponding to the joint distribution; use a variational distribution to approximate the pseudo-likelihood function based on the mean field assumption, wherein the variational Each variable in the distribution is independent of each other, and the variables include: sample answers, nodes in the argument graph of the sample graph, and directed edges in the argument graph of the sample graph.
在一些可选的实现方式中,训练单元进一步用于:根据所述近似的变分分布确定第一损失函数和第二损失函数;根据所述伪似然函数确定第三损失函数;所述第一损失函数用于表征:由所述概率图神经网络模型预测的论证图中所包括的预测节点,与样本论证图中所包括的样本节点之间的偏差;所述第二损失函数用于表征:由所述概率图神经网络模型预测的论证图中所包括的预测有向边,与样本论证图中所包括的有向边之间的偏差;所述第三损失函数用于表征:由所述概率图神经网络模型预测的论证图中所包括的预测答案与样本答案之间的偏差;其中,通过所述伪似然函数所确定的第三损失函数中的节点 和有向边是通过变分分布的预测结果。In some optional implementation manners, the training unit is further configured to: determine a first loss function and a second loss function according to the approximate variational distribution; determine a third loss function according to the pseudo-likelihood function; the first A loss function is used to characterize: the deviation between the prediction nodes included in the demonstration graph predicted by the probability graph neural network model and the sample nodes included in the sample demonstration graph; the second loss function is used to characterize : the deviation between the predicted directed edges included in the argument graph predicted by the probability graph neural network model and the directed edges included in the sample argument graph; the third loss function is used to characterize: by the The deviation between the predicted answer included in the demonstration graph predicted by the probability graph neural network model and the sample answer; wherein, the nodes and directed edges in the third loss function determined by the pseudo-likelihood function are obtained by changing distribution of prediction results.
在一些可选的实现方式中,所述预设条件,包括:所述第一损失函数、第二损失函数、第三损失函数之和满足收敛条件;或者训练次数达到预设次数阈值。In some optional implementation manners, the preset condition includes: the sum of the first loss function, the second loss function, and the third loss function satisfies a convergence condition; or the number of training times reaches a preset number threshold.
在一些可选的实现方式中,确定单元602进一步用于:根据所述问题语句,使用预设检索方法,从所述关联上下文中检索出至少一个语句;使用预设编码方法将上述检索出所述至少一个语句进行编码;以及将编码化的所述问题语句以及编码化的所述至少一个语句,输入到预训练的语言模型,得到所述问题特征向量以及所述关联上下文特征向量。In some optional implementation manners, the determining unit 602 is further configured to: use a preset retrieval method to retrieve at least one sentence from the associated context according to the question sentence; Encoding the at least one statement; and inputting the encoded question statement and the encoded at least one statement into a pre-trained language model to obtain the question feature vector and the associated context feature vector.
请参考图7,图7示出了本公开的一个实施例的基于自然语言推理的基于自然语言推理的信息处理方法可以应用于其中的示例性系统架构。Please refer to FIG. 7 , which shows an exemplary system architecture in which an information processing method based on natural language reasoning based on natural language reasoning according to an embodiment of the present disclosure can be applied.
如图7所示,系统架构可以包括终端设备701、702、703,网络704,服务器705。网络704用以在终端设备701、702、703和服务器705之间提供通信链路的介质。网络704可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 7 , the system architecture may include terminal devices 701 , 702 , and 703 , a network 704 , and a server 705 . The network 704 is used as a medium for providing communication links between the terminal devices 701 , 702 , 703 and the server 705 . Network 704 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.
终端设备701、702、703可以通过网络704与服务器705交互,以接收或发送消息等。终端设备701、702、703上可以安装有各种客户端应用,例如网页浏览器应用、搜索类应用、新闻资讯类应用。终端设备701、702、703中的客户端应用可以接收用户的指令,并根据用户的指令完成相应的功能,例如根据用户的指令在信息中添加相应信息。The terminal devices 701, 702, 703 can interact with the server 705 through the network 704 to receive or send messages and the like. Various client applications, such as web browser applications, search applications, and news information applications, may be installed on the terminal devices 701, 702, and 703. The client applications in the terminal devices 701, 702, and 703 can receive user instructions and complete corresponding functions according to the user instructions, such as adding corresponding information to information according to the user instructions.
终端设备701、702、703可以是硬件,也可以是软件。当终端设备701、702、703为硬件时,可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式计算机等等。当终端设备701、702、703为软件时,可以安装在上述所列举的电子设备中。其可以实现成多个软件或软件模块(例如用来提供分布式服务的软件或软件模块),也可以实现成单个软件或软件模块。在此不做具体限定。 Terminal devices 701, 702, and 703 may be hardware or software. When the terminal devices 701, 702, and 703 are hardware, they may be various electronic devices that have display screens and support web browsing, including but not limited to smartphones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, moving picture expert compression standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, moving picture expert compression standard audio layer 4) player, laptop portable computer and desktop computer, etc. When the terminal devices 701, 702, and 703 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (such as software or software modules for providing distributed services), or as a single software or software module. No specific limitation is made here.
服务器705可以提供各种服务,例如接收终端设备701、702、703发送的问题语句以及关联上下文,对所述问题语句与关联上下文进行分析处理,将分析处理结果发送给终端设备。The server 705 can provide various services, such as receiving question sentences and associated contexts sent by terminal devices 701 , 702 , and 703 , analyzing and processing the question sentences and associated contexts, and sending analysis and processing results to the terminal devices.
需要说明的是,本公开实施例所提供的信息显示方法可以由终端 设备执行,相应地,信息显示装置可以设置在终端设备701、702、703中。此外,本公开实施例所提供的信息显示方法还可以由服务器705执行,相应地,信息显示装置可以设置于服务器705中。It should be noted that the information display method provided by the embodiments of the present disclosure can be executed by a terminal device, and accordingly, the information display apparatus can be set in the terminal devices 701, 702, and 703. In addition, the information display method provided by the embodiment of the present disclosure may also be executed by the server 705 , and accordingly, the information display device may be set in the server 705 .
应该理解,图7中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in FIG. 7 are only illustrative. According to the implementation needs, there can be any number of terminal devices, networks and servers.
下面参考图8,其示出了适于用来实现本公开实施例的电子设备(例如图7中的终端设备或服务器)的结构示意图。本公开实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图8示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。Referring now to FIG. 8 , it shows a schematic structural diagram of an electronic device (such as the terminal device or server in FIG. 7 ) suitable for implementing the embodiments of the present disclosure. The terminal equipment in the embodiment of the present disclosure may include but not limited to such as mobile phone, notebook computer, digital broadcast receiver, PDA (personal digital assistant), PAD (tablet computer), PMP (portable multimedia player), vehicle terminal (such as mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers and the like. The electronic device shown in FIG. 8 is only an example, and should not limit the functions and scope of use of the embodiments of the present disclosure.
如图8所示,电子设备可以包括处理装置(例如中央处理器、图形处理器等)801,其可以根据存储在只读存储器(ROM)802中的程序或者从存储装置808加载到随机访问存储器(RAM)803中的程序而执行各种适当的动作和处理。在RAM 803中,还存储有电子设备800操作所需的各种程序和数据。处理装置801、ROM 802以及RAM 803通过总线804彼此相连。输入/输出(I/O)接口805也连接至总线804。As shown in FIG. 8 , an electronic device may include a processing device (such as a central processing unit, a graphics processing unit, etc.) (RAM) 803 to execute various appropriate actions and processing. In the RAM 803, various programs and data necessary for the operation of the electronic device 800 are also stored. The processing device 801, ROM 802, and RAM 803 are connected to each other through a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804 .
通常,以下装置可以连接至I/O接口805:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置806;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置807;包括例如磁带、硬盘等的存储装置808;以及通信装置809。通信装置809可以允许电子设备与其他设备进行无线或有线通信以交换数据。虽然图8示出了具有各种装置的电子设备,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。Typically, the following devices can be connected to the I/O interface 805: input devices 806 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 807 such as a computer; a storage device 808 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 809. The communication means 809 may allow the electronic device to perform wireless or wired communication with other devices to exchange data. While FIG. 8 shows an electronic device having various means, it is to be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置809从网络上被下载和安装,或者从存储装置808被安装,或者从ROM 802被安装。在该计算机程序被处理装置801执行时,执行本公开实施例的方法中限定的上述功能。In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from a network via communication means 809, or from storage means 808, or from ROM 802. When the computer program is executed by the processing device 801, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are performed.
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红 外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。It should be noted that the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two. A computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In the present disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present disclosure, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
在一些实施方式中,客户端、服务器可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。In some embodiments, the client and the server can communicate using any currently known or future network protocols such as HTTP (HyperText Transfer Protocol, Hypertext Transfer Protocol), and can communicate with digital data in any form or medium The communication (eg, communication network) interconnections. Examples of communication networks include local area networks ("LANs"), wide area networks ("WANs"), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。The above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:接收问题语句以及关联上下文,其中所述问题语句用于表征待给定答案的问题;基于所述问题语句的问题特征信息以及所述关联上下文的上下文特征信息,确定所述问题的答案以及得出答案的论证图;所述论证图表征由所述关联上下文推理得到所述答案的过程,所述论证图为有向无环图,所述有向无环图包括节点和节点之间的有向边,所述节点为所述关联上下文中的语句,所述节点之间的有向边表征关联的两节点之间的推断关系。The above-mentioned computer-readable medium bears one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: receives a question statement and an associated context, wherein the question statement is used to represent a given The question of the answer; based on the question feature information of the question sentence and the context feature information of the associated context, determine the answer to the question and the argumentation diagram for obtaining the answer; The process of describing the answer, the argument graph is a directed acyclic graph, the directed acyclic graph includes nodes and directed edges between nodes, the nodes are statements in the associated context, and the nodes between The directed edge between represents the inferred relationship between two associated nodes.
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以 完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as the "C" language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定。The units involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of a unit does not constitute a limitation of the unit itself under certain circumstances.
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。The functions described herein above may be performed at least in part by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), System on Chips (SOCs), Complex Programmable Logical device (CPLD) and so on.
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述 公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is only a preferred embodiment of the present disclosure and an illustration of the applied technical principle. Those skilled in the art should understand that the disclosure scope involved in this disclosure is not limited to the technical solution formed by the specific combination of the above-mentioned technical features, but also covers the technical solutions formed by the above-mentioned technical features or Other technical solutions formed by any combination of equivalent features. For example, a technical solution formed by replacing the above-mentioned features with (but not limited to) technical features with similar functions disclosed in this disclosure.
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。In addition, while operations are depicted in a particular order, this should not be understood as requiring that the operations be performed in the particular order shown or performed in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while the above discussion contains several specific implementation details, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are merely example forms of implementing the claims.

Claims (14)

  1. 一种基于自然语言推理的信息处理方法,包括:An information processing method based on natural language reasoning, comprising:
    接收问题语句以及关联上下文,其中所述问题语句用于表征待给定答案的问题;receiving a question statement and an associated context, wherein the question statement characterizes a question to be answered;
    基于所述问题语句的问题特征信息以及所述关联上下文的上下文特征信息,确定所述问题的答案以及得出答案的论证图;所述论证图表征由所述关联上下文推理得到所述答案的过程。Based on the question feature information of the question sentence and the context feature information of the associated context, determine the answer to the question and the argumentation graph for deriving the answer; the argumentation graph represents the process of deriving the answer from the associated context .
  2. 根据权利要求1所述的方法,其中,所述论证图为有向无环图,所述有向无环图包括节点和节点之间的有向边,所述节点为所述关联上下文中的语句,所述节点之间的有向边表征关联的两节点之间的推断关系。The method according to claim 1, wherein the argument graph is a directed acyclic graph, the directed acyclic graph includes nodes and directed edges between nodes, and the nodes are nodes in the associated context statement, the directed edge between the nodes represents the inferred relationship between the two associated nodes.
  3. 根据权利要求1所述的方法,其中,所述基于所述问题语句的问题特征信息以及所述关联上下文的上下文特征信息,确定所述问题的答案以及得出答案的论证图,包括:The method according to claim 1, wherein, determining the answer to the question and obtaining an argument diagram of the answer based on the question feature information of the question sentence and the context feature information of the associated context includes:
    将所述问题语句以及所述关联上下文输入到预训练的语言模型,得到问题特征向量以及关联上下文特征向量;Inputting the question sentence and the associated context into a pre-trained language model to obtain a question feature vector and an associated context feature vector;
    将所述问题特征向量以及所述关联上下文特征向量输入到概率图神经网络模型,由所述概率图神经网络模型推理得到所述答案和得出所述答案的论证图。The question feature vector and the associated context feature vector are input into a probabilistic graph neural network model, and the answer and an argument graph of the answer are deduced from the probabilistic graph neural network model.
  4. 根据权利要求3所述的方法,其中,所述论证图包括节点和节点之间的有向边;所述概率图神经网络模型基于如下步骤得到:The method according to claim 3, wherein the argument graph includes nodes and directed edges between nodes; the probabilistic graph neural network model is obtained based on the following steps:
    通过概率图模型定义所述问题语句的所有可能的答案的论证图和答案的联合分布,以显式建立所述论证图和答案之间的依赖,论证图包括表征所述关联上下文的语句的节点和节点间的有向边,所述联合分布包括答案变量、节点变量和有向边变量;A joint distribution of argument graphs and answers of all possible answers to the question statement is defined by a probabilistic graph model to explicitly establish dependencies between the argument graph and answers, the argument graph includes nodes representing sentences of the associated context and directed edges between nodes, the joint distribution includes answer variables, node variables and directed edge variables;
    利用设计的答案势函数、节点势函数和边势函数显式建立所述联合分布中的不同变量之间的依赖关系;所述节点势函数与节点、答案相关,所述边势函数与节点、答案和节点间的有向边相关;Use the designed answer potential function, node potential function and edge potential function to explicitly establish the dependency between the different variables in the joint distribution; the node potential function is related to the node and the answer, and the edge potential function is related to the node, The answer is related to the directed edge between the nodes;
    使用神经网络对每个势函数进行参数化,得到经过参数化处理后的联合分布;Use the neural network to parameterize each potential function to obtain the joint distribution after parameterization;
    确定所述经过参数化处理后的联合分布的伪似然函数;以及确定用于近似表征所述伪似然函数的变分近似,以得到计算机可求解的概率图神经网络模型。determining a pseudo-likelihood function of the parameterized joint distribution; and determining a variational approximation for approximately characterizing the pseudo-likelihood function, so as to obtain a computer-solvable probability graph neural network model.
  5. 根据权利要求3所述的方法,其中,所述概率图神经网络模型 经过如下步骤训练得到:The method according to claim 3, wherein the probability graph neural network model is obtained through the following steps of training:
    获取训练样本集,所述训练样本集包括多组训练样本,每组训练样本包括样本关联上下文、样本问题语句、样本问题语句对应的样本答案、以及由所述样本关联上下文得到所述样本答案的样本论证图;Obtain a training sample set, the training sample set includes multiple sets of training samples, each set of training samples includes a sample associated context, a sample question sentence, a sample answer corresponding to the sample question sentence, and the sample answer obtained from the sample associated context sample argument diagram;
    将样本关联上下文以及样本问题语句作为输入,将所述样本答案以及样本论证图作为输出,对概率图神经网络模型进行训练,得到训练后的概率图神经网络模型。The sample association context and the sample question sentence are used as input, and the sample answer and the sample argument graph are used as output to train the probability graph neural network model to obtain the trained probability graph neural network model.
  6. 根据权利要求5所述的方法,其中,所述论证图包括节点和节点之间的有向边;所述将样本关联上下文以及样本问题语句作为输入,将所述样本答案以及样本论证图作为输出,对概率图神经网络模型进行训练,得到训练后的概率图神经网络模型,包括:The method according to claim 5, wherein the argument graph includes nodes and directed edges between nodes; the sample associated context and sample question statement are used as input, and the sample answer and sample argument graph are output as , train the probabilistic graph neural network model, and obtain the trained probabilistic graph neural network model, including:
    基于如下步骤建立损失函数:基于样本节点的节点特征向量、有向边对应的边特征向量,建立样本答案、节点特征向量、边特征向量之间的联合分布和近似的变分分布;根据所述联合分布和近似的变分分布确定所述损失函数;The loss function is established based on the following steps: based on the node eigenvectors of the sample nodes and the edge eigenvectors corresponding to the directed edges, the joint distribution and approximate variational distribution among the sample answers, node eigenvectors, and edge eigenvectors are established; according to the A joint distribution and an approximated variational distribution determine said loss function;
    将样本关联上下文以及样本问题语句作为输入,将所述样本答案以及样本论证图作为输出,基于所述损失函数利用反向传播算法,对所述概率图神经网络模型进行训练,直至满足预设条件。Taking sample associated context and sample question statement as input, taking the sample answer and sample argument graph as output, using the backpropagation algorithm based on the loss function, to train the probability graph neural network model until the preset conditions are met .
  7. 根据权利要求6所述的方法,其中,所述基于样本节点的节点特征向量、有向边对应的边特征向量,建立样本答案、节点特征向量、边特征向量之间的联合分布,包括:The method according to claim 6, wherein, the establishment of a joint distribution among sample answers, node feature vectors, and edge feature vectors based on the node feature vectors of the sample nodes and the edge feature vectors corresponding to the directed edges includes:
    根据样本问题和样本关联上下文的全局特征表示,确定关于样本答案的第一势函数;According to the sample question and the global feature representation of the sample association context, determine the first potential function about the sample answer;
    对于样本论证图的每一个样本节点,根据该样本节点的特征向量,建立关于该样本节点与所述样本答案的第二势函数;For each sample node in the sample demonstration graph, according to the feature vector of the sample node, establish a second potential function about the sample node and the sample answer;
    对于样本论证图的每一个有向边,根据该有向边的特征向量,建立关于该有向边与所述样本答案及关联的两个样本节点的第三势函数,所述有向边的特征向量与该有相关的两个节点各自对应的特征向量相关;For each directed edge of the sample demonstration graph, according to the eigenvector of the directed edge, a third potential function about the directed edge, the sample answer and the associated two sample nodes is established, and the directed edge's The eigenvectors are related to the corresponding eigenvectors of the two related nodes;
    基于所述第一势函数、各样本节点分别对应的第二势函数以及各样本有向边分别对应的第三势函数,参数化所述样本答案、所述样本论证图节点及所述样本论证图有向边之间的联合分布。Based on the first potential function, the second potential function corresponding to each sample node, and the third potential function corresponding to each sample directed edge, parameterize the sample answer, the sample argument graph node, and the sample argument Joint distribution between directed edges in a graph.
  8. 根据权利要求6所述的方法,其中,所述基于样本节点的节点特征向量、有向边对应的边特征向量,建立样本答案、节点特征向量、边特征向量之间的近似的变分分布,包括:The method according to claim 6, wherein, based on the node feature vector of the sample node and the edge feature vector corresponding to the directed edge, an approximate variational distribution between the sample answer, the node feature vector, and the edge feature vector is established, include:
    确定所述联合分布对应的伪似然函数;determining a pseudo-likelihood function corresponding to the joint distribution;
    基于平均场假设,使用变分分布来近似所述伪似然函数,其中所述变分分布中的每个变量是相互独立的,所述变量包括:样本答案、所述样本论证图中的节点和样本论证图中的有向边。Based on the mean field assumption, the pseudo-likelihood function is approximated using a variational distribution, wherein each variable in the variational distribution is independent of each other, and the variables include: sample answers, nodes in the sample argument graph and directed edges in the sample argument graph.
  9. 根据权利要求8所述的方法,其中,所述根据所述联合分布和近似的变分分布确定所述损失函数,包括:The method of claim 8, wherein said determining said loss function from said joint distribution and an approximated variational distribution comprises:
    根据所述变分分布确定第一损失函数和第二损失函数,以根据所述伪似然函数确定第三损失函数;determining a first loss function and a second loss function based on the variational distribution to determine a third loss function based on the pseudo-likelihood function;
    所述第一损失函数用于表征:由所述概率图神经网络模型预测的论证图中所包括的预测节点,与样本论证图中所包括的样本节点之间的偏差;The first loss function is used to characterize: the deviation between the predicted nodes included in the demonstration graph predicted by the probability graph neural network model and the sample nodes included in the sample demonstration graph;
    所述第二损失函数用于表征:由所述概率图神经网络模型预测的论证图中所包括的预测有向边,与样本论证图中所包括的有向边之间的偏差;The second loss function is used to characterize: the deviation between the predicted directed edges included in the argument graph predicted by the probabilistic graph neural network model and the directed edges included in the sample argument graph;
    所述第三损失函数用于表征:由所述概率图神经网络模型预测的论证图中所包括的预测答案与样本答案之间的偏差;其中,通过所述伪似然函数所确定的第三损失函数中的节点和有向边是通过变分分布的预测结果。The third loss function is used to characterize: the deviation between the predicted answer included in the argument map predicted by the probability graph neural network model and the sample answer; wherein, the third loss function determined by the pseudo-likelihood function The nodes and directed edges in the loss function are the predicted results through the variational distribution.
  10. 根据权利要求6所述的方法,其中,所述预设条件,包括:The method according to claim 6, wherein the preset conditions include:
    所述第一损失函数、第二损失函数、第三损失函数之和满足收敛条件;或者The sum of the first loss function, the second loss function, and the third loss function satisfies a convergence condition; or
    训练次数达到预设次数阈值。The number of training times reaches the preset number of times threshold.
  11. 根据权利要求3所述的方法,其中,在将所述问题语句以及所述关联上下文输入到预训练的语言模型,得到问题特征向量以及关联上下文特征向量之前,所述基于所述问题语句的问题特征信息以及所述关联上下文的上下文特征信息,确定所述问题的答案以及得出所述答案的论证图,还包括:The method according to claim 3, wherein, before the question statement and the associated context are input into a pre-trained language model to obtain the question feature vector and the associated context feature vector, the question based on the question statement The characteristic information and the contextual characteristic information of the associated context, determining the answer to the question and obtaining an argument graph for the answer, further include:
    根据所述问题语句,使用预设检索方法,从所述关联上下文中检索出至少一个语句;Retrieve at least one sentence from the associated context by using a preset retrieval method according to the question sentence;
    使用预设编码方法将上述检索出所述至少一个语句进行编码;Encoding the at least one sentence retrieved above using a preset encoding method;
    以及所述将所述问题语句以及所述关联上下文输入到预训练的语言模型,得到所述问题特征向量以及关联上下文特征向量,包括:And said inputting said question statement and said associated context into a pre-trained language model to obtain said question feature vector and associated context feature vector, including:
    将编码化的所述问题语句以及编码化的所述至少一个语句,输入到预训练的语言模型,得到所述问题特征向量以及所述关联上下文特征向量。Inputting the coded question sentence and the coded at least one sentence into a pre-trained language model to obtain the question feature vector and the associated context feature vector.
  12. 一种基于自然语言推理的基于自然语言推理的信息处理装置,包括:An information processing device based on natural language reasoning based on natural language reasoning, comprising:
    接收单元,用于接收问题语句以及关联上下文,其中所述问题语句用于表征待给定答案的问题;a receiving unit, configured to receive a question statement and an associated context, wherein the question statement is used to characterize a question to be given an answer;
    确定单元,用于基于所述问题语句的问题特征信息以及所述关联上下文的上下文特征信息,确定所述问题的答案以及得出所述答案的论证图;所述论证图表征由所述关联上下文推理得到所述答案的过程。A determining unit, configured to determine an answer to the question and an argument diagram for deriving the answer based on the question feature information of the question sentence and the context feature information of the associated context; The process of reasoning to arrive at the stated answer.
  13. 一种电子设备,其特征在于,包括:An electronic device, characterized in that it comprises:
    一个或多个处理器;one or more processors;
    存储装置,用于存储一个或多个程序,storage means for storing one or more programs,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-11中任一所述的方法。When the one or more programs are executed by the one or more processors, the one or more processors are made to implement the method according to any one of claims 1-11.
  14. 一种计算机可读介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-11中任一所述的方法。A computer-readable medium, on which a computer program is stored, wherein, when the program is executed by a processor, the method according to any one of claims 1-11 is realized.
PCT/CN2022/101739 2021-07-01 2022-06-28 Information processing method and apparatus based on natural language inference, and electronic device WO2023274187A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110744658.3 2021-07-01
CN202110744658.3A CN113505206B (en) 2021-07-01 2021-07-01 Information processing method and device based on natural language reasoning and electronic equipment

Publications (1)

Publication Number Publication Date
WO2023274187A1 true WO2023274187A1 (en) 2023-01-05

Family

ID=78009578

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/101739 WO2023274187A1 (en) 2021-07-01 2022-06-28 Information processing method and apparatus based on natural language inference, and electronic device

Country Status (2)

Country Link
CN (1) CN113505206B (en)
WO (1) WO2023274187A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117272937A (en) * 2023-11-03 2023-12-22 腾讯科技(深圳)有限公司 Text coding model training method, device, equipment and storage medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113505206B (en) * 2021-07-01 2023-04-18 北京有竹居网络技术有限公司 Information processing method and device based on natural language reasoning and electronic equipment
CN116226478B (en) * 2022-12-27 2024-03-19 北京百度网讯科技有限公司 Information processing method, model training method, device, equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040254903A1 (en) * 2003-06-10 2004-12-16 Heckerman David E. Systems and methods for tractable variational approximation for inference in decision-graph bayesian networks
CN106934012A (en) * 2017-03-10 2017-07-07 上海数眼科技发展有限公司 A kind of question answering in natural language method and system of knowledge based collection of illustrative plates
CN108763284A (en) * 2018-04-13 2018-11-06 华南理工大学 A kind of question answering system implementation method based on deep learning and topic model
CN109947912A (en) * 2019-01-25 2019-06-28 四川大学 A kind of model method based on paragraph internal reasoning and combined problem answer matches
CN111538819A (en) * 2020-03-27 2020-08-14 北京工商大学 Method for constructing question-answering system based on document set multi-hop inference
CN111814982A (en) * 2020-07-15 2020-10-23 四川大学 Multi-hop question-answer oriented dynamic reasoning network and method
CN112380835A (en) * 2020-10-10 2021-02-19 中国科学院信息工程研究所 Question answer extraction method fusing entity and sentence reasoning information and electronic device
US20210056445A1 (en) * 2019-08-22 2021-02-25 International Business Machines Corporation Conversation history within conversational machine reading comprehension
CN112597316A (en) * 2020-12-30 2021-04-02 厦门渊亭信息科技有限公司 Interpretable reasoning question-answering method and device
CN113505206A (en) * 2021-07-01 2021-10-15 北京有竹居网络技术有限公司 Information processing method and device based on natural language reasoning and electronic equipment

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120077180A1 (en) * 2010-09-26 2012-03-29 Ajay Sohmshetty Method and system for knowledge representation and processing using a structured visual idea map
CN107632968B (en) * 2017-05-22 2021-01-05 南京大学 Method for constructing evidence chain relation model for referee document
CN108509411B (en) * 2017-10-10 2021-05-11 腾讯科技(深圳)有限公司 Semantic analysis method and device
CN109344240B (en) * 2018-09-21 2022-11-22 联想(北京)有限公司 Data processing method, server and electronic equipment
CN110309283B (en) * 2019-06-28 2023-03-21 创新先进技术有限公司 Answer determination method and device for intelligent question answering
US11461613B2 (en) * 2019-12-05 2022-10-04 Naver Corporation Method and apparatus for multi-document question answering
CN112132143B (en) * 2020-11-23 2021-02-23 北京易真学思教育科技有限公司 Data processing method, electronic device and computer readable medium
CN112860865A (en) * 2021-02-10 2021-05-28 达而观信息科技(上海)有限公司 Method, device, equipment and storage medium for realizing intelligent question answering

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040254903A1 (en) * 2003-06-10 2004-12-16 Heckerman David E. Systems and methods for tractable variational approximation for inference in decision-graph bayesian networks
CN106934012A (en) * 2017-03-10 2017-07-07 上海数眼科技发展有限公司 A kind of question answering in natural language method and system of knowledge based collection of illustrative plates
CN108763284A (en) * 2018-04-13 2018-11-06 华南理工大学 A kind of question answering system implementation method based on deep learning and topic model
CN109947912A (en) * 2019-01-25 2019-06-28 四川大学 A kind of model method based on paragraph internal reasoning and combined problem answer matches
US20210056445A1 (en) * 2019-08-22 2021-02-25 International Business Machines Corporation Conversation history within conversational machine reading comprehension
CN111538819A (en) * 2020-03-27 2020-08-14 北京工商大学 Method for constructing question-answering system based on document set multi-hop inference
CN111814982A (en) * 2020-07-15 2020-10-23 四川大学 Multi-hop question-answer oriented dynamic reasoning network and method
CN112380835A (en) * 2020-10-10 2021-02-19 中国科学院信息工程研究所 Question answer extraction method fusing entity and sentence reasoning information and electronic device
CN112597316A (en) * 2020-12-30 2021-04-02 厦门渊亭信息科技有限公司 Interpretable reasoning question-answering method and device
CN113505206A (en) * 2021-07-01 2021-10-15 北京有竹居网络技术有限公司 Information processing method and device based on natural language reasoning and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117272937A (en) * 2023-11-03 2023-12-22 腾讯科技(深圳)有限公司 Text coding model training method, device, equipment and storage medium
CN117272937B (en) * 2023-11-03 2024-02-23 腾讯科技(深圳)有限公司 Text coding model training method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN113505206A (en) 2021-10-15
CN113505206B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
US11507748B2 (en) Method and apparatus for outputting information
WO2023274187A1 (en) Information processing method and apparatus based on natural language inference, and electronic device
WO2020182122A1 (en) Text matching model generation method and device
CN111144124B (en) Training method of machine learning model, intention recognition method, and related device and equipment
CN111177393A (en) Knowledge graph construction method and device, electronic equipment and storage medium
CN111666416B (en) Method and device for generating semantic matching model
WO2020155619A1 (en) Method and apparatus for chatting with machine with sentiment, computer device and storage medium
US11321534B2 (en) Conversation space artifact generation using natural language processing, machine learning, and ontology-based techniques
WO2023165538A1 (en) Speech recognition method and apparatus, and computer-readable medium and electronic device
WO2023273611A1 (en) Speech recognition model training method and apparatus, speech recognition method and apparatus, medium, and device
WO2022095354A1 (en) Bert-based text classification method and apparatus, computer device, and storage medium
WO2023005763A1 (en) Information processing method and apparatus, and electronic device
US11669679B2 (en) Text sequence generating method and apparatus, device and medium
CN113254716B (en) Video clip retrieval method and device, electronic equipment and readable storage medium
WO2024027663A1 (en) Recognition model pre-training method and apparatus, recognition method and apparatus, medium, and device
WO2023155678A1 (en) Method and apparatus for determining information
WO2024099342A1 (en) Translation method and apparatus, readable medium, and electronic device
CN114462425B (en) Social media text processing method, device and equipment and storage medium
CN117290477A (en) Generating type building knowledge question-answering method based on secondary retrieval enhancement
CN115270717A (en) Method, device, equipment and medium for detecting vertical position
CN116128055A (en) Map construction method, map construction device, electronic equipment and computer readable medium
CN111008213A (en) Method and apparatus for generating language conversion model
WO2023185896A1 (en) Text generation method and apparatus, and computer device and storage medium
WO2023179506A1 (en) Prosody prediction method and apparatus, and readable medium and electronic device
CN112580343A (en) Model generation method, question and answer quality judgment method, device, equipment and medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22832001

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE