CN114547273B

CN114547273B - Question answering method and related device, electronic equipment and storage medium

Info

Publication number: CN114547273B
Application number: CN202210271016.0A
Authority: CN
Inventors: 孟福利; 石庭豪; 刘加新; 郑新; 李直旭; 李明洹; 陈志刚
Original assignee: Iflytek Suzhou Technology Co Ltd
Current assignee: Iflytek Suzhou Technology Co Ltd
Priority date: 2022-03-18
Filing date: 2022-03-18
Publication date: 2022-08-16
Anticipated expiration: 2042-03-18
Also published as: CN114547273A

Abstract

The application discloses a question answering method, a relevant device, electronic equipment and a storage medium, wherein the question answering method comprises the following steps: acquiring a problem text and a knowledge graph; interacting with the first feature representation of the candidate entity respectively based on the segment feature representation of each segment to obtain a second feature representation of the problem text mapped in the target space; obtaining the feature similarity of the question text and the candidate entity in the target space based on the first feature representation and the second feature representation; and selecting at least one candidate entity as answer text of the question text based on the feature similarity between each candidate entity and the question text. According to the scheme, the relevance between the question text and the candidate entity can be more accurately reflected, the difference caused by the difference of the characteristic spaces of the question text and the candidate entity is greatly reduced, and the accuracy of question answering can be improved.

Description

Question answering method and related device, electronic equipment and storage medium

Technical Field

The present application relates to the field of speech recognition technologies, and in particular, to a question answering method, a related apparatus, an electronic device, and a storage medium.

Background

With the continuous development and progress of artificial intelligence, Knowledge Graphs (KGs) are attracting more and more attention as the realization basis of subsequent strong artificial intelligence technology. The knowledge graph is mainly used for representing various concept attributes in the real world, and the knowledge graph consists of entities and relations to form a corresponding graph structure.

A Knowledge-graph-based question-answering task (KBQA) is a process of finding correct answers to questions queried in natural language. Researches find that the traditional question answering mode is difficult to obtain more accurate answering texts. In view of this, how to improve the accuracy of question answering becomes an urgent problem to be solved.

Disclosure of Invention

The technical problem mainly solved by the application is to provide a question answering method, a relevant device, electronic equipment and a storage medium, and the accuracy of question answering can be improved.

In order to solve the above technical problem, a first aspect of the present application provides a question answering method, including: acquiring a problem text and a knowledge graph; the problem text comprises a plurality of fragments, and the knowledge graph comprises a plurality of entities; interacting with the first feature representation of the candidate entity respectively based on the segment feature representation of each segment to obtain a second feature representation of the problem text mapped in the target space; the candidate entities are selected from a plurality of entities, and the target space is the feature space where the candidate entities are located; obtaining the feature similarity of the question text and the candidate entity in the target space based on the first feature representation and the second feature representation; and selecting at least one candidate entity as answer text of the question text based on the feature similarity between each candidate entity and the question text.

In order to solve the above technical problem, a second aspect of the present application provides a question answering device, including: the system comprises an acquisition module, an interaction module, a measurement module and a selection module, wherein the acquisition module is used for acquiring a problem text and a knowledge map; the problem text comprises a plurality of fragments, and the knowledge graph comprises a plurality of entities; the interaction module is used for respectively interacting with the first feature representation of the candidate entity based on the segment feature representation of each segment to obtain a second feature representation of the problem text mapped in the target space; the candidate entities are selected from a plurality of entities, and the target space is the feature space where the candidate entities are located; the measurement module is used for obtaining the feature similarity of the question text and the candidate entity in the target space based on the first feature representation and the second feature representation; and the selection module is used for selecting at least one candidate entity as the answer text of the question text based on the feature similarity between each candidate entity and the question text.

In order to solve the above technical problem, a third aspect of the present application provides an electronic device, which includes a memory and a processor coupled to each other, wherein the memory stores program instructions, and the processor is configured to execute the program instructions to implement the question answering method of the first aspect.

In order to solve the above technical problem, a fourth aspect of the present application provides a computer-readable storage medium storing program instructions executable by a processor, the program instructions being for implementing the question answering method of the first aspect described above.

According to the scheme, the problem text and the knowledge graph are obtained, the problem text comprises a plurality of segments, the knowledge graph comprises a plurality of entities, interaction is respectively carried out on the segment feature representation of each segment and the first feature representation of the candidate entity, the second feature representation of the problem text mapped in the target space is obtained, the candidate entity is selected from the plurality of entities, and the target space is the feature space where the candidate entity is located. On the basis, the feature similarity of the question text and the candidate entities in the target space is obtained on the basis of the first feature representation and the second feature representation, and then at least one candidate entity is selected as an answer text of the question text on the basis of the feature similarity of each candidate entity and the question text. Therefore, on one hand, the method is beneficial to measuring the attention degree of the candidate entity to different positions of the question text through feature interaction, so that the second feature representation of the question text can more accurately reflect the relevance between the question text and the candidate entity, on the other hand, the first feature representation of the candidate entity and the second feature representation of the question text tend to be the same feature space through feature interaction, and therefore, the difference caused by the difference of the feature spaces of the question text and the candidate entity can be greatly reduced. Therefore, the accuracy of question answering can be improved.

Drawings

FIG. 1 is a schematic flow chart diagram of one embodiment of a question answering method of the present application;

FIG. 2 is a schematic diagram of a process for obtaining an embodiment of a knowledge-graph;

FIG. 3 is a schematic diagram of an embodiment of a knowledge-graph;

FIG. 4 is a schematic diagram of another embodiment of a knowledge-graph;

FIG. 5 is a process diagram of one embodiment of a feature similarity metric;

FIG. 6 is a process diagram of an embodiment of the present application question answering method;

FIG. 7 is a block diagram of an embodiment of the question answering device of the present application;

FIG. 8 is a block diagram of an embodiment of an electronic device of the present application;

FIG. 9 is a block diagram of an embodiment of a computer-readable storage medium of the present application.

Detailed Description

The following describes in detail the embodiments of the present application with reference to the drawings attached hereto.

In the following description, for purposes of explanation rather than limitation, specific details are set forth such as the particular system architecture, interfaces, techniques, etc., in order to provide a thorough understanding of the present application.

The terms "system" and "network" are often used interchangeably herein. The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the segment "/" herein generally indicates that the former and latter associated objects are in an "or" relationship. Further, the term "plurality" herein means two or more than two.

Referring to fig. 1, fig. 1 is a schematic flow chart of an embodiment of a question answering method according to the present application. Specifically, the method may include the steps of:

step S11: problem text and a knowledge graph are obtained.

In the disclosed embodiment, the question text contains several segments, and the knowledge graph contains several entities. It should be noted that the embodiments of the present disclosure do not limit the number of segments included in the question text, nor the number of entities included in the knowledge graph. Further, each segment contains at least one character.

In one implementation scenario, the number of characters included in a segment may be set according to the actual application. Specifically, the setting may be made according to the language to which the question text belongs. For example, in the case where the question text is a text expressed by the language "chinese", the segments may be words, i.e., each word constitutes a different segment, or the segments may be chinese characters, i.e., each chinese character constitutes a different segment; alternatively, in the case where the question text is a text expressed by the language "english", the segments may be words, i.e., the individual words constitute different segments. Other cases may be analogized, and no one example is given here.

In one implementation scenario, the question text and the knowledge graph may both relate to the target domain. Target areas may include, but are not limited to: alert, tax payment, social interaction, etc., without limitation.

In an implementation scenario, the question text may be obtained by a user through keyboard input, voice input, or the like, and the obtaining manner of the question text is not limited herein.

In one implementation scenario, to further improve the accuracy of the question response, first data and second data related to the target field may be acquired, and the first data is unstructured data and the second data is structured data. Based on the method, event extraction can be carried out based on the first data to obtain a plurality of first entities, knowledge fusion is carried out on the first entities based on the second data to obtain a plurality of second entities, and a knowledge graph is constructed and obtained based on the second entities. It should be noted that the unstructured data may include, but is not limited to, statements, paragraphs, chapters, and the like expressed in natural language, and the structured data may include, but is not limited to, a database with a specific structure, and the like, and is not limited herein. According to the mode, the knowledge graph is constructed by combining the unstructured first data and the structured second data together, sparse problems of the knowledge graph can be greatly relieved, on one hand, the follow-up excavation of deep-level information based on the knowledge graph is facilitated, the accuracy of question answering is improved, and on the other hand, the knowledge graph is more perfect, and the efficiency of searching answers based on the knowledge graph is also facilitated to be improved.

In a specific implementation scenario, taking the target field as an example of an alarm condition, the unstructured first data may include, but is not limited to, content, a record, a mouth supply, and the like related to alarm condition information, and is not limited herein. It should be noted that the content related to the alert information may include, but is not limited to: who, when, what, where, why, what and what, etc. In addition, the alarm conditions can be classified into illegal alarm conditions (such as theft alarm conditions and fraud alarm conditions) and non-illegal alarm conditions (such as dispute alarm conditions and complaint alarm conditions). In addition, the structured second data may include, but is not limited to, a household register, etc., and is not limited thereto. Other cases may be analogized, and no one example is given here. Therefore, the knowledge graph is constructed according to the information and the corresponding elements, the related alarm conditions or the related elements can be combined, and the multi-hop study and judgment are carried out on the basis, so that the deep inference of the series-parallel scheme is realized, the mining of the deeper useful information is facilitated, and the accuracy of question answering is improved.

In one specific implementation scenario, please refer to fig. 2 in combination, and fig. 2 is a schematic process diagram of an embodiment of obtaining a knowledge-graph. As shown in fig. 2, by performing event extraction on the first data, it is possible to complete the acquisition of unstructured data to structured knowledge. It should be noted that the event extraction may be performed by a mode matching, a machine learning, and the like, which is not limited herein. In addition, the specific manner of event extraction may refer to the technical details of event extraction, which are not described herein again.

In a specific implementation scenario, taking the target field as the alert, in order to alleviate the sparse problem of the knowledge graph, various information such as the household registration form, the alert information base, the background examiner and the like may be fused through the identity card number. In addition, attributes of the people (e.g., gender, residence, scholarly, etc.), relationships between the people (e.g., father and son, mother and son, spouse, colleague, etc.), and relationships between the people and the alarm event (e.g., witness, party, etc.) can be augmented, thereby enabling the knowledge graph to be more sophisticated.

Step S12: and respectively interacting with the first feature representation of the candidate entity based on the segment feature representation of each segment to obtain a second feature representation of the problem text mapped in the target space.

In the embodiment of the disclosure, the candidate entities are selected from a plurality of entities, and the target space is a feature space where the candidate entities are located. In particular, all entities in the knowledge-graph may be selected as candidate entities, respectively. Of course, in order to improve the answer efficiency, some entities in the knowledge graph may also be selected as candidate entities, which is not limited herein. It should be noted that, in the case that the first feature representation and the second feature representation are in the same feature space, the first feature representation and the second feature representation may have the same feature dimension. Furthermore, for each candidate entity, the step of obtaining the second feature representation of the question text mapped to the target space based on the interaction of the segment feature representation of each segment with the first feature representation of the candidate entity may be performed, that is, for each candidate entity, the first feature representation of the candidate entity itself and the second feature representation of the question text obtained after the interaction of each segment of the question text with the candidate entity may be obtained. It should be noted that, unless otherwise specified, each feature representation in the embodiments disclosed in the present application may be represented in a vector form, and a vector dimension may be set according to an actual situation, for example, 128 or 256 may be set, and is not limited herein.

In one implementation scenario, in order to improve the question-answering efficiency, a sub-map related to the answering question text may be extracted from the knowledge map based on the question text, and each entity in the sub-map is respectively used as a candidate entity. Illustratively, sub-maps may be extracted from the knowledge-map by algorithms such as PullNet. The specific process of extracting the sub-map may refer to technical details of algorithms such as PullNet, and is not described herein again. Referring to fig. 3 and 4 in combination, fig. 3 is a schematic diagram of an embodiment of a knowledge-graph, and fig. 4 is a schematic diagram of another embodiment of a knowledge-graph. It should be noted that circles represent entities in the knowledge graph, and connecting lines between circles represent relationships between the entities. As shown in fig. 3, for the path indicated by the bold solid line, two circles at the end of the path respectively indicate entities corresponding to two different complaints. In addition, the other entities to which the two entities respectively point represent alert entities (e.g., people) related to the appeal expressions represented by the entities, and as shown in fig. 3, the two people entities on the path correspond to the same account entity, that is, the two people are related to each other and respectively relate to different alerts, and these pieces of information also help to provide information support for subsequent work. In addition, fig. 4 shows that there is a certain connection between the alarms involved between different types of relationships between people. In addition to the common consensus among the members of the family, the relative similarity between neighborhoods and colleagues is often the same. Therefore, the correct answer is possibly eliminated by limiting the hop number of the reasoning path in the question answering process, and the multi-hop research and judgment reasoning can be supported in the question answering process by extracting the sub-map related to the answering question text. In the mode, on the basis of the question text, the sub-graph spectrum related to the answer question text is extracted from the knowledge graph, and each entity in the sub-graph spectrum is respectively used as a candidate entity, so that the number of hops of the inference path is limited in a KBQA task, so that the correct answer is increased and excluded from the limited number of hops.

In one implementation scenario, the segment feature Representation may be extracted based on a pre-training language model such as BERT (Bidirectional Encoder Representation from transforms), RoBERTa, and the like. Taking the feature expression of the extracted fragment based on the RoBERTA model as an example, in order to improve the accuracy of feature extraction, a dynamic MASK can be adopted for pre-training; meanwhile, the NSP (Next sequence Prediction) task may be cancelled and the FULL-SENTENCES mechanism may be added. Under this mechanism, unlike the NSP task, the input is no longer two sentences, but a piece of text fills in a specified byte length. The specific process of the pre-training can refer to the technical details of the pre-training language model, and is not described herein again. It should be noted that the output of the last layer of the pre-trained language model can be used as a text feature representation of the whole question text. For convenience of description, the segment feature representation of the ith segment in the question text may be denoted as e _qi ∈C ^d Where d represents the dimension of the target space. In addition, when the question text includes N pieces, a combination of the piece feature representations of the N pieces may be regarded as the text feature representation. In addition, the specific meaning of the segment can be referred to the related description, and is not repeated herein.

In one implementation scenario, the first feature representation may be extracted based on the ComplEx model. The ComplEx model is a network model based on ComplEx representation, and can better grasp the symmetric and asymmetric relationships of the binary relationships in the knowledge base by means of tensor decomposition, and each entity and relationship can be represented by a ComplEx vector. The expression vector e of the subject s in each triple contained in the knowledge graph can be extracted and obtained based on the ComplEx model _s Object o representation vector

And a representation vector w of predicate r _r And all the three satisfy ∈ C ^d . At the foundationIn the above, the representation vector may be taken as a candidate entity, and a first feature representation of the candidate entity is obtained. In addition, when the first feature expression is extracted by using the ComplEx model, only the real part of the expression vector may be taken as the first feature expression. In addition, in the training process of the ComplEx model, the representation vector of the sample triplet (r, s, o) extracted by the ComplEx model may be processed based on the scoring function, so as to obtain the score Φ (r, s, o; θ) of the sample triplet (r, s, o):

……（1）

in the above formula (1), θ represents the network parameters of the ComplEx model, and K represents the number of entities in the sample knowledge graph connected to the subject s, i.e. the total number of sample triples existing in the sample knowledge graph by the subject s. w is a _rk A representative vector representing the predicate in the kth sample triplet, e _sk A representation vector representing the subject in the kth sample triplet,

the representation vector representing the object in the kth sample triplet, it should be noted that,

representing taking the conjugate, Re () representing taking the real part of the complex number. On the basis, the values phi (r, s, o; theta) of all sample triples can be normalized to obtain the prediction probability value P (Y) of the sample triples being true _rso =1)：

P(Y _rso =1)=σ(φ(r,s,o;θ))……（2）

In the above formula (2), σ () represents a normalization function, such as sigmoid, and is not limited herein. On this basis, a sample mark marked by the sample triple can be obtained, the sample mark indicates whether the sample triple is true, if so, the sample mark "1" can be used to indicate that the sample triple is true, and the sample mark "0" can be used to indicate that the sample triple is false, so that the prediction probability value can be processed by using a loss function such as cross entropy based on the sample mark, the prediction loss is obtained, and the network parameter θ of the ComplEx model is adjusted based on the prediction loss. It should be noted that, for a specific measurement mode of predicting loss, reference may be made to technical details of a loss function such as cross entropy, and for a specific adjustment process of a network parameter, reference may be made to an optimization mode such as gradient descent, which is not described herein again. In addition, in the training process, the ComplEx model can continuously learn the feature information of the correct triplet, so that after multiple rounds of training, the ComplEx model can extract more and more accurate vector representations to improve the accuracy of the first feature representation extracted and obtained finally in the actual application process.

In an implementation scenario, attention interaction may be performed with the first feature representation based on each segment feature representation to obtain a weight of each segment, and then the segment feature representations of the segments are weighted based on the weights of the segments to obtain a second feature representation of the problem text mapped in the target space. Specifically, the weighted feature representations of the segments may be obtained by weighting the segment feature representations of the segments by using the weights of the segments, and the weighted feature representations of the segments are summed to obtain the second feature representation of the problem text mapped to the target space. In the manner, attention interaction is respectively carried out on the segment feature representations based on the segments and the first feature representation to obtain the weights of the segments, then the segment feature representations of the segments are weighted based on the weights of the segments to obtain the second feature representation of the problem text mapped in the target space, so that the attention degree of the candidate entity to each segment can be accurately measured through attention interaction, the relevance between the problem text and the candidate entity can be more accurately embodied through the second feature representation of the problem text, and the accuracy of the second feature representation can be further improved.

In a specific implementation scenario, the weights may be obtained based on an attention network (e.g., a cross-attention network), the attention network may be obtained based on sample data, the sample data may include a sample question text, a positive example entity and a negative example entity, the positive example entity forms a sample answer text of the sample question text, and the negative example entity does not form a sample answer text of the sample question text. In particular, the positive and negative case entities may both be from a sample knowledge-graph. In addition, the number of positive and negative instances may not be limited. For example, an entity that can correctly answer the sample question text may be selected from the sample knowledge-graph as a positive example entity, and at least one entity that can incorrectly answer the sample question text may be selected as a negative example entity. In the above manner, the weights are obtained based on the attention network, the attention network is obtained based on sample data training, the sample data comprises a sample question text, a positive example entity and a negative example entity, the positive example entity forms a sample answer text of the sample question text, and the negative example entity does not form the sample answer text, so that the attention network can learn the feature information of the positive example entity capable of correctly answering the sample question text and the feature information of the negative example entity capable of incorrectly answering the sample question text in the training process, thereby greatly reducing the difference caused by the difference of the feature spaces of the question text and the candidate entity, improving the network generalization performance, and being beneficial to improving the weight prediction accuracy of the attention network.

In a specific implementation scenario, for convenience of description, the first feature representation of the ith candidate entity may be denoted as e _ai And the segment feature representation of the jth segment in the question text is recorded as e _qj On this basis, the first feature representation and the segment feature representation of each segment may be processed based on the attention network to obtain a weight of each segment. Illustratively, for the ith candidate entity, the weight a of the jth segment _ij Can be expressed as:

……（3）

w _ij =f(W ^T [e _qj ;e _ai ])+b……（4）

in the above equations (3) and (4), W, b represent the network parameters of the attention network, and T represents transposition. Wherein, W ^T ∈R ^d*1 Is the intermediate parameter matrix and b is the bias term. Further, n represents the total number of segments in the question text. On the basis, a second feature representation q of the question text obtained after each segment of the question text interacts with the candidate entity respectively can be obtained in a weighted summation mode _i ：

……（5）

In a specific implementation scenario, similar to the question text, the sample question text includes a plurality of sample segments, and then the sample segment feature representations of the sample segments may be interacted with the sample case feature representations of the case entity, respectively, to obtain a first question feature representation of the sample question text mapped in a first space, where the first space is a feature space where the sample case feature representation is located, that is, the first question feature representation and the sample case feature representation have the same feature dimension. Meanwhile, the sample fragment feature representation of each sample fragment can be interacted with the sample negative case feature representation of the negative case entity respectively to obtain a second problem feature representation of the sample problem text mapped in a second space, and the second space is a feature space where the sample negative case feature representation is located, that is, the second problem feature representation and the sample negative case feature representation have the same feature dimension. For a specific process of acquiring the first problem feature representation and the second problem feature representation, reference may be made to the process of acquiring the problem feature representations, which is not described herein again. On this basis, a first similarity may be obtained based on the first problem feature representation and the sample positive case feature representation, a second similarity may be obtained based on the second problem feature representation and the sample negative case feature representation, and a network parameter of the attention network may be adjusted based on a difference between the first similarity and the second similarity. It should be noted that an inner product of the first problem feature representation and the sample positive case feature representation may be taken as the first similarity, and an inner product of the second problem feature representation and the sample negative case feature representation may be taken as the second similarity. Furthermore, the first can be specifically treated with a hinge loss functionThe difference between the similarity and the second similarity is used for obtaining the predicted loss of the attention network

：

……（6）

In the above equation (6), q represents the sample question text, a 'represents the negative case entity, a represents the positive case entity, S (q, a) represents the first similarity, S (q, a') represents the second similarity, [ x ] x] ₊ = max (0, x). In addition, γ represents a parameter of the degree of distinguishing the positive example entity from the negative example entity, and a specific value thereof may be set according to the actual situation, which is not limited herein. Thus, losses are predicted by minimizing the number of model training sessions

The first similarity can be made as large as possible, and the second similarity as small as possible. In the practical application process, the question answer pair (q, a) can be constructed in advance, and the candidate answer set C is subjected to _q Constructing positive and negative examples, i.e. correct answer sets P _q And wrong answer set N _q For a proper instance entity a, a ∈ P _q And randomly in the wrong answer set N _q And selecting k (k is greater than or equal to 1) wrong answers to form a negative example entity, and constructing sample data to train the attention network. In addition, in the training process, a SGD (storage Gradient component) may be specifically used to adjust the network parameters of the attention network. In the above manner, the sample segment feature representations of the sample segments are interacted with the sample positive case feature representations of the positive case entities respectively to obtain first problem feature representations of the sample problem texts mapped in the first space, where the first space is the feature space where the sample positive case feature representations are located, and the sample segment feature representations of the sample segments are interacted with the negative case feature representations of the negative case entities respectively to obtain second problems of the sample problem texts mapped in the second spaceAnd the second space is a feature space in which the sample negative case feature representation is located, the first similarity is obtained based on the first problem feature representation and the sample positive case feature representation, the second similarity is obtained based on the second problem feature representation and the sample negative case feature representation, and the network parameters of the attention network are adjusted based on the difference between the first similarity and the second similarity, so that the attention network can distinguish positive and negative cases as much as possible, and the weight prediction accuracy is improved.

In an implementation scenario, different from the aforementioned obtaining of the weight of each segment through attention interaction, in order to simplify the process of obtaining the weight, the weight of each segment may also be obtained by performing a similarity measure (e.g., inner product, etc.) with the first feature representation based on each segment feature representation, and then weighting the segment feature representations of the segments based on the weights of the segments to obtain the second feature representation. For a specific weighting process, reference may be made to the foregoing related description, which is not repeated herein.

Step S13: and obtaining the feature similarity of the question text and the candidate entity in the target space based on the first feature representation and the second feature representation.

In one implementation scenario, as previously described, for each candidate entity, a first feature representation of the candidate entity itself and a second feature representation of the question text obtained after each segment of the question text interacts with the candidate entity may be obtained. In this case, for each candidate entity, the feature similarity between the question text and the question text in the target space can be obtained by inner product based on the first feature representation of the candidate entity and the second feature representation obtained after interaction. That is, for each candidate entity, the aforementioned steps of attention interaction and similarity measurement may be performed to obtain feature similarity between each candidate entity and the question text.

In one implementation scenario, for convenience of description, the second feature representation obtained after the question text and the ith candidate entity are subjected to the aforementioned attention interaction may be denoted as q _i And as previously described, the first feature representation of the ith candidate entity is notede _ai Then, the feature similarity S (q, a) between the ith candidate entity and the question text can be expressed as:

S(q,a)=h(q _i ,e _ai )……（7）

in the above formula (7), h (,) represents a similarity measure function, such as may include, but is not limited to, an inner product, etc., and is not limited herein.

In one implementation scenario, referring to fig. 5, fig. 5 is a process diagram of an embodiment of a feature similarity metric. As shown in FIG. 5, for each segment x in the question text ₁ ,x ₂ ,…,x _n In other words, the segment feature representation can be extracted from a pre-trained language model (e.g., RoBERTa), while at the same time, each candidate entity (e.g., e) _a1 ,e _a2 ,e _a3 ,e _a4 ) And extracting through a complEx model to obtain a first feature representation of each candidate entity. On the basis, for each candidate entity, the respective segment feature representation can be subjected to attention interaction with the first feature representation of the candidate entity to obtain a second feature representation of the question text. E.g., candidate entity e _a1 After interaction of the first feature representation of (a) with the respective segment feature representation attention, a second feature representation q of the question text can be obtained _a1 . Further, for each candidate entity, a feature similarity between the question text and the candidate entity may be obtained based on its own first feature representation and a second feature representation obtained after attention interaction with the respective segment feature representations.

Step S14: and selecting at least one candidate entity as answer text of the question text based on the feature similarity between each candidate entity and the question text.

In one implementation scenario, as described above, for each candidate entity, a first feature representation of the candidate entity itself and a second feature representation of the question text obtained after each segment of the question text interacts with the candidate entity may be obtained, and based on the first feature representation of the candidate entity itself and the corresponding second feature representation, a first feature representation of the candidate entity and a second feature representation of the question text are obtained between the candidate entity and the question textThe feature similarity of (c). On the basis, the candidate entity corresponding to the maximum feature similarity can be selected as the answer text. Referring to FIG. 5, as mentioned above, each candidate entity e can be obtained by the above-mentioned processing _a1 ,e _a2 ,e _a3 ,e _a4 Respectively comparing the feature similarity with the question text, wherein the candidate entity e _a1 The similarity of the features with the question text is maximum, and the candidate entity e can be directly connected _a1 As answer text to the question text. Other cases may be analogized, and no one example is given here. For convenience of description, the answer text may be expressed as:

……（8）

in the above formula (8), e _a Represents a set of candidate entities, argmax represents taking the maximum value, (q) _a And a) represents a question text q _a Feature similarity with candidate entity a. By the method, the candidate entity corresponding to the maximum feature similarity is directly selected to serve as the answer text, the candidate entity can be quickly and accurately designated to serve as the answer text according to the feature similarity, and the efficiency of answering questions is improved.

In an implementation scenario, unlike the above-mentioned method of directly selecting the candidate entity corresponding to the maximum feature similarity as the only answer text, the candidate entity corresponding to the feature similarity whose difference between the target similarity and the feature similarity is smaller than the preset threshold may also be selected as the answer text, and the target similarity is the maximum feature similarity. That is, the difference between each feature similarity and the maximum feature similarity may be calculated, and the candidate entity corresponding to the feature similarity having the difference smaller than the preset threshold may be selected as the answer text. Referring to FIG. 5, as mentioned above, each candidate entity e can be obtained by the above-mentioned processing _a1 ,e _a2 ,e _a3 ,e _a4 Respectively comparing the feature similarity with the question text, wherein the candidate entity e _a1 The feature similarity with the question text is the largest, thenTo respectively calculate candidate entities e _a1 ,e _a2 ,e _a3 ,e _a4 And if the difference value corresponding to the candidate entity is smaller than a preset threshold value, the candidate entity can be used as an answer text. For convenience of description, the answer text may be expressed as:

Answer={a|S _max -S(q,a)<γ}……（9）

in the above formula (9), S _max The maximum feature similarity, namely the target similarity, is represented, S (q, a) represents the feature similarity between the question text q and the candidate entity a, and gamma represents a preset threshold. In the above manner, the candidate entity corresponding to the feature similarity with the difference value smaller than the preset threshold value between the target similarity and the target similarity is selected as the answer text, and the target similarity is the maximum feature similarity, so that the method is not only applicable to the scenario where only one answer text is corresponding to the question text, but also applicable to the scenario where more than one answer text is corresponding to the question text, and therefore, the application range can be greatly expanded.

In one implementation scenario, please refer to fig. 6 in combination, and fig. 6 is a process diagram of an embodiment of the question answering method of the present application. As shown in fig. 6, the problem text may obtain segment feature representations of respective segments through a feature extraction network (e.g., the pre-trained language model such as RoBERTa described above), and at the same time, the entity in the knowledge graph may obtain a first feature representation of the entity through the knowledge representation network, and each segment feature representation performs attention interaction with the first feature representation to obtain a second feature representation of the problem text, and based on the second feature representation obtained after the entity interacts with the problem text and the first feature representation of the entity, a feature similarity between the entity and the problem text may be obtained. On the basis, any one of the above modes can be selected, and at least one entity is selected as an answer text of the question text based on the feature similarity corresponding to each entity.

Referring to fig. 7, fig. 7 is a block diagram of an embodiment of the question answering device 70 according to the present application. Specifically, the question answering device 70 includes: the system comprises an acquisition module 71, an interaction module 72, a measurement module 73 and a selection module 74, wherein the acquisition module 71 is used for acquiring question texts and knowledge maps; the problem text comprises a plurality of fragments, and the knowledge graph comprises a plurality of entities; the interaction module 72 is configured to interact with the first feature representations of the candidate entities respectively based on the segment feature representations of the segments to obtain second feature representations of the problem texts mapped in the target space; the candidate entities are selected from a plurality of entities, and the target space is the feature space where the candidate entities are located; the measurement module 73 is configured to obtain feature similarity between the question text and the candidate entity in the target space based on the first feature representation and the second feature representation; and a selecting module 74, configured to select at least one candidate entity as an answer text of the question text based on feature similarities between the respective candidate entities and the question text.

In some disclosed embodiments, the interaction module 72 includes an attention sub-module configured to perform attention interaction with the first feature representation based on each segment feature representation, respectively, to obtain a weight of each segment; the interaction module 72 comprises a weighting sub-module for weighting the segment feature representations of the segments based on their weights, resulting in a second feature representation.

Therefore, attention interaction is carried out on the segment feature representations based on the segments and the first feature representations respectively to obtain weights of the segments, then the segment feature representations of the segments are weighted based on the weights of the segments to obtain second feature representations of the problem texts mapped in the target space, so that the attention degree of the candidate entities to the segments can be accurately measured through the attention interaction, the relevance between the problem texts and the candidate entities can be more accurately embodied through the second feature representations of the problem texts, and the accuracy of the second feature representations can be further improved.

In some disclosed embodiments, the weights are obtained based on an attention network trained based on sample data, the sample data including a sample question text, positive examples entities and negative examples entities, the positive examples entities constituting sample answer texts of the sample question text, and the negative examples entities not constituting sample answer texts.

Therefore, the weights are obtained based on the attention network, the attention network is obtained based on sample data training, the sample data comprises a sample question text, a positive example entity and a negative example entity, the positive example entity forms a sample answer text of the sample question text, and the negative example entity does not form the sample answer text, so that the attention network can learn the feature information of the positive example entity capable of correctly answering the sample question text and the feature information of the negative example entity capable of incorrectly answering the sample question text in the training process, the network generalization performance can be improved while the difference caused by the difference of the feature spaces of the question text and the candidate entity is greatly reduced, and the weight prediction accuracy of the attention network is improved.

In some disclosed embodiments, the sample question text includes a plurality of sample segments, and the question answering device 70 includes a first sample interaction module, configured to interact with sample positive case feature representations of positive case entities respectively based on the sample segment feature representations of the sample segments, so as to obtain a first question feature representation in which the sample question text is mapped in a first space; the first space is a feature space where sample positive example feature representations are located; the question answering device 70 includes a second sample interaction module, configured to interact with sample negative case feature representations of negative case entities respectively based on the sample fragment feature representations of the sample fragments, so as to obtain a second question feature representation in which a sample question text is mapped in a second space; the second space is a feature space where the sample negative example feature representation is located; the question answering device 70 includes a similarity measure module for deriving a first similarity based on the first question feature representation and the sample positive case feature representation, and deriving a second similarity based on the second question feature representation and the sample negative case feature representation; the question answering device 70 comprises a parameter adjustment module for adjusting a network parameter of the attention network based on a difference between the first similarity and the second similarity.

Therefore, a first problem feature representation of the sample problem text mapped in a first space is obtained by interacting with the sample positive case feature representation of the positive case entity based on the sample segment feature representation of each sample segment, the first space is a feature space where the sample positive case feature representation is located, a second problem feature representation of the sample problem text mapped in a second space is obtained by interacting with the negative case feature representation of the negative case entity based on the sample segment feature representation of each sample segment, the second space is a feature space where the sample negative case feature representation is located, a first similarity is obtained based on the first problem feature representation and the sample positive case feature representation, a second similarity is obtained based on the second problem feature representation and the sample negative case feature representation, and a network parameter of the attention network is adjusted based on a difference between the first similarity and the second similarity, therefore, the attention network can distinguish the positive and negative samples as much as possible, and the weight prediction accuracy is improved.

In some disclosed embodiments, the knowledge-graph relates to a target domain, and the question answering device 70 includes a data acquisition module for acquiring first data and second data relating to the target domain; the first data is unstructured data, and the second data is structured data; the question answering device 70 includes an event extraction module for performing event extraction based on the first data to obtain a plurality of first entities; the question answering device 70 includes a knowledge fusion module for performing knowledge fusion on the plurality of first entities based on the second data to obtain a plurality of second entities; the question answering device 70 includes a graph construction module for constructing a knowledge graph based on a plurality of second entities.

Therefore, the knowledge graph is constructed by combining the unstructured first data and the structured second data together, sparse problems of the knowledge graph can be greatly relieved, on one hand, the follow-up excavation of deep-level information based on the knowledge graph is facilitated, the accuracy of question answering is improved, and on the other hand, the knowledge graph is more perfect, and the efficiency of searching answers based on the knowledge graph is also facilitated to be improved.

In some disclosed embodiments, the question answering device 70 includes a sub-graph extraction module for extracting, based on the question text, a sub-graph from the knowledge graph that is relevant to answering the question text; the question answering device 70 includes a candidate entity selection module for selecting each entity in the sub-map as a candidate entity.

Therefore, on the basis of the question text, a sub-graph spectrum related to the answer question text is extracted from the knowledge graph, and each entity in the sub-graph spectrum is respectively used as a candidate entity, so that the number of hops of an inference path is limited in a KBQA task, so that the correct answer is increased and excluded from the limited number of hops.

In some disclosed embodiments, the selection module 74 includes a first selection sub-module for selecting the candidate entity corresponding to the largest feature similarity as the answer text; the selecting module 74 includes a second selecting sub-module, configured to select, as an answer text, a candidate entity corresponding to the feature similarity whose difference with the target similarity is smaller than a preset threshold; wherein the target similarity is the maximum feature similarity.

Therefore, the candidate entity corresponding to the maximum feature similarity is directly selected to serve as the answer text, the candidate entity can be quickly and accurately designated to serve as the answer text according to the feature similarity, and the efficiency of answering questions is improved; and selecting the candidate entity corresponding to the feature similarity with the difference value smaller than the preset threshold value with the target similarity as the answer text, wherein the target similarity is the maximum feature similarity, and the candidate entity is not only suitable for the scene that the answer text corresponding to the question text is only one, but also suitable for the scene that the answer text corresponding to the question text is more than one, so that the application range can be greatly improved.

Referring to fig. 8, fig. 8 is a schematic block diagram of an embodiment of an electronic device 80 according to the present application. The electronic device 80 comprises a memory 81 and a processor 82 coupled to each other, the memory 81 storing program instructions, and the processor 82 executing the program instructions to implement the steps of any of the above-described embodiments of the question answering method. Specifically, the electronic device 80 may include, but is not limited to: desktop computers, notebook computers, servers, mobile phones, tablet computers, and the like, without limitation.

In particular, the processor 82 is adapted to control itself and the memory 81 to implement the steps of any of the above-described embodiments of the question answering method. The processor 82 may also be referred to as a CPU (Central Processing Unit). The processor 82 may be an integrated circuit chip having signal processing capabilities. The Processor 82 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 82 may be collectively implemented by an integrated circuit chip.

According to the scheme, on one hand, the attention degree of the candidate entity to different positions of the problem text is measured through feature interaction, so that the relevance between the problem text and the candidate entity can be more accurately reflected by the second feature representation of the problem text, on the other hand, the first feature representation of the candidate entity and the second feature representation of the problem text tend to be in the same feature space through feature interaction, and therefore the difference caused by the fact that the feature spaces of the problem text and the candidate entity are different can be greatly reduced. Therefore, the accuracy of question answering can be improved.

Referring to fig. 9, fig. 9 is a block diagram illustrating an embodiment of a computer-readable storage medium 90 according to the present application. The computer readable storage medium 90 stores program instructions 91 executable by the processor, the program instructions 91 for implementing the steps in any of the above-described embodiments of the question answering method.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and for specific implementation, reference may be made to the description of the above method embodiments, and for brevity, details are not described here again.

The foregoing description of the various embodiments is intended to highlight various differences between the embodiments, and the same or similar parts may be referred to each other, and for brevity, will not be described again herein.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be through some interfaces, indirect coupling or communication connection between devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Claims

1. A method of answering questions, comprising:

acquiring a problem text and a knowledge graph; wherein the question text comprises a plurality of segments, and the knowledge graph comprises a plurality of entities;

interacting with the first feature representation of the candidate entity respectively based on the segment feature representation of each segment to obtain a second feature representation of the problem text mapped to the target space; the candidate entities are selected from the entities, and the target space is the feature space where the candidate entities are located;

obtaining the feature similarity of the question text and the candidate entity in the target space based on the first feature representation and the second feature representation;

selecting at least one candidate entity as an answer text of the question text based on the feature similarity between each candidate entity and the question text;

wherein the interacting between the segment feature representation based on each segment and the first feature representation of the candidate entity to obtain the second feature representation of the problem text mapped to the target space comprises:

performing attention interaction with the first feature representation respectively based on each segment feature representation to obtain the weight of each segment, or performing similarity measurement with the first feature representation respectively based on each segment feature representation to obtain the weight of each segment;

and weighting the segment feature representation of the segment based on the weight of the segment to obtain the second feature representation.

2. The method of claim 1, wherein the weights are derived based on an attention network trained based on sample data comprising a sample question text, a positive case entity and a negative case entity, and wherein the positive case entity constitutes a sample answer text of the sample question text and the negative case entity does not constitute the sample answer text.

3. The method of claim 2, wherein the sample question text comprises a number of sample fragments, and wherein the step of training the attention network comprises:

respectively interacting with sample practice characteristic representations of the practice entities based on the sample fragment characteristic representations of the sample fragments to obtain first problem characteristic representations of the sample problem texts mapped in a first space; wherein the first space is a feature space in which the sample positive case feature representation is located; and the number of the first and second groups,

interacting with the sample negative case feature representation of the negative case entity respectively based on the sample fragment feature representation of each sample fragment to obtain a second problem feature representation of the sample problem text mapped in a second space; wherein the second space is a feature space in which the sample negative case feature representation is located;

obtaining a first similarity based on the first problem feature representation and the sample positive case feature representation, and obtaining a second similarity based on the second problem feature representation and the sample negative case feature representation;

adjusting a network parameter of the attention network based on a difference between the first similarity and the second similarity.

4. The method of claim 1, wherein the knowledge-graph relates to a target domain, and the obtaining step of the knowledge-graph comprises:

acquiring first data and second data related to the target field; wherein the first data is unstructured data and the second data is structured data;

extracting events based on the first data to obtain a plurality of first entities;

performing knowledge fusion on the plurality of first entities based on the second data to obtain a plurality of second entities;

and constructing the knowledge graph based on the plurality of second entities.

5. The method of claim 1, wherein the step of selecting the candidate entity comprises:

extracting, from the knowledge graph, a sub-graph related to answering the question text based on the question text;

and taking each entity in the sub-map as the candidate entity respectively.

6. The method according to claim 1, wherein the selecting at least one candidate entity as the answer text of the question text based on the feature similarity between each candidate entity and the question text comprises:

selecting the candidate entity corresponding to the maximum feature similarity as the answer text;

or selecting a candidate entity corresponding to the feature similarity with the difference value smaller than a preset threshold value with the target similarity as the answer text; wherein the target similarity is the maximum feature similarity.

7. A question answering device, comprising:

the acquisition module is used for acquiring a problem text and a knowledge graph; wherein the question text comprises a plurality of segments, and the knowledge graph comprises a plurality of entities;

the interaction module is used for respectively interacting with the first feature representation of the candidate entity based on the segment feature representation of each segment to obtain a second feature representation of the problem text mapped in the target space; the candidate entities are selected from the entities, and the target space is the feature space where the candidate entities are located;

the measurement module is used for obtaining the feature similarity of the question text and the candidate entity in the target space based on the first feature representation and the second feature representation;

the selection module is used for selecting at least one candidate entity as an answer text of the question text based on the feature similarity between each candidate entity and the question text;

the interaction module comprises an attention submodule or a similarity measurement submodule, the attention submodule is used for performing attention interaction with the first feature representation respectively based on the segment feature representations to obtain the weight of each segment, and the similarity measurement submodule is used for performing similarity measurement with the first feature representation respectively based on the segment feature representations to obtain the weight of each segment; the interaction module comprises a weighting submodule for weighting the segment feature representation of the segment based on the weight of the segment to obtain the second feature representation.

8. An electronic device comprising a memory and a processor coupled to each other, the memory having stored therein program instructions, the processor being configured to execute the program instructions to implement the question answering method of any one of claims 1 to 6.

9. A computer-readable storage medium, characterized in that program instructions executable by a processor are stored, the program instructions being for implementing the question answering method according to any one of claims 1 to 6.