CN109033318B

CN109033318B - Intelligent question and answer method and device

Info

Publication number: CN109033318B
Application number: CN201810790249.5A
Authority: CN
Inventors: 余军; 罗长寿; 郑亚明; 魏清凤; 王富荣; 曹承忠; 陆阳; 郭强; 于维水; 王静宇
Original assignee: Beijing Academy of Agriculture and Forestry Sciences
Current assignee: Beijing Academy of Agriculture and Forestry Sciences
Priority date: 2018-07-18
Filing date: 2018-07-18
Publication date: 2020-11-27
Anticipated expiration: 2038-07-18
Also published as: CN109033318A

Abstract

The invention provides an intelligent question answering method and device, which are used for performing word segmentation processing on a text of a question to be answered and determining a context for performing semantic similarity judgment according to a word segmentation result of the question to be answered; collecting a number of common questions according to the context; performing word segmentation processing on the texts of all the common problems, and establishing a context map according to word segmentation results of all the common problems; for any one of the common questions, calculating semantic similarity between the common question and the question to be solved according to the word segmentation result of the common question, the word segmentation result of the question to be solved and the contextual graph; and taking the candidate answer corresponding to the common question with the highest similarity as the answer corresponding to the question to be answered. The embodiment of the invention can more accurately analyze the question to be solved and provide the answer.

Description

Intelligent question and answer method and device

Technical Field

The invention relates to the technical field of natural language processing, in particular to an intelligent question answering method and device.

Background

In the question answering system, the pushing randomness of the general chat answers is strong. But in the professional application field, the reply content needs to be accurate. A study that utilizes a computer to identify "user questions" for semantic comparison with existing sentences in the sentence library is referred to as a sentence similarity study. It has been a research hotspot and difficulty as a key problem in natural language processing. In addition to the sentence similarity calculation by mining the inter-word relationship and the overlap degree of the sentences (e.g., relying on WordNet architecture and relying on known network architecture and corpus), the sentence similarity research also starts to be developed based on feature extraction of neural network.

Expert scholars of computation methods based on semantic similarity of words have conducted extensive research. For example: statistical methods based on word co-occurrence. The method mainly carries out statistics through word frequency in sentences, such as TF-IDF algorithm, Jaccard Similarity coeffient method, Metzler improved method based on overlap and the like. The methods are simple and efficient to implement, but completely ignore lexical and semantic information of sentences. The other is a lexical and semantic information based approach. The method considers relevant elements of semantic information, but the construction is relatively complex, such as semantic similarity calculation based on an ontology and the like. Thirdly, a neural network corpus training feature extraction method is also vigorously developed in recent years, for example, sentence semantic similarity calculation research based on Word2vec depends on the quality and quantity of the corpus, the feature extraction is emphasized, the comprehension of the sentence meaning is ignored, and the mining of the true positive semantics cannot be realized. And the fourth method is a method adopting a comprehensive fusion means, such as sentence semantic similarity calculation based on multi-feature fusion and the like. With the progress of research and the discovery of application experience, in practical application, if various methods are separated from an application scene, the algorithm is complex to implement, or the efficiency is low, uncertain factors interfere much, and certain operation limitation exists. Therefore, the prior art provides "a word similarity calculation method based on context". The method is based on a similarity calculation method, and adopts a fuzzy mathematics concept to evaluate the word sense similarity calculation method by introducing the context of words. The method constructs the fuzzy importance of the words in the context by using the related determination of the membership, improves the sentence meaning similarity of the words, but has deficiency in the whole sentence meaning similarity of the sentences.

Disclosure of Invention

The present invention provides an intelligent question-answering method and apparatus that overcomes, or at least partially solves, the above-mentioned problems.

According to a first aspect of the present invention, there is provided an intelligent question answering method, comprising:

performing word segmentation processing on a text of a question to be solved, and determining a context for semantic similarity judgment according to a word segmentation result of the question to be solved;

collecting a number of common questions according to the context;

performing word segmentation processing on the texts of all the common problems, and establishing a context map according to word segmentation results of all the common problems;

for any one of the common questions, calculating the similarity between the common question and the question to be solved according to the word segmentation result of the common question, the word segmentation result of the question to be solved and the context map;

taking the candidate answer corresponding to the common question with the highest similarity as the answer corresponding to the question to be solved;

wherein the context map is an undirected graph representing the combined relationship between the participles of all the common questions.

According to a second aspect of the present invention, there is provided an intelligent question-answering device, comprising:

the context acquisition module is used for determining a context for semantic similarity judgment according to the problem to be solved;

a common question acquisition module for collecting a certain number of common questions according to the context;

the contextual graph acquisition module is used for performing word segmentation processing on the texts of all the common problems and establishing a contextual graph according to word segmentation results of all the common problems; wherein the context map is an undirected graph representing the combined relationship among the participles of all the common problems;

the similarity calculation module is used for calculating the similarity between the common question and the question to be solved according to the word segmentation result of the common question, the word segmentation result of the question to be solved and the contextual graph for any one of the common questions;

and the answer matching module is used for taking the candidate answer corresponding to the common question with the highest similarity as the answer corresponding to the question to be answered.

According to a third aspect of the present invention, there is also provided an electronic apparatus comprising:

at least one processor; and

at least one memory communicatively coupled to the processor, wherein:

the memory stores program instructions executable by the processor, and the processor calls the program instructions to perform the intelligent question and answer method provided by any one of the various possible implementations of the first aspect.

According to a fourth aspect of the present invention, there is also provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the smart question-and-answer method provided in any one of the various possible implementations of the first aspect.

The intelligent question answering method and the intelligent question answering device carry out context analysis on the problem to be answered to obtain the context for subsequently carrying out semantic similarity judgment, obviously the context is related to the problem to be answered, a certain number of common problems which are the same as or similar to the context are continuously obtained, so as to map the questions to be solved and the common questions into the same context for analysis, improve the accuracy of the difference analysis among the questions, the semantic similarity calculation accuracy is higher, a context map is constructed according to the obtained word segmentation result of the common problems, in embodiments of the present invention, a context map is constructed based on a number of the common problems described above, characterizing large data, the context in the embodiment of the present invention is a macroscopic context, which is completely different from the context constructed by the existing contexts of the questions to be solved and the frequently used questions based on the semantic similarity to be compared. The embodiment of the invention can more accurately analyze the question to be solved and provide the answer.

Drawings

FIG. 1 is a schematic flow chart of an intelligent question answering method according to an embodiment of the present invention;

FIG. 2 is a context diagram according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart illustrating a process of calculating similarity between a common question and a question to be solved according to a segmentation result of the common question, a segmentation result of the question to be solved and a context map according to an embodiment of the present invention;

fig. 4 is a flowchart illustrating a process of obtaining a similarity between any participle of the first text and any participle of the second text according to a context map to calculate an offset similarity between the first text and the second text according to an embodiment of the present invention;

FIG. 5 is a functional block diagram of an intelligent question answering device according to an embodiment of the present invention;

FIG. 6 is a block diagram of an electronic device according to an embodiment of the invention.

Detailed Description

The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.

In the prior art, the following methods are used for calculating the semantic similarity of words: the first method comprises the following steps: statistical methods based on word co-occurrence. The method mainly carries out statistics through word frequency in sentences, such as TF-IDF algorithm, Jaccard Similarity coeffient method, Metzler improved method based on overlap and the like. The methods are simple and efficient to implement, but completely ignore lexical and semantic information of sentences. The second is a lexical and semantic information based approach. The method considers relevant elements of semantic information, but the construction is relatively complex, such as semantic similarity calculation based on an ontology and the like. The third method is a neural network corpus training feature extraction method, which is also vigorously developed in recent years, for example, Word2 vec-based sentence semantic similarity calculation research and the like, which depends on the quality and quantity of corpora, focuses on feature extraction, ignores the comprehension of sentence meanings, and cannot realize the mining of true positive semantics. And the fourth method is a method adopting a comprehensive fusion means, such as sentence semantic similarity calculation based on multi-feature fusion and the like. With the progress of research and the discovery of application experience, in practical application, if various methods are separated from an application scene, the algorithm is complex to implement, or the efficiency is low, uncertain factors interfere much, and certain operation limitation exists. Therefore, the prior art provides "a word similarity calculation method based on context". The method is based on a similarity calculation method, by introducing the context of words and adopting the concept of fuzzy mathematics to evaluate the word sense phase calculation method. The method constructs the fuzzy importance of the words in the context by using the related determination of the membership, improves the sentence meaning similarity of the words, but has deficiency in the whole sentence meaning similarity of the sentences.

In order to overcome the above problems in the prior art, embodiments of the present invention provide a semantic similarity calculation method, the invention has the conception that the problem to be solved is subjected to context analysis to obtain the context for subsequent semantic similarity judgment, obviously the context is related to the problem to be solved, a certain number of common problems which are the same as or similar to the context are continuously obtained, so as to map the questions to be solved and the common questions into the same context for analysis, improve the accuracy of the difference analysis among the questions, the semantic similarity calculation accuracy is higher, a context map is constructed according to the obtained word segmentation result of the common problems, in embodiments of the present invention, a context map is constructed based on a number of the common problems described above, characterizing large data, the method is completely different from the existing context constructed based on the contexts of the to-be-solved question and the common question with the semantic similarity to be compared. The context in embodiments of the present invention is macroscopic. The embodiment of the invention can more accurately analyze the question to be solved and provide the answer.

Fig. 1 is a flowchart illustrating a semantic similarity calculation method according to an embodiment of the present invention, as shown in the figure, including:

s101, performing word segmentation processing on the text of the question to be solved, and determining a context for semantic similarity judgment according to the word segmentation result of the question to be solved.

Specifically, the process of acquiring the text of the question to be solved in the embodiment of the present invention may be:

and receiving text data of the question to be solved as a text of the question to be solved.

And receiving voice data of the question to be solved, carrying out voice recognition on the voice data to obtain text data subjected to voice recognition, and taking the text data subjected to voice recognition as a text of the question to be solved.

It should be understood that the above-described process of obtaining the text of the question to be solved is only a few possible implementations, and should not constitute any limitation to the embodiments of the present invention.

To more conveniently describe the basic principles of embodiments of the present invention, the text to be solved for the question is referred to as a first text p₁According to the existing word segmentation technology, p is divided₁Word segmentation is S₁、S₂、…S_mWherein m is from p₁And obtaining the number of the participles obtained by the participles, thereby obtaining the participles of the text and the number of the participles.

The context for semantic similarity judgment in the embodiment of the invention is determined according to the word segmentation result of the first text. The word segmentation result may show the technical field, environment, theme, mood, and other information of the text to be solved, for example, the first text is: a method for culturing tomato seedlings in greenhouse. After word segmentation, the word segmentation result of the first text is as follows: the tomato, the greenhouse, the seedling and the method can know that the contexts of the first text are all agricultural cultivation, particularly the field of tomato cultivation by analyzing word segmentation results.

S102, collecting a certain number of common problems according to the context.

It should be noted that, after determining the context, the embodiment of the present invention collects a certain number of common problems from the preset database. It will be appreciated that a vast number of frequently asked questions, along with answers to each frequently asked question, are stored in the database. These common questions and answers may be collected from the internet through web crawler processing methods. In the above embodiment, the problem to be solved is determined to be the tomato cultivation field, and therefore, a certain number of common problems in the tomato cultivation field can be found from the database. It should be understood that the above-described process of a certain number of common problems of a cell phone according to the context is only a possible implementation and should not constitute any limitation to the present application.

S103, performing word segmentation processing on the texts of all the common problems, and establishing a context map according to word segmentation results of all the common problems; wherein the context map is an undirected graph representing the combined relationship among the participles of all the common problems.

Specifically, the process of performing word segmentation processing on the text with the common problems may refer to the description of the above embodiments, and is not described herein again.

It should be noted that the context map in the embodiment of the present invention is a net map, vertices in the net map are participles, and edges or arcs connecting the words indicate that a combination relationship (which may also be a weight relationship, and this is not limited by the embodiment of the present invention) exists between two words. In the embodiment of the present invention, the context map is an undirected graph, and if the context-relationship undirected graph G has n vertices (i.e., n different words), the adjacency matrix is an n × n square matrix defined as:

in the above formula, g [ i ] [ j ] represents the value of a word pair formed by the participle i and the participle j in the adjacency matrix, and E represents that two words have a combination relationship.

For example, there are two common problems with text, hereinafter referred to as sample text 1 and sample text 2, sample text 1: a method for growing seedlings of tomatoes in a greenhouse; sample text 2: a method for culturing tomato seedlings. After extracting the word segmentation, stop words and feature words, four words are proposed: for convenience of expression, the tomato, the greenhouse, the seedling raising and the method are respectively set as follows: v1 (tomato), V2 (greenhouse), V3 (seedling), V4 (method); the context maps (the embodiments of the present invention do not consider the position directionality, so are undirected maps) generated by the existence of the edge relationships (V1V2), (V1V3), (V2V3) and (V3V4) are shown in fig. 2, and the corresponding adjacency matrices are as follows:

after the context map is converted into the adjacency matrix, the degree (corresponding to the number of words) of any vertex (word) can be obtained, namely the vertex V_iThe sum of the elements in the ith row in the adjacency matrix. Example (c): v₁Degree of 2, V₂Degree of 2, V₃Degree of 3, V₄The degree is 1; calculating vertex V_iAll the adjacency points in (1) are the adjacency points by scanning the ith row elements in the adjacency matrix, and the word set formed by all the adjacency points is the context word set of the word: v₁The context word set of words includes V₂And V₃；V₂The context word set of words includes V₁And V₃,V₃The context word set of words includes V₁、V₂And V₄，V₄The context word set of words includes V₃。

And S104, for any one common question, calculating the semantic similarity between the common question and the question to be solved according to the word segmentation result of the common question, the word segmentation result of the question to be solved and the context map.

It should be noted that, when calculating the semantic similarity, the embodiment of the present invention performs calculation by mapping the segmentation results of the to-be-solved question and the common question to the corresponding contexts, so as to improve the precision of the difference analysis between the questions, and improve the accuracy of calculating the semantic similarity.

S105, taking the candidate answer corresponding to the common question with the highest similarity as the answer corresponding to the question to be answered;

specifically, after semantic similarity judgment is performed on each of the common questions and the question to be answered, a common question with the highest similarity can be obtained, and the answer of the common question with the highest similarity is used as the answer of the question to be answered, so that the intelligent question answering effect is achieved.

Based on the content of the above embodiments, as an alternative embodiment, the process of calculating the similarity between the frequently asked question and the question to be solved according to the word segmentation result of the frequently asked question, the word segmentation result of the question to be solved and the context map involves two levels of calculation, namely, the presentation level similarity and the semantic level similarity. The expression level similarity refers to the morphological similarity of two sentences, and is measured by the number of the same words or synonyms contained in the two sentences and the relative positions of the same words or synonyms in the sentences. The semantic layer refers to that the literal can not be directly reflected, and the sentence needing to be understood implies the semantic. There are various methods for calculating the surface similarity, such as cosine similarity, generalized Jaccard similarity, etc. The semantic layer similarity can adopt a semantic dictionary and a word sense context.

Fig. 3 is a schematic flow chart illustrating a process of calculating semantic similarity between the frequently asked question and the question to be solved according to the word segmentation result of the frequently asked question, the word segmentation result of the question to be solved, and the context map, in an embodiment of the present invention, as shown in fig. 3, specifically:

s301, calculating cosine similarity of the first text and the second text according to the context map. Wherein, the first text is the text of the file to be solved, the second text is the text of the common question, and p is used₂And (4) showing. Wherein p is₂Is W₁、W₂、…W_nN is from p₂And the number of the participles obtained by the participles.

It should be noted that cosine similarity is a cosine value of an included angle between two vectors, and the cosine similarity is used to represent a difference degree between two sentences; cosine similarity focuses on the difference in direction of vectors, i.e., the difference in trend, rather than the magnitude of absolute distance. The formula is as follows:

wherein x is_iRepresenting a first text p₁TF-IDF weight of the ith participle, y_iRepresenting a second text p₂The TF-IDF (term frequency-inverse document frequency) weight of the ith participle is a commonly used weighting technique for information retrieval and data mining. TF means Term Frequency (Term Frequency), and IDF means Inverse text Frequency index (Inverse Document Frequency). Because the context map is a word set relation map, after words are segmented in a sentence, the TF-IDF can be well utilized to calculate the weight of words in the sentence to carry out word extraction on the sentence, after words are extracted, the similarity measurement of the space vector cosine included angle is not influenced by index scales, and the cosine value falls in the interval [0,1 ]]The larger the value, the smaller the difference.

S302, obtaining the similarity of any word segmentation of the first text and any word segmentation of the second text according to the context map so as to calculate the offset similarity of the first text and the second text.

It should be noted that, when calculating the offset similarity, the embodiment of the present invention is obtained according to the similarity of the segmented words in the two texts in the context map, and since the context map records the adjacent point (i.e., the context word set) of each segmented word, the similarity of the two texts in the word position relationship can be determined by comparing the approximate conditions of the adjacent points between every two segmented words.

And S303, obtaining the context word sets of all the participles which are not present in the second text in the first text and the context word sets of all the participles which are not present in the first text in the second text according to the context map so as to calculate the semantic layer similarity of the first text and the second text.

It should be noted that, the semantic layer similarity represents the relationship between the implied semantics of the two texts, and since the information cannot be directly translated literally, in the embodiment of the present invention, the context word sets of all the participles that are not present in the other text in each text are respectively obtained through the context map, and the semantic layer similarity is calculated through the two context word sets.

S304, calculating the semantic similarity of the first text and the second text according to the cosine similarity, the offset similarity and the semantic layer similarity of the first text and the second text.

According to the method provided by the embodiment of the invention, the cosine similarity, the offset similarity and the semantic layer similarity of the first text and the second text are respectively obtained through the context map, the similarity of the participles of the two texts in the cosine included angle and the position relation of the space vectors and the similarity of the mutually exclusive words in the semantic layer are obtained, and the semantic similarity is finally obtained, so that the reliability and the accuracy of similarity judgment can be improved.

Based on the content of the above embodiment, as an optional embodiment, the method for obtaining the TF-IDF weight of the participle in the first/second text specifically includes:

forming word set A by using adjacent points of all the participles in the first text on the context map, and forming word set B by using adjacent points of all the participles in the second text on the context map;

all participles in the word set A and the word set B form a word set T, wherein T is A and U B;

forming word set C by using adjacent points of the participles which do not exist in the second text in the first text on the context map;

and forming word set D by the adjacent points of the participles which are not existed in the first text in the second text on the context map.

For participle x in first/second text_iObtaining the participle x_iAdjacent points on the context map form a word set E, and the coincidence degree of the participles in the word set E and the word set T is taken as a participle x_iA TF value of (1); in lg (n)_T/n_E∩T) As a participle x_iThe product of TF value and IDF value is used as word segmentation x_iOf TF-IDF, wherein n_TRepresenting the total number of participles in the set of words T, n_E∩TIndicates the total number of participles common to the word set E and the word set T.

The method for obtaining the IF-IDF weight of the participles in the first/second texts in the embodiment of the invention combines the combination relation of the participles in the context map, namely combines the context where the texts are located to obtain the IF-IDF weight, and can further improve the precision of the cosine similarity of the texts.

Based on the content of the foregoing embodiment, as an optional embodiment, a similarity between any participle of the first text and any participle of the second text is obtained according to the context map to calculate an offset similarity between the first text and the second text, as shown in fig. 4, specifically:

s401, according to the first text p₁The total number m of the word segmentation in the first text and the length len (P) of the first text are obtained as the word segmentation result₁) And word segmentation S_iRelative position pos (S) in the first text_i)。

It should be noted that the word segmentation S_iRelative position pos (S) in the first text_i) By the formula

And calculating, wherein i represents the position of the participle in the first text.

S402, according to the second text p₂The total number n of the participles in the second text and the length len (P) of the second text are obtained as the result of the participles₂) And word segmentation W_jRelative position pos (W) in the second text_j)；。

In addition, the word W is divided_jRelative position pos (W) in the second text_j) By the formula

And calculating, wherein j represents the position of the participle in the second text. It should be noted that, in the embodiment of the present invention, the order of steps S401 and S402 is not limited.

S403, calculating participles S according to the context map_iAnd word segmentation W_jSimilarity sim (S) of_i,W_j)。

It should be noted that, unlike the prior art that the similarity between the segmented words is calculated only for the context of the segmented words, the embodiment of the present invention obtains the segmented word S through the context map specifically_iAnd word segmentation W_jBy comparing the adjacent point data to obtain the similarity sim (S)_i,W_j) And the similarity judgment of the participle in the macroscopic context is realized.

S404, according to the formula

Computing a first textp₁And a second text p₂Offset similarity Sim of_p(p₁,p₂)。

It should be noted that, as can be known from the offset similarity formula, when the similarities of two participles are consistent, the more consistent the relative positions of the two participles, the greater the total offset similarity, and when the relative positions of the two participles are consistent, the greater the similarity of the participles, the greater the total offset similarity.

According to the method for calculating the offset similarity, the offset similarity of the two texts is obtained from the context map, and compared with the offset similarity obtained by only considering the context relation of word segmentation in the prior art, the difference precision between the texts is further improved, so that the semantic similarity calculation accuracy is higher.

Based on the above description of the embodiment, as an alternative embodiment, the participles S are calculated according to the context map_iAnd word segmentation W_jSimilarity sim (S) of_i,W_j) The method specifically comprises the following steps:

obtaining a participle S on a context map_iAdjacent point of (S)_i) Degree of harmony len (pi (S)_i))；

Obtaining participles W on a context map_jAdjacent point of (W)_j) Degree of harmony len (pi (W)_j))；

Calculate the similarity sim (S)_i,W_j)；

Wherein, T (pi (S)_i)∩π(W_j) Represent a participle S_iAnd word segmentation W_jCommon adjacency points.

Compared with the prior art that only the context relation of the participles is considered, the method for calculating the offset similarity further improves the difference precision between the texts, and enables the semantic similarity to be calculated more accurately.

Based on the content of the foregoing embodiment, as an optional embodiment, the context word sets of all the segmented words that are not present in the second text in the first text and the context word sets of all the segmented words that are not present in the first text in the second text are obtained according to the context map, so as to calculate the semantic layer similarity between the first text and the second text, specifically:

in a first text p₁To obtain the second text p₂The non-existent participles form the first participle set, the context words of all participles in the first participle set are obtained on the context map, and the first context word set pi (P) is formed₁) In the second text p₂To obtain a first text p₁The non-existent participles form the second participle set, the context words of all participles in the second participle set are obtained on the context map, and the second context word set pi (P) is formed₂)。

Taking the first text as: the method for growing seedlings of tomatoes in a greenhouse has a second text that: the method for growing seedlings of the American tomatoes is exemplified, and the word segmentation result of the first text is as follows: tomato, greenhouse, seedling and method, the word segmentation result of the second text is as follows: us, tomato, nursery stock, methods, then the word segmentation that the second text does not exist in the first text is a greenhouse, the word segmentation is obtained in the context map: context word set for greenhouse. Similarly, the first text in the second text supplements the segmentation of the united states, and the segmentation is obtained in the context map: context word sets in the united states.

According to the formula

Calculating the similarity Sim of the semantic layers of the first text and the second text_L(p₁,p₂)；

Wherein when p is₁And p₂In the absence of an antisense word, α ═ 1; when p is₁And p₂When an antisense word is present, α ═ 1; t (Pi (P)₁)∩π(P₂) Is represented by pi (P)₁) And pi (P)₂) Common context words; t (Pi (P)₁)∪π(P₂) Is represented by pi (P)₁) And pi (P)₂) All context words in (1).

It should be noted that, when calculating the semantic layer similarity using the above formula, it is necessary to perform the first semantic layer similarity calculation in advanceAnd searching whether the text and the second text contain the antisense words. When containing anti-sense words, the semantics of the two texts are opposite with a greater probability. According to pi (P)₁) And pi (P)₂) Common context words account for pi (P)₁) And pi (P)₂) The proportion of all the context words and the state of whether the antisense words are contained or not, the embodiment of the invention realizes the calculation of the similarity of the context layer. The method provided by the embodiment of the invention has higher precision in analyzing the similarity of the words which are not contained in the two sentences in the semantic layer under the condition of combining the context map.

Based on the content of the above embodiment, as an optional embodiment, the semantic similarity between the first text and the second text is calculated according to the cosine similarity, the offset similarity, and the semantic layer similarity between the first text and the second text, specifically:

according to the formula: sim_b(p₁,p₂)＝Cosin(p₁,p₂)+α₁×Sim_p(p₁,p₂) Obtaining a first text p₁And a second text p₂Is the layer similarity Sim_b(p₁,p₂)；

According to the formula: m (p)₁,p₂)＝Sim_b(p₁,p₂)+β₁×Sim_L(p₁,p₂) Obtaining a first text p₁And a second text p₂Semantic similarity m (p) of₁,p₂)；

Wherein, Cosin (p)₁,p₂)、Sim_p(p₁,p₂) And Sim_L(p₁,p₂) Respectively representing a first text p₁And a second text p₂Cosine similarity, offset similarity and semantic layer similarity, alpha₁Factor, β, representing the influence of offset similarity on the similarity of the representation layers₁And representing the influence factor of the semantic layer similarity on the semantic similarity.

It should be noted that, in the embodiment of the present invention, the cosine similarity and the offset similarity together form a presentation layer similarity, and then the semantic pixel points are obtained by synthesizing the presentation layer similarity and the semantic layer similarity. The embodiment of the invention fully considers the image of the macro context to the semantics and carries out deeper mining on the semantics.

Based on the contents of the above-described embodiments, as an alternative embodiment, α is analyzed by practice₁The value should be ensured that the product of the similarity with the offset is smaller than the cosine similarity value, and simultaneously alpha is ensured₁The product of the similarity with the offset becomes larger as the cosine similarity value becomes larger from 0, and becomes smaller as the cosine similarity value becomes larger when a certain value is reached. Thus, according to the formula: alpha is alpha₁＝(1-Cosin(p₁,p₂))×Cosin(p₁,p₂) Obtaining the influencing factor alpha₁；

Analysis of beta by practice₁The value should ensure that the product of the similarity with the semantic layer is smaller than the similarity of the presentation layer, and ensure beta₁The product of the semantic layer similarity becomes larger as the representation layer similarity value becomes larger from 0, and becomes smaller as the representation layer similarity value becomes larger when reaching a certain value of the neighboring point. Thus, according to the formula: beta is a₁＝(1-Sim_b(p₁,p₂))×Sim_b(p₁,p₂) Obtaining the influencing factor beta₁。

According to another aspect of the present invention, an intelligent question-answering device is further provided in the embodiments of the present invention, and referring to fig. 5, fig. 5 shows a functional block diagram of the intelligent question-answering device in the embodiments of the present invention, and the system is used for matching answers according to semantic similarity of the question to be solved and the common question in the foregoing embodiments. Therefore, the descriptions and definitions in the intelligent question answering method in the foregoing embodiments can be used for understanding the execution modules in the embodiments of the present invention.

As shown in the figure, the intelligent question answering device comprises:

the context acquisition module 501 is configured to perform word segmentation processing on a text of a question to be solved, and determine a context for semantic similarity judgment according to a word segmentation result of the question to be solved;

a frequently asked questions obtaining module 502 for collecting a certain number of frequently asked questions according to the context;

a contextual graph obtaining module 503, configured to perform word segmentation processing on the texts of all the common problems, and establish a contextual graph according to word segmentation results of all the common problems; wherein the context map is an undirected graph representing the combined relationship among the participles of all the common problems;

a similarity calculation module 504, configured to calculate, for any one of the common questions, a similarity between the common question and the question to be solved according to the word segmentation result of the common question, the word segmentation result of the question to be solved, and the context map;

the answer matching module 505 is configured to use the candidate answer corresponding to the common question with the highest similarity as the answer corresponding to the question to be answered.

The intelligent question answering device of the embodiment of the invention determines sentences for semantic similarity judgment according to the questions to be answered through the context acquisition module, collects a certain amount of common questions according to the context through the common question acquisition module, so as to map the questions to be solved and the common questions into the same context for analysis, improve the precision of the difference analysis among the questions, ensure that the calculation accuracy of the semantic similarity is higher, establishing a contextual graph according to the collected word segmentation result of the common questions of the mobile phone through a contextual graph acquisition module, calculating the similarity of the questions to be solved and the common questions according to the contextual graph by a similarity calculation module, in embodiments of the present invention, a context map is constructed based on a number of the common problems described above, characterizing large data, the method is completely different from the existing context constructed based on the contexts of the to-be-solved question and the common question with the semantic similarity to be compared. The context in the embodiment of the invention is macroscopic, and finally, the candidate answer corresponding to the common question with the highest similarity is used as the answer corresponding to the question to be answered by the answer matching module. The embodiment of the invention can more accurately analyze the question to be solved and provide the answer.

The embodiment of the invention provides electronic equipment. Referring to fig. 6, the apparatus includes: a processor (processor)601, a memory (memory)602, and a bus 603;

the processor 601 and the memory 602 complete communication with each other through the bus 603, respectively; the processor 601 is configured to call the program instructions in the memory 602 to execute the semantic similarity calculation method provided in the foregoing embodiments, for example, including: performing word segmentation processing on a text of a question to be solved, and determining a context for semantic similarity judgment according to a word segmentation result of the question to be solved; collecting a number of common questions according to the context; performing word segmentation processing on the texts of all the common problems, and establishing a context map according to word segmentation results of all the common problems; for any one of the common questions, calculating the similarity between the common question and the question to be solved according to the word segmentation result of the common question, the word segmentation result of the question to be solved and the context map; taking the candidate answer corresponding to the common question with the highest similarity as the answer corresponding to the question to be solved; wherein the context map is an undirected graph representing the combined relationship between the participles of all the common questions.

An embodiment of the present invention provides a non-transitory computer-readable storage medium, where the non-transitory computer-readable storage medium stores computer instructions, where the computer instructions cause a computer to execute the method for calculating semantic similarity provided in the foregoing embodiment, for example, the method includes: performing word segmentation processing on a text of a question to be solved, and determining a context for semantic similarity judgment according to a word segmentation result of the question to be solved; collecting a number of common questions according to the context; performing word segmentation processing on the texts of all the common problems, and establishing a context map according to word segmentation results of all the common problems; for any one of the common questions, calculating the similarity between the common question and the question to be solved according to the word segmentation result of the common question, the word segmentation result of the question to be solved and the context map; taking the candidate answer corresponding to the common question with the highest similarity as the answer corresponding to the question to be solved; wherein the context map is an undirected graph representing the combined relationship between the participles of all the common questions.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An intelligent question answering method is characterized by comprising the following steps:

collecting a number of common questions according to the context;

for any one of the common questions, calculating semantic similarity between the common question and the question to be solved according to the word segmentation result of the common question, the word segmentation result of the question to be solved and the contextual graph;

wherein the context map is an undirected graph representing the combined relationship among the participles of all the common problems;

the calculating the semantic similarity between the common question and the question to be solved according to the word segmentation result of the common question, the word segmentation result of the question to be solved and the context map specifically comprises the following steps:

calculating cosine similarity of the first text and the second text according to the contextual graph; the first text is the text of the file to be solved, and the second text is the text of the common question;

obtaining the similarity of any word segmentation of the first text and any word segmentation of the second text according to the context map so as to calculate the offset similarity of the first text and the second text;

obtaining context word sets of all the participles which are not present in the second text in the first text and context word sets of all the participles which are not present in the first text in the second text according to the context map so as to calculate semantic layer similarity of the first text and the second text;

and calculating the semantic similarity of the first text and the second text according to the cosine similarity, the offset similarity and the semantic layer similarity of the first text and the second text.

2. The intelligent question-answering method according to claim 1, wherein the obtaining of the similarity of any participle of the first text and any participle of the second text according to the context map is used to calculate the offset similarity of the first text and the second text, and specifically includes:

according to the first text p₁Obtaining the total number m of the participles in the first text and the length len (P) of the first text₁) And word segmentation S_iRelative position pos (S) in the first text_i)；

According to the second text p₂The total number n of the participles in the second text and the length len (P) of the second text are obtained as the result of the participles in the second text₂) And word segmentation W_jRelative position pos (W) in the second text_j)；

Calculating a participle S from the context map_iAnd word segmentation W_jSimilarity sim (S) of_i,W_j)；

According to the formula

Computing a first text p₁And a second text p₂Offset similarity Sim of_p(p₁,p₂)。

3. The intelligent question-answering method according to claim 2, wherein the calculating of the participles S from the context map_iAnd word segmentation W_jSimilarity sim (S) of_i,W_j) The method specifically comprises the following steps:

obtaining a participle S on the context map_iAdjacent point of (S)_i) Degree of harmony len (pi (S)_i))；

Obtaining a participle W on the context map_jAdjacent point of (W)_j) Degree of harmony len (pi (W)_j))；

According to the formula

Calculate the similarity sim (S)_i,W_j)；

4. The intelligent question-answering method according to claim 1, wherein the obtaining of the context word sets of all the participles in the first text that are not present in the second text and the context word sets of all the participles in the second text that are not present in the first text according to the context map is performed to calculate semantic layer similarity of the first text and the second text, and specifically comprises:

in a first text p₁To obtain the second text p₂The non-existent participles form a first participle set, the context words of all participles in the first participle set are obtained on the context map to form a first context word set pi (P)₁)；

In the second text p₂Obtain the first text p₁The non-existent participles form a second participle set, the context words of all participles in the second participle set are obtained on the context map to form a second context word set pi (P)₂)；

According to the formula

Wherein when p is₁And p²In the absence of an antisense word, α ═ 1; when p is₁And p₂When an antisense word is present, α ═ 1; t (Pi (P)₁)∩π(P₂) Represents said pi (P)₁) And pi (P)₂) Common context words; t (Pi (P)₁)∪π(P₂) Is represented by pi (P)₁) And pi (P)₂) All context words in (1).

5. The intelligent question answering method according to claim 1, wherein the semantic similarity of the first text and the second text is calculated according to the cosine similarity, the offset similarity and the semantic layer similarity of the first text and the second text, specifically:

6. The intelligent question-answering method according to claim 5,

according to the formula: alpha is alpha₁＝(1-Cosin(p₁,p₂))×Cosin(p₁,p₂) Obtaining the influencing factor alpha₁；

According to the formula: beta is a₁＝(1-Sim_b(p₁,p₂))×Sim_b(p₁,p₂) Obtaining the influencing factor beta₁。

7. An intelligent question answering device, comprising:

the context acquisition module is used for performing word segmentation processing on the text of the question to be solved and determining a context for performing semantic similarity judgment according to the word segmentation result of the question to be solved;

the answer matching module is used for taking the candidate answer corresponding to the common question with the highest similarity as the answer corresponding to the question to be answered;

wherein the similarity calculation module is specifically configured to:

8. An electronic device, comprising:

at least one processor; and

at least one memory communicatively coupled to the processor, wherein:

the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1 to 6.

9. A non-transitory computer-readable storage medium storing computer instructions that cause a computer to perform the method of any one of claims 1 to 6.