CN109033318B - Intelligent question and answer method and device - Google Patents

Intelligent question and answer method and device Download PDF

Info

Publication number
CN109033318B
CN109033318B CN201810790249.5A CN201810790249A CN109033318B CN 109033318 B CN109033318 B CN 109033318B CN 201810790249 A CN201810790249 A CN 201810790249A CN 109033318 B CN109033318 B CN 109033318B
Authority
CN
China
Prior art keywords
text
similarity
question
context
word segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810790249.5A
Other languages
Chinese (zh)
Other versions
CN109033318A (en
Inventor
余军
罗长寿
郑亚明
魏清凤
王富荣
曹承忠
陆阳
郭强
于维水
王静宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Academy of Agriculture and Forestry Sciences
Original Assignee
Beijing Academy of Agriculture and Forestry Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Academy of Agriculture and Forestry Sciences filed Critical Beijing Academy of Agriculture and Forestry Sciences
Priority to CN201810790249.5A priority Critical patent/CN109033318B/en
Publication of CN109033318A publication Critical patent/CN109033318A/en
Application granted granted Critical
Publication of CN109033318B publication Critical patent/CN109033318B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an intelligent question answering method and device, which are used for performing word segmentation processing on a text of a question to be answered and determining a context for performing semantic similarity judgment according to a word segmentation result of the question to be answered; collecting a number of common questions according to the context; performing word segmentation processing on the texts of all the common problems, and establishing a context map according to word segmentation results of all the common problems; for any one of the common questions, calculating semantic similarity between the common question and the question to be solved according to the word segmentation result of the common question, the word segmentation result of the question to be solved and the contextual graph; and taking the candidate answer corresponding to the common question with the highest similarity as the answer corresponding to the question to be answered. The embodiment of the invention can more accurately analyze the question to be solved and provide the answer.

Description

Intelligent question and answer method and device
Technical Field
The invention relates to the technical field of natural language processing, in particular to an intelligent question answering method and device.
Background
In the question answering system, the pushing randomness of the general chat answers is strong. But in the professional application field, the reply content needs to be accurate. A study that utilizes a computer to identify "user questions" for semantic comparison with existing sentences in the sentence library is referred to as a sentence similarity study. It has been a research hotspot and difficulty as a key problem in natural language processing. In addition to the sentence similarity calculation by mining the inter-word relationship and the overlap degree of the sentences (e.g., relying on WordNet architecture and relying on known network architecture and corpus), the sentence similarity research also starts to be developed based on feature extraction of neural network.
Expert scholars of computation methods based on semantic similarity of words have conducted extensive research. For example: statistical methods based on word co-occurrence. The method mainly carries out statistics through word frequency in sentences, such as TF-IDF algorithm, Jaccard Similarity coeffient method, Metzler improved method based on overlap and the like. The methods are simple and efficient to implement, but completely ignore lexical and semantic information of sentences. The other is a lexical and semantic information based approach. The method considers relevant elements of semantic information, but the construction is relatively complex, such as semantic similarity calculation based on an ontology and the like. Thirdly, a neural network corpus training feature extraction method is also vigorously developed in recent years, for example, sentence semantic similarity calculation research based on Word2vec depends on the quality and quantity of the corpus, the feature extraction is emphasized, the comprehension of the sentence meaning is ignored, and the mining of the true positive semantics cannot be realized. And the fourth method is a method adopting a comprehensive fusion means, such as sentence semantic similarity calculation based on multi-feature fusion and the like. With the progress of research and the discovery of application experience, in practical application, if various methods are separated from an application scene, the algorithm is complex to implement, or the efficiency is low, uncertain factors interfere much, and certain operation limitation exists. Therefore, the prior art provides "a word similarity calculation method based on context". The method is based on a similarity calculation method, and adopts a fuzzy mathematics concept to evaluate the word sense similarity calculation method by introducing the context of words. The method constructs the fuzzy importance of the words in the context by using the related determination of the membership, improves the sentence meaning similarity of the words, but has deficiency in the whole sentence meaning similarity of the sentences.
Disclosure of Invention
The present invention provides an intelligent question-answering method and apparatus that overcomes, or at least partially solves, the above-mentioned problems.
According to a first aspect of the present invention, there is provided an intelligent question answering method, comprising:
performing word segmentation processing on a text of a question to be solved, and determining a context for semantic similarity judgment according to a word segmentation result of the question to be solved;
collecting a number of common questions according to the context;
performing word segmentation processing on the texts of all the common problems, and establishing a context map according to word segmentation results of all the common problems;
for any one of the common questions, calculating the similarity between the common question and the question to be solved according to the word segmentation result of the common question, the word segmentation result of the question to be solved and the context map;
taking the candidate answer corresponding to the common question with the highest similarity as the answer corresponding to the question to be solved;
wherein the context map is an undirected graph representing the combined relationship between the participles of all the common questions.
According to a second aspect of the present invention, there is provided an intelligent question-answering device, comprising:
the context acquisition module is used for determining a context for semantic similarity judgment according to the problem to be solved;
a common question acquisition module for collecting a certain number of common questions according to the context;
the contextual graph acquisition module is used for performing word segmentation processing on the texts of all the common problems and establishing a contextual graph according to word segmentation results of all the common problems; wherein the context map is an undirected graph representing the combined relationship among the participles of all the common problems;
the similarity calculation module is used for calculating the similarity between the common question and the question to be solved according to the word segmentation result of the common question, the word segmentation result of the question to be solved and the contextual graph for any one of the common questions;
and the answer matching module is used for taking the candidate answer corresponding to the common question with the highest similarity as the answer corresponding to the question to be answered.
According to a third aspect of the present invention, there is also provided an electronic apparatus comprising:
at least one processor; and
at least one memory communicatively coupled to the processor, wherein:
the memory stores program instructions executable by the processor, and the processor calls the program instructions to perform the intelligent question and answer method provided by any one of the various possible implementations of the first aspect.
According to a fourth aspect of the present invention, there is also provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the smart question-and-answer method provided in any one of the various possible implementations of the first aspect.
The intelligent question answering method and the intelligent question answering device carry out context analysis on the problem to be answered to obtain the context for subsequently carrying out semantic similarity judgment, obviously the context is related to the problem to be answered, a certain number of common problems which are the same as or similar to the context are continuously obtained, so as to map the questions to be solved and the common questions into the same context for analysis, improve the accuracy of the difference analysis among the questions, the semantic similarity calculation accuracy is higher, a context map is constructed according to the obtained word segmentation result of the common problems, in embodiments of the present invention, a context map is constructed based on a number of the common problems described above, characterizing large data, the context in the embodiment of the present invention is a macroscopic context, which is completely different from the context constructed by the existing contexts of the questions to be solved and the frequently used questions based on the semantic similarity to be compared. The embodiment of the invention can more accurately analyze the question to be solved and provide the answer.
Drawings
FIG. 1 is a schematic flow chart of an intelligent question answering method according to an embodiment of the present invention;
FIG. 2 is a context diagram according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart illustrating a process of calculating similarity between a common question and a question to be solved according to a segmentation result of the common question, a segmentation result of the question to be solved and a context map according to an embodiment of the present invention;
fig. 4 is a flowchart illustrating a process of obtaining a similarity between any participle of the first text and any participle of the second text according to a context map to calculate an offset similarity between the first text and the second text according to an embodiment of the present invention;
FIG. 5 is a functional block diagram of an intelligent question answering device according to an embodiment of the present invention;
FIG. 6 is a block diagram of an electronic device according to an embodiment of the invention.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
In the prior art, the following methods are used for calculating the semantic similarity of words: the first method comprises the following steps: statistical methods based on word co-occurrence. The method mainly carries out statistics through word frequency in sentences, such as TF-IDF algorithm, Jaccard Similarity coeffient method, Metzler improved method based on overlap and the like. The methods are simple and efficient to implement, but completely ignore lexical and semantic information of sentences. The second is a lexical and semantic information based approach. The method considers relevant elements of semantic information, but the construction is relatively complex, such as semantic similarity calculation based on an ontology and the like. The third method is a neural network corpus training feature extraction method, which is also vigorously developed in recent years, for example, Word2 vec-based sentence semantic similarity calculation research and the like, which depends on the quality and quantity of corpora, focuses on feature extraction, ignores the comprehension of sentence meanings, and cannot realize the mining of true positive semantics. And the fourth method is a method adopting a comprehensive fusion means, such as sentence semantic similarity calculation based on multi-feature fusion and the like. With the progress of research and the discovery of application experience, in practical application, if various methods are separated from an application scene, the algorithm is complex to implement, or the efficiency is low, uncertain factors interfere much, and certain operation limitation exists. Therefore, the prior art provides "a word similarity calculation method based on context". The method is based on a similarity calculation method, by introducing the context of words and adopting the concept of fuzzy mathematics to evaluate the word sense phase calculation method. The method constructs the fuzzy importance of the words in the context by using the related determination of the membership, improves the sentence meaning similarity of the words, but has deficiency in the whole sentence meaning similarity of the sentences.
In order to overcome the above problems in the prior art, embodiments of the present invention provide a semantic similarity calculation method, the invention has the conception that the problem to be solved is subjected to context analysis to obtain the context for subsequent semantic similarity judgment, obviously the context is related to the problem to be solved, a certain number of common problems which are the same as or similar to the context are continuously obtained, so as to map the questions to be solved and the common questions into the same context for analysis, improve the accuracy of the difference analysis among the questions, the semantic similarity calculation accuracy is higher, a context map is constructed according to the obtained word segmentation result of the common problems, in embodiments of the present invention, a context map is constructed based on a number of the common problems described above, characterizing large data, the method is completely different from the existing context constructed based on the contexts of the to-be-solved question and the common question with the semantic similarity to be compared. The context in embodiments of the present invention is macroscopic. The embodiment of the invention can more accurately analyze the question to be solved and provide the answer.
Fig. 1 is a flowchart illustrating a semantic similarity calculation method according to an embodiment of the present invention, as shown in the figure, including:
s101, performing word segmentation processing on the text of the question to be solved, and determining a context for semantic similarity judgment according to the word segmentation result of the question to be solved.
Specifically, the process of acquiring the text of the question to be solved in the embodiment of the present invention may be:
and receiving text data of the question to be solved as a text of the question to be solved.
And receiving voice data of the question to be solved, carrying out voice recognition on the voice data to obtain text data subjected to voice recognition, and taking the text data subjected to voice recognition as a text of the question to be solved.
It should be understood that the above-described process of obtaining the text of the question to be solved is only a few possible implementations, and should not constitute any limitation to the embodiments of the present invention.
To more conveniently describe the basic principles of embodiments of the present invention, the text to be solved for the question is referred to as a first text p1According to the existing word segmentation technology, p is divided1Word segmentation is S1、S2、…SmWherein m is from p1And obtaining the number of the participles obtained by the participles, thereby obtaining the participles of the text and the number of the participles.
The context for semantic similarity judgment in the embodiment of the invention is determined according to the word segmentation result of the first text. The word segmentation result may show the technical field, environment, theme, mood, and other information of the text to be solved, for example, the first text is: a method for culturing tomato seedlings in greenhouse. After word segmentation, the word segmentation result of the first text is as follows: the tomato, the greenhouse, the seedling and the method can know that the contexts of the first text are all agricultural cultivation, particularly the field of tomato cultivation by analyzing word segmentation results.
S102, collecting a certain number of common problems according to the context.
It should be noted that, after determining the context, the embodiment of the present invention collects a certain number of common problems from the preset database. It will be appreciated that a vast number of frequently asked questions, along with answers to each frequently asked question, are stored in the database. These common questions and answers may be collected from the internet through web crawler processing methods. In the above embodiment, the problem to be solved is determined to be the tomato cultivation field, and therefore, a certain number of common problems in the tomato cultivation field can be found from the database. It should be understood that the above-described process of a certain number of common problems of a cell phone according to the context is only a possible implementation and should not constitute any limitation to the present application.
S103, performing word segmentation processing on the texts of all the common problems, and establishing a context map according to word segmentation results of all the common problems; wherein the context map is an undirected graph representing the combined relationship among the participles of all the common problems.
Specifically, the process of performing word segmentation processing on the text with the common problems may refer to the description of the above embodiments, and is not described herein again.
It should be noted that the context map in the embodiment of the present invention is a net map, vertices in the net map are participles, and edges or arcs connecting the words indicate that a combination relationship (which may also be a weight relationship, and this is not limited by the embodiment of the present invention) exists between two words. In the embodiment of the present invention, the context map is an undirected graph, and if the context-relationship undirected graph G has n vertices (i.e., n different words), the adjacency matrix is an n × n square matrix defined as:
Figure GDA0001835606080000061
in the above formula, g [ i ] [ j ] represents the value of a word pair formed by the participle i and the participle j in the adjacency matrix, and E represents that two words have a combination relationship.
For example, there are two common problems with text, hereinafter referred to as sample text 1 and sample text 2, sample text 1: a method for growing seedlings of tomatoes in a greenhouse; sample text 2: a method for culturing tomato seedlings. After extracting the word segmentation, stop words and feature words, four words are proposed: for convenience of expression, the tomato, the greenhouse, the seedling raising and the method are respectively set as follows: v1 (tomato), V2 (greenhouse), V3 (seedling), V4 (method); the context maps (the embodiments of the present invention do not consider the position directionality, so are undirected maps) generated by the existence of the edge relationships (V1V2), (V1V3), (V2V3) and (V3V4) are shown in fig. 2, and the corresponding adjacency matrices are as follows:
Figure GDA0001835606080000071
after the context map is converted into the adjacency matrix, the degree (corresponding to the number of words) of any vertex (word) can be obtained, namely the vertex ViThe sum of the elements in the ith row in the adjacency matrix. Example (c): v1Degree of 2, V2Degree of 2, V3Degree of 3, V4The degree is 1; calculating vertex ViAll the adjacency points in (1) are the adjacency points by scanning the ith row elements in the adjacency matrix, and the word set formed by all the adjacency points is the context word set of the word: v1The context word set of words includes V2And V3;V2The context word set of words includes V1And V3,V3The context word set of words includes V1、V2And V4,V4The context word set of words includes V3
And S104, for any one common question, calculating the semantic similarity between the common question and the question to be solved according to the word segmentation result of the common question, the word segmentation result of the question to be solved and the context map.
It should be noted that, when calculating the semantic similarity, the embodiment of the present invention performs calculation by mapping the segmentation results of the to-be-solved question and the common question to the corresponding contexts, so as to improve the precision of the difference analysis between the questions, and improve the accuracy of calculating the semantic similarity.
S105, taking the candidate answer corresponding to the common question with the highest similarity as the answer corresponding to the question to be answered;
specifically, after semantic similarity judgment is performed on each of the common questions and the question to be answered, a common question with the highest similarity can be obtained, and the answer of the common question with the highest similarity is used as the answer of the question to be answered, so that the intelligent question answering effect is achieved.
Based on the content of the above embodiments, as an alternative embodiment, the process of calculating the similarity between the frequently asked question and the question to be solved according to the word segmentation result of the frequently asked question, the word segmentation result of the question to be solved and the context map involves two levels of calculation, namely, the presentation level similarity and the semantic level similarity. The expression level similarity refers to the morphological similarity of two sentences, and is measured by the number of the same words or synonyms contained in the two sentences and the relative positions of the same words or synonyms in the sentences. The semantic layer refers to that the literal can not be directly reflected, and the sentence needing to be understood implies the semantic. There are various methods for calculating the surface similarity, such as cosine similarity, generalized Jaccard similarity, etc. The semantic layer similarity can adopt a semantic dictionary and a word sense context.
Fig. 3 is a schematic flow chart illustrating a process of calculating semantic similarity between the frequently asked question and the question to be solved according to the word segmentation result of the frequently asked question, the word segmentation result of the question to be solved, and the context map, in an embodiment of the present invention, as shown in fig. 3, specifically:
s301, calculating cosine similarity of the first text and the second text according to the context map. Wherein, the first text is the text of the file to be solved, the second text is the text of the common question, and p is used2And (4) showing. Wherein p is2Is W1、W2、…WnN is from p2And the number of the participles obtained by the participles.
It should be noted that cosine similarity is a cosine value of an included angle between two vectors, and the cosine similarity is used to represent a difference degree between two sentences; cosine similarity focuses on the difference in direction of vectors, i.e., the difference in trend, rather than the magnitude of absolute distance. The formula is as follows:
Figure GDA0001835606080000081
wherein x isiRepresenting a first text p1TF-IDF weight of the ith participle, yiRepresenting a second text p2The TF-IDF (term frequency-inverse document frequency) weight of the ith participle is a commonly used weighting technique for information retrieval and data mining. TF means Term Frequency (Term Frequency), and IDF means Inverse text Frequency index (Inverse Document Frequency). Because the context map is a word set relation map, after words are segmented in a sentence, the TF-IDF can be well utilized to calculate the weight of words in the sentence to carry out word extraction on the sentence, after words are extracted, the similarity measurement of the space vector cosine included angle is not influenced by index scales, and the cosine value falls in the interval [0,1 ]]The larger the value, the smaller the difference.
S302, obtaining the similarity of any word segmentation of the first text and any word segmentation of the second text according to the context map so as to calculate the offset similarity of the first text and the second text.
It should be noted that, when calculating the offset similarity, the embodiment of the present invention is obtained according to the similarity of the segmented words in the two texts in the context map, and since the context map records the adjacent point (i.e., the context word set) of each segmented word, the similarity of the two texts in the word position relationship can be determined by comparing the approximate conditions of the adjacent points between every two segmented words.
And S303, obtaining the context word sets of all the participles which are not present in the second text in the first text and the context word sets of all the participles which are not present in the first text in the second text according to the context map so as to calculate the semantic layer similarity of the first text and the second text.
It should be noted that, the semantic layer similarity represents the relationship between the implied semantics of the two texts, and since the information cannot be directly translated literally, in the embodiment of the present invention, the context word sets of all the participles that are not present in the other text in each text are respectively obtained through the context map, and the semantic layer similarity is calculated through the two context word sets.
S304, calculating the semantic similarity of the first text and the second text according to the cosine similarity, the offset similarity and the semantic layer similarity of the first text and the second text.
According to the method provided by the embodiment of the invention, the cosine similarity, the offset similarity and the semantic layer similarity of the first text and the second text are respectively obtained through the context map, the similarity of the participles of the two texts in the cosine included angle and the position relation of the space vectors and the similarity of the mutually exclusive words in the semantic layer are obtained, and the semantic similarity is finally obtained, so that the reliability and the accuracy of similarity judgment can be improved.
Based on the content of the above embodiment, as an optional embodiment, the method for obtaining the TF-IDF weight of the participle in the first/second text specifically includes:
forming word set A by using adjacent points of all the participles in the first text on the context map, and forming word set B by using adjacent points of all the participles in the second text on the context map;
all participles in the word set A and the word set B form a word set T, wherein T is A and U B;
forming word set C by using adjacent points of the participles which do not exist in the second text in the first text on the context map;
and forming word set D by the adjacent points of the participles which are not existed in the first text in the second text on the context map.
For participle x in first/second textiObtaining the participle xiAdjacent points on the context map form a word set E, and the coincidence degree of the participles in the word set E and the word set T is taken as a participle xiA TF value of (1); in lg (n)T/nE∩T) As a participle xiThe product of TF value and IDF value is used as word segmentation xiOf TF-IDF, wherein nTRepresenting the total number of participles in the set of words T, nE∩TIndicates the total number of participles common to the word set E and the word set T.
The method for obtaining the IF-IDF weight of the participles in the first/second texts in the embodiment of the invention combines the combination relation of the participles in the context map, namely combines the context where the texts are located to obtain the IF-IDF weight, and can further improve the precision of the cosine similarity of the texts.
Based on the content of the foregoing embodiment, as an optional embodiment, a similarity between any participle of the first text and any participle of the second text is obtained according to the context map to calculate an offset similarity between the first text and the second text, as shown in fig. 4, specifically:
s401, according to the first text p1The total number m of the word segmentation in the first text and the length len (P) of the first text are obtained as the word segmentation result1) And word segmentation SiRelative position pos (S) in the first texti)。
It should be noted that the word segmentation SiRelative position pos (S) in the first texti) By the formula
Figure GDA0001835606080000101
And calculating, wherein i represents the position of the participle in the first text.
S402, according to the second text p2The total number n of the participles in the second text and the length len (P) of the second text are obtained as the result of the participles2) And word segmentation WjRelative position pos (W) in the second textj);。
In addition, the word W is dividedjRelative position pos (W) in the second textj) By the formula
Figure GDA0001835606080000102
And calculating, wherein j represents the position of the participle in the second text. It should be noted that, in the embodiment of the present invention, the order of steps S401 and S402 is not limited.
S403, calculating participles S according to the context mapiAnd word segmentation WjSimilarity sim (S) ofi,Wj)。
It should be noted that, unlike the prior art that the similarity between the segmented words is calculated only for the context of the segmented words, the embodiment of the present invention obtains the segmented word S through the context map specificallyiAnd word segmentation WjBy comparing the adjacent point data to obtain the similarity sim (S)i,Wj) And the similarity judgment of the participle in the macroscopic context is realized.
S404, according to the formula
Figure GDA0001835606080000111
Computing a first textp1And a second text p2Offset similarity Sim ofp(p1,p2)。
It should be noted that, as can be known from the offset similarity formula, when the similarities of two participles are consistent, the more consistent the relative positions of the two participles, the greater the total offset similarity, and when the relative positions of the two participles are consistent, the greater the similarity of the participles, the greater the total offset similarity.
According to the method for calculating the offset similarity, the offset similarity of the two texts is obtained from the context map, and compared with the offset similarity obtained by only considering the context relation of word segmentation in the prior art, the difference precision between the texts is further improved, so that the semantic similarity calculation accuracy is higher.
Based on the above description of the embodiment, as an alternative embodiment, the participles S are calculated according to the context mapiAnd word segmentation WjSimilarity sim (S) ofi,Wj) The method specifically comprises the following steps:
obtaining a participle S on a context mapiAdjacent point of (S)i) Degree of harmony len (pi (S)i));
Obtaining participles W on a context mapjAdjacent point of (W)j) Degree of harmony len (pi (W)j));
Figure GDA0001835606080000112
Calculate the similarity sim (S)i,Wj);
Wherein, T (pi (S)i)∩π(Wj) Represent a participle SiAnd word segmentation WjCommon adjacency points.
Compared with the prior art that only the context relation of the participles is considered, the method for calculating the offset similarity further improves the difference precision between the texts, and enables the semantic similarity to be calculated more accurately.
Based on the content of the foregoing embodiment, as an optional embodiment, the context word sets of all the segmented words that are not present in the second text in the first text and the context word sets of all the segmented words that are not present in the first text in the second text are obtained according to the context map, so as to calculate the semantic layer similarity between the first text and the second text, specifically:
in a first text p1To obtain the second text p2The non-existent participles form the first participle set, the context words of all participles in the first participle set are obtained on the context map, and the first context word set pi (P) is formed1) In the second text p2To obtain a first text p1The non-existent participles form the second participle set, the context words of all participles in the second participle set are obtained on the context map, and the second context word set pi (P) is formed2)。
Taking the first text as: the method for growing seedlings of tomatoes in a greenhouse has a second text that: the method for growing seedlings of the American tomatoes is exemplified, and the word segmentation result of the first text is as follows: tomato, greenhouse, seedling and method, the word segmentation result of the second text is as follows: us, tomato, nursery stock, methods, then the word segmentation that the second text does not exist in the first text is a greenhouse, the word segmentation is obtained in the context map: context word set for greenhouse. Similarly, the first text in the second text supplements the segmentation of the united states, and the segmentation is obtained in the context map: context word sets in the united states.
According to the formula
Figure GDA0001835606080000121
Calculating the similarity Sim of the semantic layers of the first text and the second textL(p1,p2);
Wherein when p is1And p2In the absence of an antisense word, α ═ 1; when p is1And p2When an antisense word is present, α ═ 1; t (Pi (P)1)∩π(P2) Is represented by pi (P)1) And pi (P)2) Common context words; t (Pi (P)1)∪π(P2) Is represented by pi (P)1) And pi (P)2) All context words in (1).
It should be noted that, when calculating the semantic layer similarity using the above formula, it is necessary to perform the first semantic layer similarity calculation in advanceAnd searching whether the text and the second text contain the antisense words. When containing anti-sense words, the semantics of the two texts are opposite with a greater probability. According to pi (P)1) And pi (P)2) Common context words account for pi (P)1) And pi (P)2) The proportion of all the context words and the state of whether the antisense words are contained or not, the embodiment of the invention realizes the calculation of the similarity of the context layer. The method provided by the embodiment of the invention has higher precision in analyzing the similarity of the words which are not contained in the two sentences in the semantic layer under the condition of combining the context map.
Based on the content of the above embodiment, as an optional embodiment, the semantic similarity between the first text and the second text is calculated according to the cosine similarity, the offset similarity, and the semantic layer similarity between the first text and the second text, specifically:
according to the formula: simb(p1,p2)=Cosin(p1,p2)+α1×Simp(p1,p2) Obtaining a first text p1And a second text p2Is the layer similarity Simb(p1,p2);
According to the formula: m (p)1,p2)=Simb(p1,p2)+β1×SimL(p1,p2) Obtaining a first text p1And a second text p2Semantic similarity m (p) of1,p2);
Wherein, Cosin (p)1,p2)、Simp(p1,p2) And SimL(p1,p2) Respectively representing a first text p1And a second text p2Cosine similarity, offset similarity and semantic layer similarity, alpha1Factor, β, representing the influence of offset similarity on the similarity of the representation layers1And representing the influence factor of the semantic layer similarity on the semantic similarity.
It should be noted that, in the embodiment of the present invention, the cosine similarity and the offset similarity together form a presentation layer similarity, and then the semantic pixel points are obtained by synthesizing the presentation layer similarity and the semantic layer similarity. The embodiment of the invention fully considers the image of the macro context to the semantics and carries out deeper mining on the semantics.
Based on the contents of the above-described embodiments, as an alternative embodiment, α is analyzed by practice1The value should be ensured that the product of the similarity with the offset is smaller than the cosine similarity value, and simultaneously alpha is ensured1The product of the similarity with the offset becomes larger as the cosine similarity value becomes larger from 0, and becomes smaller as the cosine similarity value becomes larger when a certain value is reached. Thus, according to the formula: alpha is alpha1=(1-Cosin(p1,p2))×Cosin(p1,p2) Obtaining the influencing factor alpha1
Analysis of beta by practice1The value should ensure that the product of the similarity with the semantic layer is smaller than the similarity of the presentation layer, and ensure beta1The product of the semantic layer similarity becomes larger as the representation layer similarity value becomes larger from 0, and becomes smaller as the representation layer similarity value becomes larger when reaching a certain value of the neighboring point. Thus, according to the formula: beta is a1=(1-Simb(p1,p2))×Simb(p1,p2) Obtaining the influencing factor beta1
According to another aspect of the present invention, an intelligent question-answering device is further provided in the embodiments of the present invention, and referring to fig. 5, fig. 5 shows a functional block diagram of the intelligent question-answering device in the embodiments of the present invention, and the system is used for matching answers according to semantic similarity of the question to be solved and the common question in the foregoing embodiments. Therefore, the descriptions and definitions in the intelligent question answering method in the foregoing embodiments can be used for understanding the execution modules in the embodiments of the present invention.
As shown in the figure, the intelligent question answering device comprises:
the context acquisition module 501 is configured to perform word segmentation processing on a text of a question to be solved, and determine a context for semantic similarity judgment according to a word segmentation result of the question to be solved;
a frequently asked questions obtaining module 502 for collecting a certain number of frequently asked questions according to the context;
a contextual graph obtaining module 503, configured to perform word segmentation processing on the texts of all the common problems, and establish a contextual graph according to word segmentation results of all the common problems; wherein the context map is an undirected graph representing the combined relationship among the participles of all the common problems;
a similarity calculation module 504, configured to calculate, for any one of the common questions, a similarity between the common question and the question to be solved according to the word segmentation result of the common question, the word segmentation result of the question to be solved, and the context map;
the answer matching module 505 is configured to use the candidate answer corresponding to the common question with the highest similarity as the answer corresponding to the question to be answered.
The intelligent question answering device of the embodiment of the invention determines sentences for semantic similarity judgment according to the questions to be answered through the context acquisition module, collects a certain amount of common questions according to the context through the common question acquisition module, so as to map the questions to be solved and the common questions into the same context for analysis, improve the precision of the difference analysis among the questions, ensure that the calculation accuracy of the semantic similarity is higher, establishing a contextual graph according to the collected word segmentation result of the common questions of the mobile phone through a contextual graph acquisition module, calculating the similarity of the questions to be solved and the common questions according to the contextual graph by a similarity calculation module, in embodiments of the present invention, a context map is constructed based on a number of the common problems described above, characterizing large data, the method is completely different from the existing context constructed based on the contexts of the to-be-solved question and the common question with the semantic similarity to be compared. The context in the embodiment of the invention is macroscopic, and finally, the candidate answer corresponding to the common question with the highest similarity is used as the answer corresponding to the question to be answered by the answer matching module. The embodiment of the invention can more accurately analyze the question to be solved and provide the answer.
The embodiment of the invention provides electronic equipment. Referring to fig. 6, the apparatus includes: a processor (processor)601, a memory (memory)602, and a bus 603;
the processor 601 and the memory 602 complete communication with each other through the bus 603, respectively; the processor 601 is configured to call the program instructions in the memory 602 to execute the semantic similarity calculation method provided in the foregoing embodiments, for example, including: performing word segmentation processing on a text of a question to be solved, and determining a context for semantic similarity judgment according to a word segmentation result of the question to be solved; collecting a number of common questions according to the context; performing word segmentation processing on the texts of all the common problems, and establishing a context map according to word segmentation results of all the common problems; for any one of the common questions, calculating the similarity between the common question and the question to be solved according to the word segmentation result of the common question, the word segmentation result of the question to be solved and the context map; taking the candidate answer corresponding to the common question with the highest similarity as the answer corresponding to the question to be solved; wherein the context map is an undirected graph representing the combined relationship between the participles of all the common questions.
An embodiment of the present invention provides a non-transitory computer-readable storage medium, where the non-transitory computer-readable storage medium stores computer instructions, where the computer instructions cause a computer to execute the method for calculating semantic similarity provided in the foregoing embodiment, for example, the method includes: performing word segmentation processing on a text of a question to be solved, and determining a context for semantic similarity judgment according to a word segmentation result of the question to be solved; collecting a number of common questions according to the context; performing word segmentation processing on the texts of all the common problems, and establishing a context map according to word segmentation results of all the common problems; for any one of the common questions, calculating the similarity between the common question and the question to be solved according to the word segmentation result of the common question, the word segmentation result of the question to be solved and the context map; taking the candidate answer corresponding to the common question with the highest similarity as the answer corresponding to the question to be solved; wherein the context map is an undirected graph representing the combined relationship between the participles of all the common questions.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (9)

1. An intelligent question answering method is characterized by comprising the following steps:
performing word segmentation processing on a text of a question to be solved, and determining a context for semantic similarity judgment according to a word segmentation result of the question to be solved;
collecting a number of common questions according to the context;
performing word segmentation processing on the texts of all the common problems, and establishing a context map according to word segmentation results of all the common problems;
for any one of the common questions, calculating semantic similarity between the common question and the question to be solved according to the word segmentation result of the common question, the word segmentation result of the question to be solved and the contextual graph;
taking the candidate answer corresponding to the common question with the highest similarity as the answer corresponding to the question to be solved;
wherein the context map is an undirected graph representing the combined relationship among the participles of all the common problems;
the calculating the semantic similarity between the common question and the question to be solved according to the word segmentation result of the common question, the word segmentation result of the question to be solved and the context map specifically comprises the following steps:
calculating cosine similarity of the first text and the second text according to the contextual graph; the first text is the text of the file to be solved, and the second text is the text of the common question;
obtaining the similarity of any word segmentation of the first text and any word segmentation of the second text according to the context map so as to calculate the offset similarity of the first text and the second text;
obtaining context word sets of all the participles which are not present in the second text in the first text and context word sets of all the participles which are not present in the first text in the second text according to the context map so as to calculate semantic layer similarity of the first text and the second text;
and calculating the semantic similarity of the first text and the second text according to the cosine similarity, the offset similarity and the semantic layer similarity of the first text and the second text.
2. The intelligent question-answering method according to claim 1, wherein the obtaining of the similarity of any participle of the first text and any participle of the second text according to the context map is used to calculate the offset similarity of the first text and the second text, and specifically includes:
according to the first text p1Obtaining the total number m of the participles in the first text and the length len (P) of the first text1) And word segmentation SiRelative position pos (S) in the first texti);
According to the second text p2The total number n of the participles in the second text and the length len (P) of the second text are obtained as the result of the participles in the second text2) And word segmentation WjRelative position pos (W) in the second textj);
Calculating a participle S from the context mapiAnd word segmentation WjSimilarity sim (S) ofi,Wj);
According to the formula
Figure FDA0002510482100000021
Computing a first text p1And a second text p2Offset similarity Sim ofp(p1,p2)。
3. The intelligent question-answering method according to claim 2, wherein the calculating of the participles S from the context mapiAnd word segmentation WjSimilarity sim (S) ofi,Wj) The method specifically comprises the following steps:
obtaining a participle S on the context mapiAdjacent point of (S)i) Degree of harmony len (pi (S)i));
Obtaining a participle W on the context mapjAdjacent point of (W)j) Degree of harmony len (pi (W)j));
According to the formula
Figure FDA0002510482100000022
Calculate the similarity sim (S)i,Wj);
Wherein, T (pi (S)i)∩π(Wj) Represent a participle SiAnd word segmentation WjCommon adjacency points.
4. The intelligent question-answering method according to claim 1, wherein the obtaining of the context word sets of all the participles in the first text that are not present in the second text and the context word sets of all the participles in the second text that are not present in the first text according to the context map is performed to calculate semantic layer similarity of the first text and the second text, and specifically comprises:
in a first text p1To obtain the second text p2The non-existent participles form a first participle set, the context words of all participles in the first participle set are obtained on the context map to form a first context word set pi (P)1);
In the second text p2Obtain the first text p1The non-existent participles form a second participle set, the context words of all participles in the second participle set are obtained on the context map to form a second context word set pi (P)2);
According to the formula
Figure FDA0002510482100000031
Calculating the similarity Sim of the semantic layers of the first text and the second textL(p1,p2);
Wherein when p is1And p2In the absence of an antisense word, α ═ 1; when p is1And p2When an antisense word is present, α ═ 1; t (Pi (P)1)∩π(P2) Represents said pi (P)1) And pi (P)2) Common context words; t (Pi (P)1)∪π(P2) Is represented by pi (P)1) And pi (P)2) All context words in (1).
5. The intelligent question answering method according to claim 1, wherein the semantic similarity of the first text and the second text is calculated according to the cosine similarity, the offset similarity and the semantic layer similarity of the first text and the second text, specifically:
according to the formula: simb(p1,p2)=Cosin(p1,p2)+α1×Simp(p1,p2) Obtaining a first text p1And a second text p2Is the layer similarity Simb(p1,p2);
According to the formula: m (p)1,p2)=Simb(p1,p2)+β1×SimL(p1,p2) Obtaining a first text p1And a second text p2Semantic similarity m (p) of1,p2);
Wherein, Cosin (p)1,p2)、Simp(p1,p2) And SimL(p1,p2) Respectively representing a first text p1And a second text p2Cosine similarity, offset similarity and semantic layer similarity, alpha1Factor, β, representing the influence of offset similarity on the similarity of the representation layers1And representing the influence factor of the semantic layer similarity on the semantic similarity.
6. The intelligent question-answering method according to claim 5,
according to the formula: alpha is alpha1=(1-Cosin(p1,p2))×Cosin(p1,p2) Obtaining the influencing factor alpha1
According to the formula: beta is a1=(1-Simb(p1,p2))×Simb(p1,p2) Obtaining the influencing factor beta1
7. An intelligent question answering device, comprising:
the context acquisition module is used for performing word segmentation processing on the text of the question to be solved and determining a context for performing semantic similarity judgment according to the word segmentation result of the question to be solved;
a common question acquisition module for collecting a certain number of common questions according to the context;
the contextual graph acquisition module is used for performing word segmentation processing on the texts of all the common problems and establishing a contextual graph according to word segmentation results of all the common problems; wherein the context map is an undirected graph representing the combined relationship among the participles of all the common problems;
the similarity calculation module is used for calculating the similarity between the common question and the question to be solved according to the word segmentation result of the common question, the word segmentation result of the question to be solved and the contextual graph for any one of the common questions;
the answer matching module is used for taking the candidate answer corresponding to the common question with the highest similarity as the answer corresponding to the question to be answered;
wherein the similarity calculation module is specifically configured to:
calculating cosine similarity of the first text and the second text according to the contextual graph; the first text is the text of the file to be solved, and the second text is the text of the common question;
obtaining the similarity of any word segmentation of the first text and any word segmentation of the second text according to the context map so as to calculate the offset similarity of the first text and the second text;
obtaining context word sets of all the participles which are not present in the second text in the first text and context word sets of all the participles which are not present in the first text in the second text according to the context map so as to calculate semantic layer similarity of the first text and the second text;
and calculating the semantic similarity of the first text and the second text according to the cosine similarity, the offset similarity and the semantic layer similarity of the first text and the second text.
8. An electronic device, comprising:
at least one processor; and
at least one memory communicatively coupled to the processor, wherein:
the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1 to 6.
9. A non-transitory computer-readable storage medium storing computer instructions that cause a computer to perform the method of any one of claims 1 to 6.
CN201810790249.5A 2018-07-18 2018-07-18 Intelligent question and answer method and device Active CN109033318B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810790249.5A CN109033318B (en) 2018-07-18 2018-07-18 Intelligent question and answer method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810790249.5A CN109033318B (en) 2018-07-18 2018-07-18 Intelligent question and answer method and device

Publications (2)

Publication Number Publication Date
CN109033318A CN109033318A (en) 2018-12-18
CN109033318B true CN109033318B (en) 2020-11-27

Family

ID=64643328

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810790249.5A Active CN109033318B (en) 2018-07-18 2018-07-18 Intelligent question and answer method and device

Country Status (1)

Country Link
CN (1) CN109033318B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109840277A (en) * 2019-02-20 2019-06-04 西南科技大学 A kind of government affairs Intelligent Service answering method and system
CN109918494B (en) * 2019-03-22 2022-11-04 元来信息科技(湖州)有限公司 Context association reply generation method based on graph, computer and medium
CN110069613A (en) * 2019-04-28 2019-07-30 河北省讯飞人工智能研究院 A kind of reply acquisition methods and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101566998A (en) * 2009-05-26 2009-10-28 华中师范大学 Chinese question-answering system based on neural network
CN102346766A (en) * 2011-09-20 2012-02-08 北京邮电大学 Method and device for detecting network hot topics found based on maximal clique
CN103425635A (en) * 2012-05-15 2013-12-04 北京百度网讯科技有限公司 Method and device for recommending answers
CN107766511A (en) * 2017-10-23 2018-03-06 深圳市前海众兴电子商务有限公司 Intelligent answer method, terminal and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9135240B2 (en) * 2013-02-12 2015-09-15 International Business Machines Corporation Latent semantic analysis for application in a question answer system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101566998A (en) * 2009-05-26 2009-10-28 华中师范大学 Chinese question-answering system based on neural network
CN102346766A (en) * 2011-09-20 2012-02-08 北京邮电大学 Method and device for detecting network hot topics found based on maximal clique
CN103425635A (en) * 2012-05-15 2013-12-04 北京百度网讯科技有限公司 Method and device for recommending answers
CN107766511A (en) * 2017-10-23 2018-03-06 深圳市前海众兴电子商务有限公司 Intelligent answer method, terminal and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
面向网络社区问答对的语义挖掘研究;王宝勋;《中国博士学位论文全文数据库信息科技辑》;20140115;全文 *

Also Published As

Publication number Publication date
CN109033318A (en) 2018-12-18

Similar Documents

Publication Publication Date Title
CN109145085B (en) Semantic similarity calculation method and system
US10831769B2 (en) Search method and device for asking type query based on deep question and answer
CN110750959B (en) Text information processing method, model training method and related device
CN108829822B (en) Media content recommendation method and device, storage medium and electronic device
CN110532571B (en) Text processing method and related device
CN112667794A (en) Intelligent question-answer matching method and system based on twin network BERT model
CN103425635B (en) Method and apparatus are recommended in a kind of answer
CN109033318B (en) Intelligent question and answer method and device
CN110297893B (en) Natural language question-answering method, device, computer device and storage medium
CN110781413B (en) Method and device for determining interest points, storage medium and electronic equipment
CN109213853B (en) CCA algorithm-based Chinese community question-answer cross-modal retrieval method
CN112559684A (en) Keyword extraction and information retrieval method
US20210279622A1 (en) Learning with limited supervision for question-answering with light-weight markov models
CN107544958B (en) Term extraction method and device
CN108287875B (en) Character co-occurrence relation determining method, expert recommending method, device and equipment
CN107679070B (en) Intelligent reading recommendation method and device and electronic equipment
CN111198946A (en) Network news hotspot mining method and device
CN113220832A (en) Text processing method and device
CN117473053A (en) Natural language question-answering method, device, medium and equipment based on large language model
CN112434134A (en) Search model training method and device, terminal equipment and storage medium
CN110969005A (en) Method and device for determining similarity between entity corpora
CN111125329A (en) Text information screening method, device and equipment
CN115795018A (en) Multi-strategy intelligent searching question-answering method and system for power grid field
CN114372478A (en) Knowledge distillation-based question and answer method, terminal equipment and storage medium
CN111858895B (en) Sequencing model determining method, sequencing device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant