CN108681574B - Text abstract-based non-fact question-answer selection method and system - Google Patents
Text abstract-based non-fact question-answer selection method and system Download PDFInfo
- Publication number
- CN108681574B CN108681574B CN201810428163.8A CN201810428163A CN108681574B CN 108681574 B CN108681574 B CN 108681574B CN 201810428163 A CN201810428163 A CN 201810428163A CN 108681574 B CN108681574 B CN 108681574B
- Authority
- CN
- China
- Prior art keywords
- text
- answer
- sentence
- abstract
- question
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a text abstract-based non-fact question-answer selecting method and a text abstract-based non-fact question-answer selecting system, which belong to the technical field of intelligent retrieval and comprise the steps of extracting a head sentence and a tail sentence of an answer text to be selected; extracting the abstracts of the remaining texts of the answer text to be selected except the first sentence and the last sentence by using a text abstract model TextRank to obtain a preliminary text abstract; sequentially combining the first sentence, the preliminary text abstract and the tail sentence to obtain an answer text abstract to be selected; taking the question and the answer text abstract to be selected as the input of a neural network semantic representation model to obtain the semantic correlation degree of the question and the answer text abstract to be selected; and returning the answer text abstract with the highest semantic relevance degree with the question as an answer. When the method and the device extract the abstract of the answer, the first sentence and the last sentence of the text of the answer are extracted as the components of the abstract, so that the completeness of the subject of the extracted text abstract is ensured, and the accuracy rate of answer selection is improved.
Description
Technical Field
The invention relates to the technical field of intelligent retrieval, in particular to a text abstract-based non-fact question-answer selecting method and system.
Background
Currently, question-answering systems have become an important research topic in the field of natural language processing, and are used in multiple fields of information acquisition, such as information retrieval, expert systems, automatic question-answering, man-machine natural language interaction, and the like. The question-answering system is different from the information retrieval in that it does not require the user to search for an answer by himself, but returns an answer directly.
According to different data sources of the question-answering system, the question-answering system is divided into three types: a question-answering system based on structured data, a question-answering system based on free text and a question-answering system based on question-answer pairs. The workflow of the question-answering system based on the question-answer pairs is that after a user puts forward a question, the answer most matched with the semantics is returned through semantic feature analysis, and the data mainly comes from the network community question-answering.
The early research on the answer selection method is generally based on traditional semantic feature extraction, text features are manually selected, then a high-performance classifier is used for training, the semantic representation method by manually defining features has strong interpretability, and the feature selection covers the whole data set. The selected characteristics are mainly the sentence quality reflected from the answer text content and the correlation between the question answer and the answer content. Manually selected features typically include N-grams of words, syntactic structures, and grammatical dependencies. When early researchers studied the answer selection method, the most common method is to train an answer selection model based on manually defined features after performing word segmentation, part of speech tagging or syntactic analysis on a text to be processed by means of an existing natural language processing tool.
However, the answer text form in the non-factual question-answer has multilaterality and has noise information, and it is difficult to match the correct answer by using general linguistic rules. Therefore, for the answer selection task of the non-factual question-answering system, the current mainstream method is to mine semantic information of a text by using a supervised machine learning method based on a standard text, such as:
the SVM model is utilized to train matching features at a word level, such as keyword matching features, phrase-level non-semantic features, and some named entity-based features. Still other researchers have developed a series of lexical features related to answer quality including whether punctuation, hyperlinks, the number of special words, part of speech and frequency of named entity features, and N-gram language models by extracting features of text through natural language processing tools. The syntax tree can be used for better capturing the local structural information of the sentence, and the answer selection method based on the syntax tree can effectively reduce the workload of feature selection. Answer selection is performed by a combined approach of syntactic and semantic features, the syntactic aspect calculates tree edit distances between dependency syntax trees for questions and answers, and the semantic aspect uses shallow semantic features such as entity types, synonyms, and the like.
The tree edit distance is a total dissipation value of operations (insertion, deletion and replacement) required in the conversion process from two trees, the calculation process is similar to the edit distance of a character string, a sequence in a question and answer is labeled by using a Conditional Random Field (CRF), and practical characteristics comprise the tree edit distance, the character string edit distance and the like. This is the first time to convert the answer selection questions of the community question-and-answer to sequence labeling questions. In addition to syntactic trees, there are also some researchers that compare the relevance of question and answer text from the perspective of language models and word vectors, for example, using translation-based models to compare how relevant a question is to an answer, treating the question and candidate answer as two different languages.
The answer selection method based on the traditional semantic feature extraction usually has good interpretability, and the basis of the answer selection method can be found through manually selected features, so that the answer selection method is easy to understand. However, there are some drawbacks when answer selection is performed using this method: first, it relies on a number of toolkits related to basic research in the natural language field, which makes the effect of the selected features dependent on the effect of the basic research. The idea of feature extraction may be very basic, but the desired result cannot be obtained in the case of texts with complex structures. Secondly, the features extracted from the answer selection model ultimately depend on the selection of a person, and the model has no self-learning capability, resulting in the limitation of model application.
Disclosure of Invention
The invention aims to provide a text abstract-based non-fact question-answer selection method and system to improve the answer selection accuracy of a question-answer system.
In order to achieve the above purpose, the present invention adopts a text abstract-based answer selection method for non-factual question answers, which comprises the following steps:
extracting a first sentence and a last sentence of the answer text to be selected;
extracting the abstracts of the remaining texts of the answer text to be selected except the first sentence and the last sentence by using a text abstract model TextRank to obtain a preliminary text abstract;
sequentially combining the first sentence, the preliminary text abstract and the tail sentence to obtain an answer text abstract to be selected;
taking the question and the answer text abstract to be selected as input of a neural network semantic representation model to obtain semantic correlation degree of the question and the answer text abstract to be selected;
and returning the answer text abstract with the highest semantic relevance degree with the question as an answer.
Preferably, the extracting the first sentence and the last sentence of the answer text to be selected includes:
and extracting the first sentence and the tail sentence of the answer text to be selected according to the positions of the first sentence and the tail sentence in the answer text to be selected.
Preferably, the extracting the abstract of the remaining text of the answer text to be selected except the first sentence and the last sentence by using the text abstract model TextRank to obtain a preliminary text abstract comprises:
dividing the answer text to be selected into sentences, and segmenting each sentence;
the part of speech of each word is labeled, and the information of the labeled words is filtered to obtain the terms of the specific words;
taking the terms or sentences of the specific words as text units, forming nodes by the text units, and forming edges between the nodes by the similarity between the text units to obtain a weight graph model;
calculating the similarity of any two nodes, and taking the similarity value as a calculation parameter of a node weight calculation formula;
iterating the node weight calculation formula until convergence is achieved to obtain a score result of each node;
according to the scores among all the nodes during convergence, all the nodes are sorted to obtain the sorted nodes;
and extracting text units from the sorted nodes according to a set extraction ratio to form a preliminary text abstract.
Preferably, the method for calculating the similarity between any two nodes includes: a vocabulary overlap method, a character string method, a cosine similarity method and a maximum common subsequence method.
On the other hand, a non-fact question-answer selecting system based on text summaries is adopted and comprises a first extraction module, a second extraction module, a combination module, a matching module and a determining module which are sequentially connected;
the first extraction module is used for extracting a first sentence and a last sentence of the answer text to be selected;
the second extraction module is used for extracting the abstracts of the remaining texts of the answer text to be selected except the first sentence and the last sentence by using a text abstract model TextRank to obtain a primary text abstract;
the combination module is used for sequentially combining the first sentence, the preliminary text abstract and the tail sentence to obtain an answer text abstract to be selected;
the matching module is used for taking the question and the answer text abstract to be selected as the input of a neural network semantic representation model to obtain the semantic correlation degree of the question and the answer text abstract to be selected;
and the determining module is used for returning the answer text abstract with the highest semantic relevance degree with the question as an answer.
Preferably, the first extraction module is specifically configured to:
and extracting the first sentence and the tail sentence of the answer text to be selected according to the positions of the first sentence and the tail sentence in the answer text to be selected.
Preferably, the second extraction module comprises a segmentation unit, a filtering unit, a weight graph model construction unit, a similarity calculation unit, an iteration unit, a sorting unit and a composition unit which are connected in sequence;
the segmentation unit is used for segmenting the answer text to be selected into sentences and segmenting each sentence;
the filtering unit is used for labeling the part of speech of each word and filtering the information of the labeled words to obtain the terms of the specific words;
the weight graph model building unit is used for taking the terms or sentences of the specific words as text units, forming the text units into nodes, and forming edges between the nodes by the similarity between the text units to obtain a weight graph model;
the similarity calculation unit is used for calculating the similarity of any two nodes and taking the similarity value as a calculation parameter of the node weight calculation formula;
the iteration unit is used for iterating the node weight calculation formula until convergence is achieved, and obtaining the score result of each node;
the sorting unit is used for sorting the nodes according to scores among the nodes during convergence to obtain the sorted nodes;
and the combination unit is used for extracting the text units from the sorted nodes according to the set extraction ratio to form a preliminary text abstract.
Preferably, the similarity calculation method adopted by the similarity calculation unit includes: a vocabulary overlap method, a character string method, a cosine similarity method and a maximum common subsequence method.
Compared with the prior art, the invention has the following technical effects: in practical application, considering that in a question-answering pair of a non-factual question-answering system, the length of an answer text is much longer than that of a question, if a single text abstract extraction method is adopted, only global information of the text is considered, and the self characteristic information of a text unit, such as the position of a sentence, the position of a term and the like, is lacked, and when the rate of extracting the abstract is set to be low, a topic drift problem is easily caused. When the abstract of the answer text is extracted, the first sentence and the last sentence of the answer text are reserved, the abstract extraction method is used for extracting the abstract, and the first sentence, the abstract and the last sentence are combined in sequence to serve as a final extracted abstract result. Because the first sentence of the answer text in the question and answer is generally the brief repetition of the question, and the tail sentence of the answer text is generally the brief summary of the answer content, when the answer abstract is extracted, the first sentence and the tail sentence of the answer text are extracted as the components of the abstract, thereby ensuring the theme integrity of the extracted text abstract and improving the accuracy of answer selection.
Drawings
The following detailed description of embodiments of the invention refers to the accompanying drawings in which:
FIG. 1 is a flow chart of a method for selecting answers to non-factual question answers based on text summaries;
FIG. 2 is a schematic diagram of text summarization of answers;
FIG. 3 is a TextRank weight diagram;
FIG. 4 is a block diagram of a neural network semantic representation model;
fig. 5 is a schematic structural diagram of a text abstract-based non-fact question-answer selection system.
Detailed Description
To further illustrate the features of the present invention, refer to the following detailed description of the invention and the accompanying drawings. The drawings are for reference and illustration purposes only and are not intended to limit the scope of the present disclosure.
The embodiment of the application provides a non-fact question-answer selecting method based on a text abstract, so that the problem that the answer selecting accuracy rate of the existing question-answer system is low is solved.
In order to solve the above problems, the main idea of this embodiment is to keep the first sentence and the last sentence of the answer text when extracting the abstract from the answer text to be selected, then extract the abstract from the text content left after removing the first sentence and the last sentence, then combine the first sentence, the abstract and the last sentence into the final text abstract in sequence, match the final text abstract and the question sentence, and obtain the answer for returning.
As shown in fig. 1 to fig. 2, the detailed description of the answer selection method for non-factual question answering based on text abstract according to the present embodiment includes the following steps S1 to S5:
s1, extracting a first sentence and a last sentence of the answer text to be selected;
s2, abstracting the remaining text except the first sentence and the last sentence of the answer text to be selected by using a text abstraction model TextRank to obtain a primary text abstract;
s3, sequentially combining the first sentence, the preliminary text abstract and the tail sentence to obtain an answer text abstract to be selected;
s4, taking the question and the answer text abstract to be selected as the input of a neural network semantic representation model to obtain the semantic correlation degree of the question and the answer text abstract to be selected;
and S5, returning the answer text abstract with the highest semantic relevance degree with the question as an answer.
It should be noted that the question and answer text abstract are input into the neural network answer selection model, the question and answer text abstract are encoded by using the neural network, vector representation of the text is obtained by mining the text semantics, and finally the semantic correlation degree of the text is obtained by calculating the similarity of the semantic vectors of the question and answer text.
More preferably, in step S1: extracting a head sentence and a tail sentence of the answer text to be selected, wherein the specific extraction process is as follows: firstly, the position of a first sentence and the position of a tail sentence of an answer text to be selected are identified, and then the first sentence and the tail sentence are extracted according to the position of the first sentence and the position of the tail sentence. For example, the position of the first appearing period in the answer text is identified, and the sentence before the period is extracted as the first sentence. And identifying the positions of two final appearing periods in the answer text, and extracting the sentence between the two periods to be used as a tail sentence.
More preferably, in step S2: and abstracting the rest texts of the answer text to be selected except the first sentence and the last sentence by using a text abstraction model TextRank to obtain a preliminary text abstraction. The detailed description is as follows:
when the TextRank algorithm is used for extracting key sentences, the sentences are marked as nodes, and then a graph model is established according to the number of edges. When the sentence similarity is calculated, the method adopted in the TextRank is generally based on a vocabulary overlapping method, that is, the similarity is higher when the number of overlapped words in two sentences is larger. Besides word overlap, sentence similarity calculation methods such as character strings, cosine similarity, maximum common subsequence, and the like can be used, and the part-of-speech method is based on statistical information. After the graph model is built, recursive calculation is carried out by utilizing a PageRank algorithm, and finally the score of the node is obtained. The higher the node score, the higher the importance of the sentence. After the sentences are sorted according to the importance of the sentences, key sentences are extracted according to a required ratio to form a text abstract.
The main steps are as follows:
(1) pretreatment: the text is divided into a plurality of text units (terms or sentences), and part-of-speech tagging is performed after word segmentation. And filtering the labeled word information, wherein the filtering content comprises stop words and parts of speech, and finally only the terms of the specific parts of speech are reserved.
(2) Constructing a weight graph model: and forming nodes by the text units, and forming edges among the nodes by the similarity among the text units to form a weight graph model.
(3) Sentence similarity calculation: calculating the similarity of two sentences by using a method based on word overlap, and matching sentences SiAnd SjThe similarity calculation is carried out by using the following formula:
wherein S isiAnd SjRepresenting two sentencesAnd (4) adding the active ingredients. Sentence SiFrom NiThe terms represent: si=w1 i,w2 i,…,wNi i。wkRepresenting the words contained in both sentences, the weight W of the edgeji=Similarity(Si,Sj)。
(4) Iterating the node score calculation formula until convergence is achieved, and obtaining each node score: the TextRank algorithm model can be represented by G ═ V, E. The algorithm represents all the node sets in the graph as V and all the edge sets in the graph as E, and V and E constitute all the contents in the graph, wherein E is a subset of V multiplied by V. Node ViThe score of (d) is as follows:
wherein, wjiRepresents a node VjAnd node ViThe weight of the connecting edge between, typically using node VjAnd node ViA similarity representation of (d); in (V)i) Indicates a point ViAll node set of nodes, Out (V)j) Represents a node VjAll node sets pointed to represent multiplication numbers; d is called a damping coefficient (d is more than or equal to 0 and less than or equal to 1) and represents the probability of a certain node in the figure 3 jumping to any other node, and d is generally 0.85.
In addition, two points are to be noted when using the TextRank algorithm; firstly, setting an initial value, and generally making all nodes initially divided into 1; secondly, convergence is judged, and when the general convergence threshold value is 0.0001, namely the error rate of any node in the graph is less than 0.0001, convergence is achieved, and iteration is stopped.
(5) And sorting all the nodes according to the scores of all the nodes, and extracting the keywords from all the nodes according to different extraction ratios to form a preliminary abstract text.
It should be noted that different extraction ratios are set in combination with actual needs, so that spoken expressions and redundant information in answer texts can be removed, and the accuracy of keyword extraction is ensured.
It should be noted that the TextRank algorithm is a classic method for extracting keywords and abstract sentences of a text, and the principle of the TextRank algorithm is an unsupervised algorithm based on a graph. In the embodiment, the TextRank algorithm is used for sorting the keywords and the key sentences in the text and adopts a PageRank algorithm.
For example, calculate sentence SiAnd sentence SjThe similarity of (2) is to establish a weight graph as shown in FIG. 3, node ViRepresenting a sentence SiNode VjRepresenting a sentence Sj. Node VjAnd node VkIs expressed as wjk. Node VjAnd node Vk+1Degree of similarity w ofjk+1Can be derived from the formula. The node V can be calculated according to a formulaiTextRank score of (1), wherein wjk+wjk+1Represents a node VjScore of (a):
it should be noted that the TextRank algorithm is an unsupervised method for extracting keywords and key sentences. The method has the advantages that a corpus does not need to be trained, the method can be well used for texts with contents in different fields, linguistic knowledge or domain knowledge does not need to be considered, and the overall structure of the texts is comprehensively considered. The disadvantage is that the TextRank algorithm only considers the global information of the text, and lacks the self characteristic information of the text unit, such as the position of a sentence, the position of a term, and the like.
In practical application, in the question-answering pair of the non-factual question-answering system, the length of the answer text is much longer than that of a question sentence, but by using a single text abstract extraction method, when the extraction rate of the abstract is set to be low, the problem of theme drift is easily caused. As shown in fig. 4, when the answer text abstract extraction is performed, the first sentence and the last sentence of the answer text are retained, and then the abstract extraction is performed by using the abstract extraction method. The characteristics of the answer text in question and answer show that the first sentence of the answer is generally a brief repetition of the question and then a solution method of the question; the end of the answer is typically a brief summary of the content of the answer. Therefore, when the answer abstract is extracted, the subject integrity of the abstract is ensured by using the first sentence and the last sentence of the answer text, and the accuracy of answer selection is further improved.
Meanwhile, the extracted abstract deletes the spoken expressions and redundant information without practical significance relative to the original answer text, so that high-efficiency answer text representation is obtained, and then semantic vectors containing more key information are obtained through a neural network semantic representation model.
As shown in fig. 5, the embodiment discloses a non-factual question-answer selection system based on a text abstract, which includes a first extraction module 10, a second extraction module 20, a combination module 30, a matching module 40 and a determination module 50, which are connected in sequence;
the first extraction module 10 is configured to extract a first sentence and a last sentence of the answer text to be selected;
a second extraction module 20, configured to extract the abstracts of the remaining texts of the answer text to be selected, except for the first sentence and the last sentence, by using a text abstraction model TextRank, so as to obtain a preliminary text abstraction;
the combination module 30 is configured to sequentially combine the first sentence, the preliminary text abstract, and the last sentence to obtain an answer text abstract to be selected;
the matching module 40 is configured to use the question and the to-be-selected answer text abstract as inputs of a neural network semantic representation model to obtain semantic correlation degrees of the question and the to-be-selected answer text abstract;
and the determining module 50 is used for returning the answer text abstract with the highest semantic relevance degree with the question as the answer.
As a further preferred scheme, the first extraction module 10 is specifically configured to:
and extracting the first sentence and the tail sentence of the answer text to be selected according to the positions of the first sentence and the tail sentence in the answer text to be selected.
As a further preferred scheme, the second extraction module 20 includes a segmentation unit, a filtering unit, a weight map model construction unit, a similarity calculation unit, an iteration unit, a sorting unit, and a composition unit, which are connected in sequence;
the segmentation unit is used for segmenting the answer text to be selected into sentences and segmenting each sentence;
the filtering unit is used for labeling the part of speech of each word and filtering the information of the labeled words to obtain the terms of the specific words;
the weight graph model building unit is used for taking the terms or sentences of the specific words as text units, forming the text units into nodes, and forming edges between the nodes by the similarity between the text units to obtain a weight graph model;
the similarity calculation unit is used for calculating the similarity of any two nodes and taking the similarity value as a calculation parameter of the node weight calculation formula;
the iteration unit is used for iterating the node weight calculation formula until convergence is achieved, and obtaining the score result of each node;
the sorting unit is used for sorting the nodes according to scores among the nodes during convergence to obtain the sorted nodes;
and the combination unit is used for extracting the text units from the sorted nodes according to the set extraction ratio to form a preliminary text abstract.
As a further preferable aspect, the similarity calculation method adopted by the similarity calculation unit includes: a vocabulary overlap method, a character string method, a cosine similarity method and a maximum common subsequence method.
It should be understood that the system for selecting answers to questions and answers based on non-facts type abstract of the present embodiment is used for implementing the processes in fig. 1, and has the same technical features and the same effects as the method for selecting answers to questions and answers based on non-facts type abstract of the present embodiment, and will not be described in detail herein.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (6)
1. A non-factual question answer selection method based on text abstract is characterized by comprising the following steps:
extracting a first sentence and a last sentence of an answer text to be selected;
extracting the abstracts of the remaining texts of the answer text to be selected except the first sentence and the last sentence by using a text abstract model TextRank to obtain a preliminary text abstract, which comprises the following steps:
dividing the answer text to be selected into sentences, and segmenting each sentence;
the part of speech of each word is labeled, and the information of the labeled words is filtered to obtain the terms of the specific words;
taking the terms or sentences of the specific words as text units, forming nodes by the text units, and forming edges between the nodes by the similarity between the text units to obtain a weight graph model;
calculating the similarity of any two nodes, and taking the similarity value as a calculation parameter of a node weight calculation formula;
iterating the node weight calculation formula until convergence is achieved to obtain a score result of each node;
according to the scores among all the nodes during convergence, all the nodes are sorted to obtain the sorted nodes;
extracting text units from the sorted nodes according to a set extraction ratio to form a preliminary text abstract;
sequentially combining the first sentence, the preliminary text abstract and the tail sentence to obtain an answer text abstract to be selected;
taking the question and the answer text abstract to be selected as input of a neural network semantic representation model to obtain semantic correlation degree of the question and the answer text abstract to be selected;
and returning the answer text abstract with the highest semantic relevance degree with the question as an answer.
2. The method for selecting a non-factual question-answer based on a text abstract as claimed in claim 1, wherein said extracting the first sentence and the last sentence of the answer text to be selected comprises:
and extracting the first sentence and the tail sentence of the answer text to be selected according to the positions of the first sentence and the tail sentence in the answer text to be selected.
3. The method for selecting answers to questions and answers based on text excerpts as claimed in claim 1, wherein the method for calculating the similarity between any two nodes comprises: a vocabulary overlap method, a character string method, a cosine similarity method and a maximum common subsequence method.
4. A non-factual question-answer selection system based on a text abstract is characterized by comprising a first extraction module, a second extraction module, a combination module, a matching module and a determination module which are connected in sequence;
the first extraction module is used for extracting a first sentence and a last sentence of the answer text to be selected;
the second extraction module is used for performing abstract extraction on the remaining texts of the answer text to be selected except the first sentence and the last sentence by using a text abstract model TextRank to obtain a preliminary text abstract, and comprises a segmentation unit, a filtering unit, a weight map model construction unit, a similarity calculation unit, an iteration unit, a sorting unit and a composition unit which are sequentially connected;
the segmentation unit is used for segmenting the answer text to be selected into sentences and segmenting each sentence;
the filtering unit is used for labeling the part of speech of each word and filtering the information of the labeled words to obtain the terms of the specific words;
the weight graph model building unit is used for taking the terms or sentences of the specific words as text units, forming the text units into nodes, and forming edges between the nodes by the similarity between the text units to obtain a weight graph model;
the similarity calculation unit is used for calculating the similarity of any two nodes and taking the similarity value as a calculation parameter of the node weight calculation formula;
the iteration unit is used for iterating the node weight calculation formula until convergence is achieved, and obtaining the score result of each node;
the sorting unit is used for sorting the nodes according to scores among the nodes during convergence to obtain the sorted nodes;
the combination unit is used for extracting text units from the sorted nodes according to a set extraction ratio to form a preliminary text abstract;
the combination module is used for sequentially combining the first sentence, the preliminary text abstract and the tail sentence to obtain an answer text abstract to be selected;
the matching module is used for taking the question and the answer text abstract to be selected as the input of a neural network semantic representation model to obtain the semantic correlation degree of the question and the answer text abstract to be selected;
and the determining module is used for returning the answer text abstract with the highest semantic relevance degree with the question as an answer.
5. The system of claim 4, wherein the first extraction module is specifically configured to:
and extracting the first sentence and the tail sentence of the answer text to be selected according to the positions of the first sentence and the tail sentence in the answer text to be selected.
6. The system for selecting a non-factual-class question-answer based on a text excerpt according to claim 4, wherein the similarity calculation means employs a similarity calculation method comprising: a vocabulary overlap method, a character string method, a cosine similarity method and a maximum common subsequence method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810428163.8A CN108681574B (en) | 2018-05-07 | 2018-05-07 | Text abstract-based non-fact question-answer selection method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810428163.8A CN108681574B (en) | 2018-05-07 | 2018-05-07 | Text abstract-based non-fact question-answer selection method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108681574A CN108681574A (en) | 2018-10-19 |
CN108681574B true CN108681574B (en) | 2021-11-05 |
Family
ID=63801897
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810428163.8A Active CN108681574B (en) | 2018-05-07 | 2018-05-07 | Text abstract-based non-fact question-answer selection method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108681574B (en) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109543089A (en) * | 2018-11-30 | 2019-03-29 | 南方电网科学研究院有限责任公司 | Classification method, system and related device of network security information data |
CN109766418B (en) * | 2018-12-13 | 2021-08-24 | 北京百度网讯科技有限公司 | Method and apparatus for outputting information |
CN109902284A (en) * | 2018-12-30 | 2019-06-18 | 中国科学院软件研究所 | A kind of unsupervised argument extracting method excavated based on debate |
CN109829052A (en) * | 2019-02-19 | 2019-05-31 | 田中瑶 | A kind of open dialogue method and system based on human-computer interaction |
CN110674286A (en) * | 2019-09-29 | 2020-01-10 | 出门问问信息科技有限公司 | Text abstract extraction method and device and storage equipment |
CN111241288A (en) * | 2020-01-17 | 2020-06-05 | 烟台海颐软件股份有限公司 | Emergency sensing system of large centralized power customer service center and construction method |
CN111401033B (en) | 2020-03-19 | 2023-07-25 | 北京百度网讯科技有限公司 | Event extraction method, event extraction device and electronic equipment |
CN113806500B (en) * | 2021-02-09 | 2024-05-28 | 京东科技控股股份有限公司 | Information processing method, device and computer equipment |
CN113282711B (en) * | 2021-06-03 | 2023-09-22 | 中国软件评测中心(工业和信息化部软件与集成电路促进中心) | Internet of vehicles text matching method and device, electronic equipment and storage medium |
CN113688231A (en) * | 2021-08-02 | 2021-11-23 | 北京小米移动软件有限公司 | Abstract extraction method and device of answer text, electronic equipment and medium |
CN113918702B (en) * | 2021-10-25 | 2022-07-01 | 北京航空航天大学 | Semantic matching-based online law automatic question-answering method and system |
CN114997175B (en) * | 2022-05-16 | 2024-06-18 | 电子科技大学 | Emotion analysis method based on domain countermeasure training |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104679728A (en) * | 2015-02-06 | 2015-06-03 | 中国农业大学 | Text similarity detection device |
CN104699763A (en) * | 2015-02-11 | 2015-06-10 | 中国科学院新疆理化技术研究所 | Text similarity measuring system based on multi-feature fusion |
CN106126492A (en) * | 2016-06-07 | 2016-11-16 | 北京高地信息技术有限公司 | Statement recognition methods based on two-way LSTM neutral net and device |
CN106202042A (en) * | 2016-07-06 | 2016-12-07 | 中央民族大学 | A kind of keyword abstraction method based on figure |
CN106844368A (en) * | 2015-12-03 | 2017-06-13 | 华为技术有限公司 | For interactive method, nerve network system and user equipment |
CN107562792A (en) * | 2017-07-31 | 2018-01-09 | 同济大学 | A kind of question and answer matching process based on deep learning |
CN107590163A (en) * | 2016-07-06 | 2018-01-16 | 北京京东尚科信息技术有限公司 | The methods, devices and systems of text feature selection |
CN107832457A (en) * | 2017-11-24 | 2018-03-23 | 国网山东省电力公司电力科学研究院 | Power transmission and transforming equipment defect dictionary method for building up and system based on TextRank algorithm |
CN107980130A (en) * | 2017-11-02 | 2018-05-01 | 深圳前海达闼云端智能科技有限公司 | It is automatic to answer method, apparatus, storage medium and electronic equipment |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2544324A1 (en) * | 2005-06-10 | 2006-12-10 | Unicru, Inc. | Employee selection via adaptive assessment |
US10431205B2 (en) * | 2016-04-27 | 2019-10-01 | Conduent Business Services, Llc | Dialog device with dialog support generated using a mixture of language models combined using a recurrent neural network |
-
2018
- 2018-05-07 CN CN201810428163.8A patent/CN108681574B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104679728A (en) * | 2015-02-06 | 2015-06-03 | 中国农业大学 | Text similarity detection device |
CN104699763A (en) * | 2015-02-11 | 2015-06-10 | 中国科学院新疆理化技术研究所 | Text similarity measuring system based on multi-feature fusion |
CN106844368A (en) * | 2015-12-03 | 2017-06-13 | 华为技术有限公司 | For interactive method, nerve network system and user equipment |
CN106126492A (en) * | 2016-06-07 | 2016-11-16 | 北京高地信息技术有限公司 | Statement recognition methods based on two-way LSTM neutral net and device |
CN106202042A (en) * | 2016-07-06 | 2016-12-07 | 中央民族大学 | A kind of keyword abstraction method based on figure |
CN107590163A (en) * | 2016-07-06 | 2018-01-16 | 北京京东尚科信息技术有限公司 | The methods, devices and systems of text feature selection |
CN107562792A (en) * | 2017-07-31 | 2018-01-09 | 同济大学 | A kind of question and answer matching process based on deep learning |
CN107980130A (en) * | 2017-11-02 | 2018-05-01 | 深圳前海达闼云端智能科技有限公司 | It is automatic to answer method, apparatus, storage medium and electronic equipment |
CN107832457A (en) * | 2017-11-24 | 2018-03-23 | 国网山东省电力公司电力科学研究院 | Power transmission and transforming equipment defect dictionary method for building up and system based on TextRank algorithm |
Non-Patent Citations (2)
Title |
---|
"Intent Identification for Knowledge Base Question Answering";Feifei Dai 等;《2017 Conference on Technologies and Applications of Artificial Intelligence (TAAI)》;20171203;第96-99页 * |
"基于卷积神经网络的自动问答";金丽娇 等;《华东师范大学学报(自然科学版)》;20171006;第66-79页 * |
Also Published As
Publication number | Publication date |
---|---|
CN108681574A (en) | 2018-10-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108681574B (en) | Text abstract-based non-fact question-answer selection method and system | |
CN109408642B (en) | Domain entity attribute relation extraction method based on distance supervision | |
US11775760B2 (en) | Man-machine conversation method, electronic device, and computer-readable medium | |
CN109190117B (en) | Short text semantic similarity calculation method based on word vector | |
JP6309644B2 (en) | Method, system, and storage medium for realizing smart question answer | |
JP6813591B2 (en) | Modeling device, text search device, model creation method, text search method, and program | |
CN106599032B (en) | Text event extraction method combining sparse coding and structure sensing machine | |
Brodsky et al. | Characterizing motherese: On the computational structure of child-directed language | |
CN109960786A (en) | Chinese Measurement of word similarity based on convergence strategy | |
US10496756B2 (en) | Sentence creation system | |
CN111177365A (en) | Unsupervised automatic abstract extraction method based on graph model | |
CN104391942A (en) | Short text characteristic expanding method based on semantic atlas | |
CN109783806B (en) | Text matching method utilizing semantic parsing structure | |
CN109271524B (en) | Entity linking method in knowledge base question-answering system | |
Sun et al. | Mining dependency relations for query expansion in passage retrieval | |
CN110188174B (en) | Professional field FAQ intelligent question and answer method based on professional vocabulary mining | |
CN105528437A (en) | Question-answering system construction method based on structured text knowledge extraction | |
Chen et al. | Automatic key term extraction from spoken course lectures using branching entropy and prosodic/semantic features | |
CN107844608B (en) | Sentence similarity comparison method based on word vectors | |
CN112417170B (en) | Relationship linking method for incomplete knowledge graph | |
CN109508460A (en) | Unsupervised composition based on Subject Clustering is digressed from the subject detection method and system | |
Ruiz-Casado et al. | Using context-window overlapping in synonym discovery and ontology extension | |
CN116227466A (en) | Sentence generation method, device and equipment with similar semantic different expressions | |
CN110059318B (en) | Discussion question automatic evaluation method based on Wikipedia and WordNet | |
CN109002540B (en) | Method for automatically generating Chinese announcement document question answer pairs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |