CN108681574A - A kind of non-true class quiz answers selection method and system based on text snippet - Google Patents
A kind of non-true class quiz answers selection method and system based on text snippet Download PDFInfo
- Publication number
- CN108681574A CN108681574A CN201810428163.8A CN201810428163A CN108681574A CN 108681574 A CN108681574 A CN 108681574A CN 201810428163 A CN201810428163 A CN 201810428163A CN 108681574 A CN108681574 A CN 108681574A
- Authority
- CN
- China
- Prior art keywords
- sentence
- text
- answer
- snippet
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
The non-true class quiz answers selection method and system that the invention discloses a kind of based on text snippet, belong to intelligent Search Technique field, include the first sentence and tail sentence of answer text to be selected described in extraction;Selection answer text remaining text in addition to first sentence and tail sentence is treated using text snippet model TextRank and carries out abstract extraction, obtains preliminary text snippet;First sentence, preliminary text snippet and tail sentence are combined successively, obtain the answer text snippet for waiting for selection;Using question sentence and the answer text snippet selected is waited as the input of neural network semantic expressiveness model, obtains question sentence and waits for the semantic degree of correlation of the answer text snippet of selection;It will be returned as answer with the highest answer text snippet of question semanteme degree of correlation.The present invention extracts the component part of the first sentence and tail sentence of answer text as abstract, ensure that the theme integrality of the text snippet extracted, to improve the accuracy rate of answer selection when carrying out answer abstract extraction.
Description
Technical field
The present invention relates to intelligent Search Technique field, more particularly to a kind of non-true class quiz answers based on text snippet
Selection method and system.
Background technology
Currently, question answering system has become one important research topic of natural language processing field, it is used for acquisition of information
Multiple fields, such as information retrieval, expert system, automatic question answering and man-machine natural language interaction etc..Question answering system and letter
Breath retrieves different place and is that it does not need user oneself and finds answer, but directly returns to answer.
According to the different data source of question answering system, be divided into three classes question answering system:Question answering system based on structural data,
Question answering system based on free text and the question answering system based on problem answers pair.Wherein, the question and answer system based on problem answers pair
The workflow of system will be returned with semantic most matched answer by meaning of one's words signature analysis after user's proposition problem, data master
To come from Web Community's question and answer.
Early stage is generally basede on traditional semantic feature extraction to the research of answer selection method, manually chooses text feature, so
It is trained using high performance classifier, is compared to carry out the method interpretation of semantic expressiveness using Manual definition's feature afterwards
By force, the selection of feature covers entire data set.The feature of the selection mainly reflected language from answer content of text
Sentence quality and problem answers and the correlation in answer content.The feature manually chosen generally comprise word N gram language models,
Syntactic structure and grammer dependence etc..The researcher of early stage is when carrying out the research of answer selection method, most common method
Exactly text to be dealt with is segmented, after part-of-speech tagging or syntactic analysis by existing natural language processing tool,
Answer preference pattern of the training based on Manual definition's feature.
However, the answer textual form in non-fact class question and answer has polygons, and there are noise information, utilization is general
Language rule is difficult to be matched to correct option.Therefore the answer for non-true class question answering system selects task, current mainstream side
Method be the semantic information of text is excavated using the machine learning method for having supervision based on received text, such as:
The matching characteristic of word level is trained using SVM models, such as Keywords matching feature, phrase rank
Non-semantic category feature, the more also feature etc. based on name entity.Researcher also by natural language processing tool come
The feature of text is extracted, is included whether comprising mark to develop a series of lexical characteristics related with answer quality
Point, hyperlink, the quantity of special word, part of speech and the frequency etc. for naming substance feature and N gram language models.Use syntax tree can be with
The partial structurtes information for preferably capturing sentence, the answer choosing method based on syntax tree can effectively reduce feature selecting
Workload.The method being combined using syntax and semantic feature carries out answer selection, by computational problem and is answered in terms of syntax
Tree edit distance between the interdependent syntax tree of case, and semantic aspect is special using shallow semantics such as entity type, synonyms
Sign.
Wherein, tree edit distance is to calculate the required operation (be inserted into, delete and replace) from two tree transfer processes
Total dissipation value, calculating process is similar with the editing distance of character string, use condition random field (Conditional Random
Fields, CRF) sequence in question and answer is labeled, practical feature includes tree edit distance and string editing distance etc..
This is that the answer select permeability by community's question and answer is converted for sequence labelling problem for the first time.In addition to syntax tree, also some are studied
Person from the correlation of the angle changing rate problem and answer text of language model and term vector, such as using the model based on translation come
Problem and candidate answers are regarded as two different language by the degree of correlation of comparison problem and answer.
Answer selection method based on traditional semantic feature extraction often has good interpretation, passes through what is manually chosen
Feature can find its foundation, be easy to make one to understand.But when carrying out answer selection using the method, can also exist
Defect:First, depending on some and the relevant kit of natural language field basic research, this allows for selected feature
Effect depends on the effect of basic research.The thought of feature extraction may have foundation very much, but face complicated text, can not
Obtain desired result.Second is that the feature extracted in answer preference pattern ultimately depends on the selection of people, model does not learn by oneself energy
Power results in the limitation of model application.
Invention content
The purpose of the present invention is to provide a kind of non-true class quiz answers selection method and system based on text snippet,
To improve the accuracy rate of request-answer system answer selection.
In order to achieve the above object, the present invention uses a kind of non-true class quiz answers selection method based on text snippet,
Include the following steps:
The first sentence and tail sentence of answer text to be selected described in extraction;
Using text snippet model TextRank to the answer text to be selected the remaining text in addition to first sentence and tail sentence
This carries out abstract extraction, obtains preliminary text snippet;
The first sentence, the preliminary text snippet and the tail sentence are combined successively, obtain the answer text for waiting for selection
Abstract;
Using question sentence and described wait that the answer text snippet selected as the input of neural network semantic expressiveness model, is asked
The semantic degree of correlation of sentence and the answer text snippet for waiting for selection;
It will be returned as answer with the highest answer text snippet of question semanteme degree of correlation.
Preferably, the first sentence and tail sentence of answer text to be selected described in the extraction, including:
According to the position of first sentence and tail sentence in the answer text to be selected, by the first sentence of the answer text to be selected and
Tail sentence extracts.
Preferably, it is described using text snippet model TextRank to answer text select remove first sentence and tail sentence it
Outer remaining text carries out abstract extraction, obtains preliminary text snippet, including:
The answer text segmentation to be selected is segmented at sentence, and to each sentence;
The part of speech of each word is labeled, and the information of word after mark is filtered, obtains the lexical item of specific word;
Using specific time lexical item or sentence as text unit, by text unit configuration node, between text unit
Side between similarity configuration node obtains weight graph model;
The similarity of any two nodes is calculated, and using similarity value as the calculating parameter of node weights calculation formula;
The node weights calculation formula is iterated until convergence, obtains the scores of each node;
Score when according to convergence between each node, is ranked up each node, each node after being sorted;
Text unit, which is extracted, according to the decimation ratio of setting, in each node after sequence forms preliminary text snippet.
Preferably, the computational methods of the similarity of any two nodes include:Vocabulary overlay method, character string method, cosine
Semblance and longest common subsequence method.
On the other hand, using a kind of, the non-true class quiz answers based on text snippet select system, including are sequentially connected
The first abstraction module, the second abstraction module, composite module, matching module and determining module;
First abstraction module, first sentence and tail sentence for extracting the answer text to be selected;
Second abstraction module, for using text snippet model TextRank to the answer text to be selected except first sentence and
Remaining text carries out abstract extraction except tail sentence, obtains preliminary text snippet;
Composite module obtains to be selected for combining the first sentence, the preliminary text snippet and the tail sentence successively
The answer text snippet selected;
Matching module, for using question sentence and the answer text snippet for waiting selecting as neural network semantic expressiveness model
Input, obtain the semantic degree of correlation of question sentence and the answer text snippet for waiting for selection;
Determining module, for will be returned as answer with the highest answer text snippet of question semanteme degree of correlation.
Preferably, first abstraction module is specifically used for:
According to the position of first sentence and tail sentence in the answer text to be selected, by the first sentence of the answer text to be selected and
Tail sentence extracts.
Preferably, second abstraction module includes sequentially connected cutting unit, filter element, weight map model construction
Unit, similarity calculated, iteration unit, sequencing unit and component units;
Cutting unit, for segmenting the answer text segmentation to be selected at sentence, and to each sentence;
Filter element is labeled for the part of speech to each word, and is filtered to the information of word after mark, and spy is obtained
Determine the lexical item of word;
Weight map model construction unit is used for using the specific lexical item or sentence as text unit, by text list
First configuration node, the side between similarity configuration node between text unit, obtains weight graph model;
Similarity calculated, the similarity for calculating any two nodes, and using similarity value as node weight restatement
Calculate the calculating parameter of formula;
Iteration unit, for being iterated to the node weights calculation formula until convergence, obtains the score of each node
As a result;
Sequencing unit, score when for according to convergence between each node, is ranked up each node, after being sorted
Each node;
Assembled unit extracts text unit composition for the decimation ratio according to setting, in each node after sequence just
Walk text snippet.
Preferably, the similarity calculating method of the similarity calculation module use includes:Vocabulary overlay method, character string
Method, cosine similarity method and longest common subsequence method.
Compared with prior art, there are following technique effects by the present invention:In practical applications, it is contemplated that asked in non-true class
Answer the question and answer centering of system, much longer than question sentence of the length of answer text, if using single text snippet abstracting method,
The global information for only considering text, lacks the position of characteristic information such as sentence of text unit itself, the position etc. of lexical item, when
Extract abstract ratio setting it is very low when, be easy to cause topic drift problem.This programme is carrying out answer text snippet extraction
When, retain answer text first sentence and tail sentence, recycle abstract abstracting method carry out abstract extraction, in sequence by first sentence, pluck
Want, tail sentence three parts combine, as the abstract result finally extracted.Since the first sentence of answer text in question and answer is usually pair
The brief repetition of problem, the tail sentence of answer text are usually the short summary to answer content, so being taken out carrying out answer abstract
When taking, the component part of the first sentence and tail sentence of answer text as abstract is extracted, ensure that the theme of the text snippet extracted
Integrality, to improve the accuracy rate of answer selection.
Description of the drawings
Below in conjunction with the accompanying drawings, the specific implementation mode of the present invention is described in detail:
Fig. 1 is a kind of flow diagram of the non-true class quiz answers selection method based on text snippet;
Fig. 2 is the text snippet extraction schematic diagram of answer;
Fig. 3 is TextRank weight maps;
Fig. 4 is neural network semantic expressiveness model framework chart;
Fig. 5 is a kind of structural schematic diagram of the non-true class quiz answers selection system based on text snippet.
Specific implementation mode
In order to illustrate further the feature of the present invention, reference should be made to the following detailed description and accompanying drawings of the present invention.Institute
Attached drawing is only for reference and purposes of discussion, is not used for limiting protection scope of the present invention.
The embodiment of the present application is by providing a kind of non-true class quiz answers selection method based on text snippet, to solve
The low problem of existing request-answer system answer selection accuracy rate.
To solve the above-mentioned problems, the main thought of the present embodiment is protected from when the answer text of selection extracts abstract
The first sentence and tail sentence of answer text are stayed, then from abstract is extracted in remaining content of text after the first sentence of removing and tail sentence, is then pressed
First sentence, abstract, tail sentence are combined into final text snippet according to sequence, final text snippet and question sentence are matched, obtained
To the answer for return.
As shown in Figure 1 to Figure 2, to a kind of non-true class quiz answers selection based on text snippet provided in this embodiment
Method is described in detail comprising following steps S1 to S5:
The first sentence and tail sentence of answer text to be selected described in S1, extraction;
It is S2, remaining in addition to first sentence and tail sentence to the answer text to be selected using text snippet model TextRank
Text carries out abstract extraction, obtains preliminary text snippet;
S3, the first sentence, the preliminary text snippet and the tail sentence are combined successively, obtains the answer text for waiting for selection
This abstract;
S4, using question sentence and described wait that the answer text snippet selected as the input of neural network semantic expressiveness model, obtains
To the semantic degree of correlation of question sentence and the answer text snippet for waiting for selection;
S5, it will be returned as answer with the highest answer text snippet of question semanteme degree of correlation.
It should be noted that question sentence and answer text snippet are input in neural network answer preference pattern, god is used
Question sentence and answer text snippet are encoded through network, indicated by obtaining its vector to the excavation of text semantic, it is final logical
The similarity for crossing the semantic vector for calculating question sentence and answer text obtains its semantic degree of correlation.
As further preferred scheme, above-mentioned steps S1:The first sentence and tail sentence of answer text to be selected described in extraction,
Specifically extraction process is:The position for identifying the position and tail sentence of the first sentence of answer text to be selected first, then according to first sentence
Position and the position of tail sentence first sentence and tail sentence are extracted.For example, identifying the fullstop first appeared in answer text
Position, and extracted the sentence before fullstop as first sentence as first sentence.Identify two finally occurred in answer text
The position of fullstop, and the sentence between two fullstop is extracted as tail sentence.
As further preferred scheme, to above-mentioned steps S2:Using text snippet model TextRank to described to be selected
It selects answer text remaining text in addition to first sentence and tail sentence and carries out abstract extraction, obtain preliminary text snippet.It carries out specifically
It is bright as follows:
When TextRank algorithm is extracted for critical sentence, sentence is labeled as node first, is then built according to the number on side
Vertical graph model.When carrying out the calculating of sentence similarity, the method used in TextRank is generally based on the side of vocabulary overlapping
Method is exactly that reduplication number in two sentences is more, then similarity is higher.In addition to word be overlapped, it is also possible to use character string,
The sentence similarities computational methods such as cosine similarity, longest common subsequence, part of speech method are all based on statistical information.It establishes
After graph model, recycles PageRank algorithms to carry out recursive calculation, finally obtain the score of node.Node score is higher, sentence
Importance it is higher.After being ranked up according to the importance of sentence, critical sentence is extracted according to required ratio and forms text snippet.
Its key step is as follows:
(1) it pre-processes:Text is divided into several text units (lexical item or sentence), part of speech mark is carried out again after participle
Note.Word information after mark is filtered, filtering content includes stop words and part of speech, finally only retains the word of specific part of speech
.
(2) weight graph model is built:By text unit configuration node, between the similarity configuration node between text unit
Side forms weight graph model.
(3) sentence similarity calculates:The similarity that two sentences are calculated using the method for word-based overlapping, to sentence Si
And SjSimilarity calculation is carried out with following formula, it is specific as follows:
Wherein, SiAnd SjRepresent two sentences.Sentence SiBy NiA lexical item indicates:wkIndicate two
The word that a sentence all includes, then the weights W on sideji=Similarity (Si,Sj)。
(4) node score calculation formula is iterated to convergence, obtains each node score:TextRank algorithm model can
It is indicated with G=(V, E).All node sets in figure are expressed as V by algorithm, and all line sets in figure are expressed as E, V and E structures
At all the elements in figure, wherein E is the subset of V × V.Node ViScore it is as follows:
Wherein, wjiIndicate node VjWith node ViBetween connect side weight, generally use node VjWith node ViSimilarity
It indicates;In(Vi) indicate to be directed toward ViAll node sets of node, Out (Vj) indicate node VjAll node sets being directed toward, *
Indicate multiplication sign;D is known as damped coefficient (0≤d≤1), indicates that a certain node in Fig. 3 jumps to the probability of any other node, d
Generally take 0.85.
In addition, it is noted that 2 points when using TextRank algorithm;First, initial value is set, and is generally allowed at the beginning of all nodes
Beginning is scored at 1;Secondly, convergence judgement, general threshold values of restraining is 0.0001, i.e., the error rate of any one node, which is less than, in figure changes
When 0.0001, reach convergence, stops iteration.
(5) all nodes are ranked up according to each node score, and according to different decimation ratios, are taken out in each node
Take crucial phrase at preliminary summary texts.
It should be noted that combining actual needs, different decimation ratios is set, can remove the colloquial style table in answer text
It reaches and redundancy, it is ensured that the accuracy of keyword extraction.
It should be noted that TextRank algorithm is the warp of the keyword abstraction and the extraction of digest sentence for carrying out text
Allusion quotation method, principle are a kind of unsupervised algorithms based on figure.TextRank algorithm is to the keyword in text in the present embodiment
It is ranked up using PageRank algorithms with critical sentence.
For example, sentence S is calculatediWith sentence SjSimilarity, initially set up weight map as shown in figure 3, node ViIt indicates
Sentence Si, node VjIndicate sentence Sj.Node VjWith node VkSimilarity be expressed as wjk.Node VjWith node Vk+1It is similar
Spend wjk+1It can be obtained by formula.Egress V can be calculated according to formulaiTextRank score, wherein wjk+wjk+1It indicates
Node VjScore:
It should be noted that TextRank algorithm is a kind of unsupervised approaches extracted to keyword and critical sentence.
Its advantage is that be not necessarily to training corpus, the text of different field content can be performed well in, do not have to consider linguistic knowledge or
Person's domain knowledge has considered the overall structure of text.The disadvantage is that since TextRank algorithm only considered the complete of text
Office's information, lacks characteristic information of text unit itself, for example, the position of sentence, the position etc. of lexical item.
In practical application, in the question and answer pair of non-true class question answering system, much longer than question sentence of the length of answer text,
But utilize single text snippet abstracting method, when extract abstract ratio setting it is very low when, be easy to cause topic drift
Problem.As shown in figure 4, this programme retains the first sentence and tail sentence of answer text, then again when carrying out the extraction of answer text snippet
Abstract extraction is carried out using abstract abstracting method.By in question and answer the characteristics of answer text it is found that the first sentence of answer be usually to asking
The brief repetition of topic, followed by way to solve the problem;The tail sentence of answer is usually the short summary to answer content.So
When carrying out answer abstract extraction, ensure the theme integrality of abstract using the first sentence and tail sentence of answer text, and then improve
The accuracy rate of answer selection.
Meanwhile the abstract of extraction is relative to colloquial style expression of the original answer text suppression without practical significance and redundancy letter
Breath, obtains efficient answer text representation, then obtains including more key messages by neural network semantic expressiveness model
Semantic vector.
As shown in figure 5, present embodiment discloses a kind of, the non-true class quiz answers based on text snippet select system, packet
Include sequentially connected first abstraction module 10, the second abstraction module 20, composite module 30, matching module 40 and determining module
50;
First abstraction module 10, first sentence and tail sentence for extracting the answer text to be selected;
Second abstraction module 20, for removing first sentence to the answer text to be selected using text snippet model TextRank
Abstract extraction is carried out with remaining text except tail sentence, obtains preliminary text snippet;
Composite module 30 is waited for for combining the first sentence, the preliminary text snippet and the tail sentence successively
The answer text snippet of selection;
Matching module 40, for using question sentence and the answer text snippet for waiting selecting as neural network semantic expressiveness mould
The input of type obtains the semantic degree of correlation of question sentence and the answer text snippet for waiting for selection;
Determining module 50, for will be returned as answer with the highest answer text snippet of question semanteme degree of correlation.
As further preferred scheme, first abstraction module 10 is specifically used for:
According to the position of first sentence and tail sentence in the answer text to be selected, by the first sentence of the answer text to be selected and
Tail sentence extracts.
As further preferred scheme, second abstraction module 20 includes sequentially connected cutting unit, filtering list
Member, weight map model construction unit, similarity calculated, iteration unit, sequencing unit and component units;
Cutting unit, for segmenting the answer text segmentation to be selected at sentence, and to each sentence;
Filter element is labeled for the part of speech to each word, and is filtered to the information of word after mark, and spy is obtained
Determine the lexical item of word;
Weight map model construction unit is used for using the specific lexical item or sentence as text unit, by text list
First configuration node, the side between similarity configuration node between text unit, obtains weight graph model;
Similarity calculated, the similarity for calculating any two nodes, and using similarity value as node weight restatement
Calculate the calculating parameter of formula;
Iteration unit, for being iterated to the node weights calculation formula until convergence, obtains the score of each node
As a result;
Sequencing unit, score when for according to convergence between each node, is ranked up each node, after being sorted
Each node;
Assembled unit extracts text unit composition for the decimation ratio according to setting, in each node after sequence just
Walk text snippet.
As further preferred scheme, the similarity calculating method that the similarity calculation module uses includes:Vocabulary
Overlay method, character string method, cosine similarity method and longest common subsequence method.
It should be understood that a kind of non-true class quiz answers selection system based on text snippet disclosed in the present embodiment
For realizing each flow in Fig. 1, and with a kind of non-true class quiz answers based on text snippet disclosed in above-described embodiment
Selection method technical characteristic having the same and identical effect are no longer described in detail at this.
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all the present invention spirit and
Within principle, any modification, equivalent replacement, improvement and so on should all be included in the protection scope of the present invention.
Claims (8)
1. a kind of non-true class quiz answers selection method based on text snippet, which is characterized in that including:
The first sentence and tail sentence of answer text to be selected described in extraction;
Using text snippet model TextRank to the answer text to be selected in addition to first sentence and tail sentence remaining text into
Row abstract extracts, and obtains preliminary text snippet;
The first sentence, the preliminary text snippet and the tail sentence are combined successively, obtain the answer text snippet for waiting for selection;
Using question sentence and the answer text snippet for waiting selecting as the input of neural network semantic expressiveness model, obtain question sentence and
The semantic degree of correlation of the answer text snippet for waiting for selection;
It will be returned as answer with the highest answer text snippet of question semanteme degree of correlation.
2. the non-true class quiz answers selection method based on text snippet as described in claim 1, which is characterized in that described
The first sentence and tail sentence of answer text to be selected described in extraction, including:
According to the position of first sentence and tail sentence in the answer text to be selected, by the first sentence and tail sentence of the answer text to be selected
It extracts.
3. the non-true class quiz answers selection method based on text snippet as described in claim 1, which is characterized in that described
Using text snippet model TextRank, to the answer text to be selected, the remaining text in addition to first sentence and tail sentence is plucked
It extracts, obtains preliminary text snippet, including:
The answer text segmentation to be selected is segmented at sentence, and to each sentence;
The part of speech of each word is labeled, and the information of word after mark is filtered, obtains the lexical item of specific word;
It is similar between text unit by text unit configuration node using specific time lexical item or sentence as text unit
The side between configuration node is spent, weight graph model is obtained;
The similarity of any two nodes is calculated, and using similarity value as the calculating parameter of node weights calculation formula;
The node weights calculation formula is iterated until convergence, obtains the scores of each node;
Score when according to convergence between each node, is ranked up each node, each node after being sorted;
Text unit, which is extracted, according to the decimation ratio of setting, in each node after sequence forms preliminary text snippet.
4. the non-true class quiz answers selection method based on text snippet as claimed in claim 3, which is characterized in that described
The computational methods of the similarity of any two nodes include:Vocabulary overlay method, character string method, cosine similarity method and maximum are common
Subsequence method.
5. a kind of non-true class quiz answers based on text snippet select system, which is characterized in that including sequentially connected the
One abstraction module, the second abstraction module, composite module, matching module and determining module;
First abstraction module, first sentence and tail sentence for extracting the answer text to be selected;
Second abstraction module, for removing first sentence and tail sentence to the answer text to be selected using text snippet model TextRank
Except remaining text carry out abstract extraction, obtain preliminary text snippet;
Composite module obtains waiting for selection for combining the first sentence, the preliminary text snippet and the tail sentence successively
Answer text snippet;
Matching module, for using question sentence and the answer text snippet for waiting selecting as the defeated of neural network semantic expressiveness model
Enter, obtains the semantic degree of correlation of question sentence and the answer text snippet for waiting for selection;
Determining module, for will be returned as answer with the highest answer text snippet of question semanteme degree of correlation.
6. the non-true class quiz answers based on text snippet select system as claimed in claim 5, which is characterized in that described
First abstraction module is specifically used for:
According to the position of first sentence and tail sentence in the answer text to be selected, by the first sentence and tail sentence of the answer text to be selected
It extracts.
7. the non-true class quiz answers based on text snippet select system as claimed in claim 5, which is characterized in that described
Second abstraction module includes sequentially connected cutting unit, filter element, weight map model construction unit, similarity calculation list
Member, iteration unit, sequencing unit and component units;
Cutting unit, for segmenting the answer text segmentation to be selected at sentence, and to each sentence;
Filter element is labeled for the part of speech to each word, and is filtered to the information of word after mark, and specific word is obtained
Lexical item;
Weight map model construction unit is used for using the specific lexical item or sentence as text unit, by text unit structure
At node, the side between similarity configuration node between text unit obtains weight graph model;
Similarity calculated, the similarity for calculating any two nodes, and calculate public affairs using similarity value as node weights
The calculating parameter of formula;
Iteration unit, for being iterated to the node weights calculation formula until convergence, obtains the scores of each node;
Sequencing unit, score when for according to convergence between each node, is ranked up each node, each section after being sorted
Point;
Assembled unit extracts the preliminary text of text unit composition for the decimation ratio according to setting, in each node after sequence
This abstract.
8. the non-true class quiz answers based on text snippet select system as claimed in claim 7, which is characterized in that described
Similarity calculation module use similarity calculating method include:Vocabulary overlay method, character string method, cosine similarity method and most
Big common subsequence method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810428163.8A CN108681574B (en) | 2018-05-07 | 2018-05-07 | Text abstract-based non-fact question-answer selection method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810428163.8A CN108681574B (en) | 2018-05-07 | 2018-05-07 | Text abstract-based non-fact question-answer selection method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108681574A true CN108681574A (en) | 2018-10-19 |
CN108681574B CN108681574B (en) | 2021-11-05 |
Family
ID=63801897
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810428163.8A Active CN108681574B (en) | 2018-05-07 | 2018-05-07 | Text abstract-based non-fact question-answer selection method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108681574B (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109543089A (en) * | 2018-11-30 | 2019-03-29 | 南方电网科学研究院有限责任公司 | A kind of classification method, system and the relevant apparatus of network security information data |
CN109766418A (en) * | 2018-12-13 | 2019-05-17 | 北京百度网讯科技有限公司 | Method and apparatus for output information |
CN109829052A (en) * | 2019-02-19 | 2019-05-31 | 田中瑶 | A kind of open dialogue method and system based on human-computer interaction |
CN109902284A (en) * | 2018-12-30 | 2019-06-18 | 中国科学院软件研究所 | A kind of unsupervised argument extracting method excavated based on debate |
CN110674286A (en) * | 2019-09-29 | 2020-01-10 | 出门问问信息科技有限公司 | Text abstract extraction method and device and storage equipment |
CN111241288A (en) * | 2020-01-17 | 2020-06-05 | 烟台海颐软件股份有限公司 | Emergency sensing system of large centralized power customer service center and construction method |
CN111401033A (en) * | 2020-03-19 | 2020-07-10 | 北京百度网讯科技有限公司 | Event extraction method, event extraction device and electronic equipment |
CN113282711A (en) * | 2021-06-03 | 2021-08-20 | 中国软件评测中心(工业和信息化部软件与集成电路促进中心) | Internet of vehicles text matching method and device, electronic equipment and storage medium |
CN113688231A (en) * | 2021-08-02 | 2021-11-23 | 北京小米移动软件有限公司 | Abstract extraction method and device of answer text, electronic equipment and medium |
CN113806500A (en) * | 2021-02-09 | 2021-12-17 | 京东科技控股股份有限公司 | Information processing method and device and computer equipment |
CN113918702A (en) * | 2021-10-25 | 2022-01-11 | 北京航空航天大学 | Semantic matching-based online legal automatic question-answering method and system |
CN114997175A (en) * | 2022-05-16 | 2022-09-02 | 电子科技大学 | Emotion analysis method based on field confrontation training |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060282306A1 (en) * | 2005-06-10 | 2006-12-14 | Unicru, Inc. | Employee selection via adaptive assessment |
CN104679728A (en) * | 2015-02-06 | 2015-06-03 | 中国农业大学 | Text similarity detection device |
CN104699763A (en) * | 2015-02-11 | 2015-06-10 | 中国科学院新疆理化技术研究所 | Text similarity measuring system based on multi-feature fusion |
CN106126492A (en) * | 2016-06-07 | 2016-11-16 | 北京高地信息技术有限公司 | Statement recognition methods based on two-way LSTM neutral net and device |
CN106202042A (en) * | 2016-07-06 | 2016-12-07 | 中央民族大学 | A kind of keyword abstraction method based on figure |
CN106844368A (en) * | 2015-12-03 | 2017-06-13 | 华为技术有限公司 | For interactive method, nerve network system and user equipment |
US20170316775A1 (en) * | 2016-04-27 | 2017-11-02 | Conduent Business Services, Llc | Dialog device with dialog support generated using a mixture of language models combined using a recurrent neural network |
CN107562792A (en) * | 2017-07-31 | 2018-01-09 | 同济大学 | A kind of question and answer matching process based on deep learning |
CN107590163A (en) * | 2016-07-06 | 2018-01-16 | 北京京东尚科信息技术有限公司 | The methods, devices and systems of text feature selection |
CN107832457A (en) * | 2017-11-24 | 2018-03-23 | 国网山东省电力公司电力科学研究院 | Power transmission and transforming equipment defect dictionary method for building up and system based on TextRank algorithm |
CN107980130A (en) * | 2017-11-02 | 2018-05-01 | 深圳前海达闼云端智能科技有限公司 | It is automatic to answer method, apparatus, storage medium and electronic equipment |
-
2018
- 2018-05-07 CN CN201810428163.8A patent/CN108681574B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060282306A1 (en) * | 2005-06-10 | 2006-12-14 | Unicru, Inc. | Employee selection via adaptive assessment |
CN104679728A (en) * | 2015-02-06 | 2015-06-03 | 中国农业大学 | Text similarity detection device |
CN104699763A (en) * | 2015-02-11 | 2015-06-10 | 中国科学院新疆理化技术研究所 | Text similarity measuring system based on multi-feature fusion |
CN106844368A (en) * | 2015-12-03 | 2017-06-13 | 华为技术有限公司 | For interactive method, nerve network system and user equipment |
US20170316775A1 (en) * | 2016-04-27 | 2017-11-02 | Conduent Business Services, Llc | Dialog device with dialog support generated using a mixture of language models combined using a recurrent neural network |
CN106126492A (en) * | 2016-06-07 | 2016-11-16 | 北京高地信息技术有限公司 | Statement recognition methods based on two-way LSTM neutral net and device |
CN106202042A (en) * | 2016-07-06 | 2016-12-07 | 中央民族大学 | A kind of keyword abstraction method based on figure |
CN107590163A (en) * | 2016-07-06 | 2018-01-16 | 北京京东尚科信息技术有限公司 | The methods, devices and systems of text feature selection |
CN107562792A (en) * | 2017-07-31 | 2018-01-09 | 同济大学 | A kind of question and answer matching process based on deep learning |
CN107980130A (en) * | 2017-11-02 | 2018-05-01 | 深圳前海达闼云端智能科技有限公司 | It is automatic to answer method, apparatus, storage medium and electronic equipment |
CN107832457A (en) * | 2017-11-24 | 2018-03-23 | 国网山东省电力公司电力科学研究院 | Power transmission and transforming equipment defect dictionary method for building up and system based on TextRank algorithm |
Non-Patent Citations (2)
Title |
---|
FEIFEI DAI 等: ""Intent Identification for Knowledge Base Question Answering"", 《2017 CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI)》 * |
金丽娇 等: ""基于卷积神经网络的自动问答"", 《华东师范大学学报(自然科学版)》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109543089A (en) * | 2018-11-30 | 2019-03-29 | 南方电网科学研究院有限责任公司 | A kind of classification method, system and the relevant apparatus of network security information data |
CN109766418A (en) * | 2018-12-13 | 2019-05-17 | 北京百度网讯科技有限公司 | Method and apparatus for output information |
CN109902284A (en) * | 2018-12-30 | 2019-06-18 | 中国科学院软件研究所 | A kind of unsupervised argument extracting method excavated based on debate |
CN109829052A (en) * | 2019-02-19 | 2019-05-31 | 田中瑶 | A kind of open dialogue method and system based on human-computer interaction |
CN110674286A (en) * | 2019-09-29 | 2020-01-10 | 出门问问信息科技有限公司 | Text abstract extraction method and device and storage equipment |
CN111241288A (en) * | 2020-01-17 | 2020-06-05 | 烟台海颐软件股份有限公司 | Emergency sensing system of large centralized power customer service center and construction method |
CN111401033A (en) * | 2020-03-19 | 2020-07-10 | 北京百度网讯科技有限公司 | Event extraction method, event extraction device and electronic equipment |
US11928435B2 (en) | 2020-03-19 | 2024-03-12 | Beijing Baidu Netcom Science Technology Co., Ltd. | Event extraction method, event extraction device, and electronic device |
CN113806500A (en) * | 2021-02-09 | 2021-12-17 | 京东科技控股股份有限公司 | Information processing method and device and computer equipment |
CN113806500B (en) * | 2021-02-09 | 2024-05-28 | 京东科技控股股份有限公司 | Information processing method, device and computer equipment |
CN113282711A (en) * | 2021-06-03 | 2021-08-20 | 中国软件评测中心(工业和信息化部软件与集成电路促进中心) | Internet of vehicles text matching method and device, electronic equipment and storage medium |
CN113282711B (en) * | 2021-06-03 | 2023-09-22 | 中国软件评测中心(工业和信息化部软件与集成电路促进中心) | Internet of vehicles text matching method and device, electronic equipment and storage medium |
CN113688231A (en) * | 2021-08-02 | 2021-11-23 | 北京小米移动软件有限公司 | Abstract extraction method and device of answer text, electronic equipment and medium |
CN113918702A (en) * | 2021-10-25 | 2022-01-11 | 北京航空航天大学 | Semantic matching-based online legal automatic question-answering method and system |
CN114997175A (en) * | 2022-05-16 | 2022-09-02 | 电子科技大学 | Emotion analysis method based on field confrontation training |
Also Published As
Publication number | Publication date |
---|---|
CN108681574B (en) | 2021-11-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108681574A (en) | A kind of non-true class quiz answers selection method and system based on text snippet | |
CN109189942B (en) | Construction method and device of patent data knowledge graph | |
CN106844658B (en) | Automatic construction method and system of Chinese text knowledge graph | |
CN106997382B (en) | Innovative creative tag automatic labeling method and system based on big data | |
CN108363743B (en) | Intelligent problem generation method and device and computer readable storage medium | |
CN105528437B (en) | A kind of question answering system construction method extracted based on structured text knowledge | |
CN108052593A (en) | A kind of subject key words extracting method based on descriptor vector sum network structure | |
JP2017511922A (en) | Method, system, and storage medium for realizing smart question answer | |
CN107729468A (en) | Answer extracting method and system based on deep learning | |
CN109271524B (en) | Entity linking method in knowledge base question-answering system | |
Al-Taani et al. | An extractive graph-based Arabic text summarization approach | |
CN114065758A (en) | Document keyword extraction method based on hypergraph random walk | |
CN110188174B (en) | Professional field FAQ intelligent question and answer method based on professional vocabulary mining | |
CN114912449B (en) | Technical feature keyword extraction method and system based on code description text | |
CN114428850B (en) | Text retrieval matching method and system | |
CN112036178A (en) | Distribution network entity related semantic search method | |
CN106777080A (en) | Short abstraction generating method, database building method and interactive method | |
Nityasya et al. | Hypernym-hyponym relation extraction from indonesian wikipedia text | |
CN112417170B (en) | Relationship linking method for incomplete knowledge graph | |
Wu et al. | Domain Event Extraction and Representation with Domain Ontology. | |
CN109002540B (en) | Method for automatically generating Chinese announcement document question answer pairs | |
CN113919339A (en) | Artificial intelligence auxiliary writing method | |
CN109215797B (en) | Method and system for extracting non-classification relation of traditional Chinese medicine medical case based on extended association rule | |
Tohalino et al. | Using virtual edges to extract keywords from texts modeled as complex networks | |
Jebbor et al. | Overview of knowledge extraction techniques in five question-answering systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |