CN108681574A - A kind of non-true class quiz answers selection method and system based on text snippet - Google Patents

A kind of non-true class quiz answers selection method and system based on text snippet Download PDF

Info

Publication number
CN108681574A
CN108681574A CN201810428163.8A CN201810428163A CN108681574A CN 108681574 A CN108681574 A CN 108681574A CN 201810428163 A CN201810428163 A CN 201810428163A CN 108681574 A CN108681574 A CN 108681574A
Authority
CN
China
Prior art keywords
sentence
text
answer
snippet
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810428163.8A
Other languages
Chinese (zh)
Other versions
CN108681574B (en
Inventor
马荣强
张健
李淼
陈雷
高会议
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Institutes of Physical Science of CAS
Original Assignee
Hefei Institutes of Physical Science of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Institutes of Physical Science of CAS filed Critical Hefei Institutes of Physical Science of CAS
Priority to CN201810428163.8A priority Critical patent/CN108681574B/en
Publication of CN108681574A publication Critical patent/CN108681574A/en
Application granted granted Critical
Publication of CN108681574B publication Critical patent/CN108681574B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The non-true class quiz answers selection method and system that the invention discloses a kind of based on text snippet, belong to intelligent Search Technique field, include the first sentence and tail sentence of answer text to be selected described in extraction;Selection answer text remaining text in addition to first sentence and tail sentence is treated using text snippet model TextRank and carries out abstract extraction, obtains preliminary text snippet;First sentence, preliminary text snippet and tail sentence are combined successively, obtain the answer text snippet for waiting for selection;Using question sentence and the answer text snippet selected is waited as the input of neural network semantic expressiveness model, obtains question sentence and waits for the semantic degree of correlation of the answer text snippet of selection;It will be returned as answer with the highest answer text snippet of question semanteme degree of correlation.The present invention extracts the component part of the first sentence and tail sentence of answer text as abstract, ensure that the theme integrality of the text snippet extracted, to improve the accuracy rate of answer selection when carrying out answer abstract extraction.

Description

A kind of non-true class quiz answers selection method and system based on text snippet
Technical field
The present invention relates to intelligent Search Technique field, more particularly to a kind of non-true class quiz answers based on text snippet Selection method and system.
Background technology
Currently, question answering system has become one important research topic of natural language processing field, it is used for acquisition of information Multiple fields, such as information retrieval, expert system, automatic question answering and man-machine natural language interaction etc..Question answering system and letter Breath retrieves different place and is that it does not need user oneself and finds answer, but directly returns to answer.
According to the different data source of question answering system, be divided into three classes question answering system:Question answering system based on structural data, Question answering system based on free text and the question answering system based on problem answers pair.Wherein, the question and answer system based on problem answers pair The workflow of system will be returned with semantic most matched answer by meaning of one's words signature analysis after user's proposition problem, data master To come from Web Community's question and answer.
Early stage is generally basede on traditional semantic feature extraction to the research of answer selection method, manually chooses text feature, so It is trained using high performance classifier, is compared to carry out the method interpretation of semantic expressiveness using Manual definition's feature afterwards By force, the selection of feature covers entire data set.The feature of the selection mainly reflected language from answer content of text Sentence quality and problem answers and the correlation in answer content.The feature manually chosen generally comprise word N gram language models, Syntactic structure and grammer dependence etc..The researcher of early stage is when carrying out the research of answer selection method, most common method Exactly text to be dealt with is segmented, after part-of-speech tagging or syntactic analysis by existing natural language processing tool, Answer preference pattern of the training based on Manual definition's feature.
However, the answer textual form in non-fact class question and answer has polygons, and there are noise information, utilization is general Language rule is difficult to be matched to correct option.Therefore the answer for non-true class question answering system selects task, current mainstream side Method be the semantic information of text is excavated using the machine learning method for having supervision based on received text, such as:
The matching characteristic of word level is trained using SVM models, such as Keywords matching feature, phrase rank Non-semantic category feature, the more also feature etc. based on name entity.Researcher also by natural language processing tool come The feature of text is extracted, is included whether comprising mark to develop a series of lexical characteristics related with answer quality Point, hyperlink, the quantity of special word, part of speech and the frequency etc. for naming substance feature and N gram language models.Use syntax tree can be with The partial structurtes information for preferably capturing sentence, the answer choosing method based on syntax tree can effectively reduce feature selecting Workload.The method being combined using syntax and semantic feature carries out answer selection, by computational problem and is answered in terms of syntax Tree edit distance between the interdependent syntax tree of case, and semantic aspect is special using shallow semantics such as entity type, synonyms Sign.
Wherein, tree edit distance is to calculate the required operation (be inserted into, delete and replace) from two tree transfer processes Total dissipation value, calculating process is similar with the editing distance of character string, use condition random field (Conditional Random Fields, CRF) sequence in question and answer is labeled, practical feature includes tree edit distance and string editing distance etc.. This is that the answer select permeability by community's question and answer is converted for sequence labelling problem for the first time.In addition to syntax tree, also some are studied Person from the correlation of the angle changing rate problem and answer text of language model and term vector, such as using the model based on translation come Problem and candidate answers are regarded as two different language by the degree of correlation of comparison problem and answer.
Answer selection method based on traditional semantic feature extraction often has good interpretation, passes through what is manually chosen Feature can find its foundation, be easy to make one to understand.But when carrying out answer selection using the method, can also exist Defect:First, depending on some and the relevant kit of natural language field basic research, this allows for selected feature Effect depends on the effect of basic research.The thought of feature extraction may have foundation very much, but face complicated text, can not Obtain desired result.Second is that the feature extracted in answer preference pattern ultimately depends on the selection of people, model does not learn by oneself energy Power results in the limitation of model application.
Invention content
The purpose of the present invention is to provide a kind of non-true class quiz answers selection method and system based on text snippet, To improve the accuracy rate of request-answer system answer selection.
In order to achieve the above object, the present invention uses a kind of non-true class quiz answers selection method based on text snippet, Include the following steps:
The first sentence and tail sentence of answer text to be selected described in extraction;
Using text snippet model TextRank to the answer text to be selected the remaining text in addition to first sentence and tail sentence This carries out abstract extraction, obtains preliminary text snippet;
The first sentence, the preliminary text snippet and the tail sentence are combined successively, obtain the answer text for waiting for selection Abstract;
Using question sentence and described wait that the answer text snippet selected as the input of neural network semantic expressiveness model, is asked The semantic degree of correlation of sentence and the answer text snippet for waiting for selection;
It will be returned as answer with the highest answer text snippet of question semanteme degree of correlation.
Preferably, the first sentence and tail sentence of answer text to be selected described in the extraction, including:
According to the position of first sentence and tail sentence in the answer text to be selected, by the first sentence of the answer text to be selected and Tail sentence extracts.
Preferably, it is described using text snippet model TextRank to answer text select remove first sentence and tail sentence it Outer remaining text carries out abstract extraction, obtains preliminary text snippet, including:
The answer text segmentation to be selected is segmented at sentence, and to each sentence;
The part of speech of each word is labeled, and the information of word after mark is filtered, obtains the lexical item of specific word;
Using specific time lexical item or sentence as text unit, by text unit configuration node, between text unit Side between similarity configuration node obtains weight graph model;
The similarity of any two nodes is calculated, and using similarity value as the calculating parameter of node weights calculation formula;
The node weights calculation formula is iterated until convergence, obtains the scores of each node;
Score when according to convergence between each node, is ranked up each node, each node after being sorted;
Text unit, which is extracted, according to the decimation ratio of setting, in each node after sequence forms preliminary text snippet.
Preferably, the computational methods of the similarity of any two nodes include:Vocabulary overlay method, character string method, cosine Semblance and longest common subsequence method.
On the other hand, using a kind of, the non-true class quiz answers based on text snippet select system, including are sequentially connected The first abstraction module, the second abstraction module, composite module, matching module and determining module;
First abstraction module, first sentence and tail sentence for extracting the answer text to be selected;
Second abstraction module, for using text snippet model TextRank to the answer text to be selected except first sentence and Remaining text carries out abstract extraction except tail sentence, obtains preliminary text snippet;
Composite module obtains to be selected for combining the first sentence, the preliminary text snippet and the tail sentence successively The answer text snippet selected;
Matching module, for using question sentence and the answer text snippet for waiting selecting as neural network semantic expressiveness model Input, obtain the semantic degree of correlation of question sentence and the answer text snippet for waiting for selection;
Determining module, for will be returned as answer with the highest answer text snippet of question semanteme degree of correlation.
Preferably, first abstraction module is specifically used for:
According to the position of first sentence and tail sentence in the answer text to be selected, by the first sentence of the answer text to be selected and Tail sentence extracts.
Preferably, second abstraction module includes sequentially connected cutting unit, filter element, weight map model construction Unit, similarity calculated, iteration unit, sequencing unit and component units;
Cutting unit, for segmenting the answer text segmentation to be selected at sentence, and to each sentence;
Filter element is labeled for the part of speech to each word, and is filtered to the information of word after mark, and spy is obtained Determine the lexical item of word;
Weight map model construction unit is used for using the specific lexical item or sentence as text unit, by text list First configuration node, the side between similarity configuration node between text unit, obtains weight graph model;
Similarity calculated, the similarity for calculating any two nodes, and using similarity value as node weight restatement Calculate the calculating parameter of formula;
Iteration unit, for being iterated to the node weights calculation formula until convergence, obtains the score of each node As a result;
Sequencing unit, score when for according to convergence between each node, is ranked up each node, after being sorted Each node;
Assembled unit extracts text unit composition for the decimation ratio according to setting, in each node after sequence just Walk text snippet.
Preferably, the similarity calculating method of the similarity calculation module use includes:Vocabulary overlay method, character string Method, cosine similarity method and longest common subsequence method.
Compared with prior art, there are following technique effects by the present invention:In practical applications, it is contemplated that asked in non-true class Answer the question and answer centering of system, much longer than question sentence of the length of answer text, if using single text snippet abstracting method, The global information for only considering text, lacks the position of characteristic information such as sentence of text unit itself, the position etc. of lexical item, when Extract abstract ratio setting it is very low when, be easy to cause topic drift problem.This programme is carrying out answer text snippet extraction When, retain answer text first sentence and tail sentence, recycle abstract abstracting method carry out abstract extraction, in sequence by first sentence, pluck Want, tail sentence three parts combine, as the abstract result finally extracted.Since the first sentence of answer text in question and answer is usually pair The brief repetition of problem, the tail sentence of answer text are usually the short summary to answer content, so being taken out carrying out answer abstract When taking, the component part of the first sentence and tail sentence of answer text as abstract is extracted, ensure that the theme of the text snippet extracted Integrality, to improve the accuracy rate of answer selection.
Description of the drawings
Below in conjunction with the accompanying drawings, the specific implementation mode of the present invention is described in detail:
Fig. 1 is a kind of flow diagram of the non-true class quiz answers selection method based on text snippet;
Fig. 2 is the text snippet extraction schematic diagram of answer;
Fig. 3 is TextRank weight maps;
Fig. 4 is neural network semantic expressiveness model framework chart;
Fig. 5 is a kind of structural schematic diagram of the non-true class quiz answers selection system based on text snippet.
Specific implementation mode
In order to illustrate further the feature of the present invention, reference should be made to the following detailed description and accompanying drawings of the present invention.Institute Attached drawing is only for reference and purposes of discussion, is not used for limiting protection scope of the present invention.
The embodiment of the present application is by providing a kind of non-true class quiz answers selection method based on text snippet, to solve The low problem of existing request-answer system answer selection accuracy rate.
To solve the above-mentioned problems, the main thought of the present embodiment is protected from when the answer text of selection extracts abstract The first sentence and tail sentence of answer text are stayed, then from abstract is extracted in remaining content of text after the first sentence of removing and tail sentence, is then pressed First sentence, abstract, tail sentence are combined into final text snippet according to sequence, final text snippet and question sentence are matched, obtained To the answer for return.
As shown in Figure 1 to Figure 2, to a kind of non-true class quiz answers selection based on text snippet provided in this embodiment Method is described in detail comprising following steps S1 to S5:
The first sentence and tail sentence of answer text to be selected described in S1, extraction;
It is S2, remaining in addition to first sentence and tail sentence to the answer text to be selected using text snippet model TextRank Text carries out abstract extraction, obtains preliminary text snippet;
S3, the first sentence, the preliminary text snippet and the tail sentence are combined successively, obtains the answer text for waiting for selection This abstract;
S4, using question sentence and described wait that the answer text snippet selected as the input of neural network semantic expressiveness model, obtains To the semantic degree of correlation of question sentence and the answer text snippet for waiting for selection;
S5, it will be returned as answer with the highest answer text snippet of question semanteme degree of correlation.
It should be noted that question sentence and answer text snippet are input in neural network answer preference pattern, god is used Question sentence and answer text snippet are encoded through network, indicated by obtaining its vector to the excavation of text semantic, it is final logical The similarity for crossing the semantic vector for calculating question sentence and answer text obtains its semantic degree of correlation.
As further preferred scheme, above-mentioned steps S1:The first sentence and tail sentence of answer text to be selected described in extraction, Specifically extraction process is:The position for identifying the position and tail sentence of the first sentence of answer text to be selected first, then according to first sentence Position and the position of tail sentence first sentence and tail sentence are extracted.For example, identifying the fullstop first appeared in answer text Position, and extracted the sentence before fullstop as first sentence as first sentence.Identify two finally occurred in answer text The position of fullstop, and the sentence between two fullstop is extracted as tail sentence.
As further preferred scheme, to above-mentioned steps S2:Using text snippet model TextRank to described to be selected It selects answer text remaining text in addition to first sentence and tail sentence and carries out abstract extraction, obtain preliminary text snippet.It carries out specifically It is bright as follows:
When TextRank algorithm is extracted for critical sentence, sentence is labeled as node first, is then built according to the number on side Vertical graph model.When carrying out the calculating of sentence similarity, the method used in TextRank is generally based on the side of vocabulary overlapping Method is exactly that reduplication number in two sentences is more, then similarity is higher.In addition to word be overlapped, it is also possible to use character string, The sentence similarities computational methods such as cosine similarity, longest common subsequence, part of speech method are all based on statistical information.It establishes After graph model, recycles PageRank algorithms to carry out recursive calculation, finally obtain the score of node.Node score is higher, sentence Importance it is higher.After being ranked up according to the importance of sentence, critical sentence is extracted according to required ratio and forms text snippet.
Its key step is as follows:
(1) it pre-processes:Text is divided into several text units (lexical item or sentence), part of speech mark is carried out again after participle Note.Word information after mark is filtered, filtering content includes stop words and part of speech, finally only retains the word of specific part of speech .
(2) weight graph model is built:By text unit configuration node, between the similarity configuration node between text unit Side forms weight graph model.
(3) sentence similarity calculates:The similarity that two sentences are calculated using the method for word-based overlapping, to sentence Si And SjSimilarity calculation is carried out with following formula, it is specific as follows:
Wherein, SiAnd SjRepresent two sentences.Sentence SiBy NiA lexical item indicates:wkIndicate two The word that a sentence all includes, then the weights W on sideji=Similarity (Si,Sj)。
(4) node score calculation formula is iterated to convergence, obtains each node score:TextRank algorithm model can It is indicated with G=(V, E).All node sets in figure are expressed as V by algorithm, and all line sets in figure are expressed as E, V and E structures At all the elements in figure, wherein E is the subset of V × V.Node ViScore it is as follows:
Wherein, wjiIndicate node VjWith node ViBetween connect side weight, generally use node VjWith node ViSimilarity It indicates;In(Vi) indicate to be directed toward ViAll node sets of node, Out (Vj) indicate node VjAll node sets being directed toward, * Indicate multiplication sign;D is known as damped coefficient (0≤d≤1), indicates that a certain node in Fig. 3 jumps to the probability of any other node, d Generally take 0.85.
In addition, it is noted that 2 points when using TextRank algorithm;First, initial value is set, and is generally allowed at the beginning of all nodes Beginning is scored at 1;Secondly, convergence judgement, general threshold values of restraining is 0.0001, i.e., the error rate of any one node, which is less than, in figure changes When 0.0001, reach convergence, stops iteration.
(5) all nodes are ranked up according to each node score, and according to different decimation ratios, are taken out in each node Take crucial phrase at preliminary summary texts.
It should be noted that combining actual needs, different decimation ratios is set, can remove the colloquial style table in answer text It reaches and redundancy, it is ensured that the accuracy of keyword extraction.
It should be noted that TextRank algorithm is the warp of the keyword abstraction and the extraction of digest sentence for carrying out text Allusion quotation method, principle are a kind of unsupervised algorithms based on figure.TextRank algorithm is to the keyword in text in the present embodiment It is ranked up using PageRank algorithms with critical sentence.
For example, sentence S is calculatediWith sentence SjSimilarity, initially set up weight map as shown in figure 3, node ViIt indicates Sentence Si, node VjIndicate sentence Sj.Node VjWith node VkSimilarity be expressed as wjk.Node VjWith node Vk+1It is similar Spend wjk+1It can be obtained by formula.Egress V can be calculated according to formulaiTextRank score, wherein wjk+wjk+1It indicates Node VjScore:
It should be noted that TextRank algorithm is a kind of unsupervised approaches extracted to keyword and critical sentence. Its advantage is that be not necessarily to training corpus, the text of different field content can be performed well in, do not have to consider linguistic knowledge or Person's domain knowledge has considered the overall structure of text.The disadvantage is that since TextRank algorithm only considered the complete of text Office's information, lacks characteristic information of text unit itself, for example, the position of sentence, the position etc. of lexical item.
In practical application, in the question and answer pair of non-true class question answering system, much longer than question sentence of the length of answer text, But utilize single text snippet abstracting method, when extract abstract ratio setting it is very low when, be easy to cause topic drift Problem.As shown in figure 4, this programme retains the first sentence and tail sentence of answer text, then again when carrying out the extraction of answer text snippet Abstract extraction is carried out using abstract abstracting method.By in question and answer the characteristics of answer text it is found that the first sentence of answer be usually to asking The brief repetition of topic, followed by way to solve the problem;The tail sentence of answer is usually the short summary to answer content.So When carrying out answer abstract extraction, ensure the theme integrality of abstract using the first sentence and tail sentence of answer text, and then improve The accuracy rate of answer selection.
Meanwhile the abstract of extraction is relative to colloquial style expression of the original answer text suppression without practical significance and redundancy letter Breath, obtains efficient answer text representation, then obtains including more key messages by neural network semantic expressiveness model Semantic vector.
As shown in figure 5, present embodiment discloses a kind of, the non-true class quiz answers based on text snippet select system, packet Include sequentially connected first abstraction module 10, the second abstraction module 20, composite module 30, matching module 40 and determining module 50;
First abstraction module 10, first sentence and tail sentence for extracting the answer text to be selected;
Second abstraction module 20, for removing first sentence to the answer text to be selected using text snippet model TextRank Abstract extraction is carried out with remaining text except tail sentence, obtains preliminary text snippet;
Composite module 30 is waited for for combining the first sentence, the preliminary text snippet and the tail sentence successively The answer text snippet of selection;
Matching module 40, for using question sentence and the answer text snippet for waiting selecting as neural network semantic expressiveness mould The input of type obtains the semantic degree of correlation of question sentence and the answer text snippet for waiting for selection;
Determining module 50, for will be returned as answer with the highest answer text snippet of question semanteme degree of correlation.
As further preferred scheme, first abstraction module 10 is specifically used for:
According to the position of first sentence and tail sentence in the answer text to be selected, by the first sentence of the answer text to be selected and Tail sentence extracts.
As further preferred scheme, second abstraction module 20 includes sequentially connected cutting unit, filtering list Member, weight map model construction unit, similarity calculated, iteration unit, sequencing unit and component units;
Cutting unit, for segmenting the answer text segmentation to be selected at sentence, and to each sentence;
Filter element is labeled for the part of speech to each word, and is filtered to the information of word after mark, and spy is obtained Determine the lexical item of word;
Weight map model construction unit is used for using the specific lexical item or sentence as text unit, by text list First configuration node, the side between similarity configuration node between text unit, obtains weight graph model;
Similarity calculated, the similarity for calculating any two nodes, and using similarity value as node weight restatement Calculate the calculating parameter of formula;
Iteration unit, for being iterated to the node weights calculation formula until convergence, obtains the score of each node As a result;
Sequencing unit, score when for according to convergence between each node, is ranked up each node, after being sorted Each node;
Assembled unit extracts text unit composition for the decimation ratio according to setting, in each node after sequence just Walk text snippet.
As further preferred scheme, the similarity calculating method that the similarity calculation module uses includes:Vocabulary Overlay method, character string method, cosine similarity method and longest common subsequence method.
It should be understood that a kind of non-true class quiz answers selection system based on text snippet disclosed in the present embodiment For realizing each flow in Fig. 1, and with a kind of non-true class quiz answers based on text snippet disclosed in above-described embodiment Selection method technical characteristic having the same and identical effect are no longer described in detail at this.
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent replacement, improvement and so on should all be included in the protection scope of the present invention.

Claims (8)

1. a kind of non-true class quiz answers selection method based on text snippet, which is characterized in that including:
The first sentence and tail sentence of answer text to be selected described in extraction;
Using text snippet model TextRank to the answer text to be selected in addition to first sentence and tail sentence remaining text into Row abstract extracts, and obtains preliminary text snippet;
The first sentence, the preliminary text snippet and the tail sentence are combined successively, obtain the answer text snippet for waiting for selection;
Using question sentence and the answer text snippet for waiting selecting as the input of neural network semantic expressiveness model, obtain question sentence and The semantic degree of correlation of the answer text snippet for waiting for selection;
It will be returned as answer with the highest answer text snippet of question semanteme degree of correlation.
2. the non-true class quiz answers selection method based on text snippet as described in claim 1, which is characterized in that described The first sentence and tail sentence of answer text to be selected described in extraction, including:
According to the position of first sentence and tail sentence in the answer text to be selected, by the first sentence and tail sentence of the answer text to be selected It extracts.
3. the non-true class quiz answers selection method based on text snippet as described in claim 1, which is characterized in that described Using text snippet model TextRank, to the answer text to be selected, the remaining text in addition to first sentence and tail sentence is plucked It extracts, obtains preliminary text snippet, including:
The answer text segmentation to be selected is segmented at sentence, and to each sentence;
The part of speech of each word is labeled, and the information of word after mark is filtered, obtains the lexical item of specific word;
It is similar between text unit by text unit configuration node using specific time lexical item or sentence as text unit The side between configuration node is spent, weight graph model is obtained;
The similarity of any two nodes is calculated, and using similarity value as the calculating parameter of node weights calculation formula;
The node weights calculation formula is iterated until convergence, obtains the scores of each node;
Score when according to convergence between each node, is ranked up each node, each node after being sorted;
Text unit, which is extracted, according to the decimation ratio of setting, in each node after sequence forms preliminary text snippet.
4. the non-true class quiz answers selection method based on text snippet as claimed in claim 3, which is characterized in that described The computational methods of the similarity of any two nodes include:Vocabulary overlay method, character string method, cosine similarity method and maximum are common Subsequence method.
5. a kind of non-true class quiz answers based on text snippet select system, which is characterized in that including sequentially connected the One abstraction module, the second abstraction module, composite module, matching module and determining module;
First abstraction module, first sentence and tail sentence for extracting the answer text to be selected;
Second abstraction module, for removing first sentence and tail sentence to the answer text to be selected using text snippet model TextRank Except remaining text carry out abstract extraction, obtain preliminary text snippet;
Composite module obtains waiting for selection for combining the first sentence, the preliminary text snippet and the tail sentence successively Answer text snippet;
Matching module, for using question sentence and the answer text snippet for waiting selecting as the defeated of neural network semantic expressiveness model Enter, obtains the semantic degree of correlation of question sentence and the answer text snippet for waiting for selection;
Determining module, for will be returned as answer with the highest answer text snippet of question semanteme degree of correlation.
6. the non-true class quiz answers based on text snippet select system as claimed in claim 5, which is characterized in that described First abstraction module is specifically used for:
According to the position of first sentence and tail sentence in the answer text to be selected, by the first sentence and tail sentence of the answer text to be selected It extracts.
7. the non-true class quiz answers based on text snippet select system as claimed in claim 5, which is characterized in that described Second abstraction module includes sequentially connected cutting unit, filter element, weight map model construction unit, similarity calculation list Member, iteration unit, sequencing unit and component units;
Cutting unit, for segmenting the answer text segmentation to be selected at sentence, and to each sentence;
Filter element is labeled for the part of speech to each word, and is filtered to the information of word after mark, and specific word is obtained Lexical item;
Weight map model construction unit is used for using the specific lexical item or sentence as text unit, by text unit structure At node, the side between similarity configuration node between text unit obtains weight graph model;
Similarity calculated, the similarity for calculating any two nodes, and calculate public affairs using similarity value as node weights The calculating parameter of formula;
Iteration unit, for being iterated to the node weights calculation formula until convergence, obtains the scores of each node;
Sequencing unit, score when for according to convergence between each node, is ranked up each node, each section after being sorted Point;
Assembled unit extracts the preliminary text of text unit composition for the decimation ratio according to setting, in each node after sequence This abstract.
8. the non-true class quiz answers based on text snippet select system as claimed in claim 7, which is characterized in that described Similarity calculation module use similarity calculating method include:Vocabulary overlay method, character string method, cosine similarity method and most Big common subsequence method.
CN201810428163.8A 2018-05-07 2018-05-07 Text abstract-based non-fact question-answer selection method and system Active CN108681574B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810428163.8A CN108681574B (en) 2018-05-07 2018-05-07 Text abstract-based non-fact question-answer selection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810428163.8A CN108681574B (en) 2018-05-07 2018-05-07 Text abstract-based non-fact question-answer selection method and system

Publications (2)

Publication Number Publication Date
CN108681574A true CN108681574A (en) 2018-10-19
CN108681574B CN108681574B (en) 2021-11-05

Family

ID=63801897

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810428163.8A Active CN108681574B (en) 2018-05-07 2018-05-07 Text abstract-based non-fact question-answer selection method and system

Country Status (1)

Country Link
CN (1) CN108681574B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109543089A (en) * 2018-11-30 2019-03-29 南方电网科学研究院有限责任公司 A kind of classification method, system and the relevant apparatus of network security information data
CN109766418A (en) * 2018-12-13 2019-05-17 北京百度网讯科技有限公司 Method and apparatus for output information
CN109829052A (en) * 2019-02-19 2019-05-31 田中瑶 A kind of open dialogue method and system based on human-computer interaction
CN109902284A (en) * 2018-12-30 2019-06-18 中国科学院软件研究所 A kind of unsupervised argument extracting method excavated based on debate
CN110674286A (en) * 2019-09-29 2020-01-10 出门问问信息科技有限公司 Text abstract extraction method and device and storage equipment
CN111241288A (en) * 2020-01-17 2020-06-05 烟台海颐软件股份有限公司 Emergency sensing system of large centralized power customer service center and construction method
CN111401033A (en) * 2020-03-19 2020-07-10 北京百度网讯科技有限公司 Event extraction method, event extraction device and electronic equipment
CN113282711A (en) * 2021-06-03 2021-08-20 中国软件评测中心(工业和信息化部软件与集成电路促进中心) Internet of vehicles text matching method and device, electronic equipment and storage medium
CN113688231A (en) * 2021-08-02 2021-11-23 北京小米移动软件有限公司 Abstract extraction method and device of answer text, electronic equipment and medium
CN113806500A (en) * 2021-02-09 2021-12-17 京东科技控股股份有限公司 Information processing method and device and computer equipment
CN113918702A (en) * 2021-10-25 2022-01-11 北京航空航天大学 Semantic matching-based online legal automatic question-answering method and system
CN114997175A (en) * 2022-05-16 2022-09-02 电子科技大学 Emotion analysis method based on field confrontation training

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060282306A1 (en) * 2005-06-10 2006-12-14 Unicru, Inc. Employee selection via adaptive assessment
CN104679728A (en) * 2015-02-06 2015-06-03 中国农业大学 Text similarity detection device
CN104699763A (en) * 2015-02-11 2015-06-10 中国科学院新疆理化技术研究所 Text similarity measuring system based on multi-feature fusion
CN106126492A (en) * 2016-06-07 2016-11-16 北京高地信息技术有限公司 Statement recognition methods based on two-way LSTM neutral net and device
CN106202042A (en) * 2016-07-06 2016-12-07 中央民族大学 A kind of keyword abstraction method based on figure
CN106844368A (en) * 2015-12-03 2017-06-13 华为技术有限公司 For interactive method, nerve network system and user equipment
US20170316775A1 (en) * 2016-04-27 2017-11-02 Conduent Business Services, Llc Dialog device with dialog support generated using a mixture of language models combined using a recurrent neural network
CN107562792A (en) * 2017-07-31 2018-01-09 同济大学 A kind of question and answer matching process based on deep learning
CN107590163A (en) * 2016-07-06 2018-01-16 北京京东尚科信息技术有限公司 The methods, devices and systems of text feature selection
CN107832457A (en) * 2017-11-24 2018-03-23 国网山东省电力公司电力科学研究院 Power transmission and transforming equipment defect dictionary method for building up and system based on TextRank algorithm
CN107980130A (en) * 2017-11-02 2018-05-01 深圳前海达闼云端智能科技有限公司 It is automatic to answer method, apparatus, storage medium and electronic equipment

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060282306A1 (en) * 2005-06-10 2006-12-14 Unicru, Inc. Employee selection via adaptive assessment
CN104679728A (en) * 2015-02-06 2015-06-03 中国农业大学 Text similarity detection device
CN104699763A (en) * 2015-02-11 2015-06-10 中国科学院新疆理化技术研究所 Text similarity measuring system based on multi-feature fusion
CN106844368A (en) * 2015-12-03 2017-06-13 华为技术有限公司 For interactive method, nerve network system and user equipment
US20170316775A1 (en) * 2016-04-27 2017-11-02 Conduent Business Services, Llc Dialog device with dialog support generated using a mixture of language models combined using a recurrent neural network
CN106126492A (en) * 2016-06-07 2016-11-16 北京高地信息技术有限公司 Statement recognition methods based on two-way LSTM neutral net and device
CN106202042A (en) * 2016-07-06 2016-12-07 中央民族大学 A kind of keyword abstraction method based on figure
CN107590163A (en) * 2016-07-06 2018-01-16 北京京东尚科信息技术有限公司 The methods, devices and systems of text feature selection
CN107562792A (en) * 2017-07-31 2018-01-09 同济大学 A kind of question and answer matching process based on deep learning
CN107980130A (en) * 2017-11-02 2018-05-01 深圳前海达闼云端智能科技有限公司 It is automatic to answer method, apparatus, storage medium and electronic equipment
CN107832457A (en) * 2017-11-24 2018-03-23 国网山东省电力公司电力科学研究院 Power transmission and transforming equipment defect dictionary method for building up and system based on TextRank algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
FEIFEI DAI 等: ""Intent Identification for Knowledge Base Question Answering"", 《2017 CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI)》 *
金丽娇 等: ""基于卷积神经网络的自动问答"", 《华东师范大学学报(自然科学版)》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109543089A (en) * 2018-11-30 2019-03-29 南方电网科学研究院有限责任公司 A kind of classification method, system and the relevant apparatus of network security information data
CN109766418A (en) * 2018-12-13 2019-05-17 北京百度网讯科技有限公司 Method and apparatus for output information
CN109902284A (en) * 2018-12-30 2019-06-18 中国科学院软件研究所 A kind of unsupervised argument extracting method excavated based on debate
CN109829052A (en) * 2019-02-19 2019-05-31 田中瑶 A kind of open dialogue method and system based on human-computer interaction
CN110674286A (en) * 2019-09-29 2020-01-10 出门问问信息科技有限公司 Text abstract extraction method and device and storage equipment
CN111241288A (en) * 2020-01-17 2020-06-05 烟台海颐软件股份有限公司 Emergency sensing system of large centralized power customer service center and construction method
CN111401033A (en) * 2020-03-19 2020-07-10 北京百度网讯科技有限公司 Event extraction method, event extraction device and electronic equipment
US11928435B2 (en) 2020-03-19 2024-03-12 Beijing Baidu Netcom Science Technology Co., Ltd. Event extraction method, event extraction device, and electronic device
CN113806500A (en) * 2021-02-09 2021-12-17 京东科技控股股份有限公司 Information processing method and device and computer equipment
CN113806500B (en) * 2021-02-09 2024-05-28 京东科技控股股份有限公司 Information processing method, device and computer equipment
CN113282711A (en) * 2021-06-03 2021-08-20 中国软件评测中心(工业和信息化部软件与集成电路促进中心) Internet of vehicles text matching method and device, electronic equipment and storage medium
CN113282711B (en) * 2021-06-03 2023-09-22 中国软件评测中心(工业和信息化部软件与集成电路促进中心) Internet of vehicles text matching method and device, electronic equipment and storage medium
CN113688231A (en) * 2021-08-02 2021-11-23 北京小米移动软件有限公司 Abstract extraction method and device of answer text, electronic equipment and medium
CN113918702A (en) * 2021-10-25 2022-01-11 北京航空航天大学 Semantic matching-based online legal automatic question-answering method and system
CN114997175A (en) * 2022-05-16 2022-09-02 电子科技大学 Emotion analysis method based on field confrontation training

Also Published As

Publication number Publication date
CN108681574B (en) 2021-11-05

Similar Documents

Publication Publication Date Title
CN108681574A (en) A kind of non-true class quiz answers selection method and system based on text snippet
CN109189942B (en) Construction method and device of patent data knowledge graph
CN106844658B (en) Automatic construction method and system of Chinese text knowledge graph
CN106997382B (en) Innovative creative tag automatic labeling method and system based on big data
CN108363743B (en) Intelligent problem generation method and device and computer readable storage medium
CN105528437B (en) A kind of question answering system construction method extracted based on structured text knowledge
CN108052593A (en) A kind of subject key words extracting method based on descriptor vector sum network structure
JP2017511922A (en) Method, system, and storage medium for realizing smart question answer
CN107729468A (en) Answer extracting method and system based on deep learning
CN109271524B (en) Entity linking method in knowledge base question-answering system
Al-Taani et al. An extractive graph-based Arabic text summarization approach
CN114065758A (en) Document keyword extraction method based on hypergraph random walk
CN110188174B (en) Professional field FAQ intelligent question and answer method based on professional vocabulary mining
CN114912449B (en) Technical feature keyword extraction method and system based on code description text
CN114428850B (en) Text retrieval matching method and system
CN112036178A (en) Distribution network entity related semantic search method
CN106777080A (en) Short abstraction generating method, database building method and interactive method
Nityasya et al. Hypernym-hyponym relation extraction from indonesian wikipedia text
CN112417170B (en) Relationship linking method for incomplete knowledge graph
Wu et al. Domain Event Extraction and Representation with Domain Ontology.
CN109002540B (en) Method for automatically generating Chinese announcement document question answer pairs
CN113919339A (en) Artificial intelligence auxiliary writing method
CN109215797B (en) Method and system for extracting non-classification relation of traditional Chinese medicine medical case based on extended association rule
Tohalino et al. Using virtual edges to extract keywords from texts modeled as complex networks
Jebbor et al. Overview of knowledge extraction techniques in five question-answering systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant