CN102411621A - Chinese inquiry oriented multi-document automatic abstraction method based on cloud mode - Google Patents

Chinese inquiry oriented multi-document automatic abstraction method based on cloud mode Download PDF

Info

Publication number
CN102411621A
CN102411621A CN2011103737529A CN201110373752A CN102411621A CN 102411621 A CN102411621 A CN 102411621A CN 2011103737529 A CN2011103737529 A CN 2011103737529A CN 201110373752 A CN201110373752 A CN 201110373752A CN 102411621 A CN102411621 A CN 102411621A
Authority
CN
China
Prior art keywords
sentence
cloud
digest
inquiry
chinese
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011103737529A
Other languages
Chinese (zh)
Other versions
CN102411621B (en
Inventor
陈劲光
何婷婷
胡珀
赵军民
李芳�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong Normal University
Original Assignee
Huazhong Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong Normal University filed Critical Huazhong Normal University
Priority to CN201110373752.9A priority Critical patent/CN102411621B/en
Publication of CN102411621A publication Critical patent/CN102411621A/en
Application granted granted Critical
Publication of CN102411621B publication Critical patent/CN102411621B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention discloses a Chinese inquiry oriented multi-document automatic abstraction method based on a cloud mode, which comprises the following steps of: segmenting sentences, dividing words and removing stop words for the inquiry and the multi-document collection; expressing the inquire and the document as a vector; processing the acquired vector by cloud mode; modifying the source code of the English automatic abstract testing tool ROUGE by building a Chinese corpus for automatically testing Chinese abstract and training parameters, finding sentences related with inquiry, and calculating the important degree of the sentence in the document collection; scoring the sentence by considering two aspects; and removing the redundancy and generating the initial abstract. The technical scheme of the invention can automatically acquire the related document collection from the given inquiry by search engine and further automatically generate user demanding abstracts. Meanwhile, important user demanding content can be directly returned, which can avoid waste of time of users for finding the needing result from web pages. The invention is a first complete system suitable for generating Chinese inquiry oriented multi-document automatic abstraction. The system has good performance proved by an experiment on Chinese and English large-scale corpus.

Description

A kind of Chinese based on cloud model is towards many documents automatic abstract method of inquiry
Technical field
The present invention relates to technical field of information processing, definite saying relates to a kind of many documents automatic abstract method towards inquiry based on cloud model.
Background technology
Along with popularizing of internet, comprising on the internet magnanimity and the time be engraved in the information of increase.To a simple queries of user's input, search engine generally can return a series of webpages through ordering that the user possibly need, and data incoherent in a large number, that repeat are wherein arranged, and needs the user to expend a lot of energy and comes oneself to search useful results.Will the content in a large amount of inquiry relevant documentations refine, be reassembled as the short summary of certain-length towards the many documents automatic abstract technology of inquiry, acceleration user's information is obtained.Many documents automatic abstract technology towards inquiry can reduce the difficulty of the information of from mass data, obtaining, the speed that raising information is obtained and understood, and then improve the efficient that the user obtains and utilizes information, improve the competitive strength of user in information society.
Not only be related but also have any different towards technology such as many documents automatic abstract and the information retrieval of inquiry, automatic question answerings.The main task of information retrieval is to find out the document that satisfies specific search condition, and the user then need strive to find needed information from the lists of documents of returning that comprises various redundant informations in a large number.The main task of automatic question answering then is to find out the answer that meets particular problem, also only limits at present the problem of some specific areas, particular type, and the answer that provides is sometimes because too simple and indigestion.The research of the question answering system of open field also is faced with substantial difficulty, and effect is also barely satisfactory.Many documents automatic abstract towards inquiry combines the advantage on the prior aries such as many documents automatic abstract, information retrieval and automatic question answering, has avoided its deficiency again to a certain extent.Fields such as it obtains at recommendation customization, the magnanimity information of user personalized information, digital library, business intelligence analysis, E-Government and mobile computing all have important Research Significance and wide application prospect.
According to the difference of summarize by, can the many documents automatic abstract towards inquiry be divided into information extraction formula and extracts formula, its main difference is: the former extracts Useful Information in the sentence, is combined into summary through rewriteeing; The latter chooses most important sentence through certain method and constitutes summary.At present, extracts formula summary is the main flow direction of research.According to the difference of research object, can the research of many documents automatic abstract towards inquiry be divided into to the digest of specific area with to the digest in open field.Though in general the digest system readability to open field be not so good as the former, wide accommodation is portable strong, is present main flow direction.The method of the invention is the extracts formula, to open field.
Cloud model is a kind of qualitative, quantitative transformation model of handling ambiguity, randomness and relevance thereof in the uncertain notion that the firm academician of Li De proposes.Cloud model is started with from the uncertainty of research natural language notion, launches the research to uncertain artificial intelligence.Though cloud model is originated in the notion in the natural language; But regrettably; The paper situation of just collecting at present it seems; The work that cloud model directly is applied in natural language processing field itself is also relatively more rare, and the method for the invention is a kind of typical application of cloud model in natural language processing, can be extended to the other field of natural language processing.
Generally extracting and generate three phases by text internal representation, text analyzing, digest towards many documents automatic abstracting system of inquiry constitutes.The text internal representation stage is converted into the internal representation form with input text.Thereby the text analyzing part is carried out the importance of definite each text elementary cell (statement, paragraph or chapters and sections etc.) of analysis of different levels to text.Digest extracts and generates part and generates the digest that content links up, reflects the original text theme through the ordering to the digest extracting unit.At present, the difference of each digest system is mainly reflected in latter two stage.
In the text analyzing stage, mainly contain based on the method that extracts: based on the method for high frequency words, based on the method for figure, based on the method for theme, and based on method of semantic etc.These existing methods may be summarized to be basically: find certain stochastic distribution of digest unit, utilize statistics, drawing method or more complicated language model to resolve these and distribute, and in view of the above the importance of digest unit is estimated.Through the text analyzing stage, choose most important sentence and can directly generate digest, but owing to just simply quoting and piling up, the summary redundance of its composition is high, continuity and readable relatively poor, is difficult to understood by the reader.
Digest extracts and generates part on the basis of previous stage, and select sentence is adjusted and modified, and present major technique means comprise that redundancy, sentence are pruned, the sentence ordering.Wherein go redundancy generally to take the MMR method, in the process of choosing the digest sentence, not only consider the importance degree of sentence, also consider sentence and selected the degree of correlation of digest sentence, choose those important but with select the incoherent sentence of digest sentence as the digest sentence.
Sentence prune through remove in the sentence some effective informations seldom or do not have a content of effective information; With the simple relatively also core content of a sentence of formal representation of grammatical; Can effectively improve the effective information content of digest, in limited space, express more contents.Utilize surfing Internet with cell phone also to become a kind of main flow mode of obtaining information resources gradually in recent years; And marked difference of cell phone platform and computer platform is the difference of screen size; Short and small summary of simplifying will help the cellphone subscriber to obtain the consulting of their demand faster, and the sentence pruning technique also thereby very likely receives more concern.At present, also extremely rare for the research of Chinese sentence pruning.
The sentence ordering is with the rearrangement of the sentence in the digest, thereby the digest after making process sort is more coherent, is understood by the reader easily, also is one of gordian technique of automatic abstract.At present, the method for sentence ordering mainly contains three kinds, i.e. the method for time order, most order, probability order.Wherein, the chronological order method is published according to former document or the order of date issued sorts, and its limitation is to obtain information actual time unusual difficulty often, and this method is not considered the theme factor simultaneously.The basic thought of most order is the orders according to the order decision digest sentence of theme under the digest sentence, and the order of theme is then by the determining positions of most of sentence in the theme.The limitation of most order is: when having only the relative position of each theme in document more stable, most order methods generate the readable just better of summary, change when frequent the digest structure confusion that becomes easily at relative position.The thinking of probability order is that the digest sentence is decomposed into characteristic; The sequencing of these characteristics of study in corpus; Utilize the order of the order decision digest sentence of characteristic again; Its limitation is the dependence for corpus, and the quality of the corpus of artificial selection is very big for sentence ordering influence.The Liu Dexi of Wuhan University has proposed a kind of mixture model of many documents digest sentence ordering, utilizes the integrated position of linear combination relation, time relationship, dependence, topic relation.The Jiang Xiaoyu of Beijing Institute of Technology has proposed a kind of sentence sort method that degree of gathering between local theme is combined with most order.The bright digest sentence ordering strategy that has proposed a kind of based on single template fusion of the horse of Central China Normal University, according to the representative template of selecting of the digest of document, utilizing template is the ordering of digest sentence, thereby guarantees digest sentence continuity in logic.People such as the Xu Yongdong of Harbin Institute of Technology propose the sentence sort method based on the processing of text temporal information, have proposed the extraction of Chinese text temporal information, semantic calculating and temporal inferences algorithm, extracting time information.
The inventor has announced a kind of many documents automatic abstract method towards inquiry based on cloud model on periodical in 2011, the method for having announced is confined to English language material, and only limits to said subordinate phase, i.e. the innovation in text analyzing stage.
Summary of the invention
For solving the problems of the technologies described above; The invention provides the many document automatic abstract method of a kind of Chinese based on cloud model towards inquiry; The newest research results that has adopted this uncertain research field of cloud model is as theoretical direction; In each link of constructing system, the apply in a flexible way thought and the method for cloud are considered fully to generate the uncertain factor in the digest process, and are utilized these uncertain factors to improve the performance of systems; For given Chinese collection of document and querying condition, what this system can automatically generate designated length satisfies query demand, succinct, the autoabstract that links up.The method is fit to Chinese language material, and the summary of generation has higher compatible degree with artificial summary, and has stronger readability, searches the used time of information thereby reduce the user.
Be to realize above-mentioned purpose, the invention provides the many document automatic abstract method of a kind of Chinese based on cloud model, may further comprise the steps towards inquiry:
1) inquiry and many collection of document are carried out sentence cutting, participle, removed stop words, will inquire about with document and be expressed as vector;
2) utilize cloud model that the vector that obtains is handled; Through the source code setting up Chinese corpus, revise English automatic abstract evaluating tool ROUGE to realize that Chinese digest is evaluated and tested automatically, parameter training; Find out sentence associated with the query; And calculate the importance degree of sentence in collection of document, and take all factors into consideration the factor of two aspects, give a mark to sentence;
3) go redundancy, generate initial digest.
And, also comprise a sentence shearing procedure after the said step 3), promptly formulate sentence pruning rule initial digest sentence is carried out the sentence pruning, produce many candidate sentence, utilize the multidimensional cloud to choose the pruning sentence and replace original digest sentence, generate the refining digest.
And the method also has a sentence ordered steps at last, promptly collection of document is carried out cluster, finds out the sub-topics that comprises one or more digest sentences, regards all documents in the collection of document as template, and the set of a plurality of templates has constituted cloud, i.e. the cloud template.Utilize the cloud template successively the inner digest sentence of sub-topics and sub-topics to be sorted, the final generation satisfied inquiry, terse, the summary that links up.
And in the described sentence shearing procedure, rule pruned in sentence is 10 artificial rules based on interdependent analysis.
And; In the described sentence shearing procedure; Utilize the multidimensional cloud to choose to prune the original digest sentence of sentence replacement and specifically be meant: with word the distribution between collection of document, the distribution between all sentences, and all query words between the degree of correlation three aspect regard water dust respectively as, the numerical characteristic that obtains three kinds of clouds through reverse cloud generator respectively obtains word one-dimensional cloud to obtain word multidimensional cloud through comprehensive cloud computing; Word one-dimensional cloud is formed sentence multidimensional cloud; Calculated candidate sentence importance degree score with the information density of candidate sentence length calculated candidate sentence, is replaced original digest sentence with the highest candidate sentence of information density again.
And in the described sentence shearing procedure, calculated candidate sentence importance degree score is meant, through calculating the similarity of sentence multidimensional cloud and former sentence multidimensional cloud, thereby obtains the importance degree score of candidate sentence, and the method for calculating sentence multidimensional cloud and former multidimensional cloud similarity is:
Figure 2011103737529100002DEST_PATH_IMAGE001
Wherein, C1 and C2 are two multidimensional clouds, Ex 1k, Ex 2k, En 1k, En 2k, He 1k, He 2kBe respectively mathematical expectation, entropy, the ultra entropy of k the property value that notion C1 and C2 had; V kBe the weight of attribute k, its size is between 0 to 1, looks specific object and contact thereof and decides.
And in the described sentence shearing procedure, the method for calculated candidate sentence information density is:
Figure 2011103737529100002DEST_PATH_IMAGE002
Wherein C, O represent candidate sentence and former sentence respectively, and what function Length calculated is sentence length, is unit with the word.
And; Utilizing the cloud template successively sub-topics to be sorted in the described step 4) specifically is meant: the one-dimensional cloud by each digest sentence that theme comprised constitutes theme relative position multidimensional cloud; Obtain theme relative position one-dimensional cloud with comprehensive cloud computing; Ex obtains theme relative position score through expectation, with this theme is sorted.
And; Utilizing the cloud template successively the digest sentence of sub-topics inside to be sorted in the said step 4) specifically is meant: it is the most similar in all documents, to find out the digest sentence that obtains in which sentence and the back; As the relative position of this digest sentence in the document; Regard each relative position as water dust, carry out reverse cloud computing, obtain the numerical characteristic of sentence relative position cloud; Obtain sentence relative position score with the inner sentence of theme through expectation Ex, theme inside sentence is sorted with this.
The method of the invention compared with prior art has following effect: owing to adopted cloud model, taken into full account uncertain problem, guaranteed the better performance of each link in the digest generative process; Sentence is pruned and can be made digest more brief, more likely is generalized to the field that mobile search etc. is had relatively high expectations to the digest terseness; The sentence ordering can reduce the jumping characteristic of digest content again, makes digest more coherent; The experiment of carrying out in the extensive language material has proved the validity of the method that the present invention proposes.
Technology of the present invention can realize for given inquiry, obtains the relevant documentation set automatically through search engine, and then generates the summary that the user needs automatically.The present invention can directly return the important content of user's needs, avoids the result of time searching needs from webpage of user's labor.The present invention knows the first holonomic system of Chinese towards many documents automatic abstract of inquiry that be suitable for generating at present, and the experiment of on the extensive language material of Chinese and English, carrying out shows that this system has good performance.
Description of drawings
Fig. 1 is the overall flow figure of a kind of Chinese based on cloud model of the present invention towards many documents automatic abstract method of inquiry.
Fig. 2 chooses the process flow diagram of process for sentence.
Fig. 3 is the process flow diagram of redundant process.
Fig. 4 prunes the process flow diagram of process for sentence.
Fig. 5 is the process flow diagram of sentence sequencer procedure.
Fig. 6 is on the TAC 2010 evaluation and test data set A (its task is with similar towards many documents automatic abstract of inquiry); Embodiment 1 said cloud abstract system is numbered 23; Its ROUGE-2 (a), ROUGE-SU4 (b), Basic Elements (c), artificial evaluation and test Average Overall Responsiveness (d) have obtained the achievement of rank the the the 3rd, the 2nd, the 2nd, the 3rd in 43 systems that participate in evaluation and electing respectively; Wherein A is artificial summary to H, and 1 to 43 is machine summary (just listing preceding ten).
Fig. 7 is ROUGE evaluation result and 95% fiducial interval of embodiment 1 said system and baseline system SumFocus.
Fig. 8 comments evaluation result for the method for the embodiment of the invention 2 with the manual work that artificial pruning sentence compares.
Fig. 9 prunes the influence for the ROUGE evaluation result for using sentence.
Figure 10 is the readable artificial evaluation and test of digest, the number percent that all kinds of results are shared.
Embodiment
Embodiment 1
Embodiment 1 corresponding diagram 1 is chosen the situation of leftmost dashed path, promptly removes the redundant summary that directly generates afterwards, may further comprise the steps:
1, querying condition and collection of document are carried out sentence cutting, participle, removes stop words.Use the sentence cutting module (SplitSentence) and the word-dividing mode (CRFWordSeg) of the LTP v2.01 version of Harbin Institute of Technology's exploitation.The up-to-date word-dividing mode of LTP is based on the CRF model construction, and participle performance F1 value has reached 97.4%.After participle, further adopt homemade inactive vocabulary to go the work of stop words.
2, the sentence expression that the above-mentioned processing of the process in collection of document, the querying condition is obtained later on becomes the form of vector, and the line number of vector is the sentence number, and columns is a speech kind number, the element of vector is corresponding number of times that certain speech occurs in certain sentence.
3, choose (Fig. 2) based on the sentence of cloud model, sentences all in the collection of document given a mark, divide four steps:
(1) the relevant score of the inquiry of calculating sentence comprises following steps:
The word in a, the employing HAL method calculating collection of document and the degree of association between the query word; The HAL method can be by the method that is called the window co-occurrence of image; Utilize speech and the query word co-occurrence situation in certain length of window to calculate the correlativity score between word and the query word, thereby obtain the semantic association information that exists between word and the query word.In the window ranges that a length is K, observe the co-occurrence situation of word (w) and query word in the collection of document (w '), then this window is moved in the entire document range of convergence, a word moves forward at every turn.Statistics word and query word are in the co-occurrence situation of certain distance, and distance is more little, and the co-occurrence number of times is many more, explains that then this word is relevant more with query word.
If on behalf of w and w ',
Figure 2011103737529100002DEST_PATH_IMAGE003
be the co-occurrence number of times of k in distance, W (k)=K-k+1 represents the co-occurrence intensity of word w and w '.Then the degree of correlation of word and query word can be expressed as:
Figure 2011103737529100002DEST_PATH_IMAGE004
B, regard the degree of correlation of each word in a word and the querying condition as water dust; Utilize reverse cloud generator, obtain the numerical characteristic
Figure 2011103737529100002DEST_PATH_IMAGE005
of cloud.
The numerical characteristic linear combination (linear combination 1) of c, cloud that step b is obtained obtains the relevant score of inquiry of word:
Figure 2011103737529100002DEST_PATH_IMAGE006
D, regard the relevant score (step c provides) of the inquiry of each word that sentence comprised as water dust; Utilize reverse cloud generator, obtain the numerical characteristic
Figure 2011103737529100002DEST_PATH_IMAGE007
of cloud.
The numerical characteristic of e, cloud that steps d is obtained carries out linear combination (linear combination 2), obtains the relevant score of inquiry of sentence:
Figure 2011103737529100002DEST_PATH_IMAGE008
(2) the importance degree score of calculating sentence comprises following steps:
Cosine similarity between a, calculating sentence and the sentence.
Adopt vector space model to calculate the similarity between sentence.For given collection of document, be the vector (w of m dimension with each sentence expression I1, w I2..., w Im), wherein m is the speech kind number of collection of document, each dimension in the vector space corresponding a speech in the vocabulary.The weight of each element must assign to represent with the TF-ISF of the pairing word of dimension at this element place in the vector, that is:
Wherein, TF representes the word frequency of speech w in sentence S, and ISF is for arranging the sentence frequency, by computes:
Figure 2011103737529100002DEST_PATH_IMAGE010
Wherein, N representes the sum of sentence in the collection of document, and n representes to contain the sentence number of speech w.
Similarity can use the cosine similarity between the vector to calculate between the sentence:
Figure 2011103737529100002DEST_PATH_IMAGE011
B, regard the similarity of each sentence in sentence and the collection of document as water dust; Utilize reverse cloud generator, obtain the numerical characteristic of cloud.
The numerical characteristic linear combination (linear combination 3) of c, cloud that step b is obtained obtains the importance degree score of sentence:
Figure 2011103737529100002DEST_PATH_IMAGE013
(3) integrate score of calculating sentence.
Relevant score of the inquiry of the sentence that (1) (2) are obtained and importance degree score are carried out linear combination (linear combination 4), obtain the integrate score of sentence:
Figure 2011103737529100002DEST_PATH_IMAGE014
(4) parameter training process.
To Chinese language material; Make up many document automatic abstract corpus and the Chinese digest automatic Evaluation instrument of Chinese towards inquiry; Confirm parameter in (1) (2) (3);
Figure 2011103737529100002DEST_PATH_IMAGE016
;
Figure 2011103737529100002DEST_PATH_IMAGE017
, δ.Be divided into following steps:
A, structure Chinese are towards many documents automatic abstract corpus of inquiring about.
There is not the many document automatic abstract corpus of disclosed Chinese at present towards inquiry.The present invention has made up the many document automatic abstract corpus of Chinese towards inquiry; After the focus incident theme that at first artificial selected 100 2009-2010 take place; Focus incident title (for example " the Guangzhou Asian Games ") is used as query word and is input to search engine, and the result of search engine is converted into relevant documentation through extracting automatically.This paper adopts a kind of relevant documentation method for distilling based on label density.After definite relevant documentation set, further write querying condition and artificial summary by the expert.Obtain containing the corpus of 100 collection of document, 1000 pieces of relevant documentations, 400 pieces of artificial summaries at last.
B, structure Chinese Text Summarization appraisal tool are used for the digest that different parameters generates is down given a mark automatically.
This instrument is on the basis of English automatic Evaluation instrument ROUGE, to make amendment to obtain, and below is the step of ROUGE-CN concrete modification source program:
Step 1: adopt the CRFWordSeg module of LTP V2.01 platform that participle is carried out in manual work summary, autoabstract, adopt space-separated during participle.
Step 2: all the elements of " smart_common_words.txt " under " data " file below the inactive vocabulary replacement of Chinese ROUGE installation kit.
Step 3: find and delete the relevant statement that filters Chinese character in the source program.
C, parameter training.
According to constraint condition; Confirm parameter ;
Figure 2011103737529100002DEST_PATH_IMAGE019
,
Figure 2011103737529100002DEST_PATH_IMAGE020
should be a group in the following candidate parameter set:
In the concrete training process, confirm successively
Figure 149441DEST_PATH_IMAGE018
,
Figure 4264DEST_PATH_IMAGE019
,
Figure 962163DEST_PATH_IMAGE020
, the locally optimal solution of δ is separated and is combined every group optimum 3 then, carries out recycle to extinction, promptly generates 3 4=127 digests are selected the optimized parameter combination through automatic Evaluation.
Through the parameter training process, confirmed parameter, also just confirmed the integrate score of the said sentence of step 3.
4, go redundancy (Fig. 3)
(1) sentence is sorted by score from high to low, choose the highest sentence of score as first digest sentence.
(2) score of remaining all sentences of adjustment and to have selected the similarity of digest sentence high more, score just is lowered manyly more:
Figure 2011103737529100002DEST_PATH_IMAGE022
Wherein R is the set of all sentences, and the set of the digest sentence that F is all to have chosen, thereby S iExpression candidate digest sentence; S LThe digest sentence that expression is chosen recently.
(3) judge whether to reach the digest length requirement, if it is reach, then technological; If do not reach length requirement, then get back to step (1).
5, generate summary.
If the length sum of the sentence of choosing is then removed the part that exceeds length in last sentence greater than digest length, generate final summary.
6, effect
Though present embodiment is fit to Chinese language material, also is suitable for English language material.Owing to openly do not evaluate and test language material in the Chinese on a large scale, at first reflect card here with the conduct of the experimental result in the English language material, provide the experimental result in the Chinese language material subsequently.
We adopted like above-mentioned 5 the cloud abstract systems participation TAC 2010 that step constituted and lead the digest international tournament in (1) 2010 year; In order the task of leading to be arranged and to interrelate towards inquiry automatic abstract task; Our only that organizing committee is given classification information is as querying condition, and other aspects all are consistent with before experiment.
The language material that adopts international text analyzing meeting TAC2008 is as corpus, and through the said parameter training process of step 3 (4), training obtains parameter:
We have submitted two systems to, and ID is respectively 6, No. 23, and embodiment 1 described cloud abstract system is numbered No. 23.On TAC 2010 evaluation and test data set A (its task is with similar towards many documents automatic abstract of inquiry); Fig. 6 has shown each item evaluation result, has obtained the achievement of rank the the the 3rd, the 2nd, the 2nd, the 3rd in 43 systems that participate in evaluation and electing respectively based on its ROUGE-2 of digest (a), ROUGE-SU4 (b), Basic Elements (c), four automatic evaluation metricses of artificial comprehensive evaluation metrics Average Overall Responsiveness (d) of cloud model.
(2) at first the described Chinese of a of step 3 (4) is divided into two parts at random towards 100 collection of document of many documents automatic abstract corpus of inquiry, i.e. each 50 collection of document of each part are respectively as corpus and testing material.Corpus mainly is used for training the parameters of cloud abstract system, and testing material is used for the effect of confirmatory experiment.
Through the said parameter training process of step 3 (4), training obtains being fit to the automatic abstract parameter of Chinese:
Figure 2011103737529100002DEST_PATH_IMAGE024
Fig. 7 is the average evaluation result that said cloud abstract system of present embodiment and baseline system SumFocus obtain on the testing material that comprises 50 collection of document, and has provided 95% fiducial interval.Wherein SumFocus is the digest system of people such as the Vanderwende exploitation of Microsoft Research (Microsoft Research); This system participates in the evaluation and test of DUC 2006; It is one of digest system that behaves oneself best; Wherein pyramid is evaluated and tested first that ranks 22 systems, and we have built this system and have generated summary in Chinese language material.As can be seen from Figure 7, the described method of present embodiment all is significantly improved than SumFocus each item score.This result reflects that methods described herein have consistance preferably with artificial summary aspect content, and popular says, is unit with the speech, and it is the same with artificial clip Text that the summary that method of the present invention generated on average has 1/3 content.
Embodiment 2
Embodiment 2 is with the difference of embodiment 1, between the step 4 and step 5 of embodiment 1, has increased a sentence shearing procedure, through less important or irrelevant sentence element in the deletion digest sentence, further increases the information content of digest.
Choose the situation of the dashed path of the rightmost side in embodiment 2 corresponding diagram 1.
Process corresponding diagram 4 pruned in sentence, comprises following steps:
1. formulate artificial rule base, be used for sentence is pruned.
Following table has provided the concise and to the point description and the example thereof of 10 artificial rules that this paper adopted, and underscore representes to use the content of this redundant rule elimination.
 
<tables num="0001"> <table > <tgroup cols="3"> <colspec colname="c001" colwidth="2%" /> <colspec colname="c002" colwidth="12%" /> <colspec colname="c003" colwidth="84%" /> <tbody > <row > <entry morerows="1">Rule</entry> <entry morerows="1">Describe</entry> <entry morerows="1">Example</entry> </row> <row > <entry morerows="1"> <b >1</b> </entry> <entry morerows="1">Parenthetic literal</entry> <entry morerows="1"> <i >Xinhua News Agency Bonn Dec 30<u >(reporter Lv Hong)</u>Germany foreign minister Jin Keer made a speech to press during this time on the 30th, and achievement was all obtained in 1997 in title Europe aspect economical, political and diplomatic.</i></i></entry></row><row ><entry morerows=" 1 "><b >2</b></entry><entry morerows=" 1 ">Absolute construction</entry><entry morerows=" 1 "><i ><u >It is reported</u>Municipal Party committee of Harbin group starts with from helping laid-off young worker to improve the employment ability, for the youth reemploys out and out service is provided.</i></entry></row><row ><entry morerows=" 1 "><b >3</b></entry><entry morerows=" 1 ">The independent adverbial modifier of sentence beginning</entry><entry morerows=" 1 "><i ><u >Early morning today,</u>In resonant national song, ceremony of rising national flag is observed the grand opening of in Lhasa.</i></entry></row><row ><entry morerows=" 1 "><b >4</b></entry><entry morerows=" 1 ">" XX says, " of sentence beginning</entry><entry morerows=" 1 "><i ><u >This newspaper Paris reporter Liu Zhengxue on May 26, the firm forever report of fruit:</u>The Li Ruihuan of President of Chinese People's Political Consultative Conference 26 days has met with friendly group member in the French senate method in Paris.</i></entry></row><row ><entry morerows=" 1 "><b >5</b></entry><entry morerows=" 1 ">The independent conjunction of sentence beginning</entry><entry morerows=" 1 "><i ><u >So,</u>Unification of the country can reach.</i></entry></row><row ><entry morerows=" 1 "><b >6</b></entry><entry morerows=" 1 ">Do the adverbial modifier's prepositional phrase</entry><entry morerows=" 1 "><i >Carry out this work<u >, for setting up socialist market economy system, promote the national economy sustained, rapid and sound development,</u>Has very important meaning.</i></entry></row><row ><entry morerows=" 1 "><b >7</b></entry><entry morerows=" 1 ">" " the word structure</entry><entry morerows=" 1 "><i >It is reported that municipal Party committee of Harbin group improves the employment ability from the young worker that helps to be laid off and start with, provide for the youth reemploys<u >Out and out</u>Service.</i></entry></row><row ><entry morerows=" 1 "><b >8</b></entry><entry morerows=" 1 ">" " the word structure</entry><entry morerows=" 1 "><i >China has realized the fastest growth in large-scale economy, it<u >Successfully</u>The market oriented economy that trend is open more.</i></entry></row><row ><entry morerows=" 1 "><b >9</b></entry><entry morerows=" 1 ">Adverbial word</entry><entry morerows=" 1 "><i >If we<u >Further</u>Emancipate the mind, seek truth from the facts, seize the opportunity, pioneer and keep forging ahead, the road of building socialism with Chinese characteristics will be walked broader and broader.</i></entry></row><row ><entry morerows=" 1 "><b >10</b></entry><entry morerows=" 1 ">Adjective</entry><entry morerows=" 1 "><i >Brain industryization be by the U.S. famous professor mark Lu Pu 1962 in " knowledge production and distribution " book<u >Up-to-date</u>Propose.</i> </entry></row></tbody></tgroup></table></tables>
2. use artificial rule that sentence is pruned successively, produce many candidate sentence.
(1) for sentence to be pruned, each bar rule in the matching rule base one by one sees whether meet this rule in order.
(2) if meet rule, prune with regard to carrying out carrying out sentence, thereby obtain candidate sentence by the requirement of rule, and with the input of this candidate sentence as next bar rule.
(3) up to having mated the last item rule, all candidate sentence that the output front obtains are as many candidate sentence.
3. from three different aspects the importance degree of the word that comprises the candidate sentence is given a mark, obtains word multidimensional cloud:
(1) frequency that word is occurred in every piece of document is regarded water dust as; Utilize reverse cloud generator, obtain the numerical characteristic
Figure 2011103737529100002DEST_PATH_IMAGE025
of cloud.
(2) frequency that word is occurred in each sentence in collection of document is regarded water dust as; Utilize reverse cloud generator, obtain the numerical characteristic
Figure 2011103737529100002DEST_PATH_IMAGE026
of cloud.
(3) regard the degree of correlation of word and each query word as water dust; Utilize reverse cloud generator, obtain the numerical characteristic
Figure 2011103737529100002DEST_PATH_IMAGE027
of cloud.Step 3 content is consistent among this step and the embodiment 1, and the cloud that the d step of step 3 obtains among the cloud that obtains and the embodiment 1 is identical.
(4) cloud that first three step is obtained makes up, and obtains the multidimensional cloud:
WMC={(Ex 1,En 1,He 1),(Ex 2,En 2,He 2),?(Ex 3,En 3,He 3)}
4. adopt comprehensive cloud computing, word multidimensional cloud is converted into word one-dimensional cloud.
Comprehensive cloud operation definition is:
Figure 2011103737529100002DEST_PATH_IMAGE028
Wherein
Figure 2011103737529100002DEST_PATH_IMAGE029
is the weight of each dimension.
Make in the following formula
Can obtain word one-dimensional cloud
Figure 2011103737529100002DEST_PATH_IMAGE031
.
4. sentence expression is become sentence multidimensional cloud, each dimension of cloud is a word one-dimensional cloud.
If in the former sentence m word arranged; Can be the form of vector
Figure 2011103737529100002DEST_PATH_IMAGE032
then with each sentence expression in the candidate sentence set; Different components appears repeatedly also being used as in same word in same sentence handles, and the position of word vector and the word position in former sentence is corresponding one by one.
In the candidate sentence; If certain word is deleted from former sentence; Then Wesy's null vector of this speech position
Figure 2011103737529100002DEST_PATH_IMAGE033
expression, that is:
Figure 2011103737529100002DEST_PATH_IMAGE034
Then sentence multidimensional cloud (Sentence Multi-dimension Cloud abbreviates SMC as) can be expressed as:
Figure 2011103737529100002DEST_PATH_IMAGE035
Have nothing in common with each other though it should be noted that each candidate sentence length of same sentence, the dimension of their SMC all is identical.
5. calculate the similarity of sentence multidimensional cloud and former sentence multidimensional cloud, obtain the information importance degree score of candidate sentence.
To the difference of three characteristic roles of cloud model, the present invention proposes a kind of improved multidimensional cloud similarity calculating method, when calculating multidimensional cloud similarity, gives different weights for each numerical characteristic of cloud.It is similar more with former sentence to prune sentence, explains that the important information of its reservation is many more.
Similarity between two multidimensional cloud C1 and the C2 is defined as:
Wherein, Ex 1k, Ex 2k, En 1k, En 2k, He 1k, He 2kBe respectively notion C1 draw C2 had kThe mathematical expectation of individual property value, entropy, ultra entropy; V kBe attribute kWeight, its size is between 0 to 1, looks specific object and contact thereof and decides.Calculate the similarity between sentences need to determine the weight vector
Figure 2011103737529100002DEST_PATH_IMAGE036
and
Figure 2011103737529100002DEST_PATH_IMAGE037
value.Here give the speech relevant with incident, i.e. higher weights of noun and verb, and the weights of noun are higher than verb.In the process of sentence one-dimensional cloud conversion, the weight of each dimension is determined by following formula at sentence multidimensional cloud:
Wherein, pos represent word part of speech (part of speech, POS).
After having defined the weight of each dimension; Make
Figure DEST_PATH_IMAGE039
; Can calculated candidate prune the similarity between sentence and the former sentence, as the information importance degree score of pruning sentence.
6. the information density of calculated candidate sentence.
The present invention proposes a kind of improved information density computing method that candidate sentence is chosen that are suitable for:
Figure 2011103737529100002DEST_PATH_IMAGE040
Wherein C, O represent candidate sentence and former sentence respectively,
Figure DEST_PATH_IMAGE041
be the importance degree score of the pruning sentence that obtains of step 5.What function Length calculated is sentence length, is unit with the word.
7. replace former sentence as the digest sentence with the highest candidate sentence of information density.
8. for owing to the deletion content space of practicing thrift out, with through last digest sentence of pruning owing to exceed part or the new digest sentence that length deleted and fill, reformulate digest.
9. implementation result:
(1) the artificial evaluation result of pruning sentence quality
Adopt described in the embodiment 1 in 6 (2) employed 50 collection of document as testing material, preceding two of each collection of document promptly add up to 100 sentences to be selected to carry out manual work evaluation.
By 4 evaluation and test persons the pruning sentence that 3 kinds of methods generate is provided 5 grades of scorings of 1 to 5 respectively from grammer correctness, information importance degree two aspects, the high more expression of score grammaticalness or the important information that comprises more is many more.In the concrete evaluation and test process, they know the content of former sentence, and 3 kinds of sentence quilt mixing at random that method generates, the sentence that evaluation and test person does not know evaluating and testing in advance is from which kind of method.
Fig. 8 has shown this result.The result shows that this method has been preserved important information preferably, on the basis of sentence shortening 32.2%, only loses 18% important information.
(2) the automatic Evaluation result of digest quality
Fig. 9 has shown whether use the influence of Chinese sentence pruning method for the digest quality.In the concrete evaluation and test process, utilize the ROUGE-1 evaluating tool, the average evaluation result that on the testing material that comprises 50 collection of document, obtains, and provided 95% fiducial interval.
Experimental result shows that the ROUGE-1 score of embodiment 2 has improved 4.7% than embodiment 1.
Embodiment 3
Embodiment 3 is the preferred embodiments of the present invention, the middle solid line part main flow chart of corresponding diagram 1, and embodiment 3 is with embodiment 2 differences, after the sentence shearing procedure at embodiment 2, has further increased the sentence ordered steps.
Embodiment 3 is with embodiment 1 difference, between the step 4 and step 5 of embodiment 1, increases sentence shearing procedure and sentence ordered steps.
Sentence ordering process flow diagram is seen Fig. 5, comprises following steps:
1. the sentence in the collection of document is carried out cluster, obtain sub-topics.We adopt a kind of self-adaption cluster algorithm based on the discovery of uniting to carry out cluster.
2. obtain the sub-topics from step 1, find the sub-topics that comprises one or more digest sentences, remove the sub-topics that does not comprise the digest sentence.
3. obtain the sentence relative position cloud of each digest sentence:
A, to find out in the document which sentence the most similar with the digest sentence, as the relative position in the document of this digest sentence;
For sentence S, the relative position that is defined as the sentence the most similar of its relative position rp in document D with this, concrete computing method are:
Figure 2011103737529100002DEST_PATH_IMAGE042
Wherein, function Position returns the absolute position of sentence in the document of place, and N is the sentence number in the document D.The computing method of sentence similarity are identical with embodiment 1 step 3 (2) a.
B, regard the relative position of digest sentence in every piece of document as water dust; Utilize reverse cloud generator; Calculate the numerical characteristic of cloud, promptly obtained sentence relative position cloud
Figure DEST_PATH_IMAGE043
.
4. obtain theme relative position multidimensional cloud.
For theme T; Be provided with digest sentence
Figure 2011103737529100002DEST_PATH_IMAGE044
and come from T; Then T can be expressed as the form of k dimension multidimensional cloud, that is:
Figure DEST_PATH_IMAGE045
5. utilize comprehensive cloud computing, obtain theme relative position one-dimensional cloud.
Figure 2011103737529100002DEST_PATH_IMAGE046
6. calculate theme relative position score, theme is sorted.
For theme , its relative position score is directly by its expectation decision:
Figure 2011103737529100002DEST_PATH_IMAGE048
Finally; According to
Figure DEST_PATH_IMAGE049
order from low to high; Successively theme is sorted; Promptly at first obtain the first that the minimum theme of branch is placed on digest, obtain the second portion that the low theme of gradation is placed on digest then, until all themes all sequence order.
7. the inner sentence of each theme is sorted.
For the ordering of the sentence of theme inside, equally only consider the expectation of SRPCloud.For sentence
Figure 2011103737529100002DEST_PATH_IMAGE050
, relative position must be divided into.
Figure DEST_PATH_IMAGE051
The order of sentence is determined by
Figure 2011103737529100002DEST_PATH_IMAGE052
;
Figure DEST_PATH_IMAGE053
is more little, and sentence is forward more in the inner position of theme.
8. result
Figure 10 has provided and has used before and after the sentence order module and the artificial evaluation result of artificial summary aspect readable.Wherein: Perfect is meant how to change sentence order, the summary that the result of digest can not improve no matter again; Acceptable: be meant and can understand, though and adjust maybe be better, yet passable summary or not in and nonessential adjustment; Poor: be meant discontinuously somewhere, just can reach the summary of the level of Acceptable through inching slightly; Unacceptable is meant needs the local too many of adjustment, the summary that put in order again.
Evaluation result shows; The sentence ordering makes the readability of digest that lifting by a relatively large margin arranged; Wherein the ratio of Perfect is 30%, and 47.5% digest readability is Acceptable in addition, that is to say that 77.5% digest is the summary that can no longer make further modification.Simultaneously, owing to used the sentence pruning, embodiment 3 also has the advantage of embodiment 2 concurrently, and promptly digest is evaluated and tested the ROUGE-1 score automatically than embodiment 1 raising 4.7%.

Claims (9)

1. one kind based on the Chinese of the cloud model many documents automatic abstract method towards inquiry, it is characterized in that comprising the steps:
1) inquiry and many collection of document are carried out sentence cutting, participle, removed stop words, will inquire about with document and be expressed as vector;
2) utilize cloud model that the vector that obtains is handled; Through the source code setting up Chinese corpus, revise English automatic abstract evaluating tool ROUGE to realize that Chinese digest is evaluated and tested automatically, parameter training; Find out sentence associated with the query; And calculate the importance degree of sentence in collection of document, and take all factors into consideration the factor of two aspects, give a mark to sentence;
3) go redundancy, generate initial digest.
2. the Chinese based on cloud model according to claim 1 is towards many documents automatic abstract method of inquiry; It is characterized in that also comprising a sentence shearing procedure after the said step 3); Promptly formulate sentence pruning rule initial digest sentence is carried out the sentence pruning; Produce many candidate sentence, utilize the multidimensional cloud to choose and prune the original digest sentence of sentence replacement, generate the refining digest.
3. the Chinese based on cloud model according to claim 2 also comprises a sentence ordered steps at last towards many documents automatic abstract method of inquiry, promptly collection of document is carried out cluster; Find out the sub-topics that comprises one or more digest sentences; Regard all documents in the collection of document as template, the set of a plurality of templates has constituted cloud, i.e. the cloud template; Utilize the cloud template successively sub-topics and the inner digest sentence of sub-topics to be sorted, finally generate required summary.
4. the Chinese based on cloud model according to claim 2 is towards many documents automatic abstract method of inquiry, it is characterized in that rule pruned in described sentence is 10 artificial regular based on interdependent analysis.
5. the Chinese based on cloud model according to claim 2 is towards many documents automatic abstract method of inquiry; It is characterized in that described utilize the multidimensional cloud to choose to prune the original digest sentence of sentence replacement specifically be meant: with word the distribution between collection of document, the distribution between all sentences, and all query words between the degree of correlation three aspect regard water dust respectively as; The numerical characteristic that obtains three kinds of clouds through reverse cloud generator respectively is to obtain word multidimensional cloud; Obtain word one-dimensional cloud through comprehensive cloud computing; Word one-dimensional cloud is formed sentence multidimensional cloud; Calculated candidate sentence importance degree score with the information density of candidate sentence length calculated candidate sentence, is replaced original digest sentence with the highest candidate sentence of information density again.
6. the Chinese based on cloud model according to claim 5 is towards many documents automatic abstract method of inquiry; It is characterized in that described calculated candidate sentence importance degree score is meant; Through calculating the similarity of sentence multidimensional cloud and former sentence multidimensional cloud; Thereby obtain the importance degree score of candidate sentence, the method for calculating sentence multidimensional cloud and former sentence multidimensional cloud similarity is:
Figure 588756DEST_PATH_IMAGE001
Wherein, C1 and C2 are two multidimensional clouds, Ex 1k, Ex 2k, En 1k, En 2k, He 1k, He 2kBe respectively mathematical expectation, entropy, the ultra entropy of k the property value that notion C1 and C2 had; V kBe the weight of attribute k, its size is 0 ~ 1.
7. the Chinese based on cloud model according to claim 5 is characterized in that towards many documents automatic abstract method of inquiry the method for described calculated candidate sentence information density is:
Figure 591347DEST_PATH_IMAGE002
Wherein C, O represent candidate sentence and former sentence respectively, and what function Length calculated is sentence length, is unit with the word.
8. the Chinese based on cloud model according to claim 3 is towards many documents automatic abstract method of inquiry; It is characterized in that saidly utilizing the cloud template successively sub-topics to be sorted specifically to be meant: the one-dimensional cloud by each digest sentence that theme comprised constitutes theme relative position multidimensional cloud; Obtain theme relative position one-dimensional cloud with comprehensive cloud computing; Ex obtains theme relative position score through expectation, with this theme is sorted.
9. the Chinese based on cloud model according to claim 3 is towards many documents automatic abstract method of inquiry; It is characterized in that saidly utilizing the cloud template successively the digest sentence of sub-topics inside to be sorted specifically to be meant: it is the most similar in all documents, to find out the digest sentence that obtains in which sentence and the back; As the relative position of this digest sentence in the document; Regard each relative position as water dust, carry out reverse cloud computing, obtain the numerical characteristic of sentence relative position cloud; Obtain sentence relative position score with the inner sentence of theme through expectation Ex, theme inside sentence is sorted with this.
CN201110373752.9A 2011-11-22 2011-11-22 Chinese inquiry oriented multi-document automatic abstraction method based on cloud mode Expired - Fee Related CN102411621B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110373752.9A CN102411621B (en) 2011-11-22 2011-11-22 Chinese inquiry oriented multi-document automatic abstraction method based on cloud mode

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110373752.9A CN102411621B (en) 2011-11-22 2011-11-22 Chinese inquiry oriented multi-document automatic abstraction method based on cloud mode

Publications (2)

Publication Number Publication Date
CN102411621A true CN102411621A (en) 2012-04-11
CN102411621B CN102411621B (en) 2014-01-08

Family

ID=45913692

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110373752.9A Expired - Fee Related CN102411621B (en) 2011-11-22 2011-11-22 Chinese inquiry oriented multi-document automatic abstraction method based on cloud mode

Country Status (1)

Country Link
CN (1) CN102411621B (en)

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102945228A (en) * 2012-10-29 2013-02-27 广西工学院 Multi-document summarization method based on text segmentation
CN104915335A (en) * 2015-06-12 2015-09-16 百度在线网络技术(北京)有限公司 Method for generating abstracts for subject document sets and device
CN105005563A (en) * 2014-04-15 2015-10-28 腾讯科技(深圳)有限公司 Abstract generation method and apparatus
CN105808561A (en) * 2014-12-30 2016-07-27 北京奇虎科技有限公司 Method and device for extracting abstract from webpage
CN105808562A (en) * 2014-12-30 2016-07-27 北京奇虎科技有限公司 Method and device for extracting webpage abstract based on weight
CN105824915A (en) * 2016-03-16 2016-08-03 上海珍岛信息技术有限公司 Method and system for generating commenting digest of online shopped product
CN105868175A (en) * 2015-12-03 2016-08-17 乐视网信息技术(北京)股份有限公司 Abstract generation method and device
CN106294314A (en) * 2016-07-19 2017-01-04 北京奇艺世纪科技有限公司 Topics Crawling method and device
CN106874362A (en) * 2016-12-30 2017-06-20 中国科学院自动化研究所 Multilingual automaticabstracting
CN107368549A (en) * 2017-06-30 2017-11-21 华中科技大学鄂州工业技术研究院 Personalized government affairs service recommendation method and system based on deep learning
CN108052686A (en) * 2018-01-26 2018-05-18 腾讯科技(深圳)有限公司 A kind of abstract extraction method and relevant device
CN108256539A (en) * 2016-12-28 2018-07-06 北京智能管家科技有限公司 Man-machine interaction method, interactive system and Intelligent story device based on semantic matches
CN108334497A (en) * 2018-02-06 2018-07-27 北京航空航天大学 The method and apparatus for automatically generating text
CN108399265A (en) * 2018-03-23 2018-08-14 北京奇虎科技有限公司 Real-time hot news providing method based on search and device
CN108417206A (en) * 2018-02-27 2018-08-17 四川云淞源科技有限公司 High speed information processing method based on big data
CN108417204A (en) * 2018-02-27 2018-08-17 四川云淞源科技有限公司 Information security processing method based on big data
CN108491512A (en) * 2018-03-23 2018-09-04 北京奇虎科技有限公司 The method of abstracting and device of headline
CN108520033A (en) * 2018-03-28 2018-09-11 华中师范大学 Enhancing pseudo-linear filter model information search method based on superspace simulation language
CN108595411A (en) * 2018-03-19 2018-09-28 南京邮电大学 More text snippet acquisition methods in a kind of same subject text set
WO2018214486A1 (en) * 2017-05-23 2018-11-29 华为技术有限公司 Method and apparatus for generating multi-document summary, and terminal
CN109033166A (en) * 2018-06-20 2018-12-18 国家计算机网络与信息安全管理中心 A kind of character attribute extraction training dataset construction method
CN109582967A (en) * 2018-12-03 2019-04-05 深圳前海微众银行股份有限公司 Public sentiment abstract extraction method, apparatus, equipment and computer readable storage medium
CN109829161A (en) * 2019-01-30 2019-05-31 延边大学 A kind of method of multilingual autoabstract
CN110555202A (en) * 2018-05-30 2019-12-10 微软技术许可有限责任公司 method and device for generating abstract broadcast
CN111159359A (en) * 2019-12-31 2020-05-15 达闼科技成都有限公司 Document retrieval method, document retrieval device and computer-readable storage medium
CN111241242A (en) * 2020-01-09 2020-06-05 北京百度网讯科技有限公司 Method, device and equipment for determining target content and computer readable storage medium
CN112732901A (en) * 2021-01-15 2021-04-30 联想(北京)有限公司 Abstract generation method and device, computer readable storage medium and electronic equipment
CN113420545A (en) * 2021-08-24 2021-09-21 平安科技(深圳)有限公司 Abstract generation method, device, equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李彩霞,张瑛: "面向主题的智能化查询分析研究", 《青海科技》 *
胡珀,何婷婷,张勇: "基于网络化数据挖掘策略的中文多文档自动文摘研究", 《中文信息处理前沿进展-中国中文信息学会二十五周年学术会议论文集》 *

Cited By (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102945228B (en) * 2012-10-29 2016-07-06 广西科技大学 A kind of Multi-document summarization method based on text segmentation technology
CN102945228A (en) * 2012-10-29 2013-02-27 广西工学院 Multi-document summarization method based on text segmentation
CN105005563A (en) * 2014-04-15 2015-10-28 腾讯科技(深圳)有限公司 Abstract generation method and apparatus
CN105808561A (en) * 2014-12-30 2016-07-27 北京奇虎科技有限公司 Method and device for extracting abstract from webpage
CN105808562A (en) * 2014-12-30 2016-07-27 北京奇虎科技有限公司 Method and device for extracting webpage abstract based on weight
CN104915335B (en) * 2015-06-12 2018-03-16 百度在线网络技术(北京)有限公司 The method and apparatus of the document sets that are the theme generation summary
CN104915335A (en) * 2015-06-12 2015-09-16 百度在线网络技术(北京)有限公司 Method for generating abstracts for subject document sets and device
CN105868175A (en) * 2015-12-03 2016-08-17 乐视网信息技术(北京)股份有限公司 Abstract generation method and device
CN105824915A (en) * 2016-03-16 2016-08-03 上海珍岛信息技术有限公司 Method and system for generating commenting digest of online shopped product
CN106294314A (en) * 2016-07-19 2017-01-04 北京奇艺世纪科技有限公司 Topics Crawling method and device
CN108256539A (en) * 2016-12-28 2018-07-06 北京智能管家科技有限公司 Man-machine interaction method, interactive system and Intelligent story device based on semantic matches
CN106874362A (en) * 2016-12-30 2017-06-20 中国科学院自动化研究所 Multilingual automaticabstracting
CN106874362B (en) * 2016-12-30 2020-01-10 中国科学院自动化研究所 Multi-language automatic abstracting method
US10929452B2 (en) 2017-05-23 2021-02-23 Huawei Technologies Co., Ltd. Multi-document summary generation method and apparatus, and terminal
CN108959312B (en) * 2017-05-23 2021-01-29 华为技术有限公司 Method, device and terminal for generating multi-document abstract
WO2018214486A1 (en) * 2017-05-23 2018-11-29 华为技术有限公司 Method and apparatus for generating multi-document summary, and terminal
CN108959312A (en) * 2017-05-23 2018-12-07 华为技术有限公司 A kind of method, apparatus and terminal that multi-document summary generates
CN107368549B (en) * 2017-06-30 2020-08-11 华中科技大学鄂州工业技术研究院 Personalized government affair service recommendation method and system based on deep learning
CN107368549A (en) * 2017-06-30 2017-11-21 华中科技大学鄂州工业技术研究院 Personalized government affairs service recommendation method and system based on deep learning
CN108052686A (en) * 2018-01-26 2018-05-18 腾讯科技(深圳)有限公司 A kind of abstract extraction method and relevant device
CN108334497A (en) * 2018-02-06 2018-07-27 北京航空航天大学 The method and apparatus for automatically generating text
CN108417204A (en) * 2018-02-27 2018-08-17 四川云淞源科技有限公司 Information security processing method based on big data
CN108417206A (en) * 2018-02-27 2018-08-17 四川云淞源科技有限公司 High speed information processing method based on big data
CN108595411A (en) * 2018-03-19 2018-09-28 南京邮电大学 More text snippet acquisition methods in a kind of same subject text set
CN108595411B (en) * 2018-03-19 2022-02-01 南京邮电大学 Method for acquiring multiple text abstracts in same subject text set
CN108491512A (en) * 2018-03-23 2018-09-04 北京奇虎科技有限公司 The method of abstracting and device of headline
CN108399265A (en) * 2018-03-23 2018-08-14 北京奇虎科技有限公司 Real-time hot news providing method based on search and device
CN108520033A (en) * 2018-03-28 2018-09-11 华中师范大学 Enhancing pseudo-linear filter model information search method based on superspace simulation language
CN110555202A (en) * 2018-05-30 2019-12-10 微软技术许可有限责任公司 method and device for generating abstract broadcast
CN109033166B (en) * 2018-06-20 2022-01-07 国家计算机网络与信息安全管理中心 Character attribute extraction training data set construction method
CN109033166A (en) * 2018-06-20 2018-12-18 国家计算机网络与信息安全管理中心 A kind of character attribute extraction training dataset construction method
CN109582967B (en) * 2018-12-03 2023-08-18 深圳前海微众银行股份有限公司 Public opinion abstract extraction method, device, equipment and computer readable storage medium
CN109582967A (en) * 2018-12-03 2019-04-05 深圳前海微众银行股份有限公司 Public sentiment abstract extraction method, apparatus, equipment and computer readable storage medium
CN109829161A (en) * 2019-01-30 2019-05-31 延边大学 A kind of method of multilingual autoabstract
CN109829161B (en) * 2019-01-30 2023-08-04 延边大学 Method for automatically abstracting multiple languages
CN111159359A (en) * 2019-12-31 2020-05-15 达闼科技成都有限公司 Document retrieval method, document retrieval device and computer-readable storage medium
CN111159359B (en) * 2019-12-31 2023-04-21 达闼机器人股份有限公司 Document retrieval method, device and computer readable storage medium
CN111241242B (en) * 2020-01-09 2023-05-30 北京百度网讯科技有限公司 Method, device, equipment and computer readable storage medium for determining target content
CN111241242A (en) * 2020-01-09 2020-06-05 北京百度网讯科技有限公司 Method, device and equipment for determining target content and computer readable storage medium
CN112732901A (en) * 2021-01-15 2021-04-30 联想(北京)有限公司 Abstract generation method and device, computer readable storage medium and electronic equipment
CN112732901B (en) * 2021-01-15 2024-05-28 联想(北京)有限公司 Digest generation method, digest generation device, computer-readable storage medium, and electronic device
CN113420545B (en) * 2021-08-24 2021-11-09 平安科技(深圳)有限公司 Abstract generation method, device, equipment and storage medium
CN113420545A (en) * 2021-08-24 2021-09-21 平安科技(深圳)有限公司 Abstract generation method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN102411621B (en) 2014-01-08

Similar Documents

Publication Publication Date Title
CN102411621B (en) Chinese inquiry oriented multi-document automatic abstraction method based on cloud mode
Aizawa et al. NTCIR-11 Math-2 Task Overview.
CN107993724B (en) Medical intelligent question and answer data processing method and device
US8332434B2 (en) Method and system for finding appropriate semantic web ontology terms from words
CN100458795C (en) Intelligent word input method and input method system and updating method thereof
CN101630314B (en) Semantic query expansion method based on domain knowledge
CN104537116B (en) A kind of books searching method based on label
CN102902806B (en) A kind of method and system utilizing search engine to carry out query expansion
CN102081668B (en) Information retrieval optimizing method based on domain ontology
CN105447080B (en) A kind of inquiry complementing method in community&#39;s question and answer search
CN104765769A (en) Short text query expansion and indexing method based on word vector
CN103049440A (en) Recommendation processing method and processing system for related articles
US20080222138A1 (en) Method and Apparatus for Constructing a Link Structure Between Documents
CN103229223A (en) Providing answers to questions using multiple models to score candidate answers
CN102236677A (en) Question answering system-based information matching method and system
CN104915405B (en) It is a kind of based on multi-level microblogging enquiry expanding method
CN107844493B (en) File association method and system
CN106484797A (en) Accident summary abstracting method based on sparse study
Minkov et al. Improving graph-walk-based similarity with reranking: Case studies for personal information management
CN101853298B (en) Event-oriented query expansion method
CN115563313A (en) Knowledge graph-based document book semantic retrieval system
CN103064846B (en) Retrieval device and search method
CN1916904A (en) Method of abstracting single file based on expansion of file
JP2013168177A (en) Information provision program, information provision apparatus, and provision method of retrieval service
JP5315726B2 (en) Information providing method, information providing apparatus, and information providing program

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140108

Termination date: 20141122

EXPY Termination of patent right or utility model