CN105488021A - Method and device generating multi-file summary - Google Patents

Method and device generating multi-file summary Download PDF

Info

Publication number
CN105488021A
CN105488021A CN201410469449.2A CN201410469449A CN105488021A CN 105488021 A CN105488021 A CN 105488021A CN 201410469449 A CN201410469449 A CN 201410469449A CN 105488021 A CN105488021 A CN 105488021A
Authority
CN
China
Prior art keywords
phrase
sentence
pond
documents
generation device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410469449.2A
Other languages
Chinese (zh)
Other versions
CN105488021B (en
Inventor
邴立东
林伟
张轶博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201410469449.2A priority Critical patent/CN105488021B/en
Publication of CN105488021A publication Critical patent/CN105488021A/en
Application granted granted Critical
Publication of CN105488021B publication Critical patent/CN105488021B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Document Processing Apparatus (AREA)
  • Machine Translation (AREA)

Abstract

The invention embodiment discloses a method and device generating a multi-file summary, so the generated multi-file summary can have high coverage rate for multi-file important information, and redundancy can be reduced; the method comprises the following steps: destructing a multi-file sentence set into a phrase pool, and obtaining feathers and relations of each phrase in the phrase pool; selecting a phrase set, satisfying a preset constraint condition, from the phrase pool as a summary phrase set according to the features and relations; combining the selected summary phrase set into summary sentences according to a preset combination mode, thus forming the multi-file summary.

Description

A kind of method and apparatus generating multi-document summary
Technical field
The present invention relates to data processing field, particularly relate to a kind of method and apparatus generating multi-document summary.
Background technology
In the information explosion epoch, magnanimity information faced by people, more and more in the urgent need to rapid, effective information processing means.As one of the acquisition channel of information, news reads the considerable part time in people's life that occupies.And the magnanimity of news, redundancy bring very big inconvenience to the reading of people.Multi-document summary (Multi-DocumentSummarization, MDS) technology, many sections of documents under same subject, as input, automatically generate length-specific summary texts as required, read for user.Thus improve the efficiency of information reading, information acquisition.
At present can by the Sentence Clustering from many documents, the dependency tree dependencytree/ then utilized is interdependent, and figure dependencygraph carries out clause's fusion, thus generates new sentence.
But adopt and in this way many document structure tree are made a summary, in cluster in units of sentence, granularity is excessively thick, causes Similarity Measure easily to mislead by tediously long in sentence and insignificant part, causes that the judgement of the multi-document summary of generation to important information is not accurate enough and redundance is higher.
Summary of the invention
Embodiments provide a kind of method and apparatus generating multi-document summary, the multi-document summary being provided for generation reduces redundance while ensureing the high coverage rate to many documents important information.
Embodiment of the present invention first aspect provides a kind of method generating multi-document summary, comprising:
The sentence set destructing of many sections of documents is phrase pond by summarization generation device;
Described summarization generation device obtains the characteristic sum relation of each phrase in described phrase pond, and described feature is for representing the characteristic of described each phrase self, and described relation is for representing the relation between described each phrase and other phrases;
Described summarization generation device is according to the characteristic sum relation of each phrase described, and choosing the phrase book cooperation meeting preset constraint condition from described phrase pond is the set of summary phrase;
Described summarization generation device is combined into summary sentence according to pre-set combinations mode by charge-coupled for described summary phrase book, generates the summary of described many sections of documents.
In conjunction with the first aspect of the embodiment of the present invention, in the first implementation of embodiment of the present invention first aspect, the characteristic sum relation that described summarization generation device obtains each phrase in described phrase pond specifically comprises:
Described summarization generation device obtains compatibility between the importance degree of each phrase in described phrase pond and each phrase and similarity, the significance level that described importance degree embodies in statement document semanteme for weighing concept representated by phrase or information, described compatibility is for weighing the possibility being formed between phrase and arrange in pairs or groups and appear in same sentence, and described similarity is for weighing the degree of semantic similitude between phrase.
In conjunction with the first implementation of embodiment of the present invention first aspect, in the second implementation of embodiment of the present invention first aspect, described summarization generation device is according to the characteristic sum relation of each phrase described, and choosing the phrase book cooperation meeting preset constraint condition from described phrase pond is that the set of summary phrase specifically comprises:
Described summarization generation application of installation solves the method for planning problem, when at utmost meeting described preset constraint condition and forming, the set of described summary phrase is chosen from described phrase pond, described planning problem is provided by described preset constraint condition, described preset constraint condition comprises the constraint to phrase importance degree, the constraint to phrase compatibility and the constraint to phrase similarity.
In conjunction with the second implementation of embodiment of the present invention first aspect, in the third implementation of embodiment of the present invention first aspect, described preset constraint condition also comprises the constraint to phrase candidate weight;
Described method also comprises:
Described summarization generation device is according to the compatibility between the importance degree of each phrase in described phrase pond and each phrase and similarity, solve the extreme value of given objective function, obtain described candidate's weight of each phrase in described phrase pond, wherein, described objective function is by the importance degree of each phrase described, and compatibility between each phrase and similarity combination are formed, described objective function is for describing quantity of information and the redundance of combination, when described objective function gets extreme value, quantity of information is maximum and redundance is minimum.
In conjunction with the first aspect of the embodiment of the present invention to any one implementation in the third implementation of first aspect, in the 4th kind of implementation of embodiment of the present invention first aspect, the described sentence set destructing by many sections of documents is that phrase pond specifically comprises:
The sentence set of described many sections of documents is utilized semantic analysis tools build syntax tree;
Extract whole phrase composition phrase pond on described syntax tree.
In conjunction with the first aspect of the embodiment of the present invention to any one implementation in the third implementation of first aspect, in 5th kind of implementation of embodiment of the present invention first aspect, described summarization generation device is combined into summary sentence according to pre-set combinations mode by charge-coupled for described summary phrase book, and the summary generating described many sections of documents specifically comprises:
Described summarization generation device, according to the order of each summary phrase in the sentence of described many sections of documents in the set of described summary phrase, arranges described summary phrase, obtains sentence of making a summary;
The earliest time that described summary sentence occurs according to verb phrase in described many sections of documents is arranged, obtains the summary of described many sections of documents.
In conjunction with the 5th kind of implementation of embodiment of the present invention first aspect, in 6th kind of implementation of embodiment of the present invention first aspect, described by described summary sentence according in described many sections of documents verb phrase occur earliest time arrange, also comprise before obtaining the step of the summary of described many sections of documents:
To the summary sentence comprising multiple verb phrase, between each verb phrase of this summary sentence, add conjunction.
Embodiment of the present invention second aspect provides a kind of summarization generation device, comprising:
Destructing module, for being phrase pond by the sentence set destructing of many sections of documents;
First acquisition module, for obtaining the characteristic sum relation of each phrase in phrase pond that the destructing of described destructing module obtains, described feature is for representing the characteristic of described each phrase self, and described relation is for representing the relation between described each phrase and other phrases;
Choose module, for the characteristic sum relation of each phrase according to described first acquisition module acquisition, choosing the phrase book cooperation meeting preset constraint condition from described phrase pond is the set of summary phrase;
Composite module, for choosing that summary phrase book that module selects is charge-coupled is combined into summary sentence according to pre-set combinations mode by described, generates the summary of described many sections of documents.
In conjunction with the second aspect of the embodiment of the present invention, in the first implementation of embodiment of the present invention second aspect, described first acquisition module specifically for, obtain the compatibility between the importance degree of each phrase in described phrase pond and each phrase and similarity, the significance level that described importance degree embodies in statement document semanteme for weighing concept representated by phrase or information, described compatibility is for weighing the possibility being formed between phrase and arrange in pairs or groups and appear in same sentence, and described similarity is for weighing the degree of semantic similitude between phrase.
In conjunction with the first implementation of embodiment of the present invention second aspect, in the second implementation of embodiment of the present invention second aspect, described choose module specifically for, application solves the method for planning problem, when at utmost meeting described preset constraint condition and forming, the set of described summary phrase is chosen from described phrase pond, described planning problem is provided by described preset constraint condition, described preset constraint condition comprises the constraint to phrase importance degree, the constraint to phrase compatibility and the constraint to phrase similarity.
In conjunction with the second implementation of embodiment of the present invention second aspect, in the third implementation of embodiment of the present invention second aspect, described preset constraint condition also comprises the constraint to phrase candidate weight;
Described summarization generation device also comprises:
Second acquisition module, for according to the compatibility between the importance degree of each phrase in described phrase pond and each phrase and similarity, solve the extreme value of given objective function, obtain described candidate's weight of each phrase in described phrase pond, wherein, described objective function is by the importance degree of each phrase described, and compatibility between each phrase and similarity combination are formed, described objective function is for describing quantity of information and the redundance of combination, when described objective function gets extreme value, quantity of information is maximum and redundance is minimum.
In conjunction with the second aspect of the embodiment of the present invention to any one implementation in the third implementation of second aspect, in the 4th kind of implementation of embodiment of the present invention second aspect, described destructing module specifically comprises:
Construction unit, for utilizing semantic analysis tools build syntax tree by the sentence set of described many sections of documents;
Extracting unit, on the syntax tree extracting described construction unit structure, all phrase forms phrase pond.
In conjunction with the second aspect of the embodiment of the present invention to any one implementation in the third implementation of second aspect, in the 5th kind of implementation of embodiment of the present invention second aspect, described composite module specifically comprises:
Phrase arrangement units, for choosing the order of each summary phrase in the sentence of described many sections of documents in the summary phrase set that module selects according to described, arranging described summary phrase, obtaining sentence of making a summary;
Sentence arrangement module, arranging according to the earliest time of verb phrase appearance in described many sections of documents for described phrase arrangement units being arranged the summary sentence obtained, obtaining the summary of described many sections of documents.
In conjunction with the 5th kind of implementation of embodiment of the present invention second aspect, in the 6th kind of implementation of embodiment of the present invention second aspect, also comprise in described composite module:
Adding device, for the summary sentence comprising multiple verb phrase, adds conjunction between each verb phrase of this summary sentence.
As can be seen from the above technical solutions, the embodiment of the present invention has the following advantages: be first phrase pond by the sentence set destructing of many sections of documents in the embodiment of the present invention, obtain the characteristic sum relation of each phrase in phrase pond, choosing the phrase book cooperation meeting preset constraint condition in phrase pond according to this characteristic sum relation is the set of summary phrase, summary sentence is combined into by charge-coupled for the summary phrase book selected again according to pre-set combinations mode, generate multi-document summary, phrase is chosen like this according to the characteristic sum relation of phrase, adopt the base unit that phrase judges as importance and redundance, judge more to become more meticulous, by choosing and combination phrase, the multi-document summary generated is made to reduce redundance while guarantee is to the high coverage rate of many documents important information.
Accompanying drawing explanation
Fig. 1 is the method schematic flow sheet generating multi-document summary in the embodiment of the present invention;
Fig. 2 is another schematic flow sheet of method generating multi-document summary in the embodiment of the present invention;
Fig. 3 builds compatibility relation example schematic in the embodiment of the present invention;
Fig. 4 is summarization generation device structural representation in the embodiment of the present invention;
Fig. 5 is another structural representation of summarization generation device in the embodiment of the present invention;
Fig. 6 is another structural representation of summarization generation device in the embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those skilled in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
Refer to Fig. 1, the method embodiment generating multi-document summary in the embodiment of the present invention comprises:
101, the sentence set destructing of many sections of documents is phrase pond by summarization generation device;
If desired to many sections of document structure tree multi-document summary, the sentence set destructing of many sections of documents is phrase pond by summarization generation device.
Can noun phrase be included, verb phrase in this phrase pond, the phrase of other parts of speech can also be comprised, such as Adjective Phrases, number phrase etc., specifically determine according to the phrase contained in many sections of documents, do not limit herein.Be understandable that, in natural language processing, in fact noun phrase comprises pronoun, and pronoun is considered to the one of noun.
102, summarization generation device obtains the characteristic sum relation of each phrase in described phrase pond;
Sentence set destructing is, behind phrase pond, obtain the characteristic sum relation of each phrase in this phrase pond, wherein by summarization generation device, the characteristic of each phrase of character representation self, such as importance degree etc., relation for representing the relation between each phrase and other phrases, such as compatible or similarity etc.
103, summarization generation device is according to the characteristic sum relation of each phrase described, and choosing the phrase book cooperation meeting preset constraint condition from described phrase pond is the set of summary phrase;
After summarization generation device gets the characteristic sum relation of each phrase, according to this characteristic sum relation, from phrase pond, choose the phrase book cooperation meeting preset constraint condition is the set of summary phrase.
Be understandable that, in preset constraint condition, contain the constraint of the characteristic sum relation to phrase, the phrase not meeting preset constraint condition all can disallowable fall, forming the set of summary phrase to retaining the phrase set meeting preset constraint condition, being used for composition summary.
Be understandable that, the feature of phrase can be used for representing the importance in a document of phrase, the relation of phrase can be used for the redundance represented in a document, and preset constraint condition is by screening the importance of phrase and redundance the constraint of characteristic sum relation.
104, summarization generation device is combined into summary sentence according to pre-set combinations mode by charge-coupled for described summary phrase book, generates the summary of described many sections of documents.
Summarization generation device is combined into summary sentence according to pre-set combinations mode by charge-coupled for this summary phrase book, generates the multi-document summary of these many sections of documents after obtaining the set of summary phrase.
Be first phrase pond by the sentence set destructing of many sections of documents in the embodiment of the present invention, obtain the characteristic sum relation of each phrase in phrase pond, choosing the phrase book cooperation meeting preset constraint condition in phrase pond according to this characteristic sum relation is the set of summary phrase, again according to the summary phrase collective combinations summary sentence that pre-set combinations mode will be selected, generate multi-document summary, phrase is chosen like this according to the characteristic sum relation of phrase, adopt the base unit that phrase judges as importance and redundance, judge more to become more meticulous, by choosing and combination phrase, the multi-document summary generated is made to reduce redundance while guarantee is to the high coverage rate of many documents important information.
Be specifically described the method generating multi-document summary in the embodiment of the present invention below, refer to Fig. 2, the method generating multi-document summary in the embodiment of the present invention comprises:
201, the sentence set of described many sections of documents is utilized semantic analysis tools build syntax tree by summarization generation device;
In this step, semantic analysis instrument can by the semantic analysis to sentence each in many sections of documents, and building syntax tree is each phrase by each sentences decomposition, and each phrase decomposited becomes the branch of syntax tree.
Be understandable that, this semantic analysis instrument can be self-contained by summarization generation device, also can be the semantic analysis instrument of summarization generation device by network request, be not construed as limiting herein.
202, summarization generation device extracts whole phrase composition phrase pond on described syntax tree;
After summarization generation device builds syntax tree to the sentence set of many sections of documents, extract whole phrase composition phrase pond on syntax tree.
Can noun phrase be included, verb phrase in this phrase pond, the phrase of other parts of speech can also be comprised, such as Adjective Phrases, number phrase etc., specifically determine according to the phrase contained in many sections of documents, do not limit herein.Be understandable that, in natural language processing, in fact noun phrase comprises pronoun, and pronoun is considered to the one of noun.
Be understandable that, step 201 to step 202 is the processes resolving into phrase in the sentence set by many sections of documents, in actual applications, except adopting the mode building syntax tree, sentences decomposition is become phrase, other mode a lot of can also be adopted, as long as the sentence set of many sections of documents can be resolved into phrase, do not do concrete restriction herein.
203, summarization generation device obtains compatibility between the importance degree of each phrase in described phrase pond and each phrase and similarity;
Behind the phrase pond of whole phrases that the sentence set that summarization generation device obtains including many sections of documents is resolved into, obtain the compatibility between the importance degree of each phrase in phrase pond and each phrase and similarity, wherein, the significance level that importance degree embodies in statement document semanteme for weighing concept representated by phrase or information, compatibility is for weighing the possibility being formed between phrase and arrange in pairs or groups and appear in same sentence, and similarity is for weighing the degree of semantic similitude between phrase.
Be understandable that, the importance degree of each phrase embodies the feature of each phrase, and the compatibility between each phrase and similarity embody the relation between each phrase.
Wherein, for importance degree, similarity, and the concrete account form of compatibility, can adopt various ways, is described below with wherein one:
One, to the calculating of phrase importance degree, the conceptual frequencies of additional position weight can be adopted to add up, and detailed process is:
1, structure concept set, concept comprises: unigram (word), bigram (two word), namedentity (physical name).
2, add up the conceptual frequencies of additional position weight: for a certain concept, add up its occurrence number in many sections of documents, simultaneously according to the position occurred, be weighted occurring at every turn, occur that more forward weight is larger.
3, the frequency sum of all financial resourcess concept that comprises for it of the importance degree of phrase.
Two, cosinesimilarity (cosine similarity) or jaccardindex (exponential function) can be passed through to the calculating of phrase similarity and calculate the similarity between two of verb phrase, the similarity between two of noun phrase.
Three, phrase compatibility embodies the numerical value of the compatibility between each phrase, and whether compatible specifically a finger noun phrase and a verb phrase, can form a sentence jointly, as follows to the concrete determination methods of compatibility:
1, to each noun or verb phrase, look for its several nearest-neighbors, each nearest-neighbors is regarded as the candidate replacing current phrase.
2, build compatibility relation, be illustrated in figure 3 and build compatibility relation example schematic, wherein NP represents noun phrase, and VP represents verb phrase, and the identical expression of footmark is from same sentence, and footmark difference represents from different sentences.NP 0and VP 0from same sentence, NP 1and NP 0it is arest neighbors.Wherein dotted line is NP and VP pair of new compatibility of adding, and solid line is unborn compatibility relation.
3, according to the compatibility relation built, its numerical value can be turned to the compatibility between each phrase, compatible good compatibility numerical value is high, and the compatibility numerical value of poor compatibility is low.
Be understandable that, above-mentioned importance degree, similarity, the account form of a compatibility only just example, can also have other calculating importance degrees more, similarity, the mode of compatibility, does not limit herein.
204, summarization generation device is according to the compatibility between the importance degree of each phrase in phrase pond and each phrase and similarity, solves the extreme value of given objective function, obtains candidate's weight of each phrase in described phrase pond.
Wherein, objective function is by the importance degree of each phrase, and compatibility between each phrase and similarity combination are formed, and this objective function is for describing quantity of information and the redundance of combination, and when objective function gets extreme value, quantity of information is maximum and redundance is minimum.
Be understandable that, the importance degree of phrase is relevant to quantity of information, and the similarity of phrase is relevant to redundance, time then in order to make objective function get extreme value, quantity of information is maximum and redundance is minimum, needs in objective function, importance degree correlation parameter is rewarded, similarity correlation parameter is punished.
Optionally, an objective function is as follows:
max { &Sigma; i &alpha; i S i n - &Sigma; i < j &alpha; ij ( S i n + S j n ) R ij n + &Sigma; i &beta; i S i v - &Sigma; i < j &beta; ij ( S i v + S j v ) R ij v } ,
Noun phrase and verb phrase in phrase pond are numbered respectively, wherein S is the importance degree parameter of phrase, relevant to importance degree.Subscript i is the sequence number of the phrase chosen is that i, subscript j represent that the sequence number of the phrase chosen is j, and subscript n represents that what choose is noun phrase, and subscript v represents that what choose is verb phrase, then S i nrepresent that sequence number is the importance degree parameter of the noun phrase of i, S i vrepresent that sequence number is the importance degree parameter of the verb phrase of i, S j nrepresent that sequence number is the importance degree parameter of the noun phrase of j, S j vrepresent that sequence number is the importance degree parameter of the verb phrase of j.R represents the redundance parameter of phrase, relevant to similarity.Due to the relation that similarity is between phrase, then the subscript of R is the sequence number of two noun phrases or two verb phrases, represents the redundance before these these two phrases chosen, R ij nredundance between the noun phrase representing sequence number to be i and sequence number be j, R ij vredundance between the verb phrase representing sequence number to be i and sequence number be j.The Section 1 of objective function and the importance degree parameter of Section 3 to phrase are rewarded, the importance degree weight sum of each phrase is added the importance degree sum part obtaining objective function, the Section 2 of objective function and the redundance parameter of Section 4 to phrase are punished, deduct the redundance parameters weighting sum of each phrase.α irepresent that sequence number is candidate's weight of the noun phrase of i, β irepresent that sequence number is candidate's weight of the verb phrase of i, α ijrepresent that sequence number is the contact weight between the noun phrase of i and j, β ijrepresent that sequence number is the contact weight between the verb phrase of i and j.
By solving extreme value to above-mentioned objective function, the contact weight between candidate's weight of each phrase in phrase pond when making quantity of information maximum redundancy degree minimum and each phrase can be solved.
Be understandable that, above-mentioned just objective function example, other various forms of objective functions can also be had, candidate's weight or the contact weight of each phrase can be obtained, as long as reward importance degree in objective function, punish redundance, the concrete form of objective function is not construed as limiting herein.
Be understandable that, use candidate's weight or contact weight that objective function solves if do not need in subsequent step, also can not perform step 204, be not construed as limiting herein.
205, summarization generation application of installation solves the method for planning problem, when at utmost meeting preset constraint condition and forming, chooses the set of summary phrase from described phrase pond.
Wherein, this planning problem has preset constraint condition to provide, and this preset constraint condition can comprise the constraint to phrase importance degree, to the constraint of phrase compatibility, to the constraint of phrase similarity.
Optionally, this preset constraint condition can also comprise other constraint more, such as, to the constraint etc. of phrase quantity and summary total length.Further, this preset constraint condition can also comprise the constraint of the candidate's weight to phrase, the constraint etc. of the contact weight between also can comprising phrase, and step limits herein.
Be understandable that, preset constraint condition adopts the mode of mathematics to show by the requirement of summary phrase, in actual applications, preset constraint condition can show as the set of a series of inequality, for limiting the importance degree of phrase, compatibility, the span of similarity, candidate's weight etc., selects the summary phrase set met the demands.
Below choosing several constraint condition is example:
N i, V irepresent that sequence number be the noun phrase of i and sequence number is the verb phrase of i;
α irepresent that sequence number is candidate's weight of the noun phrase of i, β irepresent that sequence number is candidate's weight of the verb phrase of i, α ijrepresent that sequence number is the contact weight between the noun phrase of i and j, β ijrepresent that sequence number is the contact weight between the verb phrase of i and j;
S i nrepresent that sequence number is the importance degree parameter of the noun phrase of i, S i vrepresent that sequence number is the importance degree parameter of the verb phrase of i;
R ij nredundance between the noun phrase representing sequence number to be i and sequence number be j, R ij vredundance between the verb phrase representing sequence number to be i and sequence number be j;
γ ijrepresent for weighing the noun phrase N that sequence number is i iwith the verb phrase V that sequence number is j jbetween the compatibility parameter of compatibility;
Following formula (1) is a constraint condition, may be used for retraining the restriction of noun validity:
&ForAll; i , j , &alpha; i &GreaterEqual; &gamma; ~ ij , &ForAll; i , &Sigma; j &gamma; ~ ij &GreaterEqual; &alpha; i . - - - ( 1 )
Following formula (2) is another constraint condition, may be used for retraining the restriction of verb validity:
&ForAll; j , &Sigma; i &gamma; ~ ij = &beta; j . - - - ( 2 )
Following formula (3) is another constraint condition, may be used for comprising to evade to phrase retraining:
Following formula (4) is another constraint condition, may be used for retraining the restriction of simple sentence length:
&ForAll; i , &Sigma; i &alpha; i &le; K , - - - ( 4 )
Following formula (5) is another constraint condition, may be used for retraining summary overall length:
&Sigma; i { l ( N i ) * &alpha; i } + &Sigma; j { l ( V j ) * &beta; j } &le; L , - - - ( 5 )
α iji≤0,
(6)
α ijj≤0,
α ijij<1.
β iji≤0,
(7)
β ijj≤0,
β ijij≤1.
Following formula (6) or (7) are another constraint condition, may be used for repeating restriction to word and retrain:
Following formula (8) is another constraint condition, may be used for evading pronoun retraining:
ifN iispronoun,thenα i=0.(8)
Following formula (9) is another constraint condition, may be used for evading ultrashort sentence retraining:
ifl(S)<M,V i∈S,thenβ i=0,(9)
Be understandable that, these constraint conditions above are only some examples of preset constraint condition, can also have more constraint condition in addition, according to the demand of practical application, these constraint conditions can be used alone, and also multiplely can combinationally use, are not construed as limiting herein.
206, summarization generation device is according to the order of each summary phrase in the sentence of described many sections of documents in the set of described summary phrase, arranges, obtain sentence of making a summary to described summary phrase;
After summarization generation device obtains the set of summary phrase, according to the order of each summary phrase in the sentence of many sections of documents, summary phrase is sorted, obtains sentence of making a summary.
It should be noted that, summary phrase is being arranged in the process of summary sentence, noun phrase and verb phrase form phrase group, and the order that the order in this phrase group between noun phrase and verb phrase is occurred in a document by this noun phrase and verb phrase determines; The order that mutual order between multiple phrase group is occurred in a document by the verb phrase in each phrase group determines, concrete: 1) from the verb phrase of same document, by the natural order sequence in the document.2) from the verb phrase of different document, according to the timestamp ordering of the original text shelves at each verb phrase place.
207, to the summary sentence comprising multiple verb phrase, between each verb phrase of this summary sentence, conjunction is added;
Summarization generation device to the summary sentence comprising multiple verb phrase, adds conjunction after obtaining summary sentence between each verb phrase of this summary sentence, to strengthen readability and the smoothness of summary sentence.
208, the earliest time that described summary sentence occurs according to verb phrase in described many sections of documents is arranged, obtain the summary of described many sections of documents.
The earliest time that summary sentence occurs according to verb phrase in many sections of documents arranges by summarization generation device, obtains the summary of many sections of documents.
Wherein, the earliest time that summary sentence occurs according to verb phrase in many sections of documents arranged, detailed process can be:
1, the timestamp of each summary sentence is defined as the timestamp of the verb phrase occurred the earliest in sentence;
2, according to the timestamp of each summary sentence, each summary sentence is sorted.
Be understandable that, in step 206 to step 208, mainly according to the time sequencing of the order of verb phrase in former many sections of document sentences and appearance, the phrase in the set of summary phrase is arranged, in actual applications, except adopting this arrangement mode, can also adopt other arrangement mode, the arrangement mode of such as other Corpus--based Method or the arrangement mode etc. based on rule of combination or gang form, do not limit herein.
In the embodiment of the present invention, summarization generation device is by the constraint of preset constraint condition to phrase importance degree and candidate's weight, ensure that the coverage rate of multi-document summary to important information, by to similarity, the constraint of compatibility or contact weight, reduces in multi-document summary the redundance of sentence of making a summary, further, in the summary sentence containing multiple verb phrase, add conjunction, ensure that the flatness of the new sentence be combined into.
Be that the effect that granularity forms multi-document summary is described with sentence to being that granularity formed in the effect of multi-document summary and prior art with phrase in the embodiment of the present invention below:
In order to evaluate important information covering power of the present invention, we adopt Pyramidmethod method of testing to test and assess, and evaluation result is as shown in table 1 below:
Table 1
Wherein, Pyramidmethod is a kind of evaluation method evaluating summary result proposed in " Evaluatingcontentselectioninsummarization " this section of paper in 2004 by AniNenkova and RebeccaJ.Passonneau, generally used by industry, TAC match is the match of the evaluation summary result of an industry authority.Upper table 1 for the rank in TAC in 2011 match chosen first three the summary result of troop and the summary result adopting the method for the embodiment of the present invention to obtain, the contrast of the test and appraisal mark adopting the importance degree coverage rate of Pyramidmethod to summary result to obtain.First of table 1 is classified as the numbering of troop in TAC match, and the numbering obtaining the troop of front three is respectively No. 22, No. 43 and No. 17, and the present invention represents the scheme adopted in the embodiment of the present invention; Carry out to the important information coverage rate of each troop summary result the mark that test and appraisal automatically obtain when the secondary series of table 1 and the 3rd row are respectively that in Pyramidmethod appraisal model, datum line value is 0.6 and 0.625, table 1 the 3rd is classified as to compete the ranking that Zhong Ge troop obtains at TAC in 2011.As seen from Table 1, the summary result that the scheme that the summary result adopting the scheme in the embodiment of the present invention to obtain is taked apparently higher than other troops in important information coverage rate obtains, even if they are the troops obtaining front three in TAC match.
While completing above high coverage, redundance of the present invention is lower.Particularly, length of summarization contrast is as shown in table 2 below:
Table 2
The numbering of troop in TAC match in 2011 Length of summarization
The present invention 94.3
22 99.4
43 99.8
17 99.6
From in upper table 2, the summary result adopting the scheme of the embodiment of the present invention to obtain in redundance, be also starkly lower than that the troop that obtains front three in TAC match obtains the redundance of summary result.
In the summary that the present invention generates, three class sentences, i.e. new sentence, the sentence of compression, former sentence, ratio as shown in table 3 below:
Table 3
Sentence type Occupy ratio
New sentence 33.0%
The sentence of compression 44.3%
Former sentence 22.7%
As shown in Table 3, existing new sentence in the summary result adopting the scheme of the embodiment of the present invention to obtain, has again the sentence of compression, also has the former sentence in former many sections of documents.Therefore, the compatibility of the present invention and existing method part, provides new selection simultaneously.
Meanwhile, we have carried out artificial evaluation and test to sentence readability, and first sentence score value is defined as follows:
3 points of expressions: newly-generated sentence has merged the relevant fact of same NP rightly, and has good fluency and readability.
2 points of expressions: newly-generated sentence has correctly merged the relevant fact of same NP, and better readable.But fluency is general.
1 point of expression: newly-generated sentence has correctly merged the relevant fact of same NP, but readers' needs is paid certain effort to read and understood.
0 point of expression: because phrase merges, cause newly-generated sentence to comprise the fact of mistake.
We choose 20 newly-generated summary sentences and evaluate and test, and the mean scores obtained is 2.65, and therefore the readability of new sentence is enough good.
Be described the summarization generation device in the embodiment of the present invention below, refer to Fig. 4, in the embodiment of the present invention, summarization generation device comprises:
Destructing module 401, for being phrase pond by the sentence set destructing of many sections of documents;
First acquisition module 402, for obtaining the characteristic sum relation of each phrase in phrase pond that the destructing of described destructing module 401 obtains, described feature is for representing the characteristic of described each phrase self, and described relation is for representing the relation between described each phrase and other phrases;
Choose module 403, for the characteristic sum relation of each phrase according to described first acquisition module 402 acquisition, choosing the phrase book cooperation meeting preset constraint condition from described phrase pond is the set of summary phrase;
Composite module 404, for choosing that summary phrase book that module 403 selects is charge-coupled is combined into summary sentence according to pre-set combinations mode by described, generates the summary of described many sections of documents.
In the embodiment of the present invention, the sentence set destructing of many sections of documents is first phrase pond by destructing module 401, first acquisition module 402 obtains the characteristic sum relation of each phrase in phrase pond, choosing module 403, choose the phrase book cooperation meeting preset constraint condition in phrase pond according to this characteristic sum relation be the set of summary phrase, composite module 404 is again according to the summary phrase collective combinations summary sentence that pre-set combinations mode will be selected, generate multi-document summary, phrase is chosen like this according to the characteristic sum relation of phrase, adopt the base unit that phrase judges as importance and redundance, judge more to become more meticulous, by choosing and combination phrase, the multi-document summary generated is made to reduce redundance while guarantee is to the high coverage rate of many documents important information.
Be specifically described summarization generation device in the embodiment of the present invention below, refer to Fig. 5, in the embodiment of the present invention, summarization generation device specifically comprises:
Destructing module 501, for being phrase pond by the sentence set destructing of many sections of documents;
First acquisition module 502, for obtaining the characteristic sum relation of each phrase in phrase pond that the destructing of described destructing module 501 obtains, described feature is for representing the characteristic of described each phrase self, and described relation is for representing the relation between described each phrase and other phrases;
Choose module 503, for the characteristic sum relation of each phrase according to described first acquisition module 502 acquisition, choosing the phrase book cooperation meeting preset constraint condition from described phrase pond is the set of summary phrase;
Composite module 504, for choosing that summary phrase book that module 403 selects is charge-coupled is combined into summary sentence according to pre-set combinations mode by described, generates the summary of described many sections of documents;
In the present embodiment, this first acquisition module 502 specifically for, obtain the compatibility between the importance degree of each phrase in described phrase pond and each phrase and similarity, the significance level that described importance degree embodies in statement document semanteme for weighing concept representated by phrase or information, described compatibility is for weighing the possibility being formed between phrase and arrange in pairs or groups and appear in same sentence, and described similarity is for weighing the degree of semantic similitude between phrase;
This choose module 503 specifically for, application solves the method for planning problem, when at utmost meeting described preset constraint condition and forming, the set of described summary phrase is chosen from described phrase pond, described planning problem is provided by described preset constraint condition, described preset constraint condition comprises the constraint to phrase importance degree, the constraint to phrase compatibility and the constraint to phrase similarity;
Optionally, the constraint to candidate's weight can also be comprised in this preset constraint condition;
This summarization generation device can also comprise:
Second acquisition module 505, for according to the compatibility between the importance degree of each phrase in described phrase pond and each phrase and similarity, solve the extreme value of given objective function, obtain described candidate's weight of each phrase in described phrase pond, wherein, described objective function is by the importance degree of each phrase described, and compatibility between each phrase and similarity combination are formed, described objective function is for describing quantity of information and the redundance of combination, when described objective function gets extreme value, quantity of information is maximum and redundance is minimum;
Optionally, this destructing module 501 specifically can comprise:
Construction unit 5011, for utilizing semantic analysis tools build syntax tree by the sentence set of described many sections of documents;
Extracting unit 5012, on the syntax tree extracting described construction unit 5011 structure, all phrase forms phrase pond;
Optionally, this composite module 504 specifically can comprise:
Phrase arrangement units 5041, for choosing the order of each summary phrase in the sentence of described many sections of documents in the summary phrase set that module 503 selects according to described, arranging described summary phrase, obtaining sentence of making a summary;
Sentence arrangement units 5042, arranging according to the earliest time of verb phrase appearance in described many sections of documents for described phrase arrangement units 5041 being arranged the summary sentence obtained, obtaining the summary of described many sections of documents;
Optionally, this composite module 504 can also comprise:
Adding device 5043, for the summary sentence comprising multiple verb phrase, adds conjunction between each verb phrase of this summary sentence.
In the embodiment of the present invention, choose module 503 by the constraint of preset constraint condition to phrase importance degree and candidate's weight, ensure that the coverage rate of multi-document summary to important information, by to similarity, the constraint of compatibility or contact weight, reduces in multi-document summary the redundance of sentence of making a summary, further, adding device 5043 adds conjunction in the summary sentence containing multiple verb phrase, ensure that the flatness of the new sentence be combined into.
From the angle of blocking functional entity, the summarization generation device the embodiment of the present invention is described above, from the angle of hardware handles, the summarization generation device the embodiment of the present invention is described below, refer to Fig. 6, another embodiment of summarization generation device 600 in the embodiment of the present invention comprises:
Input media 601, output unit 602, processor 603 and storer 604 (quantity of the processor 603 wherein in summarization generation device 600 can be one or more, for a processor 603 in Fig. 6).In some embodiments of the invention, input media 601, output unit 602, processor 603 are connected by bus or alternate manner with storer 604, wherein, to be connected by bus in Fig. 6.
Wherein, by calling the operational order that storer 604 stores, processor 603, for performing following steps:
Be phrase pond by the sentence set destructing of many sections of documents;
Obtain the characteristic sum relation of each phrase in described phrase pond, described feature is for representing the characteristic of described each phrase self, and described relation is for representing the relation between described each phrase and other phrases;
According to the characteristic sum relation of each phrase described, choosing the phrase book cooperation meeting preset constraint condition from described phrase pond is the set of summary phrase;
Be combined into summary sentence according to pre-set combinations mode by charge-coupled for described summary phrase book, generate the summary of described many sections of documents;
In some embodiments of the present invention, this processor 603 specifically performs following steps:
Obtain the compatibility between the importance degree of each phrase in described phrase pond and each phrase and similarity, the significance level that described importance degree embodies in statement document semanteme for weighing concept representated by phrase or information, described compatibility is for weighing the possibility being formed between phrase and arrange in pairs or groups and appear in same sentence, and described similarity is for weighing the degree of semantic similitude between phrase;
In some embodiments of the present invention, this processor 603 specifically performs following steps:
Application solves the method for planning problem, when at utmost meeting described preset constraint condition and forming, the set of described summary phrase is chosen from described phrase pond, described planning problem is provided by described preset constraint condition, described preset constraint condition comprises the constraint to phrase importance degree, the constraint to phrase compatibility and the constraint to phrase similarity;
In some embodiments of the present invention, this preset constraint condition also comprises the constraint to phrase candidate weight, then this processor 603 also performs following steps:
According to the compatibility between the importance degree of each phrase in described phrase pond and each phrase and similarity, solve the extreme value of given objective function, obtain described candidate's weight of each phrase in described phrase pond, wherein, described objective function is by the importance degree of each phrase described, and compatibility between each phrase and similarity combination are formed, and described objective function is for describing quantity of information and the redundance of combination, when described objective function gets extreme value, quantity of information is maximum and redundance is minimum;
In some embodiments of the present invention, this processor 603 specifically performs following steps:
The sentence set of described many sections of documents is utilized semantic analysis tools build syntax tree;
Extract whole phrase composition phrase pond on described syntax tree;
In some embodiments of the present invention, this processor 603 specifically performs following steps:
According to the order of each summary phrase in the sentence of described many sections of documents in the set of described summary phrase, described summary phrase is arranged, obtains sentence of making a summary;
The earliest time occurred according to verb phrase in described many sections of documents arranges, and obtains the summary of described many sections of documents;
In some embodiments of the present invention, this processor 603 also performs following steps:
To the summary sentence comprising multiple verb phrase, between each verb phrase of this summary sentence, add conjunction.
Those skilled in the art can be well understood to, and for convenience and simplicity of description, the system of foregoing description, the specific works process of device and unit, with reference to the corresponding process in preceding method embodiment, can not repeat them here.
In several embodiments that the application provides, should be understood that, disclosed system, apparatus and method, can realize by another way.Such as, device embodiment described above is only schematic, such as, the division of described unit, be only a kind of logic function to divide, actual can have other dividing mode when realizing, such as multiple unit or assembly can in conjunction with or another system can be integrated into, or some features can be ignored, or do not perform.Another point, shown or discussed coupling each other or direct-coupling or communication connection can be by some interfaces, and the indirect coupling of device or unit or communication connection can be electrical, machinery or other form.
The described unit illustrated as separating component or can may not be and physically separates, and the parts as unit display can be or may not be physical location, namely can be positioned at a place, or also can be distributed in multiple network element.Some or all of unit wherein can be selected according to the actual needs to realize the object of the present embodiment scheme.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, also can be that the independent physics of unit exists, also can two or more unit in a unit integrated.Above-mentioned integrated unit both can adopt the form of hardware to realize, and the form of SFU software functional unit also can be adopted to realize.
If described integrated unit using the form of SFU software functional unit realize and as independently production marketing or use time, can be stored in a computer read/write memory medium.Based on such understanding, the part that technical scheme of the present invention contributes to prior art in essence in other words or all or part of of this technical scheme can embody with the form of software product, this computer software product is stored in a storage medium, comprising some instructions in order to make a computer equipment (can be personal computer, server, or the network equipment etc.) perform all or part of step of method described in each embodiment of the present invention.And aforesaid storage medium comprises: USB flash disk, portable hard drive, ROM (read-only memory) (ROM, Read-OnlyMemory), random access memory (RAM, RandomAccessMemory), magnetic disc or CD etc. various can be program code stored medium.
The above, above embodiment only in order to technical scheme of the present invention to be described, is not intended to limit; Although with reference to previous embodiment to invention has been detailed description, those of ordinary skill in the art is to be understood that: it still can be modified to the technical scheme described in foregoing embodiments, or carries out equivalent replacement to wherein portion of techniques feature; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the spirit and scope of various embodiments of the present invention technical scheme.

Claims (14)

1. generate a method for multi-document summary, it is characterized in that, comprising:
The sentence set destructing of many sections of documents is phrase pond by summarization generation device;
Described summarization generation device obtains the characteristic sum relation of each phrase in described phrase pond, and described feature is for representing the characteristic of described each phrase self, and described relation is for representing the relation between described each phrase and other phrases;
Described summarization generation device is according to the characteristic sum relation of each phrase described, and choosing the phrase book cooperation meeting preset constraint condition from described phrase pond is the set of summary phrase;
Described summarization generation device is combined into summary sentence according to pre-set combinations mode by charge-coupled for described summary phrase book, generates the summary of described many sections of documents.
2. method according to claim 1, is characterized in that, the characteristic sum relation that described summarization generation device obtains each phrase in described phrase pond specifically comprises:
Described summarization generation device obtains compatibility between the importance degree of each phrase in described phrase pond and each phrase and similarity, the significance level that described importance degree embodies in statement document semanteme for weighing concept representated by phrase or information, described compatibility is for weighing the possibility being formed between phrase and arrange in pairs or groups and appear in same sentence, and described similarity is for weighing the degree of semantic similitude between phrase.
3. method according to claim 2, is characterized in that, described summarization generation device is according to the characteristic sum relation of each phrase described, and choosing the phrase book cooperation meeting preset constraint condition from described phrase pond is that the set of summary phrase specifically comprises:
Described summarization generation application of installation solves the method for planning problem, when at utmost meeting described preset constraint condition and forming, the set of described summary phrase is chosen from described phrase pond, described planning problem is provided by described preset constraint condition, described preset constraint condition comprises the constraint to phrase importance degree, the constraint to phrase compatibility and the constraint to phrase similarity.
4. method according to claim 3, is characterized in that,
Described preset constraint condition also comprises the constraint to phrase candidate weight;
Described method also comprises:
Described summarization generation device is according to the compatibility between the importance degree of each phrase in described phrase pond and each phrase and similarity, solve the extreme value of given objective function, obtain described candidate's weight of each phrase in described phrase pond, wherein, described objective function is by the importance degree of each phrase described, and compatibility between each phrase and similarity combination are formed, described objective function is for describing quantity of information and the redundance of combination, when described objective function gets extreme value, quantity of information is maximum and redundance is minimum.
5. method according to any one of claim 1 to 4, is characterized in that, the described sentence set destructing by many sections of documents is that phrase pond specifically comprises:
The sentence set of described many sections of documents is utilized semantic analysis tools build syntax tree;
Extract whole phrase composition phrase pond on described syntax tree.
6. method according to any one of claim 1 to 4, is characterized in that, described summarization generation device is combined into summary sentence according to pre-set combinations mode by charge-coupled for described summary phrase book, and the summary generating described many sections of documents specifically comprises:
Described summarization generation device, according to the order of each summary phrase in the sentence of described many sections of documents in the set of described summary phrase, arranges described summary phrase, obtains sentence of making a summary;
The earliest time that described summary sentence occurs according to verb phrase in described many sections of documents is arranged, obtains the summary of described many sections of documents.
7. method according to claim 6, is characterized in that, described by described summary sentence according in described many sections of documents verb phrase occur earliest time arrange, also comprise before obtaining the step of the summary of described many sections of documents:
To the summary sentence comprising multiple verb phrase, between each verb phrase of this summary sentence, add conjunction.
8. a summarization generation device, is characterized in that, comprising:
Destructing module, for being phrase pond by the sentence set destructing of many sections of documents;
First acquisition module, for obtaining the characteristic sum relation of each phrase in phrase pond that the destructing of described destructing module obtains, described feature is for representing the characteristic of described each phrase self, and described relation is for representing the relation between described each phrase and other phrases;
Choose module, for the characteristic sum relation of each phrase according to described first acquisition module acquisition, choosing the phrase book cooperation meeting preset constraint condition from described phrase pond is the set of summary phrase;
Composite module, for choosing that summary phrase book that module selects is charge-coupled is combined into summary sentence according to pre-set combinations mode by described, generates the summary of described many sections of documents.
9. to remove the summarization generation device described in 8 according to right, it is characterized in that, described first acquisition module specifically for, obtain the compatibility between the importance degree of each phrase in described phrase pond and each phrase and similarity, the significance level that described importance degree embodies in statement document semanteme for weighing concept representated by phrase or information, described compatibility is for weighing the possibility being formed between phrase and arrange in pairs or groups and appear in same sentence, and described similarity is for weighing the degree of semantic similitude between phrase.
10. summarization generation device according to claim 9, it is characterized in that, described choose module specifically for, application solves the method for planning problem, when at utmost meeting described preset constraint condition and forming, from described phrase pond, choose the set of described summary phrase, described planning problem is provided by described preset constraint condition, described preset constraint condition comprises the constraint to phrase importance degree, the constraint to phrase compatibility and the constraint to phrase similarity.
11. summarization generation devices according to claim 10, is characterized in that, described preset constraint condition also comprises the constraint to phrase candidate weight;
Described summarization generation device also comprises:
Second acquisition module, for according to the compatibility between the importance degree of each phrase in described phrase pond and each phrase and similarity, solve the extreme value of given objective function, obtain described candidate's weight of each phrase in described phrase pond, wherein, described objective function is by the importance degree of each phrase described, and compatibility between each phrase and similarity combination are formed, described objective function is for describing quantity of information and the redundance of combination, when described objective function gets extreme value, quantity of information is maximum and redundance is minimum.
Summarization generation device according to any one of 12. according to Claim 8 to 11, is characterized in that, described destructing module specifically comprises:
Construction unit, for utilizing semantic analysis tools build syntax tree by the sentence set of described many sections of documents;
Extracting unit, on the syntax tree extracting described construction unit structure, all phrase forms phrase pond.
Summarization generation device according to any one of 13. according to Claim 8 to 11, is characterized in that, described composite module specifically comprises:
Phrase arrangement units, for choosing the order of each summary phrase in the sentence of described many sections of documents in the summary phrase set that module selects according to described, arranging described summary phrase, obtaining sentence of making a summary;
Sentence arrangement module, arranging according to the earliest time of verb phrase appearance in described many sections of documents for described phrase arrangement units being arranged the summary sentence obtained, obtaining the summary of described many sections of documents.
14. summarization generation devices according to claim 13, is characterized in that, also comprise in described composite module:
Adding device, for the summary sentence comprising multiple verb phrase, adds conjunction between each verb phrase of this summary sentence.
CN201410469449.2A 2014-09-15 2014-09-15 A kind of method and apparatus generating multi-document summary Active CN105488021B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410469449.2A CN105488021B (en) 2014-09-15 2014-09-15 A kind of method and apparatus generating multi-document summary

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410469449.2A CN105488021B (en) 2014-09-15 2014-09-15 A kind of method and apparatus generating multi-document summary

Publications (2)

Publication Number Publication Date
CN105488021A true CN105488021A (en) 2016-04-13
CN105488021B CN105488021B (en) 2018-09-28

Family

ID=55675005

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410469449.2A Active CN105488021B (en) 2014-09-15 2014-09-15 A kind of method and apparatus generating multi-document summary

Country Status (1)

Country Link
CN (1) CN105488021B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912700A (en) * 2016-04-26 2016-08-31 上海电机学院 Abstract generation method based on TMPP (Topic Model based on Phrase Parameter)
CN106844341A (en) * 2017-01-10 2017-06-13 北京百度网讯科技有限公司 News in brief extracting method and device based on artificial intelligence
CN107169049A (en) * 2017-04-25 2017-09-15 腾讯科技(深圳)有限公司 The label information generation method and device of application
CN107391460A (en) * 2017-07-04 2017-11-24 北京航空航天大学 A kind of industry security theme multi-document summary automatic generation method and device
CN108280112A (en) * 2017-06-22 2018-07-13 腾讯科技(深圳)有限公司 Abstraction generating method, device and computer equipment
CN108733682A (en) * 2017-04-14 2018-11-02 华为技术有限公司 A kind of method and device generating multi-document summary
CN109657053A (en) * 2018-12-13 2019-04-19 北京百度网讯科技有限公司 More text snippet generation methods, device, server and storage medium
CN110162618A (en) * 2019-02-22 2019-08-23 北京捷风数据技术有限公司 A kind of the text summaries generation method and device of non-control corpus
CN110705273A (en) * 2019-09-02 2020-01-17 腾讯科技(深圳)有限公司 Information processing method and device based on neural network, medium and electronic equipment
US10929452B2 (en) 2017-05-23 2021-02-23 Huawei Technologies Co., Ltd. Multi-document summary generation method and apparatus, and terminal
CN112836016A (en) * 2021-02-05 2021-05-25 北京字跳网络技术有限公司 Conference summary generation method, device, equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101398814A (en) * 2007-09-26 2009-04-01 北京大学 Method and system for simultaneously abstracting document summarization and key words
US20090300486A1 (en) * 2008-05-28 2009-12-03 Nec Laboratories America, Inc. Multiple-document summarization using document clustering

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101398814A (en) * 2007-09-26 2009-04-01 北京大学 Method and system for simultaneously abstracting document summarization and key words
US20090300486A1 (en) * 2008-05-28 2009-12-03 Nec Laboratories America, Inc. Multiple-document summarization using document clustering

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DRAGOMIR R. RADEV 等: "Generating Natural Language Summaries from Multiple On-Line Sources", 《COMPUTATIONAL LINGUISTICS》 *
张永刚: "基于统计的多文档关键短语和文摘抽取研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
徐永东: "多文档自动文摘关键技术研究", 《中国博士学位论文全文数据库 信息科技辑》 *
胡柏: "中文短语摘要的研究和系统开发", 《中国优秀博硕士学位论文全文数据库 (硕士) 信息科技辑》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912700A (en) * 2016-04-26 2016-08-31 上海电机学院 Abstract generation method based on TMPP (Topic Model based on Phrase Parameter)
CN106844341B (en) * 2017-01-10 2020-04-07 北京百度网讯科技有限公司 Artificial intelligence-based news abstract extraction method and device
CN106844341A (en) * 2017-01-10 2017-06-13 北京百度网讯科技有限公司 News in brief extracting method and device based on artificial intelligence
CN108733682A (en) * 2017-04-14 2018-11-02 华为技术有限公司 A kind of method and device generating multi-document summary
CN107169049A (en) * 2017-04-25 2017-09-15 腾讯科技(深圳)有限公司 The label information generation method and device of application
US10929452B2 (en) 2017-05-23 2021-02-23 Huawei Technologies Co., Ltd. Multi-document summary generation method and apparatus, and terminal
CN108280112A (en) * 2017-06-22 2018-07-13 腾讯科技(深圳)有限公司 Abstraction generating method, device and computer equipment
US11409960B2 (en) 2017-06-22 2022-08-09 Tencent Technology (Shenzhen) Company Limited Summary generation method, apparatus, computer device, and storage medium
CN107391460A (en) * 2017-07-04 2017-11-24 北京航空航天大学 A kind of industry security theme multi-document summary automatic generation method and device
CN109657053B (en) * 2018-12-13 2021-09-14 北京百度网讯科技有限公司 Multi-text abstract generation method, device, server and storage medium
CN109657053A (en) * 2018-12-13 2019-04-19 北京百度网讯科技有限公司 More text snippet generation methods, device, server and storage medium
CN110162618A (en) * 2019-02-22 2019-08-23 北京捷风数据技术有限公司 A kind of the text summaries generation method and device of non-control corpus
CN110162618B (en) * 2019-02-22 2021-09-17 北京捷风数据技术有限公司 Text summary generation method and device of non-contrast corpus
CN110705273A (en) * 2019-09-02 2020-01-17 腾讯科技(深圳)有限公司 Information processing method and device based on neural network, medium and electronic equipment
CN112836016A (en) * 2021-02-05 2021-05-25 北京字跳网络技术有限公司 Conference summary generation method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN105488021B (en) 2018-09-28

Similar Documents

Publication Publication Date Title
CN105488021A (en) Method and device generating multi-file summary
Mesgar et al. A neural local coherence model for text quality assessment
US9959776B1 (en) System and method for automated scoring of texual responses to picture-based items
Burstein et al. The e-rater® automated essay scoring system
Li et al. A clustering-based approach on sentiment analysis
CN109299865B (en) Psychological evaluation system and method based on semantic analysis and information data processing terminal
Agirre et al. CLEF 2008: Ad hoc track overview
Savoy Authorship attribution: A comparative study of three text corpora and three languages
KR20080021017A (en) Comparing text based documents
JP2010015571A (en) Automated evaluation of overly repetitive word use in essay
Lalata et al. A sentiment analysis model for faculty comment evaluation using ensemble machine learning algorithms
Somasundaran et al. Evaluating argumentative and narrative essays using graphs
Chen et al. Improve the detection of improperly used Chinese characters in students’ essays with error model
O'Rourke et al. Visualizing Topic Flow in Students' Essays.
Kumar et al. Discovering the predictive power of five baseline writing competences
François Combining a statistical language model with logistic regression to predict the lexical and syntactic difficulty of texts for FFL
Curtotti et al. A right to access implies a right to know: An open online platform for research on the readability of law
JP6942759B2 (en) Information processing equipment, programs and information processing methods
Putri et al. Software feature extraction using infrequent feature extraction
Doewes et al. Individual Fairness Evaluation for Automated Essay Scoring System.
Rubtsova Automatic term extraction for sentiment classification of dynamically updated text collections into three classes
CN109325096A (en) A kind of knowledge resource search system of knowledge based resource classification
CN116362208A (en) Text processing method, apparatus, device and computer readable storage medium
Richter et al. Tracking the evolution of written language competence: an NLP–based approach
Sinha et al. Influence of target reader background and text features on text readability in bangla: A computational approach

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant