CN101751425A

CN101751425A - Method for acquiring document set abstracts and device

Info

Publication number: CN101751425A
Application number: CN200810239344A
Authority: CN
Inventors: 万小军; 杨建武; 肖建国
Original assignee: BEIJING FOUNDER E-GOVERNMENT INFORMATION TECHNOLOGY Co Ltd; Peking University; Peking University Founder Group Co Ltd
Current assignee: BEIJING FOUNDER E-GOVERNMENT INFORMATION TECHNOLOGY Co Ltd; Peking University; Peking University Founder Group Co Ltd
Priority date: 2008-12-10
Filing date: 2008-12-10
Publication date: 2010-06-23

Abstract

The invention discloses a method for acquiring document set abstracts and a device for improving the acquiring effect of the document set abstracts. The method extracts each sentence included in each document in the document set for forming a sentence set, the importance weighted value of each sentence in the sentence set is determined on the basis of the text similarity between documents in the document set and between the sentences in the sentence set, and the document set abstracts are formed by selecting the sentences in the specified number according to the determined importance weighted values in accordance with the selection sequence from higher importance weighted values to lower importance weighted values.

Description

Method for acquiring document set abstracts and device

Technical field

The present invention relates to spoken and written languages process field and technical field of information retrieval, relate in particular to a kind of method for acquiring document set abstracts and device.

Background technology

Along with the quick promotion and application of Internet technology, the technology of obtaining of document set abstracts has been widely used in the searching field of text/web site contents.Document set abstracts obtains technology and is meant: automatically from a document sets that comprises many pieces of documents, obtain the information that reflection the document is concentrated the document content main points by computer system.This technology can provide document sets brief and concise content description for the user, provides convenience for the user consults the large volume document content.For example, the basic realization principle of the press service that certain the Internet portal website provided is the various news informations on the collection network at first, and according to theme and Doctype, the news information of collecting is sorted out, form a plurality of document sets, the user use the technology of obtaining of above-mentioned document set abstracts to obtain the summary of each document sets, so that can browse interested news fast and easily.

Existing method for acquiring document set abstracts mainly is divided into two classes: extract the method for acquiring document set abstracts of (Extraction) and based on the method for acquiring document set abstracts of sentence generation (Abstraction) based on sentence.Wherein, the realization principle of the method for acquiring document set abstracts that extracts based on sentence is to every piece in document sets document, cut apart by sentence, according to predetermined sentence weighted value measurement index, for example sentence position, word class bunch, theme signature, keyword frequency/inverted order index frequency (TF/IDF) etc., determine to cut apart the weights of importance value of each sentence in document sets that obtains, at least one sentence of selection weights of importance value maximum forms the summary of described document sets.Realization principle based on the method for acquiring document set abstracts of sentence generation is according to natural language understanding technology, each sentence in the document sets is carried out the syntax and semantics analysis, and use information extraction or natural language generation technique to produce new sentence, thereby obtain the summary of described document sets.From above description as can be seen, the summary of the document sets that method for acquiring document set abstracts obtained that extracts based on sentence, form by the existing sentence that document comprised in the document sets, need not analyze institute's content information in the document sets by the deep layer natural language understanding technology of complexity, therefore the method for acquiring document set abstracts that extracts based on sentence is compared with the method for acquiring document set abstracts based on sentence generation, realizes comparatively simple.

During the weights of importance value of existing method for acquiring document set abstracts each sentence in determining document sets that extracts based on sentence, remove the mode based on the sentence weighted value measurement index of being scheduled to of above-mentioned introduction, also can use method based on graph model.For example, (author is I.Mani and E.Bloedorn to article Summarizing Similarities andDifferences Among Related Documents, be published in the periodical Information Retrieval that published in 2000) method of a kind of WebSumm by name disclosed, the WebSumm method is utilized the figure link model, wherein each sentence in the document sets is represented on the summit in the figure link model respectively, the importance of sentence of supposing to be connected with other summit many more summit representatives is high more, come determining the weights of importance value of the sentence in the document sets with this, thereby obtain the summary of document sets.

In the method for the weights of importance value of determining each sentence in the document sets based on graph model of above-mentioned introduction, only considered the relation between the sentence in the document sets, do not consider of the influence of the relation of sentence and document to the importance of sentence, suppose that promptly the importance of all documents all equates in the document sets, yet the importance of different document is different in the document sets usually, the difference of importance that existing method for acquiring document set abstracts based on graph model can not reflect different document in the document sets is to obtaining document set abstracts result's influence, thus document set abstracts obtain poor effect.

Summary of the invention

The embodiment of the invention provides a kind of method for acquiring document set abstracts and device, in order to solve the problem that the existing mode document set abstracts that obtains document set abstracts based on graph model obtains poor effect.

The technical scheme that the embodiment of the invention provides is as follows:

A kind of method for acquiring document set abstracts comprises:

Extract each sentence that comprises in each document in the document sets, form the sentence set;

Based on the text similarity between the sentence in document in the document sets and the sentence set, determine the weights of importance value of each sentence in the sentence set;

According to the weights of importance value of determining,, select the sentence of defined amount to form document set abstracts according to weights of importance value selecting sequence from high to low.

A kind of document set abstracts deriving means comprises:

Sentence set extraction unit is used for extracting each sentence that comprises in each document of document sets, forms the sentence set;

Sentence weights of importance value determining unit is used for determining the weights of importance value of each sentence in the sentence set based on the document of document sets and the text similarity between the sentence in the sentence set;

The summary determining unit is used for the weights of importance value definite according to sentence weights of importance value determining unit, according to weights of importance value selecting sequence from high to low, selects the sentence of defined amount to form document set abstracts.

The multi-document summary acquisition methods that the embodiment of the invention proposes, utilized the relation between the sentence and document in the document sets, considered of the influence of the difference of different document importance in the document sets to sentence weights of importance value, therefore can determine the weights of importance value of sentence in the document sets more accurately, and select the high sentence of weights of importance value to form document set abstracts, therefore can obtain better document set abstracts obtains effect.

Description of drawings

Fig. 1 is the main realization principle flow chart of the embodiment of the invention;

Fig. 2 is the synoptic diagram of document sets bigraph (bipartite graph) in the embodiment of the invention;

The structural representation of the document set abstracts deriving means that Fig. 3 provides for the embodiment of the invention;

Fig. 4 is the structural representation of sentence weights of importance value determining unit in the embodiment of the invention;

Fig. 5 determines the structural representation of subelement for sentence weights of importance value in the embodiment of the invention;

Fig. 6 is the structural representation of summary determining unit in the embodiment of the invention;

Fig. 7 is the structural representation that the weights of importance value is adjusted subelement in the embodiment of the invention.

Embodiment

Because existing method for acquiring document set abstracts based on graph model can not reflect the influence of the importance of sentence place document to sentence weights of importance value, thus document set abstracts obtain poor effect.The embodiment of the invention makes up the bigraph (bipartite graph) model that comprises sentence and document relationships information by when setting up graph model, has solved the problems referred to above, provides better document set abstracts to obtain scheme.

Be explained in detail to the main realization principle of embodiment of the invention technical scheme, embodiment and to the beneficial effect that should be able to reach below in conjunction with each accompanying drawing.

As shown in Figure 1, the main realization principle process of the embodiment of the invention is as follows:

Step 10 makes up the document sets bigraph (bipartite graph) model that comprises the relation information between sentence and the document;

Step 20, the weights of importance value of each sentence in the sentence set in the constructed document sets bigraph (bipartite graph) model of determining step 10;

Step 30 selects the high sentence of weights of importance value to form document set abstracts.

In step 30, according to the similarity value between the sentence in the sentence set of the constructed document sets bigraph (bipartite graph) model of step 10, the weights of importance value of each sentence that step 20 is obtained is adjusted, in the similar sentence of content, the weights of importance value that only keeps one of them sentence is constant, reduce the weights of importance value of other sentence, it is low to guarantee to form between the sentence of document set abstracts redundance like this.

To introduce an embodiment in detail and come the main realization principle of the inventive method is explained in detail and illustrates according to foregoing invention principle of the present invention below.

The first step, the bigraph (bipartite graph) model of structure document sets comprises the relation information between sentence and the document in this model, and detailed process is as follows:

Use D={d _j| 1≤j≤m} represents document sets, wherein d _jJ document in the expression document sets, m is a natural number, the quantity of document in the expression document sets.

Each document in the document sets is all carried out subordinate sentence handle, obtain forming the sentence S set={ s of all documents in the document sets _i| 1≤i≤n}, wherein, s _iI sentence in the expression sentence S set, n is a natural number, the quantity of sentence in the set of expression sentence.

Sentence set and document sets as two vertex sets of bigraph (bipartite graph) model, be please refer to accompanying drawing 2, and the arbitrary sentence of representative and represent limit of interpolation between two summits of arbitrary document in the bigraph (bipartite graph) model obtains the set E on limit _SD={ e _Ij| s _i∈ S, d _j∈ D}, wherein e _IjExpression connects the summit of representing i sentence and the limit of representing the summit of j document.Limit e _IjHas similarity value w _Ij, this similarity value is used to describe sentence s _iWith document d _jThe text similarity degree, can determine by text information processing field cosine formula (Cosine) commonly used usually.Describing the adjacency matrix that concerns between the summit of all sentence set of this bigraph (bipartite graph) model and document sets correspondence is L=(w _Ij) _{N * m}

The bigraph (bipartite graph) model that obtains through above-mentioned processing can be expressed as G=＜S, D, E _SD.

Second step, according to the bigraph (bipartite graph) model that the first step is obtained, determine the weights of importance value of each sentence in the sentence set, detailed process is as follows:

It is all identical when the weights of importance value of arbitrary sentence is initial in the hypothetical sentence subclass, and it is also identical when the weights of importance value of arbitrary document is initial in the supposition document sets, when for example the weights of importance value of each sentence is initial in the sentence set in the present embodiment is 1, i.e. AuthScore ⁽⁰⁾(s _i)=1, the weights of importance value of each document is initially 1 in the document sets, i.e. HubScore ⁽⁰⁾(d _j)=1, wherein subscript is represented iterative computation wheel number;

According to following iterative computation formula, determine every take turns each sentence in the sentence set and the weights of importance value of each document in the document sets after the iteration, each sentence in sentence set and the weights of importance value of each document in the document sets respectively with on till the weights of importance value that obtains after taking turns iteration equates, promptly up to AuthScore ^(t+1)(s _i)=AuthScore ^(t)(s _i), and HubScore ^(t+1)(d _j)=HubScore ^(t)(d _j) till,

{AuthScore}^{(t + 1)} (s_{i}) = \underset{d_{j} &Element; D}{Σ} w_{ij} \times {HubScore}^{(t)} (d_{j}),

{HubScore}^{(t + 1)} (d_{j}) = \underset{s_{i} &Element; S}{Σ} w_{ij} \times {AuthScore}^{(t)} (s_{i});

Wherein, t is a natural number, AuthScore ^(t+1)(s _i) and HubScore ^(t+1)(d _j) represent sentence s respectively _iWith document d _jWeights of importance value behind t+1 wheel interative computation, AuthScore ^(t)(s _i) and HubScore ^(t)(d _j) represent sentence s respectively _iWith document d _jLast round of, i.e. weights of importance value behind the t wheel interative computation.

Represent above-mentioned iterative computation formula with matrix form, be specially:

A ^(t+1)＝LH ^(t)，

H ^(t+1)＝L ^TA ^(t)，

Wherein, A=[AuthScore (s _i)] _{N * 1}And H=[HubScore (d _j)] _{M * 1}Represent sentence weights of importance value vector sum importance of documents weighted value vector respectively.

Whenever, take turns the sentence weights of importance value vector sum importance of documents weighted value vector that interative computation obtains and carry out standardization processing what obtain by above-mentioned steps, so that the weights of importance value sum of all sentences is 1 in the sentence set, the weights of importance value sum of all documents is 1 in the document sets, promptly

A ^(t+1)＝A ^(t+1)/‖A ^(t+1)‖ ₁，

H ^(t+1)＝H ^(t+1)/‖H ^(t+1)‖ ₁；

Wherein, ‖ A ^(t+1)‖ ₁With ‖ H ^(t+1)‖ ₁Represent vectorial A respectively ^(t+1)And H ^(t+1)The weights of importance value sum of middle all elements.

The basic thought of above-mentioned interative computation is to regard sentence in the sentence set and the relation between the document in the document sets as the relation of the Authority-Hub between the webpage in the networked information retrieval field that is similar to, and utilize the HITS iterative algorithm to find the solution, the HITS iterative algorithm is based on following two hypothesis:

A, an important documents are associated with more important sentences usually;

B, an important sentences are associated with more important documents usually.

The 3rd step, according to the text similarity value between each sentence, the second weights of importance value that goes on foot all sentences in the sentence set that obtains is adjusted, choose the low sentence of weights of importance value height and text redundancy and form document set abstracts.Concrete implementation method can have multiple, and specific implementation process in the present embodiment is as follows:

(1) obtains sentence relational matrix M=(M _Ij) _{N * n}, and this matrix standardized obtain matrix

M wherein _IjAny two sentence s among the expression sentence collection S _iAnd s _jBetween the text similarity value, and determine sentence s in the above-mentioned first step _iWith document d _jSimilarity value w _IjMethod similar, can determine by cosine formula, after this M is carried out following standardization so that each row sum is 1, i.e. any sentence s in the sentence set _iWith the similarity value sum of other sentence in the sentence set be 1, obtain matrix

(2) two set A=φ of initialization (empty set), B={s _i| i=1,2 ... n}, the final weights of importance value RankScore (s of each sentence _i) initial value be the weights of importance value AuthScore (s that obtains in above-mentioned second step _i), i.e. RankScore (s _i)=AuthScore (s _i);

(3) element among the pair set B carries out descending sort according to final weights of importance value;

(4) suppose s _iBe the most forward sentence of ordering in the sequence that is obtained in the step (3), i.e. first sentence in the sequence is with s _iTransfer to set A from set B, and the residue sentence among the pair set B, i.e. s _j(j ≠ i) carry out redundancy punishment according to following rule:

RankScore (s_{j}) = RankScore (s_{j}) - ω \times {\tilde{M}}_{ji} \times AuthScore (s_{i}),

Wherein, ω＞0, ω is the punishment degree factor, ω is big more to show that redundant punishment is strong more, in the present embodiment, establishes ω=10;

Be the sentence relational matrix after the standardization that obtains in (1);

(5) circulation execution in step (3) and (4) are till B=φ;

(6) select n sentence of sentence weights of importance value maximum to form summary from set A, wherein n is a natural number.

The multi-document summary acquisition methods that the embodiment of the invention proposes, utilized the relation between the sentence and document in the document sets, considered of the influence of the difference of different document importance in the document sets to sentence weights of importance value, therefore with prior art when the definite sentence weights of importance value, only consider that the technical scheme that concerns between the sentence compares, can determine the weights of importance value of sentence in the set of document sets sentence more accurately, and select the high sentence of weights of importance value to form document set abstracts, therefore can obtain better document set abstracts obtains effect.

In order to verify the validity of the method that the embodiment of the invention proposes, the evaluation and test data and the task that adopt document to understand meeting (DUC, Document Understanding Conference) are carried out following test to the method that the present invention proposes.Selection comprises the DUC2001 of 30 document sets and comprises the DUC2002 data of 59 document sets, require document set abstracts that different summary acquisition methods obtains in 100 words, and the document set abstracts that obtains and the document set abstracts that manually obtains compared, estimate the effect of summary acquisition methods.Usually use the ROUGE evaluating system to weigh the validity of summary acquisition methods, comprise three evaluation index ROUGE-1, ROUGE-2 and ROUGE-W, the effect of the big more descriptive abstract acquisition methods of the numerical value of above-mentioned three indexs is good more.The method that the present invention proposes and existing based on the graph model method that concerns between the sentence evaluation result as shown in Table 1 and Table 2.

Table 1: the summary on DUC2001 evaluation and test data obtains the result

System	??ROUGE-1	??ROUGE-2	??ROUGE-W
System	??ROUGE-1	??ROUGE-2	??ROUGE-W	The method that the present invention proposes	??0.37744	??0.06966	??0.11252
Existing method	??0.35474	??0.05733	??0.10667	The method that the present invention proposes	??0.37744	??0.06966	??0.11252

Table 2: the summary on DUC2002 evaluation and test data obtains the result

System	??ROUGE-1	??ROUGE-2	??ROUGE-W
System	??ROUGE-1	??ROUGE-2	??ROUGE-W	The method that the present invention proposes	??0.38569	??0.08519	??0.12500
Existing method	??0.37510	??0.07973	??0.12198	The method that the present invention proposes	??0.38569	??0.08519	??0.12500

Correspondingly, the embodiment of the invention also provides a kind of document set abstracts deriving means, please refer to accompanying drawing 3, and this device comprises sentence set extraction unit 310, sentence weights of importance value determining unit 320 and summary determining unit 330, wherein,

Sentence set extraction unit 310 is used for extracting each sentence that comprises in each document of document sets, forms the sentence set, in the specific implementation, can carry out subordinate sentence to the document in the document sets and handle, and extracts each sentence that comprises in each document of document sets;

Sentence weights of importance value determining unit 320 is used for determining the weights of importance value of each sentence in the sentence set based on the document of document sets and the text similarity between the sentence in the sentence set;

Summary determining unit 330 is used for the weights of importance value determined according to sentence weights of importance value determining unit 320, according to weights of importance value selecting sequence from high to low, selects the sentence of defined amount to form document set abstracts.

Please refer to accompanying drawing 4, sentence weights of importance value determining unit comprises that text similarity determines that subelement 410 and sentence weights of importance value determine subelement 420, wherein,

Text similarity is determined subelement 410, be used for determining the text similarity between each sentence in each document of document sets and the sentence set, when specific implementation, use cosine formula to determine the text similarity between each sentence during each document and sentence are gathered in the document sets;

Sentence weights of importance value is determined subelement 420, is used for determining the text similarity that subelement 410 is determined according to text similarity, by the interative computation mode, determines the weights of importance value of each sentence in the sentence set.

Please refer to accompanying drawing 5, sentence weights of importance value determines that subelement comprises that interative computation subelement 510, interative computation finish judgement subelement 520 and sentence weights of importance value is determined subelement 530, wherein,

Interative computation subelement 510 is used for according to following account form, determines the sentence weights of importance value that each iteration obtains:

{AuthScore}^{(t + 1)} (s_{i}) = \underset{d_{j} &Element; D}{Σ} w_{ij} \times {HubScore}^{(t)} (d_{j}),

{HubScore}^{(t + 1)} (d_{j}) = \underset{s_{i} &Element; S}{Σ} w_{ij} \times {AuthScore}^{(t)} (s_{i});

Wherein, t is a natural number, and t+1 represents this interative computation, and t represents the last iteration computing;

AuthScore ^(t+1)(s _i) be illustrated in this interative computation i sentence s in the sentence set _iThe weights of importance value;

HubScore ^(t+1)(d _j) be illustrated in this interative computation j document d in the document sets _jThe weights of importance value;

AuthScore ^(t)(s _i) be illustrated in the last iteration computing i sentence s in the sentence set _iThe weights of importance value;

HubScore ^(t)(d _j) be illustrated in the last iteration computing, divide document to concentrate j document d _jThe weights of importance value;

w _IjI sentence s in the set of expression sentence _iWith j document d in the document sets _jThe text similarity degree;

Interative computation finishes to judge subelement 520, be used for determining interative computation subelement 510 behind interative computation last time, the weights of importance value of each document in the weights of importance value of each sentence and the document sets in the sentence set, respectively with the last iteration computing after, when the weights of importance value of each document equated in the weights of importance value of each sentence and the document sets in the sentence set, termination of iterations operator unit 510 carries out interative computation to be handled;

Sentence weights of importance value is determined subelement 530, be used for when interative computation finishes to judge that the interative computation that carries out subelement 520 termination of iterations operator unit 510 is handled, with the weights of importance value of each sentence in the sentence set that obtains behind the interative computation subelement 510 last interative computations, as the weights of importance value of each sentence in the sentence set of asking for.

Please refer to accompanying drawing 6, the summary determining unit comprises that the weights of importance value is adjusted subelement 610, document set abstracts obtains subelement 620, wherein,

The weights of importance value is adjusted subelement 610, is used for adjusting the weights of importance value of each sentence according to the text similarity value between each sentence;

Document set abstracts obtains subelement 620, is used for adjusting subelement 610 adjusted weights of importance values selecting sequence from high to low according to the weights of importance value, selects the sentence of defined amount to form document set abstracts.

Please refer to accompanying drawing 7, the weights of importance value is adjusted subelement and is comprised that order module 710, sentence repeat to select module 720 and weights of importance value determination module 730, wherein,

Order module 710 is used for according to weights of importance value order from high to low, and each sentence in the distich subclass sorts, and obtains the sentence sequence;

Sentence repeats to select module 720, is used for the sentence sequence that obtains in order module 710, repeats following processing, all sentences in the sentence sequence are all selected go out till:

Select the highest sentence of weights of importance value, at each sentence in the residue sentence in the sequence, weights of importance value with this sentence is adjusted into the weights of importance value of this sentence and the difference of penalty value respectively, described penalty value is the weights of importance value three's of the text similarity value of penalty factor, this sentence and the selected sentence that goes out and this sentence a product, and wherein said penalty factor is greater than 0;

Weights of importance value determination module 730 is used for sentence is repeated to select the weights of importance value of the weights of importance value of all sentences that module 720 selects as adjusted all sentences.

Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these are revised and modification belongs within the scope of claim of the present invention and equivalent technologies thereof, then the present invention also is intended to comprise these changes and modification interior.

Claims

1. a method for acquiring document set abstracts is characterized in that, comprising:

2. the method for claim 1 is characterized in that, based on the text similarity between the sentence in each document in the document sets and the sentence set, determines the weights of importance value of each sentence in the sentence set, specifically comprises:

Determine the text similarity between the sentence in the set of document in the document sets and sentence; And

According to the text similarity between each sentence in document in the document sets and the sentence set,, determine the weights of importance value of each sentence in the sentence set by the interative computation mode.

3. method as claimed in claim 2 is characterized in that, based on the interative computation mode, determines the weights of importance value of each sentence in the sentence set, and concrete computation process is as follows:

AuthScor e^{(t + 1)} (s_{i}) = \underset{d_{j} &Element; D}{Σ} w_{ij} \times HubScor e^{(t)} (d_{j}),

{HubScore}^{(t + 1)} (d_{j}) = \underset{s_{i} &Element; S}{Σ} w_{ij} \times A {uthScore}^{(t)} (s_{i});

Repeat above-mentioned each interative computation process, up to behind interative computation last time, the weights of importance value of each document in the weights of importance value of each sentence and the document sets in the sentence set, respectively with the last iteration computing after, in the sentence set in the weights of importance value of each sentence and the document sets weights of importance value of each document equate to stop;

After interative computation stopped, behind last interative computation, the weights of importance value of each sentence was as the weights of importance value of each sentence in the sentence set of asking in the sentence set.

4. the method for claim 1 is characterized in that, according to the weights of importance value of each sentence of determining, according to weights of importance value selecting sequence from high to low, selects the sentence of defined amount to form document set abstracts, specifically comprises:

According to the text similarity value between each sentence, adjust the weights of importance value of each sentence;

According to above-mentioned adjusted weights of importance value selecting sequence from high to low, select the sentence of defined amount to form document set abstracts.

5. method as claimed in claim 4 is characterized in that, and is described in the sentence sequence, according to the text similarity value between each sentence, adjusts the weights of importance value of each sentence, specifically comprises:

According to weights of importance value order from high to low, each sentence in the distich subclass sorts, and obtains the sentence sequence;

In the sentence sequence, repeat following processing, all sentences in the sentence sequence are all selected go out till:

Select the highest sentence of weights of importance value, at each sentence in the residue sentence in the sequence, weights of importance value with this sentence is adjusted into the weights of importance value of this sentence and the difference of penalty value respectively, described penalty value is the weights of importance value three's of the text similarity value of penalty factor, this sentence and the selected sentence that goes out and this sentence a product, and wherein said penalty factor is greater than 0; With the weights of importance value of all sentences of selecting weights of importance value as adjusted all sentences.

6. method as claimed in claim 5 is characterized in that, described penalty factor is 10.

7. a document set abstracts deriving means is characterized in that, comprising:

8. device as claimed in claim 7 is characterized in that, described sentence weights of importance value determining unit specifically comprises:

Text similarity is determined subelement, is used for determining the text similarity between each sentence in each document of document sets and the sentence set;

Sentence weights of importance value is determined subelement, is used for determining the text similarity that subelement is determined according to text similarity, by the interative computation mode, determines the weights of importance value of each sentence in the sentence set.

9. device as claimed in claim 8 is characterized in that, described sentence weights of importance value determines that subelement specifically comprises:

The interative computation subelement is used for according to following account form, determines the sentence weights of importance value that each iteration obtains:

AuthScor e^{(t + 1)} (s_{i}) = \underset{d_{j} &Element; D}{Σ} w_{ij} \times HubScor e^{(t)} (d_{j}),

{HubScore}^{(t + 1)} (d_{j}) = \underset{s_{i} &Element; S}{Σ} w_{ij} \times A {uthScore}^{(t)} (s_{i});

Interative computation finishes to judge subelement, be used for determining the interative computation subelement behind interative computation last time, the weights of importance value of each document in the weights of importance value of each sentence and the document sets in the sentence set, respectively with the last iteration computing after, when the weights of importance value of each document equated in the weights of importance value of each sentence and the document sets in the sentence set, termination of iterations operator unit carries out interative computation to be handled;

Sentence weights of importance value is determined subelement, be used for when interative computation finishes to judge the interative computation processing of carrying out subelement termination of iterations operator unit, with the weights of importance value of each sentence in the sentence set that obtains behind the last interative computation of interative computation subelement, as the weights of importance value of each sentence in the sentence set of asking for.

10. device as claimed in claim 7 is characterized in that, described summary determining unit specifically comprises:

The weights of importance value is adjusted subelement, is used for adjusting the weights of importance value of each sentence according to the text similarity value between each sentence;

Document set abstracts obtains subelement, is used for adjusting the adjusted weights of importance value of subelement selecting sequence from high to low according to the weights of importance value, selects the sentence of defined amount to form document set abstracts.

11. device as claimed in claim 10 is characterized in that, described weights of importance value is adjusted subelement and is specifically comprised:

Order module is used for according to weights of importance value order from high to low, and each sentence in the distich subclass sorts, and obtains the sentence sequence;

Sentence repeats to select module, is used for the sentence sequence that obtains in order module, repeats following processing, all sentences in the sentence sequence are all selected go out till:

Weights of importance value determination module is used for sentence is repeated to select the weights of importance value of the weights of importance value of all sentences that module selects as adjusted all sentences.