CN103885935A

CN103885935A - Book section abstract generating method based on book reading behaviors

Info

Publication number: CN103885935A
Application number: CN201410090143.6A
Authority: CN
Inventors: 鲁伟明; 安文佳; 吴江琴; 庄越挺
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2014-03-12
Filing date: 2014-03-12
Publication date: 2014-06-25
Anticipated expiration: 2034-03-12
Also published as: CN103885935B

Abstract

The invention discloses a book section abstract generating method based on book reading behaviors. A book section abstract generating technology based on the book reading behaviors is essentially a document abstract generating technology, namely, the reading behaviors of a user are added into document abstract generation and applied to engineering science and education book resources. According to the book section abstract generating method, the weight of each book page in a book section is calculated by adopting a book page quantification reading behavior grading mechanism, then the book section is divided according to sentences, the similarity among the sentences is calculated according to distances, sentence weight values existing already are spread according to fashion structures, finally, based on the concept of data reconstitution, the sentences which can represent the content of the book section best are selected out to serve as a book section abstract. The reading behaviors of the user are collected and used in importance evaluation of the book pages, the corresponding book section abstract is obtained based on the concept of data reconstitution, and then the user is assisted in rapidly learning the content of the book section to improve book reading efficiency.

Description

Books chapters and sections abstraction generating method based on books reading behavior

Technical field

The present invention relates to documentation summary generation method, relate in particular to a kind of books chapters and sections abstraction generating method based on books reading behavior.

Background technology

Growing along with digital library, user is before read books, and hope can be understood books chapters and sections content information fast and accurately, urgently wishes can provide in digital library the service of books chapters and sections summary.

Books chapters and sections summarization generation is a kind of documentation summary generation method based on reading behavior in essence, by the modeling of user's reading behavior, according to behavior model, user's reading factor is added in documentation summary generating algorithm, the summary result that obtains read by user and affect.If directly adopt traditional documentation summary generation method, books chapters and sections summary may not can be accurately expressed chapters and sections content information from user's reading angle, so also just cannot meet user's demand.

In traditional reading, the destination object of readers ' reading is simple definite linguistic notation.In the beginning of reading and the end of reading, reader only obtains and obtains cognition by the content information of word, is one and departs from the existence of social encouragement.The appearance that network socialization is read, starting of making that reader selects from reading content finished to reading content, and partly or entirely process has all formed associated with social network.Be mutually related between men in community network this, reader's reading behavior often just becomes the object that needs concern and research.

Socialization reading itself is take content as core, and take social networks as tie, emphasis is shared, the reading new model of mutual exchange and effect.In the process that user reads in content, can carry out interaction with the user of same hobby, after reading finishes, can associate and contact with the masses that read same content, even form the socialization that subject under discussion merges.The overall process of share, mutual exchange and effect being read through socialization.And in these interactions, produced a large amount of new valuable contents, as comment, summary, notes, association or intersection information.

The basic summarization generation algorithm adopting in the time carrying out books chapters and sections summarization generation is the documentation summary generating algorithm (DSDR) based on data reconstruction.Documentation summary generating algorithm based on data reconstruction is a kind of removable method, the documentation summary that the method has been thought should meet a feature: from farthest reconstruct original document of the summary of results, and the expressed content information of the whole document of covering that the summary of results can be tried one's best.

On the basis of the documentation summary generating algorithm based on data reconstruction, various actions user in the time that socialization is read are taken into account, such as reading time, user's important sentences circle professional jargon is, these sentences that enclosed picture are often considered to higher representativeness, compared with the sentence of circle picture, will not have higher weighing factor with other.

Summary of the invention

The object of the invention is, for the chapters and sections that can facilitate user to understand fast books chapters and sections information summary is provided, to have provided a kind of books chapters and sections abstraction generating method based on books reading behavior.

The technical scheme that the present invention solves its technical matters employing is as follows:

The step of the books chapters and sections abstraction generating method based on books reading behavior is as follows:

1) build book page and quantize reading behavior scoring: user's reading behavior is divided into four levels from shallow to deep by reading the degree of depth, be respectively to browse level, collection level, shallow degree reading level and the degree of depth to read level, obtain the book page scoring based on user's reading behavior based on these four levels;

2) sentence weighted value propagate: by step 1) the book page scoring based on user's reading behavior obtain books page quantize score, books chapters and sections are cut apart by sentence, books page quantizes to such an extent that branch gives each sentence initial weighted value, based on the distance between sentence, utilize the popular structural sort algorithm of data to carry out the propagation of sentence weighted value;

3) books chapters and sections summarization generation: after sentence weighted value is propagated, sentence weighted value is added in the documentation summary generating algorithm based on data reconstruction, select important sentences and make a summary as chapters and sections from books chapters and sections.

Described step 1) be:

User is read the behavior of certain page by 2.1 is divided into four levels, be respectively browse level, collection level, shallow degree reads level and the degree of depth is read level, different levels have different score contributions to page;

2.2 use retention rates, turnover rate and Scoring Index decay to weigh the difficulty of reading certain level of arrival, mark with this, book page user retention rate refers to for certain book page, number of users when browsing, proceed to the ratio of the retention number of users of collection, the reading of shallow degree and degree of depth reading, book page churn rate refers to for previous step retains number of users, the ratio of the number of users that this step reduces

Set up the evaluate formula based on user's reading behavior:

V _i=[(p _i+q _i)／p _i]exp(1-p _i) i=1,2,3,4

Book page user retention rate formula:

p _i=U _i／U ₁ i=1,2,3,4

Book page churn rate formula:

q_{i} = \{\begin{matrix} U_{i} / U_{i - 1} & i = 2,3,4 \\ 1 & i = 1 \end{matrix}

Wherein: V _ifor the score contribution to books page of whole user group's reading behavior i step; p _ibe that i walks with respect to the retention rate of browsing; q _ibe the turnover rate of i step with respect to i-1 step; U _ifor proceeding to the number of users of i step;

There is dividing of priority 2.3 book page access times, the user who more first accesses and mark this book page is larger to the contribution of this page, scoring based on book page critical behavior node can calculate the significance level of book page, the significance level of book page comprehensively to divide formula equally as follows:

s_{j} = \frac{Σ_{u &Element; R_{j}} W_{uj} \times S_{uj}}{Σ_{u &Element; R_{j}} W_{uj}}

W_{uj} = \{\begin{matrix} \log_{2} (T_{j} / (t_{uj} - t_{j})) & t_{uj} &NotEqual; t_{j} \\ \log_{2} T_{j} & t_{uj} = t_{j} \end{matrix}

S_{uj} = Σ_{i = 1}^{L} V_{ij}

In above-mentioned formula: s _jfor the score value of books j page; W _ujfor the contribution weight of user u to books j page; T _jfor the summation of books j accessed time of page; t _ujfor for the first time time of access of user u to books j page; t _jfor books j page accessed time for the first time; S _ujthe score value sum of critical behavior step books j page being arrived for user u, V _ijfor the score value of user u to books j i that page reaches step critical behavior step; The degree of depth and committed step number that L arrives for user u read books j page;

2.4 according to the method for above scoring can be to the every one page of books the importance in book provide the scoring of quantification, because the otherness of books reading colony, for fear of the books page scoring high phenomenon of marking because calling party number is few, in actual page evaluation procedure, calling party number and scoring are normalized, and the comprehensive grading formula that has obtained final book page is as follows:

{PageScore}_{j} = [{\log u}_{j} - \overset{&OverBar;}{{\log u}_{J}}] + [\log_{2} s_{j} - \overset{&OverBar;}{\log_{2} s_{J}}]

In above formula: u _jfor the number of users of browsing of book page j, s _jfor the scoring to book page j, PageScore _jfor the scoring of books page, utilize with the method for mean value comparison known, only browse the number of users of book page and reader all very high to the score value of this page time, comprehensive grading just can be high, feature according to user's reading behavior in books reading, set up the book page significance level appraisement system based on user's reading behavior, four levels reading by book page quantize user behavior, define by calculating the evaluation contribution margin of four levels the difficulty that user arrives to degree of depth reading level from browsing level, finally calculate by the reading behavior of user group on book page the importance that quantizes this page.

Described step 2) be:

3.1 in step 1) in provided the score PageScore of book page j _j, this score has reflected the importance of page j in books, needs to consider that drawn sentence has relative importance in this page simultaneously, the relation of the importance of sentence and page score is as follows:

w_{i} = \{\begin{matrix} \frac{L_{i} * {PageScore}_{j}}{Σ_{i = 1}^{n} (L_{i} * {PageScore}_{j})} & L_{i} &NotEqual; 0 \\ 0 & L_{i} = 0 \end{matrix}

W in above formula _irepresent sentence v _icurrent weighted value, supposes that the set of given document sentence is

wherein v _irepresent i sentence in set V, the sentence being streaked with straight line by user be placed on set before, suppose that a front k sentence is that user streaks, ask the weighted value of sentence by being left the relation of sentence and a front k sentence;

3.2 make dis:

be illustrated in the distance metric mode on set V, can obtain every couple of sentence v _iwith sentence v _jbetween distance dis (v _i, v _j), order mapping represent to distribute to each sentence v _iweighted value f _iranking functions, vector f=[f ₁..., f _n] ^t, vectorial w=[w ₁..., w _n] ^t, wherein, if sentence v _istreaked w _i≠ 0, otherwise w _i=0, w _irepresent the initial weight value of each sentence;

3.3 are expressed as follows at the structural weight propagation algorithm of data manifold:

Step1: calculate sentence vector distance dis (v between any two _i, v _j), and ascending order arrangement, between the corresponding node of sentence vector, connecting a limit until obtain connected graph between two by ascending order list;

Step2: definition incidence matrix W, meets: if sentence vector v _iand v _jbetween corresponding point, there is a limit, W _ij=exp[-dis ²(v _i, v _j)/2 σ ²]; If sentence vector v _iand v _jbetween corresponding point, there is not limit, W _ij=0; And W _ii=0; Step3: incidence matrix W is carried out to symmetrical standardization, obtain matrix S: S=D ^-1/2wD ^-1/2, in formula, D is diagonal matrix, the diagonal element prime implicant of diagonal matrix D

D_{ii} = Σ_{j = 1}^{n} W_{ij};

Step4: iterative computation f (t+1)=aSf (t)+(1-α) w is until convergence, α be a span [0,1) parameter;

Step5: order

represent sequence { f _i(t) limit }, the limit sequence that obtains sentence weight is

{{f_{1}}^{*}, . . ., f_{n}^{*}},

Sentence weight vectors is

f = {[{f_{1}}^{*}, . . ., f_{n}^{*}]}^{T};

3.4 in Step4, and parameter alpha is used for specifying weighted value contribution and the initial weighted value of neighbor node to this node; Because the matrix S in algorithm is a diagonal matrix, so the communication process of weighted value is symmetrical; And for the convergency value of sequence { f (t) }, calculate f ^*=(I-α S) ^-1w; Through the propagation of weighted value, just obtain the reasonable weighted value of each sentence in books chapters and sections.

Described step 3) be:

4.1 obtain books chapters and sections sentence v _iweighted value

weighted value

reflect sentence v _iimportance in books chapters and sections, by n weighted value

as the diagonal element of matrix F, n weighted value carried out to diagonal matrix,

obtain diagonal matrix F, diagonal matrix F is added to the documentation summary generating algorithm based on data reconstruction;

4.2 in documentation summary generative process, to redefine linear nonnegative number as follows according to the objective function of restructing algorithm:

\min_{a_{i}, β} J = Σ_{i = 1}^{n} {{f_{i}}^{*} {| | v_{i} - V^{T} a_{i} | |}^{2} + Σ_{j = 1}^{n} \frac{a_{ij}^{2}}{β_{j}}} + γ {| | β | |}_{1}

s.t.β _j≥0，a _ij≥0，and a _i∈R ⁿ

In above formula, the process of selecting of each sentence has added books chapters and sections sentence v _iweighted value f _i ^*, wherein a _ij>=0 shows that the method only allows the additive operation of sentence in ensemble space, does not allow subtraction; β=[β simultaneously ₁, β ₂..., β _n] ^tit is an auxiliary variable; If β _j=0, all a _1j..., a _njbe 0, the candidate's sentence that this means j row does not have selected, and γ is regular terms parameter;

The objective function of 4.3 documentation summary generating algorithms based on data reconstruction is protruding optimization problems, can guarantee globally optimal solution, now, and fixing a _i, making J is 0 to the derivative of β, the minimal solution that obtains β is as follows:

β_{j} = \sqrt{\frac{Σ_{i = 1}^{n} a_{ij}^{2}}{γ}}

After having obtained the minimal solution of β, the minimization problem under nonnegativity restrictions can solve with Lagrangian method;

4.4 make α _ijfor constraint condition a _ij>=0 and A=[a _ij] under Lagrangian, lagrange formula L is as follows:

L=J+Tr[αA ^T]=Tr[F(V-AV)(V-AV) ^T+diag(β) ^-1A ^TA]+γ||β|| ₁+Tr[αA ^T]，α=[α _ij]

F is the diagonal matrix in step 4.1, and the element entry on diagonal matrix F diagonal line is respectively

also be a diagonal matrix, the element entry on diagonal matrix diag (β) diagonal line is respectively β ₁..., β _n;

4.5 lagrange formula L are as follows to A differentiate result:

\frac{&PartialD; L}{&PartialD; A} = - 2 {FVV}^{T} + 2 {FAVV}^{T} + 2 Adiag {(β)}^{- 1} + α

Order derivative be 0, can obtain being expressed as follows about α:

α=2FVV ^T-2FAVV ^T-2Adiag(β) ^-1

According to Karush-Kuhn-Tucker condition α _ija _ij=0, to the every a that is multiplied by of above formula _ijobtain following equation:

(FVV ^T) _ija _ij-(FAVV ^T) _ija _ij-(Adiag(β) ^-1) _ija _ij=0

Obtain following more new formula according to above formula:

a_{ij} &LeftArrow; \frac{a_{ij} {({FVV}^{T})}_{ij}}{{[{FAVV}^{T} + Adiag {(β)}^{- 1}]}_{ij}}

Above-mentioned more new formula iteration is carried out until convergence, finally obtained the summary sentence of books chapters and sections.

The beneficial effect that the inventive method compared with prior art has:

1. the method combines the modeling of user's reading behavior and documentation summary generation method, and the documentation summary generating algorithm based on data reconstruction is applied on books chapters and sections summarization generation, obtains the summary info of books chapters and sections;

2. the method has been carried out analysis modeling to user's reading behavior, and modeling method adopts the thought based on reading the degree of depth, and reading behavior is carried out to level division, has finally provided the comprehensive grading system of books pages, represents the significance level of books page with score height;

3. the method, take the sentence of books chapters and sections as unit, is carried out the propagation of weighted value on data stream row space according to existing sentence weighted value, finally obtains the reasonable weighted value size of each sentence, and it is more accurate to make the reflection of user behavior.

Accompanying drawing explanation

Fig. 1 is the books chapters and sections abstraction generating method system architecture diagram based on books reading behavior;

Fig. 2 is sentence weighted value transmission method block diagram of the present invention;

Fig. 3 is the library catalogue figure of the embodiment of the present invention;

Fig. 4 is the first chapters and sections schematic diagram of the embodiment of the present invention;

Fig. 5 is the chapters and sections summarization generation result figure of the embodiment of the present invention.

Embodiment

As depicted in figs. 1 and 2, the step of the books chapters and sections abstraction generating method based on books reading behavior is as follows:

Described step 1) be:

2.2 use retention rate, turnover rate and Scoring Index decay to weigh the difficulty of reading certain level of arrival, mark with this, between scoring and retention rate, there is a kind of relation of exponential damping, mark relevant with the turnover rate of previous step in the value of a certain step, also relevant to the retention rate of starting stage, here first provide book page user retention rate and turnover rate definition, book page user retention rate refers to for certain book page, number of users when browsing, proceed to collection, the ratio of the retention number of users that shallow degree reading and the degree of depth are read, book page churn rate refers to for previous step retains number of users, the ratio of the number of users that this step reduces,

Set up the evaluate formula based on user's reading behavior:

V _i=[(p _i+q _i)／p _i]exp(1-p _i) i=1,2,3,4

Book page user retention rate formula:

p _i=U _i／U ₁ i=1,2,3,4

Book page churn rate formula:

q_{i} = \{\begin{matrix} U_{i} / U_{i - 1} & i = 2,3,4 \\ 1 & i = 1 \end{matrix}

Wherein: V _ifor the score contribution to books page of whole user group's reading behavior i step; p _ibe that i walks with respect to the retention rate of browsing; q _ibe the turnover rate of i step with respect to i-1 step; Ui is the number of users that proceeds to i step;

There is dividing of priority 2.3 book page access times, the user who more first accesses and mark this book page is larger to the contribution of this page, if first calling party has just carried out degree of depth reading to certain page, the significance level of this page is relatively higher, scoring based on book page critical behavior node can calculate the significance level of book page, the significance level of book page comprehensively to divide formula equally as follows:

s_{j} = \frac{Σ_{u &Element; R_{j}} W_{uj} \times S_{uj}}{Σ_{u &Element; R_{j}} W_{uj}}

W_{uj} = \{\begin{matrix} \log_{2} (T_{j} / (t_{uj} - t_{j})) & t_{uj} &NotEqual; t_{j} \\ \log_{2} T_{j} & t_{uj} = t_{j} \end{matrix}

S_{uj} = Σ_{i = 1}^{L} V_{ij}

{PageScore}_{j} = [{\log u}_{j} - \overset{&OverBar;}{{\log u}_{J}}] + [\log_{2} s_{j} - \overset{&OverBar;}{\log_{2} s_{J}}]

Described step 2) be:

w_{i} = \{\begin{matrix} \frac{L_{i} * {PageScore}_{j}}{Σ_{i = 1}^{n} (L_{i} * {PageScore}_{j})} & L_{i} &NotEqual; 0 \\ 0 & L_{i} = 0 \end{matrix}

3.2 make dis:

be illustrated in the distance metric mode on set V, can obtain every couple of sentence v _iwith sentence v _jbetween distance dis (v _i, v _j), order mapping represent to distribute to each sentence v _ithe ranking functions of weighted value fi, vector f=[f ₁..., f _n] ^t, vectorial w=[w ₁..., w _n] ^t, wherein, if sentence vi is streaked, w _i≠ 0, otherwise w _i=0, w _irepresent the initial weight value of each sentence;

Step2: definition incidence matrix W, meets: if sentence vector v _iand v _jbetween corresponding point, there is a limit, W _ij=exp[-dis ²(v _i, v _j)/2 σ ²]; If sentence vector v _iand v _jbetween corresponding point, there is not limit, W _ij=0; And Wii=0; Step3: incidence matrix W is carried out to symmetrical standardization, obtain matrix S: S=D ^-1/2wD ^-1/2, in formula, D is diagonal matrix, the diagonal element prime implicant of diagonal matrix D

D_{ii} = Σ_{j = 1}^{n} W_{ij};

Step4: iterative computation f (t+1)=α Sf (t)+(1-α) w is until convergence, α be a span [0,1) parameter;

Step5: order the limit that represents sequence { fi (t) }, the limit sequence that obtains sentence weight is

{{f_{1}}^{*}, . . ., f_{n}^{*}},

Sentence weight vectors is

f = {[{f_{1}}^{*}, . . ., f_{n}^{*}]}^{T};

3.4 in Step4, and parameter alpha is used for specifying weighted value contribution and the initial weighted value of neighbor node to this node; Because the matrix S in algorithm is a diagonal matrix, so the communication process of weighted value is symmetrical; And for the convergency value of sequence { f (t) }, calculate f ^*=(I-aS) ^-1w; Through the propagation of weighted value, just obtain the reasonable weighted value of each sentence in books chapters and sections.

Described step 3) be:

4.1 obtain books chapters and sections sentence v _iweighted value f _i ^*, weighted value f _i ^*reflect sentence v _iimportance in books chapters and sections, by n weighted value f _i ^*as the diagonal element of matrix F, n weighted value carried out to diagonal matrix, i.e. F _ii=f _i ^*, obtain diagonal matrix F, diagonal matrix F is added to the documentation summary generating algorithm based on data reconstruction;

\min_{a_{i}, β} J = Σ_{i = 1}^{n} {{f_{i}}^{*} {| | v_{i} - V^{T} a_{i} | |}^{2} + Σ_{j = 1}^{n} \frac{a_{ij}^{2}}{β_{j}}} + γ {| | β | |}_{1}

s.t.βj≥0，a _ij≥0，and a _i∈R ⁿ

In above formula, the process of selecting of each sentence has added books chapters and sections sentence v _iweighted value f _i ^*, wherein a _ij>=0 shows that the method only allows the additive operation of sentence in ensemble space, does not allow subtraction; Simultaneously

β=[β ₁, β ₂..., β _n] ^tit is an auxiliary variable; If β _j=0, all a _1j..., a _njbe 0, the candidate's sentence that this means j row does not have selected, and γ is regular terms parameter;

β_{j} = \sqrt{\frac{Σ_{i = 1}^{n} a_{ij}^{2}}{γ}}

F is the diagonal matrix in step 4.1, and the element entry on diagonal matrix F diagonal line is respectively diag (β) is also a diagonal matrix, and the element entry on diagonal matrix diag (β) diagonal line is respectively β ₁..., β _n;

4.5 lagrange formula L is as follows to A differentiate result:

\frac{&PartialD; L}{&PartialD; A} = - 2 {FVV}^{T} + 2 {FAVV}^{T} + 2 Adiag {(β)}^{- 1} + α

Order

derivative be 0, can obtain being expressed as follows about α:

α=2FVV ^T-2FAVV ^T-2Adiag(β) ^-1

(FVV ^T) _ija _ij-(FAVV ^T) _ija _ij-(Adiag(β) ^-1) _ija _ij=0

Obtain following more new formula according to above formula:

a_{ij} &LeftArrow; \frac{a_{ij} {({FVV}^{T})}_{ij}}{{[{FAVV}^{T} + Adiag {(β)}^{- 1}]}_{ij}}

Embodiment

As shown in Figures 3 to 5, provided an application example of books chapters and sections abstraction generating methods.Describe below in conjunction with the method for this technology the concrete steps that this example is implemented in detail, as follows:

(1) at all books chapters and sections of pre-service of system, obtain books chapters and sections document content.Suppose that user is just at the first segment " definition " of the chapter 1 " Distributed Calculation brief introduction " of read books " Distributed Calculation principle and application ", want to know the chapters and sections summary of this joint, click Directory button, double-click corresponding chapters and sections, first system obtains the data such as the text message of these chapters and sections and user's reading behavior.

(2) type and the level read at these chapters and sections according to user's reading behavior data analysis user, the importance degree that obtains books page according to the comprehensive grading formula of books page quantizes score.

(3) text data of these chapters and sections of books is pressed to sentence and divided, the quantification score of reading setting-out behavior and books page in conjunction with user, has obtained by the initial weight value of line sentence.

(4) sentence is done to participle, remove the processing such as stop words, each sentence builds the vector of a higher dimensional space, obtains sentence similarity between any two according to the distance between vector.

(5) carry out the propagation of sentence initial weight value by the sort method on data manifold space, finally obtain the rational weighted value of each sentence.

(6) sentence weighted value matrix F is added in the documentation summary generating algorithm based on data reconstruction, execution algorithm, until the summary info of some sentences (depending on chapters and sections length) as these books chapters and sections chosen in convergence from these books chapters and sections, finally returns to user.

The operation result of this example at accompanying drawing 3 to middle demonstration, user is just at read books, can check by catalogue the clip Text of corresponding chapters and sections, facilitate the faster more detailed chapters and sections content of understanding of user, this books chapters and sections abstraction generating method has good use value and application prospect.

Claims

1. the books chapters and sections abstraction generating method based on books reading behavior, is characterized in that its step is as follows:

2. according to the books chapters and sections abstraction generating method based on books reading behavior described in claim 1, it is characterized in that described step 1) be:

Set up the evaluate formula based on user's reading behavior:

V _i=[(p _i+q _i)／p _i]exp(1-p _i) i=1,2,3,4

Book page user retention rate formula:

p _i=U _i／U ₁ i=1,2,3,4

Book page churn rate formula:

q_{i} = \{\begin{matrix} U_{i} / U_{i - 1} & i = 2,3,4 \\ 1 & i = 1 \end{matrix}

There is dividing of priority 2.3 book page access times, the user who more first accesses and mark this book page is larger to the contribution of this page, calculate the significance level of book page based on the scoring of book page critical behavior node, the significance level of book page comprehensively to divide formula equally as follows:

s_{j} = \frac{Σ_{u &Element; R_{j}} W_{uj} \times S_{uj}}{Σ_{u &Element; R_{j}} W_{uj}}

W_{uj} = \{\begin{matrix} \log_{2} (T_{j} / (t_{uj} - t_{j})) & t_{uj} &NotEqual; t_{j} \\ \log_{2} T_{j} & t_{uj} = t_{j} \end{matrix}

S_{uj} = Σ_{i = 1}^{L} V_{ij}

{PageScore}_{j} = [{\log u}_{j} - \overset{&OverBar;}{{\log u}_{J}}] + [\log_{2} s_{j} - \overset{&OverBar;}{\log_{2} s_{J}}]

3. the books chapters and sections abstraction generating method based on books reading behavior according to claim 1, is characterized in that described step 2) be:

w_{i} = \{\begin{matrix} \frac{L_{i} * {PageScore}_{j}}{Σ_{i = 1}^{n} (L_{i} * {PageScore}_{j})} & L_{i} &NotEqual; 0 \\ 0 & L_{i} = 0 \end{matrix}

W in above formula _jrepresent sentence v _icurrent weighted value, supposes that the set of given document sentence is

3.2 make dis: be illustrated in the distance metric mode on set V, can obtain every couple of sentence v _iwith sentence v _jbetween distance dis (v _i, v _j), order mapping

represent to distribute to each sentence v _iweighted value f _iranking functions, vector f=[f ₁..., f _n] ^t, vectorial w=[w ₁..., w _n] ^t, wherein, if sentence v _istreaked w _i≠ 0, otherwise w _i=0, w _irepresent the initial weight value of each sentence;

D_{ii} = Σ_{j = 1}^{n} W_{ij};

Step5: make f _i ^*represent sequence { f _i(t) limit }, the limit sequence that obtains sentence weight is sentence weight vectors is

4. the books chapters and sections abstraction generating method based on books reading behavior according to claim 1, is characterized in that described step 3) be:

4.1 obtain books chapters and sections sentence v _iweighted value f _i ^*, weighted value f _i ^*reflect sentence v _iimportance in books chapters and sections, by n weighted value f _i ^*as the diagonal element of matrix F, n weighted value carried out to diagonal matrix, i.e. F _ii=f _i ^*, obtain diagonal matrix F, add the documentation summary based on data reconstruction to generate diagonal matrix F

Algorithm;

\min_{a_{i}, β} J = Σ_{i = 1}^{n} {{f_{i}}^{*} {| | v_{i} - V^{T} a_{i} | |}^{2} + Σ_{j = 1}^{n} \frac{a_{ij}^{2}}{β_{j}}} + γ {| | β | |}_{1}

s.t.β _j≥0，a _ij≥0，and a _i∈R ⁿ

β_{j} = \sqrt{\frac{Σ_{i = 1}^{n} a_{ij}^{2}}{γ}}

diag (β) is also-individual diagonal matrix that the element entry on diagonal matrix diag (β) diagonal line is respectively β ₁..., β _n;

4.5 lagrange formula L are as follows to A differentiate result:

\frac{&PartialD; L}{&PartialD; A} = - 2 {FVV}^{T} + 2 {FAVV}^{T} + 2 Adiag {(β)}^{- 1} + α

Order

derivative be 0, can obtain being expressed as follows about α:

α=2FVV ^T-2FAVV ^T-2Adiag(β) ^-1

(FVV ^T) _ija _ij-(FAVV ^T) _ija _ij-(Adiag(β) ^-1) _ija _ij=0

Obtain following more new formula according to above formula:

a_{ij} &LeftArrow; \frac{a_{ij} {({FVV}^{T})}_{ij}}{{[{FAVV}^{T} + Adiag {(β)}^{- 1}]}_{ij}}