CN104615779A

CN104615779A - Method for personalized recommendation of Web text

Info

Publication number: CN104615779A
Application number: CN201510090280.4A
Authority: CN
Inventors: 尹子都; 岳昆; 张骥先; 武浩; 刘惟一
Original assignee: Yunnan University YNU
Current assignee: Yunnan University YNU
Priority date: 2015-02-28
Filing date: 2015-02-28
Publication date: 2015-05-13
Anticipated expiration: 2035-02-28
Also published as: CN104615779B

Abstract

The invention discloses a method for a personalized recommendation of a Web text. The method comprises the following steps: performing feature extraction on a plurality of kinds of Web texts generated before a certain time t so as to obtain a feature matrix E of a Web text set, and then performing cluster so as to obtain n categories; besides, according to a time span hj from the time when a Web text oj in a Web text subset relevant to the behavior of a certain user ui before the time t to the time t, calculating out a preference influence degree dj to the user ui so as to obtain a pair of category number-influence degree cj of the Web text oj and generate a dynamic preference vector of the user ui; if the preference influence degree of the user and the category of the Web text to be recommended is found to be higher than or equal to a threshold value tan, recommending the Web text to be recommended to the user. According to the method disclosed by the invention, the dynamic influence changed with the time lapse of current preference from the historic behavior of the user is considered, and the method is more accurate in recommendation, has dynamic performance, and more conforms to the actual condition.

Description

A kind of Web text individuation recommend method

Technical field

The invention belongs to magnanimity information processing and data mining technology field, more specifically say, relate to a kind of Web text individuation recommend method, the historical data based on user behavior obtains user preference, recommends the Web text of interested and potential interest to user.

Background technology

The appearance of internet and universally meet the demand of user in the information age to information, but the raising of the evolution of network and people's cognitive ability, make the generation speed of information constantly accelerate.

Web text is the various Web information with text representation, and the text description of Internet news, content of microblog, e-business network site commodity or evaluation etc. are all the Typical Representatives of current Web text.Developing rapidly and popularizing along with Internet technology, a large amount of Web texts produces and becomes the important carrier of internet information.Can to obtain for user and the Web amount of text browsed exceed the actual ability that can process of user, to have occurred information overload problem, the demand of user becomes and obtains required information to greatest extent.

For the personalized recommendation of Web text, we need by analyzing in user's the past period the browsing of Web text, evaluate, pay close attention to or the behavior record of forwarding etc. and the historical data of user behavior, calculate the preference of user, Web text processed simultaneously and extract feature, the Web text meeting user preference condition is pushed to corresponding user.

Web text commending system, the various Web information of main process and recommendation text representation, comprise user modeling module, Web text modeling module and recommend method module.Wherein the foundation of Web text modeling module depends on user modeling module, and recommend method module needs to consider user modeling module and Web text modeling module, it can thus be appreciated that user modeling module and correlation method are core and the key of whole commending system.For this reason, need to set up effective user model and corresponding matching mechanisms, the foundation of known user modeling module is based on the historical data of user behavior, namely user's past to the browsing of Web text, evaluate, pay close attention to or the historical data of the behavior such as forwarding, completing user MBM, namely set up user preferences modeling, finally implement the personalized recommendation of Web text according to specific user.

Lu Meilian etc. propose " Individuation research direction commending system and recommend method based on theme ", and (on Dec 4th, 2013 announces, publication No. is the Chinese invention patent application of CN103425799A), use the historical data completing user modeling of user's browing record; Wang Xiaolong etc. propose " a kind of click reaction type personalized recommendation system " (on 07 16th, 2014 Granted publications, Authorization Notice No. is the Chinese invention patent of CN102685565B), with the personalized recommendation system of associated recommendation system globe area, feed back based on click and by the historical data of user preference, result adjusted automatically, thus producing more accurate recommendation results; Zhao Yanbin etc. propose " community-based relevant note commending system and recommend method " (announces on 05 28th, 2014, publication No. is the Chinese invention patent application of CN103823805A), be given by user preference historical data and between correlativity obtain the method for specific user recommendation results; Wang Li just waits (< Journal of Software >, the 1st phase in 2012) to propose " a kind of method obtaining user preference based on user's historical behavior contextual information "; Zhong little Wu etc. propose " a kind of commending system based on domain expert " (announces on 05 30th, 2012, publication No. is the Chinese invention patent application of CN102479202A), according to project data, user data and user behavior historical data digging user to the field of the scoring of project quality, the interested and potential interest of user and expert user data, and calculate the contiguous specialist list obtaining active user, return to user as recommendation results collection.

Existing Web text individuation recommend method, although consider the historical data of user behavior, the accuracy of recommending still awaits improving.

Summary of the invention

The object of the invention is on the basis of existing technology, a kind of Web text individuation recommend method is provided, improve the accuracy of recommending interested and potential interest Web text to user further.

For achieving the above object, a kind of Web text individuation of the present invention recommend method, is characterized in that, comprise the following steps:

(1), Web Text character extraction

1.1), the set of Web text key word generates

The some Web texts produced before certain moment t form Web text collection; Participle is carried out to the content of each Web text in Web text collection, removes stop word, obtain the keyword set describing Web text;

1.2), Web text feature dimension generates

Scan the keyword set of each Web text successively, keyword is wherein added to one without in the ordered set of repeat element and keyword, obtain orderly keyword set S={s ₁, s ₂..., s _m, m represents the size of orderly keyword set S, and namely without the quantity of duplicate key word, each keyword in orderly keyword set S respectively as the dimension weighing Web text, thus sets up the characteristic dimension of Web text;

1.3), Web text feature matrix generates

For each Web text in Web text collection, that occur in statistics Web text and be contained in the word frequency of each keyword in orderly keyword set S, the value of corresponding dimension in row vector is tieed up as m, if the keyword in order in keyword set S does not occur in Web text, then the value of corresponding dimension is 0, and this m ties up the proper vector that row vector is this Web text;

The proper vector of all Web texts forms the eigenmatrix E of Web text collection, and the columns of E is m, line number is Web text number;

(2), Web text model builds

Use k-means clustering algorithm, cluster is carried out to the proper vector of Web text each in eigenmatrix E, the Web text in Web text collection is divided into some classifications, composition category set R={r ₁, r ₂..., r _n, n is classification sum, r _z(z=1,2 ..., n) representation class mark, z is class numbering;

(3), dynamic subscriber's preference modeling

Use U={u ₁, u ₂..., u _lrepresenting user's set, l represents the quantity of user, user u _i(i=1,2 ..., the Web text subset involved by behavior l) before moment t is combined into O={o ₁, o ₂..., o _v, v is the quantity of Web text, Web text o _j(j=1,2 ..., the time span of moment distance moment t v) produced is h _j;

3.1) Web text involved by user behavior, is generated to user preference disturbance degree

Web text o _jto user u _ipreference disturbance degree is d _j:

d_{j} = \frac{G (h_{j})}{Σ_{k = 1}^{v} G (h_{k})} - - - (1)

Wherein, G (h _j) and G (h _k) can be expressed as:

\begin{matrix} G (h_{j}) = e^{- \frac{h_{j}}{b}} \\ G (h_{k}) = e^{- \frac{h_{k}}{b}} \end{matrix} - - - (2)

In formula (2), e is natural logarithm, and b is for relatively to remember intensity, and b empirically sets (1≤b≤10);

3.2), Web text categories generates

Search Web text o _jaffiliated classification: search Web text o in Web text collection _j, return o _jaffiliated class numbering z _j; Meanwhile, integrating step 3.1) the Web text o that calculated _jdisturbance degree, Web text o can be obtained _jclass numbering-disturbance degree pair, be designated as c _j=(z _j, d _j);

Involved by all behaviors of user, the class numbering-disturbance degree of Web text is designated as C={c to set ₁, c ₂..., c _v;

3.3), user's preference of dynamic vector generates

If class numbering-disturbance degree is to c in set C _mand c _n(m, n=1,2 ..., v; M ≠ n) there is identical class numbering, then by c _ndisturbance degree be added to c _mdisturbance degree on, and remove c _n, until the class numbering nothing repetition that all class numbering-disturbance degree are right, the quantity that now class numbering-disturbance degree is right is v ' (v '≤v), and the individual class numbering-disturbance degree of this v ' is to formation user preference vector namely user u is generated _ipreference of dynamic vector;

(4), Web text individuation is recommended

The Web text produced after moment t is Web text to be recommended;

4.1), first, use step 1.1) in method keyword extraction is carried out to Web text to be recommended, obtain the keyword set of Web text to be recommended, use step 1.2) in method obtain Web Text eigenvector to be recommended; Then, calculate the centre coordinate of each classification in category set R, namely calculate the barycentric coordinates belonging to all Web Text eigenvectors of each classification; Again, the distance of Web Text eigenvector to be recommended to each class center coordinate is calculated; Finally, according to MMD (minimax distance) sorting algorithm, Web text to be recommended is grouped into corresponding classification, obtains the class numbering belonging to it;

4.2), the Web text generation liked of user

Search user and gather the preference of dynamic vector that in U, all users are corresponding, find out all users wherein comprising class numbering belonging to Web text to be recommended; A given disturbance degree threshold tau (0.1≤τ≤0.7), if the preference disturbance degree finding out class belonging to user and Web text to be recommended is not less than τ, then recommends this user by this Web text to be recommended.

Goal of the invention of the present invention is achieved in that

Web text individuation recommend method of the present invention, by carrying out feature extraction to the some Web texts produced before certain moment t, obtaining the eigenmatrix E of Web text collection, then carrying out cluster and obtaining n classification; Meanwhile, to certain user u _iweb text o during Web text subset involved by behavior before moment t closes _jthe time span h of the moment distance moment t produced _jcalculate it to user u _ipreference disturbance degree d _j, obtain Web text o _jclass numbering-disturbance degree to c _j, generate user u _ipreference of dynamic vector; Like this when recommending, according to the distance of Web Text eigenvector to be recommended to each class center coordinate, Web text to be recommended is grouped into corresponding classification, search user and gather the preference of dynamic vector that in U, all users are corresponding, find out all users wherein comprising class numbering belonging to Web text to be recommended; If the preference disturbance degree finding out class belonging to user and Web text to be recommended is not less than threshold tau, then this Web text to be recommended is recommended this user.

Contemplated by the invention user's historical behavior to pass in time current preference and the dynamic effects that changes, provide a kind of more accurately, there is dynamic and the Web text individuation recommend method of more realistic situation.Compared with existing recommend method, more can embody the dynamic effects of user's historical behavior to current preference, instead of think that the effect of all historical behaviors is identical, irrelevant with the time.

Accompanying drawing explanation

Fig. 1 is a kind of embodiment process flow diagram of Web text individuation recommend method.

Embodiment

Below in conjunction with accompanying drawing, the specific embodiment of the present invention is described, so that those skilled in the art understands the present invention better.Requiring particular attention is that, in the following description, when perhaps the detailed description of known function and design can desalinate main contents of the present invention, these are described in and will be left in the basket here.

Embodiment

In the present embodiment, as shown in Figure 1, Web text individuation recommend method of the present invention comprises: step (1), Web Text character extraction, by participle, keyword interpolation, keyword word frequency statistics, carries out feature extraction to Web text; Step (2), Web text model build, and by generating the concept matrix of Web text collection based on synonym woods, then carry out cluster, build Web text model; Step (3), the modeling of dynamic subscriber's preference, Web text subset involved by the behavior before the historical data of user behavior and moment t closes, and its temporal information, based on memory curve model, set up dynamic subscriber's preference pattern, the user preference that expression is passed in time and constantly changed; Step (4), Web text individuation are recommended, and consider the similarity of user preference and Web text feature, set up by matching relationship the personalized recommendation that user's recommendation list completes Web text.

Below four steps are described in detail.

1, Web Text character extraction

Web text collection is made up of the some Web texts produced before certain moment t.The object of Web Text character extraction is to generate Web text feature matrix.

1.1, the set of Web text key word generates

Participle is carried out to the content of each Web text in Web text collection, removes stop word, obtain the keyword set describing Web text.

1.2, Web text feature dimension generates

Scan the keyword set of each Web text successively, keyword is wherein added to one without in the ordered set of repeat element and keyword, obtain orderly keyword set S={s ₁, s ₂..., s _m, m represents the size of orderly keyword set S, and namely without the quantity of duplicate key word, each keyword in orderly keyword set S respectively as the dimension weighing Web text, thus sets up the characteristic dimension of Web text.

1.3, Web text feature matrix generates

For each Web text in Web text collection, that occur in statistics Web text and be contained in the word frequency of each keyword in orderly keyword set S, the value of corresponding dimension in row vector is tieed up as m, if the keyword in order in keyword set S does not occur in Web text, then the value of corresponding dimension is 0.This row vector is the proper vector of Web text, thus in the dimensional space representated by orderly keyword set S, sets up a m dimension row vector for each Web text.

The proper vector of all Web texts forms the eigenmatrix E of Web text collection, and the columns of E is m, line number is Web text number.

2, Web text model builds

In the present embodiment, the object that Web text model builds is to generate Web text collection concept matrix, and then generates Web text categories set R, for Web text individuation is recommended to lay the foundation.

In order to finally obtain Web text categories, known k-means clustering algorithm can be used to carry out cluster to the Web text in Web text collection and namely cluster to be carried out to the proper vector of Web text each in eigenmatrix E, thus build Web text model.

In addition, consider along with Web content of text constantly increases, orderly keyword set S also can constantly increase, namely the dimension weighing Web text can constantly increase, therefore, utilize the correlativity that may exist between each keyword in orderly keyword set S, by the relation of word in Chinese thesaurus, the different keywords of same concept are mapped as identical concept, thus dimensionality reduction is carried out to orderly keyword set S.

In the present embodiment, the concrete steps of Web text model structure are as follows:

2.1, the generation of Web text collection concept matrix

The basic thought of Chinese thesaurus is, the mapping of the given word woods W be made up of p word to the concept set W ' be made up of q concept, wherein p>q, different words can be mapped to same concept.

Based on this, in word woods W, search each keyword s in orderly keyword set S successively _x, x=1,2 ..., m, if find have word and keyword s in word woods W _xidentical, just replace keyword s by the concept that this word is corresponding _x, and check keyword s _xwith in orderly keyword set S before the keyword replaced by same concept whether with keyword s _xrepeat, if there is keyword s _y(y=1,2 ..., x-1) and keyword s _xcorresponding concept is identical, then by keyword s _xwith keyword s _ymerge, specific practice is: by the rank transformation of eigenmatrix E, the value that the xth of eigenmatrix E arranges is added to y row, removes the xth row in eigenmatrix E, removes keyword s simultaneously from orderly keyword set S _x.

Process all keywords in orderly keyword set S, just can obtain the concept matrix E ' of Web text collection, orderly keyword set S becomes orderly keyword set S ', and dimension drops to m ' by m, wherein the Concept Vectors of the corresponding Web text of every a line of concept matrix E '.Compared with eigenmatrix E, the columns of concept matrix E ' is less, and the characteristic dimension of Web text collection also drops to m ' by Conceptual Projection from dimension m, and the cluster for Web text has carried out the pre-service of dimensionality reduction;

2.2, the generation of Web text categories

Use k-means clustering algorithm to carry out cluster to Web text collection, in cluster process, adopt the distance between mathematically simple inner product of vectors computing method tolerance Web text.By cluster, the Web text in being gathered herein by Web is divided into some classifications, composition category set R={r ₁, r ₂..., r _n, n is classification sum, r _z(z=1,2 ..., n) representation class mark, z is class numbering.

Therefore, in the present embodiment, in eigenmatrix E, the proper vector of each Web text carries out cluster is first to carrying out mapping process in eigenmatrix E, obtains concept matrix E ', then use k-means clustering algorithm, cluster is carried out to the proper vector of Web text each in concept matrix E '.

3, user's preference of dynamic modeling

Use U={u ₁, u ₂..., u _lrepresenting user's set, l represents the quantity of user, user u _i(i=1,2 ..., the Web text subset involved by behavior l) before moment t is combined into O={o ₁, o ₂..., o _v, v is the quantity of Web text, Web text o _j(j=1,2 ..., the time span of moment distance moment t v) produced is h _j.User u _idynamic user model be expressed as user preference vector the process that the preference of description user ui is passed in time and constantly changed.The object of dynamic subscriber's preference modeling is to generate dynamic user preference vector

3.1, Web text involved by user behavior is generated to user preference disturbance degree

Order calculates Web text o _jto user u _ithe disturbance degree of preference is d _j, d _jfor user u in dynamic subscriber's preference pattern _ito Web text o _jpreference accounts for u _ithe ratio of all preferences:

d_{j} = \frac{G (h_{j})}{Σ_{k = 1}^{v} G (h_{k})} - - - (1)

G (h _j) represent memory curve model, G (h _j) >0, be a time span h _jsubtraction function, represent along with passage of time, Web text o _jto user u _ithe influence of preference declines, and can be expressed as:

\begin{matrix} G (h_{j}) = e^{- \frac{h_{j}}{b}} \\ G (h_{k}) = e^{- \frac{h_{k}}{b}} \end{matrix} - - - (2)

In formula (2), e is natural logarithm, and b is for relatively to remember intensity, and the span of b is 1≤b≤10, specifically empirically sets.

3.2, Web text categories generates

Search Web text o _jaffiliated classification: search Web text o in Web text collection _j, return Web text o _jaffiliated class numbering z _j.Meanwhile, the Web text o that calculated of integrating step 3.1 _jdisturbance degree, Web text o can be obtained _jclass numbering-disturbance degree pair, be designated as c _j=(z, d _j).

Involved by all behaviors of user, the class numbering-disturbance degree of Web text is designated as C={c to set ₁, c ₂..., c _v.

3.3, user's preference of dynamic vector generates

If class numbering-disturbance degree is to c in set C _mand c _n(m, n=1,2 ..., v; M ≠ n) there is identical class numbering, then by c _ndisturbance degree be added to c _mdisturbance degree on, and remove c _n, until the class numbering nothing repetition that all class numbering-disturbance degree are right, the quantity that now class numbering-disturbance degree is right is v ' (v '≤v), and the individual class numbering-disturbance degree of this v ' is to formation user preference vector namely user u is generated _ipreference of dynamic vector.Wherein, user preference vector in each element be class numbering-disturbance degree pair, disturbance degree represents user u _ito the fancy grade of v ' individual Web text generic, all disturbance degree sums are 1.

The present invention is based on dynamic subscriber's preference pattern can reflect user preferences and pass in time and the change occurred, time span h _jless, a certain preference of user is newer, and the preference that more energy representative of consumer is current, the recommendation results of the Web text drawn thus is by more identical for the preference current with user.

4, Web text individuation is recommended

Web text to be recommended refers to the Web text produced after moment t, and all Web texts to be recommended form Web text collection to be recommended, are designated as A.The method of abovementioned steps 1 is adopted to carry out feature extraction to each Web text in Web text collection A to be recommended, and in the Web text categories obtained before these Web texts are included into respectively.

To each Web text to be recommended, obtain the class numbering z belonging to it _s, then according to user's preference of dynamic vector, find out in all user's preference of dynamic vectors and comprise class numbering z _suser, the threshold tau (0.1≤τ≤0.7) of a given disturbance degree, if the preference disturbance degree of a certain user is not less than τ, then recommends this user by this Web text to be recommended.

Specifically, Web text individuation is recommended to comprise the following steps:

4.1, the Web text in Web text collection A to be recommended is classified, to obtain the class numbering of its generic;

Use the method in step 1.1 to carry out keyword extraction to the Web text in Web text collection A to be recommended and Web text to be recommended, obtain its keyword set, use the method in step 1.3, obtain each Web Text eigenvector to be recommended.And use the method for step 2.1 namely to obtain its Concept Vectors based on the Concept Mapping Method of Chinese thesaurus, two keywords of identical concept are mapped as in keyword set to Web text to be recommended, dimension values corresponding for a keyword rear in Web Text eigenvector is added to dimension values corresponding to previous keyword, and delete dimension values corresponding to a rear keyword, obtain the Concept Vectors of Web text to be recommended.

Then, the centre coordinate of each classification in category set R is calculated.In the present embodiment, use polygon center of gravity calculation method, regard all Web text concept vectors of each classification as polygonal summit, calculate barycentric coordinates.

Again, with known distance between two points formulae discovery, calculate the distance of Concept Vectors corresponding to each Web text in Web text collection A to be recommended to each class center coordinate respectively;

Finally, according to known MMD (minimax distance) sorting algorithm, respectively each Web text in Web text collection A to be recommended is grouped into specific classification, is grouped into corresponding classification, obtain the class numbering belonging to it.

4.2, the Web text generation liked of user

Search user and gather the preference of dynamic vector that in U, all users are corresponding, find out all users wherein comprising class numbering belonging to Web text to be recommended; A given disturbance degree threshold tau (0.1≤τ≤0.7), if the preference disturbance degree finding out class belonging to user and Web text to be recommended is not less than τ, then recommends this user by this Web text to be recommended, and puts into the recommendation list of this user.Meanwhile, the Web text in the recommendation list of user sorts according to respective preference disturbance degree.

In the present invention, utilize above-mentioned disturbance degree threshold tau, remove and the user that degree is not high is liked to this classification, thus improve the specific aim of recommendation results and recommend quality;

In above step 1 ~ 4, from Web Text character extraction, build Web text model, then dynamic subscriber's preference pattern is obtained based on memory curve model and user's historical behavior, the historical data of reflection user behavior is to the dynamic effects of current preference, last according to the relation between Web text to be recommended and user, complete personalized recommendation based on user preference.

Compared with prior art, the present invention has the following advantages and good effect:

(1), on the one hand, consider user's historical behavior and current preference passed in time and the dynamic effects that changes, provide a kind of more accurately, there is dynamic and the Web text individuation recommend method of more realistic situation.Compared with the recommend method of prior art, more can embody the dynamic effects of user's historical behavior to current preference, instead of think that the effect of all historical behaviors is identical, irrelevant with the time.

(2), on the other hand, adopt the Concept Mapping Method based on Chinese thesaurus, consider the potential contact between Web text key word, make the cluster result of Web text more reasonable, meet the actual use habit of user, also simplify the calculated amount of Web text modeling simultaneously, its result is more reasonable, richer semantic logic, the use habit of being more close to the users.

Example: the news personalization based on user's preference of dynamic is recommended

In this example, Web text is newsletter archive, browses the historical behavior of news for user, and 5 newsletter archives that before on February 1st, 1,2 users browse, comprise temporal information and news content, as shown in table 1, and relevant Chinese thesaurus is as shown in table 2.Newly producing news item " soldier holds 95 rifle warning shielding companion assaults " on February 2nd, 2015, is newsletter archive to be recommended.

Table 1 is user, time and Internet news text browsing data.

User	Browsing time	News is numbered	The newsletter archive browsed
				Li Yi	2015-1-1	1	The social electric business's platform of 2015 electricity Shang strategics issued by Sina's automobile
King two	2014-12-10	2	Wheresoever is the border of animal protection?
				Li Yi	2015-1-20	3	Locate medium-and-large-sized motion SUV and breathe out not H7 volume production vehicle spy photograph
King two	2015-1-12	4	Russia will put on display first item amphibious warfare rifle in the world
				King two	2015-1-19	5	95 rifle fault Pin Xian foreign militaries dare not be discontented many with PLA

Table 1

Table 2 is relevant Chinese thesaurus.

Word	Concept
		Automobile	Vehicle
SUV	Vehicle
		Rifle	Weapon

Protection

Table 2

(1), the feature extraction of newsletter archive

The content of the newsletter archive in his-and-hers watches 1 carries out participle, removes stop word, extracts keyword, as shown in table 3.

News is numbered	Keyword
		1	Automobile
2	Protection
		3	SUV
4	Rifle
		5	Rifle

Table 3

First, set up the keyword set not having repeat element, orderly keyword set S={ automobile can be obtained by table 3, protection, SUV, rifle }.

Then, according to the dimensional space of orderly keyword set S representative, set up the proper vector of every bar newsletter archive, as shown in table 4.The proper vector of each newsletter archive forms the eigenmatrix E of Internet news text collection.

News is numbered	Proper vector
		1	(1,0,0,0)
2	(0,1,0,0)
		3	(0,0,1,0)
4	(0,0,0,1)
		5	(0,0,0,1)

Table 4

(2) newsletter archive model construction

First, with the Chinese thesaurus in table 2, Conceptual Projection is carried out to the proper vector of each newsletter archive, to reach the object of dimensionality reduction.By Conceptual Projection, obtain the Concept Vectors of each newsletter archive, and orderly keyword set S ' new accordingly={ vehicle, protection, weapon }, as shown in table 5.The Concept Vectors of each newsletter archive forms the concept matrix of newsletter archive.

News is numbered	Concept Vectors
		1	(1,0,0)
2	(0,1,0)

3	(1,0,0)
		4	(0,0,1)
5	(0,0,1)

Table 5

Then, use k-means algorithm to carry out cluster to each newsletter archive, namely in Web text collection concept matrix, the Concept Vectors of each Web text carries out cluster, obtains 3 classifications, as shown in table 6.

News is numbered	Generic
		1	1
2	2
		3	1
4	3
		5	3

Table 6

3, dynamic subscriber's preference modeling

In the present embodiment, memory intensity b value is 5 relatively, calculates the preference disturbance degree of newsletter archive involved by each user behavior.In the present embodiment, for user " Lee one ", user browses 2 newsletter archives altogether, and its preference disturbance degree is respectively calculated as follows:

d_{1} = \frac{G (h_{1})}{Σ_{α = 1}^{2} G (h_{α})} = \frac{e^{- \frac{31}{5}}}{e^{- \frac{31}{5}} + e^{- \frac{12}{5}}} = 0.0219

d_{2} = \frac{G (h_{2})}{Σ_{α = 1}^{2} G (h_{α})} = \frac{e^{- \frac{31}{5}}}{e^{- \frac{31}{5}} + e^{- \frac{12}{5}}} = 0.9781

User " Lee one " has browsed the newsletter archive being numbered 1 and 3, as shown in Table 6, all belongs to classification 1, therefore d ₁and d ₂be all the disturbance degree of classification 1 correspondence, then the preference of user is all classification 1, and namely user " Lee one " disturbance degree to classification 1 is 1.

Same method, can obtain the disturbance degree namely browsing newsletter archive involved by user " king two " behavior, as shown in table 7.

User	The class numbering that preference is corresponding	Disturbance degree
			Li Yi	1	1
King two	2	0.0003
			King two	3	0.9997

Table 7

And then obtaining user's preference of dynamic vector, each user class numbering-disturbance degree is as shown in table 8 to the user preference vector formed.

User	User's preference of dynamic vector
		Li Yi	(1,1)
King two	(2,0.0003),(3,0.9997)

Table 8

(4), newsletter archive personalized recommendation

For newsletter archive to be recommended " soldier holds 95 rifle warning shielding companion assaults ", keyword " rifle " is obtained by feature extraction, Conceptual Projection is to " weapon ", the dimensional space corresponding for orderly keyword set S ' obtains Concept Vectors (0,0,1), classification 3 is grouped into thus.Threshold tau value 0.3, filters out the user " king two " that disturbance degree in preference vector is not less than τ, this newsletter archive is added the recommendation list of " king two ", thus complete the personalized recommendation of newsletter archive.

The present invention is based on memory curve model to describe and pass in time and the user preference changed, obtain dynamic subscriber's preference, and then using the personalized recommendation of Web text as starting point, the science of Web text modeling is improved by Conceptual Projection, construct the Web text individuation recommend method based on user's preference of dynamic, obtain recommendation results comparatively accurately in the mode of more realistic situation.

Although be described the illustrative embodiment of the present invention above; so that those skilled in the art understand the present invention; but should be clear; the invention is not restricted to the scope of embodiment; to those skilled in the art; as long as various change to limit and in the spirit and scope of the present invention determined, these changes are apparent, and all innovation and creation utilizing the present invention to conceive are all at the row of protection in appended claim.

Claims

1. a Web text individuation recommend method, is characterized in that, comprises the following steps:

(1), Web Text character extraction

1.1), the set of Web text key word generates

1.2), Web text feature dimension generates

1.3), Web text feature matrix generates

(2), Web text model builds

(3), user's preference of dynamic modeling

Web text o _jto user u _ipreference disturbance degree is d _j:

d_{j} = \frac{G (h_{j})}{Σ_{k = 1}^{v} G (h_{k})} - - - (1)

Wherein, G (h _j) and G (h _k) can be expressed as:

G (h_{j}) = e^{- \frac{h_{j}}{b}}

G (h_{k}) = e^{- \frac{h_{k}}{b}} - - - (2)

3.2), Web text categories generates

3.3), user's preference of dynamic vector generates

(4), Web text individuation is recommended

The Web text produced after moment t is Web text to be recommended;

4.2), the Web text generation liked of user

2. recommend method according to claim 1, it is characterized in that, described use k-means clustering algorithm, carrying out cluster to the proper vector of Web text each in eigenmatrix E is: first to carrying out mapping process in eigenmatrix E, obtain concept matrix E ', then use k-means clustering algorithm, cluster is carried out to the proper vector of Web text each in concept matrix E ';

Described mapping is treated to: in word woods W, search each keyword s in orderly keyword set S successively _x, x=1,2 ..., m, if find have word and keyword s in word woods W _xidentical, just replace keyword s by the concept that this word is corresponding _x, and check keyword s _xwith in orderly keyword set S before the keyword replaced by same concept whether with keyword s _xrepeat, if there is keyword s _y(y=1,2 ..., x-1) and keyword s _xcorresponding concept is identical, then by keyword s _xwith keyword s _ymerge, specific practice is: by the rank transformation of eigenmatrix E, the value that the xth of eigenmatrix E arranges is added to y row, removes the xth row in eigenmatrix E, removes keyword s simultaneously from orderly keyword set S _x;

Process all keywords in orderly keyword set S, just can obtain the concept matrix E ' of Web text collection, orderly keyword set S becomes orderly keyword set S ', and dimension drops to m ' by m, wherein the Concept Vectors of the corresponding Web text of every a line of concept matrix E ';

Described step 4.1) in, two keywords of identical concept are mapped as in keyword set to Web text to be recommended, dimension values corresponding for a keyword rear in Web Text eigenvector is added to dimension values corresponding to previous keyword, and delete dimension values corresponding to a rear keyword, obtain the Concept Vectors of Web text to be recommended; Then barycentric coordinates calculating and classification is carried out according to the Concept Vectors of Web text to be recommended.