CN104615779A - Method for personalized recommendation of Web text - Google Patents

Method for personalized recommendation of Web text Download PDF

Info

Publication number
CN104615779A
CN104615779A CN201510090280.4A CN201510090280A CN104615779A CN 104615779 A CN104615779 A CN 104615779A CN 201510090280 A CN201510090280 A CN 201510090280A CN 104615779 A CN104615779 A CN 104615779A
Authority
CN
China
Prior art keywords
web text
user
keyword
web
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510090280.4A
Other languages
Chinese (zh)
Other versions
CN104615779B (en
Inventor
尹子都
岳昆
张骥先
武浩
刘惟一
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunnan University YNU
Original Assignee
Yunnan University YNU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan University YNU filed Critical Yunnan University YNU
Priority to CN201510090280.4A priority Critical patent/CN104615779B/en
Publication of CN104615779A publication Critical patent/CN104615779A/en
Application granted granted Critical
Publication of CN104615779B publication Critical patent/CN104615779B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • G06F16/337Profile generation, learning or modification

Abstract

The invention discloses a method for a personalized recommendation of a Web text. The method comprises the following steps: performing feature extraction on a plurality of kinds of Web texts generated before a certain time t so as to obtain a feature matrix E of a Web text set, and then performing cluster so as to obtain n categories; besides, according to a time span hj from the time when a Web text oj in a Web text subset relevant to the behavior of a certain user ui before the time t to the time t, calculating out a preference influence degree dj to the user ui so as to obtain a pair of category number-influence degree cj of the Web text oj and generate a dynamic preference vector of the user ui; if the preference influence degree of the user and the category of the Web text to be recommended is found to be higher than or equal to a threshold value tan, recommending the Web text to be recommended to the user. According to the method disclosed by the invention, the dynamic influence changed with the time lapse of current preference from the historic behavior of the user is considered, and the method is more accurate in recommendation, has dynamic performance, and more conforms to the actual condition.

Description

A kind of Web text individuation recommend method
Technical field
The invention belongs to magnanimity information processing and data mining technology field, more specifically say, relate to a kind of Web text individuation recommend method, the historical data based on user behavior obtains user preference, recommends the Web text of interested and potential interest to user.
Background technology
The appearance of internet and universally meet the demand of user in the information age to information, but the raising of the evolution of network and people's cognitive ability, make the generation speed of information constantly accelerate.
Web text is the various Web information with text representation, and the text description of Internet news, content of microblog, e-business network site commodity or evaluation etc. are all the Typical Representatives of current Web text.Developing rapidly and popularizing along with Internet technology, a large amount of Web texts produces and becomes the important carrier of internet information.Can to obtain for user and the Web amount of text browsed exceed the actual ability that can process of user, to have occurred information overload problem, the demand of user becomes and obtains required information to greatest extent.
For the personalized recommendation of Web text, we need by analyzing in user's the past period the browsing of Web text, evaluate, pay close attention to or the behavior record of forwarding etc. and the historical data of user behavior, calculate the preference of user, Web text processed simultaneously and extract feature, the Web text meeting user preference condition is pushed to corresponding user.
Web text commending system, the various Web information of main process and recommendation text representation, comprise user modeling module, Web text modeling module and recommend method module.Wherein the foundation of Web text modeling module depends on user modeling module, and recommend method module needs to consider user modeling module and Web text modeling module, it can thus be appreciated that user modeling module and correlation method are core and the key of whole commending system.For this reason, need to set up effective user model and corresponding matching mechanisms, the foundation of known user modeling module is based on the historical data of user behavior, namely user's past to the browsing of Web text, evaluate, pay close attention to or the historical data of the behavior such as forwarding, completing user MBM, namely set up user preferences modeling, finally implement the personalized recommendation of Web text according to specific user.
Lu Meilian etc. propose " Individuation research direction commending system and recommend method based on theme ", and (on Dec 4th, 2013 announces, publication No. is the Chinese invention patent application of CN103425799A), use the historical data completing user modeling of user's browing record; Wang Xiaolong etc. propose " a kind of click reaction type personalized recommendation system " (on 07 16th, 2014 Granted publications, Authorization Notice No. is the Chinese invention patent of CN102685565B), with the personalized recommendation system of associated recommendation system globe area, feed back based on click and by the historical data of user preference, result adjusted automatically, thus producing more accurate recommendation results; Zhao Yanbin etc. propose " community-based relevant note commending system and recommend method " (announces on 05 28th, 2014, publication No. is the Chinese invention patent application of CN103823805A), be given by user preference historical data and between correlativity obtain the method for specific user recommendation results; Wang Li just waits (< Journal of Software >, the 1st phase in 2012) to propose " a kind of method obtaining user preference based on user's historical behavior contextual information "; Zhong little Wu etc. propose " a kind of commending system based on domain expert " (announces on 05 30th, 2012, publication No. is the Chinese invention patent application of CN102479202A), according to project data, user data and user behavior historical data digging user to the field of the scoring of project quality, the interested and potential interest of user and expert user data, and calculate the contiguous specialist list obtaining active user, return to user as recommendation results collection.
Existing Web text individuation recommend method, although consider the historical data of user behavior, the accuracy of recommending still awaits improving.
Summary of the invention
The object of the invention is on the basis of existing technology, a kind of Web text individuation recommend method is provided, improve the accuracy of recommending interested and potential interest Web text to user further.
For achieving the above object, a kind of Web text individuation of the present invention recommend method, is characterized in that, comprise the following steps:
(1), Web Text character extraction
1.1), the set of Web text key word generates
The some Web texts produced before certain moment t form Web text collection; Participle is carried out to the content of each Web text in Web text collection, removes stop word, obtain the keyword set describing Web text;
1.2), Web text feature dimension generates
Scan the keyword set of each Web text successively, keyword is wherein added to one without in the ordered set of repeat element and keyword, obtain orderly keyword set S={s 1, s 2..., s m, m represents the size of orderly keyword set S, and namely without the quantity of duplicate key word, each keyword in orderly keyword set S respectively as the dimension weighing Web text, thus sets up the characteristic dimension of Web text;
1.3), Web text feature matrix generates
For each Web text in Web text collection, that occur in statistics Web text and be contained in the word frequency of each keyword in orderly keyword set S, the value of corresponding dimension in row vector is tieed up as m, if the keyword in order in keyword set S does not occur in Web text, then the value of corresponding dimension is 0, and this m ties up the proper vector that row vector is this Web text;
The proper vector of all Web texts forms the eigenmatrix E of Web text collection, and the columns of E is m, line number is Web text number;
(2), Web text model builds
Use k-means clustering algorithm, cluster is carried out to the proper vector of Web text each in eigenmatrix E, the Web text in Web text collection is divided into some classifications, composition category set R={r 1, r 2..., r n, n is classification sum, r z(z=1,2 ..., n) representation class mark, z is class numbering;
(3), dynamic subscriber's preference modeling
Use U={u 1, u 2..., u lrepresenting user's set, l represents the quantity of user, user u i(i=1,2 ..., the Web text subset involved by behavior l) before moment t is combined into O={o 1, o 2..., o v, v is the quantity of Web text, Web text o j(j=1,2 ..., the time span of moment distance moment t v) produced is h j;
3.1) Web text involved by user behavior, is generated to user preference disturbance degree
Web text o jto user u ipreference disturbance degree is d j:
d j = G ( h j ) &Sigma; k = 1 v G ( h k ) - - - ( 1 )
Wherein, G (h j) and G (h k) can be expressed as:
G ( h j ) = e - h j b G ( h k ) = e - h k b - - - ( 2 )
In formula (2), e is natural logarithm, and b is for relatively to remember intensity, and b empirically sets (1≤b≤10);
3.2), Web text categories generates
Search Web text o jaffiliated classification: search Web text o in Web text collection j, return o jaffiliated class numbering z j; Meanwhile, integrating step 3.1) the Web text o that calculated jdisturbance degree, Web text o can be obtained jclass numbering-disturbance degree pair, be designated as c j=(z j, d j);
Involved by all behaviors of user, the class numbering-disturbance degree of Web text is designated as C={c to set 1, c 2..., c v;
3.3), user's preference of dynamic vector generates
If class numbering-disturbance degree is to c in set C mand c n(m, n=1,2 ..., v; M ≠ n) there is identical class numbering, then by c ndisturbance degree be added to c mdisturbance degree on, and remove c n, until the class numbering nothing repetition that all class numbering-disturbance degree are right, the quantity that now class numbering-disturbance degree is right is v ' (v '≤v), and the individual class numbering-disturbance degree of this v ' is to formation user preference vector namely user u is generated ipreference of dynamic vector;
(4), Web text individuation is recommended
The Web text produced after moment t is Web text to be recommended;
4.1), first, use step 1.1) in method keyword extraction is carried out to Web text to be recommended, obtain the keyword set of Web text to be recommended, use step 1.2) in method obtain Web Text eigenvector to be recommended; Then, calculate the centre coordinate of each classification in category set R, namely calculate the barycentric coordinates belonging to all Web Text eigenvectors of each classification; Again, the distance of Web Text eigenvector to be recommended to each class center coordinate is calculated; Finally, according to MMD (minimax distance) sorting algorithm, Web text to be recommended is grouped into corresponding classification, obtains the class numbering belonging to it;
4.2), the Web text generation liked of user
Search user and gather the preference of dynamic vector that in U, all users are corresponding, find out all users wherein comprising class numbering belonging to Web text to be recommended; A given disturbance degree threshold tau (0.1≤τ≤0.7), if the preference disturbance degree finding out class belonging to user and Web text to be recommended is not less than τ, then recommends this user by this Web text to be recommended.
Goal of the invention of the present invention is achieved in that
Web text individuation recommend method of the present invention, by carrying out feature extraction to the some Web texts produced before certain moment t, obtaining the eigenmatrix E of Web text collection, then carrying out cluster and obtaining n classification; Meanwhile, to certain user u iweb text o during Web text subset involved by behavior before moment t closes jthe time span h of the moment distance moment t produced jcalculate it to user u ipreference disturbance degree d j, obtain Web text o jclass numbering-disturbance degree to c j, generate user u ipreference of dynamic vector; Like this when recommending, according to the distance of Web Text eigenvector to be recommended to each class center coordinate, Web text to be recommended is grouped into corresponding classification, search user and gather the preference of dynamic vector that in U, all users are corresponding, find out all users wherein comprising class numbering belonging to Web text to be recommended; If the preference disturbance degree finding out class belonging to user and Web text to be recommended is not less than threshold tau, then this Web text to be recommended is recommended this user.
Contemplated by the invention user's historical behavior to pass in time current preference and the dynamic effects that changes, provide a kind of more accurately, there is dynamic and the Web text individuation recommend method of more realistic situation.Compared with existing recommend method, more can embody the dynamic effects of user's historical behavior to current preference, instead of think that the effect of all historical behaviors is identical, irrelevant with the time.
Accompanying drawing explanation
Fig. 1 is a kind of embodiment process flow diagram of Web text individuation recommend method.
Embodiment
Below in conjunction with accompanying drawing, the specific embodiment of the present invention is described, so that those skilled in the art understands the present invention better.Requiring particular attention is that, in the following description, when perhaps the detailed description of known function and design can desalinate main contents of the present invention, these are described in and will be left in the basket here.
Embodiment
In the present embodiment, as shown in Figure 1, Web text individuation recommend method of the present invention comprises: step (1), Web Text character extraction, by participle, keyword interpolation, keyword word frequency statistics, carries out feature extraction to Web text; Step (2), Web text model build, and by generating the concept matrix of Web text collection based on synonym woods, then carry out cluster, build Web text model; Step (3), the modeling of dynamic subscriber's preference, Web text subset involved by the behavior before the historical data of user behavior and moment t closes, and its temporal information, based on memory curve model, set up dynamic subscriber's preference pattern, the user preference that expression is passed in time and constantly changed; Step (4), Web text individuation are recommended, and consider the similarity of user preference and Web text feature, set up by matching relationship the personalized recommendation that user's recommendation list completes Web text.
Below four steps are described in detail.
1, Web Text character extraction
Web text collection is made up of the some Web texts produced before certain moment t.The object of Web Text character extraction is to generate Web text feature matrix.
1.1, the set of Web text key word generates
Participle is carried out to the content of each Web text in Web text collection, removes stop word, obtain the keyword set describing Web text.
1.2, Web text feature dimension generates
Scan the keyword set of each Web text successively, keyword is wherein added to one without in the ordered set of repeat element and keyword, obtain orderly keyword set S={s 1, s 2..., s m, m represents the size of orderly keyword set S, and namely without the quantity of duplicate key word, each keyword in orderly keyword set S respectively as the dimension weighing Web text, thus sets up the characteristic dimension of Web text.
1.3, Web text feature matrix generates
For each Web text in Web text collection, that occur in statistics Web text and be contained in the word frequency of each keyword in orderly keyword set S, the value of corresponding dimension in row vector is tieed up as m, if the keyword in order in keyword set S does not occur in Web text, then the value of corresponding dimension is 0.This row vector is the proper vector of Web text, thus in the dimensional space representated by orderly keyword set S, sets up a m dimension row vector for each Web text.
The proper vector of all Web texts forms the eigenmatrix E of Web text collection, and the columns of E is m, line number is Web text number.
2, Web text model builds
In the present embodiment, the object that Web text model builds is to generate Web text collection concept matrix, and then generates Web text categories set R, for Web text individuation is recommended to lay the foundation.
In order to finally obtain Web text categories, known k-means clustering algorithm can be used to carry out cluster to the Web text in Web text collection and namely cluster to be carried out to the proper vector of Web text each in eigenmatrix E, thus build Web text model.
In addition, consider along with Web content of text constantly increases, orderly keyword set S also can constantly increase, namely the dimension weighing Web text can constantly increase, therefore, utilize the correlativity that may exist between each keyword in orderly keyword set S, by the relation of word in Chinese thesaurus, the different keywords of same concept are mapped as identical concept, thus dimensionality reduction is carried out to orderly keyword set S.
In the present embodiment, the concrete steps of Web text model structure are as follows:
2.1, the generation of Web text collection concept matrix
The basic thought of Chinese thesaurus is, the mapping of the given word woods W be made up of p word to the concept set W ' be made up of q concept, wherein p>q, different words can be mapped to same concept.
Based on this, in word woods W, search each keyword s in orderly keyword set S successively x, x=1,2 ..., m, if find have word and keyword s in word woods W xidentical, just replace keyword s by the concept that this word is corresponding x, and check keyword s xwith in orderly keyword set S before the keyword replaced by same concept whether with keyword s xrepeat, if there is keyword s y(y=1,2 ..., x-1) and keyword s xcorresponding concept is identical, then by keyword s xwith keyword s ymerge, specific practice is: by the rank transformation of eigenmatrix E, the value that the xth of eigenmatrix E arranges is added to y row, removes the xth row in eigenmatrix E, removes keyword s simultaneously from orderly keyword set S x.
Process all keywords in orderly keyword set S, just can obtain the concept matrix E ' of Web text collection, orderly keyword set S becomes orderly keyword set S ', and dimension drops to m ' by m, wherein the Concept Vectors of the corresponding Web text of every a line of concept matrix E '.Compared with eigenmatrix E, the columns of concept matrix E ' is less, and the characteristic dimension of Web text collection also drops to m ' by Conceptual Projection from dimension m, and the cluster for Web text has carried out the pre-service of dimensionality reduction;
2.2, the generation of Web text categories
Use k-means clustering algorithm to carry out cluster to Web text collection, in cluster process, adopt the distance between mathematically simple inner product of vectors computing method tolerance Web text.By cluster, the Web text in being gathered herein by Web is divided into some classifications, composition category set R={r 1, r 2..., r n, n is classification sum, r z(z=1,2 ..., n) representation class mark, z is class numbering.
Therefore, in the present embodiment, in eigenmatrix E, the proper vector of each Web text carries out cluster is first to carrying out mapping process in eigenmatrix E, obtains concept matrix E ', then use k-means clustering algorithm, cluster is carried out to the proper vector of Web text each in concept matrix E '.
3, user's preference of dynamic modeling
Use U={u 1, u 2..., u lrepresenting user's set, l represents the quantity of user, user u i(i=1,2 ..., the Web text subset involved by behavior l) before moment t is combined into O={o 1, o 2..., o v, v is the quantity of Web text, Web text o j(j=1,2 ..., the time span of moment distance moment t v) produced is h j.User u idynamic user model be expressed as user preference vector the process that the preference of description user ui is passed in time and constantly changed.The object of dynamic subscriber's preference modeling is to generate dynamic user preference vector
3.1, Web text involved by user behavior is generated to user preference disturbance degree
Order calculates Web text o jto user u ithe disturbance degree of preference is d j, d jfor user u in dynamic subscriber's preference pattern ito Web text o jpreference accounts for u ithe ratio of all preferences:
d j = G ( h j ) &Sigma; k = 1 v G ( h k ) - - - ( 1 )
G (h j) represent memory curve model, G (h j) >0, be a time span h jsubtraction function, represent along with passage of time, Web text o jto user u ithe influence of preference declines, and can be expressed as:
G ( h j ) = e - h j b G ( h k ) = e - h k b - - - ( 2 )
In formula (2), e is natural logarithm, and b is for relatively to remember intensity, and the span of b is 1≤b≤10, specifically empirically sets.
3.2, Web text categories generates
Search Web text o jaffiliated classification: search Web text o in Web text collection j, return Web text o jaffiliated class numbering z j.Meanwhile, the Web text o that calculated of integrating step 3.1 jdisturbance degree, Web text o can be obtained jclass numbering-disturbance degree pair, be designated as c j=(z, d j).
Involved by all behaviors of user, the class numbering-disturbance degree of Web text is designated as C={c to set 1, c 2..., c v.
3.3, user's preference of dynamic vector generates
If class numbering-disturbance degree is to c in set C mand c n(m, n=1,2 ..., v; M ≠ n) there is identical class numbering, then by c ndisturbance degree be added to c mdisturbance degree on, and remove c n, until the class numbering nothing repetition that all class numbering-disturbance degree are right, the quantity that now class numbering-disturbance degree is right is v ' (v '≤v), and the individual class numbering-disturbance degree of this v ' is to formation user preference vector namely user u is generated ipreference of dynamic vector.Wherein, user preference vector in each element be class numbering-disturbance degree pair, disturbance degree represents user u ito the fancy grade of v ' individual Web text generic, all disturbance degree sums are 1.
The present invention is based on dynamic subscriber's preference pattern can reflect user preferences and pass in time and the change occurred, time span h jless, a certain preference of user is newer, and the preference that more energy representative of consumer is current, the recommendation results of the Web text drawn thus is by more identical for the preference current with user.
4, Web text individuation is recommended
Web text to be recommended refers to the Web text produced after moment t, and all Web texts to be recommended form Web text collection to be recommended, are designated as A.The method of abovementioned steps 1 is adopted to carry out feature extraction to each Web text in Web text collection A to be recommended, and in the Web text categories obtained before these Web texts are included into respectively.
To each Web text to be recommended, obtain the class numbering z belonging to it s, then according to user's preference of dynamic vector, find out in all user's preference of dynamic vectors and comprise class numbering z suser, the threshold tau (0.1≤τ≤0.7) of a given disturbance degree, if the preference disturbance degree of a certain user is not less than τ, then recommends this user by this Web text to be recommended.
Specifically, Web text individuation is recommended to comprise the following steps:
4.1, the Web text in Web text collection A to be recommended is classified, to obtain the class numbering of its generic;
Use the method in step 1.1 to carry out keyword extraction to the Web text in Web text collection A to be recommended and Web text to be recommended, obtain its keyword set, use the method in step 1.3, obtain each Web Text eigenvector to be recommended.And use the method for step 2.1 namely to obtain its Concept Vectors based on the Concept Mapping Method of Chinese thesaurus, two keywords of identical concept are mapped as in keyword set to Web text to be recommended, dimension values corresponding for a keyword rear in Web Text eigenvector is added to dimension values corresponding to previous keyword, and delete dimension values corresponding to a rear keyword, obtain the Concept Vectors of Web text to be recommended.
Then, the centre coordinate of each classification in category set R is calculated.In the present embodiment, use polygon center of gravity calculation method, regard all Web text concept vectors of each classification as polygonal summit, calculate barycentric coordinates.
Again, with known distance between two points formulae discovery, calculate the distance of Concept Vectors corresponding to each Web text in Web text collection A to be recommended to each class center coordinate respectively;
Finally, according to known MMD (minimax distance) sorting algorithm, respectively each Web text in Web text collection A to be recommended is grouped into specific classification, is grouped into corresponding classification, obtain the class numbering belonging to it.
4.2, the Web text generation liked of user
Search user and gather the preference of dynamic vector that in U, all users are corresponding, find out all users wherein comprising class numbering belonging to Web text to be recommended; A given disturbance degree threshold tau (0.1≤τ≤0.7), if the preference disturbance degree finding out class belonging to user and Web text to be recommended is not less than τ, then recommends this user by this Web text to be recommended, and puts into the recommendation list of this user.Meanwhile, the Web text in the recommendation list of user sorts according to respective preference disturbance degree.
In the present invention, utilize above-mentioned disturbance degree threshold tau, remove and the user that degree is not high is liked to this classification, thus improve the specific aim of recommendation results and recommend quality;
In above step 1 ~ 4, from Web Text character extraction, build Web text model, then dynamic subscriber's preference pattern is obtained based on memory curve model and user's historical behavior, the historical data of reflection user behavior is to the dynamic effects of current preference, last according to the relation between Web text to be recommended and user, complete personalized recommendation based on user preference.
Compared with prior art, the present invention has the following advantages and good effect:
(1), on the one hand, consider user's historical behavior and current preference passed in time and the dynamic effects that changes, provide a kind of more accurately, there is dynamic and the Web text individuation recommend method of more realistic situation.Compared with the recommend method of prior art, more can embody the dynamic effects of user's historical behavior to current preference, instead of think that the effect of all historical behaviors is identical, irrelevant with the time.
(2), on the other hand, adopt the Concept Mapping Method based on Chinese thesaurus, consider the potential contact between Web text key word, make the cluster result of Web text more reasonable, meet the actual use habit of user, also simplify the calculated amount of Web text modeling simultaneously, its result is more reasonable, richer semantic logic, the use habit of being more close to the users.
Example: the news personalization based on user's preference of dynamic is recommended
In this example, Web text is newsletter archive, browses the historical behavior of news for user, and 5 newsletter archives that before on February 1st, 1,2 users browse, comprise temporal information and news content, as shown in table 1, and relevant Chinese thesaurus is as shown in table 2.Newly producing news item " soldier holds 95 rifle warning shielding companion assaults " on February 2nd, 2015, is newsletter archive to be recommended.
Table 1 is user, time and Internet news text browsing data.
User Browsing time News is numbered The newsletter archive browsed
Li Yi 2015-1-1 1 The social electric business's platform of 2015 electricity Shang strategics issued by Sina's automobile
King two 2014-12-10 2 Wheresoever is the border of animal protection?
Li Yi 2015-1-20 3 Locate medium-and-large-sized motion SUV and breathe out not H7 volume production vehicle spy photograph
King two 2015-1-12 4 Russia will put on display first item amphibious warfare rifle in the world
King two 2015-1-19 5 95 rifle fault Pin Xian foreign militaries dare not be discontented many with PLA
Table 1
Table 2 is relevant Chinese thesaurus.
Word Concept
Automobile Vehicle
SUV Vehicle
Rifle Weapon
Protection Protection
Table 2
(1), the feature extraction of newsletter archive
The content of the newsletter archive in his-and-hers watches 1 carries out participle, removes stop word, extracts keyword, as shown in table 3.
News is numbered Keyword
1 Automobile
2 Protection
3 SUV
4 Rifle
5 Rifle
Table 3
First, set up the keyword set not having repeat element, orderly keyword set S={ automobile can be obtained by table 3, protection, SUV, rifle }.
Then, according to the dimensional space of orderly keyword set S representative, set up the proper vector of every bar newsletter archive, as shown in table 4.The proper vector of each newsletter archive forms the eigenmatrix E of Internet news text collection.
News is numbered Proper vector
1 (1,0,0,0)
2 (0,1,0,0)
3 (0,0,1,0)
4 (0,0,0,1)
5 (0,0,0,1)
Table 4
(2) newsletter archive model construction
First, with the Chinese thesaurus in table 2, Conceptual Projection is carried out to the proper vector of each newsletter archive, to reach the object of dimensionality reduction.By Conceptual Projection, obtain the Concept Vectors of each newsletter archive, and orderly keyword set S ' new accordingly={ vehicle, protection, weapon }, as shown in table 5.The Concept Vectors of each newsletter archive forms the concept matrix of newsletter archive.
News is numbered Concept Vectors
1 (1,0,0)
2 (0,1,0)
3 (1,0,0)
4 (0,0,1)
5 (0,0,1)
Table 5
Then, use k-means algorithm to carry out cluster to each newsletter archive, namely in Web text collection concept matrix, the Concept Vectors of each Web text carries out cluster, obtains 3 classifications, as shown in table 6.
News is numbered Generic
1 1
2 2
3 1
4 3
5 3
Table 6
3, dynamic subscriber's preference modeling
In the present embodiment, memory intensity b value is 5 relatively, calculates the preference disturbance degree of newsletter archive involved by each user behavior.In the present embodiment, for user " Lee one ", user browses 2 newsletter archives altogether, and its preference disturbance degree is respectively calculated as follows:
d 1 = G ( h 1 ) &Sigma; &alpha; = 1 2 G ( h &alpha; ) = e - 31 5 e - 31 5 + e - 12 5 = 0.0219
d 2 = G ( h 2 ) &Sigma; &alpha; = 1 2 G ( h &alpha; ) = e - 31 5 e - 31 5 + e - 12 5 = 0.9781
User " Lee one " has browsed the newsletter archive being numbered 1 and 3, as shown in Table 6, all belongs to classification 1, therefore d 1and d 2be all the disturbance degree of classification 1 correspondence, then the preference of user is all classification 1, and namely user " Lee one " disturbance degree to classification 1 is 1.
Same method, can obtain the disturbance degree namely browsing newsletter archive involved by user " king two " behavior, as shown in table 7.
User The class numbering that preference is corresponding Disturbance degree
Li Yi 1 1
King two 2 0.0003
King two 3 0.9997
Table 7
And then obtaining user's preference of dynamic vector, each user class numbering-disturbance degree is as shown in table 8 to the user preference vector formed.
User User's preference of dynamic vector
Li Yi (1,1)
King two (2,0.0003),(3,0.9997)
Table 8
(4), newsletter archive personalized recommendation
For newsletter archive to be recommended " soldier holds 95 rifle warning shielding companion assaults ", keyword " rifle " is obtained by feature extraction, Conceptual Projection is to " weapon ", the dimensional space corresponding for orderly keyword set S ' obtains Concept Vectors (0,0,1), classification 3 is grouped into thus.Threshold tau value 0.3, filters out the user " king two " that disturbance degree in preference vector is not less than τ, this newsletter archive is added the recommendation list of " king two ", thus complete the personalized recommendation of newsletter archive.
The present invention is based on memory curve model to describe and pass in time and the user preference changed, obtain dynamic subscriber's preference, and then using the personalized recommendation of Web text as starting point, the science of Web text modeling is improved by Conceptual Projection, construct the Web text individuation recommend method based on user's preference of dynamic, obtain recommendation results comparatively accurately in the mode of more realistic situation.
Although be described the illustrative embodiment of the present invention above; so that those skilled in the art understand the present invention; but should be clear; the invention is not restricted to the scope of embodiment; to those skilled in the art; as long as various change to limit and in the spirit and scope of the present invention determined, these changes are apparent, and all innovation and creation utilizing the present invention to conceive are all at the row of protection in appended claim.

Claims (2)

1. a Web text individuation recommend method, is characterized in that, comprises the following steps:
(1), Web Text character extraction
1.1), the set of Web text key word generates
The some Web texts produced before certain moment t form Web text collection; Participle is carried out to the content of each Web text in Web text collection, removes stop word, obtain the keyword set describing Web text;
1.2), Web text feature dimension generates
Scan the keyword set of each Web text successively, keyword is wherein added to one without in the ordered set of repeat element and keyword, obtain orderly keyword set S={s 1, s 2..., s m, m represents the size of orderly keyword set S, and namely without the quantity of duplicate key word, each keyword in orderly keyword set S respectively as the dimension weighing Web text, thus sets up the characteristic dimension of Web text;
1.3), Web text feature matrix generates
For each Web text in Web text collection, that occur in statistics Web text and be contained in the word frequency of each keyword in orderly keyword set S, the value of corresponding dimension in row vector is tieed up as m, if the keyword in order in keyword set S does not occur in Web text, then the value of corresponding dimension is 0, and this m ties up the proper vector that row vector is this Web text;
The proper vector of all Web texts forms the eigenmatrix E of Web text collection, and the columns of E is m, line number is Web text number;
(2), Web text model builds
Use k-means clustering algorithm, cluster is carried out to the proper vector of Web text each in eigenmatrix E, the Web text in Web text collection is divided into some classifications, composition category set R={r 1, r 2..., r n, n is classification sum, r z(z=1,2 ..., n) representation class mark, z is class numbering;
(3), user's preference of dynamic modeling
Use U={u 1, u 2..., u lrepresenting user's set, l represents the quantity of user, user u i(i=1,2 ..., the Web text subset involved by behavior l) before moment t is combined into O={o 1, o 2..., o v, v is the quantity of Web text, Web text o j(j=1,2 ..., the time span of moment distance moment t v) produced is h j;
3.1) Web text involved by user behavior, is generated to user preference disturbance degree
Web text o jto user u ipreference disturbance degree is d j:
d j = G ( h j ) &Sigma; k = 1 v G ( h k ) - - - ( 1 )
Wherein, G (h j) and G (h k) can be expressed as:
G ( h j ) = e - h j b
G ( h k ) = e - h k b - - - ( 2 )
In formula (2), e is natural logarithm, and b is for relatively to remember intensity, and b empirically sets (1≤b≤10);
3.2), Web text categories generates
Search Web text o jaffiliated classification: search Web text o in Web text collection j, return o jaffiliated class numbering z j; Meanwhile, integrating step 3.1) the Web text o that calculated jdisturbance degree, Web text o can be obtained jclass numbering-disturbance degree pair, be designated as c j=(z j, d j);
Involved by all behaviors of user, the class numbering-disturbance degree of Web text is designated as C={c to set 1, c 2..., c v;
3.3), user's preference of dynamic vector generates
If class numbering-disturbance degree is to c in set C mand c n(m, n=1,2 ..., v; M ≠ n) there is identical class numbering, then by c ndisturbance degree be added to c mdisturbance degree on, and remove c n, until the class numbering nothing repetition that all class numbering-disturbance degree are right, the quantity that now class numbering-disturbance degree is right is v ' (v '≤v), and the individual class numbering-disturbance degree of this v ' is to formation user preference vector namely user u is generated ipreference of dynamic vector;
(4), Web text individuation is recommended
The Web text produced after moment t is Web text to be recommended;
4.1), first, use step 1.1) in method keyword extraction is carried out to Web text to be recommended, obtain the keyword set of Web text to be recommended, use step 1.2) in method obtain Web Text eigenvector to be recommended; Then, calculate the centre coordinate of each classification in category set R, namely calculate the barycentric coordinates belonging to all Web Text eigenvectors of each classification; Again, the distance of Web Text eigenvector to be recommended to each class center coordinate is calculated; Finally, according to MMD (minimax distance) sorting algorithm, Web text to be recommended is grouped into corresponding classification, obtains the class numbering belonging to it;
4.2), the Web text generation liked of user
Search user and gather the preference of dynamic vector that in U, all users are corresponding, find out all users wherein comprising class numbering belonging to Web text to be recommended; A given disturbance degree threshold tau (0.1≤τ≤0.7), if the preference disturbance degree finding out class belonging to user and Web text to be recommended is not less than τ, then recommends this user by this Web text to be recommended.
2. recommend method according to claim 1, it is characterized in that, described use k-means clustering algorithm, carrying out cluster to the proper vector of Web text each in eigenmatrix E is: first to carrying out mapping process in eigenmatrix E, obtain concept matrix E ', then use k-means clustering algorithm, cluster is carried out to the proper vector of Web text each in concept matrix E ';
Described mapping is treated to: in word woods W, search each keyword s in orderly keyword set S successively x, x=1,2 ..., m, if find have word and keyword s in word woods W xidentical, just replace keyword s by the concept that this word is corresponding x, and check keyword s xwith in orderly keyword set S before the keyword replaced by same concept whether with keyword s xrepeat, if there is keyword s y(y=1,2 ..., x-1) and keyword s xcorresponding concept is identical, then by keyword s xwith keyword s ymerge, specific practice is: by the rank transformation of eigenmatrix E, the value that the xth of eigenmatrix E arranges is added to y row, removes the xth row in eigenmatrix E, removes keyword s simultaneously from orderly keyword set S x;
Process all keywords in orderly keyword set S, just can obtain the concept matrix E ' of Web text collection, orderly keyword set S becomes orderly keyword set S ', and dimension drops to m ' by m, wherein the Concept Vectors of the corresponding Web text of every a line of concept matrix E ';
Described step 4.1) in, two keywords of identical concept are mapped as in keyword set to Web text to be recommended, dimension values corresponding for a keyword rear in Web Text eigenvector is added to dimension values corresponding to previous keyword, and delete dimension values corresponding to a rear keyword, obtain the Concept Vectors of Web text to be recommended; Then barycentric coordinates calculating and classification is carried out according to the Concept Vectors of Web text to be recommended.
CN201510090280.4A 2015-02-28 2015-02-28 A kind of Web text individuations recommend method Expired - Fee Related CN104615779B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510090280.4A CN104615779B (en) 2015-02-28 2015-02-28 A kind of Web text individuations recommend method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510090280.4A CN104615779B (en) 2015-02-28 2015-02-28 A kind of Web text individuations recommend method

Publications (2)

Publication Number Publication Date
CN104615779A true CN104615779A (en) 2015-05-13
CN104615779B CN104615779B (en) 2017-08-11

Family

ID=53150221

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510090280.4A Expired - Fee Related CN104615779B (en) 2015-02-28 2015-02-28 A kind of Web text individuations recommend method

Country Status (1)

Country Link
CN (1) CN104615779B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104991968A (en) * 2015-07-24 2015-10-21 成都云堆移动信息技术有限公司 Text mining based attribute analysis method for internet media users
CN106250526A (en) * 2016-08-05 2016-12-21 浪潮电子信息产业股份有限公司 A kind of text class based on content and user behavior recommends method and apparatus
CN106339507A (en) * 2016-10-31 2017-01-18 腾讯科技(深圳)有限公司 Method and device for pushing streaming media message
CN106446059A (en) * 2016-09-02 2017-02-22 广东聚联电子商务股份有限公司 Big data-based page customization method
CN107292412A (en) * 2016-03-31 2017-10-24 阿里巴巴集团控股有限公司 A kind of problem Forecasting Methodology and forecasting system
CN107341199A (en) * 2017-06-21 2017-11-10 北京林业大学 A kind of recommendation method based on documentation & info general model
CN107368488A (en) * 2016-05-12 2017-11-21 阿里巴巴集团控股有限公司 A kind of method for determining user behavior preference, the methods of exhibiting and device of recommendation information
CN107577690A (en) * 2017-05-17 2018-01-12 中广核工程有限公司 The recommendation method and recommendation apparatus of magnanimity information data
CN108563690A (en) * 2018-03-15 2018-09-21 中山大学 A kind of collaborative filtering recommending method based on object-oriented cluster
CN108733669A (en) * 2017-04-14 2018-11-02 优路(北京)信息科技有限公司 A kind of personalized digital media content recommendation system and method based on term vector
CN108959329A (en) * 2017-05-27 2018-12-07 腾讯科技(北京)有限公司 A kind of file classification method, device, medium and equipment
CN109460519A (en) * 2018-12-28 2019-03-12 上海晶赞融宣科技有限公司 Browse object recommendation method and device, storage medium, server
CN110059261A (en) * 2019-03-18 2019-07-26 智者四海(北京)技术有限公司 Content recommendation method and device
CN110826726A (en) * 2019-11-08 2020-02-21 腾讯科技(深圳)有限公司 Object processing method, object processing apparatus, object processing device, and medium
CN111858934A (en) * 2015-12-04 2020-10-30 杭州数梦工场科技有限公司 Method and device for predicting article popularity

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102208086A (en) * 2010-03-31 2011-10-05 北京邮电大学 Field-oriented personalized intelligent recommendation system and implementation method
US20120030190A1 (en) * 2010-08-02 2012-02-02 Lee Hong-Lin Method of recording and searching for a web page and method of recording a browsed web page
CN102495873A (en) * 2011-11-30 2012-06-13 北京航空航天大学 Video recommending method based on video affective characteristics and conversation models
CN102508907A (en) * 2011-11-11 2012-06-20 北京航空航天大学 Dynamic recommendation method based on training set optimization for recommendation system
CN103544623A (en) * 2013-11-06 2014-01-29 武汉大学 Web service recommendation method based on user preference feature modeling

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102208086A (en) * 2010-03-31 2011-10-05 北京邮电大学 Field-oriented personalized intelligent recommendation system and implementation method
US20120030190A1 (en) * 2010-08-02 2012-02-02 Lee Hong-Lin Method of recording and searching for a web page and method of recording a browsed web page
CN102508907A (en) * 2011-11-11 2012-06-20 北京航空航天大学 Dynamic recommendation method based on training set optimization for recommendation system
CN102495873A (en) * 2011-11-30 2012-06-13 北京航空航天大学 Video recommending method based on video affective characteristics and conversation models
CN103544623A (en) * 2013-11-06 2014-01-29 武汉大学 Web service recommendation method based on user preference feature modeling

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李米娜: ""基于web聚类的个性化推荐服务研究"", 《万方数据企业知识服务平台》 *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104991968A (en) * 2015-07-24 2015-10-21 成都云堆移动信息技术有限公司 Text mining based attribute analysis method for internet media users
WO2017016059A1 (en) * 2015-07-24 2017-02-02 成都云堆移动信息技术有限公司 Text mining-based attribute analysis method for internet media users
CN104991968B (en) * 2015-07-24 2018-04-20 成都云堆移动信息技术有限公司 The Internet media user property analysis method based on text mining
CN111858934A (en) * 2015-12-04 2020-10-30 杭州数梦工场科技有限公司 Method and device for predicting article popularity
CN107292412A (en) * 2016-03-31 2017-10-24 阿里巴巴集团控股有限公司 A kind of problem Forecasting Methodology and forecasting system
US11281675B2 (en) 2016-05-12 2022-03-22 Advanced New Technologies Co., Ltd. Method for determining user behavior preference, and method and device for presenting recommendation information
CN107368488A (en) * 2016-05-12 2017-11-21 阿里巴巴集团控股有限公司 A kind of method for determining user behavior preference, the methods of exhibiting and device of recommendation information
US11086882B2 (en) 2016-05-12 2021-08-10 Advanced New Technologies Co., Ltd. Method for determining user behavior preference, and method and device for presenting recommendation information
CN106250526A (en) * 2016-08-05 2016-12-21 浪潮电子信息产业股份有限公司 A kind of text class based on content and user behavior recommends method and apparatus
CN106446059A (en) * 2016-09-02 2017-02-22 广东聚联电子商务股份有限公司 Big data-based page customization method
CN106339507A (en) * 2016-10-31 2017-01-18 腾讯科技(深圳)有限公司 Method and device for pushing streaming media message
CN106339507B (en) * 2016-10-31 2018-09-18 腾讯科技(深圳)有限公司 Streaming Media information push method and device
CN108733669A (en) * 2017-04-14 2018-11-02 优路(北京)信息科技有限公司 A kind of personalized digital media content recommendation system and method based on term vector
CN107577690B (en) * 2017-05-17 2021-01-05 中广核工程有限公司 Recommendation method and recommendation device for mass information data
CN107577690A (en) * 2017-05-17 2018-01-12 中广核工程有限公司 The recommendation method and recommendation apparatus of magnanimity information data
CN108959329A (en) * 2017-05-27 2018-12-07 腾讯科技(北京)有限公司 A kind of file classification method, device, medium and equipment
CN108959329B (en) * 2017-05-27 2023-05-16 腾讯科技(北京)有限公司 Text classification method, device, medium and equipment
CN107341199B (en) * 2017-06-21 2020-05-22 北京林业大学 Recommendation method based on document information commonality mode
CN107341199A (en) * 2017-06-21 2017-11-10 北京林业大学 A kind of recommendation method based on documentation & info general model
CN108563690A (en) * 2018-03-15 2018-09-21 中山大学 A kind of collaborative filtering recommending method based on object-oriented cluster
CN108563690B (en) * 2018-03-15 2022-01-21 中山大学 Collaborative filtering recommendation method based on object-oriented clustering
CN109460519A (en) * 2018-12-28 2019-03-12 上海晶赞融宣科技有限公司 Browse object recommendation method and device, storage medium, server
CN110059261A (en) * 2019-03-18 2019-07-26 智者四海(北京)技术有限公司 Content recommendation method and device
CN110826726A (en) * 2019-11-08 2020-02-21 腾讯科技(深圳)有限公司 Object processing method, object processing apparatus, object processing device, and medium
CN110826726B (en) * 2019-11-08 2023-09-08 腾讯科技(深圳)有限公司 Target processing method, target processing device, target processing apparatus, and medium

Also Published As

Publication number Publication date
CN104615779B (en) 2017-08-11

Similar Documents

Publication Publication Date Title
CN104615779A (en) Method for personalized recommendation of Web text
Zhou et al. Micro behaviors: A new perspective in e-commerce recommender systems
Gu et al. Hierarchical user profiling for e-commerce recommender systems
Wu et al. Turning clicks into purchases: Revenue optimization for product search in e-commerce
Sivapalan et al. Recommender systems in e-commerce
Cheng et al. Personalized click prediction in sponsored search
CN103678672B (en) Method for recommending information
CN108629665A (en) A kind of individual commodity recommendation method and system
CN104866474A (en) Personalized data searching method and device
CN104679771A (en) Individual data searching method and device
CN105893609A (en) Mobile APP recommendation method based on weighted mixing
CN102411754A (en) Personalized recommendation method based on commodity property entropy
CN103309886A (en) Trading-platform-based structural information searching method and device
CN101206674A (en) Enhancement type related search system and method using commercial articles as medium
CN103838756A (en) Method and device for determining pushed information
Zuo Sentiment analysis of steam review datasets using naive bayes and decision tree classifier
Eliyas et al. Recommendation systems: Content-based filtering vs collaborative filtering
CN105787767A (en) Method and system for obtaining advertisement click-through rate pre-estimation model
CN103455487A (en) Extracting method and device for search term
Yu et al. Self-propagation graph neural network for recommendation
CN105468628A (en) Sorting method and apparatus
Chai et al. User-aware multi-interest learning for candidate matching in recommenders
Wang et al. Intent mining: A social and semantic enhanced topic model for operation-friendly digital marketing
Zhang et al. Improving current interest with item and review sequential patterns for sequential recommendation
Islek et al. A hybrid recommendation system based on bidirectional encoder representations

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170811

Termination date: 20200228

CF01 Termination of patent right due to non-payment of annual fee