CN104615779A - Method for personalized recommendation of Web text - Google Patents
Method for personalized recommendation of Web text Download PDFInfo
- Publication number
- CN104615779A CN104615779A CN201510090280.4A CN201510090280A CN104615779A CN 104615779 A CN104615779 A CN 104615779A CN 201510090280 A CN201510090280 A CN 201510090280A CN 104615779 A CN104615779 A CN 104615779A
- Authority
- CN
- China
- Prior art keywords
- web text
- user
- keyword
- web
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
- G06F16/337—Profile generation, learning or modification
Abstract
The invention discloses a method for a personalized recommendation of a Web text. The method comprises the following steps: performing feature extraction on a plurality of kinds of Web texts generated before a certain time t so as to obtain a feature matrix E of a Web text set, and then performing cluster so as to obtain n categories; besides, according to a time span hj from the time when a Web text oj in a Web text subset relevant to the behavior of a certain user ui before the time t to the time t, calculating out a preference influence degree dj to the user ui so as to obtain a pair of category number-influence degree cj of the Web text oj and generate a dynamic preference vector of the user ui; if the preference influence degree of the user and the category of the Web text to be recommended is found to be higher than or equal to a threshold value tan, recommending the Web text to be recommended to the user. According to the method disclosed by the invention, the dynamic influence changed with the time lapse of current preference from the historic behavior of the user is considered, and the method is more accurate in recommendation, has dynamic performance, and more conforms to the actual condition.
Description
Technical field
The invention belongs to magnanimity information processing and data mining technology field, more specifically say, relate to a kind of Web text individuation recommend method, the historical data based on user behavior obtains user preference, recommends the Web text of interested and potential interest to user.
Background technology
The appearance of internet and universally meet the demand of user in the information age to information, but the raising of the evolution of network and people's cognitive ability, make the generation speed of information constantly accelerate.
Web text is the various Web information with text representation, and the text description of Internet news, content of microblog, e-business network site commodity or evaluation etc. are all the Typical Representatives of current Web text.Developing rapidly and popularizing along with Internet technology, a large amount of Web texts produces and becomes the important carrier of internet information.Can to obtain for user and the Web amount of text browsed exceed the actual ability that can process of user, to have occurred information overload problem, the demand of user becomes and obtains required information to greatest extent.
For the personalized recommendation of Web text, we need by analyzing in user's the past period the browsing of Web text, evaluate, pay close attention to or the behavior record of forwarding etc. and the historical data of user behavior, calculate the preference of user, Web text processed simultaneously and extract feature, the Web text meeting user preference condition is pushed to corresponding user.
Web text commending system, the various Web information of main process and recommendation text representation, comprise user modeling module, Web text modeling module and recommend method module.Wherein the foundation of Web text modeling module depends on user modeling module, and recommend method module needs to consider user modeling module and Web text modeling module, it can thus be appreciated that user modeling module and correlation method are core and the key of whole commending system.For this reason, need to set up effective user model and corresponding matching mechanisms, the foundation of known user modeling module is based on the historical data of user behavior, namely user's past to the browsing of Web text, evaluate, pay close attention to or the historical data of the behavior such as forwarding, completing user MBM, namely set up user preferences modeling, finally implement the personalized recommendation of Web text according to specific user.
Lu Meilian etc. propose " Individuation research direction commending system and recommend method based on theme ", and (on Dec 4th, 2013 announces, publication No. is the Chinese invention patent application of CN103425799A), use the historical data completing user modeling of user's browing record; Wang Xiaolong etc. propose " a kind of click reaction type personalized recommendation system " (on 07 16th, 2014 Granted publications, Authorization Notice No. is the Chinese invention patent of CN102685565B), with the personalized recommendation system of associated recommendation system globe area, feed back based on click and by the historical data of user preference, result adjusted automatically, thus producing more accurate recommendation results; Zhao Yanbin etc. propose " community-based relevant note commending system and recommend method " (announces on 05 28th, 2014, publication No. is the Chinese invention patent application of CN103823805A), be given by user preference historical data and between correlativity obtain the method for specific user recommendation results; Wang Li just waits (< Journal of Software >, the 1st phase in 2012) to propose " a kind of method obtaining user preference based on user's historical behavior contextual information "; Zhong little Wu etc. propose " a kind of commending system based on domain expert " (announces on 05 30th, 2012, publication No. is the Chinese invention patent application of CN102479202A), according to project data, user data and user behavior historical data digging user to the field of the scoring of project quality, the interested and potential interest of user and expert user data, and calculate the contiguous specialist list obtaining active user, return to user as recommendation results collection.
Existing Web text individuation recommend method, although consider the historical data of user behavior, the accuracy of recommending still awaits improving.
Summary of the invention
The object of the invention is on the basis of existing technology, a kind of Web text individuation recommend method is provided, improve the accuracy of recommending interested and potential interest Web text to user further.
For achieving the above object, a kind of Web text individuation of the present invention recommend method, is characterized in that, comprise the following steps:
(1), Web Text character extraction
1.1), the set of Web text key word generates
The some Web texts produced before certain moment t form Web text collection; Participle is carried out to the content of each Web text in Web text collection, removes stop word, obtain the keyword set describing Web text;
1.2), Web text feature dimension generates
Scan the keyword set of each Web text successively, keyword is wherein added to one without in the ordered set of repeat element and keyword, obtain orderly keyword set S={s
1, s
2..., s
m, m represents the size of orderly keyword set S, and namely without the quantity of duplicate key word, each keyword in orderly keyword set S respectively as the dimension weighing Web text, thus sets up the characteristic dimension of Web text;
1.3), Web text feature matrix generates
For each Web text in Web text collection, that occur in statistics Web text and be contained in the word frequency of each keyword in orderly keyword set S, the value of corresponding dimension in row vector is tieed up as m, if the keyword in order in keyword set S does not occur in Web text, then the value of corresponding dimension is 0, and this m ties up the proper vector that row vector is this Web text;
The proper vector of all Web texts forms the eigenmatrix E of Web text collection, and the columns of E is m, line number is Web text number;
(2), Web text model builds
Use k-means clustering algorithm, cluster is carried out to the proper vector of Web text each in eigenmatrix E, the Web text in Web text collection is divided into some classifications, composition category set R={r
1, r
2..., r
n, n is classification sum, r
z(z=1,2 ..., n) representation class mark, z is class numbering;
(3), dynamic subscriber's preference modeling
Use U={u
1, u
2..., u
lrepresenting user's set, l represents the quantity of user, user u
i(i=1,2 ..., the Web text subset involved by behavior l) before moment t is combined into O={o
1, o
2..., o
v, v is the quantity of Web text, Web text o
j(j=1,2 ..., the time span of moment distance moment t v) produced is h
j;
3.1) Web text involved by user behavior, is generated to user preference disturbance degree
Web text o
jto user u
ipreference disturbance degree is d
j:
Wherein, G (h
j) and G (h
k) can be expressed as:
In formula (2), e is natural logarithm, and b is for relatively to remember intensity, and b empirically sets (1≤b≤10);
3.2), Web text categories generates
Search Web text o
jaffiliated classification: search Web text o in Web text collection
j, return o
jaffiliated class numbering z
j; Meanwhile, integrating step 3.1) the Web text o that calculated
jdisturbance degree, Web text o can be obtained
jclass numbering-disturbance degree pair, be designated as c
j=(z
j, d
j);
Involved by all behaviors of user, the class numbering-disturbance degree of Web text is designated as C={c to set
1, c
2..., c
v;
3.3), user's preference of dynamic vector generates
If class numbering-disturbance degree is to c in set C
mand c
n(m, n=1,2 ..., v; M ≠ n) there is identical class numbering, then by c
ndisturbance degree be added to c
mdisturbance degree on, and remove c
n, until the class numbering nothing repetition that all class numbering-disturbance degree are right, the quantity that now class numbering-disturbance degree is right is v ' (v '≤v), and the individual class numbering-disturbance degree of this v ' is to formation user preference vector
namely user u is generated
ipreference of dynamic vector;
(4), Web text individuation is recommended
The Web text produced after moment t is Web text to be recommended;
4.1), first, use step 1.1) in method keyword extraction is carried out to Web text to be recommended, obtain the keyword set of Web text to be recommended, use step 1.2) in method obtain Web Text eigenvector to be recommended; Then, calculate the centre coordinate of each classification in category set R, namely calculate the barycentric coordinates belonging to all Web Text eigenvectors of each classification; Again, the distance of Web Text eigenvector to be recommended to each class center coordinate is calculated; Finally, according to MMD (minimax distance) sorting algorithm, Web text to be recommended is grouped into corresponding classification, obtains the class numbering belonging to it;
4.2), the Web text generation liked of user
Search user and gather the preference of dynamic vector that in U, all users are corresponding, find out all users wherein comprising class numbering belonging to Web text to be recommended; A given disturbance degree threshold tau (0.1≤τ≤0.7), if the preference disturbance degree finding out class belonging to user and Web text to be recommended is not less than τ, then recommends this user by this Web text to be recommended.
Goal of the invention of the present invention is achieved in that
Web text individuation recommend method of the present invention, by carrying out feature extraction to the some Web texts produced before certain moment t, obtaining the eigenmatrix E of Web text collection, then carrying out cluster and obtaining n classification; Meanwhile, to certain user u
iweb text o during Web text subset involved by behavior before moment t closes
jthe time span h of the moment distance moment t produced
jcalculate it to user u
ipreference disturbance degree d
j, obtain Web text o
jclass numbering-disturbance degree to c
j, generate user u
ipreference of dynamic vector; Like this when recommending, according to the distance of Web Text eigenvector to be recommended to each class center coordinate, Web text to be recommended is grouped into corresponding classification, search user and gather the preference of dynamic vector that in U, all users are corresponding, find out all users wherein comprising class numbering belonging to Web text to be recommended; If the preference disturbance degree finding out class belonging to user and Web text to be recommended is not less than threshold tau, then this Web text to be recommended is recommended this user.
Contemplated by the invention user's historical behavior to pass in time current preference and the dynamic effects that changes, provide a kind of more accurately, there is dynamic and the Web text individuation recommend method of more realistic situation.Compared with existing recommend method, more can embody the dynamic effects of user's historical behavior to current preference, instead of think that the effect of all historical behaviors is identical, irrelevant with the time.
Accompanying drawing explanation
Fig. 1 is a kind of embodiment process flow diagram of Web text individuation recommend method.
Embodiment
Below in conjunction with accompanying drawing, the specific embodiment of the present invention is described, so that those skilled in the art understands the present invention better.Requiring particular attention is that, in the following description, when perhaps the detailed description of known function and design can desalinate main contents of the present invention, these are described in and will be left in the basket here.
Embodiment
In the present embodiment, as shown in Figure 1, Web text individuation recommend method of the present invention comprises: step (1), Web Text character extraction, by participle, keyword interpolation, keyword word frequency statistics, carries out feature extraction to Web text; Step (2), Web text model build, and by generating the concept matrix of Web text collection based on synonym woods, then carry out cluster, build Web text model; Step (3), the modeling of dynamic subscriber's preference, Web text subset involved by the behavior before the historical data of user behavior and moment t closes, and its temporal information, based on memory curve model, set up dynamic subscriber's preference pattern, the user preference that expression is passed in time and constantly changed; Step (4), Web text individuation are recommended, and consider the similarity of user preference and Web text feature, set up by matching relationship the personalized recommendation that user's recommendation list completes Web text.
Below four steps are described in detail.
1, Web Text character extraction
Web text collection is made up of the some Web texts produced before certain moment t.The object of Web Text character extraction is to generate Web text feature matrix.
1.1, the set of Web text key word generates
Participle is carried out to the content of each Web text in Web text collection, removes stop word, obtain the keyword set describing Web text.
1.2, Web text feature dimension generates
Scan the keyword set of each Web text successively, keyword is wherein added to one without in the ordered set of repeat element and keyword, obtain orderly keyword set S={s
1, s
2..., s
m, m represents the size of orderly keyword set S, and namely without the quantity of duplicate key word, each keyword in orderly keyword set S respectively as the dimension weighing Web text, thus sets up the characteristic dimension of Web text.
1.3, Web text feature matrix generates
For each Web text in Web text collection, that occur in statistics Web text and be contained in the word frequency of each keyword in orderly keyword set S, the value of corresponding dimension in row vector is tieed up as m, if the keyword in order in keyword set S does not occur in Web text, then the value of corresponding dimension is 0.This row vector is the proper vector of Web text, thus in the dimensional space representated by orderly keyword set S, sets up a m dimension row vector for each Web text.
The proper vector of all Web texts forms the eigenmatrix E of Web text collection, and the columns of E is m, line number is Web text number.
2, Web text model builds
In the present embodiment, the object that Web text model builds is to generate Web text collection concept matrix, and then generates Web text categories set R, for Web text individuation is recommended to lay the foundation.
In order to finally obtain Web text categories, known k-means clustering algorithm can be used to carry out cluster to the Web text in Web text collection and namely cluster to be carried out to the proper vector of Web text each in eigenmatrix E, thus build Web text model.
In addition, consider along with Web content of text constantly increases, orderly keyword set S also can constantly increase, namely the dimension weighing Web text can constantly increase, therefore, utilize the correlativity that may exist between each keyword in orderly keyword set S, by the relation of word in Chinese thesaurus, the different keywords of same concept are mapped as identical concept, thus dimensionality reduction is carried out to orderly keyword set S.
In the present embodiment, the concrete steps of Web text model structure are as follows:
2.1, the generation of Web text collection concept matrix
The basic thought of Chinese thesaurus is, the mapping of the given word woods W be made up of p word to the concept set W ' be made up of q concept, wherein p>q, different words can be mapped to same concept.
Based on this, in word woods W, search each keyword s in orderly keyword set S successively
x, x=1,2 ..., m, if find have word and keyword s in word woods W
xidentical, just replace keyword s by the concept that this word is corresponding
x, and check keyword s
xwith in orderly keyword set S before the keyword replaced by same concept whether with keyword s
xrepeat, if there is keyword s
y(y=1,2 ..., x-1) and keyword s
xcorresponding concept is identical, then by keyword s
xwith keyword s
ymerge, specific practice is: by the rank transformation of eigenmatrix E, the value that the xth of eigenmatrix E arranges is added to y row, removes the xth row in eigenmatrix E, removes keyword s simultaneously from orderly keyword set S
x.
Process all keywords in orderly keyword set S, just can obtain the concept matrix E ' of Web text collection, orderly keyword set S becomes orderly keyword set S ', and dimension drops to m ' by m, wherein the Concept Vectors of the corresponding Web text of every a line of concept matrix E '.Compared with eigenmatrix E, the columns of concept matrix E ' is less, and the characteristic dimension of Web text collection also drops to m ' by Conceptual Projection from dimension m, and the cluster for Web text has carried out the pre-service of dimensionality reduction;
2.2, the generation of Web text categories
Use k-means clustering algorithm to carry out cluster to Web text collection, in cluster process, adopt the distance between mathematically simple inner product of vectors computing method tolerance Web text.By cluster, the Web text in being gathered herein by Web is divided into some classifications, composition category set R={r
1, r
2..., r
n, n is classification sum, r
z(z=1,2 ..., n) representation class mark, z is class numbering.
Therefore, in the present embodiment, in eigenmatrix E, the proper vector of each Web text carries out cluster is first to carrying out mapping process in eigenmatrix E, obtains concept matrix E ', then use k-means clustering algorithm, cluster is carried out to the proper vector of Web text each in concept matrix E '.
3, user's preference of dynamic modeling
Use U={u
1, u
2..., u
lrepresenting user's set, l represents the quantity of user, user u
i(i=1,2 ..., the Web text subset involved by behavior l) before moment t is combined into O={o
1, o
2..., o
v, v is the quantity of Web text, Web text o
j(j=1,2 ..., the time span of moment distance moment t v) produced is h
j.User u
idynamic user model be expressed as user preference vector
the process that the preference of description user ui is passed in time and constantly changed.The object of dynamic subscriber's preference modeling is to generate dynamic user preference vector
3.1, Web text involved by user behavior is generated to user preference disturbance degree
Order calculates Web text o
jto user u
ithe disturbance degree of preference is d
j, d
jfor user u in dynamic subscriber's preference pattern
ito Web text o
jpreference accounts for u
ithe ratio of all preferences:
G (h
j) represent memory curve model, G (h
j) >0, be a time span h
jsubtraction function, represent along with passage of time, Web text o
jto user u
ithe influence of preference declines, and can be expressed as:
In formula (2), e is natural logarithm, and b is for relatively to remember intensity, and the span of b is 1≤b≤10, specifically empirically sets.
3.2, Web text categories generates
Search Web text o
jaffiliated classification: search Web text o in Web text collection
j, return Web text o
jaffiliated class numbering z
j.Meanwhile, the Web text o that calculated of integrating step 3.1
jdisturbance degree, Web text o can be obtained
jclass numbering-disturbance degree pair, be designated as c
j=(z, d
j).
Involved by all behaviors of user, the class numbering-disturbance degree of Web text is designated as C={c to set
1, c
2..., c
v.
3.3, user's preference of dynamic vector generates
If class numbering-disturbance degree is to c in set C
mand c
n(m, n=1,2 ..., v; M ≠ n) there is identical class numbering, then by c
ndisturbance degree be added to c
mdisturbance degree on, and remove c
n, until the class numbering nothing repetition that all class numbering-disturbance degree are right, the quantity that now class numbering-disturbance degree is right is v ' (v '≤v), and the individual class numbering-disturbance degree of this v ' is to formation user preference vector
namely user u is generated
ipreference of dynamic vector.Wherein, user preference vector
in each element be class numbering-disturbance degree pair, disturbance degree represents user u
ito the fancy grade of v ' individual Web text generic, all disturbance degree sums are 1.
The present invention is based on dynamic subscriber's preference pattern can reflect user preferences and pass in time and the change occurred, time span h
jless, a certain preference of user is newer, and the preference that more energy representative of consumer is current, the recommendation results of the Web text drawn thus is by more identical for the preference current with user.
4, Web text individuation is recommended
Web text to be recommended refers to the Web text produced after moment t, and all Web texts to be recommended form Web text collection to be recommended, are designated as A.The method of abovementioned steps 1 is adopted to carry out feature extraction to each Web text in Web text collection A to be recommended, and in the Web text categories obtained before these Web texts are included into respectively.
To each Web text to be recommended, obtain the class numbering z belonging to it
s, then according to user's preference of dynamic vector, find out in all user's preference of dynamic vectors and comprise class numbering z
suser, the threshold tau (0.1≤τ≤0.7) of a given disturbance degree, if the preference disturbance degree of a certain user is not less than τ, then recommends this user by this Web text to be recommended.
Specifically, Web text individuation is recommended to comprise the following steps:
4.1, the Web text in Web text collection A to be recommended is classified, to obtain the class numbering of its generic;
Use the method in step 1.1 to carry out keyword extraction to the Web text in Web text collection A to be recommended and Web text to be recommended, obtain its keyword set, use the method in step 1.3, obtain each Web Text eigenvector to be recommended.And use the method for step 2.1 namely to obtain its Concept Vectors based on the Concept Mapping Method of Chinese thesaurus, two keywords of identical concept are mapped as in keyword set to Web text to be recommended, dimension values corresponding for a keyword rear in Web Text eigenvector is added to dimension values corresponding to previous keyword, and delete dimension values corresponding to a rear keyword, obtain the Concept Vectors of Web text to be recommended.
Then, the centre coordinate of each classification in category set R is calculated.In the present embodiment, use polygon center of gravity calculation method, regard all Web text concept vectors of each classification as polygonal summit, calculate barycentric coordinates.
Again, with known distance between two points formulae discovery, calculate the distance of Concept Vectors corresponding to each Web text in Web text collection A to be recommended to each class center coordinate respectively;
Finally, according to known MMD (minimax distance) sorting algorithm, respectively each Web text in Web text collection A to be recommended is grouped into specific classification, is grouped into corresponding classification, obtain the class numbering belonging to it.
4.2, the Web text generation liked of user
Search user and gather the preference of dynamic vector that in U, all users are corresponding, find out all users wherein comprising class numbering belonging to Web text to be recommended; A given disturbance degree threshold tau (0.1≤τ≤0.7), if the preference disturbance degree finding out class belonging to user and Web text to be recommended is not less than τ, then recommends this user by this Web text to be recommended, and puts into the recommendation list of this user.Meanwhile, the Web text in the recommendation list of user sorts according to respective preference disturbance degree.
In the present invention, utilize above-mentioned disturbance degree threshold tau, remove and the user that degree is not high is liked to this classification, thus improve the specific aim of recommendation results and recommend quality;
In above step 1 ~ 4, from Web Text character extraction, build Web text model, then dynamic subscriber's preference pattern is obtained based on memory curve model and user's historical behavior, the historical data of reflection user behavior is to the dynamic effects of current preference, last according to the relation between Web text to be recommended and user, complete personalized recommendation based on user preference.
Compared with prior art, the present invention has the following advantages and good effect:
(1), on the one hand, consider user's historical behavior and current preference passed in time and the dynamic effects that changes, provide a kind of more accurately, there is dynamic and the Web text individuation recommend method of more realistic situation.Compared with the recommend method of prior art, more can embody the dynamic effects of user's historical behavior to current preference, instead of think that the effect of all historical behaviors is identical, irrelevant with the time.
(2), on the other hand, adopt the Concept Mapping Method based on Chinese thesaurus, consider the potential contact between Web text key word, make the cluster result of Web text more reasonable, meet the actual use habit of user, also simplify the calculated amount of Web text modeling simultaneously, its result is more reasonable, richer semantic logic, the use habit of being more close to the users.
Example: the news personalization based on user's preference of dynamic is recommended
In this example, Web text is newsletter archive, browses the historical behavior of news for user, and 5 newsletter archives that before on February 1st, 1,2 users browse, comprise temporal information and news content, as shown in table 1, and relevant Chinese thesaurus is as shown in table 2.Newly producing news item " soldier holds 95 rifle warning shielding companion assaults " on February 2nd, 2015, is newsletter archive to be recommended.
Table 1 is user, time and Internet news text browsing data.
User | Browsing time | News is numbered | The newsletter archive browsed |
Li Yi | 2015-1-1 | 1 | The social electric business's platform of 2015 electricity Shang strategics issued by Sina's automobile |
King two | 2014-12-10 | 2 | Wheresoever is the border of animal protection? |
Li Yi | 2015-1-20 | 3 | Locate medium-and-large-sized motion SUV and breathe out not H7 volume production vehicle spy photograph |
King two | 2015-1-12 | 4 | Russia will put on display first item amphibious warfare rifle in the world |
King two | 2015-1-19 | 5 | 95 rifle fault Pin Xian foreign militaries dare not be discontented many with PLA |
Table 1
Table 2 is relevant Chinese thesaurus.
Word | Concept |
Automobile | Vehicle |
SUV | Vehicle |
Rifle | Weapon |
Protection | Protection |
Table 2
(1), the feature extraction of newsletter archive
The content of the newsletter archive in his-and-hers watches 1 carries out participle, removes stop word, extracts keyword, as shown in table 3.
News is numbered | Keyword |
1 | Automobile |
2 | Protection |
3 | SUV |
4 | Rifle |
5 | Rifle |
Table 3
First, set up the keyword set not having repeat element, orderly keyword set S={ automobile can be obtained by table 3, protection, SUV, rifle }.
Then, according to the dimensional space of orderly keyword set S representative, set up the proper vector of every bar newsletter archive, as shown in table 4.The proper vector of each newsletter archive forms the eigenmatrix E of Internet news text collection.
News is numbered | Proper vector |
1 | (1,0,0,0) |
2 | (0,1,0,0) |
3 | (0,0,1,0) |
4 | (0,0,0,1) |
5 | (0,0,0,1) |
Table 4
(2) newsletter archive model construction
First, with the Chinese thesaurus in table 2, Conceptual Projection is carried out to the proper vector of each newsletter archive, to reach the object of dimensionality reduction.By Conceptual Projection, obtain the Concept Vectors of each newsletter archive, and orderly keyword set S ' new accordingly={ vehicle, protection, weapon }, as shown in table 5.The Concept Vectors of each newsletter archive forms the concept matrix of newsletter archive.
News is numbered | Concept Vectors |
1 | (1,0,0) |
2 | (0,1,0) |
3 | (1,0,0) |
4 | (0,0,1) |
5 | (0,0,1) |
Table 5
Then, use k-means algorithm to carry out cluster to each newsletter archive, namely in Web text collection concept matrix, the Concept Vectors of each Web text carries out cluster, obtains 3 classifications, as shown in table 6.
News is numbered | Generic |
1 | 1 |
2 | 2 |
3 | 1 |
4 | 3 |
5 | 3 |
Table 6
3, dynamic subscriber's preference modeling
In the present embodiment, memory intensity b value is 5 relatively, calculates the preference disturbance degree of newsletter archive involved by each user behavior.In the present embodiment, for user " Lee one ", user browses 2 newsletter archives altogether, and its preference disturbance degree is respectively calculated as follows:
User " Lee one " has browsed the newsletter archive being numbered 1 and 3, as shown in Table 6, all belongs to classification 1, therefore d
1and d
2be all the disturbance degree of classification 1 correspondence, then the preference of user is all classification 1, and namely user " Lee one " disturbance degree to classification 1 is 1.
Same method, can obtain the disturbance degree namely browsing newsletter archive involved by user " king two " behavior, as shown in table 7.
User | The class numbering that preference is corresponding | Disturbance degree |
Li Yi | 1 | 1 |
King two | 2 | 0.0003 |
King two | 3 | 0.9997 |
Table 7
And then obtaining user's preference of dynamic vector, each user class numbering-disturbance degree is as shown in table 8 to the user preference vector formed.
User | User's preference of dynamic vector |
Li Yi | (1,1) |
King two | (2,0.0003),(3,0.9997) |
Table 8
(4), newsletter archive personalized recommendation
For newsletter archive to be recommended " soldier holds 95 rifle warning shielding companion assaults ", keyword " rifle " is obtained by feature extraction, Conceptual Projection is to " weapon ", the dimensional space corresponding for orderly keyword set S ' obtains Concept Vectors (0,0,1), classification 3 is grouped into thus.Threshold tau value 0.3, filters out the user " king two " that disturbance degree in preference vector is not less than τ, this newsletter archive is added the recommendation list of " king two ", thus complete the personalized recommendation of newsletter archive.
The present invention is based on memory curve model to describe and pass in time and the user preference changed, obtain dynamic subscriber's preference, and then using the personalized recommendation of Web text as starting point, the science of Web text modeling is improved by Conceptual Projection, construct the Web text individuation recommend method based on user's preference of dynamic, obtain recommendation results comparatively accurately in the mode of more realistic situation.
Although be described the illustrative embodiment of the present invention above; so that those skilled in the art understand the present invention; but should be clear; the invention is not restricted to the scope of embodiment; to those skilled in the art; as long as various change to limit and in the spirit and scope of the present invention determined, these changes are apparent, and all innovation and creation utilizing the present invention to conceive are all at the row of protection in appended claim.
Claims (2)
1. a Web text individuation recommend method, is characterized in that, comprises the following steps:
(1), Web Text character extraction
1.1), the set of Web text key word generates
The some Web texts produced before certain moment t form Web text collection; Participle is carried out to the content of each Web text in Web text collection, removes stop word, obtain the keyword set describing Web text;
1.2), Web text feature dimension generates
Scan the keyword set of each Web text successively, keyword is wherein added to one without in the ordered set of repeat element and keyword, obtain orderly keyword set S={s
1, s
2..., s
m, m represents the size of orderly keyword set S, and namely without the quantity of duplicate key word, each keyword in orderly keyword set S respectively as the dimension weighing Web text, thus sets up the characteristic dimension of Web text;
1.3), Web text feature matrix generates
For each Web text in Web text collection, that occur in statistics Web text and be contained in the word frequency of each keyword in orderly keyword set S, the value of corresponding dimension in row vector is tieed up as m, if the keyword in order in keyword set S does not occur in Web text, then the value of corresponding dimension is 0, and this m ties up the proper vector that row vector is this Web text;
The proper vector of all Web texts forms the eigenmatrix E of Web text collection, and the columns of E is m, line number is Web text number;
(2), Web text model builds
Use k-means clustering algorithm, cluster is carried out to the proper vector of Web text each in eigenmatrix E, the Web text in Web text collection is divided into some classifications, composition category set R={r
1, r
2..., r
n, n is classification sum, r
z(z=1,2 ..., n) representation class mark, z is class numbering;
(3), user's preference of dynamic modeling
Use U={u
1, u
2..., u
lrepresenting user's set, l represents the quantity of user, user u
i(i=1,2 ..., the Web text subset involved by behavior l) before moment t is combined into O={o
1, o
2..., o
v, v is the quantity of Web text, Web text o
j(j=1,2 ..., the time span of moment distance moment t v) produced is h
j;
3.1) Web text involved by user behavior, is generated to user preference disturbance degree
Web text o
jto user u
ipreference disturbance degree is d
j:
Wherein, G (h
j) and G (h
k) can be expressed as:
In formula (2), e is natural logarithm, and b is for relatively to remember intensity, and b empirically sets (1≤b≤10);
3.2), Web text categories generates
Search Web text o
jaffiliated classification: search Web text o in Web text collection
j, return o
jaffiliated class numbering z
j; Meanwhile, integrating step 3.1) the Web text o that calculated
jdisturbance degree, Web text o can be obtained
jclass numbering-disturbance degree pair, be designated as c
j=(z
j, d
j);
Involved by all behaviors of user, the class numbering-disturbance degree of Web text is designated as C={c to set
1, c
2..., c
v;
3.3), user's preference of dynamic vector generates
If class numbering-disturbance degree is to c in set C
mand c
n(m, n=1,2 ..., v; M ≠ n) there is identical class numbering, then by c
ndisturbance degree be added to c
mdisturbance degree on, and remove c
n, until the class numbering nothing repetition that all class numbering-disturbance degree are right, the quantity that now class numbering-disturbance degree is right is v ' (v '≤v), and the individual class numbering-disturbance degree of this v ' is to formation user preference vector
namely user u is generated
ipreference of dynamic vector;
(4), Web text individuation is recommended
The Web text produced after moment t is Web text to be recommended;
4.1), first, use step 1.1) in method keyword extraction is carried out to Web text to be recommended, obtain the keyword set of Web text to be recommended, use step 1.2) in method obtain Web Text eigenvector to be recommended; Then, calculate the centre coordinate of each classification in category set R, namely calculate the barycentric coordinates belonging to all Web Text eigenvectors of each classification; Again, the distance of Web Text eigenvector to be recommended to each class center coordinate is calculated; Finally, according to MMD (minimax distance) sorting algorithm, Web text to be recommended is grouped into corresponding classification, obtains the class numbering belonging to it;
4.2), the Web text generation liked of user
Search user and gather the preference of dynamic vector that in U, all users are corresponding, find out all users wherein comprising class numbering belonging to Web text to be recommended; A given disturbance degree threshold tau (0.1≤τ≤0.7), if the preference disturbance degree finding out class belonging to user and Web text to be recommended is not less than τ, then recommends this user by this Web text to be recommended.
2. recommend method according to claim 1, it is characterized in that, described use k-means clustering algorithm, carrying out cluster to the proper vector of Web text each in eigenmatrix E is: first to carrying out mapping process in eigenmatrix E, obtain concept matrix E ', then use k-means clustering algorithm, cluster is carried out to the proper vector of Web text each in concept matrix E ';
Described mapping is treated to: in word woods W, search each keyword s in orderly keyword set S successively
x, x=1,2 ..., m, if find have word and keyword s in word woods W
xidentical, just replace keyword s by the concept that this word is corresponding
x, and check keyword s
xwith in orderly keyword set S before the keyword replaced by same concept whether with keyword s
xrepeat, if there is keyword s
y(y=1,2 ..., x-1) and keyword s
xcorresponding concept is identical, then by keyword s
xwith keyword s
ymerge, specific practice is: by the rank transformation of eigenmatrix E, the value that the xth of eigenmatrix E arranges is added to y row, removes the xth row in eigenmatrix E, removes keyword s simultaneously from orderly keyword set S
x;
Process all keywords in orderly keyword set S, just can obtain the concept matrix E ' of Web text collection, orderly keyword set S becomes orderly keyword set S ', and dimension drops to m ' by m, wherein the Concept Vectors of the corresponding Web text of every a line of concept matrix E ';
Described step 4.1) in, two keywords of identical concept are mapped as in keyword set to Web text to be recommended, dimension values corresponding for a keyword rear in Web Text eigenvector is added to dimension values corresponding to previous keyword, and delete dimension values corresponding to a rear keyword, obtain the Concept Vectors of Web text to be recommended; Then barycentric coordinates calculating and classification is carried out according to the Concept Vectors of Web text to be recommended.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510090280.4A CN104615779B (en) | 2015-02-28 | 2015-02-28 | A kind of Web text individuations recommend method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510090280.4A CN104615779B (en) | 2015-02-28 | 2015-02-28 | A kind of Web text individuations recommend method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104615779A true CN104615779A (en) | 2015-05-13 |
CN104615779B CN104615779B (en) | 2017-08-11 |
Family
ID=53150221
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510090280.4A Expired - Fee Related CN104615779B (en) | 2015-02-28 | 2015-02-28 | A kind of Web text individuations recommend method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104615779B (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104991968A (en) * | 2015-07-24 | 2015-10-21 | 成都云堆移动信息技术有限公司 | Text mining based attribute analysis method for internet media users |
CN106250526A (en) * | 2016-08-05 | 2016-12-21 | 浪潮电子信息产业股份有限公司 | A kind of text class based on content and user behavior recommends method and apparatus |
CN106339507A (en) * | 2016-10-31 | 2017-01-18 | 腾讯科技(深圳)有限公司 | Method and device for pushing streaming media message |
CN106446059A (en) * | 2016-09-02 | 2017-02-22 | 广东聚联电子商务股份有限公司 | Big data-based page customization method |
CN107292412A (en) * | 2016-03-31 | 2017-10-24 | 阿里巴巴集团控股有限公司 | A kind of problem Forecasting Methodology and forecasting system |
CN107341199A (en) * | 2017-06-21 | 2017-11-10 | 北京林业大学 | A kind of recommendation method based on documentation & info general model |
CN107368488A (en) * | 2016-05-12 | 2017-11-21 | 阿里巴巴集团控股有限公司 | A kind of method for determining user behavior preference, the methods of exhibiting and device of recommendation information |
CN107577690A (en) * | 2017-05-17 | 2018-01-12 | 中广核工程有限公司 | The recommendation method and recommendation apparatus of magnanimity information data |
CN108563690A (en) * | 2018-03-15 | 2018-09-21 | 中山大学 | A kind of collaborative filtering recommending method based on object-oriented cluster |
CN108733669A (en) * | 2017-04-14 | 2018-11-02 | 优路(北京)信息科技有限公司 | A kind of personalized digital media content recommendation system and method based on term vector |
CN108959329A (en) * | 2017-05-27 | 2018-12-07 | 腾讯科技(北京)有限公司 | A kind of file classification method, device, medium and equipment |
CN109460519A (en) * | 2018-12-28 | 2019-03-12 | 上海晶赞融宣科技有限公司 | Browse object recommendation method and device, storage medium, server |
CN110059261A (en) * | 2019-03-18 | 2019-07-26 | 智者四海(北京)技术有限公司 | Content recommendation method and device |
CN110826726A (en) * | 2019-11-08 | 2020-02-21 | 腾讯科技(深圳)有限公司 | Object processing method, object processing apparatus, object processing device, and medium |
CN111858934A (en) * | 2015-12-04 | 2020-10-30 | 杭州数梦工场科技有限公司 | Method and device for predicting article popularity |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102208086A (en) * | 2010-03-31 | 2011-10-05 | 北京邮电大学 | Field-oriented personalized intelligent recommendation system and implementation method |
US20120030190A1 (en) * | 2010-08-02 | 2012-02-02 | Lee Hong-Lin | Method of recording and searching for a web page and method of recording a browsed web page |
CN102495873A (en) * | 2011-11-30 | 2012-06-13 | 北京航空航天大学 | Video recommending method based on video affective characteristics and conversation models |
CN102508907A (en) * | 2011-11-11 | 2012-06-20 | 北京航空航天大学 | Dynamic recommendation method based on training set optimization for recommendation system |
CN103544623A (en) * | 2013-11-06 | 2014-01-29 | 武汉大学 | Web service recommendation method based on user preference feature modeling |
-
2015
- 2015-02-28 CN CN201510090280.4A patent/CN104615779B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102208086A (en) * | 2010-03-31 | 2011-10-05 | 北京邮电大学 | Field-oriented personalized intelligent recommendation system and implementation method |
US20120030190A1 (en) * | 2010-08-02 | 2012-02-02 | Lee Hong-Lin | Method of recording and searching for a web page and method of recording a browsed web page |
CN102508907A (en) * | 2011-11-11 | 2012-06-20 | 北京航空航天大学 | Dynamic recommendation method based on training set optimization for recommendation system |
CN102495873A (en) * | 2011-11-30 | 2012-06-13 | 北京航空航天大学 | Video recommending method based on video affective characteristics and conversation models |
CN103544623A (en) * | 2013-11-06 | 2014-01-29 | 武汉大学 | Web service recommendation method based on user preference feature modeling |
Non-Patent Citations (1)
Title |
---|
李米娜: ""基于web聚类的个性化推荐服务研究"", 《万方数据企业知识服务平台》 * |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104991968A (en) * | 2015-07-24 | 2015-10-21 | 成都云堆移动信息技术有限公司 | Text mining based attribute analysis method for internet media users |
WO2017016059A1 (en) * | 2015-07-24 | 2017-02-02 | 成都云堆移动信息技术有限公司 | Text mining-based attribute analysis method for internet media users |
CN104991968B (en) * | 2015-07-24 | 2018-04-20 | 成都云堆移动信息技术有限公司 | The Internet media user property analysis method based on text mining |
CN111858934A (en) * | 2015-12-04 | 2020-10-30 | 杭州数梦工场科技有限公司 | Method and device for predicting article popularity |
CN107292412A (en) * | 2016-03-31 | 2017-10-24 | 阿里巴巴集团控股有限公司 | A kind of problem Forecasting Methodology and forecasting system |
US11281675B2 (en) | 2016-05-12 | 2022-03-22 | Advanced New Technologies Co., Ltd. | Method for determining user behavior preference, and method and device for presenting recommendation information |
CN107368488A (en) * | 2016-05-12 | 2017-11-21 | 阿里巴巴集团控股有限公司 | A kind of method for determining user behavior preference, the methods of exhibiting and device of recommendation information |
US11086882B2 (en) | 2016-05-12 | 2021-08-10 | Advanced New Technologies Co., Ltd. | Method for determining user behavior preference, and method and device for presenting recommendation information |
CN106250526A (en) * | 2016-08-05 | 2016-12-21 | 浪潮电子信息产业股份有限公司 | A kind of text class based on content and user behavior recommends method and apparatus |
CN106446059A (en) * | 2016-09-02 | 2017-02-22 | 广东聚联电子商务股份有限公司 | Big data-based page customization method |
CN106339507A (en) * | 2016-10-31 | 2017-01-18 | 腾讯科技(深圳)有限公司 | Method and device for pushing streaming media message |
CN106339507B (en) * | 2016-10-31 | 2018-09-18 | 腾讯科技(深圳)有限公司 | Streaming Media information push method and device |
CN108733669A (en) * | 2017-04-14 | 2018-11-02 | 优路(北京)信息科技有限公司 | A kind of personalized digital media content recommendation system and method based on term vector |
CN107577690B (en) * | 2017-05-17 | 2021-01-05 | 中广核工程有限公司 | Recommendation method and recommendation device for mass information data |
CN107577690A (en) * | 2017-05-17 | 2018-01-12 | 中广核工程有限公司 | The recommendation method and recommendation apparatus of magnanimity information data |
CN108959329A (en) * | 2017-05-27 | 2018-12-07 | 腾讯科技(北京)有限公司 | A kind of file classification method, device, medium and equipment |
CN108959329B (en) * | 2017-05-27 | 2023-05-16 | 腾讯科技(北京)有限公司 | Text classification method, device, medium and equipment |
CN107341199B (en) * | 2017-06-21 | 2020-05-22 | 北京林业大学 | Recommendation method based on document information commonality mode |
CN107341199A (en) * | 2017-06-21 | 2017-11-10 | 北京林业大学 | A kind of recommendation method based on documentation & info general model |
CN108563690A (en) * | 2018-03-15 | 2018-09-21 | 中山大学 | A kind of collaborative filtering recommending method based on object-oriented cluster |
CN108563690B (en) * | 2018-03-15 | 2022-01-21 | 中山大学 | Collaborative filtering recommendation method based on object-oriented clustering |
CN109460519A (en) * | 2018-12-28 | 2019-03-12 | 上海晶赞融宣科技有限公司 | Browse object recommendation method and device, storage medium, server |
CN110059261A (en) * | 2019-03-18 | 2019-07-26 | 智者四海(北京)技术有限公司 | Content recommendation method and device |
CN110826726A (en) * | 2019-11-08 | 2020-02-21 | 腾讯科技(深圳)有限公司 | Object processing method, object processing apparatus, object processing device, and medium |
CN110826726B (en) * | 2019-11-08 | 2023-09-08 | 腾讯科技(深圳)有限公司 | Target processing method, target processing device, target processing apparatus, and medium |
Also Published As
Publication number | Publication date |
---|---|
CN104615779B (en) | 2017-08-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104615779A (en) | Method for personalized recommendation of Web text | |
Zhou et al. | Micro behaviors: A new perspective in e-commerce recommender systems | |
Gu et al. | Hierarchical user profiling for e-commerce recommender systems | |
Wu et al. | Turning clicks into purchases: Revenue optimization for product search in e-commerce | |
Sivapalan et al. | Recommender systems in e-commerce | |
Cheng et al. | Personalized click prediction in sponsored search | |
CN103678672B (en) | Method for recommending information | |
CN108629665A (en) | A kind of individual commodity recommendation method and system | |
CN104866474A (en) | Personalized data searching method and device | |
CN104679771A (en) | Individual data searching method and device | |
CN105893609A (en) | Mobile APP recommendation method based on weighted mixing | |
CN102411754A (en) | Personalized recommendation method based on commodity property entropy | |
CN103309886A (en) | Trading-platform-based structural information searching method and device | |
CN101206674A (en) | Enhancement type related search system and method using commercial articles as medium | |
CN103838756A (en) | Method and device for determining pushed information | |
Zuo | Sentiment analysis of steam review datasets using naive bayes and decision tree classifier | |
Eliyas et al. | Recommendation systems: Content-based filtering vs collaborative filtering | |
CN105787767A (en) | Method and system for obtaining advertisement click-through rate pre-estimation model | |
CN103455487A (en) | Extracting method and device for search term | |
Yu et al. | Self-propagation graph neural network for recommendation | |
CN105468628A (en) | Sorting method and apparatus | |
Chai et al. | User-aware multi-interest learning for candidate matching in recommenders | |
Wang et al. | Intent mining: A social and semantic enhanced topic model for operation-friendly digital marketing | |
Zhang et al. | Improving current interest with item and review sequential patterns for sequential recommendation | |
Islek et al. | A hybrid recommendation system based on bidirectional encoder representations |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20170811 Termination date: 20200228 |
|
CF01 | Termination of patent right due to non-payment of annual fee |