CN103309900A

CN103309900A - Personalized multidimensional document sequencing method and system

Info

Publication number: CN103309900A
Application number: CN201210069568XA
Authority: CN
Inventors: 祁勇
Original assignee: Individual
Current assignee: Individual
Priority date: 2012-03-06
Filing date: 2012-03-06
Publication date: 2013-09-18

Abstract

The invention discloses a personalized multidimensional document sequencing method and a personalized multidimensional document sequencing system. The method comprises the following steps of iteratively calculating a sequencing value of each document under each characteristic according to a link relationship between the documents and parameter vectors of the documents; acquiring a group of documents according to a searching condition submitted by a query user, and calculating a personalized sequencing value of each document in the group according to a sequencing vector of the document in the group and a query vector set by the query user; transmitting the documents of the group to the query user according to the magnitudes of the personalized sequencing values, wherein the parameter vectors of the documents are updated according to a parameter vector updating algorithm. The invention also discloses the personalized multidimensional document sequencing system. According to the method, the precision ratio of a search engine and the anti-cheating capability of a webpage ranking algorithm can be improved.

Description

A kind of document ordering method and system of personalized various dimensions

Technical field

The present invention relates to internet arena, relate in particular to a kind of document ordering method and system of personalized various dimensions.

Background technology

Search engine is to utilize information retrieval technique to carry out large-scale collecting web page, index, ordering, and the application program of webpage being presented to inquiring user according to ranking results.Along with the rapid growth of network information, search engine has more and more become the indispensable instrument that people obtain the network information.

The core technology of search engine is sort algorithm.The most effectively sort algorithm is super link analysis algorithm, for example the PageRank algorithm of Google.Although existing search engine has commercially been obtained huge success, they also have very large room for improvement.It is the web page interlinkage relation that is made up according to its subjective desire by the Web page maker on the link analysis technological essence, although it has fully reflected Web page maker's individual preference and the understanding that web page interlinkage is concerned, it does not but reflect the user-user's of search engine individual preference.Because it is normally different to the Assessment of Important of same webpage with the user with different hobbies to be engaged in different industries, and the super link analysis algorithm such as PageRank can't carry out this differentiation, they can only provide unique page rank to each webpage, therefore, the design of super link analysis algorithm is defective.Feasible solution is that the individualized feature in conjunction with the user improves Search Results, so that the rank of each webpage not only relies on the linking relationship between the webpage, and relies on the user's who submits search inquiry to individualized feature.The analysis showed that, by the individualized feature of user and document, can improve the precision ratio of search engine, reduced the user to the scanning of invalid information and browse.

But there is a following difficult problem in the individualized feature that obtains on the internet user and webpage.The firstth, the automatic acquisition problem of customized information.It is estimated has nearly 1,000,000,000,000 webpages and 2,000,000,000 users on the present internet, safeguard that webpage and user's individualized feature is unpractical by hand.How the individualized feature of automatic acquisition user and webpage is a huge challenge to personalized search.The secondth, the replacement problem of customized information.As time goes on, the personal information such as user's hobby, work place, the industry of being engaged in and education degree can change, but it is difficult requiring all users to upgrade in real time its customized information.The 3rd is the semantic difference problem of customized information.Fill in the website such as the user who has and to like Mozart, filling in of having to like classical music, possible these two users' hobby is identical, but since the difference of literal expression therefore be difficult to they are effectively sorted out.The 4th is the problem of completeness of customized information.The personal information that the user fills in the website is usually simpler.Normally liking music in the description on hobby one hurdle, play baseball or reading etc. such as the user, is difficult and require the user to describe out meticulously all sidedly its interested field.

In sum, how to obtain webpage and user's individualized feature, and come the optimizing webpage rank algorithm according to webpage and user's individualized feature, expecting the webpage sought so that search engine can filter and filter out the user more effectively in the webpage of magnanimity, is a problem demanding prompt solution.

Summary of the invention

Problem in view of above-mentioned prior art existence, the object of the present invention is to provide a kind of document ordering method and system of personalized various dimensions, by automatic acquisition document and user's individualized feature, improve the searching order algorithm, improve the precision ratio of search engine.

According to above-described purpose, the present invention proposes a kind of document ordering method of personalized various dimensions, it is characterized in that, domain features collection K={1 is set in server, 2 ..., L}, and carry out following steps:

Obtain a plurality of documents, form document sets D={1,2 ..., M}; At least contain two document subsets, wherein document sets among the described document sets D

In each document contain at least one link and point to document among the described document sets D, document sets

In each document pointed to by the contained link of a document among the described document sets S at least; And S ∪ E=D, S ∩ E ≠ Φ;

Each document among the described document sets D is provided with ordering vector and parameter vector; If the ordering vector K of document m (m ∈ D) _p(m)=[PR (m, 1), PR (m, 2) ..., PR (m, k) ..., PR (m, L)], wherein said PR (m, k) is illustrated in the lower ranking value of described document m in described document sets D of feature k (k ∈ K); If the parameter vector K of described document m _d(m)=(dw _M1, dw _M2..., dw _Mk..., dw _ML), wherein said dw _MkThe degree of correlation that represents described document m and feature k (k ∈ K); Described K _d(m) upgraded by the parameter vector update algorithm;

Upgrade the ordering vector of each document among the described document sets D; The ordering vector update algorithm is as follows: the ranking value of any one the document m among the described document sets D under feature k (k ∈ K), be linked to described document m among the described document sets S each document under described feature k ranking value and described document sets S in be linked to the function of the degree of correlation of each document of described document m and described feature k;

According to described ordering vector, the document among the described document sets D is processed.

Compared with prior art, the inventive method is by the signal of user's access document (such as webpage), automatically upgrade document and user's parameter vector, and according to document and user's parameter vector Algorithms for Page Ranking is optimized, improved the precision ratio of search engine.In addition, owing to used document and user's individualized feature technology, so this patent method has improved the anti-cheating ability on page rank.

Description of drawings

Fig. 1 is the parameter vector method for expressing of each document in document sets D;

Fig. 2 is for collecting the parameter vector method for expressing of each user among the U the user;

Fig. 3 is the ordering vector method for expressing of each document in document sets D;

Fig. 4 is document and user's parameter vector update algorithm process flow diagram;

Fig. 5 is a kind of process flow diagram of document ordering method of personalized various dimensions;

Fig. 6 is a kind of structural drawing of document ordering system of personalized various dimensions.

Embodiment

By reference to the accompanying drawings the inventive method is described in further detail.

The specific embodiments explanation of this patent method comprises following components.At first, the method for numbering serial of specification documents, user and domain features, document and user's parameter vector method for expressing and the ordering vector method for expressing of document; Then, Algorithms for Page Ranking based on parameter vector is described; Again, specification documents and user's parameter vector update algorithm; At last, introduce a kind of document ordering method and system of personalized various dimensions.

The method for numbering serial of specification documents, user and domain features at first.Described document is often referred to the Web webpage, and it obtains on the internet by the reptile program.The form of expression of described document comprises text, video, music and picture etc.Be numbered by the URL address to the Web webpage that obtains, we have just obtained document sets D={1, and 2 ..., M}, wherein M is the document number.Equally, the user on the internet also has unique identification, such as user account number, phone number, Cookie identification code, IP address, Email address and instant communication number etc.We pool together having one group of user of uniquely identified on the internet, and form the user and collect U={1,2 ..., N}, wherein N is user's number.

Document and user's common area feature is also carried out Unified number, form feature set K={1,2 ..., L}, wherein L is Characteristic Number.Domain features represents the attribute of user and document, for example news, finance and economics, science, music, military affairs and physical culture etc.Described feature, the attribute that can describe document can be described again user's attribute.A document has " science " feature, and the specification documents content is relevant with " science "; And a user has " science " feature, illustrates that the user likes the content relevant with " science ".

The below introduces the method for expressing of document and user's parameter vector.It is similar to the vectorial expression method of the vector space model that Gerard Salton proposes, namely with the base unit of characteristic item as document or user's feature.The parameter vector that represents document with the set of the degree of correlation of document and each feature; The parameter vector that represents the user with the set of the degree of correlation of user and each feature.

Fig. 1 is the parameter vector method for expressing of each document in document sets D.The parameter vector of any one document i in document sets D (i ∈ D) is set to K _d(i)=(dw _I1, dw _I2..., dw _Ik..., dw _IL), wherein said dw _IkThe degree of correlation that represents described document i and feature k (k ∈ K), dw _Ik∈ [a, b], a and b are nonnegative constant.In addition, the degree of correlation of k feature of each document among the described document sets D is pooled together, form a vector, be called k the document column vector (dw of document sets D _1k, dw _2k..., dw _Ik..., dw _Mk).

Fig. 2 is for collecting the parameter vector method for expressing of each user among the U the user.The parameter vector that collects any one user j among the U (j ∈ U) the user is set to K _u(j)=(uw _J1, uw _J2..., uw _Jk..., uw _JL), wherein said uw _JkThe degree of correlation that represents described user j and feature k (k ∈ K), uw _Jk∈ [a, b], a and b are nonnegative constant.In addition, the degree of correlation that described user is collected k the feature of each user among the U pools together, and forms a vector, is called k user's column vector (uw that the user collects U _1k, uw _2k..., uw _Jk..., uw _Nk).

The described degree of correlation is a real number value, the close relation degree of each feature among its expression document or user and the feature set K.If document or user are related with musical features more related a little less with sports feature, we just say that the degree of correlation of the document or user and musical features is high, and are low with the degree of correlation of sports feature.In addition, have correlativity between some feature, therefore for example physics and mechanics can reduce by reducing correlativity between the feature dimension of feature set K when feature selecting, reduce the demand to the server stores space, improves efficiency of algorithm.Some feature needn't directly be listed in the feature set, because the relatedness computation that the degree of correlation of these features can be by one or several further feature among the feature set K out.

The following describes the Algorithms for Page Ranking based on parameter vector.The PageRank algorithm of standard can represent with following formula.

PR (m) = \frac{1 - d}{M} + d \underset{i &Element; T}{Σ} \frac{PR (i)}{C (i)} - - - (1)

Wherein, set

For the chain of webpage m (m ∈ D) enters the webpage set, C (i) goes out webpage quantity for the chain of webpage i (i ∈ T); D is damping factor, and it represents that the user visits the probability of described webpage m by the link of other webpage, and 1-d represents that the user directly accesses the probability of described webpage m by keying in the URL address, and d ∈ (0,1); The webpage sorting value PageRank of PR (m) the described webpage m of expression in described document sets D, M represents the webpage quantity among the document sets D.In addition, the initial ranking value of each webpage is made as 1/M.

The shortcoming of the PageRank algorithm of standard is that on the internet each webpage only has a unique webpage sorting value.And in fact, be engaged in different industries different often to the Assessment of Important of same webpage with the user with different hobbies.Therefore, need to improve existing sort algorithm.

Fig. 3 is the ordering vector method for expressing of each document in document sets D.In order to realize personalized search, we expand the conventional P ageRank value of webpage, and the one dimension ranking value PR (i) with any one the document i among the described document sets D expands to the ordering vector based on the multidimensional of domain features.If the ordering vector K of document i (i ∈ D) _p(i)=[PR (i, 1), PR (i, 2) ..., PR (i, k) ..., PR (i, L)], the ranking value of wherein said PR (i, k) the described document i of expression under feature k (k ∈ K).Therefore, described formula (1) can be transformed into following ordering vector update algorithm:

PR (m, k) = \frac{1 - d}{M} + d \underset{i &Element; T}{Σ} \frac{PR (i, k) \cdot {dw}_{ik}}{C (i)} - - - (2)

Wherein, set

For the chain of webpage m (m ∈ D) enters the webpage set; D is damping factor, and it represents that the user visits the probability of described webpage m by the link of other webpages, and 1-d represents that the user directly accesses the probability of described webpage m by keying in the URL address, and d ∈ (0,1); The webpage sorting value of PR (m, k) the described webpage m of expression under feature k (k ∈ K).C (i) is a kind of in following two kinds of define methods: the first, C (i) is that the chain of webpage i (i ∈ T) goes out webpage quantity; The second, C (i) is one and sets constant, for example to each i ∈ T, C (i)=1 is set.In addition, for each webpage m ∈ D and each feature k ∈ K, establish the initial ranking value PR (m, k) of webpage=1/M.

Using described formula (2) before, need to make some pre-service to document sets D, in order to reject the webpage of grade sinking (rank sink) and grade leakage (rank leak).So-called grade sinking refers to that one group of webpage that interlinks does not all point to the link of any webpage outside this group; So-called grade is revealed and is referred to the webpage link of not going out.In described document sets D, solve grade sinking problem by adding the back link that a sensing chain enters webpage; Eliminate the impact that the grade leakage brings by rejecting the webpage that produces the grade leakage.After finishing iterative algorithm, more disallowable webpage is recovered, give their ranking value by described formula (2).

Described formula (2) can calculate under each feature k ∈ K the ranking value PR (m, k) of any one the document m among the described document sets D by iterative manner.Stopping criterion for iteration is that the ordering vector of document satisfies ∑ in twice iteration _{I ∈ D}|| K _p ^N+1(i)-K _p ⁿ(i) || reach the setting constant less than default penalties or iterations, wherein n is iterative steps.In an application example of described formula (2), after per step iteration, need to Sorted list vector [PR (and 1, k), PR (2, k) ..., PR (M, k)] carry out normalized (normalization), to guarantee the convergence of described formula (2).

In described formula (2), except needs were known the web page interlinkage topological structure in advance, the step of a key obtained the parameter vector of each document (being webpage) exactly.

The following describes the parameter vector K of document i _d(i)=(dw _I1, dw _I2..., dw _Ik..., dw _IL) update method.At first, by the parameter vector initial value method to set up of three example declaratives documents and certain customers, then, specification documents and user's parameter vector update algorithm.If document or user's parameter vector is not set up initial value, the default null vector that is made as of its parameter vector initial value.

Example 1 is the method that the parameter vector initial value of document i (i ∈ D) manually is set.Feature sum L=5 for example is set, and feature set K=(science, finance and economics, education, music, physical culture) arranges K _d(i)=(dw _I1, dw _I2, dw _I3, dw _I4, dw _I5)=(0,0.00032,0,0.00059,0).Be document i with the degree of correlation of " finance and economics " feature be 0.00032, with the degree of correlation of " music " feature be 0.00059, with the degree of correlation of further feature be zero.Equally, use similar approach that customer parameter vector K can be set _u(j) initial value.

Example 2 is methods that the parameter vector initial value of user j (j ∈ U) is set.One group of collection of document by described user j submission

The parameter vector of described document r (r ∈ H) is K _d(r)=(dw _R1, dw _R2..., dw _RL), therefore, for each k ∈ K, uw is set _Jk=(σ ₁/ s) ∑ _{(r ∈ H)}[dw _Rk/ (∑ _{(k ∈ K)}Dw _Rk)], wherein s is the element number of described set H, σ ₁For setting constant.Use similarity method, described user j also can collect described user selects one group of user to calculate the parameter vector initial value of described user j among the U.

Example 3 is a kind of methods that the parameter vector initial value of document is set.Catalogue is a kind of special document, and corresponding document code is arranged.We suppose that the document under the same directory has some identical feature, and the document under for example physical culture catalogue is all relevant with physical culture.If document i (i ∈ D) is a document under the catalogue n (n ∈ D), then the parameter vector initial value of described document i is decided by the parameter vector of described catalogue n, namely for each k ∈ K, dw is set _Ik=σ ₂Dw _Nk, σ wherein ₂Be constant.

Fig. 4 is document and user's parameter vector update algorithm process flow diagram.Described parameter vector update algorithm comprises following concrete steps:

S11. collect certain customers' parameters vector initial value among the U for partial document among the document sets D and user;

S12. receive the signal that any one user j (j ∈ U) accesses any one document i (i ∈ D), described signal comprises the sign of described document i and described user j at least;

S13. according to the sign of described document i, read the parameter vector K of described document i _d(i)=(dw _I1, dw _I2..., dw _Ik..., dw _IL), wherein said dw _IkThe degree of correlation that represents described document i and feature k (k ∈ K);

S14. according to the sign of described user j, read the parameter vector K of described user j _u(j)=(uw _J1, uw _J2..., uw _Jk..., uw _JL), wherein said uw _JkThe degree of correlation that represents described user j and feature k (k ∈ K);

S15. upgrade the parameter vector of described document i and described user j:

K _d ^*(i)＝function1[K _d(i)，K _u(j)]

K _u ^*(j)＝function2[K _d(i)，K _u(j)]

Wherein, described K _d(i) and described K _d ^*(i) represent respectively to upgrade parameter vector front and the rear described document i of renewal, described K _u(j) and described K _u ^*(j) represent respectively to upgrade parameter vector front and the rear described user j of renewal, described K _d ^*(i)=(dw _I1 ^*, dw _I2, ^*..., dw _Ik ^*..., dw _IL ^*), described K _u ^*(j)=(uw _J1 ^*, uw _J2 ^*..., uw _Jk ^*..., uw _JL ^*); Described function1 represents K _d ^*(i) be K _d(i) and K _u(j) function, described function2 represents K _u ^*(j) be K _d(i) and K _u(j) function.

In the described method of Fig. 4, execute after the described step S15, also comprise and upgrade described K _d(i) and described K _u(j) step namely arranges K _d(i)=K _d ^*(i) and K _u(j)=K _u ^*(j).

In the described method of Fig. 4, described signal is a kind of in the following several types: T=1 represents that described user j clicks the link of described document i, T=2 represents that described user j keys in the address of described document i, T=3 represents that described user j arranges label to described document i, T=4 represents that the described document i of described user j is set to bookmark, T=5 represent the described document i of described user j be set to like (as the Like of the types of facial makeup in Beijing operas and Google+1), T=6 represents that described user j comments on described document i, and T=7 represents that described user j transmits described document i.In an application example of the described method of Fig. 4, described signal gathers from the Web daily record.Described Web daily record comprises server log (server log), error log (error log) and Cookie daily record etc.

In an application example of the described method of Fig. 4, described method satisfies K _d ^*(i) 〉=K _d(i) and K _u ^*(j) 〉=K _u(j).Inequality K wherein _d ^*(i) 〉=K _d(i) implication is for any k ∈ K, and dw is arranged _Ik ^*〉=dw _IkInequality K _u ^*(j) 〉=K _u(j) implication is for any k ∈ K, and uw is arranged _Jk ^*〉=uw _Jk

In an application example of the described method of Fig. 4, described K _d ^*(i) component of a vector dw _Ik ^*Described uw _JkIncreasing function, be ∑ _{(k ∈ K)}Uw _JkSubtraction function; Described K _u ^*(j) component of a vector uw _Jk ^*Described dw _IkIncreasing function, be ∑ _{(k ∈ K)}Dw _IkSubtraction function.

In an application example of the described method of Fig. 4, need to be for each feature k ∈ K, respectively to document column vector (dw _1k, dw _2k..., dw _Ik..., dw _Mk) and user's column vector (uw _1k, uw _2k..., uw _Jk..., uw _Nk) carry out normalized (normalization).The executive condition of described normalized comprises a kind of in the following condition at least: (1). carry out described parameter vector update algorithm and reach set point number; (2). start according to the default time.The concrete application example of described method for normalizing is as follows:

Example 1: to document column vector (dw _1k, dw _2k..., dw _Ik..., dw _Nk) carry out the method for normalized, comprise for each i ∈ D dw is set _Ik=dw _Ik/ (∑ _{(t ∈ D)}Dw _Tk); To user's column vector (uw _1k, uw _2k..., uw _Jk..., uw _Nk) carry out the method for normalized, comprise for each j ∈ U uw is set _Jk=uw _Jk/ (∑ _{(t ∈ U)}Uw _Tk).

Example 2: at document column vector (dw _1k, dw _2k..., dw _Ik..., dw _Mk) in randomly draw R data and they sorted, obtain set { s ₁, s ₂..., s _R, and s ₁＜s ₂＜...＜s _RFor each i ∈ D, if s _m≤ dw _Ik≤ s _M+1, dw then is set _Ik=g (s _m), g (s wherein _m) be increasing function, g (s _m) ∈ [a, b], a and b are nonnegative constant.Same method can be to user's column vector (uw _1k, uw _2k..., uw _Jk..., uw _Nk) carry out normalized.

Application example 1.

This is an application example of the described parameter vector update algorithm of Fig. 4, and it upgrades the parameter vector K of described document i (i ∈ D) with following specific algorithm _d ^*(i) and the parameter vector K of described user j (j ∈ U) _u ^*(j):

Dw _Ik ^*=dw _Ik+ λ ₁(i, j, T) f ₁[K _u(j)] (for each

)

Uw _Jk ^*=uw _Jk+ λ ₂(i, j, T) f ₂[K _d(i)] (for each )

Wherein, described λ ₁(i, j, T) is under the type T of described signal, and described user j is to the influence coefficient of described document i; Described λ ₂(i, j, T) is under the type T of described signal, and described document i is to the influence coefficient of described user j.Described DK _iThe parameter vector K by described document i _d(i)=(dw _I1, dw _I2..., dw _Ik..., dw _IL) in the P of numerical value maximum _iThe set that the corresponding feature of individual component forms, described UK _jThe parameter vector K by described user j _u(j)=(uw _J1, uw _J2..., uw _Jk..., uw _JL) in the Q of numerical value maximum _jThe set that the corresponding feature of individual component forms, P _iAnd Q _jBe setup parameter.I=30 for example, P ₃₀=3, DK ₃₀={ literature, computing machine, biology }; J=265, Q ₂₆₅=2, UK ₂₆₅={ science, history }.Described dw _IkWith described dw _Ik ^*Represent respectively to upgrade parameter vector K front and the rear described document i of renewal _d(i) k component, described uw _JkWith described uw _Jk ^*Represent respectively to upgrade parameter vector K front and the rear described user j of renewal _u(j) k component.Described specific algorithm carries out following assignment after also being included in and carrying out described specific algorithm, namely for each k ∈ DK _i, uw is set _Jk=uw _Jk ^*For each k ∈ UK _j, dw is set _Ik=dw _Ik ^*

In described application example 1, described specific algorithm can be further defined to: for each k ∈ DK _i, satisfy uw _Jk ^*〉=uw _JkFor each k ∈ UK _j, satisfy dw _Ik ^*〉=dw _Ik

In described application example 1, described f ₁[K _u(j)] be the parameter vector K of described user j _u(j) function, described f ₂[K _d(i)] be the parameter vector K of described document i _d(i) function.Described f ₁[K _uAnd described f (j)] ₂[K _d(i)] concrete methods of realizing comprises:

Example 1: described f ₁[K _u(j)] be described uw _JkIncreasing function, be ∑ _{(k ∈ K)}Uw _JkSubtraction function; Described f ₂[K _d(i)] be described dw _IkIncreasing function, be ∑ _{(k ∈ K)}Dw _IkSubtraction function.

Example 2:f ₁[K _u(j)]=σ ₃Uw _Jk/ (∑ _{(k ∈ K)}Uw _Jk), f ₂[K _d(i)]=σ ₄Dw _Ik/ (∑ _{(k ∈ K)}Dw _Ik), σ wherein ₃And σ ₄For setting constant.

Example 3:f ₁[K _u(j)]=σ ₅Uw _Jk, f ₂[K _d(i)]=σ ₆Dw _Ik, σ wherein ₅And σ ₆For setting constant.

Example 4:f ₁[[K _u(j)]=σ ₇{ 1/[1+exp (uw _Jk)], f ₂[K _d(i)]=σ ₈{ 1/[1+exp (dw _Ik)], σ wherein ₇And σ ₈For setting constant.

In described application example 1, described λ ₁(i, j, T) and described λ ₂The concrete methods of realizing of (i, j, T) comprising:

Example 1: described λ ₁(i, j, T) and described λ ₂(i, j, T) is respectively the function of mathematical distance sim (i, j) between the parameter vector of described document i and described user j.λ for example ₁(i, j, T)=c ₁Sim (i, j), λ ₂(i, j, T)=c ₂Sim (i, j), wherein sim (i, j)=|| K _d(i), K _u(j) ||=[∑ _k(dw _IkUw _Jk)]/{ [∑ _k(dw _Ik) ²] ^1/2[∑ _k(uw _Jk) ²] ^1/2, c ₁And c ₂For setting constant.The implication of this example is that the mathematical distance between document and user's the parameter vector is larger, and their similarity degree is higher, so they the scale-up factor of " ballot " is larger each other.

Example 2: described λ ₁(i, j, T)=u ₁(j) d ₁(i), described λ ₂(i, j, T)=u ₂(j) d ₂(i), u wherein ₁(j) whether the parameter vector of expression user j can be used for upgrading the parameter vector of document sets D document, d ₁(i) whether the parameter vector of expression document i can be upgraded u by the parameter vector that the user collect user among the U ₂(j) whether the parameter vector of expression user j can be upgraded d by the parameter vector of document among the document sets D ₂(i) whether the parameter vector of expression document i can be used for upgrading the parameter vector that the user collects U user.u ₁(j), u ₂(j), d ₁(i) and d ₂(i) be setup parameter, their value is 0 or 1.1 representative is that 0 representative is no.The implication of this example is for preventing malicious attack, and some is through the document (or user) of reliability certification, and its parameter vector can not be upgraded by other user's (or document) parameter vector; And some document (or user) is not owing to passing through reliability certification, and its parameter vector can not upgrade other user's (or document) parameter vector.

Example 3: described λ ₁(i, j, T)=s ₁(T), described λ ₂(i, j, T)=s ₂(T).Wherein said T is the type of user's access document signal, described s ₁(T) and described s ₂(T) be the function of described T.

Example 4: the Combination of Methods with above-mentioned example 1～3 generates described λ ₁(i, j, T) and λ ₂(i, j, T).For example

λ ₁(i，j，T)＝{c ₁·sim(i，j)}·{u ₁(j)·d ₁(i)}·s ₁(T)

λ ₂(i，j，T)＝{c ₂·sim(i，j)}·{u ₂(j)·d ₂(i)}·s ₂(T)。

Example 5: described λ ₁(i, j, T) and described λ ₂(i, j, T) is for setting constant.

In described application example 1, need to be for each feature k ∈ K, respectively to document column vector (dw _1k, dw _2k..., dw _Ik..., dw _Mk) and user's column vector (uw _1k, uw _2k..., uw _Jk..., uw _Nk) carry out normalized.The executive condition of normalized comprises a kind of in the following condition at least: (1). carry out described concrete parameter vector update algorithm and reach set point number, (2). start according to the default time.

Application example 2

This is a concrete applicating example of described application example 1.For convenience of illustration,let us suppose that two users and three documents on the internet, and each user and each document all have two features, and namely the user collects U={1,2}, document sets D={1,2,3}, feature set K={1,2}.User 1 and user's 2 parameter vector is respectively (uw ₁₁, uw ₁₂) and (uw ₂₁, uw ₂₂), the parameter vector of document 1, document 2 and document 3 is respectively (dw ₁₁, dw ₁₂), (dw ₂₁, dw ₂₂) and (dw ₃₁, dw ₃₂).Uw wherein ₂₁The degree of correlation that represents described user 2 and feature 1; Dw ₃₂Represent the degree of correlation of described document 3 and feature 2, the subscript define method of other degree of correlation is similar with it.

Supposed to receive the signal of the described document 3 of described user's 2 access, and signal type T=1, then according to following parameter vector update algorithm, upgrade described document 3 and described user's 2 parameter vector:

dw ₃₁ ^*＝dw ₃₁+λ ₁(2，3，1){uw ₂₁/(uw ₂₁+uw ₂₂)}

dw ₃₂ ^*＝dw ₃₂+λ ₁(2，3，1){uw ₂₂/(uw ₂₁+uw ₂₂)}

uw ₂₁ ^*＝uw ₂₁+λ ₂(2，3，1){dw ₃₁/(dw ₃₁+dw ₃₂)}

uw ₂₂ ^*＝uw ₂₂+λ ₂(2，3，1){dw ₃₂/(dw ₃₁+dw ₃₂)}

Wherein, λ ₁(2,3,1) is illustrated under the signal type T=1, the influence coefficient of 2 pairs of described documents 3 of described user; λ ₂(2,3,1) is illustrated under the signal type T=1,3 couples of described users' 2 of described document influence coefficient.λ for example ₁(2,3,1)=c ₁Sim (2,3) s ₁(1); λ ₂(2,3,1)=c ₂Sim (2,3) s ₂(1).S wherein ₁(1)=s ₂(1)=1.5; Mathematical distance between described sim (2,3) the described document 3 of expression and described user's 2 the parameter vector, i.e. sim (2,3)=(uw ₂₁Dw ₃₁+ uw ₂₂Dw ₃₂)/{ [(uw ₂₁) ²+ (uw ₂₂) ²] ^1/2[(dw ₃₁) ²+ (dw ₃₂) ²] ^1/2; c ₁And c ₂For setting constant.

After executing above-mentioned algorithm, upgrade described document 3 and described user's 2 parameter vector, dw namely is set ₃₁=dw ₃₁ ^*, dw ₃₂=dw ₃₂ ^*, uw ₂₁=uw ₂₁ ^*And uw ₂₂=uw ₂₂ ^*

After executing the parameter vector update algorithm, to user's column vector (uw ₁₁, uw ₂₁) and (uw ₁₂, uw ₂₂) carry out normalized, and to document column vector (dw ₁₁, dw ₂₁, dw ₃₁) and (dw ₁₂, dw ₂₂, dw ₃₂) carry out normalized:

Uw ₁₁=uw ₁₁/ (uw ₁₁+ uw ₂₁), uw ₂₁=uw ₂₁/ (uw ₁₁+ uw ₂₁); (to feature k=1)

Uw ₁₂=uw ₁₂/ (uw ₁₂+ uw ₂₂), uw ₂₂=uw ₂₂/ (uw ₁₂+ uw ₂₂); (to feature k=2)

dw ₁₁＝dw ₁₁/(dw ₁₁+dw ₂₁+dw ₃₁)，dw ₂₁＝dw ₂₁/(dw ₁₁+dw ₂₁+dw ₃₁)，

Dw ₃₁=dw ₃₁/ (dw ₁₁+ dw ₂₁+ dw ₃₁); (to feature k=1)

dw ₁₂＝dw ₁₂/(dw ₁₂+dw ₂₂+dw ₃₂)，dw ₂₂＝dw ₂₂/(dw ₁₂+dw ₂₂+dw ₃₂)，

Dw ₃₂=dw ₃₂/ (dw ₁₂+ dw ₂₂+ dw ₃₂); (to feature k=2).

Fig. 5 is a kind of document ordering method flow diagram of personalized various dimensions.This is an integrated application example of Extraordinary document ordering method, comprises the steps:

S10. use described parameter vector update algorithm for many times, upgrade the parameter vector of document among the described document sets D and the parameter vector that described user collects user among the U; Concrete methods of realizing comprises the step S11 to S15 among Fig. 4;

S20., the ordering vector initial value of each document among the described document sets D is set;

S30. use described ordering vector update algorithm, iteration is updated under each feature the ranking value of each document among the described document sets D, namely upgrades the ordering vector of each document among the described document sets D;

S40. user n (n ∈ D) arranges query vector, and submits search condition to; In described search condition, extract search key; Described search condition comprises all information that the user submits in search dialogue;

S50. one group of document Q that retrieval and described search key mate in described document sets D;

S60. according to the ordering vector of each document among described query vector and the described one group of document Q, calculate the personalized ordering value UR (i, n) of each document among described one group of document Q; Described UR (i, n) expression is based on the personalized ordering value of the described document i (i ∈ Q) of the query vector of described user n;

S70. according to described personalized ordering value UR (i, n), described one group of document Q is sorted, and according to ranking results described one group of document Q is sent to described user n.

In the described method of Fig. 5, establish the query vector K of user n _s(n)=(sw _N1, sw _N2..., sw _Nk..., sw _NL), sw wherein _NkExpression is queried the degree of correlation of document and feature k (k ∈ K), sw _Nk∈ [a, b], a and b are default nonnegative constant.This query vector K _s(n) following several method to set up is arranged.The firstth, in feature set K, select feature by described user n, and it is arranged the feature degree of correlation, sw for example is set _N2=0.00023, sw _N6=0.00061, other component of a vector is 0.The secondth, give described query vector K the parameter vector assignment of described user n _s(n).The 3rd is the sign S that described user n submits one group of user or document to _n=..., r ... }.When

The time, described user r (r ∈ S _n) parameter vector be (uw _R1, uw _R2..., uw _RL), therefore the query vector of described user n is made as: for each feature k ∈ K, sw _Nk=(σ ₉/ s) ∑ _{(r ∈ Sn)}[uw _Rk/ (∑ _{(k ∈ K)}Uw _Rk)]; When

The time, described document r (r ∈ S _n) parameter vector be (dw _R1, dw _R2..., dw _RL), therefore the query vector of described user n is set to: for each feature k ∈ K, sw _Nk=(σ ₁₀/ s) ∑ _{(r ∈ Sn)}[dw _Rk/ (∑ _{(k ∈ K)}Dw _Rk)]; When

The time, the query vector of described user n is made as: for each feature k ∈ K, sw _Nk=(σ ₉/ s) ∑ _{(r ∈ Sn ∩ U)}[uw _Rk/ (∑ _{(k ∈ K)}Uw _Rk)]+(σ ₁₀/ s) ∑ _{(r ∈ Sn ∩ D)}[dw _Rk/ (∑ _{(k ∈ K)}Dw _Rk)], wherein s is described S set _nElement number, σ ₉And σ ₁₀For setting constant.

In the described method of Fig. 5, described personalized ordering value UR (i, n) has two kinds of computing method.

First method is the query vector K according to described user n _s(n) and the ordering vector of described document i (i ∈ Q), calculate the personalized ordering value UR (i, n) of described document i, namely

UR (i, n) = Σ_{k = 1}^{L} PR (i, k) \cdot {sw}_{nk}

Wherein, described PR (i, k) is illustrated in the lower ranking value of described document i in described document sets D of feature k (k ∈ K), sw _NkExpression is queried the degree of correlation of document and feature k (k ∈ K).

Second method is that described UR (i, n) is defined as mathematical distance between the parameter vector of the query vector of described user n and the document i among described one group of document Q.If the query vector of described user n is K _s(n)=(sw _N1, sw _N2..., sw _NL), the parameter vector K of described document i (i ∈ Q) _d(i)=(dw _I1, dw _I2..., dw _IL), then:

UR(i，n)＝||K _s(n)，K _d(i)||＝[∑ _k(sw _nk·dw _ik)]/{[∑ _k(sw _nk) ²] ^1/2·[∑ _k(dw _ik) ²] ^1/2}。

Fig. 6 is a kind of structural drawing of document ordering system of personalized various dimensions.Described document ordering system 200 comprises following functional module:

The document user characteristics arranges module 211: obtain a plurality of Web web document by the reptile program, form document sets D={1,2, ..., M}, wherein each document is provided with ordering vector and parameter vector, and described ordering vector and described parameter vector are stored in the document database 220; Obtain a plurality of users on the internet, form the user and collect U={1,2 ..., N}, wherein each user is provided with parameter vector, and it is stored in the customer data base 230; Domain features collection K={1 is set, 2 ..., L}, and it is stored in the property data base 240;

Document and user's initial value arrange module 212: be the vector of the partial document parameters among described document sets D initial value, be each the document setup ordering vector initial value among the described document sets D, and described parameter vector initial value and described ordering vector initial value are stored in the document database 220; For the user collects certain customers parameters vector initial value among the U, and it is stored in the customer data base 230; Be not set up user and the document of parameter vector initial value, its default parameter vector initial value is made as null vector;

User's access document signal acquisition module 213: be used for gathering the signal that any one user j (j ∈ U) (102) accesses any one document i (i ∈ D), wherein comprise at least the sign of described user j (102) and described document i, described signal storage is in Web log database 250; The signal of the described document i of described user j (102) access, to be sent at least one application server, described application server comprises portal site server 301, social networking service device 302, search engine server 303 and instant communication server 304;

Document and customer parameter vector update module 214: according to the sign of the described document i that in described user's access document signal acquisition module 213, obtains and described user j (102), in described document database 220, read the parameter vector of described document i, in described customer data base 230, read the parameter vector of user j; Then by the parameter vector update algorithm, upgrade the parameter vector of described document i and described user j (102); According to the described document i after upgrading and the parameter vector of described user j (102), upgrade respectively described document database 220 and described customer data base 230 at last;

Document ordering vector update module 215: according to the parameter vector of the relation of the document links among the described document sets D and each document, iteration is updated under each feature the ranking value of each document among the described document sets D, then upgrades described document database 220 with the described ranking value after upgrading;

User's enquiry module 216: at first, the user arranges query vector and submits search condition to; Then, in described search condition, extract search key, and in described document sets D, retrieve the one group of document Q that mates with described search key; Afterwards, according to the ordering vector of each document among described query vector and the described one group of document Q, calculate the personalized ordering value of each document among described one group of document Q, and according to described personalized ordering value described one group of document Q is sorted; At last, according to described ranking results described one group of document Q is sent in batches the user of submit Query.

The above application example only is better application example of the present invention, is not to limit protection scope of the present invention.

Claims

1. the document ordering method of personalized various dimensions is characterized in that, domain features collection K={1 is set in server, 2 ..., L}, and carry out following steps:

2. method according to claim 1 is characterized in that, repeatedly uses described ordering vector update algorithm, the ordering vector of each document among the described document sets D is carried out iteration upgrade.

3. method according to claim 1, it is characterized in that, described ordering vector update algorithm is further comprising the steps of, and namely the chain according to each document among the described document sets S goes out number, adjusts the ranking value of each document under each feature k (k ∈ K) among the described document sets S.

4. method according to claim 1 is characterized in that, described parameter vector update algorithm comprises the steps: at first, obtains a plurality of users, form the user and collect U={1, and 2 ..., N} establishes the parameter vector K of user j (j ∈ U) _u(j)=(uw _J1, uw _J2..., uw _Jk..., uw _JL), wherein said uw _JkThe degree of correlation that represents described user j and feature k (k ∈ K); Then, be the vector of the partial document parameters among described document sets D initial value, and be that described user collects the certain customers' parameters vector initial value among the U; At last, repeatedly carry out following steps:

Receive the signal that any one user j (j ∈ U) accesses any one document i (i ∈ D), described signal comprises the sign of described document i and described user j at least;

According to the sign of described document i, read the parameter vector K of described document i _d(i);

According to the sign of described user j, read the parameter vector K of described user j _u(j);

Upgrade the parameter vector of described document i and described user j:

K _d ^*(i)＝function1[K _d(i)，K _u(j)]

K _u ^*(j)＝function2[K _d(i)，K _u(j)]

Wherein said K _d(i) and described K _d ^*(i) represent respectively to upgrade parameter vector front and the rear described document i of renewal, described K _u(j) and described K _u ^*(j) represent respectively to upgrade parameter vector front and the rear described user j of renewal.

5. method according to claim 4 is characterized in that, in described parameter vector update algorithm, satisfies K _d ^*(i) 〉=K _d(i) and K _u ^*(j) 〉=K _u(j).

6. method according to claim 4 is characterized in that, establishes described K _d ^*(i)=(dw _I1 ^*, dw _I2, ^*..., dw _Ik ^*..., dw _IL ^*), described K _u ^*(j)=(uw _J1, uw _J2 ^*..., uw _Jk ^*..., uw _JL ^*), described dw then _Ik ^*Described uw _JkIncreasing function, be ∑ _{(k ∈ K)}Uw _JkSubtraction function; Described uw _Jk ^*Described dw _IkIncreasing function, be ∑ _{(k ∈ K)}Dw _IkSubtraction function.

7. method according to claim 4, it is characterized in that, described signal is a kind of in the following several types, comprises that described user j clicks the link of described document i, described user j and keys in the address of described document i, described user j and to described document i label, the described document i of described user j are set and are set to that bookmark, the described document i of described user j are set to like, described user j comments on described document i and described user j transmits described document i.

8. method according to claim 4 is characterized in that, the parameter vector K of described document i _d ^*(i) and the parameter vector K of described user j _u ^*(j) concrete update algorithm is as follows:

Dw _Ik ^*=dw _Ik+ λ ₁(i, j, T) f ₁[K _u(j)] (for each

)

Uw _Jk ^*=uw _Jk+ λ ₂(i, j, T) f ₂[K _d(i)] (for each

)

λ wherein ₁(i, j, T) is under the type T of described signal, and described user j is to the influence coefficient of described document i; λ ₂(i, j, T) is under the type T of described signal, and described document i is to the influence coefficient of described user j; Described DK _iThe parameter vector K by described document i _d(i)=(dw _I1, dw _I2..., dw _Ik..., dw _IL) in the P of numerical value maximum _iThe set that the corresponding feature of individual component forms, described UK _jThe parameter vector K by described user j _u(j)=(uw _J1, uw _J2..., uw _Jk..., uw _JL) in the Q of numerical value maximum _jThe set that the corresponding feature of individual component forms, P _iAnd Q _jBe setup parameter; Described dw _IkWith described dw _Ik ^*Represent respectively to upgrade parameter vector K front and the rear described document i of renewal _d(i) k component, described uw _JkWith described uw _Jk ^*Represent respectively to upgrade parameter vector K front and the rear described user j of renewal _u(j) k component.

9. method according to claim 8 is characterized in that, described λ ₁(i, j, T) and described λ ₂(i, j, T) is respectively the function of the mathematical distance between the parameter vector of the parameter vector of described document i and described user j.

10. method according to claim 4 is characterized in that, for each feature k ∈ K, to user's column vector (uw _1k, uw _2k..., uw _Jk..., uw _Nk) carry out normalized; And for each feature k ∈ K, to document column vector (dw _1k, dw _2k..., dw _Ik..., dw _Mk) carry out normalized.

11. method according to claim 1 is characterized in that, described method comprises the one by one document ordering application example of property, and its step is as follows:

Repeatedly use described parameter vector update algorithm, upgrade the parameter vector of document among the described document sets D and the parameter vector that described user collects user among the U;

The ordering vector initial value of each document among the described document sets D is set;

Use described ordering vector update algorithm, iteration is updated under each feature the ranking value of each document among the described document sets D;

User n (n ∈ D) arranges query vector and submits search condition to; In described search condition, extract search key;

One group of document Q of retrieval and described search key coupling in described document sets D;

According to the ordering vector of each document among described query vector and the described one group of document Q, calculate the personalized ordering value of each document among described one group of document Q;

According to described personalized ordering value, described one group of document Q is sorted, and according to ranking results described one group of document Q is sent to described user n.

12. the document ordering system of personalized various dimensions is characterized in that, comprises following functional module:

The document user characteristics arranges module: obtain a plurality of Web webpages by the reptile program, form document sets D={1,2, ..., M}, wherein each document is provided with ordering vector and parameter vector, and described ordering vector and described parameter vector are stored in the document database; Obtain a plurality of users on the internet, form the user and collect U={1,2 ..., N}, wherein each user is provided with parameter vector, and it is stored in the customer data base; Domain features collection K={1 is set, 2 ..., L}, and it is stored in the property data base;

Document and user's initial value arrange module: be the vector of the partial document parameters among described document sets D initial value, be each the document setup ordering vector initial value among the described document sets D, and described parameter vector initial value and described ordering vector initial value are stored in document database; For the user collects certain customers parameters vector initial value among the U, and it is stored in customer data base; Be not set up user and the document of parameter vector initial value, the default null vector that is made as of its parameter vector initial value;

User's access document signal acquisition module: be used for gathering the signal that any one user j (j ∈ U) accesses any one document i (i ∈ D), wherein comprise at least the sign of described user j and described document i, described signal storage is in the Web log database; Described user j accesses the signal of described document i, will be sent at least one application server, and described application server comprises portal site server, social networking service device, search engine server and instant communication server;

Document and customer parameter vector update module: according to the described document i that obtains in described user's access document signal acquisition module and the sign of described user j, in described document database, read the parameter vector of described document i and the parameter vector that in described customer data base, reads user j; Then upgrade the parameter vector of described document i and described user j by the parameter vector update algorithm;

Document ordering vector update module: according to the ordering vector initial value of each document among the described document sets D and the parameter vector of each document, iteration is updated under each feature the ranking value of each document among the described document sets D, and upgrades described document database with the described ranking value after upgrading;

User's enquiry module: at first, the user arranges query vector and submits search condition to, and described enquiry module extracts search key in described search condition; Then, one group of document Q of retrieval and described search key coupling in described document sets D; Afterwards, according to the ordering vector of each document among described query vector and the described one group of document Q, calculate the personalized ordering value of each document among described one group of document Q; At last, according to described personalized ordering value described one group of document Q is sorted, and the user who described one group of document Q is sent in batches submit Query according to described ranking results.