Embodiment
By reference to the accompanying drawings the inventive method is described in further detail.
Explanation to this patent method specific embodiments, comprise following components.At first, the method for numbering serial of user, document and feature is described, and the parameter vector method for expressing of user and document; Then, the parameter vector update algorithm based on user's calling party signal is described, and based on the parameter vector update algorithm of user's access document signal; Afterwards, the method for searching user or customer group on social networks according to given feature is described; Finally, a kind of system of obtaining the user individual feature is described.
At first the method for numbering serial of user, document and feature is described.
Obtain on the internet a plurality of users, each user has at least one user ID, and described user ID comprises in user account number, phone number, Cookie identification code, IP address, Email address and instant communication number.A plurality of users that obtain are carried out Unified number, and Customs Assigned Number is pooled together and forms the user and collect U={1,2 ..., M}, wherein M is user's number, each user has unique subscriber-coded.Equally, obtain on the internet a plurality of documents, for example by spider, obtain a plurality of Web webpages.Document on internet, have unique identification, for example the URL address of Web webpage.A plurality of documents that obtain are carried out Unified number, and document code is pooled together and forms document sets D={1,2 ..., N}, wherein N is the document number, each document has unique document coding.
Described user is collected the feature that each element in U and described document sets D has carry out Unified number, composition characteristic collection K={1,2 ..., L}, wherein L is Characteristic Number.The attribute of described character representation user and document, for example news, finance and economics, science, music, military affairs and physical culture etc.
Below introduce the method for expressing of the parameter vector of user and document.Described parameter vector method for expressing is similar to the vectorial expression method of vector space model, namely uses the base unit of characteristic item as user characteristics or file characteristics.The parameter vector that represents the user with the set of the degree of correlation of user and each feature, represent the parameter vector of document with the set of the degree of correlation of document and each feature.If certain user or document do not have certain feature, the degree of correlation of user or document and this feature is zero.
Fig. 1 collects the parameter vector method for expressing of each user in U for the user.The parameter vector that collects any one user i in U (i ∈ U) the user is set to K
u(i)=(uw
i1, uw
i2..., uw
ik..., uw
iL), wherein said uw
ikThe degree of correlation that represents described user i and feature k (k ∈ K), uw
ik∈ [a, b], a and b are nonnegative constant.In addition, the degree of correlation that described user is collected k the feature of each user in U and feature set K pools together and forms a vector, is called the user and collects k user's column vector (uw of U
1k, uw
2k..., uw
Mk).
Fig. 2 is the parameter vector method for expressing of each document in document sets D.The parameter vector of any one document n in document sets D (n ∈ D) is set to K
d(n)=(dw
n1, dw
n2..., dw
nk..., dw
nL), wherein said dw
nkThe degree of correlation that represents described document n and feature k (k ∈ K), dw
nk∈ [a, b], a and b are nonnegative constant.In addition, the degree of correlation of k the feature of each document in described document sets D and feature set K is pooled together and forms a vector, be called k the document column vector (dw of document sets D
1k, dw
2k..., dw
Nk).
The described degree of correlation is a real number value, the close relation degree of certain feature in its expression document or user and feature set K.If document or user are related with musical features more related a little less with sports feature, we just say that the degree of correlation of the document or user and musical features is high, and are low with the degree of correlation of sports feature.In addition, have correlativity between some feature, therefore can reduce the dimension of feature set K by the correlativity between the minimizing feature when feature selecting, reduce the demand to the server stores space, improve efficiency of algorithm.Some feature needn't directly be listed in feature set, because the relatedness computation that the degree of correlation of these features can be by one or several further feature in feature set K out.
The following describes the method to set up of the parameter vector initial value of user or document.Describe for following three examples.If the parameter vector of user or document is not set up initial value, the default null vector that is made as of its parameter vector initial value.
Example 1 is the method that the parameter vector initial value of user i (i ∈ U) or document n (n ∈ D) manually is set.Feature sum L=5 for example is set, and feature set K=(science, finance and economics, education, music, physical culture), arrange K
u(i)=(uw
i1, uw
i2, uw
i3, uw
i4, uw
i5)=(0,0.00032,0,0.00059,0).Be user i be 0.00032 with the degree of correlation of " finance and economics " feature, with the degree of correlation of " music " feature be 0.00059, with the degree of correlation of further feature be zero.Use similar approach, the parameter vector K of arbitrary document n can be set
d(n)=(dw
n1, dw
n2..., dw
nk..., dw
nL) initial value.
Example 2 is methods that the parameter vector initial value of user i (i ∈ U) is set.Submit one group of collection of document H={... to by described user i, r ...
The parameter vector of described document r (r ∈ H) is K
d(r)=(dw
r1, dw
r2..., dw
rL), therefore, for each k ∈ K, uw is set
ik=(σ
1/ s) ∑
(r ∈ H)[dw
rk/ (∑
(k ∈ K)dw
rk)], wherein s is the element number of described set H, σ
1For setting constant.Use similar approach, described user i also can select one group of user to calculate the parameter vector initial value of described user i in described user collects U.
Example 3 is a kind of methods that the parameter vector initial value of document is set.Catalogue is a kind of special document, and corresponding document code is arranged.Generally include the split catalogs such as news, music, physical culture, finance and economics and science and technology such as portal website.We suppose that the document under same directory has some identical feature, and the document under for example physical culture catalogue is all relevant to physical culture.If document n (n ∈ D) is a document under catalogue h (h ∈ D), the parameter vector initial value of described document n is decided by the parameter vector of described catalogue h.For example, for each k ∈ K, dw is set
nk=σ
2Dw
hk, σ wherein
2For setting constant.
Fig. 3 is the method for obtaining the user individual feature based on user's calling party signal.Specifically comprise the steps:
S11. obtain on the internet a plurality of users, the user that storage is comprised of described a plurality of users collects U={1, and 2 ..., M}; A plurality of features are set, the feature set K={1 that storage is comprised of described a plurality of features, 2 ..., L};
S12. be that described user collects a plurality of user's parameters vector initial values in U;
S13. receive the signal that any one user i (i ∈ U) accesses any one user j (j ∈ U);
S14. read the parameter vector K of described user i
u(i)=(uw
i1, uw
i2..., uw
ik..., uw
iL), wherein said uw
ikThe degree of correlation that represents described user i and feature k (k ∈ K);
S15. read the parameter vector K of described user j
u(j)=(uw
j1, uw
j2..., uw
jk..., uw
jL), wherein said uw
jkThe degree of correlation that represents described user j and feature k (k ∈ K);
S16. use following parameter vector update algorithm, upgrade the parameter vector of described user i and described user j,
K
u *(i)=function1[K
u(i),K
u(j)],
K
u *(j)=function2[K
u(i),K
u(j)];
Return to step S13.
Wherein, described K
u(i) and described K
u *(i) represent respectively to upgrade parameter vector front and the rear described user i of renewal, described K
u(j) and described K
u *(j) represent respectively to upgrade parameter vector front and the rear described user j of renewal; Described K
u *(i)=(uw
i1 *, uw
i2 *..., uw
ik *..., uw
iL *), described K
u *(j)=(uw
j1 *, uw
j2 *..., uw
jk *..., uw
jL *); Described function1 represents K
u *(i) be K
u(i) and K
u(j) function, described function2 represents K
u *(j) be K
u(i) and K
u(j) function; Any two users in described user i and described user j representative of consumer collection U, and do not refer in particular to certain two user, while for example performing step S13 the n time, i=1023 in described signal, j=29328, and the n+1 time execution step is during S13, i=737443 in described signal, j=837487.
In the described method of Fig. 3, after executing described step S16, also comprise and upgrade described K
u(i) and described K
u(j) step, namely carry out assignment K
u(i)=K
u *(i) and K
u(j)=K
u *(j).
in an application example of the described method of Fig. 3, the type of described signal is a kind of with in Types Below: T=1 represents that described user i pays close attention to (follow) described user j, T=2 represents that described user i adds as a friend described user j, T=3 represents that described user i forwards the information of described user j, T=4 represents the information of the described user j issue of described user i comment, T=5 represents the information of the described user j of described user i collection, T=6 represents that described user i sends personal letter to described user j, T=7 represents that described user i labels to described user j, T=8 represents that described user i is made as the information of described user j issue to like.In an application example of the described method of Fig. 3, described signal gathers from system journal.
In an application example of the described method of Fig. 3, described method meets K
u *(i) 〉=K
u(i) and K
u *(j) 〉=K
u(j).Inequality K wherein
u *(i) 〉=K
u(i) implication is for each k ∈ K, and uw is arranged
ik *〉=uw
ikInequality K
u *(j) 〉=K
u(j) implication is for each k ∈ K, and uw is arranged
jk *〉=uw
jk
In an application example of the described method of Fig. 3, for each k ∈ K, described uw
ik *Described uw
jkIncreasing function; For each k ∈ K, described uw
jk *Described uw
ikIncreasing function.
In an application example of the described method of Fig. 3, for each k ∈ K, described uw
iK *It is ∑
(k ∈ K)uw
jkSubtraction function; For each k ∈ K, described uw
jK *It is ∑
(k ∈ K)uw
ikSubtraction function.
In an application example of the described method of Fig. 3, described signal comprises the user ID of described user i and described user j, and each user ID is with unique subscriber-coded corresponding.This application example reads the subscriber-coded of described user i according to the user ID of described user i, and according to the subscriber-coded parameter vector that reads described user i of described user i; Read the subscriber-coded of described user j according to the user ID of described user j, and according to the subscriber-coded parameter vector that reads described user j of described user j.
In the described method of Fig. 3, after the described parameter vector update algorithm of execution reaches set point number,, for each feature k ∈ K, need to collect k user's column vector (uw in U to the user
1k, uw
2k..., uw
Mk) carry out normalized (normalization).Wherein, the implication of carrying out primary parameter vector update algorithm is, with described K
u(i) and described K
u(j) bring described function1 and described function2 into, obtain described K
u *(j) and described K
u *(i) process.The concrete application example of described method for normalizing is as follows:
Example 1: the user is collected k user's column vector (uw in U
1k, uw
2k..., uw
Mk) carry out the method for normalized, comprise the temp=∑ is set
(t ∈ U)uw
tk, and for each i ∈ U, uw is set
ik=uw
ik/ temp.
Example 2: the user is collected k user's column vector (uw in U
1k, uw
2k..., uw
Mk) carry out normalized method as follows.At first calculate the temp=∑
(t ∈ U)uw
tk, and for each i ∈ U, calculate uw
ik=uw
ik/ temp; Then to uw
1k, uw
2k..., uw
MkSort and according to ranking results with uw
1k, uw
2k..., uw
MkBe divided into the r group, and take out data composition set { s minimum in every group
1, s
2..., s
r, and s
1<s
2<...<s
rFinally to uw
1k, uw
2k..., uw
MkBe handled as follows: for each i ∈ U, if uw
ik<s
1, uw is set
ik=a; If s
m≤ uw
ik≤ s
m+1, uw is set
ik=g (s
m); If uw
ik>s
r, uw is set
ik=b.G (s wherein
m) be increasing function, and g (s
m) ∈ (a, b), 1≤m<r, a and b are nonnegative constant, r is setup parameter.
Application example 1
This is an application example of the described method of Fig. 3.In the described method of Fig. 3, described parameter vector update algorithm, by following concrete application example, is upgraded the parameter vector of described user i and described user j:
uw
ik *=uw
ik+ λ
1(j, i, T) f
1[K
u(j)] (for each
)
uw
jk *=uw
jk+ λ
2(i, j, T) f
2[K
u(i)] (for each
)
Wherein, described uw
ikWith described uw
ik *Represent to upgrade respectively k component of front parameter vector with upgrading rear described user i, described uw
jkWith described uw
jk *Represent to upgrade respectively k component of front parameter vector with upgrading rear described user j; Described λ
1(j, i, T) is under the type T of described signal, the influence coefficient of described user j to described user i, described λ
2(i, j, T) is under the type T of described signal, the influence coefficient of described user i to described user j.Described UK
iThe parameter vector K by described user i
u(i)=(uw
i1, uw
i2..., uw
ik..., uw
iL) in the P of numerical value maximum
iThe set that the corresponding feature of individual component forms, described UK
jThe parameter vector K by described user j
u(j)=(uw
j1, uw
j2..., uw
jk..., uw
jL) in the P of numerical value maximum
jThe set that the corresponding feature of individual component forms, P
iAnd P
jFor setup parameter, and P
i≤ L, P
j≤ L.I=30 for example, P
30=3, UK
30={ literature, computing machine, biology }; J=265, P
265=2, UK
265={ music, history }.Carry out following setting after carrying out described specific algorithm, namely for each k ∈ UK
i, uw is set
jk=uw
jk *For each k ∈ UK
j, uw is set
ik=uw
ik *
In described application example 1, described specific algorithm can be further defined to for each k ∈ UK
i, meet uw
jk *〉=uw
jkFor each k ∈ UK
j, meet uw
ik *〉=uw
ik
In described application example 1, described f
1[K
u(j)] be the parameter vector K of described user j
u(j) function, described f
2K
u(i)] be the parameter vector K of described user i
u(i) function.Described f
1[K
uAnd described f (j)]
2[K
u(i)] concrete methods of realizing comprises following instance:
Example 1: described f
1[K
u(j)] be described uw
jkIncreasing function, be ∑
(k ∈ K)uw
jkSubtraction function; Described f
2[K
u(i)] be described uw
ikIncreasing function, be ∑
(k ∈ K)uw
ikSubtraction function.
Example 2:f
1[K
u(j)]=σ
3Uw
jk/ (∑
(k ∈ K)uw
jk), f
2[K
u(i)]=σ
4Uw
ik/ (∑
(k ∈ K)uw
ik), σ wherein
3And σ
4For setting constant.
Example 3:f
1[K
u(j)]=σ
5Uw
jk, f
2[K
u(i)]=σ
6Uw
ik, σ wherein
5And σ
6For setting constant.
Example 4:f
1[K
u(j)]=σ
7{ 1/[1+exp (uw
jk)], f
2[K
u(i)]=σ
8{ 1/[1+exp (uw
ik)], σ wherein
7And σ
8For setting constant.
In described application example 1, described λ
1(j, i, T) and described λ
2The concrete methods of realizing of (i, j, T) comprising:
Example 1: described λ
1(j, i, T) and described λ
2(i, j, T) is respectively the function of the similarity sim (i, j) between the parameter vector of described user i and described user j.λ for example
1(j, i, T)=c
1Sim (i, j), λ
2(i, j, T)=c
2Sim (i, j), and sim (i, j)=|| K
u(i), K
u(j) ||=[∑
(k ∈ K)(uw
ikUw
jk)]/{ [∑
(k ∈ K)(uw
ik)
2]
1/2[∑
(k ∈ K)(uw
jk)
2]
1/2, c
1And c
2For setting constant.The implication of this example is that the similarity between the parameter vector of user i and user j is higher, so they the scale-up factor of " ballot " is larger each other.
Example 2: described λ
1(j, i, T)=u
1(j) u
2(i), described λ
2(i, j, T)=u
1(i) u
2(j), u wherein
1(j) whether the parameter vector of expression user j can be used for upgrading the user and collect other users' of U parameter vector, u
1(i) whether the parameter vector of expression user i can be used for upgrading the user and collect other users' of U parameter vector; u
2(i) whether the parameter vector of expression user i can be collected by the user parameter vector renewal of other users in U; u
2(j) whether the parameter vector of expression user j can be collected by the user parameter vector renewal of other users in U.u
1(j), u
2(j), u
1(i) and u
2(i) be setup parameter, their value is 0 or 1.1 the representative be, 0 represent no.The implication of this example is for preventing malicious attack, does not pass through the user of reliability certification, and its parameter vector can not upgrade other user's parameter vector; Some special user, its parameter vector can not be upgraded by other user's parameter vector.
Example 3: described λ
1(j, i, T)=s
1(T), described λ
2(i, j, T)=s
2(T).Wherein said T is the type of user's calling party signal, described s
1(T) and described s
2(T) be the function of described T.
Example 4: with the combination of above-mentioned example 1~3 each method, generate described λ
1(j, i, T) and λ
2(i, j, T).For example
λ
1(j,i,T)={c
1·sim(i,j)}·{u
1(j)·u
2(i)}·s
1(T)
λ
2(i,j,T)={c
2·sim(i,j)}·{u
1(i)·u
2(j)}·s
2(T)。
Example 5: described λ
1(j, i, T) is the function of number of users in the relational network of described user j, described λ
2(i, j, T) is the function of number of users in the relational network of described user i.
Example 6: described λ
1(j, i, T) and described λ
2(i, j, T) is for setting constant.
In described application example 1, after the described specific algorithm of execution reaches set point number, need to be for each feature k ∈ K, to k user's column vector (uw
1k, uw
2k..., uw
Mk) carry out normalized.
Application example 2
This be one of described application example 1 method for example.For convenience of illustration,let us suppose that three users on the internet, and each user has two features, and namely the user collects U={1,2,3}, feature set K={1,2}.User 1, user 2 and user's 3 parameter vector is respectively (uw
11, uw
12), (uw
21, uw
22) and (uw
31, uw
32).Uw wherein
ikThe degree of correlation of (i ∈ U, k ∈ K) described user i of expression and feature k.
If received the described user 2 described users' 3 of access signal, and signal type T=1,, according to following parameter vector update algorithm, upgrade described user 2 and described user's 3 parameter vector:
uw
21 *=uw
21+λ
1(3,2,1){uw
31/(uw
31+uw
32)}
uw
22 *=uw
22+λ
1(3,2,1){uw
32/(uw
31+uw
32)}
uw
31 *=uw
31+λ
2(2,3,1){uw
21/(uw
21+uw
22)}
uw
32 *=uw
32+λ
2(2,3,1){uw
22/(uw
21+uw
22)}
λ wherein
1(3,2,1) is illustrated under signal type T=1,3 couples of described users' 2 of described user influence coefficient; λ
2(2,3,1) is illustrated under signal type T=1,2 couples of described users' 3 of described user influence coefficient.If λ
1(3,2,1)=c
1Sim (2,3) u
1(3) u
2(2) s
1(1); λ
2(2,3,1)=c
2Sim (2,3) u
1(2) u
2(3) s
2(1), establish s
1(1)=3, s
2(1)=1.5; c
1And c
2For setting constant; u
1(3) whether expression user's 3 parameter vector can be used for upgrading the user and collect other users' of U parameter vector, u
1(2) whether expression user's 2 parameter vector can be used for upgrading the user and collect other users' of U parameter vector, u
2(2) whether expression user's 2 parameter vector can be collected by the user parameter vector renewal of other users in U, u
2(3) whether expression user's 3 parameter vector can be collected by the user parameter vector renewal of other users in U, u
1(2)=u
2(2)=u
1(3)=u
2(3)=1; Similarity between described sim (2,3) the described user 2 of expression and described user's 3 parameter vector, namely
sim(2,3)=(uw
21·uw
31+uw
22·uw
32)/{[(uw
21)
2+(uw
22)
2]
1/2·[(uw
31)
2+(uw
32)
2]
1/2}。
After executing above-mentioned algorithm, upgrade described user 2 and described user's 3 parameter vector, uw namely is set
31=uw
31 *, uw
32=uw
32 *, uw
21=uw
21 *And uw
22=uw
22 *
After executing above-mentioned algorithm, to user's column vector (uw
11, uw
21, uw
31) and (uw
12, uw
22, uw
32) carry out normalized.Its algorithm is as follows: establish temp1=uw
11+ uw
21+ uw
31, feature k=1 is arranged uw
11=uw
11/ temp1, uw
21=uw
21/ temp1, uw
31=uw
31/ temp1; If temp2=uw
12+ uw
22+ uw
32, feature k=2 is arranged uw
12=uw
12/ temp2, uw
22=uw
22/ temp2, uw
32=uw
32/ temp2.
Fig. 4 is the method for obtaining the user individual feature based on user's access document signal.Specifically comprise the steps:
S21. obtain on the internet a plurality of users, the user that storage is comprised of described a plurality of users collects U={1, and 2 ..., M}; Obtain on the internet a plurality of documents, the document sets D={1 that storage is comprised of described a plurality of documents, 2 ..., N}; A plurality of features are set, the feature set K={1 that storage is comprised of described a plurality of features, 2 ..., L};
S22. be that described user collects a plurality of user's parameters vector initial values in U, and be a plurality of document setup parameter vector initial values in described document sets D;
S23. receive the signal that any one user m (m ∈ U) accesses any one document n (n ∈ D);
S24. read the parameter vector K of described user m
u(m)=(uw
m1, uw
m2..., uw
mk..., uw
mL), wherein said uw
mkThe degree of correlation that represents described user m and feature k (k ∈ K);
S25. read the parameter vector K of described document n
d(n)=(dw
n1, dw
n2..., dw
nk..., dw
nL), wherein said dw
nkThe degree of correlation that represents described document n and feature k (k ∈ K);
S26. use following parameter vector update algorithm 2, upgrade the parameter vector of described user m and described document n,
K
u *(m)=function3[K
u(m),K
d(n)],
K
d *(n)=function4[K
u(m),K
d(n)];
After executing described step S26, return to described step S23.
Wherein, described K
u(m) and described K
u *(m) represent respectively to upgrade parameter vector front and the rear described user m of renewal, described K
d(n) and described K
d *(n) represent respectively to upgrade parameter vector front and the rear described document n of renewal; Described K
u *(m)=(uw
m1 *, uw
m2 *..., uw
mk *..., uw
mL *), described K
d *(n)=(dw
n1 *, dw
n2 *..., dw
nk *..., dw
nL *); Described function3 represents K
u *(m) be K
u(m) and K
d(n) function, described function4 represents K
d *(n) be K
u(m) and K
d(n) function; Any one user in described user m representative of consumer collection U, and do not refer in particular to certain user, described document n represents any one document in document sets D, and do not refer in particular to certain document, for example the n time execution step during S23 in described signal m=1023, n=3428, and the n+1 time execution step during S23 in described signal m=33456, n=28477.
In the described method of Fig. 4, after executing described step S26, also comprise and upgrade described K
u(m) and described K
d(n) step, namely carry out assignment K
d(n)=K
d *(n) and K
u(m)=K
u *(m).
in the described method of Fig. 4, the type of described signal is a kind of with in Types Below at least: T=9 represents that described user m clicks the link of described document n, T=10 represents that described user m keys in the address of described document n, T=11 represents that described user m arranges label to described document n, T=12 represents that the described document n of described user m is set to bookmark, T=13 represent the described document n of described user m be set to like (as the Like of the types of facial makeup in Beijing operas and Google+1), T=14 represents that described user m forwards described document n, T=15 represents the described document n of described user m comment, T=16 represents the described document n of described user m collection.In an application example of the described method of Fig. 4, described signal gathers from the Web daily record.Described Web daily record, comprise server log (server log), error log (error log) and Cookie daily record etc.
In an application example of the described method of Fig. 4, described method meets K
u *(m) 〉=K
u(m) and K
d *(n) 〉=K
d(n).Inequality K wherein
u *(m) 〉=K
u(m) implication is for each k ∈ K, and uw is arranged
mk *〉=uw
mkInequality K
d *(n) 〉=K
d(n) implication is for each k ∈ K, and dw is arranged
nk *〉=dw
nk
In an application example of the described method of Fig. 4, for each k ∈ K, described uw
mk *Described dw
nkIncreasing function, be ∑
(k ∈ K)dw
nkSubtraction function; For each k ∈ K, described dw
nk *Described uw
mkIncreasing function, be ∑
(k ∈ K)uw
mkSubtraction function.
In an application example of the described method of Fig. 4, described signal comprises the user ID of described user m and the document identification of described document n, and described user ID is with unique subscriber-coded corresponding, and described document identification is corresponding with unique document coding.This application example reads the subscriber-coded of described user m by the user ID of described user m, and according to the subscriber-coded parameter vector that reads described user m of described user m; Read the document coding of described document n by the document identification of described document n, and the parameter vector that reads described document n according to the document coding of described document n.
In the described method of Fig. 4, in the described parameter vector update algorithm 2 of execution, reach set point number t
1After, for each feature k ∈ K, to k user's column vector (uw
1k, uw
2k..., uw
Mk) carry out normalized; Reach set point number t in the described parameter vector update algorithm 2 of execution
2After, for each feature k ∈ K, to k document column vector (dw
1k, dw
2k..., dw
Nk) carry out normalized; T wherein
1And t
2For positive integer.The implication of carrying out primary parameter vector update algorithm 2 is, with described K
u(m) and described K
d(n) bring described function3 and described function4 into, obtain described K
u *(m) and described K
d *(n) process.The concrete application example of described method for normalizing is as follows:
Example 1: the user is collected k user's column vector (uw in U
1k, uw
2k..., uw
Mk) carry out the method for normalized, comprise the temp=∑ is set
(t ∈ U)uw
tk, and for each m ∈ U, uw is set
mk=uw
mk/ temp.To k document column vector (dw in document sets D
1k, dw
2k..., dw
Nk) carry out the method for normalized, comprise the temp=∑ is set
(t ∈ D)dw
tk, and for each n ∈ D, dw is set
nk=dw
nk/ temp.
Example 2: to k document column vector (dw in document sets D
1k, dw
2k..., dw
Nk) carry out normalized method as follows.At first calculate the temp=∑
(t ∈ D)dw
tk, and for each n ∈ D, calculate dw
nk=dw
nk/ temp; Then to dw
1k, dw
2k..., dw
NkSort and according to ranking results with dw
1k, dw
2k..., dw
NkBe divided into the r group, and take out data composition set { s minimum in every group
1, s
2..., s
r, and s
1<s
2<...<s
rFinally to dw
1k, dw
2k..., dw
NkBe handled as follows: for each n ∈ D, if dw
nk<s
1, dw is set
nk=a; If s
m≤ dw
nk≤ s
m+1, dw is set
nk=g (s
m); If dw
nk>s
r, dw is set
nk=b.G (s wherein
m) be increasing function, and g (s
m) ∈ (a, b), 1≤m<r, a and b are nonnegative constant, r is setup parameter.Use same method, can collect k user's column vector in U to the user and carry out normalized.
Application example 3
This is an application example of the described method of Fig. 4.Described parameter vector update algorithm 2, by following concrete application example, is upgraded the parameter vector of described user m and described document n:
uw
mk *=uw
mk+ λ
3(n, m, T) f
3[K
d(n)] (for each
)
dw
nk *=dw
nk+ λ
4(m, n, T) f
4[Ku (m)] is (for each
)
Wherein, described uw
mkWith described uw
mk *Represent to upgrade respectively k component of front parameter vector with upgrading rear described user m, described dw
nkWith described dw
nk *Represent to upgrade respectively k component of front parameter vector with upgrading rear described document n; Described λ
3(n, m, T) is under the type T of described signal, the influence coefficient of described document n to described user m, described λ
4(m, n, T) is under the type T of described signal, the influence coefficient of described user m to described document n.Described UK
mThe parameter vector K by described user m
u(m)=(uw
m1, uw
m2..., uw
mk..., uw
mL) in the P of numerical value maximum
mThe set that the corresponding feature of individual component forms, described DK
nThe parameter vector K by described document n
d(n)=(dw
n1, dw
n2..., dw
nk..., dw
nL) in the Q of numerical value maximum
nThe set that the corresponding feature of individual component forms, P
mAnd Q
nFor setup parameter, and P
m≤ L, Q
n≤ L.M=30 for example, P
30=3, UK
30={ music, physical culture, finance and economics }; N=265, Q
265=2, DK
265={ music, building }.In addition, carry out following setting after carrying out above-mentioned specific algorithm, namely for each k ∈ UK
m, dw is set
nk=dw
nk *, for each k ∈ DK
n, uw is set
mk=uw
mk *
In described application example 3, described specific algorithm can be further defined to for each k ∈ DK
n, meet uw
mk *〉=uw
mkFor each k ∈ UK
m, meet dw
nk *〉=dw
nk
In described application example 3, described f
3[K
d(n)] be the parameter vector K of described document n
d(n) function, described f
4[K
u(m)] be the parameter vector K of described user m
u(m) function.Described f
3[K
dAnd described f (n)]
4[K
u(m)] concrete methods of realizing comprises:
Example 1: described f
3[K
d(n)] be described dw
nkIncreasing function, be ∑
(k ∈ K)dw
nkSubtraction function; Described f
4[K
u(m)] be described uw
mkIncreasing function, be ∑
(k ∈ K)uw
mkSubtraction function.
Example 2:f
3[K
d(n)]=σ
3Dw
nk/ (∑
(k ∈ K)dw
nk), f
4[K
u(m)]=σ
4Uw
mk/ (∑
(k ∈ K)uw
mk), σ wherein
3And σ
4For setting constant.
Example 3:f
3[K
d(n)]=σ
5Dw
nk, f
4[K
u(m)]=σ
6Uw
mk, σ wherein
5And σ
6For setting constant.
Example 4:f
3[K
d(n)]=σ
7{ 1/[1+exp (dw
nk)], f
4[K
u(m)]=σ
8{ 1/[1+exp (uw
mk)], σ wherein
7And σ
8For setting constant.
In described application example 3, described λ
3(n, m, T) and described λ
4The concrete methods of realizing of (m, n, T) comprises following example:
Example 1: described λ
3(n, m, T) and described λ
4(m, n, T) is respectively the function of the similarity sim (m, n) between the parameter vector of described user m and described document n.λ for example
3(n, m, T)=c
1Sim (m, n), λ
4(m, n, T)=c
2Sim (m, n), and sim (m, n)=|| K
u(m), K
d(n) ||=[∑
(k ∈ K)(uw
mkDw
nk)]/{ [∑
(k ∈ K)(uw
mk)
2]
1/2[∑
(k ∈ K)(dw
nk)
2]
1/2, c
1And c
2For setting constant.The implication of this example is that the similarity between the parameter vector of user and document is higher, and the scale-up factor of " ballot " is larger each other for they.
Example 2: described λ
3(n, m, T)=u
2(m) d
1(n), described λ
4(n, m, T)=u
1(m) d
2(n), d wherein
1(n) whether the parameter vector of expression document n can be used for upgrading the user and collect U user's parameter vector, u
2(m) whether the parameter vector of expression user m can be upgraded by the parameter vector of document in document sets D, u
1(m) whether the parameter vector of expression user m can be used for upgrading the parameter vector of document sets D document, d
2(n) whether the parameter vector of expression document n can be collected by the user parameter vector renewal of user in U.u
1(m), u
2(m), d
1(n) and d
2(n) be setup parameter, their value is 0 or 1.1 the representative be, 0 represent no.The implication of this example is for preventing malicious attack, and some document (or user) is owing to not passing through reliability certification, and its parameter vector can not upgrade other user's (or document) parameter vector; Some special document (or user), its parameter vector can not be upgraded by other user's (or document) parameter vector.
Example 3: described λ
3(n, m, T)=s
1(T), described λ
4(m, n, T)=s
2(T).Wherein said T is the type of user's access document signal, described s
1(T) and described s
2(T) be the function of described T.
Example 4: with the combination of above-mentioned example 1~3 each method, generate described λ
3(n, m, T) and λ
4(m, n, T).Namely
λ
3(n,m,T)={c
1·sim(m,n)}·{u
2(m)·d
1(n)}·s
1(T)
λ
4(m,n,T)={c
2·sim(m,n)}·{u
1(m)·d
2(n)}·s
2(T)。
Example 5: described λ
3(n, m, T) is the accessed number of times of described document n or the function of PageRank value, described λ
4(m, n, T) is the function of number of users in the relational network of described user m.
Example 6: described λ
3(n, m, T) and described λ
4(m, n, T) is for setting constant.
In described application example 3, after the described concrete parameter vector update algorithm of execution reaches set point number, need to be for each feature k ∈ K, respectively to k document column vector (dw
1k, dw
2k..., dw
Nk) and k user's column vector (uw
1k, uw
2k..., uw
Mk) carry out normalized.
Application example 4
This is an applicating example of described application example 3 described methods.For convenience of illustration,let us suppose that two users and three documents on the internet, and each user and each document all have two features, and namely the user collects U={1,2}, document sets D={1,2,3}, feature set K={1,2}.User 1 and user's 2 parameter vector is respectively (uw
11, uw
12) and (uw
21, uw
22), the parameter vector of document 1, document 2 and document 3 is respectively (dw
11, dw
12), (dw
21, dw
22) and (dw
31, dw
32).Uw wherein
mkThe degree of correlation of (m ∈ U, k ∈ K) described user m of expression and feature k; dw
nkThe degree of correlation of (n ∈ D, k ∈ K) described document n of expression and feature k.
Suppose to have received the signal of described user's 2 described documents 3 of access in server, and signal type T=9, according to following algorithm, upgrade the parameter vector of described user 2 and described document 3:
uw
21 *=uw
21+λ
3(3,2,9){dw
31/(dw
31+dw
32)}
uw
22 *=uw
22+λ
3(3,2,9){dw
32/(dw
31+dw
32)}
dw
31 *=dw
31+λ
4(2,3,9){uw
21/(uw
21+uw
22)}
dw
32 *=dw
32+λ
4(2,3,9){uw
22/(uw
21+uw
22)}
λ wherein
3(3,2,9) are illustrated under signal type T=9,3 couples of described users' 2 of described document influence coefficient; λ
4(2,3,9) are illustrated under signal type T=9, the influence coefficient of 2 pairs of described documents 3 of described user.For example establish λ
3(3,2,9)=c
1Sim (2,3) s
1(9); λ
4(2,3,9)=c
2Sim (2,3) s
2(9), establish s
1(9)=3, s
2(9)=1.5; c
1And c
2For setting constant; Similarity between the parameter vector of described sim (2,3) the described user 2 of expression and described document 3, that is:
sim(2,3)=(uw
21·dw
31+uw
22·dw
32)/{[(uw
21)
2+(uw
22)
2]
1/2·[(dw
31)
2+(dw
32)
2]
1/2}。
After executing above-mentioned algorithm, upgrade the parameter vector of described user 2 and described document 3, uw namely is set
21=uw
21 *, uw
22=uw
22 *, dw
31=dw
31 *And dw
32=dw
32 *
After executing above-mentioned algorithm, to user's column vector (uw
11, uw
21) and (uw
12, uw
22) carry out normalized, and to document column vector (dw
11, dw
21, dw
31) and (dw
12, dw
22, dw
32) carry out normalized.
Algorithm to the normalized of user's column vector is as follows: establish temp1=uw
11+ uw
21, feature k=1 is arranged uw
11=uw
11/ temp1, uw
21=uw
21/ temp1; If temp2=uw
12+ uw
22, feature k=2 is arranged uw
12=uw
12/ temp2, uw
22=uw
22/ temp2.
Algorithm to the normalized of document column vector is as follows: establish temp1=dw
11+ dw
21+ dw
31, feature k=1 is arranged dw
11=dw
11/ temp1, dw
21=dw
21/ temp1, dw
31=dw
31/ temp1; If temp2=dw
12+ dw
22+ dw
32, feature k=2 is arranged dw
12=dw
12/ temp2, dw
22=dw
22/ temp2, dw
32=dw
32/ temp2.
Fig. 5 has the user's of special characteristic method flow diagram for inquiry.The method is included in server carries out following steps:
A11. receive the query vector that arbitrary inquiring user e (e ∈ U) arranges;
A12. described inquiring user e chooses one group of user in described user collects U
For example, select the one group user of age between a given area in social networks, perhaps position is all users that set in geographic area; If the user does not carry out above-mentioned choosing, default value is Q=U;
A13., according to the parameter vector of each user in described query vector and described one group of user Q, calculate the personalized ordering value UR (e, m) of each the user m (m ∈ Q) in described one group of user Q; Described UR (e, m) expression is based on the personalized ordering value of the described user m of the query vector of described user e;
A14. in described one group of user Q, some users' of described personalized ordering value maximum sign is sent to described inquiring user e.
In the described method of Fig. 5, the query vector that described user e arranges is K
s(e)=(sw
e1, sw
e2..., sw
ek..., sw
eL), sw wherein
ekRepresent that described user e expects the user who inquires and the degree of correlation of feature k (k ∈ K), sw
ek∈ [a, b], a and b are default nonnegative constant.Described query vector K
s(e) following several method to set up is arranged.
The first is to select feature by described user e in feature set K, and it is arranged the feature degree of correlation, and sw for example is set
e2=0.00023, sw
e6=0.00061, the degree of correlation of described user e and further feature is 0.
The second is to described query vector K parameter vector Ku (e) assignment of described user e
s(e).
The third is that described user e submits one group of user or document S to
e..., r ... }.When
The time, described user r (r ∈ S
e) parameter vector be (uw
r1, uw
r2..., uw
rL), therefore the query vector of described user e is made as: for each feature k ∈ K, sw
ek=(σ
9/ s) ∑
(r ∈ Se)[uw
rk/ (∑
(k ∈ K)uw
rk)]; When
The time, described document r (r ∈ S
e) parameter vector be (dw
r1, dw
r2..., dw
rL), therefore the query vector of described user e is made as: for each feature k ∈ K, sw
ek=(σ
10/ s) ∑
(r ∈ Se)[dw
rk/ (∑
(k ∈ K)dw
rk)].
In an application example of the described method of Fig. 5, described personalized ordering value UR (e, m) is the query vector K by described user e
s(e)=(sw
e1, sw
e2..., sw
ek..., sw
eL) and the parameter vector K of described user m (m ∈ Q)
u(m)=(uw
m1, uw
m2..., uw
mk..., uw
mL) calculate and obtain, for example
Fig. 6 has the method flow diagram of the customer group of special characteristic for inquiry.Obtain a plurality of customer groups, form subscriber cluster G={1,2 ..., E}, wherein E is the number of customer group.The parameter vector of customer group i (i ∈ G) is made as (gw
i1, gw
i2..., gw
ik..., gw
iL), wherein said gw
ikThe degree of correlation that represents described customer group i and feature k (k ∈ K).Therefore, to have the method for special characteristic customer group as follows in inquiry:
A21. calculate the parameter vector of each customer group in described subscriber cluster G; The parameter vector of a customer group is calculated by each user's that this customer group comprises parameter vector; For example, all users that establish in customer group i form user's set B
i, the parameter vector computing method of customer group i are for each feature k ∈ K, and gw is set
ik=(σ
11/ s) ∑
(t ∈ Bi)[uw
tk/ (∑
(k ∈ K)uw
tk)], wherein s is user's set B
iElement number, σ
11For setting constant.
A22. receive the query vector that arbitrary inquiring user e (e ∈ U) arranges;
A23., according to the parameter vector of each customer group in described query vector and described subscriber cluster G, calculate the personalized ordering value GR (e, i) of each customer group i in described subscriber cluster G (i ∈ G); Described GR (e, i) expression is based on the personalized ordering value of the described customer group i of the query vector of described user e;
A24. in described subscriber cluster G, the sign of some customer groups of described personalized ordering value maximum is sent to described inquiring user e.
In the described method of Fig. 6, the query vector that described user e arranges is K
s(e)=(sw
e1, sw
e2..., sw
ek..., sw
eL), sw wherein
ekRepresent that described user e expects the customer group that inquires and the degree of correlation of feature k (k ∈ K), sw
ek∈ [a, b], a and b are default nonnegative constant.Described query vector K
s(e) four kinds of methods to set up are arranged, first three kind is identical with the described three kinds of methods of Fig. 5.The 4th kind is that described user e submits a customer group to, and with the parameter vector assignment of this customer group, gives described K
s(e).
In searching the method for customer group, described personalized ordering value GR (e, i) is the query vector K by described user e
s(e)=(sw
e1, sw
e2..., sw
ek..., sw
eL) and the parameter vector K of described customer group i (i ∈ G)
u(i)=(gw
i1, gw
i2..., g
wi..., gw
iL) calculate and obtain, for example
A kind of system construction drawing that obtains the user individual feature of Fig. 7.Described system 200 comprises following functional module:
User, document and feature arrange module 211: obtain on the internet a plurality of users, form the user and collect U={1, and 2 ..., M}, collect U with described user and be stored in customer data base 220; Obtain on the internet a plurality of documents, form document sets D={1,2 ..., N}, be stored in document database 230 with described document sets D; A plurality of features are set, composition characteristic collection K={1,2 ..., L}, be stored in property data base 240 with described feature set;
The parameter vector initial value of user and document arranges module 212: for described user collects a plurality of user's parameters vector initial values in U, and it is stored in described customer data base 220; For a plurality of document setup parameter vector initial values in described document sets D, and it is stored in described document database 230; Be not set up user and the document of parameter vector initial value, the default null vector that is made as of its parameter vector initial value;
User's calling party signal acquisition module 213: be used for gathering the signal 1 that any one user i (i ∈ U) accesses any one user j (j ∈ U), described signal 1 is stored in Web log database 250; The signal of described user i (101) the described user j of access (102), will be sent to social networking service device 302;
User's access document signal acquisition module 214: be used for gathering the signal 2 that any one user m (m ∈ U) (103) accesses any one document n (n ∈ D), described signal 2 is stored in Web log database 250; The signal of the described document n of described user m (103) access, to be sent at least one application server, described application server comprises portal site server 301, social networking service device 302, search engine server 303 and instant communication server 304;
The parameter vector update module 215 of user and document: according to described signal 1, read the parameter vector of described user i (101) and described user j (102) in described customer data base 220, then upgrade the parameter vector of described user i (101) and described user j (102) by the parameter vector update algorithm, and in described customer data base 220 the described user i after storage update and the parameter vector of described user j; According to described signal 2, read the parameter vector of described user m (103) and the parameter vector that reads described document n in described document database 230 in described customer data base 220, then upgrade the parameter vector of described user m (103) and described document n by parameter vector update algorithm 2, and in described customer data base 220 the described user m after storage update parameter vector and in described document database 230 parameter vector of the described document n after storage update;
User's enquiry module 216: this module has the query function of user and customer group; User's query function comprises: receive by the query vector of inquiring user setting and obtain one group of user
Then according to the parameter vector of each user in described query vector and described one group of user Q, calculate the personalized ordering value of each user in described one group of user Q, and according to the size of this personalized ordering value, the sign of at least one user in described one group of user Q is sent to described inquiring user, referring to steps A 11 to A14; The customer group query function comprises: receive the query vector that is arranged by inquiring user, then according to the parameter vector of each customer group in described query vector and subscriber cluster G, calculate the personalized ordering value of each customer group in described subscriber cluster G, and according to the size of this personalized ordering value, the sign of at least one customer group in described subscriber cluster G is sent to described inquiring user, referring to steps A 21 to A24.
The above application example is only better application example of the present invention, not in order to limit protection scope of the present invention.