CN102591966A - Filtering method of search results in mobile environment - Google Patents
Filtering method of search results in mobile environment Download PDFInfo
- Publication number
- CN102591966A CN102591966A CN2011104581556A CN201110458155A CN102591966A CN 102591966 A CN102591966 A CN 102591966A CN 2011104581556 A CN2011104581556 A CN 2011104581556A CN 201110458155 A CN201110458155 A CN 201110458155A CN 102591966 A CN102591966 A CN 102591966A
- Authority
- CN
- China
- Prior art keywords
- user
- formula
- users
- value
- result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a filtering method of search results in a mobile environment. The method comprises the steps of: finely dividing users into different groups according to history position information characteristics of the users; characteristically modeling the users according to the history query records of the users; analyzing history call records of the users, establishing a social intercourse relation network of the users and calculating the social intercourse relation importance among the users; and during search, firstly, filtering the search results based on contents by using an established user characteristic model, secondly, cooperatively filtering the search results with the finely divided user group information and the excavated information of the social intercourse relation network of the users, and thirdly, returning the search results to the users. With the method for excavating the user characteristics and filtering the information, the search results can be better filtered in a personalized way, a mass of unrelated search results can be removed, a result set can be simplified, and the personalized precise search in the mobile environment can be realized.
Description
Technical field
The invention belongs to information retrieval field, be specifically related to the Search Results filter method under a kind of mobile scene, this method is applicable to the personalized search that moves under the scene.
Background technology
In the more than ten years in past, search engine technique has been obtained develop rapidly, and traditional internet hunt is implemented to quite ripe that business model has all developed from technology, and has obtained immense success.In recent years, be that the emerging technology and the application of representative continues to bring out with the mobile Internet, mobile search is one of mobile Internet important application.
Mobile search is because the portable terminal movability; Portable; And limitation such as screen size, processing power and available bandwidth, make it can not directly indiscriminately imitate the implementation of existing internet hunt, main cause has following 2 points: (1) traditional internet search engine returns to a large amount of result of user usually; In fact in most cases these results are as far as the user, and it is incoherent having over half.One of them chief reason search engine is just carrying out coupling simple to search key; Do not consider that other information are (like user context information; Individual's preference etc.), add the surge of internet information, caused the generation of a lot " rubbish result "; The user has in Search Results, oneself screen, and this has increased the weight of user's burden greatly.Moving under the scene; Because limitation such as mobile terminal screen keyboard size, processing power and available bandwidths, said circumstances is that the user is flagrant, the one, and a large amount of rubbish results waste valuable flow; The 2nd, the user carries out the page turning screening to Search Results on portable terminal be very inconvenient; This has determined that mobile search must be to search for accurately, return to the user try one's best few, result accurately; (2) to same search key, that the internet search engine of system returns all users is machine-made result, yet different user is because its background knowledge is different; Hobby is different; Information requirement is different, and same key word is to different people, in different fields; Different time all possibly expressed the different meanings with the place, and what the user needed often is all very little subclass in Search Results the inside.The movability of portable terminal; Portability and individual's property; Make the user to obtain information needed anywhere or anytime, make that the personalized search demand is stronger, this has determined that mobile search is that a kind of and individual subscriber characteristic (like interest etc.) and user's context are (like the time; Place, factors such as weather) search of relevant personalization.
What therefore, mobile search need realize is personalized accurate search.At present, domestic mobile search research still is in the starting stage, and the existing internet search technology of the technology that realizes is all still immature; Technology early has the vertical search technology, like mobile phone music search, novel search etc.; Adopting more implementation at present is to combine existing internet search technology and relevant ancillary technique, like the information filtering technology, earlier the user is carried out feature modeling; With this model Search Results is carried out personalization then and filter, filter out uncorrelated result, realize personalized precisely search.
User characteristics modeling common technology directed quantity spatial model and ontology model, vector space model is simple because of its principle, realizes using extensive relatively easily.
What information filtering technology was commonly used has content-based filtering technique and collaborative filtering technological; Content-based filtering technique is that the result is carried out feature extraction; The similarity of result of calculation and filtering profile (user model) is pressed setting threshold and is filtered, and analyzes with resultant content because be; Usually can reach filter effect preferably, but calculated amount is bigger.The collaborative filtering technology then has this thought of same interest preference according to the people of same type usually; Come user's Search Results is carried out collaborative filtering through the user similar with active user's interest, this technology has obtained good development and application in e-commerce field.
Summary of the invention
The purpose of this invention is to provide the Search Results filter method under a kind of mobile scene; This method is through digging user data (user's historical position information; Historical message registration etc.) set up user characteristics model and user social contact network; And Search Results is carried out content-based filtration and collaborative filtering respectively according to user characteristics model and user social contact network; Filter out incoherent Search Results, realize moving the accurate search of the personalization under the scene, this is of great value to improving mobile search user experience and user's viscosity.
Search Results filter method under a kind of mobile scene provided by the invention, this method comprises the steps:
The 1st step is to user U
i, i=1,2 ..., the initial results collection R to be filtered of N
1, R
2..., R
Z, utilize the d gt to treat filter result and set up proper vector, R
rProper vector be expressed as f
Rr={ (q
1, v
1), (q
2, v
2) ..., (q
d, v
d), v
aRepresent the weights on each dimension; Utilize word frequency/contrary document frequency TF/IDF Model Calculation f
Rr, the weights v on each dimension
a, to q
1, q
2... q
dIn each speech q
aIf it does not appear at R
r, in, then its weights are 0, otherwise are its TF/IDF value, TF is that it is at R
rThe middle number of times that occurs, the promptly contrary document frequency of IDF is added up the z of number as a result that those comprise this speech;
Wherein, the IDF value is log (Z/z), and Z is the number of initial results to be filtered, and the TF/IDF value is the product of TF and IDF, r=1, and 2 ..., Z, a=1,2 ..., d;
The 2nd step was sought active user U
i, similar users, from following two users set, choose, the one, the G of colony under the user
g, g is the sequence number of the colony under the user, its span is 1 to m, and the 2nd, the user's in the user social contact network set merges these two set and obtains S set, remembers that the user in this set is U
Is, utilize the vectorial cosine angle formula shown in the formula I to calculate user U
iWith each the user U in the S set
IsBetween similarity, shown in II, vector angle is more little, cosine value is big more, similarity is big more, vice versa; I representes user's sequence number, and N representes number of users, i=1, and 2 ..., N, f
UiAnd f
UisRepresent U respectively
iAnd U
IsProper vector, ψ (U
i, U
Is) represent U
iWith U
IsBetween degree of relationship, if U
IsAt U
iSocial networks in, ψ (U then
i, U
Is) get corresponding value, otherwise get null value; Choose preceding η user U from high to low by similarity
I1, U
I2..., U
I η, if not enough η, then choose all users among the S; η is a preset value;
The 3rd content-based filtration of step:
To each bar initial results R to be filtered
r, adopt formula III to calculate itself and user U successively
iBetween similarity, f
UiAnd f
RrRepresent U respectively
iAnd R
rProper vector; Filter by pre-set threshold ζ according to similarity, the initial results of similarity less than threshold value ζ filtered out, obtain intermediate result collection R
r, r=1,2 ..., Z
ζ, filter the intermediate result that obtains and arrange by original sequencing;
Wherein,
The 2nd step is to middle result set R
r, r=1,2 ..., Z
ζCarry out collaborative filtering, utilize user U
iThe similar users U of η
I1, U
I2..., U
I η, to middle R as a result
r,, calculate similarity sim ' (U by formula IV
i, R
r) carry out collaborative filtering, in the formula,
With
Represent U respectively
IsWith U
i, U
IsWith R
rBetween similarity;
Rank
r=θ r+ (1-θ) sim ' (U
i, R
r) formula V
According to sim ' (U
i, R
r) carry out collaborative filtering by pre-set threshold ε, the intermediate result of similarity less than ε is filtered out, obtain interim result set R
r, r=1,2 ..., Z
ε, r represents its sequencing ordering in interim result set, is followed successively by 1,2 ..., Z
ε, to interim R
r,, utilize formula V to calculate its order r and sim ' (U with predefined weighting coefficient θ
i, R
r) weighted sum, as net result rank Rank
r, with this rank to interim result set R
r, rearrangement obtains net result, returns to the user, and filter process finishes.
Search Results filter method under the mobile scene provided by the invention has comprehensively adopted data digging method (classification, cluster), content-based filter algorithm and collaborative filtering.Particularly, the present invention has following effect and advantage:
(1) accuracy is high, and collaborative filtering is carried out in the novelty of the present invention user social contact network information is analyzed simultaneously on the basis of traditional content-based filtration, largely improved accuracy.
(2) adaptability is strong, and the present invention considers mobile subscriber colony and individual's diversity, can adapt to various user groups and individual's individual demand well.
(3) extensibility is high, and filter method provided by the invention also can be used for its mobile Internet and use except being used for mobile search, accurate advertisement input etc., and the user characteristics modeling method also can be applied to Customer Relation Management (CRM) etc.
Description of drawings
Fig. 1 is the overall flow figure of the inventive method;
Fig. 2 is mobile subscriber's historical position change frequency sketch;
Fig. 3 is the process flow diagram of mobile subscriber's opsition dependent cluster;
Fig. 4 is mobile subscriber's social networks structural drawing;
Fig. 5 is the detailed filtering process figure of mobile search results.
Embodiment
Below in conjunction with accompanying drawing the present invention is elaborated.
Search Results filter method under a kind of mobile scene provided by the invention; As shown in Figure 1, filtered pretreatment stage before this, mainly comprise subscriber segmentation; Make up the user characteristics model and make up user's community network; Respectively corresponding following step (1) is to step (3), is filtration stage as a result then, corresponding following step (4).Concrete treatment step is following:
1, filters pretreatment stage, comprise the steps (1) to step (3).
(1) subscriber segmentation adopts data mining method the user to be segmented the user data set that existing telecom operators provide; Collected inside a large amount of user data, like user's historical position information, historical message registration; Record is write down and browses in user's historical query; Historical business datum etc., the present invention mainly comes the user is segmented with user's historical position information, and concrete steps are following:
(a) according to user's historical position change frequency the user is divided, user's historical position information has write down user's historical position L and corresponding temporal information T, and positional information L is recorded in the data set with the form of longitude and latitude; As (30.2332; 114.3243), temporal information T is with the form record of time point, the longitude and latitude of adjacent twice historical position of known users; Adopt longitude and latitude range formula (formula (1)) to be easy to calculate its distance, establish first position L
1Longitude and latitude be (lon
1, lat
1), second position L
2Longitude and latitude be (lon
2, lat
2), according to the benchmark of 0 degree warp, east longitude get on the occasion of, west longitude is got negative value, north latitude is by (90 °-lat) bring calculating into, south latitude is by (90 °+lat) bring calculating into then can be calculated the distance between 2 with formula (1).
C=sin(lat
1)·sin(lat
2)·cos(lon
1-lon
2)+cos(lat
1)·cos(lat
2)
To each user U
i, (i=1,2 ..., N), calculate the historical position accumulative total change frequency F in its nearest a period of time Δ T (as one month)
i, (i=1,2 ..., N), wherein, N representes number of users.
Shown in (2), (L
1, T
1), (L
2, T
2) ..., (L
M, T
M) be user U
i, (i=1,2 ..., N) the historical position information in nearest a period of time Δ T, (L
K-1, T
K-1) and (L
k, T
k) be twice adjacent historical position of user and temporal information, Dis (L
k, L
K-1) and T
k-T
K-1Be respectively the poor of adjacent twice historical position distance and time.M representes active user's historical position quantity, and k representes the sequence number of historical position.
Add up all users' F, obtain the interval Ω of overall range of F, Ω is divided into the interval Ω of plurality of sub
1, Ω
2..., Ω
n, n representes user group's quantity, and these sub-ranges characterize different user groups with F, and the user is divided in the corresponding sub-range according to its F, and as shown in Figure 2, the F of user A is higher, possibly be the business people who often goes on business.The F of user B is lower, then possibly often be the long period all in a certain fixed position, as being a certain college student, according to the change frequency F of position, the user is carried out a preliminary division like this, the different Ω of colony that the user is divided into
1, Ω
2..., Ω
nΩ divided to adopt the mode of dividing equally, also can preestablish a criteria for classifying by system.
(b) next to each Ω
j, (j=1,2; ..., n, j represent the sequence number of colony) user of lining carries out cluster by historical position information; It is one type that the user that the position is contiguous gathers; Correlation study research shows that the contiguous user in geographic position has similar user characteristics to a certain extent, adopts the k means clustering algorithm to each Ω
j, (j=1,2 ..., the user in n) carries out cluster, and step is following:
(b1) at first calculate each user U
i, (i=1,2 ..., N) the center O of the historical position in the Δ T time
i, according to O
iThe user is carried out cluster; I representes user's sequence number;
(b2) from Ω
j, (j=1,2 ..., a n) middle picked at random k user, each user U
q, (q=1,2 ..., k) represent an initial user bunch C
q, (q=1,2 ... k), its O
q, (q=1,2 ..., the k) initial center of representative of consumer bunch;
(b3) to Ω
j, (j=1,2 ..., n) in remaining each user, calculate itself and each user bunch C
q, (q=1,2 ... k) center O
q, (q=1,2 ..., distance k) (longitude and latitude range formula) assign to be given nearest user bunch with it;
(b4) recomputate each user's bunch new central value O then
q, (q=1,2 ..., k), replace old central value.By formula (3) calculation criterion function E
jValue, if E
jValue restrain then cluster process and finish, otherwise, change step b3.
Shown in (3), Dis (U, C
q) represent Ω
j, (j=1,2 ..., user and user bunch C in n)
q, (q=1,2 ... k) center O
q, (q=1,2 ..., distance k).
Cluster obtains compact user bunch, like this at Ω
1, Ω
2..., Ω
nOn the basis of dividing, the user further has been divided into the littler G of colony
1, G
2..., G
m, realize subscriber segmentation.
(2) make up the user characteristics model, user's historical query record has well characterized user's interest characteristics, through the historical query record of analysis user, adopts vectorial empty progressive die type that the user is carried out feature modeling, and its step comprises:
(a) add up all interior historical query records of all user's Δ T times, statistics obtains the speech q of d inequality
1, q
2..., q
d, as d dimension of vector space, user's proper vector is expressed as f
Ui={ (q
1, v
1), (q
2, v
2) ..., (q
d, v
d), (i=1,2 ..., N), v
a, (a=1,2 ..., d) represent the weights of each dimension.
(b) adopt TF/IDF (word frequency/contrary document frequency) model, to each user U
i, (i=1,2 ..., N), calculate the weights of its each dimension of proper vector.To q
1, q
2..., q
dIn each speech q
a, (a=1,2 ..., d), if it does not appear in user's historical query record, its corresponding weight value v then
a, (a=1,2 ...; D) be 0, otherwise be its TF/IDF value, TF is a word frequency; Here occur the number of times of this speech in the historical query record for the user, the promptly contrary document frequency of IDF is added up the number D that occurred the user of this speech in those historical query records; The IDF value is log (N/D), and N is all numbers of users, and the TF/IDF value is the product of TF and IDF.
(3) digging user social networks information, the historical message registration of analysis user is to each user U
i, (i=1,2 ...; N), its social networks is rendered as a star topology figure with this user-center, and is as shown in Figure 3, Centroid B representative of consumer oneself; Star node A, C, D, E; F, representative such as G and B have the user of message registration, the degree of relationship between the weight ψ representative of consumer on limit, this step mainly is the value of estimation ψ.
User's historical message registration data recording the message registration between all users, comprise the id number of both call sides), the conversation start time, the end of conversation time etc. are to each user U
i, (i=1,2 ..., N), analyze the message registration in its Δ T time, to each user u of message registration is arranged with it
x, (x=1,2 ..., e, e represent to have with it user's number of message registration), analyze itself and U
i, (i=1,2 ..., N) the total talk times α in Δ T, total duration of call β, conversation rule γ, these factors of analysis-by-synthesis can roughly be inferred U
i, (i=1,2 ..., N) and u
x, (x=1,2 ..., the ψ of degree of relationship between e)
Ix
Total talk times α is easier to statistics with total duration of call β ratio and obtains; But they all are the statistic of bulking property, and are more single, the degree of relationship between can only be the generally rough body estimating user; And ignored important minutia; Whether even like the distribution in time of each conversation incident, be integral body evenly or local uniform etc., characterize U so also introduced this characteristic factor of conversation rule γ here
i, (i=1,2 ..., N) and u
x, (x=1,2 ..., the degree of relationship between e) through the time characteristic distributions of all the conversation incidents in the statistical study time Δ T, uses the thought of variance, suc as formula (4) (5) (6), t
h, (h=1,2 ..., α) be each conversation start time, Δ t
hBe the mistiming between adjacent twice message registration, S
tBe its variance, γ is inversely proportional to S
t, shown in (6), the conversation in little this section of expression period of variance is more regular, and γ is corresponding bigger, and vice versa.
Δt
h=t
h-t
h-1,(h=2,3,...,α) (4)
With the α that calculates, beta, gamma carries out normalization to be handled, and obtains the value between 0 and 1 scope, ψ
Ix, (i=1,2 ..., N, x=1,2 ..., value e) adopts formula (8) to calculate, and it is to take all factors into consideration α, the weighted value that beta, gamma obtains, in the formula (8), 0≤λ
1≤1,0≤λ
2≤1,0≤λ
3≤1, and λ
1+ λ
2+ λ
3=1, its default value is got average 1/3.
ψ
ix=λ
1·α+λ
2·β+λ
3·γ,(λ
1+λ
2+λ
3=1) (8)
Through the analysis and the calculating of this step, just obtained each user U like this
i, (i=1,2 ..., social networks information N) comprises its associated with it user u
x, (x=1,2 ..., the ψ of degree of relationship between e)
Ix
(4) Search Results filters; Preceding step (1) to step (3) all is the preparatory stage; Be for the Search Results filtering services of this step; The user characteristics model that step (2) is set up is to be used for Search Results is carried out content-based filtration, and the user social contact network information that subscriber segmentation that step (1) is done and step (3) are excavated is to be used for Search Results is carried out collaborative filtering.
This step is carried out content-based filtration earlier to Search Results, carries out collaborative filtering then.To reach personalization and the purpose of simplifying Search Results.
User U
i, (i=1,2; ..., N) Q is once searched in submission, and searching request is at first handled by existing internet search engine; Existing internet search engine returns an initial results collection to search Q, and this result set is bigger usually, and the preceding φ bar result who chooses in this result set filters; If not enough φ bar is then chosen whole initial results collection, as result set R to be filtered
1, R
2..., R
Z, φ is an empirical value, is preestablished by system, as is set at 300, Z is a result's to be filtered number.Result's filtering process is as shown in Figure 5, and step is following:
(a) treat filter result collection R
1, R
2..., R
Z, set up proper vector, adopt the d gt of setting up in the step (2) these results to be set up proper vector, R
r(r=1,2 ..., proper vector Z) is expressed as f
Rr={ q
1, v
1), (q
2, v
2) ..., (q
d, v
d), (r=1,2 ..., Z), v
a, (a=1,2 ..., the weights of d) representing each to tie up.Same TF/IDF (word frequency/contrary document frequency) model of using in the step (2) that adopts calculates f
Rr, (r=1,2 ..., Z) the weights v on each dimension
a, (a=1,2 ..., d), to q
1, q
2... q
dIn each speech q
a, (a=1,2 ..., d), if it does not appear at R
r, (r=1,2 ..., Z) in, then its weights are 0, otherwise are its TF/IDF value, TF is that it is at R
r, (r=1,2 ..., Z) the middle number of times that occurs, the promptly contrary document frequency of IDF is added up the z of number as a result that those comprise this speech, and the IDF value is log (Z/z), and Z is all number of results, and the TF/IDF value is the product of TF and IDF.
(b) next seek active user U
i, (i=1,2 ..., similar users N) is chosen from two user's set, and the one, the G of colony in the step (1) under the user
gG is the sequence number of the colony under the user, and its span is 1 to m, the 2nd, and the user's in the user social contact network of setting up in the step (3) set; These two set are merged (user that repetition might be arranged) obtain S set, from S set, choose several similar users.
In the formula (10), || || the mould of expression vector.
(5) the vectorial cosine angle formula shown in the employing formula (10) calculates U
i, (i=1,2 ..., N) with S set in each user U
IsBetween similarity, shown in (9), vector angle is more little, cosine value is big more, similarity is big more, vice versa.f
UiAnd f
UisRepresent U respectively
iAnd U
IsProper vector, ψ (U
i, U
Is) represent U
iWith U
IsBetween degree of relationship, if U
IsAt U
iSocial networks in, ψ (U then
i, U
Is) get corresponding value, otherwise get null value.Choose preceding η user U from high to low by similarity
I1, U
I2..., U
I η, if not enough η, then choose all users among the S.η is an empirical value, is preestablished by system, can get 10 like its default value.
(c) begin to carry out the result then and filtered, filter process divides two stages, content-based filtration stage and collaborative filtering stage:
(c1) content-based before this filtration is to each bar initial results R to be filtered in (a)
r, (r=1,2 ..., Z), calculate itself and user U successively
i, (i=1,2 ..., the similarity between N), same, employing formula (10) is calculated similarity between the two, shown in (11), f
UiAnd f
RrRepresent U respectively
iAnd R
rProper vector.Filter by threshold value ζ according to similarity, the result less than ζ filters out with similarity, obtains intermediate result collection R
r, (r=1,2 ..., Z
ζ), filter the intermediate result that obtains and arrange by original sequencing.Threshold value ζ is an empirical value, preestablishes by system, and 0≤ζ≤1, its default value can be set at 0.65.
(c2) next to middle result set R
r, (r=1,2 ..., Z
ζ) carrying out collaborative filtering, collaborative filtering is based on similar users has this thought of similar interest usually, comes the active user is worked in coordination with recommendation with active user's similar users, adopts the user U that calculates in the step (b)
i, (i=1,2 ..., the similar users U of η N)
I1, U
I2..., U
I η, to middle R as a result
r, (r=1,2 ..., Z
ζ), calculate similarity sim ' (U by formula (12)
i, R
r) carry out collaborative filtering, in the formula, the vectorial cosine angle of employing formula (10) formula,
With
Represent U respectively
IsWith U
i, U
IsWith R
rBetween similarity.
Rank
r=θ·r+(1-θ)·sim′(U
i,R
r) (13)
According to sim ' (U
i, R
r) carrying out collaborative filtering by threshold epsilon, the result less than ε filters out with similarity, obtains interim result set R
r, (r=1,2 ..., Z
ε), r represents its sequencing ordering in interim result set, is followed successively by 1,2 ..., Z
ε), to R
r, (r=1,2 ..., Z
ε), calculate its order r and sim ' (U with weighting coefficient θ
i, R
r) weighted sum, as net result rank Rank
r, shown in (13), with this rank to R
r, (r=1,2 ..., Z
ε) rearrangement, obtain net result, return to the user, filter process finishes.Threshold epsilon and weighting coefficient θ are empirical value, preestablish by system, and 0≤ε≤1,0≤θ≤1, the default value of ε can be set at 0.85, and the default value of θ can be set at 0.5.
The present invention not only is confined to above-mentioned embodiment; Persons skilled in the art are according to content disclosed by the invention; Can adopt other multiple embodiment embodiment of the present invention, therefore, every employing project organization of the present invention and thinking; Do some simple designs that change or change, all fall into the scope of the present invention's protection.
Claims (7)
1. the Search Results filter method under the mobile scene, this method comprises the steps:
The 1st step is to user U
i, i=1,2 ..., the initial results collection R to be filtered of N
1, R
2..., R
Z, utilize the d gt to treat filter result and set up proper vector, R
rProper vector be expressed as f
Rr={ q
1, v
1), (q
2, v
2) ..., (q
d, v
d), v
aRepresent the weights on each dimension; Utilize word frequency/contrary document frequency TF/IDF Model Calculation f
Rr, the weights v on each dimension
a, to q
1, q
2... q
dIn each speech q
aIf it does not appear at R
r, in, then its weights are 0, otherwise are its TF/IDF value, TF is that it is at R
rThe middle number of times that occurs, the promptly contrary document frequency of IDF is added up the z of number as a result that those comprise this speech;
Wherein, the IDF value is log (Z/z), and Z is the number of initial results to be filtered, and the TF/IDF value is the product of TF and IDF, r=1, and 2 ..., Z, a=1,2 ..., d;
The 2nd step was sought active user U
i, similar users, from following two users set, choose, the one, the G of colony under the user
g, g is the sequence number of the colony under the user, its span is 1 to m, and the 2nd, the user's in the user social contact network set merges these two set and obtains S set, remembers that the user in this set is U
Is, utilize the vectorial cosine angle formula shown in the formula I to calculate user U
iWith each the user U in the S set
IsBetween similarity, shown in II, vector angle is more little, cosine value is big more, similarity is big more, vice versa; I representes user's sequence number, and N representes number of users, i=1, and 2 ..., N, f
UiAnd f
UisRepresent U respectively
iAnd U
IsProper vector, ψ (U
i, U
Is) represent U
iWith U
IsBetween degree of relationship, if U
IsAt U
iSocial networks in, ψ (U then
i, U
Is) get corresponding value, otherwise get null value; Choose preceding η user U from high to low by similarity
I1, U
I2..., U
I η, if not enough η, then choose all users among the S; η is a preset value;
The 3rd content-based filtration of step:
To each bar initial results R to be filtered
r, adopt formula III to calculate itself and user U successively
iBetween similarity, f
UiAnd f
RrRepresent U respectively
iAnd R
rProper vector; Filter by pre-set threshold ζ according to similarity, the initial results of similarity less than threshold value ζ filtered out, obtain intermediate result collection R
r, r=1,2 ..., Z
ζ, filter the intermediate result that obtains and arrange by original sequencing;
Wherein,
The 2nd step is to middle result set R
r, r=1,2 ..., Z
ζCarry out collaborative filtering, utilize user U
iThe similar users U of η
I1, U
I2..., U
I η, to middle R as a result
r,, calculate similarity sim ' (U by formula IV
i, R
r) carry out collaborative filtering, in the formula,
With
Represent U respectively
IsWith U
i, U
IsWith R
rBetween similarity;
Rank
r=θ r+ (1-θ) sim ' (U
i, R
r) formula V
According to sim ' (U
i, R
r) carry out collaborative filtering by pre-set threshold ε, the intermediate result of similarity less than ε is filtered out, obtain interim result set R
r, r=1,2 ..., Z
ε, r represents its sequencing ordering in interim result set, is followed successively by 1,2 ..., Z
ε, to interim R
r,, utilize formula V to calculate its order r and sim ' (U with predefined weighting coefficient θ
i, R
r) weighted sum, as net result rank Rank
r, with this rank to interim result set R
r, rearrangement obtains net result, returns to the user, and filter process finishes.
2. the Search Results filter method under the mobile scene according to claim 1 is characterized in that: the initial results collection in the 1st step obtains in the following manner:
For user U
iSubmit to and once search for Q; Searching request is at first handled by existing internet search engine; Existing internet search engine returns an initial results collection to search Q, and the preceding φ bar result who chooses in this result set filters, if not enough φ bar; Then choose whole initial results collection, as result set R to be filtered
1, R
2..., R
Z, φ is preestablished by system, and Z is a result's to be filtered number.
3. the Search Results filter method under the mobile scene according to claim 1 is characterized in that: the 1st step obtained result's to be filtered proper vector in the following manner:
Add up all the historical query records in all user's Δ T times, statistics obtains the speech q of d inequality
1, q
2..., q
d, as d dimension of vector space, user's proper vector is expressed as f
Ui={ q
1, v
1), (q
2, v
2) ..., (q
d, v
d), i=1,2 ..., N, v
a, a=1,2 ..., d represents the weights of each dimension.
4. the Search Results filter method under the mobile scene according to claim 1 is characterized in that: in the 2nd step, obtain similar users in the following manner:
The 4.1st step was sought active user U
i, similar users, with the G of colony under the user
gMerge with user's in the user social contact network set, obtain S set, g is the sequence number of the colony under the user, and its span is 1 to m, and m representes the number of colony;
The 4.2nd step employing formula VI calculates U
iWith each the user U in the S set
IsBetween similarity sim (U
i, U
Is), f
UiAnd f
UisRepresent U respectively
iAnd U
IsProper vector, ψ (U
i, U
Is) represent U
iWith U
IsBetween degree of relationship, if U
IsAt U
iSocial networks in, ψ (U then
i, U
Is) get corresponding value, otherwise get null value; Choose preceding η user U from high to low by similarity
I1, U
I2..., U
I η, if not enough η, then choose all users among the S; η is predefined value;
Wherein,
5. the Search Results filter method under the mobile scene according to claim 4 is characterized in that: in the 4.1st step, and the G of colony under the user
gObtain in the following manner:
The 5.1st step divided the user according to user's historical position change frequency; User's historical position information has write down user's historical position information L and corresponding temporal information T; Historical position information L is recorded in the data set with the form of longitude and latitude; Temporal information T is with the form record of time point, and the longitude and latitude of adjacent twice historical position of known users adopts the longitude and latitude range formula to calculate its distance;
To each user U
i,, calculate the historical position accumulative total change frequency F in its nearest a period of time Δ T according to formula VII
Ij:
(L
1, T
1), (L
2, T
2) ..., (L
M, T
M) be user U
i, the historical position information in nearest a period of time Δ T, (L
K-1, T
K-1) and (L
k, T
k) be twice adjacent historical position of user and temporal information, Dis (L
k, L
K-1) and T
k-T
K-1Be respectively the poor of adjacent twice historical position distance and time; M representes active user's historical position quantity, and k representes the sequence number of historical position;
The 5.2nd step all users' of statistics historical position adds up change frequency F, obtains the interval Ω of overall range of F, and Ω is divided into the interval Ω of plurality of sub
1, Ω
2..., Ω
n, n representes user group's quantity, and these sub-ranges characterize different user groups with F, and the user is divided in the corresponding sub-range according to its F, the different Ω of colony that the user is divided into
1, Ω
2..., Ω
n
The 5.3rd step is to each Ω
jIn the user carry out cluster by historical position information, it is one type that the user that the position is contiguous gathers, and again the user further has been divided into the littler G of colony
1, G
2..., G
m, j=1,2 ..., n, j represent the sequence number of colony.
6. the Search Results filter method under the mobile scene according to claim 5 is characterized in that: the 5.3rd step adopted the k means clustering algorithm to each Ω
jIn the user carry out cluster, step is following:
(b1) at first calculate each user U
iThe center O of the historical position in nearest a period of time Δ T
i, according to center O
iThe user is carried out cluster; I representes user's sequence number;
(b2) from Ω
jA middle picked at random k user, each user U
q, represent an initial user bunch C
q, its center O
qThe initial center of representative of consumer bunch, q 1,2 ..., k;
(b3) to Ω
j, in remaining each user, calculate itself and each user bunch C
qCenter O
qDistance, assign to give nearest user bunch with it;
(b4) recomputate each user's bunch new center O then
q,, replace old central value; By formula VIII calculation criterion function E
jValue, if E
jValue restrain then cluster process and finish, otherwise, change step b3;
Among the formula VIII, Dis (U, C
q) represent Ω
jIn user and user bunch C
q, center O
qDistance;
(b5) cluster obtains compact user bunch, like this at Ω
1, Ω
2..., Ω
nOn the basis of dividing, the user further has been divided into the littler G of colony
1, G
2..., G
m, realize subscriber segmentation.
7. the Search Results filter method under the mobile scene according to claim 4 is characterized in that: in the 4.1st step, the user social contact network makes up in the following manner:
The 7.1st step was adopted word frequency/contrary document frequency TF/IDF model, to each user U
iCalculate the weights of its each dimension of proper vector; To q
1, q
2..., q
dIn each speech q
A,If it does not appear in user's the historical query record, then its corresponding weight value v
aBe 0, otherwise be its TF/IDF value, TF is a word frequency, and the promptly contrary document frequency of IDF is added up the number D that occurred the user of this speech in those historical query records, and the IDF value is log (N/D), and N is all numbers of users, and the TF/IDF value is the product of TF and IDF;
The 7.2nd step is to each user U
iAnalyze the message registration in its nearest Δ T time a period of time, to each user u of message registration is arranged with it
xAnalyze itself and U
iTotal talk times α in Δ T, total duration of call β, conversation rule γ utilizes formula IX to calculate U
iWith u
xBetween the ψ of degree of relationship
Ix
ψ
Ix=λ
1α+λ
2β+λ
3γ formula IX
In the formula, 0≤λ
1≤1,0≤λ
2≤1,0≤λ
3≤1, and λ
1+ λ
2+ λ
3=1
Δt
h=t
h-t
h-1,h=2,3,...,α
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110458155 CN102591966B (en) | 2011-12-31 | 2011-12-31 | Filtering method of search results in mobile environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110458155 CN102591966B (en) | 2011-12-31 | 2011-12-31 | Filtering method of search results in mobile environment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102591966A true CN102591966A (en) | 2012-07-18 |
CN102591966B CN102591966B (en) | 2013-12-18 |
Family
ID=46480604
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 201110458155 Expired - Fee Related CN102591966B (en) | 2011-12-31 | 2011-12-31 | Filtering method of search results in mobile environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102591966B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102867031A (en) * | 2012-08-27 | 2013-01-09 | 百度在线网络技术(北京)有限公司 | Method and system for optimizing point of interest (POI) searching results, mobile terminal and server |
WO2014101846A1 (en) * | 2012-12-28 | 2014-07-03 | Huawei Technologies Co., Ltd. | Predictive caching in a distributed communication system |
CN104317900A (en) * | 2014-10-24 | 2015-01-28 | 重庆邮电大学 | Multiattribute collaborative filtering recommendation method oriented to social network |
CN104462239A (en) * | 2014-11-18 | 2015-03-25 | 电信科学技术第十研究所 | Customer relation discovery method based on data vectorization spatial analysis |
CN104866474A (en) * | 2014-02-20 | 2015-08-26 | 阿里巴巴集团控股有限公司 | Personalized data searching method and device |
CN105243135A (en) * | 2015-09-30 | 2016-01-13 | 百度在线网络技术(北京)有限公司 | Method and apparatus for showing search result |
CN106570699A (en) * | 2015-10-08 | 2017-04-19 | 平安科技(深圳)有限公司 | Client contact information excavation method and server |
CN111212381A (en) * | 2019-12-18 | 2020-05-29 | 中通服建设有限公司 | Mobile user behavior data analysis method and device, computer equipment and medium |
CN113220969A (en) * | 2020-02-06 | 2021-08-06 | 百度在线网络技术(北京)有限公司 | Advertisement determination method, device, equipment and storage medium |
CN113704604A (en) * | 2021-08-24 | 2021-11-26 | 山东库睿科技有限公司 | Search system and search method |
CN113792180A (en) * | 2021-08-30 | 2021-12-14 | 北京百度网讯科技有限公司 | Duplicate removal method and device in recommendation scene, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1903460A1 (en) * | 2006-09-21 | 2008-03-26 | Sony Corporation | Information processing |
CN101819572A (en) * | 2009-09-15 | 2010-09-01 | 电子科技大学 | Method for establishing user interest model |
CN101923545A (en) * | 2009-06-15 | 2010-12-22 | 北京百分通联传媒技术有限公司 | Method for recommending personalized information |
CN102236646A (en) * | 2010-04-20 | 2011-11-09 | 得利在线信息技术(北京)有限公司 | Personalized item-level vertical pagerank algorithm iRank |
-
2011
- 2011-12-31 CN CN 201110458155 patent/CN102591966B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1903460A1 (en) * | 2006-09-21 | 2008-03-26 | Sony Corporation | Information processing |
CN101923545A (en) * | 2009-06-15 | 2010-12-22 | 北京百分通联传媒技术有限公司 | Method for recommending personalized information |
CN101819572A (en) * | 2009-09-15 | 2010-09-01 | 电子科技大学 | Method for establishing user interest model |
CN102236646A (en) * | 2010-04-20 | 2011-11-09 | 得利在线信息技术(北京)有限公司 | Personalized item-level vertical pagerank algorithm iRank |
Non-Patent Citations (2)
Title |
---|
王秀平等: "个性化学习推荐系统的设计与实现", 《微型电脑应用》 * |
胡娟丽等: "基于典型反馈的个性化文本信息过滤", 《计算机应用》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102867031A (en) * | 2012-08-27 | 2013-01-09 | 百度在线网络技术(北京)有限公司 | Method and system for optimizing point of interest (POI) searching results, mobile terminal and server |
WO2014101846A1 (en) * | 2012-12-28 | 2014-07-03 | Huawei Technologies Co., Ltd. | Predictive caching in a distributed communication system |
CN104866474A (en) * | 2014-02-20 | 2015-08-26 | 阿里巴巴集团控股有限公司 | Personalized data searching method and device |
CN104866474B (en) * | 2014-02-20 | 2018-10-09 | 阿里巴巴集团控股有限公司 | Individuation data searching method and device |
CN104317900A (en) * | 2014-10-24 | 2015-01-28 | 重庆邮电大学 | Multiattribute collaborative filtering recommendation method oriented to social network |
CN104462239B (en) * | 2014-11-18 | 2017-08-25 | 电信科学技术第十研究所 | A kind of customer relationship based on data vector spatial analysis finds method |
CN104462239A (en) * | 2014-11-18 | 2015-03-25 | 电信科学技术第十研究所 | Customer relation discovery method based on data vectorization spatial analysis |
CN105243135B (en) * | 2015-09-30 | 2019-09-20 | 百度在线网络技术(北京)有限公司 | Show the method and device of search result |
CN105243135A (en) * | 2015-09-30 | 2016-01-13 | 百度在线网络技术(北京)有限公司 | Method and apparatus for showing search result |
CN106570699A (en) * | 2015-10-08 | 2017-04-19 | 平安科技(深圳)有限公司 | Client contact information excavation method and server |
CN111212381A (en) * | 2019-12-18 | 2020-05-29 | 中通服建设有限公司 | Mobile user behavior data analysis method and device, computer equipment and medium |
CN113220969A (en) * | 2020-02-06 | 2021-08-06 | 百度在线网络技术(北京)有限公司 | Advertisement determination method, device, equipment and storage medium |
CN113704604A (en) * | 2021-08-24 | 2021-11-26 | 山东库睿科技有限公司 | Search system and search method |
CN113792180A (en) * | 2021-08-30 | 2021-12-14 | 北京百度网讯科技有限公司 | Duplicate removal method and device in recommendation scene, electronic equipment and storage medium |
CN113792180B (en) * | 2021-08-30 | 2024-02-23 | 北京百度网讯科技有限公司 | Method and device for removing duplicate in recommended scene, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN102591966B (en) | 2013-12-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102591966B (en) | Filtering method of search results in mobile environment | |
CN102654860B (en) | Personalized music recommendation method and system | |
CN103473291B (en) | Personalized service recommendation system and method based on latent semantic probability models | |
CN103678647B (en) | A kind of method and system for realizing information recommendation | |
CN108595461B (en) | Interest exploration method, storage medium, electronic device and system | |
CN108737856B (en) | Social relation perception IPTV user behavior modeling and program recommendation method | |
CN106649657A (en) | Recommended system and method with facing social network for context awareness based on tensor decomposition | |
CN101770520A (en) | User interest modeling method based on user browsing behavior | |
CN103425763B (en) | User based on SNS recommends method and device | |
CN102663047B (en) | Method and device for mining social relationship during mobile reading | |
CN106375369A (en) | Mobile Web service recommendation method and collaborative recommendation system based on user behavior analysis | |
CN101192235A (en) | Method, system and equipment for delivering advertisement based on user feature | |
CN103854065A (en) | Customer loss prediction method and device | |
CN105608121B (en) | Personalized recommendation method and device | |
CN101127046A (en) | Method and system for sequencing to blog article | |
CN109902235A (en) | User preference based on bat optimization clusters Collaborative Filtering Recommendation Algorithm | |
CN105718576A (en) | Individual position recommending system related to geographical features | |
CN107679101A (en) | It is a kind of that method is recommended based on the network service of position and trusting relationship | |
CN113961712B (en) | Knowledge-graph-based fraud telephone analysis method | |
CN109359868A (en) | A kind of construction method and system of power grid user portrait | |
CN106779946A (en) | A kind of film recommends method and device | |
CN108521586A (en) | The IPTV TV program personalizations for taking into account time context and implicit feedback recommend method | |
EP2652909A1 (en) | Method and system for carrying out predictive analysis relating to nodes of a communication network | |
CN110852224B (en) | Expression recognition method and related device | |
CN107368499A (en) | A kind of client's tag modeling and recommendation method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20131218 Termination date: 20201231 |
|
CF01 | Termination of patent right due to non-payment of annual fee |