CN102591966A - Filtering method of search results in mobile environment - Google Patents

Filtering method of search results in mobile environment Download PDF

Info

Publication number
CN102591966A
CN102591966A CN2011104581556A CN201110458155A CN102591966A CN 102591966 A CN102591966 A CN 102591966A CN 2011104581556 A CN2011104581556 A CN 2011104581556A CN 201110458155 A CN201110458155 A CN 201110458155A CN 102591966 A CN102591966 A CN 102591966A
Authority
CN
China
Prior art keywords
user
formula
users
value
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011104581556A
Other languages
Chinese (zh)
Other versions
CN102591966B (en
Inventor
金海�
赵峰
袁平鹏
严奉伟
方飞
谢海洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN 201110458155 priority Critical patent/CN102591966B/en
Publication of CN102591966A publication Critical patent/CN102591966A/en
Application granted granted Critical
Publication of CN102591966B publication Critical patent/CN102591966B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a filtering method of search results in a mobile environment. The method comprises the steps of: finely dividing users into different groups according to history position information characteristics of the users; characteristically modeling the users according to the history query records of the users; analyzing history call records of the users, establishing a social intercourse relation network of the users and calculating the social intercourse relation importance among the users; and during search, firstly, filtering the search results based on contents by using an established user characteristic model, secondly, cooperatively filtering the search results with the finely divided user group information and the excavated information of the social intercourse relation network of the users, and thirdly, returning the search results to the users. With the method for excavating the user characteristics and filtering the information, the search results can be better filtered in a personalized way, a mass of unrelated search results can be removed, a result set can be simplified, and the personalized precise search in the mobile environment can be realized.

Description

Search Results filter method under a kind of mobile scene
Technical field
The invention belongs to information retrieval field, be specifically related to the Search Results filter method under a kind of mobile scene, this method is applicable to the personalized search that moves under the scene.
Background technology
In the more than ten years in past, search engine technique has been obtained develop rapidly, and traditional internet hunt is implemented to quite ripe that business model has all developed from technology, and has obtained immense success.In recent years, be that the emerging technology and the application of representative continues to bring out with the mobile Internet, mobile search is one of mobile Internet important application.
Mobile search is because the portable terminal movability; Portable; And limitation such as screen size, processing power and available bandwidth, make it can not directly indiscriminately imitate the implementation of existing internet hunt, main cause has following 2 points: (1) traditional internet search engine returns to a large amount of result of user usually; In fact in most cases these results are as far as the user, and it is incoherent having over half.One of them chief reason search engine is just carrying out coupling simple to search key; Do not consider that other information are (like user context information; Individual's preference etc.), add the surge of internet information, caused the generation of a lot " rubbish result "; The user has in Search Results, oneself screen, and this has increased the weight of user's burden greatly.Moving under the scene; Because limitation such as mobile terminal screen keyboard size, processing power and available bandwidths, said circumstances is that the user is flagrant, the one, and a large amount of rubbish results waste valuable flow; The 2nd, the user carries out the page turning screening to Search Results on portable terminal be very inconvenient; This has determined that mobile search must be to search for accurately, return to the user try one's best few, result accurately; (2) to same search key, that the internet search engine of system returns all users is machine-made result, yet different user is because its background knowledge is different; Hobby is different; Information requirement is different, and same key word is to different people, in different fields; Different time all possibly expressed the different meanings with the place, and what the user needed often is all very little subclass in Search Results the inside.The movability of portable terminal; Portability and individual's property; Make the user to obtain information needed anywhere or anytime, make that the personalized search demand is stronger, this has determined that mobile search is that a kind of and individual subscriber characteristic (like interest etc.) and user's context are (like the time; Place, factors such as weather) search of relevant personalization.
What therefore, mobile search need realize is personalized accurate search.At present, domestic mobile search research still is in the starting stage, and the existing internet search technology of the technology that realizes is all still immature; Technology early has the vertical search technology, like mobile phone music search, novel search etc.; Adopting more implementation at present is to combine existing internet search technology and relevant ancillary technique, like the information filtering technology, earlier the user is carried out feature modeling; With this model Search Results is carried out personalization then and filter, filter out uncorrelated result, realize personalized precisely search.
User characteristics modeling common technology directed quantity spatial model and ontology model, vector space model is simple because of its principle, realizes using extensive relatively easily.
What information filtering technology was commonly used has content-based filtering technique and collaborative filtering technological; Content-based filtering technique is that the result is carried out feature extraction; The similarity of result of calculation and filtering profile (user model) is pressed setting threshold and is filtered, and analyzes with resultant content because be; Usually can reach filter effect preferably, but calculated amount is bigger.The collaborative filtering technology then has this thought of same interest preference according to the people of same type usually; Come user's Search Results is carried out collaborative filtering through the user similar with active user's interest, this technology has obtained good development and application in e-commerce field.
Summary of the invention
The purpose of this invention is to provide the Search Results filter method under a kind of mobile scene; This method is through digging user data (user's historical position information; Historical message registration etc.) set up user characteristics model and user social contact network; And Search Results is carried out content-based filtration and collaborative filtering respectively according to user characteristics model and user social contact network; Filter out incoherent Search Results, realize moving the accurate search of the personalization under the scene, this is of great value to improving mobile search user experience and user's viscosity.
Search Results filter method under a kind of mobile scene provided by the invention, this method comprises the steps:
The 1st step is to user U i, i=1,2 ..., the initial results collection R to be filtered of N 1, R 2..., R Z, utilize the d gt to treat filter result and set up proper vector, R rProper vector be expressed as f Rr={ (q 1, v 1), (q 2, v 2) ..., (q d, v d), v aRepresent the weights on each dimension; Utilize word frequency/contrary document frequency TF/IDF Model Calculation f Rr, the weights v on each dimension a, to q 1, q 2... q dIn each speech q aIf it does not appear at R r, in, then its weights are 0, otherwise are its TF/IDF value, TF is that it is at R rThe middle number of times that occurs, the promptly contrary document frequency of IDF is added up the z of number as a result that those comprise this speech;
Wherein, the IDF value is log (Z/z), and Z is the number of initial results to be filtered, and the TF/IDF value is the product of TF and IDF, r=1, and 2 ..., Z, a=1,2 ..., d;
The 2nd step was sought active user U i, similar users, from following two users set, choose, the one, the G of colony under the user g, g is the sequence number of the colony under the user, its span is 1 to m, and the 2nd, the user's in the user social contact network set merges these two set and obtains S set, remembers that the user in this set is U Is, utilize the vectorial cosine angle formula shown in the formula I to calculate user U iWith each the user U in the S set IsBetween similarity, shown in II, vector angle is more little, cosine value is big more, similarity is big more, vice versa; I representes user's sequence number, and N representes number of users, i=1, and 2 ..., N, f UiAnd f UisRepresent U respectively iAnd U IsProper vector, ψ (U i, U Is) represent U iWith U IsBetween degree of relationship, if U IsAt U iSocial networks in, ψ (U then i, U Is) get corresponding value, otherwise get null value; Choose preceding η user U from high to low by similarity I1, U I2..., U I η, if not enough η, then choose all users among the S; η is a preset value;
Sim ( U i , U Is ) = ( 1 + ψ ( U i , U Is ) ) · Cos ( f U i , f U Is ) Formula I
Cos ( f U i , f U Is ) = f U i · f U Is | | f U Is | | · | | f U Is | | Formula II
The 3rd content-based filtration of step:
To each bar initial results R to be filtered r, adopt formula III to calculate itself and user U successively iBetween similarity, f UiAnd f RrRepresent U respectively iAnd R rProper vector; Filter by pre-set threshold ζ according to similarity, the initial results of similarity less than threshold value ζ filtered out, obtain intermediate result collection R r, r=1,2 ..., Z ζ, filter the intermediate result that obtains and arrange by original sequencing;
Sim ( U i , R r ) = Cos ( f U i , f R r ) Formula III
Wherein, Cos ( f U i , f R r ) = f U i · f R r | | f U i | | · | | f R r | |
The 2nd step is to middle result set R r, r=1,2 ..., Z ζCarry out collaborative filtering, utilize user U iThe similar users U of η I1, U I2..., U I η, to middle R as a result r,, calculate similarity sim ' (U by formula IV i, R r) carry out collaborative filtering, in the formula,
Figure BDA0000127809440000045
With
Figure BDA0000127809440000046
Represent U respectively IsWith U i, U IsWith R rBetween similarity;
Sim ′ ( U i , R r ) = Σ s = 1 η ( Cos ( f U Is , f U i ) · Cos ( f U Is , f R r ) ) Formula IV
Rank r=θ r+ (1-θ) sim ' (U i, R r) formula V
According to sim ' (U i, R r) carry out collaborative filtering by pre-set threshold ε, the intermediate result of similarity less than ε is filtered out, obtain interim result set R r, r=1,2 ..., Z ε, r represents its sequencing ordering in interim result set, is followed successively by 1,2 ..., Z ε, to interim R r,, utilize formula V to calculate its order r and sim ' (U with predefined weighting coefficient θ i, R r) weighted sum, as net result rank Rank r, with this rank to interim result set R r, rearrangement obtains net result, returns to the user, and filter process finishes.
Search Results filter method under the mobile scene provided by the invention has comprehensively adopted data digging method (classification, cluster), content-based filter algorithm and collaborative filtering.Particularly, the present invention has following effect and advantage:
(1) accuracy is high, and collaborative filtering is carried out in the novelty of the present invention user social contact network information is analyzed simultaneously on the basis of traditional content-based filtration, largely improved accuracy.
(2) adaptability is strong, and the present invention considers mobile subscriber colony and individual's diversity, can adapt to various user groups and individual's individual demand well.
(3) extensibility is high, and filter method provided by the invention also can be used for its mobile Internet and use except being used for mobile search, accurate advertisement input etc., and the user characteristics modeling method also can be applied to Customer Relation Management (CRM) etc.
Description of drawings
Fig. 1 is the overall flow figure of the inventive method;
Fig. 2 is mobile subscriber's historical position change frequency sketch;
Fig. 3 is the process flow diagram of mobile subscriber's opsition dependent cluster;
Fig. 4 is mobile subscriber's social networks structural drawing;
Fig. 5 is the detailed filtering process figure of mobile search results.
Embodiment
Below in conjunction with accompanying drawing the present invention is elaborated.
Search Results filter method under a kind of mobile scene provided by the invention; As shown in Figure 1, filtered pretreatment stage before this, mainly comprise subscriber segmentation; Make up the user characteristics model and make up user's community network; Respectively corresponding following step (1) is to step (3), is filtration stage as a result then, corresponding following step (4).Concrete treatment step is following:
1, filters pretreatment stage, comprise the steps (1) to step (3).
(1) subscriber segmentation adopts data mining method the user to be segmented the user data set that existing telecom operators provide; Collected inside a large amount of user data, like user's historical position information, historical message registration; Record is write down and browses in user's historical query; Historical business datum etc., the present invention mainly comes the user is segmented with user's historical position information, and concrete steps are following:
(a) according to user's historical position change frequency the user is divided, user's historical position information has write down user's historical position L and corresponding temporal information T, and positional information L is recorded in the data set with the form of longitude and latitude; As (30.2332; 114.3243), temporal information T is with the form record of time point, the longitude and latitude of adjacent twice historical position of known users; Adopt longitude and latitude range formula (formula (1)) to be easy to calculate its distance, establish first position L 1Longitude and latitude be (lon 1, lat 1), second position L 2Longitude and latitude be (lon 2, lat 2), according to the benchmark of 0 degree warp, east longitude get on the occasion of, west longitude is got negative value, north latitude is by (90 °-lat) bring calculating into, south latitude is by (90 °+lat) bring calculating into then can be calculated the distance between 2 with formula (1).
C=sin(lat 1)·sin(lat 2)·cos(lon 1-lon 2)+cos(lat 1)·cos(lat 2)
Dis ( L 1 , L 2 ) = R · arccos ( C ) · π 180 - - - ( 1 )
To each user U i, (i=1,2 ..., N), calculate the historical position accumulative total change frequency F in its nearest a period of time Δ T (as one month) i, (i=1,2 ..., N), wherein, N representes number of users.
F i = 1 ΔT Σ 1 M | Dis ( L k , L k - 1 ) T k - T k - 1 | - - - ( 2 )
Shown in (2), (L 1, T 1), (L 2, T 2) ..., (L M, T M) be user U i, (i=1,2 ..., N) the historical position information in nearest a period of time Δ T, (L K-1, T K-1) and (L k, T k) be twice adjacent historical position of user and temporal information, Dis (L k, L K-1) and T k-T K-1Be respectively the poor of adjacent twice historical position distance and time.M representes active user's historical position quantity, and k representes the sequence number of historical position.
Add up all users' F, obtain the interval Ω of overall range of F, Ω is divided into the interval Ω of plurality of sub 1, Ω 2..., Ω n, n representes user group's quantity, and these sub-ranges characterize different user groups with F, and the user is divided in the corresponding sub-range according to its F, and as shown in Figure 2, the F of user A is higher, possibly be the business people who often goes on business.The F of user B is lower, then possibly often be the long period all in a certain fixed position, as being a certain college student, according to the change frequency F of position, the user is carried out a preliminary division like this, the different Ω of colony that the user is divided into 1, Ω 2..., Ω nΩ divided to adopt the mode of dividing equally, also can preestablish a criteria for classifying by system.
(b) next to each Ω j, (j=1,2; ..., n, j represent the sequence number of colony) user of lining carries out cluster by historical position information; It is one type that the user that the position is contiguous gathers; Correlation study research shows that the contiguous user in geographic position has similar user characteristics to a certain extent, adopts the k means clustering algorithm to each Ω j, (j=1,2 ..., the user in n) carries out cluster, and step is following:
(b1) at first calculate each user U i, (i=1,2 ..., N) the center O of the historical position in the Δ T time i, according to O iThe user is carried out cluster; I representes user's sequence number;
(b2) from Ω j, (j=1,2 ..., a n) middle picked at random k user, each user U q, (q=1,2 ..., k) represent an initial user bunch C q, (q=1,2 ... k), its O q, (q=1,2 ..., the k) initial center of representative of consumer bunch;
(b3) to Ω j, (j=1,2 ..., n) in remaining each user, calculate itself and each user bunch C q, (q=1,2 ... k) center O q, (q=1,2 ..., distance k) (longitude and latitude range formula) assign to be given nearest user bunch with it;
(b4) recomputate each user's bunch new central value O then q, (q=1,2 ..., k), replace old central value.By formula (3) calculation criterion function E jValue, if E jValue restrain then cluster process and finish, otherwise, change step b3.
E j = Σ q = 1 k Σ U ∈ Ω j Dis ( U , C q ) , (j=1,2,....n) (3)
Shown in (3), Dis (U, C q) represent Ω j, (j=1,2 ..., user and user bunch C in n) q, (q=1,2 ... k) center O q, (q=1,2 ..., distance k).
Cluster obtains compact user bunch, like this at Ω 1, Ω 2..., Ω nOn the basis of dividing, the user further has been divided into the littler G of colony 1, G 2..., G m, realize subscriber segmentation.
(2) make up the user characteristics model, user's historical query record has well characterized user's interest characteristics, through the historical query record of analysis user, adopts vectorial empty progressive die type that the user is carried out feature modeling, and its step comprises:
(a) add up all interior historical query records of all user's Δ T times, statistics obtains the speech q of d inequality 1, q 2..., q d, as d dimension of vector space, user's proper vector is expressed as f Ui={ (q 1, v 1), (q 2, v 2) ..., (q d, v d), (i=1,2 ..., N), v a, (a=1,2 ..., d) represent the weights of each dimension.
(b) adopt TF/IDF (word frequency/contrary document frequency) model, to each user U i, (i=1,2 ..., N), calculate the weights of its each dimension of proper vector.To q 1, q 2..., q dIn each speech q a, (a=1,2 ..., d), if it does not appear in user's historical query record, its corresponding weight value v then a, (a=1,2 ...; D) be 0, otherwise be its TF/IDF value, TF is a word frequency; Here occur the number of times of this speech in the historical query record for the user, the promptly contrary document frequency of IDF is added up the number D that occurred the user of this speech in those historical query records; The IDF value is log (N/D), and N is all numbers of users, and the TF/IDF value is the product of TF and IDF.
(3) digging user social networks information, the historical message registration of analysis user is to each user U i, (i=1,2 ...; N), its social networks is rendered as a star topology figure with this user-center, and is as shown in Figure 3, Centroid B representative of consumer oneself; Star node A, C, D, E; F, representative such as G and B have the user of message registration, the degree of relationship between the weight ψ representative of consumer on limit, this step mainly is the value of estimation ψ.
User's historical message registration data recording the message registration between all users, comprise the id number of both call sides), the conversation start time, the end of conversation time etc. are to each user U i, (i=1,2 ..., N), analyze the message registration in its Δ T time, to each user u of message registration is arranged with it x, (x=1,2 ..., e, e represent to have with it user's number of message registration), analyze itself and U i, (i=1,2 ..., N) the total talk times α in Δ T, total duration of call β, conversation rule γ, these factors of analysis-by-synthesis can roughly be inferred U i, (i=1,2 ..., N) and u x, (x=1,2 ..., the ψ of degree of relationship between e) Ix
Total talk times α is easier to statistics with total duration of call β ratio and obtains; But they all are the statistic of bulking property, and are more single, the degree of relationship between can only be the generally rough body estimating user; And ignored important minutia; Whether even like the distribution in time of each conversation incident, be integral body evenly or local uniform etc., characterize U so also introduced this characteristic factor of conversation rule γ here i, (i=1,2 ..., N) and u x, (x=1,2 ..., the degree of relationship between e) through the time characteristic distributions of all the conversation incidents in the statistical study time Δ T, uses the thought of variance, suc as formula (4) (5) (6), t h, (h=1,2 ..., α) be each conversation start time, Δ t hBe the mistiming between adjacent twice message registration, S tBe its variance, γ is inversely proportional to S t, shown in (6), the conversation in little this section of expression period of variance is more regular, and γ is corresponding bigger, and vice versa.
Δt h=t h-t h-1,(h=2,3,...,α) (4)
Δt ‾ = 1 α - 1 Σ h = 1 α Δt h - - - ( 5 )
S t = 1 α - 1 Σ h = 2 α ( Δt ‾ - Δt h ) 2 - - - ( 6 )
γ = 1 S t - - - ( 7 )
With the α that calculates, beta, gamma carries out normalization to be handled, and obtains the value between 0 and 1 scope, ψ Ix, (i=1,2 ..., N, x=1,2 ..., value e) adopts formula (8) to calculate, and it is to take all factors into consideration α, the weighted value that beta, gamma obtains, in the formula (8), 0≤λ 1≤1,0≤λ 2≤1,0≤λ 3≤1, and λ 1+ λ 2+ λ 3=1, its default value is got average 1/3.
ψ ix=λ 1·α+λ 2·β+λ 3·γ,(λ 123=1) (8)
Through the analysis and the calculating of this step, just obtained each user U like this i, (i=1,2 ..., social networks information N) comprises its associated with it user u x, (x=1,2 ..., the ψ of degree of relationship between e) Ix
(4) Search Results filters; Preceding step (1) to step (3) all is the preparatory stage; Be for the Search Results filtering services of this step; The user characteristics model that step (2) is set up is to be used for Search Results is carried out content-based filtration, and the user social contact network information that subscriber segmentation that step (1) is done and step (3) are excavated is to be used for Search Results is carried out collaborative filtering.
This step is carried out content-based filtration earlier to Search Results, carries out collaborative filtering then.To reach personalization and the purpose of simplifying Search Results.
User U i, (i=1,2; ..., N) Q is once searched in submission, and searching request is at first handled by existing internet search engine; Existing internet search engine returns an initial results collection to search Q, and this result set is bigger usually, and the preceding φ bar result who chooses in this result set filters; If not enough φ bar is then chosen whole initial results collection, as result set R to be filtered 1, R 2..., R Z, φ is an empirical value, is preestablished by system, as is set at 300, Z is a result's to be filtered number.Result's filtering process is as shown in Figure 5, and step is following:
(a) treat filter result collection R 1, R 2..., R Z, set up proper vector, adopt the d gt of setting up in the step (2) these results to be set up proper vector, R r(r=1,2 ..., proper vector Z) is expressed as f Rr={ q 1, v 1), (q 2, v 2) ..., (q d, v d), (r=1,2 ..., Z), v a, (a=1,2 ..., the weights of d) representing each to tie up.Same TF/IDF (word frequency/contrary document frequency) model of using in the step (2) that adopts calculates f Rr, (r=1,2 ..., Z) the weights v on each dimension a, (a=1,2 ..., d), to q 1, q 2... q dIn each speech q a, (a=1,2 ..., d), if it does not appear at R r, (r=1,2 ..., Z) in, then its weights are 0, otherwise are its TF/IDF value, TF is that it is at R r, (r=1,2 ..., Z) the middle number of times that occurs, the promptly contrary document frequency of IDF is added up the z of number as a result that those comprise this speech, and the IDF value is log (Z/z), and Z is all number of results, and the TF/IDF value is the product of TF and IDF.
(b) next seek active user U i, (i=1,2 ..., similar users N) is chosen from two user's set, and the one, the G of colony in the step (1) under the user gG is the sequence number of the colony under the user, and its span is 1 to m, the 2nd, and the user's in the user social contact network of setting up in the step (3) set; These two set are merged (user that repetition might be arranged) obtain S set, from S set, choose several similar users.
sim ( U i , U is ) = ( 1 + ψ ( U i , U is ) ) · cos ( f U i , f U is ) - - - ( 9 )
cos ( f U i , f U is ) = f U i · f U is | | f U is | | · | | f U is | | - - - ( 10 )
In the formula (10), || || the mould of expression vector.
(5) the vectorial cosine angle formula shown in the employing formula (10) calculates U i, (i=1,2 ..., N) with S set in each user U IsBetween similarity, shown in (9), vector angle is more little, cosine value is big more, similarity is big more, vice versa.f UiAnd f UisRepresent U respectively iAnd U IsProper vector, ψ (U i, U Is) represent U iWith U IsBetween degree of relationship, if U IsAt U iSocial networks in, ψ (U then i, U Is) get corresponding value, otherwise get null value.Choose preceding η user U from high to low by similarity I1, U I2..., U I η, if not enough η, then choose all users among the S.η is an empirical value, is preestablished by system, can get 10 like its default value.
(c) begin to carry out the result then and filtered, filter process divides two stages, content-based filtration stage and collaborative filtering stage:
(c1) content-based before this filtration is to each bar initial results R to be filtered in (a) r, (r=1,2 ..., Z), calculate itself and user U successively i, (i=1,2 ..., the similarity between N), same, employing formula (10) is calculated similarity between the two, shown in (11), f UiAnd f RrRepresent U respectively iAnd R rProper vector.Filter by threshold value ζ according to similarity, the result less than ζ filters out with similarity, obtains intermediate result collection R r, (r=1,2 ..., Z ζ), filter the intermediate result that obtains and arrange by original sequencing.Threshold value ζ is an empirical value, preestablishes by system, and 0≤ζ≤1, its default value can be set at 0.65.
sim ( U i , U r ) = cos ( f U i , f R r ) - - - ( 11 )
(c2) next to middle result set R r, (r=1,2 ..., Z ζ) carrying out collaborative filtering, collaborative filtering is based on similar users has this thought of similar interest usually, comes the active user is worked in coordination with recommendation with active user's similar users, adopts the user U that calculates in the step (b) i, (i=1,2 ..., the similar users U of η N) I1, U I2..., U I η, to middle R as a result r, (r=1,2 ..., Z ζ), calculate similarity sim ' (U by formula (12) i, R r) carry out collaborative filtering, in the formula, the vectorial cosine angle of employing formula (10) formula,
Figure BDA0000127809440000122
With
Figure BDA0000127809440000123
Represent U respectively IsWith U i, U IsWith R rBetween similarity.
sim ′ ( U i , R r ) = Σ s = 1 η ( cos ( f U is , f U i ) · cos ( f U is , f R r ) ) - - - ( 12 )
Rank r=θ·r+(1-θ)·sim′(U i,R r) (13)
According to sim ' (U i, R r) carrying out collaborative filtering by threshold epsilon, the result less than ε filters out with similarity, obtains interim result set R r, (r=1,2 ..., Z ε), r represents its sequencing ordering in interim result set, is followed successively by 1,2 ..., Z ε), to R r, (r=1,2 ..., Z ε), calculate its order r and sim ' (U with weighting coefficient θ i, R r) weighted sum, as net result rank Rank r, shown in (13), with this rank to R r, (r=1,2 ..., Z ε) rearrangement, obtain net result, return to the user, filter process finishes.Threshold epsilon and weighting coefficient θ are empirical value, preestablish by system, and 0≤ε≤1,0≤θ≤1, the default value of ε can be set at 0.85, and the default value of θ can be set at 0.5.
The present invention not only is confined to above-mentioned embodiment; Persons skilled in the art are according to content disclosed by the invention; Can adopt other multiple embodiment embodiment of the present invention, therefore, every employing project organization of the present invention and thinking; Do some simple designs that change or change, all fall into the scope of the present invention's protection.

Claims (7)

1. the Search Results filter method under the mobile scene, this method comprises the steps:
The 1st step is to user U i, i=1,2 ..., the initial results collection R to be filtered of N 1, R 2..., R Z, utilize the d gt to treat filter result and set up proper vector, R rProper vector be expressed as f Rr={ q 1, v 1), (q 2, v 2) ..., (q d, v d), v aRepresent the weights on each dimension; Utilize word frequency/contrary document frequency TF/IDF Model Calculation f Rr, the weights v on each dimension a, to q 1, q 2... q dIn each speech q aIf it does not appear at R r, in, then its weights are 0, otherwise are its TF/IDF value, TF is that it is at R rThe middle number of times that occurs, the promptly contrary document frequency of IDF is added up the z of number as a result that those comprise this speech;
Wherein, the IDF value is log (Z/z), and Z is the number of initial results to be filtered, and the TF/IDF value is the product of TF and IDF, r=1, and 2 ..., Z, a=1,2 ..., d;
The 2nd step was sought active user U i, similar users, from following two users set, choose, the one, the G of colony under the user g, g is the sequence number of the colony under the user, its span is 1 to m, and the 2nd, the user's in the user social contact network set merges these two set and obtains S set, remembers that the user in this set is U Is, utilize the vectorial cosine angle formula shown in the formula I to calculate user U iWith each the user U in the S set IsBetween similarity, shown in II, vector angle is more little, cosine value is big more, similarity is big more, vice versa; I representes user's sequence number, and N representes number of users, i=1, and 2 ..., N, f UiAnd f UisRepresent U respectively iAnd U IsProper vector, ψ (U i, U Is) represent U iWith U IsBetween degree of relationship, if U IsAt U iSocial networks in, ψ (U then i, U Is) get corresponding value, otherwise get null value; Choose preceding η user U from high to low by similarity I1, U I2..., U I η, if not enough η, then choose all users among the S; η is a preset value;
Sim ( U i , U Is ) = ( 1 + ψ ( U i , U Is ) ) · Cos ( f U i , f U Is ) Formula I
Cos ( f U i , f U Is ) = f U i · f U Is | | f U Is | | · | | f U Is | | Formula II
The 3rd content-based filtration of step:
To each bar initial results R to be filtered r, adopt formula III to calculate itself and user U successively iBetween similarity, f UiAnd f RrRepresent U respectively iAnd R rProper vector; Filter by pre-set threshold ζ according to similarity, the initial results of similarity less than threshold value ζ filtered out, obtain intermediate result collection R r, r=1,2 ..., Z ζ, filter the intermediate result that obtains and arrange by original sequencing;
Sim ( U i , R r ) = Cos ( f U i , f R r ) Formula III
Wherein, Cos ( f U i , f R r ) = f U i · f R r | | f U i | | · | | f R r | |
The 2nd step is to middle result set R r, r=1,2 ..., Z ζCarry out collaborative filtering, utilize user U iThe similar users U of η I1, U I2..., U I η, to middle R as a result r,, calculate similarity sim ' (U by formula IV i, R r) carry out collaborative filtering, in the formula,
Figure FDA0000127809430000024
With
Figure FDA0000127809430000025
Represent U respectively IsWith U i, U IsWith R rBetween similarity;
Sim ′ ( U i , R r ) = Σ s = 1 η ( Cos ( f U Is , f U i ) · Cos ( f U Is , f R r ) ) Formula IV
Rank r=θ r+ (1-θ) sim ' (U i, R r) formula V
According to sim ' (U i, R r) carry out collaborative filtering by pre-set threshold ε, the intermediate result of similarity less than ε is filtered out, obtain interim result set R r, r=1,2 ..., Z ε, r represents its sequencing ordering in interim result set, is followed successively by 1,2 ..., Z ε, to interim R r,, utilize formula V to calculate its order r and sim ' (U with predefined weighting coefficient θ i, R r) weighted sum, as net result rank Rank r, with this rank to interim result set R r, rearrangement obtains net result, returns to the user, and filter process finishes.
2. the Search Results filter method under the mobile scene according to claim 1 is characterized in that: the initial results collection in the 1st step obtains in the following manner:
For user U iSubmit to and once search for Q; Searching request is at first handled by existing internet search engine; Existing internet search engine returns an initial results collection to search Q, and the preceding φ bar result who chooses in this result set filters, if not enough φ bar; Then choose whole initial results collection, as result set R to be filtered 1, R 2..., R Z, φ is preestablished by system, and Z is a result's to be filtered number.
3. the Search Results filter method under the mobile scene according to claim 1 is characterized in that: the 1st step obtained result's to be filtered proper vector in the following manner:
Add up all the historical query records in all user's Δ T times, statistics obtains the speech q of d inequality 1, q 2..., q d, as d dimension of vector space, user's proper vector is expressed as f Ui={ q 1, v 1), (q 2, v 2) ..., (q d, v d), i=1,2 ..., N, v a, a=1,2 ..., d represents the weights of each dimension.
4. the Search Results filter method under the mobile scene according to claim 1 is characterized in that: in the 2nd step, obtain similar users in the following manner:
The 4.1st step was sought active user U i, similar users, with the G of colony under the user gMerge with user's in the user social contact network set, obtain S set, g is the sequence number of the colony under the user, and its span is 1 to m, and m representes the number of colony;
The 4.2nd step employing formula VI calculates U iWith each the user U in the S set IsBetween similarity sim (U i, U Is), f UiAnd f UisRepresent U respectively iAnd U IsProper vector, ψ (U i, U Is) represent U iWith U IsBetween degree of relationship, if U IsAt U iSocial networks in, ψ (U then i, U Is) get corresponding value, otherwise get null value; Choose preceding η user U from high to low by similarity I1, U I2..., U I η, if not enough η, then choose all users among the S; η is predefined value;
Sim ( U i , U Is ) = ( 1 + ψ ( U i , U Is ) ) · Cos ( f U i , f U Is ) Formula VI
Wherein, Cos ( f U i , f U Is ) = f U i · f U Is | | f U Is | | · | | f U Is | | .
5. the Search Results filter method under the mobile scene according to claim 4 is characterized in that: in the 4.1st step, and the G of colony under the user gObtain in the following manner:
The 5.1st step divided the user according to user's historical position change frequency; User's historical position information has write down user's historical position information L and corresponding temporal information T; Historical position information L is recorded in the data set with the form of longitude and latitude; Temporal information T is with the form record of time point, and the longitude and latitude of adjacent twice historical position of known users adopts the longitude and latitude range formula to calculate its distance;
To each user U i,, calculate the historical position accumulative total change frequency F in its nearest a period of time Δ T according to formula VII Ij:
F i = 1 ΔT Σ 1 M | Dis ( L k , L k - 1 ) T k - T k - 1 | VII
(L 1, T 1), (L 2, T 2) ..., (L M, T M) be user U i, the historical position information in nearest a period of time Δ T, (L K-1, T K-1) and (L k, T k) be twice adjacent historical position of user and temporal information, Dis (L k, L K-1) and T k-T K-1Be respectively the poor of adjacent twice historical position distance and time; M representes active user's historical position quantity, and k representes the sequence number of historical position;
The 5.2nd step all users' of statistics historical position adds up change frequency F, obtains the interval Ω of overall range of F, and Ω is divided into the interval Ω of plurality of sub 1, Ω 2..., Ω n, n representes user group's quantity, and these sub-ranges characterize different user groups with F, and the user is divided in the corresponding sub-range according to its F, the different Ω of colony that the user is divided into 1, Ω 2..., Ω n
The 5.3rd step is to each Ω jIn the user carry out cluster by historical position information, it is one type that the user that the position is contiguous gathers, and again the user further has been divided into the littler G of colony 1, G 2..., G m, j=1,2 ..., n, j represent the sequence number of colony.
6. the Search Results filter method under the mobile scene according to claim 5 is characterized in that: the 5.3rd step adopted the k means clustering algorithm to each Ω jIn the user carry out cluster, step is following:
(b1) at first calculate each user U iThe center O of the historical position in nearest a period of time Δ T i, according to center O iThe user is carried out cluster; I representes user's sequence number;
(b2) from Ω jA middle picked at random k user, each user U q, represent an initial user bunch C q, its center O qThe initial center of representative of consumer bunch, q 1,2 ..., k;
(b3) to Ω j, in remaining each user, calculate itself and each user bunch C qCenter O qDistance, assign to give nearest user bunch with it;
(b4) recomputate each user's bunch new center O then q,, replace old central value; By formula VIII calculation criterion function E jValue, if E jValue restrain then cluster process and finish, otherwise, change step b3;
E j = Σ q = 1 k Σ U ∈ Ω j Dis ( U , C q ) , j = 1,2 , . . . n Formula VIII
Among the formula VIII, Dis (U, C q) represent Ω jIn user and user bunch C q, center O qDistance;
(b5) cluster obtains compact user bunch, like this at Ω 1, Ω 2..., Ω nOn the basis of dividing, the user further has been divided into the littler G of colony 1, G 2..., G m, realize subscriber segmentation.
7. the Search Results filter method under the mobile scene according to claim 4 is characterized in that: in the 4.1st step, the user social contact network makes up in the following manner:
The 7.1st step was adopted word frequency/contrary document frequency TF/IDF model, to each user U iCalculate the weights of its each dimension of proper vector; To q 1, q 2..., q dIn each speech q A,If it does not appear in user's the historical query record, then its corresponding weight value v aBe 0, otherwise be its TF/IDF value, TF is a word frequency, and the promptly contrary document frequency of IDF is added up the number D that occurred the user of this speech in those historical query records, and the IDF value is log (N/D), and N is all numbers of users, and the TF/IDF value is the product of TF and IDF;
The 7.2nd step is to each user U iAnalyze the message registration in its nearest Δ T time a period of time, to each user u of message registration is arranged with it xAnalyze itself and U iTotal talk times α in Δ T, total duration of call β, conversation rule γ utilizes formula IX to calculate U iWith u xBetween the ψ of degree of relationship Ix
ψ Ix1α+λ 2β+λ 3γ formula IX
In the formula, 0≤λ 1≤1,0≤λ 2≤1,0≤λ 3≤1, and λ 1+ λ 2+ λ 3=1
γ = 1 S t
S t = 1 α - 1 Σ h = 2 α ( Δt ‾ - Δt h ) 2
Δt h=t h-t h-1,h=2,3,...,α
Δt ‾ = 1 α - 1 Σ h = 1 α Δt h .
CN 201110458155 2011-12-31 2011-12-31 Filtering method of search results in mobile environment Expired - Fee Related CN102591966B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110458155 CN102591966B (en) 2011-12-31 2011-12-31 Filtering method of search results in mobile environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110458155 CN102591966B (en) 2011-12-31 2011-12-31 Filtering method of search results in mobile environment

Publications (2)

Publication Number Publication Date
CN102591966A true CN102591966A (en) 2012-07-18
CN102591966B CN102591966B (en) 2013-12-18

Family

ID=46480604

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110458155 Expired - Fee Related CN102591966B (en) 2011-12-31 2011-12-31 Filtering method of search results in mobile environment

Country Status (1)

Country Link
CN (1) CN102591966B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102867031A (en) * 2012-08-27 2013-01-09 百度在线网络技术(北京)有限公司 Method and system for optimizing point of interest (POI) searching results, mobile terminal and server
WO2014101846A1 (en) * 2012-12-28 2014-07-03 Huawei Technologies Co., Ltd. Predictive caching in a distributed communication system
CN104317900A (en) * 2014-10-24 2015-01-28 重庆邮电大学 Multiattribute collaborative filtering recommendation method oriented to social network
CN104462239A (en) * 2014-11-18 2015-03-25 电信科学技术第十研究所 Customer relation discovery method based on data vectorization spatial analysis
CN104866474A (en) * 2014-02-20 2015-08-26 阿里巴巴集团控股有限公司 Personalized data searching method and device
CN105243135A (en) * 2015-09-30 2016-01-13 百度在线网络技术(北京)有限公司 Method and apparatus for showing search result
CN106570699A (en) * 2015-10-08 2017-04-19 平安科技(深圳)有限公司 Client contact information excavation method and server
CN111212381A (en) * 2019-12-18 2020-05-29 中通服建设有限公司 Mobile user behavior data analysis method and device, computer equipment and medium
CN113220969A (en) * 2020-02-06 2021-08-06 百度在线网络技术(北京)有限公司 Advertisement determination method, device, equipment and storage medium
CN113704604A (en) * 2021-08-24 2021-11-26 山东库睿科技有限公司 Search system and search method
CN113792180A (en) * 2021-08-30 2021-12-14 北京百度网讯科技有限公司 Duplicate removal method and device in recommendation scene, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1903460A1 (en) * 2006-09-21 2008-03-26 Sony Corporation Information processing
CN101819572A (en) * 2009-09-15 2010-09-01 电子科技大学 Method for establishing user interest model
CN101923545A (en) * 2009-06-15 2010-12-22 北京百分通联传媒技术有限公司 Method for recommending personalized information
CN102236646A (en) * 2010-04-20 2011-11-09 得利在线信息技术(北京)有限公司 Personalized item-level vertical pagerank algorithm iRank

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1903460A1 (en) * 2006-09-21 2008-03-26 Sony Corporation Information processing
CN101923545A (en) * 2009-06-15 2010-12-22 北京百分通联传媒技术有限公司 Method for recommending personalized information
CN101819572A (en) * 2009-09-15 2010-09-01 电子科技大学 Method for establishing user interest model
CN102236646A (en) * 2010-04-20 2011-11-09 得利在线信息技术(北京)有限公司 Personalized item-level vertical pagerank algorithm iRank

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王秀平等: "个性化学习推荐系统的设计与实现", 《微型电脑应用》 *
胡娟丽等: "基于典型反馈的个性化文本信息过滤", 《计算机应用》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102867031A (en) * 2012-08-27 2013-01-09 百度在线网络技术(北京)有限公司 Method and system for optimizing point of interest (POI) searching results, mobile terminal and server
WO2014101846A1 (en) * 2012-12-28 2014-07-03 Huawei Technologies Co., Ltd. Predictive caching in a distributed communication system
CN104866474A (en) * 2014-02-20 2015-08-26 阿里巴巴集团控股有限公司 Personalized data searching method and device
CN104866474B (en) * 2014-02-20 2018-10-09 阿里巴巴集团控股有限公司 Individuation data searching method and device
CN104317900A (en) * 2014-10-24 2015-01-28 重庆邮电大学 Multiattribute collaborative filtering recommendation method oriented to social network
CN104462239B (en) * 2014-11-18 2017-08-25 电信科学技术第十研究所 A kind of customer relationship based on data vector spatial analysis finds method
CN104462239A (en) * 2014-11-18 2015-03-25 电信科学技术第十研究所 Customer relation discovery method based on data vectorization spatial analysis
CN105243135B (en) * 2015-09-30 2019-09-20 百度在线网络技术(北京)有限公司 Show the method and device of search result
CN105243135A (en) * 2015-09-30 2016-01-13 百度在线网络技术(北京)有限公司 Method and apparatus for showing search result
CN106570699A (en) * 2015-10-08 2017-04-19 平安科技(深圳)有限公司 Client contact information excavation method and server
CN111212381A (en) * 2019-12-18 2020-05-29 中通服建设有限公司 Mobile user behavior data analysis method and device, computer equipment and medium
CN113220969A (en) * 2020-02-06 2021-08-06 百度在线网络技术(北京)有限公司 Advertisement determination method, device, equipment and storage medium
CN113704604A (en) * 2021-08-24 2021-11-26 山东库睿科技有限公司 Search system and search method
CN113792180A (en) * 2021-08-30 2021-12-14 北京百度网讯科技有限公司 Duplicate removal method and device in recommendation scene, electronic equipment and storage medium
CN113792180B (en) * 2021-08-30 2024-02-23 北京百度网讯科技有限公司 Method and device for removing duplicate in recommended scene, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN102591966B (en) 2013-12-18

Similar Documents

Publication Publication Date Title
CN102591966B (en) Filtering method of search results in mobile environment
CN102654860B (en) Personalized music recommendation method and system
CN103473291B (en) Personalized service recommendation system and method based on latent semantic probability models
CN103678647B (en) A kind of method and system for realizing information recommendation
CN108595461B (en) Interest exploration method, storage medium, electronic device and system
CN108737856B (en) Social relation perception IPTV user behavior modeling and program recommendation method
CN106649657A (en) Recommended system and method with facing social network for context awareness based on tensor decomposition
CN101770520A (en) User interest modeling method based on user browsing behavior
CN103425763B (en) User based on SNS recommends method and device
CN102663047B (en) Method and device for mining social relationship during mobile reading
CN106375369A (en) Mobile Web service recommendation method and collaborative recommendation system based on user behavior analysis
CN101192235A (en) Method, system and equipment for delivering advertisement based on user feature
CN103854065A (en) Customer loss prediction method and device
CN105608121B (en) Personalized recommendation method and device
CN101127046A (en) Method and system for sequencing to blog article
CN109902235A (en) User preference based on bat optimization clusters Collaborative Filtering Recommendation Algorithm
CN105718576A (en) Individual position recommending system related to geographical features
CN107679101A (en) It is a kind of that method is recommended based on the network service of position and trusting relationship
CN113961712B (en) Knowledge-graph-based fraud telephone analysis method
CN109359868A (en) A kind of construction method and system of power grid user portrait
CN106779946A (en) A kind of film recommends method and device
CN108521586A (en) The IPTV TV program personalizations for taking into account time context and implicit feedback recommend method
EP2652909A1 (en) Method and system for carrying out predictive analysis relating to nodes of a communication network
CN110852224B (en) Expression recognition method and related device
CN107368499A (en) A kind of client's tag modeling and recommendation method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20131218

Termination date: 20201231

CF01 Termination of patent right due to non-payment of annual fee