CN102591966A - Filtering method of search results in mobile environment - Google Patents
Filtering method of search results in mobile environment Download PDFInfo
- Publication number
- CN102591966A CN102591966A CN2011104581556A CN201110458155A CN102591966A CN 102591966 A CN102591966 A CN 102591966A CN 2011104581556 A CN2011104581556 A CN 2011104581556A CN 201110458155 A CN201110458155 A CN 201110458155A CN 102591966 A CN102591966 A CN 102591966A
- Authority
- CN
- China
- Prior art keywords
- msub
- mrow
- users
- user
- math
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001914 filtration Methods 0.000 title claims abstract description 80
- 238000000034 method Methods 0.000 title claims abstract description 38
- 239000013598 vector Substances 0.000 claims description 34
- 238000004422 calculation algorithm Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 3
- 238000003064 k means clustering Methods 0.000 claims description 2
- 230000002195 synergetic effect Effects 0.000 claims description 2
- 230000001186 cumulative effect Effects 0.000 claims 1
- 238000012821 model calculation Methods 0.000 claims 1
- 238000005516 engineering process Methods 0.000 description 15
- 238000010586 diagram Methods 0.000 description 4
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 238000010276 construction Methods 0.000 description 2
- 238000007418 data mining Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000035755 proliferation Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a filtering method of search results in a mobile environment. The method comprises the steps of: finely dividing users into different groups according to history position information characteristics of the users; characteristically modeling the users according to the history query records of the users; analyzing history call records of the users, establishing a social intercourse relation network of the users and calculating the social intercourse relation importance among the users; and during search, firstly, filtering the search results based on contents by using an established user characteristic model, secondly, cooperatively filtering the search results with the finely divided user group information and the excavated information of the social intercourse relation network of the users, and thirdly, returning the search results to the users. With the method for excavating the user characteristics and filtering the information, the search results can be better filtered in a personalized way, a mass of unrelated search results can be removed, a result set can be simplified, and the personalized precise search in the mobile environment can be realized.
Description
Technical Field
The invention belongs to the field of information retrieval, and particularly relates to a search result filtering method in a mobile scene.
Background
In the past decade, the technology of search engines has been rapidly developed, and the traditional internet search has been developed from technical implementation to business model, and has been extremely mature and successful. In recent years, emerging technologies and applications represented by the mobile internet are emerging, and mobile search is one of important applications of the mobile internet.
Due to the limitations of mobility, portability, screen size, processing capability and available bandwidth of the mobile terminal, the mobile search cannot directly follow the conventional internet search implementation scheme, and the following two main reasons exist: (1) conventional internet search engines typically return a large number of results to the user, and in fact most of the time these results are not relevant to the user in more than half of the cases. One of the main reasons is that the search engine simply matches the search keywords, does not consider other information (such as user context information, personal preference, etc.), and the proliferation of information on the internet results in the generation of a lot of "junk results", and the user has to filter the search results by himself, which greatly increases the burden of the user. In a mobile scene, due to the limitations of the size, processing capacity, available bandwidth and the like of a screen keyboard of a mobile terminal, the situation is intolerable to a user, so that a large amount of garbage results waste precious flow, and the user is inconvenient to carry out page turning and screening on search results on the mobile terminal, so that the mobile search is determined to be accurate, and the accurate results are returned to the user as few as possible; (2) for the same search keyword, the unified internet search engine returns results with a uniform rule to all users, however, different users have different interests and hobbies due to different background knowledge, and different information requirements. The mobility, portability and privacy of the mobile terminal enable a user to acquire required information anytime and anywhere, so that the personalized search requirement is stronger, which determines that the mobile search is a personalized search related to personal characteristics (such as interests and the like) of the user and the context (such as time, place, weather and the like) of the user.
Therefore, what mobile search needs to achieve is a personalized, accurate search. At present, domestic mobile search research is still in a starting stage, the implementation technology is not mature compared with the existing internet search technology, the earlier technology is a vertical search technology, such as mobile phone music search, novel search and the like, and at present, more implementation schemes are adopted to combine the existing internet search technology and related auxiliary technologies, such as an information filtering technology, firstly, feature modeling is carried out on a user, then, personalized filtering is carried out on search results through the model, irrelevant results are filtered, and personalized accurate search is achieved.
The common techniques for user feature modeling include a vector space model and an ontology model, and the vector space model is simple in principle, easy to implement and relatively wide in application.
The information filtering technology commonly uses a content-based filtering technology and a collaborative filtering technology, the content-based filtering technology is to perform feature extraction on a result, calculate the similarity between the result and a filtering template (user model), and filter according to a set threshold, because the result content is analyzed, a better filtering effect can be achieved generally, but the calculation amount is larger. The collaborative filtering technology is based on the idea that people of the same type usually have the same interest and preference, and the technology is well developed and applied in the field of electronic commerce by performing collaborative filtering on search results of users through users with similar interests to the current users.
Disclosure of Invention
The invention aims to provide a method for filtering search results in a mobile scene, which builds a user characteristic model and a user social network by mining user data (user historical position information, historical call records and the like), respectively carries out content-based filtering and collaborative filtering on the search results according to the user characteristic model and the user social network, filters irrelevant search results, realizes personalized accurate search in the mobile scene, and is valuable for improving the mobile search user experience and the user stickiness.
The invention provides a method for filtering search results in a mobile scene, which comprises the following steps:
step 1 to user UiN, i 1, 21,R2,...,RZEstablishing a feature vector, R, for the result to be filtered using the d-dimensional vector spacerIs expressed as fRr={(q1,v1),(q2,v2),...,(qd,vd)},vaRepresenting the weight in each dimension; f is calculated by using a word frequency/inverse document frequency TF/IDF modelRrWeight v in each dimensionaTo q is paired1,q2,...qdEach word q in (1)aIf it does not occur at RrIf yes, its weight is 0, otherwise it is its TF/IDF value, TF is its value in RrThe number of times of occurrence in the process, namely IDF (inverse document frequency), and counting the number z of results containing the word;
wherein, the IDF value is log (Z/Z), Z is the number of initial results to be filtered, TF/IDF value is the product of TF and IDF, r is 1, 2,.., Z, a is 1, 2,.., d;
step 2, searching the current user UiThe similar users are selected from two user sets, namely a group G to which the users belonggG is the serial number of the group to which the user belongs, the value range is 1 to m, and the other is the set of the users in the user social network, the two sets are merged to obtain a set S, and the user in the set is marked as UisCalculating the user U by using the vector cosine angle formula shown in formula IiWith each user U in the set SisSimilarity between vectors is as shown in formula II, the smaller the vector included angle is, the larger the cosine value is, the larger the similarity is, and vice versa; i denotes the serial number of the user, N denotes the number of users, i 1, 2UiAnd fUisRespectively represents UiAnd UisCharacteristic vector of phi (U)i,Uis) Represents UiAnd UisDegree of relationship between them, if UisAt UiIn the social network of (2), then ψ (U)i,Uis) Taking a corresponding value, otherwise, taking a zero value; selecting front eta users U from high to low according to similarityi1,Ui2,...,UiηIf the number of the users is less than eta, all the users in the S are selected; eta is a preset value;
Step 3, filtering based on contents:
for each initial result R to be filteredrSequentially calculating the user U and the user U by adopting a formula IIIiSimilarity between them, fUiAnd fRrRespectively represents UiAnd RrThe feature vector of (2); filtering according to the similarity and a preset threshold value zeta, and filtering the initial results with the similarity smaller than the threshold value zeta to obtain an intermediate result set Rr,r=1,2,...,ZζThe intermediate results obtained by filtering are arranged according to the original sequence;
Wherein, <math>
<mrow>
<mi>cos</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>f</mi>
<msub>
<mi>U</mi>
<mi>i</mi>
</msub>
</msub>
<mo>,</mo>
<msub>
<mi>f</mi>
<msub>
<mi>R</mi>
<mi>r</mi>
</msub>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mrow>
<msub>
<mi>f</mi>
<msub>
<mi>U</mi>
<mi>i</mi>
</msub>
</msub>
<mo>·</mo>
<msub>
<mi>f</mi>
<msub>
<mi>R</mi>
<mi>r</mi>
</msub>
</msub>
</mrow>
<mrow>
<mo>|</mo>
<mo>|</mo>
<msub>
<mi>f</mi>
<msub>
<mi>U</mi>
<mi>i</mi>
</msub>
</msub>
<mo>|</mo>
<mo>|</mo>
<mo>·</mo>
<mo>|</mo>
<mo>|</mo>
<msub>
<mi>f</mi>
<msub>
<mi>R</mi>
<mi>r</mi>
</msub>
</msub>
<mo>|</mo>
<mo>|</mo>
</mrow>
</mfrac>
</mrow>
</math>
step 2 to intermediate result set Rr,r=1,2,...,ZζPerforming collaborative filtering by using user UiEta most similar users Ui1,Ui2,...,UiηTo the intermediate result RrCalculating the similarity sim' (U) according to the formula IVi,Rr) Carrying out the synergistic filtration, wherein in the formula,andrespectively represents UisAnd Ui,UisAnd RrThe similarity between them;
Rankr=θ·r+(1-θ)·sim′(Ui,Rr) Formula V
According to sim' (U)i,Rr) Carrying out collaborative filtering according to a preset threshold epsilon, filtering intermediate results with similarity smaller than epsilon, and obtaining a temporary result set Rr,r=1,2,...,ZεAnd r represents the sequence in the temporary result setSequence of 1, 2, 1, ZεTo temporary RrThen, the order r and sim' are calculated by formula V using a predetermined weighting factor θ (U)i,Rr) As a weighted sum of the final result ranking RankrRanking the temporary result set R with thisrAnd reordering to obtain a final result, returning the final result to the user, and ending the filtering process.
The search result filtering method under the mobile scene comprehensively adopts a data mining method (classification and clustering) and is based on a content filtering algorithm and a collaborative filtering algorithm. Specifically, the present invention has the following effects and advantages:
(1) the method and the device have high accuracy, and the social network information of the user is innovatively analyzed, and the collaborative filtering is simultaneously carried out on the basis of the traditional content-based filtering, so that the accuracy is greatly improved.
(2) The invention has strong adaptability, and can well adapt to the individual requirements of various user groups and individuals in consideration of the diversity of the mobile user groups and individuals.
(3) The method has high expandability, can be used for mobile search, mobile internet application, accurate advertisement delivery and the like, and can also be used for Customer Relationship Management (CRM) and the like.
Drawings
FIG. 1 is an overall flow diagram of the process of the present invention;
FIG. 2 is a simplified diagram of a mobile user's historical location change frequency;
FIG. 3 is a flow chart of mobile user clustering by location;
FIG. 4 is a diagram of a mobile user social network architecture;
FIG. 5 is a flow diagram illustrating the detailed filtering of mobile search results.
Detailed Description
The present invention will be described in detail with reference to the accompanying drawings.
The method for filtering the search results in the mobile scene, as shown in fig. 1, includes a filtering preprocessing stage, which mainly includes user segmentation, user feature model construction and user social network construction, which correspond to the following steps (1) to (3), respectively, and a result filtering stage, which corresponds to the following step (4). The specific treatment steps are as follows:
1. and a filtration pretreatment stage, which comprises the following steps (1) to (3).
(1) The invention relates to a method for subdividing users, which comprises the following steps that a data mining method is adopted to subdivide the users, a large amount of user data such as historical position information of the users, historical call records, historical query records and browsing records of the users, historical service data and the like are collected in a user data set provided by the existing telecommunication operator, and the method mainly subdivides the users according to the historical position information of the users, and comprises the following specific steps:
(a) dividing users according to the change frequency of the historical positions of the users, wherein the historical position information of the users records the historical positions L of the users and corresponding time information T, the position information L is recorded in a data set in the form of longitude and latitude (30.2332, 114.3243), the time information T is recorded in the form of time points, the longitude and latitude of two adjacent historical positions of the users are known, the distance of the two adjacent historical positions of the users is easily calculated by adopting a longitude and latitude distance formula (1)), and the first position L is set1Has a longitude and latitude of (lon)1,lat1) Second position L2Has a longitude and latitude of (lon)2,lat2) According to the reference of 0-degree warp, the east warp takes a positive value, the west warp takes a negative value, the north weft is calculated according to the (90 degrees-lat) substitution, the south weft is calculated according to the (90 degrees + lat) substitution, and the distance between the two points can be calculated by using the formula (1).
C=sin(lat1)·sin(lat2)·cos(lon1-lon2)+cos(lat1)·cos(lat2)
For each user Ui(i 1, 2.., N), calculating the historical accumulated change frequency F of the position in the latest period of time Δ T (such as one month)i(i ═ 1, 2.., N), where N represents the number of users.
As shown in formula (2), (L)1,T1),(L2,T2),...,(LM,TM) Is a user Ui(i ═ 1, 2.., N) historical location information over a recent period of time Δ T, (L ═ L ·k-1,Tk-1) And (L)k,Tk) I.e. two neighboring past locations of users and time information, Dis (L)k,Lk-1) And Tk-Tk-1The difference between the historical position distance and the time of two adjacent times is respectively. M represents the number of historical locations of the current user, and k represents the serial number of the historical locations.
F of all users is counted to obtain a total range interval omega of the F, and the omega is divided into a plurality of subintervals omega1,Ω2,...,ΩnAnd n represents the number of user groups, the sub-intervals represent different user groups by F, and the users are divided into corresponding sub-intervals according to the F, as shown in fig. 2, the F of the user a is higher and may be business people who frequently go on business. If the F of the user B is low, the user B may often be in a fixed position for a long time, for example, a college student, so that the users are divided into different groups Ω according to the frequency F of the change of the position1,Ω2,...,Ωn. The dividing of Ω may be performed in an equal division manner, or a division standard may be preset by the system.
(b) Then for each omegajAnd (j is 1, 2.. multidot.n.j represents the serial number of the group), clustering is carried out on the users in the groups according to historical position information, the users in the adjacent positions are clustered into one class, and related research shows that the users in the adjacent geographical positions have similar use to a certain extentUsing a k-means clustering algorithm to perform per omega on the household characteristicsjAnd (j) clustering the users in (j) 1, 2.. times.n) by the following steps:
(b1) first, calculate each user Ui(i ═ 1, 2.., N) of the center position O of the historical positions at time Δ TiAccording to OiClustering users; i represents the user's serial number;
(b2) from ΩjRandomly selecting k users from (j ═ 1, 2.. times.n), wherein each user U is a userqAnd (q ═ 1, 2.. times, k) represents an initial cluster of users Cq(q ═ 1, 2,. k), O thereofq(q ═ 1, 2.., k) represents the initial center of the user cluster;
(b3) for omegajEach user remaining in (j ═ 1, 2.. times.n), which is computed with each user cluster CqK (q ═ 1, 2.. k) center OqA distance (longitude and latitude distance formula) of (q ═ 1, 2.., k), which is assigned to a user cluster closest to the user cluster;
(b4) then recalculate the new center value O of each user clusterqAnd (q ═ 1, 2.., k), the old center value is replaced. Calculating a criterion function E according to equation (3)jA value of (E)jIf the values of (c) are converged, the clustering process is ended, otherwise, go to step b 3.
Dis (U, C) as shown in formula (3)q) Represents omegajUsers in (j ═ 1, 2.. times, n) and user cluster CqK (q ═ 1, 2.. k) center OqA distance of (q ═ 1, 2.., k).
Clustering results in a compact cluster of users, thus at Ω1,Ω2,...,ΩnOn the basis of the division, the users are further divided into smaller groups G1,G2,...,GmAnd realizing user subdivision.
(2) The method comprises the following steps of constructing a user characteristic model, well representing interest characteristics of a user through a historical query record of the user, and performing characteristic modeling on the user by adopting a vector idle-running model through analyzing the historical query record of the user, wherein the method comprises the following steps:
(a) all historical query records of all users within delta T time are counted, and d different words q are obtained through statistics1,q2,...,qdThe feature vector of the user is represented as f as d dimensions of the vector spaceUi={(q1,v1),(q2,v2),...,(qd,vd)},(i=1,2,...,N),vaAnd (a ═ 1, 2.., d) represents the weight of each dimension.
(b) Adopting TF/IDF (word frequency/inverse document frequency) model to process each user UiAnd (i ═ 1, 2.. times, N), the weight of each dimension of its feature vector is calculated. To q is1,q2,...,qdEach word q in (1)a(a 1, 2.. d), if it does not appear in the user's historical query record, then its corresponding weight vaAnd (a ═ 1, 2.. multidot.d.) is 0, otherwise, the TF/IDF value is the frequency of the word, namely the frequency of the word appearing in the historical query records of the user, and the IDF is the frequency of the inverse document, and the frequency of the word appearing in the historical query records is countedThe IDF value is log (N/D), N is the number of all users, and the TF/IDF value is the product of TF and IDF.
(3) Mining social network information of users, analyzing historical call records of users, and analyzing the historical call records of each user Ui(i ═ 1, 2.. times, N), the social network appears as a star topology centered on the user, as shown in fig. 3, the center node B represents the user himself, the star nodes a, C, D, E, F, G, etc. represent users who have a call record with B, the weight ψ of the edges represents the degree of relationship between the users, and this step is mainly to estimate the value ψ.
The historical call record data of the users records the call records among all the users, including the id numbers of both parties of the call), the call start time, the call end time and the like, and for each user Ui(i 1, 2.. ang., N), analyzing call records in delta T time, and recording call to each user ux(x 1, 2.. e, e represents the number of users with whom the call records are made), and U is analyzed with UiTotal number of calls α, total call duration β, and call law γ within Δ T (i ═ 1, 2.., N), and by analyzing these factors in combination, U can be roughly inferredi(i ═ 1, 2.., N) and uxDegree of relationship ψ between (x ═ 1, 2.., e)ix。
The total call times alpha and the total call duration beta are easy to be counted, but the total call times alpha and the total call duration beta are general statistics and single statistics, only the relationship degree between the users can be roughly estimated on the whole, important detail characteristics are ignored, such as whether the distribution of each call event along with the time is uniform, whether the call events are uniform on the whole or locally, and the like, and therefore, the characteristic factor of the call rule gamma is introduced to represent Ui(i ═ 1, 2.., N) and uxThe degree of relation between (x ═ 1, 2.. times, e) is obtained by statistically analyzing the time distribution characteristics of all call events within the time Δ T and borrowing the idea of variance, as shown in formulas (4), (5), (6), and (T)hAnd (h ═ 1, 2.,. alpha.) is the start time of each call, Δ thFor time between two adjacent call recordsDifference between StFor its variance, γ is inversely proportional to StAs shown in equation (6), a small variance indicates that the call is regular during the period of time, and γ is correspondingly large, or vice versa.
Δth=th-th-1,(h=2,3,...,α) (4)
Normalizing the calculated alpha, beta and gamma to obtain a value phi between 0 and 1ixThe value of (i ═ 1, 2., N, x ═ 1, 2., e) is calculated by equation (8), which is a weighted value obtained by comprehensively considering α, β, γ, and in equation (8), λ is 0 ≦ λ1≤1,0≤λ2≤1,0≤λ31 or less and lambda1+λ2+λ3Its default value is the average value 1/3.
ψix=λ1·α+λ2·β+λ3·γ,(λ1+λ2+λ3=1) (8)
Thus, through the analysis and calculation of the step, each user U is obtainediSocial network information of (i ═ 1, 2.., N), including users u with whom it is connected toxDegree of relationship ψ between (x ═ 1, 2.., e)ix。
(4) And (3) filtering the search results, wherein the previous steps (1) to (3) are preparation stages, and are used for a search result filtering service of the step, the user feature model established in the step (2) is used for filtering the search results based on content, and the user segmentation performed in the step (1) and the user social network information mined in the step (3) are used for performing collaborative filtering on the search results.
This step performs content-based filtering and then collaborative filtering on the search results. So as to achieve the purposes of individualization and simplifying the search results.
User Ui(i ═ 1, 2.., N) submits a search Q, the search request is first processed by the existing internet search engine, which returns an initial result set to search Q, the result set is usually large, the previous phi pieces of results in the result set are selected for filtering, if there are not enough phi pieces, the whole initial result set is selected as the result set R to be filtered1,R2,...,RZPhi is an empirical value preset by the system, e.g. set to 300, and Z is the number of results to be filtered. The resulting filtration scheme is shown in FIG. 5, with the following steps:
(a) result set R to be filtered1,R2,...,RZEstablishing a feature vector, R, for the results using the d-dimensional vector space established in step (2)rThe feature vector of (r ═ 1, 2.., Z) is denoted as fRr={q1,v1),(q2,v2),...,(qd,vd)},(r=1,2,...,Z),vaAnd (a ═ 1, 2.., d) represents the weight in each dimension. F is calculated by using TF/IDF (term frequency/inverse document frequency) model used in the step (2)Rr(r ═ 1, 2.., Z) weight v in each dimensiona(a ═ 1, 2,. and d), pair q1,q2,...qdEach word q in (1)a(a ═ 1, 2.., d), if it does not occur at Rr(R1, 2.. times.z), then its weight is 0, otherwise its TF/IDF value, TF is its value in RrThe number of occurrences in (r ═ 1, 2., Z), IDF, i.e., inverse document frequency, and the number of results Z containing the word, IDF value, i.e., log (Z/Z), Z being the total number of results, TF/IDF value being the product of TF and IDF.
(b) Then searching the current user Ui(i 1, 2.., N) of similar users, selected from two sets of users, one is the group G to which the user belongs in step (1)gG is the serial number of the group to which the user belongs, the numeric area is 1 to m, and secondly, the social network of the user is established in the step (3)The two sets are merged (possibly with duplicate users) to obtain a set S, and a plurality of similar users are selected from the set S.
In the formula (10), | | | | represents a modulus of the vector.
(5) Calculating U by using vector cosine included angle formula shown in formula (10)i(i 1, 2.. N) and each user U in the set SisThe similarity between vectors is smaller, the cosine value is larger, the similarity is larger, and vice versa, as shown in formula (9). f. ofUiAnd fUisRespectively represents UiAnd UisCharacteristic vector of phi (U)i,Uis) Represents UiAnd UisDegree of relationship between them, if UisAt UiIn the social network of (2), then ψ (U)i,Uis) Take the corresponding value, otherwise take the value zero. Selecting front eta users U from high to low according to similarityi1,Ui2,...,UiηAnd if the number of the users is less than eta, selecting all the users in the S. Eta is an empirical value, which is preset by the system, and the default value of eta can be 10.
(c) Then, result filtering is started, and the filtering process is divided into two stages, namely a content-based filtering stage and a collaborative filtering stage:
(c1) firstly, based on content filtering, each strip to be filtered in (a) is filteredInitial result Rr(r ═ 1, 2.., Z), which is computed in turn with user UiSimilarity between (i ═ 1, 2.., N), and similarity between the two is calculated by using formula (10) as shown in formula (11), and f is calculated by using formula (11) as shown in the figureUiAnd fRrRespectively represents UiAnd RrThe feature vector of (2). Filtering according to the similarity by a threshold value zeta, and filtering the results with the similarity smaller than zeta to obtain an intermediate result set Rr,(r=1,2,...,Zζ) And arranging the intermediate results obtained by filtering according to the original sequence. The threshold value ζ is an empirical value, preset by the system, 0 ≦ ζ ≦ 1, and its default value may be set to 0.65.
(c2) Next, for the intermediate result set Rr,(r=1,2,...,Zζ) Performing collaborative filtering, wherein the collaborative filtering is based on the idea that similar users usually have similar interests, and the collaborative filtering is performed on the current user by using the similar users of the current user to perform collaborative recommendation, and the user U obtained by calculation in the step (b) is adoptediη most similar users U of (i ═ 1, 2.., N)i1,Ui2,...,UiηTo the intermediate result Rr,(r=1,2,...,Zζ) The similarity sim' (U) is calculated according to equation (12)i,Rr) Performing collaborative filtering, wherein a vector cosine included angle formula of a formula (10) is adopted,andrespectively represents UisAnd Ui,UisAnd RrThe similarity between them.
Rankr=θ·r+(1-θ)·sim′(Ui,Rr) (13)
According to sim' (U)i,Rr) Carrying out collaborative filtering according to a threshold value epsilon, filtering the result with similarity smaller than epsilon, and obtaining a temporary result set Rr,(r=1,2,...,Zε) R represents the sequential ordering of the two in the temporary result set, and is 1, 2ε) To R, to Rr,(r=1,2,...,Zε) The order r and sim' thereof are calculated by a weighting coefficient theta (U)i,Rr) As a weighted sum of the final result ranking RankrR is ranked as shown in equation (13)r,(r=1,2,...,Zε) And reordering to obtain a final result, returning the final result to the user, and finishing the filtering process. The threshold epsilon and the weighting coefficient theta are empirical values and are preset by the system, epsilon is more than or equal to 0 and less than or equal to 1, theta is more than or equal to 0 and less than or equal to 1, the default value of epsilon can be set to 0.85, and the default value of theta can be set to 0.5.
The present invention is not limited to the above embodiments, and those skilled in the art can implement the present invention in other various embodiments according to the disclosure of the present invention, so that all designs and concepts of the present invention can be changed or modified without departing from the scope of the present invention.
Claims (7)
1. A method for filtering search results in a mobile scene comprises the following steps:
step 1 to user UiN, i 1, 21,R2,...,RZEstablishing a feature vector, R, for the result to be filtered using the d-dimensional vector spacerIs expressed as fRr={q1,v1),(q2,v2),...,(qd,vd)},vaRepresenting the weight in each dimension; using word frequency/inverse document frequencyRate TF/IDF model calculation fRrWeight v in each dimensionaTo q is paired1,q2,...qdEach word q in (1)aIf it does not occur at RrIf yes, its weight is 0, otherwise it is its TF/IDF value, TF is its value in RrThe number of times of occurrence in the process, namely IDF (inverse document frequency), and counting the number z of results containing the word;
wherein, the IDF value is log (Z/Z), Z is the number of initial results to be filtered, TF/IDF value is the product of TF and IDF, r is 1, 2,.., Z, a is 1, 2,.., d;
step 2, searching the current user UiThe similar users are selected from two user sets, namely a group G to which the users belonggG is the serial number of the group to which the user belongs, the value range is 1 to m, and the other is the set of the users in the user social network, the two sets are merged to obtain a set S, and the user in the set is marked as UisCalculating the user U by using the vector cosine angle formula shown in formula IiWith each user U in the set SisSimilarity between vectors is as shown in formula II, the smaller the vector included angle is, the larger the cosine value is, the larger the similarity is, and vice versa; i denotes the serial number of the user, N denotes the number of users, i 1, 2UiAnd fUisRespectively represents UiAnd UisCharacteristic vector of phi (U)i,Uis) Represents UiAnd UisDegree of relationship between them, if UisAt UiIn the social network of (2), then ψ (U)i,Uis) Taking a corresponding value, otherwise, taking a zero value; selecting front eta users U from high to low according to similarityi1,Ui2,...,UiηIf the number of the users is less than eta, all the users in the S are selected; eta is a preset value;
Step 3, filtering based on contents:
for each initial result R to be filteredrSequentially calculating the user U and the user U by adopting a formula IIIiSimilarity between them, fUiAnd fRrRespectively represents UiAnd RrThe feature vector of (2); filtering according to the similarity and a preset threshold value zeta, and filtering the initial results with the similarity smaller than the threshold value zeta to obtain an intermediate result set Rr,r=1,2,...,ZζThe intermediate results obtained by filtering are arranged according to the original sequence;
Wherein, <math>
<mrow>
<mi>cos</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>f</mi>
<msub>
<mi>U</mi>
<mi>i</mi>
</msub>
</msub>
<mo>,</mo>
<msub>
<mi>f</mi>
<msub>
<mi>R</mi>
<mi>r</mi>
</msub>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mrow>
<msub>
<mi>f</mi>
<msub>
<mi>U</mi>
<mi>i</mi>
</msub>
</msub>
<mo>·</mo>
<msub>
<mi>f</mi>
<msub>
<mi>R</mi>
<mi>r</mi>
</msub>
</msub>
</mrow>
<mrow>
<mo>|</mo>
<mo>|</mo>
<msub>
<mi>f</mi>
<msub>
<mi>U</mi>
<mi>i</mi>
</msub>
</msub>
<mo>|</mo>
<mo>|</mo>
<mo>·</mo>
<mo>|</mo>
<mo>|</mo>
<msub>
<mi>f</mi>
<msub>
<mi>R</mi>
<mi>r</mi>
</msub>
</msub>
<mo>|</mo>
<mo>|</mo>
</mrow>
</mfrac>
</mrow>
</math>
step 2 to intermediate result set Rr,r=1,2,...,ZζPerforming collaborative filtering by using user UiEta most similar users Ui1,Ui2,...,UiηTo the intermediate result RrCalculating the similarity sim' (U) according to the formula IVi,Rr) Carrying out the synergistic filtration, wherein in the formula,andrespectively represents UisAnd Ui,UisAnd RrThe similarity between them;
Rankr=θ·r+(1-θ)·sim′(Ui,Rr) Formula V
According to sim' (U)i,Rr) Carrying out collaborative filtering according to a preset threshold epsilon, filtering intermediate results with similarity smaller than epsilon, and obtaining a temporary result set Rr,r=1,2,...,ZεR represents the sequential ordering of the two in the temporary result set, and is 1, 2εTo temporary RrThen, the order r and sim' are calculated by formula V using a predetermined weighting factor θ (U)i,Rr) As a weighted sum of the final result ranking RankrRanking the temporary result set R with thisrAnd reordering to obtain a final result, returning the final result to the user, and ending the filtering process.
2. The method for filtering search results in a mobile scene according to claim 1, wherein: the initial result set in step 1 is obtained as follows:
for user UiSubmitting a search Q, processing a search request by an existing internet search engine, returning an initial result set to the search Q by the existing internet search engine, selecting previous phi bars in the result set for filtering, and if the phi bars are not enough, selecting all the initial result sets as a result set R to be filtered1,R2,...,RZPhi is preset by the system and Z is the number of results to be filtered.
3. The method for filtering search results in a mobile scene according to claim 1, wherein: step 1, obtaining a feature vector of a result to be filtered according to the following mode:
all historical query records of all users within delta T time are counted, and d different words q are obtained through statistics1,q2,...,qdThe feature vector of the user is represented as f as d dimensions of the vector spaceUi={q1,v1),(q2,v2),...,(qd,vd)},i=1,2,...,N,vaAnd a is 1, 2, and d represents the weight of each dimension.
4. The method for filtering search results in a mobile scene according to claim 1, wherein: and step 2, obtaining the most similar user according to the following modes:
step 4.1 of finding the current user UiSimilar users of (2) group G to which the users belonggMerging the set S and the set of the users in the user social network to obtain a set S, wherein g is the serial number of the group to which the users belong, the numeric area of the set S is 1-m, and m represents the number of the group;
step 4.2 calculate U using formula VIiWith each user U in the set SisSimilarity sim (U) betweeni,Uis),fUiAnd fUisRespectively represents UiAnd UisCharacteristic vector of phi (U)i,Uis) Represents UiAnd UisDegree of relationship between them, if UisAt UiIn the social network of (2), then ψ (U)i,Uis) Taking a corresponding value, otherwise, taking a zero value; selecting front eta users U from high to low according to similarityi1,Ui2,...,UiηIf the number of the users is less than eta, all the users in the S are selected; eta is a preset value;
Wherein, <math>
<mrow>
<mi>cos</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>f</mi>
<msub>
<mi>U</mi>
<mi>i</mi>
</msub>
</msub>
<mo>,</mo>
<msub>
<mi>f</mi>
<msub>
<mi>U</mi>
<mi>is</mi>
</msub>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mrow>
<msub>
<mi>f</mi>
<msub>
<mi>U</mi>
<mi>i</mi>
</msub>
</msub>
<mo>·</mo>
<msub>
<mi>f</mi>
<msub>
<mi>U</mi>
<mi>is</mi>
</msub>
</msub>
</mrow>
<mrow>
<mo>|</mo>
<mo>|</mo>
<msub>
<mi>f</mi>
<msub>
<mi>U</mi>
<mi>is</mi>
</msub>
</msub>
<mo>|</mo>
<mo>|</mo>
<mo>·</mo>
<mo>|</mo>
<mo>|</mo>
<msub>
<mi>f</mi>
<msub>
<mi>U</mi>
<mi>is</mi>
</msub>
</msub>
<mo>|</mo>
<mo>|</mo>
</mrow>
</mfrac>
<mo>.</mo>
</mrow>
</math>
5. the method for filtering search results in a mobile scene according to claim 4, wherein: in step 4.1, the group G to which the user belongsgThe following is obtained:
5.1, dividing the users according to the historical position change frequency of the users, wherein the historical position information of the users records the historical position information L of the users and corresponding time information T, the historical position information L is recorded in a data set in a form of longitude and latitude, the time information T is recorded in a form of time point, the longitude and latitude of two adjacent historical positions of the users are known, and the distance of the users is calculated by adopting a longitude and latitude distance formula;
for each user UiCalculating the cumulative change frequency F of the historical position within the latest period of time Delta T according to the formula VIIij:
(L1,T1),(L2,T2),...,(LM,TM) Is a user UiHistorical position information over a recent period of time Δ T, (L)k-1,Tk-1) And (L)k,Tk) I.e. two neighboring past locations of users and time information, Dis (L)k,Lk-1) And Tk-Tk-1Respectively representing the difference between the distance between the two adjacent historical positions and the time; m represents the number of the historical positions of the current user, and k represents the serial number of the historical positions;
step 5.2, counting the accumulated change frequency F of the historical positions of all the users to obtain the total range interval omega of the F, and dividing the omega into a plurality of subintervals omega1,Ω2,...,ΩnN represents the number of user groups, the sub-intervals represent different user groups by F, the users are divided into corresponding sub-intervals according to the F, and the users are divided into different groups omega1,Ω2,...,Ωn;
Step 5.3 for each ΩjThe users in the system are clustered according to historical position information, users in adjacent positions are clustered into one class, and then the users are further divided into smaller groups G1,G2,...,GmJ 1, 2.. and n, j denote the serial number of the population.
6. The method for filtering search results in a mobile scene according to claim 5, wherein: step 5.3, adopting a k-means clustering algorithm to carry out on each omegajThe user in the method carries out clustering, and the steps are as follows:
(b1) first, calculate each user UiCenter position O of history position in recent period Δ TiAccording to the central position OiClustering users; i represents the user's serial number;
(b2) fromΩjIn the method, k users are randomly selected, and each user UqRepresents an initial user cluster CqAt its central position OqRepresents the initial center of the user cluster, q 1, 2.., k;
(b3) for omegajAnd each user remaining in the cluster C is calculatedqCenter position OqIs assigned to the closest user cluster;
(b4) then recalculate the new center position O of each user clusterq-replacing the old center value; calculating a criterion function E according to formula VIIIjA value of (E)jIf the value is converged, the clustering process is ended, otherwise, the step b3 is switched;
In the formula VIII, Dis (U, C)q) Represents omegajUser and user cluster C inqCenter position OqThe distance of (d);
(b5) clustering results in a compact cluster of users, thus at Ω1,Ω2,...,ΩnOn the basis of the division, the users are further divided into smaller groups G1,G2,...,GmAnd realizing user subdivision.
7. The method for filtering search results in a mobile scene according to claim 4, wherein: in step 4.1, the user social network is constructed as follows:
step 7.1, adopting a word frequency/inverse document frequency TF/IDF model to each user UiCalculating the weight of each dimension of the characteristic vector; to q is1,q2,...,qdEach word q in (1)a,If it does not appear in the user's historical query record, then its corresponding weight vaIf not, the number of users with the word in the history query record is counted, wherein the number of users with the word is counted, the IDF value is log (N/D), N is the number of all users, and the TF/IDF value is the product of TF and IDF;
step 7.2 for each user UiAnalyzing the call records of the users within the latest time period delta T, and recording the call of each user uxAnalyze it with UiCalculating the total call times alpha, the total call duration beta and the call rule gamma in the delta T by using the formula IXiAnd uxDegree of relationship therebetween ψix;
ψix=λ1·α+λ2·β+λ3Gamma formula IX
In the formula, λ is more than or equal to 01≤1,0≤λ2≤1,0≤λ31 or less and lambda1+λ2+λ3=1
Δth=th-th-1,h=2,3,...,α
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110458155 CN102591966B (en) | 2011-12-31 | 2011-12-31 | Filtering method of search results in mobile environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110458155 CN102591966B (en) | 2011-12-31 | 2011-12-31 | Filtering method of search results in mobile environment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102591966A true CN102591966A (en) | 2012-07-18 |
CN102591966B CN102591966B (en) | 2013-12-18 |
Family
ID=46480604
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 201110458155 Expired - Fee Related CN102591966B (en) | 2011-12-31 | 2011-12-31 | Filtering method of search results in mobile environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102591966B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102867031A (en) * | 2012-08-27 | 2013-01-09 | 百度在线网络技术(北京)有限公司 | Method and system for optimizing point of interest (POI) searching results, mobile terminal and server |
WO2014101846A1 (en) * | 2012-12-28 | 2014-07-03 | Huawei Technologies Co., Ltd. | Predictive caching in a distributed communication system |
CN104317900A (en) * | 2014-10-24 | 2015-01-28 | 重庆邮电大学 | Multiattribute collaborative filtering recommendation method oriented to social network |
CN104462239A (en) * | 2014-11-18 | 2015-03-25 | 电信科学技术第十研究所 | Customer relation discovery method based on data vectorization spatial analysis |
CN104866474A (en) * | 2014-02-20 | 2015-08-26 | 阿里巴巴集团控股有限公司 | Personalized data searching method and device |
CN105243135A (en) * | 2015-09-30 | 2016-01-13 | 百度在线网络技术(北京)有限公司 | Method and apparatus for showing search result |
CN106570699A (en) * | 2015-10-08 | 2017-04-19 | 平安科技(深圳)有限公司 | Client contact information excavation method and server |
CN111212381A (en) * | 2019-12-18 | 2020-05-29 | 中通服建设有限公司 | Mobile user behavior data analysis method and device, computer equipment and medium |
CN113220969A (en) * | 2020-02-06 | 2021-08-06 | 百度在线网络技术(北京)有限公司 | Advertisement determination method, device, equipment and storage medium |
CN113704604A (en) * | 2021-08-24 | 2021-11-26 | 山东库睿科技有限公司 | Search system and search method |
CN113792180A (en) * | 2021-08-30 | 2021-12-14 | 北京百度网讯科技有限公司 | Duplicate removal method and device in recommendation scene, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1903460A1 (en) * | 2006-09-21 | 2008-03-26 | Sony Corporation | Information processing |
CN101819572A (en) * | 2009-09-15 | 2010-09-01 | 电子科技大学 | Method for establishing user interest model |
CN101923545A (en) * | 2009-06-15 | 2010-12-22 | 北京百分通联传媒技术有限公司 | Method for recommending personalized information |
CN102236646A (en) * | 2010-04-20 | 2011-11-09 | 得利在线信息技术(北京)有限公司 | Personalized item-level vertical pagerank algorithm iRank |
-
2011
- 2011-12-31 CN CN 201110458155 patent/CN102591966B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1903460A1 (en) * | 2006-09-21 | 2008-03-26 | Sony Corporation | Information processing |
CN101923545A (en) * | 2009-06-15 | 2010-12-22 | 北京百分通联传媒技术有限公司 | Method for recommending personalized information |
CN101819572A (en) * | 2009-09-15 | 2010-09-01 | 电子科技大学 | Method for establishing user interest model |
CN102236646A (en) * | 2010-04-20 | 2011-11-09 | 得利在线信息技术(北京)有限公司 | Personalized item-level vertical pagerank algorithm iRank |
Non-Patent Citations (2)
Title |
---|
王秀平等: "个性化学习推荐系统的设计与实现", 《微型电脑应用》 * |
胡娟丽等: "基于典型反馈的个性化文本信息过滤", 《计算机应用》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102867031A (en) * | 2012-08-27 | 2013-01-09 | 百度在线网络技术(北京)有限公司 | Method and system for optimizing point of interest (POI) searching results, mobile terminal and server |
WO2014101846A1 (en) * | 2012-12-28 | 2014-07-03 | Huawei Technologies Co., Ltd. | Predictive caching in a distributed communication system |
CN104866474A (en) * | 2014-02-20 | 2015-08-26 | 阿里巴巴集团控股有限公司 | Personalized data searching method and device |
CN104866474B (en) * | 2014-02-20 | 2018-10-09 | 阿里巴巴集团控股有限公司 | Individuation data searching method and device |
CN104317900A (en) * | 2014-10-24 | 2015-01-28 | 重庆邮电大学 | Multiattribute collaborative filtering recommendation method oriented to social network |
CN104462239B (en) * | 2014-11-18 | 2017-08-25 | 电信科学技术第十研究所 | A kind of customer relationship based on data vector spatial analysis finds method |
CN104462239A (en) * | 2014-11-18 | 2015-03-25 | 电信科学技术第十研究所 | Customer relation discovery method based on data vectorization spatial analysis |
CN105243135B (en) * | 2015-09-30 | 2019-09-20 | 百度在线网络技术(北京)有限公司 | Show the method and device of search result |
CN105243135A (en) * | 2015-09-30 | 2016-01-13 | 百度在线网络技术(北京)有限公司 | Method and apparatus for showing search result |
CN106570699A (en) * | 2015-10-08 | 2017-04-19 | 平安科技(深圳)有限公司 | Client contact information excavation method and server |
CN111212381A (en) * | 2019-12-18 | 2020-05-29 | 中通服建设有限公司 | Mobile user behavior data analysis method and device, computer equipment and medium |
CN113220969A (en) * | 2020-02-06 | 2021-08-06 | 百度在线网络技术(北京)有限公司 | Advertisement determination method, device, equipment and storage medium |
CN113704604A (en) * | 2021-08-24 | 2021-11-26 | 山东库睿科技有限公司 | Search system and search method |
CN113792180A (en) * | 2021-08-30 | 2021-12-14 | 北京百度网讯科技有限公司 | Duplicate removal method and device in recommendation scene, electronic equipment and storage medium |
CN113792180B (en) * | 2021-08-30 | 2024-02-23 | 北京百度网讯科技有限公司 | Method and device for removing duplicate in recommended scene, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN102591966B (en) | 2013-12-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102591966A (en) | Filtering method of search results in mobile environment | |
Mirzasoleiman et al. | Deletion-robust submodular maximization: Data summarization with “the right to be forgotten” | |
Khrouf et al. | Hybrid event recommendation using linked data and user diversity | |
US9195679B1 (en) | Method and system for the contextual display of image tags in a social network | |
CN105138653B (en) | It is a kind of that method and its recommendation apparatus are recommended based on typical degree and the topic of difficulty | |
CN101320375A (en) | Digital book search method based on user click action | |
CN108415928B (en) | Book recommendation method and system based on weighted mixed k-nearest neighbor algorithm | |
Kwak et al. | What we read, what we search: Media attention and public attention among 193 countries | |
CN105718576A (en) | Individual position recommending system related to geographical features | |
CN115408618B (en) | Point-of-interest recommendation method based on social relation fusion position dynamic popularity and geographic features | |
KR20120033821A (en) | System and method for providing search result based on personal network | |
CN106709076A (en) | Social network recommendation device and method based on collaborative filtering | |
CN105654267A (en) | Cold-chain logistic stowage intelligent recommendation method based on spectral cl9ustering | |
CN114282120A (en) | Graph embedding interest point recommendation algorithm fusing multidimensional relation | |
CN116383519A (en) | Group recommendation method based on double weighted self-attention | |
Liu et al. | Clustering analysis of urban fabric detection based on mobile traffic data | |
Sinha | Summarization of archived and shared personal photo collections | |
Cohen et al. | Leveraging discarded samples for tighter estimation of multiple-set aggregates | |
Shiratsuchi et al. | Finding unknown interests utilizing the wisdom of crowds in a social bookmark service | |
CN105447013A (en) | News recommendation system | |
Al-Ghossein et al. | Exploiting contextual and external data for hotel recommendation | |
CN108710620B (en) | Book recommendation method based on k-nearest neighbor algorithm of user | |
Badami et al. | Cross-domain hashtag recommendation and story revelation in social media | |
CN102163227A (en) | Method for analyzing web social network behavior tracks and obtaining control subsets | |
CN115618127A (en) | Collaborative filtering algorithm of neural network recommendation system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20131218 Termination date: 20201231 |
|
CF01 | Termination of patent right due to non-payment of annual fee |