CN102591966A - Filtering method of search results in mobile environment - Google Patents

Filtering method of search results in mobile environment Download PDF

Info

Publication number
CN102591966A
CN102591966A CN2011104581556A CN201110458155A CN102591966A CN 102591966 A CN102591966 A CN 102591966A CN 2011104581556 A CN2011104581556 A CN 2011104581556A CN 201110458155 A CN201110458155 A CN 201110458155A CN 102591966 A CN102591966 A CN 102591966A
Authority
CN
China
Prior art keywords
msub
mrow
users
user
math
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011104581556A
Other languages
Chinese (zh)
Other versions
CN102591966B (en
Inventor
金海�
赵峰
袁平鹏
严奉伟
方飞
谢海洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN 201110458155 priority Critical patent/CN102591966B/en
Publication of CN102591966A publication Critical patent/CN102591966A/en
Application granted granted Critical
Publication of CN102591966B publication Critical patent/CN102591966B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a filtering method of search results in a mobile environment. The method comprises the steps of: finely dividing users into different groups according to history position information characteristics of the users; characteristically modeling the users according to the history query records of the users; analyzing history call records of the users, establishing a social intercourse relation network of the users and calculating the social intercourse relation importance among the users; and during search, firstly, filtering the search results based on contents by using an established user characteristic model, secondly, cooperatively filtering the search results with the finely divided user group information and the excavated information of the social intercourse relation network of the users, and thirdly, returning the search results to the users. With the method for excavating the user characteristics and filtering the information, the search results can be better filtered in a personalized way, a mass of unrelated search results can be removed, a result set can be simplified, and the personalized precise search in the mobile environment can be realized.

Description

Search result filtering method in mobile scene
Technical Field
The invention belongs to the field of information retrieval, and particularly relates to a search result filtering method in a mobile scene.
Background
In the past decade, the technology of search engines has been rapidly developed, and the traditional internet search has been developed from technical implementation to business model, and has been extremely mature and successful. In recent years, emerging technologies and applications represented by the mobile internet are emerging, and mobile search is one of important applications of the mobile internet.
Due to the limitations of mobility, portability, screen size, processing capability and available bandwidth of the mobile terminal, the mobile search cannot directly follow the conventional internet search implementation scheme, and the following two main reasons exist: (1) conventional internet search engines typically return a large number of results to the user, and in fact most of the time these results are not relevant to the user in more than half of the cases. One of the main reasons is that the search engine simply matches the search keywords, does not consider other information (such as user context information, personal preference, etc.), and the proliferation of information on the internet results in the generation of a lot of "junk results", and the user has to filter the search results by himself, which greatly increases the burden of the user. In a mobile scene, due to the limitations of the size, processing capacity, available bandwidth and the like of a screen keyboard of a mobile terminal, the situation is intolerable to a user, so that a large amount of garbage results waste precious flow, and the user is inconvenient to carry out page turning and screening on search results on the mobile terminal, so that the mobile search is determined to be accurate, and the accurate results are returned to the user as few as possible; (2) for the same search keyword, the unified internet search engine returns results with a uniform rule to all users, however, different users have different interests and hobbies due to different background knowledge, and different information requirements. The mobility, portability and privacy of the mobile terminal enable a user to acquire required information anytime and anywhere, so that the personalized search requirement is stronger, which determines that the mobile search is a personalized search related to personal characteristics (such as interests and the like) of the user and the context (such as time, place, weather and the like) of the user.
Therefore, what mobile search needs to achieve is a personalized, accurate search. At present, domestic mobile search research is still in a starting stage, the implementation technology is not mature compared with the existing internet search technology, the earlier technology is a vertical search technology, such as mobile phone music search, novel search and the like, and at present, more implementation schemes are adopted to combine the existing internet search technology and related auxiliary technologies, such as an information filtering technology, firstly, feature modeling is carried out on a user, then, personalized filtering is carried out on search results through the model, irrelevant results are filtered, and personalized accurate search is achieved.
The common techniques for user feature modeling include a vector space model and an ontology model, and the vector space model is simple in principle, easy to implement and relatively wide in application.
The information filtering technology commonly uses a content-based filtering technology and a collaborative filtering technology, the content-based filtering technology is to perform feature extraction on a result, calculate the similarity between the result and a filtering template (user model), and filter according to a set threshold, because the result content is analyzed, a better filtering effect can be achieved generally, but the calculation amount is larger. The collaborative filtering technology is based on the idea that people of the same type usually have the same interest and preference, and the technology is well developed and applied in the field of electronic commerce by performing collaborative filtering on search results of users through users with similar interests to the current users.
Disclosure of Invention
The invention aims to provide a method for filtering search results in a mobile scene, which builds a user characteristic model and a user social network by mining user data (user historical position information, historical call records and the like), respectively carries out content-based filtering and collaborative filtering on the search results according to the user characteristic model and the user social network, filters irrelevant search results, realizes personalized accurate search in the mobile scene, and is valuable for improving the mobile search user experience and the user stickiness.
The invention provides a method for filtering search results in a mobile scene, which comprises the following steps:
step 1 to user UiN, i 1, 21,R2,...,RZEstablishing a feature vector, R, for the result to be filtered using the d-dimensional vector spacerIs expressed as fRr={(q1,v1),(q2,v2),...,(qd,vd)},vaRepresenting the weight in each dimension; f is calculated by using a word frequency/inverse document frequency TF/IDF modelRrWeight v in each dimensionaTo q is paired1,q2,...qdEach word q in (1)aIf it does not occur at RrIf yes, its weight is 0, otherwise it is its TF/IDF value, TF is its value in RrThe number of times of occurrence in the process, namely IDF (inverse document frequency), and counting the number z of results containing the word;
wherein, the IDF value is log (Z/Z), Z is the number of initial results to be filtered, TF/IDF value is the product of TF and IDF, r is 1, 2,.., Z, a is 1, 2,.., d;
step 2, searching the current user UiThe similar users are selected from two user sets, namely a group G to which the users belonggG is the serial number of the group to which the user belongs, the value range is 1 to m, and the other is the set of the users in the user social network, the two sets are merged to obtain a set S, and the user in the set is marked as UisCalculating the user U by using the vector cosine angle formula shown in formula IiWith each user U in the set SisSimilarity between vectors is as shown in formula II, the smaller the vector included angle is, the larger the cosine value is, the larger the similarity is, and vice versa; i denotes the serial number of the user, N denotes the number of users, i 1, 2UiAnd fUisRespectively represents UiAnd UisCharacteristic vector of phi (U)i,Uis) Represents UiAnd UisDegree of relationship between them, if UisAt UiIn the social network of (2), then ψ (U)i,Uis) Taking a corresponding value, otherwise, taking a zero value; selecting front eta users U from high to low according to similarityi1,Ui2,...,UIf the number of the users is less than eta, all the users in the S are selected; eta is a preset value;
<math> <mrow> <mi>sim</mi> <mrow> <mo>(</mo> <msub> <mi>U</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>U</mi> <mi>is</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>+</mo> <mi>&psi;</mi> <mrow> <mo>(</mo> <msub> <mi>U</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>U</mi> <mi>is</mi> </msub> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <mi>cos</mi> <mrow> <mo>(</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>i</mi> </msub> </msub> <mo>,</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>is</mi> </msub> </msub> <mo>)</mo> </mrow> </mrow> </math> formula I
<math> <mrow> <mi>cos</mi> <mrow> <mo>(</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>i</mi> </msub> </msub> <mo>,</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>is</mi> </msub> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>i</mi> </msub> </msub> <mo>&CenterDot;</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>is</mi> </msub> </msub> </mrow> <mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>is</mi> </msub> </msub> <mo>|</mo> <mo>|</mo> <mo>&CenterDot;</mo> <mo>|</mo> <mo>|</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>is</mi> </msub> </msub> <mo>|</mo> <mo>|</mo> </mrow> </mfrac> </mrow> </math> Formula II
Step 3, filtering based on contents:
for each initial result R to be filteredrSequentially calculating the user U and the user U by adopting a formula IIIiSimilarity between them, fUiAnd fRrRespectively represents UiAnd RrThe feature vector of (2); filtering according to the similarity and a preset threshold value zeta, and filtering the initial results with the similarity smaller than the threshold value zeta to obtain an intermediate result set Rr,r=1,2,...,ZζThe intermediate results obtained by filtering are arranged according to the original sequence;
sim ( U i , R r ) = cos ( f U i , f R r ) formula III
Wherein, <math> <mrow> <mi>cos</mi> <mrow> <mo>(</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>i</mi> </msub> </msub> <mo>,</mo> <msub> <mi>f</mi> <msub> <mi>R</mi> <mi>r</mi> </msub> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>i</mi> </msub> </msub> <mo>&CenterDot;</mo> <msub> <mi>f</mi> <msub> <mi>R</mi> <mi>r</mi> </msub> </msub> </mrow> <mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>i</mi> </msub> </msub> <mo>|</mo> <mo>|</mo> <mo>&CenterDot;</mo> <mo>|</mo> <mo>|</mo> <msub> <mi>f</mi> <msub> <mi>R</mi> <mi>r</mi> </msub> </msub> <mo>|</mo> <mo>|</mo> </mrow> </mfrac> </mrow> </math>
step 2 to intermediate result set Rr,r=1,2,...,ZζPerforming collaborative filtering by using user UiEta most similar users Ui1,Ui2,...,UTo the intermediate result RrCalculating the similarity sim' (U) according to the formula IVi,Rr) Carrying out the synergistic filtration, wherein in the formula,
Figure BDA0000127809440000045
and
Figure BDA0000127809440000046
respectively represents UisAnd Ui,UisAnd RrThe similarity between them;
<math> <mrow> <msup> <mi>sim</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <msub> <mi>U</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>R</mi> <mi>r</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>s</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>&eta;</mi> </munderover> <mrow> <mo>(</mo> <mi>cos</mi> <mrow> <mo>(</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>is</mi> </msub> </msub> <mo>,</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>i</mi> </msub> </msub> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <mi>cos</mi> <mrow> <mo>(</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>is</mi> </msub> </msub> <mo>,</mo> <msub> <mi>f</mi> <msub> <mi>R</mi> <mi>r</mi> </msub> </msub> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mrow> </math> formula IV
Rankr=θ·r+(1-θ)·sim′(Ui,Rr) Formula V
According to sim' (U)i,Rr) Carrying out collaborative filtering according to a preset threshold epsilon, filtering intermediate results with similarity smaller than epsilon, and obtaining a temporary result set Rr,r=1,2,...,ZεAnd r represents the sequence in the temporary result setSequence of 1, 2, 1, ZεTo temporary RrThen, the order r and sim' are calculated by formula V using a predetermined weighting factor θ (U)i,Rr) As a weighted sum of the final result ranking RankrRanking the temporary result set R with thisrAnd reordering to obtain a final result, returning the final result to the user, and ending the filtering process.
The search result filtering method under the mobile scene comprehensively adopts a data mining method (classification and clustering) and is based on a content filtering algorithm and a collaborative filtering algorithm. Specifically, the present invention has the following effects and advantages:
(1) the method and the device have high accuracy, and the social network information of the user is innovatively analyzed, and the collaborative filtering is simultaneously carried out on the basis of the traditional content-based filtering, so that the accuracy is greatly improved.
(2) The invention has strong adaptability, and can well adapt to the individual requirements of various user groups and individuals in consideration of the diversity of the mobile user groups and individuals.
(3) The method has high expandability, can be used for mobile search, mobile internet application, accurate advertisement delivery and the like, and can also be used for Customer Relationship Management (CRM) and the like.
Drawings
FIG. 1 is an overall flow diagram of the process of the present invention;
FIG. 2 is a simplified diagram of a mobile user's historical location change frequency;
FIG. 3 is a flow chart of mobile user clustering by location;
FIG. 4 is a diagram of a mobile user social network architecture;
FIG. 5 is a flow diagram illustrating the detailed filtering of mobile search results.
Detailed Description
The present invention will be described in detail with reference to the accompanying drawings.
The method for filtering the search results in the mobile scene, as shown in fig. 1, includes a filtering preprocessing stage, which mainly includes user segmentation, user feature model construction and user social network construction, which correspond to the following steps (1) to (3), respectively, and a result filtering stage, which corresponds to the following step (4). The specific treatment steps are as follows:
1. and a filtration pretreatment stage, which comprises the following steps (1) to (3).
(1) The invention relates to a method for subdividing users, which comprises the following steps that a data mining method is adopted to subdivide the users, a large amount of user data such as historical position information of the users, historical call records, historical query records and browsing records of the users, historical service data and the like are collected in a user data set provided by the existing telecommunication operator, and the method mainly subdivides the users according to the historical position information of the users, and comprises the following specific steps:
(a) dividing users according to the change frequency of the historical positions of the users, wherein the historical position information of the users records the historical positions L of the users and corresponding time information T, the position information L is recorded in a data set in the form of longitude and latitude (30.2332, 114.3243), the time information T is recorded in the form of time points, the longitude and latitude of two adjacent historical positions of the users are known, the distance of the two adjacent historical positions of the users is easily calculated by adopting a longitude and latitude distance formula (1)), and the first position L is set1Has a longitude and latitude of (lon)1,lat1) Second position L2Has a longitude and latitude of (lon)2,lat2) According to the reference of 0-degree warp, the east warp takes a positive value, the west warp takes a negative value, the north weft is calculated according to the (90 degrees-lat) substitution, the south weft is calculated according to the (90 degrees + lat) substitution, and the distance between the two points can be calculated by using the formula (1).
C=sin(lat1)·sin(lat2)·cos(lon1-lon2)+cos(lat1)·cos(lat2)
<math> <mrow> <mi>Dis</mi> <mrow> <mo>(</mo> <msub> <mi>L</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>L</mi> <mn>2</mn> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mi>R</mi> <mo>&CenterDot;</mo> <mi>arccos</mi> <mrow> <mo>(</mo> <mi>C</mi> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <mfrac> <mi>&pi;</mi> <mn>180</mn> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </math>
For each user Ui(i 1, 2.., N), calculating the historical accumulated change frequency F of the position in the latest period of time Δ T (such as one month)i(i ═ 1, 2.., N), where N represents the number of users.
<math> <mrow> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mi>&Delta;T</mi> </mfrac> <munderover> <mi>&Sigma;</mi> <mn>1</mn> <mi>M</mi> </munderover> <mo>|</mo> <mfrac> <mrow> <mi>Dis</mi> <mrow> <mo>(</mo> <msub> <mi>L</mi> <mi>k</mi> </msub> <mo>,</mo> <msub> <mi>L</mi> <mrow> <mi>k</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>T</mi> <mi>k</mi> </msub> <mo>-</mo> <msub> <mi>T</mi> <mrow> <mi>k</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> </mrow> </mfrac> <mo>|</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow> </math>
As shown in formula (2), (L)1,T1),(L2,T2),...,(LM,TM) Is a user Ui(i ═ 1, 2.., N) historical location information over a recent period of time Δ T, (L ═ L ·k-1,Tk-1) And (L)k,Tk) I.e. two neighboring past locations of users and time information, Dis (L)k,Lk-1) And Tk-Tk-1The difference between the historical position distance and the time of two adjacent times is respectively. M represents the number of historical locations of the current user, and k represents the serial number of the historical locations.
F of all users is counted to obtain a total range interval omega of the F, and the omega is divided into a plurality of subintervals omega1,Ω2,...,ΩnAnd n represents the number of user groups, the sub-intervals represent different user groups by F, and the users are divided into corresponding sub-intervals according to the F, as shown in fig. 2, the F of the user a is higher and may be business people who frequently go on business. If the F of the user B is low, the user B may often be in a fixed position for a long time, for example, a college student, so that the users are divided into different groups Ω according to the frequency F of the change of the position1,Ω2,...,Ωn. The dividing of Ω may be performed in an equal division manner, or a division standard may be preset by the system.
(b) Then for each omegajAnd (j is 1, 2.. multidot.n.j represents the serial number of the group), clustering is carried out on the users in the groups according to historical position information, the users in the adjacent positions are clustered into one class, and related research shows that the users in the adjacent geographical positions have similar use to a certain extentUsing a k-means clustering algorithm to perform per omega on the household characteristicsjAnd (j) clustering the users in (j) 1, 2.. times.n) by the following steps:
(b1) first, calculate each user Ui(i ═ 1, 2.., N) of the center position O of the historical positions at time Δ TiAccording to OiClustering users; i represents the user's serial number;
(b2) from ΩjRandomly selecting k users from (j ═ 1, 2.. times.n), wherein each user U is a userqAnd (q ═ 1, 2.. times, k) represents an initial cluster of users Cq(q ═ 1, 2,. k), O thereofq(q ═ 1, 2.., k) represents the initial center of the user cluster;
(b3) for omegajEach user remaining in (j ═ 1, 2.. times.n), which is computed with each user cluster CqK (q ═ 1, 2.. k) center OqA distance (longitude and latitude distance formula) of (q ═ 1, 2.., k), which is assigned to a user cluster closest to the user cluster;
(b4) then recalculate the new center value O of each user clusterqAnd (q ═ 1, 2.., k), the old center value is replaced. Calculating a criterion function E according to equation (3)jA value of (E)jIf the values of (c) are converged, the clustering process is ended, otherwise, go to step b 3.
<math> <mrow> <msub> <mi>E</mi> <mi>j</mi> </msub> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>q</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>k</mi> </munderover> <munder> <mi>&Sigma;</mi> <mrow> <mi>U</mi> <mo>&Element;</mo> <msub> <mi>&Omega;</mi> <mi>j</mi> </msub> </mrow> </munder> <mi>Dis</mi> <mrow> <mo>(</mo> <mi>U</mi> <mo>,</mo> <msub> <mi>C</mi> <mi>q</mi> </msub> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math> (j=1,2,....n) (3)
Dis (U, C) as shown in formula (3)q) Represents omegajUsers in (j ═ 1, 2.. times, n) and user cluster CqK (q ═ 1, 2.. k) center OqA distance of (q ═ 1, 2.., k).
Clustering results in a compact cluster of users, thus at Ω1,Ω2,...,ΩnOn the basis of the division, the users are further divided into smaller groups G1,G2,...,GmAnd realizing user subdivision.
(2) The method comprises the following steps of constructing a user characteristic model, well representing interest characteristics of a user through a historical query record of the user, and performing characteristic modeling on the user by adopting a vector idle-running model through analyzing the historical query record of the user, wherein the method comprises the following steps:
(a) all historical query records of all users within delta T time are counted, and d different words q are obtained through statistics1,q2,...,qdThe feature vector of the user is represented as f as d dimensions of the vector spaceUi={(q1,v1),(q2,v2),...,(qd,vd)},(i=1,2,...,N),vaAnd (a ═ 1, 2.., d) represents the weight of each dimension.
(b) Adopting TF/IDF (word frequency/inverse document frequency) model to process each user UiAnd (i ═ 1, 2.. times, N), the weight of each dimension of its feature vector is calculated. To q is1,q2,...,qdEach word q in (1)a(a 1, 2.. d), if it does not appear in the user's historical query record, then its corresponding weight vaAnd (a ═ 1, 2.. multidot.d.) is 0, otherwise, the TF/IDF value is the frequency of the word, namely the frequency of the word appearing in the historical query records of the user, and the IDF is the frequency of the inverse document, and the frequency of the word appearing in the historical query records is countedThe IDF value is log (N/D), N is the number of all users, and the TF/IDF value is the product of TF and IDF.
(3) Mining social network information of users, analyzing historical call records of users, and analyzing the historical call records of each user Ui(i ═ 1, 2.. times, N), the social network appears as a star topology centered on the user, as shown in fig. 3, the center node B represents the user himself, the star nodes a, C, D, E, F, G, etc. represent users who have a call record with B, the weight ψ of the edges represents the degree of relationship between the users, and this step is mainly to estimate the value ψ.
The historical call record data of the users records the call records among all the users, including the id numbers of both parties of the call), the call start time, the call end time and the like, and for each user Ui(i 1, 2.. ang., N), analyzing call records in delta T time, and recording call to each user ux(x 1, 2.. e, e represents the number of users with whom the call records are made), and U is analyzed with UiTotal number of calls α, total call duration β, and call law γ within Δ T (i ═ 1, 2.., N), and by analyzing these factors in combination, U can be roughly inferredi(i ═ 1, 2.., N) and uxDegree of relationship ψ between (x ═ 1, 2.., e)ix
The total call times alpha and the total call duration beta are easy to be counted, but the total call times alpha and the total call duration beta are general statistics and single statistics, only the relationship degree between the users can be roughly estimated on the whole, important detail characteristics are ignored, such as whether the distribution of each call event along with the time is uniform, whether the call events are uniform on the whole or locally, and the like, and therefore, the characteristic factor of the call rule gamma is introduced to represent Ui(i ═ 1, 2.., N) and uxThe degree of relation between (x ═ 1, 2.. times, e) is obtained by statistically analyzing the time distribution characteristics of all call events within the time Δ T and borrowing the idea of variance, as shown in formulas (4), (5), (6), and (T)hAnd (h ═ 1, 2.,. alpha.) is the start time of each call, Δ thFor time between two adjacent call recordsDifference between StFor its variance, γ is inversely proportional to StAs shown in equation (6), a small variance indicates that the call is regular during the period of time, and γ is correspondingly large, or vice versa.
Δth=th-th-1,(h=2,3,...,α) (4)
<math> <mrow> <mover> <mi>&Delta;t</mi> <mo>&OverBar;</mo> </mover> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mi>&alpha;</mi> <mo>-</mo> <mn>1</mn> </mrow> </mfrac> <munderover> <mi>&Sigma;</mi> <mrow> <mi>h</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>&alpha;</mi> </munderover> <msub> <mi>&Delta;t</mi> <mi>h</mi> </msub> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>5</mn> <mo>)</mo> </mrow> </mrow> </math>
<math> <mrow> <msub> <mi>S</mi> <mi>t</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mi>&alpha;</mi> <mo>-</mo> <mn>1</mn> </mrow> </mfrac> <munderover> <mi>&Sigma;</mi> <mrow> <mi>h</mi> <mo>=</mo> <mn>2</mn> </mrow> <mi>&alpha;</mi> </munderover> <msup> <mrow> <mo>(</mo> <mover> <mi>&Delta;t</mi> <mo>&OverBar;</mo> </mover> <mo>-</mo> <msub> <mi>&Delta;t</mi> <mi>h</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mo>)</mo> </mrow> </mrow> </math>
<math> <mrow> <mi>&gamma;</mi> <mo>=</mo> <mfrac> <mn>1</mn> <msub> <mi>S</mi> <mi>t</mi> </msub> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>7</mn> <mo>)</mo> </mrow> </mrow> </math>
Normalizing the calculated alpha, beta and gamma to obtain a value phi between 0 and 1ixThe value of (i ═ 1, 2., N, x ═ 1, 2., e) is calculated by equation (8), which is a weighted value obtained by comprehensively considering α, β, γ, and in equation (8), λ is 0 ≦ λ1≤1,0≤λ2≤1,0≤λ31 or less and lambda123Its default value is the average value 1/3.
ψix=λ1·α+λ2·β+λ3·γ,(λ123=1) (8)
Thus, through the analysis and calculation of the step, each user U is obtainediSocial network information of (i ═ 1, 2.., N), including users u with whom it is connected toxDegree of relationship ψ between (x ═ 1, 2.., e)ix
(4) And (3) filtering the search results, wherein the previous steps (1) to (3) are preparation stages, and are used for a search result filtering service of the step, the user feature model established in the step (2) is used for filtering the search results based on content, and the user segmentation performed in the step (1) and the user social network information mined in the step (3) are used for performing collaborative filtering on the search results.
This step performs content-based filtering and then collaborative filtering on the search results. So as to achieve the purposes of individualization and simplifying the search results.
User Ui(i ═ 1, 2.., N) submits a search Q, the search request is first processed by the existing internet search engine, which returns an initial result set to search Q, the result set is usually large, the previous phi pieces of results in the result set are selected for filtering, if there are not enough phi pieces, the whole initial result set is selected as the result set R to be filtered1,R2,...,RZPhi is an empirical value preset by the system, e.g. set to 300, and Z is the number of results to be filtered. The resulting filtration scheme is shown in FIG. 5, with the following steps:
(a) result set R to be filtered1,R2,...,RZEstablishing a feature vector, R, for the results using the d-dimensional vector space established in step (2)rThe feature vector of (r ═ 1, 2.., Z) is denoted as fRr={q1,v1),(q2,v2),...,(qd,vd)},(r=1,2,...,Z),vaAnd (a ═ 1, 2.., d) represents the weight in each dimension. F is calculated by using TF/IDF (term frequency/inverse document frequency) model used in the step (2)Rr(r ═ 1, 2.., Z) weight v in each dimensiona(a ═ 1, 2,. and d), pair q1,q2,...qdEach word q in (1)a(a ═ 1, 2.., d), if it does not occur at Rr(R1, 2.. times.z), then its weight is 0, otherwise its TF/IDF value, TF is its value in RrThe number of occurrences in (r ═ 1, 2., Z), IDF, i.e., inverse document frequency, and the number of results Z containing the word, IDF value, i.e., log (Z/Z), Z being the total number of results, TF/IDF value being the product of TF and IDF.
(b) Then searching the current user Ui(i 1, 2.., N) of similar users, selected from two sets of users, one is the group G to which the user belongs in step (1)gG is the serial number of the group to which the user belongs, the numeric area is 1 to m, and secondly, the social network of the user is established in the step (3)The two sets are merged (possibly with duplicate users) to obtain a set S, and a plurality of similar users are selected from the set S.
<math> <mrow> <mi>sim</mi> <mrow> <mo>(</mo> <msub> <mi>U</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>U</mi> <mi>is</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>+</mo> <mi>&psi;</mi> <mrow> <mo>(</mo> <msub> <mi>U</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>U</mi> <mi>is</mi> </msub> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <mi>cos</mi> <mrow> <mo>(</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>i</mi> </msub> </msub> <mo>,</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>is</mi> </msub> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>9</mn> <mo>)</mo> </mrow> </mrow> </math>
<math> <mrow> <mi>cos</mi> <mrow> <mo>(</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>i</mi> </msub> </msub> <mo>,</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>is</mi> </msub> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>i</mi> </msub> </msub> <mo>&CenterDot;</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>is</mi> </msub> </msub> </mrow> <mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>is</mi> </msub> </msub> <mo>|</mo> <mo>|</mo> <mo>&CenterDot;</mo> <mo>|</mo> <mo>|</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>is</mi> </msub> </msub> <mo>|</mo> <mo>|</mo> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>10</mn> <mo>)</mo> </mrow> </mrow> </math>
In the formula (10), | | | | represents a modulus of the vector.
(5) Calculating U by using vector cosine included angle formula shown in formula (10)i(i 1, 2.. N) and each user U in the set SisThe similarity between vectors is smaller, the cosine value is larger, the similarity is larger, and vice versa, as shown in formula (9). f. ofUiAnd fUisRespectively represents UiAnd UisCharacteristic vector of phi (U)i,Uis) Represents UiAnd UisDegree of relationship between them, if UisAt UiIn the social network of (2), then ψ (U)i,Uis) Take the corresponding value, otherwise take the value zero. Selecting front eta users U from high to low according to similarityi1,Ui2,...,UAnd if the number of the users is less than eta, selecting all the users in the S. Eta is an empirical value, which is preset by the system, and the default value of eta can be 10.
(c) Then, result filtering is started, and the filtering process is divided into two stages, namely a content-based filtering stage and a collaborative filtering stage:
(c1) firstly, based on content filtering, each strip to be filtered in (a) is filteredInitial result Rr(r ═ 1, 2.., Z), which is computed in turn with user UiSimilarity between (i ═ 1, 2.., N), and similarity between the two is calculated by using formula (10) as shown in formula (11), and f is calculated by using formula (11) as shown in the figureUiAnd fRrRespectively represents UiAnd RrThe feature vector of (2). Filtering according to the similarity by a threshold value zeta, and filtering the results with the similarity smaller than zeta to obtain an intermediate result set Rr,(r=1,2,...,Zζ) And arranging the intermediate results obtained by filtering according to the original sequence. The threshold value ζ is an empirical value, preset by the system, 0 ≦ ζ ≦ 1, and its default value may be set to 0.65.
sim ( U i , U r ) = cos ( f U i , f R r ) - - - ( 11 )
(c2) Next, for the intermediate result set Rr,(r=1,2,...,Zζ) Performing collaborative filtering, wherein the collaborative filtering is based on the idea that similar users usually have similar interests, and the collaborative filtering is performed on the current user by using the similar users of the current user to perform collaborative recommendation, and the user U obtained by calculation in the step (b) is adoptediη most similar users U of (i ═ 1, 2.., N)i1,Ui2,...,UTo the intermediate result Rr,(r=1,2,...,Zζ) The similarity sim' (U) is calculated according to equation (12)i,Rr) Performing collaborative filtering, wherein a vector cosine included angle formula of a formula (10) is adopted,
Figure BDA0000127809440000122
and
Figure BDA0000127809440000123
respectively represents UisAnd Ui,UisAnd RrThe similarity between them.
<math> <mrow> <msup> <mi>sim</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <msub> <mi>U</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>R</mi> <mi>r</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>s</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>&eta;</mi> </munderover> <mrow> <mo>(</mo> <mi>cos</mi> <mrow> <mo>(</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>is</mi> </msub> </msub> <mo>,</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>i</mi> </msub> </msub> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <mi>cos</mi> <mrow> <mo>(</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>is</mi> </msub> </msub> <mo>,</mo> <msub> <mi>f</mi> <msub> <mi>R</mi> <mi>r</mi> </msub> </msub> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>12</mn> <mo>)</mo> </mrow> </mrow> </math>
Rankr=θ·r+(1-θ)·sim′(Ui,Rr) (13)
According to sim' (U)i,Rr) Carrying out collaborative filtering according to a threshold value epsilon, filtering the result with similarity smaller than epsilon, and obtaining a temporary result set Rr,(r=1,2,...,Zε) R represents the sequential ordering of the two in the temporary result set, and is 1, 2ε) To R, to Rr,(r=1,2,...,Zε) The order r and sim' thereof are calculated by a weighting coefficient theta (U)i,Rr) As a weighted sum of the final result ranking RankrR is ranked as shown in equation (13)r,(r=1,2,...,Zε) And reordering to obtain a final result, returning the final result to the user, and finishing the filtering process. The threshold epsilon and the weighting coefficient theta are empirical values and are preset by the system, epsilon is more than or equal to 0 and less than or equal to 1, theta is more than or equal to 0 and less than or equal to 1, the default value of epsilon can be set to 0.85, and the default value of theta can be set to 0.5.
The present invention is not limited to the above embodiments, and those skilled in the art can implement the present invention in other various embodiments according to the disclosure of the present invention, so that all designs and concepts of the present invention can be changed or modified without departing from the scope of the present invention.

Claims (7)

1. A method for filtering search results in a mobile scene comprises the following steps:
step 1 to user UiN, i 1, 21,R2,...,RZEstablishing a feature vector, R, for the result to be filtered using the d-dimensional vector spacerIs expressed as fRr={q1,v1),(q2,v2),...,(qd,vd)},vaRepresenting the weight in each dimension; using word frequency/inverse document frequencyRate TF/IDF model calculation fRrWeight v in each dimensionaTo q is paired1,q2,...qdEach word q in (1)aIf it does not occur at RrIf yes, its weight is 0, otherwise it is its TF/IDF value, TF is its value in RrThe number of times of occurrence in the process, namely IDF (inverse document frequency), and counting the number z of results containing the word;
wherein, the IDF value is log (Z/Z), Z is the number of initial results to be filtered, TF/IDF value is the product of TF and IDF, r is 1, 2,.., Z, a is 1, 2,.., d;
step 2, searching the current user UiThe similar users are selected from two user sets, namely a group G to which the users belonggG is the serial number of the group to which the user belongs, the value range is 1 to m, and the other is the set of the users in the user social network, the two sets are merged to obtain a set S, and the user in the set is marked as UisCalculating the user U by using the vector cosine angle formula shown in formula IiWith each user U in the set SisSimilarity between vectors is as shown in formula II, the smaller the vector included angle is, the larger the cosine value is, the larger the similarity is, and vice versa; i denotes the serial number of the user, N denotes the number of users, i 1, 2UiAnd fUisRespectively represents UiAnd UisCharacteristic vector of phi (U)i,Uis) Represents UiAnd UisDegree of relationship between them, if UisAt UiIn the social network of (2), then ψ (U)i,Uis) Taking a corresponding value, otherwise, taking a zero value; selecting front eta users U from high to low according to similarityi1,Ui2,...,UIf the number of the users is less than eta, all the users in the S are selected; eta is a preset value;
<math> <mrow> <mi>sim</mi> <mrow> <mo>(</mo> <msub> <mi>U</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>U</mi> <mi>is</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>+</mo> <mi>&psi;</mi> <mrow> <mo>(</mo> <msub> <mi>U</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>U</mi> <mi>is</mi> </msub> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <mi>cos</mi> <mrow> <mo>(</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>i</mi> </msub> </msub> <mo>,</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>is</mi> </msub> </msub> <mo>)</mo> </mrow> </mrow> </math> formula I
<math> <mrow> <mi>cos</mi> <mrow> <mo>(</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>i</mi> </msub> </msub> <mo>,</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>is</mi> </msub> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>i</mi> </msub> </msub> <mo>&CenterDot;</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>is</mi> </msub> </msub> </mrow> <mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>is</mi> </msub> </msub> <mo>|</mo> <mo>|</mo> <mo>&CenterDot;</mo> <mo>|</mo> <mo>|</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>is</mi> </msub> </msub> <mo>|</mo> <mo>|</mo> </mrow> </mfrac> </mrow> </math> Formula II
Step 3, filtering based on contents:
for each initial result R to be filteredrSequentially calculating the user U and the user U by adopting a formula IIIiSimilarity between them, fUiAnd fRrRespectively represents UiAnd RrThe feature vector of (2); filtering according to the similarity and a preset threshold value zeta, and filtering the initial results with the similarity smaller than the threshold value zeta to obtain an intermediate result set Rr,r=1,2,...,ZζThe intermediate results obtained by filtering are arranged according to the original sequence;
sim ( U i , R r ) = cos ( f U i , f R r ) formula III
Wherein, <math> <mrow> <mi>cos</mi> <mrow> <mo>(</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>i</mi> </msub> </msub> <mo>,</mo> <msub> <mi>f</mi> <msub> <mi>R</mi> <mi>r</mi> </msub> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>i</mi> </msub> </msub> <mo>&CenterDot;</mo> <msub> <mi>f</mi> <msub> <mi>R</mi> <mi>r</mi> </msub> </msub> </mrow> <mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>i</mi> </msub> </msub> <mo>|</mo> <mo>|</mo> <mo>&CenterDot;</mo> <mo>|</mo> <mo>|</mo> <msub> <mi>f</mi> <msub> <mi>R</mi> <mi>r</mi> </msub> </msub> <mo>|</mo> <mo>|</mo> </mrow> </mfrac> </mrow> </math>
step 2 to intermediate result set Rr,r=1,2,...,ZζPerforming collaborative filtering by using user UiEta most similar users Ui1,Ui2,...,UTo the intermediate result RrCalculating the similarity sim' (U) according to the formula IVi,Rr) Carrying out the synergistic filtration, wherein in the formula,
Figure FDA0000127809430000024
and
Figure FDA0000127809430000025
respectively represents UisAnd Ui,UisAnd RrThe similarity between them;
<math> <mrow> <msup> <mi>sim</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <msub> <mi>U</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>R</mi> <mi>r</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>s</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>&eta;</mi> </munderover> <mrow> <mo>(</mo> <mi>cos</mi> <mrow> <mo>(</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>is</mi> </msub> </msub> <mo>,</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>i</mi> </msub> </msub> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <mi>cos</mi> <mrow> <mo>(</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>is</mi> </msub> </msub> <mo>,</mo> <msub> <mi>f</mi> <msub> <mi>R</mi> <mi>r</mi> </msub> </msub> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mrow> </math> formula IV
Rankr=θ·r+(1-θ)·sim′(Ui,Rr) Formula V
According to sim' (U)i,Rr) Carrying out collaborative filtering according to a preset threshold epsilon, filtering intermediate results with similarity smaller than epsilon, and obtaining a temporary result set Rr,r=1,2,...,ZεR represents the sequential ordering of the two in the temporary result set, and is 1, 2εTo temporary RrThen, the order r and sim' are calculated by formula V using a predetermined weighting factor θ (U)i,Rr) As a weighted sum of the final result ranking RankrRanking the temporary result set R with thisrAnd reordering to obtain a final result, returning the final result to the user, and ending the filtering process.
2. The method for filtering search results in a mobile scene according to claim 1, wherein: the initial result set in step 1 is obtained as follows:
for user UiSubmitting a search Q, processing a search request by an existing internet search engine, returning an initial result set to the search Q by the existing internet search engine, selecting previous phi bars in the result set for filtering, and if the phi bars are not enough, selecting all the initial result sets as a result set R to be filtered1,R2,...,RZPhi is preset by the system and Z is the number of results to be filtered.
3. The method for filtering search results in a mobile scene according to claim 1, wherein: step 1, obtaining a feature vector of a result to be filtered according to the following mode:
all historical query records of all users within delta T time are counted, and d different words q are obtained through statistics1,q2,...,qdThe feature vector of the user is represented as f as d dimensions of the vector spaceUi={q1,v1),(q2,v2),...,(qd,vd)},i=1,2,...,N,vaAnd a is 1, 2, and d represents the weight of each dimension.
4. The method for filtering search results in a mobile scene according to claim 1, wherein: and step 2, obtaining the most similar user according to the following modes:
step 4.1 of finding the current user UiSimilar users of (2) group G to which the users belonggMerging the set S and the set of the users in the user social network to obtain a set S, wherein g is the serial number of the group to which the users belong, the numeric area of the set S is 1-m, and m represents the number of the group;
step 4.2 calculate U using formula VIiWith each user U in the set SisSimilarity sim (U) betweeni,Uis),fUiAnd fUisRespectively represents UiAnd UisCharacteristic vector of phi (U)i,Uis) Represents UiAnd UisDegree of relationship between them, if UisAt UiIn the social network of (2), then ψ (U)i,Uis) Taking a corresponding value, otherwise, taking a zero value; selecting front eta users U from high to low according to similarityi1,Ui2,...,UIf the number of the users is less than eta, all the users in the S are selected; eta is a preset value;
<math> <mrow> <mi>sim</mi> <mrow> <mo>(</mo> <msub> <mi>U</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>U</mi> <mi>is</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>+</mo> <mi>&psi;</mi> <mrow> <mo>(</mo> <msub> <mi>U</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>U</mi> <mi>is</mi> </msub> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <mi>cos</mi> <mrow> <mo>(</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>i</mi> </msub> </msub> <mo>,</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>is</mi> </msub> </msub> <mo>)</mo> </mrow> </mrow> </math> formula VI
Wherein, <math> <mrow> <mi>cos</mi> <mrow> <mo>(</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>i</mi> </msub> </msub> <mo>,</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>is</mi> </msub> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>i</mi> </msub> </msub> <mo>&CenterDot;</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>is</mi> </msub> </msub> </mrow> <mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>is</mi> </msub> </msub> <mo>|</mo> <mo>|</mo> <mo>&CenterDot;</mo> <mo>|</mo> <mo>|</mo> <msub> <mi>f</mi> <msub> <mi>U</mi> <mi>is</mi> </msub> </msub> <mo>|</mo> <mo>|</mo> </mrow> </mfrac> <mo>.</mo> </mrow> </math>
5. the method for filtering search results in a mobile scene according to claim 4, wherein: in step 4.1, the group G to which the user belongsgThe following is obtained:
5.1, dividing the users according to the historical position change frequency of the users, wherein the historical position information of the users records the historical position information L of the users and corresponding time information T, the historical position information L is recorded in a data set in a form of longitude and latitude, the time information T is recorded in a form of time point, the longitude and latitude of two adjacent historical positions of the users are known, and the distance of the users is calculated by adopting a longitude and latitude distance formula;
for each user UiCalculating the cumulative change frequency F of the historical position within the latest period of time Delta T according to the formula VIIij
<math> <mrow> <msub> <mi>F</mi> <mi>i</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mi>&Delta;T</mi> </mfrac> <munderover> <mi>&Sigma;</mi> <mn>1</mn> <mi>M</mi> </munderover> <mo>|</mo> <mfrac> <mrow> <mi>Dis</mi> <mrow> <mo>(</mo> <msub> <mi>L</mi> <mi>k</mi> </msub> <mo>,</mo> <msub> <mi>L</mi> <mrow> <mi>k</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>T</mi> <mi>k</mi> </msub> <mo>-</mo> <msub> <mi>T</mi> <mrow> <mi>k</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> </mrow> </mfrac> <mo>|</mo> </mrow> </math> VII
(L1,T1),(L2,T2),...,(LM,TM) Is a user UiHistorical position information over a recent period of time Δ T, (L)k-1,Tk-1) And (L)k,Tk) I.e. two neighboring past locations of users and time information, Dis (L)k,Lk-1) And Tk-Tk-1Respectively representing the difference between the distance between the two adjacent historical positions and the time; m represents the number of the historical positions of the current user, and k represents the serial number of the historical positions;
step 5.2, counting the accumulated change frequency F of the historical positions of all the users to obtain the total range interval omega of the F, and dividing the omega into a plurality of subintervals omega1,Ω2,...,ΩnN represents the number of user groups, the sub-intervals represent different user groups by F, the users are divided into corresponding sub-intervals according to the F, and the users are divided into different groups omega1,Ω2,...,Ωn
Step 5.3 for each ΩjThe users in the system are clustered according to historical position information, users in adjacent positions are clustered into one class, and then the users are further divided into smaller groups G1,G2,...,GmJ 1, 2.. and n, j denote the serial number of the population.
6. The method for filtering search results in a mobile scene according to claim 5, wherein: step 5.3, adopting a k-means clustering algorithm to carry out on each omegajThe user in the method carries out clustering, and the steps are as follows:
(b1) first, calculate each user UiCenter position O of history position in recent period Δ TiAccording to the central position OiClustering users; i represents the user's serial number;
(b2) fromΩjIn the method, k users are randomly selected, and each user UqRepresents an initial user cluster CqAt its central position OqRepresents the initial center of the user cluster, q 1, 2.., k;
(b3) for omegajAnd each user remaining in the cluster C is calculatedqCenter position OqIs assigned to the closest user cluster;
(b4) then recalculate the new center position O of each user clusterq-replacing the old center value; calculating a criterion function E according to formula VIIIjA value of (E)jIf the value is converged, the clustering process is ended, otherwise, the step b3 is switched;
<math> <mrow> <msub> <mi>E</mi> <mi>j</mi> </msub> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>q</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>k</mi> </munderover> <munder> <mi>&Sigma;</mi> <mrow> <mi>U</mi> <mo>&Element;</mo> <msub> <mi>&Omega;</mi> <mi>j</mi> </msub> </mrow> </munder> <mi>Dis</mi> <mrow> <mo>(</mo> <mi>U</mi> <mo>,</mo> <msub> <mi>C</mi> <mi>q</mi> </msub> <mo>)</mo> </mrow> <mo>,</mo> <mi>j</mi> <mo>=</mo> <mn>1,2</mn> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mi>n</mi> </mrow> </math> of the formula VIII
In the formula VIII, Dis (U, C)q) Represents omegajUser and user cluster C inqCenter position OqThe distance of (d);
(b5) clustering results in a compact cluster of users, thus at Ω1,Ω2,...,ΩnOn the basis of the division, the users are further divided into smaller groups G1,G2,...,GmAnd realizing user subdivision.
7. The method for filtering search results in a mobile scene according to claim 4, wherein: in step 4.1, the user social network is constructed as follows:
step 7.1, adopting a word frequency/inverse document frequency TF/IDF model to each user UiCalculating the weight of each dimension of the characteristic vector; to q is1,q2,...,qdEach word q in (1)a,If it does not appear in the user's historical query record, then its corresponding weight vaIf not, the number of users with the word in the history query record is counted, wherein the number of users with the word is counted, the IDF value is log (N/D), N is the number of all users, and the TF/IDF value is the product of TF and IDF;
step 7.2 for each user UiAnalyzing the call records of the users within the latest time period delta T, and recording the call of each user uxAnalyze it with UiCalculating the total call times alpha, the total call duration beta and the call rule gamma in the delta T by using the formula IXiAnd uxDegree of relationship therebetween ψix
ψix=λ1·α+λ2·β+λ3Gamma formula IX
In the formula, λ is more than or equal to 01≤1,0≤λ2≤1,0≤λ31 or less and lambda123=1
<math> <mrow> <mi>&gamma;</mi> <mo>=</mo> <mfrac> <mn>1</mn> <msub> <mi>S</mi> <mi>t</mi> </msub> </mfrac> </mrow> </math>
<math> <mrow> <msub> <mi>S</mi> <mi>t</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mi>&alpha;</mi> <mo>-</mo> <mn>1</mn> </mrow> </mfrac> <munderover> <mi>&Sigma;</mi> <mrow> <mi>h</mi> <mo>=</mo> <mn>2</mn> </mrow> <mi>&alpha;</mi> </munderover> <msup> <mrow> <mo>(</mo> <mover> <mi>&Delta;t</mi> <mo>&OverBar;</mo> </mover> <mo>-</mo> <msub> <mi>&Delta;t</mi> <mi>h</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow> </math>
Δth=th-th-1,h=2,3,...,α
<math> <mrow> <mover> <mi>&Delta;t</mi> <mo>&OverBar;</mo> </mover> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mi>&alpha;</mi> <mo>-</mo> <mn>1</mn> </mrow> </mfrac> <munderover> <mi>&Sigma;</mi> <mrow> <mi>h</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>&alpha;</mi> </munderover> <msub> <mi>&Delta;t</mi> <mi>h</mi> </msub> <mo>.</mo> </mrow> </math>
CN 201110458155 2011-12-31 2011-12-31 Filtering method of search results in mobile environment Expired - Fee Related CN102591966B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110458155 CN102591966B (en) 2011-12-31 2011-12-31 Filtering method of search results in mobile environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110458155 CN102591966B (en) 2011-12-31 2011-12-31 Filtering method of search results in mobile environment

Publications (2)

Publication Number Publication Date
CN102591966A true CN102591966A (en) 2012-07-18
CN102591966B CN102591966B (en) 2013-12-18

Family

ID=46480604

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110458155 Expired - Fee Related CN102591966B (en) 2011-12-31 2011-12-31 Filtering method of search results in mobile environment

Country Status (1)

Country Link
CN (1) CN102591966B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102867031A (en) * 2012-08-27 2013-01-09 百度在线网络技术(北京)有限公司 Method and system for optimizing point of interest (POI) searching results, mobile terminal and server
WO2014101846A1 (en) * 2012-12-28 2014-07-03 Huawei Technologies Co., Ltd. Predictive caching in a distributed communication system
CN104317900A (en) * 2014-10-24 2015-01-28 重庆邮电大学 Multiattribute collaborative filtering recommendation method oriented to social network
CN104462239A (en) * 2014-11-18 2015-03-25 电信科学技术第十研究所 Customer relation discovery method based on data vectorization spatial analysis
CN104866474A (en) * 2014-02-20 2015-08-26 阿里巴巴集团控股有限公司 Personalized data searching method and device
CN105243135A (en) * 2015-09-30 2016-01-13 百度在线网络技术(北京)有限公司 Method and apparatus for showing search result
CN106570699A (en) * 2015-10-08 2017-04-19 平安科技(深圳)有限公司 Client contact information excavation method and server
CN111212381A (en) * 2019-12-18 2020-05-29 中通服建设有限公司 Mobile user behavior data analysis method and device, computer equipment and medium
CN113220969A (en) * 2020-02-06 2021-08-06 百度在线网络技术(北京)有限公司 Advertisement determination method, device, equipment and storage medium
CN113704604A (en) * 2021-08-24 2021-11-26 山东库睿科技有限公司 Search system and search method
CN113792180A (en) * 2021-08-30 2021-12-14 北京百度网讯科技有限公司 Duplicate removal method and device in recommendation scene, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1903460A1 (en) * 2006-09-21 2008-03-26 Sony Corporation Information processing
CN101819572A (en) * 2009-09-15 2010-09-01 电子科技大学 Method for establishing user interest model
CN101923545A (en) * 2009-06-15 2010-12-22 北京百分通联传媒技术有限公司 Method for recommending personalized information
CN102236646A (en) * 2010-04-20 2011-11-09 得利在线信息技术(北京)有限公司 Personalized item-level vertical pagerank algorithm iRank

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1903460A1 (en) * 2006-09-21 2008-03-26 Sony Corporation Information processing
CN101923545A (en) * 2009-06-15 2010-12-22 北京百分通联传媒技术有限公司 Method for recommending personalized information
CN101819572A (en) * 2009-09-15 2010-09-01 电子科技大学 Method for establishing user interest model
CN102236646A (en) * 2010-04-20 2011-11-09 得利在线信息技术(北京)有限公司 Personalized item-level vertical pagerank algorithm iRank

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王秀平等: "个性化学习推荐系统的设计与实现", 《微型电脑应用》 *
胡娟丽等: "基于典型反馈的个性化文本信息过滤", 《计算机应用》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102867031A (en) * 2012-08-27 2013-01-09 百度在线网络技术(北京)有限公司 Method and system for optimizing point of interest (POI) searching results, mobile terminal and server
WO2014101846A1 (en) * 2012-12-28 2014-07-03 Huawei Technologies Co., Ltd. Predictive caching in a distributed communication system
CN104866474A (en) * 2014-02-20 2015-08-26 阿里巴巴集团控股有限公司 Personalized data searching method and device
CN104866474B (en) * 2014-02-20 2018-10-09 阿里巴巴集团控股有限公司 Individuation data searching method and device
CN104317900A (en) * 2014-10-24 2015-01-28 重庆邮电大学 Multiattribute collaborative filtering recommendation method oriented to social network
CN104462239B (en) * 2014-11-18 2017-08-25 电信科学技术第十研究所 A kind of customer relationship based on data vector spatial analysis finds method
CN104462239A (en) * 2014-11-18 2015-03-25 电信科学技术第十研究所 Customer relation discovery method based on data vectorization spatial analysis
CN105243135B (en) * 2015-09-30 2019-09-20 百度在线网络技术(北京)有限公司 Show the method and device of search result
CN105243135A (en) * 2015-09-30 2016-01-13 百度在线网络技术(北京)有限公司 Method and apparatus for showing search result
CN106570699A (en) * 2015-10-08 2017-04-19 平安科技(深圳)有限公司 Client contact information excavation method and server
CN111212381A (en) * 2019-12-18 2020-05-29 中通服建设有限公司 Mobile user behavior data analysis method and device, computer equipment and medium
CN113220969A (en) * 2020-02-06 2021-08-06 百度在线网络技术(北京)有限公司 Advertisement determination method, device, equipment and storage medium
CN113704604A (en) * 2021-08-24 2021-11-26 山东库睿科技有限公司 Search system and search method
CN113792180A (en) * 2021-08-30 2021-12-14 北京百度网讯科技有限公司 Duplicate removal method and device in recommendation scene, electronic equipment and storage medium
CN113792180B (en) * 2021-08-30 2024-02-23 北京百度网讯科技有限公司 Method and device for removing duplicate in recommended scene, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN102591966B (en) 2013-12-18

Similar Documents

Publication Publication Date Title
CN102591966A (en) Filtering method of search results in mobile environment
Mirzasoleiman et al. Deletion-robust submodular maximization: Data summarization with “the right to be forgotten”
Khrouf et al. Hybrid event recommendation using linked data and user diversity
US9195679B1 (en) Method and system for the contextual display of image tags in a social network
CN105138653B (en) It is a kind of that method and its recommendation apparatus are recommended based on typical degree and the topic of difficulty
CN101320375A (en) Digital book search method based on user click action
CN108415928B (en) Book recommendation method and system based on weighted mixed k-nearest neighbor algorithm
Kwak et al. What we read, what we search: Media attention and public attention among 193 countries
CN105718576A (en) Individual position recommending system related to geographical features
CN115408618B (en) Point-of-interest recommendation method based on social relation fusion position dynamic popularity and geographic features
KR20120033821A (en) System and method for providing search result based on personal network
CN106709076A (en) Social network recommendation device and method based on collaborative filtering
CN105654267A (en) Cold-chain logistic stowage intelligent recommendation method based on spectral cl9ustering
CN114282120A (en) Graph embedding interest point recommendation algorithm fusing multidimensional relation
CN116383519A (en) Group recommendation method based on double weighted self-attention
Liu et al. Clustering analysis of urban fabric detection based on mobile traffic data
Sinha Summarization of archived and shared personal photo collections
Cohen et al. Leveraging discarded samples for tighter estimation of multiple-set aggregates
Shiratsuchi et al. Finding unknown interests utilizing the wisdom of crowds in a social bookmark service
CN105447013A (en) News recommendation system
Al-Ghossein et al. Exploiting contextual and external data for hotel recommendation
CN108710620B (en) Book recommendation method based on k-nearest neighbor algorithm of user
Badami et al. Cross-domain hashtag recommendation and story revelation in social media
CN102163227A (en) Method for analyzing web social network behavior tracks and obtaining control subsets
CN115618127A (en) Collaborative filtering algorithm of neural network recommendation system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20131218

Termination date: 20201231

CF01 Termination of patent right due to non-payment of annual fee