CN102591966A - Filtering method of search results in mobile environment - Google Patents

Filtering method of search results in mobile environment Download PDF

Info

Publication number
CN102591966A
CN102591966A CN2011104581556A CN201110458155A CN102591966A CN 102591966 A CN102591966 A CN 102591966A CN 2011104581556 A CN2011104581556 A CN 2011104581556A CN 201110458155 A CN201110458155 A CN 201110458155A CN 102591966 A CN102591966 A CN 102591966A
Authority
CN
China
Prior art keywords
user
users
formula
filtering
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011104581556A
Other languages
Chinese (zh)
Other versions
CN102591966B (en
Inventor
金海�
赵峰
袁平鹏
严奉伟
方飞
谢海洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN 201110458155 priority Critical patent/CN102591966B/en
Publication of CN102591966A publication Critical patent/CN102591966A/en
Application granted granted Critical
Publication of CN102591966B publication Critical patent/CN102591966B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a filtering method of search results in a mobile environment. The method comprises the steps of: finely dividing users into different groups according to history position information characteristics of the users; characteristically modeling the users according to the history query records of the users; analyzing history call records of the users, establishing a social intercourse relation network of the users and calculating the social intercourse relation importance among the users; and during search, firstly, filtering the search results based on contents by using an established user characteristic model, secondly, cooperatively filtering the search results with the finely divided user group information and the excavated information of the social intercourse relation network of the users, and thirdly, returning the search results to the users. With the method for excavating the user characteristics and filtering the information, the search results can be better filtered in a personalized way, a mass of unrelated search results can be removed, a result set can be simplified, and the personalized precise search in the mobile environment can be realized.

Description

一种移动场景下的搜索结果过滤方法A method for filtering search results in mobile scenarios

技术领域 technical field

本发明属于信息检索领域,具体涉及一种移动场景下的搜索结果过滤方法,该方法适用于移动场景下的个性化搜索。The invention belongs to the field of information retrieval, and in particular relates to a method for filtering search results in a mobile scene, and the method is suitable for personalized search in a mobile scene.

背景技术 Background technique

过去的十几年里,搜索引擎技术取得了飞速发展,传统的互联网搜索从技术实现到商业模式都已经发展的相当成熟,并取得了巨大成功。近年来,以移动互联网为代表的新兴技术和应用不断涌现,移动搜索便是移动互联网重要应用之一。In the past ten years, search engine technology has achieved rapid development, and traditional Internet search has developed quite maturely from technology implementation to business model, and has achieved great success. In recent years, emerging technologies and applications represented by the mobile Internet have emerged continuously, and mobile search is one of the important applications of the mobile Internet.

移动搜索由于移动终端移动性,便携性,以及屏幕尺寸、处理能力和可用带宽等局限性,使得其不能直接照搬现有互联网搜索的实现方案,主要原因有以下两点:(1)传统的互联网搜索引擎通常返回给用户大量的结果,实际上大多数情况下这些结果对用户而言,有一半以上是不相关的。其中一个主要的原因搜索引擎在只是对搜索关键字进行了简单了匹配,没有考虑其他信息(如用户上下文信息,个人偏好等),加上互联网上信息的激增,导致了很多“垃圾结果”的产生,用户不得不在搜索结果中自己筛选,这大大加重了用户的负担。在移动场景下,由于移动终端屏幕键盘尺寸、处理能力和可用带宽等局限性,上述情形是用户不能容忍的,一是大量垃圾结果浪费宝贵流量,二是用户在移动终端上对搜索结果进行翻页筛选是很不方便的,这决定了移动搜索必须是精准的搜索,要返回给用户尽量少的,精准的结果;(2)对同一个搜索关键字,统的互联网搜索引擎对所有的用户返回的是千篇一律的结果,然而不同用户由于其背景知识不同,兴趣爱好不同,信息需求是不同的,同一个关键字对不同的人,在不同的领域,不同的时间和地点都可能表达不同的意思,用户需要的往往只是所有搜索结果里面一个很小的子集。移动终端的移动性,便携性和私人性,使得用户可以随时随地的获取所需信息,使得个性化搜索需求更加强烈,这决定了移动搜索是一种与用户个人特征(如兴趣等)和用户上下文(如时间,地点,天气等因素)相关的个性化的搜索。Due to the limitations of mobile terminal mobility, portability, and screen size, processing power, and available bandwidth, mobile search cannot directly copy the existing Internet search implementation solutions. The main reasons are as follows: (1) Traditional Internet Search engines usually return a large number of results to users, in fact more than half of these results are irrelevant to users in most cases. One of the main reasons is that the search engine simply matches the search keywords without considering other information (such as user context information, personal preferences, etc.), coupled with the surge of information on the Internet, resulting in many "garbage results" As a result, users have to filter by themselves in the search results, which greatly increases the burden on users. In the mobile scenario, due to the limitations of the size of the mobile terminal screen keyboard, processing power, and available bandwidth, the above situation is unacceptable to users. First, a large number of garbage results waste valuable traffic, and second, users flip through search results on mobile terminals. Page screening is very inconvenient, which determines that mobile search must be an accurate search, and as few as possible, accurate results should be returned to the user; (2) For the same search keyword, the traditional Internet search engine can search for all users The returned results are the same. However, different users have different background knowledge, different hobbies, and different information needs. The same keyword may express different meanings to different people, in different fields, at different times and places. This means that what users need is often only a small subset of all search results. The mobility, portability and privacy of mobile terminals enable users to obtain the required information anytime and anywhere, which makes the demand for personalized search more intense. Context (such as time, location, weather and other factors) relevant personalized search.

因此,移动搜索需要实现的是个性化的精准搜索。目前,国内移动搜索研究尚处于起步阶段,实现技术较现有互联网搜索技术都尚不成熟,较早的技术有垂直搜索技术,如手机音乐搜索,小说搜索等,目前采用较多的实现方案是结合现有互联网搜索技术和相关辅助技术,如信息过滤技术,先对用户进行特征建模,然后以此模型对搜索结果进行个性化过滤,过滤掉不相关结果,实现个性化精准搜索。Therefore, what mobile search needs to achieve is personalized and precise search. At present, domestic mobile search research is still in its infancy, and its implementation technology is still immature compared with the existing Internet search technology. The earlier technology includes vertical search technology, such as mobile phone music search, novel search, etc. Currently, more implementation schemes are used. Combining the existing Internet search technology and related auxiliary technologies, such as information filtering technology, firstly model the characteristics of users, and then use this model to perform personalized filtering on search results, filter out irrelevant results, and realize personalized and accurate search.

用户特征建模常用技术有向量空间模型和本体模型,向量空间模型因其原理简单,实现容易,应用相对广泛。Common technologies for user feature modeling include vector space model and ontology model. Vector space model is relatively widely used because of its simple principle and easy implementation.

信息过滤技术常用的有基于内容的过滤技术和协同过滤技术,基于内容的过滤技术是对结果进行特征提取,计算结果和过滤模板(用户模型)的相似度,按设定阈值过滤,因为是以结果内容进行分析,通常能达到较好的过滤效果,但计算量较大。协同过滤技术则根据相同类型的人通常有着相同兴趣偏好这一思想,通过与当前用户兴趣相似的用户来对用户的搜索结果进行协同过滤,这一技术已在电子商务领域取得了很好的发展和应用。Commonly used information filtering technologies are content-based filtering technology and collaborative filtering technology. Content-based filtering technology is to extract features from the results, calculate the similarity between the results and the filtering template (user model), and filter according to the set threshold, because it is based on Analysis of the result content can usually achieve a better filtering effect, but the amount of calculation is relatively large. Collaborative filtering technology is based on the idea that people of the same type usually have the same interest preferences, and collaborative filtering of user search results is performed through users with similar interests to the current user. This technology has achieved good development in the field of e-commerce. and apply.

发明内容 Contents of the invention

本发明的目的是提供一种移动场景下的搜索结果过滤方法,该方法通过挖掘用户数据(用户历史位置信息,历史通话记录等)建立用户特征模型和用户社交网络,并依据用户特征模型和用户社交网络对搜索结果分别进行基于内容的过滤和协同过滤,过滤掉不相关的搜索结果,实现移动场景下的个性化的精准搜索,这对提高移动搜索用户体验和用户粘性是很有价值的。The purpose of the present invention is to provide a method for filtering search results in a mobile scene. The method establishes a user characteristic model and a user social network by mining user data (user historical location information, historical call records, etc.), and based on the user characteristic model and user Social networks perform content-based filtering and collaborative filtering on search results to filter out irrelevant search results and realize personalized and precise search in mobile scenarios, which is valuable for improving mobile search user experience and user stickiness.

本发明提供的一种移动场景下的搜索结果过滤方法,该方法包括下述步骤:A method for filtering search results in a mobile scene provided by the present invention, the method includes the following steps:

第1步对用户Ui,i=1,2,...,N的待过滤初始结果集R1,R2,...,RZ,利用d维向量空间对待过滤结果建立特征向量,Rr的特征向量表示为fRr={(q1,v1),(q2,v2),...,(qd,vd)},va代表各个维上的权值;利用词频/逆文档频率TF/IDF模型计算fRr,在每一维上的权值va,对q1,q2,...qd中的每一个词qa,如果其没有出现在Rr,中,则其权值为0,否则为其TF/IDF值,TF为其在Rr中出现的次数,IDF即逆文档频率,统计那些包含该词的结果个数z;Step 1 For the initial result sets R 1 , R 2 , . . . , R Z of users U i , i=1, 2, . The feature vector of R r is expressed as f Rr = {(q 1 , v 1 ), (q 2 , v 2 ),..., (q d , v d )}, v a represents the weight on each dimension; Use the word frequency/inverse document frequency TF/IDF model to calculate f Rr , the weight v a on each dimension, for each word q a in q 1 , q 2 ,...q d , if it does not appear in If R r is medium, its weight is 0, otherwise it is the TF/IDF value, TF is the number of times it appears in R r , IDF is the inverse document frequency, and the number z of the results containing the word is counted;

其中,IDF值即log(Z/z),Z是待过滤初始结果的个数,TF/IDF值为TF与IDF的乘积,r=1,2,...,Z,a=1,2,...,d;Among them, the IDF value is log(Z/z), Z is the number of initial results to be filtered, the TF/IDF value is the product of TF and IDF, r=1, 2,..., Z, a=1, 2 ,...,d;

第2步寻找当前用户Ui,的相似用户,从下述两个用户集合中选取,一是用户所属的群体Gg,g为用户所属的群体的序号,其取值范围为1至m,二是用户社交网络里的用户的集合,将这两个集合进行合并得到集合S,记该集合中的用户为Uis,利用式I所示的向量余弦夹角公式计算用户Ui与集合S中的每一个用户Uis之间的相似度,如式II所示,向量夹角越小,余弦值越大,相似度越大,反之亦然;i表示用户的序号,N表示用户的数量,i=1,2,...,N,fUi和fUis分别代表Ui和Uis的特征向量,ψ(Ui,Uis)代表Ui与Uis之间的关系程度,若Uis在Ui的社交网络中,则ψ(Ui,Uis)取相应的值,否则取零值;按相似度从高到低选取前η个用户Ui1,Ui2,...,U,若不足η个,则选取S中的所有用户;η为预先设定值;The second step is to find similar users of the current user U i , and select from the following two user sets, one is the group G g to which the user belongs, and g is the serial number of the group to which the user belongs, and its value ranges from 1 to m, The second is the set of users in the user's social network. Merge these two sets to obtain a set S, record the user in this set as U is , use the vector cosine angle formula shown in formula I to calculate the user U i and set S The similarity between each user U is , as shown in formula II, the smaller the vector angle, the larger the cosine value, the greater the similarity, and vice versa; i represents the serial number of the user, and N represents the number of users , i=1, 2,..., N, f Ui and f Uis represent the eigenvectors of U i and U is respectively, ψ(U i , U is ) represents the degree of relationship between U i and U is , if U is in the social network of U i , then ψ(U i , U is ) takes the corresponding value, otherwise it takes zero value; select the first n users U i1 , U i2 ,... , U i , if less than n, then select all users in S; n is a preset value;

sim ( U i , U is ) = ( 1 + ψ ( U i , U is ) ) · cos ( f U i , f U is ) 式I sim ( u i , u is ) = ( 1 + ψ ( u i , u is ) ) &Center Dot; cos ( f u i , f u is ) Formula I

cos ( f U i , f U is ) = f U i · f U is | | f U is | | · | | f U is | | 式II cos ( f u i , f u is ) = f u i · f u is | | f u is | | · | | f u is | | Formula II

第3步基于内容过滤:Step 3 Filter based on content:

对每一条待过滤初始结果Rr,采用式III依次计算其与用户Ui之间的相似度,fUi和fRr分别代表Ui和Rr的特征向量;根据相似度按预先设定的阈值ζ过滤,将相似度小于阈值ζ的初始结果过滤掉,得到中间结果集Rr,r=1,2,...,Zζ,过滤得到的中间结果按原有的先后顺序排列;For each initial result R r to be filtered, the similarity between it and the user U i is calculated sequentially using formula III, f Ui and f Rr represent the feature vectors of U i and R r respectively; according to the similarity according to the preset Threshold ζ filtering, filtering out the initial results whose similarity is smaller than the threshold ζ to obtain an intermediate result set R r , r=1, 2, ..., Z ζ , and the intermediate results obtained by filtering are arranged in the original sequence;

sim ( U i , R r ) = cos ( f U i , f R r ) 式III sim ( u i , R r ) = cos ( f u i , f R r ) Formula III

其中, cos ( f U i , f R r ) = f U i · f R r | | f U i | | · | | f R r | | in, cos ( f u i , f R r ) = f u i &Center Dot; f R r | | f u i | | &Center Dot; | | f R r | |

第2步对中间结果集Rr,r=1,2,...,Zζ进行协同过滤,利用用户Ui的η个最相似用户Ui1,Ui2,...,U,对中间结果Rr,,按式IV计算相似度sim′(Ui,Rr)进行协同过滤,式中,

Figure BDA0000127809440000045
Figure BDA0000127809440000046
分别代表Uis与Ui,Uis与Rr之间的相似度;The second step is to perform collaborative filtering on the intermediate result set R r , r=1, 2, ..., Z ζ , using the n most similar users U i1 , U i2 , ..., U i η of the user U i , to The intermediate result R r ,, calculate the similarity sim'(U i , R r ) according to formula IV for collaborative filtering, where,
Figure BDA0000127809440000045
and
Figure BDA0000127809440000046
represent the similarity between U is and U i , U is and R r respectively;

sim ′ ( U i , R r ) = Σ s = 1 η ( cos ( f U is , f U i ) · cos ( f U is , f R r ) ) 式IV sim ′ ( u i , R r ) = Σ the s = 1 η ( cos ( f u is , f u i ) &Center Dot; cos ( f u is , f R r ) ) Formula IV

Rankr=θ·r+(1-θ)·sim′(Ui,Rr)                式VRank r =θ·r+(1-θ)·sim′(U i , R r ) Formula V

根据sim′(Ui,Rr)按预先设定的阈值ε进行协同过滤,将相似度小于ε的中间结果过滤掉,得到临时结果集Rr,r=1,2,...,Zε,r代表其在临时结果集中的先后顺序排序,依次为1,2,...,Zε,对临时Rr,,以预先设定的加权系数θ利用式V计算其顺序r和sim′(Ui,Rr)的加权和,作为最终结果排名Rankr,以此排名对临时结果集Rr,重新排序,得到最终结果,返回给用户,过滤过程结束。According to sim′(U i , R r ), perform collaborative filtering according to the preset threshold ε, filter out the intermediate results whose similarity is less than ε, and obtain the temporary result set R r , r=1, 2, ..., Z ε , r represents its order in the temporary result set, which is 1, 2, ..., Z ε , for temporary R r , use the formula V to calculate its order r and sim with the preset weighting coefficient θ The weighted sum of ′(U i , R r ) is used as the final result to rank Rank r , and the temporary result set R r is reordered based on this ranking to obtain the final result, which is returned to the user, and the filtering process ends.

本发明提供的移动场景下的搜索结果过滤方法,综合采用了数据挖掘方法(分类,聚类),基于内容过滤算法和协同过滤算法。具体而言,本发明有以下效果和优点:The search result filtering method under the mobile scene provided by the present invention comprehensively adopts data mining methods (classification, clustering), and is based on a content filtering algorithm and a collaborative filtering algorithm. Specifically, the present invention has the following effects and advantages:

(1)准确度高,本发明创新性的将用户社交网络信息加以分析,在传统的基于内容过滤的基础上同时进行协同过滤,很大程度提高了准确度。(1) High accuracy. The present invention innovatively analyzes user social network information, and simultaneously performs collaborative filtering on the basis of traditional content-based filtering, which greatly improves accuracy.

(2)适应性强,本发明考虑到移动用户群体和个人的多样性,能很好地适应各种用户群体和个人的个性化需求。(2) Strong adaptability, the present invention considers the diversity of mobile user groups and individuals, and can well adapt to the personalized needs of various user groups and individuals.

(3)可扩展性高,本发明提供的过滤方法除了能用于移动搜索,也能用于其移动互联网应用,精准广告投放等,用户特征建模方法也能应用于客户关系管理(CRM)等。(3) high scalability, the filtering method provided by the present invention can not only be used for mobile search, but also can be used for its mobile Internet application, precise advertisement placement, etc., and the user characteristic modeling method can also be applied to customer relationship management (CRM) wait.

附图说明 Description of drawings

图1为本发明方法的整体流程图;Fig. 1 is the overall flowchart of the inventive method;

图2为移动用户历史位置变化频率简图;Figure 2 is a simplified diagram of the frequency of changes in the historical location of mobile users;

图3为移动用户按位置聚类的流程图;Fig. 3 is the flowchart of clustering by location of mobile users;

图4为移动用户社交网络结构图;Fig. 4 is a structural diagram of a mobile user social network;

图5为移动搜索结果的详细过滤流程图。Fig. 5 is a detailed filtering flow chart of mobile search results.

具体实施方式 Detailed ways

下面结合附图对本发明进行详细说明。The present invention will be described in detail below in conjunction with the accompanying drawings.

本发明提供的一种移动场景下的搜索结果过滤方法,如图1所示,先是过滤预处理阶段,主要包括用户细分,构建用户特征模型和构建用户社会网络,分别对应下述步骤(1)至步骤(3),然后是结果过滤阶段,对应下述步骤(4)。具体的处理步骤如下:A method for filtering search results under a mobile scene provided by the present invention, as shown in FIG. 1 , is first a filtering preprocessing stage, mainly including user subdivision, constructing a user characteristic model and constructing a user social network, respectively corresponding to the following steps (1 ) to step (3), and then the result filtering stage, corresponding to the following step (4). The specific processing steps are as follows:

1、过滤预处理阶段,包括下述步骤(1)至步骤(3)。1. Filtration pretreatment stage, comprising the following steps (1) to (3).

(1)用户细分,采用数据挖掘的方法对用户进行细分,现有电信运营商提供的用户数据集,里面收集了大量的用户数据,如用户的历史位置信息,历史通话记录,用户的历史查询记录和浏览记录,历史业务数据等,本发明主要以用户的历史位置信息来对用户进行细分,具体步骤如下:(1) User segmentation, using data mining methods to segment users, the user data sets provided by existing telecom operators, which collect a large amount of user data, such as user historical location information, historical call records, user Historical query records and browsing records, historical business data, etc., the present invention mainly uses the user's historical location information to subdivide the user, and the specific steps are as follows:

(a)根据用户的历史位置变化频率对用户进行划分,用户的历史位置信息记录了用户历史位置L和相应时间信息T,位置信息L以经纬度的形式记录在数据集里,如(30.2332,114.3243),时间信息T以时间点的形式记录,已知用户相邻两次历史位置的经纬度,采用经纬度距离公式(式(1))很容易计算出其距离,设第一个位置L1的经纬度为(lon1,lat1),第二个位置L2的经纬度为(lon2,lat2),按照0度经线的基准,东经取正值,西经取负值,北纬按(90°-lat)带入计算,南纬按(90°+lat)带入计算,用式子(1)则可计算两点之间的距离。(a) The users are divided according to the user’s historical position change frequency. The user’s historical position information records the user’s historical position L and corresponding time information T. The position information L is recorded in the data set in the form of latitude and longitude, such as (30.2332, 114.3243 ), the time information T is recorded in the form of time points, and the longitude and latitude of the two adjacent historical positions of the user are known, and the distance can be easily calculated by using the longitude and latitude distance formula (formula (1)), and the longitude and latitude of the first position L 1 is (lon 1 , lat 1 ), the latitude and longitude of the second location L 2 is (lon 2 , lat 2 ), according to the benchmark of the 0-degree meridian, the east longitude takes a positive value, the west longitude takes a negative value, and the north latitude is (90°- lat) into the calculation, the south latitude is calculated according to (90°+lat), and the distance between two points can be calculated by formula (1).

C=sin(lat1)·sin(lat2)·cos(lon1-lon2)+cos(lat1)·cos(lat2)C=sin(lat 1 )·sin(lat 2 )·cos(lon 1 -lon 2 )+cos(lat 1 )·cos(lat 2 )

Disdis (( LL 11 ,, LL 22 )) == RR ·&Center Dot; arccosarccos (( CC )) ·&Center Dot; ππ 180180 -- -- -- (( 11 ))

对每一个用户Ui,(i=1,2,...,N),计算其最近一段时间ΔT(如一个月)内的历史位置累计变化频率Fi,(i=1,2,...,N),其中,N表示用户的数量。For each user U i , (i=1, 2, ..., N), calculate the accumulative change frequency F i , (i=1, 2,. .., N), where N represents the number of users.

Ff ii == 11 ΔTΔT ΣΣ 11 Mm || Disdis (( LL kk ,, LL kk -- 11 )) TT kk -- TT kk -- 11 || -- -- -- (( 22 ))

如式(2)所示,(L1,T1),(L2,T2),...,(LM,TM)是用户Ui,(i=1,2,...,N)最近一段时间ΔT内的历史位置信息,(Lk-1,Tk-1)和(Lk,Tk)即用户相邻的两次历史位置和时间信息,Dis(Lk,Lk-1)与Tk-Tk-1分别为相邻两次的历史位置距离与时间之差。M表示当前用户的历史位置数量,k表示历史位置的序号。As shown in formula (2), (L 1 , T 1 ), (L 2 , T 2 ), ..., (L M , T M ) are users U i , (i=1, 2, ... , N) The historical location information within the most recent period of ΔT, (L k-1 , T k-1 ) and (L k , T k ) are two adjacent historical location and time information of the user, Dis(L k , L k-1 ) and T k −T k-1 are respectively the difference between the historical location distance and time of two adjacent times. M represents the number of historical positions of the current user, and k represents the sequence number of the historical positions.

统计所有用户的F,得到F的总体范围区间Ω,将Ω划分成若干子区间Ω1,Ω2,...,Ωn,n表示用户群体的数量,这些子区间以F表征不同的用户群体,用户依照其F被划分至相应的子区间内,如图2所示,用户A的F较高,可能是经常出差的商务人士。用户B的F较低,则可能经常是较长时间都在某一固定位置,如可能是某一高校学生,这样根据位置的变化频率F,对用户进行一个初步的划分,将用户分成的不同的群体Ω1,Ω2,...,Ωn。对Ω进行划分可以采用均分的方式,也可以由系统预先设定一个划分标准。Count the F of all users to obtain the overall range range Ω of F, and divide Ω into several sub-intervals Ω 1 , Ω 2 , ..., Ω n , where n represents the number of user groups, and these sub-intervals use F to represent different users Groups, users are divided into corresponding sub-intervals according to their F, as shown in Figure 2, user A has a higher F, and may be a business person who travels frequently. If user B's F is low, he may be in a fixed location for a long time, for example, he may be a student in a certain college. In this way, according to the location change frequency F, a preliminary division of users is carried out, and users are divided into different groups. The groups Ω 1 , Ω 2 , ..., Ω n . Dividing Ω can be divided equally, or a division standard can be preset by the system.

(b)接下来对每一个Ωj,(j=1,2,...,n,j表示群体的序号)里的用户按历史位置信息进行聚类,将位置邻近的用户聚为一类,相关调查研究表明,地理位置邻近的用户在一定程度上有着相似的用户特征,采用k均值聚类算法对每一个Ωj,(j=1,2,...,n)里的用户进行聚类,步骤如下:(b) Next, for each Ω j , (j=1, 2, ..., n, j represents the serial number of the group), the users in each group are clustered according to the historical location information, and the users with adjacent positions are clustered into one group , related investigations and studies have shown that geographically adjacent users have similar user characteristics to a certain extent, and the k-means clustering algorithm is used for each user in Ω j , (j=1, 2,..., n) Clustering, the steps are as follows:

(b1)首先计算出每一个用户Ui,(i=1,2,...,N)在ΔT时间内的历史位置的中心位置Oi,根据Oi对用户进行聚类;i表示用户的序号;(b1) First calculate the central position O i of each user U i , (i=1 , 2, ..., N) in the historical position within ΔT time, and cluster the users according to O i ; i represents the user serial number;

(b2)从Ωj,(j=1,2,...,n)中随机选取k个用户,每个用户Uq,(q=1,2,...,k)代表一个初始的用户簇Cq,(q=1,2,...k),其Oq,(q=1,2,...,k)代表用户簇的初始中心;(b2) Randomly select k users from Ω j , (j=1, 2, ..., n), each user U q , (q = 1, 2, ..., k) represents an initial User cluster C q , (q=1, 2, ... k), whose O q , (q = 1, 2, ..., k) represents the initial center of the user cluster;

(b3)对Ωj,(j=1,2,...,n)中剩余的每个用户,计算其与每个用户簇Cq,(q=1,2,...k)中心Oq,(q=1,2,...,k)的距离(经纬度距离公式),将其指派给距离最近的用户簇;(b3) For each remaining user in Ω j , (j=1, 2, ..., n), calculate its center with each user cluster C q , (q = 1, 2, ... k) O q , the distance (longitude and latitude distance formula) of (q=1, 2, ..., k), assign it to the nearest user cluster;

(b4)然后重新计算每个用户簇的新的中心值Oq,(q=1,2,...,k),替换旧的中心值。按式(3)计算准则函数Ej的值,若Ej的值收敛则聚类过程结束,否则,转步骤b3。(b4) Then recalculate the new central value O q of each user cluster, (q=1, 2, . . . , k), and replace the old central value. Calculate the value of criterion function E j according to formula (3), if the value of E j converges, the clustering process ends, otherwise, go to step b3.

E j = Σ q = 1 k Σ U ∈ Ω j Dis ( U , C q ) , (j=1,2,....n)                       (3) E. j = Σ q = 1 k Σ u ∈ Ω j dis ( u , C q ) , (j=1, 2, ... n) (3)

如式(3)所示,Dis(U,Cq)代表Ωj,(j=1,2,...,n)里的用户与用户簇Cq,(q=1,2,...k)中心Oq,(q=1,2,...,k)的距离。As shown in formula (3), Dis(U, C q ) represents the user and user cluster C q in Ω j , (j=1, 2, ..., n), (q = 1, 2, .. .k) distance from the center O q , (q=1, 2, . . . , k).

聚类得到紧凑的用户簇,这样在Ω1,Ω2,...,Ωn划分的基础上,将用户进一步划分成了更小的群体G1,G2,...,Gm,实现用户细分。Clustering to obtain compact user clusters, so that based on the division of Ω 1 , Ω 2 , ..., Ω n , the users are further divided into smaller groups G 1 , G 2 , ..., G m , Implement user segmentation.

(2)构建用户特征模型,用户的历史查询记录很好的表征了用户的兴趣特征,通过分析用户的历史查询记录,采用向量空进模型对用户进行特征建模,其步骤包括:(2) Build a user characteristic model, the user's historical query records well characterize the user's interest characteristics, by analyzing the user's historical query records, adopt the vector air-entry model to carry out feature modeling to the user, and its steps include:

(a)统计所有用户ΔT时间内的所有历史查询记录,统计得到d个互异的词q1,q2,...,qd,作为向量空间的d个维,用户的特征向量表示为fUi={(q1,v1),(q2,v2),...,(qd,vd)},(i=1,2,...,N),va,(a=1,2,...,d)代表各个维的权值。(a) Count all historical query records of all users within ΔT time, and obtain d different words q 1 , q 2 , ..., q d as the d dimensions of the vector space, and the user's feature vector is expressed as f Ui = {(q 1 , v 1 ), (q 2 , v 2 ), ..., (q d , v d )}, (i=1, 2, ..., N), v a , (a=1, 2, . . . , d) represents the weight of each dimension.

(b)采用TF/IDF(词频/逆文档频率)模型,对每一个用户Ui,(i=1,2,...,N),计算其特征向量每一维的权值。对q1,q2,...,qd中的每一个词qa,(a=1,2,...,d),如果其没有出现在用户的历史查询记录中,则其相应权值va,(a=1,2,...,d)为0,否则为其TF/IDF值,TF即词频,这里为用户的历史查询记录中出现该词的次数,IDF即逆文档频率,统计那些历史查询记录中出现过该词的用户的个数D,IDF值即log(N/D),N是所有用户数,TF/IDF值为TF与IDF的乘积。(b) Using TF/IDF (term frequency/inverse document frequency) model, for each user U i , (i=1, 2, . . . , N), calculate the weight of each dimension of its feature vector. For each word q a in q 1 , q 2 , ..., q d , (a=1, 2, ..., d), if it does not appear in the user's historical query records, then its corresponding The weight v a , (a=1, 2, ..., d) is 0, otherwise it is the TF/IDF value, TF is the word frequency, here is the number of times the word appears in the user's historical query records, and IDF is the inverse Document frequency, counting the number D of users who have the word in the historical query records, the IDF value is log(N/D), N is the number of all users, and the TF/IDF value is the product of TF and IDF.

(3)挖掘用户社交网络信息,分析用户历史通话记录,对每一个用户Ui,(i=1,2,...,N),其社交网络呈现为一个以该用户为中心的星型拓扑图,如图3所示,中心节点B代表用户自己,星星节点A,C,D,E,F,G等代表与B有通话记录的用户,边的权重ψ代表用户之间的关系程度,该步骤主要是估算ψ的值。(3) Mining the user's social network information, analyzing the user's historical call records, for each user U i , (i=1, 2, ..., N), its social network is presented as a star centered on the user Topological diagram, as shown in Figure 3, the central node B represents the user himself, the star nodes A, C, D, E, F, G, etc. represent the users who have call records with B, and the weight ψ of the edge represents the degree of relationship between users , this step is mainly to estimate the value of ψ.

用户的历史通话记录数据记录了所有用户之间的通话记录,包括通话双方的id号码),通话开始时间,通话结束时间等,对每一个用户Ui,(i=1,2,...,N),分析其ΔT时间内的通话记录,对与其有通话记录的每一个用户ux,(x=1,2,...,e,e表示与其有通话记录的用户个数),分析其与Ui,(i=1,2,...,N)在ΔT内的总通话次数α,总通话时长β,通话规律γ,综合分析这些因素,可以大致推断出Ui,(i=1,2,...,N)与ux,(x=1,2,...,e)之间的关系程度ψixThe user's historical call record data records the call records between all users, including the id numbers of both parties), call start time, call end time, etc., for each user U i , (i=1, 2,... , N), analyze its call record within ΔT time, for each user u x who has a call record with it, (x=1, 2, ..., e, e represents the number of users who have a call record with it), By analyzing the total number of calls α, total call duration β, and call rules γ between it and U i , (i=1, 2, ..., N) within ΔT, and comprehensively analyzing these factors, it can be roughly inferred that U i , ( The relationship degree ψ ix between i=1, 2, . . . , N) and u x , (x=1, 2, . . . , e).

总通话次数α和总通话时长β比较容易统计得到,但它们都是总体性的统计量,比较单一,只能总体上粗略体估计用户之间的关系程度,而忽略了重要的细节特征,如每次通话事件随时间的分布是否均匀,是整体均匀还是局部均匀等,所以这里还引入了通话规律γ这一特征因素来表征Ui,(i=1,2,...,N)与ux,(x=1,2,...,e)之间的关系程度,通过统计分析时间ΔT内的所有通话事件的时间分布特点,借用方差的思想,如式(4)(5)(6),th,(h=1,2,...,α)为每次通话开始时间,Δth为相邻两次通话记录之间的时间差,St为其方差,γ反比于St,如式(6)所示,方差小表示该段时间内的通话比较有规律,γ相应较大,反之亦然。The total number of calls α and the total call duration β are relatively easy to obtain, but they are overall statistics, relatively simple, and can only roughly estimate the relationship between users in general, while ignoring important details, such as Whether the distribution of each call event over time is uniform, whether it is uniform overall or locally, etc., so the characteristic factor of call rule γ is also introduced here to represent U i , (i=1, 2, ..., N) and u x , the degree of relationship between (x=1, 2, ..., e), by statistically analyzing the time distribution characteristics of all call events within the time ΔT, borrowing the idea of variance, such as formula (4) (5) (6), t h , (h=1, 2, ..., α) is the start time of each call, Δt h is the time difference between two adjacent call records, S t is its variance, and γ is inversely proportional to S t , as shown in formula (6), a small variance means that the calls within this period are relatively regular, and γ is relatively large, and vice versa.

Δth=th-th-1,(h=2,3,...,α)                   (4)Δt h =t h -t h-1 , (h=2, 3, . . . , α) (4)

ΔtΔt ‾‾ == 11 αα -- 11 ΣΣ hh == 11 αα ΔtΔt hh -- -- -- (( 55 ))

SS tt == 11 αα -- 11 ΣΣ hh == 22 αα (( ΔtΔt ‾‾ -- ΔtΔt hh )) 22 -- -- -- (( 66 ))

γγ == 11 SS tt -- -- -- (( 77 ))

将计算得到的α,β,γ进行归一化处理,得到0和1范围之间的值,ψix,(i=1,2,...,N,x=1,2,...,e)的值采用式(8)计算得到,它是综合考虑α,β,γ得到的一个加权值,式(8)中,0≤λ1≤1,0≤λ2≤1,0≤λ3≤1,且λ123=1,其默认值取均值1/3。Normalize the calculated α, β, γ to obtain a value between 0 and 1, ψ ix , (i=1, 2, ..., N, x = 1, 2, ... , the value of e) is calculated by formula (8), which is a weighted value obtained by comprehensively considering α, β, γ. In formula (8), 0≤λ 1 ≤1, 0≤λ 2 ≤1, 0≤ λ 3 ≤ 1, and λ 123 =1, and its default value is 1/3 of the mean value.

ψix=λ1·α+λ2·β+λ3·γ,(λ123=1)     (8)ψ ix1 ·α+λ 2 ·β+λ 3 ·γ, (λ 123 =1) (8)

这样通过该步骤的分析与计算,就得到了每个用户Ui,(i=1,2,...,N)的社交网络信息,包括其与之有联系的用户ux,(x=1,2,...,e)之间的关系程度ψixIn this way, through the analysis and calculation of this step, the social network information of each user U i , (i=1, 2 , . 1, 2, ..., e) the degree of relationship between ψ ix .

(4)搜索结果过滤,前面步骤(1)至步骤(3)都是准备阶段,是为了该步骤的搜索结果过滤服务的,步骤(2)建立的用户特征模型是用来对搜索结果进行基于内容的过滤,步骤(1)所做的用户细分和步骤(3)挖掘的用户社交网络信息是用来对搜索结果进行协同过滤。(4) Search result filtering. Steps (1) to (3) above are all preparatory stages for the search result filtering service of this step. The user characteristic model established in step (2) is used to perform search results based on Content filtering, user segmentation in step (1) and user social network information mined in step (3) are used for collaborative filtering of search results.

该步骤对搜索结果先进行基于内容的过滤,然后进行协同过滤。以达到个性化和精简搜索结果的目的。In this step, content-based filtering is first performed on search results, and then collaborative filtering is performed. To personalize and refine search results.

用户Ui,(i=1,2,...,N)提交一次搜索Q,搜索请求首先由现有互联网搜索引擎来处理,现有互联网搜索引擎对搜索Q返回一个初始结果集,该结果集通常较大,选取该结果集里的前φ条结果来进行过滤,若不足φ条,则选取全部初始结果集,作为待过滤结果集R1,R2,...,RZ,φ为一个经验值,由系统预先设定,如设定为300,Z为待过滤结果的个数。结果的过滤流程如图5所示,步骤如下:User U i , (i=1, 2, ..., N) submits a search Q, the search request is first processed by the existing Internet search engine, and the existing Internet search engine returns an initial result set to the search Q, the result The set is usually large, select the first φ results in the result set to filter, if there are less than φ, select all the initial result sets as the result sets to be filtered R 1 , R 2 ,..., R Z , φ is an empirical value, preset by the system, for example, it is set to 300, and Z is the number of results to be filtered. The result filtering process is shown in Figure 5, and the steps are as follows:

(a)对待过滤结果集R1,R2,...,RZ,建立特征向量,采用步骤(2)中建立的d维向量空间对这些结果建立特征向量,Rr(r=1,2,...,Z)的特征向量表示为fRr={q1,v1),(q2,v2),...,(qd,vd)},(r=1,2,...,Z),va,(a=1,2,...,d)代表各个维上的权值。同样采用步骤(2)中用到的TF/IDF(词频/逆文档频率)模型来计算fRr,(r=1,2,...,Z)在每一维上的权值va,(a=1,2,...,d),对q1,q2,...qd中的每一个词qa,(a=1,2,...,d),如果其没有出现在Rr,(r=1,2,...,Z)中,则其权值为0,否则为其TF/IDF值,TF为其在Rr,(r=1,2,...,Z)中出现的次数,IDF即逆文档频率,统计那些包含该词的结果个数z,IDF值即log(Z/z),Z是所有结果数,TF/IDF值为TF与IDF的乘积。(a) To filter the result sets R 1 , R 2 , ..., R Z , establish feature vectors, use the d-dimensional vector space established in step (2) to establish feature vectors for these results, R r (r=1, 2, ..., Z) is expressed as f Rr = {q 1 , v 1 ), (q 2 , v 2 ), ..., (q d , v d )}, (r=1, 2, . . . , Z), v a , (a=1, 2, . . . , d) represent weights on each dimension. Also use the TF/IDF (term frequency/inverse document frequency) model used in step (2) to calculate f Rr , the weight v a of (r=1, 2, ..., Z) on each dimension, (a=1,2,...,d), for each word q a in q 1 , q 2 ,...q d , (a=1,2,...,d), if its If it does not appear in R r , (r=1, 2, ..., Z), its weight is 0, otherwise it is its TF/IDF value, and TF is its value in R r , (r=1, 2, ..., Z), IDF is the inverse document frequency, and counts the number of results containing the word z, the IDF value is log(Z/z), Z is the number of all results, and the TF/IDF value is TF Multiplied by IDF.

(b)接下来寻找当前用户Ui,(i=1,2,...,N)的相似用户,从两个用户集合中选取,一是步骤(1)中用户所属的群体Gg,g为用户所属的群体的序号,其取值范围为1至m,二是步骤(3)中建立的用户社交网络里的用户的集合,将这两个集合进行合并(有可能有重复的用户)得到集合S,从集合S中选取若干个相似用户。(b) Next, look for similar users of the current user U i , (i=1, 2, ..., N), and select from two user sets, one is the group G g to which the user belongs in step (1), g is the serial number of the group to which the user belongs, and its value range is from 1 to m. The second is the set of users in the user social network established in step (3), and these two sets are merged (there may be repeated users ) to get a set S, and select several similar users from the set S.

simsim (( Uu ii ,, Uu isis )) == (( 11 ++ ψψ (( Uu ii ,, Uu isis )) )) ·· coscos (( ff Uu ii ,, ff Uu isis )) -- -- -- (( 99 ))

coscos (( ff Uu ii ,, ff Uu isis )) == ff Uu ii ·· ff Uu isis || || ff Uu isis || || ·· || || ff Uu isis || || -- -- -- (( 1010 ))

式(10)中,|| ||表示向量的模。In formula (10), || || represents the modulus of the vector.

(5)采用式(10)所示的向量余弦夹角公式计算Ui,(i=1,2,...,N)与集合S中的每一个用户Uis之间的相似度,如式(9)所示,向量夹角越小,余弦值越大,相似度越大,反之亦然。fUi和fUis分别代表Ui和Uis的特征向量,ψ(Ui,Uis)代表Ui与Uis之间的关系程度,若Uis在Ui的社交网络中,则ψ(Ui,Uis)取相应的值,否则取零值。按相似度从高到低选取前η个用户Ui1,Ui2,...,U,若不足η个,则选取S中的所有用户。η为一个经验值,由系统预先设定,如其默认值可以取10个。(5) Calculate the similarity between U i , (i=1, 2, ..., N) and each user U is in the set S by using the vector cosine angle formula shown in formula (10), as As shown in formula (9), the smaller the angle between the vectors, the larger the cosine value, and the larger the similarity, and vice versa. f Ui and f Uis represent the feature vectors of U i and U is respectively, ψ(U i , U is ) represents the degree of relationship between U i and U is , if U is in the social network of U i , then ψ( U i , U is ) take corresponding values, otherwise take zero value. Select the first n users U i1 , U i2 , . η is an empirical value, preset by the system, such as its default value can be 10.

(c)然后开始进行结果过滤了,过滤过程分两个阶段,基于内容的过滤阶段和协同过滤阶段:(c) Then start to filter the results. The filtering process is divided into two stages, the content-based filtering stage and the collaborative filtering stage:

(c1)先是基于内容过滤,对(a)中的每一条待过滤初始结果Rr,(r=1,2,...,Z),依次计算其与用户Ui,(i=1,2,...,N)之间的相似度,同样,采用式(10)计算两者之间的相似度,如式(11)所示,fUi和fRr分别代表Ui和Rr的特征向量。根据相似度按阈值ζ过滤,将相似度小于ζ的结果过滤掉,得到中间结果集Rr,(r=1,2,...,Zζ),过滤得到的中间结果按原始的先后顺序排列。阈值ζ为一个经验值,由系统预先设定,0≤ζ≤1,其默认值可以设定为0.65。(c1) First, based on content filtering, for each initial result R r to be filtered in (a), (r=1, 2, ..., Z), sequentially calculate its relationship with user U i , (i=1, 2, ..., N), similarly, use formula (10) to calculate the similarity between the two, as shown in formula (11), f Ui and f Rr represent U i and R r respectively eigenvectors of . Filter according to the threshold ζ according to the similarity, filter out the results with a similarity smaller than ζ, and obtain the intermediate result set R r , (r=1, 2, ..., Z ζ ), the intermediate results obtained by filtering are in the original order arrangement. Threshold ζ is an empirical value, preset by the system, 0≤ζ≤1, and its default value can be set to 0.65.

simsim (( Uu ii ,, Uu rr )) == coscos (( ff Uu ii ,, ff RR rr )) -- -- -- (( 1111 ))

(c2)接下来对中间结果集Rr,(r=1,2,...,Zζ)进行协同过滤,协同过滤是基于相似用户通常有着相似的兴趣这一思想,以当前用户的相似用户来对当前用户进行协同推荐,采用步骤(b)中计算得到的用户Ui,(i=1,2,...,N)的η个最相似用户Ui1,Ui2,...,U,对中间结果Rr,(r=1,2,...,Zζ),按式(12)计算相似度sim′(Ui,Rr)进行协同过滤,式中,采用式(10)向量余弦夹角公式,

Figure BDA0000127809440000122
Figure BDA0000127809440000123
分别代表Uis与Ui,Uis与Rr之间的相似度。(c2) Next, perform collaborative filtering on the intermediate result set R r , (r=1, 2, ..., Z ζ ). Collaborative filtering is based on the idea that similar users usually have similar interests. The user comes to carry out collaborative recommendation for the current user, using the n most similar users U i1 , U i2 , . . . , U i η , for the intermediate results R r , (r=1, 2, ..., Z ζ ), calculate the similarity sim′(U i , R r ) according to formula (12) to perform collaborative filtering, where, using Equation (10) vector cosine angle formula,
Figure BDA0000127809440000122
and
Figure BDA0000127809440000123
represent the similarity between U is and U i , U is and R r respectively.

simsim ′′ (( Uu ii ,, RR rr )) == ΣΣ sthe s == 11 ηη (( coscos (( ff Uu isis ,, ff Uu ii )) ·· coscos (( ff Uu isis ,, ff RR rr )) )) -- -- -- (( 1212 ))

Rankr=θ·r+(1-θ)·sim′(Ui,Rr)                           (13)Rank r = θ·r+(1-θ)·sim'(U i , R r ) (13)

根据sim′(Ui,Rr)按阈值ε进行协同过滤,将相似度小于ε的结果过滤掉,得到临时结果集Rr,(r=1,2,...,Zε),r代表其在临时结果集中的先后顺序排序,依次为1,2,...,Zε),对Rr,(r=1,2,...,Zε),以加权系数θ计算其顺序r和sim′(Ui,Rr)的加权和,作为最终结果排名Rankr,如式(13)所示,以此排名对Rr,(r=1,2,...,Zε)重新排序,得到最终结果,返回给用户,过滤过程结束。阈值ε与加权系数θ均为经验值,由系统预先设定,0≤ε≤1,0≤θ≤1,ε的默认值可以设定为0.85,θ的默认值可以设定为0.5。According to sim′(U i , R r ), perform collaborative filtering according to the threshold ε, filter out the results with similarity less than ε, and obtain a temporary result set R r , (r=1, 2, ..., Z ε ), r Represents its order in the temporary result set, which is 1, 2, ..., Z ε ), for R r , (r=1, 2, ..., Z ε ), the weighting coefficient θ is used to calculate its The weighted sum of sequence r and sim'(U i , R r ) is used as the final result to rank Rank r , as shown in formula (13), and the ranking pair R r , (r=1, 2, ..., Z ε ) re-sort, get the final result, return to the user, and the filtering process ends. Both the threshold ε and the weighting coefficient θ are empirical values, preset by the system, 0≤ε≤1, 0≤θ≤1, the default value of ε can be set to 0.85, and the default value of θ can be set to 0.5.

本发明不仅局限于上述具体实施方式,本领域一般技术人员根据本发明公开的内容,可以采用其它多种具体实施方式实施本发明,因此,凡是采用本发明的设计结构和思路,做一些简单的变化或更改的设计,都落入本发明保护的范围。The present invention is not limited to the above-mentioned specific embodiments, and those skilled in the art can implement the present invention by using other various specific embodiments according to the disclosed content of the present invention. Changes or modified designs all fall within the protection scope of the present invention.

Claims (7)

1.一种移动场景下的搜索结果过滤方法,该方法包括下述步骤:1. A method for filtering search results under a mobile scene, the method comprising the steps of: 第1步对用户Ui,i=1,2,...,N的待过滤初始结果集R1,R2,...,RZ,利用d维向量空间对待过滤结果建立特征向量,Rr的特征向量表示为fRr={q1,v1),(q2,v2),...,(qd,vd)},va代表各个维上的权值;利用词频/逆文档频率TF/IDF模型计算fRr,在每一维上的权值va,对q1,q2,...qd中的每一个词qa,如果其没有出现在Rr,中,则其权值为0,否则为其TF/IDF值,TF为其在Rr中出现的次数,IDF即逆文档频率,统计那些包含该词的结果个数z;Step 1 For the initial result sets R 1 , R 2 , . . . , R Z of users U i , i=1, 2, . The eigenvector of R r is expressed as f Rr = {q 1 , v 1 ), (q 2 , v 2 ),..., (q d , v d )}, v a represents the weight on each dimension; use The term frequency/inverse document frequency TF/IDF model calculates f Rr , the weight v a on each dimension, for each word q a in q 1 , q 2 ,...q d , if it does not appear in R r , medium, its weight is 0, otherwise its TF/IDF value, TF is the number of times it appears in R r , IDF is the inverse document frequency, count the number z of the results containing the word; 其中,IDF值即log(Z/z),Z是待过滤初始结果的个数,TF/IDF值为TF与IDF的乘积,r=1,2,...,Z,a=1,2,...,d;Among them, the IDF value is log(Z/z), Z is the number of initial results to be filtered, the TF/IDF value is the product of TF and IDF, r=1, 2,..., Z, a=1, 2 ,...,d; 第2步寻找当前用户Ui,的相似用户,从下述两个用户集合中选取,一是用户所属的群体Gg,g为用户所属的群体的序号,其取值范围为1至m,二是用户社交网络里的用户的集合,将这两个集合进行合并得到集合S,记该集合中的用户为Uis,利用式I所示的向量余弦夹角公式计算用户Ui与集合S中的每一个用户Uis之间的相似度,如式II所示,向量夹角越小,余弦值越大,相似度越大,反之亦然;i表示用户的序号,N表示用户的数量,i=1,2,...,N,fUi和fUis分别代表Ui和Uis的特征向量,ψ(Ui,Uis)代表Ui与Uis之间的关系程度,若Uis在Ui的社交网络中,则ψ(Ui,Uis)取相应的值,否则取零值;按相似度从高到低选取前η个用户Ui1,Ui2,...,U,若不足η个,则选取S中的所有用户;η为预先设定值;The second step is to find similar users of the current user U i , and select from the following two user sets, one is the group G g to which the user belongs, and g is the serial number of the group to which the user belongs, and its value ranges from 1 to m, The second is the set of users in the user's social network. Merge these two sets to obtain a set S, record the user in this set as U is , use the vector cosine angle formula shown in formula I to calculate the user U i and set S The similarity between each user U is , as shown in formula II, the smaller the vector angle, the larger the cosine value, the greater the similarity, and vice versa; i represents the serial number of the user, and N represents the number of users , i=1, 2,..., N, f Ui and f Uis represent the eigenvectors of U i and U is respectively, ψ(U i , U is ) represents the degree of relationship between U i and U is , if U is in the social network of U i , then ψ(U i , U is ) takes the corresponding value, otherwise it takes zero value; select the first n users U i1 , U i2 ,... , U i , if less than n, then select all users in S; n is a preset value; sim ( U i , U is ) = ( 1 + ψ ( U i , U is ) ) · cos ( f U i , f U is ) 式I sim ( u i , u is ) = ( 1 + ψ ( u i , u is ) ) &Center Dot; cos ( f u i , f u is ) Formula I cos ( f U i , f U is ) = f U i · f U is | | f U is | | · | | f U is | | 式II cos ( f u i , f u is ) = f u i &Center Dot; f u is | | f u is | | · | | f u is | | Formula II 第3步基于内容过滤:Step 3 Filter based on content: 对每一条待过滤初始结果Rr,采用式III依次计算其与用户Ui之间的相似度,fUi和fRr分别代表Ui和Rr的特征向量;根据相似度按预先设定的阈值ζ过滤,将相似度小于阈值ζ的初始结果过滤掉,得到中间结果集Rr,r=1,2,...,Zζ,过滤得到的中间结果按原有的先后顺序排列;For each initial result R r to be filtered, the similarity between it and the user U i is calculated sequentially using formula III, f Ui and f Rr represent the feature vectors of U i and R r respectively; according to the similarity according to the preset Threshold ζ filtering, filtering out the initial results whose similarity is smaller than the threshold ζ to obtain an intermediate result set R r , r=1, 2, ..., Z ζ , and the intermediate results obtained by filtering are arranged in the original sequence; sim ( U i , R r ) = cos ( f U i , f R r ) 式III sim ( u i , R r ) = cos ( f u i , f R r ) Formula III 其中, cos ( f U i , f R r ) = f U i · f R r | | f U i | | · | | f R r | | in, cos ( f u i , f R r ) = f u i &Center Dot; f R r | | f u i | | · | | f R r | | 第2步对中间结果集Rr,r=1,2,...,Zζ进行协同过滤,利用用户Ui的η个最相似用户Ui1,Ui2,...,U,对中间结果Rr,,按式IV计算相似度sim′(Ui,Rr)进行协同过滤,式中,
Figure FDA0000127809430000024
Figure FDA0000127809430000025
分别代表Uis与Ui,Uis与Rr之间的相似度;
The second step is to perform collaborative filtering on the intermediate result set R r , r=1, 2, ..., Z ζ , using the n most similar users U i1 , U i2 , ..., U i η of the user U i , to The intermediate result R r ,, calculate the similarity sim'(U i , R r ) according to formula IV for collaborative filtering, where,
Figure FDA0000127809430000024
and
Figure FDA0000127809430000025
represent the similarity between U is and U i , U is and R r respectively;
sim ′ ( U i , R r ) = Σ s = 1 η ( cos ( f U is , f U i ) · cos ( f U is , f R r ) ) 式IV sim ′ ( u i , R r ) = Σ the s = 1 η ( cos ( f u is , f u i ) · cos ( f u is , f R r ) ) Formula IV Rankr=θ·r+(1-θ)·sim′(Ui,Rr)                      式VRank r =θ·r+(1-θ)·sim′(U i , R r ) Formula V 根据sim′(Ui,Rr)按预先设定的阈值ε进行协同过滤,将相似度小于ε的中间结果过滤掉,得到临时结果集Rr,r=1,2,...,Zε,r代表其在临时结果集中的先后顺序排序,依次为1,2,...,Zε,对临时Rr,,以预先设定的加权系数θ利用式V计算其顺序r和sim′(Ui,Rr)的加权和,作为最终结果排名Rankr,以此排名对临时结果集Rr,重新排序,得到最终结果,返回给用户,过滤过程结束。According to sim′(U i , R r ), perform collaborative filtering according to the preset threshold ε, filter out the intermediate results whose similarity is less than ε, and obtain the temporary result set R r , r=1, 2, ..., Z ε , r represents its order in the temporary result set, which is 1, 2, ..., Z ε , for temporary R r , use the formula V to calculate its order r and sim with the preset weighting coefficient θ The weighted sum of ′(U i , R r ) is used as the final result to rank Rank r , and the temporary result set R r is reordered based on this ranking to obtain the final result, which is returned to the user, and the filtering process ends.
2.根据权利要求1所述的移动场景下的搜索结果过滤方法,其特征在于:第1步中的初始结果集按照下述方式得到:2. the search result filtering method under the mobile scene according to claim 1, is characterized in that: the initial result set in the 1st step obtains in the following manner: 对于用户Ui提交一次搜索Q,搜索请求首先由现有互联网搜索引擎来处理,现有互联网搜索引擎对搜索Q返回一个初始结果集,选取该结果集里的前φ条结果来进行过滤,若不足φ条,则选取全部初始结果集,作为待过滤结果集R1,R2,...,RZ,φ由系统预先设定,Z为待过滤结果的个数。For user U i to submit a search Q, the search request is first processed by the existing Internet search engine, and the existing Internet search engine returns an initial result set to the search Q, and selects the first φ results in the result set to filter, if If there are less than φ items, select all the initial result sets as the result sets to be filtered R 1 , R 2 ,..., R Z , φ is preset by the system, and Z is the number of results to be filtered. 3.根据权利要求1所述的移动场景下的搜索结果过滤方法,其特征在于:第1步按照下述方式得到待过滤结果的特征向量:3. the search result filtering method under the mobile scene according to claim 1, is characterized in that: the 1st step obtains the feature vector of result to be filtered according to the following manner: 统计所有用户ΔT时间内的所有历史查询记录,统计得到d个互异的词q1,q2,...,qd,作为向量空间的d个维,用户的特征向量表示为fUi={q1,v1),(q2,v2),...,(qd,vd)},i=1,2,...,N,va,a=1,2,...,d代表各个维的权值。Count all historical query records of all users within ΔT time, and obtain d different words q 1 , q 2 , ..., q d as the d dimensions of the vector space, and the user's feature vector is expressed as f Ui = {q 1 , v 1 ), (q 2 , v 2 ), ..., (q d , v d )}, i=1, 2, ..., N, v a , a=1, 2, ..., d represents the weight of each dimension. 4.根据权利要求1所述的移动场景下的搜索结果过滤方法,其特征在于:第2步,按照下述方式得到最相似用户:4. The search result filtering method under the mobile scene according to claim 1, characterized in that: in the second step, the most similar user is obtained in the following manner: 第4.1步寻找当前用户Ui,的相似用户,将用户所属的群体Gg和用户社交网络里的用户的集合进行合并,得到集合S,g为用户所属的群体的序号,其取值范围为1至m,m表示群体的个数;Step 4.1 Find similar users of the current user U i , and merge the group G g to which the user belongs with the set of users in the user's social network to obtain a set S, where g is the serial number of the group to which the user belongs, and its value range is 1 to m, m represents the number of groups; 第4.2步采用式VI计算Ui与集合S中的每一个用户Uis之间的相似度sim(Ui,Uis),fUi和fUis分别代表Ui和Uis的特征向量,ψ(Ui,Uis)代表Ui与Uis之间的关系程度,若Uis在Ui的社交网络中,则ψ(Ui,Uis)取相应的值,否则取零值;按相似度从高到低选取前η个用户Ui1,Ui2,...,U,若不足η个,则选取S中的所有用户;η为预先设定的值;Step 4.2 Use formula VI to calculate the similarity sim(U i , U is ) between U i and each user U is in the set S, f Ui and f Uis represent the feature vectors of U i and U is respectively, ψ (U i , U is ) represents the degree of relationship between U i and U is , if U is in U i 's social network, then ψ(U i , U is ) takes the corresponding value, otherwise it takes zero value; press Select the first n users U i1 , U i2 , . . . sim ( U i , U is ) = ( 1 + ψ ( U i , U is ) ) · cos ( f U i , f U is ) 式VI sim ( u i , u is ) = ( 1 + ψ ( u i , u is ) ) &Center Dot; cos ( f u i , f u is ) Formula VI 其中, cos ( f U i , f U is ) = f U i · f U is | | f U is | | · | | f U is | | . in, cos ( f u i , f u is ) = f u i · f u is | | f u is | | &Center Dot; | | f u is | | . 5.根据权利要求4所述的移动场景下的搜索结果过滤方法,其特征在于:第4.1步中,用户所属的群体Gg按照下述方式获取:5. the method for filtering search results under the mobile scene according to claim 4, characterized in that: in the 4.1 step, the group G to which the user belongs is obtained in the following manner: 第5.1步根据用户的历史位置变化频率对用户进行划分,用户的历史位置信息记录了用户历史位置信息L和相应时间信息T,历史位置信息L以经纬度的形式记录在数据集里,时间信息T以时间点的形式记录,已知用户相邻两次历史位置的经纬度,采用经纬度距离公式计算出其距离;Step 5.1 Divide the users according to the user's historical location change frequency. The user's historical location information records the user's historical location information L and corresponding time information T. The historical location information L is recorded in the data set in the form of latitude and longitude, and the time information T Recorded in the form of time points, the longitude and latitude of the two adjacent historical locations of the user are known, and the distance is calculated using the longitude and latitude distance formula; 对每一个用户Ui,,按照式VII计算其最近一段时间ΔT内的历史位置累计变化频率FijFor each user U i , calculate the accumulative change frequency F ij of its historical position within the latest period of time ΔT according to formula VII: F i = 1 ΔT Σ 1 M | Dis ( L k , L k - 1 ) T k - T k - 1 | VII f i = 1 ΔT Σ 1 m | dis ( L k , L k - 1 ) T k - T k - 1 | VII (L1,T1),(L2,T2),...,(LM,TM)是用户Ui,最近一段时间ΔT内的历史位置信息,(Lk-1,Tk-1)和(Lk,Tk)即用户相邻的两次历史位置和时间信息,Dis(Lk,Lk-1)与Tk-Tk-1分别为相邻两次的历史位置距离与时间之差;M表示当前用户的历史位置数量,k表示历史位置的序号;(L 1 , T 1 ), (L 2 , T 2 ), ..., (L M , T M ) are the historical location information of user U i within the latest period ΔT, (L k-1 , T k -1 ) and (L k , T k ) are the user's two adjacent historical location and time information, and Dis(L k , L k-1 ) and T k -T k-1 are the two adjacent histories respectively The difference between location distance and time; M represents the number of historical locations of the current user, and k represents the sequence number of historical locations; 第5.2步统计所有用户的历史位置累计变化频率F,得到F的总体范围区间Ω,将Ω划分成若干子区间Ω1,Ω2,...,Ωn,n表示用户群体的数量,这些子区间以F表征不同的用户群体,用户依照其F被划分至相应的子区间内,将用户分成的不同的群体Ω1,Ω2,...,ΩnStep 5.2 counts the accumulative change frequency F of all users' historical positions, and obtains the overall range interval Ω of F, and divides Ω into several sub-intervals Ω 1 , Ω 2 , ..., Ω n , where n represents the number of user groups, these Sub-intervals use F to represent different user groups, users are divided into corresponding sub-intervals according to their F, and users are divided into different groups Ω 1 , Ω 2 , ..., Ω n ; 第5.3步对每一个Ωj里的用户按历史位置信息进行聚类,将位置邻近的用户聚为一类,再将用户进一步划分成了更小的群体G1,G2,...,Gm,j=1,2,...,n,j表示群体的序号。Step 5.3 Cluster the users in each Ω j according to the historical location information, cluster the users with adjacent locations into one group, and then further divide the users into smaller groups G 1 , G 2 ,..., G m , j=1, 2, . . . , n, j represents the serial number of the group. 6.根据权利要求5所述的移动场景下的搜索结果过滤方法,其特征在于:第5.3步采用k均值聚类算法对每一个Ωj里的用户进行聚类,步骤如下:6. according to the search result filtering method under the mobile scene described in claim 5, it is characterized in that: the 5.3rd step adopts k-means clustering algorithm to carry out clustering to the user in each Ω j li , and step is as follows: (b1)首先计算出每一个用户Ui在最近一段时间ΔT内的历史位置的中心位置Oi,根据中心位置Oi对用户进行聚类;i表示用户的序号;(b1) First calculate the center position O i of the historical position of each user U i in the most recent period of ΔT, and cluster the users according to the center position O i ; i represents the serial number of the user; (b2)从Ωj中随机选取k个用户,每个用户Uq,代表一个初始的用户簇Cq,其中心位置Oq代表用户簇的初始中心,q 1,2,...,k;(b2) Randomly select k users from Ω j , each user U q represents an initial user cluster C q , and its center position O q represents the initial center of the user cluster, q 1, 2, ..., k ; (b3)对Ωj,中剩余的每个用户,计算其与每个用户簇Cq中心位置Oq的距离,将其指派给距离最近的用户簇;(b3) For each remaining user in Ω j , calculate the distance between it and the central position O q of each user cluster C q , and assign it to the nearest user cluster; (b4)然后重新计算每个用户簇的新的中心位置Oq,,替换旧的中心值;按式VIII计算准则函数Ej的值,若Ej的值收敛则聚类过程结束,否则,转步骤b3;(b4) Then recalculate the new center position O q of each user cluster, and replace the old center value; calculate the value of the criterion function E j according to formula VIII, if the value of E j converges, the clustering process ends, otherwise, Go to step b3; E j = Σ q = 1 k Σ U ∈ Ω j Dis ( U , C q ) , j = 1,2 , . . . n 式VIII E. j = Σ q = 1 k Σ u ∈ Ω j dis ( u , C q ) , j = 1,2 , . . . no Formula VIII 式VIII中,Dis(U,Cq)代表Ωj里的用户与用户簇Cq,中心位置Oq的距离;In Formula VIII, Dis(U, C q ) represents the distance between the user in Ω j and the user cluster C q , and the center position O q ; (b5)聚类得到紧凑的用户簇,这样在Ω1,Ω2,...,Ωn划分的基础上,将用户进一步划分成了更小的群体G1,G2,...,Gm,实现用户细分。(b5) Clustering to obtain compact user clusters, so that based on the division of Ω 1 , Ω 2 , ..., Ω n , the users are further divided into smaller groups G 1 , G 2 , ..., G m , realize user segmentation. 7.根据权利要求4所述的移动场景下的搜索结果过滤方法,其特征在于:第4.1步中,用户社交网络按照下述方式构建:7. The search result filtering method under the mobile scene according to claim 4, characterized in that: in the 4.1 step, the user social network is constructed in the following manner: 第7.1步采用词频/逆文档频率TF/IDF模型,对每一个用户Ui计算其特征向量每一维的权值;对q1,q2,...,qd中的每一个词qa,,如果其没有出现在用户的历史查询记录中,则其相应权值va为0,否则为其TF/IDF值,TF即词频,IDF即逆文档频率,统计那些历史查询记录中出现过该词的用户的个数D,IDF值即log(N/D),N是所有用户数,TF/IDF值为TF与IDF的乘积;Step 7.1 uses the word frequency/inverse document frequency TF/IDF model to calculate the weight of each dimension of its feature vector for each user U i ; for each word q in q 1 , q 2 ,...,q d a, , if it does not appear in the user's historical query records, its corresponding weight v a is 0, otherwise it is its TF/IDF value, TF is the word frequency, IDF is the inverse document frequency, count those that appear in the historical query records The number D of users who have passed the word, the IDF value is log(N/D), N is the number of all users, and the TF/IDF value is the product of TF and IDF; 第7.2步对每一个用户Ui分析其最近一段时间ΔT时间内的通话记录,对与其有通话记录的每一个用户ux分析其与Ui在ΔT内的总通话次数α,总通话时长β,通话规律γ,利用式IX计算Ui与ux之间的关系程度ψixStep 7.2 Analyze the call records of each user U i within the latest period of ΔT, and analyze the total number of calls α and total call duration β of each user u x with U i within ΔT. , call law γ , use formula IX to calculate the degree of relationship between U i and u x ψ ix ; ψix=λ1·α+λ2·β+λ3·γ                 式IXψ ix1 ·α+λ 2 ·β+λ 3 ·γ Formula IX 式中,0≤λ1≤1,0≤λ2≤1,0≤λ3≤1,且λ123=1In the formula, 0≤λ 1 ≤1, 0≤λ 2 ≤1, 0≤λ 3 ≤1, and λ 123 =1 γγ == 11 SS tt SS tt == 11 αα -- 11 ΣΣ hh == 22 αα (( ΔtΔt ‾‾ -- ΔtΔt hh )) 22 Δth=th-th-1,h=2,3,...,αΔt h =t h -t h-1 , h=2, 3, . . . , α ΔtΔt ‾‾ == 11 αα -- 11 ΣΣ hh == 11 αα ΔtΔt hh ..
CN 201110458155 2011-12-31 2011-12-31 Filtering method of search results in mobile environment Expired - Fee Related CN102591966B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110458155 CN102591966B (en) 2011-12-31 2011-12-31 Filtering method of search results in mobile environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110458155 CN102591966B (en) 2011-12-31 2011-12-31 Filtering method of search results in mobile environment

Publications (2)

Publication Number Publication Date
CN102591966A true CN102591966A (en) 2012-07-18
CN102591966B CN102591966B (en) 2013-12-18

Family

ID=46480604

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110458155 Expired - Fee Related CN102591966B (en) 2011-12-31 2011-12-31 Filtering method of search results in mobile environment

Country Status (1)

Country Link
CN (1) CN102591966B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102867031A (en) * 2012-08-27 2013-01-09 百度在线网络技术(北京)有限公司 Method and system for optimizing point of interest (POI) searching results, mobile terminal and server
WO2014101846A1 (en) * 2012-12-28 2014-07-03 Huawei Technologies Co., Ltd. Predictive caching in a distributed communication system
CN104317900A (en) * 2014-10-24 2015-01-28 重庆邮电大学 Multiattribute collaborative filtering recommendation method oriented to social network
CN104462239A (en) * 2014-11-18 2015-03-25 电信科学技术第十研究所 Customer relation discovery method based on data vectorization spatial analysis
CN104866474A (en) * 2014-02-20 2015-08-26 阿里巴巴集团控股有限公司 Personalized data searching method and device
CN105243135A (en) * 2015-09-30 2016-01-13 百度在线网络技术(北京)有限公司 Method and apparatus for showing search result
CN106570699A (en) * 2015-10-08 2017-04-19 平安科技(深圳)有限公司 Client contact information excavation method and server
CN111212381A (en) * 2019-12-18 2020-05-29 中通服建设有限公司 Mobile user behavior data analysis method and device, computer equipment and medium
CN113220969A (en) * 2020-02-06 2021-08-06 百度在线网络技术(北京)有限公司 Advertisement determination method, device, equipment and storage medium
CN113704604A (en) * 2021-08-24 2021-11-26 山东库睿科技有限公司 Search system and search method
CN113792180A (en) * 2021-08-30 2021-12-14 北京百度网讯科技有限公司 Duplicate removal method and device in recommendation scene, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1903460A1 (en) * 2006-09-21 2008-03-26 Sony Corporation Information processing
CN101819572A (en) * 2009-09-15 2010-09-01 电子科技大学 Method for establishing user interest model
CN101923545A (en) * 2009-06-15 2010-12-22 北京百分通联传媒技术有限公司 Method for recommending personalized information
CN102236646A (en) * 2010-04-20 2011-11-09 得利在线信息技术(北京)有限公司 Personalized item-level vertical pagerank algorithm iRank

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1903460A1 (en) * 2006-09-21 2008-03-26 Sony Corporation Information processing
CN101923545A (en) * 2009-06-15 2010-12-22 北京百分通联传媒技术有限公司 Method for recommending personalized information
CN101819572A (en) * 2009-09-15 2010-09-01 电子科技大学 Method for establishing user interest model
CN102236646A (en) * 2010-04-20 2011-11-09 得利在线信息技术(北京)有限公司 Personalized item-level vertical pagerank algorithm iRank

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王秀平等: "个性化学习推荐系统的设计与实现", 《微型电脑应用》 *
胡娟丽等: "基于典型反馈的个性化文本信息过滤", 《计算机应用》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102867031A (en) * 2012-08-27 2013-01-09 百度在线网络技术(北京)有限公司 Method and system for optimizing point of interest (POI) searching results, mobile terminal and server
WO2014101846A1 (en) * 2012-12-28 2014-07-03 Huawei Technologies Co., Ltd. Predictive caching in a distributed communication system
CN104866474A (en) * 2014-02-20 2015-08-26 阿里巴巴集团控股有限公司 Personalized data searching method and device
CN104866474B (en) * 2014-02-20 2018-10-09 阿里巴巴集团控股有限公司 Individuation data searching method and device
CN104317900A (en) * 2014-10-24 2015-01-28 重庆邮电大学 Multiattribute collaborative filtering recommendation method oriented to social network
CN104462239B (en) * 2014-11-18 2017-08-25 电信科学技术第十研究所 A kind of customer relationship based on data vector spatial analysis finds method
CN104462239A (en) * 2014-11-18 2015-03-25 电信科学技术第十研究所 Customer relation discovery method based on data vectorization spatial analysis
CN105243135B (en) * 2015-09-30 2019-09-20 百度在线网络技术(北京)有限公司 Show the method and device of search result
CN105243135A (en) * 2015-09-30 2016-01-13 百度在线网络技术(北京)有限公司 Method and apparatus for showing search result
CN106570699A (en) * 2015-10-08 2017-04-19 平安科技(深圳)有限公司 Client contact information excavation method and server
CN111212381A (en) * 2019-12-18 2020-05-29 中通服建设有限公司 Mobile user behavior data analysis method and device, computer equipment and medium
CN113220969A (en) * 2020-02-06 2021-08-06 百度在线网络技术(北京)有限公司 Advertisement determination method, device, equipment and storage medium
CN113704604A (en) * 2021-08-24 2021-11-26 山东库睿科技有限公司 Search system and search method
CN113792180A (en) * 2021-08-30 2021-12-14 北京百度网讯科技有限公司 Duplicate removal method and device in recommendation scene, electronic equipment and storage medium
CN113792180B (en) * 2021-08-30 2024-02-23 北京百度网讯科技有限公司 Method and device for removing duplicate in recommended scene, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN102591966B (en) 2013-12-18

Similar Documents

Publication Publication Date Title
CN102591966B (en) Filtering method of search results in mobile environment
US20110145234A1 (en) Search method and system
CN103593425B (en) Intelligent retrieval method and system based on preference
CN110462604A (en) Data processing system and method based on device usage associated with Internet devices
CN104217030B (en) A kind of method and apparatus that user's classification is carried out according to server search daily record data
CN103700018B (en) A kind of crowd division methods in mobile community network
CN108154425A (en) Method is recommended by the Xian Xia trade companies of a kind of combination community network and position
CN101556603A (en) Coordinate search method used for reordering search results
WO2011054245A1 (en) Mobile search method, device and system
CN101770520A (en) User interest modeling method based on user browsing behavior
CN108595461A (en) Interest heuristic approach, storage medium, electronic equipment and system
CN107896153B (en) A method and device for recommending a data package based on the online behavior of a mobile user
Liu et al. Using collaborative filtering algorithms combined with Doc2Vec for movie recommendation
CN105608121A (en) Personalized recommendation method and apparatus
CN110362740A (en) A kind of water conservancy gateway information mixed recommendation method
CN108874916A (en) A kind of stacked combination collaborative filtering recommending method
CN111079009A (en) User interest detection method and system for government map service
WO2010037314A1 (en) A method for searching and the device and system thereof
CN107368499B (en) Client label modeling and recommending method and device
CN104063555B (en) The user model modeling method intelligently distributed towards remote sensing information
CN115034206B (en) Customer service hot event discovery method and system
CN107133268A (en) A kind of collaborative filtering for Web service recommendation
CN106649733B (en) Online video recommendation method based on wireless access point context classification and perception
CN105873119A (en) Method for classifying flow use behaviors of mobile network user groups
Liu et al. Clustering analysis of urban fabric detection based on mobile traffic data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20131218

Termination date: 20201231