CN102355664A - Method for identifying and matching user identity by user-based social network - Google Patents

Method for identifying and matching user identity by user-based social network Download PDF

Info

Publication number
CN102355664A
CN102355664A CN2011102263262A CN201110226326A CN102355664A CN 102355664 A CN102355664 A CN 102355664A CN 2011102263262 A CN2011102263262 A CN 2011102263262A CN 201110226326 A CN201110226326 A CN 201110226326A CN 102355664 A CN102355664 A CN 102355664A
Authority
CN
China
Prior art keywords
user
communication
telecom operators
method
carriers
Prior art date
Application number
CN2011102263262A
Other languages
Chinese (zh)
Inventor
郑毅
Original Assignee
郑毅
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 郑毅 filed Critical 郑毅
Priority to CN2011102263262A priority Critical patent/CN102355664A/en
Publication of CN102355664A publication Critical patent/CN102355664A/en

Links

Abstract

The invention discloses a user identification and matching method and a user identification and matching system for a telecom operator, an email service provider, an instant message service provider and a social network service provider. The method and the system comprises the three parts of user data acquisition, user characteristic extraction and identity database construction, and user identification and matching. In the method and the system, user identity characteristics are identified according to the communication characteristics of a user, and the most important user identity characteristic is that the user connects a plurality of groups which are independent of one another. The method and the system can be used for identifying and analyzing the conversion of the user among different service providers, and also can be used for identifying and monitoring suspicious characters by security departments.

Description

一种基于用户的社交网络对用户身份进行识别与匹配的方 Based on the user's social network user identity recognition and matching party

law

技术领域 FIELD

[0001] 本发明涉及通信领域以及电子商务领域,特别涉及一种对用户身份识别与匹配的方法和装置。 [0001] The present invention relates to the field of communications and e-commerce, and in particular relates to a method and apparatus for user identification and matching.

背景技术 Background technique

[0002] 这里所说的用户的身份识别与匹配,是指通过有限的用户通信行为数据判别使用不同账户的用户是否为同一人。 [0002] here matches the user identification, the user refers to the determination by using a different account of the limited data communication behavior of the user are the same person. 以电信运营商为例,用户身份识别的主要目的是判断是否同一人正在使用或者曾经使用多个不同的电话号码。 Telecommunications operators, for example, the main purpose of user identification is to determine whether the same people are using or have used several different telephone numbers. 所述不同电话号码可能由同一电信运营商提供服务,也可能由不同的电信运营商提供服务。 The phone number may provide different services by the same carriers, may also be serviced by different telecom operators. 本文中大部分应用是以电信运营商为例,然而其应用范围不限于电信运营商的电话用户身份识别与匹配。 In this paper, the majority of applications are telecom operators, for example, but its application is not limited to telecom operators phone users identification and matching. 例如,本方法和装置也可用来识别与匹配电子邮件、即时通信软件、微博的用户身份。 For example, the present method and apparatus may also be used to identify a matching e-mail, instant messaging software, micro-blog user identity.

[0003] 随着电信业竞争的加剧,电话用户离网转向使用其他运营商的服务频率逐渐增加。 [0003] With the intensification of competition in the telecommunications industry, off-grid phone users to switch to other carriers increasing the frequency of service. 电话用户转换运营商意味着所述用户由运营商的既有客户转变为潜在客户,而由于不能识别电话用户的身份,使得电信运营商很难有针对性的向潜在客户进行宣传与服务。 Phone users to convert the user's carrier means change from the operator's existing customers as potential customers, and because the phone does not recognize the identity of the user, making the telecom operators is difficult to have targeted advocacy and services to potential customers. 另外,电话用户在决定转换运营商时往往可能同时使用两部电话作为过渡,以确保和现有联系人沟通的连续性。 In addition, when the user decides to switch phone operators may often use two phones at the same time as the transition to ensure continuity of existing contacts and communication. 如果电信运营商能够及时有效的识别电话用户在同时使用两部电话, 有利于电信运营商采取有针对性的措施对即将转换运营商的电话用户进行挽留。 If telecom operators to promptly and effectively identify phone users use two phones at the same time, help telecom operators to take targeted measures to be converted operators of telephone users to retain. 甚至,有些电话用户出于使用更加优惠的资费套餐的考虑或者出于其他优惠政策的吸引,也可能在同一电话运营商内部更换电话号码。 Even some phone users for the use of more preferential tariff packages considerations or for other preferential policies to attract, it may replace the telephone numbers within the same phone operators. 如果电信运营商不能发现使用不同电话号码的电话用户实际为同一人,这也将使得电信运营商不能为该电话用户提供连续性的服务。 If carriers can not be found using a different telephone number of telephone subscribers is actually the same person, which would also allow telecommunications operators can not provide continuity of service for telephone users. 对电话用户身份识别的主要目的是,使得电信运营商能够以人为中心对电话用户理解和提供服务。 The main purpose of the phone user identification that allows service providers to telephone users to understand and provide services to people-centered.

[0004] 在社会学领域,研究者从企业或者个人的社会资本的角度就在社会网络中的企业或个人作为中间人连接两个没有直接关联的网络的现象进行了深入研究,Ronald Burt将其称之为“结构洞”(structural hole)。 [0004] In the field of sociology, business or individual researchers in the social network as a middleman to connect two networks not directly related to the phenomenon from the perspective of the enterprise or individual social capital on the in-depth study, Ronald Burt said it as the "structural holes" (structural hole). 然而,利用“结构洞”的概念对社会网络中的个体进行识别与匹配的研究还没有先例。 However, studies using the concept of "structural holes" of individual social networks to identify and match no precedent.

发明内容 SUMMARY

[0005] 通信用户作为社会人,在不同团体中扮演的不同身份,通信用户在社会中的关系定义了通信用户的身份。 [0005] user communication as a social person, play in different groups in different identities, user communication relations in society defines the identity of communications users. 本方法通过对通信用户的社交网络分析,识别与匹配电话用户机主身份。 This method of communication by user's social network analysis, identifying a telephone user to match the owner identity. 本方法的基本思想是出于如下考虑:通信用户由于不同身份而处于若干具有紧密联系的圈子之中,例如亲属圈、朋友圈、同事圈、同学圈、俱乐部成员圈……。 The basic idea of ​​this method is due to the following considerations: due to different user identities and communication among several in a circle with a close contact, such as relatives circle, circle of friends, colleagues circle, circle classmates, club members ring ....... 各个圈子内部成员之间存在紧密联系,而处在不同圈子的成员之间关系不太紧密,甚至互不相识。 Exist between the internal members of each circle close contact, and the relationship between members of different circles at less tightly, even strangers. 也就是说,所述通信用户存在若干紧密联系的联系人,这些联系人之间不存在紧密的联系。 In other words, there are a number of contacts the user communications closely linked, there is no close relationship between these contacts. 在这种意义下,所述通信用户成为联系不同圈子的桥梁。 In this sense, the communication link users to become different circles of the bridge. 而由于个人经历的特殊性,不同通信用户作为桥梁联系了不同的圈子,具有特殊性,我们可以利用这个特性作为特征对通信用户进行识别。 And because of the special nature of personal experience, different communication user as a bridge between the different circles, with particularity, we can use this feature to identify the user as a communication features.

[0006] 本方法的优点在于电信运营商不必获得电话用户的全部通信记录也能够对用户身份进行识别与匹配。 Advantages [0006] of the present method is that telecom operators need not get all the telephone communication record of the user can be identified with the matching user. 可以直观想象一下,某通信用户A因为工作关系在工作地结识好友B,同时也和目前仍在家乡某个小县城的好友C保持密切联系。 Intuitive Imagine a communications user A B because the working relationship to get to know friends in the workplace, but also, and is still home to a small town friend C to maintain close contact. 由于地理位置的差异和社会阶层的差异,该通信用户的好友B和好友C之间很可能互不相识,并且该通信用户成为连接B和C之间的唯一桥梁。 Due to differences in social class differences and geographical location, communication between the user's friends B and C are likely to know each other friends, and become the only user of the communication bridge between B and C. 电信运营商只需知道某电话号码在同时和B与C保持密切联系, 就能比较有把握的判断该电话号码由用户A使用。 Telecom operators only need to know a phone number at the same time maintain close contact and B and C, will be able to have a good chance of determining the telephone number used by the user A. 即使A已经多次更改了电话号码,电信运营商也能很有把握的做出判断这些号码都为用户A所拥有。 Even if A has repeatedly changed phone numbers, telecom operators can confidently make judgments these numbers are owned by user A. 并且,即使用户A有上百个联系人,但是电信运营商只需要电话用户A和B与C之间的通信记录,而不必在意电话用户A和其他电话用户之间的通信记录,电信运营商也能做出比较准确的判断。 And, even if the user A has hundreds of contacts, but telecom operators need only record communications between telephone users A and B and C, and do not care about the communication records between the phone user A and other phone users, telecom operators You can make more accurate judgments. 考虑到任何一个电信运营商的市场占有率都不是100%,也就是说任何一个电信运营商都不能获得所有电话用户的所有通信记录。 Taking into account any telecom operator's market share is not 100%, meaning that any telecom operator can not get all the communication records of all phone users. 本方法不需要电话用户全部通信记录的这个优点,使得电信运营商仅通过其他电信运营商的通信用户和本电信运营商的通信用户的网间通信记录就能够对其他电信运营商的用户身份进行识别与匹配。 The advantage of this method does not require records of telephone users all communications so that carriers can be of a different user telecom operators only through communication Wangjiantongxin record the user's communication users of other telecom operators and telecom operators in this identification and matching. 并且,本方法的这个特点使得电信运营商能够在通信用户刚刚转换电信运营商、离网或者即将离网的初始阶段就能够通过通信用户的部分通信记录就对其进行迅速识别与匹配,并展开有针对性的宣传工作。 Further, this feature makes the process just carriers to convert the user communications carriers, coming from the network or from the initial stage of the network would be able to quickly identify the portion of the communication record by matching the user's communications, and expand targeted publicity.

[0007] 本方法分为数据准备阶段、特征抽取阶段和特征匹配阶段等三个阶段。 [0007] The present method is divided into data preparation phase, feature extraction and feature matching stages three stages like stage. 数据准备阶段的主要目的是从通信网络中抽取通信用户的通信记录数据。 The main object of the data preparation phase is extracted communication log data from the communication subscribers in a communication network. 特征抽取阶段的主要目的是建立通信用户交往特征库,利用历史通信记录,根据通信用户的社交网络信息对每个通信用户进行表征,建立通信用户交往特征数据库。 The main purpose of feature extraction phase is to establish communication with the user interaction features library, using the communication history record, according to the user's social network information communication communications were characterized for each user, user interaction features to establish communications database. 在特征匹配阶段,抽取某些特定账号的通信记录,根据这些账号的通信记录将其该账号与已有的用户身份特征数据库相匹配。 Feature matching stage, extracting the communication record certain account, according to the communication record of which of these account the existing user account and wherein the database match.

[0008] 本方法所包含的数据准备阶段、特征抽取阶段和特征匹配阶段等三个阶段的流程如图ι所示,具体分为101〜103三个步骤: [0008] The data preparation stage of the process included, feature extraction and feature matching stage stage three stages of the process shown in FIG iota, particularly 101~103 divided into three steps:

[0009] 101.抽取通信用户的通信历史记录。 [0009] 101. The extracted communication history of the communication user. 电信运营商在为用户提供通信服务的同时, 出于计费或者系统维护的需要,在系统内部积累了大量的通信历史记录。 While telecom operators to provide users with communication services, the need for billing or system maintenance, within the system has accumulated a great deal of communication history. 这些通信历史记录包括但不限于如下通信记录:用户间的语音通信记录、用户间的短信通信记录、用户间的彩信通信记录、用户所使用的即时通信产品的通信记录。 These communications history including but not limited to, the following communications recording: recording voice communications between users, messaging communication between users record, MMS communication records between users, records instant messaging communication products used by the users. 这些用户间的通信记录将成为本方法的数据基础。 Communication record between these users will be basis of the process data.

[0010] 102.利用通信记录建立通信用户交往特征数据库。 [0010] 102. The user establishes a communication with the communication interaction feature database record. 基于用户的历史通信记录,抽取最能代表与识别用户身份的社会关系,以此为基础构建通信用户的交往特征数据库。 Based on historical records the user's communication, extract the most representative of and to identify users of social relations, in order to build the communications user interaction features based database. 本步骤所使用的方法在附图2以及对应的步骤201〜204中进行了详细描述。 The method used in this step will be described in detail in the figures 2 and the corresponding step 201~204.

[0011] 103.在通信用户交往特征数据库中匹配与查询感兴趣通信账号交往特征相似的历史通信账号。 [0011] 103. In the communication matching user interactions feature database and query features similar interest in communication exchanges account of the history of communication account. 一旦根据通信用户的历史通信记录建立了通信用户的交往特征数据库之后,电信运营商可以对新增电话号码根据其交往特征在通信用户历史交往特征数据库中识别这些新增号码是否与特征库中的历史电话号码相匹配,分别如附图3与附图4所说明的方法所示。 Once communication is established according to the user's contacts database features history of communications records user's communication, telecom operators can identify the user in the communication features historical association database according to their interaction characteristics of the new phone number if these new numbers and characteristics of library history matching phone number, the method described in Figure 4 and 3, respectively, as shown in the drawings. 附图3以及对应的步骤301〜305的方法描述电信运营商根据其与其他电信运营商的网间通信记录判别其他电信运营商的新增用户来源以及本电信运营商的用户是否出现即将流失的征兆。 3 and corresponding method steps described with reference 301~305 carriers according to whether there is new about the loss of other sources of user carriers and carriers present in the user determines which inter-network communication with other record carriers of sign. 附图4以及对应步骤401〜404的方法描述电信运营商利用通信用户交往特征数据库判别其新增用户的来源。 The method and corresponding steps of Fig. 4 described 401~404 communication service providers use a user interaction database wherein determining source of new subscribers. 并且,基于本方法所描述的技术,可以构建对可能更换电话号码的敏感人物(例如犯罪嫌疑人)的电话号码变更与使用的实时跟踪与报警系统,其具体步骤如附图E以及步骤501〜505所述。 Further, the method based on techniques described can be constructed in real time tracking and alarm system for use with the phone number change could replace the telephone number of sensitive people (e.g., suspect), the specific steps and step 501~ as indicated by reference E the 505.

[0012] 在获取了通信用户的行为通信数据之后,需要根据用户的通信行为数据建立通信用户的行为特征数据库。 Behavioral characteristics [0012] After obtaining the user's communication behavior of the communication data, the user needs to establish communication according to the communication behavior of the user data in the database. 本方法的目的是找到通信用户独特的、稳定的、难以伪造的且能够相对准确对用户进行识别的通信用户行为特征构建通信用户的行为特征数据库。 Object of the method is to find the unique user communication, a stable communication user behavioral characteristics can be relatively difficult to counterfeit and accurate identification of the user communication behavior of the user to build database feature. 而通信用户的交往关系是一个能够满足上述要求的通信用户行为特征。 Relationship between the exchanges user communication is a communication user behavior characteristics can satisfy the above requirements. 本方法将描述如何利用通信用户的交往行为构建通信用户的交往行为特征数据库,具体描述如步骤201〜204所述: This method will be described how communicative action feature database communication user communication behavior of the user communications construct, as described in detail the steps 201~204:

[0013] 201.根据通信记录,计算每个通信用户和其他通信用户之间的联系强度。 [0013] 201. The communication record, the strength of association between calculated for each user communications and other communications users. 两个通信用户之间是否存在联系,可以根据两个电话之间是否存在通信记录的信息进行判断。 Whether there is a communication link between two users, the information may be determined according to whether there is communication between two telephone records. 两个通信用户之间的联系越为密切而稳定,则可以认为两个通信用户之间的联系强度越强。 Contact between the two communicating users to close the more stable, it can be considered the stronger the strength of association between two communicating users. 这里不建议使用通话时长、通话次数等指标作为两个通信用户之间联系强度的度量。 It is not advisable to use a call duration, number of calls and other indicators as links between two communication stations measure strength. 因为通信时长这个指标和个人的通信习惯密切相关。 Since long this indicator and personal communications habits are closely related to communication. 并且由于一些偶然原因两个通信用户可能在短期内有长时间的通信,但是两个通信用户之间并没有建立起长期而稳定的社会联系。 And due to some accidental causes two communicating users may have a longer communication in the short term, but the communication between two users did not build long-term and stable social ties. 例如,一个通信用户在装修新房时可能和装修队师傅之间产生了大量的通信,但是在装修完毕之后他们之间就很少联系,这样的联系不能为未来识别与匹配通信用户提供足够的有用信息。 For example, when a user communication may decorate the house and between the master renovation team generated a lot of communication, but after renovation work was completed on very little contact between them, such a link does not provide enough useful for future identification and matching user communication information. 在此,我们定义两个通信用户A、B之间的关系强度为: Here, we define two communication subscribers A, B is the strength of the relationship between:

[0014] [0014]

_ W · W^ti-D+1 如果AB在时刻t-Ι到时刻t发生联系WaB(° «· Wa.^ 如果AB在时刻t-Ι到时刻t未发生联系 _ W · W ^ ti-D + 1 if AB at time t to the time t-Ι have contact WaB (° «· Wa. ^ AB If at time t to time t-Ι contact does not occur

[0015] 每个时间间隔可能取1天、1周、1月……。 [0015] Each time interval may be taken 1 day, 1 week, 1 month ....... 其中0彡α彡1,为衰减系数。 San San α 1 0 wherein, an attenuation coefficient. 当α =0时,只有最近一个时间间隔对两者的关系强度有影响。 When α = 0, only the most recent time interval of the impact strength of the relationship between the two. 当α =1时,该强度表示最近t个时间段中(最近t日、t周、t月中……)两者之间有联系的时间段个数。 When α = 1, which indicates the strength of the most recent time period t (last day t, t weeks, t ...... month) period of time with a number of links between them. 例如α的一个建议取值是0.9。 One suggestion, for example, α value is 0.9. 而每个时间段的间隔长度定义可以根据通信服务自身的特点决定。 The interval length defined for each time segment may be determined according to the characteristics of the communication service itself. 在具体实施过程中,两个通信用户之间的联系强度的定义可以进行进一步优化。 In a specific implementation, the definition of the strength of association between two communication subscribers can be further optimized.

[0016] 202.筛选通信记录,删除不适合用作用户身份识别的节点和边。 [0016] 202. Filter communication record, delete unsuitable as user identification of nodes and edges. 假设某通信用户有d个联系人,则由所述通信用户、所述通信用户的任意两个联系人所组成的三元组(triad)共有d(dl)/2种可能组合。 Suppose a user has a communication contacts d, triplet (Triad) by said communication user, any user of the communication two contacts consisting of a total of d (dl) / 2 possible combinations. 而对于用于公共服务的电话号码、用于电话营销的电话号码等商用号码很有可能在一段时间内联系数以万计的手机号码,这样的电话号码具有数以亿计的可能的三元组,将耗费大量的计算资源和存储资源。 For the telephone number for public services, telephone number and other commercial number for telemarketing likely to contact tens of thousands of cell phone numbers over a period of time, such as a phone number with a possible three-way hundreds of millions of group, a large amount of computation and storage resources. 所以,有必要根据系统的能力,对联系人过多的电话号码,也就是度过大的节点区别对待。 Therefore, it is necessary according to the system's ability to excessive contact phone number, which is too large nodes treated differently. 同样,删除那些联系强度不是很强的边,也就是删除那些不适合用作未来识别与匹配通信用户身份的强度较弱的联系,这样的联系可能是偶然发生的,未来双方继续发生联系的可能性较小。 Also, delete those contact strength is not very strong side, that is, the intensity delete those not suitable for future identification and matching the user identity is weak communication links, such links may be accidental, the two sides continue to occur in the future may contact of small. 在实践中,对于每个通信用户,也就是每个节点保留多少条边将由网络的规模以及系统的计算能力决定。 In practice, for each communication user, that is, the ability to calculate how many of each node by the network side of the scale and the decision to retain the system. 一个经验的建议是对于每个节点只保留不超过50条边。 One suggestion is to experience not only for each node to retain more than 50 side.

[0017] 203.计算通信用户作为连接其他两个通信用户的桥梁的强度。 [0017] 203. User computing communication connection strength as the other two users of a communication bridge. 假设节点η同时联系了节点Ii1和节点η2,则节点η同节点Ii1和节点η2的联系强度构成了节点η的一个特征。 Suppose node η and contact node and the node Ii1 η2, η contact strength with the node and the node [eta] 2 Ii1 node constitutes a feature node of η. 设节点η同节点Ii1以及节点〜的联系强度分别为%.rt|、w,%,则节点n作为桥梁连接节点Ii1以及节点η2的强度…定义应该满足以下特性: Η provided with the node and the node - node Ii1 contact strength respectively% .rt |, w,%, the strength as a bridge between the node n and the node node Ii1 ... η2 the definition should meet the following characteristics:

[0018] a). %.„,、越大,则这个桥梁的强度越大;[0019] b). %.„,、越接近,则这个桥梁的强度越大(此特性的目的是出于不同通信用户具有不同的通信习惯的考虑,如果节点η同节点Ii1以及节点η2的联系强度相似,说明他和二者之间的); [0018] a)% ",, is larger, the greater the intensity of the bridge;.. [0019] b)%".. ,, closer, the greater the strength of the bridge (the purpose of this feature is consider having different communication in different communication habits of users, if η node and the node with the node Ii1 η2 similar intensity of contact, between his description of both);

[0020] c).如果节点Ii1与节点〜之间的联系强度越弱,则这个桥梁的强度越大(提出此特性的目的是出于如下考虑:如果节点Ii1与节点n2之间存在联系,则η、ηι、η2三者可能在处在同一个圈子中,该圈子内的很多通信用户之间都有可能存在联系,很难根据该社交网络内部的通信用户之间的联系识别与匹配通信用户)。 . [0020] c) the strength of links between the nodes and the node ~ Ii1 if weaker, the greater the strength of the bridge (this feature is presented for the purpose of the following considerations: if there is between the node and the node n2 Ii1 contact, is η, ηι, η2 possible three circles are in the same, there may exist many communication links between users within the group, according to the contact is difficult to identify a match between a user communication inside the social communication network user).

[0021] 为了满足上述特性,定义节点η作为桥梁连接节点Ii1以及节点〜的强度如下: [0021] To satisfy the above properties, the strength of a bridge defined as η node connecting node and the node - Ii1 follows:

[0022] [0022]

Figure CN102355664AD00061

[0023] 式中是节点n作为桥梁连接节点ηι以及节点n2的强度, 、w„.„2 . 分别是相应节点之间的连接强度。 [0023] wherein n is as ηι node and a node strength of a bridge connecting the node n2,, w "." 2. Are the connection strength between the respective nodes. 为了满足特性b),定义中使用了类 In order to meet the characteristics of b), used in the definition of the class

似信息熵的表达方式。 Like expression of entropy. 为了满足特性c),式中将取对数作为分母,而对数中的“+1. 01”是为了避免分母出现负数或者0,分母中β为一常数,调节对w—^的影响。 In order to meet the characteristic C), in the logarithmic formula as the denominator, and the number of "+1. 01," the denominator to avoid occurrence 0 or negative, the denominator is a constant β, the adjustment on w- ^. 由上述公式可以看出,对于无向图桥梁长度具有如下性质: As it can be seen from the above equation, for a non-directed graph having the following properties bridge length:

[0024] [0024]

Figure CN102355664AD00062

[0025] 204.以桥梁两端的点作为变量,以桥梁强度作为值,对节点进行表征。 [0025] 204. The bridge to points at both ends as a variable to the value of intensity as a bridge, characterized by the nodes. 对于任何一个通信用户对应的节点n,节点的度数为dn,和η存在联系的节点为〜、η2……〜,,,则对该通信用户的交往特征用如下向量进行表示: For any one of the communication nodes corresponding to user n, the degree of the node dn, and η is the presence of the node link ~, η2 ...... ~ ,,, is performed by the following vector represents the communication characteristics of the communication user:

[0026] [0026]

Figure CN102355664AD00063

[0027] 将上式中所有和η存在联系的节点A、«2……ndn所存在的组合 [0027] wherein all the upper links and the presence η node A, «present compositions ndn 2 ......

记为Con (η),则两个节点m、η对应交往特征vm、vn Referred to as Con (η), the two nodes m, η communication characteristics corresponding to vm, vn

之间的相似度用余弦相似度可以定义如下: Cosine similarity between the similarity can be defined as follows:

[0028] [0028]

Figure CN102355664AD00064

[0029] 但是,在实践应用中建议使用vm、vn之间的内积<νω · vn>作为m、η之间相似度度量,即: [0029] However, it is recommended to use the inner product between vm, vn <νω · vn> m as in practical use, the similarity metric between [eta], i.e.:

[0030] [0030]

Figure CN102355664AD00065

[0031] 当vm、νη之间的相似度SimilaHvm, vn)越大时,m、η是同一个用户的可能性越大。 [0031] When the degree of similarity between the SimilaHvm vm, νη, vn) while larger, m, η is a greater likelihood of the same user. 在具体实践过程中,有必要对m、η之间的相似度公式进行一定改进和优化。 In practice the process, it is necessary for the degree of similarity between the formulas m, η certain modifications and optimization.

[0032] 其他改进:对于公式O)中所述节点η作为桥梁连接节点Ii1以及节点η2的强度可以进行优化,用^%„.„力表示如下: [0032] Other improvements:. "" For the equation O) η as a node in the node and the strength of Ii1 η2 bridge node connection may be optimized, with a force ^% expressed as follows:

[0033] [0033]

Figure CN102355664AD00066

[0034] 也即,如果有多个节点同时作为桥梁连接节点Ii1以及节点Ii2,则Ii1 · n2作为判别节点η的身份的特征的能力将大大减弱。 Ability [0034] That is, if there are multiple nodes simultaneously as a bridge to connect nodes and node Ii1 Ii2, the Ii1 · n2 as discrimination node η identity features will be greatly reduced. 所以,公式(7)的定义可以替代公式O)的定义用来构建经过优化用户交往特征向量。 Therefore, the definition of the formula (7) may be substituted for equation O) is defined to construct feature vectors optimized user interaction.

[0035] 随着电信服务渗透率趋于饱和,电信业竞争的日益加剧,很多电信用户受到其他电信运营商更为优质或者优惠的服务的吸引而从现有电信运营商离网转向使用其他电信运营商的服务。 [0035] As telecom service penetration saturation, increasing competition in the telecommunications industry, telecommunications users are many other telecom operators offer more quality services to attract or from existing off-grid telecom operators to switch to other telecommunications operator's service. 使用本方法中所构建的用户交往行为特征数据库以及电信运营商和其他电信运营商的网间通信记录,能够帮助电信运营商分析其他电信运营商的新增用户来源,从而分析本电信运营商离网用户的去向,并且据此有针对性的改善自身的运营与服务,其具体方法如附图3和步骤301〜305所示: Use this method constructed user interaction and behavioral characteristics Wangjiantongxin database records telecom operators and other telecom operators, telecom operators can help the user analyze new sources of other telecom operators, telecom operators to analyze this from the whereabouts of the user network, and accordingly is targeted to improve their operations and services, the specific method steps as shown in figures 3 and 301~305:

[0036] 301.电信运营商根据其他电信运营商的电话用户和本电信运营商的电话用户的网间通信记录,维系其他电信运营商的电话号码库。 [0036] 301. The carriers according to the user's phone records Wangjiantongxin other telecom operators phone users and this telecom operators to maintain additional phone numbers libraries telecom operators. 当其他电信运营商的电话号码列表中新出现了一些电话号码,或者其他电信运营商的一些长期不活跃的电话号码重新活跃,则有可能意味着其他电信运营商新发展了一批电话用户。 When a list of other telecom operators phone numbers there have been some new phone number, or other telecom operators of some long-term inactive telephone number to re-active, it might mean that other telecom operators to develop a number of new telephone subscribers.

[0037] 302.抽取其他电信运营商新增电话号码与本电信运营商的通信记录,按照201〜 204的方法,使用公式(1)〜公式(4)构建这些其他电信运营商新增电话号码的交往特征向量。 [0037] 302. The new phone number extracted communication record carriers according to other carriers, 201~ 204 in accordance with a method using Equation (1) to Equation (4) construction of these other carriers add a phone number contacts feature vectors.

[0038] 303.利用公式(6),计算其他电信运营商新增电话号码与本电信运营商电话号码之间的相似程度。 [0038] 303. The use of Equation (6), other carriers calculated degree of similarity between the new phone number and the phone number of the present carriers. 对于每个电话号码k,选取与其最为相近的本电信运营商电话号码i。 For each phone number k, selected its most similar to the present telecom operators phone number i. 如果两者相似度大于一定阈值则这些本电信运营商的电话号码进入进一步考察列表。 If both similarity is greater than a certain threshold value those present telecom operators to enter the phone number for further study list.

[0039] 304.如果上述列表电话号码的通话行为出现波动。 [0039] 304. If the above behavior call list of telephone numbers fluctuate. 这些波动现象包括但不限于如下情况:电话停机、电话注销、通话量下降幅度超过一定阈值、短信量下降幅度超过一定阈值、网络流量下降幅度超过一定阈值。 These include but are not limited fluctuation following: telephone stands, cancellation phone, call volume decrease exceeds a certain threshold, decrease the amount of messages exceeds a certain threshold, decrease network traffic exceeds a certain threshold. 则上述列表中的电话用户有较大的把握被判定为即将离网或者已经离网、转向使用其他电信运营商的服务。 The above-mentioned list of phone users have a greater certainty is judged to have been about to off-grid or off-grid, shift to other telecom operator's service.

[0040] 305.对于本电信运营商中很可能发生转网的电话用户采取相应的措施,例如:通过客户服务中心联系并调研这部分电话用户找出他们的离网原因;对这部分电话用户采取有针对性的关怀与挽留措施。 [0040] 305. For phone users who switch carriers of this is likely to happen to take appropriate measures, such as: contact by customer service center and research this part of the phone users to find out their reasons for off-grid; this part of the phone users adopt appropriate measures to retain and care.

[0041] 现在电信运营商都在大力发展新用户,但是这些新用户有部分用户由以前的竞争对手运营商的用户转网而来,有部分用户是真正的新增用户,而有部分用户是由本电信运营商的历史客户变更电话号码而来,如果不能有效的区分新增电话用户的来源则不能有效的评估新用户发展政策的效果,也不能有效的为已经变更电话号码的电话用户提供持续的服务。 [0041] Now telecom operators are seeking to develop new users, but some users of these new users to switch network from competitors from the previous operator's users, some users would be genuinely new users, but some users are by the telecom operators to change the history of customer phone numbers from, if not effectively distinguish the source of new telephone subscribers can not effectively assess the effectiveness of the new user development policies, can not effectively provide continuous telephone number has been changed to the user's phone service. 利用本方法提出的用户交往特征库能够帮助电信运营商有效的识别新增电话客户的来源,如附图4以及步骤401〜404所述: Communication with the user database wherein the method proposed can help carriers effectively identify new customer telephone source, such as the figures 4 and Step 401~404:

[0042] 401.电信运营商从业务运营支撑系统、客户关系管理系统等渠道获取新加入本电信运营商入网用户的电话号码。 [0042] 401. The telecom operators access to new telecom operators to join the network user's phone number from the channel business operation support system, customer relationship management systems.

[0043] 402.抽取新入网电话号码的通信记录,按照201〜204的方法,使用公式(1)〜公式(4)构建这新入网电话号码的交往特征向量。 [0043] the new network 402. The communication log extracting the telephone number, the method according to 201~204, using equation (1) to Equation (4) Construction of a new communication network feature vector that phone number.

[0044] 403.将新入网电话号码的交往特征向量和历史电话号码的交往特征库进行对比, 根据公式(6)计算新入网电话号码和历史电话号码之间的相似度。 [0044] 403. The feature library exchanges and communication history eigenvectors telephone number the telephone number of the new network are compared, a similarity between the new subscriber's telephone number and the telephone number calculation history (6) according to the formula. 历史电话号码不仅仅包括本电信运营商的历史电话号码也包括根据网间通信记录获取的由其他电信运营商提供服务的电话号码。 Telephone number includes not only the history of this historic telecommunications operator telephone number also includes phone numbers served by other carriers Wangjiantongxin according to records obtained.

[0045] 404.根据新入网电话号码和历史电话号码的相似度判断新入网用户来源,将新入网用户分为几类:更换号码的本电信运营商既有用户、其他电信运营商转网用户、纯粹新增用户。 [0045] 404. The new network based on the similarity phone number and the phone number of historical judgment source of new subscribers, new subscribers will be divided into several categories: Replace the number of existing users of this telecom operators, other telecommunications operators to switch network users pure new users. 提供新入网用户来源分析的商务智能分析报告。 Provide sources of new subscribers analysis of business intelligence analysis.

[0046] 本专利所述方法和装置的另外一个用途是在公共安全领域,对犯罪嫌疑人的识别与监控。 [0046] Another use of the method and apparatus of this patent is in the field of public safety, the identification of criminal suspects and monitoring. 犯罪嫌疑人尤其是在逃的犯罪嫌疑人经常会变换电话号码与其联系人取得联系。 Suspects, especially suspects at large will often transform a phone number to get in touch with their contacts. 如果对犯罪嫌疑人所有联系人进行人工监控与监听会存在很多问题:(1)人工的监控与监听需要耗费大量的时间与人力,且准确率不高;(2)人工监控与监听的及时性很差;(3)人工的监控与监听会侵犯很多人的隐私,这样的方法难以获得法律上的许可。 If you suspect all contacts manual monitoring and monitoring will be a lot of problems: (1) manual monitoring and listening takes a lot of time and manpower, and the accuracy is not high; timeliness (2) manual monitoring and listening poor; (3) manual monitoring and a lot of people listening would violate the privacy of such an approach is difficult to obtain legal permission. 利用本专利所述方法和装置能够自动圈定犯罪嫌疑人变更后的可能的电话号码的范围,因为一切工作由计算机自动完成,因而能够有效的避免上述的各种问题。 With the present method and apparatus of the patent to automatically delineate the range of possible telephone number change after suspect because all work is done automatically by a computer, it is possible to effectively avoid the above problems. 其具体方法如附图5以及步骤501〜505所述: The specific method as described in Fig. 5 and Step 501~505:

[0047] 501.公安部门获得相关法律批准之后向电信运营商提供犯罪嫌疑人历史电话号码。 [0047] 501. The public security department after obtaining the approval of the relevant law provides criminal suspects history telephone numbers to telecommunications operators.

[0048] 502.电信运营商从历史通信记录中获取犯罪嫌疑人历史电话号码的通信记录。 [0048] 502. The telecom operator for recording communication history suspects phone number from the history of communications records.

[0049] 503.电信运营商通过犯罪嫌疑人的历史电话号码s的历史通信记录如公式(2)〜 (4)所述构建其交往特征向量 [0049] 503. The historical communications carriers by suspects criminal history records such as phone number s of equation (2) to (4) which contacts the feature vector construct

[0050] 504.电信运营商在通信设备上设置预警规则,筛选疑似当前由犯罪嫌疑人使用电话号码。 [0050] 504. The telecom operators set up early warning rules on communications equipment, screening telephone number currently used by the suspected criminal suspects. 预警规则设置如下:抽取犯罪嫌疑人s交往特征向量Vs中权重最大的若干变量 Warning Rules as follows: extract the suspect s contacts feature vectors Vs crime heaviest weights several variables

Ky2……W5^ .·,·..以及对应的联系人组合Is1 · S2,…Si · ίν··},当任何一个电话号码P Ky2 ...... W5 ^. ·, · .. and the corresponding contact combinations Is1 · S2, ... Si · ίν ··}, if any phone number P

和联系人组合Si -Sj中的Si与~都发生联系,则该电话号码P被加入犯罪嫌疑人S的疑似电话号码清单中,以便下一步进行更为深入的调查。 And contacts the combination of Si Si -Sj in contact with - have taken place, then the phone number P is added to the list of telephone numbers of suspected criminal suspects S, the next step for more in-depth investigation.

[0051] 505.电信运营商技术人员对由系统产生的可疑电话号码进行人工判断,筛选出最有可能的犯罪嫌疑人当前使用的电话号码,并且按照法律程序将该电话号码的通信记录移交公安部门。 [0051] 505. The telecom operators and technical personnel of suspicious phone number generated by the system manual to determine screen out the phone number of the most likely suspects currently in use, and transferred to public security in accordance with legal procedures the communication record phone numbers department. 电信运营商技术人员在对由系统产生的可疑电话号码进行人工判断时依据的标准包括判断该可疑电话号码的入网时长,该可疑电话号码历史通信次数和费用等。 Telecom operators and technical personnel based in the suspicious phone number generated by the system manually when judging criteria include long network the suspicious phone number, the number of suspicious phone numbers and the history of communication costs. 电信运营商将经过技术人员人工筛选的可疑电话号码的通信记录在法律的指导下移交公安部门,由公安部门进行进一步分析。 Telecom operators will record the communication by a qualified technician manual screening of suspicious phone number to the public security department under the guidance of the law, by the public security department for further analysis.

附图说明 BRIEF DESCRIPTION

图1是用户身份识别与匹配整体流程图; 图2是通信用户交往特征抽取流程图; 图3是电信运营商离网用户去向识别流程图。 FIG 1 is an overall flowchart of the matching user identification; FIG. 2 is a flowchart of the feature extraction exchanges user communications; FIG. 3 is a flow chart identifying carriers from the destination network users. 图4是电信运营商新入网用户来源识别流程图。 FIG 4 is a new network carriers flowchart for source identification. 图5是可疑用户自动识别与监控流程图。 FIG 5 is a suspicious user automatic identification and monitoring a flowchart.

Claims (5)

1. 一种可用于电信运营商、电子邮件服务提供商、即时消息服务提供商、社交网络服务提供商以及安全领域的基于用户的社交网络利用少量的用户行为数据对用户身份进行快速准确识别与匹配的方法与装置,其方法与装置的核心包括:对用户之间关联关系的识别、 对用户之间关联关系强度的定义、 对不适合本方法与装置的用户筛选、 对不适合用作用户特征的用户关系筛选、对用户交往关系特征的定义与表征并构建通信用户交往特征数据库、 对用户特征的匹配与识别。 A can be used for telecom operators, service providers e-mail, instant user identity quickly and accurately identified based on the user's social network with a small number of user behavior data messaging service providers, social networking service providers as well as in the field of security and matching method and apparatus, which method and apparatus the core comprising: identification of the association between the user, the definition of the strength of the association between the user relationship, not suitable for the user of the present method and apparatus for screening, not suitable for use as the user user relationship features screening, characterization of user interaction defined relationship features and build a communication feature database user interaction, and matching a user identification feature.
2.为了实现权利要求1所述功能而构建的系统包括:从电信运营商的运营系统中抽取通信用户的通信历史记录、利用通信记录建立通信用户交往特征数据库、以及将未知用户和通信用户交往特征数据库中的特征相匹配的功能。 2. In order to realize the function of a system constructed according to claim comprising: extracting from the telecommunications operator's operation of the communication history of the communication system user, the user communication to establish a communication with the communication feature database records, and the unknown user and user communication exchanges the characteristic feature database match function.
3.利用如权利要求1所述的用户交往特征信息库识别竞争对手电信运营商新增用户来源、以及识别本电信运营商用户离网征兆以及离网去向,以及为了达到此目的所需要构建的包括如下功能模块的系统:根据网间通话记录发现其他电信运营商的新增电话号码的模块; 抽取其他电信运营商新增电话号码和本间通信记录的模块;将其他电信运营商的新增电话号码的交往特征向量和本电信运营商的电话号码的交往特征库进行比对的模块。 3. The user using the new communication claim 1, wherein the identification repository rival carriers user source, and to identify carriers of the present user signs off, and off-grid network destination, and for this purpose need to build the system includes following functional blocks: a recording module discovery of new telephone number in accordance with other carriers if Wang Jiantong; new phone number extracted communication record between the present and the other module carriers; the other telecom operators add intercourse intercourse feature library feature vector phone number and the phone number of telecom operators were present module alignment.
4.利用如权利要求1所述的用户交往特征信息库识别本电信运营商新入网电话号码来源的系统。 4. The user using the system of claim 1 wherein contact information database to identify carriers of the present claimed in claim network telephone number of the new source.
5.利用如权利要求1所述的用户行为特征向量的方法帮助电信运营商寻找一种能够在最大限度保护电话用户的隐私的前提下为安全部门及时发现犯罪嫌疑人所使用的新的电话号码的方法与系统包括如下功能:电信运营商从历史通信记录中获取犯罪嫌疑人历史电话号码的通信记录、 电信运营商通过犯罪嫌疑人的历史电话号码的历史通信记录构建其交往特征向量、 电信运营商在通信设备上设置预警规则,筛选疑似当前由犯罪嫌疑人使用电话号码。 5. The method of claim 1 using the user behavior feature vectors to help telecom operators to find a way to security sector to discover a new phone number used by the suspects under the premise of maximum protection of the privacy of phone users the method and system include the following functions: telecom operators to obtain communication records suspects history telephone number from the history of communications records, telecom operators to build their contacts feature vectors by historical communications recording the history of the telephone number of the suspect, telecom operators business rules set up early warning on the communication device, screening telephone number currently used by the suspected criminal suspects.
CN2011102263262A 2011-08-09 2011-08-09 Method for identifying and matching user identity by user-based social network CN102355664A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011102263262A CN102355664A (en) 2011-08-09 2011-08-09 Method for identifying and matching user identity by user-based social network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011102263262A CN102355664A (en) 2011-08-09 2011-08-09 Method for identifying and matching user identity by user-based social network

Publications (1)

Publication Number Publication Date
CN102355664A true CN102355664A (en) 2012-02-15

Family

ID=45579145

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011102263262A CN102355664A (en) 2011-08-09 2011-08-09 Method for identifying and matching user identity by user-based social network

Country Status (1)

Country Link
CN (1) CN102355664A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103279713A (en) * 2013-06-08 2013-09-04 广西师范大学 Optimized SNS (social network service) graph data publication privacy protection method
CN103914494A (en) * 2013-01-09 2014-07-09 北大方正集团有限公司 Method and system for identifying identity of microblog user
CN103916244A (en) * 2013-01-04 2014-07-09 深圳市腾讯计算机系统有限公司 Verification method and device
CN104933139A (en) * 2015-06-17 2015-09-23 中国科学院计算技术研究所 Social network user identity real-virtual mapping method and device
CN105243566A (en) * 2015-10-28 2016-01-13 联动优势科技有限公司 Method and apparatus for evaluating credit of users through different mobile phone number information from operators
CN105306213A (en) * 2015-09-23 2016-02-03 中国联合网络通信集团有限公司 User information processing method and system
CN105590232A (en) * 2014-11-11 2016-05-18 中国移动通信集团广东有限公司 Client relation generation method and apparatus, and electronic device
CN105959911A (en) * 2016-06-22 2016-09-21 中国联合网络通信集团有限公司 Method and device for identifying user
CN106331060A (en) * 2016-08-12 2017-01-11 广州市高奈特网络科技有限公司 Control execution method and system based on WIFI

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030217283A1 (en) * 2002-05-20 2003-11-20 Scott Hrastar Method and system for encrypted network management and intrusion detection
CN1547402A (en) * 2003-12-11 2004-11-17 上海正前信息科技发展有限公司 Reverse authentication system and reverse authentication method for identity registration handset short message
WO2005011305A1 (en) * 2003-07-31 2005-02-03 Huawei Technologies Co., Ltd. A optimization mutual method of the user terminal select accessing mobile network in wlan
CN1889730A (en) * 2005-09-30 2007-01-03 华为技术有限公司 Wireless user identification module, communication terminal equipment and communication control method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030217283A1 (en) * 2002-05-20 2003-11-20 Scott Hrastar Method and system for encrypted network management and intrusion detection
WO2005011305A1 (en) * 2003-07-31 2005-02-03 Huawei Technologies Co., Ltd. A optimization mutual method of the user terminal select accessing mobile network in wlan
CN1547402A (en) * 2003-12-11 2004-11-17 上海正前信息科技发展有限公司 Reverse authentication system and reverse authentication method for identity registration handset short message
CN1889730A (en) * 2005-09-30 2007-01-03 华为技术有限公司 Wireless user identification module, communication terminal equipment and communication control method

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103916244A (en) * 2013-01-04 2014-07-09 深圳市腾讯计算机系统有限公司 Verification method and device
WO2014106422A1 (en) * 2013-01-04 2014-07-10 Tencent Technology (Shenzhen) Company Limited Method and apparatus for user authentication
US9985944B2 (en) 2013-01-04 2018-05-29 Tencent Technology (Shenzhen) Company Limited Method and apparatus for user authentication
CN103914494B (en) * 2013-01-09 2017-05-17 北大方正集团有限公司 Method and system for identifying identity of microblog user
CN103914494A (en) * 2013-01-09 2014-07-09 北大方正集团有限公司 Method and system for identifying identity of microblog user
CN103279713B (en) * 2013-06-08 2015-11-18 广西师范大学 A kind of social network diagram data publication method for secret protection of optimization
CN103279713A (en) * 2013-06-08 2013-09-04 广西师范大学 Optimized SNS (social network service) graph data publication privacy protection method
CN105590232A (en) * 2014-11-11 2016-05-18 中国移动通信集团广东有限公司 Client relation generation method and apparatus, and electronic device
CN104933139A (en) * 2015-06-17 2015-09-23 中国科学院计算技术研究所 Social network user identity real-virtual mapping method and device
CN104933139B (en) * 2015-06-17 2018-06-01 中国科学院计算技术研究所 A kind of method and device of social network user identity actual situation mapping
CN105306213A (en) * 2015-09-23 2016-02-03 中国联合网络通信集团有限公司 User information processing method and system
CN105243566A (en) * 2015-10-28 2016-01-13 联动优势科技有限公司 Method and apparatus for evaluating credit of users through different mobile phone number information from operators
CN105959911A (en) * 2016-06-22 2016-09-21 中国联合网络通信集团有限公司 Method and device for identifying user
CN105959911B (en) * 2016-06-22 2019-07-16 中国联合网络通信集团有限公司 Identify the method and device of user
CN106331060A (en) * 2016-08-12 2017-01-11 广州市高奈特网络科技有限公司 Control execution method and system based on WIFI
CN106331060B (en) * 2016-08-12 2019-12-31 广州市高奈特网络科技有限公司 WIFI-based deployment and control method and system

Similar Documents

Publication Publication Date Title
US8219404B2 (en) Method and apparatus for recognizing a speaker in lawful interception systems
US8144850B2 (en) Logging calls according to call context
US9990683B2 (en) Systems and methods for acquiring, accessing, and analyzing investigative information
US20100199189A1 (en) Apparatus and method for target oriented law enforcement interception and analysis
US7092508B2 (en) Rating parties according to party identities
Ferrara et al. Detecting criminal organizations in mobile phone networks
US8788657B2 (en) Communication monitoring system and method enabling designating a peer
US7110514B2 (en) Identifying a context for a call
Hidalgo et al. The dynamics of a mobile phone network
US9270833B2 (en) Method and system for preventing illicit use of a telephony platform
KR20130097074A (en) Caller id surfing
US20090136009A1 (en) Knowledge Management, Capture and Modeling Tool for Multi-Modal Communications
US9165288B2 (en) Inferring relationships based on geo-temporal data other than telecommunications
US7076041B2 (en) Third party regulation of calls through a particular line based on a call context
US8560471B2 (en) Systems and methods for generating leads in a network by predicting properties of external nodes
CN102663106B (en) Establish the method and system of number information database
US20090012760A1 (en) Method and system for activity monitoring and forecasting
CN106791229A (en) The identification method and device of number
US8041592B2 (en) Collection and analysis of multiple data sources
WO2009084749A1 (en) Method for recommending contents with context awareness
US20080056144A1 (en) System and method for analyzing and tracking communications network operations
Saramäki et al. From seconds to months: an overview of multi-scale dynamics of mobile telephone calls
Becker et al. Fraud detection in telecommunications: History and lessons learned
CN101350957B (en) Method and equipment for shielding rubbish short message
Eagle et al. Community computing: Comparisons between rural and urban societies using mobile phone data

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)