WO2018113370A1 - 扩展用户的方法、装置及系统 - Google Patents

扩展用户的方法、装置及系统 Download PDF

Info

Publication number
WO2018113370A1
WO2018113370A1 PCT/CN2017/104079 CN2017104079W WO2018113370A1 WO 2018113370 A1 WO2018113370 A1 WO 2018113370A1 CN 2017104079 W CN2017104079 W CN 2017104079W WO 2018113370 A1 WO2018113370 A1 WO 2018113370A1
Authority
WO
WIPO (PCT)
Prior art keywords
behavior
user
seed
users
behavior data
Prior art date
Application number
PCT/CN2017/104079
Other languages
English (en)
French (fr)
Inventor
张海滨
杨怡玲
张旭
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP17884770.3A priority Critical patent/EP3537365A4/en
Publication of WO2018113370A1 publication Critical patent/WO2018113370A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/60Business processes related to postal services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0204Market segmentation

Definitions

  • the present invention relates to the field of communications, and in particular, to a method, device, and system for extending a user.
  • telecom operators are constantly introducing new services. How to accurately target the needs of new business, and then expand the number of potential users, become one of the joint factors of whether the new business can be successfully launched.
  • a telecom operator can obtain a list of users who have used a certain telecommunication service (such as a ring back tone service), and these users are called “seed users.” Based on these "seed users”, telecom operators hope to expand users to some extent and promote new services to potential extended users among more non-seed users.
  • a telecom operator has developed a new “Campus Package” for holiday interns based on market research.
  • the package includes: free of charge, 8:00 pm to 6:00 the next morning, 6 days after the flow rate reduction and other services. .
  • a telecom operator has obtained the initiative to customize this "campus package” user list (seed users), and needs to analyze the seed users who have customized the "campus package” business, and then never apply for a large number of "non-seed users” of the package.
  • the group seeks people with similar user attributes as potential extension users to increase the success rate of marketing.
  • One of the existing common solutions is to extend the user based on rules, which utilizes human experience to determine the criteria of potential users, and users who meet the standard are potential users.
  • the program needs to use a large amount of prior knowledge to determine the standard, and in many scenarios in practical applications, it is difficult to obtain prior knowledge.
  • the program usually requires business experts to participate in the development of standards, and the number of extended users based on the established standards is not too much or too little, it is difficult to accurately identify potential extended users.
  • Embodiments of the present invention provide a method, apparatus, and system for extending a user, which can solve the problem that a potential extension user of an existing solution is difficult to accurately determine.
  • the embodiment of the present invention adopts the following technical solutions:
  • a method of extending a user comprising the steps of:
  • behavior data information of a plurality of users including time and/or space related information of a behavior of the user;
  • the plurality of users include seed users and non-seed users of the entire network, wherein the seed user refers to a user who has used some telecommunication service (such as a color ring tone), and the non-seed user refers to a user who does not use some telecommunication service temporarily.
  • some telecommunication service such as a color ring tone
  • the rasterization process is specifically: segmenting the time dimension along the time axis, mapping the user's behavior data to the corresponding time period; and/or using the latitude and longitude to divide the two-dimensional position space into a grid, and the user is The behavior data is mapped to the corresponding grid.
  • Continuous time can be discretized by timeline segmentation, for example, 8:00-9:00 as a time period, then the behavior of 8:04 and the behavior of 8:08 can be classified into a time period. In order to find out the laws of commonality, it is beneficial to extract the set of behavior patterns of the seed users.
  • the behavior pattern set includes a plurality of behavior patterns, each behavior pattern represents a behavior characteristic of the seed user, and the behavior characteristic refers to a behavior habit of the seed user, that is, when the seed user's behavior occurs at a time and/or What place.
  • the step of pattern matching each non-seed user's behavior data with the behavior pattern set is specifically:
  • the level of the score represents the level of similarity.
  • the high similarity means that the behavior pattern of the non-seed user is similar to that of the seed user.
  • the low similarity means that the non-seed user and the seed user have different behavior patterns.
  • the method for determining the weighting coefficient of the fusion weight may be determined by dividing the frequency by the total number of seed users.
  • a data mining and analyzing device including:
  • a receiving unit configured to acquire behavior data information of a plurality of users, where the behavior data information includes time and/or space-related information of a behavior of the user;
  • mapping unit configured to perform rasterization processing on behavior data information of the plurality of users, and convert the behavior data information into mapping behavior data information
  • a mode extracting unit configured to perform behavior pattern extraction by using mapping behavior data information of a plurality of seed users of the plurality of users, to obtain a behavior pattern set of the plurality of seed users;
  • a pattern matching unit configured to perform pattern matching on the behavior data of each non-seed user and the behavior pattern set, and obtain similarity between each non-seed user and the plurality of seed users;
  • a determining unit determining, according to the similarity, whether at least one of the plurality of non-seed users is a potential extended user
  • a sending unit configured to send the information of the potential extended user to the marketing platform.
  • a system for extending a user including:
  • a storage device configured to store behavior data information of multiple users, where the behavior data information includes time and/or space-related information of a user's behavior;
  • the marketing platform launches a marketing service for potential extended users according to the judgment result of the data mining and analysis device.
  • Yet another aspect of the present application provides a computer program product for performing the method of the above aspects when the computer product is executed.
  • a further aspect of the present application provides a computer readable storage medium having stored therein instructions for performing the methods described in the above aspects.
  • An embodiment of the present invention provides a method, an apparatus, and a system for extending a user, by performing rasterization processing on a spatio-temporal behavior data of a seed user, obtaining a behavior pattern set of the seed user, and then performing behavior pattern data of the plurality of non-seed users.
  • a pattern matching is performed with the seed user's behavior mode set, thereby obtaining a similarity between the non-seed user and the seed user, and determining whether the non-seed user is a potential extended user based on the similarity.
  • the scheme summarizes the commonality rules, obtains a set of behavior patterns, and matches the patterns with the user attributes of non-seed users, and then roots. Score according to the matching situation to determine whether it is a potential user, so as to achieve a more accurate target user group positioning.
  • FIG. 1 is a schematic structural diagram of an extended user system according to an embodiment of the present invention.
  • FIG. 2 is a structural block diagram of a data mining and analyzing device according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of spatial rasterization processing according to an embodiment of the present invention.
  • FIG. 4 is a schematic flowchart diagram of an extended user method according to an embodiment of the present invention.
  • the system for extending users of the embodiment of the present invention includes: a storage device 10, a data mining and analysis device 20, a marketing platform 30, and a plurality of users 40.
  • the storage device 10 may be a storage server, an optical disk, a hard disk, or the like that can store user behavior data.
  • the communication base station aggregates the captured location information (latitude and longitude) of the user's mobile phone, the user's APP usage record, and the like to obtain the behavior data of the user, and then stores the acquired user behavior data in the form of the basic metadata table to the storage device 10.
  • the contents of the underlying metadata table are shown in Table 1 below:
  • the first record of Table 1 indicates that User 1 used Taobao at 8:00:00 on September 15, 2016 at the location (longitude 120.2001, dimension 30.0602).
  • the data mining and analysis device 20 is configured to mine and analyze a plurality of user behavior data stored by the storage device 10, and select a part of users from the plurality of users as potential extended users, so that the marketing platform 30 can use the potential extended user to perform new services. Promotion.
  • the data mining and analyzing device 20 of the embodiment of the present invention further includes: a receiving unit 201, a mapping unit 202, a mode extracting unit 203, a pattern matching unit 204, a determining unit 205, and a transmitting unit 206.
  • the receiving unit 201 is configured to acquire behavior data information of multiple users from the storage device 10, where the behavior data letter
  • the information includes the time and/or space related information of the user's behavior.
  • the mapping unit 202 is configured to perform rasterization processing on the behavior data information of the plurality of users acquired by the receiving unit, and convert the behavior data information into mapping behavior data information.
  • the time rasterization process is specifically: segmenting the time dimension along the time axis, for example, a certain time point such as the beginning of each month as the origin of the time axis, and 1 hour as a time period (or a time slot) Slot), divided into 24 time periods each day, and any time between 8:00 and 9:00 is discretized to the number 8.
  • the division of the time period can also be performed in other ways as needed, such as 10 minutes as a time period or 30 minutes as a time period. If a user browses multiple websites within a certain period of time, they can take the top 3 websites as the representative.
  • the spatial rasterization process is specifically: using the latitude and longitude information to divide the two-dimensional position space into a grid.
  • the geographical space is divided into a number of squares with a side length of 500 meters, and the position of the user at any one time can be located in one of the grids.
  • the implementation in this embodiment is that the latitude and longitude 0:0 represents the coordinate origin, and the distance of 500 meters corresponds to the change in the longitude step (step1) of 0.0045 degrees, and the same latitude step (step2). The change is 0.0045 degrees.
  • each extension to the right is 500 meters, the longitude is increased by 0.0045, and each upward extension is 500 meters, and the latitude is increased by 0.0045.
  • the converted coordinates are 1:1.
  • the converted coordinates are still 1:1. Through rasterization conversion, point A and point B fall in the same square represented by 1:1.
  • mapping behavior data information is obtained, as shown in the following table. 3 shows:
  • the mode extracting unit 203 performs pattern extraction on the behavior data of the seed user (assuming that the number of seed users is 500) in the mapping behavior data information, and obtains a behavior pattern set sorted based on the frequency or the number of customers, as shown in Table 4 below:
  • mode 1 use Taobao in time period 8, use Yahoo in time period 10, the number of people with this behavior mode is 209;
  • mode 2 is: use Taobao in time period 8, use Baidu in time period 10, have the The number of people in the behavioral pattern is 189.
  • each behavior pattern represents a behavioral characteristic of the seed user, and the behavioral characteristic refers to a behavioral habit of the seed user.
  • the pattern matching unit 204 performs pattern matching on the behavior data of each non-seed user with the behavior pattern set, and the matching result is as follows:
  • User 1 matches the behavior mode as 1, 2, 3, 4, 5;
  • the similarity between the non-seed user and the seed user is calculated by adopting a method of combining weights.
  • the judging unit 205 sorts according to the similarity scores of the non-seed users, and the higher the score, the non-seed users are considered The closer the seed user's behavior pattern is, the higher the success rate of promoting new services, as a potential extension user.
  • the sending unit 206 provides the list of potential extended users to the marketing platform 30 so that the marketing platform 30 utilizes the list of potential extended users for promotion of new services.
  • the method for extending a user in the embodiment of the present invention includes the following steps:
  • the data mining and analyzing device acquires the behavior data information of the plurality of users from the storage device, where the multiple users include multiple seed users and multiple non-seed users, and the behavior data information specifically includes one of the following three situations:
  • the user s behavior and the time at which the action occurred, or
  • the user s behavior and the place where the action occurred, or
  • the data mining and analyzing device performs rasterization processing on the behavior data information of the plurality of users, and converts the behavior data information into mapping behavior data information.
  • Rasterization processing refers to rasterizing the behavior data information to map the user's behavior data to corresponding time periods and/or spaces. Specifically, the time dimension is segmented along the time axis, and the user's behavior data is mapped to the corresponding time segment; and/or the two-dimensional position space is divided into grids by using latitude and longitude, and the user's behavior data is mapped to the corresponding On the square.
  • mapping behavior data information are as shown in Table 3 of the previous embodiment, and are not described again.
  • the data mining and analyzing device performs behavior pattern extraction by using mapping behavior data information of the plurality of seed users of the plurality of users, and obtains a behavior pattern set of the plurality of seed users.
  • the association rule algorithm Prefixspan is used to perform pattern mining on the behavior data of the seed user, and the behavior pattern set based on the frequency or the number of users is obtained, as shown in Table 4 of the previous embodiment, and details are not described herein.
  • the behavior pattern set includes a plurality of behavior patterns, each behavior pattern represents a behavior characteristic of the seed user, and the behavior characteristic refers to a behavior habit of the seed user, that is, when and/or where the certain behavior of the seed user occurs .
  • the data mining and analyzing device uses the behavior pattern set to perform pattern matching on the behavior data of each non-seed user and the behavior pattern set of the plurality of seed users, respectively, to obtain each of the non-seed users and the plurality of The similarity of the seed users.
  • the behavior data of each non-seed user is matched with the behavior pattern set of the seed user; if a non-seed user matches multiple spatio-temporal modes, the matching result is obtained by using the sampling fusion weight, and the fusion weight is calculated.
  • the method is the same as the previous embodiment and will not be described again.
  • the data mining and analyzing device determines, according to the similarity, whether the non-seed user is a potential extended user. For example, if the total number of non-seed users is 500, if 100 people need to be selected as potential extended users, the similarity scores of the 500 non-seed users can be sorted in descending order. It is considered that the top 100 non-seed users have similar spatiotemporal behavior patterns as the seed users, and are determined to be potential extended users.
  • the present invention summarizes the common law by analyzing the behavior data of the time and/or space of the seed user, extracts the behavior pattern set of the seed user, and then the behavior data and the seed of each non-seed user.
  • the user's behavior pattern set is matched one by one, and the similarity score is obtained according to the matching result to determine whether the non-seed user is a potential extended user. It enables telecom operators to locate potential customers more accurately and improve the efficiency of new business expansion.
  • the apparatus and method disclosed in the several embodiments provided by the present application may be implemented in other manners.
  • the division of the unit is only a logical function division, and the actual implementation may have another division manner.
  • multiple units or components may be combined or may be integrated into another system, or some features may be ignored, or carried out.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interface, device or unit. Indirect coupling or communication connections may be in electrical, mechanical or other form.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • Embodiments of the present invention also provide a computer program product and a storage medium storing the above computer program.
  • the computer program product includes program code stored in a computer readable storage medium, and the program code is loaded by a processor to implement the above method.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes.

Landscapes

  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Primary Health Care (AREA)
  • Tourism & Hospitality (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

公开了一种扩展用户的方法,包括:获取多个用户的行为数据信息,所述行为数据信息包括用户的行为所发生的时间和/或空间相关的信息;对行为数据信息进行栅格化处理,将所述行为数据信息转化为映射行为数据信息;利用所述映射行为数据信息进行行为模式提取,获得所述多个种子用户的行为模式集合;将每一个非种子用户的行为数据与所述行为模式集合进行模式匹配,获取所述每一个非种子用户与所述多个种子用户的相似度;根据所述相似度判断所述多个非种子用户中至少一个是否为潜在扩展用户。

Description

扩展用户的方法、装置及系统
本申请要求于2016年12月21日提交中国专利局、申请号为201611194189.8,发明名称为“扩展用户的方法、装置及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及通信领域,尤其涉及一种扩展用户的方法、装置及系统。
背景技术
电信运营商为了吸引更多用户,不断地推出新的业务。如何精准定位新业务的需求人群,进而扩展潜在的用户数量,成为新业务是否能成功推出的关节因素之一。通常,电信运营商可以获取一批已经使用某种电信业务(如彩铃业务)的用户名单,这些用户称为“种子用户”。基于这些“种子用户”,电信运营商希望通过某种方法来扩展用户,将新业务推广给更多的非种子用户中的潜在扩展用户。
例如,A电信运营商根据市场调研制定了一个新的针对假期实习生的一款优惠“校园套餐”,该套餐包括:接听免费、晚上8点到第二天早上6点流量计费减免等服务。A电信运营商已经获取主动定制这个“校园套餐”用户名单(种子用户),需要通过对已经定制该“校园套餐”业务的种子用户进行分析,然后从未办理该套餐的大量“非种子用户”群中寻找具备相似用户属性的人群作为潜在扩展用户,以此提升营销的成功率。
现有的常用方案之一是基于规则来扩展用户,该方案利用人的经验来确定潜在用户的标准,符合该标准的用户即为潜在用户。该方案需要利用大量的先验知识来确定标准,而在实际应用中的很多场景都很难获取先验知识。另外,该方案通常需要业务专家参与制定标准,且根据制定的标准得到扩展用户数量不是过多就是过少,很难精准地确定潜在扩展用户。
发明内容
本发明的实施例提供一种扩展用户的方法、装置及系统,能够解决现有方案的潜在扩展用户难以精准确定的问题。
为解决上述技术问题,本发明的实施例采用如下技术方案:
第一方面,提供一种扩展用户的方法,包括以下步骤:
获取多个用户的行为数据信息,所述行为数据信息包括用户的行为所发生的时间和/或空间相关的信息;
对所述多个用户的行为数据信息进行栅格化处理,将所述行为数据信息转化为映射行为数据信息;
利用所述多个用户中的多个种子用户的映射行为数据信息进行行为模式提取,获得所述多个种子用户的行为模式集合;
将每一个非种子用户的行为数据与所述行为模式集合进行模式匹配,获取所述每一个非种子用户与所述多个种子用户的相似度;
根据所述相似度判断所述多个非种子用户中至少一个是否为潜在扩展用户。
该多个用户包括全网的种子用户和非种子用户,其中,种子用户是指已经使用某种电信业务(如彩铃)的用户,非种子用户是指暂时未使用某种电信业务的用户。
所述栅格化处理具体为:将时间维度沿着时间轴分段,将用户的行为数据映射到对应的时间段上;和/或利用经纬度将二维位置空间按网格分块,将用户的行为数据映射到对应的网格上。通过时间轴分段可以将连续的时间离散化,例如8:00-9:00作为一个时间段,则8:04发生的行为和8:08发生的行为就可以归类到一个时间段上,以便找出共性的规律,从而有利于提取出种子用户的行为模式集合。
所述行为模式集合包括多个行为模式,每一个行为模式代表种子用户的一种行为特征,行为特征是指种子用户的一种行为习惯,即种子用户的某种行为发生在什么时间和/或什么地点。
将每一个非种子用户的行为数据与所述行为模式集合进行模式匹配的步骤,具体为:
将每一个非种子用户的行为数据与种子用户所具有的行为模式集合进行匹配;如果一个非种子用户匹配多种行为模式,则利用融合权重的方式获取匹配结果,最后获得每一个非种子用户与种子用户所具有的行为模式集合的相似度分数。分值的高低代表相似度的高低,相似度高意味着该非种子用户与种子用户的行为模式类似,相似度低意味着该非种子用户与种子用户的行为模式不同。
融合权重的权重系数的确定方法可以采用频度除以种子用户总数的方式确定。
本发明的第二方面,提供了一种数据挖掘与分析设备,包括:
接收单元,用于获取多个用户的行为数据信息,所述行为数据信息包括用户的行为所发生的时间和/或空间相关的信息;
映射单元,用于对所述多个用户的行为数据信息进行栅格化处理,将所述行为数据信息转化为映射行为数据信息;
模式提取单元,用于利用所述多个用户中的多个种子用户的映射行为数据信息进行行为模式提取,获得所述多个种子用户的行为模式集合;
模式匹配单元,用于将每一个非种子用户的行为数据与所述行为模式集合进行模式匹配,获取所述每一个非种子用户与所述多个种子用户的相似度;
判断单元,根据所述相似度判断所述多个非种子用户中至少一个是否为潜在扩展用户;
发送单元,用于发送所述潜在扩展用户的信息给营销平台。
本发明的第三方面,提供了一种扩展用户的系统,包括:
存储设备,用于存储多个用户的行为数据信息,所述行为数据信息包括用户的行为所发生的时间和/或空间相关的信息;
第二方面所述的数据挖掘与分析设备;以及
营销平台,根据所述数据挖掘与分析设备的判断结果对潜在扩展用户推出营销业务。
本申请的又一方面提供了一种计算机程序产品,当该计算机产品被执行时,其用于执行上述方面所述的方法。
本申请的又一方面提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,所述指令用于执行上述方面所述的方法。
本发明的实施例提供一种扩展用户的方法、装置及系统,通过对种子用户的时空行为数据进行栅格化处理,获得种子用户的行为模式集合,然后将多个非种子用户的行为模式数据与种子用户的行为模式集合进行模式匹配,进而得到非种子用户与种子用户的相似度,基于所述相似度判断非种子用户是否为潜在扩展用户。该方案通过分析种子用户的用户属性,总结其中的共性规律,得到行为模式集合,利用非种子用户的用户属性匹配这些模式,然后根 据匹配情况打分,确定是否是潜在用户,从而实现更精准的潜在用户群定位。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本发明一个实施例的扩展用户系统的组成示意图;
图2为本发明一个实施例的数据挖掘与分析设备的结构框图;
图3为本发明一个实施例的空间栅格化处理的示意图;
图4为本发明一个实施例的扩展用户方法的流程示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述。
如图1所示,本发明实施例的扩展用户的系统,包括:存储设备10、数据挖掘与分析设备20、营销平台30以及多个用户40。存储设备10可以是存储服务器、光盘、硬盘等可以存储用户行为数据的设备。通信基站将捕捉到的用户手机的位置信息(经度和维度)、用户的APP使用记录等汇总得到用户的行为数据,然后以基础元数据表的形式来将获取的用户行为数据存放到存储设备10的存储介质上。基础元数据表的内容如下表1所示:
Figure PCTCN2017104079-appb-000001
表1基础元数据表
表1的第1条记录表示:用户1在2016年9月15日8:00:00,在位置(经度120.2001,维度30.0602)使用了淘宝。
数据挖掘与分析设备20用于对存储设备10所存储的多个用户行为数据进行挖掘与分析,从多个用户中选择一部分用户作为潜在扩展用户,以便营销平台30利用该潜在扩展用户进行新业务的推广。
如图2所示,本发明实施例的数据挖掘与分析设备20进一步包括:接收单元201、映射单元202、模式提取单元203、模式匹配单元204、判断单元205以及发送单元206。
其中,接收单元201用于从存储设备10获取多个用户的行为数据信息,所述行为数据信 息包括用户的行为所发生的时间和/或空间相关的信息。
映射单元202用于对接收单元所获取的多个用户的行为数据信息进行栅格化处理,将行为数据信息转化为映射行为数据信息。
时间栅格化处理具体为:将时间维度沿着时间轴分段,例如,可以将某个时间点如每个月月初作为时间轴原点,1个小时作为一个时间段(或称为一个时隙slot),每天划分为24个时间段,8:00-9:00之间的任何一个时间都离散化为数字8。时间段的划分也可以根据需要采用其它方式,如10分钟作为一个时间段或者30分钟作为一个时间段等。如果用户在一个时间段内浏览了多个网站,则可以取浏览次数排名前3的网站作为代表。
表1的基础元数据表的行为数据经过时间栅格化处理后,得到的数据如下表2所示:
用户ID 时间 APP使用情况
ID1 8 淘宝
ID1 10 雅虎
ID1 11 百度
ID1 15 新浪
ID2 8 淘宝
ID2 9 淘宝
ID2 10 雅虎
ID2 11 百度
ID3 8 淘宝
IDn 23 天猫
表2
空间栅格化处理具体为:利用经纬度信息,将二维位置空间按网格分块。例如将地理空间划分为若干个边长为500米的方格,用户在任何一个时刻的位置都可以定位在其中的一个网格内。如图3所示,在本实施例中的实现方式是将经纬度0:0代表坐标原点,500米的距离对应在经度上步长(step1)的变化为0.0045度,同理纬度步长(step2)的变化为0.0045度。
每个网格可用如下公式做运算得到:
Longi=Math.floor(longitude/step1)向下取整经度转换
Lati=Math.floor(latitude/step2)向下取整纬度转换
根据图3,每向右延长500米,经度增加0.0045,每向上延长500米,纬度增加0.0045。对于网格中的某一点A,假设其原始坐标经纬度为0.0045:0.0045,根据经纬度转换公式Longi=floor(0.0045/0.0045)向下取整,Lati=floor(0.0045/0.0045)向下取整栅格化转换后的坐标为1:1。对于网格中的另一点B,其原始坐标经纬度为0.0055:0.0055,根据经纬度转换公式Longi=floor(0.0055/0.0045)向下取整,Lati=floor(0.0055/0.0045)向下取整栅格化转换后的坐标依然为1:1。通过栅格化转换得到点A和点B落在同一个以1:1为代表的方格中。
所以,对于空间上任意一点经纬度坐标为120.20XX:30.06XX,经过栅格化的转换可映射到floor(120.20XX/0.0045)=23XXX floor(30.06XX/0.0045)=67XX以Longi:Lati(23XXX:67XX)为代表的方格中。
表1的基础元数据表经过时间和空间的栅格化处理后,得到映射行为数据信息,如下表 3所示:
用户ID 时间 位置 APP使用情况
ID1 8 23XXX:67XX 淘宝
ID1 10 23XXX:67XX 雅虎
ID1 11 23XXX:67XX 百度
ID1 15 23XXX:67XX 新浪
ID2 8 23XXX:67XX 淘宝
ID2 9 23XXX:67XX 淘宝
ID2 10 23XXX:67XX 雅虎
ID2 11 23XXX:67XX 百度
ID3 8 23XXX:67XX 淘宝
IDn 23 23XXX:67XX 天猫
表3映射行为数据信息
模式提取单元203对映射行为数据信息中的种子用户(假设种子用户数量为500人)的行为数据进行模式提取,可以得到基于频度或者客户数量排序的行为模式集合,如下表4所示:
模式编号 行为模式 人数
模式1 8淘宝,10雅虎 209
模式2 8淘宝,11百度 189
模式3 8淘宝,10雅虎,15新浪 80
模式4 10地点A,11百度 50
模式5 8地点A淘宝 30
模式n 6地点A新浪,8地点B雅虎 2
表4行为模式集合表
模式1的含义为:在时间段8使用淘宝,时间段10使用雅虎,具有该行为模式的人数为209人;模式2的含义为:在时间段8使用淘宝,时间段10使用百度,具有该行为模式的人数为189人。其中,每一个行为模式代表种子用户的一种行为特征,行为特征是指种子用户的一种行为习惯。
模式匹配单元204将每一个非种子用户的行为数据与所述行为模式集合进行模式匹配,匹配的结果如下:
用户1匹配的行为模式为1,2,3,4,5;
用户2匹配的行为模式为1,2,4;
用户3匹配的行为模式为5;
然后,采取融合权重的方式,计算非种子用户与述种子用户的相似度。
用户1的分数为:(209+189+80+50+30)/500=1.116(种子用户数为500)
用户2的分数为:(209+189+50)/500=0.896
用户3的分数为:30/500=0.06
判断单元205根据非种子用户的相似度分数进行排序,分数越高则认为该非种子用户与 种子用户的行为模式越接近,推广新业务的成功率越高,作为潜在扩展用户。
发送单元206将潜在扩展用户的清单提供给营销平台30,以便营销平台30利用该潜在扩展用户的清单进行新业务的推广。
如图4所示,本发明实施例扩展用户的方法包括以下步骤:
S102,数据挖掘与分析设备从存储设备获取多个用户的行为数据信息,该多个用户包括多个种子用户和多个非种子用户,该行为数据信息具体包括以下三种情况之一:
用户的行为和该行为所发生的时间,或
用户的行为和该行为所发生的地点,或
用户的行为和该行为所发生的时间及地点相关的信息。
记录该行为数据信息的形式和内容如前一实施例的表1所示,不再赘述。
S104,数据挖掘与分析设备对所述多个用户的行为数据信息进行栅格化处理,将所述行为数据信息转化为映射行为数据信息。栅格化处理是指将所述的行为数据信息栅格化,使用户的行为数据映射到对应的时间段和/或空间上。具体为:将时间维度沿着时间轴分段,将用户的行为数据映射到对应的时间段上;和/或利用经纬度将二维位置空间按网格分块,将用户的行为数据映射到对应的方格上。
该映射行为数据信息的形式及内容,如前一实施例的表3所示,不再赘述。
S106,数据挖掘与分析设备利用所述多个用户中的多个种子用户的映射行为数据信息进行行为模式提取,获得所述多个种子用户的行为模式集合。
具体为,利用关联规则算法Prefixspan对种子用户的行为数据进行模式挖掘,得到基于频度或者用户数量排序的行为模式集合,如前一实施例的表4所示,不再赘述。行为模式集合包括多个行为模式,每一个行为模式代表种子用户的一种行为特征,行为特征是指种子用户的一种行为习惯,即种子用户的某种行为发生在什么时间和/或什么地点。
S108,数据挖掘与分析设备利用所述行为模式集合,将每一个非种子用户的行为数据分别与多个种子用户的行为模式集合逐一进行模式匹配,获取所述每一个非种子用户与所述多个种子用户的相似度。
具体为,将每一个非种子用户的行为数据与种子用户所具有的行为模式集合进行匹配;如果一个非种子用户匹配多种时空模式,则利用采样融合权重的方式获取匹配结果,融合权重的计算方式同前一实施例,不再赘述。
S110,数据挖掘与分析设备根据所述相似度判断所述非种子用户中是否为潜在扩展用户。以非种子用户的总数为500人为例,如果需要从这500人中选出100人为潜在扩展用户,则可以对这500个非种子用户的相似度分值按照从高到低的顺序排序,则认为该排名前100名的非种子用户具备与种子用户类似的时空行为模式,确定为潜在扩展用户。
与现有技术相比,本发明通过分析种子用户的时间和/或空间的行为数据,总结其中的共性规律,提取出种子用户的行为模式集合,然后将每一个非种子用户的行为数据与种子用户的行为模式集合逐一匹配,根据匹配结果获得相似度分数,确定非种子用户是否是潜在扩展用户。使得电信运营商可以更精准的定位潜在客户群,提高新业务的拓展效率。
在本申请所提供的几个实施例中所揭露的装置和方法,可以通过其它的方式实现。例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的 间接耦合或通信连接,可以是电性,机械或其它的形式。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
本发明实施例还提供了计算机程序产品以及存储有上述计算机程序的存储介质。该计算机程序产品包括在计算机可读存储介质中存储的程序代码,并且该程序代码通过处理器进行加载来实现上述方法。前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。

Claims (15)

  1. 一种扩展用户的方法,包括以下步骤:
    获取多个用户的行为数据信息,所述行为数据信息包括用户的行为所发生的时间和/或空间相关的信息;
    对所述多个用户的行为数据信息进行栅格化处理,将所述行为数据信息转化为映射行为数据信息;
    利用所述多个用户中的多个种子用户的映射行为数据信息进行行为模式提取,获得所述多个种子用户的行为模式集合;
    将每一个非种子用户的行为数据与所述行为模式集合进行模式匹配,获取所述每一个非种子用户与所述多个种子用户的相似度;
    根据所述相似度判断所述多个非种子用户中至少一个是否为潜在扩展用户。
  2. 根据权利要求1所述的方法,其特征在于,所述栅格化处理具体包括:将所述的行为数据信息栅格化,使用户的行为数据映射到对应的时间段和/或空间上。
  3. 根据权利要求2所述的方法,其特征在于,所述栅格化处理具体为:
    将时间维度沿着时间轴分段,将用户的行为数据映射到对应的时间段上;和/或利用经纬度将二维位置空间按网格分块,将用户的行为数据映射到对应的网格上。
  4. 根据权利要求1至3任意一项所述的方法,其特征在于,所述利用所述多个用户中的多个种子用户的映射行为数据信息进行行为模式提取,获得所述多个种子用户的行为模式集合的步骤,具体为:利用关联规则算法Prefixspan对种子用户的行为数据进行模式挖掘,得到基于频度或者用户数量排序的行为模式集合。
  5. 根据权利要求1至4任意一项所述的方法,其特征在于,所述行为模式集合包括多个行为模式,每一个行为模式代表种子用户的一种行为特征,行为特征是指种子用户的一种行为习惯,即种子用户的某种行为发生在什么时间和/或什么地点。
  6. 根据权利要求1至5任意一项所述的方法,其特征在于,所述将每一个非种子用户的行为数据与所述行为模式集合进行模式匹配的步骤,具体为:
    将每一个非种子用户的行为数据与种子用户所具有的行为模式集合进行匹配;
    如果一个非种子用户匹配多种行为模式,则利用融合权重的方式获取匹配结果。
  7. 根据权利要求1至6任意一项所述的方法,其特征在于,所述根据所述相似度判断所述多个非种子用户中至少一个是否为潜在扩展用户的步骤,具体为:
    根据所述多个非种子用户的相似度分数进行排序,分数越高则认为该非种子用户与种子用户的行为模式越接近。
  8. 一种数据挖掘与分析设备,其特征在于,包括:
    接收单元,用于获取多个用户的行为数据信息,所述行为数据信息包括用户的行为所发生的时间和/或空间相关的信息;
    映射单元,用于对所述多个用户的行为数据信息进行栅格化处理,将所述行为数据信息转化为映射行为数据信息;
    模式提取单元,用于利用所述多个用户中的多个种子用户的映射行为数据信息进行行为模式提取,获得所述多个种子用户的行为模式集合;
    模式匹配单元,用于将每一个非种子用户的行为数据与所述行为模式集合进行模式匹
    配,获取所述每一个非种子用户与所述多个种子用户的相似度;
    判断单元,根据所述相似度判断所述多个非种子用户中至少一个是否为潜在扩展用户;
    发送单元,用于发送所述潜在扩展用户的信息给营销平台。
  9. 根据权利要求8所述的设备,其特征在于,所述的栅格化处理具体包括:
    将时间维度沿着时间轴分段,将用户的行为数据映射到对应的时间段上;和/或利用经纬度将二维位置空间按网格分块,将用户的行为数据映射到对应的网格上。
  10. 根据权利要求8所述的设备,其特征在于,所述的模式提取单元具体用于:利用关联规则算法Prefixspan对种子用户的行为数据进行模式挖掘,得到基于频度或者用户数量排序的行为模式集合。
  11. 根据权利要求8所述的设备,其特征在于,所述的行为模式集合包括多个行为模式,每一个行为模式代表种子用户的一种行为特征,行为特征是指种子用户的一种行为习惯,即种子用户的某种行为发生在什么时间和/或什么地点。
  12. 根据权利要求8所述的设备,其特征在于,所述的模式匹配单元具体用于:将每一个非种子用户的行为数据与种子用户所具有的行为模式集合进行匹配;如果一个非种子用户匹配多种行为模式,则利用融合权重的方式获取匹配结果。
  13. 根据权利要求8所述的设备,其特征在于,所述的判断单元具体用于:
    根据所述多个非种子用户的相似度分数进行排序,分数越高则认为该非种子用户与种子用户的行为模式越接近。
  14. 一种扩展用户的系统,其特征在于,包括:
    存储设备,用于存储多个用户的行为数据信息,所述行为数据信息包括用户的行为所发生的时间和/或空间相关的信息;
    如权权利要求8-13任意一项所述的设备;以及
    营销平台,根据所述数据挖掘与分析设备的判断结果对潜在扩展用户推出营销业务。
  15. 一种计算机可读介质,所述计算机可读介质中存储有可执行指令,所述可执行指令用于执行权利要求1至7任一项所述的方法。
PCT/CN2017/104079 2016-12-21 2017-09-28 扩展用户的方法、装置及系统 WO2018113370A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP17884770.3A EP3537365A4 (en) 2016-12-21 2017-09-28 METHOD, DEVICE AND SYSTEM FOR INCREASING THE USER NUMBER

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611194189.8A CN108230001A (zh) 2016-12-21 2016-12-21 扩展用户的方法、装置及系统
CN201611194189.8 2016-12-21

Publications (1)

Publication Number Publication Date
WO2018113370A1 true WO2018113370A1 (zh) 2018-06-28

Family

ID=62624383

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/104079 WO2018113370A1 (zh) 2016-12-21 2017-09-28 扩展用户的方法、装置及系统

Country Status (3)

Country Link
EP (1) EP3537365A4 (zh)
CN (1) CN108230001A (zh)
WO (1) WO2018113370A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108876470A (zh) * 2018-06-29 2018-11-23 腾讯科技(深圳)有限公司 标签用户扩展方法、计算机设备及存储介质

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111225112B (zh) * 2020-01-03 2021-02-19 北京小米移动软件有限公司 流量使用控制方法、装置及存储介质
CN111325267B (zh) * 2020-02-18 2024-02-13 京东城市(北京)数字科技有限公司 数据融合方法、装置和计算机可读存储介质
CN111738774A (zh) * 2020-06-30 2020-10-02 中国平安财产保险股份有限公司 识别潜在目标用户的方法、装置、计算机设备和存储介质
CN112107866A (zh) * 2020-09-28 2020-12-22 腾讯科技(深圳)有限公司 用户行为数据处理方法、装置、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120054040A1 (en) * 2010-08-30 2012-03-01 Abraham Bagherjeiran Adaptive Targeting for Finding Look-Alike Users
WO2012034105A2 (en) * 2010-09-10 2012-03-15 Turnkey Intelligence, Llc Systems and methods for generating prospect scores for sales leads, spending capacity scores for sales leads, and retention scores for renewal of existing customers
CN104598557A (zh) * 2015-01-05 2015-05-06 华为技术有限公司 数据栅格化、用户行为分析的方法和装置
CN104751354A (zh) * 2015-04-13 2015-07-01 合一信息技术(北京)有限公司 一种广告人群筛选方法
CN105260414A (zh) * 2015-09-24 2016-01-20 精硕世纪科技(北京)有限公司 用户行为相似性计算方法及装置
CN105550903A (zh) * 2015-12-25 2016-05-04 腾讯科技(深圳)有限公司 目标用户确定方法及装置

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110191142A1 (en) * 2010-02-04 2011-08-04 Microsoft Corporation Using networking site interactions to generate a target list of potential consumers

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120054040A1 (en) * 2010-08-30 2012-03-01 Abraham Bagherjeiran Adaptive Targeting for Finding Look-Alike Users
WO2012034105A2 (en) * 2010-09-10 2012-03-15 Turnkey Intelligence, Llc Systems and methods for generating prospect scores for sales leads, spending capacity scores for sales leads, and retention scores for renewal of existing customers
CN104598557A (zh) * 2015-01-05 2015-05-06 华为技术有限公司 数据栅格化、用户行为分析的方法和装置
CN104751354A (zh) * 2015-04-13 2015-07-01 合一信息技术(北京)有限公司 一种广告人群筛选方法
CN105260414A (zh) * 2015-09-24 2016-01-20 精硕世纪科技(北京)有限公司 用户行为相似性计算方法及装置
CN105550903A (zh) * 2015-12-25 2016-05-04 腾讯科技(深圳)有限公司 目标用户确定方法及装置

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
See also references of EP3537365A4
ZHONG, ZHAO-MAN: "Discovering Similar Users for Specific User on Microblog", CHINESE JOURNAL OF COMPUTERS, vol. 39, no. 4, 30 April 2016 (2016-04-30), pages 765 - 779, XP009515826, ISSN: 0254-4164 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108876470A (zh) * 2018-06-29 2018-11-23 腾讯科技(深圳)有限公司 标签用户扩展方法、计算机设备及存储介质

Also Published As

Publication number Publication date
EP3537365A4 (en) 2019-12-11
CN108230001A (zh) 2018-06-29
EP3537365A1 (en) 2019-09-11

Similar Documents

Publication Publication Date Title
WO2018113370A1 (zh) 扩展用户的方法、装置及系统
CN105608179B (zh) 确定用户标识的关联性的方法和装置
JP5829662B2 (ja) 処理方法、コンピュータプログラム及び処理装置
JP5917719B2 (ja) 画像データベースにおける画像管理のための方法、装置、および、コンピュータで読取り可能な記録媒体
CN106933867B (zh) 一种图像查询方法和装置
TWI703862B (zh) 內容推薦方法及裝置
JP7407209B2 (ja) 情報プッシュ方法及び装置
CN110163076A (zh) 一种图像数据处理方法和相关装置
CN109597858B (zh) 一种商户的分类方法及其装置和商户的推荐方法及其装置
CN105095434B (zh) 时效需求识别方法及装置
TW201939400A (zh) 目標用戶群體的確定方法和裝置
CN110765882B (zh) 一种视频标签确定方法、装置、服务器及存储介质
CN102855245A (zh) 一种用于确定图片相似度的方法与设备
JP2018537760A (ja) アドレス情報に基づいたアカウントマッピングの方法及び装置
KR20170131924A (ko) 이미지 검색 방법, 장치 및 컴퓨터 프로그램
US20220286956A1 (en) Method and apparatus for mapping wireless hotspots and points of interest, computer-readable storage medium, and computer device
CN110069619A (zh) 房源展示方法、装置、设备及计算机可读存储介质
CN108269122A (zh) 广告的相似度处理方法和装置
JP2018509664A (ja) モデル生成方法、単語重み付け方法、装置、デバイス及びコンピュータ記憶媒体
CN110909222A (zh) 基于聚类的用户画像建立方法、装置、介质及电子设备
JP2007157164A (ja) 情報抽出群集化システム及びその方法
CN110855487B (zh) 网络用户相似度管理方法、装置及存储介质
CN104462347B (zh) 关键词的分类方法及装置
CN106021423B (zh) 基于群组划分的元搜索引擎个性化结果推荐方法
CN103744958B (zh) 一种基于分布式计算的网页分类方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17884770

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2017884770

Country of ref document: EP

Effective date: 20190607

NENP Non-entry into the national phase

Ref country code: DE