WO2019119635A1 - Seed user development method, electronic device and computer-readable storage medium - Google Patents

Seed user development method, electronic device and computer-readable storage medium Download PDF

Info

Publication number
WO2019119635A1
WO2019119635A1 PCT/CN2018/076181 CN2018076181W WO2019119635A1 WO 2019119635 A1 WO2019119635 A1 WO 2019119635A1 CN 2018076181 W CN2018076181 W CN 2018076181W WO 2019119635 A1 WO2019119635 A1 WO 2019119635A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
seed
expanded
seed user
users
Prior art date
Application number
PCT/CN2018/076181
Other languages
French (fr)
Chinese (zh)
Inventor
安欣
许开河
王建明
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to CN201711364792.0A priority Critical patent/CN107944931A/en
Priority to CN201711364792.0 priority
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019119635A1 publication Critical patent/WO2019119635A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06K9/6218Clustering techniques
    • G06K9/622Non-hierarchical partitioning techniques
    • G06K9/6221Non-hierarchical partitioning techniques based on statistics
    • G06K9/6223Non-hierarchical partitioning techniques based on statistics with a fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06QDATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce, e.g. shopping or e-commerce
    • G06Q30/02Marketing, e.g. market research and analysis, surveying, promotions, advertising, buyer profiling, customer management or rewards; Price estimation or determination
    • G06Q30/0201Market data gathering, market analysis or market modelling

Abstract

Disclosed is a seed user development method, the method comprising the steps of: carrying out clustering analysis on a pre-determined number of seed users by using a pre-set clustering method, and dividing the seed users into several seed user communities having specific features; calculating, by using a pre-set similarity calculation method, the similarity between a user to be developed and each seed user community; if the similarity between the user to be developed and the specific seed user community is greater than or equal to a first pre-set threshold value, classifying the user to be developed into the specific seed user community; and compiling statistics of the number of specific seed user communities, into which each user to be developed is classified, and ranking same, and determining, according to a ranking result, a development rule for developing a user to be developed into a seed user. The present application can reduce the calculation complexity of seed user development and improve the accuracy of development.

Description

种子用户拓展方法、电子设备及计算机可读存储介质Seed user extension method, electronic device and computer readable storage medium
本申请要求于2017年12月18日提交中国专利局、申请号为201711364792.0、发明名称为“种子用户拓展方法、电子设备及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。This application claims the priority of the Chinese Patent Application filed on Dec. 18, 2017, the Chinese Patent Office, Application No. 201711364792.0, entitled "Seed User Extension Method, Electronic Device, and Computer Readable Storage Media", the entire contents of which are The citation is incorporated in the application.
技术领域Technical field
本申请涉及计算机信息技术领域,尤其涉及一种种子用户拓展方法、电子设备及计算机可读存储介质。The present application relates to the field of computer information technology, and in particular, to a seed user extension method, an electronic device, and a computer readable storage medium.
背景技术Background technique
目前,对于种子用户拓展通常是通过寻找相似用户的方法,该方法通过计算每个用户间的相似程度进行用户拓展。然而,如果用户数据为大数据量级,则计算复杂度成指数增长,如何减少计算复杂度并高效进行种子用户拓展是目前急需解决的技术问题。故,现有技术中的种子用户拓展方法设计不够合理,亟需改进。Currently, for seed user development, it is usually a method of finding similar users, which performs user expansion by calculating the degree of similarity between each user. However, if the user data is of the order of large data, the computational complexity grows exponentially. How to reduce the computational complexity and efficiently expand the seed users is a technical problem that needs to be solved urgently. Therefore, the design of the seed user extension method in the prior art is not reasonable enough and needs to be improved.
发明内容Summary of the invention
有鉴于此,本申请提出一种种子用户拓展方法、电子设备及计算机可读存储介质,通过无监督学习聚类方法与距离相似度算法相结合,减少了种子用户拓展的计算复杂度,并提高了拓展的准确度。In view of this, the present application proposes a seed user extension method, an electronic device, and a computer readable storage medium, which combines an unsupervised learning clustering method with a distance similarity algorithm to reduce the computational complexity of seed user expansion and improve The accuracy of the expansion.
首先,为实现上述目的,本申请提出一种电子设备,所述电子设备包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的种子用户拓展系统,所述种子用户拓展系统被所述处理器执行时实现如下步骤:First, in order to achieve the above object, the present application provides an electronic device including a memory, a processor, and a seed user extension system stored on the memory and operable on the processor, the seed user When the expansion system is executed by the processor, the following steps are implemented:
通过预设的聚类方法针对预定数量的种子用户进行聚类分析,将所述种子用户分为若干具有特定特征的种子用户群落;Clustering analysis is performed on a predetermined number of seed users by a preset clustering method, and the seed user is divided into a plurality of seed user communities having specific characteristics;
通过预设的相似度计算方法,计算待拓展用户与每个种子用户群落的相似度;Calculating the similarity between the user to be expanded and each seed user community by a preset similarity calculation method;
若待拓展用户与特定种子用户群落的相似度大于或等于第一预设阈值,则将该待拓展用户划分入该特定种子用户群落;及If the similarity between the user to be expanded and the specific seed user community is greater than or equal to the first preset threshold, the user to be expanded is divided into the specific seed user community;
统计每个待拓展用户划分入特定种子用户群落的个数并进行排序,并根据排序结果确定将待拓展用户拓展为种子用户的拓展规则。The number of the users to be expanded into the specific seed user community is counted and sorted, and the expansion rules for expanding the users to be expanded into seed users are determined according to the ranking result.
优选地,所述特定特征包括用户的地理位置、是否为注册用户、是否购买过特定产品。Preferably, the specific features include the geographic location of the user, whether it is a registered user, whether a particular product has been purchased.
优选地,所述计算待拓展用户与每个种子用户群落的相似度包括:计算待拓展用户与每个种子用户群落中心点的相似度,作为待拓展用户与每个种子用户群落的相似度。Preferably, the calculating the similarity between the user to be extended and the community of each seed user comprises: calculating the similarity between the user to be expanded and the center point of each seed user, as the similarity between the user to be expanded and each seed user community.
优选地,所述拓展规则设置为:根据个数从高到低的顺序,选取指定数量的待拓展用户,将选取的待拓展用户拓展为种子用户。Preferably, the extension rule is set to: select a specified number of users to be expanded according to the order of the number from high to low, and expand the selected users to be expanded into seed users.
优选地,所述拓展规则设置为:若待拓展用户划分入特定种子用户群落的个数大于或等于第二预设阈值,则将该待拓展用户拓展为种子用户,其中,所述第二预设阈值设置为所有种子用户群落总个数的预定比例。Preferably, the extension rule is configured to expand the user to be extended into a seed user if the number of the users to be expanded into a specific seed user community is greater than or equal to a second preset threshold, wherein the second pre- Let the threshold be set to a predetermined ratio of the total number of all seed user communities.
此外,为实现上述目的,本申请还提供一种种子用户拓展方法,该方法应用于电子设备,所述方法包括:In addition, to achieve the above object, the present application further provides a seed user extension method, which is applied to an electronic device, and the method includes:
通过预设的聚类方法针对预定数量的种子用户进行聚类分析,将所述种子用户分为若干具有特定特征的种子用户群落;Clustering analysis is performed on a predetermined number of seed users by a preset clustering method, and the seed user is divided into a plurality of seed user communities having specific characteristics;
通过预设的相似度计算方法,计算待拓展用户与每个种子用户群落的相似度;Calculating the similarity between the user to be expanded and each seed user community by a preset similarity calculation method;
若待拓展用户与特定种子用户群落的相似度大于或等于第一预设阈值,则将该待拓展用户划分入该特定种子用户群落;及If the similarity between the user to be expanded and the specific seed user community is greater than or equal to the first preset threshold, the user to be expanded is divided into the specific seed user community;
统计每个待拓展用户划分入特定种子用户群落的个数并进行排序,并根 据排序结果确定将待拓展用户拓展为种子用户的拓展规则。The number of the users to be expanded into the specific seed user community is counted and sorted, and the expansion rules for expanding the users to be expanded into seed users are determined according to the ranking result.
优选地,所述特定特征包括用户的地理位置、是否为注册用户、是否购买过特定产品;及Preferably, the specific feature includes a geographical location of the user, whether it is a registered user, whether a specific product has been purchased;
所述计算待拓展用户与每个种子用户群落的相似度包括:The calculating the similarity between the user to be expanded and each seed user community includes:
计算待拓展用户与每个种子用户群落中心点的相似度,作为待拓展用户与每个种子用户群落的相似度。The similarity between the user to be expanded and the center point of each seed user community is calculated as the similarity between the user to be expanded and each seed user community.
优选地,所述拓展规则设置为:根据个数从高到低的顺序,选取指定数量的待拓展用户,将选取的待拓展用户拓展为种子用户。Preferably, the extension rule is set to: select a specified number of users to be expanded according to the order of the number from high to low, and expand the selected users to be expanded into seed users.
优选地,所述拓展规则设置为:若待拓展用户划分入特定种子用户群落的个数大于或等于第二预设阈值,则将该待拓展用户拓展为种子用户,其中,所述第二预设阈值设置为所有种子用户群落总个数的预定比例。Preferably, the extension rule is configured to expand the user to be extended into a seed user if the number of the users to be expanded into a specific seed user community is greater than or equal to a second preset threshold, wherein the second pre- Let the threshold be set to a predetermined ratio of the total number of all seed user communities.
进一步地,为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质存储有种子用户拓展系统,所述种子用户拓展系统可被至少一个处理器执行,以使所述至少一个处理器执行如上述的种子用户拓展方法的步骤。Further, to achieve the above object, the present application further provides a computer readable storage medium storing a seed user extension system, the seed user extension system being executable by at least one processor, such that The at least one processor performs the steps of the seed user extension method as described above.
相较于现有技术,本申请所提出的电子设备、种子用户拓展方法及计算机可读存储介质,通过无监督学习聚类方法与距离相似度算法相结合,减少了种子用户拓展的计算复杂度,并提高了拓展的准确度。Compared with the prior art, the electronic device, the seed user extension method and the computer readable storage medium proposed by the present application combine the unsupervised learning clustering method with the distance similarity algorithm to reduce the computational complexity of the seed user expansion. And improve the accuracy of the expansion.
附图说明DRAWINGS
图1是本申请电子设备一可选的硬件架构的示意图;1 is a schematic diagram of an optional hardware architecture of an electronic device of the present application;
图2是本申请电子设备中种子用户拓展系统一实施例的程序模块示意图;2 is a schematic diagram of a program module of an embodiment of a seed user extension system in an electronic device of the present application;
图3为本申请种子用户拓展方法一实施例的实施流程示意图。FIG. 3 is a schematic diagram of an implementation process of an embodiment of a seed user extension method according to the present application.
附图标记:Reference mark:
电子设备Electronic equipment 22
存储器Memory 21twenty one
处理器processor 22twenty two
网络接口Network Interface 23twenty three
种子用户拓展系统Seed user extension system 2020
分析模块Analysis module 201201
计算模块Calculation module 202202
拓展模块Expansion module 203203
流程步骤Process step S31-S34S31-S34
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The implementation, functional features and advantages of the present application will be further described with reference to the accompanying drawings.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the objects, technical solutions, and advantages of the present application more comprehensible, the present application will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the application and are not intended to be limiting. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.
需要说明的是,在本申请中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之内。It should be noted that the descriptions of "first", "second" and the like in the present application are for the purpose of description only, and are not to be construed as indicating or implying their relative importance or implicitly indicating the number of technical features indicated. . Thus, features defining "first" or "second" may include at least one of the features, either explicitly or implicitly. In addition, the technical solutions between the various embodiments may be combined with each other, but must be based on the realization of those skilled in the art, and when the combination of the technical solutions is contradictory or impossible to implement, it should be considered that the combination of the technical solutions does not exist. Nor is it within the scope of protection required by this application.
进一步需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。It is further to be understood that the term "comprises", "comprises" or any other variations thereof is intended to encompass a non-exclusive inclusion, such that a process, method, article, or device that comprises a And includes other elements not explicitly listed, or elements that are inherent to such a process, method, article, or device. An element that is defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in the process, method, item, or device that comprises the element.
首先,本申请提出一种电子设备2。First of all, the present application proposes an electronic device 2.
参阅图1所示,是本申请电子设备2一可选的硬件架构的示意图。本实施例中,所述电子设备2可包括,但不限于,可通过系统总线相互通信连接存储器21、处理器22、网络接口23。需要指出的是,图1仅示出了具有组件21-23的电子设备2,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。Referring to FIG. 1, it is a schematic diagram of an optional hardware architecture of the electronic device 2 of the present application. In this embodiment, the electronic device 2 may include, but is not limited to, a memory 21, a processor 22, and a network interface 23 that can communicate with each other through a system bus. It is pointed out that FIG. 1 only shows the electronic device 2 with the components 21-23, but it should be understood that not all illustrated components are required to be implemented, and more or fewer components may be implemented instead.
其中,所述电子设备2可以是机架式服务器、刀片式服务器、塔式服务器或机柜式服务器等计算设备,该电子设备2可以是独立的服务器,也可以是多个服务器所组成的服务器集群。The electronic device 2 may be a computing device such as a rack server, a blade server, a tower server, or a rack server. The electronic device 2 may be an independent server or a server cluster composed of multiple servers. .
所述存储器21至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器21可以是所述电子设备2的内部存储单元,例如该电子设备2的硬盘或内存。在另一些实施例中,所述存储器21也可以是所述电子设备2的外部存储设备,例如该电子设备2上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器21还可以既包括所述电子设备2的内部存储单元也包括其外部存储设备。本实施例中,所 述存储器21通常用于存储安装于所述电子设备2的操作系统和各类应用软件,例如所述种子用户拓展系统20的程序代码等。此外,所述存储器21还可以用于暂时地存储已经输出或者将要输出的各类数据。The memory 21 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (eg, SD or DX memory, etc.), a random access memory (RAM), a static Random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk, and the like. In some embodiments, the memory 21 may be an internal storage unit of the electronic device 2, such as a hard disk or memory of the electronic device 2. In other embodiments, the memory 21 may also be an external storage device of the electronic device 2, such as a plug-in hard disk equipped on the electronic device 2, a smart memory card (SMC), and a secure digital device. (Secure Digital, SD) card, flash card, etc. Of course, the memory 21 may also include both an internal storage unit of the electronic device 2 and an external storage device thereof. In this embodiment, the memory 21 is generally used to store an operating system installed in the electronic device 2 and various types of application software, such as program code of the seed user extension system 20. Further, the memory 21 can also be used to temporarily store various types of data that have been output or are to be output.
所述处理器22在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器22通常用于控制所述电子设备2的总体操作,例如执行与所述电子设备2进行数据交互或者通信相关的控制和处理等。本实施例中,所述处理器22用于运行所述存储器21中存储的程序代码或者处理数据,例如运行所述的种子用户拓展系统20等。The processor 22 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 22 is typically used to control the overall operation of the electronic device 2, such as performing control and processing related to data interaction or communication with the electronic device 2. In this embodiment, the processor 22 is configured to run program code or process data stored in the memory 21, such as running the seed user extension system 20 and the like.
所述网络接口23可包括无线网络接口或有线网络接口,该网络接口23通常用于在所述电子设备2与其他电子设备之间建立通信连接。例如,所述网络接口23用于通过网络将所述电子设备2与外部数据平台相连,在所述电子设备2与外部数据平台之间建立数据传输通道和通信连接。所述网络可以是企业内部网(Intranet)、互联网(Internet)、全球移动通讯系统(Global System of Mobile communication,GSM)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、4G网络、5G网络、蓝牙(Bluetooth)、Wi-Fi等无线或有线网络。The network interface 23 may comprise a wireless network interface or a wired network interface, which is typically used to establish a communication connection between the electronic device 2 and other electronic devices. For example, the network interface 23 is configured to connect the electronic device 2 to an external data platform through a network, and establish a data transmission channel and a communication connection between the electronic device 2 and an external data platform. The network may be an intranet, an Internet, a Global System of Mobile communication (GSM), a Wideband Code Division Multiple Access (WCDMA), a 4G network, or a 5G network. Wireless or wired networks such as network, Bluetooth, Wi-Fi, etc.
至此,己经详细介绍了本申请各个实施例的应用环境和相关设备的硬件结构和功能。下面,将基于上述应用环境和相关设备,提出本申请的各个实施例。So far, the application environment of the various embodiments of the present application and the hardware structure and functions of related devices have been described in detail. Hereinafter, various embodiments of the present application will be proposed based on the above-described application environment and related devices.
参阅图2所示,是本申请电子设备2中种子用户拓展系统20一实施例的程序模块图。本实施例中,所述的种子用户拓展系统20可以被分割成一个或多个程序模块,所述一个或者多个程序模块被存储于所述存储器21中,并由一个或多个处理器(本实施例中为所述处理器22)所执行,以完成本申请。例如,在图2中,所述的种子用户拓展系统20可以被分割成分析模块201、 计算模块202、以及拓展模块203。本申请所称的程序模块是指能够完成特定功能的一系列计算机程序指令段,比程序更适合于描述所述种子用户拓展系统20在所述电子设备2中的执行过程。以下将就各程序模块201-203的功能进行详细描述。Referring to FIG. 2, it is a program module diagram of an embodiment of the seed user extension system 20 in the electronic device 2 of the present application. In this embodiment, the seed user extension system 20 may be divided into one or more program modules, the one or more program modules being stored in the memory 21 and being processed by one or more processors ( This embodiment is executed by the processor 22) to complete the application. For example, in FIG. 2, the seed user extension system 20 can be divided into an analysis module 201, a calculation module 202, and an expansion module 203. A program module as referred to in the present application refers to a series of computer program instruction segments capable of performing a specific function, and is more suitable than the program to describe the execution process of the seed user extension system 20 in the electronic device 2. The function of each program module 201-203 will be described in detail below.
所述分析模块201,用于通过预设的聚类方法(如无监督学习K-means聚类方法)针对预定数量(可以是大数据量级)的种子用户进行聚类分析,将所述种子用户分为若干具有特定特征(或显著特征)的种子用户群落。其中,所述特定特征包括,但不限于,用户的地理位置、是否为注册用户、是否购买过特定产品(如产险)等。The analyzing module 201 is configured to perform cluster analysis on a predetermined number (which may be a large data level) of seed users by using a preset clustering method (such as an unsupervised learning K-means clustering method), and the seed is The user is divided into a number of seed user communities with specific characteristics (or salient features). The specific features include, but are not limited to, the geographic location of the user, whether it is a registered user, whether a specific product has been purchased (such as a property insurance), and the like.
举例而言,假设聚类个数为k(即种子用户群落的个数),所有种子用户的预定数量为N,则K-means聚类方法包括如下步骤:For example, if the number of clusters is k (ie, the number of seed user communities) and the predetermined number of all seed users is N, the K-means clustering method includes the following steps:
(A1)首先从N个数据对象(即N个种子用户)中任意选择k个数据对象作为聚类中心(即初始聚类中心);(A1) First, k data objects are arbitrarily selected from N data objects (ie, N seed users) as cluster centers (ie, initial cluster centers);
(A2)针对剩余的数据对象,计算每个剩余的数据对象到每个聚类中心的相似度(如欧式距离,距离越小代表相似度越高),并根据相似度的大小分别将每个剩余的数据对象分配给与其最相似的聚类中心所代表的聚类;(A2) Calculate the similarity of each remaining data object to each cluster center for the remaining data objects (such as Euclidean distance, the smaller the distance, the higher the similarity), and each according to the similarity The remaining data objects are assigned to clusters represented by cluster centers that are most similar to them;
(A3)重新计算已经得到的每个聚类的聚类中心(即每个聚类中所有数据对象的均值);(A3) recalculating the cluster centers of each cluster that have been obtained (ie, the mean of all data objects in each cluster);
(A4)迭代A2至A3步骤,直至预设的标准测度函数开始收敛为止。在本实施例中,可以采用均方差作为预设的标准测度函数。(A4) Iterate the steps A2 to A3 until the preset standard measure function begins to converge. In this embodiment, the mean square error can be used as a preset standard measure function.
所述计算模块202,用于通过预设的相似度计算方法,计算待拓展用户(一个或多个)与每个种子用户群落(即具有特定特征的种子用户群落)的相似度。其中,所述预设的相似度计算方法可以是:欧氏距离、夹角余弦、汉明距离等相似度计算方法。The calculating module 202 is configured to calculate, by using a preset similarity calculation method, a similarity between the user(s) to be extended and each seed user community (ie, a seed user community having a specific feature). The predetermined similarity calculation method may be: a similarity calculation method such as an Euclidean distance, an included cosine, and a Hamming distance.
优选地,在本实施例中,所述计算待拓展用户与每个种子用户群落的相 似度包括:计算待拓展用户与每个种子用户群落中心点的相似度(计算方法与用户之间相似度计算方法一致),作为待拓展用户与每个种子用户群落的相似度。其中,种子用户群落是多个具有相似特征的用户集合,每个集合向一个中心点聚合,该中心点即种子用户群落的中心点。由于无需与种子用户群落中的每个用户计算相似度,从而可以大幅降低计算复杂度。Preferably, in this embodiment, the calculating the similarity between the user to be extended and each seed user community comprises: calculating the similarity between the user to be expanded and the center point of each seed user community (calculation method and user similarity The calculation method is consistent) as the similarity between the user to be expanded and each seed user community. The seed user community is a set of users with similar characteristics, and each set is aggregated to a central point, which is the center point of the seed user community. Since there is no need to calculate similarities with each user in the seed user community, the computational complexity can be greatly reduced.
举例而言,采用夹角余弦方法计算待拓展用户与某个种子用户群落中心点的相似度可以采用如下公式1所示。For example, using the angle cosine method to calculate the similarity between the user to be expanded and the center point of a seed user community may be as shown in the following formula 1.
Figure PCTCN2018076181-appb-000001
Figure PCTCN2018076181-appb-000001
其中,
Figure PCTCN2018076181-appb-000002
代表待拓展用户a和某个种子用户群落中心点b的余弦相似度,
Figure PCTCN2018076181-appb-000003
代表待拓展用户a的评分向量,
Figure PCTCN2018076181-appb-000004
代表某个种子用户群落中心点b的评分向量。
among them,
Figure PCTCN2018076181-appb-000002
Representing the cosine similarity of the user a to be expanded and the center point b of a seed user community,
Figure PCTCN2018076181-appb-000003
Represents the rating vector of the user a to be expanded,
Figure PCTCN2018076181-appb-000004
A scoring vector representing the center point b of a seed user community.
所述拓展模块203,用于若待拓展用户与特定种子用户群落的相似度大于或等于第一预设阈值(如80%),则将该待拓展用户划分入该特定种子用户群落。The expansion module 203 is configured to divide the user to be expanded into the specific seed user community if the similarity between the user to be expanded and the specific seed user community is greater than or equal to a first preset threshold (eg, 80%).
举例而言,假设聚类分析后得到的种子用户群落包括三个:B1、B2、B3,待拓展用户A与种子用户群落B1的相似度S1为60%、与种子用户群落B2的相似度S2为85%(大于第一预设阈值)、与种子用户群落B3的相似度S3为90%(大于第一预设阈值),则将待拓展用户A划分入特定种子用户群落B2和B3。For example, suppose that the seed user community obtained after cluster analysis includes three: B1, B2, and B3, and the similarity S1 of the user A to be expanded and the seed user community B1 is 60%, and the similarity with the seed user community B2 is S2. For 85% (greater than the first preset threshold) and the similarity S3 with the seed user community B3 is 90% (greater than the first preset threshold), the user A to be expanded is divided into specific seed user communities B2 and B3.
所述拓展模块203,还用于统计每个待拓展用户划分入特定种子用户群落的个数并进行排序,并根据排序结果确定将待拓展用户拓展为种子用户的拓展规则。其中,排序越高代表相似度越高。The expansion module 203 is further configured to count and sort the number of the users to be expanded into a specific seed user community, and determine an extension rule for expanding the user to be expanded into a seed user according to the ranking result. Among them, the higher the ranking, the higher the similarity.
优选地,在本实施例中,所述拓展规则设置为:根据个数从高到低的顺序,选取指定数量(如前2位)的待拓展用户,将选取的待拓展用户拓展为种 子用户。例如,假设待拓展用户A同时划分入5个种子用户群落,待拓展用户B同时划分入3个种子用户群落,待拓展用户C同时划分入2个种子用户群落,则将待拓展用户A和B拓展为种子用户。Preferably, in this embodiment, the extension rule is set to select a user to be expanded according to a specified number (such as the first two digits) according to the number from the highest to the lowest, and expand the selected user to be expanded into a seed user. . For example, suppose that the user A to be expanded is divided into five seed user communities at the same time, and the user B to be expanded is divided into three seed user communities at the same time. If the extended user C is simultaneously divided into two seed user communities, the users A and B will be expanded. Expanded to seed users.
优选地,在其它实施例中,所述拓展规则还可以设置为:若待拓展用户划分入特定种子用户群落的个数大于或等于第二预设阈值,则将该待拓展用户拓展为种子用户,其中,所述第二预设阈值可以设置为所有种子用户群落(即具有特定特征的种子用户群落)总个数的预定比例(如50%)。例如,假设所有种子用户群落总个数为4,预定比例为50%,则所述第二预设阈值为2。Preferably, in other embodiments, the extension rule may be further configured to: if the number of users to be expanded into a specific seed user community is greater than or equal to a second preset threshold, expand the user to be expanded into a seed user. The second preset threshold may be set to a predetermined ratio (eg, 50%) of the total number of all seed user communities (ie, seed user communities having specific characteristics). For example, assuming that the total number of all seed user communities is 4 and the predetermined ratio is 50%, the second preset threshold is 2.
通过上述程序模块201-203,本申请所提出的种子用户拓展系统20,通过无监督学习聚类方法与距离相似度算法相结合,减少了种子用户拓展的计算复杂度,并提高了拓展的准确度。Through the above-mentioned program modules 201-203, the seed user extension system 20 proposed by the present application combines the unsupervised learning clustering method with the distance similarity algorithm, thereby reducing the computational complexity of the seed user expansion and improving the accuracy of the expansion. degree.
此外,本申请还提出一种种子用户拓展方法。In addition, the present application also proposes a seed user extension method.
参阅图3所示,是本申请种子用户拓展方法一实施例的实施流程示意图。在本实施例中,根据不同的需求,图3所示的流程图中的步骤的执行顺序可以改变,某些步骤可以省略。Referring to FIG. 3, it is a schematic flowchart of an implementation process of an embodiment of a seed user extension method of the present application. In this embodiment, the order of execution of the steps in the flowchart shown in FIG. 3 may be changed according to different requirements, and some steps may be omitted.
步骤S31,通过预设的聚类方法(如无监督学习K-means聚类方法)针对预定数量(可以是大数据量级)的种子用户进行聚类分析,将所述种子用户分为若干具有特定特征(或显著特征)的种子用户群落。其中,所述特定特征包括,但不限于,用户的地理位置、是否为注册用户、是否购买过特定产品(如产险)等。Step S31, performing cluster analysis on a predetermined number of seed users (which may be a large data level) by a preset clustering method (such as an unsupervised learning K-means clustering method), and dividing the seed user into several A seed user community with specific characteristics (or salient features). The specific features include, but are not limited to, the geographic location of the user, whether it is a registered user, whether a specific product has been purchased (such as a property insurance), and the like.
举例而言,假设聚类个数为k(即种子用户群落的个数),所有种子用户的预定数量为N,则K-means聚类方法包括如下步骤:For example, if the number of clusters is k (ie, the number of seed user communities) and the predetermined number of all seed users is N, the K-means clustering method includes the following steps:
(A1)首先从N个数据对象(即N个种子用户)中任意选择k个数据对象作为聚类中心(即初始聚类中心);(A1) First, k data objects are arbitrarily selected from N data objects (ie, N seed users) as cluster centers (ie, initial cluster centers);
(A2)针对剩余的数据对象,计算每个剩余的数据对象到每个聚类中心的相似度(如欧式距离,距离越小代表相似度越高),并根据相似度的大小分别将每个剩余的数据对象分配给与其最相似的聚类中心所代表的聚类;(A2) Calculate the similarity of each remaining data object to each cluster center for the remaining data objects (such as Euclidean distance, the smaller the distance, the higher the similarity), and each according to the similarity The remaining data objects are assigned to clusters represented by cluster centers that are most similar to them;
(A3)重新计算已经得到的每个聚类的聚类中心(即每个聚类中所有数据对象的均值);(A3) recalculating the cluster centers of each cluster that have been obtained (ie, the mean of all data objects in each cluster);
(A4)迭代A2至A3步骤,直至预设的标准测度函数开始收敛为止。在本实施例中,可以采用均方差作为预设的标准测度函数。(A4) Iterate the steps A2 to A3 until the preset standard measure function begins to converge. In this embodiment, the mean square error can be used as a preset standard measure function.
步骤S32,通过预设的相似度计算方法,计算待拓展用户(一个或多个)与每个种子用户群落(即具有特定特征的种子用户群落)的相似度。其中,所述预设的相似度计算方法可以是:欧氏距离、夹角余弦、汉明距离等相似度计算方法。Step S32: Calculate the similarity between the user(s) to be extended and each seed user community (ie, a seed user community with a specific feature) by a preset similarity calculation method. The predetermined similarity calculation method may be: a similarity calculation method such as an Euclidean distance, an included cosine, and a Hamming distance.
优选地,在本实施例中,所述计算待拓展用户与每个种子用户群落的相似度包括:计算待拓展用户与每个种子用户群落中心点的相似度(计算方法与用户之间相似度计算方法一致),作为待拓展用户与每个种子用户群落的相似度。其中,种子用户群落是多个具有相似特征的用户集合,每个集合向一个中心点聚合,该中心点即种子用户群落的中心点。由于无需与种子用户群落中的每个用户计算相似度,从而可以大幅降低计算复杂度。Preferably, in this embodiment, the calculating the similarity between the user to be extended and each seed user community comprises: calculating the similarity between the user to be expanded and the center point of each seed user community (calculation method and user similarity The calculation method is consistent) as the similarity between the user to be expanded and each seed user community. The seed user community is a set of users with similar characteristics, and each set is aggregated to a central point, which is the center point of the seed user community. Since there is no need to calculate similarities with each user in the seed user community, the computational complexity can be greatly reduced.
举例而言,采用夹角余弦方法计算待拓展用户与某个种子用户群落中心点的相似度可以采用如下公式1所示。For example, using the angle cosine method to calculate the similarity between the user to be expanded and the center point of a seed user community may be as shown in the following formula 1.
Figure PCTCN2018076181-appb-000005
Figure PCTCN2018076181-appb-000005
其中,
Figure PCTCN2018076181-appb-000006
代表待拓展用户a和某个种子用户群落中心点b的余弦相似度,
Figure PCTCN2018076181-appb-000007
代表待拓展用户a的评分向量,
Figure PCTCN2018076181-appb-000008
代表某个种子用户群落中心点b的评分向量。
among them,
Figure PCTCN2018076181-appb-000006
Representing the cosine similarity of the user a to be expanded and the center point b of a seed user community,
Figure PCTCN2018076181-appb-000007
Represents the rating vector of the user a to be expanded,
Figure PCTCN2018076181-appb-000008
A scoring vector representing the center point b of a seed user community.
步骤S33,若待拓展用户与特定种子用户群落的相似度大于或等于第一预设阈值(如80%),则将该待拓展用户划分入该特定种子用户群落。In step S33, if the similarity between the user to be expanded and the specific seed user community is greater than or equal to a first preset threshold (such as 80%), the user to be expanded is divided into the specific seed user community.
举例而言,假设聚类分析后得到的种子用户群落包括三个:B1、B2、B3,待拓展用户A与种子用户群落B1的相似度S1为60%、与种子用户群落B2的相似度S2为85%(大于第一预设阈值)、与种子用户群落B3的相似度S3为90%(大于第一预设阈值),则将待拓展用户A划分入特定种子用户群落B2和B3。For example, suppose that the seed user community obtained after cluster analysis includes three: B1, B2, and B3, and the similarity S1 of the user A to be expanded and the seed user community B1 is 60%, and the similarity with the seed user community B2 is S2. For 85% (greater than the first preset threshold) and the similarity S3 with the seed user community B3 is 90% (greater than the first preset threshold), the user A to be expanded is divided into specific seed user communities B2 and B3.
步骤S34,统计每个待拓展用户划分入特定种子用户群落的个数并进行排序,并根据排序结果确定将待拓展用户拓展为种子用户的拓展规则。其中,排序越高代表相似度越高。In step S34, the number of the users to be expanded into the specific seed user community is counted and sorted, and the expansion rule of expanding the user to be expanded into the seed user is determined according to the ranking result. Among them, the higher the ranking, the higher the similarity.
优选地,在本实施例中,所述拓展规则设置为:根据个数从高到低的顺序,选取指定数量(如前2位)的待拓展用户,将选取的待拓展用户拓展为种子用户。例如,假设待拓展用户A同时划分入5个种子用户群落,待拓展用户B同时划分入3个种子用户群落,待拓展用户C同时划分入2个种子用户群落,则将待拓展用户A和B拓展为种子用户。Preferably, in this embodiment, the extension rule is set to select a user to be expanded according to a specified number (such as the first two digits) according to the number from the highest to the lowest, and expand the selected user to be expanded into a seed user. . For example, suppose that the user A to be expanded is divided into five seed user communities at the same time, and the user B to be expanded is divided into three seed user communities at the same time. If the extended user C is simultaneously divided into two seed user communities, the users A and B will be expanded. Expanded to seed users.
优选地,在其它实施例中,所述拓展规则还可以设置为:若待拓展用户划分入特定种子用户群落的个数大于或等于第二预设阈值,则将该待拓展用户拓展为种子用户,其中,所述第二预设阈值可以设置为所有种子用户群落(即具有特定特征的种子用户群落)总个数的预定比例(如50%)。例如,假设所有种子用户群落总个数为4,预定比例为50%,则所述第二预设阈值为2。Preferably, in other embodiments, the extension rule may be further configured to: if the number of users to be expanded into a specific seed user community is greater than or equal to a second preset threshold, expand the user to be expanded into a seed user. The second preset threshold may be set to a predetermined ratio (eg, 50%) of the total number of all seed user communities (ie, seed user communities having specific characteristics). For example, assuming that the total number of all seed user communities is 4 and the predetermined ratio is 50%, the second preset threshold is 2.
通过上述步骤S31-S34,本申请所提出的种子用户拓展方法,通过无监督学习聚类方法与距离相似度算法相结合,减少了种子用户拓展的计算复杂度,并提高了拓展的准确度。Through the above steps S31-S34, the seed user extension method proposed by the present application combines the unsupervised learning clustering method with the distance similarity algorithm, thereby reducing the computational complexity of the seed user expansion and improving the accuracy of the expansion.
进一步地,为实现上述目的,本申请还提供一种计算机可读存储介质(如 ROM/RAM、磁碟、光盘),所述计算机可读存储介质存储有种子用户拓展系统20,所述种子用户拓展系统20可被至少一个处理器22执行,以使所述至少一个处理器22执行如下所述的种子用户拓展方法的步骤。Further, in order to achieve the above object, the present application further provides a computer readable storage medium (such as a ROM/RAM, a magnetic disk, an optical disk), the computer readable storage medium storing a seed user extension system 20, the seed user The extension system 20 can be executed by at least one processor 22 to cause the at least one processor 22 to perform the steps of the seed user extension method as described below.
(1)通过预设的聚类方法(如无监督学习K-means聚类方法)针对预定数量(可以是大数据量级)的种子用户进行聚类分析,将所述种子用户分为若干具有特定特征(或显著特征)的种子用户群落。其中,所述特定特征包括,但不限于,用户的地理位置、是否为注册用户、是否购买过特定产品(如产险)等。(1) performing cluster analysis on a predetermined number (which may be a large data magnitude) of seed users by a preset clustering method (such as unsupervised learning K-means clustering method), and dividing the seed users into several A seed user community with specific characteristics (or salient features). The specific features include, but are not limited to, the geographic location of the user, whether it is a registered user, whether a specific product has been purchased (such as a property insurance), and the like.
举例而言,假设聚类个数为k(即种子用户群落的个数),所有种子用户的预定数量为N,则K-means聚类方法包括如下步骤:For example, if the number of clusters is k (ie, the number of seed user communities) and the predetermined number of all seed users is N, the K-means clustering method includes the following steps:
(A1)首先从N个数据对象(即N个种子用户)中任意选择k个数据对象作为聚类中心(即初始聚类中心);(A1) First, k data objects are arbitrarily selected from N data objects (ie, N seed users) as cluster centers (ie, initial cluster centers);
(A2)针对剩余的数据对象,计算每个剩余的数据对象到每个聚类中心的相似度(如欧式距离,距离越小代表相似度越高),并根据相似度的大小分别将每个剩余的数据对象分配给与其最相似的聚类中心所代表的聚类;(A2) Calculate the similarity of each remaining data object to each cluster center for the remaining data objects (such as Euclidean distance, the smaller the distance, the higher the similarity), and each according to the similarity The remaining data objects are assigned to clusters represented by cluster centers that are most similar to them;
(A3)重新计算已经得到的每个聚类的聚类中心(即每个聚类中所有数据对象的均值);(A3) recalculating the cluster centers of each cluster that have been obtained (ie, the mean of all data objects in each cluster);
(A4)迭代A2至A3步骤,直至预设的标准测度函数开始收敛为止。在本实施例中,可以采用均方差作为预设的标准测度函数。(A4) Iterate the steps A2 to A3 until the preset standard measure function begins to converge. In this embodiment, the mean square error can be used as a preset standard measure function.
(2)通过预设的相似度计算方法,计算待拓展用户(一个或多个)与每个种子用户群落(即具有特定特征的种子用户群落)的相似度。其中,所述预设的相似度计算方法可以是:欧氏距离、夹角余弦、汉明距离等相似度计算方法。(2) Calculating the similarity of the user(s) to be expanded with each seed user community (ie, the seed user community with specific characteristics) by a preset similarity calculation method. The predetermined similarity calculation method may be: a similarity calculation method such as an Euclidean distance, an included cosine, and a Hamming distance.
优选地,在本实施例中,所述计算待拓展用户与每个种子用户群落的相似度包括:计算待拓展用户与每个种子用户群落中心点的相似度(计算方法 与用户之间相似度计算方法一致),作为待拓展用户与每个种子用户群落的相似度。其中,种子用户群落是多个具有相似特征的用户集合,每个集合向一个中心点聚合,该中心点即种子用户群落的中心点。由于无需与种子用户群落中的每个用户计算相似度,从而可以大幅降低计算复杂度。Preferably, in this embodiment, the calculating the similarity between the user to be extended and each seed user community comprises: calculating the similarity between the user to be expanded and the center point of each seed user community (calculation method and user similarity The calculation method is consistent) as the similarity between the user to be expanded and each seed user community. The seed user community is a set of users with similar characteristics, and each set is aggregated to a central point, which is the center point of the seed user community. Since there is no need to calculate similarities with each user in the seed user community, the computational complexity can be greatly reduced.
举例而言,采用夹角余弦方法计算待拓展用户与某个种子用户群落中心点的相似度可以采用如下公式1所示。For example, using the angle cosine method to calculate the similarity between the user to be expanded and the center point of a seed user community may be as shown in the following formula 1.
Figure PCTCN2018076181-appb-000009
Figure PCTCN2018076181-appb-000009
其中,
Figure PCTCN2018076181-appb-000010
代表待拓展用户a和某个种子用户群落中心点b的余弦相似度,
Figure PCTCN2018076181-appb-000011
代表待拓展用户a的评分向量,
Figure PCTCN2018076181-appb-000012
代表某个种子用户群落中心点b的评分向量。
among them,
Figure PCTCN2018076181-appb-000010
Representing the cosine similarity of the user a to be expanded and the center point b of a seed user community,
Figure PCTCN2018076181-appb-000011
Represents the rating vector of the user a to be expanded,
Figure PCTCN2018076181-appb-000012
A scoring vector representing the center point b of a seed user community.
(3)若待拓展用户与特定种子用户群落的相似度大于或等于第一预设阈值(如80%),则将该待拓展用户划分入该特定种子用户群落。(3) If the similarity between the user to be expanded and the specific seed user community is greater than or equal to a first preset threshold (eg, 80%), the user to be expanded is divided into the specific seed user community.
举例而言,假设聚类分析后得到的种子用户群落包括三个:B1、B2、B3,待拓展用户A与种子用户群落B1的相似度S1为60%、与种子用户群落B2的相似度S2为85%(大于第一预设阈值)、与种子用户群落B3的相似度S3为90%(大于第一预设阈值),则将待拓展用户A划分入特定种子用户群落B2和B3。For example, suppose that the seed user community obtained after cluster analysis includes three: B1, B2, and B3, and the similarity S1 of the user A to be expanded and the seed user community B1 is 60%, and the similarity with the seed user community B2 is S2. For 85% (greater than the first preset threshold) and the similarity S3 with the seed user community B3 is 90% (greater than the first preset threshold), the user A to be expanded is divided into specific seed user communities B2 and B3.
(4)统计每个待拓展用户划分入特定种子用户群落的个数并进行排序,并根据排序结果确定将待拓展用户拓展为种子用户的拓展规则。其中,排序越高代表相似度越高。(4) Counting and sorting the number of users to be expanded into specific seed user communities, and determining the expansion rules for expanding the users to be expanded into seed users according to the ranking result. Among them, the higher the ranking, the higher the similarity.
优选地,在本实施例中,所述拓展规则设置为:根据个数从高到低的顺序,选取指定数量(如前2位)的待拓展用户,将选取的待拓展用户拓展为种子用户。例如,假设待拓展用户A同时划分入5个种子用户群落,待拓展用户B同时划分入3个种子用户群落,待拓展用户C同时划分入2个种子用户群落,则 将待拓展用户A和B拓展为种子用户。Preferably, in this embodiment, the extension rule is set to select a user to be expanded according to a specified number (such as the first two digits) according to the number from the highest to the lowest, and expand the selected user to be expanded into a seed user. . For example, suppose that the user A to be expanded is divided into five seed user communities at the same time, and the user B to be expanded is divided into three seed user communities at the same time. If the extended user C is simultaneously divided into two seed user communities, the users A and B will be expanded. Expanded to seed users.
优选地,在其它实施例中,所述拓展规则还可以设置为:若待拓展用户划分入特定种子用户群落的个数大于或等于第二预设阈值,则将该待拓展用户拓展为种子用户,其中,所述第二预设阈值可以设置为所有种子用户群落(即具有特定特征的种子用户群落)总个数的预定比例(如50%)。例如,假设所有种子用户群落总个数为4,预定比例为50%,则所述第二预设阈值为2。Preferably, in other embodiments, the extension rule may be further configured to: if the number of users to be expanded into a specific seed user community is greater than or equal to a second preset threshold, expand the user to be expanded into a seed user. The second preset threshold may be set to a predetermined ratio (eg, 50%) of the total number of all seed user communities (ie, seed user communities having specific characteristics). For example, assuming that the total number of all seed user communities is 4 and the predetermined ratio is 50%, the second preset threshold is 2.
通过上述步骤(1)-(4),本申请所提出的计算机可读存储介质,通过无监督学习聚类方法与距离相似度算法相结合,减少了种子用户拓展的计算复杂度,并提高了拓展的准确度。Through the above steps (1)-(4), the computer readable storage medium proposed by the present application combines the unsupervised learning clustering method with the distance similarity algorithm, thereby reducing the computational complexity of the seed user expansion and improving the computational complexity. The accuracy of the expansion.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件来实现,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and can also be implemented by hardware, but in many cases, the former is A better implementation. Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk, The optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present application.
以上参照附图说明了本申请的优选实施例,并非因此局限本申请的权利范围。上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。另外,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。The preferred embodiments of the present application have been described above with reference to the drawings, and are not intended to limit the scope of the application. The serial numbers of the embodiments of the present application are merely for the description, and do not represent the advantages and disadvantages of the embodiments. Additionally, although logical sequences are shown in the flowcharts, in some cases the steps shown or described may be performed in a different order than the ones described herein.
本领域技术人员不脱离本申请的范围和实质,可以有多种变型方案实现本申请,比如作为一个实施例的特征可用于另一实施例而得到又一实施例。凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。A person skilled in the art can implement the present application in various variants without departing from the scope and spirit of the present application. For example, the features of one embodiment can be used in another embodiment to obtain another embodiment. The equivalent structure or equivalent process transformations made by the present specification and the contents of the drawings, or directly or indirectly applied to other related technical fields, are all included in the scope of patent protection of the present application.

Claims (20)

  1. 一种电子设备,其特征在于,所述电子设备包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的种子用户拓展系统,所述种子用户拓展系统被所述处理器执行时实现如下步骤:An electronic device, comprising: a memory, a processor, and a seed user extension system stored on the memory and operable on the processor, the seed user extension system being processed The following steps are implemented when the device is executed:
    通过预设的聚类方法针对预定数量的种子用户进行聚类分析,将所述种子用户分为若干具有特定特征的种子用户群落;Clustering analysis is performed on a predetermined number of seed users by a preset clustering method, and the seed user is divided into a plurality of seed user communities having specific characteristics;
    通过预设的相似度计算方法,计算待拓展用户与每个种子用户群落的相似度;Calculating the similarity between the user to be expanded and each seed user community by a preset similarity calculation method;
    若待拓展用户与特定种子用户群落的相似度大于或等于第一预设阈值,则将该待拓展用户划分入该特定种子用户群落;及If the similarity between the user to be expanded and the specific seed user community is greater than or equal to the first preset threshold, the user to be expanded is divided into the specific seed user community;
    统计每个待拓展用户划分入特定种子用户群落的个数并进行排序,并根据排序结果确定将待拓展用户拓展为种子用户的拓展规则。The number of the users to be expanded into the specific seed user community is counted and sorted, and the expansion rules for expanding the users to be expanded into seed users are determined according to the ranking result.
  2. 如权利要求1所述的电子设备,其特征在于,所述特定特征包括用户的地理位置、是否为注册用户、是否购买过特定产品。The electronic device of claim 1, wherein the specific feature comprises a geographic location of the user, whether it is a registered user, whether a particular product has been purchased.
  3. 如权利要求1所述的电子设备,其特征在于,所述计算待拓展用户与每个种子用户群落的相似度包括:计算待拓展用户与每个种子用户群落中心点的相似度,作为待拓展用户与每个种子用户群落的相似度。The electronic device according to claim 1, wherein the calculating the similarity between the user to be expanded and the community of each seed user comprises: calculating the similarity between the user to be expanded and the center point of each seed user, as a to be expanded The similarity between the user and each seed user community.
  4. 如权利要求2所述的电子设备,其特征在于,所述拓展规则设置为:根据个数从高到低的顺序,选取指定数量的待拓展用户,将选取的待拓展用户拓展为种子用户。The electronic device according to claim 2, wherein the extension rule is configured to: select a specified number of users to be expanded according to a sequence from high to low, and expand the selected user to be expanded into a seed user.
  5. 如权利要求3所述的电子设备,其特征在于,所述拓展规则设置为: 根据个数从高到低的顺序,选取指定数量的待拓展用户,将选取的待拓展用户拓展为种子用户。The electronic device according to claim 3, wherein the extension rule is set to: select a specified number of users to be expanded according to a sequence from high to low, and expand the selected user to be expanded into a seed user.
  6. 如权利要求2所述的电子设备,其特征在于,所述拓展规则设置为:若待拓展用户划分入特定种子用户群落的个数大于或等于第二预设阈值,则将该待拓展用户拓展为种子用户,其中,所述第二预设阈值设置为所有种子用户群落总个数的预定比例。The electronic device according to claim 2, wherein the extension rule is configured to expand the user to be expanded if the number of users to be expanded into a specific seed user community is greater than or equal to a second predetermined threshold. For the seed user, the second preset threshold is set to a predetermined ratio of the total number of all seed user communities.
  7. 如权利要求3所述的电子设备,其特征在于,所述拓展规则设置为:若待拓展用户划分入特定种子用户群落的个数大于或等于第二预设阈值,则将该待拓展用户拓展为种子用户,其中,所述第二预设阈值设置为所有种子用户群落总个数的预定比例。The electronic device according to claim 3, wherein the extension rule is configured to expand the user to be expanded if the number of users to be expanded into a specific seed user community is greater than or equal to a second preset threshold. For the seed user, the second preset threshold is set to a predetermined ratio of the total number of all seed user communities.
  8. 一种种子用户拓展方法,应用于电子设备,其特征在于,所述方法包括:A seed user extension method is applied to an electronic device, and the method includes:
    通过预设的聚类方法针对预定数量的种子用户进行聚类分析,将所述种子用户分为若干具有特定特征的种子用户群落;Clustering analysis is performed on a predetermined number of seed users by a preset clustering method, and the seed user is divided into a plurality of seed user communities having specific characteristics;
    通过预设的相似度计算方法,计算待拓展用户与每个种子用户群落的相似度;Calculating the similarity between the user to be expanded and each seed user community by a preset similarity calculation method;
    若待拓展用户与特定种子用户群落的相似度大于或等于第一预设阈值,则将该待拓展用户划分入该特定种子用户群落;及If the similarity between the user to be expanded and the specific seed user community is greater than or equal to the first preset threshold, the user to be expanded is divided into the specific seed user community;
    统计每个待拓展用户划分入特定种子用户群落的个数并进行排序,并根据排序结果确定将待拓展用户拓展为种子用户的拓展规则。The number of the users to be expanded into the specific seed user community is counted and sorted, and the expansion rules for expanding the users to be expanded into seed users are determined according to the ranking result.
  9. 如权利要求8所述的种子用户拓展方法,其特征在于,所述特定特征包括用户的地理位置、是否为注册用户、是否购买过特定产品。The seed user extension method according to claim 8, wherein the specific feature comprises a geographical location of the user, whether it is a registered user, or whether a specific product has been purchased.
  10. 如权利要求8所述的种子用户拓展方法,其特征在于,所述计算待拓展用户与每个种子用户群落的相似度包括:The seed user extension method according to claim 8, wherein the calculating the similarity between the user to be expanded and each seed user community comprises:
    计算待拓展用户与每个种子用户群落中心点的相似度,作为待拓展用户与每个种子用户群落的相似度。The similarity between the user to be expanded and the center point of each seed user community is calculated as the similarity between the user to be expanded and each seed user community.
  11. 如权利要求9所述的种子用户拓展方法,其特征在于,所述拓展规则设置为:根据个数从高到低的顺序,选取指定数量的待拓展用户,将选取的待拓展用户拓展为种子用户。The seed user extension method according to claim 9, wherein the extension rule is set to: select a specified number of users to be expanded according to the order of the number from high to low, and expand the selected user to be expanded into a seed. user.
  12. 如权利要求10所述的种子用户拓展方法,其特征在于,所述拓展规则设置为:根据个数从高到低的顺序,选取指定数量的待拓展用户,将选取的待拓展用户拓展为种子用户。The seed user extension method according to claim 10, wherein the extension rule is set to: select a specified number of users to be expanded according to the order of the number from high to low, and expand the selected users to be expanded into seeds. user.
  13. 如权利要求9所述的种子用户拓展方法,其特征在于,所述拓展规则设置为:若待拓展用户划分入特定种子用户群落的个数大于或等于第二预设阈值,则将该待拓展用户拓展为种子用户,其中,所述第二预设阈值设置为所有种子用户群落总个数的预定比例。The seed user extension method according to claim 9, wherein the extension rule is configured to: if the number of users to be expanded into a specific seed user community is greater than or equal to a second preset threshold, The user is extended to a seed user, wherein the second preset threshold is set to a predetermined ratio of the total number of all seed user communities.
  14. 如权利要求10所述的种子用户拓展方法,其特征在于,所述拓展规则设置为:若待拓展用户划分入特定种子用户群落的个数大于或等于第二预设阈值,则将该待拓展用户拓展为种子用户,其中,所述第二预设阈值设置为所有种子用户群落总个数的预定比例。The seed user extension method according to claim 10, wherein the extension rule is configured to: if the number of users to be expanded into a specific seed user community is greater than or equal to a second preset threshold, The user is extended to a seed user, wherein the second preset threshold is set to a predetermined ratio of the total number of all seed user communities.
  15. 一种计算机可读存储介质,所述计算机可读存储介质存储有种子用户拓展系统,所述种子用户拓展系统可被至少一个处理器执行,所述种子用户拓展系统被所述处理器执行时实现如下步骤:A computer readable storage medium storing a seed user extension system executable by at least one processor, the seed user extension system being implemented by the processor The following steps:
    通过预设的聚类方法针对预定数量的种子用户进行聚类分析,将所述种子用户分为若干具有特定特征的种子用户群落;Clustering analysis is performed on a predetermined number of seed users by a preset clustering method, and the seed user is divided into a plurality of seed user communities having specific characteristics;
    通过预设的相似度计算方法,计算待拓展用户与每个种子用户群落的相似度;Calculating the similarity between the user to be expanded and each seed user community by a preset similarity calculation method;
    若待拓展用户与特定种子用户群落的相似度大于或等于第一预设阈值,则将该待拓展用户划分入该特定种子用户群落;及If the similarity between the user to be expanded and the specific seed user community is greater than or equal to the first preset threshold, the user to be expanded is divided into the specific seed user community;
    统计每个待拓展用户划分入特定种子用户群落的个数并进行排序,并根据排序结果确定将待拓展用户拓展为种子用户的拓展规则。The number of the users to be expanded into the specific seed user community is counted and sorted, and the expansion rules for expanding the users to be expanded into seed users are determined according to the ranking result.
  16. 如权利要求15所述的计算机可读存储介质,其特征在于,所述特定特征包括用户的地理位置、是否为注册用户、是否购买过特定产品。The computer readable storage medium of claim 15 wherein the particular characteristic comprises a geographic location of the user, whether it is a registered user, whether a particular product has been purchased.
  17. 如权利要求15所述的计算机可读存储介质,其特征在于,所述计算待拓展用户与每个种子用户群落的相似度包括:计算待拓展用户与每个种子用户群落中心点的相似度,作为待拓展用户与每个种子用户群落的相似度。The computer readable storage medium according to claim 15, wherein the calculating the similarity between the user to be expanded and each of the seed user communities comprises: calculating a similarity between the user to be expanded and the center point of each seed user community, As the similarity between the user to be expanded and each seed user community.
  18. 如权利要求16所述的计算机可读存储介质,其特征在于,所述拓展规则设置为:根据个数从高到低的顺序,选取指定数量的待拓展用户,将选取的待拓展用户拓展为种子用户。The computer readable storage medium according to claim 16, wherein the extension rule is configured to: select a specified number of users to be expanded according to a sequence from high to low, and expand the selected users to be expanded to Seed user.
  19. 如权利要求17所述的计算机可读存储介质,其特征在于,所述拓展规则设置为:根据个数从高到低的顺序,选取指定数量的待拓展用户,将选取的待拓展用户拓展为种子用户。The computer readable storage medium according to claim 17, wherein the extension rule is configured to: select a specified number of users to be expanded according to a sequence from high to low, and expand the selected user to be expanded to Seed user.
  20. 如权利要求16或17所述的计算机可读存储介质,其特征在于,所述拓展规则设置为:若待拓展用户划分入特定种子用户群落的个数大于或等于 第二预设阈值,则将该待拓展用户拓展为种子用户,其中,所述第二预设阈值设置为所有种子用户群落总个数的预定比例。The computer readable storage medium according to claim 16 or 17, wherein the extension rule is configured to: if the number of users to be expanded into a specific seed user community is greater than or equal to a second preset threshold, The user to be expanded is extended to a seed user, wherein the second preset threshold is set to a predetermined ratio of the total number of all seed user communities.
PCT/CN2018/076181 2017-12-18 2018-02-10 Seed user development method, electronic device and computer-readable storage medium WO2019119635A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201711364792.0A CN107944931A (en) 2017-12-18 2017-12-18 Seed user expanding method, electronic equipment and computer-readable recording medium
CN201711364792.0 2017-12-18

Publications (1)

Publication Number Publication Date
WO2019119635A1 true WO2019119635A1 (en) 2019-06-27

Family

ID=61943698

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/076181 WO2019119635A1 (en) 2017-12-18 2018-02-10 Seed user development method, electronic device and computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN107944931A (en)
WO (1) WO2019119635A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102737327A (en) * 2011-03-31 2012-10-17 国际商业机器公司 Computer implemented method and system for dividing customer clusters
US20150302436A1 (en) * 2003-08-25 2015-10-22 Thomas J. Reynolds Decision strategy analytics
CN106022800A (en) * 2016-05-16 2016-10-12 北京百分点信息科技有限公司 User feature data processing method and device
CN107067045A (en) * 2017-05-31 2017-08-18 北京京东尚科信息技术有限公司 Data clustering method, device, computer-readable medium and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150302436A1 (en) * 2003-08-25 2015-10-22 Thomas J. Reynolds Decision strategy analytics
CN102737327A (en) * 2011-03-31 2012-10-17 国际商业机器公司 Computer implemented method and system for dividing customer clusters
CN106022800A (en) * 2016-05-16 2016-10-12 北京百分点信息科技有限公司 User feature data processing method and device
CN107067045A (en) * 2017-05-31 2017-08-18 北京京东尚科信息技术有限公司 Data clustering method, device, computer-readable medium and electronic equipment

Also Published As

Publication number Publication date
CN107944931A (en) 2018-04-20

Similar Documents

Publication Publication Date Title
TWI693527B (en) Position information processing method and device
CN106033574B (en) Method and device for identifying cheating behaviors
TWI718643B (en) Method and device for identifying abnormal groups
CN107305637B (en) Data clustering method and device based on K-Means algorithm
WO2021068610A1 (en) Resource recommendation method and apparatus, electronic device and storage medium
WO2019085463A1 (en) Department demand recommendation method, application server, and computer-readable storage medium
JP5518856B2 (en) Improved image recognition support device
WO2019085120A1 (en) Collaborative filtering recommendation method, electronic device, and computer readable storage medium
WO2020114108A1 (en) Clustering result interpretation method and device
WO2021109724A1 (en) Log anomaly detection method and apparatus
WO2021042844A1 (en) Large-scale data clustering method and apparatus, computer device and computer-readable storage medium
WO2021135104A1 (en) Multi-source data-based object pushing method and apparatus, device, and storage medium
WO2019119635A1 (en) Seed user development method, electronic device and computer-readable storage medium
US11030183B2 (en) Automatic content-based append detection
US9141687B2 (en) Identification of data objects within a computer database
WO2020119136A1 (en) Position determination
JP2016170493A (en) Data arrangement program, data arrangement method and data arrangement apparatus
CN111026865A (en) Relation alignment method, device and equipment of knowledge graph and storage medium
WO2019061664A1 (en) Electronic device, user's internet surfing data-based product recommendation method, and storage medium
US20180096021A1 (en) Methods and systems for improved search for data loss prevention
US10810458B2 (en) Incremental automatic update of ranked neighbor lists based on k-th nearest neighbors
WO2015109781A1 (en) Method and device for determining parameter of statistical model on the basis of expectation maximization
WO2019041524A1 (en) Method, electronic apparatus, and computer readable storage medium for generating cluster tag
US20170337203A1 (en) Evaluation program, evaluation method, and information processing device
CN110134721B (en) Data statistics method and device based on bitmap and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18891887

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 22/09/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18891887

Country of ref document: EP

Kind code of ref document: A1