WO2020151152A1 - 基于用户画像的聚类方法、电子装置及存储介质 - Google Patents

基于用户画像的聚类方法、电子装置及存储介质 Download PDF

Info

Publication number
WO2020151152A1
WO2020151152A1 PCT/CN2019/089151 CN2019089151W WO2020151152A1 WO 2020151152 A1 WO2020151152 A1 WO 2020151152A1 CN 2019089151 W CN2019089151 W CN 2019089151W WO 2020151152 A1 WO2020151152 A1 WO 2020151152A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
variables
clustering
feature
weight
Prior art date
Application number
PCT/CN2019/089151
Other languages
English (en)
French (fr)
Inventor
金戈
徐亮
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020151152A1 publication Critical patent/WO2020151152A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Definitions

  • This application relates to the field of data analysis technology, and more specifically, to a clustering method, electronic device and storage medium based on user portraits.
  • User portrait is the labeling of user information, and a label is usually a highly refined feature identification, such as age, gender, user preference, etc.
  • all the labels of the user are comprehensively looked at, and a three-dimensional "portrait" of the user can be outlined
  • user portraits can abstract the full picture of user information.
  • the user portraits are clustered.
  • the data sources can be divided into life attributes, behavior attributes, etc., which cannot be clustered accurately.
  • the purpose of the present application is to provide a clustering method, electronic device and storage medium based on user portraits for targeted clustering on the basis of retaining all feature information.
  • the present application provides an electronic device, the electronic device includes a memory and a processor, the memory includes a user portrait-based clustering program, the user portrait-based clustering program is used by the processor The following steps are implemented during execution:
  • the characteristic variable corresponding to the user characteristic is a continuous variable and a discrete variable
  • the continuous variable is a numerical variable with an order attribute
  • the discrete variable is a non-numeric variable
  • this application also provides a clustering method based on user portraits, including:
  • the characteristic variable is a continuous variable and a discrete variable
  • the continuous variable is a numeric variable with an order attribute
  • the discrete variable is a non-numeric variable
  • the present application also provides a computer-readable storage medium that includes a clustering program based on a user portrait, and when the clustering program based on a user portrait is executed by a processor , To achieve the steps of the above-mentioned clustering method based on user profile.
  • the clustering method, electronic device, and computer-readable storage medium based on user portraits described in this application can achieve targeted clustering on the basis of retaining all feature information. At the same time, due to the orderly and disorderly processing of discrete features, The overall accuracy is improved.
  • FIG. 1 is a schematic diagram of the application environment of a preferred embodiment of the clustering method based on user portraits of this application;
  • FIG. 2 is a schematic diagram of modules of a preferred embodiment of the clustering program based on user portraits in FIG. 1;
  • Fig. 3 is a flowchart of a preferred embodiment of a clustering method based on user portraits of the present application.
  • This application provides a clustering method based on user portraits, which is applied to an electronic device 1.
  • FIG. 1 it is a schematic diagram of an application environment of a preferred embodiment of the clustering method based on user portraits of this application.
  • the electronic device 1 may be a terminal client with computing functions such as a server, a mobile phone, a tablet computer, a portable computer, a desktop computer, and the like.
  • the memory 11 includes at least one type of readable storage medium.
  • the at least one type of readable storage medium may be a non-volatile storage medium such as flash memory, hard disk, multimedia card, card-type memory, and the like.
  • the readable storage medium may be an internal storage unit of the electronic device 1, such as a hard disk of the electronic device 1.
  • the readable storage medium may also be an external memory of the electronic device 1, for example, a plug-in hard disk or a smart memory card (Smart Media Card, SMC) equipped on the electronic device 1. Secure Digital (SD) card, Flash Card, etc.
  • SD Secure Digital
  • the readable storage medium of the memory 11 is generally used to store a clustering program 10 based on a user portrait installed in the electronic device 1 and the like.
  • the memory 11 can also be used to temporarily store data that has been output or will be output.
  • the processor 12 may be a central processing unit (CPU), a microprocessor or other data processing chip, which is used to run the program code or process data stored in the memory 11, for example, to execute a user profile based The clustering program 10 and so on.
  • CPU central processing unit
  • microprocessor or other data processing chip
  • the network interface 13 may optionally include a standard wired interface and a wireless interface (such as a Wi-Fi interface), and is usually used to establish a communication connection between the electronic device 1 and other electronic clients.
  • a standard wired interface and a wireless interface such as a Wi-Fi interface
  • the communication bus 14 is used to realize the connection and communication between these components.
  • FIG. 1 only shows the electronic device 1 with the components 11-14, but it should be understood that it is not required to implement all the illustrated components, and more or fewer components may be implemented instead.
  • the electronic device 1 may also include a user interface.
  • the user interface may include an input unit such as a keyboard (Keyboard), a voice input device such as a microphone (microphone) and other clients with voice recognition functions, and a voice output device such as audio, earphones, etc. Etc.
  • the user interface may also include a standard wired interface and a wireless interface.
  • the electronic device 1 may also include a display, which may also be called a display screen or a display unit.
  • a display which may also be called a display screen or a display unit.
  • it may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an organic light-emitting diode (Organic Light-Emitting Diode, OLED) touch device, etc.
  • the display is used for displaying information processed in the electronic device 1 and for displaying a visualized user interface.
  • the electronic device 1 further includes a touch sensor.
  • the area provided by the touch sensor for the user to perform touch operations is called a touch area.
  • the touch sensor described here may be a resistive touch sensor, a capacitive touch sensor, or the like.
  • the touch sensor includes not only a contact type touch sensor, but also a proximity type touch sensor and the like.
  • the touch sensor may be a single sensor, or may be, for example, a plurality of sensors arranged in an array.
  • the electronic device 1 may also include logic gate circuits, sensors, audio circuits, etc., which will not be repeated here.
  • the memory 11 as a computer storage medium may include an operating system and a clustering program 10 based on user portraits; the processor 12 executes the clustering based on user portraits stored in the memory 11.
  • the following steps are implemented at program 10:
  • the characteristic variable corresponding to the user characteristic is a continuous variable and a discrete variable
  • the continuous variable is a numerical variable with an order attribute
  • the discrete variable is a non-numeric variable
  • the clustering program 10 based on user portraits may also be divided into one or more modules, and one or more modules are stored in the memory 11 and executed by the processor 12 to complete the application.
  • the module referred to in this application refers to a series of computer program instruction segments that can complete specific functions.
  • Fig. 2 it is a functional block diagram of a preferred embodiment of the clustering program 10 based on user portraits in Fig. 1.
  • the clustering program 10 based on user portraits can be divided into:
  • the user characteristic acquisition module 110 acquires the user characteristics of multiple users and their corresponding characteristic variables
  • the conversion module 120 converts user characteristics into word vectors
  • the first clustering module 130 clusters the word vectors and determines the category to which each user feature belongs;
  • the dividing module 140 divides the characteristic variable into a continuous variable and a discrete variable, the continuous variable is a numeric variable with an order attribute, and the discrete variable is a non-numeric variable;
  • the quantization module 150 quantifies discrete variables and continuous variables
  • the preference selection module 160 filters out the categories of user characteristics with preferences, and assigns a weight greater than 1 to the quantified discrete variables and continuous variables of the user characteristic categories with preferences, and the preferences refer to user characteristics of interest. , Is also the bias of the clustering process;
  • the second clustering module 170 clusters all quantified discrete variables and continuous variables, and clusters the feature variables of the weighted user feature category and the feature variables of the unweighted user feature category to obtain biased users Feature clustering.
  • this application also provides a clustering method based on user portraits.
  • FIG. 3 is a flowchart of a preferred embodiment of a clustering method based on user portraits in this application.
  • the method can be executed by a device, and the device can be implemented by software and/or hardware.
  • the clustering method based on user portraits includes:
  • Step S1 Obtain the user characteristics of multiple users and the characteristic variables corresponding to the user characteristics.
  • the user characteristics and characteristic variables can be obtained from the network by using web crawler technology, or they can be obtained through special data.
  • the feature is gender, and the feature variable is female;
  • Step S2 converting user characteristics into word vectors, for example, searching for word vectors corresponding to user characteristics from a word vector dictionary.
  • the word vector dictionary is a pre-prepared dictionary, preferably using the Word2Vec algorithm to generate the word vector dictionary;
  • Step S3 cluster the word vector to determine the category of each user feature.
  • This step can be implemented by the SKLearn module in Python.
  • name, gender, age, hometown, etc. can be clustered into personal attributes
  • educational background, certificate, Work experience, etc. can be clustered into business ability
  • family ranking, family structure, family happiness, and family education can be clustered into family responsibility;
  • Step S4 Divide the characteristic variable into a continuous variable and a discrete variable.
  • the continuous variable is a numerical variable with an order attribute
  • the discrete variable is a non-numeric variable (such as place name, rank information).
  • the characteristic variable can be distinguished by Programming is realized automatically;
  • Step S5 quantify discrete variables and continuous variables
  • Step S6 Screen out the user characteristic categories with preference, and assign a weight greater than 1 to the quantified discrete variables and continuous variables of the preference user characteristic categories.
  • the preference refers to the bias of the clustering process. For example, for character-biased clustering, the proportion of characteristic variables related to character-related user characteristics will be increased, and the clustering results will have more significant differences in character;
  • Step S7 clustering all discrete variables and continuous variables that have been quantified, that is, clustering the feature variables of the weighted user feature categories and the feature variables of the unweighted user feature categories (for example, hierarchical clustering, K-Means clustering, etc.) to obtain biased user feature clustering.
  • This step can be achieved through the K-Prototypes library in Python.
  • the above-mentioned clustering method is an unsupervised classification method.
  • a weighted clustering algorithm is established according to the characteristics of user portraits.
  • the user classification function can be weighted and modified according to specific application scenarios, and the preference of clustering methods can be increased according to business requirements.
  • step S5 the above-mentioned method for quantifying discrete variables and continuous variables includes:
  • Transform discrete variables place names and other information that are not sequential and whose value exceeds a set number (for example, 20) into high-level forms (such as identity, city level, etc.);
  • the discrete variables and continuous variables with order after coding are filtered out and normalized.
  • the category of the preferred user feature is one or more categories, and when the category of the preferred user feature is one category, the preferred user feature
  • the weight of the feature variable of a type of user characteristic is greater than 1 and not greater than n-1; when the category with preference is multiple types, the weight of the characteristic variable of a type of user characteristic of multiple types of preference is greater than 1 and The sum of the weights is not greater than the range of n-1, and n is the number of categories after user feature clustering.
  • the category of the preferred user feature is one or more categories, and when the category of the preferred user feature is one category, the category of the preferred user feature
  • the weight value of the feature variable is greater than 1 and the product of the number of user features of the category and the weight value is equal to the sum of the number of user features of other categories; when there are multiple categories of preference, the preference of multiple categories
  • the weight of the feature variable of a type of user feature is greater than 1, and the sum of the weights is equal to the sum of the number of user features of the category without preference.
  • the total number of user features is 800, and there are 4 user feature categories.
  • the number of user characteristics from the first category to the fourth category are 100, 300, 200, 200, respectively. If the preference is classified as the first category, the weight of the first category changes within the range of greater than 1 but not greater than 7.
  • the weights assigned to user feature categories with preferences in the above two embodiments can be changed within the above range to obtain different sub-assignments to obtain different sub-clusters.
  • One or more of the following embodiments can be used.
  • the combination of obtains the optimal weight of the user characteristic category with preference.
  • the method of assigning a weight greater than 1 to discrete variables and continuous variables that have been quantified for user feature categories with preferences includes:
  • the optimal weight is determined.
  • it also includes:
  • the clustering result corresponding to the best weight is regarded as the best biased user feature clustering, which includes:
  • s i is the i th cluster profile coefficients
  • a i and b i are the maximum distance of two characteristic variables of the i-th clustering result belonging to different categories
  • the method of assigning a weight greater than 1 to discrete variables and continuous variables that have been quantified for user feature categories with preferences includes:
  • b ij is the j-th characteristic variable of the i-th user characteristic
  • matrix W is the weights assigned different times to the feature variables that have preference for one or more types of user characteristics
  • is the linear coefficient vector assigned weights each time
  • w n,l is the nth feature variable for the lth time
  • the weight assigned the weight is greater than 1 and not greater than n-1
  • n is the number of feature variables
  • l is the number of weightings
  • w l is the weight vector composed of the weights of the lth weighting, and each The sum of the weights in the weight vector is not greater than n-1
  • ⁇ l is the linear coefficient of the lth weighting
  • ⁇ k ⁇ 0
  • k 1, 2,,l
  • F n is the combined weight of the nth feature
  • the optimal solution of the combined weight matrix corresponding to the first derivative of the weight evaluation model is zero as the optimal weight of each characteristic variable.
  • the method of assigning a weight greater than 1 to discrete variables and continuous variables that have been quantified for user feature categories with preferences includes:
  • b ij is the j-th characteristic variable of the i-th user characteristic
  • matrix W is the weights assigned different times to the feature variables that have preference for one or more types of user characteristics
  • is the linear coefficient vector assigned weights each time
  • w n,l is the nth feature variable for the lth time
  • the weight assigned the weight is greater than 1 and not greater than n-1
  • n is the number of feature variables
  • l is the number of weightings
  • w l is the weight vector composed of the weights of the lth weighting, and each The sum of weights in the weight vector is not greater than n-1
  • ⁇ l is the linear coefficient of the lth weighting
  • ⁇ k ⁇ 0
  • k 1, 2,...,l
  • F n is the combined weight of the nth feature
  • the optimal solution of the combined weight matrix corresponding to the first derivative of the weight evaluation model is zero as the optimal weight of each characteristic variable.
  • the vector difference matrix is used to construct the weight evaluation model, which reflects the difference between the characteristic variables belonging to different user characteristics, and makes the difference between the various types of characteristic variables clustering clear and has good interpretability.
  • the vector and The matrix constructs the weight evaluation model, which reflects the connection between the characteristics of different users, so that the characteristic variables have a good profile when clustering. Therefore, the weighted combination of the two can be used to construct the evaluation model.
  • the method for quantifying discrete variables and continuous variables includes:
  • the degree of dispersion can be obtained according to one or more of the range, interquartile range, variance, standard deviation, average variance, and coefficient of variation of the word vector, for example, using average variance evaluation Dispersion,
  • PC is the degree of dispersion of the discrete variable of a user characteristic
  • N is the number of users
  • y i and o i are the discrete variables of the user characteristic of the i-th user and its expected value
  • the expected value reduces the degree of dispersion Set value
  • the threshold performs summary statistics on discrete variables whose degree of dispersion exceeds the threshold (a value can be set, the higher the clustering accuracy, the lower the threshold), until the degree of dispersion does not exceed the threshold.
  • the discrete characteristics of residential areas can be summarized from neighborhoods into streets. After the generalization is unified as a street, when the degree of dispersion of the discrete features still exceeds the threshold, it can be further generalized and unified as a district/county.
  • the method of clustering all discrete variables and continuous variables that have been quantified to obtain a biased user feature clustering includes:
  • the nodes are clustered according to the similarity (for example, clustering by k-means method), and the intersection of the initial clusters in the clustering results is taken as the best clustering result.
  • similarity for example, clustering by k-means method
  • an embodiment of the present application also proposes a computer-readable storage medium that includes a clustering program based on a user portrait, and the following steps are implemented when the clustering program based on a user portrait is executed by a processor :
  • the characteristic variable is a continuous variable and a discrete variable
  • the continuous variable is a numeric variable with an order attribute
  • the discrete variable is a non-numeric variable
  • the specific implementation of the computer-readable storage medium of the present application is substantially the same as the specific implementation of the above-mentioned clustering method and electronic device based on user portraits, and will not be repeated here.
  • the above-mentioned clustering methods, electronic devices and storage media based on user portraits can select several fields that are more concerned (targeted classification, for example, for this group of users, I hope I can focus on personal attribute classification, then increase the weight of this part of the attribute) ) Perform weight adjustment (greater than 1) to achieve targeted clustering.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Databases & Information Systems (AREA)
  • Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Data Mining & Analysis (AREA)
  • Technology Law (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请涉及数据分析技术,提供一种基于用户画像的聚类方法,包括:获取多个用户的用户特征及其特征变量;将用户特征转为词向量;对词向量进行聚类,确定各用户特征所属类别;将所述特征变量划分为连续变量和离散变量;对离散变量和连续变量进行量化处理;筛选出有偏好的用户特征的类别,对所述有偏好的用户特征类别的经过量化处理的离散变量和连续变量赋予大于1的权值;对所有经过量化处理的离散变量和连续变量进行聚类,得到有偏向的用户特征聚类。本申请还提出了一种电子装置及存储介质。本申请在保留全部特征信息的基础上,有针对性的聚类。

Description

基于用户画像的聚类方法、电子装置及存储介质
本申请要求于2019年1月24日提交的中国专利申请号2019100688777的优先权益,上述案件全部内容以引用的方式并入本文中。
技术领域
本申请涉及数据分析技术领域,更为具体地,涉及一种基于用户画像的聚类方法、电子装置及存储介质。
背景技术
为精准营销服务,进而深入挖掘潜在的商业价值,于是,用户画像的概念应运而生。用户画像是用户信息的标签化,而一个标签通常是高度精炼的特征标识,如年龄、性别、用户偏好等,最后将用户的所有标签综合来看,就可以勾勒出该用户的立体“画像”了,用户画像可抽象出用户信息的全貌。现阶段对于用户画像进行聚类,通常数据源可以分为生活属性、行为属性等等,不能有针对性的准确聚类。
发明内容
鉴于上述问题,本申请的目的是提供一种在保留全部特征信息的基础上,有针对性的聚类的基于用户画像的聚类方法、电子装置及存储介质。
为了实现上述目的,本申请提供一种电子装置,所述电子装置包括存储器和处理器,所述存储器中包括基于用户画像的聚类程序,所述基于用户画像的聚类程序被所述处理器执行时实现如下步骤:
获取多个用户的用户特征及所述用户特征对应的特征变量;
将用户特征转化为词向量;
对词向量进行聚类,确定各用户特征所属类别;
将所述用户特征对应的特征变量划分为连续变量和离散变量,所述连续变量是具有次序属性的数值型变量,所述离散变量是非数值型变量;
对离散变量和连续变量进行量化处理;
筛选出有偏好的用户特征的类别,对所述有偏好的用户特征类别的经过量化处理的离散变量和连续变量赋予大于1的权值,所述偏好是指聚类过程的偏向性;
对所有经过量化处理的离散变量和连续变量进行聚类,得到有偏向的用户特征聚类。
此外,为了实现上述目的,本申请还提供一种基于用户画像的聚类方法,包括:
获取多个用户的用户特征及其对应的特征变量;
将用户特征转化为词向量;
对词向量进行聚类,确定各用户特征所属类别;
将所述特征变量划分为连续变量和离散变量,所述连续变量是具有次序属性的数值型变量,所述离散变量是非数值型变量;
对离散变量和连续变量进行量化处理;
筛选出有偏好的用户特征的类别,对所述有偏好的用户特征类别的经过量化处理的离散变量和连续变量赋予大于1的权值,所述偏好是指聚类过程的偏向性;
对所有经过量化处理的离散变量和连续变量进行聚类,得到有偏向的用户特征聚类。
此外,为了实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质中包括基于用户画像的聚类程序,所述基于用户画像的聚类程序被处理器执行时,实现上述的基于用户画像的聚类方法的步骤。
本申请所述基于用户画像的聚类方法、电子装置及计算机可读存储介质可以在保留全部特征信息的基础上,实现针对性的聚类,同时由于对离散特征的有序、无序处理,使得整体的精度得到提升。
附图说明
图1是本申请基于用户画像的聚类方法较佳实施例的应用环境示意图;
图2是图1中基于用户画像的聚类程序较佳实施例的模块示意图;
图3是本申请基于用户画像的聚类方法较佳实施例的流程图。
具体实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
以下将结合附图对本申请的具体实施例进行详细描述。
本申请提供一种基于用户画像的聚类方法,应用于一种电子装置1。参照图1所示,为本申请基于用户画像的聚类方法较佳实施例的应用环境示意图。
在本实施例中,电子装置1可以是服务器、手机、平板电脑、便携计算机、桌上型计算机等具有运算功能的终端客户端。
存储器11包括至少一种类型的可读存储介质。所述至少一种类型的可读存储介质可为如闪存、硬盘、多媒体卡、卡型存储器等的非易失性存储介质。在一些实施例中,所述可读存储介质可以是所述电子装置1的内部存储单元,例如该电子装置1的硬盘。在另一些实施例中,所述可读存储介质也可以是所述电子装置1的外部存储器,例如所述电子装置1上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。
在本实施例中,所述存储器11的可读存储介质通常用于存储安装于所述电子装置1的基于用户画像的聚类程序10等。所述存储器11还可以用于暂时地存储已经输出或者将要输出的数据。
处理器12在一些实施例中可以是一中央处理器(Central Processing Unit,CPU),微处理器或其他数据处理芯片,用于运行存储器11中存储的程序代码或处理数据,例如执行基于用户画像的聚类程序10等。
网络接口13可选地可以包括标准的有线接口、无线接口(如WI-FI接口),通常用于在该电子装置1与其他电子客户端之间建立通信连接。
通信总线14用于实现这些组件之间的连接通信。
图1仅示出了具有组件11-14的电子装置1,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。
可选地,该电子装置1还可以包括用户接口,用户接口可以包括输入单元比如键盘(Keyboard)、语音输入装置比如麦克风(microphone)等具有语音识别功能的客户端、语音输出装置比如音响、耳机等,可选地用户接口还可以包括标准的有线接口、无线接口。
可选地,该电子装置1还可以包括显示器,显示器也可以称为显示屏或显示单元。
在一些实施例中可以是LED显示器、液晶显示器、触控式液晶显示器以及有机发光二极管(Organic Light-Emitting Diode,OLED)触摸器等。显示器用于显示在电子装置1中处理的信息以及用于显示可视化的用户界面。
可选地,该电子装置1还包括触摸传感器。所述触摸传感器所提供的供用户进行触摸操作的区域称为触控区域。此外,这里所述的触摸传感器可以为电阻式触摸传感器、电容式触摸传感器等。而且,所述触摸传感器不仅包括接触式的触摸传感器,也可包括接近式的触摸传感器等。此外,所述触摸传感器可以为单个传感器,也可以为例如阵列布置的多个传感器。
可选地,该电子装置1还可以包括逻辑门电路,传感器、音频电路等等,在此不再赘述。
在图1所示的装置实施例中,作为一种计算机存储介质的存储器11中可以包括操作系统以及基于用户画像的聚类程序10;处理器12执行存储器11中存储的基于用户画像的聚类程序10时实现如下步骤:
获取多个用户的用户特征及所述用户特征对应的特征变量;
将用户特征转化为词向量;
对词向量进行聚类,确定各用户特征所属类别;
将所述用户特征对应的特征变量划分为连续变量和离散变量,所述连续变量是具有次序属性的数值型变量,所述离散变量是非数值型变量;
对离散变量和连续变量进行量化处理;
筛选出有偏好的用户特征的类别,对所述有偏好的用户特征类别的经过量化处理的离散变量和连续变量赋予大于1的权值,所述偏好是指聚类过程的偏向性;
对所有经过量化处理的离散变量和连续变量进行聚类,得到有偏向的用户特征聚类。
在其他实施例中,所述基于用户画像的聚类程序10还可以被分割为一个或者多个模块,一个或者多个模块被存储于存储器11中,并由处理器12执行,以完成本申请。本申请所称的模块是指能够完成特定功能的一系列计算机程序指令段。参照图2所示,为图1中基于用户画像的聚类程序10较佳实 施例的功能模块图。所述基于用户画像的聚类程序10可以被分割为:
用户特征获取模块110,获取多个用户的用户特征及其对应的特征变量;
转化模块120,将用户特征转化为词向量;
第一聚类模块130,对词向量进行聚类,确定各用户特征所属类别;
划分模块140,将所述特征变量划分为连续变量和离散变量,所述连续变量是具有次序属性的数值型变量,所述离散变量是非数值型变量;
量化模块150,对离散变量和连续变量进行量化处理;
偏好选择模块160,筛选出有偏好的用户特征的类别,对所述有偏好的用户特征类别的经过量化处理的离散变量和连续变量赋予大于1的权值,所述偏好是指关注的用户特征,也是聚类过程的偏向性;
第二聚类模块170,对所有经过量化的离散变量和连续变量进行聚类,将有加权的用户特征类别的特征变量和无加权的用户特征类别的特征变量进行聚类,得到有偏向的用户特征聚类。
此外,本申请还提供一种基于用户画像的聚类方法。参照图3所示,为本申请基于用户画像的聚类方法较佳实施例的流程图。该方法可以由一个装置执行,该装置可以由软件和/或硬件实现。
在本实施例中,基于用户画像的聚类方法,包括:
步骤S1,获取多个用户的用户特征及所述用户特征对应的特征变量,例如,可以利用网络爬虫技术从网络中获得用户特征及其特征变量,也可以通过专门的数据获得,又如,用户特征为性别,特征变量为女;
步骤S2,将用户特征转化为词向量,例如,从词向量词典中查找用户特征对应的词向量,具体地,词向量词典为预先准备的字典,优选采用Word2Vec算法生成词向量词典;
步骤S3,对词向量进行聚类,确定各用户特征所属类别,这一步骤可通过Python中的SKLearn模块实现,例如,姓名、性别、年龄、籍贯等可以聚类为个人属性,学历、证书、工作经历等可以聚类为业务能力,家中排行、家庭结构,家庭幸福感、家庭教育等可以聚类为家庭责任感;
步骤S4,将所述特征变量划分为连续变量和离散变量,所述连续变量是具有次序属性的数值型变量,所述离散变量是非数值型变量(如地名、等级信息),特征变量区分可通过编程自动实现;
步骤S5,对离散变量和连续变量进行量化处理;
步骤S6,筛选出有偏好的用户特征的类别,对所述有偏好的用户特征类别的经过量化处理的离散变量和连续变量赋予大于1的权值,所述偏好是指聚类过程的偏向性,例如,对于偏向性格的聚类,则会上调性格相关用户特征的特征变量的比重,聚类结果在性格方面的差异会更显著;
步骤S7,对所有经过量化处理的离散变量和连续变量进行聚类,也就是说将有加权的用户特征类别的特征变量和无加权的用户特征类别的特征变量进行聚类(例如层次聚类,K-Means聚类等),得到有偏向的用户特征聚类。这一步骤可以通过Python中K-Prototypes库实现。
上述聚类方法为非监督分类方法,根据用户画像特征,建立加权聚类算法,实现用户分类功可以根据具体应用场景做加权修改,可以根据业务需求有针对的加大聚类方法的偏好。
在步骤S5中,上述对离散变量和连续变量进行量化处理的方法包括:
将具有次序性的离散变量(例如等级)转换为数值形式;
将不具有次序性且取值数量超过设定数量(例如20个)的离散变量(地名等信息)转化为高阶形式(如身份、城市等级等信息);
将转换为高阶形式的离散变量进行编码(例如,one-hot编码);
筛选出编码后具有次序的所述离散变量与连续变量进行归一化处理。
在本申请的一个实施例中,在步骤S6中,所述有偏好的用户特征的类别为一类或多类,当所述有偏好的用户特征的类别为一类时,所述有偏好的一类用户特征的特征变量的权值在大于1且不大于n-1范围内;当有偏好的类别为多类时,多类偏好的一类用户特征的特征变量的权值在大于1且权值之和不大于n-1范围内,n为用户特征聚类后的类别数量。
在本申请的另一个实施例中,所述有偏好的用户特征的类别为一类或多类,当所述有偏好的用户特征的类别为一类时,所述有偏好的一类用户特征的特征变量的权值在大于1且使得所述类别的用户特征数与权值的乘积等于其他类别的用户特征数之和的范围内;当有偏好的类别为多类时,多类偏好的一类用户特征的特征变量的权值在大于1,且权值之和等于无偏好的类别的用户特征数之和的范围内,例如,用户特征总数有800,具有4个用户特征类别,第一类别到第四类别的用户特征数分别为100、300、200、200,有偏好 的分类为第一类别,则第一类别的权值在大于1不大于7的范围内进行变化。
可以采用上述两个实施例中对有偏好的用户特征类别赋予的权值在上述范围内进行变化,得到不同次赋值,从而得到不同次聚类,可以采用下述实施例中一种或多种的结合得到有偏好的用户特征类别的最佳权值。
在一个可选实施例中,所述对有偏好的用户特征类别的经过量化处理的离散变量和连续变量赋予大于1的权值的方法包括:
统计用户特征聚类后的类别数量n;
将有偏好的用户特征的类别的特征变量的权值在大于1不大于n-1的范围内进行变化;
根据赋权之后的聚类的轮廓系数或/和可解释性,确定最佳权值。
优选地,还包括:
将最佳权值对应的聚类结果作为最佳偏向的用户特征聚类,其中,包括:
根据下式计算每次聚类的轮廓系数
Figure PCTCN2019089151-appb-000001
其中,s i为第i次聚类的轮廓系数,a i和b i分别为第i次聚类结果中属于不同类别的距离最大的两个特征变量;
重复上述步骤,得到轮廓系数随权值的变化曲线,观察曲线是否有极值点,将轮廓系数最大值对应的权值作为最佳权值,与轮廓系数最大值对应的聚类结果作为最佳偏向的用户特征聚类。
在一个可选实施例中,所述对有偏好的用户特征类别的经过量化处理的离散变量和连续变量赋予大于1的权值的方法包括:
获得有偏好一类或多类的用户特征类别的经过量化处理的离散变量和连续变量构成的量化矩阵
B=(b ij) m×n
其中,b ij为第i个用户特征的第j个特征变量;
构建对有偏好的用户特征类别的特征变量不同次赋予不同权值的组合权值矩阵
F=WΘ=[F 1 F 2 … F n] T
Figure PCTCN2019089151-appb-000002
Figure PCTCN2019089151-appb-000003
F n=w n,1θ 1+w n,2θ 2+…+w n,lθ l
其中,矩阵W为有偏好一类或多类用户特征的特征变量不同次赋予的权值,Θ为各次赋予权值的线性系数向量,w n,l为第l次对第n个特征变量赋予的权值,权值大于1且不大于n-1,n为特征变量的个数,l为赋权次数,w l为第l次赋权的权值组成的权值向量,且每一个权值向量中权值之和不大于n-1,θ l为第l次赋权的线性系数,θ k≥0,k=1,2,,l,
Figure PCTCN2019089151-appb-000004
F n为第n个特征的组合权值;
利用向量矩阵构建向量差矩阵C,
Figure PCTCN2019089151-appb-000005
根据向量差矩阵和组合权值矩阵获得权值评价模型
M(F)=CF=CWΘ;
将权值评价模型一阶导数为零对应的组合权值矩阵的最优解分别作为各特征变量的最佳权值。
在一个可选实施例中,所述对有偏好的用户特征类别的经过量化处理的离散变量和连续变量赋予大于1的权值的方法包括:
获得有偏好一类或多类的用户特征类别的经过量化处理的离散变量和连续变量构成的量化矩阵
B=(b ij) m×n
其中,b ij为第i个用户特征的第j个特征变量;
构建对有偏好的用户特征类别的特征变量不同次赋予不同权值的组合权值矩阵
F=WΘ=[F  1F 2 … F n] T
Figure PCTCN2019089151-appb-000006
Figure PCTCN2019089151-appb-000007
F n=w n,1θ 1+w n,2θ 2+…+w n,lθ l
其中,矩阵W为有偏好一类或多类用户特征的特征变量不同次赋予的权值,Θ为各次赋予权值的线性系数向量,w n,l为第l次对第n个特征变量赋予的权值,权值大于1且不大于n-1,n为特征变量的个数,l为赋权次数,w l为第l次赋权的权值组成的权值向量,且每一个权值向量中权值之和不大于n-1,θ l为第l次赋权的线性系数,θ k≥0,k=1,2,…,l,
Figure PCTCN2019089151-appb-000008
F n为第n个特征的组合权值;
利用向量矩阵构建向量和矩阵H,
Figure PCTCN2019089151-appb-000009
根据向量和矩阵和组合权值矩阵获得权值评价模型
M′(F)=HF=HWΘ;
将权值评价模型一阶导数为零对应的组合权值矩阵的最优解分别作为各特征变量的最佳权值。
利用向量差矩阵构建权值评价模型,体现了属于不同用户特征的特征变量之间的差异,使得特征变量聚类时的各类之间的差异清晰,具有较好的可解释性,利用向量和矩阵构建权值评价模型,体现了不同用户特征之间的联系使得特征变量聚类时具有良好的轮廓,因此,可以采用两者加权结合构建评价模型。
在本申请的一个实施例中,所述对离散变量和连续变量进行量化处理的方法包括:
判断离散变量的离散程度,所述离散程度可以根据词向量的极差、四分位距、方差、标准差、平均方差和变异系数中的一种或多种方法获得,例如,采用平均方差评价离散度,
Figure PCTCN2019089151-appb-000010
其中PC为一个用户特征的离散变量的离散程度,N为用户数,y i和o i分别为第i个用户的用户特征的离散变量及其期待值,所述期待值是使得离散程度降低的设定值;
对离散程度超过阈值(可以设定值,聚类精度越高,阈值越低)的离散变量进行概括统计,直到离散程度不超过阈值,例如,居住地的离散特征可以由小区概括统一为街道,概括统一为街道后的离散特征的离散程度依然超过阈值时,可以进一步概括统一为区/县。
在本申请的一个实施例中,所述对所有经过量化处理的离散变量和连续变量进行聚类,得到有偏向的用户特征聚类的方法包括:
赋予不同权值进行多次初始聚类;
根据多次初始聚类的结果构建树结构,其中,根节点从顶到低依次是第一次初始聚类结果到最后一次初始聚类结果的每一个聚类,边长为聚类结果 中具有相同的用户特征的特征变量占所有特征变量的比例;
以节点之间的边长相对于最大边长与最短边长的差值的比值作为节点之间的相似度;
根据相似度对节点进行聚类(例如采用k-means方法聚类),将聚类结果中初始聚类的交集作为最佳聚类结果。
此外,本申请实施例还提出一种计算机可读存储介质,所述计算机可读存储介质中包括基于用户画像的聚类程序,所述基于用户画像的聚类程序被处理器执行时实现如下步骤:
获取多个用户的用户特征及其对应的特征变量;
将用户特征转化为词向量;
对词向量进行聚类,确定各用户特征所属类别;
将所述特征变量划分为连续变量和离散变量,所述连续变量是具有次序属性的数值型变量,所述离散变量是非数值型变量;
对离散变量和连续变量进行量化处理;
筛选出有偏好的用户特征的类别,对所述有偏好的用户特征类别的经过量化处理的离散变量和连续变量赋予大于1的权值,所述偏好是指聚类过程的偏向性;
对所有经过量化处理的离散变量和连续变量进行聚类,得到有偏向的用户特征聚类。
本申请之计算机可读存储介质的具体实施方式与上述基于用户画像的聚类方法、电子装置的具体实施方式大致相同,在此不再赘述。
上述基于用户画像的聚类方法、电子装置及存储介质可以选择较为关注的若干个字段(针对性分类,比如对这群用户我希望可以偏重于个人属性分类,那么就加大这部分属性的权重)进行权重调整(大于1),实现有针对性的聚类。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物 品或者方法中还存在另外的相同要素。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端客户端(可以是手机,计算机,服务器,或者网络客户端等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种基于用户画像的聚类方法,其特征在于,包括:
    获取多个用户的用户特征及所述用户特征对应的特征变量;
    将用户特征转化为词向量;
    对词向量进行聚类,确定各用户特征所属类别;
    将所述用户特征对应的特征变量划分为连续变量和离散变量,所述连续变量是具有次序属性的数值型变量,所述离散变量是非数值型变量;
    对离散变量和连续变量进行量化处理;
    筛选出有偏好的用户特征的类别,对有偏好的用户特征类别的经过量化处理的离散变量和连续变量赋予大于1的权值,所述偏好是指聚类过程的偏向性;
    对所有经过量化处理的离散变量和连续变量进行聚类,得到有偏向的用户特征聚类。
  2. 根据权利要求1所述的基于用户画像的聚类方法,其特征在于,所述对离散变量和连续变量进行量化处理的方法包括:
    将具有次序性的离散变量转换为数值形式;
    将不具有次序性且取值数量超过设定数量的离散变量转化为高阶形式;
    将转换为高阶形式的离散变量进行编码;
    筛选出编码后具有次序的所述离散变量与连续变量进行归一化处理。
  3. 根据权利要求1所述的基于用户画像的聚类方法,其特征在于,所述对有偏好的用户特征类别的经过量化处理的离散变量和连续变量赋予大于1的权值的方法包括:
    统计用户特征聚类后的类别数量n;
    将有偏好的用户特征的类别的特征变量的权值在大于1不大于n-1的范围内进行变化;
    根据赋权之后的聚类的轮廓系数或/和可解释性,确定最佳权值。
  4. 根据权利要求3所述的基于用户画像的聚类方法,其特征在于,所述根据赋权之后的聚类的轮廓系数或/和可解释性,确定最佳权值的步骤之后,还包括:
    将最佳权值对应的聚类结果作为最佳偏向的用户特征聚类,其中,包括:
    根据下式计算每次聚类的轮廓系数
    Figure PCTCN2019089151-appb-100001
    其中,s i为第i次聚类的轮廓系数,a i和b i分别为第i次聚类结果中属于不同类别的距离最大的两个特征变量;
    重复上述步骤,得到轮廓系数随权值的变化曲线,观察曲线是否有极值点,将轮廓系数最大值对应的权值作为最佳权值,与轮廓系数最大值对应的聚类结果作为最佳偏向的用户特征聚类。
  5. 根据权利要求1所述的基于用户画像的聚类方法,其特征在于,所述有偏好的用户特征的类别为一类或多类,当所述有偏好的用户特征的类别为一类时,所述有偏好的一类用户特征的特征变量的权值在大于1且不大于n-1范围内;当有偏好的类别为多类时,多类偏好的一类用户特征的特征变量的权值在大于1且权值之和不大于n-1范围内,n为用户特征聚类后的类别数量。
  6. 根据权利要求5所述的基于用户画像的聚类方法,其特征在于,对所述有偏好的用户特征类别的经过量化处理的离散变量和连续变量赋予大于1的权值的方法还包括:
    获得有偏好一类或多类的用户特征类别的经过量化处理的离散变量和连续变量构成的量化矩阵;
    B=(b ij) m×n
    其中,b ij为第i个用户特征的第j个特征变量;
    构建对有偏好的用户特征类别的特征变量不同次赋予不同权值的组合权值矩阵;
    F=WΘ=[F 1 F 2 … F n] T
    Figure PCTCN2019089151-appb-100002
    Figure PCTCN2019089151-appb-100003
    F n=w n,1θ 1+w n,2θ 2+…+w n,lθ l
    其中,矩阵W为有偏好一类或多类用户特征的特征变量不同次赋予的权值,Θ为各次赋予权值的线性系数向量,w n,l为第l次对第n个特征变量赋予的权值,权值大于1且不大于n-1,n为特征变量的个数,l为赋权次数,w l为第l次赋权的权值组成的权值向量,且每一个权值向量中权值之和不大于n-1,θ l为第l次赋权的线性系数,θ k≥0,k=1,2,…,l,
    Figure PCTCN2019089151-appb-100004
    F n为第n个特征的组合权值;
    利用向量矩阵构建向量差矩阵C,
    Figure PCTCN2019089151-appb-100005
    根据向量差矩阵和组合权值矩阵获得权值评价模型;
    M(F)=CF=CWΘ;
    将权值评价模型一阶导数为零对应的组合权值矩阵的最优解分别作为各特征变量的最佳权值。
  7. 根据权利要求5所述的基于用户画像的聚类方法,其特征在于,对所述有偏好的用户特征类别的经过量化处理的离散变量和连续变量赋予大于1的权值的方法还包括:
    获得有偏好一类或多类的用户特征类别的经过量化处理的离散变量和连 续变量构成的量化矩阵;
    B=(b ij) m×n
    其中,b ij为第i个用户特征的第j个特征变量;
    构建对有偏好的用户特征类别的特征变量不同次赋予不同权值的组合权值矩阵;
    F=WΘ=[F 1 F 2 … F n] T
    Figure PCTCN2019089151-appb-100006
    Figure PCTCN2019089151-appb-100007
    F n=w n,1θ 1+w n,2θ 2+…+w n,lθ l
    其中,矩阵W为有偏好一类或多类用户特征的特征变量不同次赋予的权值,Θ为各次赋予权值的线性系数向量,w n,l为第l次对第n个特征变量赋予的权值,权值大于1且不大于n-1,n为特征变量的个数,l为赋权次数,w l为第l次赋权的权值组成的权值向量,且每一个权值向量中权值之和不大于n-1,θ l为第l次赋权的线性系数,θ k≥0,k=1,2,…,l,
    Figure PCTCN2019089151-appb-100008
    F n为第n个特征的组合权值;
    利用向量矩阵构建向量和矩阵H,
    Figure PCTCN2019089151-appb-100009
    根据向量和矩阵和组合权值矩阵获得权值评价模型;
    M′(F)=HF=HWΘ;
    将权值评价模型一阶导数为零对应的组合权值矩阵的最优解分别作为各特征变量的最佳权值。
  8. 根据权利要求1所述的基于用户画像的聚类方法,其特征在于,所述对有偏好的用户特征类别的经过量化处理的离散变量和连续变量赋予大于1的权值的方法包括:
    统计用户特征总数,属于每一个用户特征类别的用户特征数;
    赋予有偏好的用户特征类别的权值在大于1到使得所述类别的用户特征数等于其他类别的用户特征数之和的范围内。
  9. 根据权利要求1所述的基于用户画像的聚类方法,其特征在于,
    当所述有偏好的用户特征的类别为一类时,所述有偏好的一类用户特征的特征变量的权值在大于1且使得所述类别的用户特征数与权值的乘积等于其他类别的用户特征数之和的范围内。
  10. 根据权利要求1所述的基于用户画像的聚类方法,其特征在于,
    所述对离散变量和连续变量进行量化处理的方法包括:
    判断离散变量的离散程度,对离散程度超过阈值的离散变量进行概括统计,直到离散程度不超过阈值。
  11. 根据权利要求10所述的基于用户画像的聚类方法,其特征在于,
    所述离散程度根据词向量的极差、四分位距、方差、标准差、平均方差和变异系数中的一种或多种方法获得。
  12. 根据权利要求11所述的基于用户画像的聚类方法,其特征在于,
    采用平均方差评价离散度的公式如下:
    Figure PCTCN2019089151-appb-100010
    其中PC为一个用户特征的离散变量的离散程度,N为用户数,y i和o i分别为第i个用户的用户特征的离散变量及其期待值,所述期待值是使得离散程度降低的设定值。
  13. 根据权利要求1所述的基于用户画像的聚类方法,其特征在于,
    对所有经过量化处理的离散变量和连续变量进行聚类,得到有偏向的用户特征聚类的方法包括:
    赋予不同权值进行多次初始聚类;
    根据多次初始聚类的结果构建树结构,其中,根节点从顶到低依次是第一次初始聚类结果到最后一次初始聚类结果的每一个聚类,边长为聚类结果中具有相同的用户特征的特征变量占所有特征变量的比例;
    根据节点之间的相似度对节点进行聚类,将聚类结果中初始聚类的交集作为最佳聚类结果。
  14. 根据权利要求13所述的基于用户画像的聚类方法,其特征在于,
    以节点之间的边长相对于最大边长与最短边长的差值的比值作为节点之间的相似度。
  15. 根据权利要求1所述的基于用户画像的聚类方法,其特征在于,
    采用Word2Vec算法生成词向量词典,从词向量词典中查找与用户特征对应的词向量,从而将用户特征转化为词向量。
  16. 根据权利要求2所述的基于用户画像的聚类方法,其特征在于,
    使用one-hot编码方法对转换为高阶形式的离散变量进行编码。
  17. 根据权利要求1所述的基于用户画像的聚类方法,其特征在于,
    利用网络爬虫技术从网络中获得用户特征及其特征变量。
  18. 根据权利要求1所述的基于用户画像的聚类方法,其特征在于,
    通过Python中的SKLearn模块对词向量进行聚类,确定各用户特征所属类别。
  19. 一种电子装置,其特征在于,包括存储器和处理器,所述存储器中存储有基于用户画像的聚类程序,所述基于用户画像的聚类程序被所述处理器执行时实现如下步骤:
    获取多个用户的用户特征及其对应的特征变量;
    将用户特征转化为词向量;
    对词向量进行聚类,确定各用户特征所属类别;
    将所述特征变量划分为连续变量和离散变量,所述连续变量是具有次序属性的数值型变量,所述离散变量是非数值型变量;
    对离散变量和连续变量进行量化处理;
    筛选出有偏好的用户特征的类别,对所述有偏好的用户特征类别的经过量化处理的离散变量和连续变量赋予大于1的权值,所述偏好是指聚类过程的偏向性;
    对所有经过量化处理的离散变量和连续变量进行聚类,得到有偏向的用户特征聚类。
  20. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中包括有基于用户画像的聚类程序,所述基于用户画像的聚类程序被处理器执行时,实现如权利要求1至18中任一项权利要求所述基于用户画像的聚类方法的步骤。
PCT/CN2019/089151 2019-01-24 2019-05-30 基于用户画像的聚类方法、电子装置及存储介质 WO2020151152A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910068877.7A CN109903082B (zh) 2019-01-24 2019-01-24 基于用户画像的聚类方法、电子装置及存储介质
CN201910068877.7 2019-01-24

Publications (1)

Publication Number Publication Date
WO2020151152A1 true WO2020151152A1 (zh) 2020-07-30

Family

ID=66944108

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/089151 WO2020151152A1 (zh) 2019-01-24 2019-05-30 基于用户画像的聚类方法、电子装置及存储介质

Country Status (2)

Country Link
CN (1) CN109903082B (zh)
WO (1) WO2020151152A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117272119A (zh) * 2023-11-21 2023-12-22 国网山东省电力公司营销服务中心(计量中心) 用户画像分类模型训练方法、用户画像分类方法及系统

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111597348B (zh) * 2020-04-27 2024-02-06 平安科技(深圳)有限公司 用户画像方法、装置、计算机设备和存储介质
CN111881190B (zh) * 2020-08-05 2021-10-08 厦门南讯股份有限公司 基于客户画像的关键数据挖掘系统
CN112116205B (zh) * 2020-08-21 2024-03-12 国网上海市电力公司 针对台区用电特征的画像方法、装置和存储介质
CN117973789A (zh) * 2021-07-30 2024-05-03 北京壹心壹翼科技有限公司 基于全流程用户画像的智能匹配方法、装置、设备及介质

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104268290A (zh) * 2014-10-22 2015-01-07 武汉科技大学 一种基于用户聚类的推荐方法
CN106850314A (zh) * 2016-12-20 2017-06-13 上海掌门科技有限公司 一种用于确定用户属性模型及用户属性信息的方法与设备
CN107679946A (zh) * 2017-09-28 2018-02-09 平安科技(深圳)有限公司 基金产品推荐方法、装置、终端设备及存储介质
US20180047036A1 (en) * 2016-08-11 2018-02-15 Ricoh Company, Ltd. User behavior analysis method and device as well as non-transitory computer-readable medium
CN108062375A (zh) * 2017-12-12 2018-05-22 百度在线网络技术(北京)有限公司 一种用户画像的处理方法、装置、终端和存储介质
CN108519993A (zh) * 2018-03-02 2018-09-11 华南理工大学 基于多数据流计算的社交网络热点事件检测方法
CN109086787A (zh) * 2018-06-06 2018-12-25 平安科技(深圳)有限公司 用户画像获取方法、装置、计算机设备以及存储介质
CN109165383A (zh) * 2018-08-09 2019-01-08 四川政资汇智能科技有限公司 一种基于云平台的数据汇聚、分析、挖掘与共享方法
CN109255715A (zh) * 2018-09-03 2019-01-22 平安科技(深圳)有限公司 电子装置、产品推荐方法和计算机可读存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9251275B2 (en) * 2013-05-16 2016-02-02 International Business Machines Corporation Data clustering and user modeling for next-best-action decisions
CN108427669B (zh) * 2018-02-27 2021-06-11 华青融天(北京)软件股份有限公司 异常行为监控方法和系统
CN108737856B (zh) * 2018-04-26 2020-03-20 西北大学 社会关系感知的iptv用户行为建模与节目推荐方法
CN108734217A (zh) * 2018-05-22 2018-11-02 齐鲁工业大学 一种基于聚类分析的客户细分方法及装置

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104268290A (zh) * 2014-10-22 2015-01-07 武汉科技大学 一种基于用户聚类的推荐方法
US20180047036A1 (en) * 2016-08-11 2018-02-15 Ricoh Company, Ltd. User behavior analysis method and device as well as non-transitory computer-readable medium
CN106850314A (zh) * 2016-12-20 2017-06-13 上海掌门科技有限公司 一种用于确定用户属性模型及用户属性信息的方法与设备
CN107679946A (zh) * 2017-09-28 2018-02-09 平安科技(深圳)有限公司 基金产品推荐方法、装置、终端设备及存储介质
CN108062375A (zh) * 2017-12-12 2018-05-22 百度在线网络技术(北京)有限公司 一种用户画像的处理方法、装置、终端和存储介质
CN108519993A (zh) * 2018-03-02 2018-09-11 华南理工大学 基于多数据流计算的社交网络热点事件检测方法
CN109086787A (zh) * 2018-06-06 2018-12-25 平安科技(深圳)有限公司 用户画像获取方法、装置、计算机设备以及存储介质
CN109165383A (zh) * 2018-08-09 2019-01-08 四川政资汇智能科技有限公司 一种基于云平台的数据汇聚、分析、挖掘与共享方法
CN109255715A (zh) * 2018-09-03 2019-01-22 平安科技(深圳)有限公司 电子装置、产品推荐方法和计算机可读存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117272119A (zh) * 2023-11-21 2023-12-22 国网山东省电力公司营销服务中心(计量中心) 用户画像分类模型训练方法、用户画像分类方法及系统
CN117272119B (zh) * 2023-11-21 2024-03-22 国网山东省电力公司营销服务中心(计量中心) 用户画像分类模型训练方法、用户画像分类方法及系统

Also Published As

Publication number Publication date
CN109903082A (zh) 2019-06-18
CN109903082B (zh) 2022-10-28

Similar Documents

Publication Publication Date Title
WO2020151152A1 (zh) 基于用户画像的聚类方法、电子装置及存储介质
US11599714B2 (en) Methods and systems for modeling complex taxonomies with natural language understanding
CN110347835B (zh) 文本聚类方法、电子装置及存储介质
Chernozhukov et al. hdm: High-dimensional metrics
US20180349384A1 (en) Differentially private database queries involving rank statistics
CN107301199B (zh) 一种数据标签生成方法和装置
WO2020253503A1 (zh) 人才画像的生成方法、装置、设备及存储介质
US20180158078A1 (en) Computer device and method for predicting market demand of commodities
WO2018103718A1 (zh) 应用推荐的方法、装置及服务器
CN111753060A (zh) 信息检索方法、装置、设备及计算机可读存储介质
CN113569135B (zh) 基于用户画像的推荐方法、装置、计算机设备及存储介质
CN110503506B (zh) 基于评分数据的物品推荐方法、装置及介质
CN110276382B (zh) 基于谱聚类的人群分类方法、装置及介质
CN114528844A (zh) 意图识别方法、装置、计算机设备及存储介质
CN110688452A (zh) 一种文本语义相似度评估方法、系统、介质和设备
Misuraca et al. BMS: An improved Dunn index for Document Clustering validation
CN112070550A (zh) 基于搜索平台的关键词确定方法、装置、设备及存储介质
CN110232154B (zh) 基于随机森林的产品推荐方法、装置及介质
CN115730597A (zh) 多级语义意图识别方法及其相关设备
CN111898704A (zh) 对内容样本进行聚类的方法和装置
WO2020114109A1 (zh) 嵌入结果的解释方法和装置
CN113486670B (zh) 基于目标语义的文本分类方法、装置、设备及存储介质
CN114547257B (zh) 类案匹配方法、装置、计算机设备及存储介质
US20220367051A1 (en) Methods and systems for estimating causal effects from knowledge graphs
CN115329083A (zh) 文档分类方法、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19911985

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 14.09.2021)

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 22.04.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19911985

Country of ref document: EP

Kind code of ref document: A1