WO2020143305A1 - 群体信息分类方法、装置、计算机设备和存储介质 - Google Patents

群体信息分类方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2020143305A1
WO2020143305A1 PCT/CN2019/117529 CN2019117529W WO2020143305A1 WO 2020143305 A1 WO2020143305 A1 WO 2020143305A1 CN 2019117529 W CN2019117529 W CN 2019117529W WO 2020143305 A1 WO2020143305 A1 WO 2020143305A1
Authority
WO
WIPO (PCT)
Prior art keywords
group information
variable
group
variables
continuous
Prior art date
Application number
PCT/CN2019/117529
Other languages
English (en)
French (fr)
Inventor
邓悦
金戈
徐亮
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020143305A1 publication Critical patent/WO2020143305A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling

Definitions

  • the present application relates to a group information classification method, device, computer equipment and storage medium.
  • the information involved includes continuous variables and discrete variables corresponding to the group information.
  • the measured distance between the two types of variables is divided into weights to obtain the final clustering result, so as to obtain group classification result.
  • the size of the weights cannot be calculated accurately, resulting in different weights and inaccurate group classification results.
  • a group information classification method, device, computer device, and storage medium that can improve the accuracy of a classification result corresponding to group information are provided.
  • a group information classification method including:
  • the classification task carrying a group identifier
  • a group information classification device including:
  • a communication module for receiving a classification task, the classification task carrying a group identifier
  • a variable identification module configured to obtain group information according to the group identification, and identify the first continuous variable and the discrete variable corresponding to the group information;
  • a variable processing module configured to perform continuous processing on the discrete variables to obtain a second continuous variable corresponding to the group information; normalize the first continuous variable and the second continuous variable to obtain the corresponding group information Quasi-variables;
  • the clustering module is used to cluster the standardized variables corresponding to the group information to obtain a classification result corresponding to the group information.
  • a computer device includes a memory and one or more processors.
  • the memory stores computer-readable instructions.
  • the computer-readable instructions are executed by the one or more processors, the one or more Each processor performs the following steps:
  • the classification task carrying a group identifier
  • One or more non-volatile computer-readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:
  • the classification task carrying a group identifier
  • FIG. 1 is an application scenario diagram of a group information classification method in one or more embodiments.
  • FIG. 2 is a schematic flowchart of a group information classification method in one or more embodiments.
  • FIG. 3 is a schematic flowchart of a continuous variable processing step for discrete variables in one or more embodiments.
  • FIG. 4 is a block diagram of a group information classification device in one or more embodiments.
  • Figure 5 is a block diagram of a computer device in one or more embodiments.
  • the group information classification method provided in this application can be applied to the application scenario shown in FIG. 1.
  • the terminal 102 communicates with the server 104 via the network.
  • the server 104 receives the classification task uploaded by the terminal 102, and the classification task carries a group identifier.
  • the server 104 obtains group information according to the group identifier, and identifies the first continuous variable and the discrete variable corresponding to the group information.
  • the server 104 performs continuous processing on the discrete variables to obtain the second continuous variable corresponding to the group information.
  • the server 104 performs standardization processing on the first continuous variable and the second continuous variable to obtain a standardized variable corresponding to the group information.
  • the server 104 clusters the standardized variables corresponding to the group information to obtain the classification result corresponding to the group information, and can measure the distance of the discrete variables without setting the weights, to avoid the distance weight division between different types of variables to the groups The influence of the classification results improves the accuracy of the classification results corresponding to the group information.
  • the terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
  • the server 104 may be implemented by an independent server or a server cluster composed of multiple servers.
  • a group information classification method is provided. Taking the method applied to the server in FIG. 1 as an example for illustration, it includes the following steps:
  • step 202 a classification task is received, and the classification task carries a group identifier.
  • Step 204 Obtain group information according to the group identifier, and identify the first continuous variable and the discrete variable corresponding to the group information.
  • Step 206 Perform continuous processing on the discrete variables to obtain a second continuous variable corresponding to the group information.
  • the server receives the classification task uploaded by the terminal, analyzes the classification task, and obtains the group identifier carried by the classification task.
  • the server obtains the corresponding group information according to the group identifier.
  • the group information may be information of a certain number of people within a preset range. For example, employee performance information for October 2018.
  • the server recognizes the variable corresponding to the group information.
  • Variables include continuous variables and discrete variables. Continuous variables include a first continuous variable and a second continuous variable. The first continuous variable can be directly identified through the group information. The second continuous variable can be obtained by transforming the discrete variable. The first continuous variable may be information expressed in numerical values. Discrete variables can be information expressed in multiple dimensions.
  • the first continuous variable may be the number of attending courses, the number of attendance days, the number of years of entry, and so on.
  • Discrete variables can be training results, gender, etc.
  • the server After the server recognizes the variables corresponding to the group information, it can continuously process the discrete variables in the variables corresponding to the group information. The server obtains multiple dimensions corresponding to the discrete variables in the group information and performs multiple dimensions corresponding to the discrete variables. After encoding, the second continuous variable corresponding to the discrete variable is obtained. At this time, the variables corresponding to the group information are all continuous variables.
  • the server receives the employee group classification task uploaded by the terminal, analyzes the classification task, and obtains the employee group identifier carried by the classification task.
  • the server obtains the employee ID of each employee according to the employee group ID and the employee performance information according to the employee ID.
  • the performance information corresponding to all employee IDs can be called group information.
  • the server identifies the group information and obtains the first continuous variable corresponding to the group information And discrete variables.
  • the first continuous variable may be the number of attending courses, the number of attendance days, the number of years of entry, etc.
  • Discrete variables can be training results, gender, etc.
  • the server performs continuous processing on the training results to obtain continuous variables corresponding to the training results (whether it is excellent, good, qualified, or unqualified).
  • the server performs continuous processing on gender to obtain continuous variables corresponding to gender (whether male or female).
  • step 208 the first continuous variable and the second continuous variable are standardized to obtain a standardized variable corresponding to the group information.
  • the server may normalize the first continuous variable and the second continuous variable. Specifically, the server calculates the mean and standard deviation of the variable corresponding to the group information, and obtains the standardized variable corresponding to the group information according to the variable corresponding to the group information, the mean and standard deviation of the variable corresponding to the group information, and the preset relationship.
  • Variables include continuous variables and discrete variables. Continuous variables include a first continuous variable and a second continuous variable. The preset relationship may be to subtract the mean of the variable and then divide by the standard deviation.
  • Step 210 Cluster the standardized variables corresponding to the group information to obtain a classification result corresponding to the group information.
  • the server may cluster the standardized variable corresponding to the group information to obtain the classification result corresponding to the group information. Specifically, the server performs distance measurement on the standardized variables corresponding to the group information, clusters the standardized variables according to the measured distances between the standardized variables, and obtains a variety of standardized variable types.
  • the groups corresponding to the group information are divided into various group types.
  • a classification result corresponding to corresponding group information is obtained according to multiple standardized variable types, and the classification result includes multiple group types.
  • Each standardized variable type corresponds to a type of group type.
  • the server receives the employee group classification task uploaded by the terminal.
  • the classification task carries the employee group ID.
  • the server obtains the employee ID of each employee based on the employee group ID, obtains employee performance information based on the employee ID, and compares the performance information corresponding to all employee IDs.
  • Called group information the server identifies the group information and obtains the first continuous variable and discrete variable corresponding to the group information.
  • the server performs continuous processing on the discrete variables corresponding to the group information to obtain a second continuous variable corresponding to the group information.
  • the server normalizes the first continuous variable and the second continuous variable to obtain the standardized variable corresponding to the group information.
  • the server clusters the standardized variables corresponding to the group information to obtain a variety of standardized variable types, thereby obtaining a classification result corresponding to the group information.
  • the classification result can be multiple employee types with different performance levels, or multiple employees with the same performance level. Employee types.
  • the server recognizes the first continuous variable corresponding to the group information and the discrete variable, and performs continuous processing on the discrete variable to obtain the second continuous variable corresponding to the group information.
  • the first continuous variable and the second continuous variable are standardized to obtain the standardized variable corresponding to the group information, and then the standardized variable is clustered to obtain the classification result corresponding to the group information.
  • the step of continuously processing the discrete variables includes:
  • Step 302 Acquire multiple dimensions corresponding to discrete variables in the group information.
  • Step 304 Encoding multiple dimensions corresponding to the discrete variable to obtain a second continuous variable corresponding to the discrete variable.
  • the server may obtain multiple dimensions corresponding to the discrete variable in the group information.
  • the discrete variables can be training results, gender, etc. Training scores correspond to four dimensions (excellent, good, qualified, and unqualified), and gender corresponds to two dimensions (male and female).
  • the server may encode the multiple dimensions corresponding to the discrete variable to obtain the second continuous variable corresponding to the discrete variable.
  • the encoding method may be one-hot (one-hot) encoding.
  • the server After the server encodes the discrete variable, it can express the multiple dimensions corresponding to the discrete variable as a numerical value, so that the discrete variable can be converted into a continuous variable of multiple dimensions. The distance between them is measured.
  • the performance information corresponding to all employee IDs can be called group information
  • the discrete variable corresponding to the group information can be training results, which correspond to four dimensions (excellent, good, qualified, and unqualified), and encode the training results , Get four continuous variables (whether excellent, good, qualified, unqualified), the values of the four continuous variables are 0 or 1, the corresponding value of "yes” is 1, the corresponding value of "no” The value is 0. If the training situation is excellent, the continuous variable corresponding to the discrete variable is [1,0,0,0]. If the training is qualified, the continuous variable corresponding to the discrete variable is [0,0,1,0].
  • the discrete variable corresponding to the group information can be gender, and gender corresponds to two dimensions (male and female), and the gender is encoded to obtain two continuous variables (whether male or female), and the values of the two continuous variables Both are 0 or 1. The value corresponding to "Yes” is 1, and the value corresponding to "No” is 0. If the gender is male, the continuous variable corresponding to the discrete variable is [1, 0]. If the gender is female, the continuous variable corresponding to the discrete variable is [0, 1].
  • the server obtains multiple dimensions corresponding to the discrete variables in the crowd information, encodes the multiple dimensions corresponding to the discrete variables, and can convert the discrete variables into continuous variables, so as to measure the distance, without the need for discrete
  • the distance between the two types of variables and continuous variables is weighted, which effectively improves the accuracy of the classification results corresponding to the group information.
  • encoding multiple dimensions corresponding to the discrete variable to obtain the second continuous variable corresponding to the group information includes: encoding multiple dimensions corresponding to the discrete variable to obtain the value of each dimension; according to the discrete The values of multiple dimensions corresponding to the variables obtain the second continuous variable corresponding to the group information.
  • the server encodes the discrete variable, and can represent the multiple dimensions corresponding to the discrete variable with numerical values, so as to convert the discrete variable into a continuous variable of multiple dimensions. The distance between them is measured.
  • the first continuous variable and the second continuous variable are standardized, and the standardized variable corresponding to the group information includes: calculating the mean and standard deviation of the first continuous variable and the second continuous variable corresponding to the group information; According to the first continuous variable and the second continuous variable, the mean value, the standard deviation and the preset relationship, the standardized variable corresponding to the group information is obtained.
  • the server obtains the standardized variable corresponding to the group information according to the mean value, standard deviation, and preset relationship of the first continuous variable and the second continuous variable of the group information.
  • the preset relationship may be to subtract the mean value of the first continuous variable and the second continuous variable, and then divide by the standard deviation.
  • the server can obtain the weights of the first continuous variable and the second continuous variable among all the variables by standardizing the first continuous variable and the second continuous variable, and stabilize the values of all the variables within an appropriate range.
  • the range of the variables can be stabilized in [0,1], which can avoid the classification results corresponding to the group information from being affected by the variables with large dimension, resulting in unreasonable group classification results.
  • the variables for clustering include age, income (in RMB), height (in meters), and weight (in kg). Since the dimension of income is much larger than other variables, by standardizing all variables, the classification result corresponding to group information is avoided to be affected only by income.
  • the server calculates the mean and standard deviation of the first continuous variable and the second continuous variable corresponding to the group information; according to the first continuous variable and the second continuous variable, the mean, the standard deviation and the preset relationship, the group information is obtained Corresponding standardized variables. It can stabilize the values of all variables in an appropriate range, effectively avoiding the dimension of the variables to affect the group classification results, and further improve the accuracy of the classification results corresponding to the group information.
  • clustering the first continuous variable and the second continuous variable after standardization to obtain the classification result corresponding to the group information includes: performing a distance measurement on the standardized variable corresponding to the group information; The distance between the variables clusters the standardized variables to obtain a variety of standardized variable types; according to the multiple standardized variable types, the classification results corresponding to the group information are obtained.
  • the server can perform distance measurement on the first continuous variable and the second continuous variable, and arbitrarily select n variables among the first continuous variable and the second continuous variable as the initial The clustering center.
  • the remaining variables are classified according to the distance between the remaining variables and each clustering center variable. The closer the clustering variable to the clustering center variable, the greater the similarity between the variables. High, assign the remaining variable to the type of group represented by the clustering variable that is closer.
  • the cluster center of the new cluster is calculated, that is, the average value of all variables in the new cluster is calculated. Repeat this process continuously until the clustering results no longer change.
  • a variety of standardized variable types are obtained to obtain the classification results corresponding to the group information.
  • the server measures the distance between the first continuous variable and the second continuous variable after normalization, and clusters the variables according to the distance between the first continuous variable and the second continuous variable after normalization To get multiple continuous variables. According to the multiple continuous variables obtained by clustering, the corresponding group classification results are obtained to improve the accuracy of the group classification results.
  • the above method further includes: analyzing the classification results corresponding to the group information to obtain the distinguishing characteristics of multiple group types; combining the distinguishing characteristics of each group type with other group types in the classification results corresponding to the group information Compare the feature values of the same distinguishing features to obtain the group features of each group type.
  • the server clusters the standardized variables according to the distance between the measured standardized variables to obtain a variety of standardized variable types, so as to obtain the classification results corresponding to the group information, and calculates the number of continuous variables in each group type in the classification results corresponding to the group information.
  • One mean calculate the second mean of continuous variables in other group types in the classification result corresponding to the group information.
  • the first mean value is compared with the second mean value, and then the first mean value and the second mean value are calculated according to a preset relationship to obtain different characteristics of various group types.
  • the preset relationship may be ⁇ first mean-second mean ⁇ /(first mean+second mean).
  • the server compares the characteristic values of the distinguishing characteristics of multiple group types, so as to obtain the group characteristics of each group type.
  • the performance information corresponding to all the above employee IDs is called group information.
  • group information By clustering the first continuous variable and the second continuous variable corresponding to the group information, the classification result corresponding to the group information is obtained, and then the classification corresponding to the group information
  • the results were analyzed, and the distinguishing characteristics of a certain employee type in the same performance level were examination scores and work plan completion.
  • enterprises can train employees to improve their learning ability, thereby improving employee performance.
  • the server analyzes the group classification result to obtain the distinguishing characteristics of multiple types of groups, and compares the distinguishing characteristics of each type of group with the same distinguishing characteristics of other types of groups in the group classification result to obtain each type of group Group characteristics. Can accurately obtain the group characteristics of each type of group, according to the group characteristics of each type of group to adapt to different business needs.
  • the classification result corresponding to the group information is analyzed to obtain the distinguishing characteristics of multiple group types, including: selecting the target group type from the classification result corresponding to the group information; calculating the target group according to the classification result corresponding to the group information The first mean value of continuous variables in the type; the second mean value of the continuous variables in the remaining group types is calculated according to the classification results corresponding to the group information; the distinguishing characteristics of the target group type are calculated based on the first mean and the second mean value; The analysis results of the classification results until the distinguishing characteristics of all group types in the classification results are obtained.
  • the server may select a target group type from the classification result corresponding to the group information, and calculate the first mean value of continuous variables in the target group type.
  • the server further calculates the second mean value of continuous variables in the remaining group types in the classification result corresponding to the group information.
  • the server compares the first mean value with the second mean value, and then calculates the first mean value and the second mean value according to a preset relationship, thereby obtaining the distinguishing characteristics of the target group type.
  • the preset relationship may be ⁇ first mean-second mean ⁇ /(first mean+second mean).
  • the server selects the next target group type from the classification results corresponding to the group information, and calculates the first mean value of continuous variables in the next target group type.
  • the server further calculates the second mean value of continuous variables in the remaining group types in the classification result corresponding to the group information.
  • the server compares the first mean value with the second mean value, and then calculates the first mean value and the second mean value according to a preset relationship, so as to obtain the distinguishing feature of the next target group type.
  • the server repeats the above steps of analyzing the classification results corresponding to the group information until the distinguishing characteristics of all group types are obtained.
  • steps in the flowcharts of FIGS. 2 to 3 are sequentially displayed in accordance with the arrows, the steps are not necessarily executed in the order indicated by the arrows. Unless clearly stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least some of the steps in FIGS. 2 to 3 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but may be executed at different times. These sub-steps or stages The execution order of is not necessarily sequential, but may be executed in turn or alternately with at least a part of other steps or sub-steps or stages of other steps.
  • a group information classification device including: a communication module 402, a variable identification module 404, a variable processing module 406, and a clustering module 408, wherein:
  • the communication module 402 is used to receive a classification task, and the classification task carries a group identifier;
  • the variable identification module 404 is used to obtain group information according to the group identification, and identify the first continuous variable and the discrete variable corresponding to the group information;
  • the variable processing module 406 is used to continuously process the discrete variables to obtain the second continuous variable corresponding to the group information; standardize the first continuous variable and the second continuous variable to obtain the quasi-normalized variable corresponding to the group information;
  • the clustering module 408 is used to cluster the standardized variables corresponding to the group information to obtain the classification result corresponding to the group information.
  • variable processing module 406 is used to obtain multiple dimensions corresponding to the discrete variables in the group information; encode the multiple dimensions corresponding to the discrete variables to obtain the second continuous variable corresponding to the group information.
  • variable processing module 406 is also used to encode multiple dimensions corresponding to the discrete variable to obtain the value of each dimension; according to the values of the multiple dimensions corresponding to the discrete variable to obtain the second continuous corresponding to the group information variable.
  • variable processing module 406 is further used to calculate the mean and standard deviation of the first continuous variable and the second continuous variable corresponding to the group information; according to the first continuous variable and the second continuous variable, the mean, the standard deviation and The relationship is preset to obtain standardized variables corresponding to the group information.
  • the clustering module 408 is used to measure the distance between the standardized variables corresponding to the group information; cluster the standardized variables according to the distance between the measured standardized variables to obtain multiple standardized variable types; A standardized variable type to obtain the classification result corresponding to the group information.
  • the above-mentioned device further includes an analysis module, which is used to analyze the classification results corresponding to the group information to obtain the distinguishing characteristics of multiple group types; The same distinguishing characteristics of other group types in the classification result corresponding to the group information are compared with feature values to obtain group characteristics of each group type.
  • an analysis module which is used to analyze the classification results corresponding to the group information to obtain the distinguishing characteristics of multiple group types; The same distinguishing characteristics of other group types in the classification result corresponding to the group information are compared with feature values to obtain group characteristics of each group type.
  • the analysis module is also used to select the target group type from the classification result corresponding to the group information; calculate the first mean of continuous variables in the target group type according to the classification result corresponding to the group information; according to the classification result corresponding to the group information Calculate the second mean of the continuous variables in the remaining group types; calculate the distinguishing characteristics of the target group type according to the first mean and the second mean; repeat the steps of analyzing the classification results corresponding to the group information until all group types in the classification result are obtained Distinguishing features.
  • Each module in the above group information classification device may be implemented in whole or in part by software, hardware, or a combination thereof.
  • the above modules may be embedded in the hardware form or independent of the processor in the computer device, or may be stored in the memory in the computer device in the form of software so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure may be as shown in FIG. 5.
  • the computer equipment includes a processor, memory, network interface, and database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer-readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium.
  • the database of the computer device is used to store group information.
  • the network interface of the computer device is used to communicate with external terminals through a network connection. When the computer-readable instructions are executed by the processor to implement a group information classification method.
  • FIG. 5 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Include more or less components than shown in the figure, or combine certain components, or have a different arrangement of components.
  • a computer device which includes a memory and one or more processors.
  • the memory stores computer-readable instructions.
  • processors When the computer-readable instructions are executed by one or more processors, one or more Multiple processors execute the steps in the foregoing method embodiments.
  • One or more non-volatile computer-readable storage media storing computer-readable instructions.
  • the computer-readable instructions When executed by one or more processors, cause the one or more processors to perform the methods in the foregoing method embodiments step.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM random access memory
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain (Synchlink) DRAM
  • RDRAM direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种群体信息分类方法,包括:接收分类任务,分类任务携带群体标识;根据群体标识获取群体信息,识别群体信息对应的第一连续变量以及离散变量;对离散变量进行连续化处理,得到群体信息对应的第二连续变量;将第一连续变量以及第二连续变量进行标准化处理,得到群体信息对应的标准化变量;将群体信息对应的标准化变量进行聚类,得到群体信息对应的分类结果。

Description

群体信息分类方法、装置、计算机设备和存储介质
本申请要求于2019年1月7日提交中国专利局,申请号为2019100126040,申请名称为“群体信息分类方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及一种群体信息分类方法、装置、计算机设备和存储介质。
背景技术
企业针对不同的业务需求,需要通过对群体进行类型划分,以此了解不同群体的特征。在进行类型划分时,涉及到的信息包括与群体信息对应的连续变量和离散变量。在传统方式中,通过度量连续变量两两之间的距离及离散变量两两之间的距离,将度量出的两种类型变量的距离进行权重划分,得到最终的聚类结果,从而得到群体分类结果。但是,对于权重的大小并不能准确地进行计算,导致权重的不同使群体分类结果不准确。
发明内容
根据本申请公开的各种实施例,提供一种能够提高群体信息对应的分类结果的准确性的群体信息分类方法、装置、计算机设备和存储介质。
一种群体信息分类方法,包括:
接收分类任务,所述分类任务携带群体标识;
根据所述群体标识获取群体信息,识别所述群体信息对应的第一连续变量以及离散变量;
对所述离散变量进行连续化处理,得到所述群体信息对应的第二连续变量;
将所述第一连续变量以及第二连续变量进行标准化处理,得到所述群体信息对应的标准化变量;及
将所述群体信息对应的标准化变量进行聚类,得到所述群体信息对应的分类结果。
一种群体信息分类装置,包括:
通信模块,用于接收分类任务,所述分类任务携带群体标识;
变量识别模块,用于根据所述群体标识获取群体信息,识别所述群体信息对应的第一连续变量以及离散变量;
变量处理模块,用于对所述离散变量进行连续化处理,得到所述群体信息对应的第二连续变量;将所述第一连续变量以及第二连续变量进行标准化处理,得到所述群体信息对应的准化变量;及
聚类模块,用于将所述群体信息对应的标准化变量进行聚类,得到所述群体信息对应的分类结果。
一种计算机设备,包括存储器及一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:
接收分类任务,所述分类任务携带群体标识;
根据所述群体标识获取群体信息,识别所述群体信息对应的第一连续变量以及离散变量;
对所述离散变量进行连续化处理,得到所述群体信息对应的第二连续变量;
将所述第一连续变量以及第二连续变量进行标准化处理,得到所述群体信息对应的标准化变量;及
将所述群体信息对应的标准化变量进行聚类,得到所述群体信息对应的分类结果。
一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:
接收分类任务,所述分类任务携带群体标识;
根据所述群体标识获取群体信息,识别所述群体信息对应的第一连续变量以及离散变量;
对所述离散变量进行连续化处理,得到所述群体信息对应的第二连续变量;
将所述第一连续变量以及第二连续变量进行标准化处理,得到所述群体信息对应的标准化变量;及
将所述群体信息对应的标准化变量进行聚类,得到所述群体信息对应的分类结果。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1为一个或多个实施例中群体信息分类方法的应用场景图。
图2为一个或多个实施例中群体信息分类方法的流程示意图。
图3为一个或多个实施例中对离散变量进行连续化处理步骤的流程示意图。
图4为一个或多个实施例中群体信息分类装置的框图。
图5为一个或多个实施例中计算机设备的框图。
具体实施方式
为了使本申请的技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请提供的群体信息分类方法,可以应用于如图1所示的应用场景中。终端102通过网络与服务器104进行通信。服务器104接收终端102上传的分类任务,该分类任务携带群体标识。服务器104根据群体标识获取群体信息,识别群体信息对应的第一连续变量以及离散变量。服务器104对离散变量进行连续化处理,得到群体信息对应的第二连续变量。服务器104对第一连续变量以及第二连续变量进行标准化处理,得到群体信息对应的标准化变量。服务器104将群体信息对应的标准化变量进行聚类,得到群体信息对应的分类结果,能够在不需要设置权重的情况下能够对离散变量进行距离度量,避免不同类型变量之间的距离权重划分对群体分类结果的影响,提高群体信息对应的分类结果的准确性。终端102可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备。服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
在其中一个实施例中,如图2所示,提供了一种群体信息分类方法,以该方法应用于图1中的服务器为例进行说明,包括以下步骤:
步骤202,接收分类任务,分类任务携带群体标识。
步骤204,根据群体标识获取群体信息,识别群体信息对应的第一连续变量以及离散变量。
步骤206,对离散变量进行连续化处理,得到群体信息对应的第二连续变量。
服务器接收终端上传的分类任务,对该分类任务进行解析,得到分类任务携带的群体标识。服务器根据群体标识获取对应的群体信息。群体信息可以是预设范围内的一定数量人群的信息。例如,2018年10月份的员工绩效信息。服务器识别群体信息对应的变量。变量包括连续变量以及离散变量。连续变量包括第一连续变量以及第二连续变量。第一连续变量可以通过群体信息直接进行识别得到。第二连续变量需要通过对离散变量进行转化处理才能得 到。第一连续变量可以是用数值来进行表示的信息。离散变量可以是用多个维度来进行表示的信息。
例如,第一连续变量可以是参加课程的次数、考勤天数、入职年限等。离散变量可以是培训成绩、性别等。服务器在识别群体信息对应的变量后,可对群体信息对应的变量中的离散变量进行连续化处理,服务器在群体信息中获取与离散变量对应的多个维度,对离散变量对应的多个维度进行编码,得到离散变量对应的第二连续变量,此时群体信息对应的变量均为连续变量。
例如,服务器接收终端上传的员工群体分类任务,对该分类任务进行解析,得到分类任务携带的员工群体标识。服务器根据员工群体标识获取每个员工的员工标识,根据员工标识获取员工绩效信息,所有员工标识对应的绩效信息可以称为群体信息,服务器对群体信息进行识别,得到群体信息对应的第一连续变量以及离散变量。第一连续变量可以是参加课程的次数、考勤天数、入职年限等。离散变量可以是培训成绩、性别等。服务器对培训成绩进行连续化处理,得到培训成绩对应的连续变量(是否优秀,是否良好,是否合格,是否不合格)。服务器对性别进行连续化处理,得到性别对应的连续变量(是否男,是否女)。
步骤208,将第一连续变量以及第二连续变量进行标准化处理,得到群体信息对应的标准化变量。
服务器在对离散变量进行连续化处理,得到群体信息对应的第二连续变量后,可将第一连续变量以及第二连续变量进行标准化处理。具体地,服务器计算群体信息对应的变量的均值及标准差,根据群体信息对应的变量、群体信息对应的变量的均值及标准差、以及预设关系,得到群体信息对应的标准化变量。变量包括连续变量以及离散变量。连续变量包括第一连续变量以及第二连续变量。预设关系可以是先将变量减去均值,再除以标准差。
步骤210,将群体信息对应的标准化变量进行聚类,得到群体信息对应的分类结果。
服务器在将第一连续变量以及第二连续变量进行标准化处理,得到群体信息对应的标准化变量后,可将群体信息对应的标准化变量进行聚类,得到群体信息对应的分类结果。具体地,服务器对群体信息对应的标准化变量进行距离度量,根据度量后的标准化变量之间的距离对标准化变量进行聚类,得到多种标准化变量类型。将群体信息对应的群体分为多种群体类型。根据多种标准化变量类型获取对应的群体信息对应的分类结果,该分类结果中包括多种群体类型每种标准化变量类型对应一类群体类型。
例如,企业为了了解员工绩效,需要对员工群体进行类型划分,以此了解不同绩效级别以及同一绩效级别的员工群体的特征。服务器接收终端上传的员工群体分类任务,该分类任务中携带员工群体标识,服务器根据员工群体标识获取每个员工的员工标识,根据员工的员 工标识获取员工绩效信息,将所有员工标识对应的绩效信息称为群体信息,服务器对群体信息进行识别,得到群体信息对应的第一连续变量以及离散变量。服务器对该群体信息对应的离散变量进行连续化处理,得群体信息对应的第二连续变量。服务器将第一连续变量以及第二连续变量进行标准化处理,得到群体信息对应的标准化变量。服务器将群体信息对应的标准化变量进行聚类,得到多种标准化变量类型,从而得到群体信息对应的分类结果,该分类结果可以是不同绩效级别的多个员工类型,也可以是同一绩效级别的多个员工类型。
在本实施例中,服务器通过识别群体信息对应的第一连续变量以及离散变量,对离散变量进行连续化处理,得到群体信息对应的第二连续变量。将第一连续变量以及第二连续变量进行标准化处理,得到群体信息对应的标准化变量,再标准化变量进行聚类,得到群体信息对应的分类结果。相对于传统方式,在不需要设置权重的情况下能够对群体信息对应的离散变量进行距离度量,避免不同类型变量之间的距离权重划分对群体分类结果的影响,能够有效提高群体信息对应的分类结果的准确性。
在其中一个实施例中,如图3所示,对离散变量进行连续化处理的步骤包括:
步骤302,在群体信息中获取与离散变量对应的多个维度。
步骤304,对离散变量对应的多个维度进行编码,得到离散变量对应的第二连续变量。
服务器在识别群体信息对应的第一连续变量以及离散变量后,可在群体信息中获取与离散变量对应的多个维度。例如,离散变量可以是培训成绩、性别等。培训成绩对应四个维度(优秀,良好,合格,不合格),性别对应两个维度(男,女)。
服务器在获取与离散变量对应的多个维度后,可对离散变量对应的多个维度进行编码,得到离散变量对应的第二连续变量。编码方式可以是one-hot(独热)编码。服务器对离散变量进行编码后,可将离散变量对应的多个维度用数值来进行表示,从而将离散变量转化为多个维度的连续变量,在不需要设置权重的情况下,也可以对离散变量之间的距离进行度量。
例如,可以将所有员工标识对应的绩效信息称为群体信息,该群体信息对应的离散变量可以是培训成绩,培训成绩对应四个维度(优秀,良好,合格,不合格),对培训成绩进行编码,得到四个连续变量(是否优秀,是否良好,是否合格,是否不合格),四个连续变量的取值均为0或1,“是”对应的取值为1,“否”对应的取值为0。若培训情况为优秀,则离散变量对应的连续变量为[1,0,0,0]。若培训情况为合格,则离散变量对应的连续变量为[0,0,1,0]。
再如,该群体信息对应的离散变量可以是性别,性别对应两个维度(男,女),对性别进行编码,得到两个连续变量(是否男,是否女),两个连续变量的取值均为0或1,“是”对应的取值为1,“否”对应的取值为0。若性别是男,则离散变量对应的连续变量为[1,0]。 若性别是女,则离散变量对应的连续变量为[0,1]。
在本实施例中,服务器在人群信息中获取与离散变量对应的多个维度,对离散变量对应的多个维度进行编码,能够将离散变量转化为连续变量,从而进行距离度量,不需要将离散变量以及连续变量两种类型变量的距离进行权重划分,有效提高了群体信息对应的分类结果的准确性。
在其中一个实施例中,对离散变量对应的多个维度进行编码,得到群体信息对应的第二连续变量,包括:对离散变量对应的多个维度进行编码,得到每个维度的数值;根据离散变量对应的多个维度的数值得到群体信息对应的第二连续变量。
服务器对离散变量进行编码,可将离散变量对应的多个维度用数值来进行表示,从而将离散变量转化为多个维度的连续变量,在不需要设置权重的情况下,也可以对离散变量之间的距离进行度量。
在其中一个实施例中,将第一连续变量以及第二连续变量进行标准化处理,得到群体信息对应的标准化变量包括:计算群体信息对应的第一连续变量以及第二连续变量的均值及标准差;根据第一连续变量以及第二连续变量、均值、标准差以及预设关系,得到群体信息对应的标准化变量。
服务器根据群体信息第一连续变量以及第二连续变量的均值、标准差以及预设关系,得到群体信息对应的标准化变量。预设关系可以是将第一连续变量以及第二连续变量减去均值,再除以标准差。服务器通过对第一连续变量以及第二连续变量进行标准化处理可以得到第一连续变量以及第二连续变量在所有变量中所占的权重,将所有变量的取值稳定在一个合适的范围内。可以将变量的范围稳定在[0,1]内,能够避免群体信息对应的分类结果受量纲大的变量的影响,导致群体分类结果不合理。例如,进行聚类的变量中包括年龄、收入(单位是人民币)、身高(单位米)、体重(单位千克)。由于收入的量纲远大于其他变量,通过对所有变量进行标准化处理,避免群体信息对应的分类结果只受收入的影响。
在本实施例中,服务器计算群体信息对应的第一连续变量以及第二连续变量的均值及标准差;根据第一连续变量以及第二连续变量、均值、标准差以及预设关系,得到群体信息对应的标准化变量。能够将所有变量的取值稳定在一个合适的范围内,有效避免了变量的量纲影响群体分类结果,进一步提高群体信息对应的分类结果的准确性。
在其中一个实施例中,将标准化处理后的第一连续变量以及第二连续变量进行聚类,得到群体信息对应的分类结果包括:对群体信息对应的标准化变量进行距离度量;根据度量后的标准化变量之间的距离对标准化变量进行聚类,得到多种标准化变量类型;根据多种标准化变量类型获取群体信息对应的分类结果。
服务器在对第一连续变量以及第二连续变量进行标准化处理后,可对第一连续变量以及第二连续变量进行距离度量,在第一连续变量以及第二连续变量中任意选择n个变量作为初始聚类中心,对于剩余的变量,则根据剩余变量与每个聚类中心变量之间的距离,来对剩余变量进行分类,与聚类中心变量距离越近的,表明变量之间的相似度越高,将该剩余变量分配给距离更近的聚类中心变量所代表的群体类型。每增加一个变量就计算获取的新聚类的聚类中心,即计算该新聚类中所有变量的均值。不断重复这一过程,知道聚类结果不再变化。经过聚类,得到多种标准化变量类型,从而得到群体信息对应的分类结果。
在本实施例中,服务器对标准化处理后的第一连续变量以及第二连续变量之间进行距离度量,根据标准化处理后的第一连续变量以及第二连续变量之间的距离对变量进行聚类,得到多种连续变量。根据聚类得到的多种连续变量,获取对应的群体分类结果,提高群体分类结果的准确性。
在其中一个实施例中,上述方法还包括:对群体信息对应的分类结果进行分析,得到多种群体类型的区别特征;将每种群体类型的区别特征与群体信息对应的分类结果中其他群体类型的相同区别特征进行特征值对比,得到每种群体类型的群体特征。
服务器根据度量的标准化变量之间的距离对标准化变量进行聚类,得到多种标准化变量类型,从而获取群体信息对应的分类结果,计算群体信息对应的分类结果中每种群体类型中连续变量的第一均值,计算群体信息对应的分类结果中其他群体类型中连续变量的第二均值。将第一均值与第二均值进行比较,再将第一均值与第二均值按照预设关系进行计算,得到多种群体类型的区别特征。预设关系可以是∣第一均值-第二均值∣/(第一均值+第二均值)。服务器将多种群体类型的区别特征的特征值的进行对比,从而得到每种群体类型的群体特征。
例如,将上述所有员工标识对应的绩效信息称为群体信息,通过对群体信息对应的第一连续变量与第二连续变量进行聚类,得到群体信息对应的分类结果,再对群体信息对应的分类结果进行分析,得到同一绩效级别中某种员工类型的区别特征有考试成绩和工作计划完成量。首先,将该种员工类型的考试成绩与其他类人群的考试成绩进行比较,如果考试成绩高于同一级别的其他员工类型的考试成绩,则表明该种员工类型的学习能力比较强,就可以得出该种员工类型的特征为善于学习,企业为了提高员工绩效,就可以通过对员工进行培训,提高员工的学习能力,从而提高员工绩效。再将该种员工类型的工作计划完成量与其他类人群的工作计划完成量进行比较,如果工作计划完成量高于同一级别的其他员工类型的工作计划完成量,则表明该种员工类型的特征为目标明确,自律能力强,计划完成的工作会在规定的时间内完成。企业为了提高员工绩效,也会注重设定目标,加强自律能力这一方面,提高员工绩效。
在本实施例中,服务器对群体分类结果进行分析,得到多类群体的区别特征,将每类群体的区别特征与群体分类结果中其他类群体的相同区别特征进行特征值对比,得到每类群体的群体特征。能够准确得到对每类群体的群体特征,根据每类群体的群体特征来适应不同的业务需求。
在其中一个实施例中,对群体信息对应的分类结果进行分析,得到多种群体类型的区别特征,包括:在群体信息对应的分类结果选取目标群体类型;根据群体信息对应的分类结果计算目标群体类型中连续变量的第一均值;根据群体信息对应的分类结果计算剩余群体类型中连续变量的第二均值;根据第一均值和第二均值计算得到目标群体类型的区别特征;重复对群体信息对应的分类结果进行分析的步骤,直至得到分类结果中所有群体类型的区别特征。
服务器可以在群体信息对应的分类结果选取一个目标群体类型,计算该目标群体类型中连续变量的第一均值。服务器进而计算群体信息对应的分类结果中剩余群体类型中连续变量的第二均值。服务器将第一均值与第二均值进行比较,再将第一均值与第二均值按照预设关系进行计算,从而得到目标群体类型的区别特征。预设关系可以是∣第一均值-第二均值∣/(第一均值+第二均值)。服务器在群体信息对应的分类结果中选取下一个目标群体类型,计算下一个目标群体类型中连续变量的第一均值。服务器进而计算群体信息对应的分类结果中剩余群体类型中连续变量的第二均值。服务器将第一均值与第二均值进行比较,再将第一均值与第二均值按照预设关系进行计算,从而得到下一个目标群体类型的区别特征。服务器通过重复上述对群体信息对应的分类结果进行分析的步骤,直至得到所有群体类型的区别特征。服务器通过计算目标群体类型对应的第一均值以及剩余群体类型对应的第二均值,能够准确计算目标群体类型的区别特征,有利于后续进一步分析每种群体类型的群体特征。
应该理解的是,虽然图2至3的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2至3中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
在其中一个实施例中,如图4所示,提供了一种群体信息分类装置,包括:通信模块402、变量识别模块404、变量处理模块406、聚类模块408,其中:
通信模块402,用于接收分类任务,分类任务携带群体标识;
变量识别模块404,用于根据群体标识获取群体信息,识别群体信息对应的第一连续变 量以及离散变量;
变量处理模块406,用于对离散变量进行连续化处理,得到群体信息对应的第二连续变量;将第一连续变量以及第二连续变量进行标准化处理,得到群体信息对应的准化变量;
聚类模块408,用于将群体信息对应的标准化变量进行聚类,得到群体信息对应的分类结果。
在其中一个实施例中,变量处理模块406用于在群体信息中获取与离散变量对应的多个维度;对离散变量对应的多个维度进行编码,得到群体信息对应的第二连续变量。
在其中一个实施例中,变量处理模块406还用于对离散变量对应的多个维度进行编码,得到每个维度的数值;根据离散变量对应的多个维度的数值得到群体信息对应的第二连续变量。
在其中一个实施例中,变量处理模块406还用于计算群体信息对应的第一连续变量以及第二连续变量的均值及标准差;根据第一连续变量以及第二连续变量、均值、标准差以及预设关系,得到群体信息对应的标准化变量。
在其中一个实施例中,聚类模块408用于对群体信息对应的标准化变量进行距离度量;根据度量后的标准化变量之间的距离对标准化变量进行聚类,得到多种标准化变量类型;根据多种标准化变量类型获取群体信息对应的分类结果。
在其中一个实施例中,上述装置还包括分析模块,该分析模块用于对所述群体信息对应的分类结果进行分析,得到多种群体类型的区别特征;将每种群体类型的区别特征与所述群体信息对应的分类结果中其他群体类型的相同区别特征进行特征值对比,得到每种群体类型的群体特征。
在其中一个实施例中,分析模块还用于在群体信息对应的分类结果选取目标群体类型;根据群体信息对应的分类结果计算目标群体类型中连续变量的第一均值;根据群体信息对应的分类结果计算剩余群体类型中连续变量的第二均值;根据第一均值和第二均值计算得到目标群体类型的区别特征;重复对群体信息对应的分类结果进行分析的步骤,直至得到分类结果中所有群体类型的区别特征。
关于群体信息分类装置的具体限定可以参见上文中对于群体信息分类方法的限定,在此不再赘述。上述群体信息分类装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图5所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据 库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储群体信息。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种群体信息分类方法。
本领域技术人员可以理解,图5中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在其中一个实施例中,提供了一种计算机设备,包括存储器及一个或多个处理器,存储器中存储有计算机可读指令,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述各个方法实施例中的步骤。
一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述各个方法实施例中的步骤。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种群体信息分类方法,包括:
    接收分类任务,所述分类任务携带群体标识;
    根据所述群体标识获取群体信息,识别所述群体信息对应的第一连续变量以及离散变量;
    对所述离散变量进行连续化处理,得到所述群体信息对应的第二连续变量;
    将所述第一连续变量以及第二连续变量进行标准化处理,得到所述群体信息对应的标准化变量;及
    将所述群体信息对应的标准化变量进行聚类,得到所述群体信息对应的分类结果。
  2. 根据权利要求1所述的方法,其特征在于,所述对所述离散变量进行连续化处理,包括:
    在所述群体信息中获取与所述离散变量对应的多个维度;及
    对所述离散变量对应的多个维度进行编码,得到所述群体信息对应的第二连续变量。
  3. 根据权利要求2所述的方法,其特征在于,所述对所述离散变量对应的多个维度进行编码,得到所述群体信息对应的第二连续变量,包括:
    对所述离散变量对应的多个维度进行编码,得到每个维度的数值;及
    根据所述离散变量对应的多个维度的数值得到所述群体信息对应的第二连续变量。
  4. 根据权利要求1所述的方法,其特征在于,所述将所述第一连续变量以及第二连续变量进行标准化处理,得到所述群体信息对应的标准化变量,包括:
    计算所述群体信息对应的第一连续变量以及第二连续变量的均值及标准差;及
    根据所述第一连续变量以及第二连续变量、均值、标准差以及预设关系,得到所述群体信息对应的标准化变量。
  5. 根据权利要求1所述的方法,其特征在于,所述将所述群体信息对应的标准化变量进行聚类,得到所述群体信息对应的分类结果,包括:
    对所述群体信息对应的标准化变量进行距离度量;
    根据度量后的标准化变量之间的距离对所述标准化变量进行聚类,得到多种标准化变量类型;及
    根据所述多种标准化变量类型获取所述群体信息对应的分类结果。
  6. 根据权利要求1-5任意一项所述的方法,其特征在于,所述方法还包括:
    对所述群体信息对应的分类结果进行分析,得到多种群体类型的区别特征;及
    将每种群体类型的区别特征与所述群体信息对应的分类结果中其他群体类型的相同区别特征进行特征值对比,得到每种群体类型的群体特征。
  7. 根据权利要求5所述的方法,其特征在于,所述对所述群体信息对应的分类结果进行分析,得到多种群体类型的区别特征,包括:
    在所述群体信息对应的分类结果选取目标群体类型;
    根据所述群体信息对应的分类结果计算所述目标群体类型中连续变量的第一均值;
    根据所述群体信息对应的分类结果计算剩余群体类型中连续变量的第二均值;
    根据所述第一均值和第二均值计算得到所述目标群体类型的区别特征;
    重复对群体信息对应的分类结果进行分析的步骤,直至得到所述分类结果中所有群体类型的区别特征。
  8. 一种群体信息分类装置,包括:
    通信模块,用于接收分类任务,所述分类任务携带群体标识;
    变量识别模块,用于根据所述群体标识获取群体信息,识别所述群体信息对应的第一连续变量以及离散变量;
    变量处理模块,用于对所述离散变量进行连续化处理,得到所述群体信息对应的第二连续变量;将所述第一连续变量以及第二连续变量进行标准化处理,得到所述群体信息对应的准化变量;及
    聚类模块,用于将所述群体信息对应的标准化变量进行聚类,得到所述群体信息对应的分类结果。
  9. 根据权利要求8所述的装置,其特征在于,所述变量处理模块用于在所述群体信息中获取与所述离散变量对应的多个维度;及对所述离散变量对应的多个维度进行编码,得到所述群体信息对应的第二连续变量。
  10. 根据权利要求9所述的装置,其特征在于,所述变量处理模块还用于对所述离散变量对应的多个维度进行编码,得到每个维度的数值;及根据所述离散变量对应的多个维度的数值得到所述群体信息对应的第二连续变量。
  11. 一种计算机设备,包括存储器及一个或多个处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:
    接收分类任务,所述分类任务携带群体标识;
    根据所述群体标识获取群体信息,识别所述群体信息对应的第一连续变量以及离散变量;
    对所述离散变量进行连续化处理,得到所述群体信息对应的第二连续变量;
    将所述第一连续变量以及第二连续变量进行标准化处理,得到所述群体信息对应的标准化变量;及
    将所述群体信息对应的标准化变量进行聚类,得到所述群体信息对应的分类结果。
  12. 根据权利要求11所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:在所述群体信息中获取与所述离散变量对应的多个维度;及对所述离散变量对应的多个维度进行编码,得到所述群体信息对应的第二连续变量。
  13. 根据权利要求12所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:对所述离散变量对应的多个维度进行编码,得到每个维度的数值;及根据所述离散变量对应的多个维度的数值得到所述群体信息对应的第二连续变量。
  14. 根据权利要求11所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:计算所述群体信息对应的第一连续变量以及第二连续变量的均值及标准差;及根据所述第一连续变量以及第二连续变量、均值、标准差以及预设关系,得到所述群体信息对应的标准化变量。
  15. 根据权利要求11所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:对所述群体信息对应的标准化变量进行距离度量;根据度量后的标准化变量之间的距离对所述标准化变量进行聚类,得到多种标准化变量类型;及根据所述多种标准化变量类型获取所述群体信息对应的分类结果。
  16. 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:
    接收分类任务,所述分类任务携带群体标识;
    根据所述群体标识获取群体信息,识别所述群体信息对应的第一连续变量以及离散变量;
    对所述离散变量进行连续化处理,得到所述群体信息对应的第二连续变量;
    将所述第一连续变量以及第二连续变量进行标准化处理,得到所述群体信息对应的标准化变量;及
    将所述群体信息对应的标准化变量进行聚类,得到所述群体信息对应的分类结果。
  17. 根据权利要求16所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:在所述群体信息中获取与所述离散变量对应的多个维度;及对所述离散变量对应的多个维度进行编码,得到所述群体信息对应的第二连续变量。
  18. 根据权利要求17所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:对所述离散变量对应的多个维度进行编码,得到每个维度的数值;及根据所述离散变量对应的多个维度的数值得到所述群体信息对应的第二连续变量。
  19. 根据权利要求16所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:计算所述群体信息对应的第一连续变量以及第二连续变量的均值及 标准差;及根据所述第一连续变量以及第二连续变量、均值、标准差以及预设关系,得到所述群体信息对应的标准化变量。
  20. 根据权利要求16所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:对所述群体信息对应的标准化变量进行距离度量;根据度量后的标准化变量之间的距离对所述标准化变量进行聚类,得到多种标准化变量类型;及根据所述多种标准化变量类型获取所述群体信息对应的分类结果。
PCT/CN2019/117529 2019-01-07 2019-11-12 群体信息分类方法、装置、计算机设备和存储介质 WO2020143305A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910012604.0 2019-01-07
CN201910012604.0A CN109858525A (zh) 2019-01-07 2019-01-07 群体信息分类方法、装置、计算机设备和存储介质

Publications (1)

Publication Number Publication Date
WO2020143305A1 true WO2020143305A1 (zh) 2020-07-16

Family

ID=66894092

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/117529 WO2020143305A1 (zh) 2019-01-07 2019-11-12 群体信息分类方法、装置、计算机设备和存储介质

Country Status (2)

Country Link
CN (1) CN109858525A (zh)
WO (1) WO2020143305A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112765467A (zh) * 2021-01-19 2021-05-07 北京嘀嘀无限科技发展有限公司 一种服务推荐方法、装置、电子设备以及存储介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109858525A (zh) * 2019-01-07 2019-06-07 平安科技(深圳)有限公司 群体信息分类方法、装置、计算机设备和存储介质
CN110399430A (zh) * 2019-06-14 2019-11-01 平安科技(深圳)有限公司 用户特征确定方法、装置、设备及计算机可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103236025A (zh) * 2013-04-25 2013-08-07 国家电网公司 基于电力用户数据的数据归整处理方法
US20140214492A1 (en) * 2004-05-28 2014-07-31 Vendavo, Inc. Systems and methods for price point analysis
CN105719661A (zh) * 2016-01-29 2016-06-29 西安交通大学 一种弦乐器演奏音质自动判别方法
CN108549973A (zh) * 2018-03-22 2018-09-18 中国平安人寿保险股份有限公司 识别模型构建及评估的方法、装置、存储介质及终端
CN109858525A (zh) * 2019-01-07 2019-06-07 平安科技(深圳)有限公司 群体信息分类方法、装置、计算机设备和存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140214492A1 (en) * 2004-05-28 2014-07-31 Vendavo, Inc. Systems and methods for price point analysis
CN103236025A (zh) * 2013-04-25 2013-08-07 国家电网公司 基于电力用户数据的数据归整处理方法
CN105719661A (zh) * 2016-01-29 2016-06-29 西安交通大学 一种弦乐器演奏音质自动判别方法
CN108549973A (zh) * 2018-03-22 2018-09-18 中国平安人寿保险股份有限公司 识别模型构建及评估的方法、装置、存储介质及终端
CN109858525A (zh) * 2019-01-07 2019-06-07 平安科技(深圳)有限公司 群体信息分类方法、装置、计算机设备和存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112765467A (zh) * 2021-01-19 2021-05-07 北京嘀嘀无限科技发展有限公司 一种服务推荐方法、装置、电子设备以及存储介质

Also Published As

Publication number Publication date
CN109858525A (zh) 2019-06-07

Similar Documents

Publication Publication Date Title
WO2020248843A1 (zh) 基于大数据的画像分析方法、装置、计算机设备及存储介质
WO2020143305A1 (zh) 群体信息分类方法、装置、计算机设备和存储介质
CN109272396B (zh) 客户风险预警方法、装置、计算机设备和介质
Unal Defining an optimal cut‐point value in ROC analysis: an alternative approach
WO2020077895A1 (zh) 签约意向判断方法、装置、计算机设备和存储介质
WO2021027317A1 (zh) 基于关系网络的属性信息处理方法、装置、计算机设备和存储介质
WO2020119030A1 (zh) 用于答复问题的模型训练方法、装置、设备及存储介质
WO2021120677A1 (zh) 一种仓储模型训练方法、装置、计算机设备及存储介质
WO2020015089A1 (zh) 身份信息风险评定方法、装置、计算机设备和存储介质
WO2020253357A1 (zh) 数据产品推荐方法、装置、计算机设备和存储介质
WO2019041439A1 (zh) 核保难度预测的方法、装置、计算机设备及存储介质
WO2020057021A1 (zh) 数据表处理方法、装置、计算机设备和存储介质
WO2020177366A1 (zh) 基于时序数据的数据处理方法、装置和计算机设备
WO2022252454A1 (zh) 异常数据检测方法、装置、计算机设备和可读存储介质
WO2023050534A1 (zh) 轨道交通站点设备能耗预测方法、装置、设备和存储介质
WO2020034801A1 (zh) 医疗特征筛选方法、装置、计算机设备和存储介质
WO2020056968A1 (zh) 数据降噪方法、装置、计算机设备和存储介质
CN110458601B (zh) 资源数据的处理方法、装置、计算机设备和存储介质
CN110610431A (zh) 基于大数据的智能理赔方法及智能理赔系统
WO2021135063A1 (zh) 病理数据分析方法、装置、设备及存储介质
CN113707296B (zh) 医疗方案数据处理方法、装置、设备及存储介质
CN108009740B (zh) 一种烟用香精香料智能化精细识别系统及方法
CN116842330B (zh) 一种可对比历史记录的保健信息处理方法及装置
WO2022022042A1 (zh) 监控数据上报方法、装置、计算机设备及存储介质
US11367311B2 (en) Face recognition method and apparatus, server, and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19909018

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 31.08.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19909018

Country of ref document: EP

Kind code of ref document: A1