CN110852392A - User grouping method, device, equipment and medium - Google Patents

User grouping method, device, equipment and medium Download PDF

Info

Publication number
CN110852392A
CN110852392A CN201911107058.5A CN201911107058A CN110852392A CN 110852392 A CN110852392 A CN 110852392A CN 201911107058 A CN201911107058 A CN 201911107058A CN 110852392 A CN110852392 A CN 110852392A
Authority
CN
China
Prior art keywords
attribute
user
user group
data
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911107058.5A
Other languages
Chinese (zh)
Inventor
邓杨
高宏华
王杰明
傅立霖
张佳煌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CCB Finetech Co Ltd
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co Ltd filed Critical China Construction Bank Corp
Priority to CN201911107058.5A priority Critical patent/CN110852392A/en
Publication of CN110852392A publication Critical patent/CN110852392A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Data Mining & Analysis (AREA)
  • Accounting & Taxation (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Economics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Technology Law (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a user grouping method, a device, equipment and a medium. The method comprises the following steps: generating a preset number of user groups according to the existing user data; determining the central numerical attribute characteristics of the user group according to the numerical attribute data of the users in the user group; determining the center classification attribute characteristics of the user group according to the classification attribute data of the users in the user group; and matching the numerical attribute data of the existing user with the central numerical attribute characteristics of the user group, matching the classified attribute data of the existing user with the central classified attribute characteristics of the user group, and selecting the user group to which the existing user belongs from the generated user group. According to the embodiment of the invention, the central numerical attribute characteristics and the central classification attribute characteristics of the user group are determined according to the user numerical attribute data and the classification attribute data in the user group, so that the final classification result after matching based on the central attribute characteristics is more accurate and has high classification stability.

Description

User grouping method, device, equipment and medium
Technical Field
The embodiment of the invention relates to the technical field of data mining, in particular to a user grouping method, a device, equipment and a medium.
Background
At present, domestic insurance companies have tried to use big data technology for data analysis and decision support. Specifically, the users are grouped by a data mining method, user groups with different requirements and characteristics are found, personalized and differentiated services and products are provided for each user group, more potential value customers and potential loss users are developed, the maximization of user satisfaction and loyalty is finally realized, the enterprise market share and profit are expanded, and the competitive advantages of enterprises are consolidated.
The existing user clustering methods comprise a k-means clustering method, a k-modes clustering method, a k-protocols clustering method and the like, but the user clustering results obtained by the clustering methods are not accurate, often deviate from the actual situation and have poor stability.
Disclosure of Invention
The embodiment of the invention provides a user clustering method, a user clustering device, user clustering equipment and a user clustering medium, which are used for solving the problems of inaccurate clustering result and poor stability of the existing user clustering method.
In a first aspect, an embodiment of the present invention provides a user grouping method, where the method includes:
generating a preset number of user groups according to the existing user data; wherein the user data comprises a numeric attribute and a categorical attribute;
determining the central numerical attribute characteristics of the user group according to the numerical attribute data of the users in the user group;
determining the center classification attribute characteristics of the user group according to the classification attribute data of the users in the user group;
matching the numerical attribute data of the existing user with the central numerical attribute features of the user group, matching the classification type attribute data of the existing user with the central classification type attribute features of the user group, and selecting the user group to which the existing user belongs from the generated user group.
In a second aspect, an embodiment of the present invention provides a user grouping apparatus, where the apparatus includes:
the user group generating module is used for generating a preset number of user groups according to the existing user data; wherein the user data comprises a numeric attribute and a categorical attribute;
the first attribute characteristic determining module is used for determining the central numerical attribute characteristic of the user group according to the numerical attribute data of the users in the user group;
the second attribute characteristic determining module is used for determining the center classification attribute characteristics of the user group according to the classification attribute data of the users in the user group;
and the user group selection module is used for matching the numerical attribute data of the existing user with the central numerical attribute characteristics of the user group, matching the classification attribute data of the existing user with the central classification attribute characteristics of the user group, and selecting the user group to which the existing user belongs from the generated user group.
In a third aspect, an embodiment of the present invention provides an apparatus, where the apparatus includes:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement a user clustering method as in any of the embodiments of the invention.
In a fourth aspect, the present invention provides a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the user grouping method according to any one of the embodiments of the present invention.
According to the embodiment of the invention, the central numerical attribute feature and the central classification attribute feature of the user group are determined according to the user numerical attribute data and the classification attribute data in the user group, the numerical attribute data and the classification attribute data of the existing user are respectively matched with the central numerical attribute feature and the central classification attribute feature of the user group, and the user group to which the existing user belongs is selected from the generated user group according to the matching result, so that the final grouping result after matching based on the central attribute feature is more accurate and has high grouping stability.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a flowchart of a user grouping method according to an embodiment of the present invention;
fig. 2 is a flowchart of a user grouping method according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a user grouping apparatus according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of an apparatus according to a fourth embodiment of the present invention.
Detailed Description
The embodiments of the present invention will be described in further detail with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the embodiments of the invention and that no limitation of the invention is intended. It should be further noted that, for convenience of description, only the structures related to the embodiments of the present invention are shown in the drawings, not all the structures.
Example one
Fig. 1 is a flowchart of a user grouping method according to an embodiment of the present invention. The present embodiment is suitable for a case where users are grouped according to existing user data, and the method may be executed by the user grouping device provided in the embodiment of the present invention, and the user grouping device may be implemented by software and/or hardware. As shown in fig. 1, the method may include:
step 101, generating a preset number of user groups according to existing user data; wherein the user data comprises a numeric attribute and a categorical attribute.
The user data may be basic information of stock users who purchase insurance in a bank insurance channel, insurance purchasing behavior information, insurance withdrawal behavior information, customer insurance life cycle information, policy value information, intra-row investment financing behavior information, intra-row value level information, asset information, and the like. A user corresponds to a piece of user data, and the piece of user data comprises numerical attribute data of at least one field and classification type attribute data of at least one field, wherein the numerical attribute data takes numbers as expressions, such as age, monthly wages and the like, and the classification type attribute data comprises at least two candidate classification type attributes, such as gender (male or female) and academic calendar (elementary school, junior middle school, high school or university) and the like.
Specifically, randomly extracting user data of a target data amount from the existing user data to serve as a user group, and repeating the same operation until a preset number of user groups are obtained, wherein the preset number is determined by the user grouping number, for example, if the existing users are to be grouped into 3 groups, the preset number is 3; the target data amount is determined according to a ratio of the number of the existing users to a preset number, for example, if the number of the existing users is 30 and the preset number is 3, the target data amount is 30/3-10 pieces, and if the number of the existing users is 20 and the preset number is 3, the target data amount is [20/3] -6 pieces. The user data may be extracted continuously from the existing user data, or may be extracted once, and the obtained user data is returned to the existing user data, and then the user data of the target data amount is extracted.
Optionally, this embodiment further provides a specific implementation manner for generating a preset number of user groups:
assume that the existing user data is expressed in the form of a data set X ═ X1,X2,…,XnIn which X isiAnd represents the user data corresponding to the ith user.
1. Put the whole data set X ═ X1,X2,…,XnThe component X { [ X ]1,…,Xk],[Xk+1,…,X2k]…, where k is a predetermined number, and if there is any remaining user data after dividing the data set into n/k groups, then allocating the remaining user data to the first group.
2. Randomly extracting one piece of user data in each obtained group to obtain n/k pieces of user data, and taking the obtained n/k pieces of user data as a user group.
3. And (3) repeating the step (2) according to the preset number until the user groups with the preset number are generated.
The user groups with the preset number are generated according to the existing user data, and a foundation is laid for subsequently selecting the user groups to which the existing users belong from the generated user groups.
And 102, determining the central numerical attribute characteristics of the user group according to the numerical attribute data of the users in the user group.
Specifically, according to the obtained preset number of user groups, the numerical attribute data of each user in each user group is determined, mathematical statistical calculation is performed on the numerical attribute data of the users in each user group according to a preset mathematical statistical algorithm, the result after the statistical calculation is used as the central numerical attribute feature of the corresponding user group, and the preset mathematical statistical algorithm includes but is not limited to mean calculation, median calculation, mode calculation, weighted average calculation and the like.
Optionally, step 102 includes: and taking the average value of the numerical attribute data of the users in the user group as the central numerical attribute characteristic of the user group.
For example, assuming that the numerical attribute data of the users in the user group a includes three categories of "age", "monthly income", and "loan amount", the average value of "age" in the user group a is determined to be "35 years", "the average value of" monthly income "is" 7831 yuan ", and the average value of" loan amount "is" 2461 yuan "through the average calculation, then" age 35 years "," monthly income 7831 yuan ", and" loan amount 245761 yuan "are taken as the central numerical attribute feature of the user group a.
The central numerical attribute characteristics of the user group are determined according to the numerical attribute data of the users in the user group, so that the problem that the grouping result is too dependent on the central numerical attribute characteristics and the grouping result is changed correspondingly when the central numerical attribute characteristics are changed due to the fact that a certain user data is selected randomly in the conventional grouping method and the numerical attribute data of the user data are used as the central numerical attribute characteristics of the user group is solved.
And 103, determining the center classification attribute characteristics of the user group according to the classification attribute data of the users in the user group.
Specifically, according to the obtained preset number of user groups, the classification attribute data of each user in each user group is determined, the distribution statistics is performed on the classification attribute data of the users in each user group, the statistical distribution result is used as the center classification attribute feature of the corresponding user group, and the distribution statistics process can include the frequency statistics of the classification attribute data, the number statistics of the classification attribute data and the like.
Optionally, step 103 includes: and determining the center typing attribute characteristics of the user group according to the frequency of the classification attribute data of the users in the user group.
For example, assuming that the categorical attribute data of the users in the user group B includes three categories, i.e., "gender", "academic story" and "marital status", after statistical distribution of the categorical attribute data of the users in the user group B, it is determined that the frequency of "gender males" and "gender females" in the user group B are "55%"; the frequency of the "primary school calendar" is "10%", "the frequency of the" primary school calendar "is" 30% "," the frequency of the "high school calendar" is "40%", and the frequency of the "college school calendar" is "20%"; the frequency of "married" is "73%", "unmarried" is "20%", and "dissociated" is "7%", and then "sex male, 55%", "sex female, 45%", "primary school calendar, 10%", "junior school calendar, 30%", "high school calendar, 40%", "college school calendar, 20%", "married, 73%", "not married, 20%" and "dissociated, 7%" are taken as the center-scoring type attribute features of the user group B.
The central classification attribute characteristics of the user group are determined according to the classification attribute data of the users in the user group, so that the problem that the classification result is too dependent on the central classification attribute characteristics and the classification result is changed correspondingly when the central classification attribute characteristics are changed because a certain user data is randomly selected and the classification attribute data of the user data is used as the central classification attribute characteristics of the user group in the conventional classification method is solved.
And 104, matching the numerical attribute data of the existing user with the central numerical attribute characteristics of the user group, matching the classification attribute data of the existing user with the central classification attribute characteristics of the user group, and selecting the user group to which the existing user belongs from the generated user group.
Specifically, the user group to which the existing user belongs is selected and determined jointly according to the matching result of the numerical attribute data of the existing user and the central numerical attribute features of each user group, and the matching result of the classification type attribute data of the existing user and the central classification type attribute features of each user group.
Optionally, matching the numerical attribute data of the existing user with the central numerical attribute feature of the user group includes: calculating the difference between the classified attribute data of the existing users and the numerical attribute characteristics of the centers corresponding to the user groups, further calculating the square value of the difference, and finally calculating the square valueThe squared values of the differences are summed to obtain a matching result. For example, the process of matching the numerical attribute data of the existing users with the central numerical attribute features of the user group can be represented by a euclidean distance formula:
Figure BDA0002271623700000071
wherein r represents a numerical attribute,representing user XlNumerical attribute data of and user group ZxThe euclidean distance between the central numerical attribute features of (a),representing user XlThe j-th numerical attribute data of (1),
Figure BDA0002271623700000082
representing a group of users ZxJ-th central numerical attribute feature of (1), p represents user XlNumber of numerical attribute data and user group ZxThe central numerical attribute feature quantity of (1).
Optionally, matching the classified attribute data of the existing user with the central classified attribute features of the user group includes: taking the classified attribute data of the existing users as first classified attribute data, and taking the classified attribute data except the first classified attribute data in the classified attribute total data as second classified attribute data; taking a first frequency value as a first classification attribute feature of the first classification attribute data, and taking a second frequency value as a second classification attribute feature of the second classification attribute data, so as to obtain an existing user classification attribute feature comprising the first classification attribute feature and the second classification attribute feature; and matching the existing user type-classifying attribute characteristics with the center type-classifying attribute characteristics of the user group. Wherein the first frequency value is 100, and the second frequency value is 0.
Optionally, selecting a user group to which an existing user belongs from the generated user group includes: determining a total matching result according to the numerical attribute data of the existing user and a first matching result of the central numerical attribute characteristics of the user group, and the classified attribute data of the existing user and a second matching result of the central classified attribute characteristics of the user group; the total matching result is the sum of the first matching result and the second matching result; and taking the user group corresponding to the minimum matching total result as the user group to which the existing user belongs. For example, suppose that the user group includes a user group a, a user group B, and a user group C, the first matching result of the user Z and the user group a is 1, and the second matching result is 1; the first matching result of the user Z and the user group B is 1.5, and the second matching result is 2; the first matching result between the user Z and the user group C is 0.5, the second matching result is 2.5, and the minimum matching result is 1+1 of the user Z and the user group a being 2, so that the user group a is the user group to which the user Z belongs.
The numerical attribute data of the existing users are matched with the central numerical attribute characteristics of the user groups, the classification attribute data of the existing users are matched with the central classification attribute characteristics of the user groups, and the user groups to which the existing users belong are selected from the generated user groups, so that the technical effect of user grouping on the existing users is realized.
According to the technical scheme provided by the embodiment of the invention, the central numerical attribute feature and the central classification attribute feature of the user group are respectively determined according to the user numerical attribute data and the classification attribute data in the user group, so that the problem that the clustering result is too dependent on the central numerical attribute feature and the central classification attribute feature when the central numerical attribute feature and the central classification attribute feature are changed, and the accuracy and the stability of the clustering result are improved, wherein the central numerical attribute feature and the central classification attribute feature of the user group are avoided by randomly selecting certain user data and taking the numerical attribute data and the classification attribute data of the user data as the central numerical attribute feature and the central classification attribute feature of the user group in the conventional clustering method.
On the basis of the above embodiment, before "generating the preset number of user groups" in step 101, the method includes:
A. and carrying out index processing on the existing user data to generate an index wide table.
By processing the existing user data into the index wide table, the implementation efficiency of the subsequent related steps related to the user data can be greatly improved.
B. And carrying out missing value processing, abnormal value processing and standardization processing on the user data in the index wide table.
Specifically, the missing value processing is processing of missing values for numerical attribute data and classification attribute data in the user data. Missing value processing of the numeric attribute data includes filling missing numeric attribute data with zero values or average values; the missing value processing of the categorical attribute data includes processing the missing value into an independent value of the categorical attribute, for example, if the categorical attribute data is "gender" missing, then "gender" is processed into a "third category" independent value independent of "male" or "female".
The abnormal value processing is abnormal value processing for numerical attribute data in user data. And according to the mean value of the numerical attribute data and a preset threshold value, if the numerical attribute data is larger than the sum of the mean value and the preset threshold value or smaller than the difference between the mean value and the preset threshold value, determining that the numerical attribute data is an abnormal value, and replacing the abnormal value by adopting a preset quantile of a numerical attribute data value range. For example, if the average value of the numerical attribute data "age" is "age 35", the preset threshold value is "age 30", and the preset quantile is 99 quantile, "age" greater than 35+30 or less than 35-30 or less than 5 is taken as the abnormal value, and the "age" is replaced with the "age" 99 quantile in the value range of "age".
The normalization process is a normalization process for numerical attribute data in user data. The process of normalization can be formulated from
Figure BDA0002271623700000101
Is shown, wherein xijRepresenting numerical attribute data before normalization, sijRepresenting normalized numerical attributesData, min (x)j) Represents the minimum value, max (x), in the numeric attribute data before normalizationj) Representing the maximum value in the numeric attribute data before normalization.
By carrying out missing value processing, abnormal value processing and standardization processing on the user data in the index wide table, the quality of the user data is ensured, and the final user clustering result is more reliable.
On the basis of the above embodiment, step 104 includes:
and taking the clustering result of the existing user data as a training sample, training the training sample by adopting a neural network algorithm, calculating the weight of each numerical attribute data and each classified attribute data in the user data, and obtaining a trained neural network model. When the existing user data is updated, the updated user data can be directly input into the trained neural network model to obtain the user group to which the user belongs.
The weights of the numerical attribute data and the classification attribute data in the user data are calculated by training the training samples through the neural network algorithm, and a trained neural network model is obtained, so that the influence of the numerical attribute data and the classification attribute data on the clustering result is avoided, a more accurate clustering result can be obtained, and the actual requirement is met.
Example two
Fig. 2 is a flowchart of a user clustering method according to a second embodiment of the present invention. This embodiment provides a specific implementation manner for "matching the classified attribute data of the existing user with the central classified attribute feature of the user group" in step 104 of the above embodiment, as shown in fig. 2, the method may include:
step 201, using the classified attribute data of the existing user as the first classified attribute data, and using the classified attribute data except the first classified attribute data in the classified attribute total data as the second classified attribute data.
The total data of the classified attributes includes all possible values of the classified attributes. The number of the first and second classification attribute data may be one or plural.
For example, assuming that there are three categories of "sex", "school history" and "marital status" in the categorical attribute data of the user a, and that the categorical attribute data of the three categories are "sex male", "university school history" and "married", respectively, the "male", "university school history" and "married" are taken as the first categorical attribute data. Assuming that the total data of the genre attributes is "sex male", "sex female", "primary school calendar", "junior school calendar", "high school calendar", "college school calendar", "married", "not married", and "divorce", the genre attribute data "sex female", "primary school calendar", "junior school calendar", "high school calendar", "not married", and "divorce" other than the first genre attribute data are taken as the second genre attribute data.
Step 202, using the first frequency value as a first classification attribute feature of the first classification attribute data, and using the second frequency value as a second classification attribute feature of the second classification attribute data, so as to obtain an existing user classification attribute feature including the first classification attribute feature and the second classification attribute feature.
Wherein the first frequency value is 100, and the second frequency value is 0.
For example, assuming that the first-type attribute data is "sex man", "university school calendar", and "married", then "100%" is taken as the first-type attribute feature of "sex man", "university school calendar", and "married"; the second type attribute data is "sex woman", "primary school calendar", "junior middle school calendar", "high school calendar", "not married" and "divorce", and "0%" is used as the second type attribute feature of "sex woman", "primary school calendar", "junior middle school calendar", "high school calendar", "not married" and "divorce".
Step 203, determining a first center classification type attribute feature of the first classification type attribute data in the center classification type attribute features, and a second center classification type attribute feature of the second classification type attribute data in the center classification type attribute features.
For example, assuming that the first-score attribute data is "sex male", "university history", and "married", the "sex male, 50%", "university history, 20%" and "married, 73%" in the center classification type attribute feature are taken as the first center-score attribute feature; assuming that the second-category attribute data is "gender maid", "primary school calendar", "high school calendar", "not married" and "divorce", the "gender maid, 45%", "primary school calendar, 10%", "primary school calendar, 30%", "high school calendar, 40%", "not married, 20%" and "divorce, 7%" in the center-classification attribute feature are taken as the second center-category attribute feature.
And 204, taking the square of the difference between the first classification attribute feature and the first center classification attribute feature as the difference of the first classification attribute feature.
Illustratively, assuming that the first-type attribute features are "sex male, 100%", "university school calendar, 100%" and "married, 100%", and the first-center classification attribute features are "sex male, 50%", "university school calendar, 20%" and "married, 73%", then (100% -50%)2、(100%-20%)2And (100% -73%)2As the first categorical attribute feature difference.
And step 205, taking the square of the difference between the second classification attribute feature and the second center classification attribute feature as the second classification attribute feature difference.
Illustratively, assume that the second center classification attribute features "gender maid, 0%", "primary school calendar, 0%", "junior school calendar, 0%", "high school calendar, 0%", "not married, 0%" and "dissociate, 0%", and the second center classification attribute features "gender maid, 45%", "primary school calendar, 10%", "junior school calendar, 30%", "high school calendar, 40%", "not married, 20%" and "dissociate, 7%".Then (0% -45%)2、(0%-10%)2、(0%-30%)2、(0%-40%)2、(0%-20%)2And (0% -7%)2As a second classification attribute feature difference.
And step 206, summing the first classification attribute feature difference and the second classification attribute feature difference to obtain a target classification attribute feature difference so as to obtain a matching result.
For example, assume that the difference of the first-type attribute features is (100% -50%)2、(100%-20%)2And (100% -73%)2The difference value of the second type attribute characteristic is (0% -45%)2、(0%-10%)2、(0%-30%)2、(0%-40%)2、(0%-20%)2And (0% -7%)2Then (100% -50%)2+(100%-20%)2+(100%-73%)2+(0%-45%)2+(0%-10%)2+(0%-30%)2+(0%-40%)2+(0%-20%)2+(0%-7%)2As the target classification type attribute feature difference.
According to the technical scheme provided by the embodiment of the invention, the square of the difference between the first type attribute feature and the first center type attribute feature is used as the difference of the first type attribute feature, the square of the difference between the second type attribute feature and the second center type attribute feature is used as the difference of the second type attribute feature, and the difference of the first type attribute feature and the difference of the second type attribute feature are summed to obtain the difference of the target type attribute feature so as to obtain the matching result.
EXAMPLE III
Fig. 3 is a schematic structural diagram of a user grouping apparatus according to a third embodiment of the present invention, which is capable of performing a user grouping method according to any embodiment of the present invention, and has functional modules and beneficial effects corresponding to the performing method. As shown in fig. 3, the apparatus may include:
a user group generating module 31, configured to generate a preset number of user groups according to existing user data; wherein the user data comprises a numeric attribute and a categorical attribute;
a first attribute feature determination module 32, configured to determine a central numerical attribute feature of the user group according to numerical attribute data of users in the user group;
a second attribute feature determination module 33, configured to determine a central classification attribute feature of the user group according to the classification attribute data of the users in the user group;
and the user group selection module 34 is configured to match the numerical attribute data of the existing user with the central numerical attribute features of the user group, match the classification attribute data of the existing user with the central classification attribute features of the user group, and select the user group to which the existing user belongs from the generated user group.
On the basis of the foregoing embodiment, the first attribute feature determining module 32 is specifically configured to:
and taking the average value of the numerical attribute data of the users in the user group as the central numerical attribute characteristic of the user group.
On the basis of the foregoing embodiment, the second attribute feature determination module 33 is specifically configured to:
and determining the center typing attribute characteristics of the user group according to the frequency of the classification attribute data of the users in the user group.
On the basis of the foregoing embodiment, the user group selecting module 34 is specifically configured to:
taking the classified attribute data of the existing users as first classified attribute data, and taking the classified attribute data except the first classified attribute data in the classified attribute total data as second classified attribute data;
taking a first frequency value as a first classification attribute feature of the first classification attribute data, and taking a second frequency value as a second classification attribute feature of the second classification attribute data, so as to obtain an existing user classification attribute feature comprising the first classification attribute feature and the second classification attribute feature;
and matching the existing user type-classifying attribute characteristics with the center type-classifying attribute characteristics of the user group.
On the basis of the foregoing embodiment, the user group selecting module 34 is further specifically configured to:
determining a first center-typed attribute feature of the first typed attribute data in the center-typed attribute features and a second center-typed attribute feature of the second typed attribute data in the center-typed attribute features;
taking the square of the difference between the first categorical attribute feature and the first center categorical attribute feature as a first categorical attribute feature difference;
taking the square of the difference between the second type attribute feature and the second center type attribute feature as a second type attribute feature difference;
and summing the first classification attribute feature difference and the second classification attribute feature difference to obtain a target classification attribute feature difference so as to obtain a matching result.
The user grouping device provided by the embodiment of the invention can execute the user grouping method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to a user grouping method provided in any embodiment of the present invention.
Example four
Fig. 4 is a schematic structural diagram of an apparatus according to a fourth embodiment of the present invention. Fig. 4 illustrates a block diagram of an exemplary device 400 suitable for use in implementing embodiments of the present invention. The apparatus 400 shown in fig. 4 is only an example and should not bring any limitations to the functionality or scope of use of the embodiments of the present invention.
As shown in FIG. 4, device 400 is in the form of a general purpose computing device. The components of device 400 may include, but are not limited to: one or more processors or processing units 401, a system memory 402, and a bus 403 that couples the various system components (including the system memory 402 and the processing unit 401).
Bus 403 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Device 400 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by device 400 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 402 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)404 and/or cache memory 405. The device 400 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 406 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, and commonly referred to as a "hard drive"). Although not shown in FIG. 4, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to the bus 403 by one or more data media interfaces. Memory 402 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 408 having a set (at least one) of program modules 407 may be stored, for example, in memory 402, such program modules 407 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 407 generally perform the functions and/or methods of the described embodiments of the invention.
Device 400 may also communicate with one or more external devices 409 (e.g., keyboard, pointing device, display 410, etc.), with one or more devices that enable a user to interact with device 400, and/or with any devices (e.g., network card, modem, etc.) that enable device 400 to communicate with one or more other computing devices. Such communication may be through input/output (I/O) interface 411. Also, device 400 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) through network adapter 412. As shown, the network adapter 412 communicates with the other modules of the device 400 over the bus 403. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with device 400, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processing unit 401 executes various functional applications and data processing by running the program stored in the system memory 402, for example, to implement the user clustering method provided by the embodiment of the present invention, including:
generating a preset number of user groups according to the existing user data; wherein the user data comprises a numeric attribute and a categorical attribute;
determining the central numerical attribute characteristics of the user group according to the numerical attribute data of the users in the user group;
determining the center classification attribute characteristics of the user group according to the classification attribute data of the users in the user group;
matching the numerical attribute data of the existing user with the central numerical attribute features of the user group, matching the classification type attribute data of the existing user with the central classification type attribute features of the user group, and selecting the user group to which the existing user belongs from the generated user group.
EXAMPLE five
An embodiment of the present invention further provides a computer-readable storage medium, where the computer-executable instructions, when executed by a computer processor, are configured to perform a user clustering method, where the method includes:
determining a target tracking area according to the area of a target object in different frame images, and tracking the target object in the target tracking area;
generating a preset number of user groups according to the existing user data; wherein the user data comprises a numeric attribute and a categorical attribute;
determining the central numerical attribute characteristics of the user group according to the numerical attribute data of the users in the user group;
determining the center classification attribute characteristics of the user group according to the classification attribute data of the users in the user group;
matching the numerical attribute data of the existing user with the central numerical attribute features of the user group, matching the classification type attribute data of the existing user with the central classification type attribute features of the user group, and selecting the user group to which the existing user belongs from the generated user group.
Of course, the storage medium containing the computer-executable instructions provided by the embodiments of the present invention is not limited to the method operations described above, and may also perform related operations in a user clustering method provided by any embodiment of the present invention. The computer-readable storage media of embodiments of the invention may take any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (12)

1. A method for user clustering, the method comprising:
generating a preset number of user groups according to the existing user data; wherein the user data comprises a numeric attribute and a categorical attribute;
determining the central numerical attribute characteristics of the user group according to the numerical attribute data of the users in the user group;
determining the center classification attribute characteristics of the user group according to the classification attribute data of the users in the user group;
matching the numerical attribute data of the existing user with the central numerical attribute features of the user group, matching the classification type attribute data of the existing user with the central classification type attribute features of the user group, and selecting the user group to which the existing user belongs from the generated user group.
2. The method of claim 1, wherein determining the central numerical attribute of the user group based on the numerical attribute data of the users in the user group comprises:
and taking the average value of the numerical attribute data of the users in the user group as the central numerical attribute characteristic of the user group.
3. The method of claim 1, wherein determining the center-typed attribute characteristics of the user group according to the classified attribute numbers of the users in the user group comprises:
and determining the center typing attribute characteristics of the user group according to the frequency of the classification attribute data of the users in the user group.
4. The method of claim 3, wherein matching the categorical attribute data of existing users with the center-typed attribute features of the user population comprises:
taking the classified attribute data of the existing users as first classified attribute data, and taking the classified attribute data except the first classified attribute data in the classified attribute total data as second classified attribute data;
taking a first frequency value as a first classification attribute feature of the first classification attribute data, and taking a second frequency value as a second classification attribute feature of the second classification attribute data, so as to obtain an existing user classification attribute feature comprising the first classification attribute feature and the second classification attribute feature;
and matching the existing user type-classifying attribute characteristics with the center type-classifying attribute characteristics of the user group.
5. The method of claim 4, wherein matching the existing user typing attribute features with the central typing attribute features of the user population comprises:
determining a first center-typed attribute feature of the first typed attribute data in the center-typed attribute features and a second center-typed attribute feature of the second typed attribute data in the center-typed attribute features;
taking the square of the difference between the first categorical attribute feature and the first center categorical attribute feature as a first categorical attribute feature difference;
taking the square of the difference between the second type attribute feature and the second center type attribute feature as a second type attribute feature difference;
and summing the first classification attribute feature difference and the second classification attribute feature difference to obtain a target classification attribute feature difference so as to obtain a matching result.
6. A user grouping apparatus, the apparatus comprising:
the user group generating module is used for generating a preset number of user groups according to the existing user data; wherein the user data comprises a numeric attribute and a categorical attribute;
the first attribute characteristic determining module is used for determining the central numerical attribute characteristic of the user group according to the numerical attribute data of the users in the user group;
the second attribute characteristic determining module is used for determining the center classification attribute characteristics of the user group according to the classification attribute data of the users in the user group;
and the user group selection module is used for matching the numerical attribute data of the existing user with the central numerical attribute characteristics of the user group, matching the classification attribute data of the existing user with the central classification attribute characteristics of the user group, and selecting the user group to which the existing user belongs from the generated user group.
7. The apparatus of claim 6, wherein the first attribute characteristic determination module is specifically configured to:
and taking the average value of the numerical attribute data of the users in the user group as the central numerical attribute characteristic of the user group.
8. The apparatus according to claim 6, wherein the second attribute characteristic determining module is specifically configured to:
and determining the center typing attribute characteristics of the user group according to the frequency of the classification attribute data of the users in the user group.
9. The apparatus of claim 8, wherein the user group selection module is specifically configured to:
taking the classified attribute data of the existing users as first classified attribute data, and taking the classified attribute data except the first classified attribute data in the classified attribute total data as second classified attribute data;
taking a first frequency value as a first classification attribute feature of the first classification attribute data, and taking a second frequency value as a second classification attribute feature of the second classification attribute data, so as to obtain an existing user classification attribute feature comprising the first classification attribute feature and the second classification attribute feature;
and matching the existing user type-classifying attribute characteristics with the center type-classifying attribute characteristics of the user group.
10. The apparatus of claim 9, wherein the user group selection module is further configured to:
determining a first center-typed attribute feature of the first typed attribute data in the center-typed attribute features and a second center-typed attribute feature of the second typed attribute data in the center-typed attribute features;
taking the square of the difference between the first categorical attribute feature and the first center categorical attribute feature as a first categorical attribute feature difference;
taking the square of the difference between the second type attribute feature and the second center type attribute feature as a second type attribute feature difference;
and summing the first classification attribute feature difference and the second classification attribute feature difference to obtain a target classification attribute feature difference so as to obtain a matching result.
11. An apparatus, characterized in that the apparatus further comprises:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the user clustering method of any one of claims 1-5.
12. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the user grouping method according to any one of claims 1 to 5.
CN201911107058.5A 2019-11-13 2019-11-13 User grouping method, device, equipment and medium Pending CN110852392A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911107058.5A CN110852392A (en) 2019-11-13 2019-11-13 User grouping method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911107058.5A CN110852392A (en) 2019-11-13 2019-11-13 User grouping method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN110852392A true CN110852392A (en) 2020-02-28

Family

ID=69600841

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911107058.5A Pending CN110852392A (en) 2019-11-13 2019-11-13 User grouping method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN110852392A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021151330A1 (en) * 2020-09-08 2021-08-05 平安科技(深圳)有限公司 User grouping method, apparatus and device, and computer-readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105139035A (en) * 2015-08-31 2015-12-09 浙江工业大学 Mixed attribute data flow clustering method for automatically determining clustering center based on density
CN108369674A (en) * 2015-12-09 2018-08-03 甲骨文国际公司 The system and method that the client with mixed attributes type is finely divided using target clustering method
CN108846687A (en) * 2018-04-02 2018-11-20 平安科技(深圳)有限公司 Client segmentation method, apparatus and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105139035A (en) * 2015-08-31 2015-12-09 浙江工业大学 Mixed attribute data flow clustering method for automatically determining clustering center based on density
CN108369674A (en) * 2015-12-09 2018-08-03 甲骨文国际公司 The system and method that the client with mixed attributes type is finely divided using target clustering method
CN108846687A (en) * 2018-04-02 2018-11-20 平安科技(深圳)有限公司 Client segmentation method, apparatus and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
常茜茜,张月琴: "一种基于划分的混合数据聚类算法", 《计算机应用与软件》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021151330A1 (en) * 2020-09-08 2021-08-05 平安科技(深圳)有限公司 User grouping method, apparatus and device, and computer-readable storage medium

Similar Documents

Publication Publication Date Title
CN110837931B (en) Customer churn prediction method, device and storage medium
WO2021174944A1 (en) Message push method based on target activity, and related device
CN111340616B (en) Method, device, equipment and medium for approving online loan
WO2007106786A2 (en) Methods and systems for multi-credit reporting agency data modeling
CN111783039B (en) Risk determination method, risk determination device, computer system and storage medium
CN111709826A (en) Target information determination method and device
CN109034199B (en) Data processing method and device, storage medium and electronic equipment
CN111125266A (en) Data processing method, device, equipment and storage medium
CN112232950A (en) Loan risk assessment method and device, equipment and computer-readable storage medium
CN113034046A (en) Data risk metering method and device, electronic equipment and storage medium
CN111754287B (en) Article screening method, apparatus, device and storage medium
CN113313538A (en) User consumption capacity prediction method and device, electronic equipment and storage medium
CN113392920B (en) Method, apparatus, device, medium, and program product for generating cheating prediction model
CN112950359A (en) User identification method and device
CN110852392A (en) User grouping method, device, equipment and medium
CN116862641A (en) Credit product recommendation method and device, electronic equipment and storage medium
CN115237970A (en) Data prediction method, device, equipment, storage medium and program product
CN115099933A (en) Service budget method, device and equipment
CN114626940A (en) Data analysis method and device and electronic equipment
JP2003323601A (en) Predicting device with reliability scale
CN113988878A (en) Graph database technology-based anti-fraud method and system
CN113298447A (en) Financing lease management system based on data processing
CN113435748A (en) Dot state determination method and apparatus, electronic device and storage medium
CN112950392A (en) Information display method, posterior information determination method and device and related equipment
Dixon et al. A Bayesian approach to ranking private companies based on predictive indicators

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220909

Address after: 12 / F, 15 / F, 99 Yincheng Road, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai, 200120

Applicant after: Jianxin Financial Science and Technology Co.,Ltd.

Address before: 25 Financial Street, Xicheng District, Beijing 100033

Applicant before: CHINA CONSTRUCTION BANK Corp.

Applicant before: Jianxin Financial Science and Technology Co.,Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200228