CN117312892A

CN117312892A - User clustering method, device, computer equipment and storage medium

Info

Publication number: CN117312892A
Application number: CN202311196615.1A
Authority: CN
Inventors: 李娅汝
Original assignee: Bank of China Ltd
Current assignee: Bank of China Ltd
Priority date: 2023-09-15
Filing date: 2023-09-15
Publication date: 2023-12-29

Abstract

The application relates to a user clustering method, a user clustering device, a computer device, a storage medium and a computer program product. The method comprises the following steps: acquiring an initial connection parameter value and a proximity graph corresponding to the initial connection parameter value; determining a maximum group corresponding to the proximity graph based on the connection relation among the users to be divided in the proximity graph; clustering the maximum groups based on users to be partitioned included in each maximum group to obtain an initial clustering result; evaluating the initial clustering result to obtain an evaluation value corresponding to the initial clustering result; updating the initial connection parameter value to obtain an updated initial connection parameter value, repeatedly executing the steps of obtaining the initial connection parameter value and the adjacent graph corresponding to the initial connection parameter value until a preset circulation stopping condition is reached, and determining an initial clustering result corresponding to the maximum evaluation value as a target clustering result. By adopting the method, the accuracy of user clustering can be improved.

Description

User clustering method, device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technology, and in particular, to a user clustering method, apparatus, computer device, storage medium, and computer program product.

Background

With the development of computer technology, more and more users use a banking application program, and banking recommends financial products for users through the banking application program, so that in order to improve the accuracy of recommendation, users are required to be clustered, namely, users with the same or similar characteristics are divided into users, and then financial products meeting the user requirements are recommended for the users according to the categories corresponding to the users.

In the conventional technology, the categories corresponding to the users are determined by using a neural network model, the categories corresponding to the users are predicted by the neural network model based on a large amount of training data, and the users are clustered according to the categories corresponding to the users, so that the characteristics of each other user are not fully considered, and the accuracy of the user clustering is low.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a user clustering method, apparatus, computer device, computer-readable storage medium, and computer program product that can improve user clustering accuracy.

In a first aspect, the present application provides a user clustering method. The method comprises the following steps:

acquiring an initial connection parameter value and a proximity graph corresponding to the initial connection parameter value; the proximity graph comprises a plurality of users to be divided, and the number of other users to be divided connected with each user to be divided is equal to the initial connection parameter value;

Determining a maximum group corresponding to the proximity graph based on the connection relation between the users to be divided in the proximity graph; the maximum group comprises a plurality of users to be divided, and any two users to be divided have a connection relationship;

clustering the maximum groups based on users to be divided included in each maximum group to obtain an initial clustering result;

evaluating the initial clustering result to obtain an evaluation value corresponding to the initial clustering result;

updating the initial connection parameter value to obtain an updated initial connection parameter value, repeatedly executing the steps of obtaining the initial connection parameter value and the adjacent graph corresponding to the initial connection parameter value until a preset circulation stopping condition is reached, and determining an initial clustering result corresponding to the maximum evaluation value as a target clustering result.

In one embodiment, the obtaining the initial connection parameter value and the proximity graph corresponding to the initial connection parameter value includes:

acquiring initial connection parameter values and characteristic parameter sets corresponding to each user to be divided;

determining a connection user corresponding to each user to be divided based on the characteristic parameter set corresponding to the user to be divided and the characteristic parameter set corresponding to other users to be divided; the number of the connection users corresponding to the users to be divided is equal to the initial connection parameter value;

And connecting the users to be divided with the corresponding connection users to obtain the proximity graph corresponding to the initial connection parameter value.

In one embodiment, the determining, for each user to be divided, a connection user corresponding to the user to be divided based on the feature parameter set corresponding to the user to be divided and the feature parameter set corresponding to the other user to be divided includes:

for each user to be divided, determining cosine similarity between a feature parameter set corresponding to the user to be divided and feature parameter sets corresponding to other users to be divided, and obtaining feature similarity between the user to be divided and the other users to be divided;

and comparing the feature similarities, and determining other users to be divided corresponding to the feature similarities of the initial connection parameter value with the maximum value as connection users corresponding to the users to be divided.

In one embodiment, the obtaining the feature parameter set corresponding to each user to be divided includes:

acquiring an initial feature set corresponding to the user to be divided;

performing digital processing on the initial feature set to obtain an initial parameter set corresponding to the user to be divided; the initial parameter set comprises parameter values corresponding to a plurality of dimensions;

Carrying out standardization processing on parameter values of the same dimension of a plurality of users to be divided aiming at each dimension to obtain characteristic parameters corresponding to the dimension of the users to be divided;

and obtaining a characteristic parameter set corresponding to the user to be divided based on the characteristic parameters of each dimension of the user to be divided.

In one embodiment, the clustering processing is performed on the maximum groups based on the users to be divided included in each maximum group to obtain an initial clustering result, including:

counting the users to be divided included in the maximum groups aiming at each maximum group to obtain the corresponding counting number of the maximum groups;

and clustering the maximum clusters with the same statistical quantity to obtain an initial clustering result.

In one embodiment, the step of updating the initial connection parameter value to obtain an updated initial connection parameter value, and repeatedly executing the step of obtaining the initial connection parameter value and the proximity graph corresponding to the initial connection parameter value until a preset cycle stop condition is reached, and determining an initial clustering result corresponding to the largest evaluation value as a target clustering result includes:

Acquiring a preset updating step length, and fusing the initial connection parameter value with the preset updating step length to obtain an updated initial connection parameter value;

repeating the steps of obtaining the initial connection parameter value and the adjacent graph corresponding to the initial connection parameter value until a preset circulation stopping condition is reached, and obtaining a plurality of initial clustering results and evaluation values corresponding to the initial clustering results;

and comparing the evaluation values, and determining an initial clustering result corresponding to the largest evaluation value as a target clustering result.

In one embodiment, the target clustering result comprises a plurality of clustering sets, and each clustering set comprises a plurality of users to be partitioned; the method further comprises the steps of:

determining a target product identifier corresponding to each cluster set;

and sending the target product identifier to each user to be divided in the corresponding clustering set.

In a second aspect, the present application further provides a user clustering apparatus. The device comprises:

the acquisition module is used for acquiring the initial connection parameter value and the adjacent graph corresponding to the initial connection parameter value; the proximity graph comprises a plurality of users to be divided, and the number of other users to be divided connected with each user to be divided is equal to the initial connection parameter value;

The determining module is used for determining the maximum clique corresponding to the proximity graph based on the connection relation between the users to be divided in the proximity graph; the maximum group comprises a plurality of users to be divided, and any two users to be divided have a connection relationship;

the clustering module is used for carrying out clustering processing on the maximum groups based on the users to be partitioned included in each maximum group to obtain an initial clustering result;

the evaluation module is used for evaluating the initial clustering result to obtain an evaluation value corresponding to the initial clustering result;

and the comparison module is used for updating the initial connection parameter value to obtain an updated initial connection parameter value, repeatedly executing the steps of obtaining the initial connection parameter value and the adjacent graph corresponding to the initial connection parameter value until a preset circulation stopping condition is reached, and determining an initial clustering result corresponding to the maximum evaluation value as a target clustering result.

In a third aspect, the present application also provides a computer device comprising a memory storing a computer program and a processor implementing the steps of the method of any one of the first aspects when the computer program is executed by the processor.

In a fourth aspect, the present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of any of the first aspects.

In a fifth aspect, the present application also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method of any of the first aspects.

According to the user clustering method, the device, the computer equipment, the storage medium and the computer program product, the initial connection parameter value and the adjacent graph corresponding to the initial connection parameter value are obtained, the connection relation in the adjacent graph characterizes the similarity or the same characteristics among users to be partitioned, the maximum cluster corresponding to the adjacent graph is determined based on the connection relation among the users to be partitioned in the adjacent graph, the similarity or the same characteristics among any two users to be partitioned in the maximum cluster are determined, the initial connection parameter value is obtained by repeating the first clustering of the users to be partitioned with the similarity or the same characteristics, the clustering processing is carried out on the maximum clusters based on the users to be partitioned included in each maximum cluster, the initial clustering result is obtained, the second clustering is carried out on the maximum clusters according to the similarity or the same characteristics among the maximum clusters, the initial clustering result is evaluated, the evaluation value corresponding to the initial clustering result is obtained, the initial connection parameter value is updated, the initial connection parameter value is repeatedly obtained, the clustering is carried out on the maximum clustering result is accurately, the clustering is carried out until the maximum clustering result is accurately achieved, the clustering result is accurately, and the clustering is accurately achieved through the clustering of the preset, and the clustering result is accurately, and the clustering is accurately achieved.

Drawings

FIG. 1 is a diagram of an application environment for a user clustering method in one embodiment;

FIG. 2 is a flow diagram of a user clustering method in one embodiment;

FIG. 3 is a flow chart of a proximity graph determination step in one embodiment;

FIG. 4 is a flow chart of a feature parameter set determination step in one embodiment;

FIG. 5 is a flowchart illustrating a target cluster result determination step in one embodiment;

FIG. 6 is a block diagram of a user clustering device in one embodiment;

fig. 7 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

The user clustering method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server. The terminal and the server can be independently used for executing the user clustering method provided in the embodiment of the application. The terminal and the server can also cooperate to perform the user clustering method provided in the embodiments of the present application. For example, the computer device obtains an initial connection parameter value and a proximity graph corresponding to the initial connection parameter value; the proximity graph comprises a plurality of users to be divided, and the number of other users to be divided connected with each user to be divided is equal to the initial connection parameter value; determining a maximum group corresponding to the proximity graph based on the connection relation among the users to be divided in the proximity graph; the maximum group comprises a plurality of users to be divided, and any two users to be divided have a connection relationship; clustering the maximum groups based on users to be partitioned included in each maximum group to obtain an initial clustering result; evaluating the initial clustering result to obtain an evaluation value corresponding to the initial clustering result; updating the initial connection parameter value to obtain an updated initial connection parameter value, repeatedly executing the steps of obtaining the initial connection parameter value and the adjacent graph corresponding to the initial connection parameter value until a preset circulation stopping condition is reached, and determining an initial clustering result corresponding to the maximum evaluation value as a target clustering result. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.

In one embodiment, as shown in fig. 2, a user clustering method is provided, and this embodiment is described by taking the application of the method to a computer device as an example, and includes steps 202 to 210.

Step 202, obtaining an initial connection parameter value and a proximity graph corresponding to the initial connection parameter value; the proximity graph comprises a plurality of users to be divided, and the number of other users to be divided connected with each user to be divided is equal to the initial connection parameter value.

The initial connection parameter value refers to an initial value corresponding to a connection parameter, and the connection parameter refers to a parameter representing the number of other users to be divided connected with each user to be divided in the proximity graph. For example, if the initial connection parameter value is 5, the number of other users to be divided connected to each user to be divided in the proximity graph is 5. The initial connection parameter value can be set according to actual requirements. The proximity graph refers to a graph structure representing a proximity relation between users to be divided, and it can be understood that each object to be divided is a node in the proximity graph, and a connection relation between the nodes characterizes the proximity relation between the corresponding object to be divided and the object to be divided. The initial connection parameter values are different, and the proximity graph formed between the objects to be divided is different. The users to be divided refer to users who need to be clustered. Such as a user of a bank.

The computer device obtains an initial connection parameter value and a proximity graph corresponding to the initial connection parameter value, wherein the proximity graph comprises a plurality of users to be divided, and the number of other users to be divided connected with each user to be divided is equal to the initial connection parameter value.

Step 204, determining a maximum group corresponding to the proximity graph based on the connection relation between the users to be divided in the proximity graph; the maximum group comprises a plurality of users to be divided, and any two users to be divided have a connection relationship.

The connection relationship refers to a relationship represented by a connection line segment in the adjacent graph. The maximum clique refers to a set composed of a plurality of objects to be divided in the adjacent graph, any two objects to be divided contained in the set have a connection relationship, and the objects to be divided in any maximum clique can not be added in the set. A neighborhood graph may have multiple blobs.

The computer device determines, by using a preset method, maximum groups corresponding to the proximity graph based on connection relationships between the users to be divided in the proximity graph, where each maximum group includes a plurality of users to be divided, and any two users to be divided in the maximum groups have connection relationships therebetween. The preset method may be a Bron-kerbosch algorithm (a maximum mass search algorithm), a modified Bron-kerbosch algorithm, or the like.

And 206, clustering the maximum groups based on the users to be divided included in each maximum group to obtain an initial clustering result.

The clustering process refers to a process of forming a set of the clusters having the same or similar characteristics according to the characteristics of the clusters. The initial clustering result is the result obtained by clustering the maximum clusters. The initial clustering result comprises a plurality of clustering sets, and a plurality of objects to be partitioned in the same clustering set have the same or similar characteristics.

Illustratively, the computer device performs clustering processing on the maximum cliques based on the users to be divided included in each maximum clique, and obtains an initial clustering result.

In one embodiment, the computer device clusters the biggest cliques with the same object to be partitioned to obtain an initial clustering result.

And step 208, evaluating the initial clustering result to obtain an evaluation value corresponding to the initial clustering result.

Wherein, the evaluation refers to evaluating the accuracy of the initial clustering result. The indices evaluated include, but are not limited to, FMI (Adjusted Rand Index ) and RI (Rand Index). The evaluation value is a numerical value for representing the accuracy of the initial clustering result, and the larger the evaluation value is, the higher the accuracy of the initial clustering result is represented.

The computer device evaluates the initial clustering result based on a preset evaluation index to obtain an evaluation value corresponding to the initial clustering result.

Step 210, updating the initial connection parameter value to obtain an updated initial connection parameter value, repeating the steps of obtaining the initial connection parameter value and the proximity graph corresponding to the initial connection parameter value until a preset cycle stop condition is reached, and determining an initial clustering result corresponding to the maximum evaluation value as a target clustering result.

The target clustering result is a final clustering result, and can be understood as an initial clustering result with highest accuracy. The preset cycle stop condition refers to a preset cycle stop condition, the preset cycle stop condition can be set according to actual requirements, and the preset cycle stop condition can be that the cycle times reach preset times, the evaluation value is greater than or equal to a preset evaluation value, and the like.

The computer device obtains a preset update step length, updates the initial connection parameter value based on the preset update step length, obtains an updated initial connection parameter value, repeatedly executes steps 202-210 until a preset circulation stop condition is reached, obtains a plurality of initial clustering results, and an evaluation value corresponding to each initial clustering result, and determines an initial clustering result corresponding to the largest evaluation value as a target clustering result.

In this embodiment, an initial connection parameter value and a proximity graph corresponding to the initial connection parameter value are obtained, a connection relationship in the proximity graph characterizes similarity or the same characteristic among users to be partitioned, a maximum cluster corresponding to the proximity graph is determined based on the connection relationship among the users to be partitioned in the proximity graph, any two users to be partitioned in the maximum cluster have similarity or the same characteristic, it can be understood that the users to be partitioned with similarity or the same characteristic are clustered for the first time, the maximum clusters are clustered based on the users to be partitioned included in each maximum cluster to obtain an initial clustering result, it can be understood that clustering is performed on the maximum clusters according to the characteristics among the maximum clusters, namely, second clustering is performed on the maximum clusters according to the similarity or the same characteristics among the maximum clusters, the initial clustering result is evaluated to obtain an evaluation value corresponding to the initial clustering result, the initial connection parameter value is updated to obtain the updated initial connection parameter value, the step of obtaining the initial connection parameter value is repeated, the step of clustering the proximity graph corresponding to the initial connection parameter value is performed until the preset stop condition is met, the maximum clustering result reaches the maximum clustering condition, the clustering result is accurately determined, and the clustering result is accurately clustered the clustering result is more than the target, and the clustering result is accurately determined.

In one embodiment, as shown in fig. 3, obtaining an initial connection parameter value and a proximity graph corresponding to the initial connection parameter value includes:

step 302, obtaining an initial connection parameter value and a feature parameter set corresponding to each user to be divided.

The feature parameter set refers to a set composed of parameter values corresponding to a plurality of features representing users to be divided. The feature parameter set contains parameter values of multiple dimensions, wherein each dimension represents one feature of the user to be divided, and the feature parameter set comprises dimensions including, but not limited to, payment habit, consumption level, liability condition, family environment, purchase category, purchase quantity, purchase capability and the like.

The computer device obtains initial connection parameter values and corresponding characteristic parameter sets of each user to be divided.

Step 304, for each user to be divided, determining a connection user corresponding to the user to be divided based on the feature parameter set corresponding to the user to be divided and the feature parameter set corresponding to other users to be divided; the number of connection users corresponding to the users to be divided is equal to the initial connection parameter value.

The connection user refers to other users to be divided having an adjacency relationship with the users to be divided, and it can be understood that the other users to be divided having a connection relationship with the users to be divided in the proximity graph.

For each user to be divided, the computer device determines the connection users corresponding to the users to be divided based on the feature parameter set corresponding to the user to be divided and the feature parameter set corresponding to other users to be divided, wherein the number of the connection users corresponding to the users to be divided is equal to the initial connection parameter value.

In one embodiment, for each user to be divided, the computer device calculates a linear distance between the user to be divided and other users to be divided based on a feature parameter set corresponding to the user to be divided and a feature parameter set corresponding to other users to be divided, and determines the other users to be divided corresponding to the minimum initial connection parameter value linear distance as a connection user corresponding to the user to be divided.

And step 306, connecting the user to be divided with the corresponding connection user to obtain a proximity graph corresponding to the initial connection parameter value.

The computer device connects the user to be divided with the corresponding connection user to obtain a proximity graph corresponding to the initial connection parameter value.

In this embodiment, according to a feature parameter set corresponding to a user to be divided and feature parameter sets corresponding to other users to be divided, a connection user corresponding to the user to be divided is determined, the user to be divided is connected with the corresponding connection user, a proximity graph corresponding to an initial connection parameter value is obtained, and a connection relationship in the proximity graph represents that the users to be divided have similar or identical characteristics.

In one embodiment, for each user to be divided, determining a connection user corresponding to the user to be divided based on a feature parameter set corresponding to the user to be divided and a feature parameter set corresponding to other users to be divided includes:

for each user to be divided, determining cosine similarity between a feature parameter set corresponding to the user to be divided and feature parameter sets corresponding to other users to be divided, and obtaining feature similarity between the user to be divided and other users to be divided; and comparing the feature similarities, and determining other users to be divided corresponding to the feature similarities of the maximum initial connection parameter value as connection users corresponding to the users to be divided.

The cosine similarity, also called cosine similarity, refers to the cosine value of the included angle of the two feature parameter sets, and the cosine value is used for evaluating the similarity between the users to be divided corresponding to the two feature parameter sets. Feature similarity refers to a numerical value used to characterize similarity between users to be partitioned and other users to be partitioned.

For each user to be divided, the computer device determines cosine similarity between the feature parameter set corresponding to the user to be divided and the feature parameter sets corresponding to other users to be divided, obtains feature similarity between the user to be divided and other users to be divided, compares a plurality of feature similarities corresponding to the user to be divided, and determines other users to be divided corresponding to the largest initial connection parameter value feature similarity as connection users corresponding to the user to be divided.

In this embodiment, by determining the other users to be divided corresponding to the feature similarities of the largest initial connection parameter values as the connection users corresponding to the users to be divided, it can be understood that the other users to be divided most similar to the features of the users to be divided are determined as the connection users of the users to be divided, and accurate basic data is provided for generating the proximity graph.

In one embodiment, as shown in fig. 4, obtaining a set of feature parameters corresponding to each user to be divided includes:

step 402, obtaining an initial feature set corresponding to a user to be divided.

The initial feature set refers to a set composed of feature initial values corresponding to a plurality of features of a user to be divided, and the feature initial values refer to unprocessed feature values.

Illustratively, the computer device obtains an initial feature set corresponding to the user to be partitioned.

Step 404, performing digital processing on the initial feature set to obtain an initial parameter set corresponding to the user to be divided; the initial parameter set comprises parameter values corresponding to a plurality of dimensions.

Where the digitization process refers to the process of converting non-digitized feature values, which refer to features described in text, for example, the profession of the user to be divided, into numbers. The initial parameter set refers to a set composed of a plurality of initial parameter values corresponding to users to be divided, and can be understood as an initial parameter set obtained by digitizing the initial feature set. The dimensions refer to the characteristics of the users to be divided, and different dimensions represent different characteristics. The parameter value refers to the initial parameter value corresponding to the feature.

The computer device performs digital processing on the initial feature set to obtain an initial parameter set corresponding to the user to be divided, wherein the initial parameter set comprises parameter values corresponding to a plurality of dimensions.

Step 406, for each dimension, performing standardization processing on parameter values of the same dimension of the plurality of users to be divided, to obtain feature parameters corresponding to the dimension of the users to be divided.

The normalization process is a normalization process, and it is understood that the parameter value is converted into a numerical value between 0 and 1. The characteristic parameters refer to values between 0 and 1 obtained by carrying out standardization processing on parameter values.

For each dimension, the computer device calculates the mean value and standard deviation of parameter values of the same dimension of a plurality of users to be divided, and based on the mean value and standard deviation, adopts a normal function (the normal function is a mathematical function used for normalizing the sizes of the elements of the vector or the matrix so that the sum of the elements is 1) to perform normalization processing, so as to obtain the feature parameters corresponding to the dimension of the users to be divided.

Step 408, obtaining a feature parameter set corresponding to the user to be divided based on the feature parameters of each dimension of the user to be divided.

The computer device illustratively composes the feature parameters of each dimension of the users to be partitioned into a feature parameter set corresponding to the users to be partitioned.

In this embodiment, the initial feature set corresponding to the user to be divided is subjected to digital processing and standardized processing to obtain the feature parameter set corresponding to the user to be divided, and feature parameters of each dimension in the feature parameter set are numbers between 0 and 1, so that basic data is provided for calculating cosine similarity between the user to be divided and other users to be divided.

In one embodiment, clustering is performed on the maximum clusters based on the users to be partitioned included in each maximum cluster to obtain an initial clustering result, including:

counting the users to be divided included in the maximum groups aiming at each maximum group to obtain the corresponding counting number of the maximum groups; and clustering the maximum clusters with the same statistical quantity to obtain an initial clustering result.

The statistical quantity refers to the quantity of users to be divided contained in a biggest group.

The computer device counts the users to be divided included in each maximum group respectively to obtain the corresponding statistical quantity of the maximum groups, and then clusters the maximum groups with the same statistical quantity to obtain an initial clustering result.

In this embodiment, the statistical number of the maximum clusters is used as the characteristic of the maximum clusters, and the maximum clusters with the same characteristic are clustered to obtain an initial clustering result.

In one embodiment, as shown in fig. 5, updating the initial connection parameter value to obtain an updated initial connection parameter value, repeatedly executing the steps of obtaining the initial connection parameter value and the proximity graph corresponding to the initial connection parameter value until a preset cycle stop condition is reached, determining an initial clustering result corresponding to the maximum evaluation value as a target clustering result, including:

step 502, obtaining a preset updating step length, and fusing the initial connection parameter value and the preset updating step length to obtain an updated initial connection parameter value.

The preset update step length refers to a preset update step length, and the preset update step length can be set according to actual requirements.

The computer device obtains a preset update step length, and adds the initial connection parameter value to the preset update step length to obtain an updated initial connection parameter value.

And step 504, repeating the steps of obtaining the initial connection parameter value and the adjacent graph corresponding to the initial connection parameter value until a preset circulation stop condition is reached, and obtaining a plurality of initial clustering results and evaluation values corresponding to the initial clustering results.

Illustratively, the computer device repeatedly executes steps 202 to 210 until a preset cycle stop condition is reached, and a plurality of initial clustering results and evaluation values corresponding to the initial clustering results are obtained.

Step 506, comparing the evaluation values, and determining the initial clustering result corresponding to the maximum evaluation value as the target clustering result.

Illustratively, the computer device compares the plurality of evaluation values and determines an initial cluster result corresponding to the largest evaluation value as a target cluster result.

In this embodiment, an initial clustering result corresponding to the maximum evaluation value is determined as a target clustering result, the larger the evaluation value is, the more accurate the clustering result is represented, the objects to be partitioned are clustered for multiple times by updating the initial connection parameter value for multiple times, the initial clustering result with the highest accuracy is determined as the target clustering result, and the accuracy of the target clustering result is improved.

determining a target product identifier corresponding to each cluster set; and sending the target product identification to each user to be divided in the corresponding cluster set.

The target product identifier refers to a character string representing a product suitable for being recommended to the object to be partitioned in the cluster set, and the target product identifier can be a name or a number of the target product. The number of target product identifications may be one or more.

Illustratively, the computer device determines a target product identification corresponding to each cluster set, and then sends the target product identification to each user to be partitioned in the corresponding cluster set.

In the embodiment, the target product identifiers corresponding to the cluster sets are sent to each user to be divided in the corresponding cluster sets, so that the accuracy of sending the target product identifiers is improved.

In one exemplary embodiment, the user clustering flow is as follows:

the method comprises the steps that a computer device obtains an initial feature set corresponding to a user to be divided, digital processing is conducted on the initial feature set to obtain an initial parameter set corresponding to the user to be divided, the computer device calculates the mean value and standard deviation of parameter values of the same dimension of a plurality of users to be divided according to each dimension, and based on the mean value and the standard deviation, a normal function (the normal function is a mathematical function and is used for normalizing the sizes of elements of a vector or a matrix to enable the sum of the elements to be 1) to conduct standardized processing to obtain feature parameters corresponding to the dimension of the user to be divided, and the feature parameters of the dimension of the user to be divided are formed into the feature parameter set corresponding to the user to be divided.

The method comprises the steps that initial connection parameter values are obtained, cosine similarity between a feature parameter set corresponding to each user to be divided and feature parameter sets corresponding to other users to be divided is determined by computer equipment aiming at each user to be divided, feature similarity between the user to be divided and other users to be divided is obtained, multiple feature similarities are compared, and other users to be divided corresponding to the feature similarity with the largest initial connection parameter value are determined to be connection users corresponding to the users to be divided. And connecting the user to be divided with the corresponding connection user to obtain a proximity graph corresponding to the initial connection parameter value.

And determining maximum groups corresponding to the proximity graph based on the connection relation among the users to be partitioned in the proximity graph by using an improved Bron-kerbosch algorithm, respectively counting the users to be partitioned included in each maximum group to obtain the corresponding statistical quantity of the maximum groups, and clustering the maximum groups with the same statistical quantity to obtain an initial clustering result. And evaluating the initial clustering result to obtain an evaluation value corresponding to the initial clustering result.

Obtaining a preset updating step length, adding the preset updating step length to the initial connection parameter value to obtain an updated initial connection parameter value, and repeating the steps until a preset circulation stopping condition is reached to obtain a plurality of initial clustering results and evaluation values corresponding to the initial clustering results. Comparing the multiple evaluation values, and determining an initial clustering result corresponding to the maximum evaluation value as a target clustering result, wherein the target clustering result comprises multiple clustering sets, and each clustering set comprises multiple users to be divided. And determining a target product identifier corresponding to each cluster set, and then sending the target product identifier to each user to be divided in the corresponding cluster set.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides a user clustering device for realizing the above related user clustering method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiment of one or more user clustering devices provided below may be referred to the limitation of the user clustering method hereinabove, and will not be repeated here.

In one embodiment, as shown in fig. 6, there is provided a user clustering apparatus, including: an acquisition module 602, a determination module 604, a clustering module 606, an evaluation module 608, and a comparison module 610, wherein:

an obtaining module 602, configured to obtain an initial connection parameter value and a proximity graph corresponding to the initial connection parameter value; the proximity graph comprises a plurality of users to be divided, and the number of other users to be divided connected with each user to be divided is equal to the initial connection parameter value;

a determining module 604, configured to determine a maximum clique corresponding to the proximity graph based on a connection relationship between each user to be divided in the proximity graph; the maximum group comprises a plurality of users to be divided, and any two users to be divided have a connection relationship;

the clustering module 606 is configured to perform clustering processing on the maximum groups based on the to-be-divided users included in each maximum group, so as to obtain an initial clustering result;

the evaluation module 608 is configured to evaluate the initial clustering result to obtain an evaluation value corresponding to the initial clustering result;

the comparing module 610 is configured to update the initial connection parameter value to obtain an updated initial connection parameter value, and repeatedly perform the steps of obtaining the initial connection parameter value and the proximity graph corresponding to the initial connection parameter value until a preset cycle stop condition is reached, and determine an initial clustering result corresponding to the maximum evaluation value as a target clustering result.

In one embodiment, the acquisition module 602 is further configured to: acquiring an initial connection parameter value and a characteristic parameter set corresponding to each user to be divided; determining a connection user corresponding to each user to be divided based on the characteristic parameter set corresponding to the user to be divided and the characteristic parameter set corresponding to other users to be divided; the number of the connecting users corresponding to the users to be divided is equal to the initial connecting parameter value; and connecting the user to be divided with the corresponding connection user to obtain a proximity graph corresponding to the initial connection parameter value.

In one embodiment, the acquisition module 602 is further configured to: for each user to be divided, determining cosine similarity between a feature parameter set corresponding to the user to be divided and feature parameter sets corresponding to other users to be divided, and obtaining feature similarity between the user to be divided and other users to be divided; and comparing the feature similarities, and determining other users to be divided corresponding to the feature similarities of the maximum initial connection parameter value as connection users corresponding to the users to be divided.

In one embodiment, the acquisition module 602 is further configured to: acquiring an initial feature set corresponding to a user to be divided; performing digital processing on the initial feature set to obtain an initial parameter set corresponding to the user to be divided; the initial parameter set comprises parameter values corresponding to a plurality of dimensions; for each dimension, carrying out standardization processing on parameter values of the same dimension of a plurality of users to be divided to obtain characteristic parameters corresponding to the dimension of the users to be divided; and obtaining a characteristic parameter set corresponding to the user to be divided based on the characteristic parameters of each dimension of the user to be divided.

In one embodiment, the clustering module 606 is further configured to: counting the users to be divided included in the maximum groups aiming at each maximum group to obtain the corresponding counting number of the maximum groups; and clustering the maximum clusters with the same statistical quantity to obtain an initial clustering result.

In one embodiment, the comparison module 610 is further configured to: acquiring a preset updating step length, and fusing the initial connection parameter value with the preset updating step length to obtain an updated initial connection parameter value; repeatedly executing the steps of obtaining the initial connection parameter value and the adjacent graph corresponding to the initial connection parameter value until a preset circulation stopping condition is reached, and obtaining a plurality of initial clustering results and evaluation values corresponding to the initial clustering results; and comparing the evaluation values, and determining an initial clustering result corresponding to the maximum evaluation value as a target clustering result.

In one embodiment, the user clustering device further comprises a transmission module, and the transmission module is configured to: determining a target product identifier corresponding to each cluster set; and sending the target product identification to each user to be divided in the corresponding cluster set.

The modules in the user clustering device may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a user clustering method. The display unit of the computer device is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device. The display screen can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structure shown in fig. 7 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.

In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

It should be noted that, user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims

1. A method of user clustering, the method comprising:

2. The method according to claim 1, wherein the obtaining the initial connection parameter value and the proximity graph corresponding to the initial connection parameter value includes:

3. The method according to claim 2, wherein the determining, for each of the users to be divided, the connection user corresponding to the user to be divided based on the set of feature parameters corresponding to the user to be divided and the set of feature parameters corresponding to other users to be divided, includes:

4. The method according to claim 2, wherein the obtaining the set of feature parameters corresponding to each of the users to be divided includes:

acquiring an initial feature set corresponding to the user to be divided;

5. The method according to claim 1, wherein the clustering the maximum cliques based on the users to be partitioned included in each maximum clique to obtain an initial clustering result includes:

6. The method according to claim 1, wherein the step of updating the initial connection parameter value to obtain an updated initial connection parameter value, repeating the steps of obtaining the initial connection parameter value and the proximity graph corresponding to the initial connection parameter value until a preset cycle stop condition is reached, determining an initial clustering result corresponding to the maximum evaluation value as a target clustering result, includes:

7. The method according to claim 1, wherein the target clustering result comprises a plurality of clustering sets, and each clustering set comprises a plurality of users to be partitioned; the method further comprises the steps of:

determining a target product identifier corresponding to each cluster set;

8. A user clustering device, the device comprising:

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.