CN112541530B

CN112541530B - Data preprocessing method and device for clustering model

Info

Publication number: CN112541530B
Application number: CN202011409579.9A
Authority: CN
Inventors: 熊涛; 赵文龙; 吴若凡; 漆远
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-12-06
Filing date: 2020-12-06
Publication date: 2023-06-20
Anticipated expiration: 2040-12-06
Also published as: CN112541530A

Abstract

The embodiment of the specification provides a method for preprocessing data of a clustering model and clustering business entities by utilizing an attribute map, and provides a method for characterizing the attribute map through a characterization vector and training the clustering model by utilizing information loss transferred between the characterization vector and a prototype vector of a clustering class based on an information theory. And, this loss of information is measured by the similarity between the token vector and the mapping vector determined based on the prototype vector. Further, in determining mutual information, using an empirical probability distribution instead of the overall distribution's expectations provides a way in which mutual information can be approximated empirically. The method can effectively utilize the information theory, thereby providing a business entity clustering method for more effectively utilizing the attribute map.

Description

Data preprocessing method and device for clustering model

Technical Field

One or more embodiments of the present disclosure relate to the field of computer technology, and in particular, to a method and apparatus for preprocessing data for a clustering model and clustering business entities by using an attribute map.

Background

With the development of computer technology, the application of graph data is becoming more and more widespread. Wherein the graph data is a data form describing association relations between various entities. The visual representation of the graph data is, for example, a relational network, a knowledge graph, or the like. The graph data may generally include a plurality of nodes, each node corresponding to a respective business entity. In case that the service entity has a predefined association, the corresponding nodes of the graph data may have a corresponding association relationship. For example, in the graph data represented by a plurality of triples, the triples (a, r, b) represent that there is an association relationship r between the node a and the node b. In the visualized relationship network, the node a and the node b can be represented by a connecting edge corresponding to the association relation r.

An attribute map is map data described by each node through a plurality of attributes, and the attribute map may have some nodes with more discrete attributes. As such, in the service processing based on the attribute map, the related service processing becomes complicated. Therefore, how to perform efficient business processing for attribute graphs, especially graph data containing discrete attribute nodes, is a considerable problem.

Disclosure of Invention

One or more embodiments of the present specification describe a method and apparatus for preprocessing data for a clustering model and clustering business entities using attribute maps, so as to solve one or more of the problems mentioned in the background art.

According to a first aspect, a data preprocessing method for a clustering model is provided, wherein the clustering model is used for clustering service entities by using an attribute map, the attribute map comprises a plurality of nodes corresponding to the plurality of service entities one by one, each node has a feature vector determined based on the attribute of the corresponding service entity, the clustering model comprises a coding module, a mapping module and a judging module, and the plurality of nodes comprise a first node; the method comprises the following steps: processing the attribute map by using the coding module to obtain each characterization vector corresponding to each node, wherein the first node corresponds to a first characterization vector; determining, by the mapping module, a first mapping vector for mapping the first node to a plurality of cluster categories by using the first characterization vector, where the first mapping vector is formed by combining prototype vectors corresponding to the cluster categories, and a combination parameter is determined based on the first characterization vector; based on the discrimination module, detecting the similarity degree of the first characterization vector and the first mapping vector so as to determine the clustering loss of the clustering model, wherein the similarity degree between the first characterization vector and the first mapping vector replaces the overall distribution through the empirical distribution of the characterization vector and the mapping vector, so that the empirical mutual information determination is constructed based on the discrimination function, and the clustering loss is inversely related to the similarity degree between the first characterization vector and the first mapping vector; and adjusting model parameters of the coding module, prototype vectors and intermediate vectors in the discriminant functions in the discriminant module with the aim of minimizing the clustering loss, so as to train the clustering model.

According to one embodiment, the encoding module is a graph neural network, and the first characterization vector is determined based on a fusion result of a feature vector of the first node and a feature vector of a neighboring node thereof.

According to one embodiment, the first mapping vector is determined by: determining each importance coefficient corresponding to each prototype vector based on the first characterization vector and each prototype vector; and combining the prototype vectors in a weighted summation mode according to the combination parameters determined by the importance coefficients to obtain the first mapping vector.

According to one embodiment, each importance coefficient is determined based on an attention mechanism, each prototype vector comprising a first prototype vector, the first importance coefficient corresponding to the first prototype vector being positively correlated with the similarity of the first prototype vector and the first token vector and negatively correlated with the sum of the similarities of each prototype vector and the first token vector.

According to one embodiment, the detecting, based on the discriminating module, a degree of similarity of the first token vector and the first map vector includes: and determining the similarity of the first characterization vector and the first mapping vector based on the product of the first characterization vector, the intermediate vector of the discriminant function and the first mapping vector.

According to an embodiment, the cluster penalty is also positively correlated with a degree of similarity between the first token vector and other mapping vectors corresponding to other nodes.

According to one embodiment, the detecting, based on the discriminating module, a degree of similarity of the first token vector and the first map vector includes: updating the first characterization vector by using the weighted vector of the first characterization vector and the first mapping vector; and detecting the similarity degree of the updated first characterization vector and the first mapping vector based on the judging module.

According to a second aspect, a data preprocessing method for a clustering model is provided, the clustering model is used for clustering service entities by using an attribute map, wherein the attribute map comprises a plurality of nodes corresponding to the plurality of service entities one by one, each node has a feature vector determined based on the attribute of the corresponding service entity, the clustering model comprises a coding module, a mapping module and a judging module, and the plurality of nodes comprise a first node; the method comprises the following steps: processing the attribute map by using the coding module to obtain each characterization vector corresponding to each node, wherein the first node corresponds to a first characterization vector; determining the coding loss of the coding module based on the similarity degree between the first characterization vector and the first feature vector corresponding to the first node; adjusting model parameters of the coding module with the aim of minimizing the coding loss; processing the attribute map by using an encoding module with adjusted model parameters to obtain a third characterization vector corresponding to the first node; determining, by the mapping module, a first mapping vector for mapping the first node to a plurality of cluster categories by using the third characterization vector, where the first mapping vector is formed by combining prototype vectors corresponding to the cluster categories, and a combination parameter is determined based on the first characterization vector; detecting a degree of similarity of the third token vector and the first mapping vector based on the discriminant module, thereby determining a clustering loss of the clustering model, wherein the degree of similarity between the third token vector and the first mapping vector replaces an overall distribution via empirical distributions of token vectors and mapping vectors, thereby constructing empirical mutual information based on discriminant functions, and determining that the clustering loss is inversely related to the degree of similarity between the third token vector and the first mapping vector; and aiming at minimizing the clustering loss, adjusting each prototype vector and the intermediate vector in the discriminant function, so as to train the mapping module and the discriminant module.

According to one embodiment, the encoding module is a graph neural network, and the characterization vector of the first node is determined based on a fusion result of the feature vector of the first node and the feature vector of its neighboring nodes.

According to one embodiment, the degree of similarity between the first token vector and the first feature vector corresponding to the first node is measured via a first discriminant function based on the product of the first token vector, the intermediate vector of the first discriminant function, the first mapping vector, and the determined similarity of the first token vector and the first mapping vector.

According to one embodiment, the attribute map corresponds to a change map with feature vectors randomly adjusted, and in the change map, a second node corresponding to the first node is provided, and the second node corresponds to a second feature vector obtained by processing the change map by the encoding module; the coding loss also includes a negative correlation with a degree of similarity between the first feature vector and the second feature vector.

According to a third aspect, a method for clustering service entities is provided, and is used for clustering service entities by using an attribute map through a pre-trained clustering model, wherein the attribute map comprises a plurality of nodes corresponding to the plurality of service entities one by one, each node has a feature vector determined based on the attribute of the corresponding service entity, and the clustering model comprises a coding module, a mapping module and a judging module; the method comprises the following steps: processing the attribute map by using the coding module to obtain each characterization vector corresponding to each node; the mapping module is used for respectively determining mapping vectors obtained by mapping the nodes to a plurality of clustering categories by utilizing the characterization vectors, wherein a single mapping vector is formed by combining prototype vectors corresponding to the clustering categories respectively, and the combination parameters are determined based on the corresponding characterization vectors; based on the judging module, detecting the similarity degree of the characterization vector of the first node and the mapping vector of the second node; and under the condition that the similarity degree meets the preset condition, determining that the business entity corresponding to the first node and the business entity corresponding to the second node belong to the same cluster category.

According to a fourth aspect, a data preprocessing device for a clustering model is provided, where the clustering model is used for clustering service entities by using an attribute map, the attribute map includes a plurality of nodes corresponding to a plurality of service entities one to one, each node has a feature vector determined based on an attribute of a corresponding service entity, the clustering model includes an encoding module, a mapping module and a discriminating module, and the plurality of nodes includes a first node; the device comprises:

the coding unit is configured to process the attribute map by utilizing the coding module to obtain each characterization vector corresponding to each node, and the first node corresponds to the first characterization vector;

the mapping unit is configured to determine, through the mapping module, a first mapping vector for mapping the first node to a plurality of cluster categories by using the first characterization vector, wherein the first mapping vector is formed by combining prototype vectors corresponding to the cluster categories respectively, and a combination parameter is determined based on the first characterization vector;

a discrimination unit configured to detect a degree of similarity of the first token vector and the first map vector based on the discrimination module, thereby determining a clustering loss of the clustering model, wherein the degree of similarity between the first token vector and the first map vector replaces an overall distribution via an empirical distribution of token vectors and map vectors, thereby constructing an empirical mutual information determination based on a discrimination function, the clustering loss being inversely related to the degree of similarity between the first token vector and the first map vector;

And the adjusting unit is configured to adjust the model parameters of the coding module, each prototype vector and the intermediate vector in the discriminant function in the discriminant module with the aim of minimizing the clustering loss, so as to train the clustering model.

According to a fifth aspect, there is provided a data preprocessing apparatus for a clustering model for clustering business entities using an attribute map, wherein the attribute map includes a plurality of nodes in one-to-one correspondence with a plurality of business entities, each node has a feature vector determined based on an attribute of a corresponding business entity, the clustering model includes a coding module, a mapping module, and a discriminating module, and the plurality of nodes includes a first node; the device comprises:

a first discrimination unit configured to determine a coding loss of the coding module based on a degree of similarity between the first characterization vector and a first feature vector corresponding to the first node;

a first adjustment unit configured to adjust model parameters of the encoding module with the aim of minimizing the encoding loss;

The coding unit is further configured to process the attribute map by using a coding module with adjusted model parameters to obtain a third characterization vector corresponding to the first node;

the mapping unit is configured to determine, through the mapping module, a first mapping vector for mapping the first node to a plurality of cluster categories by using the third characterization vector, wherein the first mapping vector is formed by combining prototype vectors corresponding to the cluster categories respectively, and a combination parameter is determined based on the first characterization vector;

a second discrimination unit configured to detect a degree of similarity of the third characterization vector and the first mapping vector based on the discrimination module, thereby determining a clustering loss of the clustering model, wherein the degree of similarity between the third characterization vector and the first mapping vector replaces an overall distribution via empirical distributions of characterization vectors and mapping vectors, thereby constructing empirical mutual information determination based on a discrimination function, the clustering loss being inversely related to the degree of similarity between the third characterization vector and the first mapping vector;

and the second adjusting unit is configured to adjust each prototype vector and the intermediate vector in the discriminant function with the aim of minimizing the clustering loss, so as to train the mapping module and the discriminant module.

According to a sixth aspect, there is provided an apparatus for clustering business entities, configured to perform business entity clustering by using an attribute map through a pre-trained clustering model, where the attribute map includes a plurality of nodes corresponding to a plurality of business entities one to one, each node has a feature vector determined based on an attribute of a corresponding business entity, and the clustering model includes a coding module, a mapping module, and a discriminating module; the device comprises:

the coding unit is configured to process the attribute map by utilizing the coding module to obtain each characterization vector corresponding to each node;

the mapping unit is configured to respectively determine mapping vectors obtained by mapping the nodes to a plurality of clustering categories by utilizing the characterization vectors through the mapping module, wherein a single mapping vector is formed by combining prototype vectors respectively corresponding to the clustering categories, and combination parameters are determined based on the corresponding characterization vectors;

a discriminating unit configured to detect a degree of similarity of the characterization vector of the first node and the mapping vector of the second node based on the discriminating module;

and the determining unit is configured to determine that the service entity corresponding to the first node and the service entity corresponding to the second node belong to the same cluster category under the condition that the similarity degree meets a preset condition.

According to a seventh aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first, second or third aspects.

According to an eighth aspect, there is provided a computing device comprising a memory and a processor, characterised in that the memory has executable code stored therein, the processor, when executing the executable code, implementing the method of the first, second or third aspect.

According to the method and the device provided by the embodiment of the specification, based on information theory, the attribute graph is characterized by the characterization vector, and the clustering model is trained by utilizing the information loss transferred between the characterization vector and the prototype vector of the clustering category. Further, in determining mutual information, using an empirical probability distribution instead of the overall distribution's expectations provides a way in which mutual information can be approximated empirically. The method can effectively utilize the information theory and provide a business entity clustering method for more effectively utilizing the attribute map.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a specific implementation architecture under the technical concepts of the present specification;

FIG. 2 is a schematic diagram of a specific architecture of a clustering model under the technical concept of the present specification;

FIG. 3 illustrates a method flow diagram for data preprocessing for a cluster model, according to one embodiment;

FIG. 4 shows a method flow diagram for data preprocessing for a cluster model, according to another embodiment;

FIG. 5 illustrates a flow chart of a method of clustering business entities according to another embodiment;

FIG. 6 shows a schematic block diagram of an apparatus for data preprocessing for a cluster model, according to one embodiment;

FIG. 7 shows a schematic block diagram of an apparatus for data preprocessing for a cluster model according to another embodiment;

fig. 8 shows a schematic block diagram of an apparatus for clustering business entities according to another embodiment.

Detailed Description

The following describes the scheme provided in the present specification with reference to the drawings.

First, a description will be given with reference to one embodiment scenario shown in fig. 1. As shown in fig. 1, a specific implementation scenario of node clustering based on graph data is shown. In this implementation scenario, the computing platform may acquire pre-built graph data and then cluster nodes in the graph data through a clustering model for the graph data.

Each entity corresponding to each node of the graph data is associated with a specific business scenario. In the case where a specific service scenario is related to a user, such as community discovery or user grouping, each service entity corresponding to each node in the graph data may be, for example, a user or the like. In a specific scenario of paper classification, social platform article classification, etc., each business entity corresponding to each node in the graph data may be, for example, an article, etc. In other specific service scenarios, the service entity corresponding to the graph data may also be any other reasonable entity, which is not limited herein.

In the graph data, the entity corresponding to a single node can have various attributes on which the clusters depend, and therefore, the entity can also be called an attribute graph. For example, for a business entity of a user, there may be attributes such as age, income, stay location, and the like. For business entities of articles, attributes such as keywords, belonging fields, article spreads and the like can be corresponding. In an alternative embodiment, every two nodes with an association relationship may also have an association attribute, and the association attribute may also be used as an edge attribute of a corresponding connection edge. For example, users associated by social behavior may have social properties (e.g., chat frequency, transfer behavior, redness package behavior, etc.) between them, i.e., association properties between the respective two nodes, which may be edge properties of a connecting edge between the respective two nodes.

The computing platform may process the attribute map described above through a clustering model to classify each node into a plurality of categories. As shown in fig. 1, the service entities corresponding to nodes such as X1, X3, X8, X9 and … … are a cluster type, and the service entities corresponding to nodes such as X2, X4, X6 and X7 and … … are a cluster type … …

Under the condition that the business entity corresponding to the node has discrete attributes, the clustering is directly carried out through the node expression vector determined by the attributes, and the accuracy of a clustering result can be influenced. For this purpose, under the implementation architecture of the present specification, the clustering model may be divided into two parts, a characterization of the business entity and clustering based on the characterization. For this purpose, the present specification provides a clustering model of the coding module, the mapping module, and the discrimination module as shown in fig. 1. Wherein the encoding module may be implemented as an encoder for determining respective corresponding token vectors for the respective nodes. The token vector may describe the corresponding node more fully and accurately. The token vector may be viewed as a hidden representation that allows nodes to cluster better. The mapping module may complete the mapping of the token vector to the cluster category and determine a corresponding mapping vector. The discrimination module clusters each node based on discrimination of the degree of similarity of the representative vector and the mapped vector.

For ease of understanding, the technical idea process of the present specification is described below in connection with the model architecture shown in fig. 2.

Assuming that a single node extracts features through attributes, the obtained feature vector is marked as X, and each dimension of X passes through X _i And (3) representing. The map data at this time is "original map" in the input map data shown in fig. 2, G (V, E, X). Assuming that each feature makes the sample prone to a cluster category, a single feature x _i The mapped cluster category is denoted as y _i . Assuming that the total number of clustering categories is K, y _i Values are taken among the K cluster categories. According to the information theory, the mutual information of two random variables is the KL information divergence of the product of the joint distribution of the two random variables and the marginal distribution. The larger the information divergence, the marginal distribution product and the union are shownThe larger the difference in distribution, the higher the correlation of the two random variables. For example, the feature vector X and the cluster Y of the sample may be two random variables, and then the information loss from X to Y may be represented by KL divergence as an example:

those skilled in the art will readily appreciate that in model-to-data processing, information transfer between input and output variables can be seen. This transfer of information can be described by mutual information. When mutual information is maximum, the information loss can be considered as 0. Thus, in a clustering process, it is desirable to maximize the mutual information between the input and the output. To obtain an empirical estimate, the empirical distribution of X is used to approximate p (X), and a cluster model p is proposed _θ (y|x). The empirical mutual information thus generated

The method comprises the following steps:

to better characterize a node, a characterization vector Z of the node is defined, which may be the encoding result of each feature. Assume that a code pattern epsilon is used _θ Then there is a relationship z=epsilon between the token vector Z and the feature vector X _θ (X). In this way, discrete features may be fused in a coded manner. The coding scheme may be any suitable coding scheme, for example, a graph neural network GCN coding scheme shown in fig. 2, or an ebedding scheme.

Taking a graph neural network as an example, feature vectors of the node and neighbor nodes can be fused in the encoding process, so that the encoding vector for more comprehensively expressing each node is obtained. The single-layer graph neural network can update the expression vector according to the fusion vector of the current expression vectors of the neural network and the neighbor nodes. The neighbor nodes of a single node may be nodes to which it is connected by connecting edges. The neighboring nodes used in the encoding process may be predefined, for example, neighboring nodes within 3 orders, or a random predetermined number (e.g., 5) of nodes in the first order neighboring nodes, etc. The fusion of the current expression vectors (which may be feature vectors initially) of the node itself and the neighboring node may be, for example: adding, averaging, weighted summing, taking a maximum value, and so on. Taking weighted summation as an example, the weight may be determined in various manners, such as a negative correlation with the degree of the node (the number of first-order neighbor nodes), an importance value determined by the attention mechanism, and the like. In a specific example, for the node v, after the current expression vector of the neighboring node is fused, the node expression vector of the single-layer graph neural network (layer i) may be updated as follows:

Wherein W represents model parameters of the first layer, d represents the degree of the node, N _v And the neighbor node set of the node v is represented, h represents the current node expression vector determined by the first layer-1, and when l=1, the current node expression vector determined by the first layer-1 is the characteristic vector of the corresponding node. Through the processing of a plurality of layers of graph neural networks, corresponding characterization vectors, such as epsilon in FIG. 2, can be obtained for each node in the original graph _θ (X)。

According to the principle of information theory, assuming that each cluster corresponds to a vector, called prototype vector, for example denoted μ, the token vector Z is quantized randomly to the respective prototype vector, that is to say one of the classifications must be chosen, whose mutual information should be maximized. The mapping vector of randomly quantizing Z to each prototype vector is denoted as Q (Z), and the mutual information of mapping Z to each cluster through the cluster model can be denoted as:

wherein the mapping vector Q (Z) can be combined with each prototype vector mu _j And (5) associating. Q (Z)Each prototype vector mu _j Can be determined in various reasonable ways. As a specific example, the attention mechanism may be utilized to determine. For example, let the attention value of each prototype vector be:

where Z is a specific example of a random variable Z, such as a token vector for a specific node. The attention value represents the importance of the corresponding cluster category with respect to the individual nodes, which may also be referred to as importance value. Further, each prototype vector may be fused according to the importance value to obtain a mapping vector. The fusion method is, for example, weighting with importance value as weight, for example:

It is worth noting that Q here ^att (z, μ) is only Q ^* In the event, may be defined in a variety of reasonable ways, not limited herein. Where Z is a specific value of the random variable Z, for example, a token vector Z for node a.

The prototype vector mu can be used as a model parameter, can be randomly defined in the initial stage, and can be continuously adjusted in the process of mutual information maximization.

In the above formula, K is the number of clustering categories. In common clustering methods, such as K-means, the determination of the number of cluster categories is often easy to determine. For information-based clustering methods, the complexity of cluster embedding is typically penalized following the Regularized Information Maximization (RIM) principle, with the goal of minimizing cluster loss. For example, one method for determining the cluster loss associated with the number of cluster categories is:

wherein g is a non-negative regularization function, which can be set according to the actual situation. Selecting L within a predetermined range _rim The minimized parameter K, as a cluster category.

Those skilled in the art will appreciate that maximizing an accurate estimate of mutual information may learn entangled and useless representations, and thus, the technical concepts of the present specification provide a method for approximating mutual information determination, which may further generalize the principle of mutual information to allow for more flexible information preservation measures. For example, since information retention is usually two kinds of distributed f-divergences, mutual information is one of the selection modes, and the mutual information can be generalized to the f-divergences and is recorded as the f-information. The f information promotes the measurement index of the distribution difference from KL divergence to f divergence. The choice of f-divergence may include commonly used divergence types such as Kullback-Leiber divergence, chisquare divergence, jensen-Shannon divergence, and the like. Thus, for two random variables X, Y, the extrapolated f-information can be, for example: i _f (X，Y)＝D _f (P _XY ||P _X P _Y ). Thus, mutual information characterizing vector Z and mapping vector Q (Z) is generalized:

wherein sup represents the maximum, D _φ For discriminant functions, e.g. defined as D _φ (h，h′)＝σ(h ^T Phi h ') for determining the degree of similarity of the distribution of the other two vectors (e.g. h and h') via the intermediate vector phi, f being the convex conjugate function of f.

Representing the compliance distribution P about Z _Z Is>

Representing the overall distribution P of compliance with Z _Z And Z' obeys the overall distribution P _Q*(Z) Is a mathematical expectation of (a).

In the model training process, μ and φ may be model parameters to be adjusted. Because the mathematical expectation of the overall distribution is not available in the above equation, the mutual information is not easy to directly calculate. In order to be able to obtain a sufficiently similar representation of mutual information, the present specification proposes to use an empirical distribution instead of a global distribution (e.g. P _Z 、P _Q*(Z) ) Thereby obtaining mutual information of the approximation representation.

Further, since the distribution between the characterization vector of the node and the mapping vector of the node is as consistent as possible, that is, the mutual information determined by using the discriminant function is as large as possible, whereas the distribution between the characterization vector of the node and the mapping vectors of other nodes is as inconsistent as possible, the model loss can be determined by using the node construction samples. Model loss may be, for example, inversely related to mutual information of the same node and positively related to mutual information of different nodes. In this way, in the process of reducing model loss, mutual information of the same node tends to be maximized, and mutual information of different nodes tends to be minimized.

As a specific example, the model loss may be, for example:

wherein ε _θ (x) For one code definition of Z,

for example, Q (Z) as previously described, as determined by prototype vector μ. N is, for example, the number of positive samples, M is the number of negative samples, and C represents the negative sample pair. In one embodiment, m=n×m, M is a positive integer (e.g., 20), indicating that for a certain node, M nodes different from it are selected as negative samples. Epsilon _θ (x ⁺ ) In case of a token vector representing the current node, < +.>

A mapping vector for the negative sample selected for the node may be represented.

The model loss may also be determined by other methods, and will not be described in detail herein. In this way, model parameters can be adjusted in a direction in which model loss decreases, thereby training a cluster model. Taking a gradient descent method as an example, the model parameters can be adjusted by the gradient of the model loss to each model parameter through the gradient descent method, the Newton method and the like.

Under the implementation architecture of the present specification, the characterization part (for example, corresponding to the coding module) and the clustering part (for example, corresponding to the mapping module and the discriminating module) may be trained as a whole (to adjust corresponding model parameters), or may be trained separately. In the training process of the clustering part, the information loss of mapping from the characterization vector obtained by the characterization part to the clustering class can be minimized, or mutual information can be maximized as a target. In case the parameters of the characterizing part are substituted into the clustering module, the model parameters of the characterizing part and the clustering part can be adjusted simultaneously, e.g. epsilon _θ And mu. In some possible designs, the characterization part may also measure model loss separately, thereby adjusting the corresponding model parameters, and substituting the adjusted model parameters into the clustering part for the clustering part to adjust the corresponding parameters.

As an example, the encoding module of the characterization part may determine the characterization vector based on the depth map (Deep Graph Infomax, DGI). Depth maps are typically implemented by information-based convolutional neural networks, which can encode node feature vectors into hidden representations, i.e., token vectors, that are more suitable for clustering in an unsupervised manner. For depth maps, coding losses can be directly considered, for example:

where R is a read operator operating on a subset of X and outputting a vector of a predetermined dimension, and is parameterized by a discriminant function D _ψ And (5) performing constraint. The first term in this equation may correspond to the "+" term in fig. 2, i.e., a positive sample term, and the second term may correspond to the "-" term in fig. 2, i.e., a negative sample term. Wherein the negative in the second termThe sample item may be constructed via a node and a node different from the node. For example, for node v, M paired nodes are randomly taken from nodes other than node v to construct a negative sample

It will be appreciated that for node v, X may be a random variable X having a certain value, and X may be understood as a random variable X having a specific value.

According to one embodiment, a negative sample may be constructed using the input graph data. As shown in fig. 2, node characteristics can be randomly changed without disturbing the connection structure, and the changed graph data is assumed to be a change graph, such as

Then by applying a pattern of changes->

Is sampled to construct a negative sample. For example, for node v, from the variogram +.>

Node-concentrating node paired with node v to construct negative sample +.>

The negative sample number is for example M.

In one embodiment, the coding loss L can be based on _dgi Adjusting model parameters of the coding module to determine the structure epsilon of the coding module _θ Each of (3) parameters. For example, by L _dgi For epsilon _θ Gradient to L according to the method such as gradient descent method, newton method, etc _dgi The direction of decrease adjusts the individual model parameters. In the coding diagram epsilon _θ In the case where the parameters are well-tuned (e.g., tend to converge), they may be substituted into the model training of the subsequent part. At this time, model loss L of the clustered part _cluster Epsilon in (E) _θ Is determined, if the undetermined parameter is mu only Can be directed to L _cluster The reduced direction adjusts the model parameter mu to train the mapping module and the discriminating module of the clustering part.

In an alternative implementation, in order to improve the coding module so that it better gets a token vector that favors clustering, feedback of the mapping vector can be added to the coding module. For example, the characterization vector of the encoding result is modified to:

where ε is a predetermined parameter, typically a value between 0 and 1, for example 0.1. The improved coding module can balance between the characterization part and the clustering part, and epsilon can be understood as a balance weight parameter. Applying the improved coding results to the foregoing procedure, a clustering model based on depth maps and information theory, such as called cluster-aware deep graph infomax (CADGI), can be obtained.

Based on the theory, the specification provides a method for preprocessing data of a clustering model and clustering business entities by utilizing an attribute map. Wherein, the attribute graph can comprise a plurality of nodes, and each node can be in one-to-one correspondence with a plurality of business entities. Each node has a feature vector determined based on the attributes of the corresponding business entity. The attribute of the service entity is determined according to the actual service requirement, for example, in the case that the service entity is a user, the corresponding attribute may include the gender of the user, the action track, and the like, and in the case that the service entity is stable, the corresponding attribute may include a keyword, a domain to which the service entity belongs, and the like. The values in the respective dimensions of the feature vector may be separately quantized to describe the respective attributes. Based on the theory described in the foregoing, the clustering model proposed in the present specification may include a coding module, a mapping module, and a discriminating module.

As shown in fig. 3, a flow diagram of data preprocessing for a cluster model according to one embodiment. The execution subject of this flow may be an apparatus, device, or server having some computing power, such as the computing platform shown in fig. 1.

As shown in fig. 3, taking a first node of a plurality of nodes in an attribute map as an example, the process includes the following steps: step 301, processing the attribute map by using an encoding module to obtain each characterization vector corresponding to each node, wherein a first node corresponds to a first characterization vector; step 302, determining, by a mapping module, a first mapping vector for mapping a first node to a plurality of cluster categories by using a first characterization vector, wherein the first mapping vector is formed by combining prototype vectors corresponding to the cluster categories, and a combination parameter is determined based on the first characterization vector; step 303, detecting the similarity degree of the first characterization vector and the first mapping vector based on the discrimination module, thereby determining the clustering loss of the clustering model, wherein the similarity degree between the first characterization vector and the first mapping vector replaces the overall distribution through the empirical distribution of the characterization vector and the mapping vector, thereby constructing the empirical mutual information based on the discrimination function, and determining that the clustering loss is inversely related to the similarity degree between the first characterization vector and the first mapping vector; and 304, aiming at minimizing the clustering loss, adjusting model parameters of the coding module, each prototype vector and intermediate vectors in the discriminant function in the discriminant module, thereby training a clustering model.

First, in step 301, an attribute map is processed by using an encoding module to obtain respective token vectors corresponding to respective nodes, where a first node corresponds to a first token vector. The coding module here may be any reasonable model, such as convolutional neural network, graph neural network, etc. In the case that the coding module is a graph neural network, the characterization vector of each node may be determined based on the fusion result of the feature vector of the corresponding node and the feature vector of its neighboring node. For example, the first token vector is determined based on a fusion of the feature vector of the first node with the feature vector of its neighboring nodes. The feature vectors of the single node and its neighboring nodes may be fused, for example, as provided in equation (3), and will not be described herein. The relationship between the characterization vector Z and the vector X can be expressed, for example, as z=epsilon _θ (X)。

Next, a first mapping vector for mapping the first node to the number of cluster categories is determined by the mapping module using the first token vector, via step 302. It will be appreciated that the effect of each feature (the feature in the token vector is used in this specification) on a single cluster category is considered as a transfer of information for a single node, in accordance with the principles of information theory described earlier. The total information transfer between the characterization vector of one node and each cluster category can be represented by the mutual information between the mapping vector obtained by mapping the characterization vector to each cluster category and the characterization vector. And the mapping vector is determined based on the token vector and can be associated with the token vector.

Assuming that the nodes are clustered into K cluster categories, each cluster category may correspond to a prototype vector, e.g., the prototype vector corresponding to the ith cluster category may be denoted as μ _i . It is desirable for a single node that the information mapped by the token vector of that node to the individual nodes be consistent with the token vector, since some information may be transferred to each cluster category. Under the implementation architecture of the present specification, information that characterizes a mapping of vectors of individual nodes to individual nodes may be represented by mapping vectors. The mapping vector may be the result of a combination of K prototype vectors. The combination may be a linear combination or a nonlinear combination, and is not limited herein. Wherein the combination parameters, or mapping parameters, required in the combination process may be related to the corresponding token vectors.

Taking the first node as an example, the corresponding first mapping vector is formed by combining all prototype vectors corresponding to all cluster categories respectively, wherein the combination parameters can be determined based on the association relation between the first characterization vector and the prototype vector. This is because the association between the first token vector and the prototype vector represents the amount of information that the first token vector assigns to the corresponding prototype vector. The first mapping vector corresponding to the first node may be determined by means of equations (5), (6), for example.

The number K of cluster categories may be predetermined or determined according to a machine learning method. In an alternative embodiment, the number of cluster categories K may be determined by a Regularization Information Maximization (RIM) principle as described in formula (7), and will not be described in detail herein.

Then, in step 303, based on the discrimination module, a degree of similarity between the first token vector and the first mapping vector is detected, thereby determining a clustering loss of the clustering model. Wherein the degree of similarity between the first token vector and the first map vector may be described by mutual information. According to the principle, the mutual information is generalized to the f-information due to the great difficulty in the determination process of the mutual information, so that the expression of the empirical mutual information constructed based on the discriminant function is obtained, as described in the formula (8). After the conversion, however, this desire is difficult to determine due to the desire concerning the vector probability distribution. For this purpose, it is considered to use an empirical probability distribution instead of the desired distribution, so that the discriminant function approximates the corresponding mutual information. Wherein the discriminant function is, for example, D _φ (h，h′)＝σ(h ^T Phi h ') and the similarity of the distribution between the two vectors h and h' is discriminated by the intermediate vector.

It can be understood that in the clustering process, the better the clustering result, the greater the information retention degree between the characterization vector of the single node and the mapping vector of the single node, that is, the greater the mutual information, the smaller the loss. Thus, for the token vector and the mapping vector of the individual node itself, the cluster penalty is inversely related to the corresponding mutual information. For example, the cluster penalty is inversely related to the degree of similarity between the first token vector and the first map vector. As shown in equation (9). At this time, the current node and itself can also be considered to constitute a positive sample.

On the other hand, for the current node, its token vector should theoretically have a different distribution than the mapping vectors of other nodes. Thus, in alternative embodiments, the cluster penalty may also be positively correlated with the degree of similarity between the token vector of the current node and other mapping vectors corresponding to other nodes. At this time, the current node and other nodes may also be considered to constitute a negative sample.

According to a possible design, the clustering loss may be the sum of losses corresponding to a plurality of nodes (such as all nodes) in the attribute map, as shown in formula (9).

In alternative embodiments, the attribute map may also be changed to sample the negative samples. Such as randomly scrambling feature vectors of individual nodes, etc.

Next, in step 304, the model parameters of the coding module, the prototype vectors, and the intermediate vectors in the discriminant function in the discriminant module are adjusted with the goal of minimizing the clustering loss, thereby training the clustering model. It can be understood that in the above process, the model parameters of the coding module, the prototype vectors in the mapping module, and the intermediate vectors in the discriminant function in the discriminant module are all undetermined model parameters in the clustering model. Model loss is actually the difference between the actual situation and the expected result, and in order to make the clustering model have a better clustering effect, the model parameters can be adjusted towards the direction of decreasing the model loss. For example, the gradient of the model loss to each model parameter may be determined and then the model parameters adjusted using a gradient descent method, newton method, or the like.

In a possible design, the feedback information can be supplemented by using the mapping vector as the characterization vector, so that characterization information more suitable for clustering is obtained. For example, for a first node, the first token vector may be updated with a weighting vector of the first token vector and the first mapping vector; based on the discrimination module, the similarity degree of the updated first characterization vector and the first mapping vector is detected. The weighting weights may be artificially set super parameters, e.g. parameters between 0-1. Refer to the specific example of equation (11).

The flow shown in fig. 3 is an embodiment of adjusting model parameters together in the encoding process and in the subsequent clustering process. FIG. 4 illustrates a data preprocessing flow for a cluster model according to another embodiment. In the flow shown in fig. 4, the model parameters in the encoding process and the subsequent clustering process can also be separately adjusted. Still taking the first node as an example, the flow shown in fig. 4 includes the following steps:

and step 401, processing the attribute map by using the coding module to obtain each characterization vector corresponding to each node. Wherein the first node corresponds to the first token vector. In the case where the encoding module is a graph neural network, the token vector of the first node may be determined based on a fusion result of the feature vector of the first node and the feature vector of its neighboring nodes.

In step 402, a coding loss of the coding module is determined based on a degree of similarity between the first token vector and the first token vector corresponding to the first node. The degree of similarity between the first token vector and the first feature vector may be measured, for example, via the first discriminant function based on a product of the first token vector, an intermediate vector of the first discriminant function, and the first mapping vector, the degree of similarity of the first token vector and the first mapping vector being determined.

In alternative embodiments, the comparison of a single node to itself may be considered a positive sample, the comparison of a single node to other nodes may be considered a negative sample, and the coding loss may be inversely related to the degree of similarity of the positive sample, and positively related to the degree of similarity of the negative sample. In practice, the coding penalty may include a sum of the losses determined by the plurality of nodes in the attribute map. As shown in equation (10).

Step 403, adjusting model parameters of the coding module with the goal of minimizing coding loss. For example, the gradient of the coding loss to each model parameter may be determined, and then the model parameters may be adjusted using a gradient descent method, newton method, or the like.

And step 404, processing the attribute map by using the coding module with the adjusted model parameters to obtain a third characterization vector corresponding to the first node. After the coding module is adjusted, the attribute map can be processed by using the trained coding module, so that the characterization vector which is more beneficial to clustering is obtained for subsequent processing. At this time, the characterization vector of each node can be determined and used repeatedly in the process of training the subsequent module, so that the single data calculation amount is reduced.

Step 405, determining, by the mapping module, a first mapping vector for mapping the first node to the plurality of cluster categories using the third token vector. The first mapping vector is formed by combining prototype vectors corresponding to the clustering categories respectively, and the combination parameters are determined based on the first characterization vector.

Step 406, based on the discrimination module, detecting the similarity degree between the third characterization vector and the first mapping vector, thereby determining the clustering loss of the clustering model. Wherein the degree of similarity between the third token vector and the first map vector is determined by constructing empirical mutual information based on the discriminant function by replacing the overall distribution with the empirical distribution of the token vector and the map vector. The cluster loss is inversely related to the degree of similarity between the third token vector and the first mapping vector.

In an alternative embodiment, the attribute map corresponds to a change map with feature vectors randomly adjusted, and in the change map, a second node corresponding to the first node is provided, and the second node corresponds to a second characterization vector obtained by processing the change map by the encoding module; the coding loss also includes a negative correlation with the degree of similarity between the first feature vector and the second feature vector.

In another optional embodiment, a third node different from the first node exists in the attribute map, and the third node corresponds to a fourth characterization vector obtained by processing the change map through the encoding module; the coding loss also includes a negative correlation with the degree of similarity between the first feature vector and the fourth feature vector.

Step 407, adjusting the intermediate vectors in each prototype vector and the discriminant function with the objective of minimizing the cluster loss, thereby training the mapping module and the discriminant module.

It should be noted that, in the flow shown in fig. 4, steps 401, 405-407 are similar to steps 301-304 in fig. 3, respectively. The difference is that in the flow shown in fig. 4, loss of the coding module is determined separately, and model parameters of the coding module are adjusted, so that the trained coding module is utilized to process the attribute map, and characterization vectors of each node (corresponding to each service entity) are obtained. The token vector may be repeated for model training in subsequent steps.

Fig. 5 illustrates a flow for clustering business entities according to an embodiment of the present disclosure, where a trained clustering model may be used to cluster business entities using an attribute map. The cluster model may include an encoding module, a mapping module, and a discriminant module, and may be trained by processes such as fig. 3, 4. The attribute map may include a plurality of nodes in one-to-one correspondence with a plurality of business entities, each node having a feature vector determined based on attributes of the respective business entity.

As shown in fig. 5, the clustering procedure for the business entity may include the following steps:

and step 501, processing the attribute map by using a coding module to obtain each characterization vector corresponding to each node.

Step 502, respectively determining, by a mapping module, mapping vectors obtained by mapping each node to a plurality of cluster categories by using each characterization vector. The single mapping vector is formed by combining prototype vectors corresponding to the clustering categories respectively, and the combination parameters are determined based on the corresponding characterization vectors.

Step 503, based on the discrimination module, detecting the similarity degree between the characterization vector of the first node and the mapping vector of the second node. In this step, the similarity between the token vector and the mapping vector between every two nodes can be detected by the discrimination module, that is, for any two nodes, the similarity between the token vector of the first node and the mapping vector of the second node, or the similarity between the token vector of the second node and the mapping vector of the first node, is detected. This step may be performed for any two nodes.

Step 504, determining that the service entity corresponding to the first node and the service entity corresponding to the second node belong to the same cluster category under the condition that the similarity degree meets the preset condition. In the case of a similarity characterization, where the degree of similarity is defined by a discriminant function, the predetermined condition here may be, for example, that the resulting similarity is greater than a predetermined threshold. Taking the first node and the second node as examples, the predetermined condition may be that any one of the similarity of the token vector of the first node and the mapping vector of the second node, and the similarity of the token vector of the second node and the mapping vector of the first node is greater than a predetermined threshold, or both are greater than a predetermined threshold, or one is greater than a first predetermined threshold (e.g. 0.8) while the other is not less than a second predetermined threshold (e.g. 0.5).

It should be noted that, the information theory described above based on fig. 1 and fig. 2 is a theoretical basis of the clustering model training process shown in fig. 3 and fig. 4 and the clustering process shown in fig. 5, so some alternatives related to the information theory described above based on fig. 1 and fig. 2 may also be applied to the embodiments described in fig. 3, fig. 4 and fig. 5, which are not repeated herein.

Reviewing the above, the method provided by the embodiments of the present specification provides, based on information theory, a method for characterizing an attribute map by a characterization vector and training a cluster model using information loss transferred between the characterization vector and a prototype vector of a cluster class. And, this loss of information is measured by the similarity between the token vector and the mapping vector determined based on the prototype vector. Further, in determining mutual information, empirical probability distribution is used instead of overall distribution, providing a way in which mutual information can be approximated empirically. The method can effectively utilize the information theory and provide a business entity clustering method for more effectively utilizing the attribute map.

According to an embodiment of another aspect, a data preprocessing device for a cluster model is further provided. Fig. 6 shows a data preprocessing apparatus for a cluster model according to one embodiment of the present specification. The apparatus 600 includes:

The encoding unit 61 is configured to process the attribute map by using an encoding module to obtain each characterization vector corresponding to each node, and the first node corresponds to the first characterization vector;

the mapping unit 62 is configured to determine, by using the mapping module and using the first characterization vector, a first mapping vector for mapping the first node to the plurality of cluster categories, where the first mapping vector is formed by combining prototype vectors corresponding to the cluster categories, and the combination parameter is determined based on the first characterization vector;

a discriminating unit 63 configured to detect a degree of similarity between the first token vector and the first map vector based on the discriminating module, thereby determining a clustering loss of the clustering model, wherein the degree of similarity between the first token vector and the first map vector replaces the overall distribution via an empirical distribution of the token vector and the map vector, thereby constructing an empirical mutual information determination based on the discriminating function, the clustering loss being inversely related to the degree of similarity between the first token vector and the first map vector;

the adjusting unit 64 is configured to adjust the model parameters of the coding module, the prototype vectors and the intermediate vectors in the discriminant functions in the discriminant modules, with the aim of minimizing the clustering loss, so as to train the clustering model.

According to an embodiment of another aspect, another data preprocessing device for a cluster model is also provided. Fig. 7 shows a data preprocessing apparatus for a cluster model according to one embodiment of the present specification. The apparatus 700 includes:

the encoding unit 71 is configured to process the attribute map by using an encoding module to obtain each characterization vector corresponding to each node, and the first node corresponds to the first characterization vector;

a first discriminating unit 72 configured to determine a coding loss of the coding module based on a degree of similarity between the first characterization vector and a first feature vector corresponding to the first node;

a first adjustment unit 73 configured to adjust model parameters of the encoding module with the aim of minimizing encoding loss;

the encoding unit 71 is further configured to process the attribute map by using an encoding module with adjusted model parameters to obtain a third characterization vector corresponding to the first node;

the mapping unit 74 is configured to determine, by using the mapping module and using the third characterization vector, a first mapping vector for mapping the first node to the plurality of cluster categories, where the first mapping vector is formed by combining prototype vectors corresponding to the cluster categories, and the combination parameter is determined based on the first characterization vector;

A second discriminating unit 75 configured to detect a degree of similarity of the third characterization vector and the first mapping vector based on the discriminating module, thereby determining a clustering loss of the clustering model, wherein the degree of similarity between the third characterization vector and the first mapping vector replaces the overall distribution via an empirical distribution of the characterization vector and the mapping vector, thereby constructing an empirical mutual information determination based on the discriminating function, the clustering loss being inversely related to the degree of similarity between the third characterization vector and the first mapping vector;

the second adjustment unit 76 is configured to adjust the intermediate vectors in the respective prototype vectors and the discriminant function with the goal of minimizing the cluster loss, thereby training the mapping module and the discriminant module.

Fig. 8 illustrates an apparatus for clustering business entities according to one embodiment. The apparatus 800 includes:

the encoding unit 81 is configured to process the attribute map by using the encoding module to obtain each characterization vector corresponding to each node;

the mapping unit 82 is configured to respectively determine, by using the mapping module and by using each characterization vector, mapping vectors obtained by mapping each node to a plurality of cluster categories, wherein a single mapping vector is formed by combining prototype vectors respectively corresponding to each cluster category, and combination parameters are determined based on the corresponding characterization vectors;

A discriminating unit 83 configured to detect a degree of similarity of the characterization vector of the first node and the mapping vector of the second node based on the discriminating module;

and a determining unit 84, configured to determine that the service entity corresponding to the first node and the service entity corresponding to the second node belong to the same cluster category when the similarity degree satisfies a predetermined condition.

It should be noted that the

apparatuses

600, 700, 800 shown in fig. 6, 7, 8 are apparatus embodiments corresponding to the method embodiments shown in fig. 3, 4, 5, respectively, and the corresponding descriptions in the method embodiments shown in fig. 3, 4, 5 are also applicable to the

apparatuses

600, 700, 800, which are not repeated herein.

According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 3, 4 or 5.

According to an embodiment of yet another aspect, there is also provided a computing device including a memory having executable code stored therein and a processor that, when executing the executable code, implements the method described in connection with fig. 3, 4 or 5.

Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the embodiments of the present disclosure may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, these functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The foregoing detailed description has further been provided for the purpose of illustrating the technical concept of the present disclosure, and it should be understood that the foregoing detailed description is merely illustrative of the technical concept of the present disclosure, and is not intended to limit the scope of the technical concept of the present disclosure, but any modifications, equivalents, improvements, etc. based on the technical scheme of the embodiments of the present disclosure should be included in the scope of the technical concept of the present disclosure.

Claims

1. The data preprocessing method for the clustering model is used for clustering service entities by utilizing an attribute map, wherein the service entities are users, the attribute map comprises a plurality of nodes which are in one-to-one correspondence with the plurality of service entities, each node is provided with a feature vector determined based on the attribute of the corresponding service entity, a connecting edge is arranged between the nodes corresponding to the users related through social behavior, the clustering model comprises a coding module, a mapping module and a judging module, and the plurality of nodes comprise a first node; the method comprises the following steps:

processing the attribute map by using the coding module to obtain each characterization vector corresponding to each user, wherein the user corresponding to the first node is described by the first characterization vector;

Determining, by the mapping module, a first mapping vector for classifying the user corresponding to the first node to a plurality of user groups by using the first characterization vector, where the first mapping vector is formed by combining prototype vectors respectively characterizing the user groups according to an association relationship with the first characterization vector;

determining a clustering loss of the clustering model for classifying the users corresponding to the first node into corresponding user groups based on detection of the similarity degree of the first characterization vector and the first mapping vector by the discrimination module, wherein the similarity degree between the first characterization vector and the first mapping vector replaces overall distribution by empirical distribution of the characterization vector and the mapping vector, so that empirical mutual information determination is constructed based on a discrimination function, and the clustering loss is inversely related to the similarity degree between the first characterization vector and the first mapping vector;

and adjusting model parameters of the coding module, prototype vectors and intermediate vectors in the discriminant functions in the discriminant modules with the aim of minimizing the clustering loss, so that the clustering model is trained based on adjustment of characterization of users and user groups, and the trained clustering model is used for user grouping business processing.

2. The method of claim 1, the encoding module being a graph neural network, the first characterization vector being determined based on a fusion of feature vectors of the first node and feature vectors of its neighboring nodes.

3. The method of claim 1, wherein the first mapping vector is determined by:

determining each importance coefficient corresponding to each prototype vector based on the first characterization vector and each prototype vector;

and combining the prototype vectors in a weighted summation mode according to the combination parameters determined by the importance coefficients to obtain the first mapping vector.

4. A method according to claim 3, wherein each importance coefficient is determined based on an attention mechanism, each prototype vector comprising a first prototype vector, the first importance coefficient corresponding to the first prototype vector being positively correlated with the similarity of the first prototype vector and the first token vector and negatively correlated with the sum of the similarities of each prototype vector and the first token vector.

5. The method of claim 1, wherein the detecting, by the discrimination module, a degree of similarity of the first token vector and the first map vector comprises:

And determining the similarity of the first characterization vector and the first mapping vector based on the product of the first characterization vector, the intermediate vector of the discriminant function and the first mapping vector.

6. The method of claim 1, wherein the cluster penalty is also positively correlated with a degree of similarity between the first token vector and other mapping vectors corresponding to other nodes.

7. The method of claim 1, wherein the detecting, by the discrimination module, a degree of similarity of the first token vector and the first map vector comprises:

updating the first characterization vector by using the weighted vector of the first characterization vector and the first mapping vector;

and detecting the similarity degree of the updated first characterization vector and the first mapping vector based on the judging module.

8. The data preprocessing method for the clustering model is used for clustering service entities by utilizing an attribute map, wherein the service entities are users, the attribute map comprises a plurality of nodes which are in one-to-one correspondence with the plurality of service entities, each node is provided with a feature vector determined based on the attribute of the corresponding service entity, a connecting edge is arranged between the nodes corresponding to the users related through social behavior, the clustering model comprises a coding module, a mapping module and a judging module, and the plurality of nodes comprise a first node; the method comprises the following steps:

determining the coding loss of the coding module based on the similarity degree between the first characterization vector and the first characterization vector corresponding to the user indicated by the first node;

adjusting model parameters of the coding module with the aim of minimizing the coding loss;

processing the attribute map by using an encoding module with adjusted model parameters to obtain a third characterization vector corresponding to the user indicated by the first node;

determining, by the mapping module, a first mapping vector for classifying the users indicated by the first node to a plurality of user groups by using the third characterization vector, where the first mapping vector is formed by combining prototype vectors respectively characterizing the user groups according to an association relationship with the first characterization vector;

determining a clustering loss of the clustering model for the user corresponding to the first node classified to a corresponding user group based on detection of the similarity degree of the third characterization vector and the first mapping vector by the discrimination module, wherein the similarity degree between the third characterization vector and the first mapping vector replaces the overall distribution by the empirical distribution of the characterization vector and the mapping vector, so that empirical mutual information determination is constructed based on a discrimination function, and the clustering loss is inversely related to the similarity degree between the third characterization vector and the first mapping vector;

And adjusting each prototype vector and the intermediate vector in the discriminant function by taking the minimization of the clustering loss as a target, so as to train the mapping module and the discriminant module, and using the clustering model trained with the mapping module and the discriminant module for carrying out user grouping business processing.

9. The method of claim 8, the encoding module being a graph neural network, the token vector of the first node being determined based on a fusion of the feature vector of the first node with the feature vectors of its neighboring nodes.

10. The method of claim 8, wherein a degree of similarity between the first token vector and a first feature vector corresponding to the first node is measured via a first discriminant function based on a product of the first token vector, an intermediate vector of the first discriminant function, and the first mapping vector, the degree of similarity of the first token vector and the first mapping vector determined.

11. The method of claim 8, wherein the cluster penalty is also positively correlated with a degree of similarity between the first token vector and other mapping vectors corresponding to other nodes.

12. The method of claim 11, wherein the attribute map corresponds to a change map with feature vectors randomly adjusted, and the change map has a second node corresponding to the first node, and the second node corresponds to a second feature vector obtained by processing the change map by the encoding module; the coding loss also includes a negative correlation with a degree of similarity between the first feature vector and the second feature vector.

13. A method for clustering business entities, which is used for clustering business entities by utilizing an attribute map through a pre-trained clustering model, wherein the business entities are users, the attribute map comprises a plurality of nodes which are in one-to-one correspondence with the plurality of business entities, each node is provided with a feature vector which is determined based on the attribute of the corresponding business entity, and connecting edges are arranged between the nodes which are corresponding to the users and are related through social behaviors, and the clustering model comprises a coding module, a mapping module and a judging module; the method comprises the following steps:

processing the attribute map by using the coding module to obtain each characterization vector corresponding to each user;

the mapping module is used for respectively determining mapping vectors obtained by mapping the users to a plurality of user groups by using the characterization vectors, wherein a single mapping vector is formed by combining prototype vectors which characterize the user groups according to association relations with the characterization vectors;

based on the judging module, detecting the similarity degree of the characterization vector of the user corresponding to the first node and the mapping vector of the user corresponding to the second node;

and under the condition that the similarity degree meets the preset condition, determining that the user corresponding to the first node and the user corresponding to the second node belong to the same user group.

14. The data preprocessing device for a clustering model is used for clustering service entities by utilizing an attribute map, wherein the service entities are users, the attribute map comprises a plurality of nodes which are in one-to-one correspondence with the plurality of service entities, each node is provided with a feature vector determined based on the attribute of the corresponding service entity, a connecting edge is arranged between the nodes corresponding to the users related through social behavior, the clustering model comprises a coding module, a mapping module and a judging module, and the plurality of nodes comprise a first node; the device comprises:

the coding unit is configured to process the attribute map by utilizing the coding module to obtain each characterization vector corresponding to each user, and the user corresponding to the first node describes the characterization vector through the first characterization vector;

the mapping unit is configured to determine, through the mapping module, a first mapping vector for classifying the users corresponding to the first node to a plurality of user groups by using the first characterization vector, wherein the first mapping vector is formed by combining prototype vectors respectively characterizing the user groups according to an association relationship with the first characterization vector; the method comprises the steps of carrying out a first treatment on the surface of the

A discrimination unit configured to determine, based on detection of a degree of similarity of the first token vector and the first mapping vector by the discrimination module, a cluster loss of the cluster model to a corresponding user population for a user corresponding to the first node, wherein the degree of similarity between the first token vector and the first mapping vector replaces an overall distribution via empirical distribution of token vectors and mapping vectors, thereby constructing empirical mutual information determination based on a discrimination function, the cluster loss being inversely related to the degree of similarity between the first token vector and the first mapping vector;

and the adjusting unit is configured to adjust the model parameters of the coding module, each prototype vector and the intermediate vectors in the discriminant functions in the discriminant modules with the aim of minimizing the clustering loss, so that the clustering model is trained based on the adjustment of the characterization of the users and the user groups, and the trained clustering model is used for carrying out user grouping business processing.

15. The data preprocessing device for a clustering model is used for clustering service entities by utilizing an attribute map, wherein the service entities are users, the attribute map comprises a plurality of nodes which are in one-to-one correspondence with the plurality of service entities, each node is provided with a feature vector determined based on the attribute of the corresponding service entity, a connecting edge is arranged between the nodes corresponding to the users related through social behavior, the clustering model comprises a coding module, a mapping module and a judging module, and the plurality of nodes comprise a first node; the device comprises:

a first discrimination unit configured to determine a coding loss of the coding module based on a degree of similarity between the first characterization vector and a first feature vector corresponding to a user indicated by the first node;

the coding unit is further configured to process the attribute map by using a coding module with adjusted model parameters to obtain a third characterization vector corresponding to the user indicated by the first node;

the mapping unit is configured to determine, through the mapping module, a first mapping vector for classifying the users indicated by the first node to a plurality of user groups by using the third characterization vector, wherein the first mapping vector is formed by combining prototype vectors respectively characterizing the user groups according to an association relationship with the first characterization vector;

a second discrimination unit configured to determine, based on detection of a degree of similarity of the third token vector and the first mapping vector by the discrimination module, a cluster loss of the cluster model for classification of the user corresponding to the first node to a corresponding user population, wherein the degree of similarity between the third token vector and the first mapping vector replaces an overall distribution via empirical distributions of token vectors and mapping vectors, thereby constructing an empirical mutual information determination based on a discrimination function, the cluster loss being inversely related to the degree of similarity between the third token vector and the first mapping vector;

And the second adjusting unit is configured to adjust each prototype vector and the intermediate vector in the discriminant function with the aim of minimizing the clustering loss, so as to train the mapping module and the discriminant module, and train the clustering models of the mapping module and the discriminant module for user grouping business processing.

16. The device for clustering the business entities is used for clustering the business entities by utilizing an attribute map through a pre-trained clustering model, wherein the business entities are users, the attribute map comprises a plurality of nodes which are in one-to-one correspondence with the plurality of business entities, each node is provided with a feature vector determined based on the attribute of the corresponding business entity, and connecting edges are arranged between the nodes corresponding to the users which are related through social behaviors, and the clustering model comprises a coding module, a mapping module and a judging module; the device comprises:

the coding unit is configured to process the attribute map by utilizing the coding module to obtain each characterization vector corresponding to each user;

a mapping unit configured to pass through the mapping module,

respectively determining mapping vectors obtained by mapping the users to a plurality of user groups by using the characterization vectors, wherein a single mapping vector is formed by combining prototype vectors respectively representing the user groups according to the similarity with the characterization vectors;

The judging unit is configured to detect the similarity degree of the characterization vector of the user corresponding to the first node and the mapping vector of the user corresponding to the second node based on the judging module;

and the determining unit is configured to determine that the user corresponding to the first node and the user corresponding to the second node belong to the same user group under the condition that the similarity degree meets a preset condition.

17. A computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of claims 1-13.

18. A computing device comprising a memory and a processor, wherein the memory has executable code stored therein, which when executed by the processor, implements the method of any of claims 1-13.