CN117725442A - Method, device, equipment and storage medium for classifying desiring party - Google Patents

Method, device, equipment and storage medium for classifying desiring party Download PDF

Info

Publication number
CN117725442A
CN117725442A CN202410040812.2A CN202410040812A CN117725442A CN 117725442 A CN117725442 A CN 117725442A CN 202410040812 A CN202410040812 A CN 202410040812A CN 117725442 A CN117725442 A CN 117725442A
Authority
CN
China
Prior art keywords
data
clusters
determined
cluster
difference
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410040812.2A
Other languages
Chinese (zh)
Inventor
赵昱榕
贾岩
张恕迪
李戎
全威龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202410040812.2A priority Critical patent/CN117725442A/en
Publication of CN117725442A publication Critical patent/CN117725442A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a method, a device, equipment and a storage medium for classifying a demand party, and relates to the technical field of big data. The method comprises the steps of obtaining classification reference data of different demanding parties; the classification reference data comprise basic attribute data of the demander and interaction behavior data between the corresponding demander and the candidate object; the method comprises the steps of taking the number of clusters as independent variables, and taking the clustering degree of clustering analysis on each classified reference data based on the independent variables as dependent variables to construct a first objective function; under the condition that the first objective function accords with the preset Gaussian distribution, taking the cluster number corresponding to the Gaussian mean value as the target cluster number; and carrying out cluster analysis on each classified reference data based on the number of the target clusters so as to classify different demanding parties. The method can accurately determine the total number of clusters, and ensure the accuracy of the clustering result, thereby improving the recommending effect of the product.

Description

Method, device, equipment and storage medium for classifying desiring party
Technical Field
The present application relates to the field of big data technology, and may be used in the field of financial technology or other related fields, and in particular, to a method, apparatus, device and storage medium for classifying a party in need thereof.
Background
With the rapid development of the information age, the information volume is increasing, and the product recommendation modes of various industries are also changed. For example, in a financial institution, users (demand parties) can be clustered, and products matched with the class of the target user are recommended to the target user, so that the product recommendation efficiency and accuracy can be improved.
However, due to the randomness of the current clustering algorithm in the process of determining the total number of clusters, the situation that the total number of clusters is too large or too small can occur, the accuracy of the clustering result is affected, and then the recommending effect of the product is affected.
Disclosure of Invention
Based on the above, it is necessary to provide a method, a device, an apparatus and a storage medium for classifying a demand side, which can accurately determine the total number of clusters, and ensure the accuracy of the clustering result, thereby improving the product recommendation effect.
In a first aspect, the present application provides a method for classifying a demand party, including:
acquiring classification reference data of different demanding parties; the classification reference data comprise basic attribute data of the demander and interaction behavior data between the corresponding demander and the candidate object;
the method comprises the steps of taking the number of clusters as independent variables, and taking the clustering degree of clustering analysis on each classified reference data based on the independent variables as dependent variables to construct a first objective function;
Under the condition that the first objective function accords with the preset Gaussian distribution, taking the cluster number corresponding to the Gaussian mean value as the target cluster number;
and carrying out cluster analysis on each classified reference data based on the number of the target clusters so as to classify different demanding parties.
In one embodiment, when the first objective function accords with a preset gaussian distribution, taking the number of clusters corresponding to the gaussian mean value as the target number of clusters includes: determining a reference clustering degree corresponding to a clustering result of the clustering analysis through the number of the determined reference clusters based on a first objective function; constructing a second objective function according to the data difference condition between the number of the reference clusters to be determined and the number of the determined reference clusters and the degree of each reference cluster; taking the corresponding cluster number when the second objective function is maximum as the new reference cluster number, and iteratively executing the determining operation of the reference cluster degree until the preset iteration termination condition is met; and taking the number of the reference clusters determined when the preset iteration termination condition is met as the number of the target clusters.
In one embodiment, constructing a second objective function according to the data difference between the number of reference clusters to be determined and the number of the determined reference clusters and the degree of each reference cluster, including: determining reference difference data between the number of the reference clusters to be determined and the number of the determined reference clusters based on a covariance function of a preset Gaussian distribution; taking the difference value between the reference clustering degree of the determined number of the reference clusters and the average value of the reference clustering degree corresponding to the determined number of the reference clusters as difference adjustment data; and constructing a second objective function according to the product of the reference difference data and the difference adjustment data.
In one embodiment, constructing the second objective function based on the product of the reference difference data and the difference adjustment data includes: determining difference weight data according to the determined data difference conditions among the number of the reference clusters; weighting the difference adjustment data according to the difference weight data to update the difference adjustment data; and constructing a second objective function according to the product of the reference difference data and the updated difference adjustment data.
In one embodiment, determining difference weight data according to the determined data difference between the number of reference clusters includes: and determining difference weight data among the determined number of each reference cluster based on a covariance function of a preset Gaussian distribution.
In one embodiment, if there is only one determined number of reference clusters, determining difference weight data between the determined number of reference clusters based on a covariance function of a preset gaussian distribution includes: and determining difference weight data of the determined number of the reference clusters and the determined number of the reference clusters based on a covariance function of a preset Gaussian distribution.
In one embodiment, performing cluster analysis on each classified reference data based on the number of target clusters to classify different desirers includes: selecting an initial clustering center from the classified reference data; determining the reference probability of the corresponding classified reference data as the clustering center according to the Manhattan distance between the classified reference data which is not selected as the clustering center and each existing clustering center; wherein the existing cluster centers comprise initial cluster centers; based on a wheel disc method, selecting the next clustering center according to each reference probability until the number of the existing clustering centers reaches the number of target clusters; and carrying out cluster analysis on the corresponding classified reference data according to the Manhattan distance between the classified reference data which is not selected as the cluster center and each cluster center so as to classify different demanding parties.
In a second aspect, the present application further provides a demand side classification apparatus, including:
the data acquisition module is used for acquiring the classified reference data of different demanding parties; the classification reference data comprise basic attribute data of the demander and interaction behavior data between the corresponding demander and the candidate object;
the function construction module is used for constructing a first objective function by taking the number of clusters as independent variables and taking the clustering degree of clustering analysis on each classified reference data based on the independent variables as the dependent variables;
the quantity determining module is used for taking the quantity of clusters corresponding to the Gaussian mean value as the quantity of target clusters under the condition that the first objective function accords with the preset Gaussian distribution;
and the classification module is used for carrying out cluster analysis on each classification reference data based on the number of the target clusters so as to classify different demanding parties.
In a third aspect, the present application also provides a computer device comprising a memory storing a computer program and a processor which when executing the computer program performs the steps of:
acquiring classification reference data of different demanding parties; the classification reference data comprise basic attribute data of the demander and interaction behavior data between the corresponding demander and the candidate object;
The method comprises the steps of taking the number of clusters as independent variables, and taking the clustering degree of clustering analysis on each classified reference data based on the independent variables as dependent variables to construct a first objective function;
under the condition that the first objective function accords with the preset Gaussian distribution, taking the cluster number corresponding to the Gaussian mean value as the target cluster number;
and carrying out cluster analysis on each classified reference data based on the number of the target clusters so as to classify different demanding parties.
In a fourth aspect, the present application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
acquiring classification reference data of different demanding parties; the classification reference data comprise basic attribute data of the demander and interaction behavior data between the corresponding demander and the candidate object;
the method comprises the steps of taking the number of clusters as independent variables, and taking the clustering degree of clustering analysis on each classified reference data based on the independent variables as dependent variables to construct a first objective function;
under the condition that the first objective function accords with the preset Gaussian distribution, taking the cluster number corresponding to the Gaussian mean value as the target cluster number;
and carrying out cluster analysis on each classified reference data based on the number of the target clusters so as to classify different demanding parties.
In a fifth aspect, the present application also provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of:
acquiring classification reference data of different demanding parties; the classification reference data comprise basic attribute data of the demander and interaction behavior data between the corresponding demander and the candidate object;
the method comprises the steps of taking the number of clusters as independent variables, and taking the clustering degree of clustering analysis on each classified reference data based on the independent variables as dependent variables to construct a first objective function;
under the condition that the first objective function accords with the preset Gaussian distribution, taking the cluster number corresponding to the Gaussian mean value as the target cluster number;
and carrying out cluster analysis on each classified reference data based on the number of the target clusters so as to classify different demanding parties.
According to the method, the device, the equipment and the storage medium for classifying the required parties, the cluster number is taken as an independent variable according to the acquired classified reference data of different required parties, the clustering degree of clustering analysis on the classified reference data based on the independent variable is taken as a dependent variable, a first objective function is constructed, the cluster number corresponding to the Gaussian mean value is taken as the target cluster number under the condition that the first objective function accords with the preset Gaussian distribution, and then the clustering analysis is carried out on the classified reference data based on the target cluster number so as to classify different required parties. The first objective function can represent the change of the clustering degree of each classified reference data along with the change of the cluster number and accords with the preset gaussian distribution, so that the cluster number corresponding to the clustering degree of each classified reference data when the clustering degree of each classified reference data is relatively common (namely, the cluster number corresponding to the gaussian mean value) is used as the target cluster number. Further, the clustering analysis is carried out on the classified reference data according to the number of the target clusters, so that the clustering analysis can be carried out on the classified reference data better, namely, the clustering analysis effect on the classified reference data is better, and therefore, the accuracy of classifying the demander can be improved in the whole process, and the effect of recommending subsequent products can be improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the related art, the drawings that are required to be used in the embodiments or the related technical descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to the drawings without inventive effort for a person having ordinary skill in the art.
FIG. 1 is a diagram of an application environment for a method of classifying a demand party in one embodiment;
FIG. 2 is a flow chart of a method of classifying a demand side according to one embodiment;
FIG. 3 is a flow chart illustrating determining the number of target clusters in one embodiment;
FIG. 4 is a flow diagram of constructing a second objective function in one embodiment;
FIG. 5 is a flow chart of classifying a demand party according to another embodiment;
FIG. 6 is a flow chart of a method for classifying a demand side according to another embodiment;
FIG. 7 is a flow diagram of a method of recommending items in one embodiment;
FIG. 8 is a block diagram of the architecture of an item recommendation application-specific input terminal in one embodiment;
FIG. 9 is a block diagram of a sort device for the demander in one embodiment;
fig. 10 is an internal structural view of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
The method for classifying the desiring party provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server. Specifically, the server 104 obtains the classification reference data of different demanding parties through the terminal 102, and constructs a first objective function by taking the number of clusters as an independent variable and taking the clustering degree of clustering analysis on each classification parameter data based on the independent variable as a dependent variable. Under the condition that the first objective function accords with the preset gaussian distribution, the server 104 takes the number of clusters corresponding to the gaussian mean value as the number of target clusters, and performs cluster analysis on each classified reference data based on the number of target clusters so as to classify different demanding parties. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.
In an exemplary embodiment, as shown in fig. 2, a method for classifying a demand party is provided, which is illustrated by taking the application of the method to the server 104 in fig. 1 as an example, and includes the following steps:
s201, acquiring classification reference data of different demanding parties.
The demander can be a user who has a product recommendation demand, a user identification of the user, or user equipment used. The classification reference data is used to characterize data that has an impact on the classification results of the demander. For example, the classification reference data may include base attribute data of the demander. For example, essential information of the demander, such as at least one of age information and sex information. The classification reference data may also include, for example, interaction behavior data between the demander and the candidate items, for example, operation data (such as at least one of browsing, collecting, and trading operations) of the demander on each candidate item, and interaction preference data determined according to the interaction behavior data of the demander within a preset period of time, and the like. The candidate item may be data for which the requesting party has a need to acquire. Illustratively, the candidate item may be a financial product of the financial institution for the customer corresponding to the financial institution.
Alternatively, in this embodiment, when it is detected that there is a classification requirement for the demander, the server may acquire classification reference data of different demander from the database storing the classification reference data. The server may analyze the classification request to obtain classification reference data of each of the requesters when receiving the classification request of the requester.
In order to make the classified reference data more accurate, the present embodiment may consider the classified reference data acquired by the server in the above manner as the original data. The server may perform data processing on the raw data to determine the classification reference data. In order to make the classified reference data more reference, the data processing process of the present embodiment may include preprocessing the original data, and then screening the preprocessed data to determine the classified reference data for subsequent use.
Alternatively, taking the example that the original data includes character type variables and numerical value open variables, the manner of preprocessing the original data may be implemented by at least one of conventional technologies, which is not limited in this application.
In one implementation, nonsensical data in the original data may be culled. For example, the user name may be judged for each character type variable, the repeated-meaning variable may be deleted, and the variable having the same name or the same meaning, for example, the name and the name may be deleted, so that only one may be reserved. And determining the standard deviation of the numerical variable for the numerical variable, and deleting the variable with the standard deviation of zero.
In another implementation, the missing values may be complemented. For example, the raw data may be classified by using a hierarchical clustering method to obtain a classification result, and for each type of raw data, a mean value of the type of raw data is determined as a supplementary value, and a numerical value of a missing value in the type of raw data is set as the supplementary value.
In yet another implementation, outliers may be handled. Illustratively, the presence or absence of outliers in the raw data may be detected by a preset detection algorithm, such as a quartile-space algorithm (Interquartile Range, IQR), and the outliers replaced with a preset fraction (e.g., quartile).
Alternatively, the data filtering manner of the preprocessed data may be implemented by at least one of conventional technologies, which is not limited in this application.
In one implementation, operation data used for representing the transaction record in the interaction behavior data may be extracted from the preprocessed data as classification reference data, and other data may be used as first candidate data. Further, data with higher correlation with the target data is removed from the first candidate data, and the remaining data is used as second candidate data. And then, importance ranking is carried out on the second candidate data, and a preset number of second candidate data ranked in front are also used as target data.
Optionally, the manner of determining the second candidate data from the first candidate data may be to determine correlation coefficients between different first candidate data and the target data according to a predetermined correlation coefficient determination algorithm; wherein the correlation coefficient may include at least one of Pearson correlation coefficient similarity (Pearson correlation coefficient, pearson) and Spearman correlation coefficient (Spearman), etc.; and sequencing the correlation coefficients from high to low, and selecting first candidate data with the correlation coefficient lower than a preset coefficient threshold value or the first 2N (N is a random integer) with the lower correlation coefficient as second candidate data. The mode of selecting the target data from the second candidate data can be to determine the importance index of the corresponding second candidate data according to the use condition of different second candidate data in the random forest model in the decision tree; according to the importance index, the corresponding second candidate data are arranged in a descending order; and selecting the second candidate data which are ranked at the front as classification reference data. It will be appreciated that the larger the importance index value is, the greater the influence of the corresponding second candidate data on the classification result. The preset coefficient threshold value can be set by a technician according to needs or experience, or can be determined through a plurality of experiments, and the application is not limited in any way.
S202, constructing a first objective function by taking the number of clusters as an independent variable and taking the clustering degree of clustering analysis on each classified reference data based on the independent variable as a dependent variable.
The clustering degree can be a distance related parameter between sample data in each cluster and the center of the cluster, and represents the sum of square distances between each sample in the cluster and the center of the cluster.
In this embodiment, the number of clusters may be used as an argument, and the degree of clustering of the classification reference data based on the argument may be used as an argument, so that the first objective function may be constructed according to a predetermined function construction manner.
S203, taking the cluster number corresponding to the Gaussian mean value as the target cluster number under the condition that the first target function accords with the preset Gaussian distribution.
The target cluster quantity represents the total number of classes after the classification of the demander. The ideal average value of the preset gaussian distribution can be set or adjusted by a technician according to needs or experience, or can be determined through a plurality of experiments, and the application is not limited in any way.
Optionally, the clustering degree data corresponding to the first objective function under different cluster numbers can be determined; and sampling the clustering degree data, carrying out data fitting based on a preset Gaussian distribution according to a sampling result, determining a Gaussian mean value corresponding to the fitted preset Gaussian distribution, and taking the number of clusters corresponding to the Gaussian mean value of the preset Gaussian distribution as the number of target clusters at the moment.
Optionally, the first objective function and the preset gaussian distribution may be input into a pre-trained gaussian mean value determining model, the model processes the received data, outputs a gaussian mean value, and substitutes the gaussian mean value into the first objective function, so as to determine the number of clusters corresponding to the gaussian mean value.
It should be noted that the gaussian mean value determination model may be constructed based on a common neural network algorithm, which is not described in detail. Further, the training process of the gaussian mean value determining model may be that a function sample obtained in advance, a gaussian distribution sample corresponding to the function sample, and a label corresponding to the sample are input to the gaussian mean value determining model, and the gaussian mean value determining model is subjected to supervised training according to the gaussian mean value output by the gaussian mean value determining model and the label of the corresponding sample, so that accuracy of determining the gaussian mean value by the gaussian mean value determining model is improved. The manner in which the gaussian mean determination model is trained may be a common manner of model training and will not be described in detail herein.
S204, based on the number of the target clusters, carrying out cluster analysis on each classified reference data so as to classify different demanding parties.
In an alternative embodiment, the number of target clusters and the respective classification reference data may be input into a pre-trained classification model, the classification model processes the received data, and a classification result is output.
It should be noted that the classification model may be constructed based on a common neural network model, which will not be described in detail. Further, in the training process of the classification model, the data sample to be classified and the classification label corresponding to the sample are input into the classification model, the classification result and the corresponding sample label are output according to the classification model, and the classification model is subjected to supervised training, so that the accuracy of determining the classification result by the classification model is improved. The manner in which the classification model is trained may be a common manner of model training and is not described in detail herein. The number of categories corresponding to the classification model is the number of target clusters.
In another alternative embodiment, the classified reference data may be clustered based on a common Clustering algorithm (e.g., an X-Means Clustering algorithm, i.e., an X-Means Clustering algorithm, a hierarchical Clustering algorithm, a density Clustering algorithm, or a mean shift Clustering algorithm).
Since each piece of classification reference data corresponds to the party requiring the classification, performing cluster analysis on each piece of classification reference data corresponds to classifying the party requiring the classification. That is, in this embodiment, different demand parties may be divided into the same or different categories, and the classification reference data of the different categories is a category cluster.
In the above-mentioned demand side classification method, according to the obtained classification reference data of different demand sides, the number of clusters is taken as an independent variable, the clustering degree of carrying out cluster analysis on each classification reference data based on the independent variable is taken as a dependent variable, a first objective function is constructed, and under the condition that the first objective function accords with the preset gaussian distribution, the number of clusters corresponding to the gaussian mean value is taken as the number of target clusters, and further, based on the number of target clusters, the cluster analysis is carried out on each classification reference data, so as to classify different demand sides. The first objective function can represent the change of the clustering degree of each classified reference data along with the change of the cluster number and accords with the preset gaussian distribution, so that the cluster number corresponding to the clustering degree of each classified reference data when the clustering degree of each classified reference data is relatively common (namely, the cluster number corresponding to the gaussian mean value) is used as the target cluster number. Further, the clustering analysis is carried out on the classified reference data according to the number of the target clusters, so that the clustering analysis can be carried out on the classified reference data better, namely, the clustering analysis effect on the classified reference data is better, and therefore, the accuracy of classifying the demander can be improved in the whole process, and the effect of recommending subsequent products can be improved.
Further, in order to improve the accuracy of the number of target clusters, in one embodiment, as shown in fig. 3, in a case where the first objective function conforms to a preset gaussian distribution, the method includes:
s301, determining a reference clustering degree corresponding to a clustering result of clustering analysis through the determined reference cluster number based on the first objective function.
Illustratively, the first objective function may be as follows:
wherein the WCSS is used for representing the clustering degree and is a dependent variable of a first objective function;represents the jth class reference data within the ith cluster; />Represents the ithThe centroid of the cluster, k, is the number of clusters and is the argument of the first objective function.
The reference cluster number may be the number of clusters determined before the number of target clusters is determined this time. The reference clustering degree may be a clustering degree determined when performing cluster analysis on each classified reference data based on the number of reference clusters.
For each determined reference cluster number, performing cluster analysis on each classified reference data to obtain a reference cluster number cluster; determining a reference cluster degree corresponding to the determined number of reference clusters by adopting the following formula:
The WCSS is used for representing the reference clustering degree corresponding to the determined reference cluster number;represents the jth class reference data within the ith cluster; />Representing the centroid of the ith cluster, k is the determined reference cluster number.
For example, if the number of reference clusters represents three clusters, the present embodiment may determine squares of distances between centroids of the three clusters and each classified reference data, and take a sum of squares of the three distances as the reference clustering degree. If there is only one reference cluster number, the reference cluster degree determined by the first objective function may be directly used as the reference cluster degree in this embodiment.
If there are at least two reference cluster numbers, in this embodiment, the reference cluster levels corresponding to the reference cluster numbers may be subjected to matrixing to obtain the reference cluster levels in this embodiment, or the sum of the reference cluster levels may be used as the reference cluster level in this embodiment, which is not limited.
S302, constructing a second objective function according to the data difference condition between the number of the reference clusters to be determined and the number of the determined reference clusters and the degree of each reference cluster.
The number of reference clusters to be determined is the number of clusters determined by the calculation, and it can be understood that the number of reference clusters to be determined is an unknown number. The data difference condition characterizes a difference condition between the number of reference clusters to be determined and each determined number of reference clusters, and may be, for example, at least one of a sum, a mean, a variance, or a covariance of a difference between the number of reference clusters to be determined and each number of reference clusters.
It is noted that if there is no determined number of reference clusters, the number of first reference clusters can be determined as follows: randomly selecting one or more points in each classified reference data as an initial clustering center, respectively calculating the shortest distance between the classified reference data of each non-clustering center and each existing clustering center, and calculating the probability value that the classified reference data of each non-clustering center becomes the next clustering centerAnd selecting the next cluster center based on a wheel disc method until all cluster centers are selected, and taking the number of the cluster centers as the number of first reference clusters.
Optionally, in order to enable the second objective function to be more accurate, so as to improve accuracy of the number of reference clusters to be determined, in this embodiment, reference difference data between the number of reference clusters to be determined and the number of determined reference clusters may be determined based on a covariance function of a preset gaussian distribution; taking the difference between the reference clustering degree of the determined number of the reference clusters and the average value of the determined number of the reference clusters as difference adjustment data; and constructing a second objective function according to the product of the reference difference data and the difference adjustment data.
Wherein the reference difference data is capable of characterizing a difference between the number of reference clusters to be determined and any determined number of reference clusters. Alternatively, the value of the reference difference data may be directly used as the data difference condition; the data difference condition corresponding to the determined reference difference data may be used as the data difference condition in the embodiment according to the preset correspondence between the data value and the data difference condition, which is not limited. The difference adjustment data is used to adjust the reference difference data.
Alternatively, the reference difference data may be used to measure the data difference between different reference cluster numbers. For example, the following formula may be used to represent the data difference between the number of reference clusters to be determined and any of the determined reference clusters:
in the method, in the process of the invention,representing the number of reference clusters to be determined; />Representing the number of reference clusters; />Representing reference difference data (i.e., covariance between the number of reference clusters to be determined and any reference cluster number) between the number of reference clusters to be determined and any reference cluster number,>the parameter may be a fixed value for the reference difference data set in advance.
The reference difference data in the present embodiment may be expressed as, for example, when there is at least one reference cluster numberWherein->Representing the number of reference clusters to be determined, +.>Representing a matrix of the number of each determined reference cluster. It should be noted that when there is only one determined number of reference clusters, K is a matrix containing only a single element, and the corresponding element is the determined number of reference clusters.
Alternatively, the process of determining the variance adjustment data may be as follows: and (3) determining the average value of the determined reference clusters, and taking the difference value between the reference clustering degree corresponding to the reference clusters and the corresponding average value determined in the step (S301) as difference adjustment data.
For example, the following formula may be employed to determine the variance adjustment data:
in the method, in the process of the invention,difference adjustment data corresponding to the number of the determined reference clusters is represented; />Representing the number of reference clusters to be determined; />A matrix representing the determined number of reference clusters; />Representing the reference clustering degree corresponding to the number of the determined reference clusters; />Representing the mean of the corresponding reference cluster degree of each determined reference cluster number.
Further, the present embodiment may use the product of the determined difference adjustment data and the reference difference data as the second objective function, and the second objective function may be as follows, for example:
In the method, in the process of the invention,a value that is a second objective function; />Representing a preset ideal mean value, which may be determined empirically or through a number of experiments, and whose value may vary with specific needs, illustratively>The ideal average value of the preset gaussian distribution can be 0 or 10, and is not limited; />Representing the number of reference clusters to be determined; />A matrix composed of the number of each reference cluster; />Difference adjustment data corresponding to the number of the determined reference clusters is represented; />Representing reference difference data.
S303, taking the corresponding cluster number when the second objective function is maximum as the new reference cluster number, and iteratively executing the determination operation of the reference cluster degree until the preset iteration termination condition is met.
Specifically, in this embodiment, the second objective function is used as the maximum target, and the constructed second objective function is solved, so as to determine the new number of reference clusters. I.e. new reference cluster number
Alternatively, the second objective function may be solved, and the number of new reference clusters may be determined in many ways, for example, according to a predetermined value interval of the number of new reference clusters, integers in the value interval may be substituted into the second objective function in sequence, a value of the second objective function is determined, and a value corresponding to the maximum value of the second objective function is taken as the number of new reference clusters.
It should be noted that the new value interval of the number of reference clusters may be determined based on manual experience, or may be determined through a large number of experiments, which is not limited. The function solution model may be constructed based on a common convolutional neural network, and will not be described in detail herein. In this embodiment, the number of clusters corresponding to the maximum second objective function is used as the new number of reference clusters, and the step of determining the reference clustering degree in S301 is performed in a return manner until the iteration number reaches the preset number requirement; or the situation that the difference value between the number of clusters corresponding to the maximum second objective function of the current time and the number of the reference clusters determined last time is smaller than the preset difference value appears for multiple times, and the preset iteration termination condition can be met.
S304, the number of the reference clusters determined when the preset iteration termination condition is met is used as the number of target clusters.
Specifically, in this embodiment, the number of reference clusters determined by the corresponding current iteration when the preset iteration termination condition is satisfied is used as the number of target clusters. For example, the preset iteration termination condition may be that a preset number of iterations is reached, or that the determined number of reference clusters continuously fluctuates within a preset range. The preset iteration termination condition may be determined based on human experience, or may be determined through a large number of experiments, which is not limited.
Further, in order to make the value of the number of target clusters more accurate, in this embodiment, the number of reference clusters determined when the preset iteration termination condition is satisfied may also be verified, so as to determine the number of target clusters. For example, the reference cluster number may be verified based on a cross verification method, whether verification is passed or not is judged, and the reference cluster number is regarded as a target cluster number in the case that verification is passed; otherwise, continuing iteration until the number of target clusters is determined.
In the above embodiment, each reference cluster degree is determined according to the first objective function, and the second objective function is constructed according to the data difference between the number of reference clusters to be determined and the number of determined reference clusters, and each reference cluster degree. And iterating the number of clusters corresponding to the maximum second function serving as the new reference number of clusters until a preset iteration termination condition is met, and taking the number of clusters determined in the last iteration as the target number of clusters. The second objective function can represent the difference condition and the clustering degree between the number of the reference clusters to be determined and the number of the reference clusters, when the second objective function is maximum, the corresponding reference clustering degree is also maximum, and the determined number of the target clusters has better clustering effect on the classified reference data when the iteration termination condition is met.
It should be noted that, in the process of determining the number of target clusters, the value of the first reference cluster number may be preset, or may be determined according to an existing clustering algorithm, which is not limited. Further, the method of determining the number of target clusters according to the existing clustering algorithm is a common technical means in the art, and will not be described herein.
On the basis of the above embodiments, further, in order to further improve the accuracy of the difference adjustment data, another way of constructing the second objective function is provided to improve the accuracy and rationality of the second objective function construction result. As shown in fig. 4, the method comprises the following steps:
s401, determining difference weight data according to the determined data difference situation among the number of the reference clusters.
The difference weight data may be data for weighting the data difference case.
Optionally, in order to make the difference weight data more accurate, the accuracy of the number of reference clusters to be determined is further improved. In this embodiment, the determining process of the difference weight data may be: and determining difference weight data among the determined number of each reference cluster based on a covariance function of a preset Gaussian distribution. That is, the data difference condition between any two reference cluster numbers in each determined reference cluster number set is determined by determining the data difference condition in S302, and then difference weight data between each determined reference cluster number is determined according to each data difference condition.
For example, the average value or the sum value of the number of each reference cluster may be used as the difference weight data. The data difference matrix can be constructed according to the data difference conditions, and the difference weight data among the determined number of the reference clusters can be determined according to the data difference matrix. Illustratively, it can be determined by the following formula:
wherein,differential weight data representing the number of each reference cluster determined; />Represents a data difference matrix, which is composed of +.>A matrix of components, wherein->And->Respectively representing the +.sup.th in the number of the determined reference clusters>Person and->Number of reference clusters>Representing quantized +.>And->Data difference conditions between; />The noise coefficient is preset; />Is->A matrix with the same number of rows and a value of 1.
It should be noted that, if there is only one determined number of reference clusters, the present embodiment may determine difference weight data between the determined number of reference clusters and itself based on a covariance function of a preset gaussian distribution. Exemplary, if the determined number of reference clusters isThen in determining the difference weight data +.>In the course of (a), the above formula +.>Representation->(/>And its own differential weight data).
Since the difference weight data between the number of the reference clusters and the number of other reference clusters cannot be determined when only one determined number of the reference clusters exists, in this embodiment, a specific manner of determining the difference weight data when only one determined number of the reference clusters exists is provided, so that the process of determining the difference weight data is more strict.
S402, weighting the difference adjustment data according to the difference weight data to update the difference adjustment data.
Specifically, in this embodiment, the product of the difference weight data and the difference adjustment data may be used as updated difference adjustment data.
Illustratively, the updated variance adjustment data may be as follows. Wherein (1)>Differential weight data representing the number of each reference cluster determined; />Namely, difference adjustment data before update +.>And the difference adjustment data corresponding to the determined number of each reference cluster is represented.
S403, constructing a second objective function according to the product of the reference difference data and the updated difference adjustment data.
Alternatively, in this embodiment, the product of the reference difference data and the updated difference adjustment data may be used as the second objective function, and the second objective function may be represented by the following formula:
In the method, in the process of the invention,a value that is a second objective function; />Representing a preset ideal mean value, which may be determined empirically or through a number of experiments, and whose value may vary with specific needs, illustratively>The ideal average value of the preset gaussian distribution can be 0 or 10, and is not limited; />A value that is a second objective function; />Representing the number of reference clusters to be determined; />A matrix composed of the number of each reference cluster; />Representing a reference clustering degree; />Representing the average value of the number of each reference cluster; />Representing variance adjustment data; />Representing reference difference data; />Represents a data difference matrix, which is composed of +.>A matrix of components, wherein->And->Respectively representing the +.sup.th in the number of the determined reference clusters>Person and->Number of reference clusters>Representation->And->Data difference conditions between; />The noise coefficient is preset; />Is->A matrix with the same number of rows and a value of 1.
In the above embodiment, the difference weight data is determined according to the determined data difference condition between the number of each reference cluster, and the determined difference adjustment data is weighted by using the difference weight data, and since the weighted difference adjustment data can better adjust the reference difference data, the second objective function constructed according to the product of the weighted difference adjustment data and the reference difference data is more reasonable and accurate.
Further, as shown in fig. 5, the process of performing cluster analysis on the classification reference data based on the number of target clusters to classify different desirers may include the following steps:
s501, selecting an initial clustering center from each classified reference data.
The initial clustering center may be any one or more of the classified reference data, which may become the classified reference data of the clustering center. The embodiment can randomly extract one or preset numbers from each classified reference data as an initial clustering center.
S502, determining the reference probability of the corresponding classification reference data as the clustering center according to the Manhattan distance between the classification reference data which is not selected as the clustering center and each existing clustering center.
Wherein the existing cluster centers comprise initial cluster centers. The reference probability can characterize the probability that each classified reference data not selected as a cluster center is taken as a cluster center, and it can be understood that the larger the reference probability, the larger the probability that the classified reference data is taken as a cluster center, and otherwise, the smaller the probability.
Specifically, in this embodiment, for each classified reference data that is not selected as a cluster center, the manhattan distance between the classified reference data and each existing cluster center is calculated, and the minimum distance is selected from the manhattan distances. And determining the reference probability of each reference classified data serving as the clustering center according to the minimum distance corresponding to the classified reference data which is not selected as the clustering center.
For example, for each classified reference data that is not selected as a cluster center, the ratio of its corresponding minimum distance to the sum of all minimum distances may be used as the reference probability for the classified reference data as a cluster center. The reference probability can be determined by the following formula:
in the method, in the process of the invention,representing a reference probability corresponding to the x-th classified reference data which is not selected as the cluster center; />Representing the minimum distance corresponding to the x-th classified reference data which is not selected as the clustering center; x represents the set of classification reference data not selected as cluster center.
It should be noted that, since the manhattan distance requires that the eigenvectors of the samples are numerical, and there is comparability between the dimensions. Therefore, the present embodiment requires data processing and feature encoding of the classification reference data before determining the manhattan distance between the classification reference data not selected as the cluster center and each existing cluster center. Illustratively, the data processing of the classification reference data may be performed based on a generated Pre-Training model (GPT), specifically as follows: step one, segmentation is carried out on classification reference data: the classification reference data is divided into individual tokens (Token) which may be words, subwords or characters or the like. Secondly, word embedding processing is carried out on each Token: each Token is encoded and mapped into a word embedding vector of a fixed dimension, and the word embedding vector corresponding to each Token is used for representing the semantics and the context information of the Token. Third, each real number vector is processed by an encoder: word embedding vectors are processed through a multi-layer self-attention mechanism and a feedforward neural network layer, so that the dependency relationship and the context information among Token are determined. And fourthly, inputting the output of the encoder to a corresponding processing layer (for example, a full connection layer is used in a classification task), and processing the received data by the processing layer so as to output a data processing result of the classification reference data, namely, a numerical representation of the classification reference data.
Further, a Manhattan distance between the classification reference data not selected as a cluster center and an existing cluster center is determined from the numerical representation of each classification reference data. The specific manhattan distance calculation formula may be as follows:
in the method, in the process of the invention,is the z-th dimension characteristic data in the x-th classified reference data which is not selected as the clustering center;is the z-dimension characteristic data in the classification reference data of the jth cluster center; m is the largest dimension of the feature data.
S503, selecting the next cluster center according to each reference probability based on a wheel disc method until the number of the existing cluster centers reaches the number of the target clusters.
Specifically, in this embodiment, one classified reference data with the largest reference probability is used as the next cluster center, the existing cluster center set is updated, and the operation of determining the reference probability of each classified reference data as the cluster center is re-executed S401 according to the updated existing cluster center until the number of cluster centers of the target cluster is determined, the wheel disc is stopped, the number of cluster centers of the target cluster is obtained, and the position of each cluster center is determined.
S504, carrying out cluster analysis on the corresponding classified reference data according to the classified reference data which is not selected as the cluster center and the Manhattan distance between the cluster centers so as to classify different demands.
Specifically, in this embodiment, the manhattan distance between the classification reference data not selected as the cluster center and each cluster center is determined, and according to the manhattan distance and a preset distance threshold, the cluster center corresponding to the classification reference data not selected as the cluster center is determined, and the classification reference data is clustered to the cluster where the cluster center is located. The distance threshold may be determined according to manual experience, or may be determined according to a plurality of experiments, which is not limited.
Optionally, the specific manner of determining the manhattan distance is described in detail in S402, and is not described herein. In this embodiment, if the manhattan distance between the classified reference data and a cluster center is smaller than the distance threshold, the classified reference data is clustered to the cluster in which the cluster center is located, so as to determine the number of clusters of the target clusters, and the classified reference data and the corresponding demander in each cluster, thereby achieving the effect of classifying different demanders.
Optionally, the present embodiment may divide different demand parties into multiple types such as a conservative type, a balanced type, or an active type.
In the above embodiment, the next cluster center is determined according to the manhattan distance between the classification reference data which is not selected as the cluster center and each existing cluster center, and after the number of target clusters is determined, the classification reference data is classified into the cluster clusters of the number of target clusters, thereby achieving the effect of classifying different demand parties. And different from Euclidean distance, manhattan distance is adopted in the process of determining the next clustering center, so that the process of determining the clustering center is more accurate, and the accuracy of classifying different demand parties is further improved.
For the convenience of understanding of those skilled in the art, the above-mentioned method for classifying a party is described in detail, and as shown in fig. 6, the method may include:
s601, acquiring classification reference data of different demanding parties.
The classification reference data comprises basic attribute data of the demander and interaction behavior data between the corresponding demander and the candidate object.
S602, determining cluster centers in song classification reference data based on a roulette method, and taking the number of the determined cluster centers as the number of first determined reference clusters.
It should be noted that, the number of first reference clusters may be determined as follows: randomly selecting one or more points in each classified reference data as an initial clustering center, respectively calculating the shortest distance between the classified reference data of each non-clustering center and each existing clustering center, and calculating the probability value that the classified reference data of each non-clustering center becomes the next clustering centerAnd selecting the next cluster center based on a wheel disc method until all cluster centers are selected, and taking the number of the cluster centers as the number of first reference clusters.
S603, constructing a first objective function by taking the number of clusters as an independent variable and taking the clustering degree of clustering analysis on each classified reference data based on the independent variable as a dependent variable.
S604, determining a reference clustering degree corresponding to a clustering result of the clustering analysis through the determined reference cluster number based on the first objective function.
S605, determining reference difference data between the number of the reference clusters to be determined and the number of the determined reference clusters based on a covariance function of a preset Gaussian distribution.
S606, taking the difference value between the reference clustering degree of the determined number of the reference clusters and the average value of the determined number of the reference clusters as difference adjustment data.
The variance adjustment data may be determined by the following formula:
in the method, in the process of the invention,representing the number of reference clusters to be determined; />Representing the number of reference clusters; />Representing reference difference data (i.e., covariance between the number of reference clusters to be determined and any reference cluster number) between the number of reference clusters to be determined and any reference cluster number,>the parameter may be a fixed value for the reference difference data set in advance.
It should be noted that, when there is at least one reference cluster number, the reference difference data in this embodiment may be expressed asWherein->Representing the number of reference clusters to be determined, +.>Representing a matrix of the number of each determined reference cluster. When only stored When a number of reference clusters is determined, K is a matrix containing only a single element, and the corresponding element is the determined number of reference clusters.
S607, determining the difference weight data among the determined reference cluster numbers based on a covariance function of a preset Gaussian distribution.
Illustratively, the differential weight data may be determined by the following formula:
wherein,differential weight data representing the number of each reference cluster determined; />Represents a data difference matrix, which is composed of +.>A matrix of components, wherein->And->Respectively representing the +.sup.th in the number of the determined reference clusters>Person and->Number of reference clusters>Representation->And->Data difference conditions between; />The noise coefficient is preset; />Is->A matrix with the same number of rows and a value of 1.
If there is only one determined number of reference clusters, the difference weight data between the determined number of reference clusters and itself is determined based on a covariance function of a preset gaussian distribution. Exemplary, if the determined number of reference clusters isThen in determining the difference weight data +.>In the course of (a), the above formula +.>Representation->(/>And its own differential weight data).
S608, weighting the difference adjustment data according to the difference weight data to update the difference adjustment data.
Illustratively, the updated variance adjustment data may be as follows
S609, constructing a second objective function according to the product of the reference difference data and the updated difference adjustment data.
Illustratively, the second objective function may be formulated as follows:
in the method, in the process of the invention,a value that is a second objective function; />Representing a preset ideal mean value, which may be determined empirically or through a number of experiments, and whose value may vary with specific needs, illustratively>The ideal average value of the preset gaussian distribution can be 0 or 10, and is not limited; />A value that is a second objective function; />Representing the number of reference clusters to be determined; />A matrix composed of the number of each reference cluster; />Representing a reference clustering degree; />Representing the average value of the number of each reference cluster; />Representing variance adjustment data; />Representing reference difference data; />Representing data differencesDifferent matrix, is composed of->A matrix of components, wherein->And->Respectively representing the +.sup.th in the number of the determined reference clusters>Person and- >Number of reference clusters>Representation->And->Data difference conditions between; />The noise coefficient is preset; />Is->A matrix with the same number of rows and a value of 1.
S610, taking the corresponding cluster number when the second objective function is maximum as a new reference cluster number; if the preset iteration termination condition is not satisfied, the routine returns to S604.
S611, under the condition that the preset iteration termination condition is met, taking the number of reference clusters determined by the last iteration as the number of target clusters.
When the number of iterations reaches a preset number or the determined number of reference clusters continuously fluctuates within a preset range, the iteration is stopped, and the number of reference clusters determined in the last iteration is used as the number of target clusters.
S612, selecting an initial clustering center from the classified reference data.
S613, determining the reference probability of the corresponding classification reference data as the clustering center according to the Manhattan distance between the classification reference data which is not selected as the clustering center and each existing clustering center.
Wherein the existing cluster centers comprise initial cluster centers.
S614, selecting the next cluster center according to each reference probability based on a wheel disc method until the number of the existing cluster centers reaches the number of the target clusters.
S615, performing cluster analysis on the corresponding classified reference data according to the classified reference data which is not selected as the cluster center and the Manhattan distance between the cluster centers so as to classify different demands.
Further, in one embodiment, the method may be used to make an item recommendation after classifying the demander, for example, as shown in fig. 7, and the method includes the following steps:
and S701, acquiring article demand data generated based on the clustering category of the to-be-recommended demand party.
The to-be-recommended requirement party can be a requirement party with an article recommendation requirement. The item demand data characterizes item demand preference data corresponding to the cluster category.
The type of the to-be-recommended demander is converted into a vector representation using TF-IDF (word frequency-inverse document frequency) coding. The basic idea of TF-IDF is: keyword(s)In document->The more times of occurrence, the +.>Document->The greater the importance of (2), the more can be passed +.>To express document +.>Is defined by the meaning of (1). In addition, keyword->The higher the frequency of occurrence in the further document, the description +.>The less contribution to distinguishing documents. Let the number of documents contained in the document set be +. >The document set contains keywords->The document number of (2) is->,/>Representing keyword +.>In document->The number of occurrences of>In document->Word frequency->Definition of the definitionThe method comprises the following steps:
in the method, in the process of the invention,expressed in document->Keywords appearing in the document; />At->The inverse frequency of the occurrence in the document set is defined as +.>
Illustratively, employDimension vector->The item demand data is represented. Wherein the individual components in each vector are calculated by the following formula:
in the method, in the process of the invention,representing individual components in the item demand data vector; />Is->In document->Word frequency of (a) is determined; />Is->The inverse frequency that appears in the document collection.
S702, acquiring article description data of different candidate articles.
Illustratively, employDimension vector->The item description data is represented. Wherein the individual components in each vector are calculated by the following formula:
in the method, in the process of the invention,representing individual components in the item demand data vector; />Is->In document->Word frequency of (a) is determined;is->The inverse frequency that appears in the document collection.
S703, selecting the articles to be recommended corresponding to the to-be-recommended demand party from the candidate articles according to the matching degree between the article demand data and the article description data.
Specifically, the similarity between the object demand data and the object description data can be determined, for example, the cosine similarity and the Pearson correlation coefficient similarity (Pearson correlation coefficient, pearson) between the object demand data and the object description data are respectively determined, the weighted summation processing is performed on the two similarities according to the weights corresponding to the preset similarities, the target similarity is obtained, and the matching degree between the object demand data and the object description data is determined according to the size of the target similarity. It will be appreciated that the greater the target similarity, the higher the degree of matching and vice versa.
Illustratively, the cosine similarity between the item demand data and the item description data may be determined according to the following formula:
in the method, in the process of the invention,cosine similarity between the article demand data and each article description data; />K-dimensional vector data representing item demand data; />K-dimensional vector data representing item description data; />Representing individual components in the item demand data vector; />Representing individual components in the item demand data vector.
The Pearson similarity between the item demand data and the item description data may be determined according to the following formula:
in the method, in the process of the invention,to determine a Pearson similarity between the item demand data and the item description data; />K-dimensional vector data representing item demand data; />K-dimensional vector data representing item description data; />Representing individual components in the item demand data vector; />Representing individual components in the item demand data vector; />
The determination formula of the target similarity can be as follows:
in the method, in the process of the invention,target similarity between the item demand data and each item description data;cosine similarity between the article demand data and each article description data; />To determine the demand data and description data of each article Pearson similarity between; />K-dimensional vector data representing item demand data; />K-dimensional vector data representing item description data; />Is a preset weighting coefficient. Note that the cosine similarity and Pearson similarity have values ranging from [ -1,1]Between, target similarity->The range of the value of (C) is [ -1,1]The closer the value is to 1, the more similar.
Further, based on the above-described demand side classification method, the present embodiment also provides an item recommendation dedicated input terminal (Portable Android Device, PAD) including, as shown in fig. 8, a wireless network unit 801, a processing system 802, an information receiving unit 803, an information matching unit 804, and an information transmitting unit 805. The PAD processing system includes a customer classification system and a financial product recommendation system, a wireless network unit 801 of the PAD is connected with a number calling system, and when recommending an article, firstly, a to-be-recommended demand party can call numbers through the number calling system, and when detecting that a number calling operation exists, an information receiving unit 803 of the PAD receives basic information of the to-be-recommended demand party. These basic information are then transferred to the processing system 802 for further processing. After receiving the basic information of the client, the processing system 802 pre-processes the basic information to obtain the classified reference data corresponding to the to-be-recommended demander. Then, the classification reference data is matched with the financial product information by the information matching unit 804. And determines a degree of matching (similarity) between the item demand data of the to-be-recommended demander and the item description information of each candidate recommended product, and if the degree of similarity is smaller than a preset threshold value, recommends the candidate recommended item to the to-be-recommended demander (displayed on the PAD interface) through the information transmission unit 805. If no similarity is smaller than the preset threshold value, determining that there is no article which can be recommended to the to-be-recommended demander, and sending a prompt of no recommended product to the PAD display interface through the information sending unit 805. Meanwhile, the processing system 802 may also call the information sending unit 805 to send information to the terminal of the item recommendation user (staff) to remind them that the to-be-recommended demand party is the target demand party, so that item recommendation can be performed. The real-time reminding mechanism can help staff to better grasp the article recommending time and improve the article recommending effect.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a demand side classification device for realizing the above related demand side classification method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the one or more demand sorting devices provided below may be referred to the limitation of the demand sorting method hereinabove, and will not be repeated here.
In one exemplary embodiment, as shown in fig. 9, there is provided a demand side classification apparatus including: a data acquisition module 901, a function construction module 902, a quantity determination module 903 and a classification module 904, wherein:
the data acquisition module 901 is configured to acquire classification reference data of different demanding parties.
The classification reference data comprises basic attribute data of the demander and interaction behavior data between the corresponding demander and the candidate object.
The function construction module 902 is configured to construct a first objective function with the number of clusters as an argument and a clustering degree of performing cluster analysis on each classification reference data based on the argument as an argument.
The number determining module 903 is configured to take, as the number of target clusters, the number of clusters corresponding to the gaussian mean value when the first objective function meets a preset gaussian distribution.
The classification module 904 is configured to perform cluster analysis on each classification reference data based on the number of target clusters, so as to classify different requesters.
In one embodiment, the number determination module 903 includes a reference cluster degree determination unit, a second objective function construction unit, an iterative clustering unit, and a number determination unit. Wherein:
And the reference cluster degree determining unit is used for determining the reference cluster degree corresponding to the clustering result of the clustering analysis through the determined reference cluster number based on the first objective function. And the second objective function construction unit is used for constructing a second objective function according to the data difference condition between the number of the reference clusters to be determined and the number of the determined reference clusters and the degree of each reference cluster. And the iterative clustering unit is used for taking the corresponding cluster number when the second objective function is maximum as the new reference cluster number, and iteratively executing the determination operation of the reference clustering degree until the preset iteration termination condition is met. The number determining unit is used for taking the number of the reference clusters determined when the preset iteration termination condition is met as the number of the target clusters.
In one embodiment, the second objective function construction unit comprises a reference difference data determination subunit, a difference adjustment data determination subunit, and a second objective function construction subunit, wherein: a reference difference data determining subunit, configured to determine reference difference data between the number of reference clusters to be determined and the number of determined reference clusters based on a covariance function of a preset gaussian distribution. And the difference adjustment data determining subunit is used for taking the difference value between the reference clustering degree of the determined number of each reference cluster and the average value of the determined number of each reference cluster as difference adjustment data. And the second objective function construction subunit is used for constructing a second objective function according to the product of the reference difference data and the difference adjustment data.
In one embodiment, the second objective function construction subunit is specifically configured to determine difference weight data according to the determined data difference situation between the number of reference clusters; weighting the difference adjustment data according to the difference weight data to update the difference adjustment data; and constructing a second objective function according to the product of the reference difference data and the updated difference adjustment data.
In an embodiment, the second objective function construction subunit is further configured to determine difference weight data between the determined number of reference clusters based on a covariance function of a preset gaussian distribution.
In an embodiment, if there is only one determined number of reference clusters, the second objective function construction subunit is further configured to determine difference weight data between the determined number of reference clusters and itself based on a covariance function of a preset gaussian distribution.
In one embodiment, the classification module 904 includes an initial cluster center determination unit, a reference probability determination unit, a wheel unit, and a classification unit, wherein: and the initial cluster center determining unit is used for selecting an initial cluster center from the classified reference data. And the reference probability determining unit is used for determining the corresponding classification reference data as the reference probability of the clustering center according to the Manhattan distance between the classification reference data which is not selected as the clustering center and each existing clustering center. Wherein the existing cluster centers comprise initial cluster centers. The wheel disc unit is used for selecting the next cluster center according to each reference probability based on a wheel disc method until the number of the existing cluster centers reaches the number of the target clusters. And the classification unit is used for carrying out cluster analysis on the corresponding classification reference data according to the classification reference data which is not selected as the cluster center and the Manhattan distance between the cluster centers so as to classify different demands.
The respective modules in the above-described demand side sorting apparatus may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In an exemplary embodiment, a computer device, which may be a terminal, is provided, and an internal structure thereof may be as shown in fig. 10. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a method of demand side classification. The display unit of the computer device is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device. The display screen can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structure shown in fig. 10 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one exemplary embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:
acquiring classification reference data of different demanding parties; the classification reference data comprise basic attribute data of the demander and interaction behavior data between the corresponding demander and the candidate object;
the method comprises the steps of taking the number of clusters as independent variables, and taking the clustering degree of clustering analysis on each classified reference data based on the independent variables as dependent variables to construct a first objective function;
under the condition that the first objective function accords with the preset Gaussian distribution, taking the cluster number corresponding to the Gaussian mean value as the target cluster number;
and carrying out cluster analysis on each classified reference data based on the number of the target clusters so as to classify different demanding parties.
In one embodiment, the processor when executing the computer program further performs the steps of: determining a reference clustering degree corresponding to a clustering result of the clustering analysis through the number of the determined reference clusters based on a first objective function; constructing a second objective function according to the data difference condition between the number of the reference clusters to be determined and the number of the determined reference clusters and the degree of each reference cluster; taking the corresponding cluster number when the second objective function is maximum as the new reference cluster number, and iteratively executing the determining operation of the reference cluster degree until the preset iteration termination condition is met; and taking the number of the reference clusters determined when the preset iteration termination condition is met as the number of the target clusters.
In one embodiment, the processor when executing the computer program further performs the steps of: determining reference difference data between the number of the reference clusters to be determined and the number of the determined reference clusters based on a covariance function of a preset Gaussian distribution; taking the difference between the reference clustering degree of the determined number of the reference clusters and the average value of the determined number of the reference clusters as difference adjustment data; and constructing a second objective function according to the product of the reference difference data and the difference adjustment data.
In one embodiment, the processor when executing the computer program further performs the steps of: determining difference weight data according to the determined data difference conditions among the number of the reference clusters; weighting the difference adjustment data according to the difference weight data to update the difference adjustment data; and constructing a second objective function according to the product of the reference difference data and the updated difference adjustment data.
In one embodiment, the processor when executing the computer program further performs the steps of: and determining difference weight data among the determined number of each reference cluster based on a covariance function of a preset Gaussian distribution.
In one embodiment, the processor when executing the computer program further performs the steps of: and determining difference weight data of the determined number of the reference clusters and the determined number of the reference clusters based on a covariance function of a preset Gaussian distribution.
In one embodiment, the processor when executing the computer program further performs the steps of: selecting an initial clustering center from the classified reference data; determining the reference probability of the corresponding classified reference data as the clustering center according to the Manhattan distance between the classified reference data which is not selected as the clustering center and each existing clustering center; wherein the existing cluster centers comprise initial cluster centers; based on a wheel disc method, selecting the next clustering center according to each reference probability until the number of the existing clustering centers reaches the number of target clusters; and carrying out cluster analysis on the corresponding classified reference data according to the Manhattan distance between the classified reference data which is not selected as the cluster center and each cluster center so as to classify different demanding parties.
In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of: acquiring classification reference data of different demanding parties; the classification reference data comprise basic attribute data of the demander and interaction behavior data between the corresponding demander and the candidate object; the method comprises the steps of taking the number of clusters as independent variables, and taking the clustering degree of clustering analysis on each classified reference data based on the independent variables as dependent variables to construct a first objective function; under the condition that the first objective function accords with the preset Gaussian distribution, taking the cluster number corresponding to the Gaussian mean value as the target cluster number; and carrying out cluster analysis on each classified reference data based on the number of the target clusters so as to classify different demanding parties.
In one embodiment, the computer program when executed by the processor further performs the steps of: determining a reference clustering degree corresponding to a clustering result of the clustering analysis through the number of the determined reference clusters based on a first objective function; constructing a second objective function according to the data difference condition between the number of the reference clusters to be determined and the number of the determined reference clusters and the degree of each reference cluster; taking the corresponding cluster number when the second objective function is maximum as the new reference cluster number, and iteratively executing the determining operation of the reference cluster degree until the preset iteration termination condition is met; and taking the number of the reference clusters determined when the preset iteration termination condition is met as the number of the target clusters.
In one embodiment, the computer program when executed by the processor further performs the steps of: determining reference difference data between the number of the reference clusters to be determined and the number of the determined reference clusters based on a covariance function of a preset Gaussian distribution; taking the difference between the reference clustering degree of the determined number of the reference clusters and the average value of the determined number of the reference clusters as difference adjustment data; and constructing a second objective function according to the product of the reference difference data and the difference adjustment data.
In one embodiment, the computer program when executed by the processor further performs the steps of: determining difference weight data according to the determined data difference conditions among the number of the reference clusters; weighting the difference adjustment data according to the difference weight data to update the difference adjustment data; and constructing a second objective function according to the product of the reference difference data and the updated difference adjustment data.
In one embodiment, the computer program when executed by the processor further performs the steps of: and determining difference weight data among the determined number of each reference cluster based on a covariance function of a preset Gaussian distribution.
In one embodiment, the computer program when executed by the processor further performs the steps of: and determining difference weight data of the determined number of the reference clusters and the determined number of the reference clusters based on a covariance function of a preset Gaussian distribution.
In one embodiment, the computer program when executed by the processor further performs the steps of: selecting an initial clustering center from the classified reference data; determining the reference probability of the corresponding classified reference data as the clustering center according to the Manhattan distance between the classified reference data which is not selected as the clustering center and each existing clustering center; wherein the existing cluster centers comprise initial cluster centers; based on a wheel disc method, selecting the next clustering center according to each reference probability until the number of the existing clustering centers reaches the number of target clusters; and carrying out cluster analysis on the corresponding classified reference data according to the Manhattan distance between the classified reference data which is not selected as the cluster center and each cluster center so as to classify different demanding parties.
In one embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, performs the steps of: acquiring classification reference data of different demanding parties; the classification reference data comprise basic attribute data of the demander and interaction behavior data between the corresponding demander and the candidate object; the method comprises the steps of taking the number of clusters as independent variables, and taking the clustering degree of clustering analysis on each classified reference data based on the independent variables as dependent variables to construct a first objective function; under the condition that the first objective function accords with the preset Gaussian distribution, taking the cluster number corresponding to the Gaussian mean value as the target cluster number; and carrying out cluster analysis on each classified reference data based on the number of the target clusters so as to classify different demanding parties.
In one embodiment, the computer program when executed by the processor further performs the steps of: determining a reference clustering degree corresponding to a clustering result of the clustering analysis through the number of the determined reference clusters based on a first objective function; constructing a second objective function according to the data difference condition between the number of the reference clusters to be determined and the number of the determined reference clusters and the degree of each reference cluster; taking the corresponding cluster number when the second objective function is maximum as the new reference cluster number, and iteratively executing the determining operation of the reference cluster degree until the preset iteration termination condition is met; and taking the number of the reference clusters determined when the preset iteration termination condition is met as the number of the target clusters.
In one embodiment, the computer program when executed by the processor further performs the steps of: determining reference difference data between the number of the reference clusters to be determined and the number of the determined reference clusters based on a covariance function of a preset Gaussian distribution; taking the difference between the reference clustering degree of the determined number of the reference clusters and the average value of the determined number of the reference clusters as difference adjustment data; and constructing a second objective function according to the product of the reference difference data and the difference adjustment data.
In one embodiment, the computer program when executed by the processor further performs the steps of: determining difference weight data according to the determined data difference conditions among the number of the reference clusters; weighting the difference adjustment data according to the difference weight data to update the difference adjustment data; and constructing a second objective function according to the product of the reference difference data and the updated difference adjustment data.
In one embodiment, the computer program when executed by the processor further performs the steps of: and determining difference weight data among the determined number of each reference cluster based on a covariance function of a preset Gaussian distribution.
In one embodiment, the computer program when executed by the processor further performs the steps of: and determining difference weight data of the determined number of the reference clusters and the determined number of the reference clusters based on a covariance function of a preset Gaussian distribution.
In one embodiment, the computer program when executed by the processor further performs the steps of: selecting an initial clustering center from the classified reference data; determining the reference probability of the corresponding classified reference data as the clustering center according to the Manhattan distance between the classified reference data which is not selected as the clustering center and each existing clustering center; wherein the existing cluster centers comprise initial cluster centers; based on a wheel disc method, selecting the next clustering center according to each reference probability until the number of the existing clustering centers reaches the number of target clusters; and carrying out cluster analysis on the corresponding classified reference data according to the Manhattan distance between the classified reference data which is not selected as the cluster center and each cluster center so as to classify different demanding parties.
It should be noted that, the user information (including, but not limited to, basic attribute data of the requiring party, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data are required to meet the related regulations.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (11)

1. A method of classifying a party in need thereof, comprising:
acquiring classification reference data of different demanding parties; the classification reference data comprise basic attribute data of a demand party and interaction behavior data between the corresponding demand party and candidate articles;
the number of clusters is taken as an independent variable, and the clustering degree of clustering analysis on each classified reference data based on the independent variable is taken as a dependent variable, so that a first objective function is constructed;
Under the condition that the first objective function accords with a preset Gaussian distribution, taking the number of clusters corresponding to the Gaussian mean value as the number of target clusters;
and carrying out cluster analysis on each classified reference data based on the number of the target clusters so as to classify different demanding parties.
2. The method according to claim 1, wherein, in the case where the first objective function conforms to a preset gaussian distribution, taking the number of clusters corresponding to the gaussian mean as the target number of clusters includes:
determining a reference clustering degree corresponding to a clustering result of the clustering analysis through the number of the determined reference clusters based on the first objective function;
constructing a second objective function according to the data difference condition between the number of the reference clusters to be determined and the number of the determined reference clusters and the reference clustering degree;
taking the corresponding cluster number when the second objective function is maximum as a new reference cluster number, and iteratively executing the determining operation of the reference cluster degree until a preset iteration termination condition is met;
and taking the number of the reference clusters determined when the preset iteration termination condition is met as the number of the target clusters.
3. The method according to claim 2, wherein constructing a second objective function according to the data difference between the number of reference clusters to be determined and the determined number of reference clusters and the degree of each reference cluster comprises:
Determining reference difference data between the number of the reference clusters to be determined and the number of the determined reference clusters based on the covariance function of the preset Gaussian distribution;
taking the difference value between the reference clustering degree of the determined number of the reference clusters and the average value of the reference clustering degree corresponding to the determined number of the reference clusters as difference adjustment data;
and constructing the second objective function according to the product of the reference difference data and the difference adjustment data.
4. A method according to claim 3, wherein said constructing said second objective function from the product of said reference difference data and said difference adjustment data comprises:
determining difference weight data according to the determined data difference conditions among the number of the reference clusters;
weighting the difference adjustment data according to the difference weight data to update the difference adjustment data;
and constructing the second objective function according to the product of the reference difference data and the updated difference adjustment data.
5. The method of claim 4, wherein determining difference weight data based on the determined data difference between the number of reference clusters comprises:
And determining difference weight data among the determined reference cluster numbers based on the covariance function of the preset Gaussian distribution.
6. The method of claim 5, wherein if there is only one determined number of reference clusters, determining difference weight data between the determined number of reference clusters based on the covariance function of the preset gaussian distribution, comprises:
and determining difference weight data of the determined number of the reference clusters and the determined difference weight data based on the covariance function of the preset Gaussian distribution.
7. The method of any of claims 1-6, wherein performing cluster analysis on each of the classified reference data based on the number of target clusters to classify different desirors comprises:
selecting an initial clustering center from each classified reference data;
determining the reference probability of the corresponding classified reference data as the clustering center according to the Manhattan distance between the classified reference data which is not selected as the clustering center and each existing clustering center; wherein the existing cluster center comprises the initial cluster center;
based on a wheel disc method, selecting the next clustering center according to each reference probability until the number of the existing clustering centers reaches the number of the target clusters;
And carrying out cluster analysis on the corresponding classified reference data according to the Manhattan distance between the classified reference data which is not selected as the cluster center and each cluster center so as to classify different demanding parties.
8. A demand side classification apparatus, the apparatus comprising:
the data acquisition module is used for acquiring the classified reference data of different demanding parties; the classification reference data comprise basic attribute data of a demand party and interaction behavior data between the corresponding demand party and candidate articles;
the function construction module is used for constructing a first objective function by taking the number of clusters as independent variables and taking the clustering degree of clustering analysis on the classified reference data based on the independent variables as dependent variables;
the quantity determining module is used for taking the quantity of clusters corresponding to the Gaussian mean value as the quantity of target clusters under the condition that the first objective function accords with the preset Gaussian distribution;
and the classification module is used for carrying out cluster analysis on each classification reference data based on the number of the target clusters so as to classify different demands.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1-7 when the computer program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1-7.
11. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any of claims 1-7.
CN202410040812.2A 2024-01-11 2024-01-11 Method, device, equipment and storage medium for classifying desiring party Pending CN117725442A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410040812.2A CN117725442A (en) 2024-01-11 2024-01-11 Method, device, equipment and storage medium for classifying desiring party

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410040812.2A CN117725442A (en) 2024-01-11 2024-01-11 Method, device, equipment and storage medium for classifying desiring party

Publications (1)

Publication Number Publication Date
CN117725442A true CN117725442A (en) 2024-03-19

Family

ID=90205422

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410040812.2A Pending CN117725442A (en) 2024-01-11 2024-01-11 Method, device, equipment and storage medium for classifying desiring party

Country Status (1)

Country Link
CN (1) CN117725442A (en)

Similar Documents

Publication Publication Date Title
US20230013306A1 (en) Sensitive Data Classification
Shen et al. Deep variational matrix factorization with knowledge embedding for recommendation system
WO2022057658A1 (en) Method and apparatus for training recommendation model, and computer device and storage medium
EP4322031A1 (en) Recommendation method, recommendation model training method, and related product
JP6334431B2 (en) Data analysis apparatus, data analysis method, and data analysis program
Liu et al. Novel evolutionary multi-objective soft subspace clustering algorithm for credit risk assessment
CN114691973A (en) Recommendation method, recommendation network and related equipment
Angadi et al. Multimodal sentiment analysis using reliefF feature selection and random forest classifier
CN111209469A (en) Personalized recommendation method and device, computer equipment and storage medium
CN115879508A (en) Data processing method and related device
CN113656699B (en) User feature vector determining method, related equipment and medium
CN111538909A (en) Information recommendation method and device
CN114997916A (en) Prediction method, system, electronic device and storage medium of potential user
Xu Machine Learning for Flavor Development
Agustyaningrum et al. Online shopper intention analysis using conventional machine learning and deep neural network classification algorithm
CN115905648B (en) Gaussian mixture model-based user group and financial user group analysis method and device
CN112070559A (en) State acquisition method and device, electronic equipment and storage medium
CN117725442A (en) Method, device, equipment and storage medium for classifying desiring party
CN110959157A (en) Accelerating large-scale similarity calculations
CN114357184A (en) Item recommendation method and related device, electronic equipment and storage medium
CN113792952A (en) Method and apparatus for generating a model
CN110941714A (en) Classification rule base construction method, application classification method and device
Oyama et al. Link prediction across time via cross-temporal locality preserving projections
JP7428858B2 (en) Information processing device, information processing method, and information processing program
CN114706927B (en) Data batch labeling method based on artificial intelligence and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination