CN116720094A - Client information clustering method and device, processor and electronic equipment - Google Patents

Client information clustering method and device, processor and electronic equipment Download PDF

Info

Publication number
CN116720094A
CN116720094A CN202310719562.0A CN202310719562A CN116720094A CN 116720094 A CN116720094 A CN 116720094A CN 202310719562 A CN202310719562 A CN 202310719562A CN 116720094 A CN116720094 A CN 116720094A
Authority
CN
China
Prior art keywords
centroid
candidate
error function
client information
clustering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310719562.0A
Other languages
Chinese (zh)
Inventor
程永龙
王钰
范淑君
王睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202310719562.0A priority Critical patent/CN116720094A/en
Publication of CN116720094A publication Critical patent/CN116720094A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Technology Law (AREA)
  • Strategic Management (AREA)
  • Computational Mathematics (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Development Economics (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a clustering method, a device, a processor and electronic equipment of client information, wherein the method is applied to the technical field of big data, and comprises the following steps: s101, calculating an error function of each piece of customer information in a customer information set to obtain a first error function, and determining the customer information corresponding to the first error function as a first centroid; s102, determining a first candidate centroid set according to the determined centroids and other client information, determining the next centroid according to the first candidate centroid set, and repeatedly executing the step S102 until K centroids are determined; and S103, clustering the client information sets according to the K centroids to obtain a target clustering result. The application solves the problem of inaccurate clustering result caused by taking discrete points or noise data in the client information as the mass centers when the client information is clustered and the mass centers used for clustering are acquired by adopting a random selection mode in the related technology.

Description

Client information clustering method and device, processor and electronic equipment
Technical Field
The application relates to the technical field of big data, in particular to a clustering method and device of client information, a processor and electronic equipment.
Background
Currently, in order to minimize the occurrence of default events (for example, a loan cannot be withdrawn to become bad account) in a financial institution, when a worker in the financial institution transacts with a customer, it is often necessary to evaluate the credit of the customer and determine a transaction policy of the financial institution transacting with the customer according to the evaluation result of the credit of the customer. In the prior art, when evaluating the credit of the client, the client is generally clustered according to the transaction data of the client and the financial institution and the asset condition of the client, and the credit level of the client is determined according to the clustering result.
In the prior art, a K-Means clustering algorithm is generally adopted to classify clients, but the traditional K-Means clustering algorithm adopts a method for randomly extracting initial centroids to initialize the clustering algorithm, so that the initial centroids which are randomly extracted can be too scattered or too concentrated, the initial centroids are unevenly distributed, the convergence speed of the clustering algorithm is slower, the clustering effect is poorer, and the classification result of the clients is inaccurate.
Aiming at the problem that in the related technology, when client information is clustered and centroids used for clustering are obtained in a random selection mode, discrete points or noise data in the client information are possibly used as centroids, so that a clustering result is inaccurate, no effective solution is proposed at present.
Disclosure of Invention
The application mainly aims to provide a clustering method, a device, a processor and electronic equipment for client information, so as to solve the problem that when client information is clustered in the related technology and centroids used for clustering are obtained in a random selection mode, discrete points or noise data in the client information are possibly used as centroids, and a clustering result is inaccurate.
In order to achieve the above object, according to one aspect of the present application, there is provided a clustering method of client information, the method comprising: s101, calculating an error function of each piece of customer information in a customer information set to obtain a plurality of error functions, determining a first error function from the plurality of error functions, and determining customer information corresponding to the first error function as a first centroid, wherein the error function is a sum of squares of cosine distances between the customer information and other customer information in the customer information set; s102, determining a first candidate centroid set according to the determined centroids and other client information except the determined centroids in the client information set, determining the next centroid according to the first candidate centroid set, and repeatedly executing the step S102 until K centroids are determined, wherein K is a positive integer greater than 3, and the determined centroids comprise the first centroid; and S103, clustering the client information in the client information set according to the K centroids to obtain a target clustering result.
Further, determining a first set of candidate centroids based on the determined centroids and other customer information in the set of customer information than the determined centroids comprises: calculating cosine distances between the determined centroid and other client information except the determined centroid in the client information set to obtain N cosine distances between each client information and the determined centroid, wherein the determined centroid comprises N centroids; determining first customer information according to N cosine distances between each customer information and the determined centroid, and determining the first customer information as a next candidate centroid; and determining the first candidate centroid set according to the cosine distance between the next candidate centroid and the client information except the determined centroid in the client information set.
Further, determining a next centroid from the first set of candidate centroids includes: m clustering is carried out on the client information set according to the first candidate centroid set and the determined centroid, M clustering results are obtained, wherein the first candidate centroid set contains M client data, M is an integer, and the clustering results contain a plurality of clusters; obtaining candidate error function sets corresponding to the M clustering results according to candidate error functions in each cluster in each clustering result; determining a next error function according to the determined error function and the candidate error function set; and determining a target clustering result corresponding to the next error function, and determining a first candidate centroid corresponding to the target clustering result as the next centroid.
Further, performing M clustering on the client information set according to the first candidate centroid set and the determined centroid, to obtain M clustering results, including: combining the determined centroid with each client information in the first candidate centroid set to obtain M second candidate centroid sets; and clustering the client information set for M times according to the M second candidate centroid sets to obtain M clustering results.
Further, obtaining the candidate error function set corresponding to the M clustering results according to the candidate error function in each cluster in each clustering result includes: calculating an error function of each cluster in each clustering result to obtain a candidate error function corresponding to each cluster in the clustering result; determining an average candidate error function of the clustering result according to the candidate error function corresponding to each cluster in the clustering result; and integrating all the average candidate error functions of the clustering results to obtain the candidate error function set.
Further, determining a next error function from the determined error function and the set of candidate error functions comprises: calculating the difference value between an objective error function and each candidate error function in the candidate error function set to obtain M difference values, wherein the objective error function is one error function in the determined error functions; and sequencing the M differences, and determining a candidate error function corresponding to the difference with the sequence of the first bit as the next error function.
Further, after clustering the client information in the client information set according to the K centroids to obtain a target clustering result, the method further includes: determining K classes of clients corresponding to the client information in the client information set according to the target clustering result; and adjusting the transaction strategy of each type of clients according to the preset transaction strategy.
In order to achieve the above object, according to another aspect of the present application, there is provided a clustering apparatus of client information, the apparatus comprising: the computing unit is used for computing an error function of each piece of customer information in the customer information set to obtain a plurality of error functions, determining a first error function from the plurality of error functions, and determining customer information corresponding to the first error function as a first centroid, wherein the error function is the sum of squares of cosine distances between the customer information and other pieces of customer information in the customer information set; a first determining unit, configured to determine a first candidate centroid set according to the determined centroids and other client information in the client information set except the determined centroids, and determine a next centroid according to the first candidate centroid set, and repeatedly execute the steps of the first determining unit until K centroids are determined, where K is a positive integer greater than 3, and the determined centroids include the first centroid; and the clustering unit is used for clustering the client information in the client information set according to the K centroids to obtain a target clustering result.
Further, the first determination unit includes: a first calculating subunit, configured to calculate cosine distances between the determined centroid and other client information in the client information set except for the determined centroid, to obtain N cosine distances between each client information and the determined centroid, where the determined centroid includes N centroids; a first determining subunit, configured to determine first customer information according to N cosine distances between each customer information and the determined centroid, and determine the first customer information as a next candidate centroid; and the second determination subunit is used for determining the first candidate centroid set according to the cosine distance between the next candidate centroid and the client information except the determined centroid in the client information set.
Further, the first determination unit includes: a clustering subunit, configured to cluster the client information set M times according to the first candidate centroid set and the determined centroid, to obtain M clustering results, where the first candidate centroid set includes M client data, M is an integer, and the clustering results include a plurality of clusters; the second calculation subunit is used for obtaining candidate error function sets corresponding to the M clustering results according to candidate error functions in each cluster in each clustering result; a third determining subunit, configured to determine a next error function according to the determined error function and the candidate error function set; and the fourth determination subunit is used for determining a target clustering result corresponding to the next error function and determining a first candidate centroid corresponding to the target clustering result as the next centroid.
Further, the clustering subunit includes: a first processing module, configured to combine the determined centroid with each client information in the first candidate centroid set to obtain M second candidate centroid sets; and the clustering module is used for carrying out M times of clustering on the client information sets according to the M second candidate centroid sets to obtain M clustering results.
Further, the second computing subunit includes: the first calculation module is used for calculating the error function of each cluster in each clustering result to obtain a candidate error function corresponding to each cluster in the clustering result; the determining module is used for determining an average candidate error function of the clustering result according to the candidate error function corresponding to each cluster in the clustering result; and the second processing module is used for synthesizing the average candidate error functions of all the clustering results to obtain the candidate error function set.
Further, the third determining subunit includes: the second calculation module is used for calculating the difference value between an objective error function and each candidate error function in the candidate error function set to obtain M difference values, wherein the objective error function is one error function in the determined error functions; and the third processing module is used for sequencing the M difference values and determining a candidate error function corresponding to the difference value with the first order as the next error function.
Further, the apparatus further comprises: the second determining unit is used for determining K classes of clients corresponding to the client information in the client information set according to the target clustering result after clustering the client information in the client information set according to the K centroids to obtain the target clustering result; and the adjusting unit is used for adjusting the transaction strategy of each type of clients according to the preset transaction strategy.
In order to achieve the above object, according to one aspect of the present application, there is provided a processor for executing a program, wherein the program executes a clustering method of client information as set forth in any one of the above.
In order to achieve the above object, according to one aspect of the present application, there is provided an electronic device including one or more processors and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method for clustering client information according to any one of the above.
According to the application, the following steps are adopted: s101, calculating an error function of each piece of customer information in a customer information set to obtain a plurality of error functions, determining a first error function from the plurality of error functions, and determining customer information corresponding to the first error function as a first centroid, wherein the error function is a sum of squares of cosine distances between the customer information and other customer information in the customer information set; s102, determining a first candidate centroid set according to the determined centroids and other client information except the determined centroids in the client information set, determining the next centroid according to the first candidate centroid set, and repeatedly executing the step S102 until K centroids are determined, wherein K is a positive integer greater than 3, and the determined centroids comprise the first centroid; s103, clustering the client information in the client information set according to the K centroids to obtain a target clustering result, and solving the problem that the clustering result is inaccurate because discrete points or noise data in the client information are taken as centroids when the client information is clustered and centroids used for clustering are acquired in a random selection mode in the related technology. By calculating the error function of each piece of customer information in the customer information set, a first centroid which is closer to other pieces of customer information can be determined in the customer information set, K centroids which are distributed in a scattered manner are determined according to the determined centroids and the customer information except the determined centroids in the customer information set, the problem that the clustering result is poor due to the fact that the traditional K-Means algorithm randomly selects discrete noise data as the centroids used for clustering is avoided, the convergence speed of the K-Means algorithm and the quality of the clustering result are improved, and the clustering quality of the clustering result of the customer information is further improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application. In the drawings:
FIG. 1 is a flow chart of a method for clustering customer information provided in accordance with a first embodiment of the present application;
FIG. 2 is a schematic diagram of an alternative method for clustering customer information provided in accordance with an embodiment of the present application;
FIG. 3 is a schematic diagram of a clustering apparatus for client information provided according to a second embodiment of the present application;
fig. 4 is a schematic diagram of a clustering electronic device for client information provided according to a fifth embodiment of the present application.
Detailed Description
It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.
It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, user transaction information, user information stored in financial institutions, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, processed data, calculated data, inputted data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data is required to comply with the related laws and regulations and standards of the related country and region, and is provided with a corresponding operation portal for the user to select authorization or rejection.
In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate in order to describe the embodiments of the application herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
The present application will be described with reference to preferred implementation steps, and fig. 1 is a flowchart of a method for clustering client information according to an embodiment of the present application, and as shown in fig. 1, the method includes the following steps:
step S101, calculating an error function of each piece of customer information in the customer information set to obtain a plurality of error functions, determining a first error function from the plurality of error functions, and determining customer information corresponding to the first error function as a first centroid, wherein the error function is the sum of squares of cosine distances between the customer information and other customer information in the customer information set.
In the first embodiment, the customer information is information related to the customer stored in the financial institution, for example, information such as transaction information of the customer in the financial institution, asset information of the customer, liability information of the customer, and the like. The customer information may be information after processing the customer information, for example, credit rating information of the customer in a financial institution, risk rating information of the customer in the financial institution, or the like.
In an alternative embodiment, the step of determining the first centroid in the set of customer information may be as follows: step S1011, calculating an error function of each client information in the client information set, for example, the client information set includes client information a, client information B and client information C, the cosine distance between the client information a and the client information B is cosine distance D1, the cosine distance between the client information a and the client information C is cosine distance D2, the error function of the client information a in the client information set is the sum of the square of the cosine distance D1 and the square of the cosine distance D2, and calculating an error function of the client information B and an error function of the client information C; in step S1012, among the error functions of each piece of customer information in the customer information set, the smallest error function may be determined as the first error function, and the customer information corresponding to the first error function is determined as the first centroid, for example, the customer information set includes customer information a, customer information B, and customer information C, and the error function of customer information a is smallest, and then the error function of customer information a is determined as the first error function, and customer information a is determined as the first centroid.
It should be noted that, before calculating the client information in the client information set, text data of the client information related to the client needs to be acquired, the text data of the client information is preprocessed, the text data of the client information is converted into word vectors, and the client information set is obtained from the word vector set. Specifically, text data of customer information within a preset time period (e.g., 6 months, 1 year, etc.) is acquired in a preset database containing customer information at a financial institution; then, preprocessing (such as word segmentation processing, word removal processing, etc.) is performed on the text data of the acquired client information, so as to obtain preprocessed client information; secondly, inputting the preprocessed customer information into a word2vec model for processing to obtain word vectors; and finally, forming a client information set by the word vectors corresponding to the client information.
Step S102, a first candidate centroid set is determined according to the determined centroids and other client information except the determined centroids in the client information set, a next centroid is determined according to the first candidate centroid set, and the step S102 is repeatedly executed until K centroids are determined, wherein K is a positive integer greater than 3, and the determined centroids comprise the first centroid.
In a first embodiment, the determined centroid represents a centroid that has been determined when the step is performed, wherein the determined centroid includes at least the first centroid. For example, after determining the first centroid, the determined centroid is the first centroid, and step S102 may be specifically represented as determining a first candidate centroid set according to the first centroid and client information other than the first centroid in the client information set, and determining a second centroid according to the first candidate centroid set; after determining the second centroid, the determined centroids are a first centroid and a second centroid, and step S102 may be specifically represented as a first candidate centroid set based on the first centroid, the second centroid, and customer information other than the first centroid and the second centroid in the customer information set, and a third centroid is determined based on the first candidate centroid set.
Step S103, clustering the client information in the client information set according to the K centroids to obtain a target clustering result.
In the first embodiment, in order to divide clients into different categories, the client information in the client information set needs to be clustered according to the K centroids obtained in step S102.
In summary, in the clustering method of the client information provided in the first embodiment of the present application, through step S101, an error function of each client information in a client information set is calculated to obtain a plurality of error functions, a first error function is determined from the plurality of error functions, and client information corresponding to the first error function is determined as a first centroid, where the error function is a sum of squares of cosine distances between the client information and other client information in the client information set; step S102, a first candidate centroid set is determined according to the determined centroids and other client information except the determined centroids in the client information set, a next centroid is determined according to the first candidate centroid set, and the step S102 is repeatedly executed until K centroids are determined, wherein K is a positive integer greater than 3, and the determined centroids comprise the first centroid; step S103, clustering the client information in the client information set according to K centroids to obtain a target clustering result, and solving the problem that the clustering result is inaccurate because discrete points or noise data in the client information are taken as centroids when the client information is clustered and centroids used for clustering are acquired in a random selection mode in the related technology. By calculating the error function of each piece of customer information in the customer information set, a first centroid which is closer to other pieces of customer information can be determined in the customer information set, K centroids which are distributed more dispersedly are determined according to the determined centroids and the pieces of customer information except the determined centroids in the customer information set, and the problem that the clustering result is poor due to the fact that discrete noise data (namely, the pieces of customer information which are farther away from other pieces of customer information and are distributed more dispersedly in the customer information set) are randomly selected as centroids for clustering by the traditional K-Means algorithm is avoided, so that the convergence speed of the K-Means algorithm and the quality of the clustering result are improved, and the clustering quality of the clustering result of the customer information is further improved.
Optionally, in the method for clustering client information provided in the first embodiment of the present application, determining the first candidate centroid set according to the determined centroid and other client information except the determined centroid in the client information set includes: calculating cosine distances between the determined centroid and other client information except the determined centroid in the client information set to obtain N cosine distances between each client information and the determined centroid, wherein the determined centroid comprises N centroids; determining first customer information according to N cosine distances between each customer information and the determined centroid, and determining the first customer information as the next candidate centroid; a first set of candidate centroids is determined based on the cosine distance of the next candidate centroid from the customer information in the set of customer information other than the determined centroid.
In the first embodiment, in order to determine the next centroid based on the determined centroids, N cosine distances between each piece of customer information and the determined centroids may be calculated, an average distance of the N cosine distances is calculated, an average distance between each piece of customer information and the determined centroids is obtained, customer information corresponding to the maximum average distance is determined as first customer information, namely the next candidate centroid, a first candidate centroid set is formed by customer information with a cosine distance from the next candidate centroid smaller than a threshold T1 in the customer information set, and the next centroid is determined according to the first candidate centroid set.
In an alternative embodiment, the client information set includes client information a, client information B, client information C, and client information D, and the determined client information a is a first centroid, the client information D is a second centroid, and the step of determining the first candidate centroid set according to the client information a and the client information D (i.e., the determined centroids) and the client information B and the client information C (i.e., the client information other than the determined centroids in the client information set) may be as follows:
step S201, calculating the cosine distance D1 between the client information B and the client information A to be 0.6, the cosine distance D2 between the client information B and the client information D to be 0.8 (i.e. 2 cosine distances between the client information A and the determined centroid), the cosine distance D3 between the client information C and the client information A to be 0.5, and the cosine distance D4 between the client information C and the client information D to be 1.0;
step S202, calculating the average value of the cosine distance D1 and the cosine distance D2 to obtain the average distance between the client information B and the determined centroid as 0.7, calculating the average value of the cosine distance D3 and the cosine distance D4 to obtain the average distance between the client information C and the determined centroid as 0.75, and determining the client information C corresponding to the maximum average distance of 0.75 as the next candidate centroid, namely a third candidate centroid;
In step S203, if the cosine distance between the client information B and the client information C is greater than the threshold T1, the first candidate centroid set only includes the client information C, and if the cosine distance between the client information B and the client information C is less than or equal to the threshold T1, the first candidate centroid set includes the client information B and the client information C.
The N cosine distances between each piece of customer information and the determined centroid are calculated, the customer information with the largest average distance from the determined centroid is determined as the next candidate centroid, and the first candidate centroid set is determined according to the next candidate centroid, so that the distance between the next centroid determined from the first candidate centroid set and the determined centroid is far, further, centroids with more scattered distribution can be obtained, the quality of centroids for clustering is improved, and further, the effect of improving the clustering quality of clustering results is achieved.
Optionally, in the method for clustering client information provided in the first embodiment of the present application, determining the next centroid according to the first candidate centroid set includes: m clustering is carried out on the client information sets according to the first candidate centroid set and the determined centroids, M clustering results are obtained, wherein the first candidate centroid set contains M client data, M is an integer, and the clustering results contain a plurality of clusters; obtaining candidate error function sets corresponding to M clustering results according to candidate error functions in each cluster in each clustering result; determining a next error function according to the determined error function and the candidate error function set; and determining a target clustering result corresponding to the next error function, and determining a first candidate centroid corresponding to the target clustering result as the next centroid.
In the first embodiment, in order to determine the next centroid from the first candidate centroid set, the client information set may be clustered M times according to the client information in the first candidate centroid set and the determined centroid, so as to obtain M clustering results, then candidate error functions in each cluster in each clustering result are calculated, candidate error function sets corresponding to the M clustering results are obtained, differences between each candidate error function in the candidate error function sets and the determined last error function are calculated respectively, the candidate error function with the largest difference is determined as the next error function, the clustering result corresponding to the candidate error function with the largest difference is determined as the target clustering result, and the candidate centroid corresponding to the target clustering result is determined as the next centroid.
Specifically, the first candidate centroid set includes client information a and client information B, the determined centroid is client information C, an error function corresponding to the client information C is a first error function H1, the client information set further includes client information D, and the step of determining the next centroid in the first candidate centroid set is as follows:
step S301, clustering is performed according to the client information A and the client information C (i.e. the determined mass centers) to obtain a clustering result Yac, wherein the clustering result Yac comprises two clusters (i.e. a cluster G1 comprising the client information A and a cluster G2 comprising the client information C), clustering is performed according to the client information B and the client information C to obtain a clustering result Ybc, and the clustering result YBc comprises two clusters (i.e. a cluster G3 comprising the client information B and a cluster G4 comprising the client information C);
Step S302, calculating a candidate error function H2 of the clustering result Yac (namely, a candidate error function obtained by calculating the error function of the cluster G1 and the error function of the cluster G2) and a candidate error function H3 of the clustering result Ybc, wherein the candidate error function H2 and the candidate error function H3 form a candidate error function set;
step S303, subtracting the candidate error function H2 from the determined first error function H1 to obtain a difference value L1, subtracting the candidate error function H3 from the first error function H1 to obtain a difference value L2, and determining the candidate error function H2 corresponding to the difference value L1 with smaller difference value as a second error function (namely the next error function);
in step S304, the clustering result Yac corresponding to the candidate error function H2 is determined as the target clustering result, and the client information a (i.e., the candidate centroid) corresponding to the clustering result Yac is determined as the second centroid (i.e., the next centroid).
The client information in the first candidate centroid set and the determined centroids are clustered for a plurality of times, and the clustering quality of a plurality of clustering results is evaluated (namely, the candidate error function of each clustering result is calculated, and the next error function is determined), so that the next centroid with better clustering result can be determined in the first candidate centroid set, the quality of the centroids used for clustering is improved, and the effect of improving the clustering quality of the clustering results is achieved.
Optionally, in the method for clustering client information provided in the first embodiment of the present application, performing M clusters on the client information set according to the first candidate centroid set and the determined centroid, where obtaining M clustering results includes: combining the determined centroid with each piece of customer information in the first candidate centroid set to obtain M second candidate centroid sets; and carrying out M times of clustering on the client information sets according to the M second candidate centroid sets to obtain M clustering results.
Specifically, if the determined centroids are the client information a (i.e., the determined first centroid), the client information B (i.e., the determined second centroid) and the client information C (i.e., the determined third centroid), the first candidate centroid set includes the client information D and the client information E, and the client information set also includes the client information F and the client information H, the client information set is clustered for 2 times according to the determined centroids (i.e., M is equal to 2), so as to obtain 2 clustering results, such as the following steps:
step S401, combining the client information A, the client information B, the client information C and the client information D to obtain a second candidate centroid set J1, and combining the client information A, the client information B, the client information C and the client information E to obtain a second candidate centroid set J2;
Step S402, clustering is carried out according to a second candidate centroid set J1{ client information A, client information B, client information C and client information D }, cosine distances between client information F in the client information set and client information B in the second candidate centroid set J1 are minimum, client information B and client information F are divided into the same cluster, cosine distances between client information H in the client information set and client information C in the second candidate centroid set J1 are minimum, client information H and client information C are divided into the same cluster, and a clustering result Y1 (namely, the cluster comprises cluster G1{ client information A }, G2{ client information B, client information F }, G3 client information C, client information H }, G4{ client information D });
in step S403, clustering is performed according to the second candidate centroid set J2{ client information a, client information B, client information C, client information E }, the cosine distance between the client information F in the client information set and the client information a in the second candidate centroid set J2 is the smallest, the client information a and the client information F are divided into the same cluster, the cosine distance between the client information H in the client information set and the client information E in the second candidate centroid set J2 is the smallest, and the client information H and the client information E are divided into the same cluster, so as to obtain a clustering result Y2 (i.e., the cluster includes the cluster G5{ client information a, client information F }, G6{ client information B }, G7{ client information C }, G8{ client information E, and client information H }).
The client information in the first candidate centroid set is combined with the determined centroids to obtain M second candidate centroid sets, and clustering is carried out according to the M second candidate centroid sets to obtain M clustering results, so that the next centroid in the first candidate centroid set is selected according to the clustering quality of the M clustering results, and the quality of centroids for clustering is improved.
Optionally, in the method for clustering client information provided in the first embodiment of the present application, obtaining a candidate error function set corresponding to M clustering results according to candidate error functions in each cluster in each clustering result includes: calculating an error function of each cluster in each clustering result to obtain a candidate error function corresponding to each cluster in the clustering result; determining an average candidate error function of the clustering result according to the candidate error function corresponding to each cluster in the clustering result; and synthesizing the average candidate error functions of all the clustering results to obtain a candidate error function set.
In the first embodiment, in order to evaluate the clustering quality of the M clustering results, an error function of each cluster in each clustering result may be calculated, and an average candidate error function of each clustering result may be calculated, so as to obtain a candidate error function set, and then a clustering result with better clustering quality in the M clustering results may be determined according to the candidate error function set.
Specifically, if the cluster result Y1 includes the cluster G1 and the cluster G2, the step of calculating the average candidate error function of the cluster result Y1 may be as follows: step S501, calculating an average word vector of each piece of client information in the cluster G1, and taking the average word vector as a cluster center P1 of the cluster G1; step S502, calculating the sum of squares of cosine distances between each piece of customer information in the cluster G1 and the cluster center P1, for example, the cosine distance between the customer information a and the cluster center P1 is the cosine distance D1, the cosine distance between the customer information B and the cluster center P1 is the cosine distance D2, and the candidate error function of the cluster G1 is the sum of squares of the cosine distance D1 and the cosine distance D2; step S503, similarly, calculating a candidate error function of the cluster G2; step S503, calculating the average value of the candidate error function of the cluster G1 and the candidate error function of the cluster G2, and obtaining the average candidate error function of the clustering result Y1.
By calculating the error function of each cluster in each clustering result, the clustering quality of the clustering result can be determined according to the error function of each cluster in each clustering result, the clustering result with the optimal clustering quality can be determined in M clustering results, the next centroid can be determined according to the clustering result with the optimal clustering quality, and the quality of the centroid used for clustering is improved.
Optionally, in the method for clustering client information provided in the first embodiment of the present application, determining the next error function according to the determined error function and the candidate error function set includes: calculating the difference value between the target error function and each candidate error function in the candidate error function set to obtain M difference values, wherein the target error function is one error function in the determined error functions; and sequencing the M differences, and determining a candidate error function corresponding to the difference with the sequence of the first bit as the next error function.
In the first embodiment, in order to determine the next centroid according to the clustering quality of the clustering result, the next error function may be determined from the determined error functions among the candidate error functions, and then the next centroid may be determined according to the next error function.
Specifically, the determined centroid includes a first centroid C1 and a second centroid C2, the error function corresponding to the first centroid C1 is a first error function L1, the error function corresponding to the second centroid C2 is a second error function L2, the candidate error function set includes a candidate error function LS1, the candidate error function LS2 and a candidate error function LS3, and the process of determining the next error function in the candidate error function set may be as follows:
Step S601, calculating the difference between the last determined error function of the determined error functions and each candidate error function of the candidate error function set, namely the difference Y1 obtained by subtracting the candidate error function LS1 from the second error function L2, the difference Y2 obtained by subtracting the candidate error function LS2 from the second error function L2, and the difference Y3 obtained by subtracting the candidate error function LS3 from the second error function L2;
in step S602, the difference Y1, the difference Y2, and the difference Y3 are ordered, and the difference in order of the first digit and positive number is determined as the next error function.
By calculating the difference values of the determined error functions and candidate error functions corresponding to the M clustering results respectively, the obtained difference values can be used for representing the clustering quality of the M clustering results, the candidate error function corresponding to the largest difference value is determined as the next error function, the clustering result with the optimal clustering quality in the M clustering results can be determined as the next centroid, and the effect of improving the quality of the centroids used for clustering is achieved.
Optionally, in the method for clustering client information provided in the first embodiment of the present application, after clustering client information in a client information set according to K centroids to obtain a target clustering result, the method further includes: determining K classes of clients corresponding to the client information in the client information set according to the target clustering result; and adjusting the transaction strategy of each type of clients according to the preset transaction strategy.
In the first embodiment, in order to reduce the transaction risk of the financial institution and reduce the default event occurring in the financial institution, K-class clients may be determined according to the client information included in each cluster in the target clustering result. And then, adjusting the transaction policies of the financial institutions and clients of different categories according to the preset transaction policies. For example, for a first type customer with better repayment credit, the operations such as loan amount promotion, repayment period relaxation and the like can be properly performed so as to maintain good transaction relationship between the financial institution and the first type customer; for the second kind of customers with poor repayment credit, the loan amount and repayment period can be properly reduced, so that the risk of default events between the financial institution and the second kind of customers is reduced. The target clustering result is used for adjusting the transaction strategy of the transaction with clients of different categories, so that the effect of reducing the probability of occurrence of default events in the financial institutions is achieved.
Alternatively, in the first embodiment, the flow of determining K centroids according to the present embodiment may be as shown in fig. 2. In step 201, client information of a plurality of clients is collected to obtain a client information set database. Step 202, calculating an error function of each piece of client information in the client information set Dataset, and taking the client information with the smallest error function as a first centroid C1. In step 203, the cosine distance between the first centroid C1 and other client information in the client information set Dataset is calculated, the client information with the smallest cosine distance is used as the second candidate centroid Cn2, and the client information with the cosine distance smaller than the threshold T1 in the client information set Dataset forms the first candidate centroid set list_c2. In step 204, each piece of customer information in the first candidate centroid set list_c2 is combined with the first centroid C1 to obtain a plurality of second candidate centroid sets. And clustering the client information set Dataset according to the plurality of second candidate centroid sets to obtain a clustering result corresponding to each second candidate centroid set, and calculating an error function of each second candidate centroid set. And calculating the difference value of the error function of each second candidate centroid set and the error function of the first centroid C1, determining the largest positive difference value as a second error function, and determining a second centroid C2 according to the candidate centroids corresponding to the second error function. Step 205, it is determined whether the determined number of centroids reaches K. Step 206, if the number of the determined centroids reaches K, ending the calculation; if the determined number of centroids does not reach K, steps 207 to 210 are repeated until the determined number of centroids reaches K. In step 207, the average distance between each client information in the client information set Dataset and the determined centroid is calculated, the client information corresponding to the maximum average distance is determined as the next candidate centroid CnX, and the client information having a cosine distance from CnX less than T1 constitutes the first candidate centroid set list_cx. Step 208, combining each piece of customer information in list_CX with the determined centroids to obtain a plurality of second candidate centroid sets, and clustering according to each second candidate centroid set. Step 209, determining a clustering result corresponding to each second candidate centroid set, and calculating an average error function corresponding to each second candidate centroid set. A second set of candidate centroids corresponding to the average error function having the largest difference and positive difference of the last error function is determined. Step 210, determining the next centroid CX and the error function corresponding to CX according to the second candidate centroid set determined in the previous step.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.
Example two
The second embodiment of the present application also provides a device for clustering client information, and it should be noted that the device for clustering client information in the second embodiment of the present application may be used to execute the method for clustering client information provided in the first embodiment of the present application. The following describes a clustering device for client information provided in the second embodiment of the present application.
Fig. 3 is a schematic diagram of a clustering apparatus for client information according to a second embodiment of the present application. As shown in fig. 3, the apparatus includes: a calculation unit 301, a first determination unit 302 and a clustering unit 303.
Specifically, the calculating unit 301 is configured to calculate an error function of each client information in the client information set to obtain a plurality of error functions, determine a first error function from the plurality of error functions, and determine client information corresponding to the first error function as a first centroid, where the error function is a sum of squares of cosine distances between the client information and other client information in the client information set.
A first determining unit 302, configured to determine a first candidate centroid set according to the determined centroids and other client information except the determined centroids in the client information set, and determine a next centroid according to the first candidate centroid set, and repeatedly perform the steps of the first determining unit until K centroids are determined, where K is a positive integer greater than 3, and the determined centroids include the first centroid.
And the clustering unit 303 is configured to cluster the client information in the client information set according to the K centroids, so as to obtain a target clustering result.
According to the clustering device for the client information provided by the second embodiment of the application, the error function of each client information in the client information set is calculated through the calculating unit 301 to obtain a plurality of error functions, a first error function is determined from the plurality of error functions, the client information corresponding to the first error function is determined as a first centroid, and the error function is the sum of squares of cosine distances between the client information and other client information in the client information set; the first determining unit 302 determines a first candidate centroid set according to the determined centroids and other client information except the determined centroids in the client information set, and determines a next centroid according to the first candidate centroid set, and repeatedly executes the steps of the first determining unit 302 until K centroids are determined, wherein K is a positive integer greater than 3, and the determined centroids include the first centroid; the clustering unit 303 clusters the client information in the client information set according to the K centroids to obtain a target clustering result, so that the problem that the clustering result is inaccurate due to the fact that discrete points or noise data in the client information are taken as centroids when the client information is clustered in the related technology and centroids used for clustering are acquired in a random selection mode is solved. By calculating the error function of each piece of customer information in the customer information set, a first centroid which is closer to other pieces of customer information can be determined in the customer information set, K centroids which are distributed in a scattered manner are determined according to the determined centroids and the customer information except the determined centroids in the customer information set, the problem that the clustering result is poor due to the fact that the traditional K-Means algorithm randomly selects discrete noise data as the centroids used for clustering is avoided, the convergence speed of the K-Means algorithm and the quality of the clustering result are improved, and the clustering quality of the clustering result of the customer information is further improved.
Optionally, in the clustering apparatus for client information provided in the second embodiment of the present application, the first determining unit 302 includes: a first calculating subunit, configured to calculate cosine distances between the determined centroid and other client information in the client information set except for the determined centroid, to obtain N cosine distances between each client information and the determined centroid, where the determined centroid includes N centroids; a first determining subunit, configured to determine first customer information according to N cosine distances between each customer information and the determined centroid, and determine the first customer information as a next candidate centroid; and the second determination subunit is used for determining the first candidate centroid set according to the cosine distance between the next candidate centroid and the client information except the determined centroid in the client information set.
Optionally, in the clustering apparatus for client information provided in the second embodiment of the present application, the first determining unit 302 includes: the clustering subunit is used for carrying out M times of clustering on the client information sets according to the first candidate centroid set and the determined centroids to obtain M clustering results, wherein the first candidate centroid set comprises M client data, M is an integer, and the clustering results comprise a plurality of clusters; the second calculation subunit is used for obtaining candidate error function sets corresponding to M clustering results according to candidate error functions in each cluster in each clustering result; a third determining subunit, configured to determine a next error function according to the determined error function and the candidate error function set; and the fourth determination subunit is used for determining a target clustering result corresponding to the next error function and determining a first candidate centroid corresponding to the target clustering result as the next centroid.
Optionally, in the clustering device for client information provided in the second embodiment of the present application, the above-mentioned clustering subunit includes: the first processing module is used for combining the determined barycenter with each piece of client information in the first candidate barycenter set to obtain M second candidate barycenter sets; and the clustering module is used for clustering the client information set for M times according to the M second candidate centroid sets to obtain M clustering results.
Optionally, in the clustering device for client information provided in the second embodiment of the present application, the second computing subunit includes: the first calculation module is used for calculating the error function of each cluster in each clustering result to obtain a candidate error function corresponding to each cluster in the clustering result; the determining module is used for determining an average candidate error function of the clustering result according to the candidate error function corresponding to each cluster in the clustering result; and the second processing module is used for synthesizing the average candidate error functions of all the clustering results to obtain a candidate error function set.
Optionally, in the client information clustering device provided in the second embodiment of the present application, the third determining subunit includes: the second calculation module is used for calculating the difference value between the target error function and each candidate error function in the candidate error function set to obtain M difference values, wherein the target error function is one error function in the determined error functions; and the third processing module is used for sequencing the M differences and determining a candidate error function corresponding to the difference with the sequence of the first bit as the next error function.
Optionally, in the client information clustering device provided in the second embodiment of the present application, the device further includes: the second determining unit is used for determining K classes of clients corresponding to the client information in the client information set according to the target clustering result after clustering the client information in the client information set according to the K centroids to obtain the target clustering result; and the adjusting unit is used for adjusting the transaction strategy of each type of clients according to the preset transaction strategy.
The clustering means of the client information includes a processor and a memory, the above-mentioned computing unit 301, the first determining unit 302, the clustering unit 303, and the like are stored as program units in the memory, and the processor executes the above-mentioned program units stored in the memory to realize corresponding functions.
The processor includes a kernel, and the kernel fetches the corresponding program unit from the memory. The kernel can be provided with one or more than one kernel, and the accuracy of the clustering result is improved by adjusting kernel parameters.
The memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), among other forms in computer readable media, the memory including at least one memory chip.
A third embodiment of the present invention provides a computer-readable storage medium having stored thereon a program that, when executed by a processor, implements a clustering method of client information.
The fourth embodiment of the invention provides a processor, which is used for running a program, wherein the clustering method of client information is executed when the program runs.
As shown in fig. 4, a fifth embodiment of the present invention provides an electronic device, where the device includes a processor, a memory, and a program stored in the memory and executable on the processor, and the processor implements the following steps when executing the program: s101, calculating an error function of each piece of customer information in a customer information set to obtain a plurality of error functions, determining a first error function from the plurality of error functions, and determining the customer information corresponding to the first error function as a first centroid, wherein the error function is the sum of squares of cosine distances between the customer information and other customer information in the customer information set; s102, determining a first candidate centroid set according to the determined centroids and other client information except the determined centroids in the client information set, determining the next centroid according to the first candidate centroid set, and repeatedly executing the step S102 until K centroids are determined, wherein K is a positive integer greater than 3, and the determined centroids comprise the first centroid; and S103, clustering the client information in the client information set according to the K centroids to obtain a target clustering result.
The processor also realizes the following steps when executing the program: determining a first set of candidate centroids based on the determined centroids and other customer information in the set of customer information other than the determined centroids comprises: calculating cosine distances between the determined centroid and other client information except the determined centroid in the client information set to obtain N cosine distances between each client information and the determined centroid, wherein the determined centroid comprises N centroids; determining first customer information according to N cosine distances between each customer information and the determined centroid, and determining the first customer information as the next candidate centroid; a first set of candidate centroids is determined based on the cosine distance of the next candidate centroid from the customer information in the set of customer information other than the determined centroid.
The processor also realizes the following steps when executing the program: determining a next centroid from the first set of candidate centroids includes: m clustering is carried out on the client information sets according to the first candidate centroid set and the determined centroids, M clustering results are obtained, wherein the first candidate centroid set contains M client data, M is an integer, and the clustering results contain a plurality of clusters; obtaining candidate error function sets corresponding to M clustering results according to candidate error functions in each cluster in each clustering result; determining a next error function according to the determined error function and the candidate error function set; and determining a target clustering result corresponding to the next error function, and determining a first candidate centroid corresponding to the target clustering result as the next centroid.
The processor also realizes the following steps when executing the program: performing M times of clustering on the client information set according to the first candidate centroid set and the determined centroid, and obtaining M clustering results comprises: combining the determined centroid with each piece of customer information in the first candidate centroid set to obtain M second candidate centroid sets; and carrying out M times of clustering on the client information sets according to the M second candidate centroid sets to obtain M clustering results.
The processor also realizes the following steps when executing the program: according to the candidate error function in each cluster result, obtaining candidate error function sets corresponding to M cluster results comprises: calculating an error function of each cluster in each clustering result to obtain a candidate error function corresponding to each cluster in the clustering result; determining an average candidate error function of the clustering result according to the candidate error function corresponding to each cluster in the clustering result; and synthesizing the average candidate error functions of all the clustering results to obtain a candidate error function set.
The processor also realizes the following steps when executing the program: determining a next error function from the determined error function and the set of candidate error functions includes: calculating the difference value between the target error function and each candidate error function in the candidate error function set to obtain M difference values, wherein the target error function is one error function in the determined error functions; and sequencing the M differences, and determining a candidate error function corresponding to the difference with the sequence of the first bit as the next error function.
The processor also realizes the following steps when executing the program: after clustering the client information in the client information set according to the K centroids to obtain the target clustering result, the method further comprises the following steps: determining K classes of clients corresponding to the client information in the client information set according to the target clustering result; and adjusting the transaction strategy of each type of clients according to the preset transaction strategy.
The device herein may be a server, PC, PAD, cell phone, etc.
The application also provides a computer program product adapted to perform, when executed on a data processing device, a program initialized with the method steps of: s101, calculating an error function of each piece of customer information in a customer information set to obtain a plurality of error functions, determining a first error function from the plurality of error functions, and determining the customer information corresponding to the first error function as a first centroid, wherein the error function is the sum of squares of cosine distances between the customer information and other customer information in the customer information set; s102, determining a first candidate centroid set according to the determined centroids and other client information except the determined centroids in the client information set, determining the next centroid according to the first candidate centroid set, and repeatedly executing the step S102 until K centroids are determined, wherein K is a positive integer greater than 3, and the determined centroids comprise the first centroid; and S103, clustering the client information in the client information set according to the K centroids to obtain a target clustering result.
When executed on a data processing device, is further adapted to carry out a program initialized with the method steps of: determining a first set of candidate centroids based on the determined centroids and other customer information in the set of customer information other than the determined centroids comprises: calculating cosine distances between the determined centroid and other client information except the determined centroid in the client information set to obtain N cosine distances between each client information and the determined centroid, wherein the determined centroid comprises N centroids; determining first customer information according to N cosine distances between each customer information and the determined centroid, and determining the first customer information as the next candidate centroid; a first set of candidate centroids is determined based on the cosine distance of the next candidate centroid from the customer information in the set of customer information other than the determined centroid.
When executed on a data processing device, is further adapted to carry out a program initialized with the method steps of: determining a next centroid from the first set of candidate centroids includes: m clustering is carried out on the client information sets according to the first candidate centroid set and the determined centroids, M clustering results are obtained, wherein the first candidate centroid set contains M client data, M is an integer, and the clustering results contain a plurality of clusters; obtaining candidate error function sets corresponding to M clustering results according to candidate error functions in each cluster in each clustering result; determining a next error function according to the determined error function and the candidate error function set; and determining a target clustering result corresponding to the next error function, and determining a first candidate centroid corresponding to the target clustering result as the next centroid.
When executed on a data processing device, is further adapted to carry out a program initialized with the method steps of: performing M times of clustering on the client information set according to the first candidate centroid set and the determined centroid, and obtaining M clustering results comprises: combining the determined centroid with each piece of customer information in the first candidate centroid set to obtain M second candidate centroid sets; and carrying out M times of clustering on the client information sets according to the M second candidate centroid sets to obtain M clustering results.
When executed on a data processing device, is further adapted to carry out a program initialized with the method steps of: according to the candidate error function in each cluster result, obtaining candidate error function sets corresponding to M cluster results comprises: calculating an error function of each cluster in each clustering result to obtain a candidate error function corresponding to each cluster in the clustering result; determining an average candidate error function of the clustering result according to the candidate error function corresponding to each cluster in the clustering result; and synthesizing the average candidate error functions of all the clustering results to obtain a candidate error function set.
When executed on a data processing device, is further adapted to carry out a program initialized with the method steps of: determining a next error function from the determined error function and the set of candidate error functions includes: calculating the difference value between the target error function and each candidate error function in the candidate error function set to obtain M difference values, wherein the target error function is one error function in the determined error functions; and sequencing the M differences, and determining a candidate error function corresponding to the difference with the sequence of the first bit as the next error function.
When executed on a data processing device, is further adapted to carry out a program initialized with the method steps of: after clustering the client information in the client information set according to the K centroids to obtain the target clustering result, the method further comprises the following steps: determining K classes of clients corresponding to the client information in the client information set according to the target clustering result; and adjusting the transaction strategy of each type of clients according to the preset transaction strategy.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims (10)

1. A method for clustering customer information, comprising:
s101, calculating an error function of each piece of customer information in a customer information set to obtain a plurality of error functions, determining a first error function from the plurality of error functions, and determining customer information corresponding to the first error function as a first centroid, wherein the error function is a sum of squares of cosine distances between the customer information and other customer information in the customer information set;
S102, determining a first candidate centroid set according to the determined centroids and other client information except the determined centroids in the client information set, determining the next centroid according to the first candidate centroid set, and repeatedly executing the step S102 until K centroids are determined, wherein K is a positive integer greater than 3, and the determined centroids comprise the first centroid;
and S103, clustering the client information in the client information set according to the K centroids to obtain a target clustering result.
2. The method of claim 1, wherein determining a first set of candidate centroids based on the determined centroids and other customer information in the set of customer information other than the determined centroids comprises:
calculating cosine distances between the determined centroid and other client information except the determined centroid in the client information set to obtain N cosine distances between each client information and the determined centroid, wherein the determined centroid comprises N centroids;
determining first customer information according to N cosine distances between each customer information and the determined centroid, and determining the first customer information as a next candidate centroid;
And determining the first candidate centroid set according to the cosine distance between the next candidate centroid and the client information except the determined centroid in the client information set.
3. The method of claim 1, wherein determining a next centroid from the first set of candidate centroids comprises:
m clustering is carried out on the client information set according to the first candidate centroid set and the determined centroid, M clustering results are obtained, wherein the first candidate centroid set contains M client data, M is an integer, and the clustering results contain a plurality of clusters;
obtaining candidate error function sets corresponding to the M clustering results according to candidate error functions in each cluster in each clustering result;
determining a next error function according to the determined error function and the candidate error function set;
and determining a target clustering result corresponding to the next error function, and determining a first candidate centroid corresponding to the target clustering result as the next centroid.
4. The method of claim 3, wherein clustering the set of customer information M times in accordance with the first set of candidate centroids and the determined centroids, obtaining M cluster results comprises:
Combining the determined centroid with each client information in the first candidate centroid set to obtain M second candidate centroid sets;
and clustering the client information set for M times according to the M second candidate centroid sets to obtain M clustering results.
5. The method of claim 3, wherein obtaining the set of candidate error functions corresponding to the M cluster results from candidate error functions within each cluster in each cluster result comprises:
calculating an error function of each cluster in each clustering result to obtain a candidate error function corresponding to each cluster in the clustering result;
determining an average candidate error function of the clustering result according to the candidate error function corresponding to each cluster in the clustering result;
and integrating all the average candidate error functions of the clustering results to obtain the candidate error function set.
6. A method according to claim 3, wherein determining a next error function from the determined error function and the set of candidate error functions comprises:
calculating the difference value between an objective error function and each candidate error function in the candidate error function set to obtain M difference values, wherein the objective error function is one error function in the determined error functions;
And sequencing the M differences, and determining a candidate error function corresponding to the difference with the sequence of the first bit as the next error function.
7. The method of claim 1, wherein after clustering the customer information in the set of customer information according to the K centroids to obtain a target cluster result, the method further comprises:
determining K classes of clients corresponding to the client information in the client information set according to the target clustering result;
and adjusting the transaction strategy of each type of clients according to the preset transaction strategy.
8. A client information clustering device, comprising:
the computing unit is used for computing an error function of each piece of customer information in the customer information set to obtain a plurality of error functions, determining a first error function from the plurality of error functions, and determining customer information corresponding to the first error function as a first centroid, wherein the error function is the sum of squares of cosine distances between the customer information and other pieces of customer information in the customer information set;
a first determining unit, configured to determine a first candidate centroid set according to the determined centroids and other client information in the client information set except the determined centroids, and determine a next centroid according to the first candidate centroid set, and repeatedly execute the steps of the first determining unit until K centroids are determined, where K is a positive integer greater than 3, and the determined centroids include the first centroid;
And the clustering unit is used for clustering the client information in the client information set according to the K centroids to obtain a target clustering result.
9. A processor, characterized in that the processor is configured to run a program, wherein the program runs to perform the method of clustering client information according to any one of claims 1 to 7.
10. An electronic device comprising one or more processors and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of clustering customer information of any one of claims 1-7.
CN202310719562.0A 2023-06-16 2023-06-16 Client information clustering method and device, processor and electronic equipment Pending CN116720094A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310719562.0A CN116720094A (en) 2023-06-16 2023-06-16 Client information clustering method and device, processor and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310719562.0A CN116720094A (en) 2023-06-16 2023-06-16 Client information clustering method and device, processor and electronic equipment

Publications (1)

Publication Number Publication Date
CN116720094A true CN116720094A (en) 2023-09-08

Family

ID=87865728

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310719562.0A Pending CN116720094A (en) 2023-06-16 2023-06-16 Client information clustering method and device, processor and electronic equipment

Country Status (1)

Country Link
CN (1) CN116720094A (en)

Similar Documents

Publication Publication Date Title
US10614073B2 (en) System and method for using data incident based modeling and prediction
Hung et al. Customer segmentation using hierarchical agglomerative clustering
CN111882426A (en) Business risk classifier training method, device, equipment and storage medium
CN116450951A (en) Service recommendation method and device, storage medium and electronic equipment
CN114529400A (en) Consumption loan preauthorization evaluation method, device and medium
CN114782201A (en) Stock recommendation method and device, computer equipment and storage medium
CN116542747A (en) Product recommendation method and device, storage medium and electronic equipment
KR20110114181A (en) Loan underwriting method for improving forecasting accuracy
CN116720094A (en) Client information clustering method and device, processor and electronic equipment
CN111667307B (en) Method and device for predicting financial product sales volume
CN114049202A (en) Operation risk identification method and device, storage medium and electronic equipment
CN113849618A (en) Strategy determination method and device based on knowledge graph, electronic equipment and medium
CN111429232A (en) Product recommendation method and device, electronic equipment and computer-readable storage medium
CN113064944A (en) Data processing method and device
CN117635225A (en) Electronic red envelope issuing method and device, storage medium and electronic equipment
CN117952688A (en) Classification method and device for merchants and electronic equipment
CN117648438A (en) Classification processing method and device for consultation type worksheets, storage medium and electronic equipment
CN116611930A (en) Recommendation method and device for financial products, processor and electronic equipment
US20160042091A1 (en) System And Method Of Forming An Index
CN116485526A (en) Service processing method and device and electronic equipment
CN116228371A (en) Recommendation method and device for financial products, storage medium and electronic equipment
CN117350839A (en) Account checking method and device, storage medium and electronic equipment
CN117726398A (en) Product information recommendation method and device, processor and electronic equipment
CN116151987A (en) Financial product transaction processing method, device and system
CN115048470A (en) Customer classification method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination