CN113688906A - Customer segmentation method and system based on quantum K-means algorithm - Google Patents
Customer segmentation method and system based on quantum K-means algorithm Download PDFInfo
- Publication number
- CN113688906A CN113688906A CN202110982944.3A CN202110982944A CN113688906A CN 113688906 A CN113688906 A CN 113688906A CN 202110982944 A CN202110982944 A CN 202110982944A CN 113688906 A CN113688906 A CN 113688906A
- Authority
- CN
- China
- Prior art keywords
- quantum
- data
- quantum state
- sample
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004422 calculation algorithm Methods 0.000 title claims abstract description 68
- 238000000034 method Methods 0.000 title claims abstract description 53
- 230000011218 segmentation Effects 0.000 title claims abstract description 29
- 238000006243 chemical reaction Methods 0.000 claims abstract description 35
- 238000004364 calculation method Methods 0.000 claims abstract description 24
- 238000009826 distribution Methods 0.000 claims description 7
- 238000004140 cleaning Methods 0.000 claims description 6
- 238000013075 data extraction Methods 0.000 claims description 6
- 238000004458 analytical method Methods 0.000 abstract description 6
- 230000001133 acceleration Effects 0.000 abstract description 3
- 230000006399 behavior Effects 0.000 description 36
- 238000012545 processing Methods 0.000 description 5
- 238000001514 detection method Methods 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 4
- 238000010845 search algorithm Methods 0.000 description 4
- 238000007621 cluster analysis Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 238000007619 statistical method Methods 0.000 description 3
- 230000004308 accommodation Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000000556 factor analysis Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000000513 principal component analysis Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- PMGQWSIVQFOFOQ-YKVZVUFRSA-N clemastine fumarate Chemical compound OC(=O)\C=C\C(O)=O.CN1CCC[C@@H]1CCO[C@@](C)(C=1C=CC(Cl)=CC=1)C1=CC=CC=C1 PMGQWSIVQFOFOQ-YKVZVUFRSA-N 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000002096 quantum dot Substances 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000013468 resource allocation Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/29—Graphical models, e.g. Bayesian networks
- G06F18/295—Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Probability & Statistics with Applications (AREA)
- Computational Linguistics (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a customer segmentation method and a system based on a quantum K-means algorithm, wherein the method comprises the following steps: acquiring a customer behavior data set D; according to the sample x in the customer behavior data set DmCharacteristic value of (2), sample xmConversion to quantum state | xm>Represents; and according to the selected k cluster centers ciThe characteristic value of (2) converts the clustering center c into a quantum state | c>Represents; the client behavior data and the clustering center are subjected to quantum computation, the similarity between each data and the clustering center is output, and the similarity exists in a quantum state | amMore than middle; looking up quantum state | am>Middle data sample | xm>And cluster center | ci>So as to find the minimum value of (c) with sample xmNearest cluster center cj. The invention standardizes the input data and inputs the data without destroying the data relation, can help enterprises to realize deep analysis clients, and simultaneously, quantumCalculation acceleration, accurate calculation and energy saving brought by calculation.
Description
Technical Field
The invention relates to the field of quantum finance, in particular to a customer segmentation method and a customer segmentation system based on a quantum K-means algorithm.
Background
The principle of pareto, also called twenty-eight, plays an important role in the economic field, which considers that, in any case, the main factors influencing the outcome of a transaction are only a small part. A large number of studies have found that it is only 20% of customers that contribute 80% of profits to a business. Because the cost of developing new users in various industries of the financial market is far higher than the cost of reserving customers, the marketization leads the similarity of products and services of various enterprises to be higher and higher, and the development space of the enterprises is limited. The competition is changed, the relationship between the enterprise and the clients is maintained, different clients are layered according to the characteristics of the clients, the resource allocation of the enterprise to different types of clients is optimized, and the maximization of enterprise income is a fundamental requirement of the enterprise for pursuing long-term stable development.
The traditional statistical method has relatively huge consumption on manpower and material resources, and a statistical result has certain errors due to various external factors. Compared with the traditional investigation statistical method, the client behavior data called from the existing database has lower information cost and higher reliability. The machine learning algorithm has the disadvantages that the data volume used by the machine learning algorithm is relatively large, the problems caused by a statistical method are avoided, but the problems are also caused by overlarge data volume, the calculation time consumption is large, the calculation resource consumption is high, and the problems are common defects of the existing algorithm.
The method for distinguishing the customers with different values is crucial to the core development of enterprises, common subdivision methods can be generally divided into pre-subdivision and post-subdivision, the specific method for post-subdivision is generally cluster analysis, and then the value of the class is analyzed according to the obtained classes. Meanwhile, the existing customer segmentation method is a split demonstration method, and common laws of various disciplines cannot be connected in series. Before the customer information is processed by using the cluster analysis algorithm, a certain method is usually selected to preprocess the data, such as principal component analysis, due to the complex data structure. The data preprocessing can actually bring certain calculation benefits, but the processed data lose the implicit relation among the original data, and the integrity of data information is reduced. The idea of the classic k-means algorithm is simple and easy to realize, but due to the calculation characteristic, clustering needs to be realized through repeated iterative calculation. When a complex and large-amount computing environment is faced, the computing complexity of the complex and large-amount computing environment can be increased greatly along with the increase of the data amount, so that the k-means algorithm is not suitable for the clustering problem with the large data amount generally. Quantum computing is based on the basic characteristics of quantum, has powerful parallel computing capability and higher data accommodation capability, and when large data is processed, the performance of quantum computing far exceeds the operation processing capability of a classical computer. The quantum k-means algorithm is a k-means algorithm based on a quantum computing theory, and can effectively improve the computing efficiency of the k-means algorithm and reduce the space complexity.
Therefore, in order to effectively solve the problems of low customer information processing speed, low efficiency and high energy consumption of an enterprise facing a big data environment, the method and the system for subdividing the customers based on the quantum K-means algorithm are provided, the possibility is provided for analyzing and researching the customer subdivision in a system integral mode, the customer image is depicted more completely and strictly, and the method and the system belong to the problems to be solved in the field.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a customer segmentation method and a customer segmentation system based on a quantum K-means algorithm.
The purpose of the invention is realized by the following technical scheme:
the invention provides a customer segmentation method based on a quantum K-means algorithm, which comprises the following steps:
determining a subdivision angle, namely the characteristic quantity D, and acquiring a client behavior data set D;
according to the sample x in the customer behavior data set DmCharacteristic value of (2), sample xmConversion to quantum state | xm>Represents; and according to the selected k cluster centers ciThe characteristic value of (2) converts the clustering center c into a quantum state | c>Represents;
the customer behavior data and the clustering center are subjected to quantum computation, and the similarity between each data and the clustering center is output, namely the quantum state | x is computedm>And | c>The similarity exists in a quantum state | am>Performing the following steps;
looking up quantum state | am>Middle data sample | xm>And cluster center | ci>So as to find the minimum value of (c) with sample xmNearest cluster center cj。
Further, the obtaining of the customer behavior data set D includes:
data extraction: extracting required data from a database;
data cleaning: checking all variables for missing, unknown, invalid or valid values; then, according to the variable distribution characteristics and the actual requirements, adopting corresponding rules to update the missing value, the unknown value and the invalid value to be effective;
data conversion: converting different types of data into quantum k-means algorithms can use the type of quantum state.
Further, the sample x in the data set D according to the customer behaviormCharacteristic value of (2), sample xmConversion to quantum state | xm>Expressed, the conversion formula is:
in the formula, xmjDenotes the m-th sample xmThe jth feature of (1);
the k cluster centers c according to the selectioniThe characteristic value of (2) converts the clustering center c into a quantum state | c>Expressed, the conversion formula is:
in the formula, cijThe jth feature representing the ith cluster center c.
Further, the customer behavior data and the clustering centers are subjected to quantum computing, and the similarity between each data and the clustering centers is output, namely, the quantum state | x is computedm>And | c>The similarity exists in a quantum state | alphamIn (6), the method comprises:
and controlling the switching gate to calculate a similarity result | ψ >, which has the formula:
in the formula, n represents a sample xmThe number of the (c) component(s),s(xm,ci) Denotes xmAnd ciThe similarity of (2);
quantum state | ψ>The output of the phase estimation algorithm is | | c as input to the phase estimation algorithmi-xm|>This is the data sample | xm>And cluster center | c>Similarity between them, exist in quantum state | alpham>In, the formula is:
further, the quantum state | α is searchedm>Middle data sample | xm>And cluster center | ci>So as to find the minimum value of (c) with sample xmNearest cluster center cjThe method comprises the following steps:
randomly selecting a cluster center ciAs an initial value, the following steps are then repeatedHere, by continuously iterating through | am>Minimum of (2):
preparing initial value c of clustering centeriQuantum state of (b) is | beta>;
Will | αm>、|β>As input, | b>C 'is found by utilizing a Grover algorithm as a control input'jC of wherein'jRepresenting a temporary cluster center;
if | c'j-xm|<|cj-xmL, then c'jReplacement cj。
In a second aspect of the present invention, there is provided a customer segmentation system based on quantum K-means algorithm, comprising:
an angle subdivision module: the method comprises the steps of determining subdivision angles, namely feature quantity D, and obtaining a customer behavior data set D;
quantum state conversion module: for determining the sample x from the customer behavior data set DmCharacteristic value of (2), sample xmConversion to quantum state | xm>Represents; and according to the selected k cluster centers ciThe characteristic value of (2) converts the clustering center c into a quantum state | c>Represents;
a similarity calculation module: the method is used for outputting the similarity between each data and the clustering center through quantum computation of customer behavior data and the clustering center, namely computing the quantum state | xm>And | c>The similarity exists in a quantum state | alpham>Performing the following steps;
a clustering center searching module: for finding quantum state | αm>Middle data sample | xm>And cluster center | ci>So as to find the minimum value of (c) with sample xmNearest cluster center cj。
Further, the obtaining of the customer behavior data set D includes:
a data extraction submodule: extracting required data from a database;
a data cleaning submodule: checking all variables for missing, unknown, invalid or valid values; then, according to the variable distribution characteristics and the actual requirements, adopting corresponding rules to update the missing value, the unknown value and the invalid value to be effective;
a data conversion submodule: converting different types of data into quantum k-means algorithms can use the type of quantum state.
Further, the sample x in the data set D according to the customer behaviormCharacteristic value of (2), sample xmConversion to quantum state | xm>Expressed, the conversion formula is:
in the formula, xmjDenotes the m-th sample xmThe jth feature of (1);
the k cluster centers c according to the selectioniThe characteristic value of (2) converts the clustering center c into a quantum state | c>Expressed, the conversion formula is:
in the formula, cijThe jth feature representing the ith cluster center c.
Further, the similarity calculation module includes:
and controlling the switching gate to calculate a similarity result | ψ >, wherein the formula is as follows:
in the formula, n represents a sample xmThe number of the (c) component(s),s(xm,ci) Denotes xmAnd ciThe similarity of (2);
quantum state | ψ>The output of the phase estimation algorithm is | | c as input to the phase estimation algorithmi-xm|>This is the data sample | xm>And cluster center | c>Similarity between them, exist in quantum state | alpham>In, the formula is:
further, the cluster center searching module includes:
randomly selecting a cluster center ciAs an initial value, the following steps are then repeatedHere, by continuously iteratively looking up | αm>Minimum of (2):
preparing initial value c of clustering centeriQuantum state of (b) is | beta>;
Will | αm>、|β>As input, | b>C 'is found by utilizing a Grover algorithm as a control input'jC of wherein'jRepresenting a temporary cluster center;
if | c'j-xm|<|cj-xmL, then c'jReplacement cj。
In a third aspect of the invention, a storage medium is provided having stored thereon computer instructions which, when executed, perform the steps of the method for quantum hidden markov model solution fraud detection.
In a fourth aspect of the present invention, there is provided a terminal comprising a memory and a processor, wherein the memory stores computer instructions executable on the processor, and the processor executes the computer instructions to perform the steps of the method for solving fraud detection using quantum hidden markov models.
The invention has the beneficial effects that:
(1) in an exemplary embodiment of the present invention, a traditional customer segmentation implementation needs to process data first, which is at the cost of losing data integrity although obtaining computational benefits, whereas the customer segmentation method based on the quantum k-means algorithm in the exemplary embodiment normalizes input data and inputs the input data without destroying data relationships, which can help an enterprise to implement deep analysis of customers, and meanwhile, computation acceleration, computation accuracy and computation energy saving brought by quantum computation are also expected by the enterprise.
Exemplary embodiments of the system, the storage medium and the terminal of the present invention also have the same advantages.
(2) In yet another exemplary embodiment of the present invention, customer data is obtained and the required data is extracted from an existing database or authorized database within the enterprise as a training data set based on the requirements of the enterprise. The obtained data needs to be standardized, and model processing is facilitated. But the specification step does not disrupt the data relationships.
(3) In yet another exemplary embodiment of the present invention, specific embodiments of the subsequent steps are disclosed.
Drawings
Fig. 1 is an inventive flow chart provided by an exemplary embodiment of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it should be noted that directions or positional relationships indicated by "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", and the like are directions or positional relationships described based on the drawings, and are only for convenience of description and simplification of description, and do not indicate or imply that the device or element referred to must have a specific orientation, be configured and operated in a specific orientation, and thus, should not be construed as limiting the present invention.
In the description of the present invention, it should be noted that, unless otherwise explicitly stated or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context. Furthermore, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The existing customer segmentation method is a split demonstration method and cannot connect the common laws of various disciplines in series. Before the customer information is processed by using the cluster analysis algorithm, a certain method is usually selected to preprocess the data, such as principal component analysis, due to the complex data structure. The data preprocessing can actually bring certain calculation benefits, but the processed data lose the implicit relation among the original data, and the integrity of data information is reduced. The idea of the classic k-means algorithm is simple and easy to realize, but due to the calculation characteristic, clustering needs to be realized through repeated iterative calculation. When a complex and large-amount computing environment is faced, the computing complexity of the complex and large-amount computing environment can be increased greatly along with the increase of the data amount, so that the k-means algorithm is not suitable for the clustering problem with the large data amount generally. Quantum computing is based on the basic characteristics of quantum, has powerful parallel computing capability and higher data accommodation capability, and when large data is processed, the performance of quantum computing far exceeds the operation processing capability of a classical computer. The quantum k-means algorithm is a k-means algorithm based on a quantum computing theory, and can effectively improve the computing efficiency of the k-means algorithm and reduce the space complexity.
The essence of the client subdivision method is that an enterprise divides a data set D into subclasses Y with a certain amount by using a certain subdivision standard M through an existing client information data set DiWherein i is more than or equal to 2, and represents the number of subclasses, namely the number of classified customer groups. Specifically, the customer information data set D containing n samples is defined as (D)1,d2,...,dn) Divided into k groups Y ═ Y (Y)1,Y2,...,Yk). At the same time, each subclass Y is requirediNon-null, each sample can ultimately only belong to one class. The following exemplary embodiments perform customer segmentation based on customer behavior in the context of a power grid, and other fields may be used and perform corresponding operations, which are not limited herein.
Referring to fig. 1, fig. 1 illustrates a customer segmentation method based on a quantum K-means algorithm according to an exemplary embodiment of the present invention, including:
determining a subdivision angle, namely the characteristic quantity D, and acquiring a client behavior data set D;
according to the sample x in the customer behavior data set DmCharacteristic value of (2), sample xmConversion to quantum state | xm>Represents; and according to the selected k cluster centers ciThe characteristic value of (2) converts the clustering center c into a quantum state | c>Represents;
the customer behavior data and the clustering center are subjected to quantum computation, and the similarity between each data and the clustering center is output, namely the quantum state | x is computedm>And | c>The similarity exists in a quantum state | alpham>Performing the following steps;
looking up quantum state | alpham>Middle data sample | xm>And cluster center | ci>So as to find the minimum value of (c) with sample xmNearest cluster center cj。
The determination of the subdivision angle can enable an enterprise to divide customers into a plurality of groups according to certain standards and own requirements. Common subdivision angles include: geographic subdivision, demographic subdivision, psychological subdivision, behavioral subdivision, and the like. The subdivision angle selected by the exemplary embodiment of the present invention is a behavior subdivision, i.e., a customer behavior, and the determination of the subdivision angle determines the data set used as a customer behavior data set. The number of data features in the data set used is determined to be d.
The number of the clustering centers is the number of the classifications, and the samples near the clustering centers are of one class.
The invention provides a customer segmentation method based on a quantum machine learning algorithm, the traditional customer segmentation needs to process data firstly, the method obtains calculation benefits but costs the loss of data integrity, and the customer segmentation method based on the quantum machine learning algorithm in the exemplary embodiment normalizes and inputs input data without destroying data relation, can help enterprises to realize deep analysis customers, and meanwhile, the calculation acceleration, the calculation accuracy and the calculation energy saving brought by quantum calculation are also expected by the enterprises.
The following exemplary embodiments will set forth the individual steps in detail.
Preferably, in an exemplary embodiment, the obtaining the customer behavior data set D includes:
data extraction: extracting required data from a database;
data cleaning: checking all variables for missing, unknown, invalid or valid values; then, according to the variable distribution characteristics and the actual requirements, adopting corresponding rules to update the missing value, the unknown value and the invalid value to be effective; this content prevents abnormal data from adversely affecting the clustering process.
Data conversion: converting different types of data (different formats, types, distributions) into quantum k-means algorithms can use the type of quantum state.
Specifically, in the exemplary embodiment, to obtain customer data, the required data is extracted from an existing database or authorized database within the enterprise as a training data set according to the enterprise requirements. The obtained data needs to be standardized, and model processing is facilitated.
After obtaining the customer data, it is then necessary to determine the subdivision angle, the different subdivision angles determining the results of the data analysis. The exemplary embodiment starts from the analysis of the power grid customer behavior, and researches the characteristics of the grouped customer groups. The power grid client behavior data comprises daily load conditions, summer/winter electricity consumption, load factors, payment types, consultation hot line times, online login times, business handling and other preferences.
And then selecting a certain clustering analysis method, selecting different clustering centers and obtaining the customer detail classification. The initially selected clustering center is used for screening the power grid customer behavior data according to a factor analysis method and can also be selected according to experience, but the clustering center selected according to experience generally has subjectivity, and the clustering result is not scientific and objective. The exemplary embodiment uses a factor analysis method to generate a clustering center, and uses the filtered variables as the clustering center to allow the algorithm to converge as soon as possible. The subdivision method used in the present exemplary embodiment is a quantum k-means algorithm, and the specific steps are as follows:
(1) preparing quantum state, normalizing the power grid customer behavior data set D, and performing normalization processing on n samples x in the data set DmThere are d features, each feature in each sample being represented by xmjRepresents; the cluster centers are denoted by c, there are k cluster centers, c ═ c1,c2,...,ckC for each cluster centeriAnd (4) showing. Power grid customer behavior data is used as input of quantum k-means algorithm and quantum state | xm>Representing, selected for the cluster center, the quantum state | c>And (4) showing. For example, x0jA jth eigenvalue representing a 0 th data point; c. CijThe jth eigenvalue representing the ith cluster center. This section mainly achieves the conversion of classical data to quantum states.
Preferably, in an exemplary embodiment, the samples x in the customer behavior based dataset DmCharacteristic value of (2), sample xmConversion to quantum state | xm>Expressed, the conversion formula is:
in the formula, xmjDenotes the m-th sample xmThe jth feature of (1);
the k cluster centers c according to the selectioniThe characteristic value of (2) converts the clustering center c into a quantum state | c>Expressed, the conversion formula is:
in the formula, cijThe jth feature representing the ith cluster center c.
(2) Calculating the similarity between the power grid customer behavior data sample and the clustering center, namely calculating the quantum state | xm>And | c>And storing the similarity between the data sample and the clustering center in the quantum bit by using a phase estimation algorithm, wherein the Control-Swap Gate (Control-Swap Gate) is used for calculating the similarity. Controlling the result | ψ of the calculation of the switching gate>Obtaining c through a phase estimation algorithmi-xm|>This is the data sample | xm>And cluster center | c>Similarity between them, exist in quantum state | alpham>I.e., the smaller the value, the higher the similarity. The part is mainly input with power grid customer behavior data and the clustering center, and similarity between each data and the clustering center is output through quantum calculation.
Preferably, in an exemplary embodiment, the customer behavior data and the cluster center are subjected to quantum computation, and the similarity between each data and the cluster center is output, namely, a quantum state | x is computedm>And | c>The similarity exists in a quantum state | alpham>In, comprising:
and controlling the switching gate to calculate a similarity result | ψ >, wherein the formula is as follows:
in the formula, n represents a sample xmThe number of the (c) component(s),s(xm,ci) Denotes xmAnd ciThe similarity of (2);
quantum state | ψ>The output of the phase estimation algorithm is | | c as input to the phase estimation algorithmi-xm|>This is the data sample | xm>And cluster center | c>Similarity between them, exist in quantum state | alpham>In, the formula is:
wherein, the quantum phase estimation method can calculate the phase of the target quantum state, and is realized mainly by quantum Fourier transformWherein the quantum state | ψ>As an input xi|j>,||ci-xm|>I.e. is the output yk|k>. In addition, | ci-xm|>I.e. is s (x)m,ci)。
(3) Searching the similarity maximum value between the power grid customer behavior data sample and the clustering center, | a>In the presence of nk | | ci-xm|>Value, | αm>In which there are k | | ci-xm|>Finding quantum state | alpha by using quantum minimum value search algorithmm>Middle data sample | xm>And cluster center | ci>The minimum value in between. The method is characterized in that a value with the best clustering effect generated by the power grid customer behavior data through a quantum-minimum search algorithm is found through a quantum-minimum search algorithm.
More preferably, in an exemplary embodiment, the looking-up quantum state | αmX of data samplem>And cluster center | ci>So as to find the minimum value of (c) with sample xmNearest cluster center cjThe method comprises the following steps:
randomly selecting a cluster center ciAs an initial value, the following steps are then repeatedHere, by continuously iterating through | am>Minimum of (2):
preparing initial value c of clustering centeriQuantum state of (b) is | beta>;
Will | am>、|β>As input, | b>C 'is found by utilizing a Grover algorithm as a control input'jC of wherein'jRepresenting a temporary cluster center; (| b)>The control input can be a condition control input, and when the total input satisfies a certain condition, a desired output can be obtained by the control input b)
If | c'j-xm|<|cj-xmL, then c'jReplacement cj。
Finally, the result of statistic calculation is synthesized, and c can be found by a quantum minimum value search algorithmjAnd xmNearest cluster center, will xmDue to cjAnd the enterprise is helped to realize the analysis of the power grid customer behavior data through a quantum k-means algorithm. After the power grid customer behavior subdivision is completed, power utilization personalized services can be provided for different customers according to actual markets, a differentiated value-added service scheme is achieved, and enterprises are helped to create income stably.
With the same inventive concept as the above-described exemplary embodiment, still another exemplary embodiment of the present invention provides a customer segmentation system based on a quantum K-means algorithm, including:
an angle subdivision module: the method comprises the steps of determining subdivision angles, namely feature quantity D, and obtaining a customer behavior data set D; (ii) a
Quantum state conversion module: for determining the sample x from the customer behavior data set DmCharacteristic value of (2), sample xmConversion to quantum state | xm>Represents; and according to the selected k cluster centers ciThe characteristic value of (2) converts the clustering center c into a quantum state | c>Represents;
a similarity calculation module: the method is used for outputting the similarity between each data and the clustering center through quantum computation of customer behavior data and the clustering center, namely computing the quantum state | xm>Similarity of | c >, existence of similarity in quantum state | αm>Performing the following steps;
a clustering center searching module: for finding quantum state | αm>Middle data sample | xm>And cluster center | ci>So as to find the minimum value of (c) with sample xmNearest cluster center cj。
Correspondingly, in an exemplary embodiment, the obtaining the customer behavior data set D includes:
a data extraction submodule: extracting required data from a database;
a data cleaning submodule: checking all variables for missing, unknown, invalid or valid values; then, according to the variable distribution characteristics and the actual requirements, adopting corresponding rules to update the missing value, the unknown value and the invalid value to be effective;
a data conversion submodule: converting different types of data into quantum k-means algorithms can use the type of quantum state.
Correspondingly, in an exemplary embodiment, the samples x in the customer behavior-based dataset DmCharacteristic value of (2), sample xmConversion to quantum state | xm>Expressed, the conversion formula is:
in the formula, xmjDenotes the m-th sample xmThe jth feature of (1);
the k cluster centers c according to the selectioniThe characteristic value of (2) converts the clustering center c into a quantum state | c>Expressed, the conversion formula is:
in the formula, cijThe jth feature representing the ith cluster center c.
Correspondingly, in an exemplary embodiment, the similarity calculation module includes:
and controlling the switching gate to calculate a similarity result | ψ >, wherein the formula is as follows:
in the formula, n represents a sample xmThe number of the (c) component(s),s(xm,ci) Denotes xmAnd ciThe similarity of (2);
quantum state | ψ>The output of the phase estimation algorithm is | | c as input to the phase estimation algorithmi-xm|>This is the data sample | xm>And cluster center | c>Similarity between them, exist in quantum state | alpham>In, the formula is:
correspondingly, in an exemplary embodiment, the cluster center searching module includes:
randomly selecting a cluster center ciAs an initial value, the following steps are then repeatedHere, by continuously iteratively looking up | αm>Minimum of (2):
preparing initial value c of clustering centeriQuantum state of (b) is | beta>;
Will | αm>、|β>As input, | b>C 'is found by utilizing a Grover algorithm as a control input'jC of wherein'jRepresenting a temporary cluster center;
if | c'j-xm|<|cj-xmL, then c'jReplacement cj。
Having the same inventive concept as the above-described exemplary embodiments, an exemplary embodiment of the present invention provides a storage medium having stored thereon computer instructions that, when executed, perform the steps of the method for quantum hidden markov model solution fraud detection.
Having the same inventive concept as the above-described exemplary embodiments, an exemplary embodiment of the present invention provides a terminal, including a memory and a processor, where the memory has stored thereon computer instructions executable on the processor, and the processor executes the computer instructions to perform the steps of the method for quantum hidden markov model solution fraud detection.
Based on such understanding, the technical solution of the present embodiment or parts of the technical solution may be essentially implemented in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
It is to be understood that the above-described embodiments are illustrative only and not restrictive of the broad invention, and that various other modifications and changes in light thereof will be suggested to persons skilled in the art based upon the above teachings. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications of the invention may be made without departing from the spirit or scope of the invention.
Claims (10)
1. The customer segmentation method based on the quantum K-means algorithm is characterized by comprising the following steps of: the method comprises the following steps:
determining a subdivision angle, namely the characteristic quantity D, and acquiring a client behavior data set D;
according to the sample x in the customer behavior data set DmCharacteristic value of (2), sample xmConversion to quantum state | xm>Represents; and according to the selected k cluster centers ciThe characteristic value of (2) converts the clustering center c into a quantum state | c>Represents;
the customer behavior data and the clustering center are subjected to quantum computation, and the similarity between each data and the clustering center is output, namely the quantum state | x is computedm>And | c>The similarity exists in a quantum state | alpham>Performing the following steps;
looking up quantum state | alpham>Middle data sample | xm>And cluster center | ci>So as to find the minimum value of (c) with sample xmNearest cluster center cj。
2. The quantum K-means algorithm based customer segmentation method of claim 1, wherein: the obtaining of the customer behavior data set D includes:
data extraction: extracting required data from a database;
data cleaning: checking all variables for missing, unknown, invalid or valid values; then, according to the variable distribution characteristics and the actual requirements, adopting corresponding rules to update the missing value, the unknown value and the invalid value to be effective;
data conversion: converting different types of data into quantum k-means algorithms can use the type of quantum state.
3. The quantum K-means algorithm based customer segmentation method of claim 1, wherein: the samples x in the data set D according to the customer behaviormCharacteristic value of (2), sample xmConversion to quantum state | xm>Expressed, the conversion formula is:
in the formula, xmjDenotes the m-th sample xmThe jth feature of (1);
the k cluster centers c according to the selectioniThe characteristic value of (2) converts the clustering center c into a quantum state | c>Expressed, the conversion formula is:
in the formula, cijThe jth feature representing the ith cluster center c.
4. The quantum K-means algorithm based customer segmentation method of claim 3, wherein: the customer behavior data and the clustering center are subjected to quantum computation, and the similarity between each data and the clustering center is output, namely, the quantum state | x is computedm>And | c>The similarity exists in a quantum state | alpham>In, comprising:
and controlling the switching gate to calculate a similarity result | ψ >, wherein the formula is as follows:
in the formula, n represents a sample xmThe number of the (c) component(s),,s(xm,ci) Denotes xmAnd ciThe similarity of (2);
quantum state | ψ>The output of the phase estimation algorithm is | | c as input to the phase estimation algorithmi-xm|>This is the data sample | xm>And cluster center | c>Similarity between them, exist in quantum state | alpham>In, the formula is:
5. the quantum K-means algorithm based customer segmentation method of claim 4, wherein: said finding a quantum state | αm>Middle data sample | xm>And cluster center | ci>So as to find the minimum value of (c) with sample xmNearest cluster center cjThe method comprises the following steps:
randomly selecting a cluster center ciAs an initial value, the following steps are then repeatedHere, by continuously iteratively looking up | αm>Minimum of (2):
preparing initial value c of clustering centeriQuantum state of (b) is | beta>;
Will | am>、|β>As input, | b>C 'is found by utilizing a Grover algorithm as a control input'jC of wherein'jRepresenting a temporary cluster center;
if | c'j-xm|<|cj-xmL, then c'jReplacement cj。
6. The customer segmentation system based on the quantum K-means algorithm is characterized in that: the method comprises the following steps:
an angle subdivision module: determining a subdivision angle, namely the characteristic quantity D, and acquiring a client behavior data set D;
quantum state conversion module: for determining the sample x from the customer behavior data set DmCharacteristic value of (2), sample xmConversion to quantum state | xm>Represents; and according to the selected k cluster centers ciThe characteristic value of (2) converts the clustering center c into a quantum state | c>Represents;
a similarity calculation module: the method is used for outputting each data and cluster by quantum computing the customer behavior data and the cluster centerSimilarity between centres, i.e. computing quantum state | xm>And | c>The similarity exists in a quantum state | alpham>Performing the following steps;
a clustering center searching module: for finding quantum state | αm>Middle data sample | xm>And cluster center | ci>So as to find the minimum value of (c) with sample xmNearest cluster center cj。
7. The quantum K-means algorithm based customer segmentation system of claim 6, wherein: the obtaining of the customer behavior data set D includes:
a data extraction submodule: extracting required data from a database;
a data cleaning submodule: checking all variables for missing, unknown, invalid or valid values; then, according to the variable distribution characteristics and the actual requirements, adopting corresponding rules to update the missing value, the unknown value and the invalid value to be effective;
a data conversion submodule: converting different types of data into quantum k-means algorithms can use the type of quantum state.
8. The quantum K-means algorithm based customer segmentation system of claim 6, wherein: the samples x in the data set D according to the customer behaviormCharacteristic value of (2), sample xmConversion to quantum state | xm>Expressed, the conversion formula is:
in the formula, xmjDenotes the m-th sample xmThe jth feature of (1);
the k cluster centers c according to the selectioniThe characteristic value of (2) converts the clustering center c into a quantum state | c>Expressed, the conversion formula is:
in the formula, cijThe jth feature representing the ith cluster center c.
9. The quantum K-means algorithm based customer segmentation system of claim 8 wherein: the similarity calculation module includes:
and controlling the switching gate to calculate a similarity result | ψ >, wherein the formula is as follows:
in the formula, n represents a sample xmThe number of the (c) component(s),s(xm,ci) Denotes xmAnd ciThe similarity of (2);
quantum state | ψ>The output of the phase estimation algorithm is | | c as input to the phase estimation algorithmi-xm|>This is the data sample | xm>And cluster center | c>Similarity between them, exist in quantum state | alpham>In, the formula is:
10. the quantum K-means algorithm based customer segmentation system of claim 9, wherein: the cluster center searching module comprises:
randomly selecting a cluster center ciAs an initial value, the following steps are then repeatedHere, by successive iterative checksFinding | αm>Minimum of (2):
preparing initial value c of clustering centeriQuantum state of (b) is | beta>;
Will | αm>、|β>As input, | b>C 'is found by utilizing a Grover algorithm as a control input'jC of wherein'jRepresenting a temporary cluster center;
if | c'j-xm|<|cj-xmL, then c'jReplacement cj。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110982944.3A CN113688906A (en) | 2021-08-25 | 2021-08-25 | Customer segmentation method and system based on quantum K-means algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110982944.3A CN113688906A (en) | 2021-08-25 | 2021-08-25 | Customer segmentation method and system based on quantum K-means algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113688906A true CN113688906A (en) | 2021-11-23 |
Family
ID=78582677
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110982944.3A Pending CN113688906A (en) | 2021-08-25 | 2021-08-25 | Customer segmentation method and system based on quantum K-means algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113688906A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114219048A (en) * | 2022-02-21 | 2022-03-22 | 合肥本源量子计算科技有限责任公司 | Spectral clustering method and device based on quantum computation, electronic equipment and storage medium |
CN114282000A (en) * | 2022-02-21 | 2022-04-05 | 合肥本源量子计算科技有限责任公司 | Text clustering method, text clustering device, text clustering medium and electronic device based on quantum computation |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040117403A1 (en) * | 2001-05-14 | 2004-06-17 | David Horn | Method and apparatus for quantum clustering |
CN110852380A (en) * | 2019-11-11 | 2020-02-28 | 安徽师范大学 | Quantum ant lion and k-means based clustering method and intrusion detection method |
US20200410380A1 (en) * | 2019-06-28 | 2020-12-31 | International Business Machines Corporation | Unsupervised clustering in quantum feature spaces using quantum similarity matrices |
CN112686328A (en) * | 2021-01-06 | 2021-04-20 | 成都信息工程大学 | Data classification system and method based on quantum fuzzy information |
-
2021
- 2021-08-25 CN CN202110982944.3A patent/CN113688906A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040117403A1 (en) * | 2001-05-14 | 2004-06-17 | David Horn | Method and apparatus for quantum clustering |
US20200410380A1 (en) * | 2019-06-28 | 2020-12-31 | International Business Machines Corporation | Unsupervised clustering in quantum feature spaces using quantum similarity matrices |
CN110852380A (en) * | 2019-11-11 | 2020-02-28 | 安徽师范大学 | Quantum ant lion and k-means based clustering method and intrusion detection method |
CN112686328A (en) * | 2021-01-06 | 2021-04-20 | 成都信息工程大学 | Data classification system and method based on quantum fuzzy information |
Non-Patent Citations (4)
Title |
---|
K. BENLAMINE: "Quantum Collaborative K-means", 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), pages 1 - 7 * |
余健等: "一种基于聚类分析的电力计量自动化检定流水线故障诊断方法", 电子设计工程, vol. 28, no. 8, pages 76 - 79 * |
刘雪娟等: "量子k-means算法", 吉林大学学报(工学版), vol. 48, no. 2, pages 2 * |
李杰: "基于聚类算法的电力客户行为优化模型研究", 中国优秀硕士学位论文全文数据库工程科技Ⅱ辑, no. 2, pages 1 - 3 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114219048A (en) * | 2022-02-21 | 2022-03-22 | 合肥本源量子计算科技有限责任公司 | Spectral clustering method and device based on quantum computation, electronic equipment and storage medium |
CN114282000A (en) * | 2022-02-21 | 2022-04-05 | 合肥本源量子计算科技有限责任公司 | Text clustering method, text clustering device, text clustering medium and electronic device based on quantum computation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Pérez-Martín et al. | Big Data techniques to measure credit banking risk in home equity loans | |
US7930242B2 (en) | Methods and systems for multi-credit reporting agency data modeling | |
US9489627B2 (en) | Hybrid clustering for data analytics | |
Noirhomme‐Fraiture et al. | Far beyond the classical data models: symbolic data analysis | |
Afonso et al. | Housing prices prediction with a deep learning and random forest ensemble | |
CN110956273A (en) | Credit scoring method and system integrating multiple machine learning models | |
US11636486B2 (en) | Determining subsets of accounts using a model of transactions | |
JP2002543538A (en) | A method of distributed hierarchical evolutionary modeling and visualization of experimental data | |
CN110866782A (en) | Customer classification method and system and electronic equipment | |
US10956825B1 (en) | Distributable event prediction and machine learning recognition system | |
CN111783039B (en) | Risk determination method, risk determination device, computer system and storage medium | |
CN112381154A (en) | Method and device for predicting user probability and computer equipment | |
CN113688906A (en) | Customer segmentation method and system based on quantum K-means algorithm | |
CN107704883A (en) | A kind of sorting technique and system of the grade of magnesite ore | |
CN110929525A (en) | Network loan risk behavior analysis and detection method, device, equipment and storage medium | |
CN111460161A (en) | Unsupervised text theme related gene extraction method for unbalanced big data set | |
CN112836750A (en) | System resource allocation method, device and equipment | |
CN112348685A (en) | Credit scoring method, device, equipment and storage medium | |
CN107203772A (en) | A kind of user type recognition methods and device | |
CN112529319A (en) | Grading method and device based on multi-dimensional features, computer equipment and storage medium | |
Li et al. | An improved genetic-XGBoost classifier for customer consumption behavior prediction | |
CN117035983A (en) | Method and device for determining credit risk level, storage medium and electronic equipment | |
CN116502898A (en) | Enterprise risk portrait generation method and device based on neural network | |
CN113988878B (en) | Graph database technology-based anti-fraud method and system | |
Yu et al. | Computer Image Content Retrieval considering K‐Means Clustering Algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |