CN114663690B

CN114663690B - System for realizing breast cancer classification based on novel quantum frame

Info

Publication number: CN114663690B
Application number: CN202210411357.3A
Authority: CN
Inventors: 单征; 丁晓东; 郭佳郁; 许瑾晨; 侯一凡; 连航; 范智强
Original assignee: Information Engineering University of PLA Strategic Support Force
Current assignee: Information Engineering University of PLA Strategic Support Force
Priority date: 2022-04-19
Filing date: 2022-04-19
Publication date: 2023-04-28
Anticipated expiration: 2042-04-19
Also published as: CN114663690A

Abstract

The invention discloses a system for realizing breast cancer classification based on a novel quantum frame, which is used for executing the following steps: quantum coding is carried out according to the breast cancer data characteristics, and sample characteristics are coded on a quantum circuit; performing quantum kernel entropy principal component analysis on the breast cancer data by combining a quantum kernel estimation method, so as to achieve the aim of preprocessing the breast cancer data; according to the obtained preprocessed breast cancer data, successively performing quantum coding to enter a variable component sub-circuit, namely a quantum variable component classifier; the parameter optimization is realized by using a quantum gradient descent algorithm to the parameters of the quantum variation classifier; judging whether the loss function of the quantum variation classifier meets the actual requirement, and if so, ending the quantum variation classification process; and if the actual requirement is not met, quantum encoding is carried out on the next piece of pre-processed breast cancer data. Under the condition that the feature value of the data set is less and the classification accuracy is not high, the method can effectively improve the breast cancer classification accuracy.

Description

System for realizing breast cancer classification based on novel quantum frame

Technical Field

The invention belongs to the technical field of breast cancer classification and identification, and particularly relates to a system for realizing breast cancer classification based on a novel quantum frame.

Background

The existing main technology for realizing breast cancer classification based on quantum machine learning is a method based on quantum kernel estimation and a method based on quantum variation classification.

The main disadvantages of these techniques are:

1. the preprocessing of the data is mainly realized by adopting a traditional method, and particularly SVD (singular value decomposition) and PCA (principal component analysis) are mainly utilized during the dimension reduction processing of the data, wherein the two methods have obvious defects, firstly, the actual data do not show complex nonlinear relations, secondly, a certain amount of redundant information exists between the data, the information of a certain attribute is extremely likely to be excessively strengthened, and the searching of the real potential structure between the data is blocked by neglecting certain useful characteristics.

2. The traditional gradient descent algorithm is mainly adopted in the model parameter optimization link, and the convergence time is long.

Machine learning and quantum computing are two different approaches, both of which exhibit potential in dealing with some previously difficult problems [ document 1: vojt ě ch

Antonio D.Córcoles.Supervised learning with quantum-enhanced feature spaces.Nature 567,209-212,2019.]. Many experimental proposals for noisy mesoscale quantum devices involve training a parametric quantum circuit with classical optimization loops. The mixed quantum classical algorithm is widely applied to quantum simulation, optimization and machine learning. The flexibility of these approaches to certain types of errors and the high flexibility in coherence time and gate requirements make them particularly attractive for implementation in Noisy Intermediate Scale Quantum (NISQ) systems. Random circuits are often proposed as initial guesses to explore the quantum state space due to their simplicity and hardware efficiency. Wherein the classification problem belongs to the category of supervised machine learning, based on a given tagged dataset

Randomly selecting a training sample set T and a test sample set S from the data set to enable T U S E C, obtaining an objective function f through learning, and integrating each attribute set C _i Can be correctly mapped into a predefined class label, and then the unknown class samples are classified by using the objective function f. The classification requirement must know explicitly the information of each category in advance and is a supervised learning algorithm modeling or predicting discrete random variables. Quantum machine learning is to use quantum states to represent feature space of classification problems and use quantum HilbertThe large dimension of space is enhanced. Among them, the theory and quantum circuit structure of the variation classification method are presented most typically in document 1, and the quantum variation classifier is built in [ document 2: mitarai, K., negoro, M., kitagawa, M.&Fujii,K.Quantum circuit learning.arXiv preprint arXiv:1803.00745(2018).][ document 3: farhi, E.&Neven,H.Classifification with quantum neural networks on near term processors.arXiv preprint arXiv:1802.06002(2018).]And classifying the training set by using the variable component sub-circuits. However, while the traditional machine learning method is still adopted to find the optimal parameters of certain tasks when the parameter optimization is performed, training the quantum circuit by using the classical optimization method based on gradient or no gradient can be seriously affected by the barren plateau existing in the cost landscape. While Kernel Principal Component Analysis (KPCA) effectively calculates principal components in a high-dimensional feature space through integration operators and nonlinear kernel functions. Compared with other nonlinear Principal Component Analysis (PCA) techniques, KPCA only requires a solution to the eigenvalue problem, without any nonlinear optimization, but KPCA is limited by the limitation that the kernel estimation cost increases significantly when the dataset is too large. />

Disclosure of Invention

Aiming at the existing method for realizing breast cancer classification based on quantum machine learning, the data preprocessing is mainly realized by adopting a traditional method, and the useful characteristics of original information can not be ensured while the dimension is reduced; in the model optimization link, a traditional gradient descent algorithm is mainly adopted to achieve the problem of long convergence time, a system for realizing breast cancer classification based on a novel quantum frame is provided, relevant parameters of a cost function which needs to be solved by a traditional method are encoded onto the relative phase of an overlapped state in a Hilbert space through conversion, the quantum optimization algorithm is utilized to find the optimal parameters of certain tasks, and the aim is to relieve the barren plateau problem; meanwhile, a quantum kernel estimation method is utilized to realize optimization and acceleration of Kernel Principal Component Analysis (KPCA), so that the aim of rapidly carrying out principal component analysis is fulfilled; under the condition that the feature value of the data set is less and the classification accuracy is not high, the breast cancer classification accuracy can be effectively improved.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

a system for achieving breast cancer classification based on a novel quantum framework, the system being configured to perform the steps of:

step 1: quantum coding is carried out according to the breast cancer data characteristics, and sample characteristics are coded on a quantum circuit;

step 2: performing quantum kernel entropy principal component analysis on the breast cancer data by combining a quantum kernel estimation method, so as to achieve the aim of preprocessing the breast cancer data;

step 3: sequentially performing quantum coding according to the preprocessed breast cancer data obtained in the step 2, and entering a variable component sub-circuit, namely a quantum variable component classifier;

step 4: the parameter optimization is realized by using a quantum gradient descent algorithm to the parameters of the quantum variation classifier;

step 5: judging whether the loss function of the quantum variation classifier meets the actual requirement, and if so, ending the quantum variation classification process; and if the actual requirement is not met, quantum encoding is carried out on the next piece of preprocessed breast cancer data, and the step 3 is carried out.

Further, the quantum encoding mode in the step 1 is to encode the breast cancer data characteristic onto the phase of the state.

Further, the step 2 includes:

step 2.1: solving a characteristic value and a characteristic vector lam and vec of a kernel function K;

x is another X ^T ＝[φ(x ₁ ),…,φ(x _N )]Then

Wherein phi (x) is a mapping function, and N is the total number of breast cancer data processed;

step 2.2: calculating an entropy corresponding to the characteristic value and the characteristic vector;

step 2.3: rearranging the eigenvalues and eigenvectors according to the magnitude of entropy;

step 2.4: selecting the first n characteristic values with the maximum entropy and the characteristic vector lambda, u;

step 2.5: the first n eigenvalues and eigenvectors are followed

Merging into one evidence;

step 2.6: calculating K' according to a quantum kernel estimation method;

step 2.7: calculation of

Obtaining dimension reduction->

Wherein lambda, u are the eigenvalues and eigenvectors of the kernel function K arranged according to the magnitude of entropy, respectively.

Further, in the step 3, the method includes:

data X of breast cancer after pretreatment _i Encoding to a quantum state

Go through the tape parameter->

Is a variable component sub-line of (1), wherein->

The initial value is +.>

Further, the step 4 includes:

step 4.1: is provided withFix X _i A classification prediction result corresponding to the correctly classified bit string beta;

step 4.2: measuring in z direction to obtain output bit string alpha and probability thereof

After each measurement is completed, comparing the bit strings alpha and beta to obtain data X _i Classification prediction result y of (2) _i ；

Step 4.3: will classify the prediction result y _i True classification result Y given in training set _i Comparing and calculating a loss function

Step 4.4: returning to the step 3, sequentially inputting the rest of the preprocessed breast cancer data into a quantum circuit to obtain corresponding data

And calculates the total cost function +.>

Step 4.5: optimizing the total cost function by selecting a quantum gradient descent algorithm, and repeatedly updating parameters in a quantum circuit

Until the cutoff condition is optimized.

Compared with the prior art, the invention has the beneficial effects that:

1. the invention adopts the analysis method of the principal components of the quantum kernel entropy, which is a process for realizing the ascending dimension and the descending dimension, and can remove redundant information of data through the process, obtain a more compact and economical data representation form, and simultaneously more effectively express the potential structure of the data. Finally, the method converts a nonlinear separable problem into a linear separable problem, and can obtain a better processing effect on data with more complex structures, thereby really achieving the aim of data preprocessing.

2. The gradient descent method is an optimization algorithm, and is also commonly called as a steepest descent method, and is commonly used for recursively approximating a minimum deviation model in machine learning and artificial intelligence, wherein the gradient descent direction is a search direction by using a negative gradient direction, and a minimum value is solved along the gradient descent direction. In the training process, the loss values of the output value and the true value can be obtained in each forward propagation, the smaller the loss value is, the better the representative model is, so that the gradient descent algorithm is used here to help find the minimum loss value, and the corresponding learning parameters b and w can be reversely deduced, thereby achieving the effect of optimizing the model. However, in the real adoption of the traditional optimization process, a lot of calculation resources and time are consumed, and the quantum gradient descent algorithm is adopted, so that the quantum calculation advantage is fully utilized, and the calculation resources and the calculation time can be effectively reduced.

3. The invention encodes the preprocessed data into the quantum circuit, realizes classification by utilizing the variable component quantum circuit, and can effectively improve accuracy and calculation force on the problem of breast cancer classification.

Drawings

FIG. 1 is a basic flow chart of a system for implementing breast cancer classification based on a novel quantum framework in accordance with an embodiment of the present invention;

FIG. 2 is a diagram illustrating linear segmentation according to an embodiment of the present invention;

FIG. 3 is a general flow chart of a quantum kernel estimation method according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of data encoding according to an embodiment of the present invention;

FIG. 5 is a representation of data encoding in accordance with an embodiment of the present invention;

FIG. 6 is a schematic diagram of the spatial transformation relationship of the quantum kernel entropy principal component analysis method according to an embodiment of the present invention;

FIG. 7 is a general flow chart of a quantum variation classification method according to an embodiment of the invention;

FIG. 8 is a diagram illustrating an example of a coding scheme according to an embodiment of the present invention;

FIG. 9 is a diagram illustrating a second embodiment of an encoding method according to the present invention;

FIG. 10 is a schematic diagram of a quantum variation classifier according to an embodiment of the present invention;

FIG. 11 is a schematic diagram of a quantum circuit optimized based on a quantum gradient descent algorithm according to an embodiment of the present invention;

FIG. 12 is a graph showing correlation of sample features corresponding to breast cancer data used in an experiment according to an embodiment of the present invention;

FIG. 13 shows the experimental results of the embodiment of the present invention.

Detailed Description

The invention is further illustrated by the following description of specific embodiments in conjunction with the accompanying drawings:

as shown in fig. 1, a system for realizing breast cancer classification based on a novel quantum framework is used for executing the following steps:

It is worth to be noted that, step 1, step 2 specifically belong to the processing steps of the principal component analysis method of the quantum kernel entropy; the steps 3-5 specifically belong to the processing steps of the quantum variation classification method.

The analysis method of the principal component of the quantum kernel entropy is specifically shown as follows.

In machine learning algorithms, we often use kernel functions in the face of non-linearity problems, statistical learning methods [ Li Hang ]Beijing: press, 2012, university of bloom.]The definition of the kernel function is given: is provided with

For the feature space (Hilbert space), if there is one slave +.>

To->

Mapping phi (x):>

so that all->

The function K (x, z) satisfies the condition K (x, z) =Φ (x) ·Φ (z), where K (x, z) is a kernel function, Φ (x) is a mapping function, and Φ (x) ·Φ (z) is an inner product of Φ (x) and Φ (z). />

As shown in fig. 2, for the problem that the training set is not linearly separable, we typically map the training set into a high-dimensional space for linear segmentation, and we need to calculate the classification function in the high-dimensional space:

thus we need not know the vectors in the high-dimensional space, but only the dot product of the two vectors in the high-dimensional space, i.e. the kernel function K (x, z) =φ (x) & φ (z), the classification problem in the high-dimensional space becomes

The kernel function is calculated, which corresponds to the dot product in the high-dimensional space being calculated, and thus corresponds to the division in the high-dimensional space.

When the input space is an European space or a discrete set and the feature space is a Hilbert space, the kernel function represents an inner product between feature vectors resulting from mapping the input space to the feature space. By using kernel functions, nonlinear support vector machines can be learned, which is equivalent to linear support vector machine learning in a higher-dimensional feature space, a method called kernel method. The overall flow of the quantum kernel estimation method is shown in fig. 3.

In this framework, we use a quantity simulator to estimate the kernel matrix of |T|×|T|

Sample points in all training sets +.>

Using feature mapping to obtain classical data (breast cancer data in this application) vector +.>

To quantum state |phi (x)>By transition in initial state |0>The above-applied unitary transform enables the coding of data, namely:

the specific process is shown in fig. 4.

The concrete expression is shown in fig. 5.

The spatial transformation relationship of the quantum kernel entropy principal component analysis method based on quantum kernel estimation is shown in fig. 6:

firstly, mapping χ to a Feature Space (Feature Space) through a nonlinear mapping function ψ by points in an Input Space (Input Space), then mapping the points in the Feature Space to a Kernel Space (Kernel Space) through a Kernel (Kernel) function, and then mapping back to the Feature Space through a certain mapping relation.

The quantum (linear) core estimation method is described as follows:

the mathematical format of the kernel function is expressed as follows:

wherein X is ^T ＝[φ(x ₁ ),…,φ(x _N )]N is the total number of breast cancer data processed, then:

and calculating and obtaining a characteristic value lambda of K and a corresponding characteristic vector u.

And (3) obtaining entropy of the corresponding eigenvalues and eigenvectors, and selecting the previous kappa eigenvalues lambda and the corresponding eigenvectors u according to the entropy.

Equal-upper two sides are multiplied by X ^T Obtaining:

x is to be ^T u is unitized to obtain:

both sides multiply with phi (x _j ) Obtaining:

wherein the method comprises the steps of

The data after dimension reduction is obtained.

The algorithm of the quantum kernel entropy principal component analysis method is described as follows:

specifically, the quantum variation classification method algorithm framework is shown in fig. 7.

Description of algorithm:

1. data X of breast cancer after pretreatment _i Encoding to a quantum state

Go through the tape parameter->

Wherein>

The initial value is +.>

Setting a classification prediction result corresponding to the correctly classified bit string beta;

2. measuring in z direction to obtain output bit string alpha and probability thereof

After each measurement is finished, comparing the bit strings alpha and beta to obtain the pair data X _i Classification prediction result y classified to a result _i ；/>

3. Will classify the prediction result y _i True classification result Y given in training set _i Comparing and calculating a loss function

4. Returning to the first step, the rest of the test data X _i Sequentially inputting into quantum circuits to obtain corresponding

And calculates the total cost function +.>

5. Optimizing the total cost function by selecting a quantum gradient descent algorithm, and repeatedly updating parameters in a quantum circuit

Until the cutoff condition is optimized.

Specifically, the quantum encoding scheme is as follows:

two coding modes are generally adopted: encoding data into line parameters and encoding data into phase of states. Specifically, in this embodiment, the second encoding mode is adopted in step 1, and the first encoding mode is adopted in step 3.

The first coding method is shown in fig. 8, and the corresponding codes are:

cost_cir.rz(-2*math.pi*x[i][0],qubitlist[0])

cost_cir.rz(-2*math.pi*x[i][1],qubitlist[1])

cost_cir.rz(-2*math.pi*x[i][2],qubitlist[2])

cost_cir.rz(-2*math.pi*x[i][3],qubitlist[3])

the second coding scheme is shown in fig. 9, and the corresponding codes are:

def convertDataToAngles(data):

prob1＝data[2]**2+data[3]**2

prob0＝1-prob1

angle1＝2*np.arcsin(np.sqrt(prob1))

prob1＝data[3]**2/prob1

angle2＝2*np.arcsin(np.sqrt(prob1))

prob1＝data[1]**2/prob0

angle3＝2*np.arcsin(np.sqrt(prob1))

return np.array([angle1,angle2,angle3])

def encodeData(qc,qreg,angles):

qc.ry(angles[0],qreg[1])

qc.cry(angles[1],qreg[1],qreg[0])

qc.x(qreg[1])

qc.cry(angles[2],qreg[1],qreg[0])

qc.x(qreg[1])

specifically, the quantum variation classifier circuit configuration in the present application is shown in fig. 10.

Wherein, according to the structure of the feature mapping circuit, we construct the classifier part of the variation algorithm by appending a single-qubit unit layer and an entanglement gate graph. Each subsequent layer or depth contains a set of additional entanglement for all qubits of the algorithm. We use a coherent controllable quantum mechanical system, such as a superconducting chip with n transmission qubits, to fabricate short depth quantum circuits

In the experiment here, consisting of n=2 qubits, one control phase gate is added per depth. The single qubit units used in the classifier are limited to Y and Z rotations to simplify the number of parameters that the classical optimizer needs to handle. The control phase we use, rather than CNOT, the entanglement of gates is reasonable, our aim is to increase the popularity in our framework. The use of control phase gates does not require detailed refinement of this part of the algorithm for different system topologies. Our compiler can then use the specific entanglement diagram of a given device to translate each controlled phase gate into a CNOT that is available in our system. The general circuit consists of the following single and multiple qubits gates:

wherein the method comprises the steps of

Specifically, the quantum gradient descent algorithm principle is as follows:

we first define the loss function

Loss function

For theta _i Partial derivatives are calculated and can be simplified as +.>

For theta _i The partial derivative is calculated, namely: />

Using the product rule, will->

Unfolding to obtain:

by Hermitian conjugation, the following forms can be converted:

namely:

/>

setting up

And->

The gate lines of (2) are as follows:

def GRYGate(theta):

u00＝-1/2*math.cos(theta/2)

u01＝-1/2*math.sin(theta/2)

u10＝1/2*math.sin(theta/2)

u11＝-1/2*math.cos(theta/2)

gateLabel＝"G({})".format(theta)

GRYGate＝UnitaryGate(np.array([[u00,u01],[u10,u11]]),label＝gateLabel)

return GRYGate

def GRZGate(theta):

u00＝-i/2*math.exp(-i*theta/2)

u01＝0

u10＝0

u11＝-i/2*math.exp(i*theta/2)

gateLabel＝"G({})".format(theta)

GRZGate＝UnitaryGate(np.array([[u00,u01],[u10,u11]]),label＝gateLabel)

return GRZGate

specifically, the quantum circuit parameter optimization is performed based on a quantum gradient descent algorithm in the following manner:

the following we designed a quantum circuit to find the desired inner product form:

we use Hadamard method to realize, first, we prepare the input quantum state and prepare the auxiliary state

Now to

Application Y _k W(θ)V(x _k ) In state |1>And (3) obtaining:

by turning over the auxiliary state of operation

Application->

In state |0>The above results are:

applying Hadamard gate operation to the auxiliary gate yields:

the probability of the auxiliary bit 0 is now:

finally, using the probability of 0 of the auxiliary qubit to calculate theta _i Is a gradient of (a).

Updating

(eta is the step size).

The final quantum circuit diagram is shown in fig. 11.

To verify the effect of the invention, the following experiments were performed:

the current global data production keeps growing in an explosive situation of about 24% per year, computing technologies represented by machine learning are rapidly developed, and machine learning has been developed into a plurality of subdivision fields such as unsupervised learning, supervised learning, semi-supervised learning, reinforcement learning, deep learning and the like. Machine learning aims at the existing data and combines different learning strategies to explore the association relation and structure implied by the data, so that a learning model is obtained, and analysis and prediction are carried out according to the model. The breast cancer data set selected by the invention is specifically a standard data set (https:// scikit-learn. Org/stable /) for diagnosing breast cancer of a scikit machine learning library. The data set is specifically composed of 569 samples, each having 30 features, describing the average, standard deviation and maximum of 10 dimensions of radius, texture, perimeter, area, smoothness, etc. of breast tumor, 212 cases of malignant samples, 357 cases of benign samples. Sample feature correlations are shown in fig. 12. The classification result obtained by the method is shown in fig. 13, and the experimental result shows that the method has higher classification prediction accuracy.

In summary, the invention provides a novel quantum framework, namely a classical-quantum mixed solution framework for solving the classification problem, and is applied to breast cancer classification and identification, the framework of the quantum variation classification method proposed in the literature 1 is improved, relevant parameters of a cost function which needs to be solved by a traditional method are encoded onto the relative phases of superposition states in the Hilbert space through conversion, and the optimal parameters of certain tasks are searched by utilizing a quantum optimization algorithm, so that the barren altitude problem is hopefully relieved. Meanwhile, a quantum kernel estimation method is utilized to realize optimization and acceleration of Kernel Principal Component Analysis (KPCA), so that the aim of rapidly carrying out principal component analysis is fulfilled; under the condition that the feature value of the data set is less and the classification accuracy is not high, the framework is utilized, so that the classification accuracy can be effectively improved.

Specifically, the invention adopts a quantum kernel entropy principal component analysis method, which is a process for realizing ascending and then descending of dimensions, and can remove redundant information of data through the process, so as to obtain a more compact and economical data representation form, and simultaneously, can more effectively express potential structures of the data. Finally, the method converts a nonlinear separable problem into a linear separable problem, and can obtain a better processing effect on data with more complex structures, thereby really achieving the aim of data preprocessing.

The gradient descent method is an optimization algorithm, and is also commonly called as a steepest descent method, and is commonly used for recursively approximating a minimum deviation model in machine learning and artificial intelligence, wherein the gradient descent direction is a search direction by using a negative gradient direction, and a minimum value is solved along the gradient descent direction. In the training process, the loss values of the output value and the true value can be obtained in each forward propagation, the smaller the loss value is, the better the representative model is, so that the gradient descent algorithm is used here to help find the minimum loss value, and the corresponding learning parameters b and w can be reversely deduced, thereby achieving the effect of optimizing the model. However, in the real adoption of the traditional optimization process, a lot of calculation resources and time are consumed, and the quantum gradient descent algorithm is adopted, so that the quantum calculation advantage is fully utilized, and the calculation resources and the calculation time can be effectively reduced.

The invention encodes the preprocessed data into the quantum circuit, realizes classification by utilizing the variable component quantum circuit, and can effectively improve accuracy and calculation force on the problem of breast cancer classification.

The foregoing is merely illustrative of the preferred embodiments of this invention, and it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of this invention, and it is intended to cover such modifications and changes as fall within the true scope of the invention.

Claims

1. A system for achieving breast cancer classification based on a novel quantum framework, characterized in that the system is used for executing the following steps:

step 1: quantum coding is carried out according to the breast cancer data characteristics, and sample characteristics are coded on a quantum circuit; the breast cancer data are data in a scikit machine learning library breast cancer diagnosis standard data set;

the step 2 comprises the following steps:

in addition, another

Then->

step 2.5: the first n eigenvalues and eigenvectors are followed

Merging into one evidence;

step 2.6: calculating K' according to a quantum kernel estimation method;

step 2.7: calculation of

Obtaining dimension reduction->

Wherein λ, u are the entropy-dependent magnitudes of the kernel function K, respectivelyThe feature values and feature vectors of the arrangement;

the step 3 comprises the following steps:

data X of breast cancer after pretreatment _i Encoding to quantum state

Go through the tape parameter->

Is a variable component sub-line of (1), wherein->

The initial value is +.>

the step 4 comprises the following steps:

step 4.1: setting X _i A classification prediction result corresponding to the correctly classified bit string beta;

And calculates the total cost function +.>

i=0, 1, &.. until the cut-off condition is optimized;

2. The system for classifying breast cancer based on the novel quantum frame according to claim 1, wherein the quantum encoding mode in the step 1 is to encode the breast cancer data feature onto the phase of the state.