CN114282000A - Text clustering method, text clustering device, text clustering medium and electronic device based on quantum computation - Google Patents
Text clustering method, text clustering device, text clustering medium and electronic device based on quantum computation Download PDFInfo
- Publication number
- CN114282000A CN114282000A CN202210154196.4A CN202210154196A CN114282000A CN 114282000 A CN114282000 A CN 114282000A CN 202210154196 A CN202210154196 A CN 202210154196A CN 114282000 A CN114282000 A CN 114282000A
- Authority
- CN
- China
- Prior art keywords
- clustering
- text
- quantum state
- center
- quantum
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000012545 processing Methods 0.000 claims abstract description 17
- 239000013598 vector Substances 0.000 claims description 84
- 238000004590 computer program Methods 0.000 claims description 14
- 238000000513 principal component analysis Methods 0.000 claims description 11
- 230000009467 reduction Effects 0.000 claims description 11
- 238000002360 preparation method Methods 0.000 claims description 9
- 230000002457 bidirectional effect Effects 0.000 claims description 5
- 238000003860 storage Methods 0.000 claims description 5
- 238000004422 calculation algorithm Methods 0.000 abstract description 12
- 238000005516 engineering process Methods 0.000 abstract description 8
- 230000001133 acceleration Effects 0.000 abstract description 5
- 230000008901 benefit Effects 0.000 abstract description 5
- 230000000875 corresponding effect Effects 0.000 description 80
- 230000005540 biological transmission Effects 0.000 description 6
- 238000005259 measurement Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 239000002096 quantum dot Substances 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000004088 simulation Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000010276 construction Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000002184 metal Substances 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000011946 reduction process Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000013107 unsupervised machine learning method Methods 0.000 description 1
- 238000005303 weighing Methods 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a text clustering method, a text clustering device, a text clustering medium and an electronic device based on quantum computing, wherein the method comprises the following steps: preparing a first quantum state corresponding to a text to be clustered and a second quantum state corresponding to a clustering center for clustering, wherein one clustering center corresponds to one class; operating and measuring a first preset quantum circuit to obtain the similarity between the first quantum state and the second quantum state; and clustering the texts to be clustered according to the similarity between the first quantum state and the second quantum state. By utilizing the embodiment of the invention, the quantum computing technology and the clustering algorithm can be combined, the parallel acceleration advantage of quantum computing is exerted, the amount of computation required by processing the traditional clustering task is reduced, and the blank of the related technology is filled.
Description
Technical Field
The invention relates to the technical field of quantum computation, in particular to a text clustering method, a text clustering device, a text clustering medium and an electronic device based on quantum computation.
Background
The clustering algorithm is a classic unsupervised machine learning method, and for a given sample set, the sample set is divided into a plurality of clusters according to the distance between samples. The effect of having the points within a cluster as close together as possible and the distance between clusters as large as possible is to group samples with similar characteristics into one class.
At present, the amount of data to be processed by a clustering algorithm is positively correlated with the number of samples, and when the number of samples is large enough, the computational complexity required by the clustering algorithm to process the data also becomes large, so that how to reduce the computational complexity of the clustering algorithm becomes a problem to be solved urgently.
Disclosure of Invention
The invention provides a text clustering method, a text clustering device, a text clustering medium and an electronic device based on quantum computing, which combine quantum computing with a clustering algorithm, give play to the parallel acceleration advantage of quantum computing, and are used for reducing the amount of computation required for processing the traditional clustering task.
One embodiment of the invention provides a text clustering method based on quantum computation, which comprises the following steps:
preparing a first quantum state corresponding to a text to be clustered and a second quantum state corresponding to a clustering center for clustering, wherein one clustering center corresponds to one class;
operating and measuring a first preset quantum circuit to obtain the similarity between the first quantum state and the second quantum state;
and clustering the texts to be clustered according to the similarity between the first quantum state and the second quantum state.
Optionally, the preparing a first quantum state corresponding to the text to be clustered and a second quantum state corresponding to the clustering center for clustering includes:
obtaining a text vector corresponding to a text to be clustered and a center vector corresponding to a clustering center for clustering, and preparing a first quantum state corresponding to the text vector and a second quantum state corresponding to the center vector.
Optionally, the obtaining a text vector corresponding to the text to be clustered and a center vector corresponding to a clustering center for clustering includes:
obtaining a text vector corresponding to a text to be clustered and a center vector corresponding to a clustering center for clustering, and performing dimension reduction processing on the text vector and the center vector.
Optionally, the obtaining a text vector corresponding to a text to be clustered and a center vector corresponding to a clustering center for clustering, and performing a dimension reduction process on the text vector and the center vector, includes:
acquiring a text to be clustered, and inputting the text to be clustered to a pre-trained bidirectional encoder BERT model to obtain a text vector corresponding to the text to be clustered;
initializing a center vector corresponding to a clustering center for clustering, and inputting the text vector and the center vector to a pre-trained Principal Component Analysis (PCA) model to obtain the text vector and the center vector after dimensionality reduction.
Optionally, the clustering the texts to be clustered according to the similarity between the first quantum state and the second quantum state includes:
for each first quantum state, searching a second quantum state corresponding to the similarity meeting a first preset condition in the similarity between the first quantum state and each second quantum state;
and dividing the text to be clustered corresponding to the first quantum state into the searched class corresponding to the second quantum state.
Optionally, the method further includes:
determining the clustering centers of all the classes after current clustering, and judging whether the determined clustering centers meet a second preset condition or not;
and if the second preset condition is not met, updating the clustering center for clustering to the determined clustering center, and returning to the step of preparing the first quantum state corresponding to the text to be clustered and the second quantum state corresponding to the clustering center for clustering.
Optionally, the determining whether the determined clustering center meets a second preset condition includes:
calculating an offset distance between the determined cluster center and the cluster center for clustering;
if the offset distance is smaller than a preset threshold value, judging that the determined clustering center meets a second preset condition;
and if the offset distance is not smaller than a preset threshold value, judging that the determined clustering center does not meet a second preset condition.
Yet another embodiment of the present invention provides a text clustering apparatus based on quantum computing, the apparatus including:
the device comprises a preparation module, a processing module and a processing module, wherein the preparation module is used for preparing a first quantum state corresponding to a text to be clustered and a second quantum state corresponding to a clustering center for clustering, and one clustering center corresponds to one class;
the acquisition module is used for operating and measuring a first preset quantum circuit to acquire the similarity between the first quantum state and the second quantum state;
and the clustering module is used for clustering the texts to be clustered according to the similarity between the first quantum state and the second quantum state.
An embodiment of the invention provides a storage medium having a computer program stored thereon, wherein the computer program is arranged to perform any of the above methods when executed.
An embodiment of the invention provides an electronic device comprising a memory having a computer program stored therein and a processor configured to execute the computer program to perform any of the methods described above.
Compared with the prior art, the text clustering method based on quantum computation provided by the invention comprises the steps of preparing a first quantum state corresponding to a text to be clustered and a second quantum state corresponding to a clustering center for clustering, wherein one clustering center corresponds to one class; operating and measuring a first preset quantum circuit to obtain the similarity between the first quantum state and the second quantum state; clustering is carried out on the text to be clustered according to the similarity between the first quantum state and the second quantum state, so that quantum computation and a clustering algorithm are combined, the parallel acceleration advantage of quantum computation is exerted, the computation amount required by processing the traditional clustering task is reduced, and the blank of the related technology is filled.
Drawings
Fig. 1 is a block diagram of a hardware structure of a computer terminal of a text clustering method based on quantum computing according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of a text clustering method based on quantum computing according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a first predetermined quantum wire according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a text clustering device based on quantum computing according to an embodiment of the present invention.
Detailed Description
The embodiments described below with reference to the drawings are illustrative only and should not be construed as limiting the invention.
The embodiment of the invention firstly provides a text clustering method based on quantum computing, and the method can be applied to electronic equipment, such as computer terminals, specifically common computers, quantum computers and the like.
This will be described in detail below by way of example as it would run on a computer terminal. Fig. 1 is a block diagram of a hardware structure of a computer terminal of a text clustering method based on quantum computing according to an embodiment of the present invention. As shown in fig. 1, the computer terminal may include one or more (only one shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, and optionally, a transmission device 106 for communication functions and an input-output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the computer terminal. For example, the computer terminal may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
The memory 104 may be used to store software programs and modules of application software, such as program instructions/modules corresponding to the text clustering method based on quantum computing in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the software programs and modules stored in the memory 104, that is, implementing the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to a computer terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
It should be noted that a true quantum computer is a hybrid structure, which includes two major components: one part is a classic computer which is responsible for executing classic calculation and control; the other part is quantum equipment which is responsible for running a quantum program to further realize quantum computation. The quantum program is a string of instruction sequences which can run on a quantum computer and are written by a quantum language such as a Qrun language, so that the support of the operation of the quantum logic gate is realized, and the quantum computation is finally realized. In particular, a quantum program is a sequence of instructions that operate quantum logic gates in a time sequence.
In practical applications, due to the limited development of quantum device hardware, quantum computation simulation is usually required to verify quantum algorithms, quantum applications, and the like. The quantum computing simulation is a process of realizing the simulation operation of a quantum program corresponding to a specific problem by means of a virtual architecture (namely a quantum virtual machine) built by resources of a common computer. In general, it is necessary to build quantum programs for a particular problem. The quantum program referred in the embodiment of the invention is a program written in a classical language for representing quantum bits and evolution thereof, wherein the quantum bits, quantum logic gates and the like related to quantum computation are all represented by corresponding classical codes.
A quantum circuit, which is an embodiment of a quantum program and also a weighing sub-logic circuit, is the most common general quantum computation model, and represents a circuit that operates on a quantum bit under an abstract concept, and the circuit includes the quantum bit, a circuit (timeline), and various quantum logic gates, and finally, a result is often read through a quantum measurement operation.
Unlike conventional circuits that are connected by metal lines to pass either voltage or current signals, in quantum circuits, the lines can be viewed as being connected by time, i.e., the state of a qubit evolves naturally over time, in the process being operated on as indicated by the hamiltonian until a logic gate is encountered.
The quantum program refers to the total quantum circuit, wherein the total number of the quantum bits in the total quantum circuit is the same as the total number of the quantum bits of the quantum program. It can be understood that: a quantum program may consist of quantum wires, measurement operations for quantum bits in the quantum wires, registers to hold measurement results, and control flow nodes (jump instructions), and a quantum wire may contain tens to hundreds or even thousands of quantum logic gate operations. The execution process of the quantum program is a process executed for all the quantum logic gates according to a certain time sequence. It should be noted that timing is the time sequence in which the single quantum logic gate is executed.
It should be noted that in the classical calculation, the most basic unit is a bit, and the most basic control mode is a logic gate, and the purpose of the control circuit can be achieved through the combination of the logic gates. Similarly, the way qubits are handled is quantum logic gates. The quantum state can be evolved by using quantum logic gates, which are the basis for forming quantum circuits, including single-bit quantum logic gates, such as Hadamard gates (H gates, Hadamard gates), pauli-X gates (X gates), pauli-Y gates (Y gates), pauli-Z gates (Z gates), RX gates, RY gates, RZ gates, and the like; two-bit or multi-bit quantum logic gates such as CNOT gates, CR gates, CZ gates, iSWAP gates, Toffoli gates, and the like. Quantum logic gates are typically represented using unitary matrices, which are not only matrix-form but also an operation and transformation. The function of a general quantum logic gate on a quantum state is calculated by multiplying a unitary matrix by a matrix corresponding to a quantum state right vector.
Clustering algorithms are machine learning techniques that involve grouping of data points, and given a set of data points, a clustering algorithm can be used to divide each data point into a particular cluster. In some cases, the data points in the same group should have similar attributes or characteristics, and the data points in different groups should have highly different attributes or characteristics, and further, when a group of data points is divided into clusters, the more densely distributed points may also be divided into the same cluster, and a cluster is a category.
In the field of natural language processing, text with words in the range of hundreds to fifteen is called short text, for example, text which can embody the whole text content in one page of a4 paper, and a short text clustering technology is one of core technologies in the text information mining technology, and aims to divide a document set into a plurality of clusters, so that the similarity of the document content in the same cluster is as large as possible, and the similarity between different clusters is as small as possible.
Referring to fig. 2, fig. 2 is a schematic flowchart of a text clustering method based on quantum computation according to an embodiment of the present invention, which may include the following steps:
s201, preparing a first quantum state corresponding to a text to be clustered and a second quantum state corresponding to a clustering center for clustering, wherein one clustering center corresponds to one class;
specifically, a text vector corresponding to the text to be clustered and a center vector corresponding to the clustering center for clustering can be obtained, and a first quantum state corresponding to the text vector and a second quantum state corresponding to the center vector are prepared. Preferably, after the text vector and the central vector are obtained, dimension reduction processing can be performed on the text vector and the central vector, so that the calculation complexity is reduced, and the preparation of subsequent quantum states is facilitated.
The text to be clustered can be a short text set including a plurality of short texts to be clustered, the clustering center for clustering refers to a class center of a plurality of classes to be clustered, and the data form of the clustering center can be represented as a center vector.
Exemplarily, a text to be clustered can be obtained, and the text to be clustered is input to a pre-trained bidirectional encoder BERT model to obtain a text vector corresponding to the text to be clustered; initializing a center vector corresponding to a clustering center for clustering, and inputting the text vector and the center vector into a pre-trained Principal Component Analysis (PCA) model to obtain a text vector and a center vector after dimensionality reduction. And the central vector corresponding to the clustering center for clustering can be initialized and set randomly, and the text vector and the central vector after dimensionality reduction are in the same dimensionality.
It should be noted that the BERT (bidirectional encoder) model is a language representation model, and can convert a short text data set into a vector including a certain dimension; a common data Analysis method of a Principal Component Analysis (PCA) model is commonly used for dimensionality reduction of high-dimensional data, can be used for extracting Principal feature components of the data, and reduces resource consumption of subsequent processing by reducing the dimensionality of a high-dimensional vector to a low-dimensional vector. The BERT model, the PCA model, and the training methods thereof are all prior art, and the present invention is not described herein.
In a specific implementation, a second predetermined quantum wire may be constructed based on the concept of existing angle encoding (angle _ encoding) for preparing a first quantum state corresponding to the text vector and a second quantum state corresponding to the central vector. Illustratively, vector information x is encoded by using a rotation logic gate, a parameter of the rotation logic gate, i.e. a rotation angle, is determined by the vector information, and the encoded quantum state | x > has the following form:
wherein R can be any one of RX gate, RY gate and RZ gate, and xiFor the ith data in the vector, n is the dimension of the vector, and the number of qubits in the second predetermined line may be equal to the dimension of the vector.
As can be appreciated by those skilled in the art, it is assumed thatUsing a RY gate, the angular encoding rotates the qubit 180 degrees around the y-axis to obtain a quantum state |111 corresponding to x>。
S202, operating and measuring a first preset quantum circuit to obtain the similarity between the first quantum state and the second quantum state;
specifically, the first quantum state and the second quantum state may be used as an initial quantum state of the first preset quantum line, the target quantum bit of the first preset quantum line is operated and measured, and the similarity between the first quantum state and the second quantum state is obtained according to the measurement result.
An exemplary process of calculating the similarity between quantum states is described based on fig. 3. As shown in FIG. 3, q0To control bits, q1、q2To input bits, | phi>、|ψ>H represents an H gate, SWAP represents a SWAP gate, and M represents quantum measurement operation. Wherein the initial quantum state of the circuit is | v1>The last quantum state after the circuit operation is | v2>For example, the following are shown:
for control bit q0Making a measurement, calculating q0Is |0>Probability P (| 0) of state>) The following were used:
wherein,can represent | ψ>And phi>Is the square of the absolute value of the inner product of (i.e. | ψ)>And phi>And can be obtained from the above formula。
S203, clustering the texts to be clustered according to the similarity between the first quantum state and the second quantum state.
Specifically, for each first quantum state, in the similarity between the first quantum state and each second quantum state, the second quantum state corresponding to the similarity meeting the first preset condition is searched, and the text to be clustered corresponding to the first quantum state is classified into the class corresponding to the searched second quantum state. In practical application, the similarity satisfying the first preset condition may be: the similarity with the largest absolute value or the similarity larger than a predetermined value, and so on.
Illustratively, there are 10 texts to be clustered, that is, 10 short texts, corresponding to 10 first quantum states; if 3 classes are to be aggregated, 3 cluster centers are set, corresponding to 3 second quantum states. And 3 similarities correspond to a certain first quantum state and 3 second quantum states, and the values of the 3 similarities are 0.1, 0.6 and 0.9, so that the short text corresponding to the first quantum state is divided into the classes to which the clustering centers corresponding to the second quantum states with the maximum similarity of 0.9 belong, that is, the short text and the clustering centers corresponding to the maximum similarity belong to the same class.
Further, in order to improve the clustering precision, the clustering centers of the various classes after current clustering can be determined, and whether the determined clustering centers meet a second preset condition or not is judged; and if the second preset condition is not met, updating the clustering center for clustering to the determined clustering center, and returning to the step of preparing the first quantum state corresponding to the text to be clustered and the second quantum state corresponding to the clustering center for clustering.
Because the actual clustering center which is re-determined after clustering changes, if the position change of the clustering center used by the clustering is small, the clustering effect is good, if the change is large, the clustering center needs to be updated, and clustering iteration is performed again until the position of the clustering center is stable.
For example, whether the determined clustering center meets the second preset condition or not may be determined, and an offset distance between the determined clustering center and the clustering center for clustering may be calculated; if the offset distance is smaller than a preset threshold value, judging that the determined clustering center meets a second preset condition; and if the offset distance is not less than the preset threshold value, judging that the determined clustering center does not meet a second preset condition.
For example, the offset distance between the cluster center newly determined after clustering and the cluster center used for the current clustering is calculated, and the offset distance is expressed by the coordinate distance between the corresponding center vectors. If the offset distance is smaller than the preset threshold value of 0.01, the second preset condition is met, and clustering is finished; and if the offset distance of the two clustering centers before and after clustering is less than 0.01, updating the clustering center, taking the redetermined clustering center as the clustering center used for next clustering, and returning to the step of preparing the first quantum state corresponding to the text to be clustered and the second quantum state corresponding to the clustering center used for clustering until the offset distance of the two clustering centers before and after clustering is less than 0.01.
Or, judging whether the determined clustering center meets the second preset condition may be: acquiring the updating times corresponding to the determined clustering center, and if the updating times reach the preset times, meeting a second preset condition and finishing clustering; otherwise, updating the clustering center, taking the re-determined clustering center as the clustering center used by the next clustering, and returning to the step of preparing the first quantum state corresponding to the text to be clustered and the second quantum state corresponding to the clustering center used for clustering until the updating times of the clustering center reach the preset times.
Therefore, a first quantum state corresponding to a text to be clustered and a second quantum state corresponding to a clustering center for clustering are prepared, wherein one clustering center corresponds to one class; operating and measuring a first preset quantum circuit to obtain the similarity between the first quantum state and the second quantum state; clustering is carried out on the text to be clustered according to the similarity between the first quantum state and the second quantum state, so that quantum computation and a clustering algorithm are combined, the parallel acceleration advantage of quantum computation is exerted, the computation amount required by processing the traditional clustering task is reduced, and the blank of the related technology is filled.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a text clustering device based on quantum computation according to an embodiment of the present invention, which corresponds to the flow shown in fig. 2, and the device includes:
the preparation module 401 is configured to prepare a first quantum state corresponding to a text to be clustered and a second quantum state corresponding to a clustering center for clustering, where one clustering center corresponds to one class;
an obtaining module 402, configured to operate and measure a first preset quantum line, and obtain a similarity between the first quantum state and the second quantum state;
a clustering module 403, configured to cluster the texts to be clustered according to the similarity between the first quantum state and the second quantum state.
Specifically, the preparation module comprises:
the acquisition and preparation unit is used for acquiring a text vector corresponding to a text to be clustered and a center vector corresponding to a clustering center for clustering, and preparing a first quantum state corresponding to the text vector and a second quantum state corresponding to the center vector.
Specifically, the acquisition preparation unit includes:
and the obtaining subunit is used for obtaining a text vector corresponding to the text to be clustered and a center vector corresponding to a clustering center for clustering, and performing dimension reduction processing on the text vector and the center vector.
Optionally, the obtaining subunit is specifically configured to:
acquiring a text to be clustered, and inputting the text to be clustered to a pre-trained bidirectional encoder BERT model to obtain a text vector corresponding to the text to be clustered;
initializing a center vector corresponding to a clustering center for clustering, and inputting the text vector and the center vector to a pre-trained Principal Component Analysis (PCA) model to obtain the text vector and the center vector after dimensionality reduction.
Specifically, the clustering module is specifically configured to:
for each first quantum state, searching a second quantum state corresponding to the similarity meeting a first preset condition in the similarity between the first quantum state and each second quantum state;
and dividing the text to be clustered corresponding to the first quantum state into the searched class corresponding to the second quantum state.
Specifically, the apparatus may further include:
the judging module is used for determining the clustering centers of all the current clustered classes and judging whether the determined clustering centers meet a second preset condition or not;
and the updating module is used for updating the clustering center for clustering into the determined clustering center and returning to the step of preparing the first quantum state corresponding to the text to be clustered and the second quantum state corresponding to the clustering center for clustering under the condition that the determined clustering center does not meet the second preset condition.
Specifically, the determining module is specifically configured to:
calculating an offset distance between the determined cluster center and the cluster center for clustering;
if the offset distance is smaller than a preset threshold value, judging that the determined clustering center meets a second preset condition;
and if the offset distance is not smaller than a preset threshold value, judging that the determined clustering center does not meet a second preset condition.
Therefore, a first quantum state corresponding to a text to be clustered and a second quantum state corresponding to a clustering center for clustering are prepared, wherein one clustering center corresponds to one class; operating and measuring a first preset quantum circuit to obtain the similarity between the first quantum state and the second quantum state; clustering is carried out on the text to be clustered according to the similarity between the first quantum state and the second quantum state, so that quantum computation and a clustering algorithm are combined, the parallel acceleration advantage of quantum computation is exerted, the computation amount required by processing the traditional clustering task is reduced, and the blank of the related technology is filled.
An embodiment of the present invention further provides a storage medium, in which a computer program is stored, where the computer program is configured to execute the steps in any of the above method embodiments when running.
Specifically, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:
s1, preparing a first quantum state corresponding to a text to be clustered and a second quantum state corresponding to a clustering center for clustering, wherein one clustering center corresponds to one class;
s2, operating and measuring a first preset quantum circuit to obtain the similarity between the first quantum state and the second quantum state;
s3, clustering the texts to be clustered according to the similarity between the first quantum state and the second quantum state.
An embodiment of the present invention further provides an electronic apparatus, which includes a memory and a processor, and is characterized in that the memory stores a computer program, and the processor is configured to execute the computer program to perform the steps in any of the above method embodiments.
Specifically, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
Specifically, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s1, preparing a first quantum state corresponding to a text to be clustered and a second quantum state corresponding to a clustering center for clustering, wherein one clustering center corresponds to one class;
s2, operating and measuring a first preset quantum circuit to obtain the similarity between the first quantum state and the second quantum state;
s3, clustering the texts to be clustered according to the similarity between the first quantum state and the second quantum state.
The construction, features and functions of the present invention are described in detail in the embodiments illustrated in the drawings, which are only preferred embodiments of the present invention, but the present invention is not limited by the drawings, and all equivalent embodiments modified or changed according to the idea of the present invention should fall within the protection scope of the present invention without departing from the spirit of the present invention covered by the description and the drawings.
Claims (10)
1. A text clustering method based on quantum computation is characterized by comprising the following steps:
preparing a first quantum state corresponding to a text to be clustered and a second quantum state corresponding to a clustering center for clustering, wherein one clustering center corresponds to one class;
operating and measuring a first preset quantum circuit to obtain the similarity between the first quantum state and the second quantum state;
and clustering the texts to be clustered according to the similarity between the first quantum state and the second quantum state.
2. The method of claim 1, wherein the preparing a first quantum state corresponding to the text to be clustered and a second quantum state corresponding to the clustering center for clustering comprises:
obtaining a text vector corresponding to a text to be clustered and a center vector corresponding to a clustering center for clustering, and preparing a first quantum state corresponding to the text vector and a second quantum state corresponding to the center vector.
3. The method according to claim 2, wherein the obtaining a text vector corresponding to the text to be clustered and a center vector corresponding to a clustering center for clustering comprises:
obtaining a text vector corresponding to a text to be clustered and a center vector corresponding to a clustering center for clustering, and performing dimension reduction processing on the text vector and the center vector.
4. The method according to claim 3, wherein the obtaining a text vector corresponding to a text to be clustered and a center vector corresponding to a clustering center for clustering, and performing dimension reduction processing on the text vector and the center vector comprises:
acquiring a text to be clustered, and inputting the text to be clustered to a pre-trained bidirectional encoder BERT model to obtain a text vector corresponding to the text to be clustered;
initializing a center vector corresponding to a clustering center for clustering, and inputting the text vector and the center vector to a pre-trained Principal Component Analysis (PCA) model to obtain the text vector and the center vector after dimensionality reduction.
5. The method according to claim 1, wherein the clustering the text to be clustered according to the similarity between the first quantum state and the second quantum state comprises:
for each first quantum state, searching a second quantum state corresponding to the similarity meeting a first preset condition in the similarity between the first quantum state and each second quantum state;
and dividing the text to be clustered corresponding to the first quantum state into the searched class corresponding to the second quantum state.
6. The method of claim 1, further comprising:
determining the clustering centers of all the classes after current clustering, and judging whether the determined clustering centers meet a second preset condition or not;
and if the second preset condition is not met, updating the clustering center for clustering to the determined clustering center, and returning to the step of preparing the first quantum state corresponding to the text to be clustered and the second quantum state corresponding to the clustering center for clustering.
7. The method according to claim 6, wherein the determining whether the determined cluster center satisfies a second preset condition comprises:
calculating an offset distance between the determined cluster center and the cluster center for clustering;
if the offset distance is smaller than a preset threshold value, judging that the determined clustering center meets a second preset condition;
and if the offset distance is not smaller than a preset threshold value, judging that the determined clustering center does not meet a second preset condition.
8. An apparatus for clustering text based on quantum computation, the apparatus comprising:
the device comprises a preparation module, a processing module and a processing module, wherein the preparation module is used for preparing a first quantum state corresponding to a text to be clustered and a second quantum state corresponding to a clustering center for clustering, and one clustering center corresponds to one class;
the acquisition module is used for operating and measuring a first preset quantum circuit to acquire the similarity between the first quantum state and the second quantum state;
and the clustering module is used for clustering the texts to be clustered according to the similarity between the first quantum state and the second quantum state.
9. A storage medium, in which a computer program is stored, wherein the computer program is arranged to perform the method of any of claims 1 to 7 when executed.
10. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210154196.4A CN114282000A (en) | 2022-02-21 | 2022-02-21 | Text clustering method, text clustering device, text clustering medium and electronic device based on quantum computation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210154196.4A CN114282000A (en) | 2022-02-21 | 2022-02-21 | Text clustering method, text clustering device, text clustering medium and electronic device based on quantum computation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114282000A true CN114282000A (en) | 2022-04-05 |
Family
ID=80882060
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210154196.4A Pending CN114282000A (en) | 2022-02-21 | 2022-02-21 | Text clustering method, text clustering device, text clustering medium and electronic device based on quantum computation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114282000A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120321188A1 (en) * | 2011-06-20 | 2012-12-20 | Michael Benjamin Selkowe Fertik | Identifying information related to a particular entity from electronic sources, using dimensional reduction and quantum clustering |
CN108595532A (en) * | 2018-04-02 | 2018-09-28 | 三峡大学 | A kind of quantum clustering system and method for Law Text |
CN113688906A (en) * | 2021-08-25 | 2021-11-23 | 四川元匠科技有限公司 | Customer segmentation method and system based on quantum K-means algorithm |
CN113743457A (en) * | 2021-07-29 | 2021-12-03 | 暨南大学 | Quantum density peak value clustering method based on quantum Grover search technology |
-
2022
- 2022-02-21 CN CN202210154196.4A patent/CN114282000A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120321188A1 (en) * | 2011-06-20 | 2012-12-20 | Michael Benjamin Selkowe Fertik | Identifying information related to a particular entity from electronic sources, using dimensional reduction and quantum clustering |
CN108595532A (en) * | 2018-04-02 | 2018-09-28 | 三峡大学 | A kind of quantum clustering system and method for Law Text |
CN113743457A (en) * | 2021-07-29 | 2021-12-03 | 暨南大学 | Quantum density peak value clustering method based on quantum Grover search technology |
CN113688906A (en) * | 2021-08-25 | 2021-11-23 | 四川元匠科技有限公司 | Customer segmentation method and system based on quantum K-means algorithm |
Non-Patent Citations (3)
Title |
---|
刘雪娟: ""面向大数据的聚类算法研究"", 《中国博士学位论文全文数据库 信息科技辑》 * |
吴永飞 等: "" 量子聚类算法在银行智慧运营场景中的应用"", 《银行家》 * |
龚静: "《中文文本聚类研究》", 31 March 2012, 中国传媒大学出版社 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114358319B (en) | Machine learning framework-based classification method and related device | |
CN114358216B (en) | Quantum clustering method based on machine learning framework and related device | |
CN114819163B (en) | Training method and device for quantum generation countermeasure network, medium and electronic device | |
CN114792378A (en) | Quantum image identification method and device | |
CN114821217A (en) | Image identification method and device based on quantum classical hybrid neural network | |
CN114358318B (en) | Machine learning framework-based classification method and related device | |
CN115828999A (en) | Quantum convolution neural network construction method and system based on quantum state amplitude transformation | |
CN114511094A (en) | Quantum algorithm optimization method and device, storage medium and electronic device | |
CN114764619A (en) | Convolution operation method and device based on quantum circuit | |
CN114819170A (en) | Method, apparatus, medium, and electronic apparatus for estimating options based on quantum line | |
CN114819168A (en) | Quantum comparison method and device for matrix eigenvalues | |
CN114219048A (en) | Spectral clustering method and device based on quantum computation, electronic equipment and storage medium | |
EP4414901A1 (en) | Model weight acquisition method and related system | |
CN115879562B (en) | Quantum program initial mapping determination method and device and quantum computer | |
CN114282000A (en) | Text clustering method, text clustering device, text clustering medium and electronic device based on quantum computation | |
CN116431807A (en) | Text classification method and device, storage medium and electronic device | |
CN114881238A (en) | Method and apparatus for constructing quantum discriminator, medium, and electronic apparatus | |
CN114881239A (en) | Method and apparatus for constructing quantum generator, medium, and electronic apparatus | |
CN115983392A (en) | Method, device, medium and electronic device for determining quantum program mapping relation | |
CN114372584A (en) | Transfer learning method based on machine learning framework and related device | |
CN115409185A (en) | Construction method and device of quantum line corresponding to linear function | |
CN114862079A (en) | Risk value estimation method, device, medium, and electronic device based on quantum line | |
CN115907021B (en) | Quantum calculation-based data clustering method and device and quantum computer | |
CN114372582B (en) | Quantum automatic coding method based on machine learning framework and related device | |
CN115423108B (en) | Quantum circuit cutting processing method and device and quantum computer operating system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20220405 |