CN108038505A

CN108038505A - A kind of producing region weather data fusion method based on lattice contraction

Info

Publication number: CN108038505A
Application number: CN201711317380.1A
Authority: CN
Inventors: 彭伟民; 陈爱红; 陈婧; 徐海涛
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2017-12-12
Filing date: 2017-12-12
Publication date: 2018-05-15
Anticipated expiration: 2037-12-12
Also published as: CN108038505B

Abstract

The invention discloses a kind of producing region weather data fusion method based on lattice contraction.The present invention represents ability using the data of lattice structure, and the data sample of source data set is expressed as dot matrix node and is organized into the lattice structure of level shape.By the lattice contraction method based on adjacent bit, similar dot matrix node is assembled in a stepwise fashion.On the basis of lattice contraction result, the present invention proposes the corresponding classical and producing region weather data fusion method based on quantum.The present invention can by it is a kind of it is controllable in a manner of realize the effective integration of producing region weather data.

Description

Method for fusing weather data of production area based on dot matrix shrinkage

Technical Field

The invention relates to the technical field of (producing area weather) data fusion, in particular to a producing area weather data fusion method based on lattice contraction.

Background

Data fusion aims to detect duplicate data in a source data set and fuse them into a more compact and clear representation. Fusion results are valuable information for decision support if they can show the overall distribution and relevance of the source data. There is a lot of valuable information in the vast production area weather data generated by data-related companies. Considerable economic benefits can be generated if this valuable information is presented as a compact data fusion result and used for decision support.

Conventionally, data fusion techniques have mainly focused on the fusion of sensed data and image data. In recent years, in addition to sensor data fusion and image data fusion, data fusion techniques are also applied to the fields of failure diagnosis, scene classification, behavior and object recognition, and the like. In the financial field, applications of data fusion technology include producing area weather prediction, agricultural product market prediction and the like, but research and invention about producing area weather and other financial data fusion are rare. In order to obtain a concise and clear fusion result and use the fusion result for decision support, the main task of the fusion of the weather data of the source and the production area is to improve the integrity and the accuracy of the weather data of the source and the production area in a gradual and controllable mode. The existing classical and quantum-based data fusion methods are not sufficient to support the completion of this task. Therefore, a new method for fusing the weather data of the production area is needed.

The lattice structure has strong data representation capability, and can represent a traditional binary state and a quantum basic state. After the source data units are converted into lattice nodes and a hierarchical structure is formed, progressive fusion of the source data can be achieved through lattice shrinkage. Because the size of the lattice structure depends on the length of the lattice nodes, the shrinking process of the lattice structure is the shrinking process of the lattice nodes, and the process is similar to the data fusion process. In other words, the repeated data is gradually aggregated during the lattice shrinkage process. Since the lattice contraction process is controllable, the data fusion process based on lattice contraction is also controllable. When the repeated data in the same subset is detected, the repeated data in the same subset is fused into a new data unit through a fusion operation. In summary, it is appropriate to implement the producing area weather data fusion through lattice shrinkage, and the logical process is shown in fig. 1.

Before converting the weather data unit of the source production area into the lattice nodes, integer preprocessing is required to be carried out on the weather data unit of the source production area. The representation of the lattice nodes and the construction of the lattice structure are related to the unique value number of the elements in each data vector. Based on the dot matrix representation of the producing area weather data, the producing area weather data fusion process mainly comprises the following three steps: dot matrix shrinkage, repeated detection and data fusion. The lattice shrinkage brings similar lattice nodes together. The repeated detection divides the source data set into different subsets according to the lattice shrinkage result. And the data fusion fuses the repeated data units in the same subset into a single target data unit.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a producing area weather data fusion method based on lattice contraction.

The invention comprises the following steps:

1) Lattice shrinkage based on adjacent bit and quantum representation

For a Source producing zone weather dataset X (= [ s ] ₁ ，s ₂ ，...，s _i ，...，s _n ] ^T ) Data sample s in (1) _i Data elements thereinFrom L different data vectors X ¹ ，X ² ，...，X ^j ，...，X ^L (ii) a L is determined by the number of the weather attributes of the producing area; here, X ¹ ，X ² ，...，X ^j ，...，X ^L The following production zone weather attributes are indicated, respectively: average maximum air temperature, average minimum air temperature, historical maximum air temperature, historical minimum air temperature, weekly average air temperature, zhou Qiwen deviation, weekly cumulative rainfall, weekly rainfall deviation, 24-hour maximum rainfall, monthly cumulative rainfall, annual cumulative rainfall, average maximum relative humidity, average minimum relative humidity, days with air temperature above 90 ° F, days with air temperature below 32 ° F, days with rainfall above 0.01 inches, days with rainfall above 0.5 inches.

Defining data samples s _i Is composed ofRecording data samples once a week, wherein a production area weather data set X comprises a certain production areaRecording data samples for more than 15 years; if the data vector X ^j Has m as a data element _j A unique value, then the data elementIs quantized to contain m _j Quantum states of qubits, i.e.Only one qubit in each quantum state equals |1 >, its subscript, andat m _j The bit sequences in the unique value sequence are consistent; when the unique value number m is taken from all the data vectors ₁ ，m ₂ ，...，m _j ，...，m _L After determination, data sample s _i Quantized into a quantum state having M qubits; obviously, there are L qubits in each quantized sample equal to |1 >; the lattice structure is constructed by distributing the quantized samples between full |0 > quantum state |00.. 0.. 00 > and full |1 > quantum state |11.. 1.. 11 >.

The first layer of the lattice structure has only one node containing M qubits |0 > and has M child nodes, each node in the second layer has M-1 child nodes and has one qubit equal to |1 >, each node in the third layer has M-2 child nodes and has two qubits equal to |1 >, and so on for the other layers; the number of lattice nodes possessed by the (M/2+1) layer is the largest; and the quantized sample containing L quantum bit |1 > is located at the L +1 th layer; data samples | s are used below _i For example, a progressive lattice shrinkage process based on neighboring bits is described.

Data sample s _i Quantum of (2) | s _i Consists of the quantum representation of the L data elements it contains; and contains m _j Quantization element of quantum siteHas the following expression:

wherein the qubitEqual to |0 > or |1 >; defining quantized data samples | s _i Is more than:

due to sample | s _i Expression of > the element it contains Is composed of, thus will | s _i A contraction of > decomposes into a contraction of the element involved; two qubitsAndcan be achieved by controlled operation; in particular, qubitsAndas the non-union controlled bit, the qubit |0 > is taken as the target bit; if it isOrEqual to |1 >,then the X-gate is applied to the target quantum bit |0 >; when a particular attribute value in a data vector reaches a given threshold, the contraction of the data elements in the vector ends.

The following inequality is set as the lattice shrinkage threshold based on the neighboring bits:

wherein the content of the first and second substances,representing the number of unique data elements contained in the jth data vector in the r-th round of lattice contraction; if inequality (3) is satisfied, continuing the dot matrix shrinkage process, otherwise, terminating the dot matrix shrinkage process; that is, when the speed of the unique data element reduction is less than the speed of the dot matrix node length shortening, the dot matrix shrinkage process is continued.

2) Lattice shrinkage based on neighboring bits and classical representation

The dot matrix contraction principle based on adjacent bits and classical representation is similar to the dot matrix contraction principle based on adjacent bits and quantum representation; the quantum state is replaced by a binary state, so that a classical lattice structure can be obtained, and each lattice node comprises M binary bits; the contraction of two binary bits may be achieved by a logical or operation, as compared to the contraction of two qubits; because the shrinkage of the lattice nodes will result in the aggregation of similar data elements in one data vector, the difference between the data elements in one data vector is enlarged after one round of shrinkage; based on this, the vector variance is taken as a key factor of the shrinkage threshold, which can reflect the dispersion between nodes in the vector;

wherein, the first and the second end of the pipe are connected with each other,denotes the jth numberVariance of the shrinkage of the data vector in the r-th round lattice; i.e. if the vector variance remains increased, the lattice shrinkage process continues.

3) Quantum-based repetition detection and data fusion

Data elements according to the quantum representation of the data elements shown in equation (1)Subscript 1,2,k of the mesoquantum site _j ,...,m _j To representAll unique numeric sequence numbers; in thatDuring the contraction process of (a), the subscript of each qubit remains unchanged; namely, after one round of contraction, the material is put into a mould,as a result of shrinkageSubscript values of the medium quantum bits are derived fromSubscript value of the medium quantum bit.

For convenience, data elementsThe result after r rounds of contraction is set asAnd has the following defined form:

wherein the content of the first and second substances, and so on; for theIn (2), only one qubit is equal to |1 >, which is called the pole qubit and is assumed to be the pole qubitEqual to |1 >; as the quantum data elements in the vector shrink, the subscript gap between the extreme qubits in the two data elements becomes smaller; in the following, in quantum state | x _i1 > and | x _i2 For example, the change in the subscript difference between two pole qubits is shown; wherein the content of the first and second substances, and | x _i1 > and | x _i2 The subscript sequence of > is {1,2.

Obviously, | x _i1 > and | x _i2 A > pole qubit ofAndbefore the shrinkage, the plastic film is put into a shrinking way,andsubscript gap equal to After one round of contraction, | x _i1 > and | x _i2 Become > to And|x _i1 >' and | x _i2 The subscript sequence of >' is {1,3,5,7,9,11,13,15}; | x _i1 >' and | x _i2 The pole qubit ofAndand after one round of contractionAndsubscript gap equal toFollowing the same approach, | x _i1 ＞′And | x _i2 After two rounds of contraction >':and |x _i1 > "and | x _i2 The subscript sequence of "is {1,5,9,13}, where the pole qubit isAndafter two rounds of shrinkageAndsubscript gap equal to

The change in the difference in subscripts between the pole qubits indicates that as the data elements in the vector shrink, the difference in subscripts between the pole qubits remains reduced by no less than r after r rounds of shrinking; for this reason, for the setting of the repeated detection standard, the reduction amount of the subscript difference and the different contraction round numbers of the data elements in the same data sample are mainly considered; let data sample | s _i The result of shrinkage is|s _i Pole qubit > rdtThenFrom | s _i Data elements in rdt Pole qubit ofAnd (4) forming.

The duplicate detection criteria were defined as:

wherein r is _j The number of contraction turns of the data elements in the jth data vector; if the data sample | s _i > and | s _i ' > inequality (7) is satisfied, they are duplicate data on a subset; according toDefinition of (2), expressionConverting into:

the duplicate detection criteria are expressed as follows:

when all the duplicate data samples in a subset are determined, the duplicate data samples are fused into a single target data sample; for source data samples in a subset, statistics on the observation probability of all data elements in each vector is needed; the data element with the highest probability of observation in a vector will be selected as the target data element for that vector; with a subset of data ss _t Middle data sampleFor example, a quantum-based data fusion model is shown; wherein n is _t Represents a subset ss _t The number of medium data samples; if the source data set is divided into T subsets, then n ₁ +…+n _t +…+n _T ＝n。

Data samplesCan be decomposed into a fusion of data elements in different data vectors; let target data sample be | s _t >. It represents the subset ss _t Fusing results of the source data samples; | s _t Data element of target > middleAs source data samplesMiddle source data element And equals the source data element with the maximum probability of observation, i.e.:wherein the content of the first and second substances,as data elementsThe observation probability of (2); if it will be Denoted as max (p) _tj ) Then:

in formula (10), k =1 _t ，...，n _t 。

4) Classical repeat detection and data fusion

For the classical data fusion model, the focus of attention is the inclusion of the fusion results; that is, a target data element contains all the components of the associated source data element; in subsets ss _t Middle data sampleFor example, a classical data fusion model is shown; wherein the content of the first and second substances,in subsets ss _t The target data sample corresponding to the intermediate source data sample is s _t And is andfor data vector X ^j Middle source data elementCorresponding target data elementA simple way to calculate (1) is to use a weighted average strategy;

wherein the content of the first and second substances,is composed ofThe weight of (2).

The invention has the beneficial effects that:

1) The invention expresses the weather data of the production area into lattice nodes and further organizes the lattice nodes into a lattice structure. According to the characteristics of the lattice structure, the aggregation of similar lattice nodes can be realized through lattice shrinkage. The lattice node expression depends on the unique value number of the data elements in each vector, so that the length of the node is too long, and the operation of vectors and matrixes adopted by the traditional lattice contraction is not facilitated. Aiming at the situation, the invention provides a dot matrix contraction method based on adjacent bits, and the contraction of dot matrix nodes is realized in a progressive mode.

2) According to the lattice contraction result and the aggregation condition of similar lattice nodes, the method divides the source data set into different subsets by analyzing the subscripts of the quantum bits (or binary bits) in the data samples, thereby realizing the detection of repeated data samples in the same subset. For the fusion of data samples in subsets, the quantum-based fusion model is based on a maximum observed probability strategy, whereas the classical fusion model is based on a weighted average strategy.

Drawings

FIG. 1 is a logical process of producing zone weather data fusion based on lattice shrinkage.

Fig. 2 is a quantum-based lattice structure. Each quantized lattice node contains M qubits.

FIG. 3 is a round of shrinkage of a lattice node. Wherein, a unique value taking number m in each data vector is set ₁ ，m ₂ ，...，m _j ，...，m _L Is an even number, m ₁ /2，m ₂ /2，...，m _j /2，...，m _L And/2 is an odd number.

FIG. 4 is a classical lattice structure. Each lattice node contains M binary bits.

Fig. 5 is a round of qubit subscript transfer. Wherein, the first and the second end of the pipe are connected with each other,to representSequence of subscripts of Medium Quantum bits, 2k' _j ＝k _j And 2m' _j ＝m _j 。Middle quantum positionIs thatMiddle quantum positionAndthe result of shrinkage of (2). If qubitAndare each 2k' _j -1 and 2k' _j Then qubitSubscript ofIs equal to 2k' _j -1. Quantum bitSubscript of (a) is in the sequence of subscriptsWherein the number is k' _j . And a qubitAndsubscript of (a) is in the sequence of subscriptsWherein each of the numbers is 2k' _j -1 and 2k' _j 。

FIG. 6 is a fusion of data samples in a subset.

Detailed Description

The embodiments of the invention are further illustrated in the following figures:

1) Lattice shrinkage based on adjacent bit and quantum representation

As a basic unit of the lattice structure, one lattice node depends on the assignment of data elements in one data sample. For a Source producing zone weather dataset X (= [ s ] ₁ ，s ₂ ，...，s _i ，...，s _n ] ^T ) Data sample s in (1) _i Data elements thereinFrom L different data vectors X ¹ ，X ² ，...，X ^j ，...，X ^L . L is determined by the number of the weather attributes of the producing area. Here, X ¹ ，X ² ，...，X ^j ，...，X ^L The following production zone weather attributes are indicated, respectively: average maximum air temperature, average minimum air temperature, historical maximum air temperature, historical minimum air temperature, weekly average air temperature, zhou Qiwen deviation, weekly cumulative rainfall, weekly rainfall deviation, 24-hour maximum rainfall (one week value), monthly cumulative rainfall, annual cumulative rainfall, average maximum relative humidity, average minimum relative humidity, days when air temperature is higher than 90 ° F, days when air temperature is lower than 32 ° F, days when rainfall is greater than 0.01 inch, and days when rainfall is greater than 0.5 inch. Thus, data samples s are defined _i Is composed ofTypically, the data samples are recorded once a week, and the producing region weather data set X contains data sample records for more than 15 years in a certain producing region. If the data vector X ^j Has m as a data element _j A unique value, then the data elementCan be quantized to contain m _j Quantum states of qubits, i.e.Only one qubit in each quantum state equals |1 >, its subscript, andat m _j The bit sequences in the unique value sequence are consistent. When the unique value number m in all the data vectors ₁ ,m ₂ ,...,m _j ,...,m _L After determination, the data sample s _i Can be quantized to haveQuantum states of the qubits. It is apparent that there are L qubits in each quantized sample equal to |1 >. The lattice structure is constructed by distributing the quantized samples between full |0 > quantum state |00.. 0.. 00 > and full |1 > quantum state |11.. 1.. 11 > as shown in fig. 2.

The lattice structure has only one node containing M qubits |0 > and having M child nodes in a first layer, each node in a second layer has M-1 child nodes and has one qubit equal to |1 >, each node in a third layer has M-2 child nodes and has two qubits equal to |1 >, and so on for the other layers. Therefore, the (M/2+1) layer has the largest number of lattice nodes. And the quantized samples containing L qubits |1 > are located in the L +1 th layer. In the following, data samples | s _i For example, a progressive lattice shrinkage process based on neighboring bits is described.

Data sample s _i Quantum representation of (2) _i Consists of the quantum representation of the L data elements it contains. And contains m _i Quantization element of quantum siteHaving the following representation:

wherein the qubitEqual to |0 > or |1 >. Thus, quantized data samples | s are defined _i Is more than:

typically, lattice shrinkage is achieved by vector and matrix operations. However, as shown in equation (2), the longer quantized data samples are not favorable for vector and matrix operations, nor for the definition of aggregation relationships between quantized data samples. Thus, if the length of the quantized data sample expressions can be shortened in a progressive manner, the definition of the aggregation relationships between the quantized data samples will be easier. In this process, the lattice nodes are shrunk in a controlled and progressive manner, e.g. byAs shown in fig. 3. Due to sample | s _i Expression of > the element it containsIs composed of, thus will | s _i A contraction of > decomposes into a contraction of the included elements. Two qubitsAndcan be achieved by controlled operation. In particular, qubitsAndtreated as an uncombined controlled bit, the qubit |0 > is treated as the target bit. If it isOrEqual to |1 >, apply an X-gate to the target quantum bit |0 >. When a particular attribute value in a data vector reaches a given threshold, the contraction of the data elements in the vector ends.

For setting the contraction threshold, the degree of disorder of the data system and the number of unique elements contained in the data system need to be considered. In particular, the shrinking of the lattice nodes and lattice structures can reduce the degree of misordering or the number of microscopic states involved in the data system and can avoid a significant reduction in the number of unique data elements. However, if the speed at which the number of unique data elements decreases is less than the lattice structure contraction speed, the accuracy of the contraction result can be further improved. For this purpose, the following inequality is set as a lattice shrinkage threshold based on adjacent bits.

Wherein the content of the first and second substances,indicating the number of unique data elements contained in the j-th data vector in the r-th dot matrix contraction. If inequality (3) is satisfied, continuing the dot matrix shrinkage process, otherwise, terminating the dot matrix shrinkage process. In other words, the lattice shrinkage process continues when the rate at which the unique data elements are reduced is less than the rate at which the lattice node lengths are shortened.

2) Lattice shrinkage based on neighboring bits and classical representation

The principle of lattice shrinkage based on neighboring bits and classical representation is similar to that based on neighboring bits and quantum representation. By replacing the quantum states in fig. 2 with binary states, a classical lattice structure can be obtained, as shown in fig. 4. Wherein each lattice node comprises M binary bits. The contraction of two binary bits may be achieved by a logical or operation, as compared to the contraction of two qubits. The differences between data elements in a data vector are expanded after a round of contraction, since contraction of the lattice nodes will result in aggregation of similar data elements in a data vector. Based on this, the vector variance is treated as a key factor for the shrink threshold, which can reflect the dispersion between nodes in the vector.

Wherein the content of the first and second substances,the variance of the j-th data vector shrinking in the r-th round of lattice is shown. That is, if the vector variance remains increasing, the lattice shrinkage process continues.

3) Quantum-based duplicate detection and data fusion

For quantum-based duplicate detection, it is not easy to convert the source data toThe set is directly divided into different subsets. Therefore, it is necessary to build a repetitive detection model by using the classical calculation idea. Here, a duplicate detection model is built by analyzing the subscripts of the data elements after lattice shrinkage. Data sample s _i And the data elements it containsThe subscript analysis process for the data elements is set forth for the purpose of example.

Data elements according to the quantum representation of the data elements shown in equation (1)Subscript 1,2,k of the mesoquantum site _j ,...,m _j RepresentSequence numbers of all unique values. In thatThe subscript of each qubit remains unchanged during the shrinking process. That is, after one round of contraction,as a result of shrinkage ofSubscript values of the medium quantum bits are derived fromThe subscript value of the medium quantum bit. Fig. 5 shows the subscript migration of qubits after one round of shrinkage.

For convenience, data elementsThe result after r rounds of contraction is set asAnd has the following definitionsMeaning form:

wherein the content of the first and second substances, and so on. For theIn (2), only one qubit is equal to |1 >, which is called the pole qubit and is assumed to be the pole qubitEqual to |1 >. As the quantum data elements shrink in the vector, the difference in subscripts between the extreme qubits in the two data elements becomes smaller. In the following, in quantum state | x _i1 > and | x _i2 For example, the change in the subscript gap between two pole qubits is shown. Wherein the content of the first and second substances, and | x _i1 > and | x _i2 The subscript sequence for > is {1,2, ·,8,...，15，16}。

obviously, | x _i1 > and | x _i2 A > pole qubit ofAndbefore the shrinkage, the plastic film is put into a shrinking way,andsubscript gap equal to After one round of contraction, | x _i1 > and | x _i2 Become > to And|x _i1 >' and | x _i2 The subscript sequence of >' is {1,3,5,7,9,11,13,15 }. | x _i1 >' and | x _i2 The pole qubit ofAndand after one round of contractionAndsubscript gap therebetween equal toFollowing the same approach, | x _i1 >' and | x _i2 After two rounds of contraction >':and |x _i1 the sum | x _i2 The subscript sequence of "is {1,5,9,13}, where the pole qubit isAndafter two rounds of shrinkageAndsubscript gap therebetween equal to

The change in the subscript difference between the pole qubits indicates that as the data elements in the vector shrink, the subscript difference between the pole qubits remains reduced by no less than r after the r round of shrinking. For this reason, for the setting of the duplicate detection criterion, the amount of reduction of the subscript gap and the data in the same data sample are mainly consideredDifferent number of shrink rounds of the element. Let data sample | s _i The result of shrinkage is|s _i Pole qubit > rdtThenFrom | s _i Data elements in rdt Pole qubit ofAnd (4) forming.

Thus, the duplicate detection criteria are defined as:

wherein r is _j The number of contraction rounds of the data elements in the jth data vector. If the data sample | s _i > and | s _i ' > inequality (7) is satisfied, they are duplicate data on a subset. According toDefinition of (2), expressionConverting into:

thus, the duplicate detection criteria have the following representation:

when all duplicate data samples in a subset are determined, they are fused into a single target data sample. For quantum-based data fusion methods, the measurement of the quantum stacking states results in a collapse characteristic meaning that the focus of interest is representative of the target data sample. And the representativeness of the target data sample may be reflected by the observed probability. Thus, for source data samples in a subset, the probability of observation for all data elements in each vector needs to be counted. The data element in a vector with the highest probability of observation will be selected as the target data element for this vector. In the following, the data subsets ss _t Middle data sample For example, a quantum-based data fusion model is shown. Wherein n is _t Represents a subset ss _t The number of data samples. If the source data set is divided into T subsets, then n ₁ +…+n _t +…+n _T ＝n。

Data sampleCan be decomposed into a fusion of data elements in different data vectors. Let the target data sample be | s _t >. It represents the subset ss _t And (5) fusion results of the source data samples. | s _t Data element of target > middleAs source data samplesMiddle source data element And equals the source data element with the largest probability of observation, i.e.:wherein the content of the first and second substances,as data elementsThe probability of observation of (2). If it will be Denoted as max (p) _tj ) And then:

in formula (10), k =1 _t ，...，n _t . Algorithm 1 shows a quantum-based iterative detection and data fusion process. With step 23 as the basic operation, the computational complexity of algorithm 1 is O (n) ² L)。

4) Classical repeat detection and data fusion

Classical and quantum-based lattice shrinkage follow the same principles, while repeat detection depends on the lattice shrinkage results, so the classical repeat detection modeling principle is the same as the quantum-based repeat detection modeling principle. For classical data fusion models, the focus of attention is the inclusion of the fusion results. In other words, a target data element contains all the components of the associated source data element. In the following, in subset ss _t Middle data sampleFor example, a classical data fusion model is shown. Wherein the content of the first and second substances,in subsets ss _t The target data sample corresponding to the intermediate source data sample is s _t And is andfor a data vector X ^j Middle source data element Corresponding target data elementOne simple way to calculate (c) is to use a weighted average strategy.

Wherein the content of the first and second substances,is composed ofThe weight (frequency) of (c). Generally, the weighted average strategy can increase the diversity of the fusion result or the number of unique data elements in the target data set, thereby improving the accuracy of the target data set. The classical duplicate detection and data fusion process can refer to algorithm 1.

Example (b):

1) Inputting a production area weather data set X (= [ s ]) containing n samples ₁ ，s ₂ ，...，s _i ，...，s _n ] ^T ) And n is the number of data samples of a certain production area for more than 15 years. Wherein the content of the first and second substances,data elementsFrom L different data vectors X ^l ，X ² ，...，X ⁱ ，...，X ^L . All data elements are first processed by an integer and then each sample is converted into a lattice node (quantum state or binary state). For vector X ^j Element(s) in (1), unique number of values m _j ＝max(X ^j )-min(X ^j ). Wherein, max (X) ^j ) And max (X) ^j ) Are each X ^j The maximum and minimum values of the medium elements. From equations (1) and (2), a lattice representation of the quantum or binary states of each sample can be obtained and organized into the lattice structure shown in fig. 2 and 4.

2) For each element contained in the lattice node, the contraction is performed as shown in fig. 3. And (3) completing contraction of the elements in each vector after the elements in the vector meet the conditions shown by inequalities (3) and (4). And when the elements in all the vectors are shrunk, obtaining a dot matrix shrinking result.

3) According to the lattice contraction process, subscript analysis is carried out on the quantum bit or binary bit in each element according to the graph shown in FIG. 5 and the formula (6), and the contracted element shown in the formula (5) and a sample consisting of the elements are obtained. The data samples in each subset are then obtained according to inequalities (7) and (9).

4) For the data samples in each subset, they are fused into a single target sample according to fig. 6 and according to equations (10) and (11). And when the data samples in all the subsets are fused, obtaining a final production area weather data fusion result.

Claims

1. A method for fusing production area weather data based on lattice contraction is characterized by comprising the following steps:

1) Lattice shrinkage based on adjacent bit and quantum representation

For a Source producing zone weather dataset X (= [ s ] ₁ ,s ₂ ,…,s _i ,…,s _n ] ^T ) Data sample s in (1) _i Data elements thereinFrom L different data vectors X ¹ ,X ² ,…,X ^j ,…,X ^L (ii) a L is determined by the number of the weather attributes of the producing area; here, X ¹ ,X ² ,…,X ^j ,…,X ^L The following production zone weather attributes are indicated, respectively: average maximum air temperature, average minimum air temperature, historical maximum air temperature, historical minimum air temperature, weekly average air temperature, zhou Qiwen deviation, weekly accumulated rainfall, weekly rainfall deviation, 24-hour maximum rainfall, monthly accumulated rainfall, annual accumulated rainfall, average maximum relative humidity, average minimum relative humidity, days with air temperature above 90 ° F, days with air temperature below 32 ° F, days with rainfall above 0.01 inch, days with rainfall above 0.5 inch;

defining data samples s _i Is composed ofRecording data samples once a week, wherein a production area weather data set X comprises data sample records of a certain production area for more than 15 years; if the data vector X ^j Has m as a data element _j A unique value, then the data elementIs quantized to contain m _j Quantum states of quantum sites, i.e.Wherein only one qubit is equal to |1&gt, its subscript andat m _j The bit sequences in the unique value sequence are consistent; when the unique value number m is taken from all the data vectors ₁ ,m ₂ ,…,m _j ,…,m _L After determination, the data sample s _i Quantized into a quantum state having M qubits; obviously, there are L qubits equal to |1 in each quantized sample> the lattice structure is obtained by distributing quantized samples to all |0&gt, quantum state |00 … 0 … 00&gt, and all |1&gt, quantum state |11 … 1 … 11&gt, constructing;

the first layer of the lattice structure has only one M qubits |0&gt and has M child nodes, each node in the second layer has M-1 child nodes and has a qubit equal to |1&Each node in the third layer has M-2 child nodes with two qubits equal to |1&The situation of other layers can be analogized; the (M/2+1) layer has the largest number of lattice nodes; and contains L qubits |1&The quantized sample is positioned on the L +1 layer; with data samples | s _i &For example, a progressive lattice contraction process based on adjacent bits is described;

data sample s _i Quantum of (2) | s _i &gt, quantum representation of L data elements contained by itForming; and contains m _j Quantization element of quantum siteHas the following expression:

wherein the qubitIs equal to |0&gt, or |1> defining quantized data samples | s _i &gt is:

due to sample | s _i &gt, the expression of Is composed of the expression of (1), thus will | s _i &gt is decomposed into the shrinkage of the elements involved; two qubitsAndcan be achieved by controlled manipulation; in particular, qubitsAndas an uncombined controlled bit, qubit |0&gt is taken as the target position; if it isOrIs equal to |1&gt, applying an X-gate to the target quantum bit |0> when a specific attribute value in a data vector reaches a given threshold value, the contraction of the data elements in the vector is ended;

setting the following inequality as a lattice contraction threshold value based on adjacent bits;

wherein, the first and the second end of the pipe are connected with each other,representing the number of unique data elements contained in the jth data vector in the r-th round of lattice contraction; if inequality (3) is satisfied, continuing the dot matrix shrinkage process, otherwise, terminating the dot matrix shrinkage process; when the speed of reducing the unique data elements is less than the speed of shortening the lengths of the lattice nodes, the lattice shrinkage process is continued;

2) Lattice shrinkage based on neighboring bits and classical representation

The dot matrix contraction principle based on adjacent bits and classical representation is similar to the dot matrix contraction principle based on adjacent bits and quantum representation; the quantum state is replaced by the binary state, so that a classical lattice structure can be obtained, and each lattice node comprises M binary bits; the contraction of two binary bits may be achieved by a logical or operation, as compared to the contraction of two qubits; because the shrinkage of the lattice nodes will result in the aggregation of similar data elements in one data vector, the difference between the data elements in one data vector is enlarged after one round of shrinkage; based on this, the vector variance is taken as a key factor of the shrinkage threshold, which can reflect the dispersion between nodes in the vector;

wherein the content of the first and second substances,representing the variance of the jth data vector in the r-th round of lattice shrinkage; if the vector variance keeps increasing, continuing the dot matrix shrinkage process;

3) Quantum-based duplicate detection and data fusion

Data elements according to the quantum representation of the data elements shown in equation (1)Subscript 1,2,k of the mesoquantum site _j ,…,m _j To representAll unique numeric sequence numbers; in thatDuring the contraction process of (a), the subscript of each qubit remains unchanged; namely, after one round of contraction, the material is put into a mould,as a result of shrinkage ofSubscript values of the medium quantum bits are derived fromSubscript value of the medium quantum bits;

for convenience, data elementsContracted by r wheelThe latter result is set asAnd has the following defined form:

wherein the content of the first and second substances, and so on; for theOnly one qubit in (a) is equal to |1&The qubit is called the pole qubit and is assumed to beIs equal to |1> as the quantum data elements in the vector shrink, the subscript gap between the extreme qubits in the two data elements becomes smaller; in the following, in quantum state | x _i1 &gt, & ltI x > _i2 &For example, showing the change of subscript difference between two pole qubits; wherein, the first and the second end of the pipe are connected with each other, and | x _i1 &gt, & ltX _i2 &gt is the subscript sequence of {1,2, …,8, …,15,16};

obviously, | x _i1 &gt, & ltI x > _i2 &The quantum bit ofAndbefore the shrinkage, the plastic film is put into a shrinking way,andsubscript gap therebetween equal to After one round of contraction, | x _i1 &gt, & ltI x > _i2 >, to | x _i1 >′＝And|x _i1 >' and | x _i2 &gt, the subscript sequence of {1,3,5,7,9,11,13,15}; | x _i1 >' and | x _i2 &The quantum bit of the pole isAndand after one round of contractionAndsubscript gap therebetween equal toFollowing the same approach, | x _i1 >' and | x _i2 &And after two rounds of shrinkage, becomes:and |x _i1 > "and | x _i2 > "has the subscript sequence of {1,5,9,13}, where the pole qubit isAndafter two rounds of shrinkageAndsubscript gap therebetween equal to

The change in the difference in subscripts between the pole qubits indicates that as the data elements in the vector shrink, the difference in subscripts between the pole qubits remains reduced by no less than r after r rounds of shrinking; for this reason, for the setting of the duplicate detection criteria, the amount of reduction of the subscript gap and the number of different contraction rounds of the data elements in the same data sample are mainly considered; data sampleThis | s _i &gt, shrinkage result of|s _i > _rdt The pole qubit ofThen theFrom | s _i > _rdt Middle data element Pole qubit ofComposition is carried out;

the duplicate detection criteria were defined as:

wherein r is _j The number of contraction turns of the data elements in the jth data vector; if the data sample | s _i &gt, & lt | s _i′ &gt, satisfy inequality (7), then they are duplicate data on a subset; according toDefinition of (2), expressionConverting into:

the duplicate detection criteria are expressed as follows:

when all the repeated data samples in a subset are determined, the repeated data samples are fused into a single target data sample; for source data samples in a subset, statistics on the observation probability of all data elements in each vector is needed; the data element with the highest probability of observation in a vector will be selected as the target data element for that vector; with a subset of data ss _t Middle data sampleFor example, a quantum-based data fusion model is shown; wherein n is _t Represents a subset ss _t The number of medium data samples; if the source data set is divided into T subsets, then n ₁ +…+n _t +…+n _T ＝n；

Data samplesCan be decomposed into a fusion of data elements in different data vectors; let the target data sample be | s _t &gt, which represents the subset ss _t Fusion results of the source data samples; | s _t &gt, medium target data elementAs source data samplesMiddle source data element And equals the source data element with the largest probability of observation, i.e.:wherein the content of the first and second substances,as data elementsThe observation probability of (2); if it will be Denoted as max (p) _tj ) And then:

in equation (10), k =1, …, h _t ,…,n _t ；

4) Classical duplicate detection and data fusion

For the classical data fusion model, the focus of attention is the inclusion of the fusion results; that is, a target data element contains all the components of the associated source data element; in subsets ss _t Middle data sampleFor example, a classical data fusion model is presented; wherein the content of the first and second substances,in subsets ss _t The target data sample corresponding to the intermediate source data sample is s _t And is andfor data vector X ^j Middle source data elementCorresponding target data elementA simple way to calculate (1) is to use a weighted average strategy;