CN113742428B - Neural network data set storage method based on blockchain - Google Patents

Neural network data set storage method based on blockchain Download PDF

Info

Publication number
CN113742428B
CN113742428B CN202111102572.7A CN202111102572A CN113742428B CN 113742428 B CN113742428 B CN 113742428B CN 202111102572 A CN202111102572 A CN 202111102572A CN 113742428 B CN113742428 B CN 113742428B
Authority
CN
China
Prior art keywords
source domain
dimensional
distribution
domain feature
low
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111102572.7A
Other languages
Chinese (zh)
Other versions
CN113742428A (en
Inventor
朱永恒
朱登超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yidian Life E Commerce Co ltd
Original Assignee
Yidian Life E Commerce Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yidian Life E Commerce Co ltd filed Critical Yidian Life E Commerce Co ltd
Priority to CN202111102572.7A priority Critical patent/CN113742428B/en
Publication of CN113742428A publication Critical patent/CN113742428A/en
Application granted granted Critical
Publication of CN113742428B publication Critical patent/CN113742428B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of blockchains, in particular to a neural network data set storage method based on a blockchain. According to the method, firstly, source domain characteristic distribution and target characteristic distribution corresponding to each end are obtained according to stored data and data to be stored of each end in a block chain; according to the feature space where the source domain feature distribution at any two ends is located, mapping and aligning two source domain feature distributions of different feature spaces; obtaining rationality of each target feature vector and rationality degree of target feature distribution in target feature distribution corresponding to two source domain feature distribution according to the obtained mapping alignment result; and determining a target end capable of storing the data to be stored in the blockchain according to a preset judging condition. According to the invention, the optimal storage data is obtained by utilizing the reasonable degree of the data to be stored, so that the diversity and completeness of the storage data on the blockchain are improved, and the aim of training the neural network with higher accuracy and feature extraction capability is fulfilled.

Description

Neural network data set storage method based on blockchain
Technical Field
The invention relates to the technical field of blockchains, in particular to a neural network data set storage method based on a blockchain.
Background
With the rapid development of society, various industries need to record a great amount of data which is continuously generated and can assist the construction and development of society, for example, traffic image data can be used for assisting the construction of urban traffic; the financial foreign trade data is beneficial to clients and transaction data, and can be used for large data analysis, financial risk early warning and the like; medical image data in the field of medical health can be used for diagnosis of diseases and the like.
The existing method for storing data based on the blockchain is to store the data in the blocks randomly, so when training the neural network according to the data in the blockchain, if the neural network with strong feature extraction capability is required to be trained, more data needs to be extracted from the blockchain, but the data can be stored in the blocks with relatively long distance or small correlation, so that the training speed of the neural network is seriously affected, and the problems of over-fitting and under-fitting can also occur, so that the accuracy of the neural network is reduced.
Disclosure of Invention
In order to solve the technical problems, the invention aims to provide a neural network data set storage method based on a blockchain, which adopts the following technical scheme:
One embodiment of the invention provides a neural network data set storage method based on a blockchain, which comprises the following steps:
the method comprises the steps of obtaining feature vectors of stored data of each end in a block chain and feature vectors of data to be stored, which are obtained by each end, and obtaining source domain feature distribution and target feature distribution corresponding to each end respectively;
according to the feature space where the source domain feature distribution corresponding to any two ends is located, mapping and aligning two source domain feature distributions of different feature spaces;
obtaining rationality of each target feature vector in target feature distribution corresponding to any two source domain feature distributions according to mapping alignment results of any two source domain feature distributions;
obtaining the reasonable degree of the target feature distribution of each end according to the rationality of each target feature vector in the target feature distribution corresponding to any two source domain feature distributions;
and determining the target end capable of storing the data to be stored in the block chain according to the reasonable degree of the target feature distribution of each end and a preset judging condition.
Preferably, the mapping and aligning the two source domain feature distributions in different feature spaces according to the feature space in which the source domain feature distribution corresponding to any two ends is located includes:
Acquiring low-dimensional source domain feature distribution and Gao Weiyuan domain feature distribution in the source domain feature distribution corresponding to any two ends, wherein the dimension of the low-dimensional source domain feature distribution is smaller than that of the Gao Weiyuan domain feature distribution;
calculating a low-dimensional source domain Gram matrix corresponding to the low-dimensional source domain feature distribution and an up-dimensional source domain Gram matrix corresponding to the up-dimensional source domain feature distribution; the ascending dimension source domain feature distribution is obtained after mapping and aligning the low dimension source domain feature distribution;
constructing an objective function:
wherein M is 0 A focus matrix for the low-dimensional source domain feature distribution a;distribution of source domain characteristics for upkeep->A corresponding dimension-increasing source domain Gram matrix; m is M a A low-dimensional source domain Gram matrix corresponding to the low-dimensional source domain feature distribution a; />Representing a matrix of interest M 0 And matrix->Is a Hadamard product of (C); />Distribution of source domain characteristics for upkeep->A corresponding upgoing source domain feature matrix; d (D) b Gao Weiyuan domain feature matrix corresponding to Gao Weiyuan domain feature distribution b; ΔD of b For Gao Weiyuan domain feature matrix D b A noise matrix included in the image data; i 2 A calculation formula for obtaining an L2 norm;
solving the objective function, and solving the up-dimensional source domain feature matrix enabling the objective function to reach the minimum value A preset noise matrix delta D b
And obtaining the ascending dimension source domain feature distribution and the equal dimension source domain feature distribution after the low dimension source domain feature distribution and the Gao Weiyuan domain feature distribution are mapped and aligned.
Preferably, the method for acquiring the attention matrix includes:
presetting that the concerned matrix is identical to the low-dimensional source domain Gram matrix in rows and columns, and the low-dimensional source domain feature vectors corresponding to elements at the same position in the two matrices are identical;
setting all element values of the attention matrix as preset values;
calculating Euclidean distance between any target feature vector in the target feature distribution at any end and each source domain feature vector in the corresponding source domain feature distribution; acquiring the minimum Top-k Euclidean distances corresponding to each target feature vector and the source domain feature vectors corresponding to the minimum Top-k Euclidean distances as first source domain feature vectors to form a first source domain set;
forming a total set by a plurality of the first source domain sets; calculating the occurrence times of the same first source domain feature vector in the total set, and resetting the element value at the corresponding position of each source domain feature vector in the attention matrix to the occurrence times of the corresponding first source domain feature vector;
And carrying out normalization processing on the element values of the attention matrix after reset to obtain the attention matrix after normalization.
Preferably, the method for obtaining the mapping alignment result includes:
for a low-dimensional target feature vector in the low-dimensional target feature distribution, acquiring the first source domain feature vector and the first source domain set corresponding to the low-dimensional target feature vector;
presetting a undetermined parameter sequence, and constructing a linear mathematical model containing undetermined parameters by the undetermined parameter sequence and the first source domain feature vector corresponding to the low-dimensional target feature vector;
the linear mathematical model is:
wherein x is a low-dimensional target feature vector in the low-dimensional target feature distribution; θ q The q-th undetermined parameter in the undetermined parameter sequence; a, a q The q-th first source domain feature vector corresponding to the low-dimensional target feature vector x; q is the number of the first source domain feature vectors;
obtaining the undetermined parameter sequence by using a RANSAC algorithm;
mapping and aligning the first source domain set corresponding to the low-dimensional target feature vector to obtain an up-dimensional first source domain set; obtaining a mapping alignment result of a low-dimensional target feature vector in a low-dimensional target feature distribution corresponding to the low-dimensional source domain feature distribution according to the first source domain set with the ascending dimension and the undetermined parameter;
The mapping aligns the results:
wherein f (x) is a mapping alignment result of the low-dimensional target feature vector;and the first source domain feature vector corresponding to the q-th first source domain feature vector.
Preferably, the obtaining the rationality of each target feature vector in the target feature distribution corresponding to any two source domain feature distributions according to the mapping alignment result of any two source domain feature distributions includes:
calculating the L2 norm of the difference value between the mapping alignment result and any two upgoing source domain feature vectors in the upgoing source domain feature distribution and the L2 norm of the difference value between the two upgoing source domain feature vectors; taking the average value of the three L2 norms as a first discrete degree;
calculating the L2 norm of the difference value between the mapping alignment result and any two isodimensional source domain feature vectors in the isodimensional source domain feature distribution and the L2 norm of the difference value between the two isodimensional source domain feature vectors; taking the average value of the three L2 norms as a second discrete degree;
obtaining rationality of a target feature vector corresponding to the mapping alignment result according to the first discrete degree, the second discrete degree and absolute values of differences of inner products of source domain feature vectors before and after the corresponding mapping alignment;
The rationality calculation formula is as follows:
wherein R is x (a, b) is the rationality of any target feature vector x in the low-dimensional target feature distribution a1 corresponding to the low-dimensional source domain feature distribution a and the high-dimensional target feature distribution b1 corresponding to the Gao Weiyuan domain feature distribution b;representing y, z is the upbound source domain feature distribution +.>Any two of the up-dimensional source domain feature vectors; />Representing m, n is the isodimensional source domain feature distribution +.>Any two of the isodimensional source domain feature vectors; ρ x (y, z) is a mapping alignment result f (x) with the up-web source domain feature distributionA first degree of discretization of the up-dimensional source domain feature vector y, z; ρ x (m, n) is the mapping alignment result f (x) and the said equidimensional source domain feature distribution ++>A second degree of discretization of the medium-dimensional source domain feature vector m, n; ΔM yz The absolute value of the difference between the inner product of the up-dimensional source domain eigenvector y, z and the inner product of the low-dimensional source domain eigenvector before alignment of the mapping corresponding to y, z.
Preferably, the obtaining the reasonable degree of the target feature distribution at each end according to the rationality of each target feature vector in the target feature distribution corresponding to any two source domain feature distributions includes:
the reasonable degree calculation formula is as follows:
wherein P is the reasonable degree of the target feature distribution; x epsilon a1 represents that x is any target feature vector x in the low-dimensional target feature distribution a 1; b epsilon z represents b as any one low-dimensional source domain feature distribution in all other low-dimensional source domain feature distribution sets z except the low-dimensional source domain feature distribution a; n represents the number of low-dimensional source domain feature distributions contained in the low-dimensional source domain feature distribution set z.
Preferably, the determining, according to the reasonable degree of the target feature distribution of each end and a preset judgment condition, the target end capable of storing the data to be stored in the blockchain includes:
acquiring final target feature distribution meeting the preset judging condition, and taking the end corresponding to the final target feature distribution as a target end; and the preset judging condition is target characteristic distribution corresponding to the maximum value of the reasonable degree.
The invention has the following beneficial effects:
according to the embodiment of the invention, the characteristic vector of the data to be stored is divided into the source domain characteristic distribution and the target characteristic distribution according to the characteristic vector of the data stored in the block chain at each end and the characteristic vector of the data to be stored acquired at each end by using the block chain technology, so that the data which can be stored on the block chain in the data to be stored is acquired, and the accuracy of acquiring the storable data is improved; mapping the low-dimensional source domain feature distribution of each end to the high-dimensional source domain feature distribution, and improving the rationality calculation accuracy of the subsequent target feature vector; the rationality and the rationality of the target feature vector are analyzed, and the distribution relation between the data to be stored and the data features stored on the blockchain in different feature spaces can be further obtained according to the rationality; and storing the data to be stored into the blockchain according to the reasonable degree of the target feature vector. The method ensures that the correlation exists between the data contained in the block newly added to the block chain and the feature vectors contained in the data of other adjacent blocks, ensures that the data stored in the local block contains various, comprehensive and uniformly distributed feature information, is beneficial to training of the neural network, improves the training speed of the neural network, avoids the problems of over fitting and under fitting, improves the accuracy of the neural network, and is beneficial to training the neural network with higher accuracy and stronger feature extraction capability.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a method for storing a neural network data set based on a blockchain according to an embodiment of the present invention.
Detailed Description
In order to further describe the technical means and effects adopted by the present invention to achieve the preset purpose, the following detailed description refers to the specific implementation, structure, characteristics and effects of a blockchain-based neural network data set storage method according to the present invention with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The embodiment of the invention provides a specific implementation method of a neural network data set storage method based on a blockchain, which is suitable for storing traffic data, namely image data under different weather environments and label data corresponding to each image data, and also can be used for storing financial foreign trade data and medical and health data: the images comprise information such as roads, vehicles, pedestrians, road identifications and the like, each image is provided with a label, and the labels in the embodiment of the invention refer to mask images of semantic areas such as roads, vehicles, pedestrians, road identifications and the like in the images, and the image data can be used for training neural networks such as semantic segmentation networks and the like. The practitioner may also store other data and tag data corresponding to each data from different individuals, businesses, institutions, etc.
The blocks on the blockchain in the embodiments of the present invention are generated by the ends, which refer to individuals, businesses, institutions, etc. that have some computing power and data acquisition capability. In this embodiment, there is already a blockchain that has stored a large amount of data, and there is still a large amount of data generated that needs to be stored on the existing blockchain; different DNN networks (Deep Neural Networks, deep neural networks, simply referred to as neural networks) can be trained according to the data stored in the blockchain, the DNN networks can be different in structure, different in application scene or function, and different in extracted characteristics on the same data. Some DNN networks have been trained based on data on existing blockchains for different purposes and have some feature extraction capabilities.
When the new DNN network is trained by utilizing the data on the blockchain, the data on the blockchain is required to be continuously read, or the DNN network is trained after the data on the blockchain is temporarily stored locally, the data of the DNN network is trained by certain local blocks on the blockchain, and the characteristics of the data are required to have good, various and complete characteristic distribution, so that the DNN network can learn comprehensive and accurate characteristics, the problem of serious overfitting or underfitting caused by unbalanced data distribution of the DNN network is avoided, and the DNN network accuracy is further improved.
The idea of an embodiment of the invention is to obtain some data, which is stored on the blockchain if it can increase the diversity and completeness of the features of the data stored on the blockchain. The problems of data transmission, communication, encryption and decryption of data, identity authentication of data, storage of data and the like are not considered when the data is stored by using the blockchain technology, because the problems are all the prior art.
The embodiment of the invention utilizes a blockchain technology to divide the feature vector of the data to be stored into source domain feature distribution and target feature distribution, maps and aligns the source domain feature distribution of each end low dimension to the source domain feature distribution of high dimension, analyzes the rationality and rationality degree of the target feature vector, obtains the optimal target feature distribution according to the rationality degree of the target feature vector, and stores the data to be stored corresponding to the optimal target feature distribution into a blockchain. The method achieves the aim of increasing the diversity and completeness of the data stored on the blockchain on the data to be stored corresponding to the optimal target feature distribution, is favorable for training DNN networks with higher accuracy and stronger feature extraction capability, and improves the accuracy and efficiency.
The following specifically describes a specific scheme of the neural network data set storage method based on the blockchain, which is provided by the invention with reference to the accompanying drawings.
Referring to fig. 1, a flowchart of a method for storing a neural network data set based on a blockchain is shown in an embodiment of the present invention.
Step S100, obtaining feature vectors of data stored in the block chain by each end and feature vectors of data to be stored obtained by each end, and obtaining source domain feature distribution and target feature distribution corresponding to each end respectively.
All terminals can acquire a trained DNN network, one terminal corresponds to one DNN network, the DNN networks are trained according to data on a blockchain, and the DNN networks may have different structures and functions.
Each end uses its own DNN network to extract the feature vector of the data stored on the latest Q blocks on the blockchain, in this embodiment the value of Q is 50.
The specific method for extracting the feature vector of the stored data of the latest block on the block chain comprises the following steps:
(1) All data stored in the latest Q blocks on the block chain are read out by each end, then each end respectively acquires the characteristic vector extracted by the DNN network on each data, and each characteristic vector is a high-dimensional vector and can be regarded as a point of a high-dimensional space.
Wherein, the feature vector extracted by the DNN network on each data refers to: inputting the data into a DNN network, and then obtaining a feature vector diagram or a feature vector output by a specific convolution layer of the DNN network, wherein the feature vector diagram or the feature vector is flattened into a one-dimensional vector, and the result is the feature vector of the data. The feature vector diagram or the feature vector outputted by a specific convolution layer of the DNN network, such as a feature vector outputted by an encoder of a semantic segmentation network (such as a SegNet network and a U-Net network) or a key point detection network, or the feature vector outputted by the last layer of the full connection layer of the classification regression network of a full connection network and the like.
The feature vectors extracted by the DNN network are all abstract, high-dimensional features presented by the input data, often having dimensions much lower than those of the input data. It should be noted that, what DNN network is specifically utilized, and what convolution layer of the DNN network is used to output the feature vector diagram or the feature vector as the feature vector of the data, which is determined by each end, the embodiment of the present invention only focuses on how the feature vectors extracted by each end on the data are distributed, and does not focus on how the feature vectors are obtained.
(2) Each end obtains some data to be stored, which needs to be stored on the blockchain but is not yet stored on the blockchain, and each end extracts the feature vector of each data in the data to be stored by using a respective DNN network. It should be noted that, the data to be stored acquired by each end may be different, which is caused by different sources of the data acquired by each end, or the delay of network data transmission, etc.
To this end, each end extracts some data feature vectors using its own DNN network, with a large portion of the feature vectors being extracted from the data stored on the blockchain. The feature vectors extracted from the data stored on the blockchain constitute a feature distribution, i.e., a set of feature vectors, which is referred to as a source domain feature distribution.
Another fraction of feature vectors is extracted from the data to be stored, which fraction of feature vectors also constitutes a feature distribution, i.e. a set of feature vectors, which feature distribution is referred to as target feature distribution. It should be noted that, the feature distribution may also be regarded as a matrix, where each row of the matrix corresponds to one feature vector, that is, a matrix corresponding to the source domain feature distribution is referred to as a source domain feature vector matrix, and a matrix corresponding to the target feature distribution is referred to as a target feature vector matrix.
And each end shares the source domain characteristic distribution and the target characteristic distribution which are obtained respectively to other users of all the ends, namely, each end can obtain the source domain characteristic distribution and the target characteristic distribution of the other ends.
It should be noted that, one end corresponds to one source domain feature distribution and one target feature distribution, where one feature distribution is actually a set of feature vectors, and the feature distribution may also be represented by a matrix, and one matrix corresponds to one feature distribution.
Step S200, according to the feature space where the source domain feature distribution corresponding to any two ends is located, mapping and aligning two source domain feature distributions of different feature spaces.
And acquiring source domain feature distribution corresponding to any two ends, wherein the source domain feature distribution is assumed to be a source domain feature distribution a and a source domain feature distribution b. Because the two source domain feature distributions are feature vector sets extracted from data on the same blockchain on different DNN networks, the two source domain feature distributions characterize different data features in the same data, so the spatial dimensions in which the two source domain feature distributions are located may not be identical.
In the embodiment of the invention, the source domain feature distribution a is assumed to be a low-dimensional space, b is a high-dimensional space, and the dimension of the low-dimensional source domain feature distribution is smaller than that of the Gao Weiyuan domain feature distribution, namely the source domain feature distribution a is a low-dimensional source domain feature distribution, and the source domain feature b is Gao Weiyuan domain feature distribution. The feature space is determined by the number of feature directions, which is determined by the number of non-collinear feature vectors.
And mapping and aligning two source domain feature distributions of different feature spaces, namely mapping and up-sizing source domain feature vectors of a low-dimensional space to source domain feature vectors of a high-dimensional space. Conventional mapping up-dimensional alignment methods use projective transforms for spatial transformation, but projective transforms are generally applicable to linear spatial transforms.
The embodiment of the invention provides a mapping dimension-increasing alignment method, which is hereinafter referred to as a mapping alignment method, so as to align the source domain feature distribution of one low dimension with the source domain feature distribution of another high dimension. The method for alignment has the following thought: the feature vector in one low-dimensional source domain feature distribution is upscaled, and then the result after upscaling has small difference with the feature vector in another high-dimensional source domain feature distribution. The precondition is that the two source domain feature distributions must be feature distributions of different dimensions for characterizing the same quantity of data. If the two source domain feature distributions are the same dimension, the method can be used for calculation, and only one source domain feature distribution is selected at will to be considered as low-dimensional, and then the method is used for calculation, so that the special case of the same dimension is not specifically described in the embodiment of the invention.
The mapping alignment method comprises the following specific steps:
(1) The ascending dimension source domain characteristic distribution after the mapping alignment of the preset low dimension source domain characteristic distribution a is as followsIt should be noted that the upgoing source domain feature distribution is +.>Still a quantity to be quantified. By the up-dimension source domain feature distribution->Obtaining the corresponding source domain characteristic matrix as +.>One of the feature distributions can be represented as a feature vector matrix, each row of which corresponds to one of the feature distributions.
Calculating a low-dimensional source domain Gram matrix M corresponding to all low-dimensional source domain feature vectors in the low-dimensional source domain feature distribution a a Wherein one source domain feature distribution corresponds to one source domain Gram matrix. And calculate the up-dimension source domain feature distributionIn the corresponding up-dimension source domain Gram matrix of all up-dimension source domain feature vectors +.>Low-dimensional source domain Gram matrix M a The inner product of any two low-dimensional source domain feature vectors in the low-dimensional source domain feature distribution a is expressed as a low-dimensional source domain Gram matrix M a Is a component of the group. Specifically, assuming that the inner product of the ith low-dimensional source domain feature vector and the jth low-dimensional source domain feature vector in the low-dimensional source domain feature distribution a is v, the low-dimensional source domain Gram matrix M a The element of row i and column j is v.
Low-dimensional source domain Gram matrix M a The geometric relationship of each feature vector in the low-dimensional source domain feature distribution a may be represented. Low-dimensional source domain Gram matrix M a Each row corresponds to a low-dimensional source domain feature vector, and all elements in a row are listed in the tableThe inner product of a certain low-dimensional source domain feature vector and all other low-dimensional source domain feature vectors is shown, each column of the matrix also corresponds to a low-dimensional source domain feature vector, and all elements in a column represent the inner product of a certain low-dimensional source domain feature vector and all other low-dimensional source domain feature vectors. Low-dimensional source domain Gram matrix M a The number of rows and columns of the pattern are the number of feature vectors in the low-dimensional source domain feature distribution a, and the ascending-dimensional source domain Gram matrixAnd the same is true.
(2) Let Gao Weiyuan domain feature matrix of Gao Weiyuan domain feature distribution b be D b And presetting Gao Weiyuan domain feature matrix D b The noise matrix DeltaD contained in (a) b . The noise matrix DeltaD b Size and Gao Weiyuan domain feature matrix D b The noise matrix delta D is uniform in size b Characterization Gao Weiyuan domain feature matrix D b The noise contained in the matrix is Gao Weiyuan domain feature matrix D b Is a function of the error of (a).
The purpose of denoising the Gao Weiyuan domain feature matrix in the embodiment of the invention is that, because the dimension of the up-dimensional source domain feature distribution after mapping alignment is the same as the dimension of the Gao Weiyuan domain feature distribution, the alignment is to achieve the purpose that the feature vectors of the two feature distributions are the same as possible, if denoising is not performed, a large number of high-dimensional feature vectors different from the up-dimensional feature vectors in the up-dimensional source domain feature distribution may exist in the Gao Weiyuan domain feature distribution, so that the alignment purpose is not achieved.
In the embodiment of the invention, the noise matrix Δd is desired b Is sparse and is intended to achieve the goal of having as little difference as possible between the low-dimensional source domain feature distribution a and the Gao Weiyuan domain feature distribution b map-aligned source domain feature distribution. Meanwhile, the embodiment of the invention also aims to achieve the aim that the geometric relationship of the source domain Gram matrix corresponding to the source domain feature distribution before and after the dimension rise is not damaged as far as possible.
Therefore, the objective function Loss is constructed, and in the embodiment of the invention, the smaller the objective function Loss is, the better the objective function Loss is, and the smaller the objective function Loss isThe method can enable the low-dimensional source domain feature vector in the low-dimensional source domain feature distribution a to be close to the Gao Weiyuan domain feature vector in the Gao Weiyuan domain feature distribution b after the dimension is increased, and enable the geometric relationship before and after the dimension is increased to be unchanged. Obtaining the ascending dimension source domain characteristic distribution when the objective function Loss reaches the minimum valueObtaining the corresponding up-dimension source domain feature matrix as +.>Noise matrix Δd b
Constructing an objective function Loss:
wherein M is 0 A focus matrix for the low-dimensional source domain feature distribution a;distribution of source domain characteristics for upkeep->A corresponding dimension-increasing source domain Gram matrix; m is M a A low-dimensional source domain Gram matrix corresponding to the low-dimensional source domain feature distribution a; />For upgoing dimension source domain feature distribution A corresponding upgoing source domain feature matrix; d (D) b Gao Weiyuan domain feature matrix corresponding to Gao Weiyuan domain feature distribution b; ΔD of b For Gao Weiyuan domain feature matrix D b The noise matrix contained in the matrix.
Wherein D is b -ΔD b Representing a high-dimensional source domain feature matrix D b And denoising the matrix.Representing the feature distribution of the upkeep source domain->Corresponding up-dimensional source domain feature matrix +.>And Gao Weiyuan domain feature matrix D b Corresponding isodimensional source domain feature matrix after denoising ∈>The embodiment of the invention expects that the difference is as small as possible, the smaller the difference is, which is indicative of the up-dimension source domain feature distribution +.>The smaller the difference between the upgoing source domain feature vector in Gao Weiyuan domain feature distribution b and the Gao Weiyuan domain feature vector in Gao Weiyuan domain feature distribution b, wherein ∈>It is to be noted that the feature matrix of the isodimensional source domain +.>Namely Gao Weiyuan domain feature matrix D b And denoising the matrix.
The low-dimensional source domain feature distribution a represents the change of a source domain Gram matrix before and after mapping alignment and is used for representing the change of the geometric relationship between each feature vector before and after mapping alignment. Dimension-rising source domain Gram matrix>And a low-dimensional source domain Gram matrix M a The smaller the difference between the two feature vectors is, the smaller the geometric relation between the source domain feature vectors before and after the mapping alignment in the low-dimensional source domain feature distribution a is changed.
Representing a matrix of interest M 0 And matrix->I.e. the matrix formed by multiplying the corresponding elements of the two matrices. Introduction of a matrix of interest M 0 The method aims at distributing a larger attention to the low-dimensional source domain feature vector with smaller Euclidean distance with the low-dimensional target feature vector in the low-dimensional target feature distribution a1 in the low-dimensional source domain feature distribution a so as to ensure that the low-dimensional target feature vector in the low-dimensional target feature distribution a1 is mapped and aligned later.
Wherein the matrix of interest M 0 And low-dimensional source domain Gram matrix M a The size of the matrix of interest M is uniform, i.e. rows and columns are uniform 0 And (3) representing the importance degree of the low-dimensional source domain feature vector in the low-dimensional source domain feature distribution a. Low-dimensional source domain Gram matrix M a Each row and each column of (1) corresponds to a low-dimensional source domain feature vector, a matrix M is concerned 0 Each row and each column also corresponds to the same low-dimensional source domain feature vector, and focuses on matrix M 0 And low-dimensional source domain Gram matrix M a The source domain feature vectors corresponding to the elements at the same position are the same. Such as a low-dimensional source domain Gram matrix M a The low-dimensional source domain feature vector corresponding to a certain row or a certain column of (a) is u, then the matrix M is concerned 0 The low-dimensional source domain feature vector corresponding to the same row or column in (a) is also u.
Specifically, the attention matrix M 0 The acquisition method of (1) comprises the following steps:
1) Matrix of interest M 0 All the element values of (2) are set to a predetermined value. In the embodiment of the present invention, the preset value is 1.0.
2) Based on any low-dimensional object feature vector c in low-dimensional object feature distribution a1 corresponding to low-dimensional source domain feature distribution a, calculating Euclidean distance between each low-dimensional source domain feature vector in low-dimensional source domain feature distribution a and low-dimensional object feature vector c, acquiring minimum Top-k Euclidean distances, and low-dimensional source domain feature vectors corresponding to the minimum Top-k Euclidean distances, and taking the Top-k low-dimensional source domain feature vectors as first source domain feature vectors to form a first source domain set
{a 1 ,a 2 ,…,a q ,…,a Q }. In this embodiment, the practitioner can adjust the value of k according to the actual situation.
It should be noted that, since there are a plurality of low-dimensional object feature vectors in the low-dimensional object feature distribution a1, each low-dimensional object feature vector has a set of first source domain sets { a } corresponding thereto 1 ,a 2 ,…,a q ,…,a Q }。
A total set S is formed by a plurality of first source domain sets corresponding to the low-dimensional target feature vectors, and the occurrence times of each low-dimensional source domain feature vector in the total set S are calculated, such as a first source domain feature vector a in the total set S q N occurrences of (all first source domain feature vectors are low-dimensional source domain feature vectors) then indicate that there are n low-dimensional target feature vectors and first source domain feature vector a in low-dimensional target feature distribution a1 q The Euclidean distance of (2) is smaller, and then each first source domain feature vector in the total set S is obtained in the attention matrix M 0 Each corresponding row or column, the corresponding element value of each row or column being reset to the number of occurrences of the corresponding first source domain feature vector, e.g. the matrix of interest M 0 The first source domain feature vector corresponding to the mth row and mth column of (a) is a q Then the matrix M will be of interest 0 The element values of the m-th row and m-th column of (a) are set as the first source domain feature vector a q Number of occurrences n. If the low-dimensional source domain feature vector corresponding to the element in the attention matrix does not belong to the total set S, the element value is set to 0, that is, 0 times in the total set S occurs.
And calculating the occurrence times of each first source domain feature vector in the total set, and resetting the element value at the corresponding position of each low-dimensional source domain feature vector in the attention matrix to the occurrence times of the corresponding low-dimensional source domain feature vector.
3) For the reset attention matrix M 0 Element values of (2) are normalized, i.e. reset Post focus matrix M 0 Resetting each element value as the ratio of each element value to the sum of all element values to obtain a normalized attention matrix M 0
(3) Up to this point, the up-dimension source domain feature distribution after mapping alignment of the low-dimension source domain feature distribution a and Gao Weiyuan domain feature distribution b is obtainedAnd the feature distribution of the equal-dimension source domain->
And step S300, obtaining the rationality of each target feature vector in the target feature distribution corresponding to any two source domain feature distributions according to the mapping alignment result of any two source domain feature distributions.
Specifically, the specific step of obtaining the rationality of each target feature vector in the target features includes:
(1) First, any one target feature vector x of a low-dimensional target feature distribution a1 and a high-dimensional target feature distribution b1 corresponding to a low-dimensional source domain feature distribution a and a Gao Weiyuan domain feature distribution b is acquired.
The target feature vector x may be from a low-dimensional target feature distribution a1 or from a high-dimensional target feature distribution b1, and the embodiment of the invention needs to map the target feature vector x to the up-dimensional source domain feature distributionAnd the feature distribution of the equal-dimension source domain->In a high-dimensional space. It should be noted that if the target feature vector x is from the high-dimensional target feature distribution b1, and the high-dimensional target feature distribution b1 itself is in the high-dimensional space, the up-scaling is not required, so if the target feature vector x is from the high-dimensional target feature distribution b1, the target feature vector x itself is in the up-scaling source domain feature distribution- >And the feature distribution of the equal-dimension source domain->In the high-dimensional space where it is located; if the target feature vector x is from a low-dimensional target feature distribution a1, then the dimensions of the target feature vector x are low-dimensional, so it is necessary to align the target feature vector x map to an up-dimensional source domain feature distributionAnd the feature distribution of the equal-dimension source domain->In a high-dimensional space. In the embodiment of the present invention, since the target feature vector x is from the high-dimensional target feature distribution b1 without any dimension increase, only the case that the target feature vector x is from the low-dimensional target feature distribution a1 with the dimension increase due to the mapping alignment is considered.
(2) When the target feature vector x comes from the low-dimensional target feature distribution a1, the target feature vector x is a low-dimensional wood target feature vector, and the low-dimensional target feature vector x is now up-scaled to a high-dimensional space.
The specific steps of the dimension increasing are as follows:
1) In the low-dimensional source domain feature distribution a, the low-dimensional source domain feature vector corresponding to the minimum Top-k Euclidean distances among the Euclidean distances of the target feature vector x is obtained, namely the first source domain set { a) related to 2) in the step S200 1 ,a 2 ,…,a q ,…,a Q }。
2) Presetting a predetermined parameter sequence { theta } 12 ,…,θ q ,…,θ Q And constructing a linear mathematical model containing the undetermined parameters by the undetermined parameter sequence and the first source domain feature vector. The embodiment of the invention solves the pending parameter { theta }' by utilizing the RANSAC algorithm 12 ,…,θ q ,…,θ Q }。
The linear mathematical model containing undetermined parameters can also beTo be considered as hyperplane of high-dimensional space. Pending parameter sequence { θ 12 ,…,θ q ,…,θ Q Random variable is a first source domain feature vector a q Wherein the sample data of the random variable is a first source domain set { a } 1 ,a 2 ,…,a q ,…,a Q Now, the undetermined parameter sequence { θ } needs to be solved 12 ,…,θ q ,…,θ Q Fitting a linear mathematical model containing the parameters to be determined to obtain the sample data. The RANSAC algorithm is a means for solving such a mathematical model, and is specifically known in the art, and will not be described herein.
The linear mathematical model containing the undetermined parameters is:
wherein x is a low-dimensional target feature vector in the low-dimensional target feature distribution; θ q The q-th undetermined parameter in the undetermined parameter sequence; a, a q The q first source domain feature vector corresponding to the low-dimensional target feature vector x; q is the number of first source domain feature vectors.
Note that, the Top-k low-dimensional source domain feature vectors (Top-k low-dimensional source domain feature vectors having the smallest distance, i.e., the first source domain feature vector) having the smallest distance from the low-dimensional target feature vector x are utilized, because the distribution of the low-dimensional target feature vector x can be reflected more accurately as the distance from the low-dimensional target feature vector x is closer. The distribution situation is kept unchanged before and after the rising dimension of the low-dimensional target feature vector x, namely the linear relation between the low-dimensional target feature vector x and the Top-k low-dimensional source domain feature vectors with the smallest distance before the rising dimension is kept consistent with the linear relation between the low-dimensional target feature vector x and the Top-k low-dimensional source domain feature vectors with the smallest distance after the rising dimension. The low-dimensional object feature vector x is then upscaled to a high-dimensional space based on this.
3) According to the low-dimensional source domain characteristic distribution a and the corresponding ascending-dimensional source domain characteristic distributionAcquiring a first source domain set { a } 1 ,a 2 ,…,a q ,…,a Q The first source domain set of the up-dimension after mapping alignment is +.>Wherein the feature vector in the up-dimensional first source domain set is an up-dimensional first source domain feature vector, the map alignment process is the same as the map alignment process in step S200.
And obtaining a mapping alignment result of the low-dimensional target feature vector in the low-dimensional target feature distribution a1 according to the first source domain set with the ascending dimension and the parameter to be determined.
The mapping alignment results are:
wherein f (x) is the mapping alignment result of the low-dimensional target feature vector;the first source domain feature vector is the up-dimensional first source domain feature vector corresponding to the q first source domain feature vector; q is the number of first source domain feature vectors; θ q Is the q-th undetermined parameter in the undetermined parameter sequence. It should be noted that the number of up-scaling first source domain feature vectors is the same as the number of first source domain feature vectors.
Up to this point, the low-dimensional object feature vector x up-dimension is aligned so that the up-dimension object feature vector and the up-dimension source domain feature distributionAnd the feature distribution of the equal-dimension source domain->In the same high-dimensional space.
4) Finally, the mapping alignment result of any target feature vector x in the low-dimensional target feature distribution a1 and the high-dimensional target feature distribution b1 is:
Wherein x epsilon a1 is the target feature vector x from the low-dimensional target feature distribution a1; x ε b1 is the target feature vector x from the high-dimensional target feature distribution b1.
(3) When the dimension is raised, the source domain feature distributionAnd the feature distribution of the equal-dimension source domain->If some of the source domain feature vectors are far away from the target feature vector x and the distribution is discrete, it is explained that the target feature vector x can increase the source domain feature vectors (i.e. up-dimension source domain feature distribution +.>And the feature distribution of the equal-dimension source domain->Some source domain feature vectors) of the set of source domain feature vectors; when rising dimension source domain feature distribution +.>And the feature distribution of the equal-dimension source domain->Some of the source domain feature vectors and the target feature vectors x are relatively close in distance, and the source domain feature distribution is concentrated, so that the target feature vectors x cannot be used for increasing the diversity and completeness of the source domain feature vectors too much. In addition to that if->The geometric relationship of the source domain feature vectors in the map alignment is greatly changed, so that the rationality of the target feature vector x and the source domain feature vectors is also affected.
Specifically, the method for calculating rationality is as follows:
firstly, calculating the mapping alignment result and the ascending dimension source domain feature distributionL2 norm of the difference value of any one of the up-dimensional source domain feature vectors in the tree is calculated, and the mapping alignment result and the up-dimensional source domain feature distribution are calculated again>The L2 norm of the difference value of any other up-dimensional source domain feature vector is calculated, and the L2 norm of the difference value between two up-dimensional source domain feature vectors is calculated; the average of the three L2 norms is taken as the first degree of dispersion. Wherein the L2 norm represents the Euclidean distance between two feature vectors; a greater degree of first dispersion indicates that the map-alignment result is more discrete for the upbound source domain feature vector, and a smaller degree of first dispersion indicates that the map-alignment result is more concentrated for the upbound source domain feature vector.
The first degree of discretization is calculated as follows:
wherein ρ is x (y, z) is an upgoing source domain feature distributionA first degree of discretization of the inner dimension source domain feature vector y, z; f (x) is the mapping alignment result, y is the upwarp dimension source domain feature distribution ∈>Any dimension-increasing source domain feature vector is arranged in the memory; z is the Uygur source domainDistribution of symptoms->Any upbound source domain feature vector other than y is inside.
Recalculating the mapping alignment result and the isovitamin source domain feature distribution L2 norm of difference value of any one of the equal-dimension source domain feature vectors in the interior, and calculating mapping alignment result and equal-dimension source domain feature distribution +.>The L2 norm of the difference value of any other equal-dimension source domain feature vector is calculated, and the L2 norm of the difference value between two equal-dimension source domain feature vectors is calculated; the average of the three L2 norms was taken as the second degree of dispersion. Wherein the L2 norm represents the Euclidean distance between two feature vectors; a greater degree of second dispersion indicates that the map alignment result is more discrete for the isovolumetric source domain feature vector, and a smaller degree of second dispersion indicates that the map alignment result is more concentrated for the isovolumetric source domain feature vector.
The second degree of discretization is calculated as follows:
wherein ρ is x (m, n) is an isodimensional source domain feature distributionA second degree of discretization of the inner-dimensional source domain feature vector m, n; f (x) is the mapping alignment result and m is the isodimensional source domain feature distribution +.>An inner arbitrary isodimensional source domain feature vector; n is the characteristic distribution of the equal-dimensional source domain +.>Any isodimensional source domain feature vector other than m.
And obtaining the rationality of the target feature vector corresponding to the mapping alignment result according to the first discrete degree, the second discrete degree and the absolute value of the difference value of the inner product of the source domain feature vector before and after the corresponding mapping alignment.
The rationality calculation formula is as follows:
wherein R is x (a, b) is the rationality of any target feature vector x in the low-dimensional target feature distribution a1 corresponding to the low-dimensional source domain feature distribution a and the high-dimensional target feature distribution b1 corresponding to the Gao Weiyuan domain feature distribution b;for y, z is the upbound source domain feature distribution +.>Any two of the up-dimensional source domain feature vectors; />For m, n is the isodimensional source domain feature distribution +.>Any two of the isodimensional source domain feature vectors; ρ x (y, z) is the mapping alignment result f (x) and the up-dimension source domain feature distribution +.>A first degree of discretization of the up-dimensional source domain feature vector y, z; ρ x (m, n) is the mapping alignment result f (x) and the feature distribution of the isodimensional source domain +.>A second degree of discretization of the medium-dimensional source domain feature vector m, n; ΔM yz For upgoing dimension source domain feature directionsThe absolute value of the difference of the inner product of the quantities y, z and the inner product of the low-dimensional source domain feature vector before alignment of the mapping corresponding to y, z.
ρ x The larger (y, z) the map alignment result f (x) of the target feature vector x and the higher dimension source domain feature vector y, z are discretely distributed, the target feature vector x is put into the lower dimension source domain feature distribution a so that the map alignment result f (x can increase the diversity or completeness of the higher dimension source domain feature vector y, z. Similarly, ρ x The larger the (m, n) is, the more the mapping alignment result f (x) of the target feature vector x and the equal-dimensional source domain feature vector m, n are in discrete distribution, and the target feature vector x is put into the Gao Weiyuan domain feature distribution b, so that the diversity or completeness of the equal-dimensional source domain feature vector m, n can be increased by the f (x).
ΔM yz The larger the inner product of the up-dimensional source domain feature vector y, z is, the larger the inner product is compared with the inner product before mapping alignment is, the up-dimensional source domain feature vector y, the two corresponding low-dimensional source domain feature vectors in the low-dimensional source domain feature distribution a before z mapping alignment are marked as h and g, the lower-dimensional source domain feature vector h and g cannot be ensured to be aligned with Gao Weiyuan domain feature vectors corresponding to Gao Weiyuan domain feature distribution b under the condition that the original geometric relationship is unchanged, the two low-dimensional source domain feature vectors h and g in the low-dimensional source domain feature distribution a and Gao Weiyuan domain feature vectors corresponding to h and g in the Gao Weiyuan domain feature distribution b are not perfectly aligned, the reason that the h and g are not perfectly aligned is that the geometric relationship of the two low-dimensional feature vectors is not strong, if the h and g are required to be perfectly aligned, the other feature vectors are required to be put in order to increase the number of the feature vectors in the low-dimensional source domain feature distribution a and the Gao Weiyuan domain feature distribution b and the diversity of the feature vectors, so that the geometric relationship between the two low-dimensional source domain feature vectors is further introduced into the geometric relationship is further reduced.
ρ x (y,z)ΔM yz Or ρ x The larger the (m, n), the more necessary it is to put the target feature vector x in the low-dimensional source domain feature distribution a and Gao Weiyuan domain feature distribution b (how to put it specifically in the following step S400),and after the target feature vector x is put into the low-dimensional source domain feature distribution a and the Gao Weiyuan domain feature distribution b, the mapping alignment result f (x) of the target feature vector x and the up-dimensional source domain feature vector y, z can be combined together, so that the diversity or completeness of the up-dimensional source domain feature vector y, z can be increased. Similarly, the diversity and completeness of the isovitamin source domain feature vectors m and n can be increased.
Thus R is x The larger (a, b) is the greater the target feature vector x in the low-dimensional target feature distribution a1 and the high-dimensional target feature distribution b1 is necessary for all feature vectors after the mapping alignment of the source domain feature distribution a and the high-dimensional target feature distribution b1, in other words, the target feature vector x is important and reasonable, is not available for the low-dimensional source domain feature distribution a and the Gao Weiyuan domain feature distribution b, can increase the feature vector diversity and the feature vector completeness after the mapping alignment of the low-dimensional source domain feature distribution a and the Gao Weiyuan domain feature distribution b, or the data corresponding to the target feature vector x can enable the data corresponding to the low-dimensional source domain feature distribution a and the Gao Weiyuan domain feature distribution b to contain various complete features, which is required by the embodiment of the present invention.
Step S400, obtaining the reasonable degree of the target feature distribution of each end according to the rationality of each feature vector in the target feature distribution corresponding to any two source domain feature distributions.
According to the step S300, the rationality R of the target feature vector x in the low-dimensional target feature distribution a1 and the high-dimensional target feature distribution b1 corresponding to the low-dimensional source domain feature distribution a and the Gao Weiyuan domain feature distribution b is obtained x (a,b)。
For the low-dimensional target feature distribution a1, each low-dimensional target feature vector in the low-dimensional target feature distribution a1 is a feature vector extracted from the data to be stored by the end corresponding to the low-dimensional target feature distribution a1 through the DNN network. The embodiment of the invention expects that the feature vector of the data to be stored has complementarity with the feature vector of the data stored on the blockchain, or the former can enrich the latter, so that the latter is according to diversity, generalization and completeness. The embodiment of the invention uses the reasonable degree of the low-dimensional target feature distribution to describe whether the feature vector of the data to be stored and the feature vector of the data stored on the blockchain have complementarity, diversity and completeness.
The reasonable degree of the formula is:
wherein P is the reasonable degree of target feature distribution; x is an arbitrary target feature vector in the low-dimensional target feature distribution a 1; b epsilon z represents that b is any source domain feature distribution in a domain feature distribution set z formed by all other source domain feature distributions except the low-dimensional source domain feature distribution a; n represents the number of source domain feature distributions contained in the set of source domain feature distributions z.
Thus, given any source domain feature distribution, the reasonable degree of the corresponding target feature distribution can be calculated. That is, one end corresponds to one source domain feature distribution, one target feature distribution, and one reasonable degree.
And S500, determining a target end capable of storing data to be stored in the block chain according to the reasonable degree of target feature distribution of each end and preset judging conditions.
Based on the reasonable degree of the target feature distribution, obtaining the final target feature distribution meeting the preset judgment condition and the end corresponding to the final target feature distribution. In the embodiment of the invention, the preset judgment condition is that a reasonable degree maximum value and a target feature distribution corresponding to the reasonable degree maximum value are obtained, and the target feature distribution corresponding to the reasonable degree maximum value is used as a final target feature distribution meeting the preset judgment condition.
And storing the data to be stored in the end corresponding to the final target feature distribution in a new block, and storing the block on a block chain. And (3) performing a series of processing on the original band storage data to obtain data capable of being stored on the blockchain, wherein the process of storing the data on the blockchain is a process of putting the target feature vector x into the low-dimensional source domain feature distribution a and the Gao Weiyuan domain feature distribution b in the step S300.
In general, according to the feature distribution of the data already stored on the blockchain and the feature distribution of the data to be stored at each end, the embodiment of the invention allocates a reasonable degree to the feature distribution of the data to be stored at each end, obtains the end with the largest reasonable degree, stores the data on the block by the end, and connects the block to the blockchain, so that the process of determining which end stores the data on the blockchain is the consensus mechanism of the blockchain in the embodiment of the invention. In addition, the end of the game needs to be awarded, such as economic benefit, and how to award is not the key point of the embodiment of the present invention, so the embodiment of the present invention will not be described in detail.
With the continuous generation of new blocks, more and more data are stored on the block chain, and the characteristic distribution of the data has local diversification and completeness, so that the DNN network of different tasks can be trained, and different purposes can be realized.
In summary, the embodiment of the present invention uses the blockchain technique to divide the feature vector of the data to be stored into the source domain feature distribution and the target feature distribution according to the feature vector of the data stored in the blockchain at each end and the feature vector of the data to be stored obtained at each end. Mapping the low-dimensional source domain feature distribution of each end to the high-dimensional source domain feature distribution, analyzing the rationality and the reasonable degree of the target feature vector, obtaining the optimal target feature distribution according to the reasonable degree of the target feature vector, and storing the data to be stored corresponding to the optimal target feature distribution into the block chain. The diversity and completeness of the data stored on the blockchain can be increased on the data to be stored corresponding to the optimal target feature distribution, and the DNN network with higher accuracy and stronger feature extraction capability can be trained.
It should be noted that: the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. And the foregoing description has been directed to specific embodiments of this specification. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims (3)

1. A blockchain-based neural network data set storage method, comprising the steps of:
The method comprises the steps of obtaining feature vectors of stored data of each end in a block chain and feature vectors of data to be stored, which are obtained by each end, and obtaining source domain feature distribution and target feature distribution corresponding to each end respectively;
according to the feature space where the source domain feature distribution corresponding to any two ends is located, mapping and aligning two source domain feature distributions of different feature spaces;
obtaining rationality of each target feature vector in target feature distribution corresponding to any two source domain feature distributions according to mapping alignment results of any two source domain feature distributions;
obtaining the reasonable degree of the target feature distribution of each end according to the rationality of each target feature vector in the target feature distribution corresponding to any two source domain feature distributions;
determining a target end capable of storing data to be stored in the block chain according to the reasonable degree of the target feature distribution of each end and preset judging conditions;
the mapping and aligning the two source domain feature distributions in different feature spaces according to the feature space in which the source domain feature distribution corresponding to any two ends is located includes:
acquiring low-dimensional source domain feature distribution and Gao Weiyuan domain feature distribution in the source domain feature distribution corresponding to any two ends, wherein the dimension of the low-dimensional source domain feature distribution is smaller than that of the Gao Weiyuan domain feature distribution;
Calculating a low-dimensional source domain Gram matrix corresponding to the low-dimensional source domain feature distribution and an up-dimensional source domain Gram matrix corresponding to the up-dimensional source domain feature distribution; the ascending dimension source domain feature distribution is obtained after mapping and aligning the low dimension source domain feature distribution;
constructing an objective function:
wherein M is 0 A focus matrix for the low-dimensional source domain feature distribution a;distribution of source domain characteristics for upkeep->A corresponding dimension-increasing source domain Gram matrix; m is M a A low-dimensional source domain Gram matrix corresponding to the low-dimensional source domain feature distribution a; />Representing a matrix of interest M 0 And matrix->Is a Hadamard product of (C); />Distribution of source domain characteristics for upkeep->A corresponding upgoing source domain feature matrix; d (D) b Gao Weiyuan domain feature matrix corresponding to Gao Weiyuan domain feature distribution b; ΔD of b For Gao Weiyuan domain feature matrix D b A noise matrix included in the image data; II 2 A calculation formula for obtaining an L2 norm;
solving the objective function, and solving the up-dimensional source domain feature matrix enabling the objective function to reach the minimum valueA preset noise matrix delta D b
Obtaining the ascending dimension source domain feature distribution and the equal dimension source domain feature distribution after the low dimension source domain feature distribution and the Gao Weiyuan domain feature distribution are mapped and aligned;
The method for acquiring the attention matrix comprises the following steps:
presetting that the concerned matrix is identical to the low-dimensional source domain Gram matrix in rows and columns, and the low-dimensional source domain feature vectors corresponding to elements at the same position in the two matrices are identical;
setting all element values of the attention matrix as preset values;
calculating Euclidean distance between any target feature vector in the target feature distribution at any end and each source domain feature vector in the corresponding source domain feature distribution; acquiring the minimum Top-k Euclidean distances corresponding to each target feature vector and the source domain feature vectors corresponding to the minimum Top-k Euclidean distances as first source domain feature vectors to form a first source domain set;
forming a total set by a plurality of the first source domain sets; calculating the occurrence times of the same first source domain feature vector in the total set, and resetting the element value at the corresponding position of each source domain feature vector in the attention matrix to the occurrence times of the corresponding first source domain feature vector;
normalizing the element values of the reset attention matrix to obtain a normalized attention matrix;
the method for acquiring the mapping alignment result comprises the following steps:
For a low-dimensional target feature vector in low-dimensional target feature distribution, acquiring the first source domain feature vector and the first source domain set corresponding to the low-dimensional target feature vector;
presetting a undetermined parameter sequence, and constructing a linear mathematical model containing undetermined parameters by the undetermined parameter sequence and the first source domain feature vector corresponding to the low-dimensional target feature vector;
the linear mathematical model is:
wherein x is a low-dimensional target feature vector in the low-dimensional target feature distribution; θ q The q-th undetermined parameter in the undetermined parameter sequence; a, a q The q-th first source domain feature vector corresponding to the low-dimensional target feature vector x; q is the number of the first source domain feature vectors;
obtaining the undetermined parameter sequence by using a RANSAC algorithm;
mapping and aligning the first source domain set corresponding to the low-dimensional target feature vector to obtain an up-dimensional first source domain set; obtaining a mapping alignment result of a low-dimensional target feature vector in a low-dimensional target feature distribution corresponding to the low-dimensional source domain feature distribution according to the first source domain set with the ascending dimension and the undetermined parameter;
the mapping aligns the results:
wherein f (x) is a mapping alignment result of the low-dimensional target feature vector; The first source domain feature vector corresponding to the q-th first source domain feature vector;
the obtaining the rationality of each target feature vector in the target feature distribution corresponding to any two source domain feature distributions according to the mapping alignment result of any two source domain feature distributions comprises:
calculating the L2 norm of the difference value between the mapping alignment result and any two upgoing source domain feature vectors in the upgoing source domain feature distribution and the L2 norm of the difference value between the two upgoing source domain feature vectors; taking the average value of the three L2 norms as a first discrete degree;
calculating the L2 norm of the difference value between the mapping alignment result and any two isodimensional source domain feature vectors in the isodimensional source domain feature distribution and the L2 norm of the difference value between the two isodimensional source domain feature vectors; taking the average value of the three L2 norms as a second discrete degree;
obtaining rationality of a target feature vector corresponding to the mapping alignment result according to the first discrete degree, the second discrete degree and absolute values of differences of inner products of source domain feature vectors before and after the corresponding mapping alignment;
the rationality calculation formula is as follows:
Wherein R is x (a, b) is the rationality of any target feature vector x in the low-dimensional target feature distribution a1 corresponding to the low-dimensional source domain feature distribution a and the high-dimensional target feature distribution b1 corresponding to the Gao Weiyuan domain feature distribution b;representing y, z is the upbound source domain feature distribution +.>Any two of the up-dimensional source domain feature vectors; />Representing m, n is the isodimensional source domain feature distribution +.>Any two of the isodimensional source domain feature vectors; ρ x (y, z) is mapping alignment result f (x) with the up-dimensional source domain feature distribution ++>A first degree of discretization of the up-dimensional source domain feature vector y, z; ρ x (m, n) is the mapping alignment result f (x) and the said equidimensional source domain feature distribution ++>A second degree of discretization of the medium-dimensional source domain feature vector m, n; ΔM yz The absolute value of the difference between the inner product of the up-dimensional source domain eigenvector y, z and the inner product of the low-dimensional source domain eigenvector before alignment of the mapping corresponding to y, z.
2. The blockchain-based neural network data set storage method according to claim 1, wherein the obtaining the rationality of the target feature distribution at each end according to the rationality of each target feature vector in the target feature distribution corresponding to any two source domain feature distributions includes:
The reasonable degree calculation formula is as follows:
wherein P is the reasonable degree of the target feature distribution; x epsilon a1 represents that x is any target feature vector x in the low-dimensional target feature distribution a 1; b epsilon z represents b as any one low-dimensional source domain feature distribution in all other low-dimensional source domain feature distribution sets z except the low-dimensional source domain feature distribution a; n represents the number of low-dimensional source domain feature distributions contained in the low-dimensional source domain feature distribution set z.
3. The method for storing a neural network data set based on a blockchain according to claim 1, wherein the determining, according to the reasonable degree of the target feature distribution of each end and a preset judgment condition, a target end capable of storing data to be stored in the blockchain includes:
acquiring final target feature distribution meeting the preset judging condition, and taking the end corresponding to the final target feature distribution as a target end; and the preset judging condition is target characteristic distribution corresponding to the maximum value of the reasonable degree.
CN202111102572.7A 2021-09-20 2021-09-20 Neural network data set storage method based on blockchain Active CN113742428B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111102572.7A CN113742428B (en) 2021-09-20 2021-09-20 Neural network data set storage method based on blockchain

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111102572.7A CN113742428B (en) 2021-09-20 2021-09-20 Neural network data set storage method based on blockchain

Publications (2)

Publication Number Publication Date
CN113742428A CN113742428A (en) 2021-12-03
CN113742428B true CN113742428B (en) 2023-11-03

Family

ID=78740031

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111102572.7A Active CN113742428B (en) 2021-09-20 2021-09-20 Neural network data set storage method based on blockchain

Country Status (1)

Country Link
CN (1) CN113742428B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112100295A (en) * 2020-10-12 2020-12-18 平安科技(深圳)有限公司 User data classification method, device, equipment and medium based on federal learning
CN112784919A (en) * 2021-02-03 2021-05-11 华南理工大学 Intelligent manufacturing multi-mode data oriented classification method
CN113268760A (en) * 2021-07-19 2021-08-17 浙江数秦科技有限公司 Distributed data fusion platform based on block chain

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190332921A1 (en) * 2018-04-13 2019-10-31 Vosai, Inc. Decentralized storage structures and methods for artificial intelligence systems

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112100295A (en) * 2020-10-12 2020-12-18 平安科技(深圳)有限公司 User data classification method, device, equipment and medium based on federal learning
CN112784919A (en) * 2021-02-03 2021-05-11 华南理工大学 Intelligent manufacturing multi-mode data oriented classification method
CN113268760A (en) * 2021-07-19 2021-08-17 浙江数秦科技有限公司 Distributed data fusion platform based on block chain

Also Published As

Publication number Publication date
CN113742428A (en) 2021-12-03

Similar Documents

Publication Publication Date Title
Bai et al. Optimization of deep convolutional neural network for large scale image retrieval
CN112307958B (en) Micro-expression recognition method based on space-time appearance motion attention network
CN110059198B (en) Discrete hash retrieval method of cross-modal data based on similarity maintenance
Khan et al. Fruits diseases classification: exploiting a hierarchical framework for deep features fusion and selection
CN107316013B (en) Hyperspectral image classification method based on NSCT (non-subsampled Contourlet transform) and DCNN (data-to-neural network)
Alshazly et al. Handcrafted versus CNN features for ear recognition
Ding et al. Convolutional neural networks based hyperspectral image classification method with adaptive kernels
Bunte et al. Adaptive local dissimilarity measures for discriminative dimension reduction of labeled data
CN111860674A (en) Sample class identification method and device, computer equipment and storage medium
Bunte et al. Exploratory observation machine (XOM) with Kullback-Leibler divergence for dimensionality reduction and visualization
Zellinger et al. Multi-source transfer learning of time series in cyclical manufacturing
Cheng et al. Supervised hashing with deep convolutional features for palmprint recognition
Li et al. A novel visual codebook model based on fuzzy geometry for large-scale image classification
Aiadi et al. MDFNet: An unsupervised lightweight network for ear print recognition
Chang et al. Progressive dimensionality reduction by transform for hyperspectral imagery
WO2019144608A1 (en) Image processing method, processing apparatus and processing device
ABAWATEW et al. Attention augmented residual network for tomato disease detection andclassification
Yu et al. Affine invariant fusion feature extraction based on geometry descriptor and BIT for object recognition
Guan et al. A novel measure to evaluate generative adversarial networks based on direct analysis of generated images
CN113742428B (en) Neural network data set storage method based on blockchain
Hao et al. Evaluation of ground distances and features in EMD-based GMM matching for texture classification
CN113409351B (en) Unsupervised field self-adaptive remote sensing image segmentation method based on optimal transmission
Jin et al. Blind image quality assessment for multiple distortion image
Li et al. High-order local pooling and encoding Gaussians over a dictionary of Gaussians
Tabejamaat et al. A coding-guided holistic-based palmprint recognition approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20230919

Address after: Zone A, Unit 16-3, Hongxing Erke Group Building, No. 11 Hualian Road, Siming District, Xiamen City, Fujian Province, 361000

Applicant after: Yidian life e-commerce Co.,Ltd.

Address before: 450000 room 2002-26, 20 / F, block B, Xicheng science and technology building, No. 41, Jinsuo Road, high tech Industrial Development Zone, Zhengzhou, Henan Province

Applicant before: Zhengzhou Songhe Lizhi Information Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant