CN113780450B - Distributed storage method and system based on self-coding neural network - Google Patents

Distributed storage method and system based on self-coding neural network Download PDF

Info

Publication number
CN113780450B
CN113780450B CN202111088135.4A CN202111088135A CN113780450B CN 113780450 B CN113780450 B CN 113780450B CN 202111088135 A CN202111088135 A CN 202111088135A CN 113780450 B CN113780450 B CN 113780450B
Authority
CN
China
Prior art keywords
data
error
reasoning
network
obtaining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111088135.4A
Other languages
Chinese (zh)
Other versions
CN113780450A (en
Inventor
秦志伟
张乾坤
董得东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Yunzhi Xin'an Security Technology Co ltd
Original Assignee
Zhengzhou Yunzhi Xin'an Security Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunzhi Xin'an Security Technology Co ltd filed Critical Zhengzhou Yunzhi Xin'an Security Technology Co ltd
Priority to CN202111088135.4A priority Critical patent/CN113780450B/en
Publication of CN113780450A publication Critical patent/CN113780450A/en
Application granted granted Critical
Publication of CN113780450B publication Critical patent/CN113780450B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to the technical field of artificial intelligence, in particular to a distributed storage method and system based on a self-coding neural network, wherein the method comprises the following steps: inputting the training data into a compression network for reasoning to obtain reasoning data, and obtaining a first error according to the difference value between the training data and the reasoning data; constructing a mapping relation between the reasoning data and the corresponding first error; wherein the output of the compression network is not less than the input; acquiring target data to be stored, acquiring a first error corresponding to the target data according to the mapping relation, and acquiring control data according to a difference value between the target data and the first error; inputting control data into a compression network to obtain a shortest hidden layer; and taking the shortest hidden layer data as the representation data, and splitting and combining the representation data to obtain the storage data of each server. The distributed storage technology is more efficient and safer by combining the self-coding technology in the deep learning neural network.

Description

Distributed storage method and system based on self-coding neural network
Technical Field
The invention relates to the field of artificial intelligence, in particular to a distributed storage method and system based on a self-coding neural network.
Background
When the data is stored in a distributed mode, data information is scattered on different devices, in order to improve the fault tolerance of a distributed system, namely, when a single device or a few devices are in fault, the distributed storage system can still operate normally, the problem of the fault of the devices does not affect the stored data, and a plurality of backups are needed to be carried out on the data, but a large amount of storage space is definitely sacrificed to store repeated data while the fault tolerance of the system is improved, and the data is subjected to dimension reduction compression, so that storage resources can be well saved.
The network is used for data dimension reduction, the most common use is a self-coding network, complicated training labels are not needed, and the network is simple in structure and easy to train. But self-encoding networks are reconstruction errors in terms of data recovery, i.e. self-encoding networks can only implement lossy compression and reconstruction.
The current common processing method for data to be stored is as follows: the data is taken as network input data, the data is inferred, and the dimension reduction inference result (hidden code Z) is taken as storage data. And when the data is recovered, carrying out reconstruction reasoning on the reduced-dimension data to obtain reasoning data for use. However, it is apparent that the recovered data is not already largely the original data due to network errors. The processing mode only stores the corresponding reconstruction error data, and cannot accurately recover the original data.
Disclosure of Invention
In order to solve the technical problems, the invention provides a distributed storage method based on a self-coding neural network, which comprises the following steps:
inputting training data into a compression network to perform reasoning and outputting reasoning data, wherein the output of the compression network is not less than the input; obtaining a first error according to the difference value between the training data and the reasoning data; constructing a mapping relation between the reasoning data and the corresponding first error;
obtaining target data to be stored, obtaining a first error corresponding to the target data according to the mapping relation, and obtaining control data according to a difference value between the target data and the first error;
inputting control data into a compression network to obtain a shortest hidden layer; and taking the shortest hidden layer data as the representation data, and splitting and combining the representation data to obtain the data to be stored in each server.
Preferably, the obtaining the first error corresponding to the target data according to the mapping relationship includes:
the mapping relation is a lookup table, the lookup table is constructed according to the reasoning data corresponding to the training data and the first error, and the reasoning data in the lookup table is a real data point; and if the target data to be stored is in the lookup table, acquiring a corresponding first error, otherwise, acquiring two real data points closest to the target data in the lookup table, and presuming the first error corresponding to the target data according to the first error corresponding to the two real data points.
Preferably, the method further comprises:
if the target data to be stored is not in the lookup table, taking the difference value between the target data and the first error as mapping data;
performing iterative reasoning on the mapping data to obtain control data, wherein the iterative reasoning comprises the following steps: inputting the mapping data into a compression network for reasoning to obtain actual reasoning data, and obtaining a second error according to the difference value between the target data and the actual reasoning data; and if the second error is zero, stopping iteration, wherein the mapping data are control data, otherwise, adjusting the mapping data according to the second error and carrying out iterative reasoning on the adjusted mapping data.
Preferably, the adjusting the mapping data according to the second error includes:
performing curve fitting to a first curve according to the reasoning data and the training data corresponding to the reasoning data; the first curve is adjusted according to the actual reasoning data and the second error corresponding to the actual reasoning data, two real data points on the first curve, which are closest to the actual reasoning data, are obtained, and an adjustment amount is obtained according to the slope of the first curve segment between the second error and the two real data points; and adjusting the mapping data according to the adjustment amount.
Preferably, the method further comprises:
obtaining iteration necessity according to the ratio of the second error to the first error and the uncertainty of the target data corresponding to the first error; and when the iteration necessity is smaller than the set threshold value, no iteration reasoning is carried out, and hidden layer data and a second error are stored.
Preferably, the method further comprises:
and obtaining uncertainty of a first error corresponding to the target data according to the average interval quantity of the adjacent training data, the interval quantity between the two real data points and the interval quantity between the target data and the two real data points.
Preferably, the penalty coefficient is used for correcting the difference value between the input and the output of the compression network to obtain the loss of the compression network; the output of the compression network is controlled to be not smaller than the input through loss, and the penalty coefficient is specifically:
obtaining the current error direction abnormality degree according to the current input and output difference value of the compression network and the influence coefficient; obtaining the outlier degree of the current error according to the difference value of the current input and output of the compression network and the average value of the difference values of all the input and output; and carrying out weighted summation on the error direction abnormality degree and the outlier degree of the error to obtain a penalty coefficient.
The invention also provides a distributed storage system based on the self-coding neural network, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, and is characterized in that the computer program is executed by the processor to realize the steps of the distributed storage method based on the self-coding neural network.
The embodiment of the invention has at least the following beneficial effects:
the error distribution of the compression network meets the requirement of a distributed system by improving the loss function of the compression network; and splitting and combining the representation data of the target data, so that the data reading efficiency and the fault tolerance rate are improved, and the distributed storage technology is more efficient and safer.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a method.
Detailed Description
In order to further describe the technical means and effects adopted by the present invention to achieve the preset purpose, the following detailed description refers to specific implementation, structure, characteristics and effects of a distributed storage method and system based on a self-coding neural network according to the present invention, with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The main purpose of the invention is as follows: and processing the data by using the deep learning neural network to obtain coded multiple copy data so as to solve the requirements of the distributed storage system on data consistency and fault tolerance. In order to realize the content of the invention, the invention designs a distributed storage method and a system based on a self-coding neural network. The distributed storage technology is more efficient and safe by combining the self-coding technology in the deep learning neural network. The invention aims at the following situations: the data is subjected to dimension reduction compression to obtain the consistent representation data, and then the representation data is subjected to combined distribution to realize high-fault-tolerance distributed storage.
The following specifically describes a specific scheme of a distributed storage method and system based on a self-coding neural network.
Referring to fig. 1, a flowchart illustrating a distributed storage method based on a self-coding neural network according to an embodiment of the present invention is shown, and the method includes the following steps:
step 1, inputting training data into a compression network to perform reasoning and outputting reasoning data, wherein the output of the compression network is not less than the input; obtaining a first error according to the difference value between the training data and the reasoning data; and constructing a mapping relation between the reasoning data and the corresponding first error.
In this embodiment, the compression network adopts a self-coding network, and the self-coding network is divided into a coding part and a decoding part, and the input and the output should be equal in theory.
Currently, there are also data processing in connection with self-encoding networks. However, operations such as dimension reduction, compression, encryption, etc. of data often require high precision, and even complete error-free recovery of the data. The neural network itself has the disadvantage of uncertainty, i.e. in fact, recovering data from the encoded network does not achieve the absolute lossless recovery that the system would like at all. How to improve the loss function used in training again does not allow loss to converge to 0 (loss is 0, meaning that there is no longer any error in the reasoning of the network). The network just makes the network approach to 0, and when the true convergence is 0, the problem of network overfitting occurs. DNN networks are only limited in accuracy in finding approximate results. The need to combine DNN technology with distributed storage requires that the problem of data accuracy be addressed.
At present, when DNN processes data, a common process is to infer target data as network input and output of a network middle layer as representation data of the target data. But this approach ignores or bypasses network uncertainties and inaccuracies. The uncertainty refers to that when the network takes target data as input to carry out reasoning reconstruction, the output data of the network is uncertain, and the target data (just lossless at the moment) can be data with large errors or data with small errors. For example, the input data 22 is encoded and decoded by the self-encoding network, and then reconstructed to obtain the corresponding output data M, and the value of M is unknown before reasoning. Possibly 22, possibly 20, and possibly even 15. The inaccuracy is that the degree of error of the network reasoning is uncertain, i.e. the loss convergence value can only represent the overall average error of the training data. The error of the network reconstruction is uncertain when processing data within a new, non-training data set. Therefore, the problem is: the hidden layer output data used is not representative data of the target data, but representative data of M. For example, it is desirable to reconstruct the [10,15,29,30] data inferentially for representation using the compressed hidden layer output data z 1. But the reconstructed data of the network reasoning has errors of [13,17,30,28], so the stored z1 corresponds to [13,17,30,28], and the resulting data is [13,17,30,28] even when the decoder performs the data reconstruction reasoning.
Only the output of the network is controlled so that the network is reconstructed to [10,15,29,30], the corresponding zl is correct, and the required stored data (the network has reconstruction errors, so the data is processed to obtain the representation data, and the target data cannot be taken as input data but is output data).
Firstly, training a compression network suitable for a distributed storage system so as to be used for compressing data, otherwise, the same data is backed up for multiple copies, the occupied memory is too large, and the cost performance is lower. The invention enables the output of the compression network to be no less than the input by improving the loss function required to train the compression network. The loss of the compression network is specifically: obtaining the current error direction abnormality degree according to the current input and output difference value of the compression network and the influence coefficient; obtaining the outlier degree of the current error according to the difference value of the current input and output of the compression network and the average value of the difference values of all the input and output; and carrying out weighted summation on the error direction abnormality degree and the outlier degree of the error to obtain a penalty coefficient. And correcting the difference value between the input and the output of the compression network by using the punishment coefficient to obtain the loss of the compression network.
In particular, to minimize the uncertainty of the self-coding network reasoning process, the loss function is improved in combination with the special requirements of the current scenario. Uncertainty refers to the uncertainty of the positive and negative of the error and the non-uniformity of the magnitude. The loss function used is an improvement of the mean square error loss function, and the form is as follows:
wherein ε is a penalty coefficient, TL n BQ is the network reasoning value of the nth data n The tag value for the nth data, N is the total number of data for the training batch.
To facilitate searching later, it is desirable to ensure that the error is unidirectional, i.e., that the output is always greater than the input (or that the output is less than the input), so that, after the output is obtained later, the input can be scaled up (or down) by finding that it is less than the target data, rather than searching on both sides without directionality. Meanwhile, errors are consistent as much as possible, and the randomness of the errors is reduced. Therefore, the mean square error loss is improved, and the penalty coefficient epsilon is increased, namely, when the error is a positive value (TL > BQ) and the outlier degree is small, the normal is affected and is approximate to 1. When the error is negative (TL < BQ) and the outlier is large, the influence of the error is enlarged, so that the corresponding neuron weight can be preferentially adjusted.
Calculating the degree of error direction abnormality P n
pw n =TL n -BQ n
Wherein, gamma is the influence coefficient of the error direction, the empirical value is 1, and the implementer can adjust according to the actual requirement. When the difference is negative, the corresponding network weight parameters are preferably modified according to the error, and finally the reasoning values in the network training result are all larger than the input value, so that the unidirectionality of the error is ensured.
Calculating the outlier degree Q of the error:
in which Q n Is the outlier degree of the current error, i.e. the difference between the current error and the average error.
Therefore, the penalty coefficient ε is calculated as follows:
ε=α*P n +β*Q n
in the formula, alpha and beta are respectively corresponding influence coefficients of direction and difference values, an implementer can adjust according to actual running conditions, and experience values in the embodiment are all 1.
Then, inputting the training data into a compression network to perform reasoning and outputting reasoning data, and obtaining a first error according to the difference value between the training data and the reasoning data; and constructing a mapping relation between the reasoning data and the corresponding first error. In this embodiment, the mapping relationship is a lookup table, and the lookup table is constructed according to the reasoning data and the first error corresponding to the training data, where the reasoning data in the lookup table is a real data point.
After the compression network training is completed, i.e. after loss convergence is stable, all weight parameters in the network are fixed. The training data is reasoned again using the current network. And (3) carrying out reasoning on the training data again by using the trained network (with fixed parameters) to obtain the reasoning data TL corresponding to each training data YJ. The reasoning operation is to take training data YJ as network input, calculate by the network using internally determined parameters, and output reasoning data TL.
First, a first error WC1 is obtained, namely, corresponding errors of input and output of a compression network are obtained:
WC1=TL-YJ
after the first errors are obtained, a corresponding lookup table is obtained according to the reasoning data TL of all training data and the corresponding first errors WC1, i.e. each output (reasoning data) corresponds to a first error. The look-up table is dynamically updatable after each use. Meanwhile, training data is used as a horizontal axis coordinate, reasoning data is used as a vertical axis coordinate to draw data scattered points, and then fitting is performed to obtain a corresponding first curve.
And 2, acquiring target data to be stored, acquiring a first error corresponding to the target data according to the mapping relation, and acquiring control data according to a difference value between the target data and the first error. And if the target data to be stored is in the lookup table, acquiring a corresponding first error, otherwise, acquiring two real data points closest to the target data in the lookup table, and presuming the first error corresponding to the target data according to the first error corresponding to the two real data points.
When we train the network, limited, discrete data is used, while the actual digital space must be continuous, endless. If the target data to be stored is in the lookup table, the corresponding first error can be directly found, the reasoning data in the lookup table is called as the real data point, the corresponding mapping data (control data) can be obtained through simple calculation, and then the data can be compressed and encrypted. If the target data to be stored does not exist in the data lookup table, the corresponding first error needs to be estimated.
The first error and corresponding uncertainty can be achieved in two ways.
First embodiment: if the data is not a real data point, the invention refers to a virtual data point, and the corresponding first error needs to be estimated according to the real data points nearest to the two sides of the virtual data point. The calculation process of the first error WC1 at this time is:
WC1=WC z +f*zd
in WC z Representing a first error corresponding to a real data point to the left of the current virtual data point (the real data point closest to and smaller than the virtual data point), WC y And representing a first error corresponding to a real data point (a real data point which is closest to the virtual data point and is larger than the virtual data point) on the right side of the current virtual data point, wherein f is a proportional relation obtained by the slope, and zd is the distance between the current virtual data point and the real data point.
Meanwhile, the uncertainty pb corresponding to the first error is calculated as follows, and the uncertainty is used for determining the necessity of iterative reasoning:
if the target data to be stored is in the data lookup table, the uncertainty pb is 0; otherwise, the first error corresponding to the data needs to be guessed, and the uncertainty pb exists.
Where PD is the average amount of separation of adjacent data in the training data and KD is the amount of separation between real data points on both sides of the data. zd is the amount of separation of the target data to be stored to the left-hand real point, yd is the amount of separation of the data to the right-hand real point.The larger the interval between two real data points, the less accurate the error of its internal estimate, which is proportional. />The closer the current data is to the real data point on one side, the more accurate the error estimate is, inversely proportional. Further, normalization processing is carried out on the pb, so that the value range of all the pb is within the range of [0,1]]Between them.
The second embodiment performs error data acquisition and analysis by training an error inference network. The input of the obtained error reasoning network is reasoning data YJ, and the output is first error WC1 and corresponding uncertainty pb. All training data and their corresponding tag data (first error and uncertainty) are obtained based on the first embodiment, followed by construction of a fully connected network (FC) with a number of input layer neurons of M and a number of output layer neurons of 2*M. I.e. each input data corresponds to a first error and uncertainty data. The training data TL may be obtained randomly, and the corresponding first error WC1 and uncertainty pb may be calculated according to the above manner. The loss function of the network uses a mean square error loss function. With use, the data in the data lookup table is continuously amplified, and the identities of some virtual data points are also converted into real data points, so that the corresponding uncertainties are also changed. Therefore, the neural network needs to be retrained at regular time.
The first error WC1 may be obtained by real data points on both sides of the target data MB, or the first error WC1 may be obtained by inference from an error inference network by taking the target data MB as input. Mapping data ys=mb-WC 1 is obtained from MB, WC1.
If the target data to be stored is not in the lookup table, taking the difference value between the target data and the first error as mapping data; performing iterative reasoning on the mapping data to obtain control data, wherein the iterative reasoning comprises the following steps: inputting the mapping data into a compression network for reasoning to obtain actual reasoning data, and obtaining a second error according to the difference value between the target data and the actual reasoning data; and if the second error is zero, stopping iteration, wherein the mapping data are control data, otherwise, adjusting the mapping data according to the second error and carrying out iterative reasoning on the adjusted mapping data. The adjusting the mapping data according to the second error includes: performing curve fitting to a first curve according to the reasoning data and the training data corresponding to the reasoning data; the first curve is adjusted according to the actual reasoning data and the second error corresponding to the actual reasoning data, two real data points on the first curve, which are closest to the actual reasoning data, are obtained, and an adjustment amount is obtained according to the slope of the first curve segment between the second error and the two real data points; and adjusting the mapping data according to the adjustment amount. In particular, obtaining iteration necessity according to the ratio of the second error to the first error and the uncertainty of the target data corresponding to the first error; when the iteration necessity is smaller than the set threshold, no iteration reasoning is performed, and hidden layer data and error data are stored at the same time.
Specifically, the obtained mapping data YS is used as an input of the compression network, and the corresponding actual reasoning data SJ is obtained by the compression network. Calculating a second error WC2:
WC2=MB-SJ
if wc2=0, the map data at this time can be used as the control data SR of the target data MB.
If wc2+.0, then the error of the fitting at this time is not accurate, so that there is a difference between the mapping data and the desired control data SR. It is necessary to make an adjustment again on the basis of the map data YS until the second error WC2 satisfies the requirement (equal to 0 or smaller than the set error).
The necessity of iteration is calculated to judge whether iteration adjustment is needed or not:
by=1-nt
in the formula, pb is 0,1, nd is iteration difficulty, and is 0,1, WC2 is 0 in normal prediction, the larger the second error obtained at the moment is, the more difficult the adjustment is indicated and the larger the prediction deviation is, and meanwhile, the higher the inaccuracy is, which means that the further the virtual data point is from the real data point, the more difficult the iteration process of obtaining the real data point is.
When the iteration necessity by is smaller than the set threshold (the threshold is set to 0.5 in this embodiment), it is considered that the degree of obtaining the optimal data by the iteration adjustment is harder. Therefore, no iterative adjustment is performed, and both hidden layer data and error data are stored. The number of bytes of stored data dq=mg+mc+md at this time. Wherein, the minimum byte number Mg of the hidden layer is obtained at the same time, WC2 is obtained, and the byte numbers Mc and md are the byte numbers corresponding to the marker, namely the byte numbers used for separating the data and the error data.
When the iteration necessity by is greater than the set threshold, it is considered necessary to perform a plurality of iterations to adjust the mapping data YS so that the lossless input SR corresponding to MB is finally obtained. The specific process of iterative adjustment is as follows:
(1) The adjustment direction is obtained, which means whether the adjustment amount is added or subtracted on the basis of the current mapping data YS. According to the rule that the first curve can obtain the adjustment direction, the second error type is used for judging the adjustment direction, specifically: when WC2>0, WC1 is decreased, i.e., the input data YS is increased. When WC2<0, WC1 is increased, i.e., the input data YS is decreased. The positive and negative of WC2 and the change required by YS are consistent.
(2) Obtaining an adjustment amount:
the adjustment amount is related to the current second error, and the adjustment amount cl is:
where tan θ is the slope of the first curve segment between two real data points. The WC2 has different values and corresponding data indices A, B. When WC2>0, a large true point needs to be used to subtract the current inference point, and when WC2<0, a small true point needs to be used to subtract the current inference point.
Therefore, the adjustment direction is integrated to obtain the adjustment amount:
wherein, each adjustment involves three points altogether, the former real point (angle-1), the current point (angle-0), and the latter real point (angle-1). The real points are data points obtained by reasoning by a compressed network instead of fitting the guessed fitting points. And (3) regulating the mapping data according to the regulating quantity to obtain YS+tz, inputting the YS+tz as the mapping data into a compression network for iteration to obtain new output data, and obtaining a new second error WC2 in the same way.
And continuously performing cyclic adjustment, and obtaining adjustment quantity according to the new WC1 and WC2 until the WC2 meets the requirement, wherein the corresponding mapping data YS is the required control data SR.
Step 3, inputting the control data into a compression network to obtain a shortest hidden layer; and taking the shortest hidden layer data as the representation data, and splitting and combining the representation data to obtain the data to be stored in each server.
The target data MB can be recovered by reasoning the SR input compression network. And marking the hidden layer data with the minimum byte number in the reasoning process as the representation data of the target data MB as Mg. Splitting the representation data Mg so that the distributed storage data can enhance the reading efficiency and fault tolerance of the data, and then distributing the split data to each distributed storage server.
Preferably, the distributed storage may be performed as follows: the number R of servers capable of storing data is obtained, and the byte number Ge of Mg is obtained. If R > =ge, i.e. the number of bytes of the server is greater than Mg, at this time, mg is first shifted and combined to obtain storage data, and then stored in each server. For example: the set of data to be stored is {1,2,3,4,5}, the data of each server in 7 servers is {1,2,3,4,5}, {2,3,4,5,1}, {3,4,5,1,2}, {4,5,1,2,3}, {5,1,2,3,4}, {1,2,3,4,5}, {2,3,4,5,1}, the number of bytes of data is 5, and the number of servers is 7, so after the data is needed to be supplemented, the data is shifted to be stored in each server, and each server only reads the first byte to obtain the complete data. If R < Ge, namely the number of bytes is more than the number of servers, the data needs to be uniformly segmented, and the number of bytes in each segment is as follows: ge/R. For example: the data to be stored are {1,2,3,4,5,6,7}, the data of each server in 3 servers are {1,2,3,4,5,6,7}, {3,4,5,6,7,1,2}, {5,6,7,1,2,3,4}, the number of bytes of the data is 7, and the number of servers is 3, so the data are uniformly divided into 2, 2 and 3 segments, and each segment is sequentially shifted to obtain the data stored in each server.
Example 2:
the embodiment provides a distributed storage system based on a self-coding neural network, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, and is characterized in that the computer program is executed by the processor to realize the steps of the distributed storage method based on the self-coding neural network.
It should be noted that: the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. And the foregoing description has been directed to specific embodiments of this specification. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims (5)

1. A distributed storage method based on a self-encoding neural network, the method comprising the steps of:
inputting training data into a compression network to perform reasoning and outputting reasoning data, wherein the output of the compression network is not less than the input; obtaining a first error according to the difference value between the training data and the reasoning data; constructing a mapping relation between the reasoning data and the corresponding first error;
obtaining target data to be stored, obtaining a first error corresponding to the target data according to the mapping relation, and obtaining control data according to a difference value between the target data and the first error;
inputting control data into a compression network to obtain a shortest hidden layer; taking the shortest hidden layer data as representing data, splitting and combining the representing data to obtain data to be stored in each server;
correcting the difference value between the input and output of the compression network by using a punishment coefficient to obtain the loss of the compression network; the output of the compression network is controlled to be not smaller than the input through loss, and the penalty coefficient is specifically:
obtaining the current error direction abnormality degree according to the current input and output difference value of the compression network and the influence coefficient; obtaining the outlier degree of the current error according to the difference value of the current input and output of the compression network and the average value of the difference values of all the input and output; carrying out weighted summation on the error direction abnormality degree and the outlier degree of the error to obtain a punishment coefficient;
correcting the difference value between the input and the output of the compression network by using a punishment coefficient to obtain a loss function of the compression network, wherein the loss function is as follows:
where e is the loss function, ε is the penalty coefficient, TL n BQ is the network reasoning value of the nth data n The label value of the nth data, N is the total data quantity of the training batch;
calculating the degree of error direction abnormality P n
pw n =TL n -BQ n
Wherein γ is the influence coefficient of the error direction, pw n The difference value between the network reasoning value and the label value of the nth data;
calculating the outlier degree Q of the error:
in which Q n The outlier degree of the current error;
the penalty coefficient ε is calculated as follows:
ε=α*P n +β*Q n
wherein alpha and beta are respectively direction and difference value corresponding influence coefficients;
the obtaining the first error corresponding to the target data according to the mapping relation includes:
the mapping relation is a lookup table, the lookup table is constructed according to the reasoning data corresponding to the training data and the first error, and the reasoning data in the lookup table is a real data point; if the target data to be stored is in the lookup table, acquiring a corresponding first error, otherwise, acquiring two real data points closest to the target data in the lookup table, and presuming the first error corresponding to the target data according to the first error corresponding to the two real data points;
the method further comprises the steps of:
if the target data to be stored is not in the lookup table, taking the difference value between the target data and the first error as mapping data;
performing iterative reasoning on the mapping data to obtain control data, wherein the iterative reasoning comprises the following steps: inputting the mapping data into a compression network for reasoning to obtain actual reasoning data, and obtaining a second error according to the difference value between the target data and the actual reasoning data; and if the second error is zero, stopping iteration, wherein the mapping data are control data, otherwise, adjusting the mapping data according to the second error and carrying out iterative reasoning on the adjusted mapping data.
2. The method of claim 1, wherein adjusting the mapping data based on the second error comprises:
performing curve fitting to a first curve according to the reasoning data and the training data corresponding to the reasoning data; the first curve is adjusted according to the actual reasoning data and the second error corresponding to the actual reasoning data, two real data points on the first curve, which are closest to the actual reasoning data, are obtained, and an adjustment amount is obtained according to the slope of the first curve segment between the second error and the two real data points; and adjusting the mapping data according to the adjustment amount.
3. The method according to claim 2, wherein the method further comprises:
obtaining iteration necessity according to the ratio of the second error to the first error and the uncertainty of the target data corresponding to the first error; and when the iteration necessity is smaller than the set threshold value, no iteration reasoning is carried out, and hidden layer data and a second error are stored.
4. A method according to claim 3, characterized in that the method further comprises:
and obtaining uncertainty of a first error corresponding to the target data according to the average interval quantity of the adjacent training data, the interval quantity between the two real data points and the interval quantity between the target data and the two real data points.
5. A distributed storage system based on a self-encoding neural network, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the computer program when executed by the processor implements the steps of the method according to any one of claims 1 to 4.
CN202111088135.4A 2021-09-16 2021-09-16 Distributed storage method and system based on self-coding neural network Active CN113780450B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111088135.4A CN113780450B (en) 2021-09-16 2021-09-16 Distributed storage method and system based on self-coding neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111088135.4A CN113780450B (en) 2021-09-16 2021-09-16 Distributed storage method and system based on self-coding neural network

Publications (2)

Publication Number Publication Date
CN113780450A CN113780450A (en) 2021-12-10
CN113780450B true CN113780450B (en) 2023-07-28

Family

ID=78851595

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111088135.4A Active CN113780450B (en) 2021-09-16 2021-09-16 Distributed storage method and system based on self-coding neural network

Country Status (1)

Country Link
CN (1) CN113780450B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107122809A (en) * 2017-04-24 2017-09-01 北京工业大学 Neural network characteristics learning method based on image own coding
CN108921343A (en) * 2018-06-26 2018-11-30 浙江工业大学 Based on storehouse self-encoding encoder-support vector regression traffic flow forecasting method
CN110751264A (en) * 2019-09-19 2020-02-04 清华大学 Electricity consumption mode identification method based on orthogonal self-coding neural network

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106713385A (en) * 2015-11-13 2017-05-24 中国电信股份有限公司 Distributed storage redundant data compression method and system, client and server
EP3443735A4 (en) * 2016-04-12 2019-12-11 Quidient, LLC Quotidian scene reconstruction engine
CN106707099B (en) * 2016-11-30 2019-04-12 国网上海市电力公司 Monitoring and positioning method based on abnormal electricity consumption detection model
WO2018148195A1 (en) * 2017-02-08 2018-08-16 Marquette University Robotic tracking navigation with data fusion
NZ759818A (en) * 2017-10-16 2022-04-29 Illumina Inc Semi-supervised learning for training an ensemble of deep convolutional neural networks
CN108304556B (en) * 2018-02-06 2019-06-07 中国传媒大学 The personalized recommendation method combined based on content with collaborative filtering
CN108764064A (en) * 2018-05-07 2018-11-06 西北工业大学 SAR Target Recognition Algorithms based on Steerable filter device and self-encoding encoder
CN109783603B (en) * 2018-12-13 2023-05-26 平安科技(深圳)有限公司 Text generation method, device, terminal and medium based on self-coding neural network
US11210554B2 (en) * 2019-03-21 2021-12-28 Illumina, Inc. Artificial intelligence-based generation of sequencing metadata
US20200334680A1 (en) * 2019-04-22 2020-10-22 Paypal, Inc. Detecting anomalous transactions using machine learning
CN110119447B (en) * 2019-04-26 2023-06-16 平安科技(深圳)有限公司 Self-coding neural network processing method, device, computer equipment and storage medium
CN112100645A (en) * 2019-06-18 2020-12-18 中国移动通信集团浙江有限公司 Data processing method and device
CN110550518B (en) * 2019-08-29 2020-07-28 电子科技大学 Elevator operation abnormity detection method based on sparse denoising self-coding
CN110929843A (en) * 2019-10-29 2020-03-27 国网福建省电力有限公司 Abnormal electricity consumption behavior identification method based on improved deep self-coding network
CN111401236A (en) * 2020-03-16 2020-07-10 西北工业大学 Underwater sound signal denoising method based on self-coding neural network
CN113191439A (en) * 2021-05-10 2021-07-30 中南大学 Deviation punishment enhanced stacking automatic encoder processing method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107122809A (en) * 2017-04-24 2017-09-01 北京工业大学 Neural network characteristics learning method based on image own coding
CN108921343A (en) * 2018-06-26 2018-11-30 浙江工业大学 Based on storehouse self-encoding encoder-support vector regression traffic flow forecasting method
CN110751264A (en) * 2019-09-19 2020-02-04 清华大学 Electricity consumption mode identification method based on orthogonal self-coding neural network

Also Published As

Publication number Publication date
CN113780450A (en) 2021-12-10

Similar Documents

Publication Publication Date Title
CN109815223B (en) Completion method and completion device for industrial monitoring data loss
DE102018121905A1 (en) Method and apparatus for quantizing artificial neural networks
US20200134463A1 (en) Latent Space and Text-Based Generative Adversarial Networks (LATEXT-GANs) for Text Generation
CN111310852B (en) Image classification method and system
CN111078911A (en) Unsupervised hashing method based on self-encoder
CN110705711A (en) Quantum state information dimension reduction coding method and device
US20200372340A1 (en) Neural network parameter optimization method and neural network computing method and apparatus suitable for hardware implementation
CN110892419A (en) Stop-code tolerant image compression neural network
Yoon et al. Bitwidth heterogeneous federated learning with progressive weight dequantization
CN114332500A (en) Image processing model training method and device, computer equipment and storage medium
CN113780450B (en) Distributed storage method and system based on self-coding neural network
CN115525771A (en) Context data enhancement-based learning method and system for representation of few-sample knowledge graph
CN111898799B (en) BFA-Elman-based power load prediction method
Li et al. Online Bayesian dictionary learning for large datasets
Zhou et al. Improving robustness of random forest under label noise
CN112667394B (en) Computer resource utilization rate optimization method
CN115660096A (en) Quantum random walking error correction method based on multiple particles
CN114745104A (en) Information transmission method for eliminating noise interference based on multi-dimensional quantum error correction
CN114301889A (en) Efficient federated learning method and system based on weight compression
CN114595802A (en) Data compression-based impulse neural network acceleration method and device
CN113177627A (en) Optimization system, retraining system, and method thereof, and processor and readable medium
Zhu et al. End-to-end topology-aware machine learning for power system reliability assessment
Wei et al. Compression and storage algorithm of key information of communication data based on backpropagation neural network
Han et al. Online aware synapse weighted autoencoder for recovering random missing data in wastewater treatment process
CN117633712B (en) Sea level height data fusion method, device and equipment based on multi-source data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant