CN115857823A

CN115857823A - Distributed compression storage method based on data sharing

Info

Publication number: CN115857823A
Application number: CN202211660453.8A
Authority: CN
Inventors: 祁朋涛
Original assignee: Hangyin Consumer Finance Co ltd
Current assignee: Hangyin Consumer Finance Co ltd
Priority date: 2022-12-23
Filing date: 2022-12-23
Publication date: 2023-03-28

Abstract

The invention relates to the technical field of electric digital data processing, in particular to a distributed compression storage method based on data sharing. The method comprises the following steps: obtaining an uplink adjustment coefficient and a downlink adjustment coefficient of data according to uplink data and downlink data corresponding to the data in the characteristic sequence; obtaining the basic tolerance of data according to the uplink adjustment coefficient and the downlink adjustment coefficient of the data in the characteristic sequence and the uplink data and the downlink data of the data; obtaining tolerance weight of data according to frequency number of the data appearing in the characteristic sequence, uplink data and downlink data of the data; obtaining a final tolerance of each data based on the basic tolerance and the tolerance weight of each data; and compressing the data sequence corresponding to the image by using the final tolerance of each piece of data, and sharing the compressed data to each secondary server for distributed storage. The invention can improve the compression effect of the image data through the obtained self-adaptive final tolerance.

Description

Distributed compression storage method based on data sharing

Technical Field

The invention relates to the technical field of electric digital data processing, in particular to a distributed compression storage method based on data sharing.

Background

In recent years, internet is rapidly developed, large data is applied to more and more industries, the data density is higher and higher, and accordingly, the generated data volume is correspondingly larger and larger, especially, the storage of image data of a client has a serious influence on the storage cost, so that the data volume during storage needs to be reduced by compressing when the large data is stored, and the storage cost is reduced.

When picture data of a client is stored, because of a unique visual identification system of human eyes, a part of image information can be allowed to be lost, namely, the image data which needs to be stored by the client can be compressed by using a lossy compression algorithm, namely, the image data can be converted into one-dimensional data, the image data is compressed by using a revolving door compression algorithm, in the revolving door compression algorithm, tolerance is a factor for determining distortion and compression rate, and the tolerance is fixed in the existing revolving door compression algorithm, so that the compression effect on data with different changes is not good, namely, the compression rate and the distortion rate of the data can not be guaranteed to reach a reasonable balance point.

Disclosure of Invention

In order to solve the problem that the tolerance in the existing revolving door compression algorithm is a fixed value and has a poor effect on data compression with different variation trend characteristics, the invention aims to provide a distributed compression storage method based on data sharing, and the adopted technical scheme is as follows:

one embodiment of the invention provides a distributed compression storage method based on data sharing, which comprises the following steps:

the gray value of one channel of each pixel point of the image forms a one-dimensional data sequence; removing redundant data in the data sequence to obtain a characteristic sequence;

acquiring uplink data and downlink data corresponding to one data in the SDT compression algorithm process of the characteristic sequence, and acquiring an uplink adjustment coefficient and a downlink adjustment coefficient of the data based on the uplink data and the downlink data;

obtaining the basic tolerance of data according to the uplink adjustment coefficient and the downlink adjustment coefficient of the data in the characteristic sequence and the uplink data and the downlink data of the data;

obtaining tolerance weight of data according to frequency number of the data appearing in the characteristic sequence, uplink data and downlink data of the data;

obtaining a final tolerance of each data based on the basic tolerance and the tolerance weight of each data; and compressing the data sequence corresponding to the image by using the final tolerance of each piece of data, and sharing the compressed data to each secondary server for distributed storage.

Preferably, the acquiring of the uplink data and the downlink data corresponding to one data of the feature sequence in the SDT compression algorithm process includes: selecting one data, obtaining data which is equal to the data in the characteristic sequence, and recording the data as equal data; obtaining data which is backward adjacent to each equal data and is larger than the equal data, and recording the data as uplink data of the data; and obtaining data which is backward adjacent to each equal data and is smaller than the equal data, and recording the data as downlink data of the data.

Preferably, obtaining the uplink adjustment coefficient and the downlink adjustment coefficient of the data based on the uplink data and the downlink data includes:

obtaining the ratio of the number of uplink data corresponding to one data in the characteristic sequence to the sum of the number of the uplink data and the number of the downlink data; obtaining the reciprocal and standard deviation of the average value of all uplink data; the product of the ratio, the reciprocal of the average value and the standard deviation is a first calculation parameter of the uplink data, and a second calculation parameter of the downlink data is obtained in the same way; and obtaining the result of adding the first calculation parameter and the second calculation parameter, wherein the ratio of the first calculation parameter to the result of adding is the uplink adjustment coefficient of the data, and the ratio of the second calculation parameter to the result of adding is the downlink adjustment coefficient of the data.

Preferably, obtaining the basic tolerance of the data according to the uplink adjustment coefficient and the downlink adjustment coefficient of the data in the feature sequence, and the uplink data and the downlink data of the data includes: taking an uplink adjustment coefficient of one data in the characteristic sequence as the weight of the average value of the difference value of each uplink data and the data, taking a downlink adjustment coefficient as the weight of the average value of the difference value of the data and each downlink data, and summing to obtain a summation result; and calculating the average value of the summation result, and taking the average value as the basic tolerance of the data.

Preferably, obtaining the tolerance weight of a data according to the frequency of occurrence of the data in the feature sequence, the uplink data and the downlink data of the data comprises: dividing the same data in the characteristic sequence into one class, and obtaining the quantity of each class of data and the total class quantity; acquiring uplink data and a difference value between the downlink data and the data; and obtaining the tolerance weight of the data according to the frequency of occurrence of the data, the quantity of each type of data in the characteristic sequence, the total class number and the absolute value of the ratio of the difference value to the data.

Preferably, the tolerance weight of the data is:

wherein the content of the first and second substances,

representing a tolerance weight of jth data in the feature sequence; />

Representing the frequency of occurrence of the jth data in the signature sequence; />

Expressed as the fifth or fifth degree>

The number of data in the class data, the ^ h>

The size of the data in the class data is->

；/>

Representing the total number of categories in the data sequence; />

And &>

Respectively represent the dataThe number of uplink data and the number of downlink data; />

The ^ th of the upstream data and the downstream data representing the data>

A piece of data; />

An exponential function with a natural constant e as the base is shown.

Preferably, the deriving the final tolerance for each datum based on the base tolerance and the tolerance weight for each datum comprises: and obtaining the sum of the first preset value and the tolerance weight, and multiplying the ratio of the sum to the second preset value by the basic tolerance to obtain the final tolerance.

Preferably, removing redundant data in the data sequence to obtain a feature sequence includes: in the characteristic sequence, only one data in the continuously repeated data is reserved, and the discontinuously repeated data is directly reserved, so that the characteristic sequence is obtained.

The embodiment of the invention at least has the following beneficial effects: according to the method, the image data are converted into the one-dimensional sequence data, the image data can be integrally analyzed, and meanwhile, the characteristic sequence of the data sequence is obtained, the characteristic sequence not only can represent the characteristics of the data in the data sequence, but also is small in data quantity, and each data can be more conveniently analyzed; obtaining uplink data and downlink data of one data in the characteristic sequence, then respectively obtaining an uplink adjustment coefficient and a downlink adjustment coefficient according to the uplink data and the downlink data, using the uplink adjustment coefficient and the downlink adjustment coefficient as adjustment coefficients when calculating the basic tolerance of the data, and then analyzing the variation characteristics of the data in the characteristic sequence by combining the uplink data and the downlink data of the data, thereby obtaining a more appropriate basic tolerance; and then obtaining tolerance weight by combining the uplink data and the downlink data of the data according to the frequency number of the data appearing in the feature sequence, adjusting the basic tolerance to obtain final tolerance, wherein the final tolerance of each data self-adaption is obtained by combining the variation features of each data in the feature sequence, so that the compression effect of compressing the data in the data sequence converted from the image is improved, and the data after the compression while preventing is shared to each secondary server for distributed storage, so that the compression rate and the distortion rate reach a reasonable balance, namely the compression rate is improved while the distortion rate is kept in a reasonable range, and the observation of the decompressed image data by human eyes is not influenced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions and advantages of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart of a method for a distributed compressed storage method based on data sharing according to an embodiment of the present invention.

Detailed Description

To further illustrate the technical means and effects of the present invention for achieving the predetermined objects, the following detailed description of a distributed compression storage method based on data sharing according to the present invention with reference to the accompanying drawings and preferred embodiments shows the following detailed descriptions of the specific implementation, structure, features and effects thereof. In the following description, different "one embodiment" or "another embodiment" refers to not necessarily the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The following describes a specific scheme of the distributed compression storage method based on data sharing in detail with reference to the accompanying drawings.

Example (b):

the main application scenarios of the invention are as follows: when the image data of the client side is stored in a sharing and distributed mode, in order to reduce the storage capacity of the data, the picture data transmitted by the client side is compressed by using a revolving door compression algorithm.

The purpose of the invention is as follows: the image data is used for extracting the characteristic sequence, then the characteristic sequence is used for self-adapting the tolerance sizes of different gray values in the image, and finally the SDT compression algorithm is used for lossy compression of the image data according to the self-adapting tolerances of the different gray values to realize the distributed storage of data sharing.

Referring to fig. 1, a flowchart of a method for distributed compressed storage based on data sharing according to an embodiment of the present invention is shown, where the method includes the following steps:

s1, forming a one-dimensional data sequence by gray values of one channel of each pixel point of an image; and removing redundant data in the data sequence to obtain a characteristic sequence.

When the image data uploaded by the user client is compressed by using the SDT compression algorithm, firstly, the characteristic sequence of each image needs to be acquired, and the duplicate removal processing is needed when the characteristic sequence is acquired.

The invention needs to perform SDT compression based on adaptive tolerance on image data to be compressed, and an SDT compression algorithm can achieve the best compression effect when compressing sequence data, so that firstly, one-dimensional serialization needs to be performed on two-dimensional image data, repeated continuous data is not necessary when acquiring the tolerance of each data in the one-dimensional sequence data, and corresponding calculation amount is not necessary and also increased, so that continuous repeated data needs to be removed, and a characteristic sequence of continuous non-repeated serialized image data is obtained, for example, when the gray value corresponding to a certain row of pixel point of a certain part of the existing image data is [ a, a, a, a, a, b ], when the tolerance size is quantized by using the part of gray value, only two of a and b are essentially needed, but because the original gray value is [ a, a, a, a, b ], the related calculation of each adjacent gray value a and the previous or next a is performed, so that more data processing costs are caused, and then the change of the pixel point size of the image data has to be reflected by the continuous characteristic sequence, and the pixel point size of the image has to be adapted finally.

To a first order

Taking image data to be compressed as an example, forming a one-dimensional data sequence by using the gray value of each channel of each pixel point in an image, processing the data in the continuously repeated data in the data sequence, only retaining one data in the continuously repeated data to obtain a feature sequence, wherein the gray values of a plurality of channels in one image have a plurality of one-dimensional data sequences, namely correspond to a plurality of feature sequences, and the corresponding feature sequences are based on the gray values of the channels>

The specific acquisition mode is as follows:

firstly to the first

Two-dimensional image data serialization is carried out on image data to be compressed in a specific mode of taking the fifth or fifth judgment>

A first row of pixel points of the image data to be compressed is taken as a basis, then, each subsequent line of pixel points is sequentially arranged after the first line of pixel points, and then the ^ th or greater value can be obtained>

Serialized data sequence for each image to be compressed->

Specifically, the following is shown:

wherein the content of the first and second substances,

indicates the fifth->

The fifth/following serialization of the images to be compressed>

The size of each datum, is greater or less than>

Wherein->

Indicates the total number of data sequences, i.e.the ^ th or greater than the sequence before serialization>

The total number of pixel points of the image to be compressed is greater or less>

Indicates the ^ th or not serialized>

The fifth or of an image to be compressed>

The gray value of each pixel point is large;

further, acquiring a corresponding characteristic sequence based on the data sequence to obtain the data sequence

Is greater than or equal to>

Number of data->

For example, the removal manner of the backward duplicate data is as follows:

first, backward data repetition value is performed

The calculation of (2):

wherein the content of the first and second substances,

is the ^ th or greater in the data sequence>

The size of each datum, is greater or less than>

Is the ^ th or greater in the data sequence>

The size of the next data of the data; />

Is the first->

A data repetition value of the data;

the value results are two types:

/>

wherein

If so, then it is interpreted as->

And/or>

Not repeated, is->

Whether individual data is not ^ h>

Backward repeat data of the individual data is stopped>

The search for backward duplicates of individual data is selected at this time ^ h>

Data looking for the ^ th ^ or ^ th->

Backward repeating data of the data; />

If so, then it is interpreted as->

And &>

Repeat, the fifth->

Whether individual data is a ^ th ^ or>

The backward duplicate data of the individual data is then continuously processed at the ^ th ^ or ^ th>

Individual data>

And/or>

Is repeated, resulting in a->

The corresponding data repetition value is judged whether it is 0, if so, then->

Is->

Backward repeating the data; if not 0, then not, then a search for a ^ th or greater is started>

Number of data->

The look-up mode and the ^ th->

The searching mode of backward repeated data of each data is the same. It should be noted that the backward direction herein refers to data after one data.

Then obtain the first

Backward repeated data of each data is removed, and only the second backward repeated data is reserved

Number of data->

And (4) finishing. For example, the data sequence is [ A, B, B, B, C, C, A]Then its characteristic sequence is [ A, B, C, A ]]That is, the continuously repeated data BBB, CC and AA are processed, only one data is reserved, namely B, C and A, and the first A is not continuously repeated and is directly reserved, so the characteristic sequence is [ A, B, C, A]。

To this end, the first

Characteristic sequence which is processed by a data sequence corresponding to the image data to be stored->

Wherein->

As follows:

wherein

Indicates the th in the characteristic sequence>

Individual data->

Indicates the th in the characteristic sequence>

A piece of data; wherein->

，/>

。

And S2, acquiring uplink data and downlink data corresponding to one data in the SDT compression algorithm process of the characteristic sequence, and acquiring an uplink adjustment coefficient and a downlink adjustment coefficient of the data based on the uplink data and the downlink data.

When the image data is compressed by using the SDT compression algorithm, the size of the whole compression ratio depends on the size of tolerance, the larger the tolerance is, the higher the compression ratio is, but the higher the distortion rate of the corresponding data is; the smaller the opposite tolerance, the lower its compression rate, but the lower the distortion rate of the corresponding data. The present invention calculates the base tolerance of each data by the difference between different sizes of data in the feature sequence (corresponding to different gray values in the image) and the backward continuous data. Because the data is compressed by the SDT compression algorithm in a backward compression process, the data is compressed according to the fluctuation trend of the data.

Therefore, it is necessary to obtain backward neighboring data of each data for analysis and further characterization of the data, the backward neighboring data refers to data neighboring to a data, and data following the data, for example, AB, B is backward neighboring data of a. Meanwhile, for one datum, the datum in the feature sequence is mainly divided into three types, one is larger than the datum, one is smaller than the datum, and the other is equal to the datum, and because the tolerance of the equal datum is consistent, the data larger than one datum and the data smaller than one datum need to be combined for analysis, and then the variation feature of the datum is obtained.

Thus, the method for obtaining the uplink data and the downlink data corresponding to one data in the feature sequence specifically comprises the following steps: selecting one data, obtaining data which is equal to the data in the characteristic sequence, and recording the data as equal data; obtaining data which is backward adjacent to each equal data and is larger than the equal data, and recording the data as uplink data of the data; the data which is adjacent to each equal data in the backward direction and smaller than the equal data is obtained and recorded as the downlink data of the data, and it needs to be explained that the uplink data and the downlink data of the equal data are the same.

Further, the discrete degrees of the uplink data and the downlink data of one data need to be analyzed respectively to obtain an uplink adjustment coefficient and a downlink adjustment coefficient corresponding to the one data; obtaining the ratio of the number of uplink data corresponding to one data in the characteristic sequence to the sum of the number of the uplink data and the number of the downlink data; obtaining the reciprocal and standard deviation of the average value of all uplink data; the product of the ratio, the reciprocal of the average value and the standard deviation is a first calculation parameter of the uplink data, and a second calculation parameter of the downlink data is obtained in the same way; obtaining the result of adding the first calculation parameter and the second calculation parameter, wherein the ratio of the first calculation parameter to the result of adding is the uplink adjustment coefficient of the data, and the ratio of the second calculation parameter to the result of adding is the downlink adjustment coefficient of the data, specifically, first, obtaining the first calculation parameter corresponding to the uplink data, and using a formula to represent the first calculation parameter as:

wherein, the first and the second end of the pipe are connected with each other,

representing a first calculation parameter corresponding to uplink data; />

And &>

Respectively representing the quantity of uplink data and the quantity of downlink data of one datum; />

Represents the mean of all uplink data; />

Number indicating all uplink data components corresponding to one dataAccording to the set; />

Indicating the ^ th or greater in all upstream data constituent sets to which a data corresponds>

And transmitting the uplink data.

The first calculation parameter represents the dispersion degree of all uplink data corresponding to one data and the ratio of the uplink data, because the tolerance of each data is calculated subsequently, when the tolerance of each data is calculated, the uplink data and the downlink data of the data need to be analyzed respectively, when the ratio of the uplink data is larger and the dispersion degree is larger, the influence on the calculation tolerance is larger, that is, the larger the first calculation parameter is, the corresponding data is indicated

The larger the discrete degree and the ratio of the uplink data are, the more the tolerance needs to be adjusted by using the uplink adjustment coefficient and the downlink adjustment coefficient, so that the ratio of the uplink data added in the formula in the overall row data and the downlink data is greater or less>

The larger the value is, the more serious the influence of the uplink data is needed when the influence on the tolerance is; />

What is indicated is the degree of upstream data dispersion, since different data calculations @>

May be identical, and therefore utilize

A distinction is made.

Similarly, according to the method for obtaining the first calculation parameter of the uplink data, the second calculation parameter of the downlink data is calculated

Only the parameters corresponding to the downlink data need to be changed during calculation. Thus, the first and second calculation parameters respectively corresponding to the uplink data and the downlink data of one data in the feature sequence are obtained, and further, the calculation parameters are normalized to obtain an uplink adjustment coefficient corresponding to the uplink data and a downlink adjustment coefficient corresponding to the downlink data, which are expressed by a formula:

wherein the content of the first and second substances,

and &>

Respectively represents an upward adjustment factor and a downward adjustment factor, <' >>

And &>

The first calculation parameter and the second calculation parameter are respectively expressed, and the normalization is to convert the first calculation parameter and the second calculation parameter into weights for analyzing uplink data and downlink data of the data subsequently so as to obtain a proper basic tolerance.

And S3, obtaining the basic tolerance of the data according to the uplink adjustment coefficient and the downlink adjustment coefficient of the data in the characteristic sequence and the uplink data and the downlink data of the data.

Because the data is compressed by the SDT compression algorithm in a backward compression process according to the fluctuation trend of the data, the invention analyzes the weighted average difference of the data in the characteristic sequence and the corresponding uplink data and downlink data, and the weighted average difference is used as the basic tolerance of the data with different sizes in the characteristic sequence.

Taking the size in the characteristic sequence as the second order

Number of data->

Calculating the basic tolerance of the data, taking an uplink adjustment coefficient of one data in the characteristic sequence as the weight of the average value of the difference value of each uplink data and the data, taking a downlink adjustment coefficient as the weight of the average value of the difference value of the data and each downlink data, and summing to obtain a summation result; the average of the summation results is taken as the base tolerance of the data, and is formulated as:

wherein the content of the first and second substances,

representing the base tolerance of the jth data in the feature sequence; />

Representing an uplink adjustment coefficient; />

Representing a downlink adjustment coefficient; />

Indicates the ^ th or greater than or equal to in the upstream data corresponding to the jth data>

Individual uplink data, based on the comparison result>

Represents the jth data in the characteristic sequence and->

Greater than or equal to>

，/>

Indicates the ^ th or greater than or equal to in the downstream data corresponding to the jth data>

A downlink data, and

is less than or equal to>

；/>

And &>

Respectively representing the quantity of uplink data and downlink data corresponding to the jth data,

，/>

。

is used to indicate the fifth->

Number of data->

Without taking into account basic tolerances under other environmental influences, for example, without taking into account how much data of this size is present in the overall characteristic sequence, since it is at the ^ th ^ er>

All data in the characteristic sequence of each image to be compressed represent continuous and non-repeated gray values of pixel points in one channel in the image to be compressed, and the SDT compression algorithm is a backward compression algorithm, so that all data in the characteristic sequence and the ^ h or greater in the characteristic sequence are utilized by the method, the method and the device>

The backward adjacent data of the data with the same size and the data with the size are subjected to difference calculation, namely, the difference calculation

And &>

The average of the disparity values is then found as the base tolerance for the data of that size in the signature sequence.

And in the feature sequence, the data

Based on the characteristics of the SDT compression algorithm, the backward neighboring data of the same size data not only have the backward data larger than the data, but also have the backward neighboring data smaller than the data, i.e., the upstream data and the downstream data of the data, and therefore need to be processed respectively, and consideration is given to whether the data is larger than or equal to ≧>

And is less than or equal to>

The backward neighboring data of (2) is calculated by the ratio of the dispersion degree and the data amount of the backward neighboring data of the two casesThe product of (a) is used as a weight to adjust the average value of the difference values, that is, the average value of the difference values under two conditions is adjusted by using the uplink adjustment coefficient and the downlink adjustment coefficient, for example, the value of the difference value is greater than or equal to ≧ in the backward continuous data>

The more data that is present, then the size is ≥ h>

Should be subject to a base tolerance greater than ÷>

The tolerance size should be more heavily biased toward being greater than ≧>

The principle of the degree of dispersion is similar, the more discrete, the greater the size is->

When the data of (1) is compressed as a compression point, the more fluctuation the subsequent data is, the larger the tolerance should be given so that the compression efficiency is better when the data is used as a compression point, and vice versa. The mean value of the difference values is weighted as a function of the number and the discreteness of the backward adjacent data, and the weighted mean value is finally used as the value in the characteristic sequence ^ H>

The base tolerance of the data. It should be noted that the basic tolerance of the data with the same size in the feature sequence is the same.

Therefore, the variation characteristics of the uplink data and the downlink data of one data are analyzed to obtain an uplink adjustment coefficient and a downlink adjustment coefficient, and the difference value between the uplink data and the downlink data of the subsequent data is adjusted to obtain a proper basic tolerance of the data.

And S4, obtaining tolerance weight of data according to the frequency number of the data appearing in the characteristic sequence, and the uplink data and the downlink data of the data.

The basic tolerance calculated in step S3 is for data of the same size in the feature sequence, and the actual image condition is not considered, so that the basic tolerance needs to be corrected to obtain a tolerance with better compression effect.

For example, the repeated data appearing more times in the feature sequence has a smaller interval in the image, and when the gray value is used as a compression point, in order to enable the compressed smaller interval to have a better visual effect or a smaller distortion rate, the tolerance should be smaller when the repeated data is compressed, otherwise, the data appearing less times indicates that the interval of the corresponding gray value in the image is longer, so that the tolerance size should be larger when the repeated data is compressed, so as to ensure the corresponding compression rate, and therefore, the corresponding tolerance weight value of the data with different sizes needs to be calculated to adjust the basic tolerance, so as to obtain the final tolerance.

In particular, the size in the characteristic sequence is

Taking data as an example, dividing the same data in the characteristic sequence into a class, and obtaining the quantity and the total class number of each class of data; acquiring uplink data and a difference value between the downlink data and the data; according to the data>

The frequency of occurrence in the feature sequence, the number of each type of data in the feature sequence, the total number of classes, the difference and the data->

The absolute value of the ratio of (a) to (b) yields the tolerance weight for the data, formulated as: />

Wherein the content of the first and second substances,

representing a tolerance weight of jth data in the feature sequence; />

Expressed as the fifth or fifth degree>

The number of data in the class data, the ^ h>

The size of the data in the class data is->

；/>

Representing the total number of categories in the data sequence; />

And &>

Respectively indicating the number of uplink data and the number of downlink data of the data; />

The ^ th of the upstream data and the downstream data representing the data>

A piece of data; />

An exponential function with a natural constant e as the base is shown.

Wherein the content of the first and second substances,

the number of data in each category,

is->

Characterized by the fact that the data->

The greater the relative number of occurrences, the greater the value, the same data in the characteristic sequence is declared->

The more times of occurrence, the smaller the interval in the image to be compressed, the smaller the interval, the more the corresponding gray scale sensitivity needs to be considered, that is, when the gray scale value is taken as a compression point, the information loss rate between the gray scale value and the compression point needs to be reduced to compensate the visual perception of human eyes, that is, a smaller tolerance is needed; in the same way, the method for preparing the composite material,

indicates that data is greater or less>

Upstream data and downstream data and data->

The larger the value is, the more abundant the gradation change in the image is, and the tolerance is also made smaller when the data is used as a compression point. From this, data can be finally obtained->

Tolerance weight of, to data/>

The basic tolerance is adjusted, so that the adjusted basic tolerance can improve the compression effect of the SDT compression algorithm when the image data is compressed.

S5, obtaining the final tolerance of each datum based on the basic tolerance and the tolerance weight of each datum; and compressing the data sequence corresponding to the image by using the final tolerance of each piece of data, and sharing the compressed data to each secondary server for distributed storage.

In steps S3 and S4, the basic tolerance size corresponding to each data in the feature sequence and the tolerance weight corresponding thereto are calculated, and the basic tolerance is adjusted by using the tolerance weight, so as to obtain the final tolerance size of the data in the feature sequence.

By the size in the characteristic sequence of

Take the data of (4) as an example, obtain a first preset value and a magnitude of @>

And the result of the comparison, compared with the performance of the second preset value, multiplied by the base tolerance yields a value of ≥ h>

Corresponds to a final tolerance which corresponds to the final tolerance ≥>

The calculation formula of (a) is as follows: />

Wherein the content of the first and second substances,

indicates a size in the characteristic sequence of->

Based on a tolerance of the data of (4), based on the comparison of the comparison result of (4)>

Indicates a size of->

Preferably, in this embodiment, the value of the first preset value is 1, and the value of the second preset value is 2; based on the basic tolerance, the corresponding tolerance weight is used to adjust it, specifically to use->

The adjustment is carried out in order to narrow the variation range of the tolerance weight, in order to prevent certain special cases in which the final tolerance is too high or too low, which could lead to a ^ th decision>

When an image is compressed, the final tolerance is used for compression, so that the situation that the local compression rate is insufficient or the local information loss rate is high occurs, and therefore an implementer can adjust the values of the first preset value and the second preset value according to the specific situation.

Therefore, the final tolerance of each datum in the feature sequence can be obtained, the change condition of the datum is considered when the final tolerance of each datum is calculated, and the data in the feature sequence is the gray value of each pixel point of one channel in the image, so that the change condition of the gray value of each pixel point of one channel of the image is combined, and therefore when the image datum is compressed by the SDT algorithm based on the final tolerance of each datum, the compression rate can be improved, and the distortion of the image does not affect the normal use of the image.

In addition, the final tolerances of the same data in the feature sequence are all the same, and the image has gray values of several channels, and there are several one-dimensional data sequences, which require compression.

The final tolerance of each different size data in the feature sequence is adapted so that the same data has the most suitable final tolerance, and each different size data in the feature sequence corresponds to the second data

A different gray value in the image to be compressed is obtained so that the ^ h or greater is obtained>

The final tolerance corresponding to each gray value in the image to be compressed is set, and then the SD compression algorithm is utilized to carry out the fifth or fifth judgment on the gray values>

And compressing the one-dimensional data sequence corresponding to the image to be compressed.

Specially, data sequences

Is greater than or equal to>

Data of individual data->

The compressed data is->

Which finally takes the value->

The following were used:

because the compression by the SDT compression algorithm is followed by the second

Number of data->

The compression curve point is used to replace the gray value of the point, but the gray value is an integer, but the compression value at the point may be non-integer, so the membership calculation is performed in the above manner, and the nearest integer is membership to the point, so that the value is greater or less than the gray value of the point>

Indicating a rounding down. At this point, a ^ h>

After the image to be compressed is compressed, the fifth or fifth judgment is obtained>

Compressed data of an image.

After the compressed data is obtained, distributed storage is needed to be performed on the compressed data, wherein the distributed file system used in the present invention is MooseFS, and an implementer may also select other distributed file systems, and the specific distribution is as follows: the server comprises a metadata management server, a metadata log management server, a main server, a plurality of secondary servers and an NFS server. The working principle of all servers is as follows:

the NFS server: uploading, changing and viewing image data as a user client;

a metadata management server: managing metadata generated in a storage system;

metadata log management server: logging all metadata and managing the logs;

a main server: performing primary storage on all data;

and (3) the secondary server: and sharing the data of the main server and backing up the distributed storage.

The whole work flow is as follows:

firstly, an index is established in a metadata server, then an NFS server is used as a client to communicate with the metadata server, all operations of a user are based on the NFS server, for example, image files are uploaded, the user uploads the images at the client, then channel separation is carried out on RGB images in the NFS server, a gray value of each channel is obtained, then a self-adaptive final tolerance of each data is obtained by using the method for obtaining the final tolerance, and then compression is carried out by using an SDT compression algorithm.

And then, the compressed image data uploaded by the user side is stored by the main server, and then the compressed image data is backed up and shared to a plurality of auxiliary servers in a backup mode to realize distributed storage.

It should be noted that: the precedence order of the above embodiments of the present invention is only for description, and does not represent the merits of the embodiments. And specific embodiments thereof have been described above. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the present invention, and any modifications, equivalents, improvements and the like made within the scope of the present invention are intended to be included therein.

Claims

1. A distributed compression storage method based on data sharing is characterized by comprising the following steps:

2. The distributed compression storage method based on data sharing according to claim 1, wherein the obtaining of the uplink data and the downlink data corresponding to one data in the SDT compression algorithm process by the signature sequence comprises: selecting one data, obtaining data which is equal to the data in the characteristic sequence, and recording the data as equal data; obtaining data which is backward adjacent to each equal data and is larger than the equal data, and recording the data as uplink data of the data; and obtaining data which is backward adjacent to each equal data and is smaller than the equal data, and recording the data as downlink data of the data.

3. The method according to claim 1, wherein obtaining the uplink adjustment coefficient and the downlink adjustment coefficient of the data based on the uplink data and the downlink data comprises:

4. The method according to claim 1, wherein obtaining the base tolerance of a data according to the uplink adjustment coefficient and the downlink adjustment coefficient of the data in the signature sequence, and the uplink data and the downlink data of the data comprises: taking an uplink adjustment coefficient of one data in the characteristic sequence as the weight of the average value of the difference value of each uplink data and the data, taking a downlink adjustment coefficient as the weight of the average value of the difference value of the data and each downlink data, and summing to obtain a summation result; and calculating the average value of the summation result, and taking the average value as the basic tolerance of the data.

5. The method of claim 1, wherein obtaining the tolerance weight of a data according to the frequency of occurrence of the data in the signature sequence, the uplink data and the downlink data of the data comprises: dividing the same data in the characteristic sequence into one class, and obtaining the quantity of each class of data and the total class quantity; acquiring uplink data and a difference value between the downlink data and the data; and obtaining the tolerance weight of the data according to the frequency of occurrence of the data, the quantity of each type of data in the characteristic sequence, the total class number and the absolute value of the ratio of the difference value to the data.

6. The distributed compression storage method based on data sharing according to claim 5, wherein the tolerance weight of the data is:

wherein the content of the first and second substances,

representing a tolerance weight of jth data in the feature sequence; />

Expressed as the fifth or fifth degree>

The number of data in the class data, the ^ h>

The size of the data in the class data is->

；/>

Representing the total number of categories in the data sequence; />

And &>

The ^ th of the upstream data and the downstream data representing the data>

A piece of data; />

An exponential function with a natural constant e as the base is shown.

7. The method of claim 1, wherein obtaining the final tolerance of each data based on the basic tolerance and the tolerance weight of each data comprises: and obtaining the sum of the first preset value and the tolerance weight, and multiplying the ratio of the sum to the second preset value by the basic tolerance to obtain the final tolerance.

8. The distributed compression storage method based on data sharing according to claim 1, wherein the removing redundant data in the data sequence to obtain a signature sequence includes: in the characteristic sequence, only one data in the continuously repeated data is reserved, and the discontinuously repeated data is directly reserved, so that the characteristic sequence is obtained.