CN115857823A - Distributed compression storage method based on data sharing - Google Patents

Distributed compression storage method based on data sharing Download PDF

Info

Publication number
CN115857823A
CN115857823A CN202211660453.8A CN202211660453A CN115857823A CN 115857823 A CN115857823 A CN 115857823A CN 202211660453 A CN202211660453 A CN 202211660453A CN 115857823 A CN115857823 A CN 115857823A
Authority
CN
China
Prior art keywords
data
tolerance
uplink
downlink
obtaining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211660453.8A
Other languages
Chinese (zh)
Inventor
祁朋涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangyin Consumer Finance Co ltd
Original Assignee
Hangyin Consumer Finance Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangyin Consumer Finance Co ltd filed Critical Hangyin Consumer Finance Co ltd
Priority to CN202211660453.8A priority Critical patent/CN115857823A/en
Publication of CN115857823A publication Critical patent/CN115857823A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention relates to the technical field of electric digital data processing, in particular to a distributed compression storage method based on data sharing. The method comprises the following steps: obtaining an uplink adjustment coefficient and a downlink adjustment coefficient of data according to uplink data and downlink data corresponding to the data in the characteristic sequence; obtaining the basic tolerance of data according to the uplink adjustment coefficient and the downlink adjustment coefficient of the data in the characteristic sequence and the uplink data and the downlink data of the data; obtaining tolerance weight of data according to frequency number of the data appearing in the characteristic sequence, uplink data and downlink data of the data; obtaining a final tolerance of each data based on the basic tolerance and the tolerance weight of each data; and compressing the data sequence corresponding to the image by using the final tolerance of each piece of data, and sharing the compressed data to each secondary server for distributed storage. The invention can improve the compression effect of the image data through the obtained self-adaptive final tolerance.

Description

Distributed compression storage method based on data sharing
Technical Field
The invention relates to the technical field of electric digital data processing, in particular to a distributed compression storage method based on data sharing.
Background
In recent years, internet is rapidly developed, large data is applied to more and more industries, the data density is higher and higher, and accordingly, the generated data volume is correspondingly larger and larger, especially, the storage of image data of a client has a serious influence on the storage cost, so that the data volume during storage needs to be reduced by compressing when the large data is stored, and the storage cost is reduced.
When picture data of a client is stored, because of a unique visual identification system of human eyes, a part of image information can be allowed to be lost, namely, the image data which needs to be stored by the client can be compressed by using a lossy compression algorithm, namely, the image data can be converted into one-dimensional data, the image data is compressed by using a revolving door compression algorithm, in the revolving door compression algorithm, tolerance is a factor for determining distortion and compression rate, and the tolerance is fixed in the existing revolving door compression algorithm, so that the compression effect on data with different changes is not good, namely, the compression rate and the distortion rate of the data can not be guaranteed to reach a reasonable balance point.
Disclosure of Invention
In order to solve the problem that the tolerance in the existing revolving door compression algorithm is a fixed value and has a poor effect on data compression with different variation trend characteristics, the invention aims to provide a distributed compression storage method based on data sharing, and the adopted technical scheme is as follows:
one embodiment of the invention provides a distributed compression storage method based on data sharing, which comprises the following steps:
the gray value of one channel of each pixel point of the image forms a one-dimensional data sequence; removing redundant data in the data sequence to obtain a characteristic sequence;
acquiring uplink data and downlink data corresponding to one data in the SDT compression algorithm process of the characteristic sequence, and acquiring an uplink adjustment coefficient and a downlink adjustment coefficient of the data based on the uplink data and the downlink data;
obtaining the basic tolerance of data according to the uplink adjustment coefficient and the downlink adjustment coefficient of the data in the characteristic sequence and the uplink data and the downlink data of the data;
obtaining tolerance weight of data according to frequency number of the data appearing in the characteristic sequence, uplink data and downlink data of the data;
obtaining a final tolerance of each data based on the basic tolerance and the tolerance weight of each data; and compressing the data sequence corresponding to the image by using the final tolerance of each piece of data, and sharing the compressed data to each secondary server for distributed storage.
Preferably, the acquiring of the uplink data and the downlink data corresponding to one data of the feature sequence in the SDT compression algorithm process includes: selecting one data, obtaining data which is equal to the data in the characteristic sequence, and recording the data as equal data; obtaining data which is backward adjacent to each equal data and is larger than the equal data, and recording the data as uplink data of the data; and obtaining data which is backward adjacent to each equal data and is smaller than the equal data, and recording the data as downlink data of the data.
Preferably, obtaining the uplink adjustment coefficient and the downlink adjustment coefficient of the data based on the uplink data and the downlink data includes:
obtaining the ratio of the number of uplink data corresponding to one data in the characteristic sequence to the sum of the number of the uplink data and the number of the downlink data; obtaining the reciprocal and standard deviation of the average value of all uplink data; the product of the ratio, the reciprocal of the average value and the standard deviation is a first calculation parameter of the uplink data, and a second calculation parameter of the downlink data is obtained in the same way; and obtaining the result of adding the first calculation parameter and the second calculation parameter, wherein the ratio of the first calculation parameter to the result of adding is the uplink adjustment coefficient of the data, and the ratio of the second calculation parameter to the result of adding is the downlink adjustment coefficient of the data.
Preferably, obtaining the basic tolerance of the data according to the uplink adjustment coefficient and the downlink adjustment coefficient of the data in the feature sequence, and the uplink data and the downlink data of the data includes: taking an uplink adjustment coefficient of one data in the characteristic sequence as the weight of the average value of the difference value of each uplink data and the data, taking a downlink adjustment coefficient as the weight of the average value of the difference value of the data and each downlink data, and summing to obtain a summation result; and calculating the average value of the summation result, and taking the average value as the basic tolerance of the data.
Preferably, obtaining the tolerance weight of a data according to the frequency of occurrence of the data in the feature sequence, the uplink data and the downlink data of the data comprises: dividing the same data in the characteristic sequence into one class, and obtaining the quantity of each class of data and the total class quantity; acquiring uplink data and a difference value between the downlink data and the data; and obtaining the tolerance weight of the data according to the frequency of occurrence of the data, the quantity of each type of data in the characteristic sequence, the total class number and the absolute value of the ratio of the difference value to the data.
Preferably, the tolerance weight of the data is:
Figure DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE004
representing a tolerance weight of jth data in the feature sequence; />
Figure DEST_PATH_IMAGE006
Representing the frequency of occurrence of the jth data in the signature sequence; />
Figure DEST_PATH_IMAGE008
Expressed as the fifth or fifth degree>
Figure DEST_PATH_IMAGE010
The number of data in the class data, the ^ h>
Figure 964591DEST_PATH_IMAGE010
The size of the data in the class data is->
Figure DEST_PATH_IMAGE012
;/>
Figure DEST_PATH_IMAGE014
Representing the total number of categories in the data sequence; />
Figure DEST_PATH_IMAGE016
And &>
Figure DEST_PATH_IMAGE018
Respectively represent the dataThe number of uplink data and the number of downlink data; />
Figure DEST_PATH_IMAGE020
The ^ th of the upstream data and the downstream data representing the data>
Figure DEST_PATH_IMAGE022
A piece of data; />
Figure DEST_PATH_IMAGE024
An exponential function with a natural constant e as the base is shown.
Preferably, the deriving the final tolerance for each datum based on the base tolerance and the tolerance weight for each datum comprises: and obtaining the sum of the first preset value and the tolerance weight, and multiplying the ratio of the sum to the second preset value by the basic tolerance to obtain the final tolerance.
Preferably, removing redundant data in the data sequence to obtain a feature sequence includes: in the characteristic sequence, only one data in the continuously repeated data is reserved, and the discontinuously repeated data is directly reserved, so that the characteristic sequence is obtained.
The embodiment of the invention at least has the following beneficial effects: according to the method, the image data are converted into the one-dimensional sequence data, the image data can be integrally analyzed, and meanwhile, the characteristic sequence of the data sequence is obtained, the characteristic sequence not only can represent the characteristics of the data in the data sequence, but also is small in data quantity, and each data can be more conveniently analyzed; obtaining uplink data and downlink data of one data in the characteristic sequence, then respectively obtaining an uplink adjustment coefficient and a downlink adjustment coefficient according to the uplink data and the downlink data, using the uplink adjustment coefficient and the downlink adjustment coefficient as adjustment coefficients when calculating the basic tolerance of the data, and then analyzing the variation characteristics of the data in the characteristic sequence by combining the uplink data and the downlink data of the data, thereby obtaining a more appropriate basic tolerance; and then obtaining tolerance weight by combining the uplink data and the downlink data of the data according to the frequency number of the data appearing in the feature sequence, adjusting the basic tolerance to obtain final tolerance, wherein the final tolerance of each data self-adaption is obtained by combining the variation features of each data in the feature sequence, so that the compression effect of compressing the data in the data sequence converted from the image is improved, and the data after the compression while preventing is shared to each secondary server for distributed storage, so that the compression rate and the distortion rate reach a reasonable balance, namely the compression rate is improved while the distortion rate is kept in a reasonable range, and the observation of the decompressed image data by human eyes is not influenced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions and advantages of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of a method for a distributed compressed storage method based on data sharing according to an embodiment of the present invention.
Detailed Description
To further illustrate the technical means and effects of the present invention for achieving the predetermined objects, the following detailed description of a distributed compression storage method based on data sharing according to the present invention with reference to the accompanying drawings and preferred embodiments shows the following detailed descriptions of the specific implementation, structure, features and effects thereof. In the following description, different "one embodiment" or "another embodiment" refers to not necessarily the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following describes a specific scheme of the distributed compression storage method based on data sharing in detail with reference to the accompanying drawings.
Example (b):
the main application scenarios of the invention are as follows: when the image data of the client side is stored in a sharing and distributed mode, in order to reduce the storage capacity of the data, the picture data transmitted by the client side is compressed by using a revolving door compression algorithm.
The purpose of the invention is as follows: the image data is used for extracting the characteristic sequence, then the characteristic sequence is used for self-adapting the tolerance sizes of different gray values in the image, and finally the SDT compression algorithm is used for lossy compression of the image data according to the self-adapting tolerances of the different gray values to realize the distributed storage of data sharing.
Referring to fig. 1, a flowchart of a method for distributed compressed storage based on data sharing according to an embodiment of the present invention is shown, where the method includes the following steps:
s1, forming a one-dimensional data sequence by gray values of one channel of each pixel point of an image; and removing redundant data in the data sequence to obtain a characteristic sequence.
When the image data uploaded by the user client is compressed by using the SDT compression algorithm, firstly, the characteristic sequence of each image needs to be acquired, and the duplicate removal processing is needed when the characteristic sequence is acquired.
The invention needs to perform SDT compression based on adaptive tolerance on image data to be compressed, and an SDT compression algorithm can achieve the best compression effect when compressing sequence data, so that firstly, one-dimensional serialization needs to be performed on two-dimensional image data, repeated continuous data is not necessary when acquiring the tolerance of each data in the one-dimensional sequence data, and corresponding calculation amount is not necessary and also increased, so that continuous repeated data needs to be removed, and a characteristic sequence of continuous non-repeated serialized image data is obtained, for example, when the gray value corresponding to a certain row of pixel point of a certain part of the existing image data is [ a, a, a, a, a, b ], when the tolerance size is quantized by using the part of gray value, only two of a and b are essentially needed, but because the original gray value is [ a, a, a, a, b ], the related calculation of each adjacent gray value a and the previous or next a is performed, so that more data processing costs are caused, and then the change of the pixel point size of the image data has to be reflected by the continuous characteristic sequence, and the pixel point size of the image has to be adapted finally.
To a first order
Figure DEST_PATH_IMAGE026
Taking image data to be compressed as an example, forming a one-dimensional data sequence by using the gray value of each channel of each pixel point in an image, processing the data in the continuously repeated data in the data sequence, only retaining one data in the continuously repeated data to obtain a feature sequence, wherein the gray values of a plurality of channels in one image have a plurality of one-dimensional data sequences, namely correspond to a plurality of feature sequences, and the corresponding feature sequences are based on the gray values of the channels>
Figure DEST_PATH_IMAGE028
The specific acquisition mode is as follows:
firstly to the first
Figure 338982DEST_PATH_IMAGE026
Two-dimensional image data serialization is carried out on image data to be compressed in a specific mode of taking the fifth or fifth judgment>
Figure 499836DEST_PATH_IMAGE026
A first row of pixel points of the image data to be compressed is taken as a basis, then, each subsequent line of pixel points is sequentially arranged after the first line of pixel points, and then the ^ th or greater value can be obtained>
Figure 440110DEST_PATH_IMAGE026
Serialized data sequence for each image to be compressed->
Figure DEST_PATH_IMAGE030
Specifically, the following is shown:
Figure DEST_PATH_IMAGE032
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE034
indicates the fifth->
Figure 573282DEST_PATH_IMAGE026
The fifth/following serialization of the images to be compressed>
Figure DEST_PATH_IMAGE036
The size of each datum, is greater or less than>
Figure DEST_PATH_IMAGE038
Wherein->
Figure DEST_PATH_IMAGE040
Indicates the total number of data sequences, i.e.the ^ th or greater than the sequence before serialization>
Figure 406721DEST_PATH_IMAGE026
The total number of pixel points of the image to be compressed is greater or less>
Figure 54872DEST_PATH_IMAGE034
Indicates the ^ th or not serialized>
Figure 861154DEST_PATH_IMAGE026
The fifth or of an image to be compressed>
Figure 707887DEST_PATH_IMAGE036
The gray value of each pixel point is large;
further, acquiring a corresponding characteristic sequence based on the data sequence to obtain the data sequence
Figure DEST_PATH_IMAGE042
Is greater than or equal to>
Figure 387261DEST_PATH_IMAGE036
Number of data->
Figure 381762DEST_PATH_IMAGE034
For example, the removal manner of the backward duplicate data is as follows:
first, backward data repetition value is performed
Figure DEST_PATH_IMAGE044
The calculation of (2):
Figure DEST_PATH_IMAGE046
wherein the content of the first and second substances,
Figure 8046DEST_PATH_IMAGE034
is the ^ th or greater in the data sequence>
Figure 37182DEST_PATH_IMAGE036
The size of each datum, is greater or less than>
Figure DEST_PATH_IMAGE048
Is the ^ th or greater in the data sequence>
Figure 884528DEST_PATH_IMAGE036
The size of the next data of the data; />
Figure 366325DEST_PATH_IMAGE044
Is the first->
Figure 655355DEST_PATH_IMAGE036
A data repetition value of the data;
Figure 538997DEST_PATH_IMAGE044
the value results are two types:
Figure DEST_PATH_IMAGE050
/>
wherein
Figure DEST_PATH_IMAGE052
If so, then it is interpreted as->
Figure 622491DEST_PATH_IMAGE048
And/or>
Figure 466950DEST_PATH_IMAGE034
Not repeated, is->
Figure DEST_PATH_IMAGE054
Whether individual data is not ^ h>
Figure 825250DEST_PATH_IMAGE036
Backward repeat data of the individual data is stopped>
Figure 828978DEST_PATH_IMAGE036
The search for backward duplicates of individual data is selected at this time ^ h>
Figure 83373DEST_PATH_IMAGE054
Data looking for the ^ th ^ or ^ th->
Figure 539763DEST_PATH_IMAGE054
Backward repeating data of the data; />
Figure DEST_PATH_IMAGE056
If so, then it is interpreted as->
Figure 698824DEST_PATH_IMAGE048
And &>
Figure 698004DEST_PATH_IMAGE034
Repeat, the fifth->
Figure 388879DEST_PATH_IMAGE054
Whether individual data is a ^ th ^ or>
Figure 332565DEST_PATH_IMAGE036
The backward duplicate data of the individual data is then continuously processed at the ^ th ^ or ^ th>
Figure DEST_PATH_IMAGE058
Individual data>
Figure DEST_PATH_IMAGE060
And/or>
Figure 704771DEST_PATH_IMAGE034
Is repeated, resulting in a->
Figure 417512DEST_PATH_IMAGE060
The corresponding data repetition value is judged whether it is 0, if so, then->
Figure 13710DEST_PATH_IMAGE060
Is->
Figure 116795DEST_PATH_IMAGE034
Backward repeating the data; if not 0, then not, then a search for a ^ th or greater is started>
Figure 214064DEST_PATH_IMAGE058
Number of data->
Figure 922257DEST_PATH_IMAGE060
The look-up mode and the ^ th->
Figure 548411DEST_PATH_IMAGE036
The searching mode of backward repeated data of each data is the same. It should be noted that the backward direction herein refers to data after one data.
Then obtain the first
Figure 135862DEST_PATH_IMAGE036
Backward repeated data of each data is removed, and only the second backward repeated data is reserved
Figure 912188DEST_PATH_IMAGE036
Number of data->
Figure 333942DEST_PATH_IMAGE034
And (4) finishing. For example, the data sequence is [ A, B, B, B, C, C, A]Then its characteristic sequence is [ A, B, C, A ]]That is, the continuously repeated data BBB, CC and AA are processed, only one data is reserved, namely B, C and A, and the first A is not continuously repeated and is directly reserved, so the characteristic sequence is [ A, B, C, A]。
To this end, the first
Figure DEST_PATH_IMAGE062
Characteristic sequence which is processed by a data sequence corresponding to the image data to be stored->
Figure DEST_PATH_IMAGE064
Wherein->
Figure 475205DEST_PATH_IMAGE064
As follows:
Figure DEST_PATH_IMAGE066
wherein
Figure DEST_PATH_IMAGE068
Indicates the th in the characteristic sequence>
Figure DEST_PATH_IMAGE070
Individual data->
Figure DEST_PATH_IMAGE072
Indicates the th in the characteristic sequence>
Figure DEST_PATH_IMAGE074
A piece of data; wherein->
Figure DEST_PATH_IMAGE076
,/>
Figure DEST_PATH_IMAGE078
And S2, acquiring uplink data and downlink data corresponding to one data in the SDT compression algorithm process of the characteristic sequence, and acquiring an uplink adjustment coefficient and a downlink adjustment coefficient of the data based on the uplink data and the downlink data.
When the image data is compressed by using the SDT compression algorithm, the size of the whole compression ratio depends on the size of tolerance, the larger the tolerance is, the higher the compression ratio is, but the higher the distortion rate of the corresponding data is; the smaller the opposite tolerance, the lower its compression rate, but the lower the distortion rate of the corresponding data. The present invention calculates the base tolerance of each data by the difference between different sizes of data in the feature sequence (corresponding to different gray values in the image) and the backward continuous data. Because the data is compressed by the SDT compression algorithm in a backward compression process, the data is compressed according to the fluctuation trend of the data.
Therefore, it is necessary to obtain backward neighboring data of each data for analysis and further characterization of the data, the backward neighboring data refers to data neighboring to a data, and data following the data, for example, AB, B is backward neighboring data of a. Meanwhile, for one datum, the datum in the feature sequence is mainly divided into three types, one is larger than the datum, one is smaller than the datum, and the other is equal to the datum, and because the tolerance of the equal datum is consistent, the data larger than one datum and the data smaller than one datum need to be combined for analysis, and then the variation feature of the datum is obtained.
Thus, the method for obtaining the uplink data and the downlink data corresponding to one data in the feature sequence specifically comprises the following steps: selecting one data, obtaining data which is equal to the data in the characteristic sequence, and recording the data as equal data; obtaining data which is backward adjacent to each equal data and is larger than the equal data, and recording the data as uplink data of the data; the data which is adjacent to each equal data in the backward direction and smaller than the equal data is obtained and recorded as the downlink data of the data, and it needs to be explained that the uplink data and the downlink data of the equal data are the same.
Further, the discrete degrees of the uplink data and the downlink data of one data need to be analyzed respectively to obtain an uplink adjustment coefficient and a downlink adjustment coefficient corresponding to the one data; obtaining the ratio of the number of uplink data corresponding to one data in the characteristic sequence to the sum of the number of the uplink data and the number of the downlink data; obtaining the reciprocal and standard deviation of the average value of all uplink data; the product of the ratio, the reciprocal of the average value and the standard deviation is a first calculation parameter of the uplink data, and a second calculation parameter of the downlink data is obtained in the same way; obtaining the result of adding the first calculation parameter and the second calculation parameter, wherein the ratio of the first calculation parameter to the result of adding is the uplink adjustment coefficient of the data, and the ratio of the second calculation parameter to the result of adding is the downlink adjustment coefficient of the data, specifically, first, obtaining the first calculation parameter corresponding to the uplink data, and using a formula to represent the first calculation parameter as:
Figure DEST_PATH_IMAGE080
wherein, the first and the second end of the pipe are connected with each other,
Figure DEST_PATH_IMAGE082
representing a first calculation parameter corresponding to uplink data; />
Figure 35106DEST_PATH_IMAGE016
And &>
Figure 615123DEST_PATH_IMAGE018
Respectively representing the quantity of uplink data and the quantity of downlink data of one datum; />
Figure DEST_PATH_IMAGE084
Represents the mean of all uplink data; />
Figure DEST_PATH_IMAGE086
Number indicating all uplink data components corresponding to one dataAccording to the set; />
Figure DEST_PATH_IMAGE088
Indicating the ^ th or greater in all upstream data constituent sets to which a data corresponds>
Figure DEST_PATH_IMAGE090
And transmitting the uplink data.
The first calculation parameter represents the dispersion degree of all uplink data corresponding to one data and the ratio of the uplink data, because the tolerance of each data is calculated subsequently, when the tolerance of each data is calculated, the uplink data and the downlink data of the data need to be analyzed respectively, when the ratio of the uplink data is larger and the dispersion degree is larger, the influence on the calculation tolerance is larger, that is, the larger the first calculation parameter is, the corresponding data is indicated
Figure DEST_PATH_IMAGE092
The larger the discrete degree and the ratio of the uplink data are, the more the tolerance needs to be adjusted by using the uplink adjustment coefficient and the downlink adjustment coefficient, so that the ratio of the uplink data added in the formula in the overall row data and the downlink data is greater or less>
Figure DEST_PATH_IMAGE094
The larger the value is, the more serious the influence of the uplink data is needed when the influence on the tolerance is; />
Figure DEST_PATH_IMAGE096
What is indicated is the degree of upstream data dispersion, since different data calculations @>
Figure DEST_PATH_IMAGE098
May be identical, and therefore utilize
Figure DEST_PATH_IMAGE100
A distinction is made.
Similarly, according to the method for obtaining the first calculation parameter of the uplink data, the second calculation parameter of the downlink data is calculated
Figure DEST_PATH_IMAGE102
Only the parameters corresponding to the downlink data need to be changed during calculation. Thus, the first and second calculation parameters respectively corresponding to the uplink data and the downlink data of one data in the feature sequence are obtained, and further, the calculation parameters are normalized to obtain an uplink adjustment coefficient corresponding to the uplink data and a downlink adjustment coefficient corresponding to the downlink data, which are expressed by a formula:
Figure DEST_PATH_IMAGE104
Figure DEST_PATH_IMAGE106
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE108
and &>
Figure DEST_PATH_IMAGE110
Respectively represents an upward adjustment factor and a downward adjustment factor, <' >>
Figure 377254DEST_PATH_IMAGE082
And &>
Figure 748805DEST_PATH_IMAGE102
The first calculation parameter and the second calculation parameter are respectively expressed, and the normalization is to convert the first calculation parameter and the second calculation parameter into weights for analyzing uplink data and downlink data of the data subsequently so as to obtain a proper basic tolerance.
And S3, obtaining the basic tolerance of the data according to the uplink adjustment coefficient and the downlink adjustment coefficient of the data in the characteristic sequence and the uplink data and the downlink data of the data.
Because the data is compressed by the SDT compression algorithm in a backward compression process according to the fluctuation trend of the data, the invention analyzes the weighted average difference of the data in the characteristic sequence and the corresponding uplink data and downlink data, and the weighted average difference is used as the basic tolerance of the data with different sizes in the characteristic sequence.
Taking the size in the characteristic sequence as the second order
Figure 110516DEST_PATH_IMAGE070
Number of data->
Figure 25382DEST_PATH_IMAGE068
Calculating the basic tolerance of the data, taking an uplink adjustment coefficient of one data in the characteristic sequence as the weight of the average value of the difference value of each uplink data and the data, taking a downlink adjustment coefficient as the weight of the average value of the difference value of the data and each downlink data, and summing to obtain a summation result; the average of the summation results is taken as the base tolerance of the data, and is formulated as:
Figure DEST_PATH_IMAGE112
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE114
representing the base tolerance of the jth data in the feature sequence; />
Figure 234778DEST_PATH_IMAGE108
Representing an uplink adjustment coefficient; />
Figure 639214DEST_PATH_IMAGE110
Representing a downlink adjustment coefficient; />
Figure 425905DEST_PATH_IMAGE088
Indicates the ^ th or greater than or equal to in the upstream data corresponding to the jth data>
Figure DEST_PATH_IMAGE116
Individual uplink data, based on the comparison result>
Figure 347724DEST_PATH_IMAGE068
Represents the jth data in the characteristic sequence and->
Figure 739523DEST_PATH_IMAGE088
Greater than or equal to>
Figure 252544DEST_PATH_IMAGE068
,/>
Figure DEST_PATH_IMAGE118
Indicates the ^ th or greater than or equal to in the downstream data corresponding to the jth data>
Figure DEST_PATH_IMAGE120
A downlink data, and
Figure 523600DEST_PATH_IMAGE118
is less than or equal to>
Figure 983532DEST_PATH_IMAGE068
;/>
Figure 26574DEST_PATH_IMAGE092
And &>
Figure DEST_PATH_IMAGE122
Respectively representing the quantity of uplink data and downlink data corresponding to the jth data,
Figure DEST_PATH_IMAGE124
,/>
Figure DEST_PATH_IMAGE126
Figure DEST_PATH_IMAGE128
is used to indicate the fifth->
Figure 54704DEST_PATH_IMAGE070
Number of data->
Figure 878304DEST_PATH_IMAGE068
Without taking into account basic tolerances under other environmental influences, for example, without taking into account how much data of this size is present in the overall characteristic sequence, since it is at the ^ th ^ er>
Figure 673084DEST_PATH_IMAGE026
All data in the characteristic sequence of each image to be compressed represent continuous and non-repeated gray values of pixel points in one channel in the image to be compressed, and the SDT compression algorithm is a backward compression algorithm, so that all data in the characteristic sequence and the ^ h or greater in the characteristic sequence are utilized by the method, the method and the device>
Figure 570633DEST_PATH_IMAGE070
The backward adjacent data of the data with the same size and the data with the size are subjected to difference calculation, namely, the difference calculation
Figure DEST_PATH_IMAGE130
And &>
Figure DEST_PATH_IMAGE132
The average of the disparity values is then found as the base tolerance for the data of that size in the signature sequence.
And in the feature sequence, the data
Figure 297893DEST_PATH_IMAGE068
Based on the characteristics of the SDT compression algorithm, the backward neighboring data of the same size data not only have the backward data larger than the data, but also have the backward neighboring data smaller than the data, i.e., the upstream data and the downstream data of the data, and therefore need to be processed respectively, and consideration is given to whether the data is larger than or equal to ≧>
Figure 874368DEST_PATH_IMAGE068
And is less than or equal to>
Figure 472840DEST_PATH_IMAGE068
The backward neighboring data of (2) is calculated by the ratio of the dispersion degree and the data amount of the backward neighboring data of the two casesThe product of (a) is used as a weight to adjust the average value of the difference values, that is, the average value of the difference values under two conditions is adjusted by using the uplink adjustment coefficient and the downlink adjustment coefficient, for example, the value of the difference value is greater than or equal to ≧ in the backward continuous data>
Figure 428157DEST_PATH_IMAGE068
The more data that is present, then the size is ≥ h>
Figure 781778DEST_PATH_IMAGE068
Should be subject to a base tolerance greater than ÷>
Figure 455336DEST_PATH_IMAGE068
The tolerance size should be more heavily biased toward being greater than ≧>
Figure 185395DEST_PATH_IMAGE068
The principle of the degree of dispersion is similar, the more discrete, the greater the size is->
Figure 791957DEST_PATH_IMAGE068
When the data of (1) is compressed as a compression point, the more fluctuation the subsequent data is, the larger the tolerance should be given so that the compression efficiency is better when the data is used as a compression point, and vice versa. The mean value of the difference values is weighted as a function of the number and the discreteness of the backward adjacent data, and the weighted mean value is finally used as the value in the characteristic sequence ^ H>
Figure 191845DEST_PATH_IMAGE068
The base tolerance of the data. It should be noted that the basic tolerance of the data with the same size in the feature sequence is the same.
Therefore, the variation characteristics of the uplink data and the downlink data of one data are analyzed to obtain an uplink adjustment coefficient and a downlink adjustment coefficient, and the difference value between the uplink data and the downlink data of the subsequent data is adjusted to obtain a proper basic tolerance of the data.
And S4, obtaining tolerance weight of data according to the frequency number of the data appearing in the characteristic sequence, and the uplink data and the downlink data of the data.
The basic tolerance calculated in step S3 is for data of the same size in the feature sequence, and the actual image condition is not considered, so that the basic tolerance needs to be corrected to obtain a tolerance with better compression effect.
For example, the repeated data appearing more times in the feature sequence has a smaller interval in the image, and when the gray value is used as a compression point, in order to enable the compressed smaller interval to have a better visual effect or a smaller distortion rate, the tolerance should be smaller when the repeated data is compressed, otherwise, the data appearing less times indicates that the interval of the corresponding gray value in the image is longer, so that the tolerance size should be larger when the repeated data is compressed, so as to ensure the corresponding compression rate, and therefore, the corresponding tolerance weight value of the data with different sizes needs to be calculated to adjust the basic tolerance, so as to obtain the final tolerance.
In particular, the size in the characteristic sequence is
Figure 477333DEST_PATH_IMAGE068
Taking data as an example, dividing the same data in the characteristic sequence into a class, and obtaining the quantity and the total class number of each class of data; acquiring uplink data and a difference value between the downlink data and the data; according to the data>
Figure 620869DEST_PATH_IMAGE068
The frequency of occurrence in the feature sequence, the number of each type of data in the feature sequence, the total number of classes, the difference and the data->
Figure 344587DEST_PATH_IMAGE068
The absolute value of the ratio of (a) to (b) yields the tolerance weight for the data, formulated as: />
Figure 40011DEST_PATH_IMAGE002
Wherein the content of the first and second substances,
Figure 688161DEST_PATH_IMAGE004
representing a tolerance weight of jth data in the feature sequence; />
Figure 494443DEST_PATH_IMAGE006
Representing the frequency of occurrence of the jth data in the signature sequence; />
Figure 341176DEST_PATH_IMAGE008
Expressed as the fifth or fifth degree>
Figure 82867DEST_PATH_IMAGE010
The number of data in the class data, the ^ h>
Figure 342947DEST_PATH_IMAGE010
The size of the data in the class data is->
Figure 562707DEST_PATH_IMAGE012
;/>
Figure 857422DEST_PATH_IMAGE014
Representing the total number of categories in the data sequence; />
Figure 504435DEST_PATH_IMAGE016
And &>
Figure 251812DEST_PATH_IMAGE018
Respectively indicating the number of uplink data and the number of downlink data of the data; />
Figure 540842DEST_PATH_IMAGE020
The ^ th of the upstream data and the downstream data representing the data>
Figure 424484DEST_PATH_IMAGE022
A piece of data; />
Figure 570294DEST_PATH_IMAGE024
An exponential function with a natural constant e as the base is shown.
Wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE134
the number of data in each category,
Figure DEST_PATH_IMAGE136
is->
Figure DEST_PATH_IMAGE138
Characterized by the fact that the data->
Figure 615086DEST_PATH_IMAGE068
The greater the relative number of occurrences, the greater the value, the same data in the characteristic sequence is declared->
Figure 176649DEST_PATH_IMAGE068
The more times of occurrence, the smaller the interval in the image to be compressed, the smaller the interval, the more the corresponding gray scale sensitivity needs to be considered, that is, when the gray scale value is taken as a compression point, the information loss rate between the gray scale value and the compression point needs to be reduced to compensate the visual perception of human eyes, that is, a smaller tolerance is needed; in the same way, the method for preparing the composite material,
Figure DEST_PATH_IMAGE140
indicates that data is greater or less>
Figure 993426DEST_PATH_IMAGE068
Upstream data and downstream data and data->
Figure 372455DEST_PATH_IMAGE068
The larger the value is, the more abundant the gradation change in the image is, and the tolerance is also made smaller when the data is used as a compression point. From this, data can be finally obtained->
Figure 704210DEST_PATH_IMAGE068
Tolerance weight of, to data/>
Figure 459677DEST_PATH_IMAGE068
The basic tolerance is adjusted, so that the adjusted basic tolerance can improve the compression effect of the SDT compression algorithm when the image data is compressed.
S5, obtaining the final tolerance of each datum based on the basic tolerance and the tolerance weight of each datum; and compressing the data sequence corresponding to the image by using the final tolerance of each piece of data, and sharing the compressed data to each secondary server for distributed storage.
In steps S3 and S4, the basic tolerance size corresponding to each data in the feature sequence and the tolerance weight corresponding thereto are calculated, and the basic tolerance is adjusted by using the tolerance weight, so as to obtain the final tolerance size of the data in the feature sequence.
By the size in the characteristic sequence of
Figure 193278DEST_PATH_IMAGE068
Take the data of (4) as an example, obtain a first preset value and a magnitude of @>
Figure 8787DEST_PATH_IMAGE068
And the result of the comparison, compared with the performance of the second preset value, multiplied by the base tolerance yields a value of ≥ h>
Figure 848346DEST_PATH_IMAGE068
Corresponds to a final tolerance which corresponds to the final tolerance ≥>
Figure DEST_PATH_IMAGE142
The calculation formula of (a) is as follows: />
Figure DEST_PATH_IMAGE144
Wherein the content of the first and second substances,
Figure 814028DEST_PATH_IMAGE128
indicates a size in the characteristic sequence of->
Figure 402135DEST_PATH_IMAGE068
Based on a tolerance of the data of (4), based on the comparison of the comparison result of (4)>
Figure 388546DEST_PATH_IMAGE004
Indicates a size of->
Figure 694893DEST_PATH_IMAGE068
Preferably, in this embodiment, the value of the first preset value is 1, and the value of the second preset value is 2; based on the basic tolerance, the corresponding tolerance weight is used to adjust it, specifically to use->
Figure DEST_PATH_IMAGE146
The adjustment is carried out in order to narrow the variation range of the tolerance weight, in order to prevent certain special cases in which the final tolerance is too high or too low, which could lead to a ^ th decision>
Figure 995425DEST_PATH_IMAGE026
When an image is compressed, the final tolerance is used for compression, so that the situation that the local compression rate is insufficient or the local information loss rate is high occurs, and therefore an implementer can adjust the values of the first preset value and the second preset value according to the specific situation.
Therefore, the final tolerance of each datum in the feature sequence can be obtained, the change condition of the datum is considered when the final tolerance of each datum is calculated, and the data in the feature sequence is the gray value of each pixel point of one channel in the image, so that the change condition of the gray value of each pixel point of one channel of the image is combined, and therefore when the image datum is compressed by the SDT algorithm based on the final tolerance of each datum, the compression rate can be improved, and the distortion of the image does not affect the normal use of the image.
In addition, the final tolerances of the same data in the feature sequence are all the same, and the image has gray values of several channels, and there are several one-dimensional data sequences, which require compression.
The final tolerance of each different size data in the feature sequence is adapted so that the same data has the most suitable final tolerance, and each different size data in the feature sequence corresponds to the second data
Figure 438039DEST_PATH_IMAGE026
A different gray value in the image to be compressed is obtained so that the ^ h or greater is obtained>
Figure 595350DEST_PATH_IMAGE026
The final tolerance corresponding to each gray value in the image to be compressed is set, and then the SD compression algorithm is utilized to carry out the fifth or fifth judgment on the gray values>
Figure 388994DEST_PATH_IMAGE026
And compressing the one-dimensional data sequence corresponding to the image to be compressed.
Specially, data sequences
Figure 24375DEST_PATH_IMAGE042
Is greater than or equal to>
Figure 587074DEST_PATH_IMAGE036
Data of individual data->
Figure 915287DEST_PATH_IMAGE034
The compressed data is->
Figure DEST_PATH_IMAGE148
Which finally takes the value->
Figure DEST_PATH_IMAGE150
The following were used:
Figure DEST_PATH_IMAGE152
because the compression by the SDT compression algorithm is followed by the second
Figure 396560DEST_PATH_IMAGE036
Number of data->
Figure 976577DEST_PATH_IMAGE034
The compression curve point is used to replace the gray value of the point, but the gray value is an integer, but the compression value at the point may be non-integer, so the membership calculation is performed in the above manner, and the nearest integer is membership to the point, so that the value is greater or less than the gray value of the point>
Figure DEST_PATH_IMAGE154
Indicating a rounding down. At this point, a ^ h>
Figure 924941DEST_PATH_IMAGE026
After the image to be compressed is compressed, the fifth or fifth judgment is obtained>
Figure 424056DEST_PATH_IMAGE026
Compressed data of an image.
After the compressed data is obtained, distributed storage is needed to be performed on the compressed data, wherein the distributed file system used in the present invention is MooseFS, and an implementer may also select other distributed file systems, and the specific distribution is as follows: the server comprises a metadata management server, a metadata log management server, a main server, a plurality of secondary servers and an NFS server. The working principle of all servers is as follows:
the NFS server: uploading, changing and viewing image data as a user client;
a metadata management server: managing metadata generated in a storage system;
metadata log management server: logging all metadata and managing the logs;
a main server: performing primary storage on all data;
and (3) the secondary server: and sharing the data of the main server and backing up the distributed storage.
The whole work flow is as follows:
firstly, an index is established in a metadata server, then an NFS server is used as a client to communicate with the metadata server, all operations of a user are based on the NFS server, for example, image files are uploaded, the user uploads the images at the client, then channel separation is carried out on RGB images in the NFS server, a gray value of each channel is obtained, then a self-adaptive final tolerance of each data is obtained by using the method for obtaining the final tolerance, and then compression is carried out by using an SDT compression algorithm.
And then, the compressed image data uploaded by the user side is stored by the main server, and then the compressed image data is backed up and shared to a plurality of auxiliary servers in a backup mode to realize distributed storage.
It should be noted that: the precedence order of the above embodiments of the present invention is only for description, and does not represent the merits of the embodiments. And specific embodiments thereof have been described above. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the present invention, and any modifications, equivalents, improvements and the like made within the scope of the present invention are intended to be included therein.

Claims (8)

1. A distributed compression storage method based on data sharing is characterized by comprising the following steps:
the gray value of one channel of each pixel point of the image forms a one-dimensional data sequence; removing redundant data in the data sequence to obtain a characteristic sequence;
acquiring uplink data and downlink data corresponding to one data in the SDT compression algorithm process of the characteristic sequence, and acquiring an uplink adjustment coefficient and a downlink adjustment coefficient of the data based on the uplink data and the downlink data;
obtaining the basic tolerance of data according to the uplink adjustment coefficient and the downlink adjustment coefficient of the data in the characteristic sequence and the uplink data and the downlink data of the data;
obtaining tolerance weight of data according to frequency number of the data appearing in the characteristic sequence, uplink data and downlink data of the data;
obtaining a final tolerance of each data based on the basic tolerance and the tolerance weight of each data; and compressing the data sequence corresponding to the image by using the final tolerance of each piece of data, and sharing the compressed data to each secondary server for distributed storage.
2. The distributed compression storage method based on data sharing according to claim 1, wherein the obtaining of the uplink data and the downlink data corresponding to one data in the SDT compression algorithm process by the signature sequence comprises: selecting one data, obtaining data which is equal to the data in the characteristic sequence, and recording the data as equal data; obtaining data which is backward adjacent to each equal data and is larger than the equal data, and recording the data as uplink data of the data; and obtaining data which is backward adjacent to each equal data and is smaller than the equal data, and recording the data as downlink data of the data.
3. The method according to claim 1, wherein obtaining the uplink adjustment coefficient and the downlink adjustment coefficient of the data based on the uplink data and the downlink data comprises:
obtaining the ratio of the number of uplink data corresponding to one data in the characteristic sequence to the sum of the number of the uplink data and the number of the downlink data; obtaining the reciprocal and standard deviation of the average value of all uplink data; the product of the ratio, the reciprocal of the average value and the standard deviation is a first calculation parameter of the uplink data, and a second calculation parameter of the downlink data is obtained in the same way; and obtaining the result of adding the first calculation parameter and the second calculation parameter, wherein the ratio of the first calculation parameter to the result of adding is the uplink adjustment coefficient of the data, and the ratio of the second calculation parameter to the result of adding is the downlink adjustment coefficient of the data.
4. The method according to claim 1, wherein obtaining the base tolerance of a data according to the uplink adjustment coefficient and the downlink adjustment coefficient of the data in the signature sequence, and the uplink data and the downlink data of the data comprises: taking an uplink adjustment coefficient of one data in the characteristic sequence as the weight of the average value of the difference value of each uplink data and the data, taking a downlink adjustment coefficient as the weight of the average value of the difference value of the data and each downlink data, and summing to obtain a summation result; and calculating the average value of the summation result, and taking the average value as the basic tolerance of the data.
5. The method of claim 1, wherein obtaining the tolerance weight of a data according to the frequency of occurrence of the data in the signature sequence, the uplink data and the downlink data of the data comprises: dividing the same data in the characteristic sequence into one class, and obtaining the quantity of each class of data and the total class quantity; acquiring uplink data and a difference value between the downlink data and the data; and obtaining the tolerance weight of the data according to the frequency of occurrence of the data, the quantity of each type of data in the characteristic sequence, the total class number and the absolute value of the ratio of the difference value to the data.
6. The distributed compression storage method based on data sharing according to claim 5, wherein the tolerance weight of the data is:
Figure DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 415718DEST_PATH_IMAGE002
representing a tolerance weight of jth data in the feature sequence; />
Figure DEST_PATH_IMAGE003
Representing the frequency of occurrence of the jth data in the signature sequence; />
Figure 398717DEST_PATH_IMAGE004
Expressed as the fifth or fifth degree>
Figure DEST_PATH_IMAGE005
The number of data in the class data, the ^ h>
Figure 731610DEST_PATH_IMAGE005
The size of the data in the class data is->
Figure 77140DEST_PATH_IMAGE006
;/>
Figure DEST_PATH_IMAGE007
Representing the total number of categories in the data sequence; />
Figure 837286DEST_PATH_IMAGE008
And &>
Figure DEST_PATH_IMAGE009
Respectively indicating the number of uplink data and the number of downlink data of the data; />
Figure 42002DEST_PATH_IMAGE010
The ^ th of the upstream data and the downstream data representing the data>
Figure DEST_PATH_IMAGE011
A piece of data; />
Figure 178586DEST_PATH_IMAGE012
An exponential function with a natural constant e as the base is shown.
7. The method of claim 1, wherein obtaining the final tolerance of each data based on the basic tolerance and the tolerance weight of each data comprises: and obtaining the sum of the first preset value and the tolerance weight, and multiplying the ratio of the sum to the second preset value by the basic tolerance to obtain the final tolerance.
8. The distributed compression storage method based on data sharing according to claim 1, wherein the removing redundant data in the data sequence to obtain a signature sequence includes: in the characteristic sequence, only one data in the continuously repeated data is reserved, and the discontinuously repeated data is directly reserved, so that the characteristic sequence is obtained.
CN202211660453.8A 2022-12-23 2022-12-23 Distributed compression storage method based on data sharing Pending CN115857823A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211660453.8A CN115857823A (en) 2022-12-23 2022-12-23 Distributed compression storage method based on data sharing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211660453.8A CN115857823A (en) 2022-12-23 2022-12-23 Distributed compression storage method based on data sharing

Publications (1)

Publication Number Publication Date
CN115857823A true CN115857823A (en) 2023-03-28

Family

ID=85654066

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211660453.8A Pending CN115857823A (en) 2022-12-23 2022-12-23 Distributed compression storage method based on data sharing

Country Status (1)

Country Link
CN (1) CN115857823A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117743280A (en) * 2024-02-21 2024-03-22 盈客通天下科技(大连)有限公司 Intelligent management method for highway bridge construction data
CN117998024B (en) * 2024-04-07 2024-05-31 中国医学科学院阜外医院 Ultrasonic image transmission method for heart auxiliary detection

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117743280A (en) * 2024-02-21 2024-03-22 盈客通天下科技(大连)有限公司 Intelligent management method for highway bridge construction data
CN117998024B (en) * 2024-04-07 2024-05-31 中国医学科学院阜外医院 Ultrasonic image transmission method for heart auxiliary detection

Similar Documents

Publication Publication Date Title
US10140545B2 (en) Methods and systems for differentiating synthetic and non-synthetic images
US7439989B2 (en) Detecting doctored JPEG images
US8260067B2 (en) Detection technique for digitally altered images
CN115857823A (en) Distributed compression storage method based on data sharing
WO2017190691A1 (en) Picture compression method and apparatus
EP1215909B1 (en) Image encoding device
CN111199740B (en) Unloading method for accelerating automatic voice recognition task based on edge calculation
EP2871847A1 (en) Apparatus and method for image processing
CN115359807B (en) Noise online monitoring system for urban noise pollution
US20070116371A1 (en) Decoding apparatus, inverse quantization method, and computer readable medium
Lo et al. Exploring semantic segmentation on the dct representation
CN105913001A (en) On-line type multi-face image processing method based on clustering
CN115987294A (en) Multidimensional data processing method of Internet of things
US7302107B2 (en) JPEG encoding for document images using pixel classification
CN112698940A (en) Vehicle auxiliary edge computing task distribution system for vehicle-road cooperation
CN116744006B (en) Video monitoring data storage method based on block chain
Garg et al. Analysis of image types, compression techniques and performance assessment metrics: A review
CN112950491A (en) Video processing method and device
Li et al. JPEG reversible data hiding using dynamic distortion optimizing with frequency priority reassignment
CN114610234A (en) Storage system parameter recommendation method and related device
Zhang et al. Human visual system guided reversible data hiding based on multiple histograms modification
CN114677535A (en) Training method of domain-adaptive image classification network, image classification method and device
Ponomarenko et al. Automatic approaches to on-land/on-board filtering and lossy compression of AVIRIS images
Pan et al. Complexity-scalable transform coding using variable complexity algorithms
CN116614673B (en) Short video pushing system based on special crowd

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination