CN112559257B

CN112559257B - Data storage method based on data screening

Info

Publication number: CN112559257B
Application number: CN202110189565.9A
Authority: CN
Inventors: 金树柏; 罗玲
Original assignee: Shenzhen Dcs Technology Co ltd
Current assignee: Shenzhen Dcs Technology Co ltd
Priority date: 2021-02-19
Filing date: 2021-02-19
Publication date: 2021-07-13
Anticipated expiration: 2041-02-19
Also published as: CN112559257A

Abstract

The invention relates to a data storage method based on data screening, which comprises the steps of obtaining data to be stored, wherein the data to be stored comprises a plurality of hierarchical data, and data information is also included under each hierarchical data; a level matrix G (G1, G2, G3, G4 and G5) and a data similarity coefficient matrix K (K1, K2, K3, K4 and K5) are further arranged in the central processing unit, when data similarity comparison is carried out, if the similarity of two data of any level is larger than or equal to the coefficient corresponding to the level, the similarity of the two data of the level is high, the data are judged to be repeated data, otherwise, data screening is not carried out, and the screened level data and corresponding data information under the level data are stored to serve as different-place backup data of the current data level. The stored data are divided according to a specific format, and different similarity coefficients are set for each hierarchy, so that the similarity of the data can be accurately judged, and the accuracy of data comparison is improved by adopting parameters with higher similarity coefficients.

Description

Data storage method based on data screening

Technical Field

The invention relates to the field of data storage, in particular to a data storage method based on data screening.

Background

Currently, in the process of informatization development, a large amount of information data are generated rapidly, and people can analyze the actual operation conditions of various enterprises according to the data and mine high-value information in the actual operation conditions. How to realize the efficient management of the various types of data also becomes a key research content at the present stage. Even if a sufficient number of storage devices are constructed to complete the storage process of data, a large amount of network bandwidth is required to be occupied when the data is transmitted, causing a problem of network congestion.

Since many similar data are duplicated when data is stored, such duplicated data are usually backup portions generated to ensure data stability and avoid loss, and part of data are duplicated when the same data is stored due to an error operation or some other factors. Under the influence of rapid increase of data volume, current storage systems are challenged in many aspects, and in order to further increase storage speed, effective measures are required to eliminate various kinds of redundant information, which is also a key method for overcoming the limitation of storage capacity. A redundancy screening method can be introduced to eliminate repeated data existing in each file after analysis and processing, so that the effect of reducing data is achieved, and the storage space of the data is effectively reduced.

However, when data is screened, blocking is usually performed, but the content of the blocking rule is usually blocked, but the blocking is performed according to the content, so that the data block is too long or too short, which is still not beneficial to data storage, limits the speed of data storage, and affects the integrity of the data content.

Disclosure of Invention

Therefore, the invention provides a data storage method based on data screening, which can determine the similarity of data through the comparison of hierarchical data so as to screen the data and improve the screening efficiency.

In order to achieve the above object, the present invention provides a data storage method based on data screening, which includes:

acquiring data to be stored, wherein the data to be stored comprises a plurality of hierarchical data and data information under each hierarchical data;

a hierarchy matrix G (G1, G2, G3, G4, G5) is further provided within the central processor, wherein G1 represents a first hierarchy, G2 represents a second hierarchy, G3 represents a third hierarchy, G4 represents a fourth hierarchy, and G5 represents a fifth hierarchy, wherein data of the first hierarchy is larger than data of the second hierarchy, data of the second hierarchy is larger than data of the third hierarchy, data of the third hierarchy is larger than data of the fourth hierarchy, and data of the fourth hierarchy is larger than data of the fifth hierarchy;

the central processing unit is also internally provided with a first coefficient K1, a second coefficient K2, a third coefficient K3, a fourth coefficient K4 and a fifth coefficient K5, wherein K1 is smaller than K2, K2 is smaller than K3, K3 is smaller than K4, and K4 is smaller than K5;

when the first level comprises a second level, the second level comprises a third level, the third level comprises a fourth level, the fourth level comprises a fifth level, and the data of the first level and the data of the fourth level are screened at the moment;

if the data is in the first level, selecting a first coefficient K1 from the data similarity coefficient matrix K (K1, K2, K3, K4 and K5) for comparison, and if the similarity of the two data in the first level is smaller than a first coefficient K1, indicating that the two data are different;

when a second level of data is compared, selecting a second coefficient K2 from the data similarity coefficient matrix K (K1, K2, K3, K4 and K5) for comparison, if the similarity of two data of the second level is greater than or equal to a second coefficient K2, indicating that the similarity of the two data of the second level is high, determining that the data are repeated, and performing data screening on the second level and the data contained in the second level;

if the similarity of the two data of the second hierarchy is smaller than a second coefficient K2, the two data are different;

when a third level of data is compared, selecting a third coefficient K3 from the data similarity coefficient matrix K (K1, K2, K3, K4 and K5) for comparison, if the similarity of two data of the third level is more than or equal to the third coefficient K3, indicating that the similarity of the two data of the third level is higher, determining that the data are repeated, and then performing data screening on the third level and the data contained in the third level;

if the similarity of the two data at the third level is less than a third coefficient K3, the two data are different;

when a fourth level of data is compared, selecting a fourth coefficient K4 from the data similarity coefficient matrix K (K1, K2, K3, K4 and K5) for comparison, if the similarity of two data of the same fourth level is greater than or equal to a fourth coefficient K4, indicating that the similarity of the two data of the fourth level is high, determining that the data are repeated, and performing data screening on the fourth level and the data contained in the fourth level;

if the similarity of the two data of the fourth level is smaller than a fourth coefficient K4, the two data are different;

when the fifth level of the data is compared, selecting a fifth coefficient K5 from the data similarity coefficient matrix K (K1, K2, K3, K4 and K5) for comparison, if the similarity of two data of the fifth level is more than or equal to a fifth coefficient K5, indicating that the similarity of the two data of the fifth level is higher, determining that the data are repeated, and performing data screening on the fifth level and the data contained in the fifth level;

if the similarity of the two data of the fifth layer level is smaller than a fifth coefficient K5, the two data are different;

storing the screened hierarchical data and corresponding data information under the hierarchical data into a mirror pool as allopatric backup data of the current hierarchical data;

before any data is subjected to screening comparison, data to be compared with the data is determined, and a data correlation matrix D (R1, R2, R3 and R4) is further arranged in the central processing unit, wherein R1 represents a first correlation, R2 represents a second correlation, R3 represents a third correlation, R4 represents a fourth correlation, R1 is larger than R2, R2 is larger than R3, and R3 is larger than R4;

in the current database, determining the relevance R of any data in other data except the current data and the current data;

setting a data similarity comprehensive evaluation coefficient K00,

K00=K1/K10+K2/K20+K3/K30+K4/K40+K5/K50+R/(R1+R2+R3+R4)，

wherein K10 represents a standard level coefficient corresponding to a first level, K20 represents a standard level coefficient corresponding to a second level, K30 represents a standard level coefficient corresponding to a third level, K40 represents a standard level coefficient corresponding to a fourth level, and K50 represents a standard level coefficient corresponding to a fifth level.

Further, if the degree of correlation R with the current data is greater than or equal to the first degree of correlation R1, the priority of the similarity comparison with the current data is the highest, and the priority is the first priority data;

if the first correlation degree R1> is more than or equal to the second correlation degree R2 with the correlation degree R of the current data, the priority of similarity comparison with the current data is the second priority data;

if the correlation degree R of the second correlation degree R2> and the current data is larger than or equal to the third correlation degree R3, the current data is the third-priority data;

if the third correlation degree R3> is not less than the fourth correlation degree R4 with the correlation degree R of the current data, the priority is the lowest and is the fourth priority data;

if the degree of correlation R with the current data < the fourth degree of correlation R4, a similarity comparison with the current data is not required.

Further, the byte length and the key information are adopted for the correlation of the data to determine the correlation of the two data, if the byte lengths are the same, the two data may be similar data, if the byte lengths are different, the two data may not be similar data, if the byte lengths are the same, whether the key information of the two data is the same is determined, and if the key information is also the same, the levels in the data structure need to be further compared to determine the correlation.

Further, for data Gi of any hierarchy, i =1, 2, 3, 4, 5, which includes n data structures, each of which takes a value Lj, the similarity calculation formula for data Gi of any hierarchy is Si = Σ Lj × aj, where aj represents a weight coefficient corresponding to Lj =1, 2, 3 …, n.

Further, a data similarity coefficient standard K0 is set in the central processing unit, and when the similarity coefficient of any two hierarchical data in the hierarchical data is greater than or equal to the data similarity coefficient standard K0, one of the hierarchical data and the corresponding data information under the hierarchical data are deleted;

the data similarity coefficient criterion K0 is,

K0=(K1/K2+K2/K3+K3/K4+K4/K5)/4+4R4/(R1+R2+R3+R4)。

further, when n is 3, L1 corresponds to a weight coefficient a1= 0.75;

l2 corresponding to a weight coefficient a2= 0.15;

l3 corresponds to a weight factor a3= 0.1.

Further, the storing the screened hierarchical data and the corresponding data information under the hierarchical data into a mirror pool as the remote backup data of the current data hierarchy includes:

and setting a key for the remote backup data, wherein the key is generated according to the byte length of the screened data.

Furthermore, a key matrix P (P1, P2, P3, P4, P5 …, Pn) is also arranged in the central processing unit, wherein P1 represents a first key corresponding to a first byte length, P2 represents a second key corresponding to a second byte length, P3 represents a third key corresponding to a third byte length, P4 represents a fourth key corresponding to a fourth byte length, P5 represents a fifth key corresponding to a fifth byte length, Pn represents an nth key corresponding to an nth byte length;

in a mirror pool, if the byte length of the off-site backup data belongs to a first byte length, a first key is selected from the key matrix P (P1, P2, P3, P4, P5 …, Pn) to encrypt the off-site backup data;

and if the byte length of the off-site backup data belongs to the nth byte length, selecting the nth key from the key matrix P (P1, P2, P3, P4, P5 …, Pn) to encrypt the off-site backup data.

Further, a key updating coefficient S (S1, S2, S3) is set in the central processor, wherein S1 represents a first updating coefficient, S2 represents a second updating coefficient, and S3 represents a third updating coefficient;

when the total amount of data in the database is screened by 1/3 total amount of data, updating a key matrix P (P1, P2, P3, P4, P5 …, Pn) by using a first updating coefficient S1;

if 1/4 total data are screened, updating the key matrix P (P1, P2, P3, P4, P5 …, Pn) by adopting a second updating coefficient S2;

if the data of 1/5 total data amount is screened, the key matrix P (P1, P2, P3, P4, P5 …, Pn) is updated by using the third update coefficient S3.

Further, the first update coefficient S1= R1/R2;

the second update coefficient S2= R2/R3;

the third update coefficient S3= R3/R4.

Compared with the prior art, the method has the advantages that the stored data are divided according to a specific format, and different similarity coefficients are set for each hierarchy, so that the similarity evaluation standards for each hierarchy are different, so that lower parameters can be adopted when the coefficients with higher hierarchy similarity are set, and the accuracy of data screening can be influenced if the similarity coefficients are too high because the data with higher hierarchy contain more data.

Particularly, the relevance of other data and the current data is determined to be sorted, and the higher the relevance is, the higher the priority level of similarity comparison is, so that the analysis efficiency of data screening is higher, and the screening efficiency is improved.

Particularly, before hierarchical comparison is carried out on the data, the byte length of the data and the key information are required to be compared, and a part of data which is obviously impossible to be repeated data can be screened out by roughly comparing the byte length with the key information, so that the comparison time is saved, the efficiency of data comparison and analysis is greatly improved, and the efficiency of data screening is further improved.

In particular, different weight coefficients are set for different data structures of data of any hierarchy, so that the screening and analysis of the data structures are facilitated, and the screening accuracy and the screening efficiency are effectively improved.

Particularly, by setting the data similarity coefficient standard, if the similarity coefficient of any two levels of data is greater than or equal to the data similarity coefficient standard K0, the data to be screened is represented, the data is judged more intuitively and conveniently, and in practical application, the data similarity coefficient standard is represented by using each coefficient parameter in the similarity coefficient matrix and the parameter in the data correlation matrix.

In particular, in the data storage method based on data screening provided by the embodiment of the invention, the encrypted storage of the data is realized by setting the key for the backup data, the data tampering by a third party is prevented, the safety of the stored data is effectively improved, when the data is damaged, the damaged data can be repaired according to the data in the mirror image pool, and the safety and the stability of the database are improved.

Particularly, in the embodiment of the invention, the data with different byte lengths are encrypted by adopting different keys, so that the safety of the data is improved, in the actual operation, if the same key is adopted, all the keys are cracked, and great risk exists.

Particularly, the key matrix is updated through evaluation of the data amount to be screened in the database, and the dynamic key is used for storage, so that the storage of data is safer, and the update of the key matrix is performed according to the actual data amount of the database, so that association is established between data simplification and remote backup.

Drawings

Fig. 1 is a flowchart of a data storage method based on data screening according to an embodiment of the present invention.

Detailed Description

In order that the objects and advantages of the invention will be more clearly understood, the invention is further described below with reference to examples; it should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and do not limit the scope of the present invention.

It should be noted that in the description of the present invention, the terms of direction or positional relationship indicated by the terms "upper", "lower", "left", "right", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, which are only for convenience of description, and do not indicate or imply that the device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and thus, should not be construed as limiting the present invention.

Furthermore, it should be noted that, in the description of the present invention, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

Referring to fig. 1, a data storage method based on data filtering according to an embodiment of the present invention includes:

s100, acquiring data to be stored, wherein the data to be stored comprises a plurality of hierarchical data, and data information is also included under each hierarchical data, and the data information can be multimedia data such as images, videos, audios and the like, and can also be data such as characters and/or numbers and the like.

S200, a hierarchy matrix G (G1, G2, G3, G4 and G5) and a data similarity coefficient matrix K (K1, K2, K3, K4 and K5) are further arranged in the central processing unit, when data similarity comparison is carried out, if the similarity of two data of any hierarchy is larger than or equal to a coefficient corresponding to the hierarchy, the similarity of the two data of the hierarchy is high, the data is judged to be repeated data, a data similarity coefficient standard K0 is arranged in the central processing unit, and when the similarity coefficient of any two hierarchies in the hierarchy is larger than or equal to a data similarity coefficient standard K0, one hierarchy and corresponding data information under the hierarchy are deleted;

and S300, storing the deleted hierarchical data and the corresponding data information under the hierarchical data.

Specifically, in the embodiment of the invention, any two levels of data are compared by setting the data similarity coefficient standard K0, and if the data similarity coefficient standard K0 is higher than the data similarity coefficient standard K0, one level of data and corresponding data information under the level of data are deleted, so that the data is more conveniently and accurately screened, and the screening efficiency is high.

A hierarchy matrix G (G1, G2, G3, G4, G5) is further provided within the central processor, wherein G1 represents a first hierarchy, G2 represents a second hierarchy, G3 represents a third hierarchy, G4 represents a fourth hierarchy, and G5 represents a fifth hierarchy, wherein data of the first hierarchy is larger than data of the second hierarchy, data of the second hierarchy is larger than data of the third hierarchy, data of the third hierarchy is larger than data of the fourth hierarchy, and data of the fourth hierarchy is larger than data of the fifth hierarchy; and for any data, any one of the first hierarchy, the second hierarchy, the third hierarchy, the fourth hierarchy and the fifth hierarchy can be empty.

if any data in other data after the current data in the current database is removed, the degree of correlation R between the data and the current data is determined, a data similarity comprehensive evaluation coefficient K00 is set,

K00=K1/K10+K2/K20+K3/K30+K4/K40+K5/K50+R/(R1+R2+R3+R4)，

The data storage method based on data screening provided by the embodiment of the invention divides the stored data according to a specific format, and a different similarity coefficient is set for each hierarchy, so that the similarity evaluation criterion is different for each hierarchy, so that the lower parameters can be used when setting the coefficients of similarity with higher hierarchy, since the data with high level contains more data, if the similarity coefficient is too high, the accuracy of data screening will be affected, because in order to improve the accuracy of screening, the data with high level adopts a lower similarity coefficient, the data with low level adopts a higher similarity coefficient, the data quantity contained in the data with low level is smaller, the similarity of the data is convenient to judge accurately, and the data comparison accuracy is improved by adopting a parameter with higher similarity coefficient.

Specifically, the data similarity comprehensive evaluation coefficient K00 is set, so that the data evaluation is more accurate, the real-time correlation of the data and the parameters in the data correlation matrix D (R1, R2, R3, R4) need to be referred to in addition to the consideration of the hierarchy similarity of the data, the data similarity evaluation is more accurate, the screening accuracy is further improved, and the data transmission backup efficiency is further improved.

In the embodiment of the invention, when data are compared, if the similarity of two data of the first level exceeds the first coefficient, the subsequent data do not need to be compared, and the screening operation is directly executed, so that the time for comparing data of other levels is saved, and the efficiency for screening the data is greatly improved.

Specifically, if the degree of correlation R with the current data is equal to or greater than the first degree of correlation R1, the priority of the similarity comparison with the current data is the highest, and the data is the first priority data;

if the first degree of correlation R1 is greater than or equal to the second degree of correlation R2, the priority of similarity comparison with the current data is the second priority data;

if the second degree of correlation R2 is greater than or equal to the third degree of correlation R3, the data is third-priority data;

if the third correlation degree R3 is greater than or equal to the fourth correlation degree R4, the priority is the lowest and is the fourth priority data;

if the correlation degree R with the current data is smaller than the fourth correlation degree R4, similarity comparison with the current data is not needed, and when the data similarity comparison is carried out, the sequence of the first priority data, the second priority data, the third priority data and the fourth priority data is adopted to be compared with the current data.

Specifically, according to the data storage method based on data screening provided by the embodiment of the present invention, the relevance between other data and the current data is determined and ranked, and the higher the relevance is, the higher the priority level of the similarity comparison is, so that the analysis efficiency of data screening is higher, and the screening efficiency is improved.

Specifically, the byte length and the key information are adopted for the relevancy of the data to determine the relevancy of the two data, if the byte lengths are the same, the two data may be similar data, if the byte lengths are different, the two data may not be similar data, if the byte lengths are the same, whether the key information of the two data is the same is determined, and if the key information is also the same, the levels in the data structure need to be further compared to determine the relevancy.

Specifically, according to the data storage method based on data screening provided in the embodiment of the present invention, before performing hierarchical comparison on data, the byte length of the data needs to be compared with the key information, and by roughly comparing the byte length with the key information, a part of data that is obviously unlikely to be duplicated data can be screened out, so that the comparison time is saved, the efficiency of data comparison and analysis is greatly improved, and the efficiency of data screening is further improved.

Specifically, the data storage method based on data screening provided by the embodiment of the present invention is described as follows by way of example: the data to be stored in the database is 'I is Chinese', the other data is 'I is not Chinese', the data which are not similar to each other can be directly eliminated through byte length comparison, the data are not screened and are stored in the database and stored in a mirror image pool, however, if the data to be stored in the database are 'I is Chinese' and 'he is Chinese', the data with the structure are divided into three sections, namely, a first section L1 which is a second section L2, a Chinese person which is a third section L3, the corresponding data are the first section which is a second section and Chinese person which is a third section, the contents of the second section and the third section are completely the same in the embodiment of the invention, and the contents of the first section are completely different, in the specific implementation process, the similarity S of the data is determined by setting the proportion of the data structure sections, wherein the weight of the first section is a1, the weight of the second section is a2, the weight of the third segment is a3, the similarity calculation of the data "i is Chinese" is S1= L1 × a1+ L2 × a2+ L3 × a3, the similarity calculation of the data "he is Chinese" is S2= L1' × a1+ L2 × a2+ L3 × a3, and the weight a1 of the first segment is greater than the weight a2 of the second segment and is greater than or equal to the weight a3 of the third segment in the data structure, in the embodiment of the invention, the weight a1 is far greater than a2 and a3, so as to realize accurate judgment of the similar phase of the data.

According to the embodiment of the invention, different weight coefficients are set for different data structures of data of any hierarchy, so that the screening analysis of the data structures is facilitated, and the screening accuracy and the screening efficiency are effectively improved.

Specifically, for data Gi of an arbitrary hierarchy, i =1, 2, 3, 4, 5, which includes n data structures, each of which takes a value Lj, the similarity calculation formula for data Gi of an arbitrary hierarchy is Si = Σ Lj × aj, where aj represents a weight coefficient corresponding to Lj, and j =1, 2, 3 …, n.

Specifically, in the embodiment of the invention, similarity calculation is performed on each data structure in the data, so that each data structure in data of any hierarchy participates in similarity calculation, and the weight coefficients set by each data structure are different, so that the similarity of the data can be flexibly reflected, the data is more accurate in screening, and the screening accuracy and the simplification efficiency are improved.

Specifically, a data similarity coefficient standard K0 is set in the central processing unit, and when the similarity coefficient of any two hierarchical data in the hierarchical data is greater than or equal to the data similarity coefficient standard K0, one of the hierarchical data and the corresponding data information under the hierarchical data are deleted;

the data similarity coefficient criterion K0 is,

K0=(K1/K2+K2/K3+K3/K4+K4/K5)/4+4R4/(R1+R2+R3+R4)。

specifically, by setting the data similarity coefficient standard, if the similarity coefficient of any two levels of data is greater than or equal to the data similarity coefficient standard K0, the data to be screened is represented, the data is judged more intuitively and conveniently, and in practical application, the data similarity coefficient standard is represented by using each coefficient parameter in the similarity coefficient matrix and a parameter in the data correlation matrix.

Specifically, when n is 3, the weight coefficient a1=0.75 corresponding to L1;

l2 corresponding to a weight coefficient a2= 0.15;

l3 corresponds to a weight factor a3= 0.1.

Specifically, the weight coefficient of a simple data structure is set, and the similarity of data is calculated by adopting the weight coefficient in the embodiment of the invention, so that the method is visual, convenient, quick and accurate, the convenience of calculation is improved, the calculation efficiency is improved, and the processing speed of data screening is further improved.

Specifically, the storing the screened hierarchical data and the corresponding data information under the hierarchical data into a mirror pool as the remote backup data of the current data hierarchy includes:

and setting a key for the stored data, wherein the key is generated according to the byte length of the screened data.

Specifically, in the data storage method based on data screening provided by the embodiment of the present invention, by setting a key for backup data, encrypted storage of the data is realized, tampering of the data by a third party is prevented, security of the stored data is effectively improved, when the data is damaged, the damaged data can be repaired according to the data in the mirror image pool, and security and stability of the database are improved.

Specifically, a key matrix P (P1, P2, P3, P4, P5 …, Pn) is further arranged in the central processing unit, wherein P1 represents a first key corresponding to a first byte length, P2 represents a second key corresponding to a second byte length, P3 represents a third key corresponding to a third byte length, P4 represents a fourth key corresponding to a fourth byte length, P5 represents a fifth key corresponding to a fifth byte length, Pn represents an nth key corresponding to an nth byte length;

in the mirror image pool, if the byte length of the data for data storage belongs to a first byte length, a first key is selected from the key matrix P (P1, P2, P3, P4, P5 …, Pn) to encrypt the remote backup data;

and if the byte length of the data for data storage belongs to the nth byte length, selecting the nth key from the key matrix P (P1, P2, P3, P4, P5 …, Pn) to encrypt the offsite backup data.

Specifically, in the embodiment of the invention, different keys are used for encrypting data with different byte lengths, so that the safety of the data is improved.

Specifically, a key update coefficient S (S1, S2, S3) is further provided in the central processor, where S1 denotes a first update coefficient, S2 denotes a second update coefficient, and S3 denotes a third update coefficient;

In the embodiment of the invention, the key matrix is updated by evaluating the data quantity to be screened in the database, and the dynamic key is adopted for storage, so that the storage of the data is safer, and the updating of the key matrix is carried out according to the actual data volume of the database, so that the association between the data screening and the data storage is established, if the screened data is more, the data amount stored in the database is small, the useful data of the database is not much, the encryption can be performed by using a slightly simple secret key, if the data screened in the database is small, the data stored in the database is large, the secret key of the database needs to be upgraded, the data is effectively protected by using a complex secret key, the safety of the data is improved, and the risk of repairing the data is reduced.

Specifically, the first update coefficient S1= R1/R2;

the second update coefficient S2= R2/R3;

the third update coefficient S3= R3/R4.

Specifically, the update coefficient in the embodiment of the present invention is expressed according to the correlation of the data, if the correlation of the data is higher, the update coefficient after the quotient is smaller, and the size of the update coefficient determines how much the key is changed, if the coefficient is smaller, the key is not changed much, and if the coefficient is higher, the key is changed much, so that the screened data is more conveniently stored in the mirror image pool, and the adopted encryption method is more effective, thereby improving the security of data storage. Those skilled in the art will appreciate that the location where the data is stored may be a mirror pool, or may be in other storage structures.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention; various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A data storage method based on data screening is characterized by comprising the following steps:

a hierarchy matrix G (G1, G2, G3, G4, G5) is further provided within the central processor, wherein G1 represents a first hierarchy, G2 represents a second hierarchy, G3 represents a third hierarchy, G4 represents a fourth hierarchy, and G5 represents a fifth hierarchy, wherein the data of the first hierarchy is larger than the data of the second hierarchy, the data of the second hierarchy is larger than the data of the third hierarchy, the data of the third hierarchy is larger than the data of the fourth hierarchy, and the data of the fourth hierarchy is larger than the data of the fifth hierarchy;

when data similarity comparison is carried out, if the similarity of two data of the first level is greater than or equal to a first coefficient K1, the similarity of the two data of the first level is high, and the data is judged to be repeated data;

if the data is in the first level, selecting a first coefficient K1 from a data similarity coefficient matrix K (K1, K2, K3, K4 and K5) for comparison, and if the similarity of the two data in the first level is smaller than a first coefficient K1, indicating that the two data are different;

setting a data similarity comprehensive evaluation coefficient K00,

K00=K1/K10+K2/K20+K3/K30+K4/K40+K5/K50+R/(R1+R2+R3+R4)，

wherein K10 represents a standard level coefficient corresponding to a first level, K20 represents a standard level coefficient corresponding to a second level, K30 represents a standard level coefficient corresponding to a third level, K40 represents a standard level coefficient corresponding to a fourth level, and K50 represents a standard level coefficient corresponding to a fifth level;

storing the screened hierarchical data and the corresponding data information under the hierarchical data into a mirror pool, wherein the allopatric backup data serving as the current hierarchical data comprises the following steps:

setting a key for the remote backup data, wherein the key is generated according to the byte length of the screened data;

a key matrix P (P1, P2, P3, P4, P5 … and Pn) is further arranged in the central processing unit, wherein P1 represents a first key corresponding to a first byte length, P2 represents a second key corresponding to a second byte length, P3 represents a third key corresponding to a third byte length, P4 represents a fourth key corresponding to a fourth byte length, P5 represents a fifth key corresponding to a fifth byte length, and Pn represents an nth key corresponding to an nth byte length;

2. The data storage method based on data filtering of claim 1,

if the correlation degree R with the current data is larger than or equal to the first correlation degree R1, the priority of similarity comparison with the current data is the highest and is the first priority data;

3. The data storage method based on data screening as claimed in claim 2, wherein the correlation of the two data is determined by using byte length and key information for the correlation of the data, if the byte length is the same, the two data may be similar data, if the byte length is different, the two data may not be similar data, if the byte length is the same, it is determined whether the key information of the two data is the same, if the key information is also the same, the respective levels in the data structure need to be further compared to determine the correlation.

4. The data storage method based on data screening as claimed in claim 3, wherein, for data Gi, i =1, 2, 3, 4, 5 at any level, the data Gi includes n data structures, each data structure is respectively subjected to a value Lj, and the similarity calculation formula for data Gi at any level is Si = Σ Lj × aj, where aj represents a weight coefficient corresponding to Lj, and j =1, 2, 3 …, n.

5. The data storage method based on data filtering of claim 2,

setting a data similarity coefficient standard K0 in the central processing unit, and deleting one of the hierarchical data and corresponding data information under the hierarchical data when the similarity coefficient of any two hierarchical data in the hierarchical data is more than or equal to the data similarity coefficient standard K0;

the data similarity coefficient criterion K0 is,

K0=(K1/K2+K2/K3+K3/K4+K4/K5)/4+4R4/(R1+R2+R3+R4)。

6. the data storage method based on data screening of claim 4, wherein when n is 3, L1 corresponds to a weight coefficient a1= 0.75;

l2 corresponding to a weight coefficient a2= 0.15;

l3 corresponds to a weight factor a3= 0.1.

7. The data storage method based on data filtering as claimed in claim 1, wherein a key update coefficient S (S1, S2, S3) is further provided in the central processor, wherein S1 represents a first update coefficient, S2 represents a second update coefficient, and S3 represents a third update coefficient;

8. The data storage method based on data filtering of claim 7, wherein the first update coefficient S1= R1/R2;

the second update coefficient S2= R2/R3;

the third update coefficient S3= R3/R4.