CN113126879A - Data storage method and device and electronic equipment - Google Patents

Data storage method and device and electronic equipment Download PDF

Info

Publication number
CN113126879A
CN113126879A CN201911391011.6A CN201911391011A CN113126879A CN 113126879 A CN113126879 A CN 113126879A CN 201911391011 A CN201911391011 A CN 201911391011A CN 113126879 A CN113126879 A CN 113126879A
Authority
CN
China
Prior art keywords
data
value
target data
module
fingerprint
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911391011.6A
Other languages
Chinese (zh)
Other versions
CN113126879B (en
Inventor
李露璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Sichuan Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Sichuan Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Sichuan Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201911391011.6A priority Critical patent/CN113126879B/en
Publication of CN113126879A publication Critical patent/CN113126879A/en
Application granted granted Critical
Publication of CN113126879B publication Critical patent/CN113126879B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • G06F3/0641De-duplication techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the specification discloses a data storage method, a data storage device and electronic equipment, which are used for solving the problem of low efficiency of data de-duplication in the prior art. The scheme comprises the following steps: cutting target data based on a data blocking algorithm to obtain a plurality of data blocks; then, matching an initial input value for the target data based on the security level of the target data to determine a filling value, wherein the initial input value is positively correlated with the security level of the target data; then, sequentially performing data absorption and extrusion processing on the plurality of data blocks based on the filling values, and calculating to obtain data fingerprints of the plurality of data blocks corresponding to the target data; if the data fingerprint is in the fingerprint index table, adding the data fingerprint in the logic view; otherwise, adding the fingerprint index of the corresponding data block in the fingerprint index table. Therefore, target data with different safety levels can be dealt with by matching appropriate initial input values for the target data, and especially when the data are huge, the deleting efficiency of repeated data can be effectively improved.

Description

Data storage method and device and electronic equipment
Technical Field
The present disclosure relates to the field of computer software technologies, and in particular, to a data storage method and apparatus, and an electronic device.
Background
Data storage has become a relatively mature technology today with the rapid development of information. However, in the face of rapid expansion of data, waste of storage resources is reduced, and storage utilization efficiency is improved, which is a current urgent demand for efficient storage concepts. The data de-duplication is a key technology for realizing data reduction, and mainly focuses on de-duplication of data blocks, so that the aim of reducing data capacity is fulfilled. The technology can reduce the requirement on physical storage space to a great extent, reduce network bandwidth in the transmission process and effectively save equipment purchase and maintenance cost. Meanwhile, the method is also a green storage technology and can effectively reduce energy consumption.
At present, the deduplication technology mainly determines identical data blocks through data fingerprints of the data blocks. Data fingerprints are generally calculated by using an MD5 Algorithm (Message-Digest Algorithm 5), an SHA-2 Algorithm (Secure Hash Algorithm 2), and the like, but the two algorithms as mainstream algorithms have reached bottlenecks in performance and safety protection.
Therefore, it is highly desirable to find a new data storage scheme to improve the efficiency of data de-duplication.
Disclosure of Invention
An object of an embodiment of the present specification is to provide a data storage method, an apparatus, and an electronic device, so as to improve efficiency of data de-duplication.
In order to solve the above technical problem, the embodiments of the present specification are implemented as follows:
in a first aspect, a data storage method is provided, including:
cutting target data based on a data blocking algorithm to obtain a plurality of data blocks;
matching an initial input value for the target data based on a security level of the target data to determine a padding value, wherein the initial input value is positively correlated with the security level of the target data;
sequentially performing data absorption and extrusion processing on the data blocks based on the filling values, and calculating to obtain data fingerprints of the data blocks corresponding to the target data;
if the data fingerprint is in a fingerprint index table, adding the data fingerprint in a logical view; otherwise, adding the fingerprint index of the corresponding data block in the fingerprint index table.
In a second aspect, a data storage device is presented, comprising:
the cutting module is used for cutting the target data based on a data blocking algorithm to obtain a plurality of data blocks;
a matching module for matching an initial input value for the target data based on the security level of the target data to determine a padding value, wherein the initial input value is positively correlated with the security level of the target data;
the calculation module is used for sequentially performing data absorption and extrusion processing on the data blocks based on the filling values and calculating to obtain a plurality of data block data fingerprints corresponding to the target data;
the processing module is used for adding the data fingerprint in the logic view if the data fingerprint is in the fingerprint index table; otherwise, adding the fingerprint index of the corresponding data block in the fingerprint index table.
In a third aspect, an electronic device is provided, including:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to perform a data storage method as described in the first aspect.
In a fourth aspect, a computer-readable storage medium is proposed, which when executed on a computer causes the computer to perform the data storage method according to the first aspect.
According to the technical scheme provided by the embodiment of the specification, the target data is cut based on a data blocking algorithm to obtain a plurality of data blocks; then matching an initial input value for the target data based on the security level of the target data to determine a filling value, wherein the initial input value is positively correlated with the security level of the target data; then, sequentially performing data absorption and extrusion processing on the plurality of data blocks based on the filling values, and calculating to obtain data fingerprints of the plurality of data blocks corresponding to the target data; if the data fingerprint is in a fingerprint index table, adding the data fingerprint in a logical view; otherwise, adding the fingerprint index of the corresponding data block in the fingerprint index table. Therefore, target data with different safety levels can be dealt with by matching appropriate initial input values for the target data, and especially when the data are huge, the deleting efficiency of repeated data can be effectively improved.
Drawings
In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present specification, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.
Fig. 1 is a schematic step diagram of a data storage method provided in an embodiment of the present specification.
Fig. 2a and 2b are exemplary diagrams of two data blocking algorithms provided in an embodiment of the present specification.
Fig. 3 is a schematic diagram of a cavernous structure of a Keccak encryption algorithm provided in an embodiment of the present specification.
Fig. 4a and 4b are schematic diagrams of two internal module structures of a preset function provided in an embodiment of the present specification.
Fig. 5 is a schematic structural diagram of a data storage device 200 according to an embodiment of the present disclosure.
Fig. 6 is a schematic structural diagram of an electronic device provided in an embodiment of the present specification.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present specification, and not all of the embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification without any inventive step should fall within the scope of protection of the present specification.
The embodiment of the specification provides a data storage method and a data storage device, and aims to solve the problems that when data storage is carried out in the prior art, efficiency of deleting repeated data is low, and storage bottleneck exists. In the embodiment of the specification, the data fingerprint is calculated by adopting an improved Keccak encryption algorithm, and particularly when the data is huge, a proper initial input value can be matched based on the data security level to determine the filling value, so that the efficiency of deleting the repeated data can be effectively improved.
Referring to fig. 1, a schematic diagram of steps of a data storage method provided in an embodiment of the present disclosure is to be understood that an execution subject of the data storage method may be a data storage device having a data storage function, such as a terminal, a server, or the like, or an electronic product or the like integrated with such a data storage device. The data storage method may include the steps of:
step 102: and cutting the target data based on a data blocking algorithm to obtain a plurality of data blocks.
In this embodiment, the target data may be understood as new data received locally, and it is not known whether the new data is duplicate data. In order to obtain a data fingerprint of the target data, the target data may be cut to obtain a plurality of data blocks.
In this embodiment, when the target data is cut, the data blocking algorithm used may at least include: fixed-length blocking algorithm or variable-length blocking algorithm. In addition, other algorithms that can implement data slicing and are used for calculating data fingerprints may also be included, for example, a slider slicing algorithm and the like, which are not described herein in detail.
The fixed-length blocking algorithm and the variable-length blocking algorithm are described as examples.
1. Fixed-length blocking algorithm
The fixed-length blocking algorithm is to cut data by adopting a predefined data block size and perform a weak check value and an md5 strong check value. An example of a fixed-length blocking algorithm, which may be described with reference to fig. 2a, has the advantages of simplicity, high performance, but is very sensitive to data addition and deletion, resulting in a very low deduplication rate.
2. Variable-length block division algorithm
Variable length blocking CDC algorithms typically perform block segmentation based on the file content, with the block size being variable, and the CDC calculates the data fingerprint for the file data using a fixed size (e.g., 48 bytes) sliding window during the segmentation execution. If the fingerprint satisfies a condition, such as when its value modulo a particular integer is equal to a predetermined number, the window position is taken as the boundary of the block. The size of the data block can be limited during specific segmentation, and an upper limit and a lower limit are set. An example of the variable-length blocking algorithm can be shown in fig. 2b, which is insensitive to file content changes, and the insertion or deletion of data only affects a few data blocks, and the remaining data blocks are not affected, so that the deduplication rate is much higher than that of fixed-length blocking. But has high requirements on performance and algorithm and low processing speed.
Step 104: matching an initial input value for the target data based on a security level of the target data to determine a pad value, wherein the initial input value is positively correlated with the security level of the target data.
It should be understood that, in the embodiments of the present specification, the security level of the target data may be used to characterize the security, importance, and other security-related attributes of the target data.
Since the embodiment of the present specification calculates the data fingerprint of the target data by using the improved Keccak encryption algorithm, padding values must be added to the data block to reach the corresponding byte length. In the embodiment of the present specification, first, the target data needs to be matched with an appropriate padding value according to the security level of the target data. During specific implementation, a preset grade threshold value can be set according to historical data, namely, data are divided into different safety grade categories according to the safety grade of the historical data. If the security level of the target data is greater than a preset level threshold value, matching an initial input value from a first type initial value; otherwise, matching the initial input value from the second type initial value; wherein the byte length of the initial input value in the first class of initial values is greater than the byte length of the initial input value in the second class of initial values.
The first class of initial values comprise 1600 bits, 800 bits and 400 bits, and the second class of initial values comprise: 25bit, 50bit, 100bit are exemplified. Considering that the output efficiency of a preset function f is related to an initial input value b in a Keccak encryption algorithm, when the b is 1600 bits, the filling value is the most, the safety is the highest, and the corresponding output efficiency is the lowest; when b is 25bit, the filling value is minimum, the safety is minimum, and the corresponding output efficiency is also maximum. Therefore, under the condition that the backup system normally operates, the initial input value b can be filled with a first type of initial value on one hand and can be filled with a second type of initial value on the other hand, calculation is carried out simultaneously, and two indexes are reserved. When new data is stored, the system tool automatically judges the busyness degree of the system or manually grades the importance and the safety of the data, in other words, the safety level of the target data is determined. If the system is idle or the importance and the safety of the data are low, determining that the safety level is less than a preset level threshold, and filling the target data into any one of 25 bits, 50 bits and 100 bits for calculation; and if the system is busy or the importance and the safety of the data are higher, determining that the safety level is greater than or equal to a preset level threshold, and filling the target data into any one of 1600 bits, 800 bits and 400 bits for calculation. Therefore, when the data is large enough, the repeated data deleting efficiency can be effectively improved. In addition, when the system is idle, a plurality of basic data indexes can be created to meet the requirements of different situations.
Step 106: and sequentially performing data absorption and extrusion processing on the plurality of data blocks based on the filling values, and calculating to obtain data fingerprints of the plurality of data blocks corresponding to the target data.
After the filling value is determined, the filling value may be used to sequentially perform data absorption and extrusion on the plurality of data blocks obtained by segmentation in step 102, in fact, the process may be understood as a sponge structure shown in fig. 3, and the sponge structure mainly includes a data absorption process and a data extrusion process.
An implementation scheme, when the security level of the target data is not greater than a preset level threshold, sequentially performing data absorption processing on the plurality of data blocks based on the filling value, specifically including:
directly filling the filling value for the first data block in the plurality of data blocks to obtain an output value with a fixed length; taking the output value as an input value of a preset function, and calculating to obtain an output value of the preset function; performing XOR processing on the output value of the preset function and the second data block, and filling the filling value for the result after the XOR processing; and taking the filled result as the input value of the preset function again, and repeating the calculation until the last data block obtains an output value to finish the data absorption processing.
In a specific implementation process, in a data absorption process, when the importance and the security of target data are determined to be low, a padding value c may be directly filled in the first data block P1, so that an initial input value b (b may be 25, 50, 100, 200, 400, 800, 1600bit) of a fixed length is obtained; here, the initial input value b is also the initial input value matched in step 104. And calculating the initial input value b and a preset function f, carrying out exclusive-or processing on the output value and P2, filling c in the result, and inputting the result into the function f again as a new input value. And repeating the steps until the processing of the last data block Pn is completed, ending the data absorption stage, and entering the extrusion stage.
The output value of the data absorbing stage function f is stored as z0, and the entire output value is filled in c and used as the input value of the function f. The first r bits of the input value of the function f are saved as z1, and the entire output value is padded with c and then used as the input value of the function f again. Repeating the above steps until a function of the hash value of the required length is obtained, and ending the data extrusion stage.
Since the Keccak algorithm is an algorithm used in the field of encryption, the security requirement of the Keccak algorithm is often higher than that of deduplication. Therefore, on the premise that data security can be guaranteed, the method can be adopted, namely, the exclusive or processing with r is not carried out, and a filling value c is directly filled to serve as an input value of the function f, so that the efficiency of deleting repeated data is improved.
In another implementation scheme, when the security level of the target data is greater than a preset level threshold, sequentially performing data absorption processing on the plurality of data blocks based on the filling value specifically includes:
performing exclusive-or processing on a first data block in the plurality of data blocks and an operation value, and filling the padding value with a result after the exclusive-or processing to obtain an initial input value with a fixed length, wherein the operation value is equal to the byte length of the first data block; inputting the initial input value into a preset function, and calculating to obtain an output value of the preset function; performing XOR processing on the output value of the preset function and the second data block, and filling the filling value for the result after the XOR processing; and taking the filled result as the input value of the preset function again, and repeating the calculation until the last data block obtains an output value to finish the data absorption processing. It should be appreciated that the initial input values herein may be obtained from the matching in step 104.
When the data absorption and extrusion processing is carried out on the plurality of data blocks in sequence based on the filling values, a preset function is called for many times; when the security level of the target data is not greater than a preset level threshold, calling the preset function comprises: continuously calling a theta module, a rho module and a pi module; referring to FIG. 4a, the θ module is used to construct an array, and the byte number of each element in the array is equal; in particular, the θ module may translate r + c into a 64-bit array of 5x5 elements, compute the portions of co-bits in each column, and then combine them using an XOR operator. And finally, carrying out exclusive OR on the obtained parity result and each state bit. The rho module is used for circularly shifting the elements in the array according to the arrangement of the triangular numbers; the pi module is used for transforming elements in the array. By the scheme, only part of modules are continuously called when the preset function is used, the calculation process can be reduced, and the efficiency of deleting the repeated data is improved.
In another implementation, when the security level of the target data is not greater than the preset level threshold, after the θ module, the ρ module, and the π module are continuously called, the method further comprises: calling a chi module and/or a iota module; referring to FIG. 4b, the χ module is used to add non-linear characteristics to the transformed array; specifically, the row elements may be combined using AND, NOT, XOR, etc., and the result written into the state array. The iota module is used for eliminating the symmetry of the array, and specifically, an element in the array can be XOR-ed with a cyclic constant; the module has 24 cyclic constants to choose from, which are defined internally by Keccak. Through the scheme, all the modules can be continuously called when the preset function is used, and the safety of the extrusion result can be improved.
It should be understood that the function f preset during the whole algorithm is to perform complex stirring operation on the input data and output the result. The input length of the preset function f is not r, but r + c, which means that there are c capacities that are not directly affected by the content of the input packet, so that some features in the input information can be effectively prevented from being leaked out. Because the Keccak algorithm uses a brand-new structure, the previous attack means aiming at the MD structure (MD5) is difficult to work, and the safety is improved. The output value with any length can be obtained through a Keccak algorithm, and the problem of cyclic collision conflict caused by the fact that only a fixed length exists in SHA-2 can be effectively solved.
Step 108: if the data fingerprint is in a fingerprint index table, adding the data fingerprint in a logical view; otherwise, adding the fingerprint index of the corresponding data block in the fingerprint index table.
After the data fingerprints of the multiple data blocks of the target data are obtained, whether the data fingerprint of each data block is in the fingerprint index table or not can be respectively judged, if the data fingerprint of any data block is in the fingerprint index table, the data block corresponding to the data fingerprint is considered to be repeated, otherwise, if the data fingerprint is not in the fingerprint index table, the data block corresponding to the data fingerprint is considered to be new.
For repeated data blocks: a data pointer or data fingerprint of a duplicate data block will be added in the logical view. Logically, the system has multiple copies of data blocks, while physically, the system has only a single copy of data blocks.
For a new data block: the system needs to save a new block of data, which is the actual block of data stored. When updating the fingerprint index table, the data fingerprint and metadata of the new data block are added to the fingerprint index table, thereby maintaining the consistency of the fingerprint index table and the stored data block.
Thus, the deduplication operation is completed during the data storage phase. When a data needs to be recovered, the metadata of the data is read first, and the data blocks are read out from the data area according to the data block pointers recorded by the metadata, so that the data blocks are reassembled into the file.
According to the technical scheme, target data are cut based on a data blocking algorithm to obtain a plurality of data blocks; then matching an initial input value for the target data based on the security level of the target data to determine a filling value, wherein the initial input value is positively correlated with the security level of the target data; then, sequentially performing data absorption and extrusion processing on the plurality of data blocks based on the filling values, and calculating to obtain data fingerprints of the plurality of data blocks corresponding to the target data; if the data fingerprint is in a fingerprint index table, adding the data fingerprint in a logical view; otherwise, adding the fingerprint index of the corresponding data block in the fingerprint index table. Therefore, target data with different safety levels can be dealt with by matching appropriate initial input values for the target data, and especially when the data are huge, the deleting efficiency of repeated data can be effectively improved.
Fig. 5 is a schematic structural diagram of a data storage device 200 according to an embodiment of the present disclosure. Referring to FIG. 5, in one software implementation, the data storage device 200 may include:
the cutting module 202 is configured to cut the target data based on a data blocking algorithm to obtain a plurality of data blocks;
a matching module 204, configured to match an initial input value for the target data based on the security level of the target data to determine a padding value, wherein the initial input value is positively correlated with the security level of the target data;
a calculating module 206, configured to perform data absorption and extrusion processing on the multiple data blocks in sequence based on the filling values, and calculate to obtain multiple data block data fingerprints corresponding to the target data;
a processing module 208, configured to add the data fingerprint in the logical view if the data fingerprint is in the fingerprint index table; otherwise, adding the fingerprint index of the corresponding data block in the fingerprint index table.
According to the technical scheme of the specification, target data are cut based on a data blocking algorithm to obtain a plurality of data blocks; then matching an initial input value for the target data based on the security level of the target data to determine a filling value, wherein the initial input value is positively correlated with the security level of the target data; then, sequentially performing data absorption and extrusion processing on the plurality of data blocks based on the filling values, and calculating to obtain data fingerprints of the plurality of data blocks corresponding to the target data; if the data fingerprint is in a fingerprint index table, adding the data fingerprint in a logical view; otherwise, adding the fingerprint index of the corresponding data block in the fingerprint index table. Therefore, target data with different safety levels can be dealt with by matching appropriate initial input values for the target data, and especially when the data are huge, the deleting efficiency of repeated data can be effectively improved.
As an embodiment, when the matching module 204 matches the target data with the initial input value based on the security level of the target data to determine the padding value, specifically, the matching module is configured to: if the security level of the target data is greater than a preset level threshold, matching an initial input value from a first type initial value to determine a filling value; otherwise, matching the initial input value from the second type initial value to determine a filling value; wherein the byte length of the initial input value in the first class of initial values is greater than the byte length of the initial input value in the second class of initial values.
As another embodiment, when the security level of the target data is not greater than the preset level threshold, the calculating module 206 sequentially performs data absorption processing on the plurality of data blocks based on the filling value, specifically to: directly filling the filling value for the first data block in the plurality of data blocks to obtain an initial input value with a fixed length; inputting the initial input value into a preset function, and calculating to obtain an output value of the preset function; performing XOR processing on the output value of the preset function and the second data block, and filling the filling value for the result after the XOR processing; and inputting the filled result as a new input value into the preset function, and repeatedly calculating until the last data block obtains an output value to finish data absorption processing.
As another embodiment, when the security level of the target data is greater than a preset level threshold, the calculating module 206 sequentially performs data absorption processing on the plurality of data blocks based on the filling value, specifically to: performing exclusive-or processing on a first data block in the plurality of data blocks and an operation value, and filling the padding value with a result after the exclusive-or processing to obtain an initial input value with a fixed length, wherein the operation value is equal to the byte length of the first data block; inputting the initial input value into a preset function, and calculating to obtain an output value of the preset function; performing XOR processing on the output value of the preset function and the second data block, and filling the filling value for the result after the XOR processing; and inputting the filled result as a new input value into the preset function, and repeatedly calculating until the last data block obtains an output value to finish data absorption processing.
As another embodiment, the calculating module 206 calls a preset function for multiple times when performing data absorption and extrusion processing on the data blocks in sequence based on the filling values; accordingly, the number of the first and second electrodes,
when the security level of the target data is not greater than a preset level threshold, the calculation module 206 calls the preset function, specifically, for continuously calling a θ module, a ρ module, and a π module; the theta module is used for constructing an array, and the byte bits of each element in the array are equal; the rho module is used for circularly shifting the elements in the array according to the arrangement of the triangular numbers; the pi module is used for transforming elements in the array.
As another embodiment, when the security level of the target data is not greater than the preset level threshold, the calculating module 206 is further configured to call the χ module and/or the iota module after continuously calling the θ module, the ρ module and the π module; the x module is used for adding nonlinear characteristics to the array after the conversion cycle; the iota module is used for eliminating the symmetry of the array.
As another embodiment, when the fingerprint index of the corresponding data block is added to the fingerprint index table, the processing module 208 is specifically configured to: and adding the data fingerprint and the metadata of the corresponding data block in the fingerprint index table.
It should be understood that the data storage device of the embodiments of the present disclosure may also perform the method performed by the data storage device (or apparatus) in fig. 1, and implement the functions of the data storage device (or apparatus) in the embodiments shown in fig. 1, which are not described herein again.
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present specification. Referring to fig. 6, at a hardware level, the electronic device includes a processor, and optionally further includes an internal bus, a network interface, and a memory. The memory may include a memory, such as a Random-access memory (RAM), and may further include a non-volatile memory, such as at least 1 disk memory. Of course, the electronic device may also include hardware required for other services.
The processor, the network interface, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (peripheral component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 6, but that does not indicate only one bus or one type of bus.
And the memory is used for storing programs. In particular, the program may include program code comprising computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.
The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the shared resource access control device on the logic level. The processor is used for executing the program stored in the memory and is specifically used for executing the following operations:
cutting target data based on a data blocking algorithm to obtain a plurality of data blocks;
matching an initial input value for the target data based on a security level of the target data to determine a padding value, wherein the initial input value is positively correlated with the security level of the target data;
sequentially performing data absorption and extrusion processing on the data blocks based on the filling values, and calculating to obtain data fingerprints of the data blocks corresponding to the target data;
if the data fingerprint is in a fingerprint index table, adding the data fingerprint in a logical view; otherwise, adding the fingerprint index of the corresponding data block in the fingerprint index table.
According to the technical scheme of the specification, target data are cut based on a data blocking algorithm to obtain a plurality of data blocks; then matching an initial input value for the target data based on the security level of the target data to determine a filling value, wherein the initial input value is positively correlated with the security level of the target data; then, sequentially performing data absorption and extrusion processing on the plurality of data blocks based on the filling values, and calculating to obtain data fingerprints of the plurality of data blocks corresponding to the target data; if the data fingerprint is in a fingerprint index table, adding the data fingerprint in a logical view; otherwise, adding the fingerprint index of the corresponding data block in the fingerprint index table. Therefore, target data with different safety levels can be dealt with by matching appropriate initial input values for the target data, and especially when the data are huge, the deleting efficiency of repeated data can be effectively improved.
The method performed by the data storage device according to the embodiment shown in fig. 1 in this specification can be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present specification may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present specification may be embodied directly in a hardware decoding processor, or in a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.
The electronic device may also execute the method shown in fig. 1 and implement the functions of the data storage apparatus in the embodiment shown in fig. 1, which are not described herein again in this specification.
Of course, besides the software implementation, the electronic device of the embodiment of the present disclosure does not exclude other implementations, such as a logic device or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or a logic device.
Embodiments of the present specification also propose a computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a portable electronic device comprising a plurality of application programs, are capable of causing the portable electronic device to perform the method of the embodiment shown in fig. 1, and in particular for performing the method of:
cutting target data based on a data blocking algorithm to obtain a plurality of data blocks;
matching an initial input value for the target data based on a security level of the target data to determine a padding value, wherein the initial input value is positively correlated with the security level of the target data;
sequentially performing data absorption and extrusion processing on the data blocks based on the filling values, and calculating to obtain data fingerprints of the data blocks corresponding to the target data;
if the data fingerprint is in a fingerprint index table, adding the data fingerprint in a logical view; otherwise, adding the fingerprint index of the corresponding data block in the fingerprint index table.
According to the technical scheme of the specification, target data are cut based on a data blocking algorithm to obtain a plurality of data blocks; then matching an initial input value for the target data based on the security level of the target data to determine a filling value, wherein the initial input value is positively correlated with the security level of the target data; then, sequentially performing data absorption and extrusion processing on the plurality of data blocks based on the filling values, and calculating to obtain data fingerprints of the plurality of data blocks corresponding to the target data; if the data fingerprint is in a fingerprint index table, adding the data fingerprint in a logical view; otherwise, adding the fingerprint index of the corresponding data block in the fingerprint index table. Therefore, target data with different safety levels can be dealt with by matching appropriate initial input values for the target data, and especially when the data are huge, the deleting efficiency of repeated data can be effectively improved.
In short, the above description is only a preferred embodiment of the present disclosure, and is not intended to limit the scope of the present disclosure. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present specification shall be included in the protection scope of the present specification.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include transitory computer readable media (transmyedia) such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Claims (10)

1. A method of storing data, comprising:
cutting target data based on a data blocking algorithm to obtain a plurality of data blocks;
matching an initial input value for the target data based on a security level of the target data to determine a padding value, wherein the initial input value is positively correlated with the security level of the target data;
sequentially performing data absorption and extrusion processing on the plurality of data blocks based on the filling values, and calculating to obtain a data fingerprint of each data block corresponding to the target data;
if the data fingerprint is in a fingerprint index table, adding the data fingerprint in a logical view; otherwise, adding the fingerprint index of the corresponding data block in the fingerprint index table.
2. The method of claim 1, wherein matching the target data with initial input values based on the security level of the target data to determine padding values comprises:
if the security level of the target data is greater than a preset level threshold, matching an initial input value from a first type initial value to determine a filling value;
otherwise, matching the initial input value from the second type initial value to determine a filling value;
wherein the byte length of the initial input value in the first class of initial values is greater than the byte length of the initial input value in the second class of initial values.
3. The method according to claim 1 or 2, wherein when the security level of the target data is not greater than a preset level threshold, sequentially performing data absorption processing on the plurality of data blocks based on the filling value, specifically comprising:
directly filling the filling value for the first data block in the plurality of data blocks to obtain an initial input value with a fixed length;
inputting the initial input value into a preset function, and calculating to obtain an output value of the preset function;
performing XOR processing on the output value of the preset function and the second data block, and filling the filling value for the result after the XOR processing;
and inputting the filled result as a new input value into the preset function, and repeatedly calculating until the last data block obtains an output value to finish data absorption processing.
4. The method according to claim 3, wherein when the security level of the target data is greater than a preset level threshold, sequentially performing data absorption processing on the plurality of data blocks based on the filling value, specifically comprising:
performing exclusive-or processing on a first data block in the plurality of data blocks and an operation value, and filling the padding value with a result after the exclusive-or processing to obtain an initial input value with a fixed length, wherein the operation value is equal to the byte length of the first data block;
inputting the initial input value into a preset function, and calculating to obtain an output value of the preset function;
performing XOR processing on the output value of the preset function and the second data block, and filling the filling value for the result after the XOR processing;
and inputting the filled result as a new input value into the preset function, and repeatedly calculating until the last data block obtains an output value to finish data absorption processing.
5. The method according to claim 1 or 2, characterized in that, when the data absorption and extrusion processing is sequentially performed on the plurality of data blocks based on the filling values, a preset function is called a plurality of times;
when the security level of the target data is not greater than a preset level threshold, calling the preset function comprises: continuously calling a theta module, a rho module and a pi module;
the theta module is used for constructing an array, and the byte bits of each element in the array are equal; the rho module is used for circularly shifting the elements in the array according to the arrangement of the triangular numbers; the pi module is used for transforming elements in the array.
6. The method of claim 5, wherein when the security level of the target data is not greater than a preset level threshold, subsequent to successive calls to the θ module, the ρ module, and the π module, the method further comprises: calling a chi module and/or a iota module;
the x module is used for adding nonlinear characteristics to the array after the conversion cycle; the iota module is used for eliminating the symmetry of the array.
7. The method of claim 4, wherein adding the fingerprint index of the corresponding data chunk to the fingerprint index table specifically comprises:
and adding the data fingerprint and the metadata of the corresponding data block in the fingerprint index table.
8. A data storage device, comprising:
the cutting module is used for cutting the target data based on a data blocking algorithm to obtain a plurality of data blocks;
a matching module for matching an initial input value for the target data based on the security level of the target data to determine a padding value, wherein the initial input value is positively correlated with the security level of the target data;
the calculation module is used for sequentially performing data absorption and extrusion processing on the plurality of data blocks based on the filling values, and calculating to obtain a data fingerprint of each data block corresponding to the target data;
the processing module is used for adding the data fingerprint in the logic view if the data fingerprint is in the fingerprint index table; otherwise, adding the fingerprint index of the corresponding data block in the fingerprint index table.
9. An electronic device, comprising:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to perform a data storage method as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that,
the computer-readable storage medium comprising instructions which, when executed on a computer, cause the computer to carry out the data storage method of any one of claims 1 to 7 when executed.
CN201911391011.6A 2019-12-30 2019-12-30 Data storage method and device and electronic equipment Active CN113126879B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911391011.6A CN113126879B (en) 2019-12-30 2019-12-30 Data storage method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911391011.6A CN113126879B (en) 2019-12-30 2019-12-30 Data storage method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN113126879A true CN113126879A (en) 2021-07-16
CN113126879B CN113126879B (en) 2022-11-29

Family

ID=76767545

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911391011.6A Active CN113126879B (en) 2019-12-30 2019-12-30 Data storage method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN113126879B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114356212A (en) * 2021-11-23 2022-04-15 阿里巴巴(中国)有限公司 Data processing method, system and computer readable storage medium
WO2023108360A1 (en) * 2021-12-13 2023-06-22 华为技术有限公司 Method and apparatus for managing data in storage system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104871155A (en) * 2012-10-01 2015-08-26 西部数据技术公司 Optimizing data block size for deduplication
EP3076584A1 (en) * 2015-03-31 2016-10-05 Université De Reims Champagne-Ardenne Hashed data retrieval method
CN109844750A (en) * 2016-09-30 2019-06-04 国际商业机器公司 Padding state determines
CN109918018A (en) * 2017-12-13 2019-06-21 华为技术有限公司 A kind of date storage method and storage equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104871155A (en) * 2012-10-01 2015-08-26 西部数据技术公司 Optimizing data block size for deduplication
EP3076584A1 (en) * 2015-03-31 2016-10-05 Université De Reims Champagne-Ardenne Hashed data retrieval method
CN109844750A (en) * 2016-09-30 2019-06-04 国际商业机器公司 Padding state determines
CN109918018A (en) * 2017-12-13 2019-06-21 华为技术有限公司 A kind of date storage method and storage equipment

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
MUTOUREND: "《strobe——面向IoT物联网应用的密码学协议框架》", 《CSDN》 *
PANASAYYA YALLA ET., AL.: "《Comparison of multi-purpose cores of Keccak and AES》", 《DATE "15: PROCEEDINGS OF THE 2015 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITIONMARCH 2015》 *
吴武飞等: "可重构Keccak算法设计及FPGA实现", 《计算机应用》 *
吴涛等: "一种基于变参级联混沌的Hash函数算法", 《计算机研究与发展》 *
李梦东,杜飞: "《SHA-3第三轮候选算法简评》", 《北京电子科技学院学报》 *
赵太飞; 尹航; 李永明: "《基于Sponge结构的轻量级Hash函数设计》", 《小型微型计算机系统》 *
钱凯: "《云存储中快速安全的数据去重方法》", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114356212A (en) * 2021-11-23 2022-04-15 阿里巴巴(中国)有限公司 Data processing method, system and computer readable storage medium
WO2023108360A1 (en) * 2021-12-13 2023-06-22 华为技术有限公司 Method and apparatus for managing data in storage system

Also Published As

Publication number Publication date
CN113126879B (en) 2022-11-29

Similar Documents

Publication Publication Date Title
CN108846749B (en) Partitioned transaction execution system and method based on block chain technology
WO2017185616A1 (en) File storage method and electronic equipment
CN111966649B (en) Lightweight online file storage method and device capable of efficiently removing weight
US11537304B2 (en) Data verification method and apparatus, and storage medium
CN113126879B (en) Data storage method and device and electronic equipment
US11249987B2 (en) Data storage in blockchain-type ledger
CN112613053B (en) Data encryption and decryption method and device
CN107391761B (en) Data management method and device based on repeated data deletion technology
CN110061930B (en) Method and device for determining data flow limitation and flow limiting values
CN111858520A (en) Method and device for separately storing block link point data
CN109145651B (en) Data processing method and device
KR101953548B1 (en) Network traffic recording device and method thereof
US10489244B2 (en) Systems and methods for detecting and correcting memory corruptions in software
CN111274245B (en) Method and device for optimizing data storage
CN111861744A (en) Method for realizing parallelization of block chain transaction and block chain link point
CN108123804B (en) Data decryption execution method, device and medium
CN116865766A (en) Waveform data compression method, system, equipment and medium
KR20210126773A (en) Partitioning method and device therefor
CN111459937A (en) Data table association method, device, server and storage medium
WO2024021491A1 (en) Data slicing method, apparatus and system
CN109213972B (en) Method, device, equipment and computer storage medium for determining document similarity
CN107368281B (en) Data processing method and device
CN113721986B (en) Data compression method and device, electronic equipment and storage medium
CN115129728A (en) File checking method and device
CN104933010A (en) Duplicated data deleting method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant