CN112394873A - Data management method, system, electronic equipment and storage medium - Google Patents

Data management method, system, electronic equipment and storage medium Download PDF

Info

Publication number
CN112394873A
CN112394873A CN201910740429.7A CN201910740429A CN112394873A CN 112394873 A CN112394873 A CN 112394873A CN 201910740429 A CN201910740429 A CN 201910740429A CN 112394873 A CN112394873 A CN 112394873A
Authority
CN
China
Prior art keywords
data block
file
data
inter
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910740429.7A
Other languages
Chinese (zh)
Other versions
CN112394873B (en
Inventor
周玉坤
付忞
古亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sangfor Technologies Co Ltd
Original Assignee
Sangfor Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sangfor Technologies Co Ltd filed Critical Sangfor Technologies Co Ltd
Priority to CN201910740429.7A priority Critical patent/CN112394873B/en
Publication of CN112394873A publication Critical patent/CN112394873A/en
Application granted granted Critical
Publication of CN112394873B publication Critical patent/CN112394873B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0617Improving the reliability of storage systems in relation to availability
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • G06F3/0641De-duplication techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data management method, a system, an electronic device and a computer readable storage medium, wherein the method comprises the following steps: when a write request of a first target file is received, carrying out blocking processing on the first target file, and generating metadata of the first target file; determining whether the data blocks in the first target file are inter-file high-reference data blocks or not according to the metadata; the inter-file high-reference data blocks are inter-file repeated data blocks with inter-file reference counts larger than or equal to a threshold value; if the data block is a high-reference data block between files, performing redundancy management on the data block by using a copy strategy; and if the data block is a non-inter-file high-reference data block, performing redundancy management on the data block by using an erasure code strategy. Therefore, the data management method provided by the application guarantees high data availability with low storage overhead, and avoids the problem of data loss or inaccessibility caused by storage equipment failure.

Description

Data management method, system, electronic equipment and storage medium
Technical Field
The present application relates to the field of storage technologies, and in particular, to a data management method, a data management system, an electronic device, and a computer-readable storage medium.
Background
Data deduplication technology has been widely applied to backup systems, primary storage systems, virtual machines, and cloud storage systems as a system-level compression technology. Since storage systems are inevitably faced with uncorrectable disk errors and potential sector errors, ensuring data availability becomes one of the important safety indicators for storage systems. Data deduplication may reduce storage overhead relative to storage systems that do not employ data deduplication, but inevitably compromises data availability. The logical layout and the physical layout of the file after data deduplication are inconsistent, and the same data block can be referenced in different files. The loss of one physical block causes a more serious data loss of the secondary storage system. Therefore, increasing the data availability of data deduplication systems is a very serious challenge.
At present, an erasure code or copy strategy is usually adopted for data after data deduplication, and the erasure code strategy-based method adopted in a storage system has poor expandability and can cause extra I/O overhead, while the copy strategy-based method can increase storage overhead.
Therefore, how to satisfy the requirements of availability and low storage overhead in a data deduplication storage system is a technical problem to be solved by those skilled in the art.
Disclosure of Invention
The present application aims to provide a data management method, system, an electronic device and a computer readable storage medium, which meet the requirements of availability and low storage overhead in a data deduplication storage system.
In order to achieve the above object, the present application provides a data management method, including:
when a write request of a first target file is received, carrying out blocking processing on the first target file, and generating metadata of the first target file;
determining whether the data blocks in the first target file are inter-file high-reference data blocks or not according to the metadata; the inter-file high-reference data blocks are inter-file repeated data blocks with inter-file reference counts larger than or equal to a threshold value;
if the data block is the inter-file high-reference data block, performing redundancy management on the data block by using a copy strategy;
and if the data block is a non-inter-file high-reference data block, performing redundancy management on the data block by using an erasure code strategy.
Wherein the determining whether the data block in the first target file is an inter-file high reference data block according to the metadata includes:
judging whether the data blocks in the first target file are repeated data blocks among files or not;
if the data block is the repeated data block among the files, when the inter-file reference count of the data block is greater than or equal to a threshold value, the data block is judged to be a high-reference data block among the files, and when the inter-file reference count is less than the threshold value, the data block is judged to be low-reference data among the files;
and if the data block is a non-inter-file repeated data block, judging that the data block is the non-inter-file high-reference data block.
Wherein the determining whether the data block in the first target file is a duplicate data block between files includes:
judging whether the fingerprint of the data block is matched with a fingerprint sequence in the metadata of the first target file;
if the fingerprint sequence in the metadata of the first target file is matched, the data block is a repeated data block in the file;
if the fingerprint sequence is not matched with the fingerprint sequence in the metadata of the first target file, judging whether the fingerprint hits the fingerprint sequences of other files or not; if so, the data block is a repeated data block among files; if not, the data block is a non-repeated data block.
Wherein the performing redundancy management on the data block by using the copy policy includes:
and when the inter-file reference count meets a preset condition, increasing the copy number of the data block. The preset conditions comprise a first preset condition and a second preset condition; the first preset condition is that the inter-file reference count is equal to the threshold, the second preset condition is that the inter-file reference count, a target reference count and the current copy number satisfy a preset relationship, the target reference count is the reference count of the data block when the copy number increases from the target copy number to the current copy number, and the target copy number is the current copy number minus one.
Wherein the performing redundancy management on the data block by using the erasure coding strategy includes:
adding the data block into a target container;
when the target container is filled, dividing the target container into k data objects, and generating m check objects corresponding to the k data objects; wherein m and k are positive integers;
and storing the k data objects and the m check objects into k + m nodes.
Wherein, still include:
when a second target file reading request is received, reading an encoding object of each data block in a second target file from at least k nodes according to metadata information of erasure code stripes;
and decoding the coded object to obtain each data block in the second target file so as to respond to the reading request.
To achieve the above object, the present application provides a data management system, including:
the generating module is used for carrying out blocking processing on a first target file when a writing request of the first target file is received, and generating metadata of the first target file;
a determining module, configured to determine whether a data block in the first target file is an inter-file high-reference data block according to the metadata; the inter-file high-reference data blocks are inter-file repeated data blocks with inter-file reference counts larger than or equal to a threshold value; if yes, starting the working process of the first management module; if not, starting the working process of the second management module;
the first management module is used for performing redundancy management on the data block by using a copy policy;
and the second management module is used for performing redundancy management on the data block by using an erasure code strategy.
To achieve the above object, the present application provides an electronic device including:
a memory for storing a computer program;
a processor for implementing the steps of the data management method as described above when executing the computer program.
To achieve the above object, the present application provides a computer-readable storage medium having stored thereon a computer program, which when executed by a processor, implements the steps of the data management method as described above.
According to the scheme, the data management method provided by the application comprises the following steps: when a write request of a first target file is received, carrying out blocking processing on the first target file, and generating metadata of the first target file; determining whether the data blocks in the first target file are inter-file high-reference data blocks or not according to the metadata; the inter-file high-reference data blocks are inter-file repeated data blocks with inter-file reference counts larger than or equal to a threshold value; if so, carrying out redundancy management on the data block by using a copy strategy; and if not, performing redundancy management on the data block by using an erasure code strategy.
According to the data management method, redundancy management is performed on repeated data blocks among the files, namely high-reference data blocks among the files, of which the reference count among the files is larger than or equal to the threshold value by using the copy strategy, so that high data availability of the high-reference data blocks among the files can be ensured, redundancy management is performed on other data blocks by using the erasure code strategy, and storage overhead is reduced. Therefore, the data management method provided by the application guarantees high data availability with low storage overhead, and avoids the problem of data loss or inaccessibility caused by storage equipment failure. The application also discloses a data management system, an electronic device and a computer readable storage medium, which can also realize the technical effects.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:
FIG. 1 is an architecture diagram of a data management method provided herein;
FIG. 2 is a flow chart illustrating a method of data management according to an exemplary embodiment;
FIG. 3 is a detailed flowchart of step S103 in FIG. 2;
FIG. 4 is a detailed flowchart of step S104 in FIG. 2;
FIG. 5 is a flow diagram illustrating another method of data management in accordance with an exemplary embodiment;
FIG. 6 is a block diagram illustrating a data management system in accordance with an exemplary embodiment;
FIG. 7 is a block diagram illustrating an electronic device in accordance with an exemplary embodiment.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
As shown in fig. 1, an architecture diagram of a data management method provided by the present application includes a data deduplication module, a duplicate data detection module, and a data management module. The input of the data deduplication module is file content, and the output is metadata information. The repeated data detection module outputs the input file metadata information as a data block state. The data block state includes non-duplicate data blocks, duplicate data blocks within a file, inter-file high-reference data blocks, and inter-file low-reference data blocks. The input of the data management module is a data block state, and the output of the data management module is a data redundancy management strategy, including a copy strategy and an erasure code strategy.
The embodiment of the application discloses a data management method, which meets the requirements of availability and low storage overhead in a data deduplication storage system.
Referring to fig. 2, a flowchart of a data management method according to an exemplary embodiment is shown, as shown in fig. 2, including:
s101: when a write request of a first target file is received, carrying out blocking processing on the first target file, and generating metadata of the first target file;
the execution subject of this embodiment may be a processor of a data deduplication storage system, and the data deduplication storage system is a storage system supporting a data deduplication feature function, which may reduce storage overhead. The system segments input data or files into data blocks, calculates a hash value of each data block as a fingerprint of the data block by using a hash function (for example, SHA-1), and maintains a fingerprint index table in a memory, wherein the fingerprint index table records a one-to-one mapping relation between the fingerprint of the data block and a physical position.
In a specific implementation, when a write request of a first target file is received, the first target file is divided into data blocks, and metadata is generated, wherein the metadata mainly comprises a file name, a path, the number of the data blocks, a fingerprint sequence of all the data blocks, the length of the data blocks and the like.
S102: determining whether the data blocks in the first target file are inter-file high-reference data blocks or not according to the metadata; if yes, entering S103; if not, entering S104;
the inter-file high-reference data blocks are inter-file repeated data blocks with inter-file reference counts larger than or equal to a threshold value;
in this step, for each data block in the first target file, the data block state thereof needs to be determined, including non-duplicate data blocks, intra-file duplicate data blocks, and inter-file duplicate data blocks. The number of the non-repeated data blocks in the system is 1, the number of the repeated data blocks in the file is larger than 1, the reference count of the repeated data blocks among the files is larger than 1, namely a plurality of different files contain the repeated data blocks among the files, and the repeated data blocks among the files comprise high-reference data blocks among the files and low-reference data blocks among the files. S103 is entered for inter-file high-reference data blocks, otherwise S104 is entered.
The inter-file high-reference data block is an inter-file repeated data block of which the inter-file reference count is greater than or equal to a threshold value, the inter-file reference count of the data block indicates the number of files referencing the data block, and for the input data block, if the data block is the inter-file repeated data block, the inter-file reference count of the data block is increased by 1. When the inter-file reference count is greater than or equal to the threshold, the data block is an inter-file high reference data block. Namely, the step can comprise: judging whether the data blocks in the first target file are repeated data blocks among files or not; if yes, determining inter-file reference count of the data block; determining whether the inter-file reference count is greater than or equal to the threshold; if so, the data block is a high-reference data block between files; and if not, the data block is inter-file low-reference data.
The step of determining whether the data block in the first target file is a duplicate data block between files may include: judging whether the fingerprint of the data block is matched with a fingerprint sequence in the metadata of the first target file; if the data blocks are matched, the data blocks are repeated data blocks in the file; if not, judging whether the fingerprints hit the fingerprint sequences of other files; if so, the data block is a repeated data block among files; if not, the data block is a non-repeated data block.
S103: performing redundancy management on the data block by using a copy strategy;
in the step, a copy strategy is adopted for the high-reference data blocks among the files for redundancy management, so that the high availability of the data is ensured.
Preferably, a dynamic copy policy may be used for redundancy management, that is, this step may include: and when the inter-file reference count meets a preset condition, increasing the copy number of the data block. A person skilled in the art may set a preset condition, and dynamically adjust the number of copies of each data block, that is, when the reference count between the files does not satisfy the preset condition, the number of copies is increased, and the increase gradient may be set to 1.
S104: and carrying out redundancy management on the data block by utilizing an erasure code strategy.
In the step, redundancy management is performed on the low-reference data blocks between the non-repeated data blocks and the files by using an erasure code strategy, so that the storage overhead of the whole system is reduced.
According to the data management method provided by the embodiment of the application, redundancy management is performed on repeated data blocks among files, namely high-reference data blocks among files, of which the reference count among the files is greater than or equal to the threshold value by using the copy strategy, so that high data availability of the high-reference data blocks among the files can be ensured, and redundancy management is performed on other data blocks by using the erasure code strategy, so that storage overhead is reduced. Therefore, the data management method provided by the embodiment of the application guarantees high data availability with low storage overhead, and avoids the problem of data loss or inaccessibility caused by storage equipment failure.
Describing the flow of redundancy management using a copy policy in detail below, as shown in fig. 3, step S103 in the above embodiment may include:
s31: determining whether the inter-file reference count is equal to the threshold; if yes, go to S33; if not, go to S32;
s32: determining the current copy number of the data block, and judging whether the inter-file reference count and the current copy number meet a preset relational expression; if yes, go to S33;
wherein the preset relational expression specifically comprises:
T-Tr=a·r+b;
wherein T is the inter-file reference count, r is the current copy number, TrA and b are preset parameters for the reference count of the data block when the number of copies increases from r-1 to r.
S33: increasing the number of copies of the data block by 1;
in this embodiment, the preset conditions include a first preset condition or a second preset condition, the first preset condition is that the inter-file reference count is equal to a threshold, and the second preset condition is that the inter-file reference count is greater than the threshold, the inter-file reference count T is greater than the threshold, and the target reference count T is greater than the thresholdrAnd the current copy number r meet a preset relationship. When any one of the above is satisfied, the number of copies of the data block is increased by 1. Each copy can be dispersedly stored on different nodes, the reserved space of the nodes is written, and the metadata information of the copy can be updated into the copy table.
Preferably, before step S31, the method may further include: and judging whether the current copy number of the data block exceeds the maximum copy number, and if so, deleting the data block.
Therefore, for the inter-file high-reference data blocks, the number of the copies is dynamically added according to the increase of the inter-file reference count, so that the expansibility and the data availability of the system are improved.
Describing the process of redundancy management using erasure coding strategy in detail below, as shown in fig. 4, step S104 in the first embodiment may include:
s41: adding the data block into a target container;
in this embodiment, non-inter-file high-reference data chunks, that is, non-duplicate data chunks and inter-file low-reference data chunks, are aggregated and then written into a container.
S42: when the target container is filled, dividing the target container into k data objects, and generating m check objects corresponding to the k data objects; wherein m and k are positive integers;
in this step, when the container is full, a single container is divided into k data objects and m check objects are generated, the data objects and the check objects being collectively referred to as encoding objects. When reading the data block, the complete data block can be obtained by reading at least k code objects.
S43: and storing the k data objects and the m check objects into k + m nodes.
In this step, different encoding objects are stored into different nodes, that is, k data objects and m check objects are stored into k + m nodes, respectively.
Therefore, for the non-inter-file high-reference data blocks, the erasure code strategy is adopted by the embodiment to store the encoding objects into the multiple devices respectively, so that the storage overhead of the whole system is reduced.
As shown in fig. 5, the following describes in detail a reading process in the data management method provided in the present application, and specifically includes:
s201: when a second target file reading request is received, reading an encoding object of each data block in a second target file from at least k nodes according to metadata information of erasure code stripes;
in this embodiment, when a second target file read request is received, metadata of the second target file is searched to obtain a file path, a file name, and a fingerprint sequence. And creating a file according to the file path and the file name. For each data block in the metadata, the encoding object is read from at least k nodes according to the metadata information of the erasure code stripe.
Here, the metadata information of the erasure correction code band records a node to which each data block in the second target file is to be encoded. In the process of writing the file, the whole container is divided into k data objects, so that when each data block is read, at least k nodes are selected from the nodes corresponding to the data blocks to read data, and then the complete data block can be obtained.
S202: and decoding the coded object to obtain each data block in the second target file so as to respond to the reading request.
In this step, the encoding object is decoded to obtain a complete data block, all the data blocks are written into the new file, and the new file is returned to respond to the read request.
If the selected nodes have fault nodes, the erasure codes can be used for reconstructing the fault nodes to recover data. If the partial data block can not be reconstructed, searching the copy table to judge whether other nodes have copies or not, acquiring copy addresses, and reading the copies to recover the data of the failed nodes.
In the following, a data management system provided by an embodiment of the present application is introduced, and a data management system described below and a data management method described above may be referred to each other.
Referring to fig. 6, a block diagram of a data management system is shown according to an exemplary embodiment, as shown in fig. 6, including:
a generating module 601, configured to perform blocking processing on a first target file when a write request of the first target file is received, and generate metadata of the first target file;
a determining module 602, configured to determine, according to the metadata, whether a data block in the first target file is an inter-file high-reference data block; the inter-file high-reference data blocks are inter-file repeated data blocks with inter-file reference counts larger than or equal to a threshold value; if yes, starting the working process of the first management module; if not, starting the working process of the second management module;
the first management module 603 is configured to perform redundancy management on the data block by using a copy policy;
the second management module 604 is configured to perform redundancy management on the data block by using an erasure coding policy.
According to the data management system provided by the embodiment of the application, redundancy management is performed on repeated data blocks among files, namely high-reference data blocks among files, of which the reference count among the files is greater than or equal to the threshold value by using the copy strategy, so that high data availability of the high-reference data blocks among the files can be ensured, and redundancy management is performed on other data blocks by using the erasure code strategy, so that storage overhead is reduced. Therefore, the data management system provided by the embodiment of the application guarantees high data availability with low storage overhead, and avoids the problem of data loss or inaccessibility caused by storage equipment failure.
On the basis of the foregoing embodiment, as a preferred implementation manner, the determining module 602 includes:
the first judging unit is used for judging whether the data blocks in the first target file are repeated data blocks among files or not; if yes, starting the working process of the first judgment unit; if not, starting the working process of the second judgment unit;
a first determination unit, configured to determine that the data block is an inter-file high-reference data block when an inter-file reference count of the data block is greater than or equal to a threshold, and determine that the data block is inter-file low-reference data when the inter-file reference count is less than the threshold;
and the second judging unit is used for judging the data block as the non-inter-file high-reference data block.
On the basis of the above embodiment, as a preferred implementation, the first judging unit includes:
a first judging subunit, configured to judge whether a fingerprint of the data block hits a fingerprint sequence in metadata of the first target file; if the data blocks are matched, the data blocks are repeated data blocks in the file; if not, starting the working process of the second judgment subunit;
the second judgment subunit is used for judging whether the fingerprint hits the fingerprint sequences of other files; if so, the data block is a repeated data block among files; if not, the data block is a non-repeated data block.
On the basis of the foregoing embodiment, as a preferred implementation manner, the first management module 603 is specifically a module that increases the number of copies of the data block when the inter-file reference count satisfies a preset condition.
On the basis of the above embodiment, as a preferred implementation, the preset condition includes a first preset condition or a second preset condition; the first preset condition is that the inter-file reference count is equal to the threshold, the second preset condition is that the inter-file reference count, a target reference count and the current copy number satisfy a preset relationship, the target reference count is the reference count of the data block when the copy number increases from the target copy number to the current copy number, and the target copy number is the current copy number minus one.
On the basis of the foregoing embodiment, as a preferred implementation, the second management module 604 includes:
the adding unit is used for adding the data blocks into the target container;
the generating unit is used for dividing the target container into k data objects when the target container is filled, and generating m check objects corresponding to the k data objects; wherein m and k are positive integers;
and the storage unit is used for storing the k data objects and the m check objects into k + m nodes.
On the basis of the above embodiment, as a preferred implementation, the method further includes:
the reading module is used for reading the coding object of each data block in the second target file from at least k nodes according to the metadata information of the erasure code stripe when a reading request of the second target file is received;
and the decoding module is used for performing decoding operation on the encoded object to obtain each data block in the second target file so as to respond to the reading request.
With regard to the system in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
The present application further provides an electronic device, and referring to fig. 7, a structure diagram of an electronic device 700 provided in an embodiment of the present application may include a processor 11 and a memory 12, as shown in fig. 7. The electronic device 700 may also include one or more of a multimedia component 13, an input/output (I/O) interface 14, and a communication component 15.
The processor 11 is configured to control the overall operation of the electronic device 700, so as to complete all or part of the steps in the data management method. The memory 12 is used to store various types of data to support operation at the electronic device 700, such as instructions for any application or method operating on the electronic device 700 and application-related data, such as contact data, transmitted and received messages, pictures, audio, video, and so forth. The Memory 12 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk. The multimedia component 13 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 12 or transmitted via the communication component 15. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 14 provides an interface between the processor 11 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication module 15 is used for wired or wireless communication between the electronic device 700 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G or 4G, or a combination of one or more of them, so that the corresponding Communication component 15 may include: Wi-Fi module, bluetooth module, NFC module.
In an exemplary embodiment, the electronic Device 700 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the data management methods described above.
In another exemplary embodiment, there is also provided a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the above-described data management method. For example, the computer readable storage medium may be the memory 12 described above including program instructions that are executable by the processor 11 of the electronic device 700 to perform the data management method described above.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims (10)

1. A method for managing data, comprising:
the method comprises the steps of carrying out blocking processing on a first target file and generating metadata of the first target file;
determining whether the data blocks in the first target file are inter-file high-reference data blocks or not according to the metadata; the inter-file high-reference data blocks are inter-file repeated data blocks with inter-file reference counts larger than or equal to a threshold value;
if the data block is the inter-file high-reference data block, performing redundancy management on the data block by using a copy strategy;
and if the data block is a non-inter-file high-reference data block, performing redundancy management on the data block by using an erasure code strategy.
2. The data management method of claim 1, wherein the determining whether the data block in the first target file is an inter-file high reference data block according to the metadata comprises:
judging whether the data blocks in the first target file are repeated data blocks among files or not;
if the data block is the repeated data block among the files, when the inter-file reference count of the data block is greater than or equal to a threshold value, the data block is judged to be a high-reference data block among the files, and when the inter-file reference count is less than the threshold value, the data block is judged to be low-reference data among the files;
and if the data block is a non-inter-file repeated data block, judging that the data block is the non-inter-file high-reference data block.
3. The data management method according to claim 2, wherein the determining whether the data block in the first target file is an inter-file repeated data block comprises:
judging whether the fingerprint of the data block is matched with a fingerprint sequence in the metadata of the first target file;
if the fingerprint sequence in the metadata of the first target file is matched, the data block is a repeated data block in the file;
if the fingerprint sequence is not matched with the fingerprint sequence in the metadata of the first target file, judging whether the fingerprint hits the fingerprint sequences of other files or not; if so, the data block is a repeated data block among files; if not, the data block is a non-repeated data block.
4. The data management method of claim 1, wherein the performing redundancy management on the data block by using the copy policy comprises:
and when the inter-file reference count meets a preset condition, increasing the copy number of the data block.
5. The data management method according to claim 4, wherein the preset condition comprises a first preset condition or a second preset condition; the first preset condition is that the inter-file reference count is equal to the threshold, the second preset condition is that the inter-file reference count, a target reference count and the current copy number satisfy a preset relationship, the target reference count is the reference count of the data block when the copy number increases from the target copy number to the current copy number, and the target copy number is the current copy number minus one.
6. The data management method of claim 1, wherein the performing redundancy management on the data block by using an erasure coding strategy comprises:
adding the data block into a target container;
when the target container is filled, dividing the target container into k data objects, and generating m check objects corresponding to the k data objects; wherein m and k are positive integers;
and storing the k data objects and the m check objects into k + m nodes.
7. The data management method according to any one of claims 1 to 6, further comprising:
when a second target file reading request is received, reading an encoding object of each data block in a second target file from at least k nodes according to metadata information of erasure code stripes;
and decoding the coded object to obtain each data block in the second target file so as to respond to the reading request.
8. A data management system, comprising:
the generating module is used for carrying out blocking processing on a first target file when a writing request of the first target file is received, and generating metadata of the first target file;
a determining module, configured to determine whether a data block in the first target file is an inter-file high-reference data block according to the metadata; the inter-file high-reference data blocks are inter-file repeated data blocks with inter-file reference counts larger than or equal to a threshold value; if yes, starting the working process of the first management module; if not, starting the working process of the second management module;
the first management module is used for performing redundancy management on the data block by using a copy policy;
and the second management module is used for performing redundancy management on the data block by using an erasure code strategy.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the data management method according to any one of claims 1 to 7 when executing said computer program.
10. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the data management method according to any one of claims 1 to 7.
CN201910740429.7A 2019-08-12 2019-08-12 Data management method, system, electronic equipment and storage medium Active CN112394873B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910740429.7A CN112394873B (en) 2019-08-12 2019-08-12 Data management method, system, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910740429.7A CN112394873B (en) 2019-08-12 2019-08-12 Data management method, system, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112394873A true CN112394873A (en) 2021-02-23
CN112394873B CN112394873B (en) 2024-05-24

Family

ID=74602267

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910740429.7A Active CN112394873B (en) 2019-08-12 2019-08-12 Data management method, system, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112394873B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114398006A (en) * 2021-12-24 2022-04-26 中国电信股份有限公司 Distributed storage mode control method, device, equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101814045A (en) * 2010-04-22 2010-08-25 华中科技大学 Data organization method for backup services
CN102567218A (en) * 2010-12-17 2012-07-11 微软公司 Garbage collection and hotspots relief for a data deduplication chunk store
US8250035B1 (en) * 2008-09-30 2012-08-21 Emc Corporation Methods and apparatus for creating a branch file in a file system
CN103118133A (en) * 2013-02-28 2013-05-22 浙江大学 Mixed cloud storage method based on file access frequency
CN104917788A (en) * 2014-03-11 2015-09-16 中国移动通信集团公司 Data storage method and apparatus
CN109144417A (en) * 2018-08-16 2019-01-04 广州杰赛科技股份有限公司 A kind of cloud storage method, system and equipment
CN109522151A (en) * 2017-09-15 2019-03-26 北京京东尚科信息技术有限公司 Method and device for data redundancy storage

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8250035B1 (en) * 2008-09-30 2012-08-21 Emc Corporation Methods and apparatus for creating a branch file in a file system
CN101814045A (en) * 2010-04-22 2010-08-25 华中科技大学 Data organization method for backup services
CN102567218A (en) * 2010-12-17 2012-07-11 微软公司 Garbage collection and hotspots relief for a data deduplication chunk store
CN103118133A (en) * 2013-02-28 2013-05-22 浙江大学 Mixed cloud storage method based on file access frequency
CN104917788A (en) * 2014-03-11 2015-09-16 中国移动通信集团公司 Data storage method and apparatus
CN109522151A (en) * 2017-09-15 2019-03-26 北京京东尚科信息技术有限公司 Method and device for data redundancy storage
CN109144417A (en) * 2018-08-16 2019-01-04 广州杰赛科技股份有限公司 A kind of cloud storage method, system and equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
袁雪梅: "多云存储中的数据分布及混合冗余方法", 硕士电子期刊, pages 2 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114398006A (en) * 2021-12-24 2022-04-26 中国电信股份有限公司 Distributed storage mode control method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN112394873B (en) 2024-05-24

Similar Documents

Publication Publication Date Title
AU2017228544B2 (en) Nonvolatile media dirty region tracking
US11954373B2 (en) Data structure storage and data management
US9921910B2 (en) Virtual chunk service based data recovery in a distributed data storage system
US20160006461A1 (en) Method and device for implementation data redundancy
US9928248B2 (en) Self-healing by hash-based deduplication
CN110018998B (en) File management method and system, electronic equipment and storage medium
EP2756399B1 (en) Querying and repairing data
US11093387B1 (en) Garbage collection based on transmission object models
CN110998537B (en) Expired backup processing method and backup server
CN110795269B (en) Data recovery verification method, device and equipment
US10489240B2 (en) Efficient detection of corrupt data
US11656942B2 (en) Methods for data writing and for data recovery, electronic devices, and program products
EP3336702B1 (en) Metadata recovery method and device
CN111143116A (en) Method and device for processing bad blocks of disk
US10489244B2 (en) Systems and methods for detecting and correcting memory corruptions in software
CN112394873B (en) Data management method, system, electronic equipment and storage medium
US8396837B2 (en) Information processing apparatus
CN108121504B (en) Data deleting method and device
WO2023082629A1 (en) Data storage method and apparatus, electronic device, and storage medium
CN114691414A (en) Check block generation method and data recovery method
CN114138543A (en) Data strip coding method, system, device and medium
CN113553215A (en) Erasure code data recovery optimization method and device based on environmental information
CN107305582B (en) Metadata processing method and device
US11645333B1 (en) Garbage collection integrated with physical file verification
US11442929B2 (en) Double header logging of slotted page structures

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant