CN114415958A - Disk data processing method and system, storage medium and electronic equipment - Google Patents

Disk data processing method and system, storage medium and electronic equipment Download PDF

Info

Publication number
CN114415958A
CN114415958A CN202210067247.XA CN202210067247A CN114415958A CN 114415958 A CN114415958 A CN 114415958A CN 202210067247 A CN202210067247 A CN 202210067247A CN 114415958 A CN114415958 A CN 114415958A
Authority
CN
China
Prior art keywords
data
deleted
storage file
file
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210067247.XA
Other languages
Chinese (zh)
Inventor
郝敬龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingdong Technology Information Technology Co Ltd
Original Assignee
Jingdong Technology Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingdong Technology Information Technology Co Ltd filed Critical Jingdong Technology Information Technology Co Ltd
Priority to CN202210067247.XA priority Critical patent/CN114415958A/en
Publication of CN114415958A publication Critical patent/CN114415958A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0652Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device

Abstract

The present disclosure provides a disk data processing method and system, a storage medium, and an electronic device; relates to the technical field of data storage. The method comprises the following steps: receiving a data deletion request aiming at a disk, wherein the data deletion request comprises a data identifier to be deleted; determining data to be deleted and the storage file type of the data to be deleted based on the data to be deleted identifier; the type of the storage file of the data to be deleted is determined by the data size information of the data to be written in the data writing process; determining a corresponding data deleting mode based on the type of the storage file of the data to be deleted; and deleting the data to be deleted by adopting a data deleting mode corresponding to the type of the storage file of the data to be deleted. The method and the device can solve the problems that in the prior art, the redundancy of the disk data is high, and the read-write efficiency of the disk data is influenced.

Description

Disk data processing method and system, storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of data storage technologies, and in particular, to a disk data processing method, a disk data processing apparatus, a computer-readable storage medium, and an electronic device.
Background
Magnetic disks are the primary storage media for computers and can store large amounts of binary data using magnetic recording technology. After the disk is powered off, the data can be kept from being lost, and repeated erasing and writing operations can be carried out.
In the existing disk data processing process, user uploading data is usually added into a file. When deleting data, only the mapping relation from the user data address to the file is deleted, and the data is not deleted. After the user deletes the data, the data on the disk cannot be released in time, so that the data redundancy is high; further leading to read amplification in the subsequent Garbage Collection (GC) process, affecting the disk read-write rate.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
An object of the embodiments of the present disclosure is to provide a disk data processing method, a disk data processing system, a computer-readable storage medium, and an electronic device, so as to solve the problem that the disk data redundancy is high and the disk data read-write efficiency is affected in the prior art to a certain extent.
According to a first aspect of the present disclosure, there is provided a disk data processing method, including:
receiving a data deletion request aiming at a disk, wherein the data deletion request comprises a mapping relation between a data identifier to be deleted and a data storage file position to be deleted;
determining data to be deleted and the storage file type of the data to be deleted based on the data to be deleted identifier; the type of the storage file of the data to be deleted is determined by the data size information of the data to be written in the data writing process;
determining a corresponding data deleting mode based on the type of the storage file of the data to be deleted;
and deleting the data to be deleted by adopting a data deleting mode corresponding to the type of the storage file of the data to be deleted.
In an exemplary embodiment of the present disclosure, based on the foregoing scheme, the method further includes:
determining a storage file type of a disk, wherein the storage file type comprises a dense file and a sparse file, and the storage space of the dense file is smaller than that of the sparse file;
and performing region division on the storage space of the disk based on the storage file types, wherein each region corresponds to one storage file.
In an exemplary embodiment of the present disclosure, based on the foregoing scheme, the data deletion request includes a mapping relationship between an identifier of data to be deleted and a location of a storage file of the data to be deleted, and determining a corresponding data deletion mode based on a type of the storage file of the data to be deleted includes:
when the storage file type of the data to be deleted is a dense file, deleting the mapping relation between the data identifier to be deleted and the storage file position of the data to be deleted;
and when the type of the storage file of the data to be deleted is a sparse file, calling a Punch Hole interface of the file system to perform punching processing on the storage file.
In an exemplary embodiment of the present disclosure, based on the foregoing scheme, the method further includes:
determining the proportion of the failure data on the storage file;
and in response to the comparison result of the proportion and a first preset threshold value, performing recycling processing on the storage file based on the storage file type.
In an exemplary embodiment of the disclosure, based on the foregoing solution, the performing, in response to a comparison result between the duty ratio and a first preset threshold, a reclamation process on the storage file based on the storage file type includes:
when the occupation ratio is not less than a first preset threshold value, for the dense file, resource recovery is carried out on the storage file by reading effective data on the storage file and transferring the read effective data to a new dense file;
and for the sparse file, skipping the holes in the storage file, reading the effective data, transferring the read effective data to a new sparse file, and recovering the resources of the storage file.
In an exemplary embodiment of the present disclosure, based on the foregoing scheme, the method further includes:
receiving a data write request aiming at a disk; the data writing request comprises data to be written and data size information of the data to be written;
determining storage file type information of the data to be written based on the data size information of the data to be written;
and distributing the data to be written to corresponding storage files of the disk for data writing based on the storage file type information.
In an exemplary embodiment of the present disclosure, based on the foregoing scheme, the determining, based on the size information of the data to be written, the storage file type information of the data to be written includes:
when the size information of the data to be written is smaller than a second preset threshold value, determining that the storage file type of the data to be written is a dense file;
and when the size information of the data to be written is larger than or equal to a second preset threshold, determining that the storage file type of the data to be written is a sparse file.
In an exemplary embodiment of the present disclosure, based on the foregoing scheme, before allocating the data to be written to a corresponding storage file of a disk for data writing, the method further includes:
and adding an identification head to the data to be written.
In an exemplary embodiment of the present disclosure, based on the foregoing scheme, skipping the hole in the storage file, and reading the valid data includes:
determining the position of data to be read on the storage file based on the identification head of the written data in the storage file;
reading effective data on the storage file based on the position of the data to be read;
and executing jump operation on the positions of the storage file except the position of the data to be read.
According to a second aspect of the present disclosure, there is provided a disk data processing system comprising:
the receiving module is used for receiving a data deleting request aiming at the disk, wherein the data deleting request comprises a data identifier to be deleted;
the type determining module is used for determining the data to be deleted and the storage file type of the data to be deleted based on the data to be deleted identifier; the type of the storage file of the data to be deleted is determined by the data size information of the data to be written in the data writing process;
the deletion mode determining module is used for determining a corresponding data deletion mode based on the storage file type of the data to be deleted;
and the data deleting module is used for deleting the data to be deleted by adopting a data deleting mode corresponding to the storage file type of the data to be deleted.
According to a third aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the disk data processing method according to any one of the above embodiments.
According to a fourth aspect of the present disclosure, there is provided an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute the disk data processing method of any of the above embodiments via executing the executable instructions.
Exemplary embodiments of the present disclosure may have some or all of the following benefits:
in the disk data processing method provided by the exemplary embodiment of the present disclosure, a data deletion request for a disk triggered by a user may be received, where the data deletion request includes an identifier of data to be deleted; determining data to be deleted and the storage file type of the data to be deleted based on the data to be deleted identifier; the type of the storage file of the data to be deleted is determined by the data size information of the data to be written in the data writing process. And determining a corresponding data deleting mode based on the storage file type of the data to be deleted, and further deleting the data to be deleted in a corresponding mode. On one hand, data can be written into different types of files of the disk according to the data size information of the data to be written; when the user deletes data, different modes of deletion are carried out aiming at different types of files, so that partial disk space can be released, and the redundancy of disk data is reduced. On the other hand, the data is deleted in different modes based on the storage file type of the data to be deleted, so that data reading and amplification can be avoided in the subsequent GC process, and the disk reading and writing efficiency is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.
FIG. 1 is a schematic diagram illustrating an exemplary system architecture to which the disk data processing method and system of the embodiments of the present disclosure may be applied;
FIG. 2 schematically illustrates a flow diagram of a disk data processing method according to one embodiment of the present disclosure;
FIG. 3 schematically illustrates a flow diagram of a disk zone partitioning process in one embodiment according to the present disclosure;
FIG. 4 is a schematic diagram illustrating distribution of internal data and holes of a file after a hole is punched in the file according to an embodiment of the disclosure;
FIG. 5 is a flow diagram that schematically illustrates a disk data reclamation process, in accordance with an embodiment of the present disclosure;
FIG. 6 is a flow diagram that schematically illustrates a disk data writing process, in accordance with an embodiment of the present disclosure;
FIG. 7 schematically illustrates a process flow diagram for garbage collection of a stored file according to an embodiment of the present disclosure;
FIG. 8 is a flow diagram that schematically illustrates an implementation of a disk data processing process, in accordance with an embodiment of the present disclosure;
FIG. 9 is a block diagram that schematically illustrates a disk data processing system, in accordance with an embodiment of the present disclosure;
FIG. 10 illustrates a block diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
Fig. 1 is a schematic diagram illustrating a system architecture 100 of an exemplary application environment to which the disk data processing method and system of the present disclosure may be applied. As shown in fig. 1, the system architecture 100 may include one or more of terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few. The terminal devices 101, 102, 103 may be various electronic devices having a display screen, including but not limited to desktop computers, portable computers, smart phones, tablet computers, and the like. It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, server 105 may be a server cluster comprised of multiple servers, or the like.
The disk data processing method provided by the embodiment of the present disclosure may be executed in the server 105, and accordingly, the disk data processing system is generally disposed in the server 105. The disk data processing method provided by the embodiment of the present disclosure may also be executed by the terminal devices 101, 102, and 103, and correspondingly, the disk data processing system may also be disposed in the terminal devices 101, 102, and 103.
In the disk data processing method provided by the embodiment of the present disclosure, the server 105 may be a background server of the object storage system. The background server may process the data deletion request and the data writing request of the user, and feed back the processing result to the terminal devices 101, 102, and 103.
In the existing magnetic disk data reading and writing process, in order to improve the throughput of writing data and reduce the time delay, data is usually added into a file, and when the size of the file is larger than a set threshold, the file is closed, and a new file is opened. For the deletion of the disk data, generally only the mapping relationship between the data storage file identifier and the file position is deleted, but the data is not deleted, so that the redundancy of the disk data is higher. When the data deleted by the user in a file reaches a certain proportion, the GC operation is performed on the file, namely, the valid data in the file is written into a new file, and the old file is deleted. In order to solve the above problems, the present disclosure designs a disk data processing method.
The technical solution of the embodiment of the present disclosure is explained in detail below:
referring to fig. 2, a magnetic disk data processing method according to an example embodiment of the present disclosure may include:
step S210, receiving a data deletion request for the disk, where the data deletion request includes an identifier of data to be deleted.
In this example embodiment, a data deletion request sent by a user is received, where the data deletion request may include an identifier of data to be deleted, and the identifier of the data to be deleted is returned to the user by the server in the data writing process. In some embodiments, during the data writing process, the server may encode the written data, where the encoding may include information about the location of the file where the data is stored, and may also form a mapping relationship between the data encoding (data identifier) and the location of the file.
Step S220, determining data to be deleted and the storage file type of the data to be deleted based on the data to be deleted identifier; and the type of the storage file of the data to be deleted is determined by the data size information of the data to be written in the data writing process.
In this example embodiment, the identifier of the data to be deleted may be information returned to the user by the server in the data writing process. According to the data identifier to be deleted, the data (data to be deleted) and the storage file corresponding to the identifier can be found, and the file type of the storage file can be further determined.
The type of the storage file of the data to be deleted can be determined in the data writing process. In the data writing process, data may be written into different types of storage files according to the data size information of the data to be written, for example, data with a data size greater than 500M may be written into a large file, data with a data size less than 500M may be written into a small file, or more types of file types may be divided, which is not particularly limited in this example.
Step S230, determining a corresponding data deleting mode based on the storage file type of the data to be deleted.
In this example embodiment, the storage file types may include a large file and a small file, and may further include other file types, for example, a dense file and a sparse file, which is not particularly limited in this example. The deleting mode of the data to be deleted can comprise the mapping relation between the data deleting identification and the file storage position, and the file is punched.
Step S240, deleting the data to be deleted by using a data deletion method corresponding to the storage file type of the data to be deleted.
In this exemplary embodiment, data to be deleted in a large file (sparse file) may be subjected to hole punching, a data area to be deleted in the file is subjected to file hole punching, a disk space of a punched portion is released, and redundancy of disk data is alleviated. The mapping relation between the data identification and the file storage position is deleted for the small files (dense files), and the influence of a large number of disk fragments caused by the small file punching processing on the subsequent read-write efficiency of the disk is avoided.
In the disk data processing method provided by the present exemplary embodiment, a data deletion request for a disk triggered by a user may be received, where the data deletion request includes an identifier of data to be deleted; determining data to be deleted and the storage file type of the data to be deleted based on the data to be deleted identifier; the type of the storage file of the data to be deleted is determined by the data size information of the data to be written in the data writing process. And determining a corresponding data deleting mode based on the storage file type of the data to be deleted, and further deleting the data to be deleted. On one hand, data can be written into different types of files of the disk according to the data size information of the data to be written; when the user deletes data, different modes of deletion are carried out aiming at different types of files, so that partial disk space can be released, and the redundancy of disk data is reduced. On the other hand, the data is deleted in different modes based on the storage file type of the data to be deleted, so that data reading and amplification can be avoided in the subsequent GC process, and the disk reading and writing efficiency is improved.
The various steps of the present disclosure are described in more detail below.
In some embodiments, referring to fig. 3, the method further comprises:
in step S310, the storage file type of the disk is determined.
In the present exemplary embodiment, the storage file types may include a dense file (corresponding to a small file) and a sparse file (corresponding to a large file), and the storage space of the dense file is smaller than that of the sparse file.
Step S320, based on the storage file type, performing region division on the storage space of the disk, where each region corresponds to one storage file.
In this exemplary embodiment, the storage space of the disk may be divided into regions according to file types, and each region stores one file correspondingly. The number of each type of file divided may be determined according to the disk capacity and the size of data uploaded by the user, for example, the number of each type of file may be determined according to the disk capacity so that all the disk capacity can be utilized.
In some embodiments, the data deletion request includes a mapping relationship between an identifier of data to be deleted and a location of a storage file of the data to be deleted, and the determining a corresponding data deletion mode based on a type of the storage file of the data to be deleted includes:
and when the storage file type of the data to be deleted is a dense file, deleting the mapping relation between the identifier of the data to be deleted and the storage file position of the data to be deleted without deleting the data. And when the type of the storage file of the data to be deleted is a sparse file, calling a Punch Hole interface of a file system to perform Hole punching processing on the data to be deleted in the storage file. For example, for the Lunix system, a hole file is generated at a certain position by using lseek or truncate, and the stored data of the hole part of the file is directly released, so that the redundancy of the disk data is reduced.
In the present exemplary embodiment, the disk file system is a structure for organizing, storing and naming files, there are many common file systems, such as FAT16, FAT32, NTFS, Minix, ext2, xiafs, HPFS, VFAT, etc., different operating systems are suitable for different file systems, and Linux can support multiple file systems. In this example, a Punch Hole interface of the file system can be called to Punch a Hole in the file, and the Hole part of the file does not occupy the disk space, which is equivalent to releasing part of the disk space in the file. For example, the lseek function is used for performing the hole punching processing, that is, the seek data parameter of the lseek function is used for searching for data on a storage file, the hole punching is performed on the invalid data area on the disk, the punched file is as shown in fig. 4, and it can be seen from the figure that a part of disk space inside the file can be released through the hole punching processing.
In some embodiments, referring to fig. 5, the method further comprises:
step 510, determining the proportion of the invalid data on the storage file.
In this example embodiment, the invalidation data may include data of deleting data identifier and file location mapping relationship, and may also include data processed by punching. That is, the expiration data may be data for which the user performs a deletion operation. For each storage file, each time a user deletes data, the proportion of the failure data to the total storage space (or the total storage data) of the file can be recorded, and when the proportion of the failure data is larger, the GC operation can be performed on the storage file.
And step 520, in response to the comparison result between the occupation ratio and a first preset threshold, performing recovery processing on the storage file based on the storage file type.
In the present exemplary embodiment, the first preset threshold may be set according to the user requirement, for example, may be set to 80%. And when the ratio is smaller than a first preset threshold, performing no GC processing. And when the ratio is larger than or equal to a first preset threshold value, performing GC processing on the file. Specifically, for the dense file, resource recovery is performed on the storage file by reading valid data on the storage file and migrating the read valid data to a new dense file. And for the sparse file, skipping the holes in the storage file, reading the effective data, transferring the read effective data to a new sparse file, and recovering the resources of the storage file.
In some embodiments, referring to fig. 6, the method further comprises:
in step S610, a data write request to the disk is received.
In the present exemplary embodiment, the server receives a data write request sent by a user. The data write request may contain data to be written and may further include data size information of the data to be written.
Step S620, determining the storage file type information of the data to be written based on the data size information of the data to be written.
In the present exemplary embodiment, data may be stored in different types of files according to data size information of data to be written. Specifically, when the size information of the data to be written is smaller than a second preset threshold, it is determined that the storage file type of the data to be written is a dense file. And when the size information of the data to be written is larger than or equal to a second preset threshold, determining that the storage file type of the data to be written is a sparse file. In this example, the second preset threshold may be set according to the size of the disk storage space and the user requirement, and may be set to 500M or 300M, for example.
Step S630, based on the storage file type information, allocating the data to be written to a corresponding storage file of a disk for data writing.
In this example embodiment, the data to be written may be allocated to the storage file corresponding to the disk according to the storage file type information, and data writing may be performed.
In the embodiment, in the data writing process, the data is written into the different types of storage files according to the size information of the data, so that different types of data deletion can be performed on the different types of files when the subsequent user deletes the data, and the problem of high redundancy of the disk data can be solved.
In some embodiments, referring to fig. 7, the method comprises:
step S710, adding an identification header to the data to be written.
In this exemplary embodiment, for the data to be written in each write request of the user, an identification header may be added to the batch of data to identify that valid data information is stored on the segment of storage space in the magnetic disk. For example, the header "00" may be added to the header of the batch of data. The data headers of the written data of different batches may be the same or different, and this example is not particularly limited in this respect.
Step S720, writing the data with the identification head into the storage file with the corresponding type of the disk.
In this exemplary embodiment, the identification header does not change the type of the storage file corresponding to the data, and the data with the identification header is written into the storage file of the type corresponding to the disk after the identification header is added.
Step S730, determining a position of the data to be read on the storage file based on the identification head of the data written in the storage file.
In the present exemplary embodiment, when GC processing is performed on a file, hole region data does not need to be read for a sparse file having holes. Specifically, the effective data position in the file is judged by identifying the data identification head, so that the reading of the hollow file is avoided.
Step S740, reading the valid data on the storage file based on the position of the data to be read.
In the exemplary embodiment, valid data on a storage file can be read based on the position of data to be read, so that the problem of read amplification in the GC process is avoided.
Step S750, executing a jump operation on the position of the storage file except the position of the data to be read.
In this exemplary embodiment, the jump operation may be directly performed on the invalid data and the hole area in the storage file, that is, directly jump over the length of the hole area (or the invalid data), for example, 12k, and read the next valid data. The skip operation may be implemented by a skip instruction of the file system.
For example, as shown in FIG. 8, the disk data processing procedure can be realized by steps S801-S809.
In step S801, a data write request to a disk is received. In this example, the data write request contains data to be written and data size information of the data to be written.
Step S802, determining the storage file type information of the data to be written based on the data size information of the data to be written. In this example, the storage file types may be divided into dense files and sparse files, where the dense file storage space is smaller than the sparse files. For example, a reserved space may be provided in the sparse file, so that the size of data on the reserved space is smaller than the storage space.
Step S803, add an identification header to the data to be written. In this example, the identification heads are data identification heads, that is, the same identification head can be added to different data to be written, so that data positioning in the subsequent data reading process is facilitated.
Step S804, based on the storage file type information, allocating the data with the identification header to a corresponding storage file of the disk for data writing. In this example, the data in the data write requests of different batches may be written into the same storage file, and are randomly allocated according to the size of the data to be written.
In step S805, a data deletion request for the disk is received. In this example, the data deletion request includes a mapping relationship, i.e., meta, between the identifier of the data to be deleted and the location of the data storage file to be deleted, and this information is returned to the user by the server when the data is written.
Step S806, determining the data to be deleted and the storage file type of the data to be deleted based on the identifier of the data to be deleted. In this example, the data to be deleted and the storage file of the data to be deleted may be found according to the identifier of the data to be deleted, and then the storage file type of the data to be deleted may be determined.
Step S807, deleting the data to be deleted in a corresponding manner based on the storage file type of the data to be deleted. In this example, for the dense file, the mapping relationship between the data identifier to be deleted and the storage file location may be deleted. For sparse files, a Punch Hole interface of a file system is adopted to Punch a Hole in the file for data to be deleted in the file, the disk space is released, and the redundancy of the disk data is reduced.
Step S808, judging whether the ratio of the invalid data on the stored file is greater than or equal to a first preset threshold value; if so, go to step S809, otherwise go to step S801 or step S805.
And step S809, for the dense file, performing resource recovery on the storage file by reading the valid data on the storage file and migrating the read valid data to a new dense file. And for the sparse file, skipping the holes in the storage file, reading the effective data, transferring the read effective data to a new sparse file, and recovering the resources of the storage file.
Before deleting the disk data, the method also comprises a disk data writing process, wherein the disk file is divided into a dense file and a sparse file, and the data is written into different types of files based on the data size of the data to be written; therefore, when a user triggers a data deletion request for the disk, the data to be deleted can be deleted in a corresponding mode based on the storage file type of the data to be deleted. On one hand, data can be written into different types of files of the disk according to the data size information of the data to be written; when the user deletes data, different modes of deletion are carried out aiming at different types of files, so that partial disk space can be released, and the redundancy of disk data is reduced. On the other hand, the data is deleted in different modes based on the storage file type of the data to be deleted, so that data reading and amplification can be avoided in the subsequent GC process, and the disk reading and writing efficiency is improved. In addition, the problem of serious disk fragmentation caused by file punching processing on dense files (small files) can be avoided.
Further, in the present exemplary embodiment, a magnetic disk data processing system 900 is also provided. The disk data processing system 900 may be applied to a server of a storage system. Referring to FIG. 9, the disk data processing system 900 may include:
a receiving module 910, configured to receive a data deletion request for a disk, where the data deletion request includes a mapping relationship between an identifier of data to be deleted and a location of a data storage file to be deleted;
a type determining module 920, configured to determine, based on the identifier of the data to be deleted, the data to be deleted and a storage file type of the data to be deleted; the type of the storage file of the data to be deleted is determined by the data size information of the data to be written in the data writing process;
a deletion mode determining module 930, configured to determine a corresponding data deletion mode based on the storage file type of the data to be deleted;
the data deleting module 940 may be configured to delete the data to be deleted in a data deleting manner corresponding to the storage file type of the data to be deleted.
In an exemplary embodiment of the present disclosure, the deletion system 900 further includes:
the disk file type determining module can be used for determining the storage file type of a disk, wherein the storage file type comprises a dense file and a sparse file, and the storage space of the dense file is smaller than that of the sparse file;
the disk region dividing module may be configured to perform region division on the storage space of the disk based on the storage file type, where each region corresponds to one storage file.
In an exemplary embodiment of the present disclosure, the data deletion request includes a mapping relationship between an identifier of data to be deleted and a location of a storage file of the data to be deleted, and the deletion mode determining module 930 may be further configured to:
when the storage file type of the data to be deleted is a dense file, deleting the mapping relation between the data identifier to be deleted and the storage file position of the data to be deleted;
and when the type of the storage file of the data to be deleted is a sparse file, calling a Punch Hole interface of the file system to perform punching processing on the storage file.
In an exemplary embodiment of the present disclosure, the deletion system 900 further includes:
and the proportion determining module can be used for determining the proportion of the failure data on the storage file.
And the recovery processing module can be used for responding to the comparison result of the proportion and a first preset threshold value and performing recovery processing on the storage file based on the storage file type.
In an exemplary embodiment of the disclosure, the recycling processing module may be further configured to:
when the occupation ratio is not less than a first preset threshold value, for the dense file, resource recovery is carried out on the storage file by reading effective data on the storage file and transferring the read effective data to a new dense file;
and for the sparse file, skipping the holes in the storage file, reading the effective data, transferring the read effective data to a new sparse file, and recovering the resources of the storage file.
In an exemplary embodiment of the present disclosure, the deletion system 900 further includes:
the write request receiving module receives a data write request aiming at a disk; the data writing request comprises data to be written and data size information of the data to be written;
the writing type determining module is used for determining the storage file type information of the data to be written based on the data size information of the data to be written;
and the data writing module is used for distributing the data to be written to the corresponding storage file of the disk for data writing based on the storage file type information.
In an exemplary embodiment of the disclosure, the write type determination module may be further configured to:
when the size information of the data to be written is smaller than a second preset threshold value, determining that the storage file type of the data to be written is a dense file;
and when the size information of the data to be written is larger than or equal to a second preset threshold, determining that the storage file type of the data to be written is a sparse file.
In an exemplary embodiment of the present disclosure, the deletion system 900 further includes:
and adding an identification head module, and adding an identification head to the data to be written.
In an exemplary embodiment of the disclosure, the recycling processing module may be further configured to:
determining the position of data to be read on the storage file based on the identification head of the written data in the storage file; reading effective data on the storage file based on the position of the data to be read; and executing jump operation on the positions of the storage file except the position of the data to be read.
The specific details of each module or unit in the above-mentioned disk data processing system have been described in detail in the corresponding disk data processing method, and therefore are not described herein again.
As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method as described in the embodiments below. For example, the electronic device may implement the steps shown in fig. 2 to 8, and the like.
It should be noted that the computer readable media shown in the present disclosure may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
An electronic device 1000 according to such an embodiment of the present disclosure is described below with reference to fig. 10. The electronic device 1000 shown in fig. 10 is only an example and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 10, the electronic device 1000 is embodied in the form of a general purpose computing device. The components of the electronic device 1000 may include, but are not limited to: the at least one processing unit 1010, the at least one memory unit 1020, a bus 1030 connecting different system components (including the memory unit 1020 and the processing unit 1010), and a display unit 1040.
Wherein the storage unit stores program code that is executable by the processing unit 1010 to cause the processing unit 1010 to perform steps according to various exemplary embodiments of the present disclosure described in the "exemplary methods" section above in this specification.
The storage unit 1020 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)10201 and/or a cache memory unit 10202, and may further include a read-only memory unit (ROM) 10203.
The memory unit 1020 may also include a program/utility 10204 having a set (at least one) of program modules 10205, such program modules 10205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 1030 may be any one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, and a local bus using any of a variety of bus architectures.
The electronic device 1000 may also communicate with one or more external devices 1100 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 1000, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 1000 to communicate with one or more other computing devices. Such communication may occur through input/output (I/O) interfaces 1050. Also, the electronic device 1000 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet) via the network adapter 1060. As shown, the network adapter 1060 communicates with the other modules of the electronic device 1000 over the bus 1030. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 1000, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RA identification systems, tape drives, and data backup storage systems, etc.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
Furthermore, the above-described figures are merely schematic illustrations of processes included in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
It should be noted that although the various steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that these steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc., are all considered part of this disclosure.
It should be understood that the disclosure disclosed and defined in this specification extends to all alternative combinations of two or more of the individual features mentioned or evident from the text and/or drawings. All of these different combinations constitute various alternative aspects of the present disclosure. The embodiments of this specification illustrate the best mode known for carrying out the disclosure and will enable those skilled in the art to utilize the disclosure.

Claims (12)

1. A disk data processing method, comprising:
receiving a data deletion request aiming at a disk, wherein the data deletion request comprises a data identifier to be deleted;
determining data to be deleted and the storage file type of the data to be deleted based on the data to be deleted identifier; the type of the storage file of the data to be deleted is determined by the data size information of the data to be written in the data writing process;
determining a corresponding data deleting mode based on the type of the storage file of the data to be deleted;
and deleting the data to be deleted by adopting a data deleting mode corresponding to the type of the storage file of the data to be deleted.
2. The method of claim 1, further comprising:
determining a storage file type of a disk, wherein the storage file type comprises a dense file and a sparse file, and the storage space of the dense file is smaller than that of the sparse file;
and performing region division on the storage space of the disk based on the storage file types, wherein each region corresponds to one storage file.
3. The magnetic disk data processing method according to claim 2, wherein the data deletion request includes a mapping relationship between an identifier of data to be deleted and a location of a storage file of the data to be deleted, and the determining a corresponding data deletion mode based on the type of the storage file of the data to be deleted includes:
when the storage file type of the data to be deleted is a dense file, deleting the mapping relation between the data identifier to be deleted and the storage file position of the data to be deleted;
and when the type of the storage file of the data to be deleted is a sparse file, calling a Punch Hole interface of the file system to perform punching processing on the storage file.
4. The method of claim 1, further comprising:
determining the proportion of the failure data on the storage file;
and in response to the comparison result of the proportion and a first preset threshold value, performing recycling processing on the storage file based on the storage file type.
5. The magnetic disk data processing method according to claim 4, wherein the performing reclamation processing on the storage file based on the storage file type in response to the comparison result of the percentage with the first preset threshold value comprises:
when the occupation ratio is not less than a first preset threshold value, for the dense file, resource recovery is carried out on the storage file by reading effective data on the storage file and transferring the read effective data to a new dense file;
and for the sparse file, skipping the holes in the storage file, reading the effective data, transferring the read effective data to a new sparse file, and recovering the resources of the storage file.
6. The method of claim 5, further comprising:
receiving a data write request aiming at a disk; the data writing request comprises data to be written and data size information of the data to be written;
determining storage file type information of the data to be written based on the data size information of the data to be written;
and distributing the data to be written to corresponding storage files of the disk for data writing based on the storage file type information.
7. The magnetic disk data processing method according to claim 6, wherein said determining storage file type information of the data to be written based on the size information of the data to be written comprises:
when the size information of the data to be written is smaller than a second preset threshold value, determining that the storage file type of the data to be written is a dense file;
and when the size information of the data to be written is larger than or equal to a second preset threshold, determining that the storage file type of the data to be written is a sparse file.
8. The method according to claim 6, wherein before allocating the data to be written to the corresponding storage file of the disk for data writing, the method further comprises:
and adding an identification head to the data to be written.
9. The method of claim 8, wherein skipping holes in the storage file and reading valid data comprises:
determining the position of data to be read on the storage file based on the identification head of the written data in the storage file;
reading effective data on the storage file based on the position of the data to be read;
and executing jump operation on the positions of the storage file except the position of the data to be read.
10. A disk data processing system, comprising:
the receiving module is used for receiving a data deleting request aiming at the disk, wherein the data deleting request comprises a data identifier to be deleted;
the type determining module is used for determining the data to be deleted and the storage file type of the data to be deleted based on the data to be deleted identifier; the type of the storage file of the data to be deleted is determined by the data size information of the data to be written in the data writing process;
the deletion mode determining module is used for determining a corresponding data deletion mode based on the storage file type of the data to be deleted;
and the data deleting module is used for deleting the data to be deleted by adopting a data deleting mode corresponding to the storage file type of the data to be deleted.
11. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method according to any one of claims 1-9.
12. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out the method of any one of claims 1-9.
CN202210067247.XA 2022-01-20 2022-01-20 Disk data processing method and system, storage medium and electronic equipment Pending CN114415958A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210067247.XA CN114415958A (en) 2022-01-20 2022-01-20 Disk data processing method and system, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210067247.XA CN114415958A (en) 2022-01-20 2022-01-20 Disk data processing method and system, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN114415958A true CN114415958A (en) 2022-04-29

Family

ID=81275471

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210067247.XA Pending CN114415958A (en) 2022-01-20 2022-01-20 Disk data processing method and system, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN114415958A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912698A (en) * 2016-04-25 2016-08-31 乐视控股(北京)有限公司 Deletion method and system of data file in disk
CN109739933A (en) * 2019-01-02 2019-05-10 郑州云海信息技术有限公司 Memory space method for releasing, device, terminal and computer readable storage medium
CN110531929A (en) * 2019-08-09 2019-12-03 济南浪潮数据技术有限公司 The small documents processing method and processing device of storage system
US10503697B1 (en) * 2016-06-30 2019-12-10 EMC IP Holding Company LLC Small file storage system
CN111104063A (en) * 2019-12-06 2020-05-05 浪潮电子信息产业股份有限公司 Data storage method and device, electronic equipment and storage medium
CN111813342A (en) * 2020-07-14 2020-10-23 济南浪潮数据技术有限公司 Data recovery method, device, equipment and computer readable storage medium
CN112783420A (en) * 2019-11-06 2021-05-11 阿里巴巴集团控股有限公司 Data deleting and garbage recycling method, device, system and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912698A (en) * 2016-04-25 2016-08-31 乐视控股(北京)有限公司 Deletion method and system of data file in disk
US10503697B1 (en) * 2016-06-30 2019-12-10 EMC IP Holding Company LLC Small file storage system
CN109739933A (en) * 2019-01-02 2019-05-10 郑州云海信息技术有限公司 Memory space method for releasing, device, terminal and computer readable storage medium
CN110531929A (en) * 2019-08-09 2019-12-03 济南浪潮数据技术有限公司 The small documents processing method and processing device of storage system
CN112783420A (en) * 2019-11-06 2021-05-11 阿里巴巴集团控股有限公司 Data deleting and garbage recycling method, device, system and storage medium
CN111104063A (en) * 2019-12-06 2020-05-05 浪潮电子信息产业股份有限公司 Data storage method and device, electronic equipment and storage medium
CN111813342A (en) * 2020-07-14 2020-10-23 济南浪潮数据技术有限公司 Data recovery method, device, equipment and computer readable storage medium

Similar Documents

Publication Publication Date Title
US11960726B2 (en) Method and apparatus for SSD storage access
US8521986B2 (en) Allocating storage memory based on future file size or use estimates
US9207876B2 (en) Remove-on-delete technologies for solid state drive optimization
US10437481B2 (en) Data access method and related apparatus and system
CN108733306B (en) File merging method and device
US9201787B2 (en) Storage device file system and block allocation
CN109918352B (en) Memory system and method of storing data
EP3364303B1 (en) Data arrangement method, storage apparatus, storage controller and storage array
CN111158602A (en) Data layered storage method, data reading method, storage host and storage system
CN116257460B (en) Trim command processing method based on solid state disk and solid state disk
CN107169126B (en) Log processing method and related equipment
US11288185B2 (en) Method and computer program product for performing data writes into a flash memory
CN112347044A (en) Object storage optimization method based on SPDK
CN108334457B (en) IO processing method and device
CN114415958A (en) Disk data processing method and system, storage medium and electronic equipment
US20230409235A1 (en) File system improvements for zoned storage device operations
CN103064926B (en) Data processing method and device
CN112000289B (en) Data management method for full flash storage server system and related components
CN110287064B (en) Method and device for restoring disk data and electronic equipment
US20230134506A1 (en) System and method for managing vm images for high-performance virtual desktop services
US10331375B1 (en) Data storage area cleaner
EP4120060A1 (en) Method and apparatus of storing data,and method and apparatus of reading data
JP2022016767A (en) Controller, storage device, control method, and control program
CN111930781A (en) Method and device for processing data request of cache database
CN115421904A (en) Method and device for managing memory, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination