CN114281238A

CN114281238A - Data storage method and device

Info

Publication number: CN114281238A
Application number: CN202011034068.3A
Authority: CN
Inventors: 李航
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-09-27
Filing date: 2020-09-27
Publication date: 2022-04-05

Abstract

The embodiment of the application provides a data storage method and equipment, wherein the method comprises the following steps: acquiring a file to be stored and attribute information of the file to be stored; acquiring a first label corresponding to the file to be stored according to the attribute information; storing the file to be stored into a first storage interval according to the first label; the first label is matched with a second label carried by a file in the first storage space. By adopting the embodiment of the application, write amplification can be avoided in the data storage process so as to improve the performance and the service life of the memory.

Description

Data storage method and device

Technical Field

The present application relates to the field of data storage technologies, and in particular, to a data storage method and device.

Background

With the continuous upgrade of digital economy and the successive application of industrial internet and 5G technologies, the requirements on the scale of data storage are more and more large, and the requirements on high-throughput and low-latency storage are more and more urgent. The existing full-flash distributed storage frame replaces a mechanical hard disk with a solid state hard disk on the basis of original distributed storage, and the performance of the whole cluster is improved by utilizing the characteristics of high throughput and low time delay of the solid state hard disk.

Before writing data, the solid state disk needs to erase the data in a block form, and then writes the data needing to be written into the erased block. If the data in the block to be erased includes valid data and invalid data, the valid data can be continuously accessed, and the data to be erased is invalid and cannot be accessed again, that is, the probability of accessing the data in the block to be erased is inconsistent, the valid data needs to be migrated to other places before erasing the block, which causes the problem of write amplification, and further affects the performance and the service life of the solid state disk.

In summary, how to avoid write amplification in the data storage process to improve the performance and the service life of the solid state disk is a technical problem to be solved by those skilled in the art.

Disclosure of Invention

The embodiment of the application discloses a data storage method and data storage equipment, which can avoid write amplification in the data storage process so as to improve the performance and the service life of a memory.

In a first aspect, an embodiment of the present application discloses a data storage method, including:

acquiring a file to be stored and attribute information of the file to be stored;

acquiring a first label corresponding to the file to be stored according to the attribute information;

and storing the file to be stored into a first storage interval according to the first label, wherein the first label is matched with a second label carried by the file in the first storage space.

The first label is used for indicating the probability of the access of the file to be stored; the matching between the first tag and the second tag carried by the file in the first storage space includes: the similarity between the first label and the second label is greater than or equal to a first threshold value.

The attribute information of the file is used to indicate the probability of the file being accessed. In the application, the files are identified through the tags, the file tags of the files matched with the attribute information are also matched, and the files matched with the attribute information are the files with the same or similar access probability, so that the access probability of the files can be indicated through the tags, and the files can be divided more intensively and more accurately.

Then, files with matched labels, namely files with the same or similar access probability, are stored in the same storage area or adjacent storage areas, so that the problem of write amplification of the storage areas can be reduced. Specifically, because the access probabilities of data in the same storage interval are inconsistent, after a part of data in the same storage interval is invalid, the other part of data is still valid, and at this time, if data needs to be written in the storage interval, the valid part of data needs to be moved out first, and then the storage interval needs to be erased, and other data can be written in the storage interval, so that the problem of write amplification is caused. In the application, when data are stored in the storage space from the beginning, the data with the same or similar access probability are written into the storage space, so that the write amplification can be reduced in the subsequent data writing process to improve the performance of the memory, the erasing times of the storage interval can be reduced to prolong the service life, and in addition, the reading efficiency of the file can be improved because the file is stored in the same storage area or the adjacent storage interval.

In a specific embodiment, after the first tag of the file to be stored is obtained, the first tag may be compared with tags of data in each storage interval, and if the tag of the data in a certain storage interval matches the first tag and there is a remaining storage space in the certain storage interval, the file to be stored may be stored in the certain storage interval.

Specifically, each storage interval may correspond to a work thread for executing a write task, and storing the file to be stored in the certain storage interval may specifically be that the data storage device adds the file to be stored to a queue of the work thread of the certain storage interval, and stores the file to be stored in the certain storage interval through the work thread.

If the tag of the data stored in a certain storage interval is matched with the first tag, but no storage space is left in the certain storage interval, the file to be stored can be stored in the storage space which is closest to the certain storage interval and in which the data is not stored. Specifically, a new worker thread may be created to store the file to be stored in the storage space, which is closest to the certain storage interval and in which data is not stored yet.

In another possible implementation, after the first tag of the file to be stored is obtained, the first tag may be compared with tags of data in each storage interval, and if the tags of the data in each storage interval are not matched with the first tag, the data to be stored may be stored in a storage space where no data is stored. Similarly, the data storage device may create a new worker thread for storing the file to be stored in the storage space where no data is stored.

In a possible implementation manner, the attribute information of the first data is matched with the attribute information of the file to be stored.

In one possible embodiment, the storing the file to be stored in a first storage section according to a first tag includes: and storing the plurality of data blocks in the first storage interval according to the first tag.

Optionally, the first storage interval may include a plurality of sub-storage intervals, and the sub-storage intervals are adjacent storage intervals.

In the embodiment, the tags of the data blocks belonging to the same file are the same, so that the data blocks of the file can be stored in the same or adjacent storage spaces, thereby improving the reading efficiency of the file.

In a possible implementation manner, the attribute information of the file to be stored includes information of a logical address where the file to be stored is stored.

In this embodiment, the accessed probabilities of the files with the same or adjacent logical addresses are the same or similar, so that the files with the same or adjacent logical addresses can be stored in the same or adjacent storage space, so that the erasing times of the storage space can be reduced in the subsequent data writing process, and the write amplification can be avoided.

In a possible implementation manner, the information of the logical address includes a storage path of the file to be stored in the file system, wherein the first tag matches a tag of a file in the storage path in the same hierarchical directory as the file to be stored.

In this embodiment, the information of the logical address of the file may be represented as a storage path of the file in the file system, and in one storage path, the probability of accessing the file of the same directory hierarchy is similar, so that the file of the same directory hierarchy may be stored in the same or an adjacent storage space, so that the number of times of erasing the storage space may be reduced in the subsequent data writing process, and write amplification may be avoided.

In a possible implementation manner, the attribute information of the file to be stored includes an identifier of a user to which the file to be stored belongs, where the first tag is matched with a tag of a file carrying the identifier.

In the embodiment, the accessed probabilities of the files of the same user are similar, so that the files of the same user can be stored in the same or adjacent storage intervals, so that the erasing times of the storage space can be reduced in the subsequent data writing process, and the writing amplification is avoided.

In one possible embodiment, the attribute information of the file to be stored includes a plurality of items of sub-attribute information, and the obtaining the first tag of the file to be stored according to the attribute information includes:

respectively acquiring a number according to each item of sub-attribute information of the plurality of items of sub-attribute information;

and obtaining the first label according to the obtained plurality of numbers.

Optionally, the multiple items of sub-attribute information may be all attribute information of the file to be stored, or may be attribute information of a part of the file to be stored.

In this embodiment, the tags of the files can be configured according to a plurality of items of attribute information, and the division of the files can be more accurate by comprehensively considering each item of attribute information.

In one possible embodiment, the first storage section is located in a memory, and the memory includes a hot section for storing hot data and a cold section for storing cold data;

the storing the file to be stored into a first storage interval according to the first tag includes:

searching a storage interval for storing the file to be stored in the hot interval according to the first label under the condition that the file to be stored belongs to hot data; determining the first storage interval in the hot interval to be used for storing the file to be stored;

or, under the condition that the file to be stored belongs to cold data, searching a storage interval for storing the file to be stored in the cold interval according to the first tag; and determining the first storage interval in the cold interval to be used for storing the file to be stored.

In a possible embodiment, the memory includes a plurality of flash memory granules, and the hot region and the cold region are configured in the plurality of flash memory granules according to a preset space configuration rule, where the preset space configuration rule is such that a ratio of the number of the hot region and the number of the cold region in each of the plurality of flash memory granules does not exceed a second threshold.

In the embodiment, since the hot data is stored in the hot zone, the erasing times are more, the cold data is stored in the cold zone, and the erasing times are less, the cold zone and the hot zone are uniformly distributed in each flash memory particle, so that the service life of each flash memory particle is equivalent, and the problems of long service life and short service life of some flash memory particles are solved.

In one possible embodiment, the first storage section is configured to store a file having a tag value within the first range, and the storing the file to be stored in the first storage section according to the first tag includes: and determining that the value of the first label is in the first range, and storing the file to be stored in a first storage interval.

In the embodiment, the range of the tag value of the file stored in each storage interval can be configured in advance, and the file to be stored can be quickly matched with the storage interval in the storage process, so that the storage efficiency is improved.

In a second aspect, the present application provides a data storage device comprising:

the device comprises a first acquisition unit, a second acquisition unit and a processing unit, wherein the first acquisition unit is used for acquiring a file to be stored and attribute information of the file to be stored;

the second obtaining unit is used for obtaining a first label corresponding to the file to be stored according to the attribute information;

and the storage unit is used for storing the file to be stored into a first storage interval according to the first label, wherein the first label is matched with a second label carried by the file in the first storage space.

In a possible implementation manner, the file to be stored includes a plurality of data blocks, where each data block in the plurality of data blocks corresponds to a first tag, and the storage unit is specifically configured to: and storing the plurality of data blocks to the first storage interval according to the first label.

In a possible implementation manner, the attribute information of the file to be stored includes a plurality of items of sub-attribute information, and the second obtaining unit is specifically configured to: respectively acquiring a number according to each item of sub-attribute information of the plurality of items of sub-attribute information; and obtaining the first label according to the obtained plurality of numbers.

In one possible embodiment, the first storage section is located in a memory, and the memory includes a hot section for storing hot data and a cold section for storing cold data; the storage unit is specifically configured to:

or, under the condition that the file to be stored belongs to cold data, searching a storage interval for storing the file to be stored in the cold interval according to the first tag; and determining the first storage interval in the hot interval to be used for storing the file to be stored.

In a possible embodiment, the flash memory includes a plurality of flash memory particles, and the hot region and the cold region are allocated in the plurality of flash memory particles according to a predetermined space allocation rule, where the predetermined space allocation rule is such that a ratio of the number of the hot region and the number of the cold region in each of the plurality of flash memory particles does not exceed a second threshold.

In one possible embodiment, the first storage section is configured to store a file with a tag value within the first range, and the storage unit is specifically configured to: determining that the value of the first tag is within the first range,

and storing the file to be stored in a first storage interval.

In a third aspect, the present application provides a data storage device comprising a processor and a memory, the memory storing a computer program, the processor being configured to invoke the computer program to perform the method of any of the first aspects described above.

In a fourth aspect, the present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any of the first aspects described above.

In summary, the files are identified by the tags, and the file tags of the files matched with the attribute information are also matched, so that the files can be divided more intensively and more accurately. The files matched with the labels are stored in the same storage area or the adjacent storage areas, so that the performance of the memory can be improved by avoiding write amplification in the subsequent data storage process, the erasing times of the storage areas are reduced to prolong the service life, and in addition, the reading efficiency of the files can be improved because the files are stored in the same storage area or the adjacent storage areas.

Drawings

The drawings to be used in the embodiments of the present application will be described below.

Fig. 1 is a schematic structural diagram of a storage system according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a controller included in a storage system according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a distributed storage system according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a server of a distributed storage system according to an embodiment of the present application;

fig. 5 is a schematic flowchart illustrating a data storage method according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a file storage path according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a file storage path according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a storage space according to an embodiment of the present disclosure;

fig. 9 is a schematic logical structure diagram of an apparatus according to an embodiment of the present application.

Detailed Description

A storage system suitable for use in embodiments of the present application will first be described.

As shown in fig. 1, the storage system in the embodiment of the present application may be a storage array. The storage array includes a storage controller 101 and a plurality of memories, where the memory may be a hard disk, the hard disk includes a Solid State Disk (SSD) or a magnetic disk, and the solid state disk may be a flash memory. As shown in fig. 2, the storage controller 101 includes a processor 201, a memory 202, and an interface 203, wherein the memory 202 stores computer programs, and the processor 201 executes the computer programs in the memory 202 to perform management and data access operations on the storage system. In addition, the processor 201 may be any one or combination of a microprocessor, an application specific integrated circuit, a Field Programmable Gate Array (FPGA), and other hardware, and is in communication with the interface 203. The interface 203 may be a Host Bus Adapter (HBA) or the like.

In one possible embodiment, the storage controller 101 may not include the memory 202, and the data stored by the storage controller 101 may be stored in a hard disk controlled by the controller 101.

As with the storage arrays described in fig. 1 and 2, the controller 101 is configured to perform the data storage method in the embodiments of the present application.

Further, the storage system according to the embodiment of the present application may also be a distributed storage system or the like. Illustratively, as shown in fig. 3, the distributed storage system includes a plurality of servers, such as server 1, server 2, server 3, … …, server n (where n is an integer greater than 1), servers communicating with each other via an InfiniBand (InfiniBand) or ethernet network. In practical applications, the number of servers in the distributed storage system may be increased or decreased according to actual needs, which is not limited in the embodiments of the present application. The servers in a distributed storage system are also referred to as storage nodes.

The server of the distributed storage system includes a structure as shown in fig. 4. As shown in fig. 4, each server in the distributed storage system includes a processor 401, a memory 402, an interface 403, and a plurality of storages, which may be a plurality of hard disks. Examples of the plurality of hard disks include the hard disk 1, the hard disk 2, and the hard disk 3 shown in fig. 4. The memory 402 stores a computer program, and the processor 401 executes corresponding operations of the computer program in the memory 402, for example, to execute the data storage method in the embodiment of the present application. The interface 403 may be a hardware interface, such as a Network Interface Card (NIC) or a host bus adapter, or may be a program interface module. The hard disk comprises a solid state disk or a magnetic disk and the like. In addition, the processor 401 may be a Central Processing Unit (CPU), a Field Programmable Gate Array (FPGA), or a combination of an FPGA (or other hardware) and a CPU. The memory 402 in the embodiments of the present application may provide memory for the processor 401. The interface 403 may be a network interface card, host bus adapter.

Based on the above description, the technical solutions in the embodiments of the present application are described below with reference to the accompanying drawings.

Referring to fig. 5, fig. 5 shows a data storage method provided by the embodiment of the present application, which may be applied to the storage systems shown in fig. 1 and fig. 3, and the data storage method provided by the embodiment of the present application may be executed by a controller shown in fig. 1 and fig. 2 or a processor in a server or a server shown in fig. 3 and fig. 4, and hereinafter, the controller or the server is collectively referred to as a data storage device. The method may include, but is not limited to, the steps of:

s501, obtaining a file to be stored and attribute information of the file to be stored.

In a specific embodiment, the data storage device may receive the file to be stored and the attribute information of the file from the client, or may obtain the file to be stored and the attribute information of the file through its own file system.

The attribute information of the file may include one or more of information of a logical address of the file, an identification of a user to which the file belongs, and a type of the file. Optionally, the attribute information of the file may further include the size and creation time of the file, and the like.

The information of the logical address of the file may be represented as a storage path of the file in the file system. Illustratively, the storage path of the file may be, for example, "C: \ Documents and Setting \ User name \ Application Data", etc.

The identifier of the user to which the file belongs may be a default user name, or a custom user name, or the like.

The types of files may include, for example, files of various types of documents, audio, video, pictures, and various types of drawings, and the like, and the different types of files may be distinguished by a suffix name of the file.

And S502, acquiring a first label corresponding to the file to be stored according to the attribute information.

In the embodiment of the present application, a tag is an identifier for distinguishing whether accessed probabilities of files are the same or close to each other, and the following first exemplarily describes a method for configuring a tag for a file in the embodiment of the present application, where the following cases may be referred to:

in the first case, the tag is configured for the file by the logical address of the file.

In a specific embodiment, a tag may be configured for a file according to a logical address interval, for example, the logical address may be divided into a plurality of logical storage intervals, each logical storage interval may include one or more storage units with consecutive logical addresses, files stored in each logical storage interval may be classified into one class, and then a tag is configured for each class of files, where tags of different classes of files are different.

The logic of label configuration for files is that the probability of accessing the files stored in the storage units with continuous logical addresses is the same or similar, so that the labels of the files with the same or similar access probability can be configured to be the same or matched, and the concept of label matching will be described below, and will not be described in detail here.

Wherein, the accessed probabilities of the files are similar, which means that the difference between the accessed probabilities of the two files is less than or equal to a threshold, which may be, for example, 0.1, 0.2, or 0.05, etc. For example, assuming that the threshold is 0.1, the probability that one file is accessed is 0.5, the probability that another file is accessed is 0.58, and the difference between the two probabilities is 0.08, which is smaller than the threshold 0.1, so that the probabilities of being accessed of the one file and the another file are said to be similar.

For ease of understanding the tag configuration of a file by logical address, see table 1.

TABLE 1

Table 1 exemplarily lists the case of tag configuration of a file by a logical address. In table 1, LBA is an abbreviation of local block address, indicating one logical address. Assuming that every three logical addresses constitute one logical storage interval, the files stored in each logical storage interval may include one or more. It can be seen that different labels may be configured in different logical storage intervals, the labels of the files stored in different logical storage intervals are different, and the labels of the files stored in the same logical storage interval are the same. For example, the tags of file 1 and file 2 belonging to the first logical storage interval are both 1000, while the tag of file 3 belonging to the second logical storage interval is 2000.

In a possible implementation, the information of the logical address of the file may be represented as a storage path of the file in the file system, that is, the file may be configured by a tag through the storage path of the file. For ease of understanding, the following is exemplified.

Suppose that the storage path of the file can be, for example, "C: \ Documents and Setting \ User name \ Application Data", and the storage path includes four levels of storage directories, respectively, "C:", "Documents and Setting", "User name", and "Application Data". Wherein, the 'C' is a root directory; "Documents and Setting" is the subdirectory of "C:" and is also the parent directory of "User name"; "User name" is a child directory of "Documents and settings" and at the same time is a parent directory of "Application Data".

The probability of the files of the same hierarchical directory being accessed is similar, so the tags of the files of the same hierarchical directory can be configured as matching tags, wherein two tags with tag similarity greater than or equal to the first threshold are matched.

In one possible embodiment, the label of the file may be a specific value, and the label similarity of two labels may be determined by the difference between the two label values, for example, the difference corresponds to a similarity within a certain range. See, for example, table 2.

TABLE 2

Difference between two tag values C	C≤10	10<C≤20	20<C≤30
				Similarity of labels	90％	80％	70％

As can be seen in table 2, in the case where the difference between the two tag values is less than or equal to 10, then the tag similarity of the two tags is 90%; in the case where the difference between the two tag values is greater than 10 and less than or equal to 20, then the tag similarity of the two tags is 80%; in the case where the difference between the two tag values is greater than 20 and less than or equal to 30, then the tag similarity of the two tags is 70%. Assuming that the first threshold is 70%, then two tags match as long as the difference between their values is less than or equal to 30. It should be noted that the numerical values in table 2 are only an example and do not limit the present application.

In another possible embodiment, the tags of the file may be regular codes, and then, the tag similarity of the two tags may be represented by the similarity between the two codes, and the similarity between the two codes may be determined by comparing whether the first m1(m is an integer greater than 0) bits in the two codes are the same. Assuming that the total number of bits of the code is m2, the similarity of the code may be m1/m 2. For example, if one code is 11111 and the other code is 11211, it can be seen that the total number of bits of the codes is 5, and the first 2 bits of the two codes are the same, then the similarity between the two codes is 2/5-40%. For another example, if one code is 11111 and the other code is 11112, it can be seen that the total number of coded bits is 5, and the first 4 bits of the two codes are the same, then the similarity between the two codes is 4/5-80%. Assuming that the first threshold is 80%, as long as the first 4 of the two codes with the total number of bits 5 are the same, the two codes match, i.e., the two tags match. It should be noted that the number of coded bits and the first threshold are merely examples, and do not limit the present application.

The matching of the two tags may also be referred to as matching of attribute information of files corresponding to the two tags.

Based on the above two forms of tags, the following describes the process of configuring the tags of the file according to the storage path of the file.

First, a process of configuring a tag of a file according to a storage path of the file when the tag of the file is a specific value will be described. Illustratively, starting from the root directory of the storage path of the file, each folder and file in each layer directory is configured with a number, and each number in the same layer directory is different. And then, adding the numbers of the hierarchical directories to obtain the label of the corresponding file. For ease of understanding, reference may be made to fig. 6.

Fig. 6 illustrates a schematic diagram of files in a storage path of three directory hierarchies. Assume that the first hierarchical directory includes a folder, folder 1, configured with a number of 1000; the second hierarchical directory includes folder 2, folder 3 and file 1, which are respectively provided with numbers of 1000, 2000 and 3000; the third hierarchical directory includes file 2, file 3, file 4, and file 5, where file 2 belongs to folder 2, the configuration number is 1, and file 3, file 4, and file 5 belong to folder 3, and the configuration numbers are 1, 2, and 3, respectively.

Then, the label of file 1 is the number of folder 1 plus the number of file 1, i.e. 1000+3000 equals 4000; the label of file 2 is the sum of the numbers of folder 1, folder 2 and file 2, i.e. 1000+1000+1 is 2001; the label of the file 3 is that the sum of the numbers of the folder 1, the folder 3 and the file 3, namely 1000+2000+1 is 3001; the label of the file 4 is 3002 which is the sum of the numbers of the folder 1, the folder 3 and the file 4, namely 1000+2000+ 2; the label of file 5 is 3003 which is the sum of the numbers of folder 1, folder 3 and file 5, i.e., 1000+2000+ 3.

Then, a process of configuring the tag of the file according to the storage path of the file when the tag of the file is a regular code is introduced. Illustratively, starting from the root directory of the storage path of the file, each folder and file in each layer directory is configured with a number, and each number in the same layer directory is different. Then, the serial numbers of the hierarchical directories are sequentially arranged to be the labels of the corresponding files. For ease of understanding, reference may be made to fig. 7.

Fig. 7 illustrates a schematic diagram of files in a storage path of three directory hierarchies. Assuming that the first hierarchical directory includes one folder, folder 1, the configuration number is 1; the second hierarchical directory comprises a folder 2, a folder 3 and a file 1, wherein the numbers of the folders 2, the folders 3 and the files 1 are respectively configured as 1, 2 and 3; the third hierarchical directory includes file 2, file 3, file 4, and file 5, where file 2 belongs to folder 2, the configuration number is 1, and file 3, file 4, and file 5 belong to folder 3, and the configuration numbers are 1, 2, and 3, respectively.

Then, assuming that the number is arranged in the order from the first-level directory to the third-level directory, the label of the corresponding file is 13, 111, 121, 122, and 123 for the file 1, the file 2, the file 3, the file 4, and the file 5.

In the second case, the label configuration of the file is performed through the user identification to which the file belongs.

Illustratively, different tags may be configured for different files based on different user identifications.

In a possible implementation manner, a base number may be configured for each user, and then different numbers are configured for files of each user, where the base number plus the file number is the label of the file. For example, assuming that a file belonging to the user 1 has a file a and a file B, and a file belonging to the user 2 has a file C and a file D, base numbers 10000 and 20000 are set for the user 1 and the user 2, respectively, the numbers set for the file a and the file B are 10 and 20, respectively, and the numbers set for the file C and the file D are 10 and 20, respectively. Then, the labels of file a and file B are 10010 and 10020, respectively, and the labels of file C and file D are 20010 and 20020, respectively.

In another possible implementation, a base number may be configured for each user, and then different numbers may be configured for the file of each user, where the base numbers and the file numbers are arranged in sequence to form a file label. For example, assuming that a file belonging to user 1 has file a and file B, and a file belonging to user 2 has file C and file D,

base numbers

1 and 2 are first set for user 1 and user 2, respectively, file a and file B have

numbers

1 and 2, respectively, and file C and file D have

numbers

1 and 2, respectively. Then, the labels of file a and file B are 11 and 12, respectively, and the labels of file C and file D are 21 and 22, respectively.

And in the third case, the label configuration of the file is carried out according to the type of the file.

For example, different tags may be configured for different files according to different file types.

In a possible implementation manner, a base number may be configured for each type, and then different numbers are configured for files of each type, where the base number plus the number of the file is the label of the file. For example, assume that there are three types of audio, video, and picture, where a file belonging to the audio type is a file a, a file b, and a file c, a file belonging to the video type is a file d and a file e, and a file belonging to the picture type is a file f. Then, base numbers 10000, 20000, and 30000, respectively, numbers 10, 20, and 30 for file a, file b, and file c, numbers 10 and 20 for file d and file e, respectively, and number 10 for file f may be set for audio, video, and pictures, respectively. Then, the labels of the files a, b and c are 10010, 10020 and 10030, respectively, the labels of the files d and e are 20010 and 20020, respectively, and the label of the file f is 30010, respectively.

In another possible implementation, a base number may be configured for each type, and then different numbers are configured for files of each type, and the base numbers and the numbers of the files are arranged in sequence to form a label of the file. For example, assume that there are three types of audio, video, and picture, where a file belonging to the audio type is a file a, a file b, and a file c, a file belonging to the video type is a file d and a file e, and a file belonging to the picture type is a file f. Then,

base numbers

1, 2, and 3 may be set for audio, video, and pictures, respectively, and

numbers

1, 2, and 3 may be set for file a, file b, and file c,

numbers

1 and 2 may be set for file d and file e, respectively, and number 1 may be set for file f. Then, the labels of file a, file b and file c are 11, 12 and 13, respectively, the labels of file d and file e are 21 and 22, respectively, and the label of file f is 31, respectively.

And in the fourth case, the label of the file is configured by combining a plurality of items of attribute information of the file.

In this case, a number may be configured for each of the plurality of items of attribute information of the file to obtain a plurality of numbers, and then the tag of the file may be obtained from the plurality of numbers. The plurality of items of attribute information may include all of the attribute information of the file, or the plurality of items of attribute information may include the attribute information of the part of the file.

In a possible embodiment, the tag of the file may be a specific value, and after configuring the number of each attribute of the plurality of attributes of the file, the numbers of the plurality of attributes may be added to obtain the tag of the file. For example, assume that the plurality of attributes of the file include a storage path of the file, an identification of a user to which the file belongs, and a type of the file. Then, the tags of the file in the storage path, the identifier of the user to which the file belongs, and the type of the file may be acquired according to the first, second, and third cases, respectively, and then the acquired three tags may be added to obtain a final tag of the file. For example, if the tags obtained from the first case, the second case, and the third case are 1001, 20010, and 10020 in terms of the storage path, the identifier of the user to which the file belongs, and the file type, respectively, then 1001+20010+10020, which is the tag obtained by adding the three tags, is 31031, which is the final tag of the file.

In a possible embodiment, the label of the file is a regular code, and then after configuring the number of each of the plurality of attributes of the file, the numbers of the plurality of attributes may be arranged in order to obtain the label of the file. For example, assume that the plurality of attributes of the file include a storage path of the file, an identification of a user to which the file belongs, and a type of the file. Then, the tags of the file in the storage path, the identifier of the user to which the file belongs, and the type of the file may be obtained according to the first, second, and third cases, respectively, and then the obtained three tags are arranged in sequence to obtain the final tag of the file. For example, if the tags obtained from the first case, the second case, and the third case are 11, 21, and 12 in terms of the storage path, the identifier of the user to which the file belongs, and the type of the file, respectively, 112112, which is the tag obtained by arranging the three tags in the order of the storage path, the identifier of the user to which the file belongs, and the type of the file, is the final tag of the file.

Based on the introduced file tag configuration method, the data storage device may obtain a tag of the file to be stored, which may be referred to as a first tag.

S503, storing the file to be stored in a first storage interval according to the first label; the first label is matched with a second label carried by a file in the first storage space.

In a specific embodiment, the first storage interval may include one or more storage intervals, and the storage interval may be a storage interval in the hard disk shown in fig. 1, fig. 2, or fig. 4. The size of each storage interval may be preconfigured. In one possible implementation, the size of each storage interval may be the same, for example, the size of each storage interval may be 4 megabytes, 8 megabytes, or 16 megabytes, and so on. In another possible implementation, the size of each storage interval may also be different, for example, the size of some storage intervals may be 4 megabits, the size of some storage intervals may be 8 megabits, and so on. The size of the specific storage interval can be configured according to actual requirements, and the scheme does not limit the size.

The data storage device may number the respective storage sections, and illustratively, the respective storage sections may be consecutively numbered in the order of addresses.

In a possible implementation manner, after the data storage device obtains the first tag of the file to be stored, the data storage device may compare the first tag with tags of data in each storage interval, and if the tag of the data in a certain storage interval matches the first tag and there is a remaining storage space in the certain storage interval, the data storage device may store the file to be stored in the certain storage interval.

Specifically, each storage interval may correspond to a work thread for executing a write task, and the data storage device may store the file to be stored in the certain storage interval, specifically, the data storage device adds the file to be stored in a queue of the work thread of the certain storage interval, and stores the file to be stored in the certain storage interval through the work thread.

If the tag of the data stored in a certain storage interval is matched with the first tag, but no storage space is left in the certain storage interval, the data storage device may store the file to be stored in the storage space which is closest to the certain storage interval and in which no data is stored.

Specifically, the data storage device may create a new worker thread for storing the file to be stored in the storage space which is closest to the certain storage interval and in which data is not stored yet.

To facilitate understanding of the storage space which is closest to the certain storage section and in which no data is stored, see fig. 8. Fig. 8 exemplarily shows a part of the storage sections, and assuming that the storage section 6 is the certain storage section, the storage section 1, the storage section 2, the storage section 3, the storage section 5, the storage section 7, the storage section 8, the storage section 9, and the storage section 10 are storage sections adjacent to the storage section 6, and are storage sections closest to the storage section 6, and if there are sections in which data is not stored in the storage sections, the data storage device may select one or more sections in which data is not stored from the sections for storing the file to be stored. If data are stored in all of the adjacent storage sections, the data storage device may select one or more sections, which do not store data, from the storage sections 4, 8, or 12, to be used for storing the file to be stored.

In another possible implementation, after the data storage device obtains the first tag of the file to be stored, the first tag may be compared with tags of data in each storage interval, and if the tags of the data in each storage interval are not matched with the first tag, the data storage device may store the data to be stored in a storage space where no data is stored. Similarly, the data storage device may create a new worker thread for storing the file to be stored in the storage space where no data is stored.

In a possible implementation manner, the tag comparison in the process of storing the file to be stored may be performed in a mapping table, where the mapping table is used to record a mapping relationship among a file name, attribute information of the file, a tag of the file, and a storage interval of the file. For ease of understanding, see table 3.

TABLE 3

File name	Attribute information of file	Label for documents	Storage section of file
				Document
1	Attribute information 1	Label 1	Memory interval 1
				Document 2	Attribute information 2	Label 2	Memory interval 2
Document 3	Attribute information 3	Label 3	Memory section 3

Table 3 illustrates the contents of the mapping table. The data storage device may compare the first tag of the file to be stored with each tag in the mapping table, if a tag matching the first tag exists, check whether a storage interval mapped by the tag matching the first tag has a remaining storage space, if so, store the file to be stored in the storage interval, and if not, store the file to be stored in the storage interval.

If the mapping table does not have a tag matching with the first tag, the data storage device may store the data to be stored in the storage space where no data is stored.

In the process of storing the file to be stored in the storage space, the data storage device may add information of the file to be stored to the mapping table.

In a possible implementation manner, the data storage device may pre-configure a mapping relationship between each storage interval and the tag range, that is, each storage interval is used for storing a file corresponding to the tag range mapped by the storage interval. For example, if the file has a tag of a value, then assuming that storage interval 1 is mapped with tag values 1 to 5000 and storage interval 2 is mapped with tag values 5001 to 10000, then this indicates that files with tag values in the range of 1 to 5000 may be stored in storage interval 1 and files with tag values in the range of 5001 to 10000 may be stored in storage interval 2. For another example, if the tag of the file is a code, assuming that the storage interval 1 is mapped with tags in the range of 11, 12, 13, 14 and 15 and the storage interval 2 is mapped with tags in the range of 21, 22, 23, 24 and 25, the file corresponding to the tag in the range of 11, 12, 13, 14 and 15 may be stored in the storage interval 1 and the file corresponding to the tag in the range of 21, 22, 23, 24 and 25 may be stored in the storage interval 2.

The first storage section may be configured to store a file of tags in a first range, and then the first tag is in the first range, specifically, the first range may be configured according to an actual situation, and this scheme does not limit this.

In one possible embodiment, before storing the file to be stored in the storage space, the data storage device may divide the file to be stored into a plurality of data blocks, and each data block has the same tag as the file to be stored, so that the plurality of data blocks may be stored in the same or adjacent storage spaces.

Or, the file to be stored includes a plurality of subfiles, and then the tags of the plurality of subfiles are the same as the tags of the file to be stored, so that the plurality of subfiles can be stored in the same or adjacent storage spaces.

In this embodiment, the tags of the data blocks belonging to the same file are the same, and the tags of the data blocks are the same as the tags of the file to which the data blocks belong, so that the data blocks of the file can be stored in the same or adjacent storage spaces, thereby improving the reading efficiency of the file.

In one possible embodiment, the files may be divided into hot data and cold data, and illustratively, the hot data and the cold data may be distinguished according to the type of the file, e.g., files that are frequently modified, such as editable documents and drawing files, may be divided into hot data, and files that are less frequently modified, such as audio or video, may be divided into cold data. Meanwhile, the storage interval can also be divided into a hot interval and a cold interval, wherein the hot interval is used for storing hot data, and the cold interval is used for storing cold data.

After the data storage device acquires the file to be stored and the attribute information thereof, it may determine whether the file to be stored is hot data or cold data according to the attribute information, for example, the type of the file. If the data is the hot data, searching whether the label of the data in the storage interval is matched with the first label of the file to be stored in the hot interval. If the data is cold data, whether the label of the data in the storage interval is matched with the first label of the file to be stored is searched in the cold interval.

In a possible implementation manner, the data storage device records the number of times of erasing data in each storage interval, and may periodically check the number of times of erasing data in each storage interval, and if the number of times of erasing data in a certain hot interval is less than the preset number of times of erasing within a preset time period, the data in the hot interval may be considered as cold data, and the data storage device may migrate the data in the hot interval to the cold interval for storage. If the erasing times of the data in a certain cold interval are greater than the preset erasing times within the preset duration, the data in the cold interval can be considered as hot data, and the data storage device can migrate the data in the cold interval to the hot interval for storage.

In one possible embodiment, a memory, such as a flash memory, may include a plurality of flash memory granules, and the hot region and the cold region are configured in the plurality of flash memory granules according to a preset space configuration rule, where the preset space configuration rule is such that a ratio of the number of the hot region and the cold region in each of the plurality of flash memory granules does not exceed a second threshold value, and the second threshold value may be any value between 0.7 and 1.3. For ease of understanding, this is illustrated. Assuming that a flash memory granule includes 100 storage intervals and the second threshold is 1.1, the hot interval/cold interval is 1.1 and the hot interval + cold interval is 100, the final calculation results in 52 hot intervals and 48 cold intervals. In the embodiment of the application, the number of the hot areas and the number of the cold areas in each flash memory particle can be balanced, so that the data storage load in the flash memory particles can be balanced, the balance of the service life of each flash memory particle is also ensured, and the service life of the flash memory particles is prolonged to a certain extent.

The foregoing describes the solution provided by the embodiments of the present application mainly from the perspective of a data storage device. It will be appreciated that the data storage device, in order to carry out the above-described functions, may comprise corresponding hardware structures and/or software modules for performing the respective functions. Those of skill in the art would readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiment of the present application, the data storage device and the like may be divided into functional modules according to the above method examples, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, in the embodiment of the present application, the division of the module is schematic, and is only one logic function division, and there may be another division manner in actual implementation.

In the case of dividing each functional module by corresponding functions, fig. 9 shows a schematic logical structure diagram of the data storage device according to the foregoing method embodiment, and the data storage device 900 includes:

a first obtaining unit 901, configured to obtain a file to be stored and attribute information of the file to be stored;

a second obtaining unit 902, configured to obtain, according to the attribute information, a first tag corresponding to the file to be stored;

a storage unit 903, configured to store the file to be stored in a first storage interval according to the first tag, where the first tag is matched with a second tag carried by the file in the first storage space.

In a possible implementation manner, the file to be stored includes a plurality of data blocks, where each data block in the plurality of data blocks corresponds to a first tag, and the storage unit 903 is specifically configured to: and storing the plurality of data blocks in the first storage interval according to the first tag.

In a possible embodiment, the information of the logical address includes a storage path of the file to be stored in the file system, wherein the first tag matches a tag of a file in the storage path in the same hierarchical directory as the file to be stored.

In a possible implementation manner, the attribute information of the file to be stored includes multiple items of sub-attribute information, and the second obtaining unit 902 is specifically configured to: respectively acquiring a number according to each item of sub-attribute information of the plurality of items of sub-attribute information; and obtaining the first label according to the obtained plurality of numbers.

In one possible embodiment, the first storage region is located in a memory, the memory including a hot region for storing hot data and a cold region for storing cold data. The storage unit 903 is specifically configured to:

searching a storage interval for storing the file to be stored in the hot interval according to the first label under the condition that the file to be stored belongs to hot data; determining the first storage interval in the hot interval for storing the file to be stored;

or, under the condition that the file to be stored belongs to cold data, searching a storage interval for storing the file to be stored in the cold interval according to the first tag; and determining the first storage interval in the hot interval for storing the file to be stored.

In one possible embodiment, the first storage section is used to store a file with a tag value in the first range, and the storage unit 903 is specifically used to: and determining that the value of the first label is in the first range, and storing the file to be stored in a first storage interval.

For specific operations and advantages of the units in the data storage device 900 shown in fig. 9, reference may be made to the description in the foregoing embodiments, and details are not repeated here.

An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and the computer program is executed by a processor to implement the method embodiment described in fig. 5 and its possible implementation manner.

An embodiment of the present application further provides a computer program product, and when the computer program product is read and executed by a computer, the method embodiment described in the foregoing fig. 5 and its possible implementation manner is implemented.

The embodiment of the present application further provides a computer program, which when executed on a computer, will enable the computer to implement the method embodiment described in the foregoing fig. 5 and its possible implementation manner.

The terms "first," "second," and the like in this application are used for distinguishing between similar items and items that have substantially the same function or similar functionality, and it should be understood that "first," "second," and "nth" do not have any logical or temporal dependency or limitation on the number or order of execution. It will be further understood that, although the following description uses the terms first, second, etc. to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first image may be referred to as a second image, and similarly, a second image may be referred to as a first image, without departing from the scope of the various described examples. Both the first image and the second image may be images, and in some cases, may be separate and distinct images.

It should also be understood that, in the embodiments of the present application, the size of the serial number of each process does not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

It will be further understood that the terms "comprises," "comprising," "includes," and/or "including," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be appreciated that reference throughout this specification to "one embodiment," "an embodiment," "one possible implementation" means that a particular feature, structure, or characteristic described in connection with the embodiment or implementation is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" or "one possible implementation" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A method of storing data, comprising:

2. The method according to claim 1, wherein the first tag is used to indicate a probability that the file to be stored is accessed;

the first tag is matched with a second tag carried by a file in the first storage space, and the method comprises the following steps:

the similarity between the first label and the second label is greater than or equal to a first threshold value.

3. The method according to claim 1 or 2, wherein the attribute information of the file to be stored comprises at least one of information of a logical address where the file to be stored is stored, an identification of a user to which the file to be stored belongs, and file type information of the file to be stored.

4. The method according to claim 3, wherein the information of the logical address comprises a storage path of the file to be stored in a file system.

5. The method according to claim 1, wherein the attribute information of the file to be stored includes a plurality of items of sub-attribute information, and the obtaining the first tag of the file to be stored according to the attribute information includes:

and obtaining the first label according to the obtained plurality of numbers.

6. The method according to any one of claims 1 to 5, wherein the file to be stored comprises a plurality of data blocks, wherein each data block in the plurality of data blocks corresponds to a first tag,

the storing the file to be stored to a first storage interval according to the first tag comprises:

and storing the plurality of data blocks to the first storage interval according to the first label.

7. The method of any one of claims 1 to 6, wherein the first storage compartment is located in a memory, the memory comprising a hot compartment for storing hot data and a cold compartment for storing cold data;

searching a storage interval for storing the file to be stored in the hot interval according to the first label under the condition that the file to be stored belongs to hot data;

determining the first storage interval in the hot intervals to be used for storing the file to be stored.

8. The method according to any one of claims 1 to 7, wherein the first storage interval is used for storing files having tags with values in the first range,

determining that the value of the first tag is within the first range,

and storing the file to be stored in a first storage interval.

9. A data storage device, comprising:

10. The apparatus according to claim 9, wherein the first tag is configured to indicate a probability that the file to be stored is accessed;

11. The apparatus according to claim 9 or 10, wherein the attribute information of the file to be stored comprises at least one of information of a logical address where the file to be stored is stored, an identification of a user to which the file to be stored belongs, and file type information of the file to be stored.

12. The apparatus according to claim 11, wherein the information of the logical address comprises a storage path of the file to be stored in a file system.

13. The apparatus according to claim 9, wherein the attribute information of the file to be stored includes a plurality of items of sub-attribute information, and the second obtaining unit is configured to:

and obtaining the first label according to the obtained plurality of numbers.

14. The method according to any one of claims 9 to 13, wherein the file to be stored comprises a plurality of data blocks, wherein each data block in the plurality of data blocks corresponds to a first tag, and the storage unit is configured to:

15. The apparatus of any of claims 9 to 14, wherein the first storage compartment is located in a memory, the memory comprising a hot compartment for storing hot data and a cold compartment for storing cold data; the storage unit is used for:

searching a storage interval for storing the file to be stored in the hot interval according to the first label under the condition that the file to be stored belongs to hot data; determining the first storage interval in the hot intervals to be used for storing the file to be stored.

16. The apparatus according to any one of claims 9 to 15, wherein the first storage interval is configured to store a file with a tag value within the first range, and the storage unit is configured to:

determining that the value of the first tag is within the first range,

and storing the file to be stored in a first storage interval.

17. A data storage device comprising a processor and a memory, the memory storing a computer program, the processor being configured to invoke the computer program to perform the method of any one of claims 1 to 8.