CN110597452A

CN110597452A - Data processing method and device of storage system, storage server and storage medium

Info

Publication number: CN110597452A
Application number: CN201810607955.1A
Authority: CN
Inventors: 吴亦川; 郑健平
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Communications Ltd Research Institute
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Communications Ltd Research Institute
Priority date: 2018-06-13
Filing date: 2018-06-13
Publication date: 2019-12-20

Abstract

The embodiment of the invention discloses a data processing method and device of a storage system, a storage server and a storage medium. The method comprises the following steps: determining whether a first file requesting an operation is cached in a buffer; if the first file is cached in the buffer area, accessing the first file in the buffer area; wherein, if the first file is cached in the buffer area, accessing the first file located in the buffer area includes at least one of: if the first file is cached in the buffer area, reading the first file from the buffer area; if the first file is cached in the buffer area, updating the first file in the buffer area; and if the first file is cached in the cache region, deleting the first file in the cache region.

Description

Data processing method and device of storage system, storage server and storage medium

Technical Field

The present invention relates to the field of information technologies, and in particular, to a data processing method and apparatus for a storage system, a storage server, and a storage medium.

Background

The storage system may be used to store various information, for example, to store various files. The storage system may comprise a distributed storage system. The distributed storage system comprises a plurality of storage servers which are arranged in a distributed mode. The distributed storage system can support high Input/Output (I/O) throughput, reliability and expandability due to the special structural design of the storage server in distributed arrangement. However, in the prior art, even a distributed storage system still has the problem of large access delay, especially when high-temperature files are stored in the storage system, the problem of large access delay is particularly prominent, and on the other hand, the disk aging rate of the storage server is high, and the service life is short.

Disclosure of Invention

The embodiment of the invention provides a data processing method and device of a storage system, a storage server and a storage medium.

The technical scheme of the invention is realized as follows:

in a first aspect, an embodiment of the present invention provides a data processing method for a storage system, including:

determining whether a first file requesting an operation is cached in a buffer;

if the first file is cached in the buffer area, accessing the first file in the buffer area; wherein, if the first file is cached in the buffer area, accessing the first file located in the buffer area includes at least one of:

if the first file is cached in the buffer area, reading the first file from the buffer area;

if the first file is cached in the buffer area, updating the first file in the buffer area;

and if the first file is cached in the cache region, deleting the first file in the cache region.

Optionally, if the first file is cached in the buffer, updating the first file located in the buffer includes:

if the first file is cached in the buffer area, locking the buffer area, wherein the lock of the buffer area is used for suspending the read operation response of the buffer area;

modifying the first file located in the buffer;

unlocking the buffer after the modification of the first file is completed.

Optionally, the method further comprises:

determining whether a buffer start condition is satisfied;

if the starting condition of the buffer area is met, caching N files meeting the preset condition in the buffer area, wherein N is a positive integer.

Optionally, the determining whether the buffer start condition is satisfied includes:

and determining whether the starting condition of the buffer area is met or not according to the access condition information of the file in a preset time period.

Optionally, the determining whether the buffer starting condition is met according to the access condition information of the file in the predetermined time period includes at least one of:

if the access times of at least one file in the preset time period are larger than the access time threshold, determining that the starting condition of the buffer area is met;

if the access times of at least one file in the preset time period are larger than the access time threshold value, and the time interval between the access time of the last access of the at least one file and the current time is smaller than a first time interval, determining that the starting condition of the buffer area is met;

and if the access times of at least one file in the preset time period are larger than the access time threshold value and the time interval between the current time and the last starting time of the buffer area is smaller than a second time interval, determining that the starting condition of the buffer area is met.

Optionally, the method further comprises:

and when the buffer area is closed or the file of the buffer area is updated, synchronizing the file with the update in the buffer area to a disk.

Optionally, the method further comprises:

determining the cache information of the buffer area; wherein the cache information comprises at least one of:

the buffer area head structure body is used for storing the file information cached in the buffer area;

a tail pointer for indicating the tail address of the last file written in the buffer;

a buffer flag for indicating whether the buffer is open;

and the buffer area information is used for indicating a buffer area corresponding to the buffer area.

Optionally, the method further comprises:

determining whether the buffer area has enough residual space according to the file size of the second file to be written into the buffer area;

and if the buffer area has enough residual space, writing the second file into the residual space.

Optionally, the method further comprises at least one of:

if the buffer area does not have enough residual space, writing the second file from the tail pointer of the buffer area;

and if the second file is written to the tail part of the buffer area, continuing to write the second file from the head part of the buffer area.

Optionally, the method further comprises:

and updating the cache information of the buffer area according to the condition that the second file is written into the buffer area.

Optionally, the updating the cache information of the buffer area according to the condition that the second file is written into the buffer area includes at least one of:

adding the file information of the second file in a buffer area head structure body of the cache information;

deleting file information of a third file covered by the second file within the buffer header structure;

and updating the tail pointer of the cache region according to the cache condition of the second file.

Optionally, the updating the tail pointer of the cache region according to the caching status of the second file includes at least one of:

if the buffer area has enough residual space, the cache address pointed by the tail pointer is shifted according to the file size of the second file;

and if the buffer area does not have enough residual space, dividing the cache address currently indicated by the tail pointer by the space size of the residual space after adding the file size of the second file to obtain the updated tail pointer.

Optionally, the method further comprises:

and if the first file is not positioned in the buffer area, accessing the first file from a disk.

In a second aspect, an embodiment of the present invention provides a data processing apparatus for a storage system, including:

the determining module is used for determining whether the first file requesting the operation is cached in a buffer area;

the cache access module is used for accessing the first file in the buffer area if the first file is cached in the buffer area;

the cache access module is specifically configured to execute at least one of the following:

In a third aspect, an embodiment of the present invention provides a storage server, including:

a transceiver for receiving and transmitting a signal from the wireless communication device,

a memory for storing a plurality of data to be transmitted,

and the processor is respectively connected with the transceiver and the memory, and is used for controlling the information transceiving of the transceiver and the information storage of the memory by executing the computer executable instructions on the memory, and realizing the data processing method of the storage system provided by any technical scheme of the first aspect.

In a fourth aspect, embodiments of the present invention provide a computer storage medium having computer-executable instructions stored thereon; after being executed, the computer-executable instructions can implement the data processing method of the storage system provided by any technical scheme of the first aspect.

According to the data processing method and device of the storage system, the storage server and the storage medium provided by the embodiment of the invention, whether the file is in the buffer area or not can be determined before the file requested to be operated is accessed, and if the requested file is directly accessed from the buffer area in the buffer area, compared with the case that the file is accessed from a disk every time, on one hand, the access speed can be reduced, and the access time delay is reduced; on the other hand, the file is prevented from being accessed from the disk every time, the access frequency of the disk can be reduced, and the problems of high aging rate and short service life of the disk due to frequent access of the disk are solved. Therefore, the data processing method and device provided by the storage system provided by the embodiment of the invention,

Drawings

Fig. 1 is a schematic flowchart of a data processing method of a first storage system according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of a data processing method of a second storage system according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a data processing apparatus of a second storage system according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a cache server according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a cache system according to an embodiment of the present invention;

FIG. 6 is a block diagram of a data processing apparatus of a third storage system according to an embodiment of the present invention;

FIG. 7 is a block diagram of a data processing apparatus of a fourth storage system according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a data processing apparatus of a fifth storage system according to an embodiment of the present invention.

Detailed Description

The technical solution of the present invention is further described in detail with reference to the drawings and the specific embodiments of the specification.

As shown in fig. 1, the present embodiment provides a data processing method of a storage system, including:

step S110: determining whether a first file requesting an operation is cached in a buffer;

step S120: and if the first file is cached in the buffer area, accessing the first file in the buffer area.

The data processing method of the storage system provided by the embodiment can be applied to various storage systems, for example, can be applied to a distributed storage system; the method can be particularly applied to the storage server of the distributed storage system.

The storage server may receive an operation request from a client, which may include one or more of: read requests, write requests, delete requests. The read request is a request to read a target file (referred to as a first file herein) from a storage server; the write request includes: adding information in a first file, and modifying the existing content of the first file, wherein the modification comprises: and replacing the partial content in the first file, deleting the partial content in the first file, and adjusting the sequence of different partial contents in the first file.

The deletion request is used for deleting the whole file of the first file.

In this embodiment, after receiving an operation request from a client, it is determined whether a first file requested to be operated is cached in a buffer. The buffer here may be a memory area such as a cache.

If the first file is located in the buffer area, the first file located in the buffer area is directly accessed, and the first file is not required to be accessed from a disk. Therefore, if a file with high access frequency (high heat) occurs, after the file with high heat is written into the buffer area, when other clients access the file again, the storage server can directly send the first file in the buffer area to the clients, or directly rewrite the first file in the buffer area, so that the speed of accessing the file from the disk by the storage server is greatly improved; on the other hand, aiming at the first file with high heat, the frequency of operating the disk by the storage server is greatly reduced, the problem of disk aging caused by frequent high-speed reading and writing of the disk is reduced, and the service life of the storage server is prolonged.

The file content of the first file herein may include: the data is not limited to any one of various data such as text data, audio data, image data, and programmable code.

Optionally, the step S120 may include at least one of:

and if the first file is cached in the buffer area, updating the first file in the buffer area.

For example, if the operation request sent by the client is a read request, the step S120 may include: and responding to the reading request, reading the first file from the buffer area, and transmitting the read first file to a requester through a network, for example, a client requesting to read the file.

For another example, the write request or the delete request sent by the client may include, in step S120: and updating the first file in the cache region in response to the write request or the delete request and the like.

For another example, if a deletion request is currently received, the step S120 may include: and deleting the first file in the cache region. And if the deletion request sent by the requester, in addition to deleting the first file in the buffer area, deleting the first file on the disk.

In summary, in this embodiment, if the target file is stored in the buffer, when the read request is satisfied, in order to accelerate reading and reduce the number of times of reading the disk, the target file in the read buffer is directly transmitted to the requester, if the read request is satisfied, the target file in the buffer is operated first, and then the target file updated in the buffer is rewritten back to the disk when the buffer is closed, so that the writing of the file is completed. If the target file is the deletion request, the target file in the buffer area is deleted first, and the phenomena that other read requests or write requests quickly pass through the target file cached in the buffer area and are dirty and the like are avoided. The target file also needs to be deleted from the disk in order to completely delete the target file.

Optionally, the step S120 may include:

modifying the first file located in the buffer;

unlocking the buffer after the modification of the first file is completed.

When updating the file of the buffer, in order to reduce dirty reading, before the file of the buffer needs to be changed, locking the buffer, that is, locking the buffer, where the locking of the buffer specifically includes: a lock identification bit is set for a buffer, and after the lock identification bit is set to a particular value, the buffer is considered locked. Once the buffer is locked, some or all of the files stored in the buffer are locked, and the response of the read operation is suspended. After the lock of the buffer is released, the read of the buffer is resumed, and the read operation that was not responded to before is responded.

Optionally, as shown in fig. 2, the method further includes:

step S101: determining whether a buffer start condition is satisfied;

step S102: if the starting condition of the buffer area is met, caching N files meeting the preset condition in the buffer area, wherein N is a positive integer.

In some embodiments, the buffer of the storage server may be kept in an open state all the time, and a file with a latest or high read frequency is cached in the buffer, so as to speed up access on one hand, and reduce the number of times of reading and writing the disk on the other hand to prolong the service life of the disk.

The caching of the N files satisfying the predetermined condition in the buffer may include at least one of:

if the starting condition of the buffer area is met, caching the N files which are accessed recently in the buffer area;

and if the starting condition of the buffer area is met, caching the file meeting the starting condition of the buffer area into the buffer area.

The recently accessed N files herein may include: and determining N files recently read, written or updated by the storage server according to the access record information of the files accessed by the storage server, and writing the N files into the buffer area. The value of N may be any preset value, for example, the value of N is 8, 10, 12, or 20.

In other embodiments, the step S101 may include:

A file stored on a storage server, the storage server being aware of. At the same time, the storage server records various information such as reading and writing of files through a log file and the like. The information in the log file may include the access status information. The ending time of the predetermined time period may be the current time, and the starting time of the predetermined time period may be the current time, backward forward by a predetermined time, and the like.

In this embodiment, whether a starting condition of the buffer is satisfied may be determined according to the access status information in a predetermined time period, and if the starting condition of the buffer is satisfied, the file access of the buffer is started. Starting up the buffer may comprise the steps of:

configuring the buffer according to the configuration file of the buffer, for example, the configuration file at least includes: configuration information indicating the spatial capacity of the buffer, the configuration file further comprising: setting data structure for recording buffer information in the buffer area, wherein the data structure can define various fields for recording different types of buffer information; these fields may be used to manage and accelerate the operation of the files within the buffer.

The spatial capacity of the buffer indicated by the configuration information may be statically set in advance, for example, set to 100M, 256M, 512M, or the like. In other embodiments, the method further comprises:

determining a storage space required by high-heat file caching according to the access condition information of the file;

and setting the space capacity of the buffer area according to the storage space.

A high heat file is here a file that has access frequency exceeding a specified frequency. The number of the high heat files may be one or more. The capacity space is not less than the capacity of the storage space required by the high-heat file cache. For example, the capacity space a of the buffer is equal to a times the storage space required for the high-heat file cache. And a is an integer not less than 1.

Thus, in some embodiments, the configuring the buffer may include:

initializing the buffer area according to the configuration file, for example, selecting a part of memory space in a memory to be set as the memory space of the buffer area;

and creating the cache information according to the data structure.

After the configuration of the buffer area is completed, at least the file meeting the opening condition of the buffer area is written into the buffer area. And updating the cache information according to the file direction written in the buffer area.

It is worth noting that: in this embodiment, determining whether the opening condition of the buffer is satisfied according to the access status information of the file in the predetermined time period may be a file in the buffer and/or a file on the disk in the predetermined time period, and therefore the method may include: caching access status information of all files on the server.

For example, the duration of the predetermined period of time may be a fixed value, e.g., 30 minutes, 20 minutes, 15 minutes, or the like.

Optionally, the step S101 may include at least one of:

if the access times of at least one file in the preset time period are larger than the access time threshold, determining that the starting condition of the buffer area is met; for example, when a buffer is started for the first time in a storage server newly added to the storage system, the buffer may be started when it is considered that the buffer start condition is satisfied when it is detected that the access number of at least one file in a predetermined time period is greater than the access number threshold; certainly, the determination strategy meeting the starting condition of the buffer area is not limited to starting the buffer area for the first time;

The access frequency threshold value can be preset, if the actual access frequency of a file in a preset time period exceeds the access frequency threshold value, it is indicated that a file with high access frequency exists in the current storage server, and if the file is accessed from a disk every time, on one hand, the access rate is full, and on the other hand, the read-write frequency of the disk is large; therefore, the starting condition of the buffer area is determined, and the file access process of the buffer area is started. After the file is written into the buffer area, the client can directly send the file to the client from the buffer area after accessing again, the file is directly updated in the buffer area, the access speed is accelerated, and the read-write times of a disk are reduced.

The access here includes: reading in response to the read request, further comprising: write in response to a write request.

In other embodiments, the number of times that a file is accessed within a predetermined time period is greater than the access number threshold, and the time interval between the access time of the last access of the file and the current time is smaller than the first time interval, which indicates that the file is accessed frequently within the time period, and the interval between the access time of the last access of the file and the current time is small, which indicates that there is a very high probability that the file will continue to be accessed frequently, so that it is necessary to start a buffer file access flow or maintain a buffer file access flow, and it is determined that the opening condition of the buffer is satisfied, so that the buffer is opened or maintained, and it is convenient for each client to access the high-heat file from the buffer.

Optionally, the method further comprises: and when the buffer area is closed or the file of the buffer area is updated, synchronizing the file with the update in the buffer area to a disk.

In this embodiment, when the buffer is closed, the file in the buffer needs to be written into the disk, because the file in the buffer may be updated, but the file in the buffer is written into the disk, so that even if the buffer is closed, the file in the latest version is stored in the disk, which is convenient for subsequent successful access and avoids subsequent dirty reading.

Similarly, if an update operation of a file cached in the buffer area is received, after the file in the buffer area is directly updated, the file updated according to the buffer area is directly written into the disk, and the synchronous update of the disk is realized; and the final version of the file can be saved in the magnetic disk finally, so that the stored data is prevented from being abnormal.

Optionally, the method further comprises:

a buffer flag for indicating whether the buffer is open;

In this embodiment, the buffer header structure stores file information of a file currently stored in the buffer, where the file information at least includes: a file identification of the file, which may include: file name and/or File Identifier (FID). Thus, in step S110, it is known whether the first file exists in the current buffer area through matching of the file information. In some embodiments, the file information may further include: index information for each file indicating a storage location of the corresponding file within the buffer. For example, the index information includes at least: the initial address of the corresponding file stored in the buffer area; in other embodiments, the index information may further include: the corresponding file is stored in the buffer at the end position.

The buffer flag may be used to indicate whether the buffer is open, for example, the buffer flag may be an open flag bit of the buffer, and may be used to indicate whether the buffer is in an open state or a closed state. Therefore, in some embodiments, the method further comprises: and determining whether the buffer area is in an open state, if so, executing the step S110, otherwise, not executing the step S110, and directly accessing the first file from the disk. Determining whether the buffer is in an open state may include: and determining whether the buffer area is opened or not according to the buffer area identification, namely determining that the buffer area is in an opened state.

Optionally, the method further comprises:

And determining whether the capacity of the blank space which is not written with data or the releasable space which is written with invalid data (data which can be overwritten) in the current buffer area is enough according to the file size of the second file to be written into the buffer area. Here, the empty space and the releasable space are collectively referred to as a remaining space. For example, comparing the file size with the space capacity of the remaining space, if the space capacity of the remaining space is larger than the file size, it indicates that the buffer has enough remaining space, and if the space capacity of the remaining space is smaller than the file size, it indicates that the buffer does not have enough remaining space.

And if the buffer area has enough residual space, directly writing the second file into the residual space, thus adding the second file into the buffer area.

In this embodiment, when the second file is written in the remaining space, the writing may be started from a position indicated by a tail pointer of the buffer, that is, a position pointed by the tail pointer is a start position of the writing of the second file.

In some embodiments, the method further comprises at least one of:

If the current buffer area does not have enough residual space, the second file is written from the position pointed by the tail pointer of the buffer area, and if the tail part of the buffer area is carried, the residual part of the second file is continuously written from the head part of the buffer area, so that on one hand, the complete writing of the second file is realized, and on the other hand, the buffer area adopts the sequential writing mode, so that the file which is written into the buffer area firstly is covered by the file which is written into the buffer area later, and therefore, the file with the lower heat degree is replaced by the file with the higher heat degree at present to be written into the buffer area.

Optionally, the method further comprises:

Here, the updating of the cache information may include: the update of the buffer header structure may further include an update of a tail pointer.

For example, the updating the cache information of the buffer area according to the status of writing the second file into the buffer area includes: adding the file information of the second file in a buffer area head structure body of the cache information; deleting file information of a third file covered by the second file within the buffer header structure; and updating the tail pointer of the cache region according to the cache condition of the second file.

The file previously written to the buffer for the third file here is the file overwritten by the newly written file to the buffer. If a part of a document is covered, the space occupied by the uncovered part of the document is the releasable space, which can be a component of the remaining space.

For example, if the second file is directly appended to the remaining space of the buffer, the update of the tail pointer only needs to be shifted by the file size, resulting in the newly pointed to address of the tail pointer.

If the previously written third file is overwritten during the writing of the second file, the third file needs to delete the file information of the third file from the buffer header structure, so as to ensure the accuracy of the determination of whether the first file is in the buffer in step S110.

Optionally, the method further comprises: and if the first file is not positioned in the buffer area, accessing the first file from a disk.

If the first file is not located in the buffer area, the file requested to be accessed by the client side needs to be successfully returned to the client side, the first file is directly accessed from the disk, if the operation request of the current client side is a read request, the first file is directly read from the disk, the read first file is sent to the client side, if the operation request of the client side is a write request, the first file is read from the disk, data is written into the buffer area, and the first file with the data written is written back to the disk again.

As shown in fig. 3, the present embodiment provides a data processing apparatus of a storage system, including:

a determining module 110, configured to determine whether the first file requesting the operation is cached in a buffer;

the cache access module 120 is configured to access the first file located in the buffer area if the first file is cached in the buffer area.

The data processing device can be applied to the storage server.

The determining module 110 and the cache accessing module 120 may be both program modules, and the program modules can determine whether the first file is in the buffer after being executed by the processor, and meanwhile, if the first file is in the buffer, the first file in the buffer may be directly accessed, instead of starting to access from the disk again, so as to accelerate the access rate.

Optionally, the cache access module 120 includes at least one of:

the reading unit is used for reading the first file from the buffer area if the first file is cached in the buffer area;

the updating unit is used for updating the first file in the buffer area if the first file is cached in the buffer area;

and the deleting unit is used for deleting the first file in the buffer area if the first file is cached in the buffer area.

In other embodiments, the updating unit is specifically configured to lock the buffer if the first file is cached in the buffer, where the lock of the buffer is used to suspend the read operation response of the buffer; modifying the first file located in the buffer; unlocking the buffer after the modification of the first file is completed.

Further, the apparatus further comprises:

the starting judgment module is used for determining whether the starting condition of the buffer area is met;

and the caching module is used for caching N files meeting preset conditions in the buffer area if the starting conditions of the buffer area are met, wherein N is a positive integer. In some embodiments, N may be a positive integer of 8, 10, etc.

In other embodiments, the start determining module is specifically configured to determine whether the buffer start condition is met according to access condition information of the file within a predetermined time period.

Here, the termination time of the predetermined period may be the current time.

Specifically, the start determining module is specifically configured to execute at least one of:

Further, the apparatus further comprises:

and the disk writing module is used for synchronizing the updated file in the buffer area to the disk when the buffer area is closed or the file in the buffer area is updated.

When the buffer area is closed or the files in the buffer area are updated, the files with the updates in the buffer area are synchronized to the disk, so that the problem that new files written into the buffer area or the files cached in the buffer area with the updates are not finally written into the disk is solved.

In other embodiments, the apparatus further comprises:

a buffer information determining module 110, configured to determine buffer information of the buffer; wherein the cache information comprises at least one of:

a buffer flag for indicating whether the buffer is open;

buffer information for indicating a buffer area corresponding to the buffer。

In other embodiments, the apparatus further comprises:

a remaining space determining module 110, configured to determine whether there is sufficient remaining space in the buffer area according to a file size of the second file to be written in the buffer area;

the cache module is specifically configured to write the second file into the remaining space if the buffer area has a sufficient remaining space.

In some embodiments, the cache module is specifically configured to perform at least one of:

Optionally, the apparatus further comprises:

and the cache information updating module is used for updating the cache information of the buffer area according to the condition that the second file is written into the buffer area.

The cache information updating module specifically includes at least one of the following:

the first updating submodule is used for adding the file information of the second file in a buffer area head structure body of the cache information;

a second update sub-module, configured to delete, in the buffer header structure, file information of a third file covered by the second file;

and the third updating submodule is used for updating the tail pointer of the cache region according to the cache condition of the second file.

Further, the third update submodule is specifically configured to, if there is sufficient remaining space in the buffer, offset the cache address pointed by the tail pointer according to the file size of the second file; and if the buffer area does not have enough residual space, dividing the cache address currently indicated by the tail pointer by the space size of the residual space after adding the file size of the second file to obtain the updated tail pointer.

Furthermore, the apparatus further comprises:

and the disk access module is specifically used for accessing the first file from a disk if the first file is not located in the buffer area.

As shown in fig. 4, the present embodiment further provides a storage server, including:

a memory for storing a plurality of data to be transmitted,

and a processor, respectively connected to the transceiver and the memory, for controlling the transceiver to transmit and receive information and the memory to store information by executing computer-executable instructions located on the memory, and implementing a data processing method of one or more of the foregoing storage systems, for example, one or more of the methods shown in fig. 1 to 2 and fig. 6 to 8.

The transceiver may include various devices having communication functions, such as a network interface and/or a transceiver antenna, etc.

The memory may include: a device for storing media. In this embodiment, the memory may include: a buffer and a disk; the buffer is used for providing the buffer area; the disk may be used for writing of various files.

The processor may include: a central processing unit, a microprocessor, a digital signal processor, a programmable array or an application specific integrated circuit, etc.

The processor is respectively connected with the transceiver and the memory, and is used for implementing the data processing method of the storage system provided by one or more of the above technical solutions by executing the computer executable code.

The present embodiments also provide a computer storage medium having computer-executable instructions stored thereon; after being executed, the computer-executable instructions can implement the data processing method of the storage system provided by one or more of the technical solutions; for example, one or more of the methods illustrated in fig. 1-2 and 6-8.

Several specific examples are provided below in connection with any of the embodiments described above:

example 1:

the present example provides a data processing method in a storage server usable in a Distributed File System (Distributed File System), which may include:

a cache module is added in the storage server, and whether the requirement of opening a buffer area is met is judged before downloading the file each time, namely: if the repetition rate of the recently downloaded file is high, opening a buffer area, firstly searching the buffer area before reading the local disk each time, if the local disk is hit, directly reading the file from the memory, and accelerating the reading speed, otherwise, loading the file from the disk, and updating the head of the buffer area and the actual file storage position in the memory according to a certain rule. Otherwise, downloading the file according to normal logic.

There is also provided in this example a distributed storage system, as shown in fig. 5, comprising:

the tracking server can comprise a plurality of servers; the tracking server may be configured to provide an access address of a target file to be accessed to the client requesting the operation, for example, information such as a network protocol (IP) address and a port number of a storage server storing the target file.

The storage servers can be used for storing files in a distributed mode, connection can be established with the client, and Transmission Control Protocol (TCP)/IP Protocol Transmission of the files is carried out based on the established connection.

A distributed storage system (e.g., a distributed file system) refers to a large system in which data stored physically is not stored in a single computer node (corresponding to the storage server) but in a group of servers, which are connected to each other via a network and can be accessed to each other. The distributed file system has better mass data storage capacity, higher I/O throughput, reliability and expandability, thereby being more and more widely applied. The distributed file systems currently in the mainstream include GFS of Google corporation, HDFS of open source project Hadoop, TFS of Taobao, MooseFS, GlusterFS, MogileFS, FastDFS, and the like. The architecture and implementation principles of these systems are substantially the same.

The main functions of the distributed file system include file storage, file synchronization, file access (such as file uploading and file downloading), and the like, and simultaneously, the requirements of high capacity and load balancing are met. Typically, three roles are involved, client (client), tracker server (tracker server), and storage server (storage server).

(1) Client (client): the client initiates various operation requests to the file. In any request, the tracking server needs to communicate with the storage server after obtaining the address of the storage server.

(2) Tracking Server (Tracker Server): the tracking Server group can be composed of a plurality of servers, and the tracking Server is responsible for connecting the client and the Storage Server (Storage Server). The tracking server adopts a peer-to-peer structure, and can be added or taken off line at any time without influencing on-line service, so that the problem of single point of failure is avoided. The tracking server also plays a role in load balancing.

(3) Storage Server (Storage Server): the storage server group can also be composed of a plurality of servers, a grouped organization structure is adopted, the storage contents of the storage servers in the same group are the same, and the storage servers in the same group cannot communicate with each other, so that the organization structure plays roles in redundant backup and load balancing to a certain degree.

For example, the tracking server selects the IP address and the port number of the storage server with a low load rate to return to the client according to the load rate of the storage server currently storing the same file, so that load balancing is achieved.

The client communicates with the tracking server, the tracking server finds a storage server according to a certain algorithm, then the IP address and the port of the storage server are returned to the client, the client communicates with the storage server directly by using a TCP/IP protocol to upload files, and then the client does not need to interact with the tracking server. When the uploading is finished, the storage server returns the generated FID to the client and stores the FID, and the generated FID is used when the client operates the file later. The generated FID information is mainly composed of four parts, namely: the group name of the packet, the disk path, the two-level directory, and the HASH (HASH) algorithm generate a new file name. The group name of the group is used to indicate the group name of the memory group storing the corresponding file. And storing the file indicated by the disk path in the disk of the corresponding storage server.

The client downloading the file may include:

the client communicates with the tracking server first, the FID of the file to be downloaded is sent to the tracking server first,

the tracking server returns the IP address and port of the storage server storing the file to the client through the socket,

and the client side directly communicates with the storage server according to the returned information.

In this way, the client can download the file from the corresponding storage server.

The tracking server does not know the storage location of the file, but can find the file requested by the client. This is because the file download request of the client includes the FID of the file to be downloaded. After the tracking server obtains the download request, the storage server group where the file is located can be found according to the group name of the packet in the FID, then a storage server which can be used is found in the storage server group, the IP address and the port of the storage server are returned to the client, and the client actively sends the download request to the storage server. The request received by the storage server also contains the FID of the file, the file is quickly found according to the virtual disk path and the two-level directory in the FID, and then the content of the file is returned through the socket to finish downloading.

When the storage server is started, the same number of reporting threads can be created according to the number of the tracking servers in the configuration file to be respectively communicated with the tracking servers, and the communication request is sent out when the reporting threads of the storage server are started. After the tracking server obtains the reported information, the tracking server saves the grouping and the state information of the storage servers in the grouping in a linked list mode, and stores the storage server in the communication to a specific file, so that the tracking server can read a previously connected storage server list from a local file when being started next time.

The storage servers in the same group synchronize files in a push mode, the source server uploading the files is synchronized to the target server in the group, and the backed-up data does not need to be synchronized, so that a loop is prevented from being formed between the storage servers. And for the storage server which is just added, a certain storage server in the group can synchronize the backup data and the source data to the newly added server. Meanwhile, the storage server which is just added in communicates with the tracking server, the tracking server knows that a new storage server is added into the cluster, and the tracking server returns the storage server list in the group to the newly added storage server and returns the new storage server list in the group to the original storage server.

Files will be synchronized between storage servers of the same group, and files will not be synchronized between different groups. For example, there are now three storage servers numbered 1, 2, 3 in a group. When a file is uploaded to one of the storage servers 1, the file is synchronized to the storage server 2 and the storage server 3 as backup data.

The operations of file synchronization, uploading, downloading and the like are recorded in the binary log file, only the name of the file is recorded, the content is not recorded, and then synchronization is performed according to the binary log file, so that the operation is simplified. And an incremental synchronization mode is designed, namely, the synchronized position is recorded in a file named as a synchronous recording mark, and the next synchronization is carried out after the operation of recording the file.

In this embodiment, the storage server is configured with a buffer access service logic, and can start a special buffer to cache a file when a certain condition is met, and respond to various operation requests sent by the client by using the file cached in the buffer, so as to accelerate an access rate and reduce unnecessary read-write times of the disk.

Example 2:

the present example provides a data processing method of a storage system, including:

the first step is as follows: the client sends a file download request to the tracking server.

The second step is that: the tracking server returns information of an available storage server according to a load balancing mechanism, wherein the information mainly comprises an IP address and a port number of the storage server.

The third step: the client connects the storage server according to the obtained IP address and the port number, sends a file downloading command, and the storage server analyzes the data packet to obtain the file information of the file to be downloaded. The file download command is one of the read requests.

The fourth step: and the storage server automatically judges whether the buffer area is opened or not according to the corresponding configuration item of the configuration file.

The fifth step: and the storage server judges whether the recent file downloading repetition rate is higher according to whether the number of times of the latest downloaded file stored in the memory exceeds a preset threshold value, if so, continues to the next step, otherwise, returns an error message, directly reads the file from the disk and then sends the file to the client.

And a sixth step: the storage server traverses the index information of the buffer area, judges whether the buffer area hits the file according to the file Identification (ID), if so, reads the file from the buffer area, updates the index information of the buffer area, puts the index information of the file including the file ID and the file size at the last position of the index area, and executes the eighth step; otherwise, the next step is performed.

The seventh step: the storage server pushes the file into a disk read-write queue, a disk read-write thread detects that a task arrives in the disk read-write queue, starts to read the content and writes the content into the disk, if the read content is smaller than the size of the file, a network I/O thread is informed, the content of the file is continuously read from the network and written into the disk until the file is completely written into the disk. Meanwhile, the index area of the buffer needs to be updated, and the index information of the file is added to the index area. Meanwhile, whether the buffer area has enough memory to accommodate the file is inquired.

Eighth step: the storage server writes in a log file, updates the statistical information of the number of downloaded files and updates the statistical information of the number of successful downloaded files so as to send the statistical information to the tracking server when the heartbeat is detected.

The ninth step: and the storage server updates a data structure for counting the download times of the recent file, adds 1 to the download times of the item corresponding to the file, and newly inserts the item if the item does not exist.

The tenth step: and closing the connection and finishing the file downloading.

The buffer area has the following characteristics:

(1) to prevent the problem of too low hit rate, the buffer is only turned on at a specific time. Newly adding a data structure: 10 file names and key-value pairs of download times, the time the buffer was opened. The storage server records the file names and the download times of the last 10 downloads, when the download time of one of the files is more than 10 times and the current time minus the last buffer opening time is less than 30 minutes (the time is valid within 30 minutes of opening once), and under the condition that the two are met at the same time, the buffer is in an open state.

(2) Whether the buffer is opened or not is judged firstly when the files are downloaded each time, namely, the downloading times of traversing 10 recently accessed files are more than 10 times, and the time of subtracting the opening time of the last buffer from the current time is less than 30 minutes.

(3) Updating the cache information recording the file name and the downloading times: and updating the file downloading times during each downloading, and closing the buffer when the current time minus the last opening time is more than 30 minutes. The file name and the number of file downloads are reset every 10 minutes.

(4) Reading from a configuration file when a buffer size system is initialized, and defaulting to 100M;

files in the buffer area are stored in a linked manner without adopting a grouping mode, wherein the distributed file system is light in weight and increases the complexity of the system in a grouping mode, and the buffered files are different in size;

files with the size larger than the total size of the buffer area cannot be cached; in order to prevent reading dirty data, each modification (Modify) operation of the client judges that the operated file is not in the buffer area, and if the operated file is in the buffer area, the corresponding area is set to be invalid;

writing in the buffer area from the g _ pointer each time, wherein the writing mode of the buffer area is continuous and sequential writing;

g _ mempointor is initialized to 0, each writing update is the initial position of the written file plus the size of the file, and the tail position of the file is given to the global variable. Simultaneously updating MEMHEADER the structure;

locking a writing buffer area each time, and unlocking after writing;

(5) the data structure of the buffering information of the buffer may include the following fields:

and the MEMIME structure records the index information of each file, including the file name, the file size and the offset of the file in the buffer area. It is defined as follows:

typedef struct{

char Filename [ MAX _ PATH ]; // File name

long Long FileSize; // File size

long long Offset; // offset of file in buffer

MEMIMETEM; // offset of file in buffer

The cache information structure comprises the following fields:

static MEMIMETEM g _ MEMHEADER [ MAX _ ITEM ]; // buffer head Structure

static int g _ pinsertpointontor; // tail pointer

static pool g _ blsOpenCache _ System; // whether the buffer flag is turned on

static char memche [ memche _ SIZE ]; // buffer zone

g _ MEMHEADER stores the file information of the files cached in the buffer, and is an array composed of MEMITEMs.

The g _ pinsepointer records the address value obtained by adding the file size to the start address of the last file in the buffer. The caching of the next file starts from the position of the address plus one. The MEMCHE is a buffer area of an actual file, is pre-allocated when the system is initialized, and prevents the performance of the system from being influenced by multiple allocation and release.

The searching process of the buffer area comprises the following steps: each entry (item) in g _ MEMHEADER is traversed sequentially, finding a hit, otherwise, not in the buffer. One of the entries may be file information of one file.

Updating the buffer area: and updating the file to be cached to the memory at the beginning of the g _ plnsertenter each time (if the size of the g _ plnsertenter pointer to the end of the buffer is larger than the size of the file to be cached, writing from the beginning again). Then, g _ pinsepointor is updated to be (g _ pinsepointor + file size)% (the size of the buffer), the g _ MEMHEADER structure is traversed, the file entries covered by the newly added files are deleted (calculated according to the starting position of the buffer area covered by the newly added files), finally, the file information of the newly added files is added to the end of the g _ MEMHEADER structure, the updating operation is finished, and it needs to be noted that, during the whole updating operation, the whole buffer is required to be added with an exclusive lock.

In summary, each update operation guarantees: each entry in g _ MEMHEADER may be sorted by time, with the file added first being top-most; files in the buffer starting from g _ plnsetpointor can also be in chronological order.

For example, the data structure of g _ MEMHEADER may be a first-in-first-out queue.

The file uploading, modifying and deleting operations also need to modify the files and the cache information cached in the buffer area. The update operation of the file may also cause a problem that the data read in the buffer is "dirty data". Therefore, the update operation flow of the file needs to be modified, and whether the file to be modified is in the buffer area is judged when the file is modified. The details are as follows:

when the file is updated, whether the cache strategy is opened or not is judged from the configuration file, and if the cache strategy is not opened, the file is modified according to the original flow. If the file is opened, traversing the structure for storing the latest downloaded file to see whether the file to be modified at this time is included, and if not, modifying the file according to the original logic. If so, it indicates that the file has been buffered and the contents of the buffer need to be modified.

Locking the buffer, traversing the head of the buffer, searching the initial address of the file in the buffer, modifying the content of the buffer according to the offset (offset) of the modified file, and then releasing the buffer lock. And finally, modifying the file according to the original logic, and if an error occurs in the modifying process, deleting the file in the buffer area at the same time.

In summary, the present example provides a method and a system for implementing a distributed file system cache, which solve the problem of poor system access performance during file synchronization of a distributed file system by adding a cache module. The core idea is as follows: a cache module is added in a storage server of the distributed file system, and whether the requirement of opening a buffer area is met or not is judged before downloading the file each time, namely: if the repetition rate of the recently downloaded file is high, opening a buffer area, firstly searching the buffer area before reading the local disk each time, if the local disk is hit, directly reading the file from the memory, and accelerating the reading speed, otherwise, loading the file from the disk, and updating the head of the buffer area and the actual file storage position in the memory according to a certain rule. Otherwise, downloading the file according to normal logic.

Example 3:

as shown in fig. 6, the present example provides a data processing method of a storage system, including:

the client communicates with the tracking server to obtain the IP address and the port number of an available storage server;

the client establishes connection with a storage server according to the IP address and the port number;

the storage server analyzes a data packet sent by the client to obtain file information, wherein the data packet is a request data packet of the client for requesting to operate the file;

judging whether the buffer area is opened or not;

if the buffer area is not opened, directly downloading the file from the disk;

if the buffer is opened, the storage server determines whether the file is in the buffer according to the parsed file information (e.g., file name or FID) and the like;

if the file is in the buffer, directly reading the file from the buffer;

and if the file is not in the buffer area, the file is considered to be pushed into the task queue by reading, and the file is read from the disk according to the response sequence of the task queue.

Judging whether enough residual space exists in the buffer area;

if the residual space is enough, writing the file read from the disk into a buffer area, and then sending the file back to the client through the network;

if there is not enough space left, files are added from the end of the last buffer update operation while the buffer index is updated.

Writing a log;

and executing a downloading callback function and processing ending work.

Example 4:

as shown in fig. 7, the present example provides a data processing method of a storage system, including:

judging whether the residual space is larger than the size of the file to be written;

if yes, writing the file from the tail pointer g _ plnsetpointor;

if not, circularly writing from the head of the buffer when the writing operation reaches the tail of the buffer; synchronously updating the position pointed by the tail pointer while writing the file;

deleting the index of the file covered by the current written file in g _ MEMHEADER;

inserting the file index of the current written file into the tail of g _ MEMHEADER;

judging whether the current writing reaches the tail of the buffer area;

if the file reaches the tail of the buffer area, updating the g _ plnsetpointor value and adding the size of the file, and then taking the remainder of the size of the buffer area;

if the tail of the buffer is not reached, the g _ plnsetpointor pointer is updated to point to the tail of the newly added file.

The buffer is unlocked.

Example 5:

as shown in fig. 8, the present example provides a data processing method of a storage system, including:

it is determined whether the file to be updated is in the buffer,

if the file is directly updated in the buffer area;

if the file is not in the buffer area, caching the corresponding file;

the operation of updating the file in the buffer is entered.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all the functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may be separately used as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A data processing method of a storage system, comprising:

determining whether a first file requesting an operation is cached in a buffer;

if the first file is cached in the buffer area, accessing the first file in the buffer area; wherein the content of the first and second substances,

if the first file is cached in the buffer area, accessing the first file located in the buffer area, including at least one of:

2. The method of claim 1,

if the first file is cached in the buffer area, updating the first file located in the buffer area, including:

modifying the first file located in the buffer;

unlocking the buffer after the modification of the first file is completed.

3. The method according to claim 1 or 2, characterized in that the method further comprises:

determining whether a buffer start condition is satisfied;

4. The method of claim 2,

the determining whether the buffer starting condition is met includes:

5. The method of claim 4,

the determining whether the buffer starting condition is met according to the access condition information of the file in the preset time period includes at least one of the following steps:

6. The method of claim 3, further comprising:

7. The method of claim 1, further comprising:

a buffer flag for indicating whether the buffer is open;

8. The method of claim 7, further comprising:

9. The method of claim 8, further comprising at least one of:

10. The method of claim 9, further comprising:

11. The method according to claim 8, wherein the updating the cache information of the buffer according to the writing condition of the second file into the buffer comprises at least one of:

12. The method of claim 11, wherein the updating the tail pointer of the buffer according to the buffer status of the second file comprises at least one of:

13. The method of claim 1, further comprising:

14. A data processing apparatus of a storage system, comprising:

15. A storage server, comprising:

a memory for storing a plurality of data to be transmitted,

a processor, connected to the transceiver and the memory respectively, for controlling the transceiver and the memory to store information by executing computer-executable instructions located on the memory, and implementing the method provided in any one of claims 1 to 13.

16. A computer storage medium having computer-executable instructions stored thereon; the computer-executable instructions, when executed, enable the method provided in any one of claims 1 to 13 to be carried out.