CN111782622B - Log processing method, device, server and storage medium - Google Patents

Log processing method, device, server and storage medium Download PDF

Info

Publication number
CN111782622B
CN111782622B CN202010923654.7A CN202010923654A CN111782622B CN 111782622 B CN111782622 B CN 111782622B CN 202010923654 A CN202010923654 A CN 202010923654A CN 111782622 B CN111782622 B CN 111782622B
Authority
CN
China
Prior art keywords
log
hard disk
shared
file system
logs
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010923654.7A
Other languages
Chinese (zh)
Other versions
CN111782622A (en
Inventor
王利虎
李飞飞
颜红波
刘兴奎
刘攀
刘振军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Cloud Computing Ltd
Original Assignee
Alibaba Cloud Computing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Cloud Computing Ltd filed Critical Alibaba Cloud Computing Ltd
Priority to CN202010923654.7A priority Critical patent/CN111782622B/en
Publication of CN111782622A publication Critical patent/CN111782622A/en
Application granted granted Critical
Publication of CN111782622B publication Critical patent/CN111782622B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/176Support for shared access to files; File sharing support

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the specification provides a log processing method, a log processing device, a server and a storage medium, wherein data and logs are separately stored in different hard disks, and the logs are exclusively used in one hard disk and cannot be influenced by data processing tasks; because other data are not involved in the shared log hard disk, the logs can be sequentially stored in the shared log hard disk, and the log writing performance can be ensured. The shared log hard disk stores logs of other file system hard disks, so that the log hard disks are shared by a plurality of file systems, and the space waste can be reduced.

Description

Log processing method, device, server and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a log processing method, an apparatus, a server, and a storage medium.
Background
Network equipment, systems, service programs and the like generate logs during operation; each log records the description of the date, time, user and action. The log processing process needs to store logs, and the server needs to store other data. Based on this, there is a need to provide better log processing schemes.
Disclosure of Invention
To overcome the problems in the related art, the present specification provides a log processing method, apparatus, server, and storage medium.
According to a first aspect of embodiments of the present specification, there is provided a server, where the server is configured with at least two hard disks, where at least one shared journal hard disk is used to store a journal generated by a file system, and at least one file system hard disk is used to store data written by at least one file system;
the server runs a shared log process, and the shared log process is used for:
and acquiring logs generated when the file system operates the data, and sequentially writing the logs into the shared log hard disk.
According to a second aspect of embodiments of the present specification, there is provided a log processing method, which is applied to a server configured with at least one file system hard disk and at least one shared log hard disk; the file system hard disk is used for storing data written in by a corresponding file system; the shared log hard disk is used for sequentially storing at least one log generated by the file system; the method comprises the following steps:
acquiring a log generated when the file system operates the data;
and sequentially writing the logs into the shared log hard disk.
According to a third aspect of embodiments of the present specification, there is provided a log processing apparatus, which is applied to a server configured with at least one file system hard disk and at least one shared log hard disk; the file system hard disk is used for storing data written in by a corresponding file system; the shared log hard disk is used for sequentially storing at least one log generated by the file system; the device comprises:
a log acquisition module to: acquiring a log generated when the file system operates the data;
a log write module to: and sequentially writing the logs into the shared log hard disk.
According to a fourth aspect of embodiments of the present specification, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements an embodiment of the log processing method as described above.
The technical scheme provided by the embodiment of the specification can have the following beneficial effects:
in the embodiment of the specification, data and logs are separately stored in different hard disks, and the logs are exclusive to one hard disk and cannot be influenced by the processing task of the data; because other data are not involved in the shared log hard disk, the logs can be sequentially stored in the shared log hard disk, and the log writing performance can be ensured. The shared log hard disk stores logs of other file system hard disks, so that the log hard disks are shared by a plurality of file systems, and the space waste can be reduced.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the specification.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present specification and together with the description, serve to explain the principles of the specification.
FIG. 1A is a schematic diagram of a distributed system shown in accordance with an exemplary embodiment.
FIG. 1B is a flow chart illustrating a method of log processing according to an exemplary embodiment of the present description.
FIG. 1C is a schematic diagram illustrating a log store in a shared log hard disk according to an example embodiment.
FIG. 1D is a schematic diagram of a shared journal hard disk shown in accordance with an exemplary embodiment of the present description.
Fig. 1E is a schematic diagram illustrating creating a memory index according to an exemplary embodiment of the present disclosure.
FIG. 1F is a schematic diagram illustrating interaction between a shared journal process and a file system process, according to an illustrative embodiment.
FIG. 1G is a process diagram illustrating 3 stages of storage of a journal in a shared journal hard disk according to an exemplary embodiment of the present description.
FIG. 1H illustrates a process for moving a journal in a shared journal hard disk according to an example embodiment.
Fig. 2 is a block diagram of a log processing device shown in the present specification according to an exemplary embodiment.
Fig. 3 is a hardware configuration diagram of a computer device in which the log processing apparatus is located in the present specification.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the specification, as detailed in the appended claims.
The terminology used in the description herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the description. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of the present specification. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
As shown in fig. 1A, a schematic diagram of a distributed system is shown in accordance with an exemplary embodiment. The distributed system includes a plurality of servers, and fig. 1A shows 1 to n servers included in the distributed system. Wherein the server is configured with: at least two hard disks; based on the functional distinction of the hard disks in the embodiments of the present specification, the at least two hard disks include: at least one file system hard disk and at least one shared journal hard disk. In actual service, the server of this embodiment may also be applied to a non-distributed scenario, for example, the server of this embodiment may also be an independent server.
The file system hard disk is used for storing data. Data stored in the one or more file system hard disks is managed by one or more file systems. For example, a file system may manage data stored in a file system hard disk, that is, each file system hard disk is managed by an independent file system to store data. In other examples, it is also possible that two or more hard disks may be virtualized into a large hard disk on which a file system is deployed, and the data stored in the virtualized large hard disk is managed by the file system.
The shared log hard disk is used for sequentially storing and operating the data. I.e. logs generated by the file system operating on the data it manages are stored sequentially.
The file system hard disk of the embodiment may be a solid state hard disk, and the shared log hard disk may also be a solid state hard disk; of course, the actual service may also include other types of hard disks, which is not limited in this description embodiment.
In the actual service, the server may also be configured with a hard disk with other functions, which is not limited in the embodiments of the present specification; for example, the server may be configured with an additional hard disk dedicated to storing data associated with the server's operating system, such as operating system program files and the various types of computer program files installed in the operating system.
In this embodiment, the server is loaded with an operating system, and the operating system runs a shared log process. The shared log process is to: and acquiring logs generated when the file system operates the data, and sequentially writing the logs into the shared log hard disk. And when the operating system calls the file system, accessing the shared log through communication with the shared log process so as to realize the management of the data.
As an example, as shown in fig. 1B, is a flowchart of a log processing method shown in the present specification according to an exemplary embodiment, including the following steps:
102, acquiring a log generated when the file system operates the data;
and step 104, writing the log into the shared log hard disk in sequence.
In the embodiment of the specification, data and logs are separately stored in different hard disks, and the logs are exclusively owned by one hard disk and cannot be influenced by the processing task of the data; because other data are not involved in the shared log hard disk, the logs can be sequentially stored in the shared log hard disk, and the log writing performance can be ensured. The shared log hard disk stores logs of other file system hard disks, so that log equipment is shared, and space waste can be reduced.
In this embodiment, the server may receive a request for a modification operation on data stored in the hard disk of the file system, which is initiated by a requestor, and the server may perform asynchronous processing, for example, if the request for the modification operation is received, after a log corresponding to the modification operation is generated, a modification completion message may be returned to the requestor, and after the log is sequentially stored in the hard disk of the shared log, the data is modified based on the modification operation.
The scheme of this embodiment relates to data in a file system hard disk and a shared log in a shared log hard disk, and optionally, the server may implement corresponding processing by creating different processes. As an example, the server runs a file system process corresponding to each of the file systems, and a shared log process;
optionally, the file system process may be configured to: sending a log write request to the shared log process for a log corresponding to a modify operation for data; sending a log reading request to the shared log process request;
optionally, the shared log process may be configured to: if the log writing request is received, storing the logs into a shared log hard disk in sequence; and if the log reading request is received, reading the corresponding log in the shared log hard disk and sending the corresponding log to the file system process.
As can be seen from the foregoing embodiments, in the server in this embodiment, because data may have frequent operations such as modification, a file system hard disk often needs to perform garbage collection processing, and writing a log in the hard disk needs to have faster execution efficiency.
Taking a Solid State Drive (SSD) as an example, the Solid State Drive is a storage device using a semiconductor Flash memory (NAND Flash) as a medium. The traditional mechanical Hard Disk (Hard Disk Drive, HDD) adopts a mechanical structure of "motor + mechanical arm + magnetic Disk", and the SSD has evolved into an architecture using a semiconductor Flash memory (Flash) as a medium and a semiconductor chip as a main control, thereby achieving a good access performance far exceeding that of the HDD.
The basic memory unit of the SSD is a semiconductor flash memory grain, one memory grain consists of a plurality of BLOCKs (generally between 256 KB-4 MB), and the smallest access unit is PAGE (generally 4 KB). Flash memory granules support random access (only sequential access as opposed to HDD), high performance read operations (taking about 100 microseconds us), but there is a limit to write operations. After a PAGE is formatted, it stores all 0's or all 1's. After being written by the user, the data on the PAGE is arranged in 0/1 combinations according to the actual content of the data. When the user writes the PAGE again, the data of the BLOCK must be read out first, the data on the BLOCK is erased to all 0 s or all 1 s, and the read data is merged with the PAGE written by the user for the second time and written back to the BLOCK as a whole. The whole process takes milliseconds.
In order to solve the problem that when the flash PAGE is written twice, the Block must be erased and written again, the SSD chip usually adopts the following processing method:
1. distributing the data written each time to erased PAGE;
2. if the data exists in the part, the old PAGE of the marked data is the garbage PAGE;
3. when the garbage PAGE on a certain BLOCK exceeds a certain proportion, background garbage recovery is executed:
reading the BLOCK data as a whole; writing valid data into the other BLOCK to be written which is erased; the current BLOCK is erased.
Newly written data is always written into the empty PAGE, so that the writing delay of the SSD can be ensured; however, the background garbage collection algorithm preempts the bandwidth of the SSD with the data, resulting in a reduction of the bandwidth of the SSD.
When a user writes a large amount of random data into the SSD, certain PAGEs are repeatedly written, and the SSD continuously allocates new PAGEs for the data, so that the calculation burden of the SSD chip is increased; when a user writes a large amount of SSD randomly, the data of each particle BLOCK on the SSD is distributed in a chaotic way, and a background garbage recycling algorithm is difficult to find a BLOCK stored in a garbage PAGE (PAGE aggregation).
On the other hand, in the case where data and a log are stored in one hard disk, the hard disk needs to be divided into an area dedicated to storing data and an area dedicated to storing a log, and the area dedicated to storing a log is not normally in a full-written state. The current data server can support a plurality of hard disks, and if each hard disk is formatted into an independent file system, the hard disks access data independently; in this case, there is a problem that resources are wasted because an area dedicated to storing the log remains in each hard disk.
In the embodiment of the specification, data and logs are separately stored in different hard disks, and the logs are exclusive to one hard disk and cannot be influenced by the processing task of the data; because other data are not involved in the shared log hard disk, the logs can be sequentially stored in the shared log hard disk, and the log writing performance can be ensured. The shared log hard disk stores logs of other file system hard disks, so that log equipment is shared, and space waste can be reduced.
The log in the embodiment of the present specification may be a pre-written log. In a file system, a Write-ahead loading (WAL) is a technique for ensuring data consistency, providing atomicity and persistence (two of the database ACID properties). In a storage system using WAL, all modifications to the data are written to the log file before the final write to the address where the data is actually stored. Assuming that a program is powered down during the course of performing certain operations, upon restart, the program may need to know whether the operation being performed was successful or partially successful or failed. If WAL is used, the program may check the log file and compare the content of operations scheduled to be performed in the event of a sudden power loss with the content of operations actually performed. Based on this comparison, the program can decide whether to undo the operation done or to continue to complete the operation done, or to leave it intact. In file systems, the WAL is commonly referred to as journaling. WAL is a technique that file systems or other storage systems will typically implement. The scheme in the embodiment of the present description is applied to the WAL, and the performance of the WAL can be better ensured.
In the embodiment of the specification, communication such as request sending, data interaction and the like needs to be carried out between each file system process and the shared log process; in some examples, the way the file system process communicates with the shared journal process includes: a shared memory based message queue; the message queue of the shared memory means that a specific area is divided in the memory of the server and is specially used for the file system process and the shared log process, a message queue of a management message is operated in the specific area, and the file system process and the shared log process communicate through the message queue; the request generated by the file system process and required to be sent to the shared log process and the data required by the shared log process and sent to the file system process are stored in the specific area by the message queue, the file system process acquires the data sent to the file system process from the message queue by accessing the specific area, and the shared log process acquires the request sent to the file system process from the message queue by accessing the specific area. Based on this, the file system process and the shared log process can not be interfered by other task processing of the server, so that the processing efficiency can be improved, and the delay problem can be reduced.
In the embodiment of the present specification, a shared log process receives log requests sent by each file system process, a shared log hard disk is dedicated to storing logs, and the shared log process can store corresponding logs into the shared log hard disk in sequence after generating the corresponding logs; the sequential storage in this embodiment may be sequential storage from the head to the tail in the physical space of the hard disk, and may cycle to sequential storage in which the head continues after the storage to the tail of the hard disk. By the sequential storage mode, on one hand, the log is convenient to write and read, and the log reading and writing speed is high; on the other hand, the garbage recycling is convenient; the specific garbage collection processing embodiment will be described later.
In the embodiment of the present specification, the log file on the file system side is transferred from the file system hard disk, and is submitted to the shared log process to be stored in the shared log hard disk. The content that the file system sends to the shared journal process is the journal of the file system. In the shared log process, because the shared log process stores log files of multiple file systems, the shared log process needs to distinguish the file systems to which the logs belong, and in this embodiment, a log write request sent by the file system process carries: the file system process corresponds to a file system identifier of a file system. The shared log process receives a plurality of log write-in requests, and can identify which file system the log corresponds to based on the file system identification carried in the log write-in requests.
The present specification also provides an embodiment of a storage format of the log in the hard disk. Optionally, the storage format of the log in the shared log hard disk in this embodiment includes: the file system comprises a self-description block and a data block, wherein the self-description block is arranged in front of the data block, the self-description block stores self-description information comprising the file system identification, and the data block stores the log content. Optionally, the size of the self-description block and the data block may not be limited in the actual service, and the size depends on the actual length of the log; the self-description block may also be set to a fixed size according to business needs and empirical values, and the data block may also be set to a fixed size.
Fig. 1C is a schematic diagram illustrating a log storage in a shared log hard disk SSD according to an exemplary embodiment. In fig. 1C, 3 WAL logs are taken as an example, and the format of each log adopts a self-description information and log content manner. Through the mode, when the shared log process is abnormally quitted and restarted, if the shared log process has no self-description information, the shared log process cannot identify the WAL log, because the shared log process needs to face the WAL log from different file systems; and through the file system identification in the self-description information, the shared log process can identify which file system each log corresponds to. The format of the self-describing block of WAL is shown in fig. 1C, which includes 4 pieces of information: file system ID (i.e., identification of file system), WAL ID (identification of WAL log), WAL offset (log storage address), WAL len (log length), and these 4 pieces of information set the size of 64-bit.
The shared log process receives a plurality of log write requests of a plurality of file system processes and writes logs into the log SSD in sequence. Thus, the shared log process needs to persistently record which areas of the log SSD are unwritten areas and which areas are written areas.
Optionally, because the logs are written into the hard disk sequentially, the shared log process may record a head address and a tail address of the log, and an address range between the head address and the tail address, that is, a storage address range in the hard disk representing all currently stored logs, knows the storage address range in the hard disk in which the log is stored, and also knows which areas in the hard disk have not written the log. The head address may refer to an address stored in a log which is the latest in time among the stored logs, and the tail address may refer to an address stored in an oldest log among the stored logs.
The head address and the tail address may be written in a super block (superblock) of the hard disk. The super block stores attribute information of the hard disk file system, disk layout, resource use condition and other information. The file system knows the layout of the disk through the superblock, finds used and available resources and the like. A superblock corresponds to an entry and the operation of the file system usually starts from the superblock. By recording a head address and a tail address in the super block, the shared log process can record which areas in the shared log hard disk are written with logs in a persistent mode and which areas are not written with logs; when the shared log process is ended and restarted, the storage range of all logs currently stored in the hard disk can be known.
Fig. 1D is a schematic diagram of a shared journal hard disk according to an exemplary embodiment, where head _ offset refers to a head address, tail _ offset refers to a tail address, and the head address and the tail address are recorded in superblock, and are 64-bit in size respectively. In the initial stage of starting the use of the hard disk, when the log is written, the log is written from the superblock, and the log is written in sequence one by one. How to obtain a head address and a tail address for a shared log process; in the first writing stage, the head address needs to be determined by reading the super block, because the log is written from the back of the super block, so the head address is the address behind the super block; and the tail address can be determined according to the storage address of the log in the hard disk after the log is written in the hard disk by the shared log process. For example, the shared log process may also be used to: and after receiving the log writing request and storing the logs into a shared log hard disk in sequence, updating the storage address range in the super block according to the storage addresses of the logs. That is, after writing the log each time, the shared log process needs to update the storage address range recorded in the super block according to the storage address of the log written most recently. At a subsequent stage of writing to the log, the tail address may be read from the superblock.
In this embodiment, the generation process of the log is executed by the file system, and generally, the log generated by the file system includes a log identifier and a log length, so that the log write request sent by the file system process may carry: file system identification, log length, and log content. The self-description information written by the shared log process may further include: and log identification, the storage address of the log in the hard disk of the shared log and the length of the log content. Wherein, in the self-describing block, the file system identification can be arranged at the top.
In this embodiment, the shared log process writes a log address in the self-description block, where the log address refers to a storage address of the log in a shared log hard disk, and optionally may be an index address.
The log generated by the file system in the related technology is stored in the hard disk where the file system is located, so that the log generated by the file system corresponds to a log identifier, a log address and a log length, and the log address refers to a storage address of the log in the hard disk where the file system is located in the related technology; the journal in this embodiment is no longer stored in the hard disk where the file system is located, and the file system hard disk is not required to be divided into areas for storing the journal. In order to reduce the modification difficulty of the existing file system, the log generated by the file system may still correspond to a log address, and the log address may be a null value or other set field. After receiving a log write-in request sent by a file system process, a shared log process acquires a file system identifier, a log length and log content from the shared log process, acquires an address which can be used for storing the log in a current hard disk and serves as a storage address of the log, and stores the log.
In this embodiment of the present description, if a file system process of a file system hard disk is restarted due to an error, a log reading request needs to be sent to a shared log process, so as to read a log for data recovery. This requires that the shared journal process must be able to quickly retrieve the actual storage address of the journal in the hard disk to send a request to the SSD device to load the WAL data content. Thus, the shared log process may also be used to: a memory index is stored in a memory of the server, and the memory index represents each log and needs to be established at the storage address of the shared log hard disk; in some examples, the in-memory index may be an in-memory hash index.
Fig. 1E is a schematic diagram of establishing a memory index in an embodiment of this specification, and a process of establishing a memory index is described in conjunction with the diagram:
after the shared log process is started, reading a head address and a tail address recorded in a Super Block to obtain an address range of valid data, namely a storage address range of all stored logs in a shared log hard disk;
traversing all self-description blocks of the log according to the storage address range;
according to each self-description block, reading self-description information (including file system identification ID, log identification WAL ID, storage address WAL offset and log length WAL length, all being 64-bit size) recorded in the self-description block, and establishing a memory index, wherein the memory index records storage address index of each log in the memory.
The indexing method of this embodiment may be hash indexing, and as an example, a hash code may be generated by using a parameter that can uniquely distinguish a log, for example, a combination of a file system identifier ID of the log and a log identifier. And after the hash code is generated, the hash index represents the actual storage address of the hash code corresponding to the log in the hard disk. By adopting a hash index mode, each log corresponds to a unique hash code, and better query efficiency can be obtained by comparing the hash codes.
Based on the memory index, the log can be quickly read. For example, if a log reading request sent when the file system process is recovered is received, where the log reading request carries a file system identifier and a log identifier, a storage address on a shared log hard disk corresponding to a log to be read can be obtained through the record of a memory index, the log content of the log stored on the shared log hard disk is read through the storage address, and the read log content is returned to the file system process.
In this embodiment, the log may be stored continuously, so that the storage address range of the log in the shared log hard disk also changes continuously, and based on this, the shared log process of this embodiment may also be used to: and after receiving the log writing request and storing the logs into a shared log hard disk in sequence, updating the storage address range in the super block according to the storage addresses of the logs. In this embodiment, when the shared log process is running, when a log write request of the file system process is received and the log write is completed, the shared log process is updated into the memory index according to the storage address of the log written in the self-description block during the write, and the storage address range in the super block is updated.
Next, a log read/write process between the shared log process and the file system process will be described.
FIG. 1F is a diagram illustrating interaction between a shared journal process and a file system process, according to an exemplary embodiment.
The processing procedure of the file system process may be:
1. a user write request; namely, a modification request of a requester for data is received, a corresponding log is generated based on the modification request, and a log write request carrying the log is sent to a message queue.
4. Returning the user writing success; namely, a request reply of the message queue is received, and a log write success message is replied to the requester.
5. Asynchronous write back; i.e. asynchronously processing the modification request: and modifying the data stored in the hard disk of the file system based on the modification request.
6. A user read request; that is, if a read request of a user for data is received, data is read from the file system hard disk based on the read request.
The processing procedure of the shared log process may be:
2. acquiring a user log; i.e. a log write request of a file system process is received from the message queue.
3. Writing the shared logs in sequence; namely, based on the log writing request, the logs are written into the shared log hard disk in sequence.
And after the log is successfully written into the shared log SSD, updating the memory index, sending a message which is successfully written into the memory index into a message queue, and sending the message which is successfully written into the memory index into the file system process through the message queue.
Next, the log reclamation process will be explained.
As shown in FIG. 1G, the process of sharing 3 phases of storage of a log in a log hard disk is shown in FIG. 1G; in phase 1, 7 logs are stored in the memory address range between head address head to Tail address Tail. The present embodiment takes 3 file systems as an example, and shows 7 logs from the file systems 1 to 3. The 3 lists in FIG. 1G show the stages of the log from stage 1 through 2 passes of processing.
In this embodiment, after the operation corresponding to the log is completed, the log may be cleared as needed to release the hard disk space. Optionally, the file system process is further configured to: after the successful writing message is obtained through the message queue, a log clearing request is sent to the shared log process according to a set period; the shared log process is further to: and if the log clearing request is received, clearing the log which is requested to be deleted by the log clearing request in the shared log hard disk.
For example, the shared journal process receives a journal clearing request of the file system process of the file system 3, correspondingly clears the journal requested to be deleted by the journal clearing request, and the journal storage state of the shared journal hard disk after clearing is as shown in the 2 nd stage in the figure, and the hard disk space where the 5 th journal is located is released. The shared log process may also update the memory index, deleting the index of the removed log in the memory index. After the flush, the head and tail addresses are unchanged.
The shared log process also receives a log clearing request of the file system process of the file system 1, correspondingly clears the log which is requested to be deleted by the log clearing request, the log storage state of the shared log hard disk after clearing is shown as the 3 rd stage in the figure, and the hard disk space where the 1 st log is located is released. Likewise, the shared log process may also update the memory index. After the clearing, it can be found that the log with the earliest time has changed in all the currently stored logs, so that the tail address has changed, and the storage address range in the superblock can be updated.
Wherein, as log processing is involved, based on the WAL mechanism, the shared log process is further configured to: and writing a log corresponding to the log clearing request into the shared log hard disk before clearing the log which is requested to be deleted by the log clearing request in the shared log hard disk.
As shown in FIG. 1H, the process of moving the log in the shared log hard disk is shown in FIG. 1H.
In this embodiment, if the file system process is abnormal for a long time, the file system may accumulate a large amount of logs on the shared log disk. As the system continues to run, these logs are older and therefore accumulate at the location pointed to by the tail address tail _ offset. Normally, the log is cleared, so that the memory space between the head address and the tail address is released due to the log clearing, but the memory space is not utilized due to the sequential storage manner. Based on this, the present embodiment also provides a processing mechanism for such a case.
As an example, the shared log process is further to: and searching one or more logs with writing duration longer than a set time threshold in the shared log hard disk, and sequentially moving the searched logs from an original storage position to a position behind the head address. Optionally, based on what condition requires moving the log with long storage time, the actual service may be flexibly determined according to the need, for example, multiple conditions such as the storage time being longer than 24 hours and 48 hours may be set, which is not limited in this embodiment.
As shown in fig. 1H, in stage 1, the storage time of the first 3 logs is long, and it is necessary to move the first 3 logs to the latest log, and the processing manner of this embodiment may be understood as that these accumulated logs are read and written sequentially to the position pointed by the header address, that is, sequentially written to the latest log. For example, the examples of phase 2 through phase 4 shown in the figure, the log is read out one by one and written sequentially.
After the 3 rd stage, the log "file system 2-3" needs to be read out and stored to the latest position, but since the current hard disk space is already stored to the last position, it can be seen in the 4 th stage that the log "file system 2-3" is stored to the starting position of the free area in the hard disk space. In this way, the sequential write mode of the log SSD is guaranteed as well as the purpose of retaining these data.
In actual service, the shared log process may also be abnormal, for example, a downtime, a process error shutdown, and the like. For example, after the shared log process writes the log into the hard disk, an exception occurs at this time, and the shared log process has not yet reached to update the storage address range in the super block, so that the storage address range recorded in the super block at the time of the exception does not match the actual log storage in the hard disk. Based on this, this embodiment further provides an exception recovery process, where the shared log process is further configured to: and if the hard disk space behind the tail address stores the log, determining that the storage address range has an error record, and updating the storage address range recorded in the super block according to the reading result.
As an example, the shared log process may perform the following processing after each startup:
and reading the Super Block to obtain a head address and a tail address.
Starting from the head address, a sequential scan is made to see if there is a missing log following the head _ offset.
If a valid log is read, updating the header address;
and updating the memory index according to the read log between the head address and the tail address.
In the exception recovery process, reading the log clearing log indicates that an exception occurs in the shared log process when the log clearing is not completed, so that the corresponding log can be cleared based on the log clearing processing mode. After the operations are completed, the shared log process is successfully recovered, and a request of the file system process is received and processed.
In correspondence with the foregoing embodiments, the present specification also provides embodiments of a log processing apparatus and a server to which the log processing apparatus is applied.
Fig. 2 is a block diagram of a log processing apparatus according to an exemplary embodiment, which is applied to a server configured with at least one file system hard disk and at least one shared log hard disk;
the file system hard disk is used for storing data written in by a corresponding file system;
the shared log hard disk is used for sequentially storing at least one log generated by the file system;
the device comprises:
a log obtaining module 21, configured to: acquiring a log generated when the file system operates the data;
a log writing module 22 for: and sequentially writing the logs into the shared log hard disk.
The implementation process of the functions and actions of each module in the log processing apparatus is specifically detailed in the implementation process of the corresponding step in the log processing method, and is not described herein again.
The embodiment of the log processing device in the specification can be applied to computer equipment, such as a server. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and as a logical device, the device is formed by reading corresponding computer program instructions in the nonvolatile memory into the memory for operation through the processor in which the file processing is located. From a hardware aspect, as shown in fig. 3, the hardware structure diagram of a computer device in which the log processing apparatus is located in this specification is shown, except for the processor 310, the memory 330, the network interface 320, and the nonvolatile memory 340 shown in fig. 3, in the embodiment, the computer device in which the log processing apparatus 331 is located may also include other hardware according to an actual function of the computer device, which is not described again. Wherein the non-volatile memory 340 may include: at least one file system hard disk and at least one shared log hard disk; the file system hard disk is used for storing data written in by a corresponding file system; the shared log hard disk is used for sequentially storing at least one log generated by the file system.
Wherein the processor implements the following method when executing the program:
acquiring a log generated when the file system operates the data;
and sequentially writing the logs into the shared log hard disk.
Accordingly, the present specification further provides a computer readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the embodiment of the log processing method as described above.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, wherein the modules described as separate parts may or may not be physically separate, and the parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in the specification. One of ordinary skill in the art can understand and implement it without inventive effort.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Other embodiments of the present description will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This specification is intended to cover any variations, uses, or adaptations of the specification following, in general, the principles of the specification and including such departures from the present disclosure as come within known or customary practice within the art to which the specification pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the specification being indicated by the following claims.
It will be understood that the present description is not limited to the precise arrangements described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present description is limited only by the appended claims.
The above description is only a preferred embodiment of the present disclosure, and should not be taken as limiting the present disclosure, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims (22)

1. A server configured with at least two hard disks, the at least two hard disks comprising: at least one file system hard disk and at least one shared log hard disk; the at least one file system hard disk corresponds to at least two file systems, and the file system hard disk is used for storing data written in by the file system corresponding to the hard disk; the at least one shared log hard disk is used for storing logs generated by the at least two file systems, and the logs are pre-written logs;
the server runs a shared log process, a file system of the server receives a modification request of a requester for data, generates a log write-in request carrying a file system identifier and log content based on the modification request, and sends the log write-in request to the shared log process, and the shared log process is used for: acquiring the log write-in request, organizing the logs in the shared log hard disk according to the sequence of the physical addresses of the shared log hard disk, storing the logs in the shared log hard disk, and returning a successful write-in message; wherein, the storage format of the log in the shared log hard disk comprises: the self-description block is arranged in front of the data block, the self-description block stores self-description information including file system identification of the file system, and the data block stores the log content;
and the file system accesses the log through communication with the shared log process so as to manage the data stored in the file system hard disk corresponding to the file system.
2. The server of claim 1, the filesystem process to: sending a log write request to the shared log process for a log corresponding to a modify operation for data; sending a log reading request to the shared log process request;
the shared log process is to: if the log writing request is received, storing the logs into a shared log hard disk in sequence; and if the log reading request is received, reading the corresponding log in the shared log hard disk and sending the corresponding log to the file system process.
3. The server of claim 2, the means by which the file system process communicates with the shared journal process comprising: a shared memory based message queue.
4. The server of claim 3, the self-description information further comprising one or more of: log identification, the storage address of the log in the shared log hard disk or the length of the log content.
5. The server of claim 4, the shared log process further to: writing in a super block of the shared log hard disk: and all the stored logs are in the storage address range of the hard disk of the shared log.
6. The server of claim 5, the memory address range being: and the storage address range is composed of a head address and a tail address, the head address refers to an address stored by the log which is the newest in time in all the stored logs, and the tail address refers to an address stored by the log which is the oldest in time in all the stored logs.
7. The server of claim 6, the shared log process further to: and storing a memory index in a memory of the server, wherein the memory index represents each log and a storage address of the log in the shared log hard disk.
8. The server of claim 7, the self-description information of the log further comprising a storage address of the corresponding log;
the establishing process of the memory index comprises the following steps:
after the shared log process is started, reading super blocks in a shared log hard disk, and determining the storage address ranges of all logs in the shared log hard disk through the head address and the tail address;
and after the storage addresses and the log identifications of the logs in the self-description information of all the logs in the storage address range are read, establishing the memory index.
9. The server of claim 8, the shared log process further to: after receiving the log write-in request and storing the logs into a shared log hard disk in sequence, updating the memory index and updating the memory address range in the super block according to the memory addresses of the logs.
10. The server of claim 6, the modify operation being initiated by a requestor; the file system process is further to: after sending the log write request to the message queue, returning a log write success message to the requester;
the shared log process is further to: and acquiring a log write-in request from the message queue, storing the logs into a shared log hard disk in sequence based on the log write-in request, and then sending a successful write-in message to the file system process through the message queue.
11. The server of claim 10, the file system process further to: after the successful writing message is obtained through the message queue, a log clearing request is sent to the shared log process according to a set period;
the shared log process is further to: and if the log clearing request is received, clearing the log which is requested to be deleted by the log clearing request in the shared log hard disk.
12. The server of claim 11, the shared log process further to: and writing a log corresponding to the log clearing request into the shared log hard disk before clearing the log which is requested to be deleted by the log clearing request in the shared log hard disk.
13. The server of claim 12, the shared log process further to: after the process is started, acquiring the storage address range recorded in the super block; and reading whether a log is stored in the hard disk space behind the tail address, and determining whether the storage address range records errors according to the reading result.
14. The server of claim 13, the shared log process further to: and if the hard disk space behind the tail address stores the log, determining that the storage address range has an error record, and updating the storage address range recorded in the super block according to the reading result.
15. The server of claim 14, the shared log process further to:
and searching one or more logs with writing duration longer than a set time threshold in the shared log hard disk, and sequentially moving the searched logs from an original storage position to a position behind the head address.
16. A log processing method is applied to a shared log process of a server, wherein the server is provided with at least two hard disks, and the at least two hard disks comprise: at least one file system hard disk and at least one shared log hard disk; the at least one file system hard disk corresponds to at least two file systems, and the file system hard disk is used for storing data written in by the file system corresponding to the hard disk; the at least one shared log hard disk is used for storing logs generated by the at least two file systems, and the logs are pre-written logs; the file system of the server receives a modification request of a requester for data, generates a log write-in request carrying a file system identifier and log content based on the modification request, and sends the log write-in request to the shared log process; the method comprises the following steps:
acquiring the log writing request;
in the shared log hard disk, organizing the logs according to the sequence of the physical addresses of the shared log hard disk, storing the logs to the shared log hard disk, and returning a successful write-in message; wherein, the storage format of the log in the shared log hard disk comprises: the self-description block is arranged in front of the data block, the self-description block stores self-description information including file system identification of the file system, and the data block stores the log content; the file system accesses the log through communication with the shared log process so as to manage data stored in a file system hard disk corresponding to the file system; wherein each of the file systems independently accesses data.
17. The method of claim 16, the file system process to: sending a log write request to the shared log process for a log corresponding to a modify operation for the data; sending a log reading request to the shared log process request;
the shared log process is to: if the log writing request is received, storing the logs into a shared log hard disk in sequence; and if the log reading request is received, reading the corresponding log in the shared log hard disk and sending the corresponding log to the file system process.
18. The method of claim 16, the self-descriptive information further comprising one or more of: log identification, the storage address of the log in the shared log hard disk or the length of the log content.
19. The method of claim 18, the shared log process further to: writing in a super block of the shared log hard disk: and all the stored logs are in the storage address range of the hard disk of the shared log.
20. The method of claim 19, the shared log process further to: and storing a memory index in a memory of the server, wherein the memory index represents each log and a storage address of the log in the shared log hard disk.
21. A log processing device is applied to a server, and the server is provided with at least two hard disks which comprise: at least one file system hard disk and at least one shared log hard disk; the at least one file system hard disk corresponds to at least two file systems, and the file system hard disk is used for storing data written in by the file system corresponding to the hard disk; the at least one shared log hard disk is used for storing logs generated by the at least two file systems, and the logs are pre-written logs; the file system of the server receives a data modification request of a requester, generates a log write-in request carrying a file system identifier and log content based on the modification request and sends the log write-in request to the log processing device;
the device comprises:
a log acquisition module to: acquiring the log writing request;
a log write module to: in the shared log hard disk, organizing the logs according to the sequence of the physical addresses of the shared log hard disk, storing the logs to the shared log hard disk, and returning a successful write-in message; wherein, the storage format of the log in the shared log hard disk comprises: the self-description block is arranged in front of the data block, the self-description block stores self-description information including file system identification of the file system, and the data block stores the log content;
the file system accesses the log through communication with the log processing device so as to manage data stored in a file system hard disk corresponding to the file system; wherein each of the file systems independently accesses data.
22. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the method of any of claims 16 to 20.
CN202010923654.7A 2020-09-04 2020-09-04 Log processing method, device, server and storage medium Active CN111782622B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010923654.7A CN111782622B (en) 2020-09-04 2020-09-04 Log processing method, device, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010923654.7A CN111782622B (en) 2020-09-04 2020-09-04 Log processing method, device, server and storage medium

Publications (2)

Publication Number Publication Date
CN111782622A CN111782622A (en) 2020-10-16
CN111782622B true CN111782622B (en) 2021-03-16

Family

ID=72762983

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010923654.7A Active CN111782622B (en) 2020-09-04 2020-09-04 Log processing method, device, server and storage medium

Country Status (1)

Country Link
CN (1) CN111782622B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114442958B (en) * 2022-01-28 2023-08-11 苏州浪潮智能科技有限公司 Storage optimization method and device for distributed storage system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9851926B2 (en) * 2015-06-02 2017-12-26 Quantum Corporation Log structured block device for hard disk drive
CN112765006A (en) * 2015-07-21 2021-05-07 北京忆恒创源科技有限公司 Storage device log generation method and storage device thereof
CN107040408B (en) * 2017-03-23 2023-10-24 国网浙江省电力公司 Network power testing method based on SDN automation equipment
CN109508246A (en) * 2018-06-25 2019-03-22 广州多益网络股份有限公司 Log recording method, system and computer readable storage medium

Also Published As

Publication number Publication date
CN111782622A (en) 2020-10-16

Similar Documents

Publication Publication Date Title
US8612722B2 (en) Determining an end of valid log in a log of write records
US20220188276A1 (en) Metadata journal in a distributed storage system
US10891264B2 (en) Distributed, scalable key-value store
US11301379B2 (en) Access request processing method and apparatus, and computer device
US7293145B1 (en) System and method for data transfer using a recoverable data pipe
US10176190B2 (en) Data integrity and loss resistance in high performance and high capacity storage deduplication
US6311193B1 (en) Computer system
US6539402B1 (en) Using periodic spaces of block ID to improve additional recovery
US20170124104A1 (en) Durable file system for sequentially written zoned storage
EP2590078B1 (en) Shadow paging based log segment directory
US20170123928A1 (en) Storage space reclamation for zoned storage
CN109542682B (en) Data backup method, device, equipment and storage medium
US6754842B2 (en) Facilitating a restart operation within a data processing system
US6944635B2 (en) Method for file deletion and recovery against system failures in database management system
CN111506253B (en) Distributed storage system and storage method thereof
US20230008732A1 (en) Database management system
US10977143B2 (en) Mirrored write ahead logs for data storage system
JP2007188497A (en) System and method for managing log information for transaction
CN111782622B (en) Log processing method, device, server and storage medium
US20170123714A1 (en) Sequential write based durable file system
CN113253932B (en) Read-write control method and system for distributed storage system
US10761936B2 (en) Versioned records management using restart era
US11055184B2 (en) In-place garbage collection of a sharded, replicated distributed state machine based on supersedable operations
CN115309336A (en) Data writing method, cache information updating method and related device
CN108271420B (en) Method for managing files, file system and server system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant