WO2022222523A1

WO2022222523A1 - Log management method and apparatus

Info

Publication number: WO2022222523A1
Application number: PCT/CN2021/140172
Authority: WO
Inventors: 舒继武; 汪庆; 陈佩; 王硕; 陆游游; 姚建业; 赵玥
Original assignee: 华为技术有限公司; 清华大学
Priority date: 2021-04-22
Filing date: 2021-12-21
Publication date: 2022-10-27
Also published as: CN115237854A

Abstract

A log management method and apparatus, which relate to the field of data storage. The log management method comprises: firstly, a processor dividing a storage space of a hard disk into a plurality of segments, wherein one segment comprises one or more physical blocks, and the one segment is used for storing a log; secondly, the processor receiving a first log writing request and a second log writing request, wherein the first log writing request comprises a first log item, and the second log writing request comprises a second log item; and finally, the processor concurrently writing the first log item into a first segment of the plurality of segments and writing the second log item into the first segment. Since a storage space of a hard disk is divided into a plurality of segments, the hard disk is prevented from storing a log in the form of a log file, thereby reducing the overheads of a file lock during a log writing process, and improving the log writing efficiency. A plurality of log items can be concurrently written into one segment, thereby avoiding a locking process, which is required for writing the log items into the log file, and improving the efficiency of concurrently writing logs into the hard disk.

Description

A log management method and device

This application claims the priority of the Chinese patent application with the application number 202110436373.3 and the application name "Data Update Method and Related Apparatus", which was submitted to the State Intellectual Property Office on April 22, 2021, and the application is also claimed in June 2021 The priority of the Chinese patent application filed with the State Intellectual Property Office on the 30th with the application number of 202110744793.8 and the application title of "A Log Management Method and Device", the entire contents of which are incorporated in this application by reference.

technical field

The present application relates to the field of data storage, and in particular, to a log management method and device.

Background technique

Write ahead log (WAL) provides a highly concurrent and persistent log storage and playback mechanism. Before the server writes business data to the storage device, it will be recorded in the WAL. Usually, in the log writing process, the server writes multiple logs to the storage device by means of append writing, and the storage device saves the logs in the form of log files, wherein each log file includes a fixed number of log entries . However, in the process of appending the log, the server needs to lock the log file to which the new log entry is to be written, resulting in a slow log writing speed. Therefore, how to improve the efficiency of log writing has become an urgent problem to be solved at present.

SUMMARY OF THE INVENTION

The present application provides a log management method and device, which solves the problem of slow log writing speed caused by the server writing logs with log files as granularity.

In order to achieve the above purpose, the present application adopts the following technical solutions.

In a first aspect, the present application provides a log management method, and the method can be applied to a processor, or the method can be applied to a communication device that can support a processor to implement the method, for example, the communication device includes a chip system, and the log management The method includes: first, the processor divides the storage space of the hard disk into a plurality of segments, one segment includes one or more physical blocks (blocks), and the one segment is used to store logs; secondly, the processor receives a first log write request and a second log write request, the first log write request includes a first log entry, and the second log write request includes a second log entry; finally, the processor writes the first log entry into the first log entry in the plurality of segments in parallel segment, and write the second log entry to the first segment. In the embodiment of the present application, since the storage space of the hard disk is divided into multiple segments, the hard disk manages logs based on segments, which avoids the hard disk from storing logs in the form of log files, reduces the overhead of file locks in the log writing process, and improves the The efficiency of log writing is improved; in addition, multiple log items can be written to a segment in parallel, avoiding the locking process required to write log items in the log file, and improving the parallel writing of logs to the hard disk. efficiency.

The log in this embodiment of the present application refers to business data, or a copy, snapshot, or clone of the business data.

As a possible implementation manner, the processor does not perform a locking mechanism in the process of writing the first log entry into the first segment or during the process of writing the second log entry into the first segment. Compared with the problem that multiple log items cannot be written to the hard disk in parallel due to the log system implementing the file locking mechanism in the conventional technology, the log management method provided by the embodiment of the present application does not implement a locking mechanism on the hard disk because the processor does not. , the processor can write the first log item and the second log item to the first segment of the hard disk in parallel, which improves the efficiency of the processor writing multiple log items to the hard disk in parallel.

As another possible implementation manner, the log management method further includes: the processor matches the log sequence number LSN of the first log write request with the log index, and determines that the first log entry included in the first log write request is to be written The paragraph is the first paragraph. The log index is used to indicate the correspondence between the first segment and the LSN. The processor matches the LSN of the log write request with the log index, and determines the segment to be written in the log entry included in the log write request, which reduces the path query time required for log writing and improves the efficiency of log writing.

As another possible implementation manner, the log index includes a first index and a second index, the first index is used to indicate the storage address range of the first segment in the hard disk, and the second index is used for each of the at least one storage address range The correspondence between the LSN of the log entry and the storage address.

The process that the above-mentioned processor matches the LSN and the log index to determine the first segment may include: the processor determines whether the LSN of the LSN and the third log item are continuous, and the third log item is any log item recorded by the second index; It is continuous with the LSN of the third log entry, and the processor uses the segment where the third log entry is located as the first segment; if the LSN is not continuous with the LSN of the third log entry, the processor matches the LSN with the first index to determine the first segment. , the LSN of the first log entry in the first segment matches the LSN of the first log write request. Since the storage capacity of the cache of the processor is small, a large amount of index data cannot be stored in the cache. In the log management method provided by the embodiment of the present application, the log index includes a first index and a second index. Any one of the first index and the second index is preferentially matched. If the match is successful, the processor can be prevented from searching for all data in the log index, the path query time required for log writing is reduced, and the log writing efficiency is improved.

During the log reading process, the processor may also first match the LSN of the log read request with the second index. If the match is successful (hit), the processor may determine the log to be read by the log read request according to the second index The storage address where the item is located reduces the path query time required for log reading and improves the efficiency of log reading. If the LSN of the log read request is not hit in the second index, the processor can determine the log segment to be read by the log read request according to the first index, and obtain the log entry corresponding to the read log request by traversing the log segment, avoiding processing The server will query the storage address of the log read request in the file system, which improves the efficiency of log reading.

As another possible implementation, the log index is stored in the cache. The cache may be the internal memory of the processor. During the log writing process, the processing unit in the processor can read the log index in the cache, so as to quickly determine the segment to be written in the log entry included in the log write request, and reduce the path query time required for log writing. The efficiency of log writing is improved.

As another possible implementation manner, the first log entry includes description information and data information, the description information includes the LSN of the first log write request, the length of the log entry, and the checksum of the length of the log entry, and the data information is used to indicate the first log The log content recorded by the item. The log entry length checksum is used to determine the integrity and accuracy of the first log entry, the log entry length is used to determine the size of the storage space occupied by the first log entry, and the LSN is used to indicate the log of the first log write request Add the sequence number of the operation. In some possible examples, the description information included in the first log entry may also be referred to as metadata information of the first log entry.

As another possible implementation manner, the process in which the processor writes the first log entry into the first segment includes: the processor generates a starting storage address according to the LSN of the first log write request, and the starting storage address is in the first log entry. The storage address range of the segment; further, the processor writes the first log entry into the first segment according to the starting storage address. In the process of writing the log items to the hard disk, the processor first generates the initial storage address according to the LSN of the log write request, and then writes the log items included in the log write request to the hard disk according to the initial storage address. The log entries are written to the hard disk to ensure the accuracy of log writing.

As another possible implementation manner, after the processor generates the start storage address according to the LSN of the first log write request, the log management method further includes: the processor sends the LSN and the start address of the first log write request to the backup device storage address. In the case where the log system has multiple storage devices, if the processor is the master replica node (referred to as: master node), the master node sends the LSN and starting storage address of the log write request to the slave replica node (referred to as: slave node), And the slave node writes the log items included in the log write request into the first segment according to the LSN and the starting storage address of the log write request, avoiding the need for the master node to go to the slave node for backup after the log writing is completed. Write efficiency for multi-write scenarios.

As another possible implementation manner, after the processor receives the first log write request, the log management method further includes: the processor receives the LSN and the starting storage address of the first log write request sent by the backup device, and according to the first log write request Once the LSN and starting storage address of the log request are written, the first log entry is written into the first segment. When the log system has multiple storage devices, if the processor is a slave node, after receiving the log write request sent by the client, the processor can also receive the LSN and the log write request sent by the backup device (master node). The starting storage address, and the processor (slave node) writes the log items included in the log writing request into the first segment according to the LSN and starting storage address of the log write request, avoiding the completion of log writing on the master node. Then go to the slave node for backup, which improves the writing efficiency of multi-write scenarios.

As another possible implementation manner, the log management method further includes: the processor sends a log write response to the client, where the log write response is used to indicate that the first log and the second log have been written to the first segment. In this way, the client can determine that the current round of log addition operation has been completed according to the log write response fed back by the processor, so as to avoid the client from repeatedly sending log write requests to the log system, resulting in repeated backups of log write requests and saving the storage resources of the log system. Improve the storage resource utilization of the log system.

As another possible implementation manner, the processor writes the first log entry into the first segment, including: when the remaining storage space of the first segment is less than the log entry length of the first log write request, the processor writes the first log entry into the first segment. One part of the log write request is written into the first segment, and another part of the first log write request is written into the second segment, and the second segment is the segment in the hard disk that has an association relationship with the first segment that is continuously read; Further, the processor sets the log tail address in the hard disk as the start storage address of the first log entry in the second segment. In the embodiment of the present application, the log tail address is represented by the starting storage address of the log entry at the segment head of the log segment where the last log is located.

In the embodiment of the present application, for the log adding operation, only when the log entry needs to be stored in two log segments, one of which is a newly allocated log segment (such as the second segment above), the log needs to be updated The starting storage address of the log segment where the tail is located. In this way, the log tail address can be represented by the starting storage address of the log segment where the log tail is located, which avoids the need to update the log tail address every time a log is added in the file system, and reduces the randomness of the log tail address required for log addition. Lowercase operations improve the efficiency of log writing.

As another possible implementation, before the processor generates the starting storage address according to the LSN of the log write request, the log management method further includes: the processor determines whether the length of the log entry of the log write request is less than or equal to the log read length of the hard disk. Write unit, the log read/write unit of the hard disk refers to the minimum data granularity of the log entries in the hard disk read and written by the processor. If the log entry length of the log write request is less than or equal to the log read/write unit of the hard disk, the processor performs the step of generating the starting storage address according to the LSN of the log write request; if the log entry length of the log write request is greater than the log read/write unit of the hard disk unit, the processor divides the log write request into multiple sub-logs, and the log entry length of each sub-log in the multiple sub-logs is the same as the log read-write unit. Finally, the processor generates a starting storage address of each sub-log in the multiple sub-logs according to the LSN of the log write request. In the case where the log entry length of the log entry included in the log write request is large, since the processor can write multiple sub-logs according to the log read/write unit of the hard disk, even when the processor directly writes the log entry included in the log write request When the medium hard disk and the processor are powered off, the sub-logs that have been written to the hard disk by the processor will not be repeatedly written (or read) by the processor, which reduces the storage resource consumption of the hard disk and improves the storage resource utilization efficiency of the hard disk.

In the embodiment of the present application, the processor may split the log items to be written included in the log write request according to the log read/write unit of the hard disk, and further, the processor atomically generates the starting storage address of each sub-log, because The processor reads and writes logs based on the granularity of log segments that do not use the file system. Therefore, the processor does not need to lock the log segments, thus realizing the process of the processor adding and writing to the hard disk in parallel, reducing the time required for log writing and improving the efficiency of log writing.

In the second aspect, the present application provides a log management device, and the beneficial effects can be found in the description of any aspect of the first aspect, which will not be repeated here. The log management apparatus has the function of implementing the behavior in the method example of any one of the above-mentioned first aspects. The functions can be implemented by hardware, or can be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions. In a possible design, the log management apparatus can be applied to a processor, or the log management apparatus can be applied to a communication device that can support the processor to implement the method. For example, the communication device includes a chip system, and the log management apparatus includes: The processing module is used to divide the storage space of the hard disk into multiple segments, one segment includes one or more physical blocks, and one segment is used to store logs; the communication module is used to receive the first log write request and the second log write request , the first log write request includes the first log entry, and the second log write request includes the second log entry; the processing module is further configured to write the first log entry into the first segment of the multiple segments in parallel, and write the first log entry into the first segment in parallel. Two log entries are written to the first segment.

As an optional implementation manner, no locking mechanism is implemented in the process of writing the first log entry into the first segment or during the process of writing the second log entry into the first segment.

As another optional implementation manner, the log management apparatus further includes: an index module, configured to match the log sequence number LSN of the first log write request with the log index, and determine the first log item included in the first log write request The segment to be written is the first segment, and the log index is used to indicate the correspondence between the first segment and the LSN.

As another optional implementation manner, the log index includes a first index and a second index, the first index is used to indicate the storage address range of the first segment in the hard disk, and the second index is used for each storage address range in at least one storage address range. The correspondence between the LSN and storage address of each log entry. The index module is specifically used to judge whether the LSN of the third log entry is continuous, and the third log entry is any log entry recorded by the second index; if the LSN is continuous with the LSN of the third log entry, the index module is specifically used for Take the segment where the third log entry is located as the first segment; if the LSN is not continuous with the LSN of the third log entry, the indexing module is specifically used to match the LSN with the first index, and determine the first segment, the first segment in the first segment. The LSN of a log entry matches the LSN of the first log write request.

As another optional implementation, the log index is stored in the cache.

As another optional implementation manner, the first log entry includes description information and data information, the description information includes the LSN of the first log write request, the length of the log entry, and the checksum of the length of the log entry, and the data information is used to indicate the first log entry. The log content recorded by the log entry.

As another optional implementation manner, the processing module is specifically configured to generate a starting storage address according to the LSN of the first log write request, and the starting storage address is in the storage address range of the first segment; The starting storage address writes the first log entry to the first segment.

As another optional implementation manner, the communication module is further configured to send the LSN and the starting storage address of the first log write request to the backup device.

As another optional implementation manner, the communication module is further configured to receive the LSN and the initial storage address of the first log write request sent by the backup device; the processing module is further configured to receive the LSN and the initial storage address of the first log write request according to the first log write request. Start storage address, write the first log entry into the first segment.

As another optional implementation manner, the communication module is further configured to send a log write response to the client, where the log write response is used to indicate that the first log and the second log have been written to the first segment.

In a third aspect, the present application provides a chip, including a memory and a processor, the memory is used to store computer instructions, and when the processor calls and runs the computer instructions from the memory, any one of the first aspect and the first aspect is possible. Operation steps of the log management method of the implementation.

In a fourth aspect, the present application provides a communication device, including the chip described in the third aspect, the chip includes a processor and a memory, and the memory is used for storing computer instructions. When the processor executes the computer instructions, the communication device realizes Operation steps of the log management method of the first aspect and any possible implementation manner of the first aspect.

For example, the communication device may be a network card, a server or a desktop computer, and so on.

In a fifth aspect, the present application provides a computer-readable storage medium, in which a computer program or instruction is stored, and when the computer program or instruction is executed by a processor or a communication device, any one of the first aspect and the first aspect is implemented Operation steps of a log management method of a possible implementation.

In a sixth aspect, the present application provides a computer program product that, when the computer program product runs on a communication device such as a computer or a server, enables the computer or server to perform the log management of the first aspect and any one of the possible implementations of the first aspect steps of the method.

In a seventh aspect, the present application provides a log system, and the communication system includes at least one server. First, the server divides the storage space of the hard disk into multiple segments, and one segment includes one or more physical blocks (blocks), which are used to store logs; secondly, the server receives the first log write request and the first log write request sent by the client. Two log write requests, the first log write request includes a first log entry, and the second log write request includes a second log entry; finally, the server writes the first log entry into the first segment of the multiple segments in parallel, and writing the second log entry to the first segment. In the embodiment of the present application, since the storage space of the hard disk is divided into multiple segments, the hard disk manages logs based on segments, which avoids the hard disk from storing logs in the form of log files, reduces the overhead of file locks in the log writing process, and improves the The efficiency of log writing is improved; in addition, multiple log items can be written to a segment in parallel, avoiding the locking process required to write log items in the log file, and improving the parallel writing of logs to the hard disk. efficiency.

The log system provided by the present application can implement the operation steps of the log management method of the first aspect and any possible implementation manner of the first aspect, and the beneficial effects can be found in the description of any aspect of the first aspect, which will not be repeated here.

On the basis of the implementation manners provided by the above aspects, the present application may further combine to provide more implementation manners.

Description of drawings

1 is a schematic structural diagram of a log system provided by the application;

FIG. 2 is a schematic flowchart one of a log management method provided by the present application;

3 is a schematic diagram of the division of a hard disk provided by the application;

4 is a schematic diagram one of log writing provided by the application;

5 is a second schematic flowchart of a log management method provided by the present application;

6 is a schematic diagram 2 of log writing provided by the application;

7 is a schematic diagram three of log writing provided by the application;

8 is a schematic flowchart three of a log management method provided by the present application;

9 is a schematic structural diagram of a log management device provided by the present application;

FIG. 10 is a schematic structural diagram of a communication device provided by this application.

Detailed ways

For the sake of clarity and conciseness in the description of the following embodiments, a brief introduction of related technologies is first given.

In the usual technical solution, the multi-read and multi-write log system is used to read and write the write-ahead log of business data. Based on the add, read and other interfaces provided by the log system, the upper-layer application can conveniently use only a small amount of work. It implements its own application layer method, and at the same time, the log system ensures the characteristics of strong consistency, data durability, fault atomicity and transaction isolation required by the upper-layer application.

However, a distributed log system based on flash arrays (a shared log design for flash clusters, CORFU) stores logs in the form of files, each log file stores a specific number (eg, 1000) of log entries, which are used for Indicates the data file recorded in at least one data operation (such as business data write) in the log. Usually, CORFU uses an independent file to store the metadata of the log system. For a single log item append operation, CORFU needs to lock the log segment where the log item is located, and the parallel read and write performance is poor. Finally, for all log append operations, CORFU needs to lock the log file to which the new log entry is to be written, resulting in slower log writing speed.

The implementation of the embodiments of the present application will be described in detail below with reference to the accompanying drawings.

1 is a schematic structural diagram of a log system provided by the present application, the log system 100 includes at least one server, and the client 120 can access the log data by using a network to access the server in the log system 100. The communication function of the network can be controlled by Switch or router implementation. In one possible example, the client may also communicate with the server through a wired connection, such as a peripheral component interconnect express (PCIe) high-speed bus.

The client 120 may be a computer running an application program, and the computer running the application program may be a physical machine or a virtual machine. For example, if the computer running the application program is a physical computing device, the physical computing device may be a server or a terminal. The terminal may also be referred to as terminal equipment, user equipment (UE), mobile station (mobile station, MS), mobile terminal (mobile terminal, MT), and the like. The terminal can be a mobile phone, a tablet computer, a notebook computer, a desktop computer, a personal communication service (PCS) phone, a desktop computer, a virtual reality (Virtual Reality, VR) terminal device, an augmented reality (Augmented Reality, AR) terminal device , wireless terminals in industrial control, wireless terminals in self driving, wireless terminals in remote medical surgery, wireless terminals in smart grid, transportation safety ( Wireless terminals in transportation safety), wireless terminals in smart cities, wireless terminals in smart homes, and so on. The embodiments of the present application do not limit the specific technology and specific device form adopted by the client 120 .

The log system provided by the embodiment of the present application may be a distributed storage system or a centralized storage system.

In a possible situation, the log system 100 shown in FIG. 1 may be a distributed storage system. As shown in FIG. 1 , the distributed storage system provided in this embodiment includes a storage cluster integrating computing and storage. The storage cluster includes one or more servers (such as server 110A and server 110B as shown in FIG. 1 ), and each server can communicate with each other.

Here, the server 110A shown in FIG. 1 is used for description. The server 110A is a device having both computing capability and storage capability, such as a server, a desktop computer, and the like. For example, an Advanced Reduced Instruction Set Computer Machines (ARM) server or an X86 server can be used as the server 110A here. In terms of hardware, as shown in FIG. 1 , the server 110A at least includes a processor 112 , a memory 113 , a network card 114 and a hard disk 105 . The processor 112, the memory 113, the network card 114 and the hard disk 105 are connected through a bus. Among them, the processor 112 and the memory 113 are used to provide computing resources. Specifically, the processor 112 is a central processing unit (CPU), which is used to process data access requests (such as log write requests) from outside the server 110A (application server or other servers), and is also used to process the server 110A Internally generated request. Exemplarily, when the processor 112 receives log write requests, it temporarily stores the data in the log write requests in the memory 113 . When the total amount of data in the memory 113 reaches a certain threshold, the processor 112 sends the data stored in the memory 113 to the hard disk 105 for persistent storage. Besides, the processor 112 is also used for data calculation or processing. Only one processor 112 is shown in FIG. 1 . In practical applications, there are often multiple processors 112 , and one processor 112 has one or more CPU cores. This embodiment does not limit the number of CPUs and the number of CPU cores.

The memory 113 refers to the internal memory that directly exchanges data with the processor, it can read and write data at any time, and is very fast, and serves as a temporary data storage for the operating system or other running programs. The memory includes at least two kinds of memory, for example, the memory can be either a random access memory or a read only memory (Read Only Memory, ROM). For example, the random access memory is Dynamic Random Access Memory (DRAM), or Storage Class Memory (SCM). DRAM is a semiconductor memory, and like most Random Access Memory (RAM), it belongs to a volatile memory (volatile memory) device. SCM is a composite storage technology that combines the characteristics of traditional storage devices and memory. Storage-level memory can provide faster read and write speeds than hard disks, but its access speed is slower than that of DRAM, and its cost is cheaper than DRAM. . However, the DRAM and the SCM are only exemplary descriptions in this embodiment, and the memory may also include other random access memories, such as static random access memory (Static Random Access Memory, SRAM) and the like. As for the read-only memory, for example, it may be Programmable Read Only Memory (PROM), Erasable Programmable Read Only Memory (EPROM), and the like. In addition, the memory 113 may also be a Dual In-line Memory Module (Dual In-line Memory Module, DIMM for short), that is, a module composed of Dynamic Random Access Memory (DRAM), or a solid state drive (Solid State Disk, SSD). In practical applications, multiple memories 113 and different types of memories 113 may be configured in the storage server 110A. This embodiment does not limit the quantity and type of the memory 113 . In addition, the memory 113 can be configured to have a power saving function. The power saving function means that when the system is powered off and then powered on again, the data stored in the memory 113 will not be lost. Memory with a power-saving function is called non-volatile memory.

The hard disk 105 is used for providing storage resources, such as storing logs and the like. It can be a magnetic disk or other type of storage medium, such as a solid state drive or a shingled magnetic recording hard drive. For example, the hard disk 105 may be a solid state disk based on a non-volatile memory host controller interface specification (Non-Volatile Memory Express, NVMe), such as an NVMe SSD.

The network card 114 in the server 110A is used to communicate with the client 120 or other application server (such as the server 110B shown in FIG. 1 ).

In one embodiment, the functionality of processor 112 may be offloaded to network card 114 . In other words, in this embodiment, the processor 112 does not perform log reading and writing operations, but the network card 114 performs log reading and writing, address translation, and other computing functions.

Here, the server 110B shown in FIG. 1 is used as an example for description. The network card 114 shown in FIG. 1 may include a processing unit 1141 and a memory 1142 . At this point, the network card 114 is an intelligent network card. The processing unit 1141 may be a CPU or other chips, and the other chips may be, but not limited to, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (ASIC), a field programmable gate Field Programmable Gate Array (FPGA) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. A general-purpose processor may be a microprocessor or any conventional processor.

The memory 1142 may refer to the internal memory that directly exchanges data with the processor, it can read and write logs at any time, and is very fast, as a temporary data storage for the operating system or other running programs. The memory 1142 may be either random access memory or ROM. For example, random access memory is DRAM or SCM. The memory may also include other random access memories such as SRAM and the like. As for the read-only memory, for example, it can be PROM, EPROM and so on. In addition, the memory 1142 may also be a DIMM, that is, a module composed of DRAM, and may also be an SSD. In practical applications, multiple memories 1142 and different types of memories 1142 may be configured in the network card 114 . This embodiment does not limit the quantity and type of the memory 1142 . In addition, the memory 1142 can be configured to have a power saving function. The power saving function means that the data stored in the memory 1142 will not be lost when the system is powered off and then powered on again. Memory with a power-saving function is called non-volatile memory.

For example, in some application scenarios, the network card 114 may also have persistent memory media, such as persistent memory (PM), or non-volatile random access memory (NVRAM), or Phase change memory (phase change memory, PCM), etc. The CPU is used to perform operations such as address translation and reading and writing logs. The memory is used to temporarily store data to be written into the hard disk 105 or read data from the hard disk 105 to be sent to the controller. It can also be a programmable electronic component, such as a data processing unit (DPU). The DPU has the generality and programmability of a CPU, but is more specialized and can operate efficiently on network packets, storage requests, or analytics requests. DPUs are distinguished from CPUs by a greater degree of parallelism (the need to handle a large number of requests). Optionally, the DPU here can also be replaced by a graphics processing unit (graphics processing unit, GPU), an embedded neural network processor (neural-network processing units, NPU) and other processing chips. There is no affiliation between the network card 114 and the hard disk 105, and the network card 114 can access any hard disk 105 in the server 110B where the network card 114 is located, so it is convenient to expand the hard disk when the storage space is insufficient.

The smart NIC can be a general-purpose smart NIC, such as a NIC with 8 processor cores and 2 x 100 giga bits per second (Gb/s) network bandwidth, which can run full operations system.

The smart network card may also be an application-oriented smart network card, and the network card may be an FPGA- or ASIC-based smart network card, such as GPU and NPU for deep neural network acceleration. For other hardware of the server 110B, reference may be made to the relevant content of the server 110A, which will not be repeated here.

In another possible situation, the log system provided by this embodiment of the present application may also be a storage cluster separated from computing and storage, where the storage cluster includes a computing device cluster and a storage device cluster, and the computing device cluster includes one or more computing devices , each computing device can communicate with each other. The computing device may be a computing device, such as a server, a desktop computer, or a controller of a storage array, or the like. In hardware, computing devices may include processors, memory, and network cards, among others. The processor is a CPU for processing data access requests from outside the computing device, or requests generated inside the computing device. Exemplarily, when the processor receives a log write request sent by a user, it temporarily stores the log in the log write request in the memory. When the total amount of log data in the memory reaches a certain threshold, the processor sends the log data stored in the memory to the storage device for persistent storage. In addition, the processor is also used for data calculation or processing, such as metadata management, data deduplication, data compression, virtualized storage space, and address translation.

As an optional implementation manner, the log system provided by the embodiment of the present application may also be a centralized storage system. The characteristic of the centralized storage system is that there is a unified entrance through which all data from external devices must pass. This entrance is the engine of the centralized storage system. The engine is the core component of the centralized storage system, and many advanced functions of the storage system are implemented in it. Illustratively, there can be one or more controllers in the engine. In a possible example, if the engine has multiple controllers, any two controllers can have a mirror channel between them, so that any two controllers can communicate with each other. Backup function, thus avoiding the unavailability of centralized storage system caused by hardware failure. The engine also includes a front-end interface and a back-end interface, wherein the front-end interface is used to communicate with the computing devices in the centralized storage system, thereby providing storage services for the computing devices. The back-end interface is used to communicate with the hard disk to expand the capacity of the centralized storage system. Through the back-end interface, the engine can connect more hard disks, thus forming a very large storage resource pool.

In order to improve the efficiency of log writing, on the basis of the log system 100 shown in FIG. 1 , an embodiment of the present application provides a log management method. Please refer to FIG. 2 . FIG. 2 provides a log management method for this application. First, the client 21 can implement the functions of the client 120 shown in FIG. 1 , and the log management method is executed by the processor 22, which can be the processing in the server 110A shown in FIG. 1 . The server 112 may also be the processing unit 1141 included in the network card 114 in the server 110B, and the log management method includes the following steps.

S210, the processor 22 divides the storage space of the hard disk into multiple segments.

The multiple segments refer to multiple log segments (log segments) in the hard disk for storing logs. One of the plurality of segments includes one or more physical blocks, and the one segment may be used to store logs.

Here, the division of the hard disk 105 shown in FIG. 1 is used as an example for description. As shown in FIG. 3 , FIG. 3 is a schematic diagram of division of a hard disk provided by the present application. The hard disk 105 may be one or more flash memory chips ( chip) package. A flash memory chip (chip) may include one or more dies, and one die may include multiple planes. As shown in Figure 3, a die is internally divided into two Planes, and the block numbers in the two Planes are single and double crossover, so during operation, a single double crossover operation can be performed to improve performance. A plane contains multiple physical blocks (block). A block consists of several pages. Taking a flash memory chip with a capacity of 16 gigabytes (GB) as an example, every 4314*8=34512 cells logically form a page, and each page can store 4 kilobytes (kilo bytes, KB) The content and 218 bytes (byte, B) of error checking and correction (error correcting code, ECC) check data, page is also the smallest unit of input and output (input output, IO) operations. Every 128 pages form a block, and each 2048 blocks form a plane. A whole flash memory chip consists of two planes. One plane stores blocks with odd numbers, and the other stores blocks with even numbers. The two planes can be paralleled. operate. This is just an example, the size of the page, the capacity of the block, and the capacity of the flash memory chip may all have different specifications, which are not limited in this embodiment.

The processor 22 writes data into the block, and when a block is full, the master controller of the hard disk 105 selects the next block to continue writing. A page is the smallest unit of data writing. In other words, the master writes data into the block at the granularity of pages. A block is the smallest unit of data erasure; when the master controller erases data, it can only erase the entire block at a time.

The above-mentioned one segment may include one or more physical blocks (blocks). As shown in FIG. 3 , the first segment includes two physical blocks (blocks), and the second segment includes one physical block (block). It is worth noting that the above-mentioned embodiments are only possible implementations provided by the present application. FIG. 3 illustrates an example where a segment includes an integer number of blocks, but in other possible examples, a segment may also include parts block, for example, if a block includes 4 pages, the above-mentioned one segment can include 3 pages or 2 pages, and the above-mentioned one segment can also include 1 block and 3 pages.

In the usual technical solution, since CORFU uses the file system to manage the storage space of the hard disk, in the process of reading and writing log files, log append writing and log reading are simple read and write operations without complex semantics. Therefore, CORFU Adopting a file system introduces a lot of file management overhead, such as file locks (mandatory access to files can only be accessed by one user or at any given time), resulting in a waste of processing and storage resources.

In the process of log reading and writing, compared to CORFU adding file locks to log files based on file granularity, in the embodiment of the present application, the hardware storage space is divided into multiple segments, and the processor can manage logs based on segments to avoid In order to add a file lock to the log file in the file system, the waste of processing resources and storage resources is reduced, and the efficiency of log reading and writing is improved.

In a possible situation, the process of dividing the storage space of the hard disk by the processor 22 may be performed only once. For example, if the log system reads and writes logs for the first time, the processor 22 may use the log entries in the log system for storing log items. The storage space is divided to obtain the above-mentioned multiple segments.

In another possible situation, the process of dividing the storage space of the hard disk by the processor 22 may be performed multiple times. For example, a timer may be set in the log system, and the timer may be a computer software instruction. In the case of the period set by the processor, the processor 22 divides the storage space of the hard disk in the log system.

In other words, the process of dividing the storage space of the hard disk by the processor 22 may be set or adjusted according to the actual usage requirements of the log system, which is not limited in this application. For example, if the processor 22 receives the division instruction sent by the client, the processor 22 may further divide the storage space of the hard disk according to the division instruction.

S220, the client 21 sends the first log write request and the second log write request to the processor 22.

The log write request provided by the embodiment of the present application refers to a data write request including a log entry to be written. For example, the first log write request includes a first log entry, and the second log write request includes a second log entry.

It is worth noting that the embodiments of this application are described by taking the log system connected to one client as an example, but in some possible examples, multiple log write requests may be sent by two or more clients to Sent by the processor, for example, the above-mentioned first log write request and second log write request may be sent to the processor by two different clients.

In an optional implementation manner, the first log entry provided by this application includes description information and data information, and the description information includes a log sequencer number (log sequencer number, LSN) of the first log write request, a log entry length ( length) and the log length checksum (log length checksum), the data information is used to indicate the log content recorded by the first log item.

In the log management method provided by the embodiment of the present application, the log may include one or more log items, a log segment may store one or more log items, and each log item records part of the log content of the log, and the log content Refers to the event recorded by the first log item in the log. For example, the log content of the first log item may be that the client performs the opening operation of the application (application, APP) at the first time; another example, the above-mentioned second log The log content of the item may be that the client performs the closing operation of the APP at the second time. The log in this embodiment of the present application refers to business data, or a copy, snapshot, or clone of the business data.

As shown in FIG. 4 , FIG. 4 is a schematic diagram 1 of log writing provided by this application. The first log entry in the hard disk includes description information and data information, and the description information includes the log entry length checksum, the log entry length, and the LSN of the first log write request. The log entry length checksum is used to determine the integrity and accuracy of the first log entry, the log entry length is used to determine the size of the storage space occupied by the first log entry, and the LSN is used to indicate the log of the first log write request Add the sequence number of the operation. In some possible examples, the description information included in the first log entry may also be referred to as metadata information of the first log entry.

The data information included in the log entry refers to the log content recorded by the log entry. Usually, the log content recorded by the log entry can refer to the operation type of the data to be stored on the hard disk, the time when the stored data is received, or the data to be stored. One or a combination of data access addresses, etc. The data to be stored refers to content data to be stored on the hard disk, for example, the content data may be a streaming media file. The format of the data to be stored is different from the format of the log entry included in the log write request.

Please continue to refer to FIG. 2 , after the processor 22 receives the first log writing request and the second log writing request, the log management method provided by the embodiment of the present application further includes step S230.

S230, the processor 22 writes the first log entry into the first segment of the plurality of segments in parallel, and writes the second log entry into the first segment.

No locking mechanism is implemented during the process of the processor writing the first log entry into the first segment, or during the process of the processor writing the second log entry into the first segment. A lock is a synchronization mechanism, and the lock overhead includes the memory space occupied by the lock, the time it takes for the processor to initialize and destroy the lock, acquire and release the lock, etc. The more locks an application uses, the greater the corresponding lock overhead.

The granularity of the lock determines the amount of data protected by each lock. In a common technical solution, since CORFU uses file granularity-based locks to lock log files, multiple log items are written to the hard disk in parallel. During the log file process, if the log file is locked, CORFU can only write one log item of the multiple log items into the log file at a time, resulting in a slow writing speed of the multiple log items.

Compared with the problem that multiple log items cannot be written to the hard disk in parallel due to the log system implementing the file locking mechanism in the conventional technology, the log management method provided by the embodiment of the present application does not implement a locking mechanism on the hard disk because the processor does not. , the processor can write the first log item and the second log item to the first segment of the hard disk in parallel, which improves the efficiency of the processor writing multiple log items to the hard disk in parallel.

In this way, in the log management method provided by the embodiment of the present application, the storage space of the hardware is divided into multiple segments, and the processor can manage the logs based on the segments, avoiding the process of adding file locks to the log files in the file system, The waste of processing resources and storage resources is reduced, and the efficiency of log reading and writing is improved; in addition, since the processor does not perform a locking mechanism on the hard disk, the processor can write the first log entry and the second log entry in parallel The first segment of the hard disk improves the efficiency of the processor writing multiple log entries to the hard disk in parallel.

If in the log management method provided by the above embodiment, the processor 22 is the processing unit 1141 in the network card 114 shown in FIG. 1 , then in the process of log reading and writing, the network card 114 can implement the log management method provided by this application. .

In one possible example, the network card 114 uses a direct memory access (DMA) technique to access data. For example, in this example, the processing unit 1141 uses the DMA technology to access the log segment of the hard disk 105 in the server 110B where the network card 114 is located to read and write the log, so as to realize direct hard disk access, so as to reduce the connection between the network card 114 and the processor 112 during the log reading and writing process. The interaction (such as log request replication and transmission, etc.) improves the efficiency of log reading and writing.

In another possible example, the network card 114 uses remote direct memory access (RDMA) technology to access data. For example, in this example, the processing unit 1141 uses the RDMA technology to access the log segment of the hard disk 105 in the server 110B other than the server 110A where the network card 114 is located in the log system to read and write the log, realize remote direct hard disk access, and reduce the number of logs in the log. During the reading and writing process, the network card 114 interacts with the processor 112 in the server 110B, and then the processor 112 in the server 110B interacts with the hard disk 105 of the server 110B, which improves the efficiency of log reading and writing.

In this way, the log management method provided by the embodiments of the present application is implemented by the processing unit of the network card in the server, which can reduce the processing resource consumption of the CPU in the server, thereby improving the efficiency of processing other data by the CPU in the server.

As an optional implementation manner, in the process of log writing, the processor can also sort the log write requests, please continue to refer to FIG. 4 , the processor in FIG. 4 can implement the log management method provided by this application , the processor may include a processing unit and an internal memory, the internal memory is provided with a sequencer, the sequencer may be a computer software (such as an application program or a thread, etc.), and the sequencer is integrated with a log sorting function hardware circuit. The sorter can globally sort the logs in the log system, so that all logs in the log system are globally ordered, which is beneficial for each server in the log system to achieve consistent snapshots and cross-region atomic updates based on the LSN of each log. wait

As shown in FIG. 4, the storage space of the hard disk includes a metadata area and a log segment area.

The metadata area is used to store the metadata information of each log segment in the hard disk. As shown in Figure 4, the metadata information in the metadata area includes segment header, header LSN and segment end. The LSN of a log entry, the segment tail is used to indicate the end storage address of the log segment on the hard disk.

The log segment area is divided into 3 log segments (the first segment, the second segment, and the third segment), and each log segment can store one or more log entries. As shown in FIG. 4, a third log entry is stored in the first segment.

After the processor receives the first log write request and the second log write request, the sorter sorts the first log write request and the second log write request. For example, the LSN of the first log entry in the first log write request is The first LSN, the LSN of the second log entry in the second log write request is the second LSN.

It is worth noting that the sequencer shown in FIG. 4 is set in the internal memory of the processor, but in some possible examples, in order to reduce the processing resource occupation of the processor, the sequencer may also be set in the client In the terminal, or set in another processor, for example, the other processor is used to sort the log write requests sent by the client, which is not limited in this application.

In order to determine the position of the first segment in the multiple segments included in the hard disk, before the processor 22 writes the first log entry and the second log entry into the first segment in parallel, the embodiment of the present application further provides a method for determining the first segment For example, the processor 22 may match the LSN of the first log write request with the log index to determine that the segment to be written in the first log entry included in the first log write request is the first segment. The log index is used to indicate the correspondence between the first segment and the LSN.

In a possible implementation manner, the above-mentioned log index is stored in the cache of the processor. As shown in FIG. 1, the cache may be a memory 1142, and the memory 1142 may be a DRAM. During the log writing process, the processing unit 1141 can read the log index in the cache (memory 1142), so as to quickly determine the segment to be written for the log item included in the log write request, and reduce the path query time required for log writing , which improves the efficiency of log writing.

In another possible implementation manner, the above-mentioned log index is stored in another cache connected to the processor. For example, the other cache includes one or more caches, the processor can use DMA or RDMA technology to read the log index from the other cache, and then the processor matches the LSN of the log write request with the log index to determine the write log index. The segment to be written by the log entry included in the log request reduces the path query time required for log writing and improves the efficiency of log writing. For another example, the above-mentioned other cache may be a distributed memory system, and the processor accesses the distributed memory system to read the above-mentioned log index.

As an optional implementation manner, the above-mentioned log index includes a first index and a second index.

As shown in FIG. 4 , the first index may be called a first-level index or a segment cache index (segment cache index), and the first index is used to indicate the storage address range of the first segment in the hard disk. Exemplarily, the first index points to the secondary index corresponding to the first segment among the multiple segments of the hard disk and the starting storage address of the log segment. Specifically, the first index includes the identifiers of multiple segments in the hard disk, the LSN of the first log entry (or the log entry at the beginning of the segment) stored in each segment in the multiple segments, and the log entry of the first log entry. The corresponding starting storage address. In some possible examples, if the hard disk further includes other segments (the second segment and the third segment shown in FIG. 4 ), the first index is also used to indicate the storage address range of the other segments in the hard disk. As shown in Table 1 below, the first index may include the segment identifier, the LSN of the segment header log entry, and the start storage address of the segment header log entry.

Table 1

段标识Segment ID	段首日志项的LSNLSN of the segment header log entry	起始存储地址starting storage address
第一段first paragraph	001001	0000 0000H0000 0000H
第二段second paragraph	101101	0001 0000H0001 0000H
第三段third paragraph	201201	0010 0000H0010 0000H

Among them, the H at the end is used to indicate that the starting storage address is expressed in hexadecimal form.

The LSN of the first log entry in the first segment is 001, and the starting storage address is 0000 0000H; the storage address range indicated by the first segment is: 0000 0000H~0000 FFFFH.

The LSN of the first log entry in the second segment is 101, and the starting storage address is 0001 0000H; the storage address range indicated by the second segment is: 0001 0000H~0001 FFFFH.

The LSN of the first log entry in the third segment is 201, and the starting storage address is 0001 0001; the storage address range indicated by the third segment is: 0010 0000H~0010 FFFFH.

As shown in FIG. 4 , the second index may be referred to as a secondary index or a log entry cache index, and the second index is used between the LSN and the storage address of each log entry in at least one storage address range corresponding relationship. The second index includes the correspondence between the LSN and the storage address of each log entry in one or more segments in the hard disk.

In a first possible example, the second index includes the correspondence between the LSN and the storage address of each log entry in a segment of the hard disk, as shown in Table 2, and the second index includes the relationship shown in FIG. 4 . The LSN and storage address of each log entry in the first segment of .

Table 2

For example, the second index includes 4 log items, and the correspondences between the LSNs of the 4 log items and the storage addresses are: 001-0000 0000H, 002-0000 0010H, 003-0000 0011H, 004-0000 0030H .

In a second possible example, the second index includes the correspondence between the LSN and the storage address of each log entry in the multiple segments of the hard disk. For example, the second index includes the first segment shown in FIG. 4 . and the LSN and storage address of each log entry in the second paragraph. For the corresponding relationship between the LSN of the log entry and the storage address, reference may be made to the relevant content in Table 2 above, which will not be repeated here.

Since the storage capacity of the cache of the processor is small, a large amount of index data cannot be stored in the cache. In the log management method provided by the embodiment of the present application, the log index includes a first index and a second index. Any one of the first index and the second index is preferentially matched. If the match is successful, the processor can be prevented from searching for all data in the log index, the path query time required for log writing is reduced, and the log writing efficiency is improved.

In the present application, since the processor uses the log index to determine the first segment to be stored in the first log entry, without using the file system to determine the log file to be stored in the first log entry, the storage management overhead caused by the file system is reduced, Thus, the storage path query time required for log writing is reduced, and the efficiency of log writing is improved.

In addition, in the process of writing the first log entry into the first segment by the processor, since the hard disk does not introduce a file system when storing the log, in the case where the processor writes multiple log entries in parallel, the processor does not need to Multiple log segments to be written by multiple log entries are locked, which reduces the process of setting file locks for each log segment by the processor and improves the efficiency of log writing.

In order to determine the first segment to be written in the first log entry and the second log entry, the present application provides a possible specific implementation manner, as shown in FIG. 5 , which is a flowchart of a log management method provided by the present application In schematic diagram 2, after the above S220, the process that the processor matches the LSN of the first log write request with the log index to determine the first segment includes the following steps S221-S223.

S221: Determine whether the LSN of the first log write request is continuous with the LSN of the third log entry.

The third log entry is any log entry recorded by the second index. As shown in Table 2, the third log entry may refer to a log entry whose LSN is 004, and the starting storage address of the third log entry is 0000 0030H.

Usually, the processor writes the log into the log system by appending. In the example of this application, the processor determines whether the LSN of the first log write request is the same as the LSN of the third log entry in the second index according to the LSN of the third log entry in the second index. The LSN of the first log write request is continuous, and then, if the LSN of the first log write request is continuous with the LSN of the third log item, then execute S222; if the LSN of the first log write request is not continuous with the LSN of the third log item, Then execute S223.

S222, taking the segment where the third log entry is located as the first segment.

If the LSN of the third log entry in the second index is continuous with the LSN of the first log write request, the processor takes the segment where the third log entry is located as the first segment, so that the processor uses appending to write the first segment. The first log entry included in the first log write request is written into the first segment, which prevents the processor from searching the file system for the to-be-written file of the first log entry included in the first log write request, and reduces the log writing process. The query time of the write path is shortened, which improves the efficiency of log writing.

S223: Match the LSN of the first log write request with the first index to determine the first segment.

The LSN of the first log entry (log entry at the beginning of the segment) in the first segment matches the LSN of the first log write request. In this example, "matching" means that the LSN of the first log write request is in the first segment. Within the LSN range, the LSN range corresponds to the storage address range of the first segment. As shown in Table 1, the LSN of the segment header log entry (first log entry) of the first segment is 001, the LSN of the segment header log entry of the second segment is 101, and the LSN of the segment header log entry of the third segment is 201, the LSN range corresponding to the first segment is "001-100", and the LSN range corresponding to the second segment is "101-200".

For example, if the LSC of the first log write request is any one of "001-100", it is confirmed that the segment to be written in the first log entry included in the first log write request is the first segment.

For another example, if the LSC of the first log write request is any one of "101-200", it is confirmed that the segment to be written in the first log entry included in the first log write request is the second segment.

It should be noted that the above S221 to S223 are only a possible way of determining the first paragraph provided by the embodiments of the present application, and should not be construed as a limitation on the present application.

In the embodiment of the present application, the processor first matches the LSN of the first log write request with the LSN of the second index including the third log entry. If the LSN of the first log write request is continuous with the LSN of the third log entry, Then the processor does not need to match the LSN of the first log write request with the first index. Compared with the processor matching the first log write request with the directory of the file system, a large amount of path query time is reduced, and log writing is improved. s efficiency.

As an optional implementation manner, in order to improve the accuracy of log writing, this embodiment of the present application provides a possible embodiment. As shown in FIG. 5 , FIG. In an implementation manner of the processor writing the first log entry into the first segment, the above-mentioned S230 may include the following steps.

S2301, the processor 22 generates an initial storage address according to the LSN of the first log write request.

The starting storage address is in the storage address range of the first segment. According to the content shown in Table 1, if the storage address range of the first segment is 0000 0000H～0000 FFFFH, if the LSN of the first log write request is continuous with the LSN (001) of the segment header log entry of the first segment, then The starting memory address can be 0000 0010H.

S2302, the processor 22 writes the first log entry into the first segment according to the initial storage address.

In the process of writing the log items to the hard disk, the processor first generates the initial storage address according to the LSN of the log write request, and then writes the log items included in the log write request to the hard disk according to the initial storage address. The log entries are written to the hard disk to ensure the accuracy of log writing.

As an optional implementation, the processor can atomically write log entries to disk. "Atomic write" can be understood as the processor writes log items to the hard disk according to the preset read and write units. The preset read and write units may refer to the log read and write units of the hard disk. The log read and write units of the hard disk are Refers to the minimum data granularity for the processor to read and write log items in the hard disk. The log read and write unit of the hard disk can be set according to the type of the hard disk, and can also be adjusted according to the needs of users, such as 4KB, 8KB or 16KB.

In a possible example, before the above S2301, the processor further determines whether the log entry length of the log write request is less than or equal to the log read/write unit of the hard disk. If the log entry length of the log write request is less than or equal to the log read/write unit of the hard disk, the above-mentioned S2301 is executed.

In another possible example, if the log entry length (such as 12KB) of the log write request is greater than the log read/write unit (such as 4KB) of the hard disk, as shown in FIG. 6, FIG. 6 provides a log write request provided by this application In the second schematic diagram, the processor divides the log items included in the log write request into multiple sub-logs, and the log item length of each sub-log in the multiple sub-logs is consistent with the hard disk log read-write unit; The LSN generates the starting storage address of each sub-log in the multiple sub-logs. For the process of generating the starting storage address by the processor according to the LSN, reference may be made to the example of S2301, which will not be repeated here.

In this example, each sub-log in the multiple sub-logs includes description information of the log item to be written, and the description information is consistent with the description information in the log write request. For the content of the description information, please refer to the related content of FIG. 4 , here No longer.

In a possible situation, in order to distinguish the above-mentioned multiple sub-logs, the processor may also assign a sub-sequence number to each sub-log, and the sub-sequence number is used to indicate that the sub-log is in the log entry to be written (such as the first log entry or second log entry).

In the case where the log entry length of the log entry included in the log write request is large, the processor can write multiple sub-logs according to the log read/write unit of the hard disk, even when the processor directly writes the log entry included in the log write request. When the hard disk and the processor are powered off, the sub-logs that have been written to the hard disk by the processor will not be repeatedly written (or read) by the processor, which reduces the storage resource consumption of the hard disk and improves the storage resource utilization efficiency of the hard disk.

As another optional implementation, if the remaining storage space of the first segment is less than the log entry length of the log write request, the processor may also write part of the log write request into the first segment, and write another part of the log write request into the first segment. A part of the content is written into the second segment, where the second segment is a log segment in the hard disk that has an associated relationship with the first segment that is continuously read. Then, the processor sets the log tail address in the hard disk as the start storage address of the first log entry in the second segment.

As shown in FIG. 7 , FIG. 7 is a schematic diagram 3 of log writing provided by this application. If the remaining storage space of the first segment is less than the log entry length of the log write request, the processor will write the log entries included in the log write request to be written. The log entry is divided into multiple sub-logs, and part of the content (sub-logs with

sub-sequence numbers

1 and 2 shown in Figure 7) is written into the first segment, and another part of the content (sub-log shown in Figure 7) The sub-log with sequence number 3) is written into the second segment that has an association relationship with the first segment that is continuously read.

As shown in FIG. 7 , if the storage space indicated by the second segment is not fully occupied, the second segment further includes a blank area, and the blank area can be used to store other log items.

In the embodiment of the present application, the log tail address is represented by the starting storage address of the log entry at the segment head of the log segment where the last log is located.

It is worth noting that the content shown in FIG. 7 is described by taking the processor writing the log entry to be written included in the log write request into two log segments as an example, but in some possible situations, the processor may also The log entry to be written is written into more log segments, which is not limited in this application.

That is to say, in the embodiment of the present application, for the log adding operation, only when the log entry needs to be stored in the newly allocated log segment (such as the above-mentioned second segment), it is necessary to update the start of the log segment where the log tail is located. start storage address. For example, set the size of the log segment to 32MB, and the offset of the log tail in the log segment in the log system is “31×1024×1024+1024×1024-3×1024”. If the client performs a 4KB log append operation at this time, the processor needs to allocate a new log segment to store the log entry, and update the starting storage address of the log segment where the log tail is located. In other cases, such as the log entry does not need to be stored in the newly allocated log segment, the processor does not need to update the starting storage address of the log segment where the log tail is located.

In this way, in the above log management method provided by the embodiment of the present application, the log tail address can be represented by the starting storage address of the log segment where the log tail is located, avoiding the process of updating the log tail address every time a log is added in the file system. The random lowercase operation of the log tail address required for log addition is reduced, and the efficiency of log writing is improved.

In order to ensure the high availability of the log system, the client can also back up the log during the log writing process. As shown in FIG. 8 , FIG. 8 is a schematic flow chart 3 of a log management method provided by this application. The master node 810 and the slave node 820 shown in FIG. 8 are any servers in the log system 100 shown in FIG. 1 . Or a storage device, for example, the master node 810 is the server 110B, and the slave node 820 is the server 110A. The meaning of each time mark in FIG. 8 is shown in Table 3 below.

table 3

(a) in FIG. 8 shows the time taken for log writing in the conventional technique, and the total time is T total (usually)=2T1+T2+2T3+2T4.

(b) in FIG. 8 shows the time required for log writing in the log management method provided by the present application, and the total time is T total (this application)=2T1+T2+T3+T4.

Please refer to (b) in FIG. 8 , if the processor 22 is the master node 210 shown in FIG. 8 , after the master node 810 generates the starting storage address according to the LSN of the first log write request, that is, after T2, the master node 810 The LSN and the starting storage address of the first write log request are sent to the slave node 820 . The slave node 820 can write the first log entry included in the first log write request into the first segment according to the LSN and the starting storage address of the first log write request. For the specific process, please refer to the above-mentioned FIG. 2 to FIG. 7 . The related content will not be repeated here.

That is to say, in the case where the log system has multiple storage devices, if the processor 22 is the master node 810, the master node 810 sends the LSN and the starting storage address of the log write request to the slave node 820, and the slave node 820 sends the LSN and the starting storage address of the log write request to the slave node 820 according to For the LSN and the starting storage address of the log write request, the log items included in the log write request are written into the first segment, which avoids the need to go to the slave node 820 for backup after the log writing of the master node 810 is completed, and improves the multi-write scenario. write efficiency.

As an optional implementation manner, if the processor 22 in the above embodiment is the slave node 820 shown in FIG. 8 , after the processor 22 (slave node 820 ) receives the first log write request, the processor 22 It can also receive the LSN and the initial storage address of the first write log request sent by the backup device (main node 810), and then, the processor 22 writes the first log according to the LSN and the initial storage address of the first write log request. The first log entry included in the request is written into the first segment, and the specific process can refer to the above-mentioned related content of FIG. 2 to FIG. 7 , which will not be repeated here.

That is to say, in the case where the log system has multiple storage devices, if the processor 22 is the slave node 820, the processor 22 can also receive the backup device (master node 810) after receiving the log write request sent by the client. The LSN and starting storage address of the log write request sent, and the processor 22 (slave node 820) writes the log items included in the log write request into the first segment according to the LSN and starting storage address of the log write request , which avoids the backup of the slave node 820 after the log writing of the master node 810 is completed, and improves the writing efficiency of the multi-write scenario.

In the log management method provided by this embodiment of the present application, after the slave node 820 writes the log items included in the log write request to the hard disk, the slave node 820 may also send a log write response to the client, where the log write response is used to indicate that the log write request has been The log entries included in the log write request (such as the above-mentioned first log entry and second log entry) are written to the hard disk. In this way, the client can determine that the current round of log addition operation has been completed according to the log write response fed back from the node 820, so as to avoid the client from repeatedly sending log write requests to the log system, resulting in repeated backups of log write requests and saving the storage resources of the log system. , to improve the storage resource utilization of the log system.

Specifically, the present application separates the control (LSN and starting storage address) and data (log entry) in the log management method, and for the log addition operation, the client can concurrently send a log write request to the master node 810 and the slave node 820 , after receiving the log write request, the master node 810 generates a corresponding log entry sequence number and a log entry start storage address, and sends them to the slave node 820 . The master node 810 and the slave node 820 can persist the log entries in parallel. After the persistence operation is completed, the master node 810 and the slave node 820 update the log entry cache index (such as the above-mentioned second index), and send confirmation information (such as the above-mentioned log write response) to the client. When the client receives confirmation information from all nodes, it determines that the log addition operation is complete.

For log read operations, the client can read from the master node and the slave node at will, which can reduce the bandwidth occupied by the client on the network card of the server where the master node is located, and effectively improve the bandwidth utilization of the server where the master node is located.

It can be understood that, in order to implement the functions in the foregoing embodiments, the network device and the terminal device include corresponding hardware structures and/or software modules for performing each function. Those skilled in the art should easily realize that the units and method steps of each example described in conjunction with the embodiments disclosed in the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is performed by hardware or computer software-driven hardware depends on the specific application scenarios and design constraints of the technical solution.

FIG. 9 is a schematic structural diagram of a log management apparatus provided by the present application. The log management apparatus 900 can be used to implement the function of the processor in the above method embodiments, and thus can also achieve the beneficial effects of the above method embodiments. In the embodiment of the present application, the log management apparatus 900 may be the processor 22 shown in FIG. 2 , the server 110A, the server 110B or the network card 114 shown in FIG. 1 , or may be applied to a server A module (such as a chip), such as the processor 112 or the processing unit 1141 shown in FIG. 1 .

As shown in FIG. 9 , the log management apparatus 900 includes a processing module 910 , a communication module 920 and an indexing module 930 . The log management apparatus 900 is used to implement the functions of the processor or the hard disk in the method embodiments shown in the foregoing Fig. 2 to Fig. 8 .

When the log management apparatus 900 is used to implement the function of the processor 22 in the method embodiment shown in FIG. 2 : the processing module 910 is used to execute S210 and S230; the communication module 920 is used to execute S220.

When the log management apparatus 900 is used to implement the function of the processor 22 in the method embodiment shown in FIG. 5 : the processing module 910 is used to execute S210 and S2301-S2302; the communication module 920 is used to execute S220, and the index module 930 is used to execute S221 to S223.

More detailed descriptions of the above-mentioned processing module 910 , communication module 920 and indexing module 930 can be obtained directly by referring to the relevant descriptions in the method embodiments shown in FIG. 2 to FIG. 8 , and details are not repeated here.

FIG. 10 is a schematic structural diagram of a communication device provided by this application. The communication device 1000 can implement the operation steps of the log management methods shown in FIGS. 2 to 8 . The communication device 1000 includes a processor 1010 and a communication interface 1020 . The processor 1010 and the communication interface 1020 are coupled to each other. It can be understood that the communication interface 1020 can be a transceiver or an input-output interface.

The specific connection medium between the communication interface 1020 , the processor 1010 , and the memory 1030 is not limited in the embodiments of the present application. In the embodiment of the present application, the communication interface 1020, the processor 1010, and the memory 1030 are connected through a bus 1040 in FIG. 10. The bus is represented by a thick line in FIG. 10, and the connection between other components is only for schematic illustration. , is not limited. The bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in FIG. 10, but it does not mean that there is only one bus or one type of bus.

Optionally, the communication device 1000 may further include a memory 1030 for storing instructions executed by the processor 1010 or input data required by the processor 1010 to run the instructions or data generated after the processor 1010 runs the instructions. For example, the memory 1030 may Used to store the above log index. The memory 1030 may be used to store software programs and modules, such as program instructions/modules corresponding to the log management method provided by the embodiments of the present application, and the processor 1010 executes various functional applications by executing the software programs and modules stored in the memory 1030. and data processing. The communication interface 1020 can be used for signaling or data communication with other devices. The network device 1000 may have multiple communication interfaces 1020 in this application.

When the communication device 1000 is used to implement the methods shown in FIGS. 2 to 8 , the processor 1010 is used to perform the functions of the above-mentioned processing unit 910 , and the communication interface 1020 is used to perform the functions of the above-mentioned transceiver unit 920 .

It can be understood that the processor in the embodiments of the present application may be a CPU, an NPU, or a GPU, and may also be other general-purpose processors, DSPs, ASICs, FPGAs, or other programmable logic devices, transistor logic devices, hardware components or the like. random combination. A general-purpose processor may be a microprocessor or any conventional processor.

As shown in FIG. 10 , the communication device 1000 provided by the present application may be a network card (the network card 114 shown in FIG. 1 ), a server, a mobile phone, a tablet computer, a notebook computer or a desktop computer, and the like.

The method steps in the embodiments of the present application may be implemented in a hardware manner, or may be implemented in a manner in which a processor executes software instructions. The software instructions can be composed of corresponding software modules, and the software modules can be stored in RAM, flash memory, ROM, PROM, EPROM, EEPROM, registers, hard disk, removable hard disk, CD-ROM or any other form of storage medium well known in the art . An exemplary storage medium is coupled to the processor, such that the processor can read information from, and write information to, the storage medium. Of course, the storage medium can also be an integral part of the processor. The processor and storage medium may reside in an ASIC. Alternatively, the ASIC may be located in a network device or in an end device. Of course, the processor and the storage medium may also exist in the network device or the terminal device as discrete components.

In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer programs or instructions. When the computer program or instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are executed in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, network equipment, user equipment, or other programmable apparatus. The computer program or instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer program or instructions may be downloaded from a website site, computer, A server or data center transmits by wire or wireless to another website site, computer, server or data center. The computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server, data center, or the like that integrates one or more available media. The usable media may be magnetic media, such as floppy disks, hard disks, magnetic tapes; optical media, such as digital video discs (DVD); and semiconductor media, such as SSDs.

In the various embodiments of the present application, if there is no special description or logical conflict, the terms and/or descriptions between different embodiments are consistent and can be referred to each other, and the technical features in different embodiments are based on their inherent Logical relationships can be combined to form new embodiments.

The terms "first", "second" and "third" in the description and claims of the present application and the above drawings are used to distinguish different objects, rather than to limit a specific order. In the embodiments of the present application, words such as "exemplary" or "for example" are used to represent examples, illustrations or illustrations. Any embodiments or designs described in the embodiments of the present application as "exemplary" or "such as" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present the related concepts in a specific manner.

In this application, "at least one" refers to one or more, "multiple" refers to two or more, and other quantifiers are similar. "And/or" describes the association relationship between associated objects, indicating that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone. Furthermore, occurrences of the singular forms "a", "an" and "the" do not mean "one or only one" unless the context clearly dictates otherwise, but rather "one or more" in one". For example, "a device" means to one or more such devices. Furthermore, at least one of means one or any combination of subsequent associated objects, for example "at least one of A, B and C" includes A, B, C, AB, AC, BC, or ABC . In the text description of this application, the character "/" generally indicates that the related objects before and after are a kind of "or" relationship; in the formula of this application, the character "/" indicates that the related objects are a kind of "division" Relationship.

It can be understood that, the various numbers and numbers involved in the embodiments of the present application are only for the convenience of description, and are not used to limit the scope of the embodiments of the present application. The size of the sequence numbers of the above processes does not mean the sequence of execution, and the sequence of execution of each process should be determined by its function and internal logic.

Claims

A log management method, comprising:

Divide the storage space of the hard disk into multiple segments, one segment includes one or more physical blocks, and the one segment is used to store logs;

receiving a first log write request and a second log write request, the first log write request includes a first log entry, and the second log write request includes a second log entry;

The first log entry is written to the first segment of the plurality of segments in parallel, and the second log entry is written to the first segment.
The method according to claim 1, wherein in the process of writing the first log entry into the first segment or during the writing the second log entry into the first segment There is no locking mechanism implemented during the segment.
The method according to claim 1 or 2, wherein the method further comprises:

Matching the log sequence number LSN of the first log write request with the log index to determine that the segment to be written in the first log entry is the first segment, and the log index is used to indicate the first segment Correspondence with the LSN.
The method according to claim 3, wherein the log index comprises a first index and a second index, the first index is used to indicate a storage address range of the first segment in the hard disk, and the The second index is used for the correspondence between the LSN and the storage address of each log entry in at least one of the storage address ranges;

Matching the LSN with the log index to determine the first segment includes:

Judging whether the LSN of the LSN is continuous with the LSN of the third log entry, and the third log entry is any log entry recorded by the second index;

If the LSN is continuous with the LSN of the third log entry, the segment where the third log entry is located is used as the first segment;

If the LSN is not continuous with the LSN of the third log entry, the LSN is matched with the first index to determine the first segment, where the LSN of the first log entry in the first segment is the same as the first index. The LSN of the first write log request matches.
The method according to claim 3 or 4, wherein the log index is stored in a cache.
The method according to any one of claims 1-5, wherein the first log entry includes description information and data information, and the description information includes an LSN of the first log write request and a log entry length and a log entry length checksum, the data information is used to indicate the log content recorded by the first log entry.
The method according to claim 6, wherein writing the first log entry into the first segment comprises:

generating an initial storage address according to the LSN of the first write log request, and the initial storage address is in the storage address range of the first segment;

The first log entry is written into the first segment according to the starting storage address.
The method according to claim 7, wherein after generating the starting storage address according to the LSN of the first log write request, the method further comprises:

Send the LSN and the starting storage address of the first log write request to the backup device.
The method according to any one of claims 1-8, wherein after the receiving the first log write request, the method further comprises:

receiving the LSN and the initial storage address of the first log write request sent by the backup device;

The first log entry is written into the first segment according to the LSN of the first log write request and the start storage address.
The method according to any one of claims 1-9, wherein the method further comprises:

A log write response is sent to the client, where the log write response is used to indicate that the first log and the second log have been written to the first segment.
A log management device, comprising:

a processing module, configured to divide the storage space of the hard disk into multiple segments, one segment includes one or more physical blocks, and the one segment is used for storing logs;

a communication module, configured to receive a first log write request and a second log write request, the first log write request includes a first log entry, and the second log write request includes a second log entry;

The processing module is further configured to write the first log entry into the first segment of the plurality of segments in parallel, and write the second log entry into the first segment.
The apparatus according to claim 11, wherein in the process of writing the first log entry into the first segment or during the writing the second log entry into the first segment There is no locking mechanism implemented during the segment.
The device according to claim 11 or 12, wherein the device further comprises:

An indexing module, configured to match the log sequence number LSN of the first log write request with a log index, and determine that the segment to be written in the first log entry is the first segment, and the log index is used to indicate Correspondence between the first segment and the LSN.
The apparatus according to claim 13, wherein the log index includes a first index and a second index, the first index is used to indicate a storage address range of the first segment in the hard disk, and the The second index is used for the correspondence between the LSN and the storage address of each log entry in at least one of the storage address ranges;

The index module is specifically configured to judge whether the LSN of the LSN and the third log entry are continuous, and the third log entry is any log entry recorded by the second index;

If the LSN is continuous with the LSN of the third log entry, the index module is specifically configured to use the segment where the third log entry is located as the first segment;

If the LSN is not continuous with the LSN of the third log entry, the index module is specifically configured to match the LSN with the first index, and determine the first segment, where the first segment in the first segment is the first segment. The LSN of each log entry matches the LSN of the first log write request.
The apparatus according to claim 13 or 14, wherein the log index is stored in a cache.
The apparatus according to any one of claims 11-15, wherein the first log entry includes description information and data information, and the description information includes an LSN of the first log write request and a log entry length and a log entry length checksum, the data information is used to indicate the log content recorded by the first log entry.
The apparatus according to claim 16, wherein the processing module is specifically configured to generate a starting storage address according to the LSN of the first log write request, and the starting storage address is in the first segment storage address range;

The processing module is specifically configured to write the first log entry into the first segment according to the starting storage address.
The apparatus according to claim 17, wherein the communication module is further configured to send the LSN and the starting storage address of the first log write request to the backup device.
The apparatus according to any one of claims 11-18, wherein the communication module is further configured to receive the LSN and the initial storage address of the first log write request sent by the backup device;

The processing module is further configured to write the first log entry into the first segment according to the LSN of the first log write request and the starting storage address.
The apparatus according to any one of claims 11-19, wherein the communication module is further configured to send a log write response to the client, where the log write response is used to indicate that the first A log and the second log are written to the first segment.
A communication device, characterized in that it includes a chip, the chip includes a processor and a memory, the memory is used for storing computer instructions, when the processor executes the computer instructions, so that the communication device executes the claims The method of any one of 1-10.
A computer-readable storage medium, characterized in that, a computer program or instruction is stored in the storage medium, and when the computer program or instruction is executed by a processor or a communication device, any one of claims 1 to 10 is implemented. method described in item.