CN117056294A - WAL processing method and device, electronic equipment and storage medium - Google Patents

WAL processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117056294A
CN117056294A CN202210491712.2A CN202210491712A CN117056294A CN 117056294 A CN117056294 A CN 117056294A CN 202210491712 A CN202210491712 A CN 202210491712A CN 117056294 A CN117056294 A CN 117056294A
Authority
CN
China
Prior art keywords
data
data block
area
memory mapping
processed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210491712.2A
Other languages
Chinese (zh)
Inventor
吴坤
刘京波
王能
徐丹
邵珠光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202210491712.2A priority Critical patent/CN117056294A/en
Publication of CN117056294A publication Critical patent/CN117056294A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1737Details of further file system functions for reducing power consumption or coping with limited storage space, e.g. in mobile devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to the technical field of data processing, in particular to a WAL processing method, a WAL processing device, electronic equipment and a storage medium, which are used for receiving a data segment to be processed and adding the received data segment to be processed into a preset data block; when the total number of the data segments to be processed contained in the data block is determined and a preset number condition is met, the data block is transferred from the buffer area to the memory mapping area; and calling a preset memory mapping function, and mapping the data block from the memory mapping area to a pre-written log WAL in a disk to obtain a mapped WAL. In this way, the data segment to be processed is stored in the form of a data block, and in the process of data reading, all data in the WAL is not required to be read, so that the resource consumption of an operating system can be reduced, and the switching consumption from a user mode to a kernel mode can be reduced by calling a memory mapping function to map the data block, so that the system resource consumption is reduced.

Description

WAL processing method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a WAL processing method, a WAL processing device, an electronic device, and a storage medium.
Background
Currently, with the development of computer technology, in order to ensure the recoverability of operation data, the operation data may be stored in a Write-Ahead log (WAL).
For example, when the operating system runs, the operating data is stored in the WAL, and after the system crashes, the corresponding operating data is read from the WAL, so as to realize data recovery.
In the related art, referring to fig. 1, a flow chart of a WAL processing method in the related art is shown, and the flow chart is that when an operating system runs in a kernel mode, the operating system has the right of accessing a disk, so when the operating data is stored in the WAL, firstly, a WAL file is created, when the creation of the WAL file is determined to be completed, the operating data is copied from a storage space of the operating system running in a user mode to a storage space running in the kernel mode, then the operating data is stored in the WAL running in the kernel mode from the storage space in the kernel mode, and in order to be able to quickly read the operating data, a corresponding log snapshot can be generated and stored when the file size of the WAL file is larger than a content capacity threshold, and when the operating data is read, if the log snapshot is determined to exist, the log snapshot is read, and the non-snapshot WAL file is read, so that the operating data is obtained, and if the non-snapshot is determined to not exist, the log snapshot is directly read, so that the operating data is obtained.
However, since the operation data needs to be copied from the user-state storage space to the kernel-state storage space, a large amount of system resources are consumed in the data copying process, thereby reducing the system performance when the operation data is stored.
In addition, since all operation data are bound into a whole after the log snapshot is generated, if one section of operation data is expected to be read, the whole log snapshot needs to be read, and then the required operation data is extracted from the log snapshot, so that the resource consumption of an operating system is increased.
Disclosure of Invention
The embodiment of the application provides a WAL processing method, a WAL processing device, electronic equipment and a storage medium, which are used for improving the performance of storing operation data and reducing the resource consumption.
The specific technical scheme provided by the embodiment of the application is as follows:
in one aspect, an embodiment of the present application provides a WAL processing method, including:
receiving a data segment to be processed, and adding the received data segment to be processed into a preset data block, wherein the data block is arranged in a buffer area of a user state storage area;
when the total number of the data segments to be processed contained in the data block is determined and a preset number condition is met, the data block is transferred from the buffer area to a memory mapping area, and the memory mapping area is arranged in the user state storage area;
And calling a preset memory mapping function, and mapping the data block from the memory mapping area to a pre-written log WAL in a disk to obtain a mapped WAL.
In one aspect, an embodiment of the present application provides a WAL processing apparatus, including:
the adding module is used for receiving the data segment to be processed and adding the received data segment to be processed into a preset data block, wherein the data block is arranged in a buffer area of the user state storage area;
the processing module is used for transferring the data blocks from the buffer area to a memory mapping area when the total number of the data segments to be processed contained in the data blocks is determined and a preset number condition is met, and the memory mapping area is arranged in the user mode storage area;
and the mapping module is used for calling a preset memory mapping function, mapping the data block from the memory mapping area to a pre-written log WAL in a disk, and obtaining the mapped WAL.
Optionally, when the received data segment to be processed is added to a preset data block, the adding module is further configured to:
determining data attribute information corresponding to the received data segment to be processed;
splicing the data attribute information and the data segment to be processed according to a preset splicing format to obtain a spliced data segment to be processed;
And adding the spliced data segment to be processed into a preset data block.
Optionally, the processing module is further configured to:
if the storage mode corresponding to the data block is determined to be a single data segment, the data block is transferred from the buffer area to the memory mapping area;
and if the storage mode is determined to be the multi-data segment, transferring the data block from the buffer area to the memory mapping area when the total number of the data segments to be processed contained in the data block is determined to be greater than a preset number threshold.
Optionally, when the data block is transferred from the buffer area to the memory mapping area, the processing module is further configured to:
when the length of the data block corresponding to the data block is determined to be larger than the residual memory capacity of the memory mapping area, randomly reading a memory mapping area with an idle working state from a preset mapping pool, wherein the mapping pool comprises a plurality of memory mapping areas;
and transferring the data block to the read memory mapping area.
Optionally, when the data block is transferred from the buffer area to the memory mapping area, the processing module is further configured to:
when the length of the data block corresponding to the data block is determined to be larger than the residual memory capacity of the memory mapping area, generating a new memory mapping area in the user state storage area;
And transferring the data block from the buffer area to the new memory mapping area.
Optionally, the mapping module is further configured to:
determining WALs mapped with the memory mapping areas based on mapping relations between the memory mapping areas and the WALs;
and calling the memory mapping function, and mapping the data block from the memory mapping area to the determined WAL to obtain the mapped WAL.
Optionally, the apparatus further includes a data reading module, where the data reading module is configured to:
receiving a data reading request, wherein the data reading request at least comprises a data index identifier;
and reading the target data segment from the corresponding data block based on the data index identifier.
Optionally, when the target data segment is read from the corresponding data block based on the data index identifier, the data reading module is further configured to:
and calling the memory mapping function, and mapping the target data segment determined based on the data index identifier into a memory mapping area by the WAL to obtain a mapped memory mapping area.
Optionally, after the received data segment to be processed is added to the preset data block, the device further includes a buffer module, where the buffer module is configured to:
And caching the data block into a data block cache area, wherein the data block cache area is arranged in the user mode storage area.
Optionally, when the target data segment is read from the corresponding data block based on the data index identifier, the data reading module is further configured to:
and when the data block cache area is determined to have the target data block corresponding to the data block identifier, reading a target data segment corresponding to the data segment identifier from the target data block of the data block cache area.
Optionally, the device further includes a verification module, where the verification module is configured to:
calculating target attribute information corresponding to the target data segment, and determining the data attribute information corresponding to the target data segment from the corresponding data block;
and determining an integrity check result of the target data segment based on the target attribute information and the data attribute information.
Optionally, when determining the integrity check result of the target data segment based on the target attribute information and the data attribute information, the check module is further configured to:
when the target length in the target attribute information is determined to be the same as the data length in the data attribute information, and the target information abstract value in the target attribute information is determined to be the same as the data information abstract value in the data attribute information, determining that the integrity check result of the target data segment is complete;
And when the target length is determined to be different from the data length and the target information abstract value is determined to be different from the data information abstract value, determining that the integrity check result of the target data segment is incomplete.
In one aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the memory stores program code that, when executed by the processor, causes the processor to perform the steps of any of the above-mentioned WAL processing methods.
In one aspect, embodiments of the present application provide a computer storage medium storing computer instructions that, when executed on a computer, cause the computer to perform the steps of any of the WAL processing methods described above.
In one aspect, embodiments of the present application provide a computer program product comprising computer instructions stored in a computer-readable storage medium; when the processor of the electronic device reads the computer instructions from the computer readable storage medium, the processor executes the computer instructions, causing the electronic device to perform the steps of any of the WAL processing methods described above.
Due to the adoption of the technical scheme, the embodiment of the application has at least the following technical effects:
receiving data segments to be processed, adding the received data segments to be processed into preset data blocks, when the total number of the data segments to be processed contained in the data blocks is determined to meet the preset number condition, transferring the data blocks from the buffer area of the user state storage area to the memory mapping area arranged in the user state storage area, and mapping the data blocks from the memory mapping area to the WAL of the magnetic disk by calling a preset memory mapping function, thereby obtaining the mapped WAL.
In this way, since the received data to be processed is added in the data block, and when the total number of the data segments to be processed contained in the data block meets the preset number condition, the data block is transferred to the memory mapping area for mapping, and in the process of reading the data, if one segment of data in the WAL is expected to be read, the required data can be directly read from the data block in the WAL, and all the data in the WAL are not required to be read, thereby reducing the resource consumption of the operating system. In addition, in the process of storing the data to be processed, the data to be processed is directly mapped from the memory mapping area of the user mode to the WAL in the disk by calling the memory mapping function, so that the mode of data mapping by calling the memory mapping function is not needed to copy the data to be processed from the memory space of the user mode to the memory space of the kernel mode, the switching consumption from the user mode to the kernel mode is reduced, the consumption of system resources can be reduced, and the system performance in the process of storing operation data is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1 is a flow chart of a WAL processing method in the related art;
fig. 2A is a schematic diagram of an application scenario in an embodiment of the present application;
FIG. 2B is a block chain structure diagram of an embodiment of the present application;
FIG. 2C is a schematic diagram of a verification in an embodiment of the application;
FIG. 3A is a flow chart of a WAL processing method according to an embodiment of the present application;
FIG. 3B is a flow chart of adding a data segment to be processed according to an embodiment of the application;
FIG. 3C is an exemplary diagram of splicing data segments to be processed according to an embodiment of the present application;
FIG. 3D is a schematic diagram of a data block according to an embodiment of the present application;
FIG. 3E is a schematic diagram illustrating a first flow of transferring data blocks according to an embodiment of the present application;
FIG. 3F is a diagram illustrating a first example of transferring a data block according to an embodiment of the present application;
FIG. 3G is a schematic diagram illustrating a second flow of transferring data blocks according to an embodiment of the present application;
FIG. 3H is a diagram illustrating a second example of transferring a data block according to an embodiment of the present application;
FIG. 3I is a third flow chart of transferring a data block according to an embodiment of the present application;
FIG. 3J is a diagram illustrating a third exemplary embodiment of a transfer data block according to the present application;
FIG. 3K is a flow chart of mapping data blocks according to an embodiment of the present application;
FIG. 3L is an exemplary diagram of mapping data blocks in an embodiment of the present application;
FIG. 3M is a flow chart illustrating the reading of data segments according to an embodiment of the present application;
FIG. 3N is a flow chart illustrating the data integrity verification according to an embodiment of the present application;
FIG. 3O is a flowchart illustrating a method for determining an integrity check result according to an embodiment of the present application;
FIG. 4 is a diagram illustrating an exemplary WAL processing method according to an embodiment of the present application;
FIG. 5 is a flow chart illustrating the storage of data segments to be processed according to an embodiment of the present application;
FIG. 6 is a schematic diagram illustrating a flow of reading a data segment according to an embodiment of the present application;
FIG. 7 is a flow chart of writing a data segment to be processed into a memory map region according to an embodiment of the application;
FIG. 8 is a schematic diagram of a data writing process according to an embodiment of the present application;
FIG. 9 is a schematic diagram of another flow of data writing in an embodiment of the present application;
FIG. 10 is a schematic diagram of a WAL processing apparatus according to an embodiment of the present application;
fig. 11 is a schematic diagram of a hardware composition structure of an electronic device to which the embodiment of the present application is applied.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the technical solutions of the present application, but not all embodiments. All other embodiments, based on the embodiments described in the present document, which can be obtained by a person skilled in the art without any creative effort, are within the scope of protection of the technical solutions of the present application.
The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be capable of operation in sequences other than those illustrated or otherwise described.
Some terms in the embodiments of the present application are explained below to facilitate understanding by those skilled in the art.
Data block: a data set comprising at least one data segment to be processed is characterized.
It should be noted that the data block may include one to-be-processed data segment or may include a plurality of to-be-processed data segments, which is not limited in the embodiment of the present application.
A user state storage area: and characterizing the memory space of the operating system running in a user mode, wherein the user mode storage area at least comprises a buffer area and a memory mapping area.
Buffer zone: the token is arranged in the user state storage area and is used for storing the memory space of the data block.
Memory mapping area: the representation is arranged in the user state storage area and is used for mapping the memory space of the data block.
It should be noted that, in the embodiment of the present application, the memory mapping area and the WAL are in a one-to-one mapping relationship, and the data block stored in the memory mapping area can be mapped to the WAL through the memory mapping function.
Mapping pool: the token is arranged in the user state storage area and comprises a memory space of at least one memory mapping area.
Memory mapping function: for mapping blocks of data from the memory mapped region into the WAL within the disk.
The memory mapping function in the embodiment of the present application may be, for example, memory mapping (memroymap), which is not limited.
Data block buffer area: the memory space is arranged in the user state storage area and used for caching the data blocks.
User mode: the method comprises the steps of representing an execution state with limited processing authority, and when an operating system runs in a user state, the operating system is used for running a user program, only limited memory space can be accessed, and peripheral equipment is not allowed to be accessed.
Kernel mode: characterizing the execution state with full processing rights, when the operating system is running in kernel mode, it is possible to access all data and peripherals in the memory space, e.g. hard disk, and also e.g. network card, the operating system can also switch from one program to another.
The following briefly describes the design concept of the embodiment of the present application:
currently, with the development of computer technology, operating data of an operating system may be stored in a WAL, and when the operating system crashes, the operating data may be recovered by reading the WAL.
In the related art, since the operating system has the right to access the disk only when running in the kernel mode, in the process of storing the operating data to the WAL, the operating system is called to create the WAL, the operating data is written into the user-mode storage space according to a certain naming rule, the operating data is copied from the user-mode storage space to the operating system running in the kernel mode storage space, and finally, the operating data is stored into the WAL in the disk from the kernel mode storage space. In order to increase the speed of reading the operation data, in the related art, when the WAL file in the disk is too large, a corresponding log snapshot is generated and stored, so that the log snapshot can be directly read when the operation data is read.
However, in this manner in the related art, since the operation data needs to be copied from the user-state storage space to the kernel-state storage space, a large amount of system resources are consumed in the process of data copying, thereby degrading the performance of the operating system in storing the operation data.
In addition, in this manner in the related art, when the WAL file is excessively large, a corresponding log snapshot is generated and stored, and all operation data is bound as a whole, however, if the related data node is included in the log snapshot, the whole log snapshot needs to be read first, and then the corresponding operation data is extracted from the log snapshot, so that the resource consumption of the operating system is increased.
In view of the above, the embodiments of the present application provide a WAL processing method, a device, an electronic apparatus, and a storage medium. The memory mapping function provided by the operating system is introduced, a piece of memory space is mapped in the user state storage area to correspond to the WAL of the magnetic disk, namely, a mapping relation exists between the memory mapping area and the WAL, when a data segment to be processed is received, a data block containing the data segment to be processed is restored to the created memory mapping area, so that the data block can be mapped from the memory mapping area to the WAL of the magnetic disk by calling the memory mapping function, the switching consumption of the operating system from the user state to the kernel state is reduced, and the system performance for storing the operating data is improved.
The preferred embodiments of the present application will be described below with reference to the accompanying drawings of the specification, it being understood that the preferred embodiments described herein are for illustration and explanation only, and not for limitation of the present application, and that the embodiments of the present application and the features of the embodiments may be combined with each other without conflict.
Fig. 2A is a schematic diagram of an application scenario in an embodiment of the present application. The application scenario diagram includes a data sharing system 200.
In the embodiment of the present application, the data sharing system 200 refers to a system for performing data sharing between nodes, where the data sharing system may include a plurality of nodes 201, and the plurality of nodes 201 may be clients that are respectively provided with an operating system in the data sharing system. Each node 201 may receive input information while operating normally and maintain shared data within the data sharing system based on the received input information. In order to ensure the information intercommunication in the data sharing system, information connection can exist between each node in the data sharing system, and the nodes can transmit information through the information connection. For example, when any node in the data sharing system receives input information, other nodes in the data sharing system acquire the input information according to a consensus algorithm, and store the input information as data in the shared data, so that the data stored on all nodes in the data sharing system are consistent. And, when the node 201 is running the operating system, operation data, i.e. data segments to be processed, are generated, and after each node 201 generates a data segment to be processed, a data block containing the data segment to be processed is mapped into the WAL of the disk by the memory mapping area.
In one possible implementation, the nodes of the blockchain may be directly or indirectly connected through wired or wireless communication, which is not limited in this embodiment of the present application.
It should be noted that, for each node in the data sharing system, there is a node identifier corresponding to the node, and each node in the data sharing system may store the node identifiers of other nodes in the data sharing system, so that the generated block may be broadcast to other nodes in the data sharing system according to the node identifiers of other nodes, and after receiving the data segment to be processed, the other nodes may also add the received data segment to be processed to a preset data block, and map the data block containing the data segment to be processed to the WAL of its own disk by the memory mapping area. And, each node can maintain a node identification list shown in the following table, and the node name and the node identification are correspondingly stored in the node identification list. The node identifier may be an IP (Internet Protocol, protocol of interconnection between networks) address and any other information that can be used to identify the node, and is shown in table 1, where the table 1 is only illustrated by taking an IP address as an example in the embodiment of the present application.
Table 1.
Node name Node identification
Node 1 117.114.151.174
Node 2 117.116.189.145
Node N 117.123.199.201
Each node in the data sharing system stores one and the same blockchain. Referring to fig. 2B, a block chain structure diagram of an embodiment of the present application is shown, wherein the block chain is composed of a plurality of blocks, the starting block includes a block header and a block body, the block header stores an input information characteristic value, a version number, a timestamp and a difficulty value, and the block body stores input information; the next block of the starting block takes the starting block as a father block, the next block also comprises a block head and a block main body, the block head stores the input information characteristic value of the current block, the block head characteristic value of the father block, the version number, the timestamp and the difficulty value, and the like, so that the block data stored in each block in the block chain are associated with the block data stored in the father block, and the safety of the input information in the block is ensured.
When each block in the blockchain is generated, referring to fig. 2C, which is a schematic diagram of verification in the embodiment of the present application, when the node where the blockchain is located receives the input information, the input information is verified, after the verification is completed, the input information is stored in the memory pool, and the hash tree used for recording the input information is updated; then, updating the update time stamp to the time of receiving the input information, trying different random numbers, and calculating the characteristic value for a plurality of times, so that the calculated characteristic value can meet the following formula:
SHA256(SHA256(version+prev_hash+merkle_root+ntime+nbits+x))<TARGET
Wherein SHA256 is a eigenvalue algorithm used to calculate eigenvalues; version (version number) is version information of the related block protocol in the block chain; the prev_hash is the block header characteristic value of the parent block of the current block; the merkle_root is a characteristic value of input information; ntime is the update time of the update timestamp; the nbits is the current difficulty, is a fixed value in a period of time, and is determined again after exceeding a fixed period of time; x is a random number; TARGET is a eigenvalue threshold that can be determined from nbits.
Thus, when the random number meeting the formula is calculated, the information can be correspondingly stored to generate the block head and the block main body, and the current block is obtained. And then, the node where the blockchain is located sends the newly generated blocks to other nodes in the data sharing system where the newly generated blocks are located according to the node identification of other nodes in the data sharing system, the other nodes verify the newly generated blocks, and the newly generated blocks are added into the blockchain stored in the newly generated blocks after the verification is completed.
The following describes an application scenario in the embodiment of the present application.
Application scenario one:
the method in the embodiment of the application can be applied to WAL used in a block chain consensus mechanism. Specifically, in order to make the blockchain network agree, a consensus algorithm may be used to make a decision of an event, so when one node in the blockchain network receives operation data sent by other nodes for a certain event, the received operation data is stored in a data block of a buffer area set in a user state storage area, when the total number of data segments to be processed contained in the data block is determined to meet a preset number condition, the data block is transferred from the buffer area of the user state storage area to a memory mapping area set in the user state storage area, and the data block is mapped from the memory mapping area to a WAL of a disk by calling a preset memory mapping function, thereby realizing the storage of the operation data, and a decision result of the event can be determined based on the stored operation data and combining the consensus algorithm.
And (2) an application scene II:
in the process of using the operating system, corresponding operation data are generated, after the operation data are generated, the operation data are added into data blocks preset in a buffer zone, after the data blocks are fully written, the data blocks are transferred from the buffer zone to a memory mapping zone, and the data blocks are mapped into WALs of the magnetic discs from the memory mapping zone through a preset memory mapping function. When the operating system crashes, the operating data in the WAL can be read through the memory mapping function, so that the operating system is restored to the working state before crashing.
The following describes the WAL processing procedure in the embodiment of the present application with reference to the accompanying drawings, and referring to fig. 3A, a schematic flow chart of a WAL processing method in the embodiment of the present application is shown, and a specific WAL processing flow is as follows:
s30: and receiving the data segment to be processed, and adding the received data segment to be processed into a preset data block.
Wherein the data block is arranged in a buffer area of the user mode storage area.
In the embodiment of the application, when the data segment to be processed is determined to be received, the data block arranged in the buffer area is read, and the received data segment to be processed is added into the read data block.
It should be noted that, when the data segment to be processed is received, the read data block may not include the data segment to be processed, and of course, the read data block may also include at least one data segment to be processed, that is, the data block may also include the data segment to be processed received at other time points, which is not limited.
Optionally, in the embodiment of the present application, in order to facilitate mapping a data set into a memory mapping area, a data segment to be processed may be transferred into the memory mapping area in the form of a data block, in the process of adding the data segment to be processed into the data block, data attribute information corresponding to the data segment to be processed and the data segment to be processed may be spliced and then added into the data block, and in the embodiment of the present application, the process of adding the data segment to be processed into the data block is described below, referring to fig. 3B, which is a schematic flow chart of adding the data segment to be processed in the embodiment of the present application, specifically including:
s301: and determining the data attribute information corresponding to the received data segment to be processed.
In the embodiment of the application, attribute calculation is carried out on the received data segment to be processed, and data attribute information corresponding to the data segment to be processed is obtained.
Wherein the data attribute information includes at least one of: data length, data information digest value.
It should be noted that, the data length is the length of the data segment to be processed, for example, may be 64kb, which is not limited in the embodiment of the present application.
The data information digest value is obtained by performing information digest calculation on the data segment to be processed by adopting a preset information digest algorithm, and the information digest algorithm can be, for example, a cyclic redundancy check (Cyclic Redundancy Check, CRC) 32 algorithm or a CRC16 algorithm, which is not limited in the embodiment of the application.
S302: and splicing the data attribute information and the data segments to be processed according to a preset splicing format to obtain spliced data segments to be processed.
In the embodiment of the application, the data attribute information and the data segment to be processed are spliced according to the preset splicing format, so that the spliced data segment to be processed is obtained, and the spliced data segment to be processed comprises the data segment to be processed and the data attribute information.
When the data segment to be processed and the data attribute information are spliced, the last character of the data attribute information can be connected with the first character of the data segment to be processed, and of course, the last character of the data segment to be processed can also be connected with the first character of the data attribute information.
If the data attribute information is the data length and the data information abstract value, the data length and the data information abstract value are spliced according to a preset splicing format, and then the data segment to be processed, the spliced data length and the spliced data information abstract value are spliced again to obtain the spliced data segment to be processed.
For example, referring to fig. 3C, in an exemplary diagram of splicing a to-be-processed data segment according to the embodiment of the present application, it is assumed that data attribute information includes a data length and a CRC32 value, first, the to-be-processed data segment is calculated to obtain the data length of the to-be-processed data segment, and meanwhile, the CRC32 algorithm is adopted to calculate the to-be-processed data segment to obtain the CRC32 value of the to-be-processed data segment, where the data length is 8 bits, the CRC32 value is 8 bits, the to-be-processed data segment is 184 bits, and then, the to-be-processed data segment is spliced according to the format of "data length+crc 32 value+to-be-processed data segment", so as to obtain the spliced to-be-processed data segment.
It should be noted that, in the embodiment of the present application, since the data attribute information of different data segments to be processed may be calculated by adopting different attribute calculation manners, the data formats of the data attribute information calculated by different data segments to be processed may be different, for example, the format of the data information digest value calculated based on the CRC32 algorithm and the format of the data information digest value calculated based on the hash algorithm are different, so that in order to facilitate storage of the data attribute information, the format normalization may be performed on the data attribute information, so that the formats of the data attribute information determined based on different calculation algorithms are all the same, that is, the number of bytes occupied by each data attribute information is the same.
In addition, it should be noted that, in the embodiment of the present application, the same attribute calculation manner may be used to determine the data attribute information of different data segments to be processed, and of course, different attribute calculation manners may be used to determine the data attribute information of different data segments to be processed, which is not limited in the embodiment of the present application.
S303: and adding the spliced data segments to be processed into a preset data block.
In the embodiment of the application, after the spliced data segment to be processed is obtained, the spliced data segment to be processed is added into the preset data block, so that the added data block is obtained, and thus, the data attribute information corresponding to the data segment to be processed, the spliced data segment to be processed and the data block are added, the accurate position of the required data segment can be rapidly positioned when the data segment is read, and the data reading efficiency is improved.
For example, a data block in an embodiment of the present application may be divided into two parts, one for storing a data index and one for storing a data segment to be processed, and first storing the data segment to be processed, where each data segment to be processed is divided into three parts: data length, CRC32 value, data segment to be processed. Each time the serialized data segment to be processed is written, the length of the data segment to be processed is calculated first, then the CRC32 value is calculated, and finally the data part of the data block is written according to the mode of the data length + the CRC32 value + the data segment to be processed. Next, a data index of the data segment to be processed is recorded in the index area.
It should be noted that, the recording of the data length is to ensure that the data segment is verified to be lost when the data segment is read, and the recording of the CRC32 value is to verify the integrity of the data segment.
For example, referring to fig. 3D, a schematic structure of a data block in an embodiment of the present application is shown, where the data block includes a data index area and a data area, the data index area stores a start position 0 and an end position 199 corresponding to a data segment a to be processed, a start position 200 and an end position 399 corresponding to a data segment B to be processed, a start position 400 and an end position 599 corresponding to a data segment C to be processed, the data area stores a data length of the data segment a to be processed, a CRC32 value and a data segment to be processed, a data length of the data segment B to be processed, a CRC32 value and a data segment to be processed, and a data length of the data segment C to be processed, a CRC32 value and a data segment to be processed.
S31: when the total number of the data segments to be processed contained in the data block is determined and the preset number condition is met, the data block is transferred from the buffer area to the memory mapping area.
The memory mapping area is arranged in the user mode storage area.
In the embodiment of the application, whether the total number of the data segments to be processed contained in the data block meets the preset number condition is judged, and when the total number is confirmed to meet the preset number condition, the data block is transferred from the buffer area to the memory mapping area.
In the embodiment of the application, the data blocks are transferred to the memory mapping area according to the index sequence corresponding to the data blocks, and the data block identification is set for each data block, so that the required data block can be accurately searched and read based on the data block identification in the subsequent data segment reading process, and the data reading efficiency is improved.
It should be noted that, in the embodiment of the present application, the total number of data segments to be processed that can be stored in a data block may be determined based on a storage mode corresponding to the data block, and referring to fig. 3E, a schematic flow chart for transferring the data block in the embodiment of the present application specifically includes:
s311: and if the storage mode corresponding to the data block is determined to be a single data segment, the data block is transferred from the buffer area to the memory mapping area.
In the embodiment of the application, each data block is a data set, and in the data set, one data segment can be contained, and a plurality of data segments can be contained, so that the storage mode of the data block can be divided into a single data segment and a plurality of data segments based on the number of the contained data segments, and if the storage mode corresponding to the data block is the single data segment, only one data segment to be processed can be stored in the data block, therefore, after the data segment to be processed is added into the data block, the data block is transferred from the buffer area to the memory mapping area.
S312: if the storage mode is determined to be the multi-data segment, when the total number of the data segments to be processed contained in the data block is determined to be greater than a preset number threshold, the data block is transferred from the buffer area to the memory mapping area.
In the embodiment of the application, if the storage mode corresponding to the data block is determined to be multiple data segments, at least two data segments to be processed can be stored in the data block, so that after the data segments to be processed are added to the data block, the total number of the data segments to be processed contained in the data block is determined, if the total number is determined to be greater than a preset number threshold value, the data block is determined to be full of the data segments to be processed, other data segments to be processed can not be stored any more, the data block is transferred from the buffer area to the memory mapping area, if the total number is determined not to be greater than the number threshold value, the other data segments to be processed can be stored in the data block, and the other data segments to be processed are continuously received until the total number is greater than the number threshold value, and then the data block is transferred from the buffer area to the memory mapping area.
For example, referring to fig. 3F, an exemplary diagram of transferring a data block in an embodiment of the present application is shown, firstly, it is determined that a storage mode corresponding to the data block 2 is a multi-data segment, then, it is determined that the total number of data segments to be processed included in the data block 2 is 2, at this time, it is determined that the total number 2 is equal to a preset number threshold 2, then, the data block 1 is transferred to a memory mapping area, and the memory mapping area includes the data block 1 and the data block 2.
In one possible implementation manner of the embodiment of the present application, a storage mode of a data block may be set to be multiple data segments by setting a parameter in a service system to be blockmultitdata=true, so that one data block supports writing of multiple data segments, and meanwhile, when a total number of data segments to be processed included in the data block is greater than a number threshold set by the parameter BlockSize, the data block is transferred to a memory mapping area, so that an effect of improving writing performance under a high-frequency writing condition is achieved.
Whether to open the blockmulti data or not can be configured according to the service requirement of the operating system, specifically, when the data length of the data block mapped to the WAL by the operating system each time is greater than a preset length threshold, the blockmulti data can be selected not to be opened, namely, the storage mode of the data block is set to be a single data segment, and when the data length of the data block mapped to the WAL by the operating system each time is not greater than the preset length threshold, the blockmulti data can be selectively opened, namely, the storage mode of the data block is set to be a multiple data segment, the mode of the multiple data segment is more suitable for the data with smaller data length, and when the data length is mapped frequently, the storage mode is set to be the multiple data segment, so that the mapping frequency can be greatly reduced, and the system performance is improved.
It should be noted that, in the embodiment of the present application, the length threshold may be set according to the memory capacity of the WAL, and may also be set according to the actual service requirement, which is not limited in the embodiment of the present application.
In addition, in the embodiment of the application, the data set in each data block can be ensured to be quite large and not too small by setting the parameters of the length of the data block.
It should be noted that, if the storage mode of the data block is multiple data segments, the data segments to be processed may be added to the data block according to the sequence of adding the data segments to be processed to the data block and according to a preset splicing format, so that the position of the target data segment to be read in the data block can be rapidly located in the subsequent process of reading the data segment.
Further, the memory mapping area in the embodiment of the present application may be read from the mapping pool when there is insufficient remaining memory capacity, or may be newly created when there is insufficient remaining content capacity, so that the process of transferring the data block may be divided into the following two types based on the establishment manner of the memory mapping area:
the first way of establishment: the memory map area is read from the map pool, and referring to fig. 3G, a first flow chart of transferring data blocks in an embodiment of the present application specifically includes:
A11: when the length of the data block corresponding to the data block is determined to be larger than the residual memory capacity of the memory mapping area, a memory mapping area with an idle working state is randomly read from a preset mapping pool.
The mapping pool comprises a plurality of memory mapping areas.
In the embodiment of the application, whether the length of the data block corresponding to the data block is larger than the residual memory capacity of the memory mapping area is judged, when the length of the data block corresponding to the data block is determined to be larger than the residual memory capacity of the memory mapping area, the memory mapping area does not have enough storage space for storing the data block, so that one memory mapping area is randomly read from the memory mapping area with idle working states contained in the mapping pool, and when the length of the data block corresponding to the data block is determined to be not larger than the residual memory capacity of the memory mapping area, the memory mapping area is determined to have enough storage space for storing the data block, and therefore, the data block can be directly transferred into the memory mapping area.
It should be noted that, in the embodiment of the present application, the remaining memory capacity is the remaining content capacity in the memory map area, and the following describes a calculation manner of the remaining memory capacity in the embodiment of the present application, specifically, first, the memory capacity of each data block stored in the memory map area is determined, and then, based on the difference between the total memory capacity of the memory map area and the memory capacity, the remaining memory capacity of the memory map area is obtained.
A12: and transferring the data block into the read memory mapping area.
In the embodiment of the application, after the memory mapping area is read out, the data block is transferred to the read out memory mapping area.
For example, referring to fig. 3H, in the first exemplary diagram of transferring data blocks in the embodiment of the present application, first, the data block length of the data block M is determined to be 12kb, and at the same time, the remaining memory capacity of the memory map area is determined to be 11kb, and it is determined that the memory map area 1 does not have enough memory space to store the data block, so that the memory map area 4 is read from the memory map area 2, the memory map area 3, the memory map area 4 and the memory map area 4 included in the map pool, and the data block M is stored in the memory map area 4.
It should be noted that, since each memory mapped area has its mapped WAL, when the remaining memory capacity of the memory mapped area is insufficient, the remaining content capacity of the WAL in the disk is also insufficient.
In this way, when the operating system is initialized, a certain number of memory mapping areas and the WALs mapped in advance are mapped in advance, and each memory mapping area mapped in advance is put into the mapping pool, so that the operating system can take out the free memory mapping area from the mapping pool for writing of data blocks, and when the residual memory capacity of the memory mapping area corresponding to one mapping is insufficient, a new memory mapping area and the corresponding WAL can be obtained from the mapping pool. And a new memory mapping area does not need to be waited to be created in the service flow, so that the blocking time is saved, and the writing performance of Wal is improved.
The second way is: referring to fig. 3I, a second flow diagram of transferring a data block according to an embodiment of the present application is shown, which specifically includes:
a21: when the length of the data block corresponding to the data block is determined to be larger than the residual memory capacity of the memory mapping area, a new memory mapping area is generated in the user state storage area.
In the embodiment of the application, the length of the data block corresponding to the data block is calculated, then, whether the length of the data block is larger than the residual memory capacity of the memory mapping area is judged, if the length of the data block is not larger than the residual memory capacity, the memory mapping area is determined to have enough storage space for storing the data block, so that the data block is directly transferred from the buffer area to the memory mapping area, and if the length of the data block is determined to be larger than the residual memory capacity, the memory mapping area is determined to not have enough space for storing the data block, so that a new memory mapping area is generated, and a mapping relation between the new memory mapping area and the corresponding WAL is established.
A22: and transferring the data blocks from the buffer area to a new memory mapping area.
In the embodiment of the application, after the memory mapping area is regenerated, the data block is transferred from the buffer area to the newly generated memory mapping area.
For example, referring to fig. 3J, in a second exemplary diagram of transferring a data block according to the embodiment of the present application, first, a data block N is calculated to have a corresponding data block length of 12kb, and at the same time, the remaining memory capacity of the memory map area 1 is determined to be 10kb, and it is determined that the memory map area 1 does not have enough memory space to store the data block, so that a memory map area 2 is regenerated, a mapping relationship between the regenerated memory map area 2 and a corresponding WAL is established, and finally, the data block is stored in the newly generated memory map area 2.
In this way, a set of memory mapping area and WAL for storing and reading data segments need to be mapped, when the residual memory capacity of the mapped memory mapping area and WAL is insufficient, a new mapping is re-created, that is, the memory mapping area and WAL and the mapping relation between the two are re-created, mapping resources are not required to be consumed in advance, and the mapping resources can be required to be allocated as required.
Further, in the embodiment of the present application, a primary-backup mapping manner may be used to create the mapping, that is, two mappings are created in advance, one is a primary mapping and one is a backup mapping, and when the data segment to be processed is stored, the primary mapping manner is generally used. When the disk corresponding to the main mapping is full, the standby mapping can be directly used, so that blocking cannot be generated. And a new backup map can be created by asynchronously opening a coroutine or thread.
In the embodiment of the application, a main map and a plurality of standby maps can be used. In this way, excessive mapping can be prevented from being created in advance, so that resource waste is avoided, and the performance is prevented from being influenced by mapping creation in the process.
S32: and calling a preset memory mapping function, and mapping the data blocks from the memory mapping area to the WAL in the disk to obtain the mapped WAL.
In the embodiment of the application, a preset memory mapping function is called to map the data block from the memory mapping area to the WAL, so that the mapped WAL is obtained, and the subsequent processing can be performed based on the mapped WAL.
In the embodiment of the application, before the data block is mapped from the memory mapping area to the WAL, the mapping relation between the memory mapping area and the WAL in the disk is established by calling the operating system, so that the mapping relation can be established before the data block is mapped after the operating system is started, and the mapping relation can be established in advance in the initialization stage of the operating system, thereby reducing the operation steps and reducing the resource consumption of the operating system.
Optionally, in the embodiment of the present application, there may be a plurality of memory mapping areas and WALs, so that a mapping relationship is established in advance between each memory mapping area and a corresponding WAL, so that in a subsequent mapping process, a data block may be mapped to a corresponding WAL based on the mapping relationship, and referring to fig. 3K, a process of mapping a data block in the embodiment of the present application is described, which specifically includes:
S321: based on the mapping relation between each memory mapping area and each WAL, the WAL mapped with the memory mapping area is determined.
In the embodiment of the application, when the memory mapping area is established, corresponding WALs are established in the disk at the same time, so that the WALs mapped with the memory mapping areas are determined from the WALs based on the mapping relation between the memory mapping areas and the corresponding WALs.
S322: and calling a memory mapping function, and mapping the data block from the memory mapping area to the determined WAL to obtain the mapped WAL.
In the embodiment of the application, after the WAL is determined, a memory mapping function is called to map the data block from the memory mapping area to the determined WAL, so as to obtain the mapped WAL.
It should be noted that, in the embodiment of the present application, if only one memory mapping area and WAL are included, the WAL is not required to be determined, and the data block can be directly mapped into the WAL from the memory mapping area.
For example, referring to fig. 3L, an exemplary diagram of mapping data blocks in an embodiment of the present application is shown, each memory mapping area has a WAL mapped by the same, where a mapping relationship exists between memory mapping area 1 and WAL1, a mapping relationship exists between memory mapping area 2 and WAL2, a mapping relationship exists between memory mapping area 3 and WAL3, a mapping relationship exists between memory mapping area 4 and WAL4, a WAL1 mapped by the same is determined based on a mapping area identifier 1 corresponding to the memory mapping area, and then, data block N is mapped from memory mapping area 1 to WAL1, thereby obtaining mapped WAL1.
Further, in the embodiment of the present application, after obtaining the mapped WAL, the corresponding data segment may be read from the WAL after the operating system crashes, as shown in fig. 3M, which is a schematic flow chart for reading the corresponding data segment in the embodiment of the present application, specifically including:
m1: a data read request is received.
Wherein the data read request includes at least a data index identification.
In the embodiment of the application, when the target object needs to recover the operating system, the data reading request can be triggered and generated in the operating system.
Wherein the data reading request comprises a data index identifier.
It should be noted that, the data index identifier at least includes one of the following: data block identification, data segment identification. When the whole data block needs to be read, the data index identifier comprises the data block identifier and the data segment identifier.
M2: based on the data index identification, the target data segment is read from the corresponding data block.
In the implementation of the application, when the data index identifier comprises a data block identifier and a data segment identifier, a data block corresponding to the data block identifier is determined from all the data blocks based on the data block identifier, and then a target data segment corresponding to the data segment identifier is read from the determined data block based on the data segment identifier.
Optionally, in the embodiment of the present application, a possible implementation manner is provided for reading the target data segment, which specifically is: and calling a memory mapping function, and mapping the target data segment determined based on the data index identifier into a memory mapping area by the WAL to obtain a mapped memory mapping area.
In the embodiment of the application, firstly, a target data segment is determined based on a data index identifier, then, a memory mapping function is called to map the target data segment from WAL to a memory mapping area, and thus, the data index identifier is used.
Specifically, a target data block can be determined from each data block based on the data block identifier in the data index identifier, a target data segment can be determined from the target data block based on the data segment identifier in the data index identifier, and finally, the determined target data segment is mapped from the WAL to the memory mapping area by calling the memory mapping function.
Optionally, in the embodiment of the present application, since a data block buffer area is further provided in the user state storage area, after the received segments to be processed are added to the preset data blocks, the data blocks may be further buffered in the data block buffer area, and the preset number of data blocks are buffered in the data block buffer area, so that the data reading performance can be improved.
It should be noted that, in the embodiment of the present application, the data block buffer area is cached with the latest preset number of data blocks, and the execution process of caching the data blocks into the data block buffer area may be executed in parallel with the execution process of transferring the data blocks into the memory map area, so that the time consumption for storing the data blocks can be reduced, and the system performance of the operating system when storing the operating data can be improved.
In addition, it should be noted that, in the embodiment of the present application, when the data blocks are cached according to the generation sequence of the data blocks, and in the data block cache region, a preset number of data blocks with the latest time point are cached, if the data block cache region does not have enough content space to cache other data blocks, after a new data block is generated, the data block with the earliest time point in the data block cache region is deleted, so that the latest data block is always cached in the data block cache region. For example, when the data block J is newly generated, assuming that 4 data blocks can be cached in the data block cache region, and that the data block L, the data block M, the data block N, and the data block O are cached in the data block cache region, the data block L with the earliest time point is determined from the data blocks, and the data block L is deleted, and the data block J is cached in the data block cache region.
Therefore, after receiving the data reading request, it may be determined whether there is a target data block corresponding to the data block identifier in the data block buffer based on the data block identifier, and specifically, the following two cases may be classified:
first case: the data block buffer area has a target data block corresponding to the data block identification.
The method specifically comprises the following steps: and when the target data block corresponding to the data block identification exists in the data block cache area, reading the target data segment corresponding to the data segment identification from the target data block of the data block cache area.
In the embodiment of the application, the data segments are stored in the form of data blocks, and the data segments are provided with the data segment identifiers, so that the data segments can be rapidly positioned in the data blocks, and therefore, when the data block buffer area is determined to have the target data blocks corresponding to the data block identifiers, the target data segments corresponding to the data segment identifiers are read from the target data blocks of the data block buffer area based on the data segment identifiers, so that the corresponding target data segments can be directly read from the data block buffer area, and the data reading performance can be greatly improved.
Second case: the data block buffer area does not have a target data block corresponding to the data block identification.
In the embodiment of the application, when the target data block corresponding to the data block identifier does not exist in the data block cache region, the corresponding target data segment is read from the WAL.
Further, in the embodiment of the present application, in order to improve the integrity of the determined target data segment, the integrity of the target data segment may be checked, and referring to fig. 3N, a flow chart of the data integrity check in the embodiment of the present application specifically includes:
n1: and calculating target attribute information corresponding to the target data segment, and determining the data attribute information corresponding to the target data segment from the corresponding data block.
In the embodiment of the application, attribute calculation is performed on the target data segment to obtain the target attribute information corresponding to the target data segment, and the target data block not only contains the target data segment but also contains the data attribute information corresponding to the target data segment, so that the data attribute information corresponding to the target data segment is determined from the target data block.
Therefore, the target attribute information in the embodiment of the application is determined by performing attribute calculation based on the read target data segment.
N2: and determining an integrity check result of the target data segment based on the target attribute information and the data attribute information.
In the embodiment of the application, the integrity check result of the target data segment is determined based on the target attribute information and the data attribute information, specifically, if the target attribute information is determined to be the same as the data attribute information, the integrity check result of the target data segment is determined to be passed, and if the target attribute information is determined to be different from the data attribute information, the integrity check result of the target data segment is determined to be failed, so that the integrity check result of the target data segment can be determined through the target attribute information corresponding to the target data segment and the read data attribute information, thereby ensuring the integrity of the target data segment.
Optionally, in the embodiment of the present application, a possible implementation manner is provided for determining the integrity check result of the target data segment, and referring to fig. 3O, a flowchart of determining the integrity check result in the embodiment of the present application is shown, which specifically includes:
n21: and when the target length in the target attribute information is determined to be the same as the data length in the data attribute information, and the target information abstract value in the target attribute information is determined to be the same as the data information abstract value in the data attribute information, determining that the integrity check result of the target data segment is complete.
In the embodiment of the application, whether the target length in the target attribute information is the same as the data length in the data attribute information or not and whether the target information abstract value in the target attribute information is the same as the data information abstract value in the data attribute information or not are judged, and the method can be concretely divided into the following four cases:
first case: the target length is the same as the data length and the target message digest value is the same as the data message digest value.
In the embodiment of the application, if the target length is determined to be the same as the data length and the target information abstract value is determined to be the same as the data information abstract value, the integrity check result of the target data segment is determined to be complete.
Second case: the target length is different from the data length, and the target information abstract value is the same as the data information abstract value.
In the embodiment of the application, if the target length is determined to be different from the data length, but the target information abstract value is the same as the data information abstract value, the integrity check result of the target data segment is determined to be incomplete.
For example, the target length of the obtained target data segment is calculated to be 184, the target information digest value of the obtained target data segment is calculated to be a, the data length of the read target data segment from the data block is calculated to be 182, and the data information digest value of the read target data segment from the data block is calculated to be a, and therefore, it is determined that the target length of the target data segment is not identical to the data length, but the target information digest value is identical to the data information digest value, and thus it is determined that the integrity check result of the target data segment is incomplete.
Third case: the target length is the same as the data length, and the target message digest value is different from the data message digest value.
In the embodiment of the application, if the target length is the same as the data length but the target information abstract value is different from the data information abstract value, the integrity check result of the target data segment is determined to be incomplete.
Fourth case: the target length is different from the data length, and the target message digest value is different from the data message digest value.
In the embodiment of the application, if the target length is determined to be different from the data length and the target information abstract value is determined to be different from the data information abstract value, the integrity check result of the target data segment is determined to be incomplete.
N22: when the target length is determined to be different from the data length and the target information abstract value is determined to be different from the data information abstract value, determining that the integrity check result of the target data segment is incomplete.
In the embodiment of the present application, based on the fourth situation, it is known that when it is determined that the target length is different from the data length and the target information digest value is different from the data information digest value, it is determined that the integrity check result of the target data segment is incomplete.
In the embodiment of the application, a plurality of segments of data are stored in the data block, the integrity of the data segment is ensured by calculating the data length and the data information abstract value, and the consistency of the data segment can be ensured by verifying the data length and the data information abstract value when the data is read.
It should be noted that, the CRC in the embodiment of the present application is not limited to CRC32, and may be used for integrity checking, for example: CRC12, CRC16, CRC8, and the like.
In the embodiment of the application, the memory mapping function is called to map the data blocks from the memory mapping area to the WAL of the disk, so that the performance problem generated during writing and reading for many times can be solved, the characteristics of an operating system can be more effectively utilized, and the performance consumption caused by data storage is reduced; meanwhile, the data segments to be processed are buffered according to the sequence, so that the reading performance is greatly improved.
Based on the foregoing embodiments, a specific example is used to describe the WAL processing method in the embodiment of the present application, and referring to fig. 4, an exemplary diagram of the WAL processing method in the embodiment of the present application specifically includes:
firstly, determining the data length and CRC32 value of a received data segment M to be processed, splicing the data length, the CRC32 value and the data segment M to be processed according to a preset splicing format to obtain a spliced data segment M to be processed, and adding the spliced data segment M to be processed into a data block P.
Secondly, since the data block P includes the data segment M to be processed and the data segment N to be processed, the total number of the data segments to be processed included in the data block P is determined to be 2, and if the total number of the data block P is determined to be not less than the preset number threshold 2, the data block P is cached from the buffer area to the memory mapping area a.
Then, determining WAL1 with a mapping relation with the memory mapping area A, and mapping the data block P from the memory mapping area A to the WAL1 by calling a memory mapping function, thereby obtaining the mapped WAL1.
Based on the foregoing embodiments, referring to fig. 5, a schematic flow chart of storing WAL in an embodiment of the present application is shown, which specifically includes:
s500: and enters WAL.
S501: a data segment to be processed is received.
S502: and adding the received data segment to be processed into the data block.
S503: whether the data block is full is judged, if yes, S504 is executed, and if no, S501 is executed again.
In the embodiment of the present application, when judging whether the data block is full, the method may be implemented by determining the total number of the data segments to be processed included in the data block, if the total number of the data segments to be processed included in the data block is determined to be greater than a preset number threshold, determining that the data block is full, executing S504, if the total number is determined not to be greater than the preset number threshold, determining that the data block is not full, executing S501 again, and continuing to receive the data segments to be processed.
S504: and caching the data block into a data block cache area.
In the embodiment of the application, the data block buffer area is arranged in the user mode storage area.
S505: and judging whether the total number of the data blocks contained in the data block cache area exceeds a data block number threshold, if so, executing S506, and if not, executing S507.
S506: the data block with the earliest time point is deleted.
S507: and (5) finishing the caching.
In the embodiment of the application, after the caching is finished, the data blocks are stored in the data block cache area.
S508: and transferring the data block into the memory mapping area.
S509: and calling a memory mapping function to map the data block from the memory mapping area to the WAL.
S510: the WAL ends.
Based on the above embodiments, referring to fig. 6, a flow chart of reading a data segment in an embodiment of the application specifically includes:
s600: and enters WAL.
S601: whether all WALs are read is determined, if yes, S602 is executed, and if no, S603 is executed.
In the embodiment of the application, whether all the data segments in the WAL need to be read is judged, if yes, the all the data segments are mapped from the WAL to the memory mapping area by calling the memory mapping function, and if not, the target data segments are judged to be read.
S602: and mapping all the data segments from the WAL to the memory mapping area by calling a memory mapping function, and reading all the data segments from the memory mapping area.
S603: whether the data block cache area is hit is judged, if yes, S604 is executed, and if not, S605 is executed.
In the embodiment of the application, based on the data block identifier, whether a target data block corresponding to the data block identifier exists in the data block buffer area is determined, if the target data block corresponding to the data block identifier exists in the data block buffer area, the hit data block buffer area is determined, a corresponding target data segment is read from the data block buffer area, if the target data block corresponding to the data block identifier does not exist in the data block buffer area, the miss data block buffer area is determined, and the corresponding target data segment is read from the memory mapping area.
S604: based on the data segment identification, the corresponding target data segment is read from the data block in the data block cache.
S605: and mapping the target data segment corresponding to the data segment identifier from the WAL to the memory mapping area by calling a memory mapping function, and reading the mapped target data segment from the memory mapping area.
S606: the WAL ends.
Based on the above embodiments, referring to fig. 7, a flow chart of writing a data segment to be processed into a memory map area according to an embodiment of the present application specifically includes:
s700: and enters WAL.
S701: waiting for the addition of a pending data segment to the data block.
S702: to the current data block.
In the embodiment of the application, the data segment to be processed is added to the current data block.
S703: whether the storage mode of the data block is the multi-data segment is judged, if yes, S704 is executed, and if not, S705 is executed.
S704: whether the data block is full is judged, if yes, S705 is executed, and if not, S701 is executed again.
In the embodiment of the application, whether the data block is full is judged by determining whether the total number of the data segments to be processed contained in the data block is larger than the preset number threshold value.
S705: and transferring the data block into the memory mapping area.
S706: other flows are performed.
In an embodiment of the present application, other processes include mapping the data blocks in the memory mapped region to the WAL.
Based on the above embodiments, referring to fig. 8, a flow chart of data writing in an embodiment of the present application specifically includes:
s800: initializing a memory mapping area.
S801: and transferring the data block containing the data segment to be processed into the memory mapping area.
S802: whether the memory mapping area is full is determined, if yes, S803 is executed, and if no, S804 is executed.
S803: a new memory map area is generated.
S804: other flows are performed.
In an embodiment of the present application, other processes include transferring the newly received data block to a newly generated memory map region.
Based on the above embodiments, referring to fig. 9, another flow chart of data writing in the embodiment of the application specifically includes:
s900: initializing a memory mapping area.
S901: and transferring the data block containing the data segment to be processed into the memory mapping area.
S902: and judging whether the memory mapping area is full, if so, writing a new mapping, and if not, executing S903.
S903: other flows are performed.
In an embodiment of the present application, other processes include transferring a newly received data block to a newly acquired memory map region.
Based on the same inventive concept, the embodiment of the application also provides a WAL processing device. Referring to fig. 10, a schematic structural diagram of a WAL processing apparatus according to an embodiment of the present application may include:
the adding module 1000 is configured to receive a data segment to be processed, and add the received data segment to be processed to a preset data block, where the data block is set in a buffer area of the user state storage area;
The processing module 1010 is configured to, when the total number of data segments to be processed included in the data block is determined and a preset number condition is satisfied, transfer the data block from the buffer area to the memory mapping area, where the memory mapping area is set in the user state storage area;
the mapping module 1020 is configured to call a preset memory mapping function, map a data block from the memory mapping area to a pre-written log WAL in the disk, and obtain a mapped WAL.
Optionally, when adding the received data segment to be processed to a preset data block, the adding module 1000 is further configured to:
determining data attribute information corresponding to the received data segment to be processed;
splicing the data attribute information and the data segments to be processed according to a preset splicing format to obtain spliced data segments to be processed;
and adding the spliced data segments to be processed into a preset data block.
Optionally, the processing module 1010 is further configured to:
if the storage mode corresponding to the data block is determined to be a single data segment, the data block is transferred from the buffer area to the memory mapping area;
if the storage mode is determined to be the multi-data segment, when the total number of the data segments to be processed contained in the data block is determined to be greater than a preset number threshold, the data block is transferred from the buffer area to the memory mapping area.
Optionally, when the data block is transferred from the buffer area to the memory mapping area, the processing module 1010 is further configured to:
when the length of the data block corresponding to the data block is determined to be larger than the residual memory capacity of the memory mapping area, randomly reading a memory mapping area with an idle working state from a preset mapping pool, wherein the mapping pool comprises a plurality of memory mapping areas;
and transferring the data block into the read memory mapping area.
Optionally, when the data block is transferred from the buffer area to the memory mapping area, the processing module 1010 is further configured to:
when the length of the data block corresponding to the data block is determined to be larger than the residual memory capacity of the memory mapping area, generating a new memory mapping area in the user state storage area;
and transferring the data blocks from the buffer area to a new memory mapping area.
Optionally, the mapping module 1020 is further configured to:
determining WALs mapped with the memory mapping areas based on mapping relations between the memory mapping areas and the WALs;
and calling a memory mapping function, and mapping the data block from the memory mapping area to the determined WAL to obtain the mapped WAL.
Optionally, the apparatus further includes a data reading module 1030, where the data reading module 1030 is configured to:
Receiving a data reading request, wherein the data reading request at least comprises a data index identifier;
based on the data index identification, the target data segment is read from the corresponding data block.
Optionally, when the target data segment is read from the corresponding data block based on the data index identifier, the data reading module 1030 is further configured to:
and calling a memory mapping function, and mapping the target data segment determined based on the data index identifier into a memory mapping area by the WAL to obtain a mapped memory mapping area.
Optionally, after adding the received data segment to be processed to the preset data block, the apparatus further includes a buffer module 1040, where the buffer module 1040 is configured to:
and caching the data blocks into a data block cache area, wherein the data block cache area is arranged in the user mode storage area.
Optionally, when the target data segment is read from the corresponding data block based on the data index identifier, the data reading module 1030 is further configured to:
and when the target data block corresponding to the data block identification exists in the data block cache area, reading the target data segment corresponding to the data segment identification from the target data block of the data block cache area.
Optionally, the apparatus further comprises a verification module 1050, where the verification module 1050 is configured to:
Calculating target attribute information corresponding to the target data segment, and determining data attribute information corresponding to the target data segment from the corresponding data block;
and determining an integrity check result of the target data segment based on the target attribute information and the data attribute information.
Optionally, when determining the integrity check result of the target data segment based on the target attribute information and the data attribute information, the check module 1050 is further configured to:
when the target length in the target attribute information is determined to be the same as the data length in the data attribute information, and the target information abstract value in the target attribute information is the same as the data information abstract value in the data attribute information, determining that the integrity check result of the target data segment is complete;
and when the target length is determined to be different from the data length and the target information abstract value is different from the data information abstract value, determining that the integrity check result of the target data segment is incomplete.
Those skilled in the art will appreciate that the various aspects of the application may be implemented as a system, method, or program product. Accordingly, aspects of the application may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.
In some possible embodiments, a WAL processing device according to the application may comprise at least a processor and a memory. Wherein the memory stores program code that, when executed by the processor, causes the processor to perform the steps in the WAL processing methods according to various exemplary embodiments of the application described in this specification. For example, the processor may perform the steps as shown in fig. 3A.
The embodiment of the application also provides electronic equipment based on the same conception as the embodiment of the method. In one embodiment, the electronic device may be a node 201 as shown in fig. 2A, and in this embodiment, the electronic device may be configured as shown in fig. 11, including a memory 1101, a communication module 1103, and one or more processors 1102.
Memory 1101 for storing a computer program executed by processor 1002. The memory 1001 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, a program required for running an instant communication function, and the like; the storage data area can store various instant messaging information, operation instruction sets and the like.
The memory 1101 may be a volatile memory (RAM), such as a random-access memory (RAM); the memory 1101 may also be a nonvolatile memory (non-volatile memory), such as a read-only memory, a flash memory (flash memory), a hard disk (HDD) or a Solid State Drive (SSD); or memory 1101, is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 1101 may be a combination of the above memories.
The processor 1102 may include one or more central processing units (central processing unit, CPU) or digital processing units, or the like. A processor 1102 for implementing the WAL processing method described above when calling a computer program stored in the memory 1101.
The communication module 1103 is used for communicating with a terminal device and other servers.
The specific connection medium between the memory 1101, the communication module 1103, and the processor 1102 is not limited to the above embodiment of the present application. The embodiment of the present application is illustrated in fig. 11 by a bus 1104 connecting the memory 1101 and the processor 1102, the bus 1104 being illustrated in fig. 11 by a bold line, and the connection between other components is merely illustrative and not limiting. The bus 1104 may be divided into an address bus, a data bus, a control bus, and the like. For ease of description, only one thick line is depicted in fig. 11, but only one bus or one type of bus is not depicted.
The memory 1101 has stored therein a computer storage medium having stored therein computer executable instructions for implementing the WAL processing method of the embodiments of the present application. The processor 1102 is configured to perform the WAL processing method described above, as shown in fig. 3A.
In some possible embodiments, aspects of the WAL processing method provided by the present application may also be implemented in the form of a program product comprising program code for causing a computer device to perform the steps of the WAL processing method according to various exemplary embodiments of the application as described herein above when the program product is run on the computer device, for example, the computer device may perform the steps as shown in fig. 3A.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The program product of embodiments of the present application may employ a portable compact disc read only memory (CD-ROM) and include program code and may run on a computing device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with a command execution system, apparatus, or device.
The readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with a command execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's equipment, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functions of two or more of the elements described above may be embodied in one element in accordance with embodiments of the present application. Conversely, the features and functions of one unit described above may be further divided into a plurality of units to be embodied.
Furthermore, although the operations of the methods of the present application are depicted in the drawings in a particular order, this is not required to either imply that the operations must be performed in that particular order or that all of the illustrated operations be performed to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (16)

1. A WAL processing method, comprising:
receiving a data segment to be processed, and adding the received data segment to be processed into a preset data block, wherein the data block is arranged in a buffer area of a user state storage area;
when the total number of the data segments to be processed contained in the data block is determined and a preset number condition is met, the data block is transferred from the buffer area to a memory mapping area, and the memory mapping area is arranged in the user state storage area;
and calling a preset memory mapping function, and mapping the data block from the memory mapping area to a pre-written log WAL in a disk to obtain a mapped WAL.
2. The method of claim 1, wherein adding the received data segment to be processed to a preset data block comprises:
determining data attribute information corresponding to the received data segment to be processed;
Splicing the data attribute information and the data segment to be processed according to a preset splicing format to obtain a spliced data segment to be processed;
and adding the spliced data segment to be processed into a preset data block.
3. The method of claim 1, wherein transferring the data block from the buffer to the memory map area when the total number of the data segments to be processed included in the data block is determined to satisfy a predetermined number condition comprises:
if the storage mode corresponding to the data block is determined to be a single data segment, the data block is transferred from the buffer area to the memory mapping area;
and if the storage mode is determined to be the multi-data segment, transferring the data block from the buffer area to the memory mapping area when the total number of the data segments to be processed contained in the data block is determined to be greater than a preset number threshold.
4. A method according to any one of claims 1-3, wherein transferring the data block from the buffer into the memory map comprises:
when the length of the data block corresponding to the data block is determined to be larger than the residual memory capacity of the memory mapping area, randomly reading a memory mapping area with an idle working state from a preset mapping pool, wherein the mapping pool comprises a plurality of memory mapping areas;
And transferring the data block to the read memory mapping area.
5. A method according to any one of claims 1-3, wherein transferring the data block from the buffer into the memory map comprises:
when the length of the data block corresponding to the data block is determined to be larger than the residual memory capacity of the memory mapping area, generating a new memory mapping area in the user state storage area;
and transferring the data block from the buffer area to the new memory mapping area.
6. The method of claim 5, wherein the calling a predetermined memory mapping function to map the data block from the memory mapped region to the WAL in the disk to obtain a mapped WAL comprises:
determining WALs mapped with the memory mapping areas based on mapping relations between the memory mapping areas and the WALs;
and calling the memory mapping function, and mapping the data block from the memory mapping area to the determined WAL to obtain the mapped WAL.
7. A method according to any one of claims 1-3, wherein the method further comprises:
receiving a data reading request, wherein the data reading request at least comprises a data index identifier;
And reading the target data segment from the corresponding data block based on the data index identifier.
8. The method of claim 7, wherein the reading the target data segment from the corresponding data block based on the data index identification comprises:
and calling the memory mapping function, and mapping the target data segment determined based on the data index identifier into a memory mapping area by the WAL to obtain a mapped memory mapping area.
9. The method of claim 7, wherein after the adding the received data segment to be processed to the preset data block, the method further comprises:
and caching the data block into a data block cache area, wherein the data block cache area is arranged in the user mode storage area.
10. The method of claim 9, wherein the reading the target data segment from the corresponding data block based on the data index identification comprises:
and when the data block cache area is determined to have the target data block corresponding to the data block identifier, reading a target data segment corresponding to the data segment identifier from the target data block of the data block cache area.
11. The method of claim 1, wherein the method further comprises:
calculating target attribute information corresponding to the target data segment, and determining the data attribute information corresponding to the target data segment from the corresponding data block;
and determining an integrity check result of the target data segment based on the target attribute information and the data attribute information.
12. The method of claim 11, wherein the determining the integrity check result for the target data segment based on the target attribute information and the data attribute information comprises:
when the target length in the target attribute information is determined to be the same as the data length in the data attribute information, and the target information abstract value in the target attribute information is determined to be the same as the data information abstract value in the data attribute information, determining that the integrity check result of the target data segment is complete;
and when the target length is determined to be different from the data length and the target information abstract value is determined to be different from the data information abstract value, determining that the integrity check result of the target data segment is incomplete.
13. A WAL processing apparatus, comprising:
the adding module is used for receiving the data segment to be processed and adding the received data segment to be processed into a preset data block, wherein the data block is arranged in a buffer area of the user state storage area;
the processing module is used for transferring the data blocks from the buffer area to a memory mapping area when the total number of the data segments to be processed contained in the data blocks is determined and a preset number condition is met, and the memory mapping area is arranged in the user mode storage area;
and the mapping module is used for calling a preset memory mapping function, mapping the data block from the memory mapping area to a pre-written log WAL in a disk, and obtaining the mapped WAL.
14. An electronic device comprising a processor and a memory, wherein the memory stores program code that, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1-12.
15. A computer readable storage medium, characterized in that it comprises a program code for causing an electronic device to perform the steps of the method of any of claims 1-12, when said program code is run on the electronic device.
16. A computer program product comprising computer instructions stored in a computer readable storage medium; when the computer instructions are read from the computer-readable storage medium by a processor of an electronic device, the processor executes the computer instructions, causing the electronic device to perform the steps of the method of any one of claims 1-12.
CN202210491712.2A 2022-05-07 2022-05-07 WAL processing method and device, electronic equipment and storage medium Pending CN117056294A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210491712.2A CN117056294A (en) 2022-05-07 2022-05-07 WAL processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210491712.2A CN117056294A (en) 2022-05-07 2022-05-07 WAL processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117056294A true CN117056294A (en) 2023-11-14

Family

ID=88667954

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210491712.2A Pending CN117056294A (en) 2022-05-07 2022-05-07 WAL processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117056294A (en)

Similar Documents

Publication Publication Date Title
US8332367B2 (en) Parallel data redundancy removal
US20200320036A1 (en) Data unit cloning in memory-based file systems
US10831741B2 (en) Log-shipping data replication with early log record fetching
US10496618B2 (en) Managing data replication in a data grid
JP6264666B2 (en) Data storage method, data storage device, and storage device
CN110597825B (en) Data processing method and device based on block chain and node equipment
EP3739450A1 (en) Data processing method and apparatus, and computing device
US9514170B1 (en) Priority queue using two differently-indexed single-index tables
CN110389859B (en) Method, apparatus and computer program product for copying data blocks
CN113220729A (en) Data storage method and device, electronic equipment and computer readable storage medium
CN114461593B (en) Log writing method and device, electronic device and storage medium
WO2022048358A1 (en) Data processing method and device, and storage medium
CN111309799A (en) Method, device and system for realizing data merging and storage medium
CN111143113A (en) Method, electronic device and computer program product for copying metadata
US11210003B2 (en) Method, device and computer program product for restoring data based on replacing child node identifiers with parent node identifier
CN117492661A (en) Data writing method, medium, device and computing equipment
JP6376626B2 (en) Data storage method, data storage device, and storage device
CN116303789A (en) Parallel synchronization method and device for multi-fragment multi-copy database and readable medium
CN117056294A (en) WAL processing method and device, electronic equipment and storage medium
CN111435323A (en) Information transmission method, device, terminal, server and storage medium
US10628399B2 (en) Storing data in a dispersed storage network with consistency
CN113778331A (en) Data processing method, main node and storage medium
CN111399759B (en) Method for reading data and writing data and object file system
CN116594551A (en) Data storage method and device
CN111400404A (en) Node initialization method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination