CN115292394A - Data processing method, data processing device, computer equipment and storage medium - Google Patents

Data processing method, data processing device, computer equipment and storage medium Download PDF

Info

Publication number
CN115292394A
CN115292394A CN202210464796.0A CN202210464796A CN115292394A CN 115292394 A CN115292394 A CN 115292394A CN 202210464796 A CN202210464796 A CN 202210464796A CN 115292394 A CN115292394 A CN 115292394A
Authority
CN
China
Prior art keywords
data
written log
storage
written
wal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210464796.0A
Other languages
Chinese (zh)
Inventor
狄静舒
宋怀明
郭庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dawning Information Industry Beijing Co Ltd
Original Assignee
Dawning Information Industry Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dawning Information Industry Beijing Co Ltd filed Critical Dawning Information Industry Beijing Co Ltd
Priority to CN202210464796.0A priority Critical patent/CN115292394A/en
Publication of CN115292394A publication Critical patent/CN115292394A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a data processing method, a data processing device, a computer device and a storage medium, wherein the method comprises the following steps: the computer equipment calls a computing service node to receive the database operation request, and if the database operation request is a data writing request, a pre-written log is constructed based on the data writing request; and calling a storage service node to divide the pre-written log to obtain a plurality of pre-written log fragments, and storing the plurality of pre-written log fragments. In the scheme, the storage service and the calculation service in the database are isolated, the coupling between the storage service and the calculation service is reduced, and the influence on the performance of the database is reduced in the data reading/writing process.

Description

Data processing method, data processing device, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data processing method and apparatus, a computer device, and a storage medium.
Background
With the rapid growth of the internet, the amount of data generated grows in a geometric pattern. The database performance requirements are higher and higher due to the phenomena of large data quantity and multiple data types.
Data writing and data storage of the traditional database are coupled in the database engine, and in the process of writing data, the traditional database needs to perform synchronous data processing on files such as data files, log files, redo log and the like.
The data volume involved in the data processing process of the traditional database is large, so that the time consumption of the database in the data processing process is increased, and the data processing efficiency is low.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a data processing method, an apparatus, a computer device, and a storage medium capable of improving the efficiency of database data processing.
In a first aspect, the present application provides a data processing method, including:
calling a computing service node to receive a database operation request, and if the database operation request is a data writing request, constructing a pre-written log based on the data writing request;
and calling a storage service node to divide the pre-written log to obtain a plurality of pre-written log fragments, and storing the plurality of pre-written log fragments.
In the embodiment, the storage service and the computing service in the database are isolated, the coupling between the storage service and the computing service is reduced, and the influence on the performance of the database is reduced in the data reading/writing process.
In one optional embodiment, the storage service node comprises a plurality of storage nodes; calling a storage service node to divide the pre-written log to obtain a plurality of pre-written log segments, and storing the plurality of pre-written log segments, wherein the method comprises the following steps:
dividing the pre-written log according to the corresponding relation between the written data in the pre-written log and the storage nodes to obtain pre-written log segments corresponding to the storage nodes;
and storing each pre-written log segment into corresponding storage nodes.
In this embodiment, wal is divided and stored according to the data division rule, so that the data amount written in the storage node is reduced.
In one optional embodiment, the storage node comprises a pre-write log segment file, a data file and a data cache; the method for storing the plurality of pre-written log segments comprises the following steps:
sequentially storing each pre-written log segment into a pre-written log segment file corresponding to each storage node according to a preset sequencing rule;
and reading the corresponding pre-written log segments from the pre-written log segment files according to a preset first operating frequency, and replaying the pre-written log segments to the data files and the data cache of each storage node.
In this embodiment, the storage service node may implement playback of the data file and the wal segment in the data cache based on the wal segment in the local wal segment, without node crossing, thereby avoiding network overhead and improving the efficiency of playback of the wal segment.
In one optional embodiment, reading a corresponding pre-written log segment from each pre-written log segment file according to a preset first operating frequency, and replaying the pre-written log segment to a data file and a data cache of each storage node includes:
aiming at each storage node, according to a first operation frequency, acquiring a first sequence number of a first pre-written log segment in a pre-written log segment file, a second sequence number of a second pre-written log segment in a data file and a third sequence number of a third pre-written log segment in a data cache;
and replaying the first pre-written log segment to the data file and the data cache of each storage node according to the first sequence number, the second sequence number and the third sequence number.
In this embodiment, the storage service node may implement playback of the data file and the latest wal segment in the data cache based on the sequence number of the wal segment in the local wal segment, implement data consistency, and improve the efficiency of wal segment playback without node crossing.
In one optional embodiment, the replaying the first pre-written log segment to the data file and the data cache of each storage node according to the first sequence number, the second sequence number, and the third sequence number includes:
under the condition that the first sequence number is larger than the second sequence number, replaying the first pre-written log fragment file to the data file;
and when the first sequence number is larger than the third sequence number, replaying the first pre-written log segment file to the data cache.
In this embodiment, by performing playback operation on wal fragments of a data file and a data cache, wal fragments in the data cache can be guaranteed to be the latest data, and consistency and validity of the data are guaranteed.
In one optional embodiment, the method further comprises:
and synchronizing the pre-written log segments in the current storage node to other storage nodes according to the preset copy number.
In this embodiment, after receiving the synchronization response of the slave node, a response that data writing is successful can be returned to the computation service node, and it is not necessary to wait until the storage service node completes the processing of wal persistence, wal playback, and the like, thereby improving the writing performance of the database.
In one optional embodiment, the method further comprises:
obtaining the number of effective copies of the pre-written log segments of the current storage node;
under the condition that the number of the effective copies is smaller than the preset number of the copies, calculating the number of the required copies based on the number of the effective copies and the preset number of the copies;
and determining other storage nodes with the required copy number as candidate storage nodes, and synchronizing the pre-written log segments of the current storage node into the candidate storage nodes.
In this embodiment, the storage service node can effectively manage the number of copies in time, so as to implement effective copies of the pre-written log segments of each storage node, and improve the efficiency of data recovery.
In an alternative embodiment, the method further comprises,
under the condition that the database operation request is a data reading request, calling a storage service node to acquire a data page corresponding to a data identifier according to the data identifier carried in the data reading request;
and returns the data page to the compute service node.
In this embodiment, the storage service node may obtain the corresponding data page from the storage service node based on the data identifier and return the data page to the computation service node, thereby improving the efficiency of data reading.
In a second aspect, the application further provides a data processing device. The device comprises:
the calculation service module is used for receiving the database operation request and constructing a pre-written log based on the data writing request under the condition that the database operation request is the data writing request;
and the storage service module is used for dividing the pre-written log to obtain a plurality of pre-written log fragments and storing the plurality of pre-written log fragments.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the method as provided in the first aspect below when the processor executes the computer program.
In a fourth aspect, the present application further provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method provided by the first aspect.
In a fifth aspect, the present application further provides a computer program product. The computer program product comprising a computer program that when executed by a processor implements the method provided by the first aspect.
According to the data processing method and device, the computer equipment and the storage medium, the computer equipment calls the computing service node to receive the database operation request, and if the database operation request is a data writing request, a pre-written log is constructed based on the data writing request; and calling a storage service node to divide the pre-written log to obtain a plurality of pre-written log fragments, and storing the plurality of pre-written log fragments. In the scheme, the storage service and the calculation service in the database are isolated, the coupling between the storage service and the calculation service is reduced, and the influence on the performance of the database is reduced in the data reading/writing process.
Drawings
FIG. 1 is a diagram of an application environment of a data processing method in one embodiment;
FIG. 2 is a block diagram of a database system in one embodiment;
FIG. 3 is a flow diagram illustrating a data processing method according to one embodiment;
FIG. 4 is a flow diagram illustrating a data processing method according to one embodiment;
FIG. 5 is a flow diagram illustrating a data processing method according to one embodiment;
FIG. 6 is a flow diagram illustrating a data processing method according to one embodiment;
FIG. 7 is a flow diagram illustrating a data processing method according to one embodiment;
FIG. 8 is a flow diagram that illustrates a data processing method in one embodiment;
FIG. 9 is a flow diagram illustrating a data processing method according to one embodiment;
FIG. 10 is a block diagram showing the structure of a data processing apparatus according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The data processing method provided by the embodiment of the application can be applied to the application environment shown in fig. 1. In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 1. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a data processing method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 1 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
A database system is deployed in a computer device, where the database system provided in this embodiment includes a computation service node and a storage service node, as shown in fig. 2, where the computation service node receives an SQL request for database write operation, constructs a pre-write log (Wal) based on data carried in the SQL request, executes a computation task, and performs transaction management. The storage service nodes comprise storage drive service nodes and storage nodes.
Receiving the Wal by the storage drive service node, dividing the Wal, and writing each Wal segment into a corresponding storage node; meanwhile, the storage drive service node controls wal synchronization among the drive service nodes; the storage drive service node is also used for maintaining the wal file of the storage node and the copy number of the data file. In addition, when a reading request of the database is received, the storage drive service node reads the data page from the storage node and returns the data page to the computing service node; when a read data request is received, the storage drive service node judges whether the data in the cache is latest according to the transaction log serial number lsn of wal, if the data is latest, the data is directly returned, and if the data is not latest, the cache and the data file are replayed wal. The storage drive service node is also used for persisting Wal in the Wal data file of the storage node and regularly replaying Wal updated data file.
Illustratively, the storage drive service nodes include one master node (i.e., read-write node) and a plurality of read nodes; the storage drive service comprises a pre-written log processing module Wal Processor, a Read module Read Block and a copy management module Replication Manager.
The Wal Processor performs Wal synchronization of the master node and the read node, wal partitioning, wal persistence, wal playback.
Alternatively, wal is synchronized: and after receiving wal, the host node generates a synchronization message, synchronizes the synchronization message to all the read nodes, and returns a response of successful data writing to the calculation service node when receiving a response that more than half of the read nodes return successful synchronization to the host node. In this case, the consistency of wal is guaranteed even if there is a failure of an individual read node. Optionally, the data synchronized between the master node and the read node further includes some attribute information in addition to wal, and optionally, the attribute information includes data definition language DDL update, wal segment blocking information, wal minimum Log Sequence Number (Log Sequence Number, lsn), maximum persisted data lsn, and the like. Based on this, after receiving the synchronous response of the slave node, the master node can return a response of successful data writing to the computing service node, and does not need to wait until the storage service node completes the processing of wal persistence, wal playback and the like, so that the writing performance of the database is improved.
wal: the main node divides wal according to data division rules to obtain a plurality of wal fragments, stores the plurality of wal fragments into corresponding storage nodes, and sorts wal in wal data files according to lsn. Based on the method, the main node can complete data persistence by playing back the local wal without node crossing, and network overhead is avoided.
wal persistence: the master node persists the sequenced wal fragments in a wal data file, and the number of copies is 3, and the method can return when 2 nodes are written successfully; wal data file (wal segment) organizes copies in a small storage block manner, for example, 2GB segment file, and uses a segment file as a failure recovery unit, so as to reduce average recovery time and realize high availability of storage and failure self-recovery. The storage driving node writes wal fragments belonging to the storage node1, the storage node2 and the storage node3 into the wal segment11, the wal segment21 and the wal segment31 respectively, and meanwhile, the storage nodes synchronize wal fragments in a pipeline mode, so that copy synchronization of wal fragments in each storage node is realized.
wal playback: the main node periodically plays back wal segment in the wall segment to the data file, completes the data disk-dropping, when playing back, compares lsn of wal segment in the wall segment, lsn of wal segment in the data file with lsn of wal segment in the data cache, if lsn of wal segment in the wall segment is larger than lsn of wal segment in the data cache and data file, then plays back the operation, that is, updates wal segment in the wall segment to the data file and data cache, and if what 24 zxft 3524 of 3272 segment in the data cache and data file is equal to 3584, 3584 in 3535, then does not. Therefore, the data cache is updated after being played back, and the wal fragments in the data cache are always kept to be the latest.
And reading the data by taking the data page as a unit and returning the data to the computing service. When data is read, judging whether a corresponding target wal fragment in the data cache is the latest according to a current transaction id carried in an operation request, lsn of a wal fragment in the data cache and cache data tid, and if so, directly obtaining wal fragments from the data cache; if not, the operation of playing wal fragments is triggered, wal fragments in the data cache and the data file are updated, and then wal fragments are read from the data cache and returned to the computing service node. When the wal fragment is replayed, judging whether the latest data of the current transaction is persisted in a wal data file, if so, summarizing the wal fragment persisted to wal data file, and then carrying out the replay operation of the wal fragment; if not, the operation of the wal fragment is replayed directly. Therefore, the latest data of the current transaction can be read whenever the data is read, and the data consistency is ensured.
And the Replication Manager is responsible for managing the copies of the wal segment and the copies of the data files, and if the number of system copies is set to be 3, when the number of effective copies of a certain file is less than 3, the copies are replicated from other effective copies to meet the requirement of 3 copies.
The following embodiments provide a data processing method based on a database system to further illustrate the data processing process.
In one embodiment, as shown in fig. 3, a data processing method is provided, which is described by taking the example that the method is applied to the computer device in fig. 1, and includes the following steps:
step 201, a computation service node is called to receive a database operation request, and if the database operation request is a data write request, a pre-write log is constructed based on the data write request.
Optionally, the database operation request includes a database-based data write request and a database-based data read request. The data writing request can carry data to be written, and the computing service node can construct a corresponding pre-written log wal based on the data to be written; the data reading request can carry a data identifier to be read, and the computing service node can obtain a corresponding data page from the storage service node according to the data identifier to be read.
In this embodiment, a computer device calls a computation service node to receive a database operation request sql request, and when the database operation request is a data write request, the computer device calls the computation service node to obtain data to be written carried in the data write request, and constructs a pre-write log wal corresponding to the data to be written based on the data to be written, optionally, the pre-write log wal includes the data to be written and other parameter information, for example, the attribute information includes a log serial number lsn of wal, an operation time, and the like.
Optionally, when the database operation request is a data reading request, the storage service node is called to obtain a data page corresponding to the data identifier according to the data identifier carried in the data reading request, and the data page is returned to the computation service node.
In this embodiment, the computer device invokes the computation service node to receive the database operation request sql request, and in a case that the database operation request is a data reading request, the computer device invokes the computation service node to obtain, according to a data identifier carried in the data reading request, a data page corresponding to the data identifier from the storage service node, and returns the data page to the computation service node.
Step 202, calling a storage service node to divide the pre-written log to obtain a plurality of pre-written log segments, and storing the plurality of pre-written log segments.
Optionally, after the computer device calls the computing service node construction wal, the computer device may transmit the constructed wal to the storage service node, and the storage service node performs storage processing based on the received wal.
In this embodiment, the storage service node may perform partition processing on the received wal, optionally, the storage service node may partition wal according to a preset partition rule, where the partition rule may include partitioning based on wal data size; or, the division rule may be based on the corresponding relationship between the data in wal and the storage nodes; alternatively, the division rule may be based on the processing time period of the data in wal. After the storage service node divides wal, obtaining a plurality of wal fragments, and storing each wal fragment in a corresponding storage node; optionally, the storage service node may perform storage processing according to the timing sequence of wal fragment, or the storage service node may also perform sorting according to lsn of wal fragment, and perform storage processing in ascending order or descending order.
In the data processing method, computer equipment calls a calculation service node to receive a database operation request, and if the database operation request is a data writing request, a pre-written log is constructed based on the data writing request; and calling a storage service node to divide the pre-written log to obtain a plurality of pre-written log segments, and storing the plurality of pre-written log segments. In addition, in the scheme, only the constructed pre-written log is transmitted in the data writing process of the database, the data is not transmitted, the data transmission amount is greatly reduced, in addition, in the process of storing the pre-written log, the pre-written log is divided and stored, the rapid data recovery capability is improved, and the writing performance of the database is optimized.
The storage service node may include a plurality of storage nodes, and in an alternative embodiment, the storage service node includes a plurality of storage nodes; as shown in fig. 4, the step of calling the storage service node to divide the pre-written log to obtain a plurality of pre-written log segments, and performing storage processing on the plurality of pre-written log segments includes:
step 301, dividing the pre-written log according to the corresponding relationship between the written data in the pre-written log and the storage nodes to obtain pre-written log segments corresponding to the storage nodes.
In this embodiment, the pre-written log wal may store the correspondence between the written data and the storage node, for example, the storage node corresponding to the written data a is node1, and the storage node corresponding to the written data B is node2. Based on the corresponding relationship, the storage service node divides the write data A and the write data B in wal to obtain wal segment1 corresponding to the write data A and wal segment2 corresponding to the write data B.
Step 302, storing each pre-written log segment into corresponding storage nodes.
In this embodiment, the storage service node writes wal fragment 1 into the corresponding storage node1 and writes wal fragment 2 into the corresponding storage node2 based on the correspondence relationship.
In this embodiment, wal is divided and stored according to data division rules, so that the data volume written into storage nodes is reduced, and the playback operation of local wal can be realized based on each storage node, that is, wal fragments can be directly written into a data file without node crossing, thereby avoiding network overhead.
In storing the wal fragment, the computer device may implement the persistence process of wal and the playback process of wal. In one optional embodiment, the storage node comprises a pre-write log segment file, a data file and a data cache; as shown in fig. 5, the storing process for multiple pre-written log segments includes:
step 401, sequentially storing each pre-written log segment into a pre-written log segment file corresponding to each storage node according to a preset sorting rule.
In this embodiment, the preset sorting rule may be a sorting rule based on lsn of wal fragment, may be a descending sorting of lsn, and may also be an ascending sorting of lsn. Illustratively, the storage service nodes write wal fragments to their corresponding storage nodes, optionally persisting wal fragments to the storage nodes' pre-write log fragment file wal segment, sorted in ascending order by lsn of wal fragments.
Optionally, the process that the storage service node persists the wal fragment to the storage node further includes synchronizing the pre-written log fragment in the current storage node to other storage nodes according to the preset copy number.
Optionally, the preset number of copies may be determined according to parameters such as the number of actual database storage nodes, the number of wal fragments, and the like, and here, the number of copies may be exemplarily 3, that is, it is considered that synchronization is successful when the wal fragment of the current storage node is successfully written into the other two storage nodes. Referring to FIG. 6, the sequenced wal fragment in current storage node1 is persisted into wal segment11 and wal segment11 is synchronized into node2 and node 3. The wal segments of each storage node store the copies in a small storage block manner, for example, 2GB one segment file, and one segment file is used as a failure recovery unit to reduce the average recovery time, and exemplarily, under a trillion network environment, the recovery time of one segment file is less than 2 seconds, thereby achieving high availability of storage and self-healing of failure. Each storage node may have multiple segment files, and as shown in FIG. 6, after node1, node2, and node3 achieve mutual synchronization, node1 includes wal segment11, wal segment21, and wal segment31. Alternatively, if the wal segment11 of storage node1 is full, then the wal fragment is written sequentially in the wal segment12 in the order of the wal segment.
After synchronizing wal segments, the storage service node may obtain the number of valid copies of each wal segment, and if the number of valid copies is reduced, it means that there may be a case where the storage node generates an exception or a failure, in an alternative embodiment, as shown in fig. 7, the method further includes:
step 501, obtaining the number of effective copies of the pre-written log segment of the current storage node.
In this embodiment, the copy management module in the storage service node may obtain the number of valid copies of the wal segment of the current storage node, that is, obtain the number of other storage nodes storing the wal segment of the current storage node.
Step 502, under the condition that the number of effective copies is less than the preset number of copies, calculating the number of required copies based on the number of effective copies and the preset number of copies.
In this embodiment, for example, the preset number of copies may be 3, and if the number of valid copies is 2 and is less than the preset number of copies, there may be a case where the storage node generates an exception or a failure, in this case, the copy management module may calculate the number of copies that need to be created, that is, the number of copies that need to be created, and in this embodiment, the number of copies that need to be created is 1.
Step 503, determining other storage nodes with the required copy number as candidate storage nodes, and synchronizing the pre-written log segments of the current storage node to the candidate storage nodes.
In this embodiment, when the copy management module determines that the number of the required copies is 1, acquiring 1 other storage nodes whose operating states are normal, and synchronizing the wal segment of the current storage node to the other storage nodes, optionally, the synchronization process includes synchronization of the wal segment of the current storage node and synchronization of the data file of the current storage node.
Step 402, reading the corresponding pre-written log segment from each pre-written log segment file according to a preset first operating frequency, and replaying the pre-written log segment to the data file and the data cache of each storage node.
Alternatively, the preset first operating frequency may be once every 1s, and the first operating frequency may also be synchronized with the frequency of wal fragment persistence by the storage service node, that is, after the storage service node completes wal fragment persistence, the storage service node performs wal fragment playback operation.
In this embodiment, the storage service node persists the wal segment to the wal segment of the storage node, and the storage service node replays the wal segment to the data file and the data cache of the storage node based on the wal segment in the wal segment.
In this embodiment, the storage service node may implement playback of the wal segment in the data file and the data cache based on the wal segment in the local wal segment, without node crossing, thereby avoiding network overhead and improving the efficiency of the wal segment playback.
Further, in order to ensure the validity of the wal segments in the data file and data cache, in an optional embodiment, as shown in fig. 8, according to a preset first operating frequency, reading a corresponding pre-written log segment from each pre-written log segment file, and playing back the pre-written log segment to the data file and data cache of each storage node, the method includes:
step 601, for each storage node, according to a first operating frequency, a first sequence number of a first pre-written log segment in a pre-written log segment file, a second sequence number of a second pre-written log segment in a data file, and a third sequence number of a third pre-written log segment in a data cache are obtained.
In this embodiment, the storage service node obtains lsn of the first pre-written log segment, lsn of the second pre-written log segment, and lsn of the third pre-written log segment, where it is noted that lsn is a serial number formed by numbers, and the larger the number is, the later the time for generating the wal segment corresponding to lsn is, that is, the latest data version of the wal segment corresponding to lsn. The storage service node can respectively obtain lsn corresponding to the wal segment stored in the wal segment, the data file and the data cache.
Step 602, replaying the first pre-written log segment to the data file and the data cache of each storage node according to the first sequence number, the second sequence number, and the third sequence number.
In the embodiment, the operation of wal fragment playback to data file and data cache in wal segment is realized according to the first lsn, the second lsn and the third lsn.
Optionally, replaying the first pre-written log segment to the data file and the data cache of each storage node according to the first sequence number, the second sequence number, and the third sequence number includes the following two cases:
and replaying the first pre-written log segment file into the data file when the first sequence number is larger than the second sequence number.
In this embodiment, the first lsn is larger than the second lsn, which indicates that wal segment in the wal segment has been updated, but wal segment in the data file is still old data, and at this time, the storage service node may replay the wal segment in the wal segment into the data file.
And replaying the first pre-written log segment file to the data cache under the condition that the first sequence number is larger than the third sequence number.
Similarly to replaying into a data file, the first lsn is larger than the third lsn, which shows that wal segment in the wal segment has been updated, but wal segment in the data cache is still old data, at this time, the storage service node may replay wal segment in the wal segment into the data cache, so that wal segment in the data cache is always the latest valid data.
Further, in the data reading process, when the storage service node acquires the data page corresponding to the data identifier from the data cache, it may be determined from the data cache whether the wal segment corresponding to the data identifier is the latest data, that is, whether lsn of wal segment in the data cache is the maximum lsn, and if so, the wal segment and the data page corresponding to the data identifier are directly acquired from the data cache; if not, determining whether the data page corresponding to the data identifier has already executed wal persistence processing; if wal persistence processing has been executed, directly executing the operation of replaying to the data file and data cache based on wal segment in the wal segment; if wal persistence processing is not performed, wal persistence processing is performed based on wal segment corresponding to the data identifier, and then playback operation to the data file and the data cache is performed based on wal segment in the wal segment, so that the wal segment and the data page corresponding to the data identifier are read from the data cache, which is not limited in this embodiment.
In this embodiment, by performing playback operation on the wal segment of the data file and the data cache, the wal segment in the data cache can be guaranteed to be the latest data, and the consistency and the validity of the data are guaranteed.
To better explain the above method, as shown in fig. 9, the present embodiment provides a data processing method, which specifically includes:
s101, calling a calculation service node to receive a database operation request;
s102, under the condition that the database operation request is a data writing request, calling a computing service node to construct a pre-written log based on the data writing request;
s103, calling a storage service node to divide the pre-written log according to the corresponding relation between the written data in the pre-written log and the storage node to obtain pre-written log segments corresponding to the storage nodes;
s104, calling a storage service node to sequentially store the pre-written log segments into pre-written log segment files corresponding to the storage nodes according to a preset sequencing rule;
s105, calling a storage service node to read corresponding pre-written log segments from each pre-written log segment file according to a preset first operation frequency;
s106, acquiring a first sequence number of a first pre-written log segment in the pre-written log segment file, a second sequence number of a second pre-written log segment in the data file, and a third sequence number of a third pre-written log segment in the data cache;
s107, when the first sequence number is larger than the second sequence number, calling a storage service node to rewrite the first pre-written log segment into a data file;
s108, under the condition that the first sequence number is larger than the third sequence number, calling a storage service node to rewrite the first pre-written log segment into a data cache;
s109, calling a storage service node to synchronize the pre-written log segments in the current storage node to other storage nodes according to the preset copy number;
s110, calling a storage service node to acquire the number of effective copies of the pre-written log segments of the current storage node;
s111, under the condition that the number of the effective copies is smaller than the preset number of the copies, calling a storage service node to calculate the number of the required copies based on the number of the effective copies and the preset number of the copies;
s112, calling the storage service node to determine other storage nodes with the required copy number as candidate storage nodes, and synchronizing the pre-written log segments of the current storage nodes into the candidate storage nodes;
and S113, under the condition that the database operation request is a data reading request, calling the storage service node to acquire a data page corresponding to the data identifier according to the data identifier carried in the data reading request, and returning the data page to the calculation service node.
In the embodiment, the storage service and the calculation service in the database are isolated, the coupling between the storage service and the calculation service is reduced, and the influence on the performance of the database is reduced in the data reading/writing process; the method of returning the result after the pre-written log cache of the read-write node is synchronously completed does not need to wait for the persistence of the pre-written log, and improves the writing performance of the database; the data file can be written by replaying the local pre-written log of the storage node without node crossing and network overhead is avoided by storing the pre-written log according to the data division rule; when data is read, partial pre-written log is replayed, the latest data is read, and the data consistency is guaranteed.
The data processing method provided by the above embodiment has similar implementation principles and technical effects to those of the above embodiment, and is not described herein again.
It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially shown as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in other steps.
Based on the same inventive concept, the embodiment of the present application further provides a data processing apparatus for implementing the above-mentioned data processing method. The implementation scheme for solving the problem provided by the device is similar to the implementation scheme described in the above method, so the specific limitations in one or more embodiments of the data processing device provided below may refer to the limitations on the data processing method in the above description, and are not described herein again.
In one embodiment, as shown in fig. 10, there is provided a data processing apparatus including:
the calculation service module 01 is used for receiving the database operation request, and constructing a pre-write log based on the data write request under the condition that the database operation request is the data write request;
the storage service module 02 is configured to divide the pre-written log to obtain a plurality of pre-written log segments, and store the plurality of pre-written log segments.
In one optional embodiment, the storage service node comprises a plurality of storage nodes; the storage service module 02 is configured to divide the pre-written log according to a correspondence between data written in the pre-written log and storage nodes to obtain pre-written log segments corresponding to the storage nodes; and storing each pre-written log segment into each corresponding storage node.
In one optional embodiment, the storage node comprises a pre-write log segment file, a data file and a data cache; the storage service module 02 is used for sequentially storing each pre-written log segment into a pre-written log segment file corresponding to each storage node according to a preset sequencing rule; and reading the corresponding pre-written log segments from the pre-written log segment files according to a preset first operation frequency, and replaying the pre-written log segments to the data files and the data cache of each storage node.
In one optional embodiment, the storage service module 02 is further configured to, for each storage node, according to the first operation frequency, obtain a first sequence number of a first pre-written log segment in the pre-written log segment file, a second sequence number of a second pre-written log segment in the data file, and a third sequence number of a pre-written log segment in the third data cache; and replaying the first pre-written log segment to the data file and the data cache of each storage node according to the first sequence number, the second sequence number and the third sequence number.
In one optional embodiment, the storage service module 02 is configured to, in a case that the first sequence number is greater than the second sequence number, replay the first pre-written log segment file into the data file; and when the first sequence number is larger than the third sequence number, replaying the first pre-written log segment file to the data cache.
In one optional embodiment, the storage service module 02 is further configured to synchronize the pre-written log segments in the current storage node to other storage nodes according to a preset number of copies.
In an optional embodiment, the storage service module 02 is further configured to obtain the number of valid copies of the pre-written log segment of the current storage node; under the condition that the number of the effective copies is smaller than the preset number of the copies, calculating the number of the required copies based on the number of the effective copies and the preset number of the copies; and determining other storage nodes with the required copy number as candidate storage nodes, and synchronizing the pre-written log segments of the current storage node into the candidate storage nodes.
In an optional embodiment, the storage service module 01 is further configured to, when the database operation request is a data reading request, obtain a data page corresponding to the data identifier according to the data identifier carried in the data reading request, and return the data page to the computation service node.
The respective modules in the data processing apparatus described above may be implemented in whole or in part by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:
calling a computing service node to receive a database operation request, and if the database operation request is a data writing request, constructing a pre-written log based on the data writing request;
and calling a storage service node to divide the pre-written log to obtain a plurality of pre-written log fragments, and storing the plurality of pre-written log fragments.
The implementation principle and technical effect of the computer device provided by the above embodiment are similar to those of the above method embodiment, and are not described herein again.
In one embodiment, a computer-readable storage medium is provided, having stored thereon a computer program which, when executed by a processor, performs the steps of:
calling a computing service node to receive a database operation request, and if the database operation request is a data writing request, constructing a pre-written log based on the data writing request;
and calling a storage service node to divide the pre-written log to obtain a plurality of pre-written log fragments, and storing the plurality of pre-written log fragments.
The implementation principle and technical effect of the computer-readable storage medium provided by the above embodiments are similar to those of the above method embodiments, and are not described herein again.
In one embodiment, a computer program product is provided, comprising a computer program which when executed by a processor performs the steps of:
calling a computing service node to receive a database operation request, and if the database operation request is a data writing request, constructing a pre-written log based on the data writing request;
and calling a storage service node to divide the pre-written log to obtain a plurality of pre-written log fragments, and storing the plurality of pre-written log fragments.
The computer program product provided by the above embodiments has similar implementation principles and technical effects to those of the above method embodiments, and is not described herein again.
It should be noted that, the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), magnetic Random Access Memory (MRAM), ferroelectric Random Access Memory (FRAM), phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), for example. The databases involved in the embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims (10)

1. A method of data processing, the method comprising:
calling a computing service node to receive a database operation request, and if the database operation request is a data writing request, constructing a pre-written log based on the data writing request;
and calling a storage service node to divide the pre-written log to obtain a plurality of pre-written log fragments, and storing the plurality of pre-written log fragments.
2. The method of claim 1, wherein the storage service node comprises a plurality of storage nodes; the calling storage service node divides the pre-written log to obtain a plurality of pre-written log fragments, and stores the pre-written log fragments, and the calling storage service node comprises:
dividing the pre-written log according to the corresponding relation between the written data in the pre-written log and the storage nodes to obtain pre-written log segments corresponding to the storage nodes;
and storing each pre-written log segment into each corresponding storage node.
3. The method of claim 2, wherein the storage nodes comprise a pre-write log segment file, a data file, and a data cache; the storing the plurality of pre-written log segments comprises:
sequentially storing each pre-written log segment into a pre-written log segment file corresponding to each storage node according to a preset sorting rule;
and reading the corresponding pre-written log segment from each pre-written log segment file according to a preset first operating frequency, and replaying the pre-written log segment to the data file and the data cache of each storage node.
4. The method of claim 3, wherein reading the corresponding pre-written log segment from each pre-written log segment file according to the preset first operating frequency, and replaying the pre-written log segment to the data file and the data cache of each storage node comprises:
for each storage node, according to the first operating frequency, acquiring a first sequence number of a first pre-written log segment in the pre-written log segment file, a second sequence number of a second pre-written log segment in the data file, and a third sequence number of a pre-written log segment in the third data cache;
and replaying the first pre-written log segment to a data file and a data cache of each storage node according to the first sequence number, the second sequence number and the third sequence number.
5. The method of claim 4, wherein replaying the pre-written log segments into the data files and data caches of the storage nodes according to the first sequence number, the second sequence number, and the third sequence number comprises:
replaying the first pre-written log segment file into the data file if the first sequence number is greater than the second sequence number;
and replaying the first pre-written log segment file to the data cache if the first sequence number is greater than the third sequence number.
6. The method of claim 1, further comprising:
and synchronizing the pre-written log segments in the current storage node to other storage nodes according to the preset copy number.
7. The method of claim 6, further comprising:
obtaining the number of effective copies of the pre-written log segments of the current storage node;
under the condition that the number of the effective copies is smaller than the preset number of the copies, calculating the number of the required copies based on the number of the effective copies and the preset number of the copies;
and determining other storage nodes with the required copy number as candidate storage nodes, and synchronizing the pre-written log segments of the current storage node into the candidate storage nodes.
8. A data processing apparatus, characterized in that the apparatus comprises:
the calculation service module is used for receiving a database operation request, and constructing a pre-written log based on the data writing request under the condition that the database operation request is the data writing request;
and the storage service module is used for dividing the pre-written log to obtain a plurality of pre-written log fragments and storing the plurality of pre-written log fragments.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202210464796.0A 2022-04-29 2022-04-29 Data processing method, data processing device, computer equipment and storage medium Pending CN115292394A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210464796.0A CN115292394A (en) 2022-04-29 2022-04-29 Data processing method, data processing device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210464796.0A CN115292394A (en) 2022-04-29 2022-04-29 Data processing method, data processing device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115292394A true CN115292394A (en) 2022-11-04

Family

ID=83820479

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210464796.0A Pending CN115292394A (en) 2022-04-29 2022-04-29 Data processing method, data processing device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115292394A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115840539A (en) * 2023-01-31 2023-03-24 天津南大通用数据技术股份有限公司 Data processing method and device, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115840539A (en) * 2023-01-31 2023-03-24 天津南大通用数据技术股份有限公司 Data processing method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
JP6415513B2 (en) System and method for providing high availability data
CN104598459B (en) database processing, data access method and system
CN105468473B (en) Data migration method and data migration device
US9223841B2 (en) System and method for providing high availability data
US7788233B1 (en) Data store replication for entity based partition
US8200624B2 (en) Membership tracking and data eviction in mobile middleware scenarios
JP2023546249A (en) Transaction processing methods, devices, computer equipment and computer programs
CN102929786A (en) Volatile memory representation of nonvolatile storage device set
CN115599747B (en) Metadata synchronization method, system and equipment of distributed storage system
US10515228B2 (en) Commit and rollback of data streams provided by partially trusted entities
CN112162846B (en) Transaction processing method, device and computer readable storage medium
CN109739684B (en) Vector clock-based copy repair method and device for distributed key value database
CN112749198A (en) Multi-level data caching method and device based on version number
CN115292394A (en) Data processing method, data processing device, computer equipment and storage medium
CN109726264A (en) Method, apparatus, equipment and the medium updated for index information
CN110532243A (en) Data processing method, device and electronic equipment
CN116339626A (en) Data processing method, device, computer equipment and storage medium
CN116048878A (en) Business service recovery method, device and computer equipment
CN113590643B (en) Data synchronization method, device, equipment and storage medium based on dual-track database
Vilaça et al. On the expressiveness and trade-offs of large scale tuple stores
CN117688099A (en) Method, device, equipment and storage medium for synchronizing main and standby data of distributed database
CN116541399A (en) Database partition table management method and device
CN114647630A (en) File synchronization method, information generation method, file synchronization device, information generation device, computer equipment and storage medium
CN116955495A (en) Data processing method, device, medium and equipment
CN117555679A (en) Service data processing method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination