WO2023279962A1 - 数据处理的方法、装置和计算系统 - Google Patents

数据处理的方法、装置和计算系统 Download PDF

Info

Publication number
WO2023279962A1
WO2023279962A1 PCT/CN2022/100432 CN2022100432W WO2023279962A1 WO 2023279962 A1 WO2023279962 A1 WO 2023279962A1 CN 2022100432 W CN2022100432 W CN 2022100432W WO 2023279962 A1 WO2023279962 A1 WO 2023279962A1
Authority
WO
WIPO (PCT)
Prior art keywords
log
operation command
command
logs
partition
Prior art date
Application number
PCT/CN2022/100432
Other languages
English (en)
French (fr)
Inventor
阙鸣健
冯犇
薛忠斌
陆云飞
郑渊悦
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP22836716.5A priority Critical patent/EP4361836A1/en
Publication of WO2023279962A1 publication Critical patent/WO2023279962A1/zh
Priority to US18/405,483 priority patent/US20240143566A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/217Database tuning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24542Plan optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Definitions

  • the present application relates to the field of databases, in particular to a data processing method, device and computing system.
  • a relational database refers to a database that organizes data based on a relational model and stores data in the form of rows and columns.
  • PostgreSQL, MySQL, and openGauss are all relational databases.
  • Relational databases have high concurrency characteristics, that is, a large number of users access related data in relational databases.
  • the processing performance of relational databases is improved by means of multi-core processors, extended computing and storage resources, and other methods.
  • the horizontal expansion of hardware resources cannot fundamentally solve the problem of processing performance of relational databases.
  • the processor converts the operation command on the relational database into multiple operators, and multiple operators realize the operation on the relational database through multiple recursive calls, which will lead to the performance degradation of the relational database, and hardware Resource utilization is reduced. Therefore, how to improve the performance of relational databases and the utilization of hardware resources is an urgent problem to be solved.
  • the application provides a data processing method, device and computing system, thereby ensuring the improvement of the performance of the relational database and the utilization rate of hardware resources.
  • a data processing method which can be executed by a computing device, and the method specifically includes determining an acceleration strategy for the first operation command after the computing device obtains a first operation command for a relational database, according to The acceleration policy executes the operation of the first operation command.
  • the acceleration policy is used to indicate a manner of accelerating the processing of the first operation command.
  • the computing device accelerates the processing of the first operation command according to the acceleration strategy, the time for the computing device to process the first operation command is reduced, the performance of the relational database is improved, and the utilization rate of hardware resources is improved.
  • the computing device determining the acceleration strategy for the first operation command includes the computing device determining a processing mode according to an identifier of the first operation command, where the identifier is used to indicate a processing mode that can be adopted by the first operation command,
  • the processing mode includes executing the processing of the first operation command using the bypass framework.
  • the computing device selects the bypass framework to process the operation of the first operation command. After the computing device determines the operation step set according to the first operation command, it performs a merge operation on the operation steps in the operation step set according to the type of the first operation command to obtain a combined operation step set, and then, the computing device completes the second operation step set according to the merged operation step set.
  • the set of operation steps includes operation steps required to execute the first operation command.
  • the processing mode based on the acceleration policy instruction provided by the embodiment of the present application is to perform multiple operations on multiple operators based on the bypass framework. Merge, obtain the merged operator, execute the merged operator to reduce the number of recursive calls between operators, complete the operation on the relational database as quickly as possible to obtain the operation result, improve the performance of the relational database, and avoid excessive hardware occupation resources to execute the process of recursive calls between operators, which improves the utilization of hardware resources.
  • determining the acceleration policy for the first operation command by the computing device includes determining the target partition in a data table associated with the first operation command in a dynamic partition pruning manner.
  • the computing device determines the target partition from the data table according to the attribute of the target partition indicated by the operation of the first operation command, and executes the operation of the first operation command on the target partition.
  • the target partition contains data for at least one attribute in the data table.
  • the executor in the computing device can modify the execution plan of the SQL statement according to the partition indicated by the partition identifier, so that the executor can dynamically access the target partition indicated by the partition identifier, avoiding scanning useless partitions, reducing competition for resources, and improving performance of relational databases.
  • the method further includes: the computing device concurrently stores the operation logs in the first operation log group to the memory according to the first rule, the first operation log group includes at least two operation logs, at least two The operation log includes a first operation log, wherein the first rule is used to determine whether the operation log in the first operation log group satisfies the condition for storing in the memory.
  • the computing device obtains the write permission of the redo log buffer with the first operation log group, writes the logs in the first operation log group into the redo log buffer in parallel, and writes the redo log buffer that satisfies At least one log that allows write status to be written to disk in parallel from the redo log buffer.
  • a redo log writer thread writes redo logs into a redo log buffer.
  • the redo log writer thread detects redo logs in the redo log buffer that can be written to disk.
  • the redo log writer thread checks the redo log buffer for consecutive redo logs written to disk.
  • the redo log writing thread initializes the storage space for storing log files in the disk according to the amount of logs to be written to the disk.
  • the redo log writer thread writes continuous redo logs that can be written to disk in the redo log buffer to disk.
  • the above step of initializing the storage space for storing log files in the disk by the redo log writer thread may be performed by the redo log file initialization thread (walfileinit).
  • the step of writing the redo log from the redo log buffer to the disk can be performed by the redo log flushing thread (walflusher).
  • the computing device merges multiple transaction logs into one group, uses the group to preempt the write permission of the redo log buffer, and writes the transaction logs in the group into the redo log buffer in parallel, reducing the number of redo log buffers.
  • the number of times of contention for write permissions improves the efficiency of writing transaction logs.
  • the redo log writing thread in the computing device decouples the process of writing the redo log into the redo log buffer and the process of writing the redo log to the disk, and the log writing thread does not need to wait for the redo log buffer to be released.
  • Write permission according to the log status of the log recorded in the array, the log is written to the disk, which removes the restriction on the write permission of the redo log buffer and improves the overall performance of the relational database.
  • the first rule includes: obtaining the write permission of the redo log buffer for storing logs in the memory, or at least one operation log included in the first operation log group is in a write-allowed state.
  • the method before storing the operation logs in the first operation log group in parallel in the memory according to the first rule, the method further includes: dividing the first operation log group according to the number of operation logs to be written.
  • the relational database is openGauss
  • the operation log includes at least one of a rewrite log (redo log) and a write-ahead log (Write Ahead log).
  • a data processing device in a second aspect, includes various modules for executing the data processing method in the first aspect or any possible design of the first aspect.
  • a computing system in a third aspect, includes at least one processor and a memory, and the memory is used to store a set of computer instructions;
  • the device executes the set of computer instructions, it executes the first aspect or the operation steps of the data processing method in any possible implementation manner of the first aspect.
  • the computing system may be a single computing device or a cluster composed of multiple computing devices.
  • a computer-readable storage medium including: computer software instructions; when the computer software instructions are run in a terminal, the computer executes the computer as described in the first aspect or any possible implementation manner of the first aspect. The steps of the method.
  • a computer program product is provided.
  • the computer program product is run on a computer, the computer is made to execute the operation steps of the method described in the first aspect or any possible implementation manner of the first aspect.
  • Fig. 1 is a schematic diagram of a database system provided by the present application.
  • Fig. 2 is a flowchart of a data processing method provided by the present application
  • Figure 3 is a schematic diagram of a stack provided by the present application.
  • FIG. 4 is a flow chart of a partition operation method provided by the present application.
  • Fig. 5 is a flow chart of a method for writing a log provided by the present application.
  • FIG. 6 is a schematic diagram of a log writing process provided by the present application.
  • FIG. 7 is a schematic diagram of a log writing process provided by the present application.
  • FIG. 8 is a schematic structural diagram of a data processing device provided by the present application.
  • FIG. 9 is a schematic structural diagram of a computing system provided by the present application.
  • a database system includes a database (database) and a database management system (database management system, DBMS).
  • a database is a computer software system that stores and manages data according to its data structure. It can also be understood that a database is a collection of computers used to store and manage a large amount of data, that is, an electronic filing cabinet. The data stored in the database can include travel records, consumption records, web pages browsed, messages sent, images, music and sounds, and so on.
  • a database management system is a software used to manage databases and is used to create, use and maintain databases. The database management system conducts unified management and control on the database to ensure the security and integrity of the database. Users can access the data in the database through the database management system.
  • the database administrator can maintain the database through the database management system.
  • the database management system supports multiple database client programs to create, modify and query databases.
  • databases include relational databases (such as PostgreSQL, MySQL, openGauss database) and non-relational databases (such as Cassandra database).
  • the embodiment of the present application mainly provides a solution for improving the performance of a relational database.
  • the implementation of the embodiment of the present application will be described in detail below with reference to the accompanying drawings.
  • FIG. 1 is a schematic diagram of a database system provided in this embodiment.
  • Client 110 communicates with database system 120 over network 130 .
  • the network 130 may refer to the Internet (Internet).
  • the client 110 may also be referred to as a client, which corresponds to a server and is a program that provides local services for clients.
  • the client 110 may also refer to a computer connected to the network 130, and may also be called a workstation (workstation).
  • the developer user 140 can call the application platform interface (application platform interface, API) 111, command-line interface (command-line interface, CLI) interface 112 or Java database connection (java database connectivity, JDBC) interface 113 to access the database through the client 110 system 120.
  • application platform interface application platform interface, API
  • command-line interface command-line interface
  • CLI command-line interface
  • Java database connection Java database connectivity, JDBC
  • the database system 120 includes a database management system 121 and a database storage system 122 .
  • the database management system 121 includes a service layer 1211 , a storage engine layer 1212 and a database process 1213 .
  • the service layer 1211 is used for processing the structured query language (Structured Query Language, SQL) language for accessing the database system 120.
  • SQL language is a database query and programming language for accessing data and querying, updating and managing relational databases.
  • the service layer 1211 may include modules such as a connector 1211a, an analyzer 1211b, an optimizer 1211c, and an executor 1211d to process SQL statements for accessing the database system 120 .
  • the connector 1211a is used to receive the SQL statement sent by the client 110 ; and to authenticate the user who sends the SQL statement to ensure that the legal user accesses the database system 120 and guarantee the security of the database system 120 .
  • the analyzer 1211b is used to perform lexical and grammatical analysis on the SQL statement to obtain the SQL statement including semantic information.
  • the optimizer 1211c is used to generate the operation steps of the SQL statement according to the SQL statement containing semantic information.
  • the executor 1211d is used to execute the operation steps of the SQL statement.
  • the execution frame d1 and the bypass frame d2 are configured in the executor 1211d.
  • the execution frame d1 and the bypass frame d2 may refer to rules for executing the operation steps of the SQL statement.
  • the executor 1211d can select a framework from the execution framework d1 and the bypass framework d2, and execute the operation steps of the SQL statement based on the selected framework to realize the operation on the database system 120 .
  • bypass frame d2 is configured in the actuator 1211d.
  • the executor 1211d executes the operation steps of the SQL statement based on the bypass framework d2 to realize the operation on the database system 120 .
  • An execution framework can refer to a model that iteratively executes operational steps, such as a volcano model.
  • the executor 1211d abstracts each operation of the SQL statement into an operator based on the execution framework, and constructs multiple operators of the SQL statement into an operator tree, from the root node to the leaf node of the operator tree from top to bottom Next, recursively call the calculation function of the operator to execute the SQL statement. Operators include filtering (limit), aggregation (aggregate), sorting (sort), index scan (index scan), partition operation (partlterator), and modify table (modify table), etc.
  • the bypass framework can include a model that incorporates steps to perform operations.
  • the executor 1211d combines multiple operators of the SQL statement based on the bypass framework to obtain a combined operator, and executes the combined operator to complete the operation of the SQL statement.
  • the storage engine layer 1212 may include a shared pool (shared global pool) 12121, a data high-speed buffer (data buffer cache) 12122, and a redo log buffer (redo log buffer) 12123.
  • the redo log buffer may also be called the writeback log buffer.
  • the shared pool 12121 is used to cache at least one of the executed SQL statements, SQL programs, and data dictionary information, for example, at least one of the executed SQL statements, SQL programs, or data dictionary information can be periodically cached to the shared pool 12121 , the above-mentioned cache operation is an area for syntax analysis, compilation, and execution of SQL statements and SQL programs.
  • the data high-speed buffer 12122 is used to store the data read from the data file and the data written to the data file.
  • the redo log buffer 12123 is used to cache the redo records generated when the user modifies the database, that is, the transaction log, for example, redo log (Redo log), write ahead log (Write Ahead log, WAL). Redo Log can also become XLog in OpenGauss and PostgreSQL databases.
  • the database process 1213 includes a system monitoring process, a process monitoring process, a database writing process, a log writing process (log write, LGWR) and a checkpoint process. Together, these database processes 1213 complete database management tasks. In addition, all processes in this embodiment can also be described as threads instead.
  • the log writing process is used to write the transaction log of the redo log buffer into the redo log file on the disk for permanent storage.
  • the starting conditions of the log writing process include that the developer user 140 can submit a transaction through an instruction (for example, a commit statement), the remaining storage capacity of the redo log buffer is greater than or equal to a preset threshold, and the log writing process clears the dirty data in the data high-speed buffer.
  • Cache blocks (dirty buffer) are written to data files and log write processes are started periodically. For example, start a log writing process every 10 seconds.
  • Database storage system 122 may refer to files stored on disk.
  • files include data files (data files) 1221, control files (control files) 1222, redo log files (redo log files) 1223, parameter files (parameter files) 1224, and archived log files (archived log files) 1225.
  • the data file 1221 includes the data of the database.
  • Control file 1222 includes binary content that records database structure information. When the database is started, the data files and redo log files are loaded according to the information in the control file, and finally the database is opened.
  • the parameter file 1224 includes content required for the database startup process, for example, records the setting of explicit parameters of the database.
  • the archived log file 1225 is used for backing up and recording the data of the redo log file 1223 to avoid loss of recorded data when the redo log file 1223 is rewritten.
  • the redo log file 1223 includes recording and saving transaction logs in the form of redo records, that is, the change operations performed by users on the database, and is the most important physical file in the database. Redo log files can be used to redo (redo) or roll back (undo) transactions.
  • the functions or functional modules of the database system 120 described in the above embodiments can be realized by a server or server cluster, and the specific form of the database system 120 is not limited in this application.
  • an embodiment of the present application provides a data processing method.
  • the computing device obtains the first operation command for the relational database, it determines the acceleration strategy for the first operation command, and executes the first operation command according to the acceleration strategy.
  • the action for the action command is used to indicate a manner of accelerating the processing of the first operation command.
  • the computing device selects the bypass framework to process the operation of the first operation command, or uses dynamic partition pruning to determine the target partition for the data table associated with the first operation command.
  • FIG. 2 is a flowchart of a data processing method provided by an embodiment of the present application.
  • a computing device may refer to a server or a device in a server cluster. The method includes the following steps.
  • Step 210 the computing device acquires a first operation command of the relational database.
  • the first operation command may refer to an SQL statement.
  • the computing device may receive a first operation command from a client (eg, client 110).
  • client eg, client 110
  • the computing device performs the function of the above-mentioned connector 1211a, that is, the computing device receives a message containing the first operation command from the client, the computing device can parse the message to obtain the first operation command, and verify the legality of sending the first operation command.
  • Authentication is performed to ensure that legitimate users access the relational database (such as the database system 120 ), ensuring the security of the relational database.
  • Step 220 the computing device acquires a set of operation steps according to the first operation command.
  • the set of operation steps includes operation steps required to execute the first operation command.
  • the operation steps required by the first operation command may refer to an execution plan for executing the first operation command.
  • the computing device executes the functions of the analyzer 1211b and the optimizer 1211c, that is, the computing device performs lexical and grammatical analysis on the first operation command to obtain semantic information of the first operation command.
  • the optimizer 1211c is configured to generate an execution plan of the first operation command according to the first operation command including semantic information, that is, obtain an operator of a relational database required to execute the first operation.
  • the executor 1211d After the executor 1211d acquires the set of operation steps, it may determine the processing mode according to the identifier of the first operation command. Wherein, the identifier is used to indicate the processing mode that the first operation command can adopt.
  • the processing mode includes executing the processing of the first operation command using the bypass framework.
  • the operation steps of the executor 1211d executing the SQL statement based on the bypass framework d2 are defined as the first processing mode.
  • the first processing mode is used to indicate to speed up the execution of operations on the relational database.
  • the operation steps of the executor 1211d executing the SQL statement based on the execution framework d1 are defined as the second processing mode. That is to say, in the embodiments provided in the present application, the first processing mode or the second processing mode may be selected to perform data processing according to requirements.
  • the computing device may determine whether to execute the first operation command in the first processing mode according to the following steps 230 and 240 .
  • Step 230 the computing device judges whether the enable flag indicates to perform operations on the relational database in the first processing mode according to the value of the enable flag.
  • the enabling flag is 1, it indicates that the operation on the relational database is performed in the first processing mode, that is, the operation steps of the SQL statement are allowed to be executed based on the bypass frame d2; if the enabling flag is 0, it indicates that it is not Enable (disable) perform operations on the relational database in the first processing mode, and enable execution of operations on the relational database in the second processing mode, that is, do not allow the execution of the operation steps of the SQL statement based on the execution frame d1.
  • the specific value of the enable flag and the meaning of each value can be pre-configured by the system administrator according to business requirements.
  • the computing device can enable or disable the first processing mode at the database configuration file, user interface, database client tool configuration and other entries.
  • step 240 If the enable flag indicates that the operation on the relational database is performed in the first processing mode, perform step 240 ; if the enable mark indicates that the operation on the relational database is performed in the second processing mode, perform step 270 .
  • Step 240 the computing device judges whether to execute the first operation command in the first processing mode according to the operation steps required for executing the first operation command.
  • the executor 1211d first determines the command type according to the first operation command, and the command type may be select (select), insert (intert), update (update) or delete (delete).
  • the executor 1211d obtains the preset operator tree of the command type to which the first operation command belongs, compares the operator tree of the operation steps required by the first operation command with the preset operator tree of the command type to which the first operation command belongs, and determines Whether to execute the first operation command in the first processing mode.
  • the executor 1211d can compare each operator in each layer of the operator tree. If each operator in each layer is the same, or the operator in each layer that realizes the main operation function is the same, determine to execute the first processing mode in the first processing mode. An operation command.
  • the top-level nodes include filter operators, aggregation operators, sort operators, and index scan operators; intermediate nodes include partition operation operators, filter operators, and aggregation operators. sub and sorting operators; leaf nodes include index scanning operators.
  • the top-level node includes the modify table operator.
  • the top-level nodes include table modification operators; middle nodes include partition operation operators; leaf nodes include index scan operators.
  • the top-level nodes include the modify table operator; the middle nodes include the partition operation operator; and the leaf nodes include the index scan operator.
  • the computing device may store the operation steps of the first operation command.
  • the computing device After the computing device receives the first operation command of the relational database again, it can obtain the historical operation steps of the first operation command from the cache, and the historical operation steps of the first operation command can be the first operation command in the first processing mode or the operation steps of the first operation command in the second processing mode. Therefore, repeated execution of steps 230 to 250 is avoided, the steps in the process of executing the first operation command on the relational database are reduced, and the performance of the relational database and the utilization rate of hardware resources are improved.
  • the computing device when the computing device receives an operation command of the same type, it can know the operation steps of the command of the same type that have been executed, that is, it can know the operation steps of the command of the same type through the execution of historical commands of the same type.
  • step flow of the data processing method provided in the embodiment of the present application may be adaptively deleted or the execution order may be changed as required.
  • the computing device does not need to execute step 230, but executes step 240 after executing step 220.
  • the first operation is to obtain the operation steps of the first operation command in the second processing mode according to the first operation command, that is, perform step 270 .
  • step 250 the computing device performs a merge operation on the operation steps in the operation step set according to the type of the first operation command, to obtain a merged operation step set.
  • the executor 1211d performs a merge operation on the operation steps in the operation step set according to the type of the first operation command according to the merge rule to obtain the merged operation step set.
  • the merging rule indicates, for example, an operator that performs a merging operation according to an operation command type.
  • the operation command types may include select commands, insert commands, delete commands, update commands, and update-based select commands.
  • the merge rule indicates an operator of a select command, an operator of an insert command, an operator of a delete command, an operator of an update command, and an operator of an update-based select command that can perform a merge operation.
  • the executor 1211d merges the operators of the operation steps in the operation step set according to the operators performing the merge operation indicated by the merge rule to obtain the merged operation step set.
  • the set of post-merge operation steps may contain at least one post-merge operation step.
  • the post-merge operation step includes at least one relational database operator.
  • the post-merge operation steps include select-merge operations, insert-merge operations, delete-merge operations, update-merge operations, update-based select-merge operations, scan-merge operations, aggregation-merge operations, sort-merge operations, and the like.
  • the scan merge operation is basically consistent with the scan operation.
  • different post-merging operation steps realize the operation functions of different relational databases
  • different post-merging operation steps include at least one different operator, but different post-merging operation steps may also include at least one same operator.
  • the embodiment of this application does not strictly distinguish the operators included in the post-merging operation steps.
  • the system administrator can adaptively configure the post-merging operation steps according to business needs, so that the executor 1211d can perform operations on relational databases.
  • the operators of the relational database required for operation are reasonably combined to improve the performance of the relational database and the utilization of hardware resources.
  • the computing device merges the operation steps in the operation step set based on the merging rule, so that the number of recursive operations between operators is correspondingly reduced, thereby improving the processing efficiency of the first operation.
  • the operators of the relational database required to execute the first operation command include some operators not indicated by the merging rules, and these unindicated operators are not included in the preset operator tree, and the executor 1211d may omit these unindicated operators, and combine operators in the relational database required to execute the first operation command based on the merging rule except for the unindicated operators to obtain a set of combined operation steps.
  • the executor 1211d does not omit any operator in the relational database operator required by the first operation command, and the executor 1211d will execute the relational database operator required by the first operation command based on the merge rule. The operators are combined to obtain the operation steps of the first operation command in the first processing mode.
  • the number of operators included in the operation steps of the first operation command in the first processing mode is less.
  • the number of recursive calls between operators in the operation steps of the first operation command in the second processing mode is less.
  • SQL statement execution plan includes client, aggregation operation, join table operation, partition operation and scan operation.
  • Aggregation operations are used to combine data of the same category and perform operations on the combined data, such as addition and subtraction.
  • the join table operation is used to associate data in different data tables.
  • the partition operation is used to perform a partition operation on at least one partition in the data table.
  • the scan operation is used to obtain the data required by the operation command from the relational database.
  • Aggregation operations, join table operations, partition operations, and scan operations may refer to four operators in relational databases.
  • the client calls the aggregation operator, the aggregation operator calls the join table operator, the join table operator calls the partition operator, and the partition operator calls the scan operator.
  • the scan operator reads the data in the data table, the partition operator selects the corresponding columns according to the data read by the scan operator to perform the table join operation, and the join table operator executes the join table operation according to the results returned by the partition operator that meet the join conditions , the aggregation operator performs aggregation operations based on the results of the join table operator.
  • the aggregation operator feeds back the data of the aggregation result to the client.
  • the executor 1211d abstracts the operators based on the execution framework so that each operator can be implemented independently without concern for the logic of other operators. But the recursive calls of multiple operators make the entire stack very deep.
  • (b) in FIG. 3 is a schematic diagram of an execution stack of an SQL statement.
  • the ExecProcNode function represents the execution entry of each operator.
  • ExecScan ⁇ ExecNest Loop ⁇ ExecAgg ⁇ ExecPartIterator is a specific instantiation operator.
  • the instantiation of ExecProcNode is a polymorphic process of virtual functions. A large number of virtual functions Calls make the processor underutilized for core business.
  • the executor 1211d combines the above-mentioned aggregation operator, scan operator, join table operator and partition operator based on the bypass framework to obtain an aggregation and merging operator. After the executor 1211d invokes the aggregation and merging operator, the functions of the aggregation operator, the scan operator, the join table operator and the partition operator are realized. It reduces the branch judgment and operator recursive calls in the original various complex scenarios, and accelerates the entire SQL statement execution process.
  • the executor 1211d executes the acceleration stack, it does not need to perform tree-like iterative execution on the execution operator like the execution stack , but to access the storage engine interface (for example, the index_getnext interface is an interface provided by the storage engine to access the corresponding data through the index) to process the data, effectively saving the overhead of the processor and the output and input interfaces.
  • the storage engine interface for example, the index_getnext interface is an interface provided by the storage engine to access the corresponding data through the index
  • the storage engine interface for example, the index_getnext interface is an interface provided by the storage engine to access the corresponding data through the index
  • Step 260 the computing device completes the first operation command according to the combined operation step set.
  • the executor 1211d executes multiple merged operations included in the operation steps of the first operation command in the first processing mode, and completes the first operation on the relational database by reducing the number of recursive calls. Thus, the performance of the relational database and the utilization of hardware resources are improved.
  • Step 270 the computing device completes the first operation command according to the set of operation steps of the first operation command.
  • the executor 1211d executes the operation steps of the first operation command in the second processing mode to complete the first operation on the relational database.
  • the processing mode based on the acceleration policy instruction provided by the embodiment of the present application is to perform multiple operations on multiple operators based on the bypass framework. Merge, obtain the merged operator, execute the merged operator to reduce the number of recursive calls between operators, complete the operation on the relational database as quickly as possible to obtain the operation result, improve the performance of the relational database, and avoid excessive hardware occupation resources to execute the process of recursive calls between operators, which improves the utilization of hardware resources.
  • the data processing process may also be accelerated based on the partition function.
  • the data table includes multiple partitions obtained by dividing the data table, and each partition can be a small data table.
  • the partition operation (or partIterator operator) can access the partition indicated by the partition identifier according to the partition identifier (also called partition attribute, partition key, and partition key parameter) provided by the SQL statement execution plan .
  • Partition operations include selection, insertion, deletion, and update of data tables. As shown in FIG. 4 , the embodiment of the present application provides a partition operation method including the following steps.
  • Step 410 the computing device determines the target partition in the data table associated with the first operation command by means of dynamic partition pruning.
  • a partition identifier may refer to an attribute in a data table. Attributes are also different for different data tables, for example, for financial statements, attributes can include income and expenses. As another example, for the performance statistics table, the attributes can include class, name, gender and grades.
  • Step 420 the computing device executes the operation of the first operation command on the target partition.
  • the executor 1211d executes the operation of the first operation command on the target partition according to the target partition indicated by the partition identifier. For example, the executor 1211d performs select, insert, delete, update, etc. on data in the target partition.
  • the first operation command indicates the partition ID 1 and the partition ID 2
  • the first operation command indicates to delete the partition 1 indicated by the partition ID 1
  • indicates the partition 2 indicated by the partition ID 2 Perform an update operation.
  • the executor 1211d determines the partition 1 indicated by the partition ID 1, and determines the partition 2 indicated by the partition ID 2.
  • the executor 1211d performs a delete operation on partition 1 and an update operation on partition 2 .
  • the function of the executor 1211d dynamically accessing the partition provided by the embodiment of the present application is not limited to the processing mode described in the above embodiment, and the executor 1211d can dynamically access the partition in the first processing mode or the second processing mode partition.
  • the embodiment of this application provides a partition operation method.
  • the executor 1211d can modify the execution plan of the SQL statement according to the partition indicated by the partition identifier, so that the executor 1211d can dynamically access the target partition indicated by the partition identifier, avoiding unnecessary partitions. Scanning reduces competition for resources and improves the performance of relational databases.
  • the computing device After the computing device operates the relational database, it will generate a transaction log (or called a redo log), and the transaction log is used to describe specific change information on the data. How to store the transaction log will also affect the process of data processing.
  • a transaction log or called a redo log
  • the business thread that generates the first operation log can also be added to the log group, and the The operation logs generated by all business threads are written into the memory in parallel.
  • the computing device may concurrently store the operation logs in the first operation log group to the memory according to the first rule.
  • the first operation log group includes at least two operation logs. At least two operation logs include the first operation log.
  • the first rule is used for judging whether the operation logs in the first operation log group meet the condition of being stored in the memory.
  • the first rule may include obtaining a write permission to a redo log buffer for storing logs in the memory, or at least one operation log included in the first operation log group has been written into the main memory.
  • the embodiment of the present application describes the process of writing a transaction log.
  • Step 510 the computing device acquires the write permission of the redo log buffer with the first operation log group, and writes the logs in the first operation log group into the redo log buffer in parallel.
  • the processor in the computing device may include multiple processor cores, and different processor cores may be used to process different operations on the relational database. Specifically, one or more processes or threads may run on each processor core, Operation instructions are executed by a specific process or thread, and at the same time, different processor cores can generate different operation logs. Processor cores can also indicate business threads that generate operation logs. For example, after the first service thread in the computing device executes the first operation command on the relational database, a first operation log is generated.
  • the members included in the first operation log group are service threads, and the number of members included in the first operation log group may be pre-configured or determined according to the number of service threads generating operation logs.
  • the service threads contained in the first operation log group are preconfigured, for example, the first operation log group contains specified service threads in the computing device. In other words, the number of operation log groups and the business threads included in each operation log group can be preconfigured.
  • the number of operation log groups and the number of business processes included in each operation log group can be divided according to the ability of computing devices to perform log persistence operations in parallel; and the business processes included in each operation log group can be According to the identification of the processor core, it can also be divided according to the processing capability of the processor core, and it can also be divided according to the priority of the service thread.
  • the redo log buffer may be the redo log buffer 12123 in the storage engine layer 1212 shown in FIG. 1 .
  • the redo log buffer may be a storage space in a memory (such as main memory) of the computing device performing the above steps 210 to 270, or it may be a storage space in other memories connected to the computing device.
  • the size of the storage space for the redo log buffer can be 1G.
  • the representative thread may also be determined according to the identifier of the service thread, the resource occupancy rate of the service thread, the running time, and the free resources of the hardware where the process is located.
  • N business threads form an operation log group
  • the first business thread that joins the operation log group is used as a representative thread (leader)
  • the business threads that join the operation log group afterwards are all member threads (follower threads). ).
  • the representative thread competes for the write permission of the redo log buffer on behalf of the group, and other member threads in the group sleep, waiting for the representative thread to wake up.
  • the business thread tries to obtain the write permission of the redo log buffer in an exclusive manner.
  • the number of write permissions for the redo log buffer determines the concurrency of transaction log writes. For example, the number of write permissions for the redo log buffer is 48, which means that 48 service threads can write transaction logs to the redo log buffer at the same time. Therefore, the delegate thread competes for a write permission from a preset number of write permissions for the redo log buffer on behalf of the group.
  • the representative thread After the representative thread competes for the write permission of the redo log buffer, the representative thread traverses the size of the transaction log to be written by each business thread in the group, and obtains the transaction log size required by all business threads in the group to write to the redo log buffer. The size of the storage space. After the representative thread applies for the required storage space in the redo log buffer from the controller controlling the main memory, all business threads in the group write transaction logs into the redo log buffer in parallel. Each business thread serially writes multiple generated transaction logs into the redo log buffer.
  • the representative thread releases the write permission of the redo log buffer, and wakes up all dormant business threads in the group.
  • the computing device merges multiple transaction logs into one group, uses the group to preempt the write permission of the redo log buffer, and writes the transaction logs in the group into the redo log buffer in parallel, reducing the number of redo log buffers.
  • the number of times of contention for write permissions improves the efficiency of writing transaction logs.
  • Step 520 the computing device writes at least one log in the redo log buffer meeting the write-allowed state from the redo log buffer to the disk.
  • the redo log writer thread detects redo logs in the redo log buffer that can be written to disk.
  • the computing device sets an array for the redo logs in the redo log buffer to record the status of the transaction logs in the redo log buffer.
  • the depth of the array can be set according to business requirements, for example, the depth of the array can also be 100.
  • the array contains the log sequence number (log sequence number, LSN), log record count (Log Record Count, LRC) and status.
  • the log sequence number is used to ensure that the interval in which the transaction log is written to the redo log buffer is globally unique.
  • the log sequence number indicates the length of the transaction log.
  • the number of log records is used to ensure the uniqueness of the transaction log number.
  • the state includes a write-permitted state and a write-non-permitted state.
  • the write-allowed status is used to indicate that the log is allowed to be written to disk from the redo log buffer.
  • the Write Not Allowed state is used to indicate that the log is not allowed to be written to disk from the redo log buffer.
  • the write permission of the redo log buffer is released, and the log status is in the write-allowed state; if the business thread writes the transaction log into the redo log buffer, the log The status is a write-disabled status.
  • the redo log writer thread After the service thread writes the transaction log into the redo log buffer, the redo log writer thread writes at least one redo log in the redo log buffer that satisfies the write-allowed state from the redo log buffer to the disk. If there are more than two redo logs in the redo log buffer that satisfy the write-allowed state, the redo log writer thread serially writes the two or more redo logs to disk.
  • the redo log writer thread After the redo log writer thread writes at least one redo log in the redo log buffer that satisfies the write-allowed state from the redo log buffer to the disk, it flushes the log in the redo log buffer with the write-allowed state to the non-write-allowed state .
  • FIG. 7 it is a schematic diagram of an array provided by the embodiment of the present application.
  • the array depth is 8, that is, the status of 8 redo logs can be recorded.
  • the status of the redo log indicated by LSN0-LSN4 is write-enabled, it means that the redo log indicated by LSN0-LSN4 is a complete log that has been written into the redo log buffer by the business thread, and the redo log write thread Redo logs are written to disk from the redo log buffer.
  • redo log indicated by LSN5 Since the status of the redo log indicated by LSN5 is not allowed to be written, it means that the redo log indicated by LSN5 is the log written into the redo log buffer by the business thread has not been completely written, and the redo log writing thread does not write LSN0-LSN4 The redo logs are written to disk from the redo log buffer.
  • the states of LSN0-LSN4 are updated to the non-writing state.
  • the redo log writing thread traverses each item in the array, and determines that the status of the redo log indicated by LSN6-LSN7 is the write-allowed state, indicating that the redo log indicated by LSN6-LSN7 is the complete redo log buffer that has been written by the business thread.
  • the redo log writer thread writes the redo logs of LSN6-LSN7 from the redo log buffer to disk.
  • redo log writing thread writes the redo log from the redo log buffer to the disk, it also needs to initialize the storage space for storing log files in the disk according to the amount of logs to be written to the disk.
  • the above step of initializing the storage space for storing log files in the disk by the redo log writer thread may be performed by the redo log file initialization thread (walfileinit).
  • the step of writing the redo log from the redo log buffer to the disk can be performed by the redo log flushing thread (walflusher).
  • the redo log writing thread decouples the process of writing the redo log into the redo log buffer and the process of writing the redo log to the disk, and the log writing thread does not need to wait for the release of the write permission of the redo log buffer.
  • the log is written to the disk according to the log status of the log recorded in the array, which removes the restriction on the write permission of the redo log buffer and improves the overall performance of the relational database.
  • the computing device includes corresponding hardware structures and/or software modules for performing various functions.
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software with reference to the units and method steps of the examples described in the embodiments disclosed in the present application. Whether a certain function is executed by hardware or computer software drives the hardware depends on the specific application scenario and design constraints of the technical solution.
  • FIG. 8 is a schematic structural diagram of a possible data processing device provided by this embodiment.
  • the data processing device may be a module for implementing the service layer 1211 shown in FIG. 1 , or a module (such as a chip) applied to a server.
  • the data processing device 800 includes a communication module 810 , an analysis module 820 , an optimization module 830 , an execution module 840 , a log writing module 850 and a storage module 860 .
  • the data processing apparatus 800 is configured to implement the functions of the computing device in the method embodiments shown in FIG. 2 , FIG. 3 , or FIG. 5 above.
  • the communication module 810 is configured to obtain a first operation command, where the first operation command is, for example, an SQL statement.
  • the first operation command is used to execute data processing in the relational database.
  • the communication module 810 is used to execute step 210 in FIG. 2 .
  • the analysis module 820 is used to perform lexical and grammatical analysis on the SQL statement to obtain the SQL statement including semantic information.
  • the optimization module 830 is used to generate the operation steps of the SQL statement according to the SQL statement containing the semantic information.
  • the execution module 840 is configured to determine an acceleration strategy for the first operation command, and the acceleration strategy is used to accelerate the processing of the first operation command; and execute the operation of the first operation command according to the acceleration strategy.
  • the execution module 840 is specifically configured to determine the acceleration strategy for the first operation command, including at least one of the following methods: determine the processing mode according to the identification of the first operation command, where the identification is used to indicate the processing mode that the first operation command can adopt , the processing mode includes using the bypass framework to execute the processing of the first operation command; or, using the dynamic partition pruning method to determine the target partition in the data table associated with the first operation command.
  • the executing module 840 is configured to execute steps 220 to 270 in FIG. 2 .
  • the storage module 860 is used for storing acceleration policies, bypass frameworks, execution frameworks, log files and so on.
  • the data processing device 800 also includes an update module 870 .
  • the update module 870 is used to update parameters such as the acceleration strategy, the bypass framework and the execution framework stored in the storage module 860
  • the data processing device 800 in the embodiment of the present application may be implemented by an ASIC, or a programmable logic device (programmable logic device, PLD), and the above-mentioned PLD may be a complex program logic device (complex programmable logical device, CPLD), Field-programmable gate array (field-programmable gate array, FPGA), general array logic (generic array logic, GAL) or any combination thereof.
  • PLD programmable logic device
  • the data processing method shown in FIG. 2 , FIG. 3 , or FIG. 5 can also be realized by software
  • the data processing device 800 and its modules can also be software modules.
  • the data processing device 800 may correspond to the implementation of the method described in the embodiment of the present application, and the above-mentioned and other operations and/or functions of the various units in the data processing device 800 are respectively in order to realize Fig. 2 , Fig. 3 , Or the corresponding flow of each method in FIG. 5 , for the sake of brevity, details are not repeated here.
  • FIG. 9 is a schematic structural diagram of a computing system 900 provided in this embodiment. As shown, computing system 900 includes processor 910 , bus 920 , memory 930 and communication interface 940 .
  • the processor 910 may be a CPU, and the processor 910 may also be other general-purpose processors, DSP, ASIC, FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components Wait.
  • a general purpose processor may be a microprocessor or any conventional processor or the like.
  • the processor can also be a GPU, NPU, microprocessor, ASIC, or one or more integrated circuits used to control the program execution of the solution of this application.
  • the communication interface 940 is used to realize the communication between the computing system 900 and external devices or devices.
  • the communication interface 940 is used to obtain operation commands for relational databases.
  • Bus 920 may include a path for transferring information between the components described above (eg, processor 910 and memory 930).
  • the bus 920 may also include a power bus, a control bus, a status signal bus, and the like. However, for clarity of illustration, the various buses are labeled as bus 920 in the figure.
  • computing system 900 may include multiple processors.
  • the processor may be a multi-CPU processor.
  • a processor herein may refer to one or more devices, circuits, and/or computing units for processing data (eg, computer program instructions).
  • the processor 910 may call the bypass framework stored in the memory 930 to perform a merge operation on the operation steps in the operation step set of the first operation command, obtain a combined operation step set, and complete the first operation command according to the merged operation step set.
  • the computing system 900 includes only one processor 910 and one memory 930 as an example.
  • the processor 910 and the memory 930 are respectively used to indicate a type of device or device.
  • the quantity of each type of device or equipment can be determined according to business needs.
  • the memory 930 may correspond to a storage medium for storing information such as an acceleration strategy, a bypass framework, an execution framework, and a log file in the above method embodiments, for example, a magnetic disk, such as a mechanical hard disk or a solid-state hard disk.
  • computing system 900 may be a general-purpose device or a special-purpose device.
  • computing system 900 may also be a server or other computing-capable devices.
  • the computing system 900 can also be a cluster composed of multiple computing devices, the cluster includes multiple computing devices, and multiple computing devices can be connected through a network, the structure of each computing device is shown in the figure 9, for the sake of brevity, it is not repeated here.
  • the computing system 900 may correspond to the data processing device 800 in this embodiment, and may correspond to a corresponding subject executing any method in FIG. 2 , FIG. 3 , or FIG. 5 , and the data
  • the above-mentioned and other operations and/or functions of each module in the processing device 800 are respectively for realizing the corresponding flow of each method in FIG. 2 , FIG. 3 , or FIG. 5 , and for the sake of brevity, details are not repeated here.
  • the method steps in this embodiment may be implemented by means of hardware, and may also be implemented by means of a processor executing software instructions.
  • Software instructions can be composed of corresponding software modules, and software modules can be stored in random access memory (random access memory, RAM), flash memory, read-only memory (read-only memory, ROM), programmable read-only memory (programmable ROM) , PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM), register, hard disk, mobile hard disk, CD-ROM or known in the art any other form of storage medium.
  • An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium.
  • the storage medium may also be a component of the processor.
  • the processor and storage medium can be located in the ASIC.
  • the ASIC can be located in a network device or a terminal device.
  • the processor and the storage medium may also exist in the network device or the terminal device as discrete components.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product comprises one or more computer programs or instructions. When the computer program or instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are executed in whole or in part.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, network equipment, user equipment, or other programmable devices.
  • the computer program or instructions can be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer program or instructions can be downloaded from a website, computer, A server or data center transmits to another website site, computer, server or data center by wired or wireless means.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrating one or more available media. Described usable medium can be magnetic medium, for example, floppy disk, hard disk, magnetic tape; It can also be optical medium, for example, digital video disc (digital video disc, DVD); It can also be semiconductor medium, for example, solid state drive (solid state drive) , SSD).

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Operations Research (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据处理的方法、装置和计算系统,涉及数据库领域。计算设备获取到对关系型数据库的第一操作命令后,确定第一操作命令的加速策略,依据加速策略执行第一操作命令的操作。如此,计算设备依据加速策略指示的对第一操作命令的处理过程进行加速的方式对第一操作命令的处理过程进行加速后,减少了计算设备处理第一操作命令的时长,提升了关系型数据库的性能,提高了硬件资源利用率。

Description

数据处理的方法、装置和计算系统
本申请要求于2021年07月09日提交国家知识产权局、申请号为202110778003.8、申请名称为“数据处理的方法、装置和计算系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据库领域,尤其涉及一种数据处理的方法、装置和计算系统。
背景技术
关系型数据库是指基于关系模型组织数据,并以行和列的形式存储数据的数据库,例如,PostgreSQL,MySQL,openGauss均属于关系型数据库。关系型数据库具有高并发特征,即大量用户访问关系型数据库中相关数据的特性。通常,为了确保数据正确的条件下尽可能快地向用户反馈结果,通过多核处理器、扩展计算和存储资源等方法来提升关系型数据库的处理性能。但硬件资源的横向扩展并不能从根本上解决关系型数据库的处理性能问题。例如,处理器基于火山模型将对关系型数据库的操作命令转换为多个算子,多个算子通过多次递归调用实现对关系型数据库的操作,则导致关系型数据库的性能降低,以及硬件资源利用率降低。因此,如何提高关系型数据库的性能和硬件资源利用率是亟待解决的问题。
发明内容
本申请提供了数据处理的方法、装置和计算系统,由此来确保提高关系型数据库的性能和硬件资源利用率。
第一方面,提供了一种数据处理方法,该方法可以由计算设备执行,所述方法具体包括计算设备获取到对关系型数据库的第一操作命令后,确定第一操作命令的加速策略,依据加速策略执行第一操作命令的操作。加速策略用于指示对第一操作命令的处理过程进行加速的方式。
如此,计算设备依据加速策略对第一操作命令的处理过程进行加速后,减少了计算设备处理第一操作命令的时长,提升了关系型数据库的性能,提高了硬件资源利用率。
在一种可能的实现方式中,计算设备确定第一操作命令的加速策略包括计算设备根据第一操作命令的标识确定处理模式,其中,标识用于指示第一操作命令所能采用的处理模式,处理模式包括采用旁路框架执行第一操作命令的处理。
示例地,计算设备选择旁路框架处理第一操作命令的操作。计算设备根据第一操作命令确定操作步骤集合后,根据第一操作命令的类型对操作步骤集合中操作步骤进行合并操作,获得合并后操作步骤集合,进而,计算设备根据合并后操作步骤集合完成第一操作命令。操作步骤集合包括执行第一操作命令所需的操作步骤。
如此,相对于基于执行框架对多个算子通过多次递归调用实现对关系型数据库的操作,本申请实施例提供的基于加速策略指示的处理模式,即基于旁路框架对多个算 子进行合并,得到合并后算子,执行合并后算子减少算子间递归调用的次数,尽可能快地完成对关系数据库的操作得到操作结果,提升了关系型数据库的性能,以及避免硬件占用过多的资源执行算子间递归调用的过程,提高了硬件资源利用率。
在另一种可能的实现方式中,计算设备确定第一操作命令的加速策略包括在第一操作命令所关联的数据表中采用动态分区剪枝方式确定目标分区。
示例地,计算设备根据第一操作命令的操作指示的目标分区的属性从数据表中确定目标分区,对目标分区执行第一操作命令的操作。目标分区包含了数据表中至少一个属性的数据。
如此,计算设备中的执行器可以根据分区标识指示的分区,修改SQL语句执行计划,使执行器动态地根据访问分区标识指示的目标分区,避免对无用分区进行扫描,降低了竞争资源,提升了关系型数据库的性能。
在另一种可能的实现方式中,方法还包括:计算设备根据第一规则并行地将第一操作日志组中操作日志存储至存储器,第一操作日志组包括至少两个操作日志,至少两个操作日志包括第一操作日志,其中,第一规则用于判断第一操作日志组中操作日志是否满足存储至存储器的条件。
具体地,计算设备以第一操作日志组获取重做日志缓冲区的写入权限,将第一操作日志组中的日志并行地写入重做日志缓冲区,以及将重做日志缓冲区中满足允许写状态的至少一个日志从重做日志缓冲区并行地写入磁盘。
示例地,重做日志写线程将重做日志写入重做日志缓冲区。重做日志写线程检测重做日志缓冲区中可以写入磁盘的重做日志。可选地,重做日志写线程检测重做日志缓冲区中连续的写入磁盘的重做日志。重做日志写线程根据待写入磁盘的日志量初始化磁盘中存储日志文件的存储空间。重做日志写线程将重做日志缓冲区中连续的可以写入磁盘的重做日志写入磁盘。
可选地,上述重做日志写线程(walwriter)初始化磁盘中存储日志文件的存储空间的步骤可以由重做日志文件初始化线程(walfileinit)执行。将重做日志从重做日志缓冲区写入磁盘的步骤可以重做日志刷盘线程(walflusher)执行。
从而,计算设备将多个事务日志合并为一个组,以组抢占重做日志缓冲区的写入权限,并将组内的事务日志并行地写入重做日志缓冲区,减少重做日志缓冲区的写入权限的争抢次数,提升写入事务日志的效率。另外,计算设备中的重做日志写线程将重做日志写入重做日志缓冲区的过程和重做日志写入磁盘的过程进行了解耦,日志写线程无需等待释放重做日志缓冲区的写入权限,根据数组记录了日志的日志状态将日志写入磁盘,解除了重做日志缓冲区的写入权限的限制,提升了关系型数据库的整体性能。
其中,第一规则包括:获取到存储器中用于存储日志的重做日志缓冲区的写入权限,或者,第一操作日志组包括的至少一个操作日志的状态为允许写状态。
在另一种可能的实现方式中,在根据第一规则并行地将第一操作日志组中操作日志存储至存储器之前,方法还包括:根据待写入操作日志的数量划分第一操作日志组。
其中,关系型数据库为openGauss,操作日志包括重写日志(redo log)和预写日志(Write Ahead log)中至少一种。
第二方面,提供了一种数据处理装置,所述装置包括用于执行第一方面或第一方面任一种可能设计中的数据处理方法的各个模块。
第三方面,提供一种计算系统,该计算系统包括至少一个处理器和存储器,存储器用于存储一组计算机指令;当处理器作为第一方面或第一方面任一种可能实现方式中的执行设备执行所述一组计算机指令时,执行第一方面或第一方面任一种可能实现方式中的数据处理方法的操作步骤。
作为一种可能的实现方式,计算系统可以为单个计算设备或者为由多个计算设备构成的集群。
第四方面,提供一种计算机可读存储介质,包括:计算机软件指令;当计算机软件指令在终端中运行时,使得计算机执行如第一方面或第一方面任意一种可能的实现方式中所述方法的操作步骤。
第五方面,提供一种计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行如第一方面或第一方面任意一种可能的实现方式中所述方法的操作步骤。
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。
附图说明
图1为本申请提供的一种数据库系统的示意图;
图2为本申请提供的一种数据处理的方法的流程图;
图3为本申请提供的一种堆栈的示意图;
图4为本申请提供的一种分区操作方法的流程图;
图5为本申请提供的一种写日志的方法的流程图;
图6为本申请提供的一种写日志的过程示意图;
图7为本申请提供的一种写日志的过程示意图;
图8为本申请提供的一种数据处理装置的结构示意图;
图9为本申请提供的一种计算系统的结构示意图。
具体实施方式
数据库系统(database system)包括数据库(database)和数据库管理系统(database management system,DBMS)。数据库是一个按数据结构来存储和管理数据的计算机软件系统。也可理解为,数据库是一个计算机中用于存储和管理大量数据的集合,即电子化的文件柜。数据库存储的数据可以包括出行记录、消费记录、浏览的网页、发送的消息、图像、音乐和声音等等。数据库管理系统是一种用于管理数据库的软件,用于建立、使用和维护数据库。数据库管理系统对数据库进行统一的管理和控制,以确保数据库的安全性和完整性。用户可以通过数据库管理系统访问数据库中的数据。数据库管理员可以通过数据库管理系统进行数据库的维护工作。数据库管理系统支持多个数据库客户端程序建立、修改和询问数据库。按照数据库类型划分,数据库包括关系型数据库(如:PostgreSQL,MySQL,openGauss数据库)和非关系型数据库(如:Cassandra数据库)。
本申请实施例主要提供了一种提升关系型数据库的性能的方案。下面将结合附图对本申请实施例的实施方式进行详细描述。
图1为本实施例提供的一种数据库系统的示意图。客户端110通过网络130与数据库系统120进行通信。网络130可以是指互联网(Internet)。客户端110也可以称为用户端,与服务器相对应,为客户提供本地服务的程序。客户端110还可以是指接入网络130的计算机,也可称为工作站(workstation)。开发者用户140可以通过客户端110调用应用平台接口(application platform interface,API)111、命令行界面(command-line interface,CLI)接口112或Java数据库连接(java database connectivity,JDBC)接口113访问数据库系统120。
数据库系统120包括数据库管理系统121和数据库存储系统122。
数据库管理系统121包括服务层1211、存储引擎层1212和数据库进程1213。服务层1211,用于处理访问数据库系统120的结构化查询语言(Structured Query Language,SQL)语言。SQL语言是一种数据库查询和程序设计语言,用于存取数据以及查询、更新和管理关系型数据库。服务层1211可以包括连接器1211a、分析器1211b、优化器1211c和执行器1211d等模块处理访问数据库系统120的SQL语句。其中,连接器1211a,用于接收客户端110发送的SQL语句;以及对发送SQL语句的用户进行认证,确保合法用户访问数据库系统120,保障数据库系统120的安全性。分析器1211b用于对SQL语句进行词法和语法解析,得到包含语义信息的SQL语句。优化器1211c用于根据包含语义信息的SQL语句生成SQL语句的操作步骤。执行器1211d用于执行SQL语句的操作步骤。
在一种可能的示例下,执行器1211d中配置了执行框架d1和旁路(bypass)框架d2。执行框架d1和旁路框架d2可以是指执行SQL语句的操作步骤的规则。执行器1211d可以从执行框架d1和旁路框架d2中选择一种框架,基于选择得到的框架执行SQL语句的操作步骤,实现对数据库系统120的操作。
在另一种可能的示例下,执行器1211d中配置了旁路框架d2。执行器1211d基于旁路框架d2执行SQL语句的操作步骤,实现对数据库系统120的操作。
执行框架可以是指一种迭代执行操作步骤的模型,比如火山模型。执行器1211d基于执行框架将SQL语句的每个操作抽象成为一个算子(operator),将SQL语句的多个算子构建成一个算子树,从算子树的根节点到叶子节点自上而下地递归调用算子的计算函数来执行SQL语句。算子包括筛选(limit)、聚合(aggregate)、排序(sort)、索引扫描(index scan)、分区操作(partlterator)和修改表(modify table)等。
旁路框架可以包括一种合并执行操作步骤的模型。执行器1211d基于旁路框架合并SQL语句的多个算子得到合并后算子,执行合并后算子完成SQL语句的操作。
存储引擎层1212可以包括共享池(shared global pool)12121、数据高速缓冲区(data buffer cache)12122、重做日志缓冲区(redo log buffer)12123。重做日志缓冲区也可以称为回写日志缓冲区。
共享池12121用于缓存已执行过的SQL语句、SQL程序和数据字典信息中至少一种,例如,可以周期性缓存已执行的SQL语句、SQL程序或数据字典信息中至少一种至共享池12121,上述缓存操作是对SQL语句、SQL程序进行语法分析、编译、执行的区域。数据高速缓冲区12122,用于存储从数据文件中读出的数据和向数据文件写入的数据。重做日志缓冲区12123则用于缓存用户对数据库进行修改操作时生成的重 做记录,即事务日志,例如,重做日志(Redo log),预写日志(Write Ahead log,WAL)。在OpenGauss和PostgreSQL数据库中Redo Log还可以成为XLog。
数据库进程1213包括系统监控进程、进程监控进程、数据库写进程、日志写进程(log write,LGWR)和检查点进程。这些数据库进程1213一起完成数据库管理任务。另外,本实施例中的所有进程也可以替换描述为线程。
日志写进程用于将重做日志缓冲区的事务日志写入磁盘中的重做日志文件中永久保存。日志写进程的启动条件包括开发者用户140可以通过指令(例如,commit语句)提交事务、重做日志缓冲区的剩余存储容量大于或等于预设阈值、日志写进程将数据高速缓冲区中的脏缓存块(dirty buffer)写入数据文件和周期性启动日志写进程。例如,每10秒启动一次日志写进程。
数据库存储系统122可以是指存储在磁盘上的文件。比如,文件包括数据文件(data files)1221、控制文件(control files)1222、重做日志文件(redo log files)1223、参数文件(parameter file)1224和归档日志文件(archived log files)1225。
其中,数据文件1221包括数据库的数据。
控制文件1222包括记录数据库结构信息的二进制内容。数据库启动时,根据控制文件中的信息进行数据文件和重做日志文件的加载,最后打开数据库。
参数文件1224包括数据库启动过程所需的内容,例如,记录了数据库显式参数的设置。
归档日志文件1225用于备份记录重做日志文件1223的数据,避免重做日志文件1223重写时丢失记录的数据。
重做日志文件1223包括以重做记录的形式记录、保存事务日志,即用户对数据库所进行的变更操作,是数据库中最重要的物理文件。利用重做日志文件可以进行事务的重做(redo)或回退(undo)。
上述实施例所述的数据库系统120的功能或功能模块可以由一个服务器或服务器集群实现,本申请对数据库系统120的具体形态不予限定。
为了提高关系型数据库的性能,本申请实施例提供一种数据处理方法,在计算设备获取到对关系型数据库的第一操作命令后,确定第一操作命令的加速策略,依据加速策略执行第一操作命令的操作。加速策略用于指示对第一操作命令的处理过程进行加速的方式。例如,计算设备选择旁路框架处理第一操作命令的操作,或者,对第一操作命令所关联的数据表采用动态分区剪枝方式确定目标分区。
接下来,结合附图对数据处理的详细过程进行阐述。
图2为本申请实施例提供的一种数据处理的方法的流程图。计算设备可以是指一个服务器或服务器集群中的设备。所述方法包括以下步骤。
步骤210、计算设备获取关系型数据库的第一操作命令。
第一操作命令可以是指SQL语句。计算设备可以从客户端(如:客户端110)接收第一操作命令。例如,计算设备执行上述连接器1211a的功能,即计算设备从客户端接收包含第一操作命令的消息,计算设备可以对消息进行解析得到第一操作命令,并对发送第一操作命令的合法性进行认证,确保合法用户访问关系型数据库(如:数据库系统120),保障关系型数据库的安全性。
步骤220、计算设备根据第一操作命令获取操作步骤集合。
操作步骤集合包括执行第一操作命令所需的操作步骤。所述第一操作命令所需的操作步骤可以是指执行第一操作命令的执行计划。计算设备执行上述分析器1211b和优化器1211c的功能,即计算设备对第一操作命令进行词法和语法解析,得到第一操作命令的语义信息。优化器1211c用于根据包含语义信息的第一操作命令生成第一操作命令的执行计划,即得到执行第一操作所需的关系型数据库的算子。
执行器1211d获取到操作步骤集合后,可以根据第一操作命令的标识确定处理模式。其中,标识用于指示第一操作命令所能采用的处理模式。处理模式包括采用旁路框架执行第一操作命令的处理。
为便于描述,将执行器1211d基于旁路框架d2执行SQL语句的操作步骤定义为第一处理模式。第一处理模式用于指示加速执行对关系型数据库的操作。执行器1211d基于执行框架d1执行SQL语句的操作步骤定义为第二处理模式。也就是说,在本申请所提供的实施例中,可以根据需求选择第一处理模式或第二处理模式执行数据处理。
在一些实施例中,计算设备可以根据如下步骤230和步骤240确定是否在第一处理模式下执行第一操作命令。
步骤230、计算设备根据使能标识的取值判断使能标识是否指示在第一处理模式下执行对关系型数据库的操作。
若使能标识为1,指示使能(enable)在第一处理模式下执行对关系型数据库的操作,即允许基于旁路框架d2执行SQL语句的操作步骤;若使能标识为0,指示不使能(disable)在第一处理模式下执行对关系型数据库的操作,使能在第二处理模式下执行对关系型数据库的操作,即不允许基于执行框架d1执行SQL语句的操作步骤。使能标识的具体值以及每个值所代表的含义,可以是系统管理员根据业务需求预先配置的。计算设备可以在数据库配置文件、用户界面、数据库客户端工具配置等入口使能或不使能第一处理模式。
若使能标识指示在第一处理模式下执行对关系型数据库的操作,执行步骤240;若使能标识指示在第二处理模式下执行对关系型数据库的操作,执行步骤270。
步骤240、计算设备根据执行第一操作命令所需的操作步骤判断是否在第一处理模式下执行第一操作命令。
执行器1211d先根据第一操作命令确定命令类型,命令类型可以是选择(select)、插入(intert)、更新(update)或删除(delete)。执行器1211d获取第一操作命令所属命令类型的预设算子树,将第一操作命令所需的操作步骤的算子树与第一操作命令所属命令类型的预设算子树进行比较,确定是否在第一处理模式下执行第一操作命令。
执行器1211d可以比较算子树的每层中每个算子,若每层中每个算子相同,或者,每层中实现主要操作功能的算子相同,确定在第一处理模式下执行第一操作命令。
示例地,选择命令的预设算子树中,顶层节点包括筛选算子、聚合算子、排序算子和索引扫描算子;中间节点包括分区操作算子、过滤(filter)算子、聚合算子和排序算子;叶子节点包括索引扫描算子。
插入命令的预设算子树中,顶层节点包括修改表算子。
更新命令的预设算子树中,顶层节点包括修改表算子;中间节点包括分区操作算 子;叶子节点包括索引扫描算子。
删除命令的预设算子树中,顶层节点包括修改表算子;中间节点包括分区操作算子;叶子节点包括索引扫描算子。
可选地,若计算设备已经对关系型数据库执行过第一操作命令,计算设备可以存储第一操作命令的操作步骤。计算设备再次接收到关系型数据库的第一操作命令后,可以从缓存中获取该第一操作命令的历史操作步骤,第一操作命令的历史操作步骤可以是第一处理模式下的第一操作命令的操作步骤或第二处理模式下的第一操作命令的操作步骤。从而,避免重复执行步骤230至步骤250,减少了对关系型数据库执行第一操作命令的过程中的步骤,提升关系型数据库的性能,以及硬件资源利用率。换句话说,计算设备在接收相同类型的操作命令时,可以通过获知已执行的同类型的命令的操作步骤,也即,可以通过同类型的历史命令的执行情况获知该类型命令的操作步骤。
需要说明的是,本申请实施例提供的数据处理的方法的步骤流程可以根据需要进行适应性的删减或改变执行顺序。例如,计算设备也可以无需执行步骤230,执行完步骤220则执行步骤240。
若确定在第一处理模式下执行第一操作命令,根据第一操作命令获取第一处理模式下第一操作命令的操作步骤,即执行步骤250和步骤260;若确定在第二处理模式下执行第一操作,根据第一操作命令获取第二处理模式下第一操作命令的操作步骤,即执行步骤270。
步骤250、计算设备根据第一操作命令的类型对操作步骤集合中操作步骤进行合并操作,获得合并后操作步骤集合。
执行器1211d基于合并规则,根据第一操作命令的类型对操作步骤集合中操作步骤进行合并操作,获得合并后操作步骤集合。合并规则例如指示了依据操作命令类型执行合并操作的算子。操作命令类型可以包括选择命令、插入命令、删除命令、更新命令和基于更新的选择命令。合并规则指示了可以执行合并操作的选择命令的算子、插入命令的算子、删除命令的算子、更新命令的算子和基于更新的选择命令的算子。
可理解的,执行器1211d根据合并规则指示的执行合并操作的算子,对操作步骤集合中操作步骤的算子进行合并,得到合并后操作步骤集合。合并后操作步骤集合可以包含至少一个合并后操作步骤。合并后操作步骤包含了至少一个关系型数据库的算子。例如,合并后操作步骤包括选择合并操作、插入合并操作、删除合并操作、更新合并操作、基于更新的选择合并操作、扫描合并操作、聚合合并操作和排序合并操作等。其中,由于扫描操作作为读取关系型数据库中数据的算子,扫描合并操作与扫描操作实现基本一致。
由于不同的合并后操作步骤实现了不同关系型数据库的操作功能,则不同的合并后操作步骤包含至少一个不同的算子,但是,不同的合并后操作步骤也可以包含至少一个相同的算子。本申请实施例对合并后操作步骤包含的算子不做严格地区分,在实际应用中,系统管理员可以根据业务需要自适应地配置合并后操作步骤,以便于执行器1211d对关系型数据库的操作所需的关系型数据库的算子进行合理地合并,来提升关系型数据库的性能,以及硬件资源利用率。
由于计算设备基于合并规则将操作步骤集合中操作步骤进行了合并,可以减少第一操作的操作步骤的数量,使得算子间递归操作的次数也相应减少,进而提升第一操作的处理效率。
在一些实施例中,执行第一操作命令所需的关系型数据库的算子包含了一些合并规则未指示的算子,这些未指示的算子也未包含在预设算子树中,执行器1211d可以省略这些未指示的算子,基于合并规则合并执行第一操作命令所需的关系型数据库的算子中除了未指示的算子之外的算子,得到合并后操作步骤集合。在另一些实施例中,执行器1211d并没有省略第一操作命令所需的关系型数据库的算子中任何算子,执行器1211d基于合并规则将执行第一操作命令所需的关系型数据库的算子进行了合并,得到第一处理模式下第一操作命令的操作步骤。可理解地,相对第二处理模式下第一操作命令的操作步骤包含的算子的个数,第一处理模式下第一操作命令的操作步骤包含的算子的个数更少。而且,相对第二处理模式下第一操作命令的操作步骤中算子之间的递归调用次数,第一处理模式下第一操作命令的操作步骤中算子之间的递归调用次数更少。
示例地,图3中的(a)所示,为SQL语句执行计划。SQL语句执行计划包含了客户端、聚合操作、连接表操作、分区操作和扫描操作。聚合操作用于对相同类别的数据进行合并,并对合并数据进行运算,比如相加、相减等。连接表操作用于对不同的数据表中的数据进行关联。分区操作用于对数据表中的至少一个分区进行分区操作。扫描操作用于从关系型数据库中获取操作命令所需的数据。
聚合操作、连接表操作、分区操作和扫描操作可以是指关系型数据库中的4个算子。客户端调用聚合算子,聚合算子调用连接表算子,连接表算子调用分区算子,分区算子调用扫描算子。扫描算子读取数据表中的数据,分区算子根据扫描算子读取的数据选取对应的列进行表连接操作,连接表算子根据分区算子返回的符合连接条件的结果执行连接表操作,聚合算子根据连接表算子的结果进行聚合操作。聚合算子将聚合结果的数据反馈给客户端。
执行器1211d基于执行框架对算子进行抽象使得每个算子可以单独实现,不需要关心其他算子的逻辑。但多个算子的递归调用使得整个堆栈非常深。图3中的(b)所示,为SQL语句的执行堆栈示意图。ExecProcNode函数表示每个算子的执行入口,ExecScan\ExecNest Loop\ExecAgg\ExecPartIterator是具体的实例化算子,以C++虚函数为例,ExecProcNode的实例化就是虚函数的多态过程,大量的虚函数调用使得处理器针对核心业务的利用率不高。
执行器1211d基于旁路框架将上述聚合算子、扫描算子、连接表算子和分区算子合并得到聚合合并算子。在执行器1211d调用到聚合合并算子后,实现聚合算子、扫描算子、连接表算子和分区算子的功能。减少了原有各种复杂场景下的分支判断及算子递归调用,加速整个SQL语句执行流程。
对比图3中的(b)所示的执行堆栈,如图3中的(c)所示的加速堆栈,执行器1211d执行加速堆栈时不需要像执行堆栈一样对执行算子进行树状迭代执行,而是访问存储引擎接口(例如index_getnext接口是存储引擎对外提供的接口,用于通过索引访问对应数据)对数据进行处理,有效地节省处理器和输出输入接口的开销。例如,图3中 的(b)所示包含4个入口,图3中的(c)所示包含3个入口。
步骤260、计算设备根据合并后操作步骤集合完成第一操作命令。
执行器1211d执行第一处理模式下第一操作命令的操作步骤包含的多个合并后操作,利用较少了递归调用的次数,完成对关系型数据库的第一操作。从而,提升关系型数据库的性能,以及硬件资源利用率。
步骤270、计算设备根据第一操作命令的操作步骤集合完成第一操作命令。
执行器1211d执行第二处理模式下第一操作命令的操作步骤完成对关系型数据库的第一操作。
如此,相对于基于执行框架对多个算子通过多次递归调用实现对关系型数据库的操作,本申请实施例提供的基于加速策略指示的处理模式,即基于旁路框架对多个算子进行合并,得到合并后算子,执行合并后算子减少算子间递归调用的次数,尽可能快地完成对关系数据库的操作得到操作结果,提升了关系型数据库的性能,以及避免硬件占用过多的资源执行算子间递归调用的过程,提高了硬件资源利用率。
作为一种可能的实施例,在上述图2所述的数据处理方法中,还可以基于分区功能对数据处理过程进行加速。
数据表包含了将数据表划分得到的多个分区,每个分区可以是一个小数据表。分区操作(或称为分区表迭代器(partIterator)算子)可以根据SQL语句执行计划提供的分区标识(也可以称为分区属性、分区键、分区键参数),对分区标识指示的分区进行访问。分区操作包括对数据表的选择、插入、删除和更新。如图4所示,本申请实施例提供了一种分区操作的方法包括以下步骤。
步骤410、计算设备在第一操作命令所关联的数据表中采用动态分区剪枝方式确定目标分区。
执行器1211d获取到分区标识后,可以根据分区标识从数据表确定分区标识指示的目标分区。分区标识可以是指数据表中的属性。对于不同的数据表的属性也不同,例如,对于财务报表,属性可以包括收入和支出。又如,对于成绩统计表,属性可以包含班级、姓名、性别和成绩等。
步骤420、计算设备对目标分区执行第一操作命令的操作。
执行器1211d根据分区标识指示的目标分区,对目标分区执行第一操作命令的操作。例如,执行器1211d对目标分区中的数据执行选择、插入、删除和更新等。
例如,假设数据表包含4个分区,第一操作命令指示了分区标识1和分区标识2,第一操作命令指示对分区标识1指示的分区1进行删除操作,以及指示分区标识2指示的分区2进行更新操作。
执行器1211d确定分区标识1指示的分区1,以及确定分区标识2指示的分区2。执行器1211d对分区1执行删除操作,以及对分区2执行更新操作。
需要说明的是,本申请实施例提供的执行器1211d动态访问分区的功能,不限定上述实施例所述的处理模式,执行器1211d可以在第一处理模式下或第二处理模式下动态地访问分区。
本申请实施例提供了一种分区操作的方法,执行器1211d可以根据分区标识指示的分区,修改SQL语句执行计划,使执行器1211d动态地根据访问分区标识指示的目 标分区,避免对无用分区进行扫描,降低了竞争资源,提升了关系型数据库的性能。
计算设备对关系型数据库进行操作后会生成事务日志(或称为重做日志),事务日志用于描述对数据的具体更改信息。而如何存储事务日志也会影响数据处理的过程。
作为另一个可能的实施例,在计算设备中的业务线程对关系型数据库执行第一操作命令后,还可以将生成第一操作日志的业务线程加入日志组,以日志组的形式将日志组中所有业务线程生成的操作日志并行地写入存储器。例如,计算设备可以根据第一规则并行地将第一操作日志组中操作日志存储至存储器。第一操作日志组包括至少两个操作日志。至少两个操作日志包括所述第一操作日志。第一规则用于判断第一操作日志组中操作日志是否满足存储至存储器的条件。第一规则可以包括获取到存储器中用于存储日志的重做日志缓冲区的写入权限,或者,第一操作日志组包括的至少一个操作日志均已写入主存。如图5所示,本申请实施例对写事务日志的过程进行说明。
步骤510、计算设备以第一操作日志组获取重做日志缓冲区的写入权限,将第一操作日志组中的日志并行地写入重做日志缓冲区。
计算设备中的处理器可以包括多个处理器核,不同的处理器核可以用于处理对关系型数据库的不同操作,具体地,每个处理器核上可以运行一个或多个进程或线程,由具体进程或线程执行操作指令,与此同时,不同的处理器核可以生成不同的操作日志。处理器核也可以指示生成操作日志的业务线程。例如,计算设备中第一业务线程对关系型数据库执行第一操作命令后,生成第一操作日志。
第一业务线程生成第一操作日志后,加入第一操作日志组。第一操作日志组包含的成员为业务线程,第一操作日志组包含的成员的数量可以是预先配置的也可以根据生成操作日志的业务线程的数量确定的。或者,第一操作日志组包含的业务线程是预先配置的,例如第一操作日志组包含了计算设备中指定的业务线程。换句话说,操作日志组的数量,以及每个操作日志组所包含业务线程均可以预先配置。例如,可以根据计算设备并行执行日志持久化操作的能力划分操作日志组的数量,以及每个操作日志组中所包含的业务进程的数量;而每个操作日志组中所包含的业务进程则可以根据处理器核的标识划分,也可以根据处理器核所的处理能力划分,还可以根据业务线程的优先级划分。
可理解的,当第一操作日志组中所有业务线程均生成操作日志后,可以由第一操作日志组中的一个业务线程作为代表线程(leader)争抢重做日志缓冲区的写入权限。写入权限用于指示允许对重做日志缓冲区执行写操作的权限。重做日志缓冲区可以是图1所示的存储引擎层1212中所述重做日志缓冲区12123。重做日志缓冲区可以执行上述步骤210至步骤270的计算设备中的存储器(如主存(main memory))中的存储空间,也可以是与所述计算设备相连的其他存储器中的存储空间。重做日志缓冲区的存储空间的大小可以是1G。
可选地,代表线程还可以是依据业务线程的标识、业务线程的资源占用率、运行时长、进程所在硬件的空余资源等确定的。
示例地,如图6所示,N个业务线程组成操作日志组,第一个加入操作日志组的业务线程作为代表线程(leader),之后加入该操作日志组的业务线程都是成员线程(follower)。
代表线程代表该组争抢重做日志缓冲区的写入权限,组中的其他成员线程休眠,等待代表线程唤醒。
需要说明的是,业务线程以独占的方式尝试获取重做日志缓冲区的写入权限。重做日志缓冲区的写入权限的数目决定了事务日志写入的并发度。例如,重做日志缓冲区的写入权限的数目为48,表示同一时刻可以由48个业务线程对重做日志缓冲区写入事务日志。因此,代表线程代表该组从重做日志缓冲区的预设数量的写入权限中争抢一个写入权限。
代表线程争抢到重做日志缓冲区的写入权限后,代表线程遍历组内每个业务线程待写入的事务日志的大小,得到组内所有业务线程写入重做日志缓冲区所需的存储空间的大小。代表线程向控制主存的控制器申请在重做日志缓冲区中所需的存储空间的大小后,将组内所有业务线程并行地将事务日志写入重做日志缓冲区。每个业务线程将生成的多个事务日志串行地写入重做日志缓冲区。
代表线程释放重做日志缓冲区的写入权限,唤醒组内所有休眠的业务线程。
由于代表线程将组内所有业务线程的事务日志写入到了重做日志缓冲区,组内其他成员线程无需争抢重做日志缓冲区的写入权限,组内其他成员线程唤醒后执行后续其他流程。
从而,计算设备将多个事务日志合并为一个组,以组抢占重做日志缓冲区的写入权限,并将组内的事务日志并行地写入重做日志缓冲区,减少重做日志缓冲区的写入权限的争抢次数,提升写入事务日志的效率。
计算设备将事务日志写入到重做日志缓冲区后,再将重做日志缓冲区中的日志写入到持久化介质中(例如磁盘)。执行步骤520。
步骤520、计算设备将重做日志缓冲区中满足允许写状态的至少一个日志从重做日志缓冲区写入磁盘。
重做日志写线程检测重做日志缓冲区中可以写入磁盘的重做日志。在一些实施例中,计算设备针对重做日志缓冲区中重做日志设置一个数组,记录重做日志缓冲区中事务日志的状态。数组的深度可以根据业务需求设置,例如,数组深度还可以是100。数组包含了日志序列号(log sequence number,LSN)、日志记录数(Log Record Count,LRC)和状态。日志序列号用于确保事务日志写入重做日志缓冲区的区间是全局唯一性。日志序列号指示了事务日志的长度。日志记录数用于确保事务日志的编号的唯一性。状态包含允许写状态和非允许写状态。允许写状态用于指示允许将日志从重做日志缓冲区写入磁盘。非允许写状态用于指示不允许将日志从重做日志缓冲区写入磁盘。
如果业务线程将事务日志写入重做日志缓冲区完成,释放重做日志缓冲区的写入权限,日志状态为允许写状态;如果业务线程将事务日志写入重做日志缓冲区未完成,日志状态为非允许写状态。
在业务线程将事务日志写入重做日志缓冲区完成后,重做日志写线程将重做日志缓冲区中满足允许写状态的至少一个重做日志从重做日志缓冲区写入磁盘。如果重做日志缓冲区中满足允许写状态的重做日志包括两个以上,重做日志写线程串行地将两个以上重做日志写入磁盘。
重做日志写线程将重做日志缓冲区中满足允许写状态的至少一个重做日志从重做 日志缓冲区写入磁盘之后,将重做日志缓冲区中允许写状态的日志刷新为非允许写状态。
示例地,如图7中的(a)所示,为本申请实施例提供的一种数组的示意图。数组深度为8,即可以记录8个重做日志的状态。当LSN0-LSN4指示的重做日志的状态为允许写状态,表示LSN0-LSN4指示的重做日志是业务线程已写入重做日志缓冲区的完整日志,重做日志写线程将LSN0-LSN4的重做日志从重做日志缓冲区写入磁盘。由于LSN5指示的重做日志的状态为非允许写状态,表示LSN5指示的重做日志是业务线程写入重做日志缓冲区的日志还未完全写完,重做日志写线程不将LSN0-LSN4的重做日志从重做日志缓冲区写入磁盘。
如图7中的(b)所示,LSN0-LSN4的日志写入完成后,LSN0-LSN4的状态更新为非允许写状态。重做日志写线程遍历数组中的每项,确定LSN6-LSN7指示的重做日志的状态为允许写状态,表示LSN6-LSN7指示的重做日志是业务线程已写入重做日志缓冲区的完整日志,重做日志写线程将LSN6-LSN7的重做日志从重做日志缓冲区写入磁盘。
如图7中的(c)所示,若业务线程将LSN8的重做日志写入重做日志缓冲区完成,LSN8的状态更新为允许写状态。LRC0更新为LRC8。若业务线程将LSN9的日志写入重做日志缓冲区完成,LSN9的状态更新为允许写状态。LRC1更新为LRC9。
需要说明的是,重做日志写线程在将重做日志从重做日志缓冲区写入磁盘之前,还需要根据待写入磁盘的日志量初始化磁盘中存储日志文件的存储空间。
可选地,上述重做日志写线程(walwriter)初始化磁盘中存储日志文件的存储空间的步骤可以由重做日志文件初始化线程(walfileinit)执行。将重做日志从重做日志缓冲区写入磁盘的步骤可以重做日志刷盘线程(walflusher)执行。
如此,重做日志写线程将重做日志写入重做日志缓冲区的过程和重做日志写入磁盘的过程进行了解耦,日志写线程无需等待释放重做日志缓冲区的写入权限,根据数组记录了日志的日志状态将日志写入磁盘,解除了重做日志缓冲区的写入权限的限制,提升了关系型数据库的整体性能。
可以理解的是,为了实现上述实施例中的功能,计算设备包括了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本申请中所公开的实施例描述的各示例的单元及方法步骤,本申请能够以硬件或硬件和计算机软件相结合的形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用场景和设计约束条件。
上文中结合图1至图7,详细描述了根据本实施例所提供的数据处理方法,下面将结合图8和图9,描述根据本实施例所提供的数据处理装置。
图8为本实施例提供的可能的数据处理装置的结构示意图。这些数据处理装置可以用于实现上述方法实施例中计算设备的功能,因此也能实现上述方法实施例所具备的有益效果。在本实施例中,该数据处理装置可以是用于实现如图1所示的服务层1211的模块,还可以是应用于服务器的模块(如芯片)。
如图8所示,数据处理装置800包括通信模块810、分析模块820、优化模块830、执行模块840、写日志模块850和存储模块860。数据处理装置800用于实现上述图2、 图3、或图5中所示的方法实施例中计算设备的功能。
通信模块810用于获取第一操作命令,第一操作命令例如是SQL语句。所述第一操作命令用于在关系型数据库中执行数据处理。例如,通信模块810用于执行图2中步骤210。
分析模块820用于对SQL语句进行词法和语法解析,得到包含语义信息的SQL语句。
优化模块830用于根据包含语义信息的SQL语句生成SQL语句的操作步骤。
执行模块840用于确定第一操作命令的加速策略,加速策略用于对第一操作命令的处理过程进行加速;根据加速策略执行第一操作命令的操作。
执行模块840具体用于确定第一操作命令的加速策略,包括以下方式中至少一种:根据第一操作命令的标识确定处理模式,其中,标识用于指示第一操作命令所能采用的处理模式,处理模式包括采用旁路框架执行第一操作命令的处理;或者,在第一操作命令所关联的数据表中采用动态分区剪枝方式确定目标分区。例如,执行模块840用于执行图2中步骤220至步骤270。
存储模块860用于存储加速策略、旁路框架、执行框架和日志文件等。
数据处理装置800还包括更新模块870。
更新模块870用于更新存储模块860存储的加速策略、旁路框架和执行框架等参数
应理解的是,本申请实施例的数据处理装置800可以通过ASIC实现,或可编程逻辑器件(programmable logic device,PLD)实现,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD),现场可编程门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。也可以通过软件实现图2、图3、或图5所示的数据处理方法时,数据处理装置800及其各个模块也可以为软件模块。
根据本申请实施例的数据处理装置800可对应于执行本申请实施例中描述的方法,并且数据处理装置800中的各个单元的上述和其它操作和/或功能分别为了实现图2、图3、或图5中的各个方法的相应流程,为了简洁,在此不再赘述。
图9为本实施例提供的一种计算系统900的结构示意图。如图所示,计算系统900包括处理器910、总线920、存储器930和通信接口940。
应理解,在本实施例中,处理器910可以是CPU,该处理器910还可以是其他通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者是任何常规的处理器等。
处理器还可以是GPU、NPU、微处理器、ASIC、或一个或多个用于控制本申请方案程序执行的集成电路。
通信接口940用于实现计算系统900与外部设备或器件的通信。在本实施例中,通信接口940用于获取对关系型数据库的操作命令。
总线920可以包括一通路,用于在上述组件(如处理器910和存储器930)之间传送信息。总线920除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线920。
作为一个示例,计算系统900可以包括多个处理器。处理器可以是一个多核(multi-CPU)处理器。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的计算单元。处理器910可以调用存储器930存储的旁路框架对第一操作命令的操作步骤集合中操作步骤进行合并操作,获得合并后操作步骤集合,根据合并后操作步骤集合完成第一操作命令。
值得说明的是,图9中仅以计算系统900包括1个处理器910和1个存储器930为例,此处,处理器910和存储器930分别用于指示一类器件或设备,具体实施例中,可以根据业务需求确定每种类型的器件或设备的数量。
存储器930可以对应上述方法实施例中用于存储加速策略、旁路框架、执行框架和日志文件等信息的存储介质,例如,磁盘,如机械硬盘或固态硬盘。
上述计算系统900可以是一个通用设备或者是一个专用设备。例如,计算系统900也可以是服务器或其他具有计算能力的设备。
作为一种可能的实施例,计算系统900也可以是一个由多个计算设备构成的集群,该集群中包括多个计算设备,多个计算设备可以通过网络相连,每个计算设备的结构如图9所示,为了简洁,在此不再赘述。
应理解,根据本实施例的计算系统900可对应于本实施例中的数据处理装置800,并可以对应于执行根据图2、图3、或图5中任一方法中的相应主体,并且数据处理装置800中的各个模块的上述和其它操作和/或功能分别为了实现图2、图3、或图5中的各个方法的相应流程,为了简洁,在此不再赘述。
本实施例中的方法步骤可以通过硬件的方式来实现,也可以由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(random access memory,RAM)、闪存、只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、CD-ROM或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于网络设备或终端设备中。当然,处理器和存储介质也可以作为分立组件存在于网络设备或终端设备中。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机程序或指令。在计算机上加载和执行所述计算机程序或指令时,全部或部分地执行本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、网络设备、用户设备或者其它可编程装置。所述计算机程序或指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机程序或指令可以从一个网站站点、计算机、服务器或数据中心通过有线或无线方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是集成一个或多个可用介质的服务器、数据中心等数据存储设备。 所述可用介质可以是磁性介质,例如,软盘、硬盘、磁带;也可以是光介质,例如,数字视频光盘(digital video disc,DVD);还可以是半导体介质,例如,固态硬盘(solid state drive,SSD)。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (17)

  1. 一种数据处理的方法,其特征在于,所述方法包括:
    获取第一操作命令,所述第一操作命令用于在关系型数据库中执行数据处理;
    确定所述第一操作命令的加速策略,所述加速策略用于对所述第一操作命令的处理过程进行加速;
    根据所述加速策略执行所述第一操作命令的操作。
  2. 根据权利要求1所述的方法,其特征在于,所述确定所述第一操作命令的加速策略,包括以下方式中至少一种:
    根据所述第一操作命令的标识确定处理模式,其中,所述标识用于指示所述第一操作命令所能采用的处理模式,所述处理模式包括采用旁路框架执行所述第一操作命令的处理;
    或者,在所述第一操作命令所关联的数据表中采用动态分区剪枝方式确定目标分区。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述第一操作命令的标识确定处理模式,包括:
    根据所述第一操作命令确定操作步骤集合,所述操作步骤集合包括执行所述第一操作命令所需的操作步骤;
    根据所述第一操作命令的类型对所述操作步骤集合中操作步骤进行合并操作,获得合并后操作步骤集合;
    则所述根据所述加速策略执行所述第一操作命令的操作,包括:
    根据所述合并后操作步骤集合完成所述第一操作命令。
  4. 根据权利要求2所述的方法,其特征在于,在所述第一操作命令所关联的数据表中采用动态分区剪枝方式确定目标分区,包括:
    根据所述第一操作命令的操作指示的所述目标分区的属性从所述数据表中确定所述目标分区,所述目标分区包含了所述数据表中至少一个属性的数据;
    则所述根据所述加速策略执行所述第一操作命令的操作,包括:
    对所述目标分区执行所述第一操作命令的操作。
  5. 根据权利要求1至4中任一所述的方法,其特征在于,所述方法还包括:
    根据第一规则并行地将第一操作日志组中操作日志存储至存储器,所述第一操作日志组包括至少两个操作日志,所述至少两个操作日志包括所述第一操作日志,其中,所述第一规则用于判断所述第一操作日志组中操作日志是否满足存储至所述存储器的条件。
  6. 根据权利要求5所述的方法,其特征在于,所述第一规则包括:获取到所述存储器中用于存储日志的重做日志缓冲区的写入权限,或者,所述第一操作日志组包括的至少一个操作日志的状态为允许写状态。
  7. 根据权利要求5或6所述的方法,其特征在于,在所述根据第一规则并行地将第一操作日志组中操作日志存储至存储器之前,所述方法还包括:
    根据待写入操作日志的数量划分第一操作日志组。
  8. 根据权利要求1至7中任一所述的方法,其特征在于,所述关系型数据库为 openGauss,所述操作日志包括重写日志redo log和预写日志Write Ahead log中至少一种。
  9. 一种数据处理的装置,其特征在于,所述装置包括:
    通信模块,用于获取第一操作命令,所述第一操作命令用于在关系型数据库中执行数据处理;
    执行模块,用于确定所述第一操作命令的加速策略,所述加速策略用于对所述第一操作命令的处理过程进行加速;
    所述执行模块,用于根据所述加速策略执行所述第一操作命令的操作。
  10. 根据权利要求9所述的装置,其特征在于,所述执行模块确定所述第一操作命令的加速策略,包括以下方式中至少一种:
    根据所述第一操作命令的标识确定处理模式,其中,所述标识用于指示所述第一操作命令所能采用的处理模式,所述处理模式包括采用旁路框架执行所述第一操作命令的处理;
    或者,在所述第一操作命令所关联的数据表中采用动态分区剪枝方式确定目标分区。
  11. 根据权利要求10所述的装置,其特征在于,所述执行模块根据所述第一操作命令的标识确定处理模式时,具体用于:
    根据所述第一操作命令确定操作步骤集合,所述操作步骤集合包括执行所述第一操作命令所需的操作步骤;
    根据所述第一操作命令的类型对所述操作步骤集合中操作步骤进行合并操作,获得合并后操作步骤集合;
    根据所述合并后操作步骤集合完成所述第一操作命令。
  12. 根据权利要求10所述的装置,其特征在于,在所述执行模块根据第一操作命令所关联的数据表中采用动态分区剪枝方式确定目标分区时,具体用于:
    根据所述第一操作命令的操作指示的所述目标分区的属性从所述数据表中确定所述目标分区,所述目标分区包含了所述数据表中至少一个属性的数据;
    对所述目标分区执行所述第一操作命令的操作。
  13. 根据权利要求9至12中任一所述的装置,其特征在于,所述装置还包括写日志模块;
    所述写日志模块,用于根据第一规则并行地将第一操作日志组中操作日志存储至存储器,所述第一操作日志组包括至少两个操作日志,所述至少两个操作日志包括所述第一操作日志,其中,所述第一规则用于判断所述第一操作日志组中操作日志是否满足存储至所述存储器的条件。
  14. 根据权利要求13所述的装置,其特征在于,所述第一规则包括:获取到所述存储器中用于存储日志的重做日志缓冲区的写入权限,或者,所述第一操作日志组包括的至少一个操作日志的状态为允许写状态。
  15. 根据权利要求13或14所述的装置,其特征在于,所述写日志模块,还用于:
    根据待写入操作日志的数量划分第一操作日志组。
  16. 根据权利要求9至15中任一所述的装置,其特征在于,所述关系型数据库为 openGauss,所述操作日志包括重写日志redo log和预写日志Write Ahead log中至少一种。
  17. 一种计算系统,其特征在于,所述计算系统包括存储器和至少一个处理器,所述存储器用于存储一组计算机指令;当所述处理器执行所述一组计算机指令时,执行上述权利要求1至8中任一所述的方法的操作步骤。
PCT/CN2022/100432 2021-07-09 2022-06-22 数据处理的方法、装置和计算系统 WO2023279962A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP22836716.5A EP4361836A1 (en) 2021-07-09 2022-06-22 Data processing method and apparatus, and computing system
US18/405,483 US20240143566A1 (en) 2021-07-09 2024-01-05 Data processing method and apparatus, and computing system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110778003.8A CN115599811A (zh) 2021-07-09 2021-07-09 数据处理的方法、装置和计算系统
CN202110778003.8 2021-07-09

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/405,483 Continuation US20240143566A1 (en) 2021-07-09 2024-01-05 Data processing method and apparatus, and computing system

Publications (1)

Publication Number Publication Date
WO2023279962A1 true WO2023279962A1 (zh) 2023-01-12

Family

ID=84800356

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/100432 WO2023279962A1 (zh) 2021-07-09 2022-06-22 数据处理的方法、装置和计算系统

Country Status (4)

Country Link
US (1) US20240143566A1 (zh)
EP (1) EP4361836A1 (zh)
CN (1) CN115599811A (zh)
WO (1) WO2023279962A1 (zh)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102810050A (zh) * 2011-05-31 2012-12-05 深圳市金蝶友商电子商务服务有限公司 日志数据写入方法和日志系统
CN105378724A (zh) * 2014-06-10 2016-03-02 华为技术有限公司 一种数据查询方法、装置及系统
CN106446134A (zh) * 2016-09-20 2017-02-22 浙江大学 基于谓词规约和代价估算的局部多查询优化方法
CN106776639A (zh) * 2015-11-24 2017-05-31 腾讯科技(深圳)有限公司 基于结构化查询语言的数据处理方法及数据处理装置
CN107491529A (zh) * 2017-08-18 2017-12-19 华为技术有限公司 一种快照删除方法及节点
WO2019143705A1 (en) * 2018-01-16 2019-07-25 Oracle International Corporation Dimension context propagation techniques for optimizing sql query plans
CN112486985A (zh) * 2020-11-26 2021-03-12 广州奇享科技有限公司 一种锅炉数据的查询方法、装置、设备及存储介质
CN112749189A (zh) * 2019-10-29 2021-05-04 北京国双科技有限公司 数据查询方法及装置

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102810050A (zh) * 2011-05-31 2012-12-05 深圳市金蝶友商电子商务服务有限公司 日志数据写入方法和日志系统
CN105378724A (zh) * 2014-06-10 2016-03-02 华为技术有限公司 一种数据查询方法、装置及系统
CN106776639A (zh) * 2015-11-24 2017-05-31 腾讯科技(深圳)有限公司 基于结构化查询语言的数据处理方法及数据处理装置
CN106446134A (zh) * 2016-09-20 2017-02-22 浙江大学 基于谓词规约和代价估算的局部多查询优化方法
CN107491529A (zh) * 2017-08-18 2017-12-19 华为技术有限公司 一种快照删除方法及节点
WO2019143705A1 (en) * 2018-01-16 2019-07-25 Oracle International Corporation Dimension context propagation techniques for optimizing sql query plans
CN112749189A (zh) * 2019-10-29 2021-05-04 北京国双科技有限公司 数据查询方法及装置
CN112486985A (zh) * 2020-11-26 2021-03-12 广州奇享科技有限公司 一种锅炉数据的查询方法、装置、设备及存储介质

Also Published As

Publication number Publication date
EP4361836A1 (en) 2024-05-01
CN115599811A (zh) 2023-01-13
US20240143566A1 (en) 2024-05-02

Similar Documents

Publication Publication Date Title
US20220188332A1 (en) Distributed transaction database log with immediate reads and batched writes
CN111338766B (zh) 事务处理方法、装置、计算机设备及存储介质
US20200249990A1 (en) Managing the Processing of Streamed Data in a Data Streaming Application Using Query Information from a Relational Database
US8601474B2 (en) Resuming execution of an execution plan in a virtual machine
US20160267132A1 (en) Abstraction layer between a database query engine and a distributed file system
US20210141794A1 (en) System and method for enhancing processing of a query to a relational database with software-based near-data processing (ndp) technology
US20070299810A1 (en) Autonomic application tuning of database schema
US20230418811A1 (en) Transaction processing method and apparatus, computing device, and storage medium
WO2016119597A1 (zh) Oltp集群数据库中页面查询方法及数据处理节点
US11734432B2 (en) Detecting second-order security vulnerabilities via modelling information flow through persistent storage
US20210326343A1 (en) Storing derived summaries on persistent memory of a storage device
WO2024021808A1 (zh) 数据查询请求的处理方法、装置、设备及存储介质
WO2023279962A1 (zh) 数据处理的方法、装置和计算系统
US20230196199A1 (en) Querying databases with machine learning model references
WO2023000561A1 (zh) 一种对数据库操作进行加速的方法和装置
Li Modernization of databases in the cloud era: Building databases that run like Legos
US11853319B1 (en) Caching updates appended to an immutable log for handling reads to the immutable log
US20240095246A1 (en) Data query method and apparatus based on doris, storage medium and device
US11366810B2 (en) Index contention under high concurrency in a database system
US11803568B1 (en) Replicating changes from a database to a destination and modifying replication capacity
WO2024109415A1 (zh) 一种数据库重分布的方法、系统、设备集群及存储介质
CN114064595A (zh) 写事务日志的方法、装置、处理器和服务器
CN113190332A (zh) 用于处理元数据的方法、设备和计算机程序产品
Tapdiya Large Scale Data Management for Enterprise Workloads
CN117971892A (zh) 数据处理方法、数据处理引擎、计算设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22836716

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022836716

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022836716

Country of ref document: EP

Effective date: 20240125

NENP Non-entry into the national phase

Ref country code: DE