CN117453682A

CN117453682A - Method and system for parallel creation of column store table btree index on openGauss database

Info

Publication number: CN117453682A
Application number: CN202311249222.2A
Authority: CN
Inventors: 那海涛; 刘惠
Original assignee: Guangzhou Mass Database Technology Co ltd
Current assignee: Guangzhou Mass Database Technology Co ltd
Priority date: 2023-09-26
Filing date: 2023-09-26
Publication date: 2024-01-26

Abstract

The invention relates to the technical field of btree index creation, and provides a method and a system for parallelly creating a column-store table btree index on an openGauss database, wherein the method comprises the following steps: acquiring the scale information of a list, and setting a starting threshold and concurrency of parallel creation of sub-threads according to the scale information of the list and the running system resource; creating a context object through the main thread, and initializing the created context object according to the scale information of the list; starting a corresponding number of sub-threads according to a starting threshold and concurrency of the parallel creation of the sub-threads, distributing initialized context objects to the sub-threads, and executing creation of a row and memory table index by the sub-threads according to the received context objects; and the main thread performs merging and sorting on the list index created by each sub-thread through the context object to obtain a complete list index. The invention can parallelly create the list-store table btree index on the openGauss database, reduce a great amount of development work and improve the performance of creating the btree index.

Description

Method and system for parallel creation of column store table btree index on openGauss database

Technical Field

The invention relates to the technical field of btree index creation, in particular to a method and a system for parallelly creating a column-store table btree index on an openGauss database.

Background

Btree is fully called a balanced tree, and is a multi-way tree, and each node of Btree can have a plurality of child nodes. Btree index is usually referred to as B+ Tree index, which is a Tree-like data structure that stores data in an orderly fashion, essentially a balanced Tree built on disk for fast retrieval of data, and storing data in order of index column values. The b+tree index is characterized in that data is stored on leaf nodes and is indexed by non-leaf nodes.

The openGauss database supports the btree index, but the current method for creating the index is mainly optimized for the characteristics of the line memory table, the data storage characteristics and the scanning mode of the line memory table are completely different from those of the line memory table, and the method for optimizing the existing creation index cannot be simply multiplexed and is mainly realized through a basic full-table scanning mode. In practical applications, openGauss databases only support single-threaded mode to create indexes, and when the amount of data in a list is large, the time for creating indexes is extremely long.

Therefore, how to provide a method for efficiently creating the hash table btree index on the openGauss database in parallel is a technical problem to be solved.

Disclosure of Invention

In view of the above, the present invention aims to overcome the deficiencies of the prior art, and to provide a method and a system for creating a list btree index on an openGauss database in parallel.

According to a first aspect of the present invention, there is provided a method for creating a columnar-memory-table btree index in parallel on an openGauss database, comprising:

acquiring the scale information of a list, and setting a starting threshold and concurrency of parallel creation of sub-threads according to the scale information of the list and the running system resource;

creating a context object through the main thread, and initializing the created context object according to the scale information of the list;

starting a corresponding number of sub-threads according to a starting threshold and concurrency of the parallel creation of the sub-threads, distributing initialized context objects to the sub-threads, and executing creation of a row and memory table index by the sub-threads according to the received context objects;

and the main thread performs merging and sorting on the list index created by each sub-thread through the context object to obtain a complete list index.

Preferably, in the method for parallel creating the list btree index on the openGauss database, the method acquires the scale information of the list, sets the starting threshold and concurrency of parallel creation of the sub-threads according to the scale information of the list and the running system resource, and comprises the following steps:

obtaining the total number of CU data blocks, the average size value of the CU data blocks and the total amount of the CU data in a column corresponding to a column storage table index key from an operation system directory table;

setting a starting threshold value of parallel creation of the sub-threads according to the total number of the obtained CU data blocks and the average size value of the CU data blocks;

and determining the concurrency of the parallel creation of the sub-threads according to the obtained CU data total amount, the CPU number of the running system and the IO concurrency support parameter.

Preferably, in the method for parallel creating the list btree index on the openGauss database, the method acquires the scale information of the list, sets the starting threshold and concurrency of parallel creation of the sub-threads according to the scale information of the list and the running system resource, and further comprises: and setting a sub-thread step length according to the average size value of the obtained CU data blocks, wherein the sub-thread step length is equal to the number of the CU data blocks obtained each time.

Preferably, in the method for creating the column-store table btree index on the openGauss database in parallel, the context object comprises task scheduling information and intermediate result information, wherein the task scheduling information comprises the total number of the CU data blocks, the starting current of the CU data blocks, a current counter of the CU data blocks, a lock for protecting the update of the current counter of the CU data blocks and the step length of the sub-threads, and the intermediate result information comprises running state information and execution result information of each sub-thread.

Preferably, in the method for creating the list btree index in parallel on the openGauss database, the context object is created through the main thread, and the created context object is initialized according to the scale information of the list, which comprises the following steps:

obtaining the total number of CU data blocks and the starting current of the CU data blocks in a column corresponding to a column-store table index key from a directory table through a main thread;

initializing a CU data block current counter according to the total number of CU data blocks in the corresponding column of the acquired column-store table index key and the CU data block initial current;

a lock is created to protect the CU data block current counter update and specify the child thread step size.

Preferably, in the method for creating the list btree index in parallel on the openGauss database, the CU data block CU id counter is used for recording the starting CU id of the CU data block to be processed in the next batch.

Preferably, in the method for creating the list btree index in parallel on the openGauss database, the sub-thread executes the creation of the list index according to the received context object, and the method comprises the following steps:

the child thread obtains a CU data block starting current to be processed through a CU data block current counter in the context object, and the CU data block current counter is increased by the step size of the child thread atomically;

determining CU data blocks to be processed in the current batch according to the starting current of the CU data blocks to be processed and the step length of the sub-thread;

obtaining position information of the CU data blocks by traversing a cudesc record of each CU data block, and reading the CU data blocks according to the position information of the CU data blocks;

analyzing all the read CU data blocks of the sub-threads, constructing index tuples, sequencing the constructed index tuples, and storing the execution results of the sub-threads in a context object;

and continuing to acquire the next batch of CU data blocks to be processed until the sub-thread completes all execution, updating the state of the context object and ending the sub-thread.

Preferably, in the method for creating the column-store table btree index on the openGauss database in parallel, the main thread performs merging and sorting on the column-store table index created by each sub-thread through the context object to obtain a complete column-store table index, which comprises the following steps: when the main thread detects that the execution of all the sub threads is finished through the context object, the execution result of each sub thread is obtained from the context object, the index tuples created by each sub thread are merged and sequenced, and a complete list table index is created through creating the data dictionary information of the list table index.

According to a second aspect of the present invention, there is provided a system for parallel creating a list btree index on an openGauss database, the system comprising an index parallel creating server for obtaining scale information of a list, and setting a start threshold and concurrency of parallel creation of sub-threads according to the scale information of the list and operating system resources; creating a context object through the main thread, and initializing the created context object according to the scale information of the list; starting a corresponding number of sub-threads according to concurrency of the parallel creation of the sub-threads, distributing initialized context objects to the sub-threads, and executing creation of a column-store table index by the sub-threads according to the received context objects; and the main thread performs merging and sorting on the list index created by each sub-thread through the context object to obtain a complete list index.

According to a third aspect of the present invention there is provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of the first aspect of the present invention when executing the program.

The method and the system for parallelly creating the column-store table btree index on the openGauss database can effectively utilize a multithread mechanism in the openGauss database, and functional modules such as column-store table scanning data, index tuple construction, sequencing and the like, so that a large amount of development work is reduced; by adopting a multithreading parallel mode, scanning data and sequencing are parallel in the process of creating the index, IO and CPU resources of the system are fully utilized, and the index creating performance is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a diagram of a system for a method of creating a list of btrees index in parallel on an openGauss database, in accordance with an embodiment of the present invention;

FIG. 2 is a flowchart illustrating steps of a method for creating a list btree index in parallel on an openGauss database according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of the apparatus provided by the present invention.

Detailed Description

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

It should be noted that, without conflict, the following embodiments and features in the embodiments may be combined with each other; and, based on the embodiments in this disclosure, all other embodiments that may be made by one of ordinary skill in the art without inventive effort are within the scope of the present disclosure.

It is noted that various aspects of the embodiments are described below within the scope of the following claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the present disclosure, one skilled in the art will appreciate that one aspect described herein may be implemented independently of any other aspect, and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. In addition, such apparatus may be implemented and/or such methods practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.

The invention aims to provide a method for creating a list btree index in parallel in an openGuass database, which can dynamically start concurrency function and determine concurrency according to the data volume of a list, and balance the task volume of each sub-thread as much as possible, and execute most of work for creating the index in parallel by utilizing a plurality of threads, thereby fully utilizing CPU (Central processing Unit) and IO (input/output) resources of a system to improve the efficiency of creating the list index and improve the usability.

Database creation of a btree index generally mainly includes several steps: scanning the table data block, parsing the table data block and constructing an index tuple, sorting the index tuples, and creating an index based on the sorted index tuples. In the list, the table data block corresponds to the CU data block, and the whole index creating process of the current list is executed in a serial mode, so that CPU and IO resources of the system are not utilized efficiently.

According to the invention, a multithreading mode is adopted in an openGuass database, an algorithm is adopted to distribute CU data blocks corresponding to index keys to each sub-thread, each sub-thread scans part of CU data blocks, analyzes CUs and builds index tuples, and the index tuples are ordered, and then a main thread gathers the index tuples built by each sub-thread, and finally the creation work of indexes is completed; and introducing a context object, initializing the context object by the main thread, and issuing the context object to the sub-thread to coordinate tasks and detect the running state of the sub-thread, thereby providing support for the sub-thread to acquire scanned CU data block tasks.

FIG. 1 illustrates an exemplary system suitable for use in the method of creating a columnar-store table btree index in parallel on an openGauss database in accordance with an embodiment of the present application. As shown in fig. 1, the system may include an index parallel creation server 101, a communication network 102, and/or one or more index parallel creation clients 103, which are illustrated in fig. 1 as multiple index parallel creation clients 103.

Index parallel creation server 101 may be any suitable server for storing information, data, programs, and/or any other suitable type of content. In some embodiments, index parallel creation server 101 may perform appropriate functions. For example, in some embodiments, the index parallel creation server 101 may be configured to create a columnar-store-table btree index on an openGauss database in parallel. As an optional example, in some embodiments, the index parallel creation server 101 may be configured to obtain size information of the list, and set a start threshold and concurrency of parallel creation of the sub-threads according to the size information of the list and the running system resource; creating a context object through the main thread, and initializing the created context object according to the scale information of the list; starting a corresponding number of sub-threads according to concurrency of the parallel creation of the sub-threads, distributing initialized context objects to the sub-threads, and executing creation of a column-store table index by the sub-threads according to the received context objects; and the main thread performs merging and sorting on the list index created by each sub-thread through the context object to obtain a complete list index.

As another example, in some embodiments, the index parallel creation server 101 may send a method of creating a columnar table btree index in parallel on the openGauss database to the index parallel creation client 103 for use by a user according to a request of the index parallel creation client 103.

As an optional example, in some embodiments, the index parallel creating client 103 is configured to provide a visual interface, where the visual interface is configured to receive a selection input operation of creating the column-store table btree index on the openGauss database in parallel by a user, and, in response to the selection input operation, obtain, from the index parallel creating server 101, an interface corresponding to an option selected by the selection input operation and display the interface, where at least information of creating the column-store table btree index on the openGauss database in parallel and an operation option for creating the column-store table btree index on the openGauss database in parallel are displayed in the interface.

In some embodiments, communication network 102 may be any suitable combination of one or more wired and/or wireless networks. For example, the communication network 102 can include any one or more of the following: the internet, an intranet, a Wide Area Network (WAN), a Local Area Network (LAN), a wireless network, a Digital Subscriber Line (DSL) network, a frame relay network, an Asynchronous Transfer Mode (ATM) network, a Virtual Private Network (VPN), and/or any other suitable communication network. Index parallel creation client 103 is capable of being connected to communication network 102 via one or more communication links (e.g., communication link 104), which communication network 102 is capable of being linked to index parallel creation server 101 via one or more communication links (e.g., communication link 105). The communication link may be any communication link suitable for transferring data between the index parallel creation client 103 and the index parallel creation server 101, such as a network link, a dial-up link, a wireless link, a hardwired link, any other suitable communication link, or any suitable combination of such links.

Index parallel creation client 103 may include any one or more clients that present, in an appropriate form, interfaces associated with creating a columnar-store table btree index in parallel on an openGauss database for use and operation by a user. In some embodiments, index parallel creation client 103 may include any suitable type of device. For example, in some embodiments, index parallel creation client 103 may include a mobile device, a tablet computer, a laptop computer, a desktop computer, and/or any other suitable type of client device.

Although the index parallel creation service 101 is illustrated as one device, in some embodiments any suitable number of devices may be used to perform the functions performed by the index parallel creation service 101. For example, in some embodiments, multiple devices may be used to implement the functions performed by the index parallel creation server 101. Alternatively, the function of index parallel creation server 101 may be implemented using a cloud service.

Based on the above system, the embodiments of the present application provide a method for creating a columnar-memory table btree index on an openGauss database in parallel, which is described in the following embodiments.

Referring to FIG. 2, a flowchart of steps of a method for creating a columnar-store table btree index in parallel on an openGauss database is shown, according to an embodiment of the present invention.

The method for creating the column-store table btree index on the openGauss database in parallel according to the embodiment may be executed at the index parallel creation server, and the method for creating the column-store table btree index on the openGauss database in parallel includes the following steps:

step S201: and acquiring the scale information of the list, and setting a starting threshold and concurrency of parallel creation of the sub-threads according to the scale information of the list and the running system resource.

As an optional example, the method of the embodiment obtains, from the running system directory table, the total number of CU data blocks, the average size value of the CU data blocks, and the total amount of CU data in a column corresponding to the column-store table index key; setting a starting threshold value of parallel creation of the sub-threads according to the total number of the obtained CU data blocks and the average size value of the CU data blocks, wherein the starting threshold value in the method of the embodiment can be set according to specific application scenes, for example, a proper starting threshold value is set according to the hardware performance of an operating system, so that the benefit brought by starting the parallel creation exceeds the cost brought by the concurrent threads; and determining the concurrency of the parallel creation of the sub-threads according to the obtained CU data total amount, the CPU number of the running system and the IO concurrency support parameter.

The method of the embodiment also needs to set a sub-thread step length according to the average size value of the obtained CU data blocks, wherein the sub-thread step length is equal to the number of the CU data blocks obtained each time. According to the method, the proper size of the CU data blocks acquired by each batch of the sub-threads is ensured, so that the task quantity of each sub-thread can be kept within a certain balance range, and serious task inclination is avoided.

Step S202: and creating the context object through the main thread, and initializing the created context object according to the scale information of the list.

As an optional example, in the method of this embodiment, by creating one data structure, that is, a context object for parallel creation of an index, the context object in the method of this embodiment includes task scheduling information and intermediate result information, where the task scheduling information includes a total number of CU data blocks in a column corresponding to a column storage table index column, a CU data block start current, a CU data block current counter, a lock for protecting update of the CU data block current counter, and a sub-thread step size, and the intermediate result information includes running state information and execution result information of each sub-thread.

As an optional example, the method of the embodiment obtains, by the main thread, from the directory table, the total number of CU data blocks and a CU data block start current of a column corresponding to the column-store table index key; initializing a CU (CU) data block current counter according to the total number of CU data blocks in the corresponding column of the acquired column memory table index key and the CU data block start current, wherein the CU data block current counter is used for recording the start current of the CU data blocks to be processed in the next batch; the present embodiment method also requires creating a lock to protect the CU data block current counter updates and specifying the child thread step size.

As an optional example, in the method of this embodiment, the main thread monitors and coordinates the task of the CU data block handled by the sub-thread through a context object shared with the sub-thread, and after the sub-thread completes the work, the final work of creating the index dictionary table, filling the index dictionary table data, and cleaning the resources is completed.

Step S203: and starting a corresponding number of sub-threads according to a starting threshold and concurrency of the parallel creation of the sub-threads, distributing the initialized context objects to the sub-threads, and executing the creation of the row and memory table index by the sub-threads according to the received context objects.

As an optional example, in the method of this embodiment, the child thread obtains the CU data block start current to be processed through the CU data block current counter in the context object, and atomically increases the CU data block current counter by the step size of the child thread; determining CU data blocks to be processed in the current batch according to the starting current of the CU data blocks to be processed and the step length of the sub-thread; obtaining position information of the CU data blocks by traversing a cudesc record of each CU data block, and reading the CU data blocks according to the position information of the CU data blocks; analyzing all the read CU data blocks of the sub-threads, constructing index tuples, sequencing the constructed index tuples, and storing the execution results of the sub-threads in a context object; and continuing to acquire the next batch of CU data blocks to be processed until the sub-thread completes all execution, updating the state of the context object and ending the sub-thread.

As an optional example, the method of the embodiment may adopt a storage engine and extend the storage engine, and can obtain CU data block information that needs to be processed by a sub-thread according to a context object distributed to the sub-thread, so as to complete the operations of scanning, resolving, constructing an index tuple and ordering the index tuple of the CU data block distributed to the sub-thread.

Step S204: and the main thread performs merging and sorting on the list index created by each sub-thread through the context object to obtain a complete list index.

As an optional example, in the method of this embodiment, after the main thread detects that execution of all the sub-threads is finished through the context object, an execution result of each sub-thread is obtained from the context object, merging and sorting are performed on index tuples created by each sub-thread, and a complete list index is created by creating data dictionary information of the list index.

The method of the embodiment can effectively utilize a multithreading mechanism in the openGauss database, and can list the functional modules of table scanning data, index tuple construction, sequencing and the like, so that a large amount of development work is reduced; by adopting a multithreading parallel mode, scanning data and sequencing are parallel in the process of creating the index, IO and CPU resources of the system are fully utilized, and the index creating performance is improved.

As shown in FIG. 3, the present invention also provides an apparatus comprising a processor 310, a communication interface 320, a memory 330 for storing a processor executable computer program, and a communication bus 340. Wherein the processor 310, the communication interface 320 and the memory 330 perform communication with each other through the communication bus 340. The processor 310 implements the method of creating a columnar-store table btree index on an openGauss database in parallel described above by running an executable computer program.

The computer program in the memory 330 may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a separate product. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The system embodiments described above are merely illustrative, in which elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected based on actual needs to achieve the purpose of the embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on such understanding, the foregoing technical solutions may be embodied essentially or in part in the form of a software product, which may be stored in a computer-readable storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the various embodiments or methods of some parts of the embodiments.

The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. A method for creating a columnar-memory-table btree index in parallel on an openGauss database, the method comprising:

2. The method for parallel creation of a list btree index on an openGauss database according to claim 1, wherein obtaining the size information of the list, setting a start threshold and a concurrency degree of parallel creation of sub-threads according to the size information of the list and an operating system resource, comprises:

3. The method for parallel creation of a list btree index on an openGauss database according to claim 2, wherein the method acquires the size information of the list, sets a starting threshold and a concurrency degree of parallel creation of sub-threads according to the size information of the list and an operating system resource, and further comprises: and setting a sub-thread step length according to the average size value of the obtained CU data blocks, wherein the sub-thread step length is equal to the number of the CU data blocks obtained each time.

4. The method of creating a list btree index in parallel on an openGauss database according to claim 1, wherein the context object includes task scheduling information and intermediate result information, wherein the task scheduling information includes a total number of CU data blocks, a CU data block start current, a CU data block current counter, a lock protecting update of the CU data block current counter, and a sub-thread step size of a column corresponding to a list of the list index, and the intermediate result information includes running status information and execution result information of each sub-thread.

5. The method for parallel creation of a list btree index on an openGauss database according to claim 1, wherein creating a context object by a main thread, initializing the created context object according to size information of the list, comprises:

6. The method for parallel creation of a list btree index on an openGauss database according to claim 5, wherein the CU data block CU id counter is configured to record a starting CU id of a CU data block to be processed in a next batch.

7. The method of creating a list btree index in parallel on an openGauss database of claim 1, wherein the sub-thread performs creation of the list index from the received context object, comprising:

8. The method for parallel creation of a list btree index on an openGauss database according to claim 1, wherein the main thread performs merge ordering on the list index created by each sub-thread through a context object to obtain a complete list index, comprising: when the main thread detects that the execution of all the sub threads is finished through the context object, the execution result of each sub thread is obtained from the context object, the index tuples created by each sub thread are merged and sequenced, and a complete list table index is created through creating the data dictionary information of the list table index.

9. A system for parallelly creating a list btree index on an openGauss database is characterized by comprising an index parallel creation server, wherein the index parallel creation server is used for acquiring scale information of a list, and setting a starting threshold and concurrency of parallel creation of sub-threads according to the scale information of the list and operating system resources; creating a context object through the main thread, and initializing the created context object according to the scale information of the list; starting a corresponding number of sub-threads according to concurrency of the parallel creation of the sub-threads, distributing initialized context objects to the sub-threads, and executing creation of a column-store table index by the sub-threads according to the received context objects; and the main thread performs merging and sorting on the list index created by each sub-thread through the context object to obtain a complete list index.

10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method according to any one of claims 1-8 when the program is executed.