CN114817301A - Optimization method, optimization device, electronic equipment and storage medium - Google Patents

Optimization method, optimization device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114817301A
CN114817301A CN202210546597.4A CN202210546597A CN114817301A CN 114817301 A CN114817301 A CN 114817301A CN 202210546597 A CN202210546597 A CN 202210546597A CN 114817301 A CN114817301 A CN 114817301A
Authority
CN
China
Prior art keywords
operator
data
sorting
thread
distribution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210546597.4A
Other languages
Chinese (zh)
Inventor
万伟
朱仲颖
韩朱忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Dameng Database Co Ltd
Original Assignee
Shanghai Dameng Database Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Dameng Database Co Ltd filed Critical Shanghai Dameng Database Co Ltd
Priority to CN202210546597.4A priority Critical patent/CN114817301A/en
Publication of CN114817301A publication Critical patent/CN114817301A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses an optimization method, an optimization device, electronic equipment and a storage medium. The method comprises the following steps: in a physical plan tree generation stage containing a sorting operator, if the current environment is a multi-thread environment, judging whether a next-layer operator of the sorting operator is loaded to target statistical information or not to obtain a first judgment result; performing optimization operation of adding a summary operator and a distribution operator to the physical plan tree to be generated according to the first judgment result; the distribution operator is an operator for distributing data, and the summary operator is an operator for summarizing data. According to the method, in a multi-thread environment, optimization operation of adding a summary operator and a distribution operator to a physical plan tree to be generated is performed according to a first judgment result, and operation related to data sequence is performed by an auxiliary ordering operator through adding the summary operator and the distribution operator so as to optimize the physical plan tree to be generated, so that the data processing efficiency is improved.

Description

Optimization method, optimization device, electronic equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of database processing, in particular to an optimization method, an optimization device, electronic equipment and a storage medium.
Background
The order of the data in the database may be manipulated by multithreading, such as sorting the data using multithreading. At present, data is generally distributed to a plurality of threads by using a multithreading technology, then each thread arranges the data of the thread in parallel, finally each thread sends the data of the thread to a corresponding main thread, and the main thread merges and orders all the data and outputs a result.
However, in the above implementation, if the amount of data is too large, there is a problem that the merging and sorting pressure of the main thread is too large, and multiple merging and sorting passes may be caused due to too many merging passes, thereby affecting the efficiency of data processing.
Disclosure of Invention
The embodiment of the invention provides an optimization method, an optimization device, electronic equipment and a storage medium, and aims to improve the efficiency of data processing in a multi-thread environment.
According to an aspect of the present invention, there is provided an optimization method, including:
in a physical plan tree generation stage containing a sorting operator, if the current environment is a multi-thread environment, judging whether an operator in the next layer of the sorting operator is loaded to target statistical information or not to obtain a first judgment result;
performing optimization operation of adding a summary operator and a distribution operator to the physical plan tree to be generated according to the first judgment result;
the target statistical information is data statistical information of a row sequence corresponding to the sorting operator, the distribution operator is an operator for data distribution, and the summarization operator is an operator for data summarization.
According to another aspect of the present invention, there is provided an optimization apparatus comprising:
the judging module is used for judging whether an operator in the next layer of the sorting operator is loaded to the target statistical information or not in the generation stage of the physical plan tree containing the sorting operator if the current environment is a multi-thread environment, so as to obtain a first judgment result;
the optimization module is used for executing optimization operation of adding a summary operator and a distribution operator to the physical plan tree to be generated according to the first judgment result;
the target statistical information is data statistical information of a row sequence corresponding to the sorting operator, the distribution operator is an operator for data distribution, and the summarization operator is an operator for data summarization.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the optimization method according to any of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer-readable storage medium storing computer instructions for causing a processor to implement the optimization method according to any one of the embodiments of the present invention when the computer instructions are executed.
In the technical scheme of the embodiment of the invention, in the generation stage of the physical plan tree containing the sequencing operator, if the current environment is a multi-thread environment, whether the next layer of operator passing through the sequencing operator is loaded to the target statistical information is judged to obtain a first judgment result; performing optimization operation of adding a summary operational character and a distribution operational character to the physical plan tree to be generated according to the first judgment result; the target statistical information is data statistical information of a rank sequence corresponding to the sorting operator, the distributing operator is an operator for distributing data, and the summarizing operator is an operator for summarizing data. According to the scheme, in a multi-thread environment, the optimization operation of adding the summary operator and the distribution operator to the physical plan tree to be generated is executed according to the first judgment result, the operation related to the data sequence can be executed by the auxiliary ordering operator through adding the summary operator and the distribution operator so as to optimize the physical plan tree to be generated, and therefore the data processing efficiency is improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present invention, nor do they necessarily limit the scope of the invention. Other features of the present invention will become apparent from the following description.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of an optimization method according to an embodiment of the present invention;
fig. 2 is a flowchart of an optimization method according to a second embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating an implementation of an unoptimized planning tree according to a second embodiment of the present invention;
fig. 4 is a schematic diagram illustrating an implementation of an optimized plan tree according to a second embodiment of the present invention;
FIG. 5 is a schematic diagram illustrating an implementation of another unoptimized planning tree according to a second embodiment of the present invention;
FIG. 6 is a schematic diagram illustrating an implementation of another optimized plan tree according to a second embodiment of the present invention;
fig. 7 is a schematic structural diagram of an optimization apparatus according to a third embodiment of the present invention;
fig. 8 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example one
Fig. 1 is a flowchart of an optimization method according to an embodiment of the present invention, where the embodiment is applicable to a case of performing optimization processing on a data sequence in a multi-thread environment, and the method may be executed by an optimization apparatus, where the optimization apparatus may be implemented in a form of hardware and/or software, and the optimization apparatus may be configured in an electronic device, where the electronic device in this embodiment includes but is not limited to: a server, a computer, a notebook computer, or a tablet computer. As shown in fig. 1, the method includes:
s110, in a physical plan tree generation stage containing a sorting operator, if the current environment is a multi-thread environment, judging whether a next layer operator passing through the sorting operator is loaded to target statistical information or not, and obtaining a first judgment result.
In this embodiment, the sorting operator may refer to an operator for performing a sorting operation on data.
When a database is operated through a structured query statement, an execution plan including what operation is specifically performed is generally generated based on the structured query statement. The execution plan can be implemented in the database as a "binary tree" composed of various operators, i.e., can be understood as an execution plan tree; wherein an operator may also be understood as a plan execution node in the execution plan tree. The operators in the execution plan tree may be executed sequentially from bottom to top, which is not limited herein. The intermediate plan tree may refer to an execution plan tree that is preliminarily generated based on the structured query statement input by the user, so that the intermediate plan tree is subsequently optimized to obtain a final execution plan tree. On this basis, the physical plan tree can be understood as a final execution plan tree obtained by optimizing based on the intermediate plan tree. The physical plan tree generation phase may be understood as a phase of generating a physical plan tree based on intermediate plan tree optimization.
The current environment may be understood to be the computing environment in which the database is currently processing data. A multi-threaded environment may be understood as an environment in a database that utilizes multi-threading techniques to sequentially manipulate data.
In a "binary tree" structure, operators are typically connected in a top-bottom manner, so that operators located one level below a current operator may be referred to as next-level operators, and correspondingly, operators located one level above the current operator may be referred to as previous-level operators. On this basis, the operator of the next layer of the sort operator may be understood as the operator of the layer below the sort operator that is connected to the sort operator in the execution plan.
The target statistical information may be understood as data statistical information of a rank sequence corresponding to the sorting operator, for example, the target statistical information may include data statistical information such as the number of data included in the rank sequence, the maximum value of the rank sequence, and the minimum value of the rank sequence. The rank sequence is understood to be the column data of the operation corresponding to the rank order operator.
In this embodiment, in the generation phase of the physical plan tree including the sorting operator, if the current environment is a multi-thread environment, it is determined whether an operator in a next layer of the sorting operator is loaded to the target statistical information, so as to obtain a first determination result.
The present embodiment does not specifically limit how to determine whether the current environment is a multi-thread environment, for example, whether the current environment is a multi-thread environment may be determined according to the relevant configuration information of the database.
The first determination result may refer to a determination result of whether an operator of a lower layer of the sorting operator can be loaded to the target statistical information. The first determination result may include loading to the target statistic by a next-layer operator of the sorting operator, or not loading to the target statistic by a next-layer operator of the sorting operator.
And S120, according to the first judgment result, performing optimization operation of adding a summary operator and a distribution operator to the physical plan tree to be generated.
In this embodiment, the physical plan tree to be generated may be understood as a physical plan tree to be generated. A distribution operator may be understood as an operator for data distribution, e.g. the distribution operator may be denoted as a DIS operator. Summary operators are understood to be operators for data summarization, e.g. a summary operator may be denoted as a GAT operator.
And according to the first judgment result, performing optimization operation of adding a summary operator and a distribution operator to the physical plan tree to be generated. An optimization operation may be understood as an operation that optimizes the physical plan tree to be generated to generate a final physical plan tree. It should be noted that the first judgment result is loaded to the target statistical information for optimization, for example, a summary operator may be inserted between the sorting operator and an operator on a layer above the sorting operator, and a distribution operator may be inserted between the sorting operator and an operator on a layer below the sorting operator, so as to assist the sorting operator in performing corresponding processing on data through the summary operator and the distribution operator; the first judgment result is that the target statistical information is not loaded and is not required to be optimized, because the corresponding target statistical information is not loaded at this time, the optimization will reduce the performance of the physical plan tree to be generated.
In this embodiment, different optimization operations may be performed by determining whether an operator of a layer above the sort operator is a merge join operator. The merge join operator may refer to an operator for performing a merge join operation on data, may also be considered as an operator for performing a related process on a data sequence, and may also assist the related process on the data sequence by using a summary operator and a distribution operator, so that if an operator on a higher layer of the sort operator is a merge join operator, the two operators may share one summary operator and one distribution operator under certain conditions.
In one embodiment, an operator above the sort operator is not a merge join operator. On this basis, the process of performing the optimization operation of adding the summarization operator and the distribution operator to the physical plan tree to be generated according to the first judgment result of the sorting operator may be: if the first judgment result of the sorting operator is that the target statistical information is not loaded, the optimization can be quitted; if the first judgment result of the sorting operator is loading to the target statistical information, since the operator in the upper layer of the sorting operator is not the merge join operator, a summary operator may be directly inserted between the sorting operator and the operator in the upper layer of the sorting operator, and a distribution operator may be inserted between the sorting operator and the operator in the lower layer of the sorting operator for optimization.
In one embodiment, a top level operator of the sort operator is a merge join operator. On this basis, the process of performing the optimization operation of adding the summarization operator and the distribution operator to the physical plan tree to be generated according to the first judgment result of the sorting operator may be: if the first determination of the sort operator is not loaded into the target statistics, then the optimization can be exited.
If the first determination result of the sorting operator is loading the target statistical information, a distribution operator may be inserted between the sorting operator and an operator in a layer below the sorting operator. It will be appreciated that a merge join operator will typically have two child nodes. On this basis, if two child nodes of the merge join operator include the sort operator and another sort operator other than the sort operator, it may be determined that the other sort operator is optimized if the next-layer operator of the two sort operators is able to be loaded with the corresponding target statistic (i.e., the next-layer operator of the other sort operator is able to be loaded with the corresponding target statistic), and a summary operator may be inserted between the merge join operator and the previous-layer operator of the merge join operator. In the event that a determination is made that an operator next to another sort operator is not loaded with corresponding target statistics, indicating that the other sort operator is not suitable for optimization, an aggregation operator may be inserted between the sort operator and an operator above the sort operator.
If the two child nodes of the merge join operator include a sort operator and another non-sort operator other than the sort operator, a summary operator may be inserted directly between the merge join operator and the operator above the merge join operator.
Optionally, the method further includes: in the generation stage of the physical plan tree, setting a first mark for a distribution operator and setting a second mark for a summary operator;
in the statement execution phase, a distribution operator containing a first token is executed as follows: according to the processing range corresponding to each thread, first data which belong to the corresponding processing range in each thread are reserved, second data which do not belong to the processing range are distributed to a target thread, a first result obtained by executing a distribution operator character is sent to an operator character on the upper layer of the distribution operator character, and the target thread is a thread corresponding to the processing range to which the second data belong; the summary operator containing the second label is executed as follows: and receiving and outputting data of each thread according to the sequence of the number of each thread from small to large, and sending a second result obtained by executing the summary operator to an operator at the upper layer of the summary operator.
In the physical plan tree generation stage, after the distribution operator and the summary operator are added into the physical plan tree to be generated, a first flag may be set for the distribution operator, and a second flag may be set for the summary operator. Wherein the first token is usable to characterize a distribution operator to be executed in accordance with the processed distribution optimization. Accordingly, the second token may be used to characterize the summary operator to be performed in the summary optimized manner being processed. The first token and the second token are usable in a subsequent statement execution stage to execute the distribute operator and the summarize operator in a corresponding optimized manner according to the tokens on the distribute operator and the summarize operator.
In the statement execution phase, the distribution operator containing the first token (i.e., the distribution optimization mode) may be executed as follows: according to the processing range corresponding to each thread, first data which belongs to the corresponding processing range in each thread is reserved, second data which does not belong to the processing range is distributed to a target thread, a first result obtained by executing a distribution operator is sent to an upper operator of the distribution operator, and subsequent data processing is carried out by the upper operator.
The statement execution stage can be understood as a stage of executing the structured query statement based on the physical plan tree. The target thread may be understood as a thread corresponding to a processing scope to which the second data belongs. The processing scope may be understood as the range of data that a thread may use for data processing. For each thread, the first data can be understood as data belonging to the corresponding processing range of the thread; accordingly, the second data may be understood as data that does not fall within the processing range of the thread. The first result may be understood as a data result after executing the distribution operator.
Optionally, before setting the first flag for the distribution operator, the method further includes: and determining the number of threads and a processing range corresponding to each thread according to the target statistical information and the computing resource information, wherein the processing range corresponding to each thread is different.
The computing resource information may be understood as computing resource information for processing data, for example, the computing resource information may include cpu performance information, memory size information, and i/o port performance information. How to determine the number of threads and the corresponding processing range of each thread according to the target statistical information and the computing resource information is not limited herein. If the target statistical information shows that the number of data is large and the number of resources used for calculating the resource information is large, a large number of threads can be determined; on the basis, a total processing range is determined according to the maximum value and the minimum value of the data in the target statistical information, and then a sub-processing range is allocated to each thread to serve as a corresponding processing range, wherein the sub-processing range is not specifically limited and can be flexibly set according to actual requirements.
Optionally, after determining the number of threads and the processing range of each thread according to the target statistical information and the computing resource information, the method further includes: controlling each thread to read the same amount of data to be processed from the memory respectively according to the ratio of the total number of the data to be processed to the number of the threads, wherein the data to be processed read by each thread is different; the data to be processed is the data required to be processed by the sorting operator.
After the number of threads and the processing range of each thread are determined according to the target statistical information and the computing resource information, in order to keep the time length for reading data of each thread approximately consistent and avoid that a certain thread takes longer than other threads due to too much data reading amount, each thread can be controlled to read the same amount of data to be processed from the memory for storing the corresponding data to be processed respectively according to the ratio of the total number of the data to be processed to the number of the threads (namely, determining an average value as the data amount read by each thread). The data to be processed read by each thread is different, so that the overlapping of the data read by the threads and the missing reading of part of the data to be processed are avoided.
In the statement execution phase, a summary operator (i.e., summary optimization mode) containing the second token may be executed as follows: and receiving and outputting data of each thread according to the sequence of the number of each thread from small to large, and sending a second result obtained by executing the summary operator to an operator at the upper layer of the summary operator.
The thread number may refer to a number set for each thread, such as thread 0, thread 1, and thread 2. It is understood that, in order to ensure that the data received by the summary operator in the order of the numbers of the threads from small to large is ordered, the threads may be numbered in the order of the front and back of the processing range of each thread. Exemplarily, assuming that the data to be processed is 100 different data within the range of 1-100, determining 3 threads, and the processing ranges are 1-30, 31-60, and 61-100, respectively; the thread number may be set according to the processing range of each thread, such as the thread with the top processing range (i.e., the range of 1-30) is numbered as thread 0, the thread with the range of 31-60 is numbered as thread 1, and so on, and the thread with the bottom processing range (i.e., the range of 61-100) is numbered as thread 2.
In this embodiment, in a generation stage of a physical plan tree including a sorting operator, if a current environment is a multi-thread environment, determining whether an operator in a next layer of the sorting operator is loaded to target statistical information, to obtain a first determination result; performing optimization operation of adding a summary operator and a distribution operator to the physical plan tree to be generated according to the first judgment result; the target statistical information is data statistical information of a rank sequence corresponding to the sorting operator, the distributing operator is an operator for distributing data, and the summarizing operator is an operator for summarizing data. According to the method, in a multi-thread environment, optimization operation of adding a summary operator and a distribution operator to the to-be-generated physical plan tree is executed according to a first judgment result, the summary operator and the distribution operator can be added to assist an ordering operator to execute operation related to data sequence so as to optimize the to-be-generated physical plan tree, and therefore the data processing efficiency is improved.
Example two
Fig. 2 is a flowchart of an optimization method according to a second embodiment of the present invention, which is further detailed based on the above-mentioned embodiment. In this embodiment, a process of performing an optimization operation of adding a summary operator and a distribution operator to a physical plan tree to be generated according to the first determination result is specifically described. As shown in fig. 2, the method includes:
s210, in a physical plan tree generation stage containing a sorting operator, if the current environment is a multi-thread environment, judging whether a next-layer operator of the sorting operator is loaded to the target statistical information or not, and obtaining a first judgment result.
S220, judging whether the operator at the upper layer of the sorting operator is a merging and connecting operator; if not, executing S230, otherwise executing S250.
In this embodiment, if the operator in the upper layer of the sorting operator is not the merge-join operator, S230 may be executed; if the operator of the upper layer of the sorting operator is the merge join operator, S250 may be performed.
S230, determining whether the first determination result is the target statistical information, if yes, performing S240, otherwise, performing S290.
In this embodiment, if the operator in the upper layer of the sorting operator is not the merge-join operator, it is determined whether the first determination result of the sorting operator is loaded to the target statistical information, if so, S240 may be performed, and if not, S290 may be performed.
S240, inserting a distribution operator between the sorting operator and the operator at the next layer of the sorting operator, and inserting a summary operator between the sorting operator and the operator at the upper layer of the sorting operator.
In this embodiment, if an operator in a layer above the sorting operator is not a merge join operator and the first determination result of the sorting operator is loading to the target statistical information, a distribution operator may be inserted between the sorting operator and an operator in a layer below the sorting operator, and a summary operator may be inserted between the sorting operator and an operator in a layer above the sorting operator for performing corresponding optimization.
S250, judging whether the first judgment result is loaded to the target statistical information or not; if so, perform S260, otherwise perform S290.
In this embodiment, if the operator in the upper layer of the sorting operator is the merge join operator, it is determined whether the first determination result of the sorting operator is the loaded target statistical information; if so, S260 may be performed, otherwise, S290 may be performed.
S260, judging whether the two child nodes of the merging connection operator comprise a sorting operator and another sorting operator except the sorting operator, if so, executing S270, and otherwise, executing S280.
In this embodiment, it is determined whether the two child nodes merging the join operator include the sort operator and another sort operator other than the sort operator, and if yes, S270 may be executed; if not, which may indicate that the two child nodes of the merge join operator include a sort operator and another non-sort operator other than the sort operator, S280 may be performed.
S270, inserting a distribution operator into the sorting operator and the operator at the lower layer of the sorting operator, judging whether the operator at the lower layer of the other sorting operator is loaded to the target statistical information to obtain a second judgment result, and executing optimization operation of adding the summary operator into the physical plan tree to be generated according to the second judgment result.
In this embodiment, the second determination result may be understood as a determination result of whether an operator in a next layer passing through another sorting operator can be loaded to the corresponding target statistical information.
In the case where the two child nodes include a sort operator and another sort operator other than the sort operator, a distribution operator may be inserted between the sort operator and a next-level operator of the sort operator, and based thereon, an optimization operation of adding a summary operator to the physical plan tree to be generated may be performed according to a second determination result. Specifically, if the second determination result is that the corresponding target statistical information can be loaded through a next-layer operational character of another sorting operational character, a summary operational character may be inserted between the merging connection operational character and an upper-layer operational character of the merging connection operational character; if the second determination result is that the corresponding target statistical information is not loaded by a next-layer operator of another sort operator, a summary operator may be inserted between the sort operator and an operator of a previous layer of the sort operator.
Optionally, the performing, according to the second judgment result, an optimization operation of adding a summary operator to the physical plan tree to be generated includes: if the second judgment result is that the corresponding target statistical information is loaded through a next-layer operator of another sorting operator, inserting a summary operator between the merging connection operator and an upper-layer operator of the merging connection operator; and if the second judgment result is that the corresponding target statistical information is not loaded through a next-layer operator of another sorting operator, inserting a summary operator between the sorting operator and an operator at the upper layer of the sorting operator.
S280, a summary operator is inserted between the merging connection operator and an operator on the upper layer of the merging connection operator, and a distribution operator is inserted between the sorting operator and an operator on the lower layer of the sorting operator.
In this embodiment, if the two child nodes of the merge join operator include a sort operator and another non-sort operator other than the sort operator, a summary operator may be inserted between the merge join operator and an operator on a layer above the merge join operator, and a distribution operator may be inserted between the sort operator and an operator on a layer below the sort operator, so as to perform corresponding optimization.
And S290, exiting the optimization.
The second embodiment provides an optimization method, which embodies a process of performing an optimization operation of adding a summary operator and a distribution operator to a physical plan tree to be generated according to the first judgment result. According to the method, optimization operations of adding a summary operational character and distributing the operational character are executed on the physical plan tree to be generated according to the first judgment result, and the data processing efficiency is improved; in addition, under the condition that the operator in the upper layer of the sorting operator is the merging and connecting operator, different optimization of the sorting operator and the merging and connecting operator is realized through corresponding judgment of the merging and connecting operator, and the flexibility of optimization is improved.
The present invention is exemplified below.
The multithreading sorts data, and a general implementation mode is that data is distributed to a plurality of threads, then each thread sorts the data of the thread in parallel, then each thread sends the data to a main thread, and the main thread merges and sorts all the data and outputs a result.
However, the above method has a problem that the merging and sorting pressure of the main thread is too large and the number of merging ways is too large, which may cause the merging and sorting of multiple passes. On the other hand, in the database system, the characteristics of the data (i.e. the target statistical information) including the maximum and minimum values, the total number of the data, the total scale of the data, and the like can be known in advance by means of the statistical information, and at this time, the multithreading data sorting mode can be improved and optimized according to the data characteristics, and the invention provides the following improvement method:
and (4) averagely and randomly reading data from a disk or a memory by each thread according to the total number of the data, wherein the read data are not overlapped. And calculating and setting a data range required to be processed by each thread, keeping the data belonging to the data processing range of each thread in the thread according to the data processing range, and distributing the data out of the range to the designated thread. After the data distribution of each thread is completed, the data of the thread is sequenced and then sequentially sent back to the main thread, and the main thread sequentially outputs the data according to the thread numbering sequence without merging and sequencing the data. Because the data output of the main thread and the data sorting of other threads are executed in series, when the main thread outputs data, some threads which do not finish the data sorting can sort the data next, so that the aim of parallelism is fulfilled, and machine resources are fully utilized. The specific data allocation manner may be determined by combining the performance of a Central Processing Unit (CPU), the size of a disk or a memory, the Input/Output (I/O) performance, and the like, which is not described herein again.
For example, assume that there are 100 different numbers 1-100 out of order in the database, and four threads are allocated to sort them according to the data size. Firstly, the 4 threads read data randomly in a disk or a memory according to 25 numbers of each thread, and the data processing ranges of the threads obtained by calculation are respectively as follows: 1-10, 11-30, 31-60, 61-100, numbering for 4 threads according to the data processing range size: the number 0 of threads 1-10, the number 1 of threads 11-30, the number 2 of threads 31-60 and the number 3 of threads 61-100 are respectively read by the threads, data distribution is carried out on 25 data read by each thread according to the data processing range, the data of the thread is sequenced after the data distribution is finished, and the data are sent to the main thread for output once the sequencing is finished. The main thread may be sequentially output according to the thread number order (0, 1, 2, 3).
It should be noted that the algorithm idea of the present invention can be used not only in a multi-thread environment, but also in other multi-site cluster environments, that is, in a non-single thread environment.
In addition, the algorithm idea can be used for sorting, and any operation related to data sequence can achieve better performance by using the algorithm. If merging connection operation is carried out, the data on two connected sides are ordered, then under a multi-thread or multi-process cluster environment, the characteristic data can be distributed to different threads or processes, each thread carries out merging connection by itself, and the result is output according to the order after the merging connection operation is completed.
The general database processing procedure for Structured Query Language (SQL) includes: lexical analysis, syntactic analysis, semantic analysis, generation of an intermediate plan tree, generation of a physical plan tree, sentence execution and the like. Where operator nodes of a variety of different function types may be included in the plan tree.
The optimization method provided by the invention mainly comprises the following steps:
1. in the stage of generating the physical plan tree, the step 2 processing is carried out on the ordering operational characters, and other types of operational characters are not processed;
2. judging whether the current environment is a multi-thread environment or a multi-site cluster environment, if so, performing the step 3, and if not, exiting the optimization;
3. judging whether the lower node (namely the next layer operational character) of the sequencing operational character can load the effective statistical information (namely the target statistical information) of the sequencing so as to obtain the data characteristics (the data characteristics can be understood as the information such as the maximum and minimum values of the data, the number of the data and the like obtained according to the statistical information), if so, carrying out the step 4, otherwise, exiting the optimization (if the data characteristics cannot be accurately obtained, the performance after the optimization is easy to be reduced, and therefore, the optimization can not be carried out on the situation);
4. for the sort operator, a communicate operator (e.g., DIS operator) is generated below it to distribute data. The optimization flag F1 is set to TRUE indicating that data needs to be distributed according to the data processing scope. And calculating and setting the thread number of the communication operator and the data processing range corresponding to each thread according to the data characteristics and the machine performance (such as the CPU performance, the disk/memory size, the I/O performance and the like of the machine). Carrying out the step 5;
5. judging whether a merge connection operator exists above the sorting operator (namely whether the sorted data needs to be merged and connected), and if not, generating a communication operator (such as a GAT operator) above the sorting operator for summarizing the data; if the operation exists, judging whether another child node of the merge join operator is a sorting operator, and if not, generating a communication operator (such as a GAT operator) above the merge join operator for summarizing data; if so, the judgment of the step 3 is carried out on the other sorting operator, if the result is that the valid statistical information can be loaded, a communication operator (such as a GAT operator) is generated above the merging connection operator for summarizing the data, otherwise, a communication operator (such as a GAT operator) is generated above the current sorting operator for summarizing the data. Setting an optimization flag F2 as TRUE, indicating that data needs to be received and output according to the thread number sequence when data summarization is carried out;
6. in the statement execution phase, for a communication operator (such as a DIS operator) used for distributing data, if the optimization flag F1 is TRUE, each thread sends data in its own data processing range to an upper layer operator (the upper layer operator sorts the data) according to the data processing range corresponding to each thread, and sends data outside its own data processing range to the corresponding thread;
7. in the statement execution phase, for a communication operator (such as a GAT operator) for summarizing data, if the optimization flag F2 is TRUE, data sent in batches by a thread with a processing thread number of 0 is received first (after one thread completes data sorting or merging connection, the processed data can be sent to a main thread in batches), and when a batch of data is processed, a batch of data is returned to an upper layer operator node until all data sent by the thread with the processing thread number of 0 is received. And receiving and processing data sent by the thread with the thread number of 1 in batches, and so on until all the thread data are received and processed.
Fig. 3 is a schematic diagram illustrating an implementation of an unoptimized plan tree according to a second embodiment of the present invention. As shown in FIG. 3, in the unoptimized plan tree, a SORT operator (i.e., SORT operator) is concatenated with an index scan operator (i.e., CSCN operator). Wherein, the CSCN operator can be considered as a next layer operator of the SORT operator; the SORT operator may be considered to be a top level operator of the CSCN operator.
Fig. 4 is a schematic diagram illustrating an implementation of an optimized plan tree according to a second embodiment of the present invention. As shown in FIG. 4, the planning tree shown in FIG. 3 is optimized using the optimization method provided by the present invention. A GAT operator (i.e., a summary operator) may be generated between a SORT operator and an operator above the SORT operator, and a DIS operator (i.e., a distribution operator) may be generated between the SORT operator and a CSCN operator.
Fig. 5 is a schematic diagram illustrating an implementation of another unoptimized planning tree according to a second embodiment of the present invention. As shown in fig. 5, the left and right child nodes of the next layer of the merge join operator (i.e., MI operator) are respectively connected with the SORT operators, the next layer of each SORT operator is connected with a CSCN operator, and the corresponding target statistics may be loaded by the next layer operator of each SORT operator.
Fig. 6 is a schematic diagram illustrating an implementation of another optimized plan tree according to a second embodiment of the present invention. As shown in FIG. 6, the planning tree shown in FIG. 5 is optimized using the optimization method provided by the present invention. Generating a GAT operator (i.e. a summary operator) between the MI operator and an operator above the MI operator; for each SORT operator, a DIS operator (i.e., a distribute operator) is generated between the SORT operator and the CSCN operator.
EXAMPLE III
Fig. 7 is a schematic structural diagram of an optimization apparatus according to a third embodiment of the present invention. As shown in fig. 7, the apparatus includes:
a determining module 310, configured to determine, in a generation stage of a physical plan tree including a sorting operator, whether an operator in a next layer of the sorting operator is loaded to target statistical information if a current environment is a multi-thread environment, so as to obtain a first determination result;
the optimization module 320 is configured to perform an optimization operation of adding a summary operator and a distribution operator to the to-be-generated physical plan tree according to the first determination result;
the target statistical information is data statistical information of a row sequence corresponding to the sorting operator, the distribution operator is an operator for data distribution, and the summarization operator is an operator for data summarization.
In the third embodiment, by the determining module 310, in the generation stage of the physical plan tree including the sorting operator, if the current environment is a multi-thread environment, it is determined whether the next-layer operator passing through the sorting operator is loaded to the target statistical information, so as to obtain a first determination result; performing, by the optimization module 320, an optimization operation of adding a summary operator and a distribution operator to the physical plan tree to be generated according to the first determination result; the target statistical information is data statistical information of a rank sequence corresponding to the sorting operator, the distributing operator is an operator for distributing data, and the summarizing operator is an operator for summarizing data. The device performs optimization operation of adding a summary operator and a distribution operator to the physical plan tree to be generated according to the first judgment result in a multi-thread environment, and can assist the ordering operator to perform operation related to data sequence by adding the summary operator and the distribution operator to optimize the physical plan tree to be generated, thereby improving the efficiency of data processing.
Optionally, an operator in a layer above the sorting operator is not a merge join operator;
the optimization module 320 includes:
the first optimization unit is used for exiting optimization if the first judgment result is that the target statistical information is not loaded;
a second optimization unit, configured to insert a distribution operator between the sorting operator and an operator in a next layer of the sorting operator and insert a summary operator between the sorting operator and an operator in a previous layer of the sorting operator if the first determination result is that the target statistical information is loaded.
Optionally, an operator in a layer above the sorting operator is a merge join operator;
the optimization module 320 further includes:
a third optimization unit, configured to exit optimization if the first determination result is that the target statistical information is not loaded;
a fourth optimization unit, configured to insert a distribution operator between the sorting operator and an operator in a next layer of the sorting operator if the first determination result is that the distribution operator is loaded to the target statistical information;
a fifth optimization unit, configured to, if the two child nodes of the merging connection operational character include the sorting operational character and another sorting operational character other than the sorting operational character, determine whether an operational character of a next layer of the another sorting operational character is loaded to the target statistical information, obtain a second determination result, and perform, according to the second determination result, an optimization operation of adding the merging operational character to the physical plan tree to be generated;
a fifth optimizing unit for inserting a summary operator between the merge join operator and an operator on a layer above the merge join operator if two child nodes of the merge join operator include the sort operator and another non-sort operator other than the sort operator.
Optionally, the performing, according to the second judgment result, an optimization operation of adding a summary operator to the physical plan tree to be generated includes:
if the second judgment result is that the target statistical information is loaded through a next-layer operator of the another sorting operator, inserting a summary operator between the merging join operator and a previous-layer operator of the merging join operator;
and if the second judgment result is that the target statistical information is not loaded by the operator in the next layer of the another sorting operator, inserting a summary operator between the sorting operator and the operator in the upper layer of the sorting operator.
Optionally, the apparatus further comprises:
the setting module is used for setting a first mark for the distribution operator character and setting a second mark for the summary operator character in the generation stage of the physical plan tree;
a first execution module, configured to execute, in a statement execution phase, a distribution operator containing the first token as follows: according to the processing range corresponding to each thread, first data which belongs to the corresponding processing range in each thread is reserved, second data which does not belong to the processing range is distributed to a target thread, a first result obtained by executing the distribution operator character is sent to an upper-layer operator character of the distribution operator character, and the target thread is a thread corresponding to the processing range to which the second data belongs;
a second execution module to execute the summary operator containing the second token as follows: and receiving and outputting data of each thread according to the sequence of the number of each thread from small to large, and sending a second result obtained by executing the summary operator to an operator at the upper layer of the summary operator.
Optionally, the apparatus further comprises:
and the determining module is used for determining the number of threads and the processing range corresponding to each thread according to the target statistical information and the computing resource information before setting the first mark for the distribution operator, wherein the processing range corresponding to each thread is different.
Optionally, the apparatus further comprises:
the reading module is used for controlling each thread to read the same amount of data to be processed from the memory respectively according to the ratio of the total number of the data to be processed to the number of the threads after determining the number of the threads and the processing range of each thread according to the target statistical information and the computing resource information, and the data to be processed read by each thread is different;
and the data to be processed is the data required to be processed by the sorting operator.
The optimization device provided by the embodiment of the invention can execute the optimization method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
Example four
Fig. 8 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention. The electronic device 10 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 8, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM)12, a Random Access Memory (RAM)13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM)12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the electronic apparatus 10 can also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
A number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, or the like. The processor 11 performs the various methods and processes described above, such as the optimization method.
In some embodiments, the optimization method may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into the RAM 13 and executed by the processor 11, one or more steps of the optimization method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the optimization method by any other suitable means (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for implementing the methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on a machine, as a stand-alone software package partly on a machine and partly on a remote machine or entirely on a remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solution of the present invention can be achieved.
The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method of optimization, the method comprising:
in a physical plan tree generation stage containing a sorting operator, if the current environment is a multi-thread environment, judging whether an operator in the next layer of the sorting operator is loaded to target statistical information or not to obtain a first judgment result;
performing optimization operation of adding a summary operator and a distribution operator to the physical plan tree to be generated according to the first judgment result;
the target statistical information is data statistical information of a row sequence corresponding to the sorting operator, the distribution operator is an operator for data distribution, and the summarization operator is an operator for data summarization.
2. The method of claim 1, wherein a top operator of the sort operator is not a merge join operator;
the performing, according to the first determination result, an optimization operation of adding a summarization operator and a distribution operator to the physical plan tree to be generated includes:
if the first judgment result is that the target statistical information is not loaded, exiting the optimization;
and if the first judgment result is that the target statistical information is loaded, inserting a distribution operator between the sorting operator and an operator at the lower layer of the sorting operator, and inserting a summary operator between the sorting operator and an operator at the upper layer of the sorting operator.
3. The method of claim 1, wherein a top operator of the sort operators is a merge join operator;
the performing, according to the first determination result, an optimization operation of adding a summarization operator and a distribution operator to the physical plan tree to be generated includes:
if the first judgment result is that the target statistical information is not loaded, exiting the optimization;
if the first judgment result is that the target statistical information is loaded, inserting a distribution operator between the sorting operator and an operator at the lower layer of the sorting operator;
if the two child nodes of the merging connection operational character comprise the sorting operational character and another sorting operational character except the sorting operational character, judging whether a next-layer operational character of the another sorting operational character is loaded to target statistical information to obtain a second judgment result, and executing optimization operation of adding the gathering operational character into the physical plan tree to be generated according to the second judgment result;
if the two child nodes of the merge join operator include the sort operator and another non-sort operator other than the sort operator, then a gather operator is inserted between the merge join operator and an operator above the merge join operator.
4. The method of claim 3, wherein performing an optimization operation to add a summary operator to the physical plan tree to be generated based on the second determination comprises:
if the second judgment result is that the target statistical information is loaded through a next-layer operator of the another sorting operator, inserting a summary operator between the merging join operator and a previous-layer operator of the merging join operator;
and if the second judgment result is that the target statistical information is not loaded by the operator in the next layer of the another sorting operator, inserting a summary operator between the sorting operator and the operator in the upper layer of the sorting operator.
5. The method of claim 1, further comprising:
setting a first mark for the distribution operator and setting a second mark for the summary operator in the physical plan tree generation phase;
in the statement execution phase, a distribution operator containing the first token is executed as follows: according to the processing range corresponding to each thread, first data which belongs to the corresponding processing range in each thread is reserved, second data which does not belong to the processing range is distributed to a target thread, a first result obtained by executing the distribution operator character is sent to an upper-layer operator character of the distribution operator character, and the target thread is a thread corresponding to the processing range to which the second data belongs;
executing a summary operator containing the second token as follows: and receiving and outputting data of each thread according to the sequence of the number of each thread from small to large, and sending a second result obtained by executing the summary operator to an operator at the upper layer of the summary operator.
6. The method of claim 5, further comprising, prior to setting the first flag for the distribute operator:
and determining the number of threads and the processing range corresponding to each thread according to the target statistical information and the computing resource information, wherein the processing range corresponding to each thread is different.
7. The method of claim 6, further comprising, after determining the number of threads and the processing scope of each thread based on the target statistics and the computational resource information:
controlling each thread to read the same amount of data to be processed from the memory respectively according to the ratio of the total number of the data to be processed to the number of the threads, wherein the data to be processed read by each thread is different;
and the data to be processed is the data required to be processed by the sorting operator.
8. An optimization device, comprising:
the judging module is used for judging whether an operator in the next layer of the sorting operator is loaded to the target statistical information or not in the generation stage of the physical plan tree containing the sorting operator if the current environment is a multi-thread environment, so as to obtain a first judgment result;
the optimization module is used for executing optimization operation of adding a summary operator and a distribution operator to the physical plan tree to be generated according to the first judgment result;
the target statistical information is data statistical information of a row sequence corresponding to the sorting operator, the distribution operator is an operator for data distribution, and the summarization operator is an operator for data summarization.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the optimization method of any one of claims 1-7.
10. A computer-readable storage medium storing computer instructions for causing a processor to perform the optimization method of any one of claims 1-7 when executed.
CN202210546597.4A 2022-05-18 2022-05-18 Optimization method, optimization device, electronic equipment and storage medium Pending CN114817301A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210546597.4A CN114817301A (en) 2022-05-18 2022-05-18 Optimization method, optimization device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210546597.4A CN114817301A (en) 2022-05-18 2022-05-18 Optimization method, optimization device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114817301A true CN114817301A (en) 2022-07-29

Family

ID=82514785

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210546597.4A Pending CN114817301A (en) 2022-05-18 2022-05-18 Optimization method, optimization device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114817301A (en)

Similar Documents

Publication Publication Date Title
CN114817301A (en) Optimization method, optimization device, electronic equipment and storage medium
CN115168358A (en) Database access method and device, electronic equipment and storage medium
CN114722048A (en) Data processing method and device, electronic equipment and storage medium
CN115563310A (en) Method, device, equipment and medium for determining key service node
CN115438007A (en) File merging method and device, electronic equipment and medium
CN115309658A (en) Test data derivation method, device, equipment and storage medium
CN114676177A (en) Financial index determination method, device, equipment, medium and product
CN115080607A (en) Method, device, equipment and storage medium for optimizing structured query statement
CN117057411B (en) Large language model training method, device, equipment and storage medium
CN117520461B (en) Distribution method, device, equipment and medium of logic fragments
CN116383454B (en) Data query method of graph database, electronic equipment and storage medium
CN115098405B (en) Software product evaluation method and device, electronic equipment and storage medium
CN115510140A (en) Data extraction method, device, equipment and storage medium
CN117112162A (en) Data processing method, device, equipment and storage medium
CN114896075A (en) Image reconstruction method and device, electronic equipment and storage medium
CN116151607A (en) Data processing method, apparatus, device, storage medium and computer program product
CN117520601A (en) Graph database query method and device, storage medium, equipment and product
CN115563103A (en) Multi-dimensional aggregation method, system, electronic device and storage medium
CN116954922A (en) Distributed storage method, device, equipment and medium
CN115033823A (en) Method, apparatus, device, medium and product for processing data
CN114416881A (en) Real-time synchronization method, device, equipment and medium for multi-source data
CN115202791A (en) Method and device for determining first screen loading resource, server and storage medium
CN117271132A (en) Database-based big data set operation method, device, equipment and medium
CN117786453A (en) Method, device, equipment and storage medium for identifying type of cabinet
CN115934550A (en) Test method, test device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination