CN111813761B

CN111813761B - Database management method, device and computer storage medium

Info

Publication number: CN111813761B
Application number: CN202010584234.0A
Authority: CN
Inventors: 黄乐; 朱林浩; 何林强
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2020-06-23
Filing date: 2020-06-23
Publication date: 2024-07-12
Anticipated expiration: 2040-06-23
Also published as: CN111813761A

Abstract

The application discloses a database management method, a device and a computer storage medium, wherein the database management method comprises the following steps: the database management method comprises the following steps: acquiring inquiry request information, and generating an execution plan according to the inquiry request information; distributing the execution plan to a plurality of computing nodes so that the computing nodes execute the query request according to the execution plan and record the scanned tuple number; obtaining query request results of a plurality of computing nodes and the number of tuples; comparing the number of the tuples of the plurality of computing nodes to obtain a first computing node corresponding to the maximum value of the number of the tuples and a second computing node corresponding to the minimum value of the number of the tuples; and when the difference between the maximum value of the number of the tuples and the minimum value of the number of the tuples reaches a preset condition, migrating the tuple data of the first computing node to the second computing node. By the method, the data inclination problem can be dynamically found, and data migration is performed, so that optimal data distribution is achieved.

Description

Database management method, device and computer storage medium

Technical Field

The present application relates to the field of database management technologies, and in particular, to a database management method, a database management device, and a computer storage medium.

Background

At present, as a large amount of business data and various data accumulated by a social network are more and more, how to efficiently store the data can have a great influence on quickly searching records meeting conditions in mass data storage.

In conventional distributed databases, common data distribution manners are hash distribution (hash distribution), range distribution, and random distribution. The use of both hash distribution and range distribution can lead to data skew problems if hot spot data is present. The effects of data skew include: some computing nodes have larger data volume, and some computing nodes have smaller data volume, so that the query time is longer; because of the faster query speed on some nodes, the resources are not utilized, and the resources are released.

Disclosure of Invention

The application provides a database management method, a database management device and a computer storage medium, which are used for solving the problem that data inclination easily occurs in the prior art.

In order to solve the technical problems, the application adopts a technical scheme that: there is provided a database management method including:

acquiring inquiry request information, and generating an execution plan according to the inquiry request information;

Distributing the execution plan to a plurality of computing nodes so that the computing nodes execute a query request according to the execution plan and record the number of scanned tuples;

Acquiring query request results of the plurality of computing nodes and the number of tuples;

Comparing the number of the tuples of the plurality of computing nodes, and acquiring a first computing node corresponding to the maximum value of the number of the tuples and a second computing node corresponding to the minimum value of the number of the tuples;

And when the difference between the maximum value of the number of the tuples and the minimum value of the number of the tuples reaches a preset condition, migrating the tuple data of the first computing node to the second computing node.

The step of obtaining the query request results and the tuple number of the plurality of computing nodes includes:

acquiring each tuple record and the corresponding record number in the plurality of computing nodes;

sorting tuple records from big to small by the number of records for each compute node;

and recording the tuple number of the plurality of computing nodes, the first M tuple records of each computing node and the corresponding record number into a tuple statistical table.

And when the difference between the maximum value of the number of the tuples and the minimum value of the number of the tuples reaches a preset condition, migrating the tuple data of the first computing node to the second computing node, wherein the step comprises the following steps:

When the number of tuples of the first computing node is at least twice the number of tuples of the second computing node, migrating the tuple data of the first computing node to the second computing node.

Wherein the step of migrating the tuple data of the first computing node to the second computing node comprises:

and migrating the first tuple record of the first computing node to the second computing node, wherein the first tuple record is the tuple record with the largest record number in the first computing node.

Wherein after the step of migrating the first tuple record of the first computing node to the second computing node, the database management method further comprises:

And storing data migration information in a migration information table, wherein the data migration information comprises the first tuple record and a second computing node.

The steps of executing the query request by the plurality of computing nodes according to the execution plan respectively comprise the following steps:

the coordination node searches whether related records exist in the migration information table according to the execution plan;

If yes, directly acquiring the tuple data of the corresponding position according to the migration information table.

The step of distributing the execution plan to a plurality of computing nodes to enable the computing nodes to execute a query request according to the execution plan and record the number of scanned tuples respectively includes:

Judging whether the execution plan comprises a sequence table scanning operator or not;

If yes, the plurality of computing nodes execute a query request according to the execution plan, record the scanned tuple number and return the query request result and the tuple number;

if not, the plurality of computing nodes execute the query request according to the execution plan respectively, and return the query request result.

In order to solve the technical problems, the application adopts another technical scheme that: providing another database management method, wherein the database management method is applied to a database management system, and the database management system comprises a coordination node and a plurality of calculation nodes; the database management method comprises the following steps:

The coordination node acquires inquiry request information of a client and generates an execution plan according to the inquiry request information;

the co-regulator distributes the execution plan to the plurality of computing nodes;

the computing nodes execute the query request according to the execution plan, record the scanned tuple number and return the query request result and the tuple number to the coordination node;

the coordination node obtains the query request result and the tuple number, compares the tuple numbers of the plurality of computing nodes, and obtains a first computing node corresponding to the maximum value of the tuple number and a second computing node corresponding to the minimum value of the tuple number;

And when the difference between the maximum value of the number of the tuples and the minimum value of the number of the tuples reaches a preset condition, the coordination control point migrates the tuple data of the first computing node to the second computing node.

In order to solve the technical problems, the application adopts another technical scheme that: providing a database management device, wherein the database management device comprises a processor and a memory; the memory has stored therein a computer program, and the processor is configured to execute the computer program to implement the steps of the database management method as described above.

In order to solve the technical problems, the application adopts another technical scheme that: there is provided a computer storage medium storing a computer program which when executed performs the steps of the database management method described above.

Compared with the prior art, the application has the beneficial effects that: the coordination node acquires the query request information and generates an execution plan according to the query request information; distributing the execution plan to a plurality of computing nodes so that the computing nodes execute the query request according to the execution plan and record the scanned tuple number; obtaining query request results of a plurality of computing nodes and the number of tuples; comparing the number of the tuples of the plurality of computing nodes to obtain a first computing node corresponding to the maximum value of the number of the tuples and a second computing node corresponding to the minimum value of the number of the tuples; and when the difference between the maximum value of the number of the tuples and the minimum value of the number of the tuples reaches a preset condition, migrating the tuple data of the first computing node to the second computing node. By the method, the data inclination problem can be dynamically found, and data migration is performed, so that optimal data distribution is achieved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort to those of ordinary skill in the art.

FIG. 1 is a flowchart of a first embodiment of a database management method according to the present application;

FIG. 2 is a schematic diagram of an embodiment of a database system according to the present application;

FIG. 3 is a flowchart of a second embodiment of a database management method according to the present application;

FIG. 4 is a flowchart of a third embodiment of a database management method according to the present application;

FIG. 5 is a flowchart of a fourth embodiment of a database management method according to the present application;

FIG. 6 is a schematic diagram of an embodiment of a database management device according to the present application;

fig. 7 is a schematic structural diagram of an embodiment of a computer storage medium according to the present application.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In order to solve the problem that data inclination easily occurs in the prior art, the application provides a database management method. Referring to fig. 1 and fig. 2, fig. 1 is a flow chart of a first embodiment of a database management method according to the present application, and fig. 2 is a structural diagram of an embodiment of a database system according to the present application.

The database management method of the present application is applied to the database system of fig. 2, wherein the database system 100 includes one coordinating node 11 and several computing nodes 12. The coordination node 11 is responsible for distributing data and tasks to the computing nodes 12, summarizing the computing results of the computing nodes 12, and finally returning the results to the user clients. The compute nodes 12 are responsible for data storage and actually performing the computing tasks. The database of the present application specifically refers to a management Massive Parallel Processing (MPP) database.

The primary goal of data distribution design in MPP databases is the even distribution of data among the various nodes of the system. In particular, multiple processors are coordinated to process programs in parallel, where each processor has independent operating system and memory resources. The system may be referred to as "shared nothing" in which the tables of the database are partitioned into segments and distributed among the different processing nodes, with no data sharing occurring between the processing nodes. The data is partitioned among the processing nodes such that each processing node has a subset of rows from the tables of the database. Each processing node processes only the rows on its own disk.

As shown in fig. 1, the database management method of the present embodiment specifically includes the following steps:

s101: and acquiring the query request information, and generating an execution plan according to the query request information.

When a user client inputs a query statement, the coordination node acquires query request information about the query statement and generates an execution plan according to the query request information. The execution plan includes tasks and data that are distributed to the individual computing nodes.

S102: distributing the execution plan to a plurality of computing nodes, so that the plurality of computing nodes execute the query request according to the execution plan and record the scanned tuple number respectively.

The coordination node distributes the execution plan to each computing node according to the distribution condition of the execution plan. Each computing node executes the query request according to the execution task acquired by the computing node and records the scanned tuple number.

Specifically, each time a plan is executed, the computing node needs to return the result of executing the query request to the coordination node for summarization. In addition, each compute node needs to determine whether a table scan (table scan) operator is included in the execution plan. If the sequence table scanning operator exists, the computing node needs to further record the number of scanned tuples and returns the tuple number and the query request result to the coordination node for summarization.

S103: and obtaining query request results of a plurality of computing nodes and the number of tuples.

The coordination node acquires the number of the tuples of the plurality of computing nodes and uniformly records the tuple number into the tuple statistical table. The tuple statistics table is mainly used for counting tuple records and tuple numbers of all computing nodes, and is used for comparing the tuple numbers and distributing migration data.

S104: and comparing the number of the tuples of the plurality of computing nodes, and acquiring a first computing node corresponding to the maximum value of the number of the tuples and a second computing node corresponding to the minimum value of the number of the tuples.

The coordination node compares the number of the tuples of the plurality of computing nodes, and extracts a first computing node corresponding to the maximum value of the number of the tuples and a second computing node corresponding to the minimum value of the number of the tuples. The coordination node can judge whether the problem of data inclination occurs in the database by comparing the difference between the first computing node and the second computing node.

It should be noted that, the first computing node and the second computing node in this embodiment do not refer to a particular computing node, and specifically, the comparison result, such as the maximum value and the minimum value, generated when the number of tuples of the computing node is counted each time. Because the storage position of the tuple and the storage data are dynamically changed, whether the problem of data inclination occurs can be dynamically judged in real time in this way.

S105: and when the difference between the maximum value of the number of the tuples and the minimum value of the number of the tuples reaches a preset condition, migrating the tuple data of the first computing node to the second computing node.

When the difference of the number of the tuples of the first computing node and the second computing node reaches a preset condition, the coordinating node migrates part of the tuple data of the first computing node to the second computing node.

Specifically, the preset conditions may be: the difference between the number of the tuples of the first computing node and the number of the tuples of the second computing node is larger than a preset fixed value; or the number of tuples of the first computing node is N times the number of tuples of the second computing node, wherein N >1.

When the first computing node and the second computing node meet preset conditions, the situation that data are inclined is indicated, the coordination node needs to execute data migration, and the situation that the data are inclined is solved.

Further, after the coordinating node completes a data migration task, the foregoing steps 102 to 105 may be executed again, and the tuple data of the plurality of computing nodes may be compared again. If the difference between the computing node with the maximum tuple number and the computing node with the minimum tuple number still meets the preset condition, executing data migration again until the problem of data inclination is completely solved.

In this embodiment, the coordinating node obtains the query request information, and generates an execution plan according to the query request information; distributing the execution plan to a plurality of computing nodes so that the computing nodes execute the query request according to the execution plan and record the scanned tuple number; obtaining query request results of a plurality of computing nodes and the number of tuples; comparing the number of the tuples of the plurality of computing nodes to obtain a first computing node corresponding to the maximum value of the number of the tuples and a second computing node corresponding to the minimum value of the number of the tuples; and when the difference between the maximum value of the number of the tuples and the minimum value of the number of the tuples reaches a preset condition, migrating the tuple data of the first computing node to the second computing node. By the method, the data inclination problem can be dynamically found, and data migration is performed, so that optimal data distribution is achieved.

Based on the step 103 of the database management method, the application also provides another specific database management method. Referring to fig. 3 in detail, fig. 3 is a flowchart illustrating a second embodiment of a database management method according to the present application.

As shown in fig. 3, the database management method of the present embodiment specifically includes the following steps:

S201: and acquiring records of each tuple in the plurality of computing nodes and corresponding record quantity.

The coordination node obtains each tuple record and the corresponding record number in the plurality of computing nodes, and forms a tuple statistical table based on the tuple records. The tuple statistics table is specifically as follows:

segment-1	segment-2	segment-n
			100000	200000	300000

at this time, if n is 1.5, the current maximum tuple number is 3 times the minimum tuple number, and according to the preset condition determination in the above embodiment, a data skew condition occurs in the tuple statistics table.

S202: the number of records per compute node orders the tuple records from big to small.

S203: and recording the tuple number of a plurality of computing nodes, the first M tuple records of each computing node and the corresponding record number into a tuple statistical table.

The coordination node may further obtain tuple records of each computing node, and count the number of tuples of each tuple record.

To reduce the computational overhead, the coordinating node records the tuple records with the number of tuples arranged in the first M bits in each computing node into a tuple statistics table. At this time, the tuple statistics table is specifically as follows:

	segment-1	segment-2	segment-n
				Total amount of	100000	200000	300000
1	Zhejiang A1111:50000	Zhejiang A2222:60000	Zhejiang A3333:80000
				2	Zhejiang A4444:40000	Zhejiang A5555:50000	Thunberg A6666:70000
M	Zhejiang A7777:30000	Zhejiang A8888:40000	Zhejiang A9999:60000

For example, record "Zhe A1111:50000" in the tuple statistics table indicates that there are 50000 records of tuple "Zhe A1111".

S204: and comparing the number of the tuples of the plurality of computing nodes, and acquiring a first computing node corresponding to the maximum value of the number of the tuples and a second computing node corresponding to the minimum value of the number of the tuples.

S205: when the number of the tuples of the first computing node is at least twice the number of the tuples of the second computing node, the first tuple record of the first computing node is migrated to the second computing node, wherein the first tuple record is the tuple record with the largest record number in the first computing node.

The data migration method and the data migration system mainly conduct data migration through a greedy algorithm.

Specifically, the coordination node firstly acquires a first computing node with the largest number of tuples and a second computing node with the smallest number of tuples, and then judges whether the difference between the first computing node and the second computing node meets a preset condition.

In this embodiment, when the number of tuples of the first computing node is at least twice the number of tuples of the second computing node, it is explained that the difference between the first computing node and the second computing node satisfies the preset condition. At this time, the coordinating node migrates the tuple record with the largest number in the first computing node, that is, the first tuple record, to the second computing node.

Further, the coordination node continues to judge whether the problem of data inclination occurs in the updated tuple statistical table. If not, it indicates that the data migration is completed, and step 206 is entered; if the data migration occurs, the data inclination judgment and the data migration are continuously executed in a circulating mode until the condition that the data migration is completed is met.

For example, the collaboration point will migrate the tuple record "Zhe A3333" with the largest number of segments-n in the tuple statistics table described above to segment-1. In the updated tuple statistics table, the total amount of segment-n is 300000-80000=220000, while the total amount of segment-1 is 100000+80000=180000. At this time, segments-n with the largest number of tuples and segments-1 with the smallest number of tuples are calculated, n is 220000/180000=1.2 <2, the condition that the data migration is completed is satisfied, and the data migration is not required to be continued, and the process proceeds to step 206.

S206: data migration information is stored in a migration information table, wherein the data migration information includes a first tuple record and a second computing node.

The coordinating node may further store the data migration information in step 205 in a migration information table, where the migration information table records a change condition of the tuple statistics table, for example, the data migration condition in the above example is stored in the migration information table:

Data	Position of
		Zhejiang A3333	segment-1

Specifically, when the data in the tuple statistics table changes again, the data is updated: if the data is subsequently migrated to segment-2, the corresponding location information needs to be modified to segment-2; such as deletion of data: if the data is migrated back to the original location segment-n again, the record needs to be deleted.

Based on the step 102 of the database management method, the application also provides another specific database management method. Referring to fig. 4 in detail, fig. 4 is a flowchart illustrating a third embodiment of a database management method according to the present application.

As shown in fig. 4, the database management method of the present embodiment specifically includes the following steps:

s301: the coordination node searches whether related records exist in the migration information table according to the execution plan.

Before distributing the execution plan to each computing node, the coordination node may first find whether there is a relevant record in the migration information table according to the execution plan. If so, go to step 302; if not, step 303 is entered.

S302: and directly acquiring the tuple data of the corresponding position according to the migration information table.

The coordination node obtains data directly to the position recorded by the migration information table.

S303: and searching the tuple data position according to the original data distribution strategy.

And the computing node searches the tuple data position according to the original data distribution strategy according to the execution plan. The data distribution policy may be: hash distribution, random (e.g., round robin) distribution, range distribution or list distribution, etc.

Based on the above embodiment, the present application also proposes another specific database management method. Referring to fig. 5, fig. 5 is a flowchart illustrating a fourth embodiment of a database management method according to the present application.

As shown in fig. 5, the database management method of the present embodiment specifically includes the following steps:

s401: and the coordination node acquires the query request information of the client and generates an execution plan according to the query request information.

S402: the co-ordination point distributes the execution plan to several computing nodes.

S403: and the plurality of computing nodes execute the query request according to the execution plan respectively, record the scanned tuple number and return the query request result and the tuple number to the coordination node.

S404: the coordination node obtains the query request result and the tuple number, compares the tuple numbers of the plurality of calculation nodes, and obtains a first calculation node corresponding to the maximum value of the tuple number and a second calculation node corresponding to the minimum value of the tuple number.

S405: when the difference between the maximum value of the number of the tuples and the minimum value of the number of the tuples reaches a preset condition, the coordination node transfers the tuple data of the first calculation node to the second calculation node.

In order to implement the database management method of the above embodiment, the present application further provides a database management device, and referring to fig. 6, fig. 6 is a schematic structural diagram of an embodiment of the database management device provided by the present application.

As shown in fig. 6, the database management apparatus 600 of the present embodiment includes a processor 61, a memory 62, an input-output device 63, and a bus 64.

The processor 61, the memory 62, and the input-output device 63 are respectively connected to the bus 64, and the memory 62 stores a computer program, and the processor 61 is configured to execute the computer program to implement the database management method of the above embodiment.

It should be noted that, the database management device 600 corresponding to the first to third embodiments of the database management method may be a server that carries and implements the coordination node function, and the database management device 600 corresponding to the fourth embodiment of the database management method may be a server cluster or a distributed server that carries and implements the database system of fig. 2.

In the present embodiment, the processor 61 may also be referred to as a CPU (Central Processing Unit ). The processor 61 may be an integrated circuit chip with signal processing capabilities. Processor 61 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The processor 61 may also be a GPU (Graphics Processing Unit, graphics processor), also called a display core, a vision processor, a display chip, and is a microprocessor that is dedicated to image operations on personal computers, workstations, game machines, and some mobile devices (e.g., tablet computers, smartphones, etc.). The GPU is used for converting and driving display information required by a computer system, providing a line scanning signal for a display, controlling the correct display of the display, and is an important element for connecting the display and a personal computer mainboard and is also one of important equipment for 'man-machine conversation'. The display card is an important component in the host computer, and is very important for people who are engaged in professional graphic design to take on the task of outputting and displaying graphics. The general purpose processor may be a microprocessor or the processor 61 may be any conventional processor or the like.

The present application also provides a computer storage medium 700 for storing a computer program 71, as shown in fig. 7, which computer program 71, when being executed by a processor, is adapted to carry out the method according to an embodiment of the database management method of the present application.

The method of the database management method embodiment of the present application may be stored in a device, such as a computer readable storage medium, when implemented in the form of a software functional unit and sold or used as a stand alone product. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing description is only of embodiments of the present invention, and is not intended to limit the scope of the invention, and all equivalent structures or equivalent processes using the descriptions and the drawings of the present invention or directly or indirectly applied to other related technical fields are included in the scope of the present invention.

Claims

1. A database management method, wherein the database is a distributed database, the database management method comprising:

2. The method of claim 1, wherein,

3. The method of claim 1, wherein,

The step of migrating the tuple data of the first computing node to the second computing node comprises:

4. The method for database management as claimed in claim 3, wherein,

After the step of migrating the first tuple record of the first computing node to the second computing node, the database management method further includes:

5. The method for database management as claimed in claim 4, wherein,

The steps of the plurality of computing nodes executing the query request according to the execution plan respectively comprise the following steps:

Searching whether related records exist in the migration information table according to the execution plan;

6. The method of claim 1, wherein,

The step of distributing the execution plan to a plurality of computing nodes, so that the plurality of computing nodes execute a query request according to the execution plan and record the scanned tuple number respectively, includes:

7. The database management method is characterized by being applied to a database management system, wherein the database management system comprises a coordination node and a plurality of calculation nodes; the database management method comprises the following steps:

the coordination point migrates the tuple data of the first computing node to the second computing node when the number of tuples of the first computing node is at least twice the number of tuples of the second computing node.

8. A database management device, wherein the database management device comprises a processor and a memory; the memory has stored therein a computer program, the processor being adapted to execute the computer program to implement the steps of the database management method according to any of claims 1 to 7.

9. A computer storage medium, characterized in that the computer storage medium stores a computer program which, when executed, implements the steps of the database management method according to any one of claims 1 to 7.