CN112202859A - Data transmission method and database system - Google Patents
Data transmission method and database system Download PDFInfo
- Publication number
- CN112202859A CN112202859A CN202011001547.5A CN202011001547A CN112202859A CN 112202859 A CN112202859 A CN 112202859A CN 202011001547 A CN202011001547 A CN 202011001547A CN 112202859 A CN112202859 A CN 112202859A
- Authority
- CN
- China
- Prior art keywords
- instance
- data
- computing
- instances
- instruction set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 60
- 230000005540 biological transmission Effects 0.000 title claims abstract description 48
- 238000004364 calculation method Methods 0.000 abstract description 60
- 238000004458 analytical method Methods 0.000 abstract description 9
- 238000003860 storage Methods 0.000 abstract description 7
- 239000012634 fragment Substances 0.000 description 21
- 238000009826 distribution Methods 0.000 description 13
- 238000010586 diagram Methods 0.000 description 12
- 230000008569 process Effects 0.000 description 12
- 230000033001 locomotion Effects 0.000 description 9
- 230000003993 interaction Effects 0.000 description 7
- 230000007246 mechanism Effects 0.000 description 5
- 238000012790 confirmation Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000013500 data storage Methods 0.000 description 3
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present disclosure provides a data transmission method and a database system. The method comprises the following steps: the main example in the database system respectively sends instruction sets to a plurality of calculation examples, each calculation example executes the instruction set to obtain the execution result of the instruction set corresponding to the calculation example, and the calculation examples of the plurality of execution instruction sets send the execution results of the instruction sets corresponding to the calculation examples to the same receiving port of the main example. According to the method, when the database carries out query tasks, the main instance receives the execution results of the instruction sets corresponding to the calculation instances respectively sent by the calculation instances of the execution instruction sets through the same receiving port, the query correctness is guaranteed, meanwhile, the receiving ports which are too much occupied by data transmission are reduced, the performance of the database system is improved, the storage analysis and calculation requirements of mass data are met, and the deployment of the database system in a larger scale is realized.
Description
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data transmission method and a database system.
Background
The shared-nothing distributed database system comprises a main node and a plurality of distributed computing nodes (hosts), wherein the main instance is deployed in the main node, a plurality of computing instances can be deployed in each computing node, the main instance and each computing instance adopt a network connection based on a Transmission Control Protocol (TCP) to carry out data Transmission, before data Transmission, TCP network connection needs to be established between the main node and each computing instance in the database system, and a sender and a receiver of each network connection need to be respectively allocated with a TCP port.
For a large-scale distributed database, a large number of computing nodes need to be deployed, a plurality of computing instances may be deployed in each computing node, and in a scenario where a query task is concurrent or even highly concurrent, a large number of TCP ports need to be occupied for data transmission between each computing instance and a main instance, however, the number of TCP ports is limited.
Thus, large-scale database systems cannot be deployed based on the TCP protocol.
Disclosure of Invention
To solve the technical problem or at least partially solve the technical problem, the present disclosure provides a data transmission method and a database system.
In a first aspect, the present disclosure provides a data transmission method applied to a database system, where the database system includes: a master instance and a plurality of compute instances, the method comprising:
the main instance respectively sends instruction sets to the plurality of computing instances;
each computing instance executes the instruction set to obtain an execution result of the instruction set corresponding to the computing instance;
and a plurality of computing instances executing the instruction sets send execution results of the instruction sets respectively corresponding to the computing instances to the same receiving port of the main instance.
Optionally, the method further includes:
and each computing instance receives data sent by other computing instances through the same receiving port and sends the data to other computing instances through the same sending port.
Optionally, the instruction set comprises a plurality of sub-instruction sets;
each computing instance executes the instruction set to obtain an execution result of the instruction set corresponding to the computing instance, and the method comprises the following steps:
each computing instance starts a plurality of executors to correspondingly execute a plurality of sub-instruction sets in the instruction set, and an execution result of the instruction set corresponding to the computing instance is obtained; each actuator receives data sent by other actuators through the same receiving port, and sends the data to other actuators through the same sending port.
Optionally, each computation instance receives data sent by other computation instances through the same receiving port, and sends data to other computation instances through the same sending port, where the method includes:
each computing instance receives data sent by other computing instances based on the RUDP and sends data to other computing instances based on the RUDP.
Optionally, each computation instance receives data sent by other computation instances through the same receiving port, and sends data to other computation instances through the same sending port, where the method includes:
each computing instance receives data sent by other computing instances based on the UDP and sends data to other computing instances based on the UDP.
Optionally, each of the actuators receives data sent by another actuator through the same receiving port, and sends data to another actuator through the same sending port, where the method includes:
each executor receives data transmitted by other executors based on the RUDP and transmits data to other executors based on the RUDP.
Optionally, each of the actuators receives data sent by another actuator through the same receiving port, and sends data to another actuator through the same sending port, where the method includes:
and each executor receives the data sent by other executors based on the UDP and sends the data to the other executors based on the UDP.
Optionally, the sending, by the multiple computing instances executing the instruction set, the execution results of the instruction sets respectively corresponding to the multiple computing instances to the same receiving port of the main instance includes:
a plurality of computing instances executing the instruction sets send their respective corresponding execution results of the instruction sets to the host instance based on a reliable user datagram protocol, RUDP.
Optionally, the sending, by the multiple computing instances executing the instruction set, the execution results of the instruction sets respectively corresponding to the multiple computing instances to the same receiving port of the main instance includes:
and a plurality of computing instances executing the instruction sets send execution results of the instruction sets respectively corresponding to the computing instances to the main instance based on a User Datagram Protocol (UDP).
In a second aspect, the present disclosure provides a database system comprising: a master instance and a plurality of compute instances;
the database system is configured to perform the data transmission method of the database system according to the first aspect.
Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages: the main node in the database system respectively sends instruction sets to the plurality of computing nodes, each computing node starts a computing example to execute the instruction set after receiving the instruction set sent by the main node, and the computing examples of the plurality of executing instruction sets send the execution results of the instruction sets respectively corresponding to the computing examples to the same receiving port of the main node. When the database carries out an inquiry task, the main node receives the execution results of the instruction sets corresponding to the calculation examples respectively sent by the calculation examples of the execution instruction sets through the same receiving port, the inquiry correctness is ensured, meanwhile, the receiving ports which are too much occupied by data transmission are reduced, the performance of the database system is improved, the storage analysis and calculation requirements of mass data are met, and the deployment of the database system in a larger scale is realized.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
FIG. 1A is a schematic diagram of a database system interacting with a client;
FIG. 1B is a schematic diagram of a database system;
fig. 2 is an interaction diagram of a data transmission method according to an embodiment of the present disclosure;
fig. 3 is an interaction diagram of another data transmission method provided in the embodiment of the present disclosure;
fig. 4 is an interaction diagram of another data transmission method provided by the embodiment of the present disclosure;
FIG. 5 is a schematic flow chart of a database system executing a query task;
FIG. 6 is a flow chart illustrating another database system performing a query task.
Detailed Description
In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.
The terms to which the present invention relates will be explained first:
the distributed database is a logically unified database formed by connecting a plurality of physically dispersed database units by using a computer network. Wherein each connected database unit is referred to as a compute node. The distributed database includes at least two compute nodes. The computing nodes may be physical computing nodes distributed in different places, or may be logical computing nodes distributed in the same physical database.
Join (Join), the most important query in a database, combines the results of two or more table queries together.
Grouping by means of grouping the result set according to one or more columns, so that the data can be classified more accurately.
Fig. 1A is an interaction diagram of a database system and clients, where at least one client is connected to a shared-nothing distributed database system as shown in fig. 1A. The shared-nothing distributed database may include a master node and a plurality of computing nodes, and the client is connected to the master node, as shown in fig. 1A, there are 3 computing nodes, which are respectively computing node 1, computing node 2, and computing node 3, and it is understood that the number of computing nodes in fig. 1A is only an example, and does not constitute a limitation to the present disclosure, where the master node is a server or a terminal device, and the master node is deployed with a master instance. A computing node is a server or terminal device, and a computing node may deploy one or more computing instances.
The main instance stores system tables including, but not limited to: the method comprises the following steps of node information of the computing nodes, distribution information of computing instances in the computing nodes, metadata of a data table and distribution conditions of data in the data table in each computing instance. In the database system, the main instance does not store the data in the data table, but stores the data in the corresponding data table of each calculation instance according to the distribution rule.
If the database needs to execute tasks such as Query, the client sends a Query request to the main instance of the host node, and optionally, the Query request may be a Structured Query Language (SQL) statement.
The main example determines an instruction set according to the query request and the distribution condition of the data in each calculation example, wherein the instruction set is a work step which needs to be completed by each calculation example determined according to the query request, and therefore after the main example determines the instruction set, the instruction set is sent to each calculation example, and each calculation example executes the instruction set in parallel. Optionally, if the query request is an SQL statement, operations such as syntax, lexical analysis, semantic analysis, query rewrite, query optimization, and the like may be performed on the SQL statement, so as to determine an instruction set that needs to be executed by each computation instance.
And the main example respectively sends the instruction sets to the multiple calculation examples, and each calculation example carries out corresponding database operation according to the received instruction set to obtain the execution result of the instruction set and sends the execution result of the instruction set to the main example.
And the main instance performs operations such as aggregation and the like on the execution results sent by the received computing instances to obtain a final query result, and sends the final query result to the client to complete the query task.
Fig. 1B is a schematic structural diagram of a database system, and fig. 1B is a schematic structural diagram of a distributed database system, which is based on fig. 1A, and further illustrates a structure of the distributed database system, as shown in fig. 1B, where distribution rules of respective computation instances are stored in a main instance, and the main instance is used for determining an instruction set when executing a query task. The calculation examples may store data tables of a database, table 1 and table 2 are stored in the calculation example 1, table 1 and table 2 are stored in the calculation example 2, table 1 and table 2 are stored in the calculation example 3, table 1 stored in each calculation example may be a table with the same structure but different table data, and similarly table 2 stored in each calculation example may be a table with the same structure but different table data. Each compute instance may execute a set of instructions sent by the master instance.
An application scenario of the present disclosure is described below in conjunction with the above database system structure.
With the rapid development of the internet and the internet of things, data growth shows an explosive trend, and more applications adopt shared-nothing distributed databases for storage and calculation. At present, a large-scale parallel processing (MPP) technology is adopted for a shared-nothing distributed database, when the database is queried, a parallel manner is adopted for querying among computing instances, data Transmission is required to be performed among the computing instances and between the computing instances and a main instance in the query, wherein the shared-nothing distributed database is used for performing data Transmission by adopting a Transmission Control Protocol (TCP), before the TCP is used for performing data Transmission, a TCP network connection needs to be established between each two of the main instance and the computing instances in the database, and a TCP port number is allocated to each network connection.
For a large-scale distributed database, a large number of computing nodes need to be deployed, when a database query task is executed, multiple computing instances may also be deployed in each computing node at the same time, and in a scenario where the query task is concurrent or even highly concurrent, a large number of TCP ports need to be occupied for data transmission between each computing instance and a main instance, however, the number of TCP ports is limited. Thus, large-scale distributed databases cannot be deployed based on the TCP protocol.
When a database system executes an inquiry task, a main instance receives execution results of instruction sets corresponding to calculation instances respectively sent by the calculation instances of a plurality of execution instruction sets through the same receiving port, the inquiry correctness is guaranteed, meanwhile, the receiving port which occupies too many data transmission is reduced, the performance of the database system is improved, the requirements of mass data storage analysis and calculation are met, and the deployment of a larger-scale database system is realized.
The following describes the technical solutions of the present disclosure and how to solve the above technical problems in detail with specific embodiments.
Fig. 2 is an interaction diagram of a data transmission method provided by an embodiment of the present disclosure, as shown in fig. 2, the method of the present embodiment is executed by a database system, the database system includes a main example and a plurality of calculation examples, fig. 2 exemplarily shows 2 calculation examples, which are calculation example 1 and calculation example 2, respectively, and the present disclosure is not limited to the number of calculation examples. The method of this example is as follows:
s201, the main instance sends instruction sets to the multiple computing instances respectively.
When a query task exists, the main instance generates an instruction set corresponding to each computation instance according to the distribution condition of data in the computation instance, the main instance may send the instruction sets corresponding to all computation instances to each computation instance, or may send the instruction sets corresponding to the computation instances to each computation instance, respectively, where the instruction sets include, but are not limited to, database operations that can be performed by the corresponding computation instance, the database operations include, but are not limited to, redistribution, join, group by, and the like, where the redistribution is to redistribute data stored in each computation instance in the database to each computation instance according to a certain rule, for example, data in the database takes a hash value for a certain column of data (distributed column), and according to the hash value, row data with the same or similar hash value is stored in one computation instance, thereby completing the distributed storage of the database through hash distribution. The instruction set may also be referred to as an execution plan.
Optionally, before S201, the method may further include: the main instance determines the instruction set according to the query request sent by the client.
S202, each calculation example executes the instruction set to obtain the execution result of the instruction set corresponding to the calculation example.
When operations such as connection of two data tables with non-distributed columns or grouping of tables with non-distributed columns are required, each computing instance allocates storage resources and computing resources for executing the instruction set, and the computing instance executes instructions in the instruction set to obtain an execution result of the instruction set corresponding to the computing instance. Optionally, the distribution rule may be to distribute data of the same hash value in the same calculation example, or to distribute data of the hash value in the same segment interval in the same calculation example according to the segment interval, and the method of storing data according to the hash value is not limited in the present invention.
S203, the multiple computing instances executing the instruction sets send the execution results of the instruction sets respectively corresponding to the computing instances to the same receiving port of the main instance.
When the main example receives the execution results of the instruction sets corresponding to the multiple calculation examples, the main example receives the execution results of the instruction sets through the same receiving port, the receiving port of the main example has a unique port identification in the database system, and the main example receives the execution results of the instruction sets corresponding to the calculation examples respectively sent by the calculation examples of the multiple execution instruction sets through the port identification.
In one possible implementation, the multiple computation instances executing the instruction sets send execution results of their respective corresponding instruction sets to the main instance based on a User Datagram Protocol (UDP).
Optionally, on the basis of UDP, a confirmation mechanism for sending data, a retransmission mechanism for data transmission failure, a congestion control mechanism, and other mechanisms may be added, so as to ensure reliability of data transmission.
Because the data transmission based on the UDP does not need to establish long connection, one receiving port can receive the data sent by a plurality of ports, and therefore, based on the UDP, the main instance can use the same receiving port to receive the execution results sent by a plurality of computing instances.
Further, a UDP-based network connection may be established in advance.
In another possible implementation manner, the multiple computing instances executing the instruction sets send execution results of their respectively corresponding instruction sets to the main instance based on a Reliable User Data Protocol (RUDP).
The RUDP is added with protocol contents such as a data retransmission mechanism and the like on the basis of UDP, so that the correctness of data transmission is ensured, and reliable data transmission is realized. For example, in a data transmission process based on the RUDP, a data sending end controls the amount of data sent through a sliding window, so as to implement failed retransmission, message confirmation, and congestion control, where a data packet sent by the sending end includes an increasing sequence number, if a receiving end receives the data packet, the sending end sends confirmation information to the sending end, and performs data sequencing and assembling according to the increasing sequence number of the received data packet, so as to obtain correct data, and if the sending end does not receive the confirmation information from the receiving end after sending the data packet within a preset time period, the sending end retransmits the data, so as to ensure the correctness of the transmitted data, thereby reducing the occupation of too many receiving ports by data transmission and achieving reliable data transmission.
Further, the network connection based on the RUDP may be established in advance.
Optionally, before S203, the master instance allocates a receiving port, and the computing instance allocates a sending port.
In this embodiment, the instruction sets are respectively sent to the multiple computing instances through the main instance in the database system, each computing instance executes the instruction set to obtain the execution result of the instruction set corresponding to the computing instance, and the computing instances executing the instruction sets send the execution results of the instruction sets corresponding to the computing instances to the same receiving port of the main instance. When the database carries out query tasks, the main instance receives the execution results of the instruction sets corresponding to the calculation instances respectively sent by the calculation instances of the execution instruction sets through the same receiving port, the query correctness is guaranteed, meanwhile, the receiving ports which are too much occupied by data transmission are reduced, the performance of the database system is improved, the storage analysis and calculation requirements of mass data are met, and the deployment of the database system in a larger scale is realized.
On the basis of the foregoing embodiments, further, the main instance parses the received query task to generate an instruction set that can be executed by each computing instance, where the instruction set may or may not include a redistribution operation, and if the instruction set includes a redistribution operation, data transmission between the computing instances is involved in the redistribution process, for example, the redistribution process may include: in the process of executing redistribution operation, the calculation example needs to read a table to be redistributed stored in the current calculation example, hash data in non-distributed columns in the calculation example, and redistribute the hashed data to the calculation examples of each calculation example according to a distribution rule.
However, for a large-scale distributed database, a large number of computing nodes need to be deployed, each computing node may have multiple computing instances deployed therein, during the execution of a corresponding instruction set, if there is redistribution, data migration is also involved, that is, data transmission and reception between computing instances, during the redistribution, if data transmission is performed by using TCP, data transmission is performed by using one receiving port and one transmitting port between every two computing instances, the more computing instances in the database system will occupy more ports, in a concurrent or highly concurrent scenario, the more occupied ports will be, that is, the database system needs to allocate a large number of port numbers between computing instances, however, the TCP ports are limited, for example, the TCP ports in the linux operating system can only use port numbers from 1025 to 65535, this may cause the system to fail to perform normal query tasks. How embodiments of the present disclosure solve the above problems is further illustrated by the following specific examples.
Fig. 3 is an interaction schematic diagram of another data transmission method of a database system according to an embodiment of the present disclosure, and fig. 3 is further based on the embodiment shown in fig. 2, and as shown in fig. 3, S202 includes S202 a:
s202a, each calculation example executes the instruction set to obtain the execution result of the instruction set corresponding to the calculation example. And in the time period of executing the instruction set by the computing instance, each computing instance receives data sent by other computing instances through the same receiving port and sends the data to other computing instances through the same sending port.
Each computing instance uses the same sending interface for sending the table data in the database, and the sending interface can be used for sending the database data to other computing instances and also can be used for sending the execution result of the instruction set to the main instance. Each compute instance uses the same receive port to receive table data in the database, which may be, for example, data sent by other compute instances in the redistribution process, where the other compute instances are compute instances in the database system other than the compute instance.
In one possible implementation, each computing instance receives data sent by other computing instances based on UDP, and sends data to other computing instances based on UDP.
In another possible implementation, each compute instance receives data sent by other compute instances based on the RUDP and sends data to other compute instances based on the RUDP.
In this embodiment, each computation instance receives data sent by other computation instances through the same receiving port, and sends the data to other computation instances through the same sending port, so that the receiving or sending ports occupied by data transmission are reduced while the correctness of query is ensured, the performance of the database system is improved, the requirements of mass data storage analysis and computation are met, and the deployment of a larger-scale database system is realized.
On the basis of the above embodiment, for the case that the instruction set includes the redistribution operation, further, if the instruction set relates to database operations of multiple tables, the instruction set corresponding to each compute instance may be divided into sub-instruction sets by the database operations and the tables of operations in the instruction set, so as to start threads corresponding to the number of sub-instruction sets in the compute instance, and in each compute instance, the corresponding sub-instruction sets may be executed in parallel by multiple threads, which can improve the execution efficiency of the database system.
However, data transmission is performed between threads in a plurality of threads started in a computing example based on TCP in the prior art, a sending port and a receiving port need to be allocated to establish a TCP connection between every two threads needing data transmission, and a normal query task may not be completed by the system due to a limited TCP port number of the system, so that a large-scale database system cannot be deployed. How embodiments of the present disclosure solve the above problems is further illustrated by the following specific examples.
Fig. 4 is an interaction diagram of a data transmission method of a database system according to another embodiment of the present disclosure, and fig. 4 is a diagram of a command set including a plurality of sub-command sets based on the embodiment shown in fig. 2 or fig. 3; the calculation instance includes a plurality of sub-calculation instances, and as shown in fig. 4, S202 includes S202 b:
s202b, each computing instance starts a plurality of actuators to correspondingly execute a plurality of sub-instruction sets in the instruction set, wherein each actuator receives data sent by other actuators through the same receiving port and sends the data to other sub-computing instances through the same sending port.
The sub-instruction set may also be called an execution plan Slice (Slice), and the sub-instruction set is a main instance and is formed by dividing the instruction set according to redistribution operations included in the instruction set. The main instance may divide the instruction set according to a table containing non-distributed columns to obtain a plurality of sub-instruction sets, and the main instance sends the sub-instruction sets of each computation instance to the corresponding computation instance. For example, for data with a table with non-distributed columns, the data of the table with non-distributed columns needs to be hashed, and the hashed data is sent to each computing instance, and the hash redistribution operation on the table with non-distributed columns is divided into a sub-instruction set. And the executors allocate storage resources and computing resources to each computing instance according to the received sub-instruction sets, each sub-instruction set corresponds to one executor, and the executors execute the instructions of the corresponding sub-instruction sets. In the process that the actuators execute the corresponding sub-instruction sets, data transmission and reception can be generated among the actuators, each actuator receives data transmitted by other actuators through the same receiving port and transmits the data to other actuators through the same transmitting port, wherein the other actuators are the actuators, except the actuator, started by a plurality of computing instances.
In a possible implementation manner, each executor receives data sent by other executors based on the UDP, and sends data to other executors based on the UDP.
In another possible implementation manner, each executor receives data sent by other executors based on the RUDP, and sends data to other executors based on the RUDP.
Optionally, the main instance may divide the instruction set into sub-instruction sets according to a data movement operation node (Motion). The data movement operation node is an operation node in the instruction set divided according to a plurality of tables in the instruction set and database operations executed on the tables. Dividing a complete instruction set into a plurality of sub instruction sets from bottom to top by taking a data moving operation node as a boundary, wherein each sub instruction set is responsible for executing a part of a plan, and dividing the data moving operation node into a data sending party and a data receiving party to obtain an upper sub instruction set and a lower sub instruction set, wherein the lowest operation node of an upper layer execution plan fragment is a data receiving operation node and receives redistributed data sent by a lower layer actuator of a calculation example, and the uppermost operation node of the lower layer execution fragment is a data sending operation node and is used for sending the data to receiving ends of other upper layer actuators.
Optionally, the sub-instruction set includes an execution sequence identifier, where the execution sequence identifier is an execution sequence identifier of the sub-instruction set when the main instance divides the instruction set into the sub-instruction sets according to an execution sequence of instructions included in the instruction set, for example, the execution sequence identifier may be a number, the number of the last executed sub-instruction set is 1, the number of the next executed sub-instruction set is 2, and so on, and each sub-computation instance executes a sub-execution result obtained by executing the corresponding sub-instruction set and sends the sub-execution result to the corresponding sub-computation instance according to the number.
The method of this embodiment is described below with reference to fig. 5, and fig. 5 is a schematic flow chart of a database system executing a query task, where as shown in fig. 5, the database system includes a main instance and N computation instances, and after receiving a client SQL request, the main instance establishes a TCP connection, and performs syntax, lexical, semantic analysis, query rewrite, and query optimization on an SQL statement to generate a distributed instruction set, which may also be called a distributed execution plan.
If the instruction set contains redistribution operation, the distributed execution plan needs to add data mobile operation nodes for data redistribution, and the distributed execution plan for a multi-table connected complex query task may have a plurality of data mobile operation nodes. If there is a data movement operation node in the execution plan, the main instance may split the execution plan into M execution plan fragments from bottom to top with the data movement operation node as a boundary, where each execution plan fragment is a part of an instruction of the execution plan, and split a position of the data movement operation node in the execution plan into a data sending side and a data receiving side, to obtain two upper and lower execution plan fragments, where a lowermost operation node of the upper execution plan fragment is a data receiving operation node for receiving rehashed data sent by a lower execution plan fragment of each computation instance, and an uppermost operation node of the lower execution fragment is a data sending operation node for redistributing local data to an upper execution fragment receiving side of each computation instance.
The scheduler of the main instance establishes TCP connection with each computation instance according to the execution plan fragments, each execution plan fragment corresponds to a query executor process of the computation instance based on TCP connection, and then distributes a distributed execution plan and a fragment number to each query executor of each computation instance through the TCP connection network, for example, the query executor 1 sends the distributed execution plan and the fragment number 1, and the query executor M sends the distributed execution plan and the fragment number M.
Each computing instance establishes the RUDP network connection of the actuators of the computing instance to the master instance for transmitting data, and for the plan fragment with the data mobile operation instance, the RUDP network connection of the data transmission between the actuators of the computing instance is required to be established, wherein the RUDP network connection comprises a lower layer data transmitting port and an upper layer data receiving port.
And each query executor in each calculation example obtains a corresponding plan fragment in the distributed execution plan according to the fragment number, and then executes corresponding operation according to the execution plan fragment. Each computing instance executes from bottom to top according to the execution plan fragment, if the operation node of the plan fragment is a data sending node, if the operation node is to be sent to other computing instance upper layer query executors, the data which is re-hashed is sent to each computing instance upper layer executor through the internal network based on the RUDP, if the operation node is to be sent to the main instance, the data is directly sent without hashing, if the operation node which executes the plan fragment is a data receiving node, the data of each executor at each computing instance lower layer is received through the internal network of the RUDP, thus ensuring that the same hash value data is in the same computing instance, therefore all computing instances can execute connection or grouping operation locally in parallel, the execution is completed, if a plurality of executors exist in the computing instance, the result data is continuously sent to the computing instance upper layer executor through the internal network of the RUDP, wherein the actuator at the top of the compute instance sends data to the actuator of the master instance.
In this embodiment, after each computing instance receives the instruction set sent by the main instance, the multiple executors are started to correspondingly execute multiple sub-instruction sets in the instruction set, and the task of executing the instruction set is divided into the sub-instruction sets, so that the sub-instruction sets are executed in parallel, the execution efficiency is improved, in the process that the executors execute the corresponding sub-instruction sets, each executor receives data sent by other executors through the same receiving port, the data are sent to other actuators through the same sending port, so that the correctness of query is ensured when a large amount of data generated in the process of executing the sub-instruction sets in parallel are migrated, the execution efficiency is improved, excessive receiving or sending ports occupied by data transmission are reduced, the performance of the database system is improved, the requirements of mass data storage analysis and calculation are met, and the deployment of the database system in a larger scale is realized.
On the basis of the above embodiment, S203 is followed by:
and the main example obtains the result of the query instruction according to the execution result of the instruction set sent by the plurality of computing examples. As shown in fig. 5, the main instance performs aggregation connection on the received execution results sent by each computation instance to obtain the result of the query instruction.
The main instance sends the results of the query instruction to the client based on TCP.
Optionally, after the main instance sends the result of the query instruction, that is, after the query execution is finished, the internal network of the data-movement-specific RUDP may be cleared, and then the TCP connection from the main instance to each executor of each compute instance may be closed.
The following describes a data transmission method of a database system of the present disclosure by taking 2 calculation examples as an example, and it should be understood that the following example is only one possible implementation manner and does not limit the present disclosure.
Fig. 6 is a schematic flow chart of another database system for executing a query task, and as shown in fig. 6, it is assumed that a query request is a connection query performed on a table S and a table T stored in the database system, where a connection condition is that a distribution column of the table S is equal to a non-distribution column of the table T, and therefore, an instruction set generated by the query is to redistribute the table T according to the non-distribution column and then perform a connection operation with the table S, so that data movement operation nodes in the non-distribution column of the table T are added, and then the data movement operation nodes are split into upper and lower sub instruction sets according to each data movement operation node, such as sub instruction set 1 and sub instruction set 2. The sub-instruction set 2 scans the table T data in sequence, and re-hashes the scanned table T data and distributes the scanned table T data to other executors. The sub instruction set 1 receives the data of the table T after the hash operation, performs a connection operation on the data of the table T after the hash operation and the scan data of the table S, and sends a connection result to the main instance through the sending port. The main instance establishes TCP connection with each computing instance according to the sub-instruction sets, each sub-instruction set corresponds to a TCP connection-based process, namely an executor, of the computing instance, and then sub-instruction sets and numbers are distributed to the executor of each computing instance. The sub instruction set 1 corresponds to the actuator 1, and the sub instruction set 2 corresponds to the actuator 2. The method of the embodiment comprises the following steps:
1. and each executor in each calculation example obtains a sub-instruction set in the instruction set according to the number.
2. The multiple executors initialize internal network inter-connect based on the RUDP protocol, establish a network for transmitting data between the executors in each executor and a network for transmitting data from the top executor to the main instance, and comprise a data transmitting end and a data receiving end.
3. Each compute instance performs instruction operations in the instruction set from the bottom up.
The sub-instruction set 2 of the query executor 2 of each calculation example executes first, sequentially scans the T data of the table, reads each line of data of the table T, then delivers the data to the data sending end, and the sending end hashes each line of data of the table T to obtain a designated executor, and sends the data to the executor 1 of each executor 1 through an internal network based on the RUDP.
The executor 1 of each calculation example takes the data of the table T sent by the executor 2 as an outer table, and performs a connection operation with the data scanned by the table S, and the result of the connection operation is the execution result of the instruction set.
4. Each compute instance sends the results of the execution of the instruction set to the data receive port of the master instance over an internal network based on the RUDP.
5. And the main example performs convergence connection on the received execution results sent back by the actuator 1 of the actuator 1 in each calculation example to obtain the result of the query instruction.
6. The main instance sends the results of the query instruction to the client by being based on TCP.
Optionally, after the main instance sends the result of the query instruction, that is, after the query execution is finished, the internal network of the data-movement-specific RUDP may be cleared, and then the TCP connection from the main instance to each executor of each compute instance may be closed.
The present disclosure provides a database system, the database system of this embodiment includes: a main instance and a plurality of compute instances. The database system is used to perform the data transmission method of the database system as described in any one of fig. 2 to 5 above.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
1. A data transmission method, applied to a database system, the database system comprising: a master instance and a plurality of compute instances, the method comprising:
the main instance respectively sends instruction sets to the plurality of computing instances;
each computing instance executes the instruction set to obtain an execution result of the instruction set corresponding to the computing instance;
and a plurality of computing instances executing the instruction sets send execution results of the instruction sets respectively corresponding to the computing instances to the same receiving port of the main instance.
2. The method of claim 1, further comprising:
and each computing instance receives data sent by other computing instances through the same receiving port and sends the data to other computing instances through the same sending port.
3. The method of claim 1, wherein the instruction set comprises a plurality of sub-instruction sets;
each computing instance executes the instruction set to obtain an execution result of the instruction set corresponding to the computing instance, and the method comprises the following steps:
each computing instance starts a plurality of executors to correspondingly execute a plurality of sub-instruction sets in the instruction set, and an execution result of the instruction set corresponding to the computing instance is obtained; each actuator receives data sent by other actuators through the same receiving port, and sends the data to other actuators through the same sending port.
4. The method of claim 2, wherein each compute instance receives data sent by other compute instances through the same receive port and sends data to other compute instances through the same send port, comprising:
each computation instance receives data sent by other computation instances based on a Reliable User Datagram Protocol (RUDP) and sends data to other computation instances based on the RUDP.
5. The method of claim 2, wherein each compute instance receives data sent by other compute instances through the same receive port and sends data to other compute instances through the same send port, comprising:
and each computing instance receives data sent by other computing instances based on a User Datagram Protocol (UDP) and sends the data to other computing instances based on the UDP.
6. The method of claim 3, wherein each actuator receives data transmitted by other actuators through the same receiving port and transmits data to other actuators through the same transmitting port, and the method comprises:
each executor receives data transmitted by other executors based on the RUDP and transmits data to other executors based on the RUDP.
7. The method of claim 3, wherein each actuator receives data transmitted by other actuators through the same receiving port and transmits data to other actuators through the same transmitting port, and the method comprises:
and each executor receives the data sent by other executors based on the UDP and sends the data to the other executors based on the UDP.
8. The method according to any of claims 1-7, wherein the sending, by the multiple compute instances executing the instruction set, the execution results of their respective corresponding instruction sets to the same receive port of the primary instance comprises:
a plurality of compute instances executing the instruction set send their respective corresponding execution results of the instruction set to the master instance based on RUDP.
9. The method according to any of claims 1-7, wherein the sending, by the multiple compute instances executing the instruction set, the execution results of their respective corresponding instruction sets to the same receive port of the primary instance comprises:
a plurality of compute instances executing the instruction sets send their respective execution results of the instruction sets to the host instance based on UDP.
10. A database system, comprising: a master instance and a plurality of compute instances;
the database system is adapted to perform a data transmission method as claimed in any one of the preceding claims 1-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011001547.5A CN112202859B (en) | 2020-09-22 | 2020-09-22 | Data transmission method and database system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011001547.5A CN112202859B (en) | 2020-09-22 | 2020-09-22 | Data transmission method and database system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112202859A true CN112202859A (en) | 2021-01-08 |
CN112202859B CN112202859B (en) | 2024-02-23 |
Family
ID=74015839
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011001547.5A Active CN112202859B (en) | 2020-09-22 | 2020-09-22 | Data transmission method and database system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112202859B (en) |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110219035A1 (en) * | 2000-09-25 | 2011-09-08 | Yevgeny Korsunsky | Database security via data flow processing |
US20110289508A1 (en) * | 2010-05-18 | 2011-11-24 | Salesforce.Com | Methods and systems for efficient api integrated login in a multi-tenant database environment |
US20130262425A1 (en) * | 2012-04-03 | 2013-10-03 | Sas Institute Inc. | Techniques to perform in-database computational programming |
US20160026684A1 (en) * | 2014-07-22 | 2016-01-28 | Oracle International Corporation | Framework for volatile memory query execution in a multi node cluster |
US9405634B1 (en) * | 2014-06-27 | 2016-08-02 | Emc Corporation | Federated back up of availability groups |
CN106250566A (en) * | 2016-08-31 | 2016-12-21 | 天津南大通用数据技术股份有限公司 | A kind of distributed data base and the management method of data operation thereof |
CN106599043A (en) * | 2016-11-09 | 2017-04-26 | 中国科学院计算技术研究所 | Middleware used for multilevel database and multilevel database system |
CN107070753A (en) * | 2017-06-15 | 2017-08-18 | 郑州云海信息技术有限公司 | A kind of data monitoring method of distributed cluster system, apparatus and system |
CN109726250A (en) * | 2018-12-27 | 2019-05-07 | 星环信息科技(上海)有限公司 | Data-storage system, metadatabase synchronization and data cross-domain calculation method |
CN109933631A (en) * | 2019-03-20 | 2019-06-25 | 江苏瑞中数据股份有限公司 | Distributed parallel database system and data processing method based on Infiniband network |
CN110389900A (en) * | 2019-07-10 | 2019-10-29 | 深圳市腾讯计算机系统有限公司 | A kind of distributed experiment & measurement system test method, device and storage medium |
CN111506602A (en) * | 2020-04-20 | 2020-08-07 | 上海达梦数据库有限公司 | Data query method, device, equipment and storage medium |
-
2020
- 2020-09-22 CN CN202011001547.5A patent/CN112202859B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110219035A1 (en) * | 2000-09-25 | 2011-09-08 | Yevgeny Korsunsky | Database security via data flow processing |
US20110289508A1 (en) * | 2010-05-18 | 2011-11-24 | Salesforce.Com | Methods and systems for efficient api integrated login in a multi-tenant database environment |
US20130262425A1 (en) * | 2012-04-03 | 2013-10-03 | Sas Institute Inc. | Techniques to perform in-database computational programming |
US9405634B1 (en) * | 2014-06-27 | 2016-08-02 | Emc Corporation | Federated back up of availability groups |
US20160026684A1 (en) * | 2014-07-22 | 2016-01-28 | Oracle International Corporation | Framework for volatile memory query execution in a multi node cluster |
CN106250566A (en) * | 2016-08-31 | 2016-12-21 | 天津南大通用数据技术股份有限公司 | A kind of distributed data base and the management method of data operation thereof |
CN106599043A (en) * | 2016-11-09 | 2017-04-26 | 中国科学院计算技术研究所 | Middleware used for multilevel database and multilevel database system |
CN107070753A (en) * | 2017-06-15 | 2017-08-18 | 郑州云海信息技术有限公司 | A kind of data monitoring method of distributed cluster system, apparatus and system |
CN109726250A (en) * | 2018-12-27 | 2019-05-07 | 星环信息科技(上海)有限公司 | Data-storage system, metadatabase synchronization and data cross-domain calculation method |
CN109933631A (en) * | 2019-03-20 | 2019-06-25 | 江苏瑞中数据股份有限公司 | Distributed parallel database system and data processing method based on Infiniband network |
CN110389900A (en) * | 2019-07-10 | 2019-10-29 | 深圳市腾讯计算机系统有限公司 | A kind of distributed experiment & measurement system test method, device and storage medium |
CN111506602A (en) * | 2020-04-20 | 2020-08-07 | 上海达梦数据库有限公司 | Data query method, device, equipment and storage medium |
Non-Patent Citations (1)
Title |
---|
邹逸君等: "一种基于物联网的电火花线切割机床远程监控管理授权系统", 电加工与模具 * |
Also Published As
Publication number | Publication date |
---|---|
CN112202859B (en) | 2024-02-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11099917B2 (en) | Efficient state maintenance for execution environments in an on-demand code execution system | |
US8943103B2 (en) | Improvements to query execution in a parallel elastic database management system | |
US7383288B2 (en) | Metadata based file switch and switched file system | |
US7509322B2 (en) | Aggregated lock management for locking aggregated files in a switched file system | |
US7512673B2 (en) | Rule based aggregation of files and transactions in a switched file system | |
US7788335B2 (en) | Aggregated opportunistic lock and aggregated implicit lock management for locking aggregated files in a switched file system | |
CN108256076B (en) | Distributed mass data processing method and device | |
US20160026667A1 (en) | Memory-aware joins based in a database cluster | |
US9037618B2 (en) | Distributed, unified file system operations | |
US20040133606A1 (en) | Directory aggregation for files distributed over a plurality of servers in a switched file system | |
US8712994B2 (en) | Techniques for accessing a parallel database system via external programs using vertical and/or horizontal partitioning | |
US20170228422A1 (en) | Flexible task scheduler for multiple parallel processing of database data | |
AU2003300350A1 (en) | Metadata based file switch and switched file system | |
CN107818129B (en) | Query restartability | |
CN1710865A (en) | Method for raising reliability of software system based on strucural member | |
US20240338366A1 (en) | Dynamic database pipeline scheduler | |
US8694618B2 (en) | Maximizing data transfer through multiple network devices | |
CN112202859B (en) | Data transmission method and database system | |
US20120265801A1 (en) | Out of order assembling of data packets | |
US8918555B1 (en) | Adaptive and prioritized replication scheduling in storage clusters | |
Gu et al. | Exploring data parallelism and locality in wide area networks | |
Pan | The performance comparison of hadoop and spark | |
Kaitoua et al. | Scalable genomic data management system on the cloud | |
JP2007507762A (en) | Transparent server-to-server transport of stateless sessions | |
US20240259387A1 (en) | Systems and methods for managing database-level roles for data sharing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address | ||
CP03 | Change of name, title or address |
Address after: 100102 201, 2 / F, 101, No. 5 building, No. 7 Rongda Road, Chaoyang District, Beijing Patentee after: China Electronics Technology Group Jincang (Beijing) Technology Co.,Ltd. Country or region after: China Address before: 100102 201, 2 / F, 101, No. 5 building, No. 7 Rongda Road, Chaoyang District, Beijing Patentee before: BEIJING KINGBASE INFORMATION TECHNOLOGIES Inc. Country or region before: China |