CN105824957B - The query engine system and querying method of distributed memory columnar database - Google Patents

The query engine system and querying method of distributed memory columnar database Download PDF

Info

Publication number
CN105824957B
CN105824957B CN201610193220.XA CN201610193220A CN105824957B CN 105824957 B CN105824957 B CN 105824957B CN 201610193220 A CN201610193220 A CN 201610193220A CN 105824957 B CN105824957 B CN 105824957B
Authority
CN
China
Prior art keywords
query engine
subtask
state
data
main
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610193220.XA
Other languages
Chinese (zh)
Other versions
CN105824957A (en
Inventor
段翰聪
王瑾
闵革勇
聂晓文
郑松
张博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201610193220.XA priority Critical patent/CN105824957B/en
Publication of CN105824957A publication Critical patent/CN105824957A/en
Application granted granted Critical
Publication of CN105824957B publication Critical patent/CN105824957B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24542Plan optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/221Column-oriented storage; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24554Unary operations; Data partitioning operations
    • G06F16/24556Aggregation; Duplicate elimination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24558Binary matching operations
    • G06F16/2456Join operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5017Task decomposition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Operations Research (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of query engine system of distributed memory columnar database and querying method, querying method includes: that resource management module determines the session that a main query engine is responsible between user;The sql like language that user sends is converted to inquiry plan by main query engine;Resource management module is that main query engine is distributed from query engine;Inquiry plan is divided at least two subtasks by main query engine, and is distributed for each subtask from query engine;Current subtask is executed after the completion of the forerunner subtask of current subtask all executes, current subtask is executed into the slave query engine where the intermediate data for completing to generate is transmitted to subsequent subtask, and current subtask completion status is sent to main query engine;Main query engine notifies client obtaining final result data from query engine.The query engine system and querying method of distributed memory columnar database provided by the invention, available good search efficiency.

Description

The query engine system and querying method of distributed memory columnar database
Technical field
The present invention relates to database technical fields, and in particular to a kind of query engine system of distributed memory columnar database System and querying method.
Background technique
NewSQL is to various new expansible, high-performance data library abbreviations, and this kind of database not only has NoSQL pairs The storage management ability of mass data also maintains traditional database and supports the characteristics such as ACID and SQL.In general, NewSQL Be roughly divided into three classes: new architecture takes different design methods, such as Google using completely new database platform Spanner, Clustrix, VoltDB and MemSQL;SQL query engine, the SQL storage engines of height optimization, provides The identical programming interface of MySQL, but scalability is more preferable than built-in engine InnoDB;Transparent fragment provides the middleware of fragment Layer, database are segmented in multiple node operations automatically.Over time, the NewSQL database of these three types by It gradually merges, the extensive distribution towards on-line analytical processing (OLAP, Online Analytical Processing) of being born Formula memory columnar database.
Query engine is the core of Database Systems, is responsible for the execution tune of entire Database Systems inquiry calculating task Degree.The SQL statement of one user input, will do it SQL statement morphology syntax parsing generative grammar first in Database Systems Tree, deforms syntax tree using database query optimizer, is finally converted to what database query engine can identify Inquiry plan.Inquiry plan tells how query engine executes, and how from database bottom storage engines to extract data, logarithm According to deform the result for being finally converted into user and wanting.
HIVE is a Tool for Data Warehouse based on Hadoop, and provides simple SQL query function, can be by SQL Sentence is converted into MapReduce task and is run.For SQL statement SELECT c_custkey FROM customer JOIN nation ON customer.C_NATIONKEY = nation.N_NATIONKEY JOIN lineitem ON Lineitem.L_PARTKEY=customer.C_CUSTKEY, HIVE to a SQL query plan and task execution process such as Shown in Fig. 1.What HIVE was really executed is MapReduce task, so inquiry plan can be converted into MapReduce task-set It closes, former inquiry plan is converted to two MapReduce tasks.Wherein, JOB1 is responsible for calculating Join1, that is, lineitem The Join operation of table and customer table;JOB2 is responsible for calculating Join2, that is, calculates Join1 result and nation table Join operation, final output.After JOB1 executes completion, external storage system, JOB2 can be written in intermediate result data It can just start to execute, and JOB2 can read the intermediate result that JOB1 is generated from external storage system and then carry out calculating work Make.The shortcomings that HIVE, is it is clear that its bottom uses MapReduce computation module, for every two MapReduce calculating task Between data sharing, the result of one of calculating task can only be output to external storage system (distributed file system Or local file system), the latter calculating task reads data from external storage system and is calculated, and leads to a large amount of disk I/O, so that entire query process delay is higher.
Spark-SQL is that another Tool for Data Warehouse is similar with HIVE function, but Spark-SQL bottom uses Spark computation model rather than MapReduce computation module.For SQL statement SELECT c_custkey FROM customer JOIN nation ON customer.C_NATIONKEY = nation.N_NATIONKEY JOIN Lineitem ON lineitem.L_PARTKEY=customer.C_CUSTKEY, Spark-SQL are to a SQL query plan It is as shown in Figure 2 with task execution process.Stage1 is mainly used to handle the ScanTable(lineitem in inquiry plan) and ScanTable(customer), RDD1 and RDD2 are respectively corresponded.Since RDD is distributed elastic data set, corresponding multiple physics Node, each physical node can execute corresponding task, so a RDD is by multiple Task(tasks executed parallel) it obtains, Such as RDD1 is just calculated by Task1-1, Task1-2.After having read lineitem table and customer table content, Stage2 It is mainly used to handle Join1 operation and ScanTable(nation) operation, RDD3 and RDD4 is generated respectively.Finally, Stage3 is used To complete Join2 operation.Spark-SQL is many fastly relative to HIVE on computing relay, but still remains some disadvantages.
One be Spark-SQL bottom using scala language realize, overall operation on a java virtual machine, memory Administrative mechanism depends on Java Virtual Machine.And Java Virtual Machine memory management mechanism is a kind of general memory management mechanism, In database query engine, the internal memory optimization customized is not done for database query engine, Spark-SQL is caused to calculate A large amount of memory headroom is consumed in the process.
Secondly being executed during being Spark-SQL task execution according to phase sequence, as Stage2 starts before executing The condition of mentioning is that Stage1 executes completion, and the precondition that Stage3 is executed is that Stage2 executes completion.Each Stage includes several A Task(task that can be executed parallel), the execution delay of each Stage is determined by executing time longest Task in the Stage It is fixed.Thus one is led to the problem of, executing after fast Task is completed can wait other in same Stage to be not carried out completion Task, after completing to task executions all in same Stage, the Task in next Stage can just start to execute.For example, Task1-1, Task1-2, Task2-1 and Task2-2 in Stage1, Task3-1 and Task3-2 in Stage2, Task3-2 depends on the calculated result of Task1-1, Task1-2 and Task2-1.If Task1-1, Task1-2 and Task2-1 task execution is completed and Task2-2 is not carried out completion, even when Task3-1 meets execution condition, is counted in Spark Under the constraint condition for calculating frame, Task3-1 still cannot start to execute, and need to complete just to start later until Task2-2 is executed It executes.If it is too long that Task2-2 executes the time, the computing relay of entire calculating process will affect.
Summary of the invention
To be solved by this invention is the low problem of existing database query engine computational efficiency.
The present invention is achieved through the following technical solutions:
A kind of query engine system of distributed memory columnar database, including resource management module, at least one master are looked into Ask engine and at least one from query engine;The main query engine is used to sql like language being converted to inquiry plan, will inquire Plan is divided at least two subtasks, and is responsible for the implementation procedure of monitoring and scheduling inquiry plan;It is described to be used from query engine In the subtask for executing the main query engine distribution;The resource management module is used to be responsible for the management of system resource and divides Match.
Optionally, the system resource includes CPU computing resource and memory source.
Based on the query engine system of above-mentioned distributed memory columnar database, the present invention also provides a kind of distributed memories The querying method of columnar database, comprising: resource management module determines the session that a main query engine is responsible between user; The sql like language that user sends is converted to inquiry plan by main query engine;Resource management module is that main query engine is distributed from looking into Engine is ask, and establishes the communication between query engine and main query engine;Inquiry plan is divided at least by main query engine Two subtasks, and distribute for each subtask from query engine;It is added to task queue from query engine by subtask, is working as The forerunner subtask of preceding subtask executes current subtask after the completion of all executing, and current subtask is executed in completing to generate Between data be transmitted to the slave query engine where subsequent subtask, and current subtask completion status is sent to main inquiry and is drawn It holds up;After the completion of entire inquiry plan, main query engine notifies client obtaining final result data from query engine.
Inquiry plan is divided into several subtasks for having dependence by the present invention, and by subtask distribute to accordingly from In the task queue of query engine, by successively executing the subtask in task queue from query engine, without In Spark-SQL, although some task meets executable condition in the latter half, since Spark-SQL executes the limit of frame System, and the shortcomings that executing calculating task cannot be started.Therefore, looking into using distributed memory columnar database provided by the invention Inquiry method, available good search efficiency.
Optionally, subtask uses physics operator representation, and the physics operator includes extracting column data operation, connection behaviour Work, condition filter operation, division operation, aggregate function operation, sorting operation and final result data convert is embarked on journey table At least one of operation.
Optionally, main query engine is that each subtask is distributed from query engine according to Cost Model.Using Cost Model It is the distribution of each subtask from query engine, can be the smallest from query engine for each subtask distribution Executing Cost, thus Further increase search efficiency.
Optionally, it includes: according to from looking into that main query engine, which is that each subtask distributes from query engine according to Cost Model, The metadata information for asking engine obtains the database table information stored from the IP of node where query engine and the node and column Information;Each in principle distribution inquiry plan, which is localized, according to data extracts the execution node IP of column data operation;Using greed The non-execution node for extracting column data operation of algorithm picks.
Optionally, the state of each subtask include etc. pending state, calculating state, distribution data mode, hold Row finishes state and executes status of fail.
Optionally, the original state of current subtask be etc. pending state, from query engine where current subtask After receiving the intermediate data that all forerunner subtasks, current subtask execute completion generation, the state of current subtask is changed to just In the state of calculating;After the completion of current subtask calculates, the state of current subtask is changed to distribution data mode, and produce calculating Raw intermediate data is sent to the slave query engine where subsequent subtask;If intermediate data is sent successfully, by current subtask State be changed to the state of being finished;If waiting between pending state and calculating state, calculating state and distribution number According between state or distributing data mode between the state that is finished and breaking down, the state of current subtask is changed to hold Row status of fail;When state in current subtask changes, the main query engine of asynchronous notifications.
Optionally, from the column data that the intermediate data transmitted between query engine is by compression processing.In traditional data In the enforcement engine of library, intermediate data is occurred by the form of table, and data storage is stored according to row, however in most of analytic type industry Under scene of being engaged in, user's only several attributes in one relation table of relationship can be additional in calculating process by the way of row storage It loads the unconcerned attribute data of user institute and has well solved this by the way of column storage to cause the waste of memory One problem.
Optionally, the compression processing includes position compression processing and dictionary compression processing.Using dictionary compression handle and The mode of position compression processing can further decrease memory overhead, improve the service efficiency of memory.
Compared with prior art, the present invention having the following advantages and benefits:
The query engine system and querying method of distributed memory columnar database provided by the invention, pass through asynchronous schedule The execution of each subtask improves integral operation efficiency, i.e., inquiry plan is divided into several subtasks for having dependence, and Subtask is distributed to accordingly from the task queue of query engine, by successively executing the son in task queue from query engine Task.Further, from the column data that the data transmitted between query engine are by compression processing, the side using row storage is solved Formula additionally loaded in calculating process user unconcerned attribute data and cause the waste problem of memory.
Detailed description of the invention
Attached drawing described herein is used to provide to further understand the embodiment of the present invention, constitutes one of the application Point, do not constitute the restriction to the embodiment of the present invention.In the accompanying drawings:
Fig. 1 is a SQL query plan and the task execution flow diagram of HIVE;
Fig. 2 is a SQL query plan and the task execution flow diagram of Spark-SQL;
Fig. 3 is the part-structure signal of the query engine system of the distributed memory columnar database of the embodiment of the present invention Figure;
Fig. 4 is a SQL query plan schematic diagram of the embodiment of the present invention;
Fig. 5 is the task execution flow diagram of the embodiment of the present invention;
Fig. 6 is the execution state transition diagram of the subtask of the embodiment of the present invention;
Fig. 7 is the schematic diagram that data are transmitted between the slave query engine of the embodiment of the present invention.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below with reference to embodiment and attached drawing, to this Invention is described in further detail, and exemplary embodiment of the invention and its explanation for explaining only the invention, are not made For limitation of the invention.
Embodiment
The present embodiment provides a kind of query engine system of distributed memory columnar database, the distributed memory column The query engine system of database include resource management module, at least one main query engine and at least one draw from inquiry It holds up.
Specifically, sql like language is converted to inquiry plan by parsing sql like language by the main query engine, and inquiry is counted Division be cut into behind at least two subtasks be distributed to it is described executed from query engine, and be responsible for holding for monitoring and scheduling inquiry plan Row process and fault-tolerant processing.With it is similar in the prior art, inquiry plan is indicated with tree.It is described to be used for from query engine The subtask of the main query engine distribution is executed, the resource management module is used to be responsible for the management and distribution of system resource. Further, the system resource includes CPU computing resource and memory source.Fig. 3 is the distributed memory column number of the present embodiment According to the partial structure diagram of the query engine system in library, main query engine 31 corresponding three from query engine: from query engine 32, from query engine 33 and from query engine 34.
The present embodiment also provides the distributed memory of the query engine system based on above-mentioned distributed memory columnar database The querying method of columnar database, comprising:
Step S1, resource management module determine the session that a main query engine is responsible between user.Specifically, with When there is query demand at family, resource management module is creating the responsible session between user of a main query engine in resource pool.
The sql like language that user sends is converted to inquiry plan by step S2, main query engine.Main query engine passes through morphology Parsing and syntax parsing, and rule-based query optimization, are converted into inquiry plan for sql like language.With class in the prior art Seemingly, inquiry plan is indicated with tree.
Step S3, resource management module are that main query engine is distributed from query engine, and is established and looked into from query engine and master Ask the communication between engine.After sql like language is converted into inquiry plan, main query engine is to resource management module application meter Calculate resource, resource management module distribution gives main query engine from query engine, and establish from query engine and main query engine it Between network connection.
Inquiry plan is divided at least two subtasks by step S4, main query engine, and for the distribution of each subtask from Query engine.By query engine in this present embodiment towards be distributed memory columnar database, tables of data is in distributed column By column storage in database, and each column are cut into several fragments according to value range.For this characteristic, the present embodiment is taken out As having gone out several physics operators, for indicating the specific subtask of some in inquiry plan.The physics operator includes extracting Column data operation, attended operation, condition filter operation, division operation, aggregate function operation, sorting operation and by final result Data convert is embarked on journey at least one of the operation of table.
Column data operation: i.e. GetColumn operator is extracted, is responsible for extracting the data of a certain column in column database, GetColumn operator itself can be with additional restrictions, such as Teacher.age > 1 GetColumn(Teacher.age), Indicate that the age for extracting Teacher table is arranged, and age value is greater than 1.
Attended operation: i.e. Join operator is responsible for executing Join operation, including Left Join, Right Join, Full Join etc..
Condition filter operation: i.e. Filter operator is responsible for executing condition filter operation, mainly includes the logics such as AND and OR Operation.
Division operation: i.e. GroupBy operator is responsible for executing GroupBy division operation, for meeting in SQL statement The function of GroupBy keyword.
Aggregate function operation: i.e. AGG operator, including Max(maximizing), Avg(averages) etc. databases are common grasps Make.
Sorting operation: i.e. Order operator, the column for sorting to needs are ranked up operation.
Final result data convert is embarked on journey the operation of table: i.e. BuildRow operator, for by column database final result number The row table being understood that according to user is reduced into, is presented to user for final result in the form of relation table.
For example, a specific SQL statement SELECT c_custkey FROM customer JOIN nation ON customer.C_NATIONKEY = nation.N_NATIONKEY JOIN lineitem ON lineitem. L_ PARTKEY=customer.C_CUSTKEY, the inquiry plan generated by the parsing of main query engine is as shown in figure 4, be divided into Subtask as shown in figure 5, including six from query engine: from query engine Slave-QE1, from query engine Slave-QE2, From query engine Slave-QE3, from query engine Slave-QE4, from query engine Slave-QE5 and from query engine Slave-QE6。
Assuming that there are two fragments for each column, then for having a GetColumn operator on the fragment of each column, Since the fragment of each column has codomain range, then can also generate the Join operator based on the fragment range for each fragment. With reference to Fig. 5, Join1 node indicates the equivalent attended operation of column L_PARTKEY and C_CUSTKEY, in actual subtask, Join1 is split into two specific physics operator, Join1-1 and Join1-2, is each responsible for codomain range in 1-100 and value Domain range is operated in the equivalent Join of 101-150.And so on, Join2 is also split as two specifically in inquiry plan Join operator.
Further, it is that each subtask is distributed from query engine that main query engine, which is according to Cost Model, in the present embodiment. Specifically, main query engine is according to the IP and the section obtained from the metadata information of query engine from node where query engine The database table information and column information of point storage.Each in principle distribution inquiry plan, which is localized, according to data extracts column data behaviour The execution node IP of work.Such as in Fig. 5, divide from what physical node storage L_PARTKEY where query engine Slave-QE1 was arranged Sheet data, then the GetColumn operator for the fragment data is just assigned to from physics section where query engine Slave-QE1 It is executed on point.And so on, the execution node of the GetColumn operator of each fragment is held in the node where corresponding data Row.Greedy algorithm is used for the selection that non-GetColumn operator executes node, non-GetColumn operator executes node at it It is chosen in the execution node of son's operator node, calculates separately the execution generation executed on every son operator node physical node Valence selects the smallest physical node of Executing Cost to execute.Principle basis cost calculation formula: Executing Cost=network cost+calculating Network load × transmitted data amount+node tasks load × calculates data volume between cost=node.In Fig. 5, Join1-1 operator It executes node or from query engine Slave-QE1 or from query engine Slave-QE3, selects to draw from inquiry here It is exactly by calculating separately Join1-1 operator from query engine Slave-QE1 that Slave-QE1, which is held up, as the foundation for executing node It Executing Cost on node and determines calculating from the Executing Cost on query engine SlaveQE-3 node from query engine The upper Executing Cost of Join1-1 is smaller, so final execution physical node is selected as from query engine Slave-QE1.
Step S5 is added to task queue from query engine by subtask, and the forerunner subtask in current subtask is whole Current subtask is executed after the completion of executing, current subtask is executed into the intermediate data for completing to generate and is transmitted to subsequent subtask institute Slave query engine, and current subtask completion status is sent to main query engine.Specifically, each subtask include etc. Pending state, calculating state, distributing data mode, the state that is finished and execute this five kinds of states of status of fail, And each subtask can safeguard the list of forerunner subtask and subsequent subtask list of the subtask, the execution of each subtask State transition graph is as shown in Figure 6.
By taking Join1-1 operator shown in Fig. 4 as an example, forerunner's operator list is GetColumn(L_PARTKEY Slice 1 [1-100]), GetColumn (C_CUSTKEY Slice 1 [1-150]), Consequence operator list is Join2-1 operator. Join1-1 operator original state be etc. it is pending, the physical node where Join1-1 operator receives its all forerunner's operator and sends After the data come, Join1-1 operator state is changed to calculating, and after Join1-1 operator calculates completion, will change when pre-operator To distribute data, and calculation result data is sent to by network to the slave query engine where Consequence operator.Data are sent Success, when pre-operator task execution is completed.If wherein a certain step breaks down, i.e., etc. pending state and calculating state it Between, calculating between state and distribution data mode or distributing data mode and break down between the state that is finished, Operator state can be set to execute failure.Certainly, the every generation one-shot change of Join1-1 operator state, can all look into master in real time It askes engine and reports and work as pre-operator state.The execution of each operator is independent from each other, and in each operator implementation procedure, state is once Change, will the main query engine of asynchronous notifications, and result data is pushed to the execution physical node where Consequence operator. In this way, whether the execution of the subtask forerunner subtask that places one's entire reliance upon completes, without as Spark or MapReduce is the same, goes execution task stage by stage.
It in step s 5, is the column by compression processing from query engine and from the intermediate data transmitted between query engine Data include that position compression processing and dictionary compression are handled using compression processing method, by taking data structure shown in Fig. 7 as an example, in Between data include three vectors, i.e. dictionary vector, offset vector sum position vector.Dictionary vector arranges initial data Sequence, then duplicate removal processing, the data of redundancy are abandoned, and save memory storage space.As for offset vector sum position vector, by What is stored inside the two vectors is integer, uses position Compression Strategies here.In a computer, an INT type accounts for four Byte, i.e. 32bit, denotable data area -2147483648~2147483647, for offset shown in Fig. 7 to Amount and position vector, the maximum value of integer can be decided in vector.So in most cases, storing a number and using Not 32bit.Assuming that the maximum value of integer is A in offset vector or position vector, then used in one number of storage Bit number rounds up for log2A, compares conventionally employed INT type or LONG type variable to store integer, in this way More save memory.
Step S6, after the completion of entire inquiry plan, main query engine notifies client most to terminate from query engine acquisition Fruit data.So far, entire inquiry work is completed.
Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention Protection scope, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include Within protection scope of the present invention.

Claims (3)

1. a kind of querying method of distributed memory columnar database, which is characterized in that the method is arranged based on distributed memory The query engine system of formula database, the system comprises:
The query engine system of distributed memory columnar database, including resource management module, at least one main query engine with And at least one is from query engine;
The main query engine is used to sql like language being converted to inquiry plan, and inquiry plan is divided at least two subtasks, And it is responsible for the implementation procedure of monitoring and scheduling inquiry plan;
It is described to be used to execute the subtask that the main query engine distributes from query engine;
The resource management module is used to be responsible for the management and distribution of system resource;
The described method includes: resource management module determines the session that a main query engine is responsible between user;
The sql like language that user sends is converted to inquiry plan by main query engine;
Resource management module is that main query engine is distributed from query engine, and is established between query engine and main query engine Communication;
Inquiry plan is divided at least two subtasks by main query engine, and is distributed for each subtask from query engine;
It is added to task queue from query engine by subtask, is held after the completion of the forerunner subtask of current subtask all executes Current subtask is executed the intermediate data for completing to generate and is transmitted to drawing where subsequent subtask from inquiry by the preceding subtask of the trade It holds up, and current subtask completion status is sent to main query engine;
After the completion of entire inquiry plan, main query engine notifies client obtaining final result data from query engine;
Subtask uses physics operator representation, and the physics operator includes extracting column data operation, attended operation, condition filter behaviour Work, division operation, aggregate function operation, at least one in the operation of sorting operation and table that final result data convert is embarked on journey Kind;
The state of each subtask include etc. pending state, calculating state, distribute data mode, the state that is finished with And execute status of fail;
The original state of current subtask such as is at the pending state, receives current son from query engine where current subtask and appoints It is engaged in after the intermediate data that all forerunner subtasks execution completions generate, is changed to the state of current subtask calculating state; After the completion of current subtask calculates, the state of current subtask is changed to distribution data mode, and the mediant generated will be calculated According to the slave query engine being sent to where subsequent subtask;If intermediate data is sent successfully, the state of current subtask is changed to Be finished state;If waiting between pending state and calculating state, calculating between state and distribution data mode Or distribute data mode and break down between the state that is finished, the state of current subtask is changed to execute failure shape State;When state in current subtask changes, the main query engine of asynchronous notifications;Main query engine is every according to Cost Model A subtask is distributed from query engine;Main query engine distributes for each subtask from query engine according to Cost Model
According to the data obtained from the metadata information of query engine from the IP of node where query engine and node storage Library table information and column information;
Each in principle distribution inquiry plan, which is localized, according to data extracts the execution node IP of column data operation;
The execution node of non-extraction column data operation is chosen using greedy algorithm.
2. the querying method of distributed memory columnar database according to claim 1, which is characterized in that from query engine Between the intermediate data that transmits be column data by compression processing.
3. the querying method of distributed memory columnar database according to claim 2, which is characterized in that at the compression Reason includes that position compression processing and dictionary compression are handled.
CN201610193220.XA 2016-03-30 2016-03-30 The query engine system and querying method of distributed memory columnar database Active CN105824957B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610193220.XA CN105824957B (en) 2016-03-30 2016-03-30 The query engine system and querying method of distributed memory columnar database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610193220.XA CN105824957B (en) 2016-03-30 2016-03-30 The query engine system and querying method of distributed memory columnar database

Publications (2)

Publication Number Publication Date
CN105824957A CN105824957A (en) 2016-08-03
CN105824957B true CN105824957B (en) 2019-09-03

Family

ID=56524572

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610193220.XA Active CN105824957B (en) 2016-03-30 2016-03-30 The query engine system and querying method of distributed memory columnar database

Country Status (1)

Country Link
CN (1) CN105824957B (en)

Families Citing this family (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106326387B (en) * 2016-08-17 2019-06-04 电子科技大学 A kind of Distributed Storage structure and date storage method and data query method
CN106445645B (en) * 2016-09-06 2019-11-26 北京百度网讯科技有限公司 Method and apparatus for executing distributed computing task
CN107818100B (en) * 2016-09-12 2019-12-20 杭州海康威视数字技术股份有限公司 SQL statement execution method and device
CN106383738B (en) * 2016-09-30 2019-10-11 北京百度网讯科技有限公司 Task processing method and distributed computing framework
CN106649503A (en) * 2016-10-11 2017-05-10 北京集奥聚合科技有限公司 Query method and system based on sql
CN107329814B (en) * 2017-06-16 2020-05-26 电子科技大学 RDMA (remote direct memory Access) -based distributed memory database query engine system
CN107450972B (en) * 2017-07-04 2020-10-16 创新先进技术有限公司 Scheduling method and device and electronic equipment
CN110020006B (en) * 2017-07-27 2021-04-27 北京国双科技有限公司 Query statement generation method and related equipment
CN109547512B (en) * 2017-09-22 2021-09-03 中国移动通信集团浙江有限公司 NoSQL-based distributed Session management method and device
CN110083441B (en) * 2018-01-26 2021-06-04 中兴飞流信息科技有限公司 Distributed computing system and distributed computing method
CN108520011B (en) * 2018-03-21 2020-12-04 哈工大大数据(哈尔滨)智能科技有限公司 Method and device for determining task execution scheme
CN110968579B (en) * 2018-09-30 2023-04-11 阿里巴巴集团控股有限公司 Execution plan generation and execution method, database engine and storage medium
CN109471893B (en) * 2018-10-24 2022-05-20 上海连尚网络科技有限公司 Network data query method, equipment and computer readable storage medium
CN110119275B (en) * 2019-05-13 2021-04-02 电子科技大学 Distributed memory column type database compiling executor architecture
CN110263105B (en) 2019-05-21 2021-09-10 北京百度网讯科技有限公司 Query processing method, query processing system, server, and computer-readable medium
CN110300332B (en) * 2019-06-18 2020-05-08 南京科源信息技术有限公司 Game loading method and system based on IPTV
CN112650561B (en) * 2019-10-11 2023-04-11 金篆信科有限责任公司 Transaction management method, system, network device and readable storage medium
CN110990430A (en) * 2019-11-29 2020-04-10 广西电网有限责任公司 Large-scale data parallel processing system
CN110851452B (en) * 2020-01-16 2020-09-04 医渡云(北京)技术有限公司 Data table connection processing method and device, electronic equipment and storage medium
CN111382156A (en) * 2020-02-14 2020-07-07 石化盈科信息技术有限责任公司 Data acquisition method, system, device, electronic equipment and storage medium
CN111552689B (en) * 2020-03-30 2022-05-03 平安医疗健康管理股份有限公司 Method, device and equipment for calculating deduplication index of fund audit
CN111723112B (en) * 2020-06-11 2023-07-07 咪咕文化科技有限公司 Data task execution method and device, electronic equipment and storage medium
CN112000688A (en) * 2020-08-14 2020-11-27 杭州数云信息技术有限公司 Query method and query system based on universal query language
CN112416926A (en) * 2020-11-02 2021-02-26 浙商银行股份有限公司 Design method of distributed database high-performance actuator supporting domestic CPU SIMD instruction
CN112269835A (en) * 2020-11-10 2021-01-26 浪潮云信息技术股份公司 Method for asynchronously reading and processing batch data by distributed database
CN113946600A (en) * 2021-10-21 2022-01-18 北京人大金仓信息技术股份有限公司 Data query method, data query device, computer equipment and medium
CN113792079B (en) * 2021-11-17 2022-02-08 腾讯科技(深圳)有限公司 Data query method and device, computer equipment and storage medium
CN113934763B (en) * 2021-12-17 2022-04-12 北京奥星贝斯科技有限公司 SQL query method and device for distributed database

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103123652A (en) * 2013-03-14 2013-05-29 曙光信息产业(北京)有限公司 Data query method and cluster database system
CN103324765A (en) * 2013-07-19 2013-09-25 西安电子科技大学 Multi-core synchronization data query optimization method based on column storage

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103123652A (en) * 2013-03-14 2013-05-29 曙光信息产业(北京)有限公司 Data query method and cluster database system
CN103324765A (en) * 2013-07-19 2013-09-25 西安电子科技大学 Multi-core synchronization data query optimization method based on column storage

Also Published As

Publication number Publication date
CN105824957A (en) 2016-08-03

Similar Documents

Publication Publication Date Title
CN105824957B (en) The query engine system and querying method of distributed memory columnar database
AU2014240211B2 (en) Background format optimization for enhanced sql-like queries in hadoop
US9177025B2 (en) Hash-join in parallel computation environments
KR101621137B1 (en) Low latency query engine for apache hadoop
EP2831767B1 (en) Method and system for processing data queries
CN107329814B (en) RDMA (remote direct memory Access) -based distributed memory database query engine system
US10394807B2 (en) Rewrite constraints for database queries
EP2469423B1 (en) Aggregation in parallel computation environments with shared memory
CN105930417B (en) A kind of big data ETL interactive process platform based on cloud computing
CN106354729A (en) Graph data handling method, device and system
US10459760B2 (en) Optimizing job execution in parallel processing with improved job scheduling using job currency hints
CN114756629B (en) Multi-source heterogeneous data interaction analysis engine and method based on SQL
CN116756150B (en) Mpp database large table association acceleration method
Sax et al. Performance optimization for distributed intra-node-parallel streaming systems
Leida et al. Distributed SPARQL query answering over RDF data streams
CN108319604B (en) Optimization method for association of large and small tables in hive
Kotowski et al. Parallel query processing for OLAP in grids
Gupta et al. An approach for optimizing the performance for apache spark applications
CN116401277A (en) Data processing method, device, system, equipment and medium
CN115982230A (en) Cross-data-source query method, system, equipment and storage medium of database
US20140379691A1 (en) Database query processing with reduce function configuration
CN107784032A (en) Gradual output intent, the apparatus and system of a kind of data query result
EP2469424B1 (en) Hash-join in parallel computation environments
CN113590651A (en) Cross-cluster data processing system and method based on HQL
Pan et al. Implementing and Optimizing Multiple Group by Query in a MapReduce Approach

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant