CN104834650A - Method and system for generating effective query tasks - Google Patents

Method and system for generating effective query tasks Download PDF

Info

Publication number
CN104834650A
CN104834650A CN201410049127.2A CN201410049127A CN104834650A CN 104834650 A CN104834650 A CN 104834650A CN 201410049127 A CN201410049127 A CN 201410049127A CN 104834650 A CN104834650 A CN 104834650A
Authority
CN
China
Prior art keywords
data block
key column
data
query task
block index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410049127.2A
Other languages
Chinese (zh)
Inventor
汪东升
李宝禄
王占业
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201410049127.2A priority Critical patent/CN104834650A/en
Publication of CN104834650A publication Critical patent/CN104834650A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention discloses a method and system for generating effective query tasks. The method includes the following steps: storing the structured table data in blocks after sorting the data by a key column to obtain a plurality of data blocks corresponding to the structured table data; obtaining the value range of the key column in each data block to create data block index; according to the data block index, generating effective query tasks for the data block containing result information when a query task of query by or including the key column is received. The method of generating effective query tasks provided in the present invention, creates data block index based on the value range of a specific column in a data block, generates effective query tasks, thus reducing invalid query tasks, improving the speed of data processing, and lowering the burden of the data management system.

Description

A kind of effective query task creating method and system
Technical field
The present invention relates to microcomputer data processing field, particularly relate to a kind of effective query task creating method and system.
Background technology
Along with the fast development of internet and the universal rapidly of various mobile terminal, the data scale of relevant enterprise and unit maintains sustained and rapid growth, especially internet data, its data scale constantly expands with index rank, and can keep this rising tendency within a period of time in future always.According to famous consulting firm IDC(International Data Corporation, International Data Corporation (IDC)) statistics, the global data total amount being created and copying in 2011 is 21 powers of 1.8ZB(10), wherein 75% comes from individual, mainly document, picture, video and music etc., 15 powers considerably beyond the data total amount 200PB(10 of all printing materials since the dawn of human civilization).US Internet data center points out, the data on internet will increase by 50% every year, every two years just will double, and at present in the world the data of more than 90% be just produce recent years, the process of massive structured data is extremely urgent.
In recent years, IT company large is abroad proposed oneself massive structured data processing scheme, the Stinger of the Greenplum of such as EMC Inc., Hortonworks company, Impala of Cloudera company etc.The core concept of these schemes is all the store and management being realized massive structured data by distributed parallel, wherein,
The Data distribution8 formula of Greenplum exists in PostgreSQL database Postgresql, and namely in cluster, every platform machine all installs Postgresql, and data are deposited by the mode of Hash, and each node deposits the partial data of a table.When performing inquiry, node containing data performs same operation, and result gathers and returns by last Master node;
SQL statement is resolved to directed acyclic graph DAG (Directed AcyclicGraph) by Stinger, namely operates the one query of data block.Bottom stores and adopts HDFS, and when performing query manipulation, performing DAG operation containing on the node of data, end product writes back in HDFS;
Impala bottom stores and adopts HDFS, when performing query manipulation, first generated query plan, then by this inquiry plan, the node be distributed to containing data block performs, what perform inquiry plan is distributed data base enforcement engine, and namely each node has a database enforcement engine to carry out data query.
The distributed parallel thought of existing massive structured data processing scheme, namely Data distribution8 formula is deposited, inquiry executed in parallel.In the inquiry of reality; often can carry out data filtering by some fields (row); all data that this table relates to all can by scanning one time; for the table that data volume is many; not containing object information in a lot of data block, query task is performed to these data blocks and can produce much invalid query task.
Summary of the invention
(1) technical matters that will solve
Technical matters to be solved by this invention is: often all data related in table are all scanned one time at existing data query, for the table that data volume is many, not containing object information in a lot of data block, when query task is performed to these data blocks, much invalid query task can be produced.
(2) technical scheme
For this purpose, the present invention proposes a kind of effective query task creating method, comprise the following steps:
Structuring table data are carried out piecemeal storage according to after key column sequence, obtains multiple data blocks that described structuring table data are corresponding;
The span obtaining key column in each data block creates data block index;
When receiving according to key column or comprising the query task of key column, according to described data block index, effective query task is generated to the data block containing object information.
Preferably, described method also comprises:
Receive the query task that client sends;
Judge whether described query task is according to key column or the query task comprising key column.
Preferably, when described query task is not according to key column or comprises the query task of key column, query task is performed to all data blocks.
Preferably, in each data block of described acquisition, the span establishment data block index of key column is specially:
Obtain the span of key column in each data block;
Record the data block index of span as this data block of key column in described data block.
Preferably, describedly according to described data block index, effective query task is generated to the data block containing object information, specifically comprises:
Extract the search condition of current queries task;
The data block index meeting described search condition is read according to described search condition;
The data block of object information is contained according to described data block index search;
Effective query task is generated to the data block containing object information, carries out data query.
Preferably, described method also comprises: for described structuring table data arrange key column.
In addition, present invention also offers a kind of effective query task generation system, comprise deblocking module, acquisition module and generation module;
Deblocking module, for structuring table data are carried out piecemeal storage according to after key column sequence, obtains multiple data blocks that described structuring table data are corresponding;
Acquisition module, creates data block index for the span obtaining key column in each data block;
Generation module, for when receiving according to key column or comprising the query task of key column, generates effective query task according to described data block index to the data block containing object information.
Preferably, described system also comprises: receiver module and judge module;
Receiver module, for receiving the query task that client sends;
Judge module, for judging whether described query task is according to key column or the query task comprising key column.
Preferably, described acquisition module comprises: acquiring unit and record cell;
Acquiring unit, for obtaining the span of key column in each data block;
Record cell, for recording the data block index of span as this data block of key column in described data block.
Preferably, described generation module comprises: extraction unit, reading unit, search unit and generation unit;
Extraction unit, for extracting the search condition of current queries task;
Reading unit, for reading the data block index meeting described search condition according to described search condition;
Search unit, for containing the data block of object information according to described data block index search;
Generation unit, for generating effective query task to the data block containing object information, carries out data query.
(3) beneficial effect
By adopting a kind of effective query task creating method disclosed by the invention and system, the method creates data block index based on the span of particular column in data block, generate effective query task, reduce invalid query task, improve the speed of data processing, reduce the burden of data management system, and this system is general and stable, the difficulty of system development and test is low, easily realizes.
Accompanying drawing explanation
Can understanding the features and advantages of the present invention clearly by reference to accompanying drawing, accompanying drawing is schematic and should not be construed as and carry out any restriction to the present invention, in the accompanying drawings:
Fig. 1 is the process flow diagram of a kind of effective query task creating method of the present invention;
Fig. 2 is the module map of a kind of effective query task of the present invention generation system.
Embodiment
Below in conjunction with accompanying drawing, embodiments of the present invention is described in detail.
The present invention proposes a kind of effective query task creating method and system, and this system adopts distributed system architecture, and structuring table data are stored in HDFS, and structuring table data store according to particular column piecemeal, and the span of particular column is as index stores.The query manipulation of table is realized by distributed data base enforcement engine, and the node namely containing a certain list data block starts database enforcement engine and carries out data query operation.Native system can realize the generation of particular column as effective query task during search condition.Node in system deployment request data center is interconnected by switch, and all nodes all mutually can be accessed and can be carried out data transmission.
This system adopts distributed structure/architecture, runs based on Hadoop distributed file system HDFS, and each node of data center runs a finger daemon of executing the task, and this process is responsible for receiving the querying command inquired about and perform other finger daemon and send.The node of the request of receiving an assignment is called task scheduling node, for current queries is responsible for.After being responsible for a certain query manipulation, being responsible for the distribution of current queries task and returning gathering of result.The method can to the query generation effective query task retrieved according to particular column.Wherein, effective query task refers to that the current data block that will operate contains last object information, and the invalid query task corresponding with it refers to those query tasks certainly not having object information.
The embodiment of the present invention proposes a kind of effective query task creating method, as shown in Figure 1, comprises the following steps:
Structuring table data are carried out piecemeal storage according to after key column sequence, are obtained multiple data blocks that described structuring table data are corresponding by step 101;
By suitable instrument, the database file (such as Oracle file, MySQL file, DB2 file etc.) of standard is imported in HDFS according to particular column sequence, after importing HDFS, in each data block, the span of this particular column is different, sort in certain sequence, obtain multiple data blocks that in the database file of standard, structuring table data are corresponding.The described specific key column being classified as structuring table data.
Step 102, the span obtaining key column in each data block creates data block index;
The span of key column in each data block will be recorded as index after data block storage simultaneously.
Step 103, when receiving according to key column or comprising the query task of key column, generates effective query task according to described data block index to the data block containing object information.
When receiving according to key column or comprising the query task of key column, current queries task can carry out the generation of effective query task by particular column index, first the search condition of current queries task and the span of particular column is extracted, the data block index meeting described search condition is read according to described search condition, the data block of object information is contained according to described data block index search, then effective query task is generated to the data block containing object information, carry out data query.
Preferably, described method also comprises:
Step 201, receives the query task that client sends;
Step 202, judges whether described query task is according to key column or the query task comprising key column.
In the embodiment of the present invention, when performing inquiry to certain table, if current query manipulation is according to particular column or the inquiry comprising particular column, so this query manipulation just can generate effective query task by index, namely only to the data block generated query task containing object information, to certainly containing the data block just not generated query task of object information, ensure the maximization of task efficiency.
Preferably, when described query task is not according to key column or comprises the query task of key column, query task is performed to all data blocks.
Preferably, in each data block of described acquisition, the span establishment data block index of key column is specially:
Step 301, obtains the span of key column in each data block;
Step 302, records the data block index of span as this data block of key column in described data block.
Preferably, describedly according to described data block index, effective query task is generated to the data block containing object information, specifically comprises:
Step 401, extracts the search condition of current queries task;
Step 402, reads the data block index meeting described search condition according to described search condition;
Step 403, contains the data block of object information according to described data block index search;
Step 404, generates effective query task to the data block containing object information, carries out data query.
Preferably, described method also comprises: for described structuring table data arrange key column.
In addition, the invention process row additionally provide a kind of effective query task generation system, and as shown in Figure 2, this system comprises deblocking module 1, acquisition module 2 and generation module 3;
Deblocking module 1, for structuring table data are carried out piecemeal storage according to after key column sequence, obtains multiple data blocks that described structuring table data are corresponding;
Acquisition module 2, creates data block index for the span obtaining key column in each data block;
Generation module 3, for when receiving according to key column or comprising the query task of key column, generates effective query task according to described data block index to the data block containing object information.
Preferably, described system also comprises: receiver module and judge module;
Receiver module, for receiving the query task that client sends;
Judge module, for judging whether described query task is according to key column or the query task comprising key column.
Preferably, described acquisition module comprises: acquiring unit and record cell;
Acquiring unit, for obtaining the span of key column in each data block;
Record cell, for recording the data block index of span as this data block of key column in described data block.
Preferably, described generation module comprises: extraction unit, reading unit, search unit and generation unit;
Extraction unit, for extracting the search condition of current queries task;
Reading unit, for reading the data block index meeting described search condition according to described search condition;
Search unit, for containing the data block of object information according to described data block index search;
Generation unit, for generating effective query task to the data block containing object information, carries out data query.
By adopting a kind of effective query task creating method disclosed by the invention and system, the method creates data block index based on the span of particular column in data block, generate effective query task, reduce invalid query task, improve the speed of data processing, reduce the burden of data management system, and this system is general and stable, the difficulty of system development and test is low, easily realizes.
Through the above description of the embodiments, those skilled in the art can be well understood to the present invention can by hardware implementing, and the mode that also can add necessary general hardware platform by software realizes.Based on such understanding, technical scheme of the present invention can embody with the form of software product, it (can be CD-ROM that this software product can be stored in a non-volatile memory medium, USB flash disk, portable hard drive etc.) in, comprise some instructions and perform method described in each embodiment of the present invention in order to make a computer equipment (can be personal computer, server, or the network equipment etc.).
It will be appreciated by those skilled in the art that accompanying drawing is the schematic diagram of a preferred embodiment, the module in accompanying drawing or flow process might not be that enforcement the present invention is necessary.
The foregoing is only embodiments of the invention; not thereby the scope of the claims of the present invention is limited; every utilize instructions of the present invention and accompanying drawing content to do equivalent structure or equivalent flow process conversion; or be directly or indirectly used in other relevant technical fields, be all in like manner included in scope of patent protection of the present invention.

Claims (10)

1. an effective query task creating method, is characterized in that, comprises the following steps:
Structuring table data are carried out piecemeal storage according to after key column sequence, obtains multiple data blocks that described structuring table data are corresponding;
The span obtaining key column in each data block creates data block index;
When receiving according to key column or comprising the query task of key column, according to described data block index, effective query task is generated to the data block containing object information.
2. method according to claim 1, is characterized in that, described method also comprises:
Receive the query task that client sends;
Judge whether described query task is according to key column or the query task comprising key column.
3. method according to claim 2, is characterized in that, when described query task is not according to key column or comprises the query task of key column, performs query task to all data blocks.
4. method according to claim 1, is characterized in that, in each data block of described acquisition, the span establishment data block index of key column is specially:
Obtain the span of key column in each data block;
Record the data block index of span as this data block of key column in described data block.
5. method according to claim 1, is characterized in that, describedly generates effective query task according to described data block index to the data block containing object information, specifically comprises:
Extract the search condition of current queries task;
The data block index meeting described search condition is read according to described search condition;
The data block of object information is contained according to described data block index search;
Effective query task is generated to the data block containing object information, carries out data query.
6. the method according to claim 1-5, is characterized in that, described method also comprises: for described structuring table data arrange key column.
7. an effective query task generation system, is characterized in that, comprises deblocking module, acquisition module and generation module;
Deblocking module, for structuring table data are carried out piecemeal storage according to after key column sequence, obtains multiple data blocks that described structuring table data are corresponding;
Acquisition module, creates data block index for the span obtaining key column in each data block;
Generation module, for when receiving according to key column or comprising the query task of key column, generates effective query task according to described data block index to the data block containing object information.
8. system according to claim 7, is characterized in that, described system also comprises: receiver module and judge module;
Receiver module, for receiving the query task that client sends;
Judge module, for judging whether described query task is according to key column or the query task comprising key column.
9. system according to claim 7, is characterized in that, described acquisition module comprises: acquiring unit and record cell;
Acquiring unit, for obtaining the span of key column in each data block;
Record cell, for recording the data block index of span as this data block of key column in described data block.
10. system according to claim 7, is characterized in that, described generation module comprises: extraction unit, reading unit, search unit and generation unit;
Extraction unit, for extracting the search condition of current queries task;
Reading unit, for reading the data block index meeting described search condition according to described search condition;
Search unit, for containing the data block of object information according to described data block index search;
Generation unit, for generating effective query task to the data block containing object information, carries out data query.
CN201410049127.2A 2014-02-12 2014-02-12 Method and system for generating effective query tasks Pending CN104834650A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410049127.2A CN104834650A (en) 2014-02-12 2014-02-12 Method and system for generating effective query tasks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410049127.2A CN104834650A (en) 2014-02-12 2014-02-12 Method and system for generating effective query tasks

Publications (1)

Publication Number Publication Date
CN104834650A true CN104834650A (en) 2015-08-12

Family

ID=53812544

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410049127.2A Pending CN104834650A (en) 2014-02-12 2014-02-12 Method and system for generating effective query tasks

Country Status (1)

Country Link
CN (1) CN104834650A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202209A (en) * 2016-06-28 2016-12-07 北京信息科技大学 The storage of distributed structured data and querying method towards commodity screening application
CN107515878A (en) * 2016-06-16 2017-12-26 苏宁云商集团股份有限公司 The management method and device of a kind of data directory
CN107577436A (en) * 2017-09-18 2018-01-12 杭州时趣信息技术有限公司 A kind of date storage method and device
CN108427675A (en) * 2017-02-13 2018-08-21 阿里巴巴集团控股有限公司 Build the method and apparatus of index
CN108874954A (en) * 2018-06-04 2018-11-23 深圳市华傲数据技术有限公司 A kind of optimization method of data base querying, medium and equipment
CN109094574A (en) * 2018-08-01 2018-12-28 长安大学 A kind of unmanned vehicle driving condition Measurement and Control System based on rack
CN109669622A (en) * 2017-10-13 2019-04-23 杭州海康威视系统技术有限公司 A kind of file management method, document management apparatus, electronic equipment and storage medium
WO2021179782A1 (en) * 2020-03-13 2021-09-16 苏州浪潮智能科技有限公司 Method, device and apparatus for improving execution efficiency of database appliance, and medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101533406A (en) * 2009-04-10 2009-09-16 北京锐安科技有限公司 Mass data querying method
CN102890978A (en) * 2012-09-25 2013-01-23 无锡市圣恩线缆有限公司 Multi-core control cable

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101533406A (en) * 2009-04-10 2009-09-16 北京锐安科技有限公司 Mass data querying method
CN102890978A (en) * 2012-09-25 2013-01-23 无锡市圣恩线缆有限公司 Multi-core control cable

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YU-XIANG WANG ET AL: "Partition-Based Online Aggregation with Shared Sampling in the Cloud", 《JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107515878A (en) * 2016-06-16 2017-12-26 苏宁云商集团股份有限公司 The management method and device of a kind of data directory
CN107515878B (en) * 2016-06-16 2020-12-22 苏宁云计算有限公司 Data index management method and device
CN106202209A (en) * 2016-06-28 2016-12-07 北京信息科技大学 The storage of distributed structured data and querying method towards commodity screening application
CN106202209B (en) * 2016-06-28 2019-10-18 北京信息科技大学 The storage of distributed structured data and querying method towards commodity screening application
CN108427675A (en) * 2017-02-13 2018-08-21 阿里巴巴集团控股有限公司 Build the method and apparatus of index
CN107577436A (en) * 2017-09-18 2018-01-12 杭州时趣信息技术有限公司 A kind of date storage method and device
CN107577436B (en) * 2017-09-18 2020-07-07 杭州时趣信息技术有限公司 Data storage method and device
CN109669622A (en) * 2017-10-13 2019-04-23 杭州海康威视系统技术有限公司 A kind of file management method, document management apparatus, electronic equipment and storage medium
CN109669622B (en) * 2017-10-13 2022-04-05 杭州海康威视系统技术有限公司 File management method, file management device, electronic equipment and storage medium
CN108874954A (en) * 2018-06-04 2018-11-23 深圳市华傲数据技术有限公司 A kind of optimization method of data base querying, medium and equipment
CN109094574A (en) * 2018-08-01 2018-12-28 长安大学 A kind of unmanned vehicle driving condition Measurement and Control System based on rack
WO2021179782A1 (en) * 2020-03-13 2021-09-16 苏州浪潮智能科技有限公司 Method, device and apparatus for improving execution efficiency of database appliance, and medium

Similar Documents

Publication Publication Date Title
CN104834650A (en) Method and system for generating effective query tasks
Han et al. Hgrid: A data model for large geospatial data sets in hbase
CN107169083B (en) Mass vehicle data storage and retrieval method and device for public security card port and electronic equipment
CN104794123B (en) A kind of method and device building NoSQL database indexes for semi-structured data
JP6535031B2 (en) Data query method and apparatus
CN103020281B (en) A kind of data storage and retrieval method based on spatial data numerical index
JP2019194882A (en) Mounting of semi-structure data as first class database element
US9619512B2 (en) Memory searching system and method, real-time searching system and method, and computer storage medium
CN104133867A (en) DOT in-fragment secondary index method and DOT in-fragment secondary index system
CN105718455A (en) Data query method and apparatus
CN104408163B (en) A kind of data classification storage and device
CN102779138B (en) The hard disk access method of real time data
CN106471501B (en) Data query method, data object storage method and data system
CN102880709A (en) Data warehouse management system and data warehouse management method
KR101790766B1 (en) Method, device and terminal for data search
CN106960020B (en) A kind of method and apparatus creating concordance list
CN104239377A (en) Platform-crossing data retrieval method and device
CN102880541A (en) Log information acquisition system and log information acquisition method
CN106649870A (en) Distributed implementation method for search engine
CN106649412B (en) Data processing method and equipment
CN111258978A (en) Data storage method
CN112262379A (en) Storing data items and identifying stored data items
CN107391769B (en) Index query method and device
CN102508884A (en) Method and device for acquiring hotpot events and real-time comments
CN101963993B (en) Method for fast searching database sheet table record

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20150812