CN104834650A - Method and system for generating effective query tasks - Google Patents
Method and system for generating effective query tasks Download PDFInfo
- Publication number
- CN104834650A CN104834650A CN201410049127.2A CN201410049127A CN104834650A CN 104834650 A CN104834650 A CN 104834650A CN 201410049127 A CN201410049127 A CN 201410049127A CN 104834650 A CN104834650 A CN 104834650A
- Authority
- CN
- China
- Prior art keywords
- data block
- key column
- data
- query task
- block index
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention discloses a method and system for generating effective query tasks. The method includes the following steps: storing the structured table data in blocks after sorting the data by a key column to obtain a plurality of data blocks corresponding to the structured table data; obtaining the value range of the key column in each data block to create data block index; according to the data block index, generating effective query tasks for the data block containing result information when a query task of query by or including the key column is received. The method of generating effective query tasks provided in the present invention, creates data block index based on the value range of a specific column in a data block, generates effective query tasks, thus reducing invalid query tasks, improving the speed of data processing, and lowering the burden of the data management system.
Description
Technical field
The present invention relates to microcomputer data processing field, particularly relate to a kind of effective query task creating method and system.
Background technology
Along with the fast development of internet and the universal rapidly of various mobile terminal, the data scale of relevant enterprise and unit maintains sustained and rapid growth, especially internet data, its data scale constantly expands with index rank, and can keep this rising tendency within a period of time in future always.According to famous consulting firm IDC(International Data Corporation, International Data Corporation (IDC)) statistics, the global data total amount being created and copying in 2011 is 21 powers of 1.8ZB(10), wherein 75% comes from individual, mainly document, picture, video and music etc., 15 powers considerably beyond the data total amount 200PB(10 of all printing materials since the dawn of human civilization).US Internet data center points out, the data on internet will increase by 50% every year, every two years just will double, and at present in the world the data of more than 90% be just produce recent years, the process of massive structured data is extremely urgent.
In recent years, IT company large is abroad proposed oneself massive structured data processing scheme, the Stinger of the Greenplum of such as EMC Inc., Hortonworks company, Impala of Cloudera company etc.The core concept of these schemes is all the store and management being realized massive structured data by distributed parallel, wherein,
The Data distribution8 formula of Greenplum exists in PostgreSQL database Postgresql, and namely in cluster, every platform machine all installs Postgresql, and data are deposited by the mode of Hash, and each node deposits the partial data of a table.When performing inquiry, node containing data performs same operation, and result gathers and returns by last Master node;
SQL statement is resolved to directed acyclic graph DAG (Directed AcyclicGraph) by Stinger, namely operates the one query of data block.Bottom stores and adopts HDFS, and when performing query manipulation, performing DAG operation containing on the node of data, end product writes back in HDFS;
Impala bottom stores and adopts HDFS, when performing query manipulation, first generated query plan, then by this inquiry plan, the node be distributed to containing data block performs, what perform inquiry plan is distributed data base enforcement engine, and namely each node has a database enforcement engine to carry out data query.
The distributed parallel thought of existing massive structured data processing scheme, namely Data distribution8 formula is deposited, inquiry executed in parallel.In the inquiry of reality; often can carry out data filtering by some fields (row); all data that this table relates to all can by scanning one time; for the table that data volume is many; not containing object information in a lot of data block, query task is performed to these data blocks and can produce much invalid query task.
Summary of the invention
(1) technical matters that will solve
Technical matters to be solved by this invention is: often all data related in table are all scanned one time at existing data query, for the table that data volume is many, not containing object information in a lot of data block, when query task is performed to these data blocks, much invalid query task can be produced.
(2) technical scheme
For this purpose, the present invention proposes a kind of effective query task creating method, comprise the following steps:
Structuring table data are carried out piecemeal storage according to after key column sequence, obtains multiple data blocks that described structuring table data are corresponding;
The span obtaining key column in each data block creates data block index;
When receiving according to key column or comprising the query task of key column, according to described data block index, effective query task is generated to the data block containing object information.
Preferably, described method also comprises:
Receive the query task that client sends;
Judge whether described query task is according to key column or the query task comprising key column.
Preferably, when described query task is not according to key column or comprises the query task of key column, query task is performed to all data blocks.
Preferably, in each data block of described acquisition, the span establishment data block index of key column is specially:
Obtain the span of key column in each data block;
Record the data block index of span as this data block of key column in described data block.
Preferably, describedly according to described data block index, effective query task is generated to the data block containing object information, specifically comprises:
Extract the search condition of current queries task;
The data block index meeting described search condition is read according to described search condition;
The data block of object information is contained according to described data block index search;
Effective query task is generated to the data block containing object information, carries out data query.
Preferably, described method also comprises: for described structuring table data arrange key column.
In addition, present invention also offers a kind of effective query task generation system, comprise deblocking module, acquisition module and generation module;
Deblocking module, for structuring table data are carried out piecemeal storage according to after key column sequence, obtains multiple data blocks that described structuring table data are corresponding;
Acquisition module, creates data block index for the span obtaining key column in each data block;
Generation module, for when receiving according to key column or comprising the query task of key column, generates effective query task according to described data block index to the data block containing object information.
Preferably, described system also comprises: receiver module and judge module;
Receiver module, for receiving the query task that client sends;
Judge module, for judging whether described query task is according to key column or the query task comprising key column.
Preferably, described acquisition module comprises: acquiring unit and record cell;
Acquiring unit, for obtaining the span of key column in each data block;
Record cell, for recording the data block index of span as this data block of key column in described data block.
Preferably, described generation module comprises: extraction unit, reading unit, search unit and generation unit;
Extraction unit, for extracting the search condition of current queries task;
Reading unit, for reading the data block index meeting described search condition according to described search condition;
Search unit, for containing the data block of object information according to described data block index search;
Generation unit, for generating effective query task to the data block containing object information, carries out data query.
(3) beneficial effect
By adopting a kind of effective query task creating method disclosed by the invention and system, the method creates data block index based on the span of particular column in data block, generate effective query task, reduce invalid query task, improve the speed of data processing, reduce the burden of data management system, and this system is general and stable, the difficulty of system development and test is low, easily realizes.
Accompanying drawing explanation
Can understanding the features and advantages of the present invention clearly by reference to accompanying drawing, accompanying drawing is schematic and should not be construed as and carry out any restriction to the present invention, in the accompanying drawings:
Fig. 1 is the process flow diagram of a kind of effective query task creating method of the present invention;
Fig. 2 is the module map of a kind of effective query task of the present invention generation system.
Embodiment
Below in conjunction with accompanying drawing, embodiments of the present invention is described in detail.
The present invention proposes a kind of effective query task creating method and system, and this system adopts distributed system architecture, and structuring table data are stored in HDFS, and structuring table data store according to particular column piecemeal, and the span of particular column is as index stores.The query manipulation of table is realized by distributed data base enforcement engine, and the node namely containing a certain list data block starts database enforcement engine and carries out data query operation.Native system can realize the generation of particular column as effective query task during search condition.Node in system deployment request data center is interconnected by switch, and all nodes all mutually can be accessed and can be carried out data transmission.
This system adopts distributed structure/architecture, runs based on Hadoop distributed file system HDFS, and each node of data center runs a finger daemon of executing the task, and this process is responsible for receiving the querying command inquired about and perform other finger daemon and send.The node of the request of receiving an assignment is called task scheduling node, for current queries is responsible for.After being responsible for a certain query manipulation, being responsible for the distribution of current queries task and returning gathering of result.The method can to the query generation effective query task retrieved according to particular column.Wherein, effective query task refers to that the current data block that will operate contains last object information, and the invalid query task corresponding with it refers to those query tasks certainly not having object information.
The embodiment of the present invention proposes a kind of effective query task creating method, as shown in Figure 1, comprises the following steps:
Structuring table data are carried out piecemeal storage according to after key column sequence, are obtained multiple data blocks that described structuring table data are corresponding by step 101;
By suitable instrument, the database file (such as Oracle file, MySQL file, DB2 file etc.) of standard is imported in HDFS according to particular column sequence, after importing HDFS, in each data block, the span of this particular column is different, sort in certain sequence, obtain multiple data blocks that in the database file of standard, structuring table data are corresponding.The described specific key column being classified as structuring table data.
Step 102, the span obtaining key column in each data block creates data block index;
The span of key column in each data block will be recorded as index after data block storage simultaneously.
Step 103, when receiving according to key column or comprising the query task of key column, generates effective query task according to described data block index to the data block containing object information.
When receiving according to key column or comprising the query task of key column, current queries task can carry out the generation of effective query task by particular column index, first the search condition of current queries task and the span of particular column is extracted, the data block index meeting described search condition is read according to described search condition, the data block of object information is contained according to described data block index search, then effective query task is generated to the data block containing object information, carry out data query.
Preferably, described method also comprises:
Step 201, receives the query task that client sends;
Step 202, judges whether described query task is according to key column or the query task comprising key column.
In the embodiment of the present invention, when performing inquiry to certain table, if current query manipulation is according to particular column or the inquiry comprising particular column, so this query manipulation just can generate effective query task by index, namely only to the data block generated query task containing object information, to certainly containing the data block just not generated query task of object information, ensure the maximization of task efficiency.
Preferably, when described query task is not according to key column or comprises the query task of key column, query task is performed to all data blocks.
Preferably, in each data block of described acquisition, the span establishment data block index of key column is specially:
Step 301, obtains the span of key column in each data block;
Step 302, records the data block index of span as this data block of key column in described data block.
Preferably, describedly according to described data block index, effective query task is generated to the data block containing object information, specifically comprises:
Step 401, extracts the search condition of current queries task;
Step 402, reads the data block index meeting described search condition according to described search condition;
Step 403, contains the data block of object information according to described data block index search;
Step 404, generates effective query task to the data block containing object information, carries out data query.
Preferably, described method also comprises: for described structuring table data arrange key column.
In addition, the invention process row additionally provide a kind of effective query task generation system, and as shown in Figure 2, this system comprises deblocking module 1, acquisition module 2 and generation module 3;
Deblocking module 1, for structuring table data are carried out piecemeal storage according to after key column sequence, obtains multiple data blocks that described structuring table data are corresponding;
Acquisition module 2, creates data block index for the span obtaining key column in each data block;
Generation module 3, for when receiving according to key column or comprising the query task of key column, generates effective query task according to described data block index to the data block containing object information.
Preferably, described system also comprises: receiver module and judge module;
Receiver module, for receiving the query task that client sends;
Judge module, for judging whether described query task is according to key column or the query task comprising key column.
Preferably, described acquisition module comprises: acquiring unit and record cell;
Acquiring unit, for obtaining the span of key column in each data block;
Record cell, for recording the data block index of span as this data block of key column in described data block.
Preferably, described generation module comprises: extraction unit, reading unit, search unit and generation unit;
Extraction unit, for extracting the search condition of current queries task;
Reading unit, for reading the data block index meeting described search condition according to described search condition;
Search unit, for containing the data block of object information according to described data block index search;
Generation unit, for generating effective query task to the data block containing object information, carries out data query.
By adopting a kind of effective query task creating method disclosed by the invention and system, the method creates data block index based on the span of particular column in data block, generate effective query task, reduce invalid query task, improve the speed of data processing, reduce the burden of data management system, and this system is general and stable, the difficulty of system development and test is low, easily realizes.
Through the above description of the embodiments, those skilled in the art can be well understood to the present invention can by hardware implementing, and the mode that also can add necessary general hardware platform by software realizes.Based on such understanding, technical scheme of the present invention can embody with the form of software product, it (can be CD-ROM that this software product can be stored in a non-volatile memory medium, USB flash disk, portable hard drive etc.) in, comprise some instructions and perform method described in each embodiment of the present invention in order to make a computer equipment (can be personal computer, server, or the network equipment etc.).
It will be appreciated by those skilled in the art that accompanying drawing is the schematic diagram of a preferred embodiment, the module in accompanying drawing or flow process might not be that enforcement the present invention is necessary.
The foregoing is only embodiments of the invention; not thereby the scope of the claims of the present invention is limited; every utilize instructions of the present invention and accompanying drawing content to do equivalent structure or equivalent flow process conversion; or be directly or indirectly used in other relevant technical fields, be all in like manner included in scope of patent protection of the present invention.
Claims (10)
1. an effective query task creating method, is characterized in that, comprises the following steps:
Structuring table data are carried out piecemeal storage according to after key column sequence, obtains multiple data blocks that described structuring table data are corresponding;
The span obtaining key column in each data block creates data block index;
When receiving according to key column or comprising the query task of key column, according to described data block index, effective query task is generated to the data block containing object information.
2. method according to claim 1, is characterized in that, described method also comprises:
Receive the query task that client sends;
Judge whether described query task is according to key column or the query task comprising key column.
3. method according to claim 2, is characterized in that, when described query task is not according to key column or comprises the query task of key column, performs query task to all data blocks.
4. method according to claim 1, is characterized in that, in each data block of described acquisition, the span establishment data block index of key column is specially:
Obtain the span of key column in each data block;
Record the data block index of span as this data block of key column in described data block.
5. method according to claim 1, is characterized in that, describedly generates effective query task according to described data block index to the data block containing object information, specifically comprises:
Extract the search condition of current queries task;
The data block index meeting described search condition is read according to described search condition;
The data block of object information is contained according to described data block index search;
Effective query task is generated to the data block containing object information, carries out data query.
6. the method according to claim 1-5, is characterized in that, described method also comprises: for described structuring table data arrange key column.
7. an effective query task generation system, is characterized in that, comprises deblocking module, acquisition module and generation module;
Deblocking module, for structuring table data are carried out piecemeal storage according to after key column sequence, obtains multiple data blocks that described structuring table data are corresponding;
Acquisition module, creates data block index for the span obtaining key column in each data block;
Generation module, for when receiving according to key column or comprising the query task of key column, generates effective query task according to described data block index to the data block containing object information.
8. system according to claim 7, is characterized in that, described system also comprises: receiver module and judge module;
Receiver module, for receiving the query task that client sends;
Judge module, for judging whether described query task is according to key column or the query task comprising key column.
9. system according to claim 7, is characterized in that, described acquisition module comprises: acquiring unit and record cell;
Acquiring unit, for obtaining the span of key column in each data block;
Record cell, for recording the data block index of span as this data block of key column in described data block.
10. system according to claim 7, is characterized in that, described generation module comprises: extraction unit, reading unit, search unit and generation unit;
Extraction unit, for extracting the search condition of current queries task;
Reading unit, for reading the data block index meeting described search condition according to described search condition;
Search unit, for containing the data block of object information according to described data block index search;
Generation unit, for generating effective query task to the data block containing object information, carries out data query.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410049127.2A CN104834650A (en) | 2014-02-12 | 2014-02-12 | Method and system for generating effective query tasks |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410049127.2A CN104834650A (en) | 2014-02-12 | 2014-02-12 | Method and system for generating effective query tasks |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104834650A true CN104834650A (en) | 2015-08-12 |
Family
ID=53812544
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410049127.2A Pending CN104834650A (en) | 2014-02-12 | 2014-02-12 | Method and system for generating effective query tasks |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104834650A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106202209A (en) * | 2016-06-28 | 2016-12-07 | 北京信息科技大学 | The storage of distributed structured data and querying method towards commodity screening application |
CN107515878A (en) * | 2016-06-16 | 2017-12-26 | 苏宁云商集团股份有限公司 | The management method and device of a kind of data directory |
CN107577436A (en) * | 2017-09-18 | 2018-01-12 | 杭州时趣信息技术有限公司 | A kind of date storage method and device |
CN108427675A (en) * | 2017-02-13 | 2018-08-21 | 阿里巴巴集团控股有限公司 | Build the method and apparatus of index |
CN108874954A (en) * | 2018-06-04 | 2018-11-23 | 深圳市华傲数据技术有限公司 | A kind of optimization method of data base querying, medium and equipment |
CN109094574A (en) * | 2018-08-01 | 2018-12-28 | 长安大学 | A kind of unmanned vehicle driving condition Measurement and Control System based on rack |
CN109669622A (en) * | 2017-10-13 | 2019-04-23 | 杭州海康威视系统技术有限公司 | A kind of file management method, document management apparatus, electronic equipment and storage medium |
WO2021179782A1 (en) * | 2020-03-13 | 2021-09-16 | 苏州浪潮智能科技有限公司 | Method, device and apparatus for improving execution efficiency of database appliance, and medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101533406A (en) * | 2009-04-10 | 2009-09-16 | 北京锐安科技有限公司 | Mass data querying method |
CN102890978A (en) * | 2012-09-25 | 2013-01-23 | 无锡市圣恩线缆有限公司 | Multi-core control cable |
-
2014
- 2014-02-12 CN CN201410049127.2A patent/CN104834650A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101533406A (en) * | 2009-04-10 | 2009-09-16 | 北京锐安科技有限公司 | Mass data querying method |
CN102890978A (en) * | 2012-09-25 | 2013-01-23 | 无锡市圣恩线缆有限公司 | Multi-core control cable |
Non-Patent Citations (1)
Title |
---|
YU-XIANG WANG ET AL: "Partition-Based Online Aggregation with Shared Sampling in the Cloud", 《JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107515878A (en) * | 2016-06-16 | 2017-12-26 | 苏宁云商集团股份有限公司 | The management method and device of a kind of data directory |
CN107515878B (en) * | 2016-06-16 | 2020-12-22 | 苏宁云计算有限公司 | Data index management method and device |
CN106202209A (en) * | 2016-06-28 | 2016-12-07 | 北京信息科技大学 | The storage of distributed structured data and querying method towards commodity screening application |
CN106202209B (en) * | 2016-06-28 | 2019-10-18 | 北京信息科技大学 | The storage of distributed structured data and querying method towards commodity screening application |
CN108427675A (en) * | 2017-02-13 | 2018-08-21 | 阿里巴巴集团控股有限公司 | Build the method and apparatus of index |
CN107577436A (en) * | 2017-09-18 | 2018-01-12 | 杭州时趣信息技术有限公司 | A kind of date storage method and device |
CN107577436B (en) * | 2017-09-18 | 2020-07-07 | 杭州时趣信息技术有限公司 | Data storage method and device |
CN109669622A (en) * | 2017-10-13 | 2019-04-23 | 杭州海康威视系统技术有限公司 | A kind of file management method, document management apparatus, electronic equipment and storage medium |
CN109669622B (en) * | 2017-10-13 | 2022-04-05 | 杭州海康威视系统技术有限公司 | File management method, file management device, electronic equipment and storage medium |
CN108874954A (en) * | 2018-06-04 | 2018-11-23 | 深圳市华傲数据技术有限公司 | A kind of optimization method of data base querying, medium and equipment |
CN109094574A (en) * | 2018-08-01 | 2018-12-28 | 长安大学 | A kind of unmanned vehicle driving condition Measurement and Control System based on rack |
WO2021179782A1 (en) * | 2020-03-13 | 2021-09-16 | 苏州浪潮智能科技有限公司 | Method, device and apparatus for improving execution efficiency of database appliance, and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104834650A (en) | Method and system for generating effective query tasks | |
Han et al. | Hgrid: A data model for large geospatial data sets in hbase | |
CN107169083B (en) | Mass vehicle data storage and retrieval method and device for public security card port and electronic equipment | |
CN104794123B (en) | A kind of method and device building NoSQL database indexes for semi-structured data | |
JP6535031B2 (en) | Data query method and apparatus | |
CN103020281B (en) | A kind of data storage and retrieval method based on spatial data numerical index | |
JP2019194882A (en) | Mounting of semi-structure data as first class database element | |
US9619512B2 (en) | Memory searching system and method, real-time searching system and method, and computer storage medium | |
CN104133867A (en) | DOT in-fragment secondary index method and DOT in-fragment secondary index system | |
CN105718455A (en) | Data query method and apparatus | |
CN104408163B (en) | A kind of data classification storage and device | |
CN102779138B (en) | The hard disk access method of real time data | |
CN106471501B (en) | Data query method, data object storage method and data system | |
CN102880709A (en) | Data warehouse management system and data warehouse management method | |
KR101790766B1 (en) | Method, device and terminal for data search | |
CN106960020B (en) | A kind of method and apparatus creating concordance list | |
CN104239377A (en) | Platform-crossing data retrieval method and device | |
CN102880541A (en) | Log information acquisition system and log information acquisition method | |
CN106649870A (en) | Distributed implementation method for search engine | |
CN106649412B (en) | Data processing method and equipment | |
CN111258978A (en) | Data storage method | |
CN112262379A (en) | Storing data items and identifying stored data items | |
CN107391769B (en) | Index query method and device | |
CN102508884A (en) | Method and device for acquiring hotpot events and real-time comments | |
CN101963993B (en) | Method for fast searching database sheet table record |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
EXSB | Decision made by sipo to initiate substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20150812 |