CN105487925B - data scanning method and device - Google Patents
data scanning method and device Download PDFInfo
- Publication number
- CN105487925B CN105487925B CN201510898272.2A CN201510898272A CN105487925B CN 105487925 B CN105487925 B CN 105487925B CN 201510898272 A CN201510898272 A CN 201510898272A CN 105487925 B CN105487925 B CN 105487925B
- Authority
- CN
- China
- Prior art keywords
- subset
- data
- scan
- scan instruction
- tables
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of data scanning method and device, and this method may include: to obtain the subset information of the corresponding tables of data of current task;According to subset information, the subset that each scan instruction for including in current task is covered when inquiring tables of data is judged;Start corresponding scan procedure correspondingly with capped subset, to carry out data scanning.Analysis task totality Map quantity can be reduced through the invention, so that expense and delay in terms of reducing task schedule, promote task global analysis performance.
Description
Technical field
The present invention relates to field of computer technology more particularly to a kind of data scanning methods and device.
Background technique
In the related art, large-scale data can be realized by Computational frames such as Map-Reduce (mapping-reduction)
Collect parallel parsing, for example HBase (Hadoop Database) database also provides through Map-Reduce and analyzes its data
Library function, allow user by way of inputting List<Scan>task, initiate to the scanning analysis of database table.
However, the relevant technologies in the treatment process to List<Scan>task, need to sweep for each Scan therein
It retouches instruction and corresponding Map scan procedure is respectively started to carry out data scanning, and usually often wrapped in List<Scan>task
Containing many Scan scan instructions, in some instances it may even be possible to reach a Scan scan instructions up to a hundred, result in the need for starting simultaneously very more Map
Scan procedure, and the time that consumption is exited in the scheduling of each Map scan procedure and starting is long, eventually leads to List<Scan>
Task needs to expend for a long time.
Summary of the invention
In view of this, the present invention provides a kind of data scanning method and device, to solve above-mentioned technology in the related technology
Problem.
The present invention provides the following technical scheme that
According to the first aspect of the invention, a kind of data scanning method is proposed, comprising:
Obtain the subset information of the corresponding tables of data of current task;
According to the subset information, judge each scan instruction for including in the current task when inquiring the tables of data
The subset of covering;
Start corresponding scan procedure correspondingly with capped subset, to carry out data scanning.
According to the second aspect of the invention, a kind of data scanner is proposed, comprising:
Subset information acquiring unit, for obtaining the subset information of the corresponding tables of data of current task;
Subset judging unit, for judging each scan instruction for including in the current task according to the subset information
The subset covered when inquiring the tables of data;
Execution unit is swept for starting corresponding scan procedure correspondingly with capped subset with carrying out data.
By above technical scheme as it can be seen that the son that the present invention is covered by analyzing each scan instruction when inquiring data
Collect, and start the scan procedure of respective numbers with the quantity of the subset of covering, is i.e. starting scans correspondingly with each subset
Process, so as to reduce analysis task totality Map quantity, so that expense and delay in terms of reducing task schedule, are promoted and appointed
Business global analysis performance.
Detailed description of the invention
Fig. 1 is a kind of flow chart of the data scanning method provided in the embodiment of the present invention;
Fig. 2 is the flow chart of another data scanning method provided in the embodiment of the present invention;
Fig. 3 is the structural schematic diagram of a kind of electronic equipment provided in the embodiment of the present invention;
Fig. 4 is a kind of structural schematic diagram of the data scanner provided in the embodiment of the present invention.
Specific embodiment
Hereinafter, the present invention will be described in detail with reference to the accompanying drawings and in combination with Examples.It should be noted that not conflicting
In the case of, the features in the embodiments and the embodiments of the present application can be combined with each other.
Referring to FIG. 1, Fig. 1 is a kind of flow chart of the data scanning method provided in the embodiment of the present invention, this method is answered
For may comprise steps of in database:
Step 102, the subset information of the corresponding tables of data of current task is obtained.
In the present embodiment, technical solution of the present invention can be applied to various types of databases;For example, the number
It can be HBase database according to library, but the present invention limits not to this.
For ease of description, it is hereafter illustrated by taking HBase database as an example.In HBase database, with each number
Increase according to the data recorded in table (or database table), HBase can divide data according to Rowkey (line unit), shape
At multiple line unit sections (Region), each line unit section is used as the subset of corresponding data table, and only records corresponding line unit
The data of range (by corresponding starting Rowkey and terminate Rowkey and determine), i.e., the Rowkey value between each line unit section is not
It can be overlapped.Therefore, after determining the tables of data that needs are inquired, so that it may know corresponding subset information, i.e., in the tables of data
Line unit interval division situation.
Step 104, according to the subset information, judge that each scan instruction (Scan) for including in the current task is being looked into
The subset covered when asking the tables of data.
In the present embodiment, the Sao Miao start-stop line unit of each scan instruction and the start-stop line unit of each subset can be compared
Compared with;If there is at least part weight in the Sao Miao start-stop line unit range of any scan instruction and the start-stop line unit range of any subset
It is folded, then determine that any scan instruction covers any subset.It in this embodiment, can by the comparison to start-stop line unit
With the subset of each scan instruction of accurate judgement corresponding covering in tables of data.
Step 106, start corresponding scan procedure correspondingly with capped subset, to carry out data scanning;Its
In, when being applied to HBase database, which can be the map scan procedure of Map-reduce Computational frame starting.
In the present embodiment, by starting a scan procedure for each subset is corresponding, according to the subset currently covered
Quantity start the scan procedure of identical quantity, and be different from and correspond to each scan instruction in the related technology and start a scanning
Process can effectively reduce started scan procedure quantity, so that scan procedure be avoided to consume in starting and calling process
Time, facilitate promoted data scanning efficiency.
As can be seen from the above embodiments, the present invention is by being divided the data for belonging to same subset by the same map process
Analysis, thus when including many scan instructions in the current task for the same tables of data, especially when multiple scan instructions
When covering same subset, the map process of starting can be effectively reduced, facilitate reduce task schedule in terms of expense with prolong
Late, task global analysis performance is promoted.
It further, on the basis of the above embodiments, in another embodiment can also include: when any capped
When there is the data slot by the repetition inquiry of multiple scan instructions in subset, merges the repetition to the data slot and inquire behaviour
Make.
In other words, if the sweep interval (line unit section) of two scan instructions has at least part to overlap,
By the merging treatment to lap, multiple scanning can be distinguished by two scan instructions to avoid identical data, can kept away
Replicate analysis and the wasting of resources for exempting from data facilitate the speed and efficiency that promote data scanning.
Fig. 2 is the flow chart of another data scanning method provided in the embodiment of the present invention, as shown in Fig. 2, this method
It may comprise steps of:
Step 202, judge that the scan instruction for including in current task (such as List<Scan>) (i.e. includes in List<Scan>
Scan instruction) number whether be less than or equal to preset value, if it is less than or be equal to the preset value, then enter step 204, it is no
Then enter step 206.
In the present embodiment, since the present invention is before finally executing data scanning, it is added to determining scan instruction covering
Subset, these steps need to consume corresponding extra process duration;Therefore, if the scanning for including in current task
When the negligible amounts of instruction, it may shorten although finally executing the duration of data scanning, since there are above-mentioned additional places
Duration is managed, it is possible that causing longer time-consuming instead.
Therefore, by the quantity for the scan instruction for judging to include in current task in advance, can be to negligible amounts the case where
It carries out the direct processing based on the relevant technologies and (is transferred to step 204), and to the more situation of quantity technology according to the invention
Scheme, which is handled, (is transferred to step 206).
Step 204A, respectively each scan instruction starts corresponding scan procedure, and (such as Map-reduce Computational frame opens
Dynamic map process), to execute the corresponding inquiry operation of each scan instruction respectively.
Step 204B be grouped by table to current task List<Scan>.
In the present embodiment, since every Scan instruction may cover the data in multiple tables of data simultaneously, thus can be with
Different data table is grouped, for example each tables of data is one group, to be scanned processing to every group of tables of data respectively.
Step 206, determine that each Scan instructs the subset (Region) covered in each tables of data respectively.
In the present embodiment, the Region information of each tables of data, the starting comprising each Region can be obtained respectively
Rowkey and termination Rowkey, and combine the scanning starting Rowkey of the corresponding every Scan instruction of each tables of data and scan eventually
Only Rowkey, so that two start-stop ranges are compared to obtain data sectional of each Scan instruction on each Region,
Coverage condition of i.e. each Scan instruction to each Region.
Step 208A generates corresponding scan procedure for capped each subset one by one.
In the present embodiment, for the determining all Region for being instructed and covering by Scan, start phase correspondingly
With the scan procedure (such as map process of Map-reduce Computational frame starting) of quantity, even if so that same Region is multiple
Scan instruction is covered, and is still only needed to start a map process, can be effectively reduced the map number of processes of starting, from
And facilitate lifting system response speed.
Step 208B merges the data slot that each scan instruction repeats inquiry.
In the present embodiment, when multiple Scan are instructed while being covered same Region, multiple Scan instructions are at this
Covered on Region start-stop range (by scanning starting Rowkey and scanning terminate Rowkey delimit) between there may be repetition,
Then the corresponding data slot range of repeating part can be merged, so that the corresponding data slot of repeating part only needs to hold
Row single pass processing, it is clear that can greatly improving sweep efficiency.
Certainly, step 208B is not necessarily executed;After can be merely through the processing of step 208A, it be directly transferred to step 210.
Step 210, scan process is executed.
It,, can be with based on the processing of step 208A when having many Scan in same tables of data by above-mentioned processing method
So that the map process started substantially reduces, by allowing each map process to analyze some data, and reduce task schedule band more
The expense come optimizes to reach final performance;Meanwhile based on the data that in step 208B multiple Scan are instructed with repetition inquiry
Segment ranges merge, it is possible to reduce inquiry data volume of every map process in corresponding Region is swept with further speeding up
Retouch speed, lifting system treatment effeciency.
Fig. 3 shows the schematic configuration diagram of the electronic equipment of the exemplary embodiment according to the application.Referring to FIG. 3,
In hardware view, which includes processor, internal bus, network interface, memory and nonvolatile memory, certainly
It is also possible that hardware required for other business.Processor read from nonvolatile memory corresponding computer program to
It is then run in memory, forms data scanner on logic level.Certainly, other than software realization mode, the application
Other implementations, such as logical device or the mode of software and hardware combining etc. is not precluded, that is to say, that following processing stream
The executing subject of journey is not limited to each logic unit, is also possible to hardware or logical device.
Referring to FIG. 4, the data scanner may include subset information acquiring unit, son in Software Implementation
Collect judging unit and execution unit.Wherein:
Subset information acquiring unit, for obtaining the subset information of the corresponding tables of data of current task;
Subset judging unit, for judging each scan instruction for including in the current task according to the subset information
The subset covered when inquiring the tables of data;
Execution unit is swept for starting corresponding scan procedure correspondingly with capped subset with carrying out data
It retouches.
Optionally, the subset judging unit is specifically used for:
The Sao Miao start-stop line unit of each scan instruction is compared with the start-stop line unit of each subset;
The Sao Miao start-stop line unit range of a scan instruction in office and the start-stop line unit range of any subset have at least one
When dividing overlapping, determine that any scan instruction covers any subset.
Optionally, the execution unit is specifically used for:
When existing in any capped subset, when repeating the data slot of inquiry by multiple scan instructions, merging is to described
The repetition inquiry operation of data slot.
Optionally, the execution unit is specifically used for:
When the number of the scan instruction is less than or equal to preset value, respectively each scan instruction starting is corresponding to be swept
Process is retouched, to execute the corresponding inquiry operation of each scan instruction respectively.
Optionally, the tables of data is HBase database table, and the scan procedure is the starting of Map-reduce Computational frame
Map process.
Through the invention, by the data for belonging to same subset in the data of the scanned covering of more Scan be placed on a Map into
It is analyzed in journey, reduces analysis task totality Map quantity, so that expense and delay in terms of reducing task schedule, it is whole to promote task
Body analyzes performance.In addition, by the repeated data of more Scan data in analysis a subset, and the repeated data is filtered out,
Analysis performance can further be promoted and reduce disk input and output.By verifying, one of 700 bayonet Scan is studied and judged
Business can make to study and judge 1/6 that analysis efficiency is optimized to before optimization by such analysis.
For device embodiment, since it corresponds essentially to embodiment of the method, so related place is referring to method reality
Apply the part explanation of example.The apparatus embodiments described above are merely exemplary, wherein described be used as separation unit
The unit of explanation may or may not be physically separated, and component shown as a unit can be or can also be with
It is not physical unit, it can it is in one place, or may be distributed over multiple network units.It can be according to actual
The purpose for needing to select some or all of the modules therein to realize the present invention program.Those of ordinary skill in the art are not paying
Out in the case where creative work, it can understand and implement.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention
Within mind and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the present invention.
Claims (8)
1. a kind of data scanning method characterized by comprising
Obtain the subset information of the corresponding tables of data of current task;
According to the subset information, each scan instruction for including in current task covering when inquiring the tables of data is judged
Subset;
Start corresponding scan procedure correspondingly with capped subset, to carry out data scanning;
Wherein, according to the subset information, judge that each scan instruction for including in the current task is inquiring the tables of data
When the process of subset that covers, comprising:
The Sao Miao start-stop line unit of each scan instruction is compared with the start-stop line unit of each subset;
If there is at least part weight in the Sao Miao start-stop line unit range of any scan instruction and the start-stop line unit range of any subset
It is folded, then determine that any scan instruction covers any subset.
2. data scanning method according to claim 1, which is characterized in that when carrying out data scanning, comprising:
When existing in any capped subset, when repeating the data slot of inquiry by multiple scan instructions, merging is to the data
The repetition inquiry operation of segment.
3. data scanning method according to claim 1, which is characterized in that further include:
When the number of the scan instruction is less than or equal to preset value, respectively each scan instruction starting is corresponding to be scanned into
Journey, to execute the corresponding inquiry operation of each scan instruction respectively.
4. data scanning method according to any one of claim 1 to 3, which is characterized in that the tables of data is HBase
Database table, the scan procedure are the scan procedure of Map-reduce Computational frame starting.
5. a kind of data scanner characterized by comprising
Subset information acquiring unit, for obtaining the subset information of the corresponding tables of data of current task;
Subset judging unit, for judging that each scan instruction for including in the current task is being looked into according to the subset information
The subset covered when asking the tables of data;
Execution unit, for starting corresponding scan procedure correspondingly with capped subset, to carry out data scanning;
Wherein, the subset judging unit is specifically used for:
The Sao Miao start-stop line unit of each scan instruction is compared with the start-stop line unit of each subset;
There is at least part weight in the Sao Miao start-stop line unit range of a scan instruction in office and the start-stop line unit range of any subset
When folded, determine that any scan instruction covers any subset.
6. data scanner according to claim 5, which is characterized in that the execution unit is specifically used for:
When existing in any capped subset, when repeating the data slot of inquiry by multiple scan instructions, merging is to the data
The repetition inquiry operation of segment.
7. data scanner according to claim 5, which is characterized in that the execution unit is specifically used for:
When the number of the scan instruction is less than or equal to preset value, respectively each scan instruction starting is corresponding to be scanned into
Journey, to execute the corresponding inquiry operation of each scan instruction respectively.
8. data scanner according to any one of claims 5 to 7, which is characterized in that the tables of data is HBase
Database table, the scan procedure are the map process of Map-reduce Computational frame starting.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510898272.2A CN105487925B (en) | 2015-12-08 | 2015-12-08 | data scanning method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510898272.2A CN105487925B (en) | 2015-12-08 | 2015-12-08 | data scanning method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105487925A CN105487925A (en) | 2016-04-13 |
CN105487925B true CN105487925B (en) | 2019-01-15 |
Family
ID=55674919
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510898272.2A Active CN105487925B (en) | 2015-12-08 | 2015-12-08 | data scanning method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105487925B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105956043A (en) * | 2016-04-26 | 2016-09-21 | 海尔优家智能科技(北京)有限公司 | Method and device for allocating Map task for MapReduce running on Hbase database |
CN110489478A (en) * | 2019-08-27 | 2019-11-22 | 恩亿科(北京)数据科技有限公司 | A kind of method and device of data scanning |
CN111427887A (en) * | 2020-03-17 | 2020-07-17 | 中国邮政储蓄银行股份有限公司 | Method, device and system for rapidly scanning HBase partition table |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102576369A (en) * | 2009-08-24 | 2012-07-11 | 阿玛得斯两合公司 | Continuous full scan data store table and distributed data store featuring predictable answer time for unpredictable workload |
CN103902544A (en) * | 2012-12-25 | 2014-07-02 | 中国移动通信集团公司 | Data processing method and system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9141664B2 (en) * | 2009-08-31 | 2015-09-22 | Hewlett-Packard Development Company, L.P. | System and method for optimizing queries |
-
2015
- 2015-12-08 CN CN201510898272.2A patent/CN105487925B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102576369A (en) * | 2009-08-24 | 2012-07-11 | 阿玛得斯两合公司 | Continuous full scan data store table and distributed data store featuring predictable answer time for unpredictable workload |
CN103902544A (en) * | 2012-12-25 | 2014-07-02 | 中国移动通信集团公司 | Data processing method and system |
Also Published As
Publication number | Publication date |
---|---|
CN105487925A (en) | 2016-04-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10389813B2 (en) | Reconfigurable cloud computing | |
CN105487925B (en) | data scanning method and device | |
CN108415830B (en) | Method and device for generating software test case | |
EP2985730A1 (en) | Method and device for partially-upgrading | |
US20200210815A1 (en) | Output method and apparatus for multiple neural network, server and computer readable storage medium | |
CN111596927B (en) | Service deployment method and device and electronic equipment | |
EP3855362A1 (en) | Convolution processing method, apparatus, and storage medium of convolutional neural network | |
CN108255689A (en) | A kind of Apache Spark application automation tuning methods based on historic task analysis | |
CN110941553A (en) | Code detection method, device, equipment and readable storage medium | |
CN110674083A (en) | Workflow migration method, device, equipment and computer readable storage medium | |
CN108023905B (en) | Internet of things application system and method | |
CN110471718B (en) | Task processing method and device | |
CN109522202B (en) | Software testing method and device | |
CN117235527A (en) | End-to-end containerized big data model construction method, device, equipment and medium | |
CN112054935A (en) | Extensible service quality diagnosis configuration method and system | |
CN116126937A (en) | Job scheduling method, job scheduling device, electronic equipment and storage medium | |
CN115511060A (en) | Model conversion method, device, storage medium and electronic device | |
CN106970837B (en) | Information processing method and electronic equipment | |
CN106951236B (en) | Plug-in development method and device | |
CN114595146A (en) | AB test method, device, system, electronic equipment and medium | |
CN110806895A (en) | Project creation method and device and computer readable storage medium | |
CN110968504A (en) | Test method, test platform, electronic device and computer storage medium | |
CN115114136A (en) | Test data generation method and device, electronic equipment and program product | |
CN107085536B (en) | Task management method and device | |
CN104778244B (en) | The searching method and device of data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |