CN105487925B - data scanning method and device - Google Patents

data scanning method and device Download PDF

Info

Publication number
CN105487925B
CN105487925B CN201510898272.2A CN201510898272A CN105487925B CN 105487925 B CN105487925 B CN 105487925B CN 201510898272 A CN201510898272 A CN 201510898272A CN 105487925 B CN105487925 B CN 105487925B
Authority
CN
China
Prior art keywords
subset
data
scan
scan instruction
tables
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510898272.2A
Other languages
Chinese (zh)
Other versions
CN105487925A (en
Inventor
周后取
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Uniview Technologies Co Ltd
Original Assignee
Zhejiang Uniview Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Uniview Technologies Co Ltd filed Critical Zhejiang Uniview Technologies Co Ltd
Priority to CN201510898272.2A priority Critical patent/CN105487925B/en
Publication of CN105487925A publication Critical patent/CN105487925A/en
Application granted granted Critical
Publication of CN105487925B publication Critical patent/CN105487925B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of data scanning method and device, and this method may include: to obtain the subset information of the corresponding tables of data of current task;According to subset information, the subset that each scan instruction for including in current task is covered when inquiring tables of data is judged;Start corresponding scan procedure correspondingly with capped subset, to carry out data scanning.Analysis task totality Map quantity can be reduced through the invention, so that expense and delay in terms of reducing task schedule, promote task global analysis performance.

Description

Data scanning method and device
Technical field
The present invention relates to field of computer technology more particularly to a kind of data scanning methods and device.
Background technique
In the related art, large-scale data can be realized by Computational frames such as Map-Reduce (mapping-reduction) Collect parallel parsing, for example HBase (Hadoop Database) database also provides through Map-Reduce and analyzes its data Library function, allow user by way of inputting List<Scan>task, initiate to the scanning analysis of database table.
However, the relevant technologies in the treatment process to List<Scan>task, need to sweep for each Scan therein It retouches instruction and corresponding Map scan procedure is respectively started to carry out data scanning, and usually often wrapped in List<Scan>task Containing many Scan scan instructions, in some instances it may even be possible to reach a Scan scan instructions up to a hundred, result in the need for starting simultaneously very more Map Scan procedure, and the time that consumption is exited in the scheduling of each Map scan procedure and starting is long, eventually leads to List<Scan> Task needs to expend for a long time.
Summary of the invention
In view of this, the present invention provides a kind of data scanning method and device, to solve above-mentioned technology in the related technology Problem.
The present invention provides the following technical scheme that
According to the first aspect of the invention, a kind of data scanning method is proposed, comprising:
Obtain the subset information of the corresponding tables of data of current task;
According to the subset information, judge each scan instruction for including in the current task when inquiring the tables of data The subset of covering;
Start corresponding scan procedure correspondingly with capped subset, to carry out data scanning.
According to the second aspect of the invention, a kind of data scanner is proposed, comprising:
Subset information acquiring unit, for obtaining the subset information of the corresponding tables of data of current task;
Subset judging unit, for judging each scan instruction for including in the current task according to the subset information The subset covered when inquiring the tables of data;
Execution unit is swept for starting corresponding scan procedure correspondingly with capped subset with carrying out data.
By above technical scheme as it can be seen that the son that the present invention is covered by analyzing each scan instruction when inquiring data Collect, and start the scan procedure of respective numbers with the quantity of the subset of covering, is i.e. starting scans correspondingly with each subset Process, so as to reduce analysis task totality Map quantity, so that expense and delay in terms of reducing task schedule, are promoted and appointed Business global analysis performance.
Detailed description of the invention
Fig. 1 is a kind of flow chart of the data scanning method provided in the embodiment of the present invention;
Fig. 2 is the flow chart of another data scanning method provided in the embodiment of the present invention;
Fig. 3 is the structural schematic diagram of a kind of electronic equipment provided in the embodiment of the present invention;
Fig. 4 is a kind of structural schematic diagram of the data scanner provided in the embodiment of the present invention.
Specific embodiment
Hereinafter, the present invention will be described in detail with reference to the accompanying drawings and in combination with Examples.It should be noted that not conflicting In the case of, the features in the embodiments and the embodiments of the present application can be combined with each other.
Referring to FIG. 1, Fig. 1 is a kind of flow chart of the data scanning method provided in the embodiment of the present invention, this method is answered For may comprise steps of in database:
Step 102, the subset information of the corresponding tables of data of current task is obtained.
In the present embodiment, technical solution of the present invention can be applied to various types of databases;For example, the number It can be HBase database according to library, but the present invention limits not to this.
For ease of description, it is hereafter illustrated by taking HBase database as an example.In HBase database, with each number Increase according to the data recorded in table (or database table), HBase can divide data according to Rowkey (line unit), shape At multiple line unit sections (Region), each line unit section is used as the subset of corresponding data table, and only records corresponding line unit The data of range (by corresponding starting Rowkey and terminate Rowkey and determine), i.e., the Rowkey value between each line unit section is not It can be overlapped.Therefore, after determining the tables of data that needs are inquired, so that it may know corresponding subset information, i.e., in the tables of data Line unit interval division situation.
Step 104, according to the subset information, judge that each scan instruction (Scan) for including in the current task is being looked into The subset covered when asking the tables of data.
In the present embodiment, the Sao Miao start-stop line unit of each scan instruction and the start-stop line unit of each subset can be compared Compared with;If there is at least part weight in the Sao Miao start-stop line unit range of any scan instruction and the start-stop line unit range of any subset It is folded, then determine that any scan instruction covers any subset.It in this embodiment, can by the comparison to start-stop line unit With the subset of each scan instruction of accurate judgement corresponding covering in tables of data.
Step 106, start corresponding scan procedure correspondingly with capped subset, to carry out data scanning;Its In, when being applied to HBase database, which can be the map scan procedure of Map-reduce Computational frame starting.
In the present embodiment, by starting a scan procedure for each subset is corresponding, according to the subset currently covered Quantity start the scan procedure of identical quantity, and be different from and correspond to each scan instruction in the related technology and start a scanning Process can effectively reduce started scan procedure quantity, so that scan procedure be avoided to consume in starting and calling process Time, facilitate promoted data scanning efficiency.
As can be seen from the above embodiments, the present invention is by being divided the data for belonging to same subset by the same map process Analysis, thus when including many scan instructions in the current task for the same tables of data, especially when multiple scan instructions When covering same subset, the map process of starting can be effectively reduced, facilitate reduce task schedule in terms of expense with prolong Late, task global analysis performance is promoted.
It further, on the basis of the above embodiments, in another embodiment can also include: when any capped When there is the data slot by the repetition inquiry of multiple scan instructions in subset, merges the repetition to the data slot and inquire behaviour Make.
In other words, if the sweep interval (line unit section) of two scan instructions has at least part to overlap, By the merging treatment to lap, multiple scanning can be distinguished by two scan instructions to avoid identical data, can kept away Replicate analysis and the wasting of resources for exempting from data facilitate the speed and efficiency that promote data scanning.
Fig. 2 is the flow chart of another data scanning method provided in the embodiment of the present invention, as shown in Fig. 2, this method It may comprise steps of:
Step 202, judge that the scan instruction for including in current task (such as List<Scan>) (i.e. includes in List<Scan> Scan instruction) number whether be less than or equal to preset value, if it is less than or be equal to the preset value, then enter step 204, it is no Then enter step 206.
In the present embodiment, since the present invention is before finally executing data scanning, it is added to determining scan instruction covering Subset, these steps need to consume corresponding extra process duration;Therefore, if the scanning for including in current task When the negligible amounts of instruction, it may shorten although finally executing the duration of data scanning, since there are above-mentioned additional places Duration is managed, it is possible that causing longer time-consuming instead.
Therefore, by the quantity for the scan instruction for judging to include in current task in advance, can be to negligible amounts the case where It carries out the direct processing based on the relevant technologies and (is transferred to step 204), and to the more situation of quantity technology according to the invention Scheme, which is handled, (is transferred to step 206).
Step 204A, respectively each scan instruction starts corresponding scan procedure, and (such as Map-reduce Computational frame opens Dynamic map process), to execute the corresponding inquiry operation of each scan instruction respectively.
Step 204B be grouped by table to current task List<Scan>.
In the present embodiment, since every Scan instruction may cover the data in multiple tables of data simultaneously, thus can be with Different data table is grouped, for example each tables of data is one group, to be scanned processing to every group of tables of data respectively.
Step 206, determine that each Scan instructs the subset (Region) covered in each tables of data respectively.
In the present embodiment, the Region information of each tables of data, the starting comprising each Region can be obtained respectively Rowkey and termination Rowkey, and combine the scanning starting Rowkey of the corresponding every Scan instruction of each tables of data and scan eventually Only Rowkey, so that two start-stop ranges are compared to obtain data sectional of each Scan instruction on each Region, Coverage condition of i.e. each Scan instruction to each Region.
Step 208A generates corresponding scan procedure for capped each subset one by one.
In the present embodiment, for the determining all Region for being instructed and covering by Scan, start phase correspondingly With the scan procedure (such as map process of Map-reduce Computational frame starting) of quantity, even if so that same Region is multiple Scan instruction is covered, and is still only needed to start a map process, can be effectively reduced the map number of processes of starting, from And facilitate lifting system response speed.
Step 208B merges the data slot that each scan instruction repeats inquiry.
In the present embodiment, when multiple Scan are instructed while being covered same Region, multiple Scan instructions are at this Covered on Region start-stop range (by scanning starting Rowkey and scanning terminate Rowkey delimit) between there may be repetition, Then the corresponding data slot range of repeating part can be merged, so that the corresponding data slot of repeating part only needs to hold Row single pass processing, it is clear that can greatly improving sweep efficiency.
Certainly, step 208B is not necessarily executed;After can be merely through the processing of step 208A, it be directly transferred to step 210.
Step 210, scan process is executed.
It,, can be with based on the processing of step 208A when having many Scan in same tables of data by above-mentioned processing method So that the map process started substantially reduces, by allowing each map process to analyze some data, and reduce task schedule band more The expense come optimizes to reach final performance;Meanwhile based on the data that in step 208B multiple Scan are instructed with repetition inquiry Segment ranges merge, it is possible to reduce inquiry data volume of every map process in corresponding Region is swept with further speeding up Retouch speed, lifting system treatment effeciency.
Fig. 3 shows the schematic configuration diagram of the electronic equipment of the exemplary embodiment according to the application.Referring to FIG. 3, In hardware view, which includes processor, internal bus, network interface, memory and nonvolatile memory, certainly It is also possible that hardware required for other business.Processor read from nonvolatile memory corresponding computer program to It is then run in memory, forms data scanner on logic level.Certainly, other than software realization mode, the application Other implementations, such as logical device or the mode of software and hardware combining etc. is not precluded, that is to say, that following processing stream The executing subject of journey is not limited to each logic unit, is also possible to hardware or logical device.
Referring to FIG. 4, the data scanner may include subset information acquiring unit, son in Software Implementation Collect judging unit and execution unit.Wherein:
Subset information acquiring unit, for obtaining the subset information of the corresponding tables of data of current task;
Subset judging unit, for judging each scan instruction for including in the current task according to the subset information The subset covered when inquiring the tables of data;
Execution unit is swept for starting corresponding scan procedure correspondingly with capped subset with carrying out data It retouches.
Optionally, the subset judging unit is specifically used for:
The Sao Miao start-stop line unit of each scan instruction is compared with the start-stop line unit of each subset;
The Sao Miao start-stop line unit range of a scan instruction in office and the start-stop line unit range of any subset have at least one When dividing overlapping, determine that any scan instruction covers any subset.
Optionally, the execution unit is specifically used for:
When existing in any capped subset, when repeating the data slot of inquiry by multiple scan instructions, merging is to described The repetition inquiry operation of data slot.
Optionally, the execution unit is specifically used for:
When the number of the scan instruction is less than or equal to preset value, respectively each scan instruction starting is corresponding to be swept Process is retouched, to execute the corresponding inquiry operation of each scan instruction respectively.
Optionally, the tables of data is HBase database table, and the scan procedure is the starting of Map-reduce Computational frame Map process.
Through the invention, by the data for belonging to same subset in the data of the scanned covering of more Scan be placed on a Map into It is analyzed in journey, reduces analysis task totality Map quantity, so that expense and delay in terms of reducing task schedule, it is whole to promote task Body analyzes performance.In addition, by the repeated data of more Scan data in analysis a subset, and the repeated data is filtered out, Analysis performance can further be promoted and reduce disk input and output.By verifying, one of 700 bayonet Scan is studied and judged Business can make to study and judge 1/6 that analysis efficiency is optimized to before optimization by such analysis.
For device embodiment, since it corresponds essentially to embodiment of the method, so related place is referring to method reality Apply the part explanation of example.The apparatus embodiments described above are merely exemplary, wherein described be used as separation unit The unit of explanation may or may not be physically separated, and component shown as a unit can be or can also be with It is not physical unit, it can it is in one place, or may be distributed over multiple network units.It can be according to actual The purpose for needing to select some or all of the modules therein to realize the present invention program.Those of ordinary skill in the art are not paying Out in the case where creative work, it can understand and implement.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Within mind and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the present invention.

Claims (8)

1. a kind of data scanning method characterized by comprising
Obtain the subset information of the corresponding tables of data of current task;
According to the subset information, each scan instruction for including in current task covering when inquiring the tables of data is judged Subset;
Start corresponding scan procedure correspondingly with capped subset, to carry out data scanning;
Wherein, according to the subset information, judge that each scan instruction for including in the current task is inquiring the tables of data When the process of subset that covers, comprising:
The Sao Miao start-stop line unit of each scan instruction is compared with the start-stop line unit of each subset;
If there is at least part weight in the Sao Miao start-stop line unit range of any scan instruction and the start-stop line unit range of any subset It is folded, then determine that any scan instruction covers any subset.
2. data scanning method according to claim 1, which is characterized in that when carrying out data scanning, comprising:
When existing in any capped subset, when repeating the data slot of inquiry by multiple scan instructions, merging is to the data The repetition inquiry operation of segment.
3. data scanning method according to claim 1, which is characterized in that further include:
When the number of the scan instruction is less than or equal to preset value, respectively each scan instruction starting is corresponding to be scanned into Journey, to execute the corresponding inquiry operation of each scan instruction respectively.
4. data scanning method according to any one of claim 1 to 3, which is characterized in that the tables of data is HBase Database table, the scan procedure are the scan procedure of Map-reduce Computational frame starting.
5. a kind of data scanner characterized by comprising
Subset information acquiring unit, for obtaining the subset information of the corresponding tables of data of current task;
Subset judging unit, for judging that each scan instruction for including in the current task is being looked into according to the subset information The subset covered when asking the tables of data;
Execution unit, for starting corresponding scan procedure correspondingly with capped subset, to carry out data scanning;
Wherein, the subset judging unit is specifically used for:
The Sao Miao start-stop line unit of each scan instruction is compared with the start-stop line unit of each subset;
There is at least part weight in the Sao Miao start-stop line unit range of a scan instruction in office and the start-stop line unit range of any subset When folded, determine that any scan instruction covers any subset.
6. data scanner according to claim 5, which is characterized in that the execution unit is specifically used for:
When existing in any capped subset, when repeating the data slot of inquiry by multiple scan instructions, merging is to the data The repetition inquiry operation of segment.
7. data scanner according to claim 5, which is characterized in that the execution unit is specifically used for:
When the number of the scan instruction is less than or equal to preset value, respectively each scan instruction starting is corresponding to be scanned into Journey, to execute the corresponding inquiry operation of each scan instruction respectively.
8. data scanner according to any one of claims 5 to 7, which is characterized in that the tables of data is HBase Database table, the scan procedure are the map process of Map-reduce Computational frame starting.
CN201510898272.2A 2015-12-08 2015-12-08 data scanning method and device Active CN105487925B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510898272.2A CN105487925B (en) 2015-12-08 2015-12-08 data scanning method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510898272.2A CN105487925B (en) 2015-12-08 2015-12-08 data scanning method and device

Publications (2)

Publication Number Publication Date
CN105487925A CN105487925A (en) 2016-04-13
CN105487925B true CN105487925B (en) 2019-01-15

Family

ID=55674919

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510898272.2A Active CN105487925B (en) 2015-12-08 2015-12-08 data scanning method and device

Country Status (1)

Country Link
CN (1) CN105487925B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105956043A (en) * 2016-04-26 2016-09-21 海尔优家智能科技(北京)有限公司 Method and device for allocating Map task for MapReduce running on Hbase database
CN110489478A (en) * 2019-08-27 2019-11-22 恩亿科(北京)数据科技有限公司 A kind of method and device of data scanning
CN111427887A (en) * 2020-03-17 2020-07-17 中国邮政储蓄银行股份有限公司 Method, device and system for rapidly scanning HBase partition table

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102576369A (en) * 2009-08-24 2012-07-11 阿玛得斯两合公司 Continuous full scan data store table and distributed data store featuring predictable answer time for unpredictable workload
CN103902544A (en) * 2012-12-25 2014-07-02 中国移动通信集团公司 Data processing method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9141664B2 (en) * 2009-08-31 2015-09-22 Hewlett-Packard Development Company, L.P. System and method for optimizing queries

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102576369A (en) * 2009-08-24 2012-07-11 阿玛得斯两合公司 Continuous full scan data store table and distributed data store featuring predictable answer time for unpredictable workload
CN103902544A (en) * 2012-12-25 2014-07-02 中国移动通信集团公司 Data processing method and system

Also Published As

Publication number Publication date
CN105487925A (en) 2016-04-13

Similar Documents

Publication Publication Date Title
US10389813B2 (en) Reconfigurable cloud computing
CN105487925B (en) data scanning method and device
CN108415830B (en) Method and device for generating software test case
EP2985730A1 (en) Method and device for partially-upgrading
US20200210815A1 (en) Output method and apparatus for multiple neural network, server and computer readable storage medium
CN111596927B (en) Service deployment method and device and electronic equipment
EP3855362A1 (en) Convolution processing method, apparatus, and storage medium of convolutional neural network
CN108255689A (en) A kind of Apache Spark application automation tuning methods based on historic task analysis
CN110941553A (en) Code detection method, device, equipment and readable storage medium
CN110674083A (en) Workflow migration method, device, equipment and computer readable storage medium
CN108023905B (en) Internet of things application system and method
CN110471718B (en) Task processing method and device
CN109522202B (en) Software testing method and device
CN117235527A (en) End-to-end containerized big data model construction method, device, equipment and medium
CN112054935A (en) Extensible service quality diagnosis configuration method and system
CN116126937A (en) Job scheduling method, job scheduling device, electronic equipment and storage medium
CN115511060A (en) Model conversion method, device, storage medium and electronic device
CN106970837B (en) Information processing method and electronic equipment
CN106951236B (en) Plug-in development method and device
CN114595146A (en) AB test method, device, system, electronic equipment and medium
CN110806895A (en) Project creation method and device and computer readable storage medium
CN110968504A (en) Test method, test platform, electronic device and computer storage medium
CN115114136A (en) Test data generation method and device, electronic equipment and program product
CN107085536B (en) Task management method and device
CN104778244B (en) The searching method and device of data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant