CN106302662A - A kind of MR operation method of saving network flow based on Hbase - Google Patents
A kind of MR operation method of saving network flow based on Hbase Download PDFInfo
- Publication number
- CN106302662A CN106302662A CN201610628407.8A CN201610628407A CN106302662A CN 106302662 A CN106302662 A CN 106302662A CN 201610628407 A CN201610628407 A CN 201610628407A CN 106302662 A CN106302662 A CN 106302662A
- Authority
- CN
- China
- Prior art keywords
- mapreduce
- hbase
- computing unit
- operation method
- network flow
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/15—Flow control; Congestion control in relation to multipoint traffic
Abstract
The invention discloses the MR operation method of a kind of saving network flow based on Hbase, belong to, it is big that the present invention solves network overhead, and the centralization of state power has the problem of the risk of network paralysis, and the technical scheme of employing is: step is as follows: (1), realize the InputFormat method of Mapreduce;(2) all big data block information of Hbase table, is obtained;(3), according to each data block, their bottom document is obtained;(4), using the bottom document of all data blocks that gets as the input of Mapreduce;With each bottom document as computing unit, perform mapreduce;(5), perform reduce, terminate mapreduce.
Description
Technical field
The present invention relates to one, the MR operation method of a kind of saving network flow based on Hbase.
Background technology
The world today, the daily operation of company often generates the data of TB rank.Data Source enumerates the Internet dress
Put any categorical data that can capture, website, social media, trade type business data and other business environment create
Data.In view of the growing amount of data, it is processed in real time for the overriding challenge faced by many mechanisms needs.
MR is the abbreviation of mapreduce, and MapReduce is a kind of programming model, for large-scale dataset (more than 1TB)
Concurrent operation.Concept " Map (mapping) " and " Reduce (reduction) ", and their main thought, be all to program from functional expression
Borrow in language, also have the characteristic borrowed from vector programming language.It is very easy to programming personnel and will not be distributed
In the case of formula multiple programming, the program of oneself is operated in distributed system.Current software realizes being to specify one
Map (mapping) function, is used for one group of key-value pair to be mapped to one group of new key-value pair, it is intended that concurrent Reduce (reduction) letter
Number, each being used for ensureing in the key-value pair of all mappings shares identical key group.
Use hbase to run MR, run on node owing to HBase bottom data file own is the most all distributed in MR.
Therefore, when performing MR, MR performs node can read the data file on other nodes by across a network, thus causes the most extra
Network overhead.It is that the Mapreduce of traditional Hbase easily causes the biggest network when company-data has upper TB or PB
Expense, makes the centralization of state power have the risk of network paralysis.
Summary of the invention
The technical assignment of the present invention is to provide the MR operation method of a kind of saving network flow based on Hbase, solves
Network overhead is big, the problem that the centralization of state power has the risk of network paralysis.
The technical assignment of the present invention realizes in the following manner,
A kind of MR operation method of saving network flow based on Hbase, step is as follows:
(1) the InputFormat method of Mapreduce, is realized;
(2) all big data block (Region) information of Hbase table, is obtained;
(3), according to each data block, their bottom document (Hfile) is obtained;
(4), using the bottom document of all data blocks that gets as the input of Mapreduce;With each bottom document for counting
Calculate unit, perform mapreduce;
(5), perform reduce, terminate mapreduce.
In step (4), perform mapreduce, MapReduce by the large-scale operation of data set is distributed to network
On each computing unit realize reliability;Each computing unit periodically returns work and the up-to-date shape that it is completed
State.
The time interval default more than one if a computing unit is kept silent, main computation unit (similar Google
Master server in File System) record this computing unit state for death, and distributing to this computing unit
Data are dealt into other computing unit.
The MR operation method of a kind of based on Hbase saving network flow of the present invention has the advantage that and combines MR's
Execution feature and the storage characteristics of HBase data, directly perform MR in each data file, fundamentally solve
The problem that Mapreduce initial operating stage cross-node fetches data, thus well save network overhead, there is well popularization and make
By value.
Accompanying drawing explanation
The present invention is further described below in conjunction with the accompanying drawings.
Accompanying drawing 1 is the flow chart of the MR operation method of a kind of saving network flow based on Hbase.
Detailed description of the invention
With reference to Figure of description and specific embodiment, the MR of a kind of based on Hbase saving network flow of the present invention is transported
Row method is described in detail below.
Embodiment:
A kind of MR operation method of based on Hbase saving network flow of the present invention, step is as follows:
(1) the InputFormat method of Mapreduce, is realized;
(2) all big data block (Region) information of Hbase table, is obtained;
(3), according to each data block, their bottom document (Hfile) is obtained;
(4), using the bottom document of all data blocks that gets as the input of Mapreduce;With each bottom document for counting
Calculate unit, perform mapreduce;
(5), perform reduce, terminate mapreduce.
In step (4), perform mapreduce, MapReduce by the large-scale operation of data set is distributed to network
On each computing unit realize reliability;Each computing unit periodically returns work and the up-to-date shape that it is completed
State.
The time interval default more than one if a computing unit is kept silent, main computation unit (similar Google
Master server in File System) record this computing unit state for death, and distributing to this computing unit
Data are dealt into other computing unit.
One, mapping and abbreviation
In brief, a mapping function is exactly (such as, to test into some independent elementary composition notional lists for one
The list of achievement) the operation that carries out specifying of each element (in such as the example above, it has been found that the achievement of all students is all
Being overestimated one point, it can define the mapping function of " subtracting ", is used for revising this mistake.).It is true that each unit
Element is all independently operated, and original list has not changed as, because it is new to preserve to create a new list here
Answer.In other words, Map operation can be with highly-parallel, and this is to the application of high performance requirements and parallel computation field
Demand is highly useful.
And Reduction refer to the element of a list is carried out suitable merging (continue with the example above, if
Someone wonder class average mark this how to do?It can define a Reduce function, by allow element in list with from
The mode that oneself adjacent element is added halves list, and such recursive operation, until list only surplus next element, is then used
This element, divided by number, has just obtained average mark.).Although he is so parallel not as mapping function, but is always because abbreviation
Having a simple answer, large-scale computing is relatively independent, so Reduce function is the most very useful under highly-parallel environment.
Two, distribution is reliable
MapReduce is by realizing reliability each node being distributed to the large-scale operation of data set on network;Each
Node can periodically return work and the up-to-date state that it is completed.If a node is kept silent preset more than one
Time interval, it is dead that host node (master server in similar Google File System) records this node state
Die, and the data distributing to this node are dealt into other node.Each atomic operation of name file that operates with is to guarantee
Conflict between parallel thread will not occur;When file is renamed when, system may copy to them beyond task name
Another name up.(avoiding side effect).
Reduction working method is similar to therewith, but due to Reduction can concurrency relatively poor, host node meeting
Reduction is only distributed on one node as far as possible, or from need the data that operate as far as possible close to node on;This is special
Property can meet the demand of Google, because they have enough bandwidth, their internal network does not has so much machine.
By detailed description of the invention above, described those skilled in the art can readily realize the present invention.But should
Working as understanding, the present invention is not limited to above-mentioned detailed description of the invention.On the basis of disclosed embodiment, described technical field
Technical staff can the different technical characteristic of combination in any, thus realize different technical schemes.
In addition to the technical characteristic described in description, it is the known technology of those skilled in the art.
Claims (3)
1. the MR operation method of a saving network flow based on Hbase, it is characterised in that step is as follows:
(1) the InputFormat method of Mapreduce, is realized;
(2) all big data block information of Hbase table, is obtained;
(3), according to each data block, their bottom document is obtained;
(4), using the bottom document of all data blocks that gets as the input of Mapreduce;With each bottom document for counting
Calculate unit, perform mapreduce;
(5), perform reduce, terminate mapreduce.
The MR operation method of a kind of saving network flow based on Hbase the most according to claim 1, it is characterised in that step
Suddenly, in (4), mapreduce, MapReduce are performed by each meter that the large-scale operation of data set is distributed on network
Calculate unit and realize reliability;Each computing unit periodically returns work and the up-to-date state that it is completed.
The MR operation method of a kind of saving network flow based on Hbase the most according to claim 2, it is characterised in that if
One computing unit is kept silent the time interval default more than, and main computation unit is recorded this computing unit state and is
Death, and the data distributing to this computing unit are dealt into other computing unit.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610628407.8A CN106302662A (en) | 2016-08-03 | 2016-08-03 | A kind of MR operation method of saving network flow based on Hbase |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610628407.8A CN106302662A (en) | 2016-08-03 | 2016-08-03 | A kind of MR operation method of saving network flow based on Hbase |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106302662A true CN106302662A (en) | 2017-01-04 |
Family
ID=57664543
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610628407.8A Pending CN106302662A (en) | 2016-08-03 | 2016-08-03 | A kind of MR operation method of saving network flow based on Hbase |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106302662A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110066649A1 (en) * | 2009-09-14 | 2011-03-17 | Myspace, Inc. | Double map reduce distributed computing framework |
CN103645952A (en) * | 2013-08-08 | 2014-03-19 | 中国人民解放军国防科学技术大学 | Non-accurate task parallel processing method based on MapReduce |
CN103984926A (en) * | 2014-05-15 | 2014-08-13 | 江苏科大汇峰科技有限公司 | Distributed moving object detection method based on MapReduce calculation model |
-
2016
- 2016-08-03 CN CN201610628407.8A patent/CN106302662A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110066649A1 (en) * | 2009-09-14 | 2011-03-17 | Myspace, Inc. | Double map reduce distributed computing framework |
CN103645952A (en) * | 2013-08-08 | 2014-03-19 | 中国人民解放军国防科学技术大学 | Non-accurate task parallel processing method based on MapReduce |
CN103984926A (en) * | 2014-05-15 | 2014-08-13 | 江苏科大汇峰科技有限公司 | Distributed moving object detection method based on MapReduce calculation model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9715536B2 (en) | Virtualization method for large-scale distributed heterogeneous data | |
CN104794123B (en) | A kind of method and device building NoSQL database indexes for semi-structured data | |
US20230244684A1 (en) | Techniques for decoupling access to infrastructure models | |
CN103646051A (en) | Big-data parallel processing system and method based on column storage | |
CN107665246B (en) | Dynamic data migration method based on graph database and graph database cluster | |
CN105159971B (en) | A kind of cloud platform data retrieval method | |
CN109902117A (en) | Operation system analysis method and device | |
CN108717457A (en) | A kind of e-commerce platform big data processing method and system | |
Hashem et al. | An Integrative Modeling of BigData Processing. | |
Singh et al. | Spatial data analysis with ArcGIS and MapReduce | |
CN103365987A (en) | Clustered database system and data processing method based on shared-disk framework | |
Gupta et al. | Fair: A hadoop-based hybrid model for faculty information retrieval system | |
CN110134511A (en) | A kind of shared storage optimization method of OpenTSDB | |
CN105930354A (en) | Storage model conversion method and device | |
Seera et al. | Perspective of database services for managing large-scale data on the cloud: a comparative study | |
CN109388651A (en) | A kind of data processing method and device | |
Liu et al. | Efficient social network data query processing on MapReduce | |
CN106302662A (en) | A kind of MR operation method of saving network flow based on Hbase | |
CN108536696A (en) | A kind of database personalized self-service query platform and method | |
CN107169044A (en) | A kind of city talent resource integrated management method | |
CN106096824A (en) | A kind of main distribution integrative graph resource share method | |
Choudhary et al. | Cloud computing and big data analytics | |
CN105224596A (en) | A kind of method of visit data and device | |
Bhushan et al. | Cost based model for big data processing with hadoop architecture | |
CN104239008A (en) | Parallel database management system and design scheme |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170104 |
|
WD01 | Invention patent application deemed withdrawn after publication |