CN106302662A - A kind of MR operation method of saving network flow based on Hbase - Google Patents

A kind of MR operation method of saving network flow based on Hbase Download PDF

Info

Publication number
CN106302662A
CN106302662A CN201610628407.8A CN201610628407A CN106302662A CN 106302662 A CN106302662 A CN 106302662A CN 201610628407 A CN201610628407 A CN 201610628407A CN 106302662 A CN106302662 A CN 106302662A
Authority
CN
China
Prior art keywords
mapreduce
hbase
computing unit
operation method
network flow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610628407.8A
Other languages
Chinese (zh)
Inventor
赵明超
牛硕
臧勇真
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN201610628407.8A priority Critical patent/CN106302662A/en
Publication of CN106302662A publication Critical patent/CN106302662A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/15Flow control; Congestion control in relation to multipoint traffic

Abstract

The invention discloses the MR operation method of a kind of saving network flow based on Hbase, belong to, it is big that the present invention solves network overhead, and the centralization of state power has the problem of the risk of network paralysis, and the technical scheme of employing is: step is as follows: (1), realize the InputFormat method of Mapreduce;(2) all big data block information of Hbase table, is obtained;(3), according to each data block, their bottom document is obtained;(4), using the bottom document of all data blocks that gets as the input of Mapreduce;With each bottom document as computing unit, perform mapreduce;(5), perform reduce, terminate mapreduce.

Description

A kind of MR operation method of saving network flow based on Hbase
Technical field
The present invention relates to one, the MR operation method of a kind of saving network flow based on Hbase.
Background technology
The world today, the daily operation of company often generates the data of TB rank.Data Source enumerates the Internet dress Put any categorical data that can capture, website, social media, trade type business data and other business environment create Data.In view of the growing amount of data, it is processed in real time for the overriding challenge faced by many mechanisms needs.
MR is the abbreviation of mapreduce, and MapReduce is a kind of programming model, for large-scale dataset (more than 1TB) Concurrent operation.Concept " Map (mapping) " and " Reduce (reduction) ", and their main thought, be all to program from functional expression Borrow in language, also have the characteristic borrowed from vector programming language.It is very easy to programming personnel and will not be distributed In the case of formula multiple programming, the program of oneself is operated in distributed system.Current software realizes being to specify one Map (mapping) function, is used for one group of key-value pair to be mapped to one group of new key-value pair, it is intended that concurrent Reduce (reduction) letter Number, each being used for ensureing in the key-value pair of all mappings shares identical key group.
Use hbase to run MR, run on node owing to HBase bottom data file own is the most all distributed in MR. Therefore, when performing MR, MR performs node can read the data file on other nodes by across a network, thus causes the most extra Network overhead.It is that the Mapreduce of traditional Hbase easily causes the biggest network when company-data has upper TB or PB Expense, makes the centralization of state power have the risk of network paralysis.
Summary of the invention
The technical assignment of the present invention is to provide the MR operation method of a kind of saving network flow based on Hbase, solves Network overhead is big, the problem that the centralization of state power has the risk of network paralysis.
The technical assignment of the present invention realizes in the following manner,
A kind of MR operation method of saving network flow based on Hbase, step is as follows:
(1) the InputFormat method of Mapreduce, is realized;
(2) all big data block (Region) information of Hbase table, is obtained;
(3), according to each data block, their bottom document (Hfile) is obtained;
(4), using the bottom document of all data blocks that gets as the input of Mapreduce;With each bottom document for counting Calculate unit, perform mapreduce;
(5), perform reduce, terminate mapreduce.
In step (4), perform mapreduce, MapReduce by the large-scale operation of data set is distributed to network On each computing unit realize reliability;Each computing unit periodically returns work and the up-to-date shape that it is completed State.
The time interval default more than one if a computing unit is kept silent, main computation unit (similar Google Master server in File System) record this computing unit state for death, and distributing to this computing unit Data are dealt into other computing unit.
The MR operation method of a kind of based on Hbase saving network flow of the present invention has the advantage that and combines MR's Execution feature and the storage characteristics of HBase data, directly perform MR in each data file, fundamentally solve The problem that Mapreduce initial operating stage cross-node fetches data, thus well save network overhead, there is well popularization and make By value.
Accompanying drawing explanation
The present invention is further described below in conjunction with the accompanying drawings.
Accompanying drawing 1 is the flow chart of the MR operation method of a kind of saving network flow based on Hbase.
Detailed description of the invention
With reference to Figure of description and specific embodiment, the MR of a kind of based on Hbase saving network flow of the present invention is transported Row method is described in detail below.
Embodiment:
A kind of MR operation method of based on Hbase saving network flow of the present invention, step is as follows:
(1) the InputFormat method of Mapreduce, is realized;
(2) all big data block (Region) information of Hbase table, is obtained;
(3), according to each data block, their bottom document (Hfile) is obtained;
(4), using the bottom document of all data blocks that gets as the input of Mapreduce;With each bottom document for counting Calculate unit, perform mapreduce;
(5), perform reduce, terminate mapreduce.
In step (4), perform mapreduce, MapReduce by the large-scale operation of data set is distributed to network On each computing unit realize reliability;Each computing unit periodically returns work and the up-to-date shape that it is completed State.
The time interval default more than one if a computing unit is kept silent, main computation unit (similar Google Master server in File System) record this computing unit state for death, and distributing to this computing unit Data are dealt into other computing unit.
One, mapping and abbreviation
In brief, a mapping function is exactly (such as, to test into some independent elementary composition notional lists for one The list of achievement) the operation that carries out specifying of each element (in such as the example above, it has been found that the achievement of all students is all Being overestimated one point, it can define the mapping function of " subtracting ", is used for revising this mistake.).It is true that each unit Element is all independently operated, and original list has not changed as, because it is new to preserve to create a new list here Answer.In other words, Map operation can be with highly-parallel, and this is to the application of high performance requirements and parallel computation field Demand is highly useful.
And Reduction refer to the element of a list is carried out suitable merging (continue with the example above, if Someone wonder class average mark this how to do?It can define a Reduce function, by allow element in list with from The mode that oneself adjacent element is added halves list, and such recursive operation, until list only surplus next element, is then used This element, divided by number, has just obtained average mark.).Although he is so parallel not as mapping function, but is always because abbreviation Having a simple answer, large-scale computing is relatively independent, so Reduce function is the most very useful under highly-parallel environment.
Two, distribution is reliable
MapReduce is by realizing reliability each node being distributed to the large-scale operation of data set on network;Each Node can periodically return work and the up-to-date state that it is completed.If a node is kept silent preset more than one Time interval, it is dead that host node (master server in similar Google File System) records this node state Die, and the data distributing to this node are dealt into other node.Each atomic operation of name file that operates with is to guarantee Conflict between parallel thread will not occur;When file is renamed when, system may copy to them beyond task name Another name up.(avoiding side effect).
Reduction working method is similar to therewith, but due to Reduction can concurrency relatively poor, host node meeting Reduction is only distributed on one node as far as possible, or from need the data that operate as far as possible close to node on;This is special Property can meet the demand of Google, because they have enough bandwidth, their internal network does not has so much machine.
By detailed description of the invention above, described those skilled in the art can readily realize the present invention.But should Working as understanding, the present invention is not limited to above-mentioned detailed description of the invention.On the basis of disclosed embodiment, described technical field Technical staff can the different technical characteristic of combination in any, thus realize different technical schemes.
In addition to the technical characteristic described in description, it is the known technology of those skilled in the art.

Claims (3)

1. the MR operation method of a saving network flow based on Hbase, it is characterised in that step is as follows:
(1) the InputFormat method of Mapreduce, is realized;
(2) all big data block information of Hbase table, is obtained;
(3), according to each data block, their bottom document is obtained;
(4), using the bottom document of all data blocks that gets as the input of Mapreduce;With each bottom document for counting Calculate unit, perform mapreduce;
(5), perform reduce, terminate mapreduce.
The MR operation method of a kind of saving network flow based on Hbase the most according to claim 1, it is characterised in that step Suddenly, in (4), mapreduce, MapReduce are performed by each meter that the large-scale operation of data set is distributed on network Calculate unit and realize reliability;Each computing unit periodically returns work and the up-to-date state that it is completed.
The MR operation method of a kind of saving network flow based on Hbase the most according to claim 2, it is characterised in that if One computing unit is kept silent the time interval default more than, and main computation unit is recorded this computing unit state and is Death, and the data distributing to this computing unit are dealt into other computing unit.
CN201610628407.8A 2016-08-03 2016-08-03 A kind of MR operation method of saving network flow based on Hbase Pending CN106302662A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610628407.8A CN106302662A (en) 2016-08-03 2016-08-03 A kind of MR operation method of saving network flow based on Hbase

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610628407.8A CN106302662A (en) 2016-08-03 2016-08-03 A kind of MR operation method of saving network flow based on Hbase

Publications (1)

Publication Number Publication Date
CN106302662A true CN106302662A (en) 2017-01-04

Family

ID=57664543

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610628407.8A Pending CN106302662A (en) 2016-08-03 2016-08-03 A kind of MR operation method of saving network flow based on Hbase

Country Status (1)

Country Link
CN (1) CN106302662A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110066649A1 (en) * 2009-09-14 2011-03-17 Myspace, Inc. Double map reduce distributed computing framework
CN103645952A (en) * 2013-08-08 2014-03-19 中国人民解放军国防科学技术大学 Non-accurate task parallel processing method based on MapReduce
CN103984926A (en) * 2014-05-15 2014-08-13 江苏科大汇峰科技有限公司 Distributed moving object detection method based on MapReduce calculation model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110066649A1 (en) * 2009-09-14 2011-03-17 Myspace, Inc. Double map reduce distributed computing framework
CN103645952A (en) * 2013-08-08 2014-03-19 中国人民解放军国防科学技术大学 Non-accurate task parallel processing method based on MapReduce
CN103984926A (en) * 2014-05-15 2014-08-13 江苏科大汇峰科技有限公司 Distributed moving object detection method based on MapReduce calculation model

Similar Documents

Publication Publication Date Title
US9715536B2 (en) Virtualization method for large-scale distributed heterogeneous data
CN104794123B (en) A kind of method and device building NoSQL database indexes for semi-structured data
US20230244684A1 (en) Techniques for decoupling access to infrastructure models
CN103646051A (en) Big-data parallel processing system and method based on column storage
CN107665246B (en) Dynamic data migration method based on graph database and graph database cluster
CN105159971B (en) A kind of cloud platform data retrieval method
CN109902117A (en) Operation system analysis method and device
CN108717457A (en) A kind of e-commerce platform big data processing method and system
Hashem et al. An Integrative Modeling of BigData Processing.
Singh et al. Spatial data analysis with ArcGIS and MapReduce
CN103365987A (en) Clustered database system and data processing method based on shared-disk framework
Gupta et al. Fair: A hadoop-based hybrid model for faculty information retrieval system
CN110134511A (en) A kind of shared storage optimization method of OpenTSDB
CN105930354A (en) Storage model conversion method and device
Seera et al. Perspective of database services for managing large-scale data on the cloud: a comparative study
CN109388651A (en) A kind of data processing method and device
Liu et al. Efficient social network data query processing on MapReduce
CN106302662A (en) A kind of MR operation method of saving network flow based on Hbase
CN108536696A (en) A kind of database personalized self-service query platform and method
CN107169044A (en) A kind of city talent resource integrated management method
CN106096824A (en) A kind of main distribution integrative graph resource share method
Choudhary et al. Cloud computing and big data analytics
CN105224596A (en) A kind of method of visit data and device
Bhushan et al. Cost based model for big data processing with hadoop architecture
CN104239008A (en) Parallel database management system and design scheme

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170104

WD01 Invention patent application deemed withdrawn after publication