CN110290179A

CN110290179A - A kind of distributed mobile base station data storage system based on Hadoop

Info

Publication number: CN110290179A
Application number: CN201910469125.1A
Authority: CN
Inventors: 郭乃网; 吴力波; 周阳; 马戎; 施政昱; 陈伟; 苏运; 田英杰; 瞿海妮; 张琪祁; 时志雄; 宋岩; 庞天宇; 沈泉江
Original assignee: Fudan University; State Grid Shanghai Electric Power Co Ltd
Current assignee: Fudan University; State Grid Shanghai Electric Power Co Ltd
Priority date: 2019-05-31
Filing date: 2019-05-31
Publication date: 2019-09-27

Abstract

The distributed mobile base station data storage system based on Hadoop that the present invention relates to a kind of, the system includes sequentially connected interface layer, functional layer, data Layer and physical layer, the physical layer includes an at least application server, backup server and core layer switch, each application server, each backup server, data Layer is separately connected core layer switch, the data Layer includes Linux storage cluster, the Linux storage cluster uses Hadoop cluster platform, the Hadoop cluster platform includes YARN, index database, HBase database, Mysql database and the Zookeeper to carry out distributed coordination service, the bottom of the Hadoop cluster platform is equipped with the HD to store the file on all nodes FS.Compared with prior art, the compatibility that the present invention has many advantages, such as hoist capacity, improves data.

Description

A kind of distributed mobile base station data storage system based on Hadoop

Technical field

The present invention relates to mobile base station data technical fields, more particularly, to a kind of mobile base of the distribution based on Hadoop It stands data-storage system.

Background technique

Mobile base station data access technology key point includes: the flexible expansion of 1, data retrieval performance: the big frequency of data scale Degree height there are short-term peak emerge in large numbers as, to platform assembly impact it is larger, due to Distributed Message Queue using trunking mode deployment, can Hardware resource extending transversely according to demand, therefore impact can effectively be shielded by application distribution formula message queue.2, distribution disappears The creation of breath queue theme and tuning: it to achieve the purpose that access magnanimity high-frequency data in real time, needs according to time series data Generate frequency, collection period, measuring point scale, targeted design data distribution strategy；Monitoring data are distributed to by data classification In Distributed Message Queue；The ginseng such as number of partitions, replicator, theme distribution of different classes of data is adjusted according to system load Number；The storage organization of time series data in the distributed message queue is set, realizes high speed writein, and reduce transition overhead；Simultaneously The recovery mechanism based on Distributed Message Queue need to be realized, to ensure that data are not lost.

Data store link and realize to the distributed storage for adopting data.In principle, acquisition metric data is stored in big data In platform distribution column data database (HBase), and Recent data (when in front of noon or one day) is buffered in big number According in platform distributed memory database (Redis), handled convenient for application higher for requirement of real-time.

However, being continuously increased with mobile terminal, mobile base station data are more and more, and conventional method can not be coped with The mobile base station data of magnanimity.High expense, the poor compatibility of data, backstage of existing business private clound GIS platform deployment Service interface customization is difficult.

Summary of the invention

It is an object of the present invention to overcome the above-mentioned drawbacks of the prior art and provide a kind of based on Hadoop's Distributed mobile base station data storage system.

The purpose of the present invention can be achieved through the following technical solutions:

A kind of distributed mobile base station data storage system based on Hadoop, including sequentially connected interface layer, function Layer, data Layer and physical layer, the physical layer include an at least application server, backup server and core layer exchange Machine, each application server, each backup server, data Layer are separately connected core layer switch, and the data Layer includes Linux Storage cluster, the Linux storage cluster use Hadoop cluster platform, and the Hadoop cluster platform includes YARN, index Library, HBase database, Mysql database and the Zookeeper to carry out distributed coordination service, the Hadoop The bottom of cluster platform is equipped with the HDFS to store the file on all nodes.

The Hadoop cluster platform is tree structure, which includes internal node and leaf node, described Internal node represent a router or core layer switch, the leaf node represents deployment DataNode back end Machine, DataNode back end comes from NameNode title to respond the read-write requests from HDFS, and for responding The order of creation, the deletion and copy block of node.

The administrator of the Hadoop cluster platform specifies a script file by setting the parameters to, in NameNode Name node loads this script after starting successfully automatically and executes the script, will be in cluster by the setting in the script The IP of DataNode back end translates into corresponding rackname, if being not provided with parameter, each DataNode data section The IP of point can be resolved to default rack, and NameNode name node is used to receive the regular heart of each DataNode back end Jump message.

In the Hadoop cluster platform, DataNode is actively initiated between NameNode every one section of heart time Connection, the eartbeat interval time set by configuration parameter, and can set maximum duration by configuration parameter, if NameNode Claim node to find a node more than not getting in touch yet with it after maximum duration, then assert that the node of discovery is dead, by the section Point is labeled as DeadNode death nodes.

The Mysql database is synchronous with HBase database realizing by Sqoop.

The interface layer uses Java API programming interface.

The functional layer includes points of interest attribute query unit, point of interest space querying unit, point of interest administrative unit.

Compared with prior art, the invention has the following advantages that

(1) present system can be combined multiple economic machines using Hadoop Distributed Architecture, form a collection Group, the physical disk of more machines form a big logic storage, can greatly promote capacity, improve the compatibility of data, solve The certainly difficult problem of background service interface customization；

(2) Distributed Storage of the invention using Java API as interface layer, is answered equipped at least one by physical layer With server, backup server, data Layer include YARN, index database, HBase database, Mysql database and for for Hadoop and HBase provide the significant components Zookeeper of Distributed Services, data Layer have it is enough reliable, can be safe complete Whole storing data, within the system, the number of copies of setting DataNode back end are three, can also be fast even if breaking down Speed is replicated and is backed up from other nodes again.

Detailed description of the invention

Fig. 1 is the structural schematic diagram of present system.

Specific embodiment

The present invention is described in detail with specific embodiment below in conjunction with the accompanying drawings.Obviously, described embodiment is this A part of the embodiment of invention, rather than whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art exist Every other embodiment obtained under the premise of creative work is not made, all should belong to the scope of protection of the invention.

As shown in Figure 1, the present invention relates to a kind of distributed mobile base station data storage system based on Hadoop, comprising: Interface layer, functional layer, data Layer and physical layer.Wherein, interface layer uses Java API programming interface.Functional layer is equipped with point of interest Attribute query unit, point of interest space querying unit, point of interest administrative unit etc..Physical layer is equipped at least one application service Device, backup server and at least one core layer switch.

Data Layer includes Linux storage cluster: it uses the linux system PC equipped with Centos 6.5 to set up cluster.Collection For group using Hadoop system as basic architecture platform, Hadoop, which is one, to carry out the soft of distributed treatment to mass data Part frame, is handled in a reliable, efficient and scalable way.It works in a parallel fashion, passes through parallel processing Speed up processing.Hadoop or telescopic, is capable of handling PB grades of data.There are many element structures for Hadoop system frame At.Its bottommost is HDFS (Hadoop Distributed File System, Hadoop distributed file system), storage The user of file in Hadoop cluster on all memory nodes, client can be created by HDFS, be deleted, moved or again Name the operation such as file；Data Layer further includes YARN (Yet Another Resource Negotiator, another resource association Tune person), index database, HBase database, Mysql database and for providing the weight of Distributed Services for Hadoop and HBase Want component Zookeeper；Mysql database is synchronous with HBase database realizing by Sqoop.HBase database passes through Lucene and index database contact realization retrieval.Entire data Layer is using MapReduce as distributed computing framework.

The present invention is using distributed file system HDFS as HBase, the storage equipment of Hive and other application data. Resource manager of the YARN as cluster is responsible for the management and scheduling of resource.It is used to support data using Hive data warehouse HQL inquiry etc. operation.HBase is distributed column storage database, is used for structured data.Zookeepe is for carrying out The various services of coordination system are responsible in distributed coordination service in systems.

HDSF rack topology: Hadoop cluster organization form is tree structure, is divided into internal node with leaf node, inside Node generally represents a router or interchanger, and leaf node then represents the machine for disposing DataNode back end. HDFS oneself cannot judge rack topological relation, that is, the topology of DataNode back end under default situations.NameNode Name node name node is used to receive the periodic heartbeat message of each DataNode back end back end, DataNode number According to node for responding the read-write requests from HDFS module client；It is also used to respond the wound from NameNode name node It builds, delete and the order of copy block.But the administrator of cluster can be by configuring in topology.script.file.name Parameter specify a script file, this script can be automatically loaded after NameNode name node starts successfully and hold The row script, it is corresponding the IP of DataNode back end in cluster is translated by the setting in the script Rackname, if being not provided with parameter, the IP of each Data Node can be resolved to/default-rack.According to this Kind of topological structure defines a kind of distance and is called network distance, and node to the distance between its Parent node is 1, any two section The distance of point is equal to them and arrives the sum of the distance of nearest public Parent node.It is usually the case that, it is desirable to make network communication It is faster it is necessary to as far as possible make node between node at a distance from it is smaller.It is clear that the network communication of machine frame inside than rack with Network communication between rack is faster.

Heartbeat mechanism.Because NameNode name node will not actively be interacted with DataNode back end and ditch Logical, so the connection between them is all that DataNode back end is actively initiated, the main purpose done so is to reduce The load of NameNode name node reduces the pressure of NameNode name node, also secure to the stability of cluster in this way, Meanwhile dynamically increasing in cluster or large effect will not be generated to NameNode name node when deletion of node.Institute Just to need to establish a kind of heartbeat mechanism, keep just contacting at regular intervals for DataNode back end active primary NameNode name node.Between the time that parameter by configuring dfs.heartbeat.interval can set heartbeat Every, and a maximum duration can be set, when NameNode name node finds that a node has been more than this maximum duration If being all not in contact with oneself, then this vertex ticks will be DeadNode dead it is assumed that this node is dead by that Node.

After cluster starting, offerService method is by according to the heart time of setting, if being set as 5 seconds, that is just The sendHeartbeat method of NameNode name node, the starting of NameNode name node were called by RPC every 5 seconds After will establish a RPC Server, for monitoring the RPC request of DataNode back end, then NameNode title section The sendHeartbeat method call handleHeartbeat method of point.Heartbeat mechanism in this way, NameNode title Node can send instruction, such as the additions and deletions etc. to data to DataNode back end.

Rack perception.Rack perception is realized based on network topology structure.Because the placement of copy in HDFS data can Performance by property and cluster is crucial, rack perceptual strategy also for improve the reliabilities of data, safety and The utilization rate of network bandwidth can be improved.It integrally breaks down to prevent some rack, it can be each duplicate copy to not In same rack, the bandwidth that can also make full use of each rack is done so, the overall performance of cluster is improved, if default is each The number of copies of file is 3, and first copy is placed on local rack first, and second copy is placed on other random racks In, third copy is placed in the different machines with second same machine frame, if copy amount is greater than 3, copy after that With regard to random selection node storage.There can be one very big mention to the reliance security of data in a simple manner in this way It is high.It when copy is stored in node, first has to verify node, purpose makes to determine that each state of node whether may be used With whether the memory space by the isGoodTarget method in NameNode name node class, first calculating disk is enough Current copy is written, if insufficient space, other nodes can be selected, then counts the operation that the node is currently executing Number can assert that the state of the node is if the current operation number of the node has been more than 2 times of the current average operation number of cluster Overload would not also store new copy on this node, then go to verify other node.Such strategy not only ensure that Certain write performance, and reliability and the safety etc. that ensure that load balancing and data in a certain range.

The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any The staff for being familiar with the art in the technical scope disclosed by the present invention, can readily occur in various equivalent modifications or replace It changes, these modifications or substitutions should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with right It is required that protection scope subject to.

Claims

1. a kind of distributed mobile base station data storage system based on Hadoop, which is characterized in that the system includes successively connecting Interface layer, functional layer, data Layer and the physical layer connect, the physical layer include at least application server, a backup services Device and core layer switch, each application server, each backup server, data Layer are separately connected core layer switch, described Data Layer include Linux storage cluster, the Linux storage cluster use Hadoop cluster platform, the Hadoop cluster Platform includes YARN, index database, HBase database, Mysql database and to carry out distributed coordination service Zookeeper, the bottom of the Hadoop cluster platform are equipped with the HDFS to store the file on all nodes.

2. a kind of distributed mobile base station data storage system based on Hadoop according to claim 1, feature exist In the Hadoop cluster platform is tree structure, which includes internal node and leaf node, the inside One router of node on behalf or core layer switch, the leaf node represent the machine of deployment DataNode back end Device, DataNode back end come from NameNode name node to respond the read-write requests from HDFS, and for responding Creation, deletion and copy block order.

3. a kind of distributed mobile base station data storage system based on Hadoop according to claim 2, feature exist In the administrator of the Hadoop cluster platform specifies a script file by setting the parameters to, in NameNode title Node loads this script after starting successfully automatically and executes the script, by the setting in the script by DataNode in cluster The IP of back end translates into corresponding rackname, if being not provided with parameter, the IP meeting of each DataNode back end It is resolved to default rack, NameNode name node is used to receive the periodic heartbeat message of each DataNode back end.

4. a kind of distributed mobile base station data storage system based on Hadoop according to claim 3, feature exist In in the Hadoop cluster platform, DataNode actively initiates the connection between NameNode every one section of heart time System, eartbeat interval time are set by configuration parameter, and can set maximum duration by configuration parameter, if NameNode title section One node of point discovery then assert that the node of discovery is dead, by the node mark more than not getting in touch yet with it after maximum duration It is denoted as DeadNode death nodes.

5. a kind of distributed mobile base station data storage system based on Hadoop according to claim 1, feature exist In the Mysql database is synchronous with HBase database realizing by Sqoop.

6. a kind of distributed mobile base station data storage system based on Hadoop according to claim 1, feature exist In the interface layer uses Java API programming interface.

7. a kind of distributed mobile base station data storage system based on Hadoop according to claim 1, feature exist In the functional layer includes points of interest attribute query unit, point of interest space querying unit, point of interest administrative unit.