CN104199919A

CN104199919A - Method for achieving real-time reading of super-large-scale data

Info

Publication number: CN104199919A
Application number: CN201410438674.XA
Authority: CN
Inventors: 许梅
Original assignee: JIANGSU HUIWANG INFORMATION TECHNOLOGY Co Ltd
Current assignee: JIANGSU HUIWANG INFORMATION TECHNOLOGY Co Ltd
Priority date: 2014-09-01
Filing date: 2014-09-01
Publication date: 2014-12-10

Abstract

The invention discloses a method for achieving real-time reading of super-large-scale data. Volume management nodes, a block data storage nodes, an ID management module, a user mounted client-side, an identity identification module and a real-time result transmission module are adopted in the method, wherein an existing HDFS serves as the basis during data storage, multithreads are started on each datanode to create indexes and parallelly create index files, and the created indexes are generated through a B+tree structure. By means of the method for achieving the real-time reading of super-large-scale data, the shortcomings of system resource waste and long data processing time caused by a commonly-used data processing method in an existing cloud computing solution are overcome. The method is an effective mass data real-time processing method.

Description

A kind ofly realize the method that ultra-large data read in real time

Technical field

The present invention relates to computer application system field, particularly a kind ofly realize the method that ultra-large data read in real time.

Background technology

Develop rapidly along with the information age, the explosive growth of quantity of information has become a kind of characteristics of the times, thing followed problem is the storage problem of mass data, the storage of traditional hard-disc type is obviously difficult to satisfy the demands, the direct-connected storage of DAS(occurring afterwards) storage mode, solved the problem of storage data volume, but discrete DAS storage forms isolated island one by one, when a memory capacity saturated, even if other DAS equipment has capacity more than needed also to need to buy new memory device, and newly add a server and also will newly add a DAS, carrying cost is higher, NAS afterwards and SAN(Storage Area Network--storage networking) solved the public problem of storage space, but the growth along with data volume, the performance of cluster has become again subject matter with extensibility, also just cannot realize the structure of ultra-large low-cost storage system.

The mass data processing that appears as of cloud computing provides solution route effectively, in common cloud computing solution, by Hadoop(distributed system architecture) HDFS(distributed file system) can realize easily mass data storage, effectively prevent Single Point of Faliure, avoid unnecessary loss simultaneously.But according to the retrieval time, conventional method is to open the concurrent operation of global search MapReduce(large-scale data in the enterprising line number of HDFS), this needs all data of the upper storage of HDFS of complete filtration.In cloud computing, especially, in mass data situation, do like this and can cause huge waste to system resource, expend a large amount of time, this is not obviously a mode that is applicable to dropping into real production environment.

Summary of the invention

The object of the invention is to overcome frequently-used data disposal route in existing cloud computing solution and can cause system resource waste, the shortcoming that data processing time is long, a kind of effective mass data real-time processing method is provided, particularly a kind ofly realizes the method that ultra-large data read in real time.

To achieve these goals, the present invention has designed a kind of method that ultra-large data read in real time that realizes, comprise volume management node, blocks of data memory node, ID administration module, user's carry client, identification module and real-time results transport module, wherein:

Volume management node: safeguard all cloud platform data subset of servers groups' information, for carry client provides id information, IP address and the port number information of client self;

Blocks of data memory node: take existing HDFS as basis, on every datanode, start multithreading and create index, the parallel index file that creates, the establishment of index is with the structural generation of B+ tree;

ID administration module for encapsulating and the id information of managing exclusive client self, and extracts or isolates the ID of corresponding user name, MAC address, the exclusive client of father self, and is sent to blocks of data memory node from id information;

User's carry client: real-time query: use distributed computing system, create and submit to job to inquire about at server end, inquiry is divided into three steps:

A. the enterprising line index of namenode is filtered, because index file name created according to the time, according to the time in querying condition and index file name coupling, the index file that screening satisfies condition;

B. task is distributed to every datanode upper, according to the index file filtering out and querying condition, passes through B+ tree query, be met the position of the data of condition;

C. again carry out the distribution of task, according to the position of data obtained in the previous step reading out data on every machine, and return to Query Result;

Identification module, for obtaining the MAC address of intelligent terminal, and contrasts with the MAC address that ID administration module extracts or separates, and judges whether coupling, if coupling continues exclusive client terminal start-up, otherwise stops operation;

Real-time results transport module: use jetty as web container, when doing data query on HDFS, jetty repeating query Query Result catalogue, if be not empty, read Query Result file and return to client, client continues to send continue request to server end, and server end starts multithreading and reads Query Result, and reading out data is returned to client, if the reading out data returning is for empty, flow process finishes, if be not empty, client continues to send continue request; In query script, any datanode successful inquiring, to client return data, does not need all datanode to inquire about.

Further, aforesaidly realize the method that ultra-large data read in real time, described active and standby volume management server externally provides service by same VIP, and it is unified that active and standby volume management server adds both states of configure and maintenance by management and monitoring center.

Beneficial effect:

Designed a kind of of the present invention realizes the method that ultra-large data read in real time, overcome frequently-used data disposal route in existing cloud computing solution and can cause system resource waste, the shortcoming that data processing time is long, becomes a kind of effective mass data real-time processing method.

Embodiment

embodiment 1

The present embodiment provides a kind of method that ultra-large data read in real time that realizes, and comprises volume management node, blocks of data memory node, ID administration module, user's carry client, identification module and real-time results transport module, wherein:

All processing of the present invention are all concurrent execution, have utilized to greatest extent the hardware device of computing machine, have greatly improved treatment effeciency, while making user carry out query manipulation, just can obtain Query Result.

Claims

1. realize the method that ultra-large data read in real time, it is characterized in that, comprise volume management node, blocks of data memory node, ID administration module, user's carry client, identification module and real-time results transport module, wherein:

2. according to claim 1ly realize the method that ultra-large data read in real time, it is characterized in that, described active and standby volume management server externally provides service by same VIP, and it is unified that active and standby volume management server adds both states of configure and maintenance by management and monitoring center.