CN104199919A - Method for achieving real-time reading of super-large-scale data - Google Patents

Method for achieving real-time reading of super-large-scale data Download PDF

Info

Publication number
CN104199919A
CN104199919A CN201410438674.XA CN201410438674A CN104199919A CN 104199919 A CN104199919 A CN 104199919A CN 201410438674 A CN201410438674 A CN 201410438674A CN 104199919 A CN104199919 A CN 104199919A
Authority
CN
China
Prior art keywords
data
client
real
time
query
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410438674.XA
Other languages
Chinese (zh)
Inventor
许梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JIANGSU HUIWANG INFORMATION TECHNOLOGY Co Ltd
Original Assignee
JIANGSU HUIWANG INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JIANGSU HUIWANG INFORMATION TECHNOLOGY Co Ltd filed Critical JIANGSU HUIWANG INFORMATION TECHNOLOGY Co Ltd
Priority to CN201410438674.XA priority Critical patent/CN104199919A/en
Publication of CN104199919A publication Critical patent/CN104199919A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Abstract

The invention discloses a method for achieving real-time reading of super-large-scale data. Volume management nodes, a block data storage nodes, an ID management module, a user mounted client-side, an identity identification module and a real-time result transmission module are adopted in the method, wherein an existing HDFS serves as the basis during data storage, multithreads are started on each datanode to create indexes and parallelly create index files, and the created indexes are generated through a B+tree structure. By means of the method for achieving the real-time reading of super-large-scale data, the shortcomings of system resource waste and long data processing time caused by a commonly-used data processing method in an existing cloud computing solution are overcome. The method is an effective mass data real-time processing method.

Description

A kind ofly realize the method that ultra-large data read in real time
Technical field
The present invention relates to computer application system field, particularly a kind ofly realize the method that ultra-large data read in real time.
Background technology
Develop rapidly along with the information age, the explosive growth of quantity of information has become a kind of characteristics of the times, thing followed problem is the storage problem of mass data, the storage of traditional hard-disc type is obviously difficult to satisfy the demands, the direct-connected storage of DAS(occurring afterwards) storage mode, solved the problem of storage data volume, but discrete DAS storage forms isolated island one by one, when a memory capacity saturated, even if other DAS equipment has capacity more than needed also to need to buy new memory device, and newly add a server and also will newly add a DAS, carrying cost is higher, NAS afterwards and SAN(Storage Area Network--storage networking) solved the public problem of storage space, but the growth along with data volume, the performance of cluster has become again subject matter with extensibility, also just cannot realize the structure of ultra-large low-cost storage system.
The mass data processing that appears as of cloud computing provides solution route effectively, in common cloud computing solution, by Hadoop(distributed system architecture) HDFS(distributed file system) can realize easily mass data storage, effectively prevent Single Point of Faliure, avoid unnecessary loss simultaneously.But according to the retrieval time, conventional method is to open the concurrent operation of global search MapReduce(large-scale data in the enterprising line number of HDFS), this needs all data of the upper storage of HDFS of complete filtration.In cloud computing, especially, in mass data situation, do like this and can cause huge waste to system resource, expend a large amount of time, this is not obviously a mode that is applicable to dropping into real production environment.
Summary of the invention
The object of the invention is to overcome frequently-used data disposal route in existing cloud computing solution and can cause system resource waste, the shortcoming that data processing time is long, a kind of effective mass data real-time processing method is provided, particularly a kind ofly realizes the method that ultra-large data read in real time.
To achieve these goals, the present invention has designed a kind of method that ultra-large data read in real time that realizes, comprise volume management node, blocks of data memory node, ID administration module, user's carry client, identification module and real-time results transport module, wherein:
Volume management node: safeguard all cloud platform data subset of servers groups' information, for carry client provides id information, IP address and the port number information of client self;
Blocks of data memory node: take existing HDFS as basis, on every datanode, start multithreading and create index, the parallel index file that creates, the establishment of index is with the structural generation of B+ tree;
ID administration module for encapsulating and the id information of managing exclusive client self, and extracts or isolates the ID of corresponding user name, MAC address, the exclusive client of father self, and is sent to blocks of data memory node from id information;
User's carry client: real-time query: use distributed computing system, create and submit to job to inquire about at server end, inquiry is divided into three steps:
A. the enterprising line index of namenode is filtered, because index file name created according to the time, according to the time in querying condition and index file name coupling, the index file that screening satisfies condition;
B. task is distributed to every datanode upper, according to the index file filtering out and querying condition, passes through B+ tree query, be met the position of the data of condition;
C. again carry out the distribution of task, according to the position of data obtained in the previous step reading out data on every machine, and return to Query Result;
Identification module, for obtaining the MAC address of intelligent terminal, and contrasts with the MAC address that ID administration module extracts or separates, and judges whether coupling, if coupling continues exclusive client terminal start-up, otherwise stops operation;
Real-time results transport module: use jetty as web container, when doing data query on HDFS, jetty repeating query Query Result catalogue, if be not empty, read Query Result file and return to client, client continues to send continue request to server end, and server end starts multithreading and reads Query Result, and reading out data is returned to client, if the reading out data returning is for empty, flow process finishes, if be not empty, client continues to send continue request; In query script, any datanode successful inquiring, to client return data, does not need all datanode to inquire about.
Further, aforesaidly realize the method that ultra-large data read in real time, described active and standby volume management server externally provides service by same VIP, and it is unified that active and standby volume management server adds both states of configure and maintenance by management and monitoring center.
Beneficial effect:
Designed a kind of of the present invention realizes the method that ultra-large data read in real time, overcome frequently-used data disposal route in existing cloud computing solution and can cause system resource waste, the shortcoming that data processing time is long, becomes a kind of effective mass data real-time processing method.
Embodiment
embodiment 1
The present embodiment provides a kind of method that ultra-large data read in real time that realizes, and comprises volume management node, blocks of data memory node, ID administration module, user's carry client, identification module and real-time results transport module, wherein:
Volume management node: safeguard all cloud platform data subset of servers groups' information, for carry client provides id information, IP address and the port number information of client self;
Blocks of data memory node: take existing HDFS as basis, on every datanode, start multithreading and create index, the parallel index file that creates, the establishment of index is with the structural generation of B+ tree;
ID administration module for encapsulating and the id information of managing exclusive client self, and extracts or isolates the ID of corresponding user name, MAC address, the exclusive client of father self, and is sent to blocks of data memory node from id information;
User's carry client: real-time query: use distributed computing system, create and submit to job to inquire about at server end, inquiry is divided into three steps:
A. the enterprising line index of namenode is filtered, because index file name created according to the time, according to the time in querying condition and index file name coupling, the index file that screening satisfies condition;
B. task is distributed to every datanode upper, according to the index file filtering out and querying condition, passes through B+ tree query, be met the position of the data of condition;
C. again carry out the distribution of task, according to the position of data obtained in the previous step reading out data on every machine, and return to Query Result;
Identification module, for obtaining the MAC address of intelligent terminal, and contrasts with the MAC address that ID administration module extracts or separates, and judges whether coupling, if coupling continues exclusive client terminal start-up, otherwise stops operation;
Real-time results transport module: use jetty as web container, when doing data query on HDFS, jetty repeating query Query Result catalogue, if be not empty, read Query Result file and return to client, client continues to send continue request to server end, and server end starts multithreading and reads Query Result, and reading out data is returned to client, if the reading out data returning is for empty, flow process finishes, if be not empty, client continues to send continue request; In query script, any datanode successful inquiring, to client return data, does not need all datanode to inquire about.
Further, aforesaidly realize the method that ultra-large data read in real time, described active and standby volume management server externally provides service by same VIP, and it is unified that active and standby volume management server adds both states of configure and maintenance by management and monitoring center.
All processing of the present invention are all concurrent execution, have utilized to greatest extent the hardware device of computing machine, have greatly improved treatment effeciency, while making user carry out query manipulation, just can obtain Query Result.

Claims (2)

1. realize the method that ultra-large data read in real time, it is characterized in that, comprise volume management node, blocks of data memory node, ID administration module, user's carry client, identification module and real-time results transport module, wherein:
Volume management node: safeguard all cloud platform data subset of servers groups' information, for carry client provides id information, IP address and the port number information of client self;
Blocks of data memory node: take existing HDFS as basis, on every datanode, start multithreading and create index, the parallel index file that creates, the establishment of index is with the structural generation of B+ tree;
ID administration module for encapsulating and the id information of managing exclusive client self, and extracts or isolates the ID of corresponding user name, MAC address, the exclusive client of father self, and is sent to blocks of data memory node from id information;
User's carry client: real-time query: use distributed computing system, create and submit to job to inquire about at server end, inquiry is divided into three steps:
A. the enterprising line index of namenode is filtered, because index file name created according to the time, according to the time in querying condition and index file name coupling, the index file that screening satisfies condition;
B. task is distributed to every datanode upper, according to the index file filtering out and querying condition, passes through B+ tree query, be met the position of the data of condition;
C. again carry out the distribution of task, according to the position of data obtained in the previous step reading out data on every machine, and return to Query Result;
Identification module, for obtaining the MAC address of intelligent terminal, and contrasts with the MAC address that ID administration module extracts or separates, and judges whether coupling, if coupling continues exclusive client terminal start-up, otherwise stops operation;
Real-time results transport module: use jetty as web container, when doing data query on HDFS, jetty repeating query Query Result catalogue, if be not empty, read Query Result file and return to client, client continues to send continue request to server end, and server end starts multithreading and reads Query Result, and reading out data is returned to client, if the reading out data returning is for empty, flow process finishes, if be not empty, client continues to send continue request; In query script, any datanode successful inquiring, to client return data, does not need all datanode to inquire about.
2. according to claim 1ly realize the method that ultra-large data read in real time, it is characterized in that, described active and standby volume management server externally provides service by same VIP, and it is unified that active and standby volume management server adds both states of configure and maintenance by management and monitoring center.
CN201410438674.XA 2014-09-01 2014-09-01 Method for achieving real-time reading of super-large-scale data Pending CN104199919A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410438674.XA CN104199919A (en) 2014-09-01 2014-09-01 Method for achieving real-time reading of super-large-scale data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410438674.XA CN104199919A (en) 2014-09-01 2014-09-01 Method for achieving real-time reading of super-large-scale data

Publications (1)

Publication Number Publication Date
CN104199919A true CN104199919A (en) 2014-12-10

Family

ID=52085212

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410438674.XA Pending CN104199919A (en) 2014-09-01 2014-09-01 Method for achieving real-time reading of super-large-scale data

Country Status (1)

Country Link
CN (1) CN104199919A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649847A (en) * 2016-12-30 2017-05-10 南威软件股份有限公司 A large data real-time processing system based on Hadoop
CN108279943A (en) * 2017-01-05 2018-07-13 腾讯科技(深圳)有限公司 Index loading method and device
CN110134893A (en) * 2019-04-03 2019-08-16 广州朗国电子科技有限公司 A kind of multimachine structure retrieval display method and device based on cloud information issuing system
CN110781001A (en) * 2019-10-23 2020-02-11 广东浪潮大数据研究有限公司 Kubernetes-based container environment variable checking method

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649847A (en) * 2016-12-30 2017-05-10 南威软件股份有限公司 A large data real-time processing system based on Hadoop
CN108279943A (en) * 2017-01-05 2018-07-13 腾讯科技(深圳)有限公司 Index loading method and device
CN108279943B (en) * 2017-01-05 2020-09-11 腾讯科技(深圳)有限公司 Index loading method and device
CN110134893A (en) * 2019-04-03 2019-08-16 广州朗国电子科技有限公司 A kind of multimachine structure retrieval display method and device based on cloud information issuing system
CN110781001A (en) * 2019-10-23 2020-02-11 广东浪潮大数据研究有限公司 Kubernetes-based container environment variable checking method
CN110781001B (en) * 2019-10-23 2023-03-28 广东浪潮大数据研究有限公司 Kubernetes-based container environment variable checking method

Similar Documents

Publication Publication Date Title
CN105393220B (en) System and method for disposing dotted virtual server in group system
US11061942B2 (en) Unstructured data fusion by content-aware concurrent data processing pipeline
CN107079060A (en) The system and method optimized for carrier-class NAT
CN104620539B (en) System and method for supporting SNMP requests by cluster
CN104364761B (en) For the system and method for the converting flow in cluster network
CN104365058B (en) For the system and method in multinuclear and group system high speed caching SNMP data
SE1751212A1 (en) Distributed data set storage and retrieval
US10860604B1 (en) Scalable tracking for database udpates according to a secondary index
CN104966006A (en) Intelligent face identification system based on cloud variation platform
CN105677842A (en) Log analysis system based on Hadoop big data processing technique
CN105025053A (en) Distributed file upload method based on cloud storage technology and system
CN109189751A (en) Method of data synchronization and terminal device based on block chain
CN103440244A (en) Large-data storage and optimization method
CN112671840B (en) Cross-department data sharing system and method based on block chain technology
CN111258978B (en) Data storage method
CN104199919A (en) Method for achieving real-time reading of super-large-scale data
CN104348793B (en) The storage method of storage server system and data message
CN107078936A (en) For the system and method for the fine granularity control for providing the MSS values connected to transport layer
CN111404932A (en) Method for accessing medical institution system to smart medical cloud service platform
CN104216963A (en) Mass network management data collection and storage method based on HBase
JP5818263B2 (en) Data distributed management system, apparatus, method and program
CN102841944A (en) Method achieving real-time processing of big data
Lai et al. A scalable multi-attribute hybrid overlay for range queries on the cloud
CN106649847A (en) A large data real-time processing system based on Hadoop
US20140317272A1 (en) Method of collecting information, content network management system, and node apparatus using management interface in content network based on information-centric networking

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20141210