CN104199919A - Method for achieving real-time reading of super-large-scale data - Google Patents
Method for achieving real-time reading of super-large-scale data Download PDFInfo
- Publication number
- CN104199919A CN104199919A CN201410438674.XA CN201410438674A CN104199919A CN 104199919 A CN104199919 A CN 104199919A CN 201410438674 A CN201410438674 A CN 201410438674A CN 104199919 A CN104199919 A CN 104199919A
- Authority
- CN
- China
- Prior art keywords
- data
- client
- real
- time
- query
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2246—Trees, e.g. B+trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2272—Management thereof
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
Abstract
The invention discloses a method for achieving real-time reading of super-large-scale data. Volume management nodes, a block data storage nodes, an ID management module, a user mounted client-side, an identity identification module and a real-time result transmission module are adopted in the method, wherein an existing HDFS serves as the basis during data storage, multithreads are started on each datanode to create indexes and parallelly create index files, and the created indexes are generated through a B+tree structure. By means of the method for achieving the real-time reading of super-large-scale data, the shortcomings of system resource waste and long data processing time caused by a commonly-used data processing method in an existing cloud computing solution are overcome. The method is an effective mass data real-time processing method.
Description
Technical field
The present invention relates to computer application system field, particularly a kind ofly realize the method that ultra-large data read in real time.
Background technology
Develop rapidly along with the information age, the explosive growth of quantity of information has become a kind of characteristics of the times, thing followed problem is the storage problem of mass data, the storage of traditional hard-disc type is obviously difficult to satisfy the demands, the direct-connected storage of DAS(occurring afterwards) storage mode, solved the problem of storage data volume, but discrete DAS storage forms isolated island one by one, when a memory capacity saturated, even if other DAS equipment has capacity more than needed also to need to buy new memory device, and newly add a server and also will newly add a DAS, carrying cost is higher, NAS afterwards and SAN(Storage Area Network--storage networking) solved the public problem of storage space, but the growth along with data volume, the performance of cluster has become again subject matter with extensibility, also just cannot realize the structure of ultra-large low-cost storage system.
The mass data processing that appears as of cloud computing provides solution route effectively, in common cloud computing solution, by Hadoop(distributed system architecture) HDFS(distributed file system) can realize easily mass data storage, effectively prevent Single Point of Faliure, avoid unnecessary loss simultaneously.But according to the retrieval time, conventional method is to open the concurrent operation of global search MapReduce(large-scale data in the enterprising line number of HDFS), this needs all data of the upper storage of HDFS of complete filtration.In cloud computing, especially, in mass data situation, do like this and can cause huge waste to system resource, expend a large amount of time, this is not obviously a mode that is applicable to dropping into real production environment.
Summary of the invention
The object of the invention is to overcome frequently-used data disposal route in existing cloud computing solution and can cause system resource waste, the shortcoming that data processing time is long, a kind of effective mass data real-time processing method is provided, particularly a kind ofly realizes the method that ultra-large data read in real time.
To achieve these goals, the present invention has designed a kind of method that ultra-large data read in real time that realizes, comprise volume management node, blocks of data memory node, ID administration module, user's carry client, identification module and real-time results transport module, wherein:
Volume management node: safeguard all cloud platform data subset of servers groups' information, for carry client provides id information, IP address and the port number information of client self;
Blocks of data memory node: take existing HDFS as basis, on every datanode, start multithreading and create index, the parallel index file that creates, the establishment of index is with the structural generation of B+ tree;
ID administration module for encapsulating and the id information of managing exclusive client self, and extracts or isolates the ID of corresponding user name, MAC address, the exclusive client of father self, and is sent to blocks of data memory node from id information;
User's carry client: real-time query: use distributed computing system, create and submit to job to inquire about at server end, inquiry is divided into three steps:
A. the enterprising line index of namenode is filtered, because index file name created according to the time, according to the time in querying condition and index file name coupling, the index file that screening satisfies condition;
B. task is distributed to every datanode upper, according to the index file filtering out and querying condition, passes through B+ tree query, be met the position of the data of condition;
C. again carry out the distribution of task, according to the position of data obtained in the previous step reading out data on every machine, and return to Query Result;
Identification module, for obtaining the MAC address of intelligent terminal, and contrasts with the MAC address that ID administration module extracts or separates, and judges whether coupling, if coupling continues exclusive client terminal start-up, otherwise stops operation;
Real-time results transport module: use jetty as web container, when doing data query on HDFS, jetty repeating query Query Result catalogue, if be not empty, read Query Result file and return to client, client continues to send continue request to server end, and server end starts multithreading and reads Query Result, and reading out data is returned to client, if the reading out data returning is for empty, flow process finishes, if be not empty, client continues to send continue request; In query script, any datanode successful inquiring, to client return data, does not need all datanode to inquire about.
Further, aforesaidly realize the method that ultra-large data read in real time, described active and standby volume management server externally provides service by same VIP, and it is unified that active and standby volume management server adds both states of configure and maintenance by management and monitoring center.
Beneficial effect:
Designed a kind of of the present invention realizes the method that ultra-large data read in real time, overcome frequently-used data disposal route in existing cloud computing solution and can cause system resource waste, the shortcoming that data processing time is long, becomes a kind of effective mass data real-time processing method.
Embodiment
embodiment 1
The present embodiment provides a kind of method that ultra-large data read in real time that realizes, and comprises volume management node, blocks of data memory node, ID administration module, user's carry client, identification module and real-time results transport module, wherein:
Volume management node: safeguard all cloud platform data subset of servers groups' information, for carry client provides id information, IP address and the port number information of client self;
Blocks of data memory node: take existing HDFS as basis, on every datanode, start multithreading and create index, the parallel index file that creates, the establishment of index is with the structural generation of B+ tree;
ID administration module for encapsulating and the id information of managing exclusive client self, and extracts or isolates the ID of corresponding user name, MAC address, the exclusive client of father self, and is sent to blocks of data memory node from id information;
User's carry client: real-time query: use distributed computing system, create and submit to job to inquire about at server end, inquiry is divided into three steps:
A. the enterprising line index of namenode is filtered, because index file name created according to the time, according to the time in querying condition and index file name coupling, the index file that screening satisfies condition;
B. task is distributed to every datanode upper, according to the index file filtering out and querying condition, passes through B+ tree query, be met the position of the data of condition;
C. again carry out the distribution of task, according to the position of data obtained in the previous step reading out data on every machine, and return to Query Result;
Identification module, for obtaining the MAC address of intelligent terminal, and contrasts with the MAC address that ID administration module extracts or separates, and judges whether coupling, if coupling continues exclusive client terminal start-up, otherwise stops operation;
Real-time results transport module: use jetty as web container, when doing data query on HDFS, jetty repeating query Query Result catalogue, if be not empty, read Query Result file and return to client, client continues to send continue request to server end, and server end starts multithreading and reads Query Result, and reading out data is returned to client, if the reading out data returning is for empty, flow process finishes, if be not empty, client continues to send continue request; In query script, any datanode successful inquiring, to client return data, does not need all datanode to inquire about.
Further, aforesaidly realize the method that ultra-large data read in real time, described active and standby volume management server externally provides service by same VIP, and it is unified that active and standby volume management server adds both states of configure and maintenance by management and monitoring center.
All processing of the present invention are all concurrent execution, have utilized to greatest extent the hardware device of computing machine, have greatly improved treatment effeciency, while making user carry out query manipulation, just can obtain Query Result.
Claims (2)
1. realize the method that ultra-large data read in real time, it is characterized in that, comprise volume management node, blocks of data memory node, ID administration module, user's carry client, identification module and real-time results transport module, wherein:
Volume management node: safeguard all cloud platform data subset of servers groups' information, for carry client provides id information, IP address and the port number information of client self;
Blocks of data memory node: take existing HDFS as basis, on every datanode, start multithreading and create index, the parallel index file that creates, the establishment of index is with the structural generation of B+ tree;
ID administration module for encapsulating and the id information of managing exclusive client self, and extracts or isolates the ID of corresponding user name, MAC address, the exclusive client of father self, and is sent to blocks of data memory node from id information;
User's carry client: real-time query: use distributed computing system, create and submit to job to inquire about at server end, inquiry is divided into three steps:
A. the enterprising line index of namenode is filtered, because index file name created according to the time, according to the time in querying condition and index file name coupling, the index file that screening satisfies condition;
B. task is distributed to every datanode upper, according to the index file filtering out and querying condition, passes through B+ tree query, be met the position of the data of condition;
C. again carry out the distribution of task, according to the position of data obtained in the previous step reading out data on every machine, and return to Query Result;
Identification module, for obtaining the MAC address of intelligent terminal, and contrasts with the MAC address that ID administration module extracts or separates, and judges whether coupling, if coupling continues exclusive client terminal start-up, otherwise stops operation;
Real-time results transport module: use jetty as web container, when doing data query on HDFS, jetty repeating query Query Result catalogue, if be not empty, read Query Result file and return to client, client continues to send continue request to server end, and server end starts multithreading and reads Query Result, and reading out data is returned to client, if the reading out data returning is for empty, flow process finishes, if be not empty, client continues to send continue request; In query script, any datanode successful inquiring, to client return data, does not need all datanode to inquire about.
2. according to claim 1ly realize the method that ultra-large data read in real time, it is characterized in that, described active and standby volume management server externally provides service by same VIP, and it is unified that active and standby volume management server adds both states of configure and maintenance by management and monitoring center.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410438674.XA CN104199919A (en) | 2014-09-01 | 2014-09-01 | Method for achieving real-time reading of super-large-scale data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410438674.XA CN104199919A (en) | 2014-09-01 | 2014-09-01 | Method for achieving real-time reading of super-large-scale data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104199919A true CN104199919A (en) | 2014-12-10 |
Family
ID=52085212
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410438674.XA Pending CN104199919A (en) | 2014-09-01 | 2014-09-01 | Method for achieving real-time reading of super-large-scale data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104199919A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106649847A (en) * | 2016-12-30 | 2017-05-10 | 南威软件股份有限公司 | A large data real-time processing system based on Hadoop |
CN108279943A (en) * | 2017-01-05 | 2018-07-13 | 腾讯科技(深圳)有限公司 | Index loading method and device |
CN110134893A (en) * | 2019-04-03 | 2019-08-16 | 广州朗国电子科技有限公司 | A kind of multimachine structure retrieval display method and device based on cloud information issuing system |
CN110781001A (en) * | 2019-10-23 | 2020-02-11 | 广东浪潮大数据研究有限公司 | Kubernetes-based container environment variable checking method |
-
2014
- 2014-09-01 CN CN201410438674.XA patent/CN104199919A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106649847A (en) * | 2016-12-30 | 2017-05-10 | 南威软件股份有限公司 | A large data real-time processing system based on Hadoop |
CN108279943A (en) * | 2017-01-05 | 2018-07-13 | 腾讯科技(深圳)有限公司 | Index loading method and device |
CN108279943B (en) * | 2017-01-05 | 2020-09-11 | 腾讯科技(深圳)有限公司 | Index loading method and device |
CN110134893A (en) * | 2019-04-03 | 2019-08-16 | 广州朗国电子科技有限公司 | A kind of multimachine structure retrieval display method and device based on cloud information issuing system |
CN110781001A (en) * | 2019-10-23 | 2020-02-11 | 广东浪潮大数据研究有限公司 | Kubernetes-based container environment variable checking method |
CN110781001B (en) * | 2019-10-23 | 2023-03-28 | 广东浪潮大数据研究有限公司 | Kubernetes-based container environment variable checking method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105393220B (en) | System and method for disposing dotted virtual server in group system | |
US11061942B2 (en) | Unstructured data fusion by content-aware concurrent data processing pipeline | |
CN107079060A (en) | The system and method optimized for carrier-class NAT | |
CN104620539B (en) | System and method for supporting SNMP requests by cluster | |
CN104364761B (en) | For the system and method for the converting flow in cluster network | |
CN104365058B (en) | For the system and method in multinuclear and group system high speed caching SNMP data | |
SE1751212A1 (en) | Distributed data set storage and retrieval | |
US10860604B1 (en) | Scalable tracking for database udpates according to a secondary index | |
CN104966006A (en) | Intelligent face identification system based on cloud variation platform | |
CN105677842A (en) | Log analysis system based on Hadoop big data processing technique | |
CN105025053A (en) | Distributed file upload method based on cloud storage technology and system | |
CN109189751A (en) | Method of data synchronization and terminal device based on block chain | |
CN103440244A (en) | Large-data storage and optimization method | |
CN112671840B (en) | Cross-department data sharing system and method based on block chain technology | |
CN111258978B (en) | Data storage method | |
CN104199919A (en) | Method for achieving real-time reading of super-large-scale data | |
CN104348793B (en) | The storage method of storage server system and data message | |
CN107078936A (en) | For the system and method for the fine granularity control for providing the MSS values connected to transport layer | |
CN111404932A (en) | Method for accessing medical institution system to smart medical cloud service platform | |
CN104216963A (en) | Mass network management data collection and storage method based on HBase | |
JP5818263B2 (en) | Data distributed management system, apparatus, method and program | |
CN102841944A (en) | Method achieving real-time processing of big data | |
Lai et al. | A scalable multi-attribute hybrid overlay for range queries on the cloud | |
CN106649847A (en) | A large data real-time processing system based on Hadoop | |
US20140317272A1 (en) | Method of collecting information, content network management system, and node apparatus using management interface in content network based on information-centric networking |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20141210 |