CN104598321A - Intelligent big data processing method and device - Google Patents

Intelligent big data processing method and device Download PDF

Info

Publication number
CN104598321A
CN104598321A CN201510073311.5A CN201510073311A CN104598321A CN 104598321 A CN104598321 A CN 104598321A CN 201510073311 A CN201510073311 A CN 201510073311A CN 104598321 A CN104598321 A CN 104598321A
Authority
CN
China
Prior art keywords
lsm tree
computing machine
lsm
large data
computer cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510073311.5A
Other languages
Chinese (zh)
Inventor
李克学
范莹
戴鸿君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Group Co Ltd
Original Assignee
Inspur Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Group Co Ltd filed Critical Inspur Group Co Ltd
Priority to CN201510073311.5A priority Critical patent/CN104598321A/en
Publication of CN104598321A publication Critical patent/CN104598321A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an intelligent big data processing method and device. The method comprises the following steps: forming a computer cluster comprising multiple computers for storing big data; acquiring big data; storing the acquired big data into LSM trees in form of columnar storage; storing each LSM tree into the corresponding computer in the computer cluster. According to the method and the device, the big data can be effectively processed.

Description

The large data processing method of a kind of intelligence and device
Technical field
The present invention relates to field of computer technology, particularly the large data processing method of a kind of intelligence and device.
Background technology
Along with the development of computer technology, large data are there are.First large data refer to that the data scale of construction is large, refer to large data collection, but in actual applications, a lot of enterprise customer put multiple data set together, has defined the data volume of PB level; Secondly large data refer to that data category is large, and data are from multiple data sources, and data class and form day by day enrich, and the structural data category limited before having broken through, enumerates semi-structured and unstructured data.
Therefore, how large data are processed, become a problem demanding prompt solution.
Summary of the invention
The invention provides a kind of disposal route and device of large data, can effectively process large data.
The large data processing method of a kind of intelligence, comprising:
Form the computer cluster comprising multiple computing machine storing large data;
Gather large data;
In the mode that column stores, the large data collected are stored in LSM tree;
Each LSM tree is stored in each computing machine in described computer cluster respectively.
Preferably, be applied to intelligent visual surveillance system,
Comprise further: pre-set the distributed storage table based on LSM tree, this storage list is strong as row using the video of time period, and the real-time detection of target, target classification, the tracing process of moving target, the analysis of video content are respectively as row race;
The large data collected are stored into LSM tree and comprise by the described mode stored with column:
By the large data collected, after carrying out classification process according to the real-time detection of target, target classification, the tracing process of moving target, the analysis of video content, be stored in the LSM tree of described distributed storage Biao Zhong respective column race respectively; In each row race, in the data write memory in being set by each LSM, after each LSM tree reaches pre-sizing, by the data from overflow of the LSM tree in internal memory in disk, and regularly the LSM tree in disk is merged into a new LSM tree.
Describedly each LSM is set each computing machine be stored into respectively in described computer cluster comprise: in each computing machine that each the new LSM tree after merging is stored in described computer cluster respectively.
Preferably, after in described each computing machine that each LSM tree is stored in described computer cluster respectively, comprise further:
For corresponding task set up by each computing machine in described computer cluster; After receiving actual task, each actual task is distributed in computing machine corresponding in described computer cluster; Read in real time in the LSM tree that each actual task stores from the computing machine be distributed to or analyze large data.
Wherein, described actual task is mapreduce task.
Preferably, the large data of described collection comprise: gather the data in the intelligent visual surveillance system in setting-up time section.
A treating apparatus for the large data of intelligence, comprising:
Forming unit, for the formation of the computer cluster comprising multiple computing machine storing large data;
Collecting unit, for gathering large data;
LSM sets processing unit, for the mode stored with column, the large data collected is stored in LSM tree;
Cluster processing unit, for being stored into each LSM tree respectively in each computing machine in described computer cluster.
Wherein, described LSM sets processing unit, be further used for arranging the distributed storage table based on LSM tree, this storage list is strong as row using the video of time period, and the real-time detection of target, target classification, the tracing process of moving target, the analysis of video content are respectively as row race; By the large data collected, after specifically carrying out classification process according to the real-time detection of target, target classification, the tracing process of moving target, the analysis of video content, be stored in the LSM tree of described distributed storage Biao Zhong respective column race respectively; In each row race, in the data write memory in being set by each LSM, after each LSM tree reaches pre-sizing, by the data from overflow of the LSM tree in internal memory in disk, and regularly the LSM tree in disk is merged into a new LSM tree.
Described cluster processing unit, is specifically stored in each computing machine in described computer cluster respectively by each the new LSM tree after merging.
Preferably, described cluster processing unit, is further used for as corresponding task set up by each computing machine in described computer cluster; After receiving actual task, each actual task is distributed in computing machine corresponding in described computer cluster; Large data are read in the LSM tree that each actual task stores from the computing machine be distributed to.
The large data processing method of the intelligence that the embodiment of the present invention provides and device, can by large distributed data storage in the multiple computing machines in computer cluster, instead of centralized stores is in a computing machine; Further, when storage, be stored in LSM in the mode that column stores, instead of be directly stored in database, therefore, solve the storage problem of large data.
In addition, in an embodiment of the present invention, based on the storage mode that distributed type assemblies Computer Storage mode and LSM are set, the real-time storage of large data can be realized, avoid the excessive storage difficult problem caused of data volume.
In an embodiment of the present invention, can distributed tasks to each computing machine in computer cluster, when task is for obtaining large data, the large data obtained are Real-time Obtainings from the LSM of each computing machine the cluster tree.Therefore, the object of the large data of Real-time Obtaining is achieved.
In an embodiment of the present invention, can distributed tasks to each computing machine in computer cluster, task for analyze large data time, guarantee Real-time Obtaining analysis result.
Accompanying drawing explanation
Fig. 1 is the process flow diagram realizing the large data processing of intelligence in one embodiment of the invention.
Fig. 2 is the process flow diagram realizing the large data processing of intelligence in another embodiment of the present invention in intelligent visual surveillance system.
Fig. 3 is the distribution schematic diagram of the large data of intelligence in one embodiment of the invention.
Fig. 4 is the apparatus structure schematic diagram realizing the large data processing of intelligence in one embodiment of the invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described.Obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
One embodiment of the invention provides the disposal route of the large data of a kind of intelligence, and see Fig. 1, the method comprises:
Step 101: form the computer cluster comprising multiple computing machine storing large data;
Step 102: gather large data;
The large data collected are stored in LSM tree by step 103: the mode stored with column;
Step 104: each LSM tree is stored in each computing machine in described computer cluster respectively.
The embodiment of the present invention can by large distributed data storage in the multiple computing machines in computer cluster, instead of centralized stores is in a computing machine; Further, when storage, be stored in LSM in the mode that column stores, instead of be directly stored in database, therefore, solve the storage problem of large data.
In one embodiment of the present of invention, the process of the large data of intelligence specifically can be realized for intelligent visual surveillance system, in such cases, in conjunction with the processing mode of above-mentioned Fig. 1 embodiment, the distributed storage table of LSM tree can be pre-set, this storage list is strong as row using the video of time period, and the real-time detection of target, target classification, the tracing process of moving target, the analysis of video content are respectively as row race; Afterwards, when collecting large data, after carrying out classification process according to the real-time detection of target, target classification, the tracing process of moving target, the analysis of video content, be stored in the LSM tree of described distributed storage Biao Zhong respective column race respectively; And the mode utilizing multiple LSM to set merging ensures the storage of data, namely in each row race, in data write memory during each LSM is set, after each LSM tree reaches pre-sizing, by the data from overflow of the LSM tree in internal memory in disk, and regularly the LSM tree in disk is merged into a new LSM tree.
In an embodiment of the present invention, can give the good task of each computer settings in cluster, according to the data that such as intelligent visual surveillance system gathers, distributed tasks, the data that task obtains are Real-time Obtaining from LSM tree.When Water demand real time data, task also can be distributed on each machine by computer cluster, guarantees Real-time Obtaining analysis result.
The process realizing the large data processing of intelligence is described in intelligent visual surveillance system below in conjunction with an object lesson.See Fig. 2, comprising:
Step 200: form the computer cluster comprising multiple computing machine storing large data.
Here, after forming computer cluster, then can share process by the multiple computing machines in cluster for follow-up large data, and externally, then but only embody as a whole computing machine.
Step 201: for corresponding task set up by each computing machine in computer cluster.
Here, set task to cluster, common based on the mapreduce task of hadoop.
Step 202: pre-set the distributed storage table based on LSM tree, and in each computing machine in computer cluster, preserve this distributed storage table based on LSM tree.
In this step, can based on hbase list structure.
In this step, this storage list can be strong as row using the video of time period, and the real-time detection of target, target classification, the tracing process of moving target, the analysis of video content are respectively as row race.When needs add other information, interpolation expansion can be carried out in other row race.
A kind of structure based on the distributed storage table of LSM tree can be as shown in table 1 below.
Table 1
In table 1, Check represents real-time detection, and Class represents target classification, and Content represents video content analysis, and P represents the tracing process of moving target, and O represents other information, and row race can divide row as required.
Step 203: gather the large data in the intelligent visual surveillance system in a certain setting-up time section.
Step 204: the large data collected are carried out classification process according to the real-time detection of target, target classification, the tracing process of moving target, the analysis of video content.
Step 205: the large data after classification process are stored in the LSM tree of distributed storage Biao Zhong respective column race respectively.
Step 206: in each row race, in data write memory during each LSM is set, after each LSM tree reaches pre-sizing, by the data from overflow of the LSM tree in internal memory in disk, and regularly the LSM tree in disk is merged into a new LSM tree.
Step 207: each the new LSM tree after merging is stored in each computing machine in described computer cluster respectively.
Step 208: after receiving actual task, is distributed to each actual task in computing machine corresponding in described computer cluster.
Step 209: read in real time in the LSM tree that each actual task stores from the computing machine be distributed to or analyze large data.
In the present embodiment, because the LSM tree of hbase is implemented on hadoop cluster, when Water demand real time data, mapreduce task also can be distributed on each machine by computer cluster, guarantees the reading carrying out LSM tree in real time.Because LSM tree has self merging and burst mechanism, make this distributed computer cluster can solve mass data storage problem, again can by solution mass data real-time processing problem.
By the embodiment shown in Fig. 2, the large data of intelligent visual surveillance system are distributed to each computing machine in computer cluster, achieve the real-time storage of large data, can be shown in Figure 3 than mode like this.
In one embodiment of the invention, propose the treating apparatus of the large data of a kind of intelligence, see Fig. 4, comprising:
Forming unit 401, for the formation of the computer cluster comprising multiple computing machine storing large data;
Collecting unit 402, for gathering large data;
LSM sets processing unit 403, for the mode stored with column, the large data collected is stored in LSM tree;
Cluster processing unit 404, for being stored into each LSM tree respectively in each computing machine in described computer cluster.
In an embodiment of the invention, described LSM sets processing unit 403, be further used for arranging the distributed storage table based on LSM tree, this storage list is strong as row using the video of time period, and the real-time detection of target, target classification, the tracing process of moving target, the analysis of video content are respectively as row race; And in each computing machine in computer cluster, preserve this distributed storage table based on LSM tree; By the large data collected, after specifically carrying out classification process according to the real-time detection of target, target classification, the tracing process of moving target, the analysis of video content, be stored in the LSM tree of described distributed storage Biao Zhong respective column race respectively; In each row race, in the data write memory in being set by each LSM, after each LSM tree reaches pre-sizing, by the data from overflow of the LSM tree in internal memory in disk, and regularly the LSM tree in disk is merged into a new LSM tree.
Described cluster processing unit 404, is specifically stored in each computing machine in described computer cluster respectively by each the new LSM tree after merging.
In one embodiment of the invention, described cluster processing unit 404, is further used for as corresponding task set up by each computing machine in described computer cluster; After receiving actual task, each actual task is distributed in computing machine corresponding in described computer cluster; Read in real time in the LSM tree that each actual task stores from the computing machine be distributed to or analyze large data.
Embodiments of the invention at least have following beneficial effect:
1, the embodiment of the present invention can by large distributed data storage in the multiple computing machines in computer cluster, instead of centralized stores is in a computing machine; Further, when storage, be stored in LSM in the mode that column stores, instead of be directly stored in database, therefore, solve the storage problem of large data.
2, in an embodiment of the present invention, based on the storage mode that distributed type assemblies Computer Storage mode and LSM are set, the real-time storage of large data can be realized, avoid the excessive storage difficult problem caused of data volume.
3, in an embodiment of the present invention, can distributed tasks to each computing machine in computer cluster, when task is for obtaining large data, the large data obtained are Real-time Obtainings from the LSM of each computing machine the cluster tree.Therefore, the object of the large data of Real-time Obtaining is achieved.
4, in an embodiment of the present invention, can distributed tasks to each computing machine in computer cluster, task for analyze large data time, guarantee Real-time Obtaining analysis result.
5, in an embodiment of the present invention, because LSM tree has self merging and burst mechanism, this distributed computer cluster is made can to solve mass data storage problem, again can by solution mass data real-time processing problem.
It should be noted that, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thus make to comprise the process of a series of key element, method, article or equipment and not only comprise those key elements, but also comprise other key elements clearly do not listed, or also comprise by the intrinsic key element of this process, method, article or equipment.When not more restrictions, the key element " being comprised " limited by statement, and be not precluded within process, method, article or the equipment comprising described key element and also there is other same factor.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment made, equivalent replacement, improvement etc., all should be included within the scope of protection of the invention.

Claims (8)

1. the large data processing method of intelligence, is characterized in that, comprising:
Form the computer cluster comprising multiple computing machine storing large data;
Gather large data;
In the mode that column stores, the large data collected are stored in LSM tree; Each LSM tree is stored in each computing machine in described computer cluster respectively.
2. method according to claim 1, is characterized in that, is applied to intelligent visual surveillance system,
Comprise further: pre-set the distributed storage table based on LSM tree, this storage list is strong as row using the video of time period, and the real-time detection of target, target classification, the tracing process of moving target, the analysis of video content are respectively as row race; And in each computing machine in computer cluster, preserve this distributed storage table based on LSM tree;
The large data collected are stored into LSM tree and comprise by the described mode stored with column:
By the large data collected, after carrying out classification process according to the real-time detection of target, target classification, the tracing process of moving target, the analysis of video content, be stored in the LSM tree of described distributed storage Biao Zhong respective column race respectively; In each row race, in the data write memory in being set by each LSM, after each LSM tree reaches pre-sizing, by the data from overflow of the LSM tree in internal memory in disk, and regularly the LSM tree in disk is merged into a new LSM tree.
Describedly each LSM is set each computing machine be stored into respectively in described computer cluster comprise: in each computing machine that each the new LSM tree after merging is stored in described computer cluster respectively.
3. method according to claim 1, is characterized in that, after in described each computing machine being stored in described computer cluster respectively by each LSM tree, comprises further:
For corresponding task set up by each computing machine in described computer cluster; After receiving actual task, each actual task is distributed in computing machine corresponding in described computer cluster; Read in real time in the LSM tree that each actual task stores from the computing machine be distributed to or analyze large data.
4. method according to claim 3, is characterized in that, described actual task is mapreduce task.
5., according to described method arbitrary in Claims 1-4, it is characterized in that, the large data of described collection comprise: gather the data in the intelligent visual surveillance system in setting-up time section.
6. a treating apparatus for the large data of intelligence, is characterized in that, comprising:
Forming unit, for the formation of the computer cluster comprising multiple computing machine storing large data;
Collecting unit, for gathering large data;
LSM sets processing unit, for the mode stored with column, the large data collected is stored in LSM tree;
Cluster processing unit, for being stored into each LSM tree respectively in each computing machine in described computer cluster.
7. device according to claim 6, it is characterized in that, described LSM sets processing unit, be further used for arranging the distributed storage table based on LSM tree, this storage list is strong as row using the video of time period, and the real-time detection of target, target classification, the tracing process of moving target, the analysis of video content are respectively as row race; And in each computing machine in computer cluster, preserve this distributed storage table based on LSM tree; By the large data collected, after specifically carrying out classification process according to the real-time detection of target, target classification, the tracing process of moving target, the analysis of video content, be stored in the LSM tree of described distributed storage Biao Zhong respective column race respectively; In each row race, in the data write memory in being set by each LSM, after each LSM tree reaches pre-sizing, by the data from overflow of the LSM tree in internal memory in disk, and regularly the LSM tree in disk is merged into a new LSM tree.
Described cluster processing unit, is specifically stored in each computing machine in described computer cluster respectively by each the new LSM tree after merging.
8. device according to claim 6, is characterized in that, described cluster processing unit, is further used for as corresponding task set up by each computing machine in described computer cluster; After receiving actual task, each actual task is distributed in computing machine corresponding in described computer cluster; Read in real time in the LSM tree that each actual task stores from the computing machine be distributed to or analyze large data.
CN201510073311.5A 2015-02-11 2015-02-11 Intelligent big data processing method and device Pending CN104598321A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510073311.5A CN104598321A (en) 2015-02-11 2015-02-11 Intelligent big data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510073311.5A CN104598321A (en) 2015-02-11 2015-02-11 Intelligent big data processing method and device

Publications (1)

Publication Number Publication Date
CN104598321A true CN104598321A (en) 2015-05-06

Family

ID=53124135

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510073311.5A Pending CN104598321A (en) 2015-02-11 2015-02-11 Intelligent big data processing method and device

Country Status (1)

Country Link
CN (1) CN104598321A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106227769A (en) * 2016-07-15 2016-12-14 北京奇虎科技有限公司 Date storage method and device
CN107508850A (en) * 2017-06-23 2017-12-22 广东工业大学 Lock-step distribution method based on tree network and deblocking under a kind of big data environment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103023992A (en) * 2012-11-28 2013-04-03 江苏乐买到网络科技有限公司 Mass data distributed storage method
US20140279855A1 (en) * 2013-03-15 2014-09-18 International Business Machines Corporation Differentiated secondary index maintenance in log structured nosql data stores
CN104268709A (en) * 2014-10-10 2015-01-07 浪潮集团有限公司 Method for designing RFID system by distributed LSM tree

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103023992A (en) * 2012-11-28 2013-04-03 江苏乐买到网络科技有限公司 Mass data distributed storage method
US20140279855A1 (en) * 2013-03-15 2014-09-18 International Business Machines Corporation Differentiated secondary index maintenance in log structured nosql data stores
CN104268709A (en) * 2014-10-10 2015-01-07 浪潮集团有限公司 Method for designing RFID system by distributed LSM tree

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李晓波: ""基于Hadoop的海量视频数据存储及转码系统的研究与设计"", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106227769A (en) * 2016-07-15 2016-12-14 北京奇虎科技有限公司 Date storage method and device
CN106227769B (en) * 2016-07-15 2019-11-26 北京奇虎科技有限公司 Date storage method and device
CN107508850A (en) * 2017-06-23 2017-12-22 广东工业大学 Lock-step distribution method based on tree network and deblocking under a kind of big data environment
CN107508850B (en) * 2017-06-23 2020-07-28 广东工业大学 Lock step distribution method based on tree network and data blocks in big data environment

Similar Documents

Publication Publication Date Title
Anuradha A brief introduction on Big Data 5Vs characteristics and Hadoop technology
CN107748752B (en) Data processing method and device
CN106156350A (en) The big data analysing method of a kind of visualization and system
Carnein et al. An empirical comparison of stream clustering algorithms
CN108268565B (en) Method and system for processing user browsing behavior data based on data warehouse
Yadranjiaghdam et al. Developing a real-time data analytics framework for twitter streaming data
CN103399887A (en) Query and statistical analysis system for mass logs
CN111506660B (en) Heat supply network real-time data warehouse system
CN110674154B (en) Spark-based method for inserting, updating and deleting data in Hive
CN104735138A (en) Distributed acquisition method and system oriented to user generated content
CN103838867A (en) Log processing method and device
US9305076B1 (en) Flattening a cluster hierarchy tree to filter documents
CN104361091A (en) Big data system
Rousseau A view on big data and its relation to Informetrics
CN104765823A (en) Method and device for collecting website data
US20160189171A1 (en) Analysing topics in social networks
CN111858278A (en) Log analysis method and system based on big data processing and readable storage device
Suryanarayana et al. Novel weather data analysis using Hadoop and MapReduce–a case study
CN104598321A (en) Intelligent big data processing method and device
CN105956069A (en) Network information collection and analysis method and network information collection and analysis system
CN105468740A (en) Game player data storage and analysis method and apparatus
Anusha et al. Big data techniques for efficient storage and processing of weather data
CN104216901A (en) Information searching method and system
CN106257457A (en) A kind of public sentiment compiles method
CN106776654B (en) Data searching method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150506

WD01 Invention patent application deemed withdrawn after publication