CN108595473A

CN108595473A - A kind of big data application platform based on cloud computing

Info

Publication number: CN108595473A
Application number: CN201810194531.7A
Authority: CN
Inventors: 袁进波
Original assignee: Guangzhou Unipower Computer Co Ltd
Current assignee: Guangzhou Unipower Computer Co Ltd
Priority date: 2018-03-09
Filing date: 2018-03-09
Publication date: 2018-09-28

Abstract

The big data application platform based on cloud computing that the invention discloses a kind of, including data acquisition layer, data analysis layer, accumulation layer, computation layer, application layer and permission and resource management and control layer, the big data application platform, it is realized based on cloud computing, its accumulation layer and computation layer is set to be extended online, and there are many different computing engines for computation layer offer, to realize data prediction, data analysis and data mining, user is in operation process, it can select the computing engines for being suitble to current work, or the computing engines itself being familiar with, to mitigate the burden of operation and improve efficiency.

Description

A kind of big data application platform based on cloud computing

Technical field

The present invention relates to field of computer technology more particularly to a kind of big data application platforms based on cloud computing.

Background technology

With the development of mass data poured in the epoch, each business is required for more data spaces and stronger Data-handling capacity.In terms of data processing, single computing engines can not meet the demand of user.The meter of form The ability to express for calculating result is very limited to.The same operation can only be with a kind of limitation of computing engines to the data processing of user Work increases many live loads, such as the operation for needing team collaboration, different Team Members is often responsible for Different work, each member is intended to be advantageously selected for the computing engines of work at present or itself more known calculating is drawn It holds up, burden can be brought to individual work or team collaboration by only providing single computing engines.

Invention content

For overcome the deficiencies in the prior art, the big data application based on cloud computing that the purpose of the present invention is to provide a kind of Platform can provide a variety of different computing engines to the user, can online be extended to computation layer and accumulation layer, and conveniently The different work demand of user.

The purpose of the present invention adopts the following technical scheme that realization：

A kind of big data application platform based on cloud computing, including：

Data acquisition layer, is used for gathered data；

Data analysis layer is used to store in data pick-up to accumulation layer that data acquisition layer is acquired；

Accumulation layer is used to store collected Various types of data, is called for upper layer application；

Computation layer is used to provide a variety of different computing engines, to realize that data prediction, data analysis and data are dug Pick；

Application layer is the entrance using data, for providing application module, to realize calling, inquiry and the pipe of data Reason；

Permission and resource management and control layer cross over data acquisition layer, data analysis layer, accumulation layer, computation layer and application layer, with Realize the unified management to user right and resource.

Further, in the data acquisition layer, the data acquired include real time data, structural data and non-knot Structure data.

Further, in the data acquisition layer, the structural data is acquired by Oracle technologies, described real-time Data realize that online acquisition, the unstructured data are acquired by Flume technologies by Kafka message queue technologies.

Further, when being extracted to data in the data analysis layer, extraction process includes being carried out just to data Step cleaning, conversion and calculating, so that data form the data format for being suitable for storage in the accumulation layer.

Further, the accumulation layer includes distributed file storage system HDFS, the distributed storage system towards row Unite HBase and key assignments storage system Redis.

Further, the computing engines that the computation layer is provided are included the parallel computation engine calculated based on memory, used In the parallel computation engine of SQL analyses and for realizing the parallel computation engine of MapReduce tasks.

Further, the permission and resource management and control layer include centralized Log Administration System Sentry, centralization peace Full management system Ranger and resource management system Yarn.

Further, the application module includes visualization model and cooperation programming module, and the visualization model is used for Visualization processing is carried out to the result of calculation of the computation layer, the cooperation programming module programs for realizing team collaboration.

Compared with prior art, the beneficial effects of the present invention are：

The big data application platform based on cloud computing of the present invention is realized based on cloud computing, makes its accumulation layer and computation layer It can be extended online, and there are many different computing engines for computation layer offer, to realize data prediction, data analysis And data mining, user can select to be suitble to the computing engines of current work or itself be familiar in operation process Computing engines, to mitigate the burden of operation and improve efficiency.

Description of the drawings

Fig. 1 is the system architecture diagram of the big data application platform based on cloud computing of present pre-ferred embodiments.

Specific implementation mode

In the following, in conjunction with attached drawing and specific implementation mode, the present invention is described further, it should be noted that not Under the premise of conflicting, new implementation can be formed between various embodiments described below or between each technical characteristic in any combination Example.

It is the system architecture diagram of the big data application platform based on cloud computing of present pre-ferred embodiments shown in Fig. 1.It should Big data application platform includes：

Data acquisition layer, is used for gathered data；

As shown in Figure 1, building mode this figure provides a kind of relatively reasonable each level.The present embodiment based on cloud meter The big data application platform of calculation, can be achieved on computing resource extend online, support team cooperation programming and be provided with The platform of a variety of difference computing engines.

The computing resource of the platform is cloud computing resources, includes the resource of accumulation layer and computation layer so that the platform can be with The online computing capability and processing capacity for promoting computing engines, extension storage ability.

Preferably, the data of data acquisition layer acquisition include structural data, real time data and unstructured data, tool For body, the type of data includes business datum, historical data, daily record data and behavioral data；In addition, in data acquisition layer In, structural data is acquired by Oracle technologies, and real time data realizes online acquisition by Kafka message queue technologies, non- Structural data is acquired by Flume technologies.Furthermore it is also possible to using Sqoop technologies, come implementation relation type database and Data transmission between Hadoop；In fact, data acquisition layer is equivalent to the data active layer of the big data application platform.

Preferably, when being extracted to data in data analysis layer, extraction process include to data carry out tentatively cleaning, Conversion and calculating, so that data form the data format for being suitable for storage in the accumulation layer.Data analysis layer include Oozie, Informatica, Spark and MR etc., wherein Oozie are a workflow engines, for assisting Hadoop job managements, Informatica is ETL tools.

Preferably, the accumulation layer includes distributed file storage system HDFS, the distributed memory system towards row HBase and key assignments storage system Redis, for realizing the storage and management of different types of data.In addition, accumulation layer is also wrapped Kudu has been included, has been suitable for quickly analyzing fast-changing data, has been that one kind taking into account data update real-time and analysis speed The storage engines of degree.

Preferably, as shown in Figure 1, computation layer includes that there are many different computing engines, generally speaking for realizing three kinds Main function, respectively data prediction, data analysis and data mining.

Wherein, Spark is increased income cluster computing environment using the big data calculated based on memory that Scala is realized, is provided The interfaces such as Java, Scala, Python and R language；Python can be used for carrying out data mining；

Hawq is the primary large-scale parallel SQL analysis engines of a Hadoop, is directed to analytical application.With other passes Be type class database seemingly, receive SQL, return the result collection.But it have many traditional databases of MPP and its His database no characteristic and function；

Hive is a Tool for Data Warehouse based on Hadoop, can the data file of structuring be mapped as a number According to library table, and complete SQL query function is provided, SQL statement can be converted to MapReduce tasks and run.Its is excellent Point is that learning cost is low, simple MapReduce statistics can be fast implemented by class SQL statement, it is not necessary to develop special MapReduce is applied, and is very suitable for the statistical analysis of data warehouse.

Preferably, permission and resource management layer include centralized Log Administration System Sentry, centralized security management System Ranger and resource management system Yarn, to carry out centralized and unified management, realization pair to user right and resource The distribution of computing resource, the computing resource that this platform is just assigned with acquiescence when distributing account (including CPU, memory, are deposited Storage) give account.

Preferably, the application module of application layer includes visualization model and cooperation programming module, and wherein visualization model is used Visualization processing is carried out in the result of calculation to computation layer, cooperation programming module programs for realizing team collaboration.Specifically, HUE and Zeppelin can be built in application layer.HUE and Zeppelin is handled for realizing interactive editing, to realize that team assists It programs.Wherein, Zeppelin is an offer interaction data analysis and the notes (notebook) based on web.

In the present embodiment, mode is built due to application layer and computation layer so that the different paragraphs of same notes can be with It is write using different computing engines, when executing notes, the different paragraphs in notes can be executed sequentially, and system can be automatically according to section Corresponding computing engines are called in engine statement (interpreter binding) in falling.

The big data application platform based on cloud computing of the present embodiment, in terms of team collaboration, the authors of notes can be with The permission of notes is shared with other user, different users can edit, run the notes together, to realize work compound, And in the process, different users can select suitable computing engines according to current homework type or select oneself The computing engines being familiar with improve operating efficiency to mitigate the burden of individual work and team collaboration.And by HUE and The implementing result of notes can be carried out visualization processing by Zeppelin, and user is allowed to have more intuitive understanding to data result.

The above embodiment is only the preferred embodiment of the present invention, and the scope of protection of the present invention is not limited thereto, The variation and replacement for any unsubstantiality that those skilled in the art is done on the basis of the present invention belong to institute of the present invention Claimed range.

Claims

1. a kind of big data application platform based on cloud computing, which is characterized in that including：

Data acquisition layer, is used for gathered data；

Computation layer is used to provide a variety of different computing engines, to realize data prediction, data analysis and data mining；

Application layer is the entrance using data, for providing application module, to realize calling, inquiry and the management of data；

Permission and resource management and control layer cross over data acquisition layer, data analysis layer, accumulation layer, computation layer and application layer, to realize Unified management to user right and resource.

2. the big data application platform based on cloud computing as described in claim 1, which is characterized in that in the data acquisition layer In, the data acquired include real time data, structural data and unstructured data.

3. the big data application platform based on cloud computing as claimed in claim 2, which is characterized in that in the data acquisition layer In, the structural data is acquired by Oracle technologies, and the real time data is realized online by Kafka message queue technologies Acquisition, the unstructured data are acquired by Flume technologies.

4. the big data application platform based on cloud computing as described in claim 1, which is characterized in that in the data analysis layer In when being extracted to data, extraction process includes to data tentatively clean, convert and calculate, so that data formation is suitable for It is stored in the data format of the accumulation layer.

5. the big data application platform based on cloud computing as described in claim 1, which is characterized in that the accumulation layer includes Distributed file storage system HDFS, the distributed memory system HBase towards row and key assignments storage system Redis.

6. the big data application platform based on cloud computing as described in claim 1, which is characterized in that the computation layer is provided Computing engines include the parallel computation engine calculated based on memory, for the SQL parallel computation engines analyzed and for real The parallel computation engine of existing MapReduce tasks.

7. the big data application platform based on cloud computing as described in claim 1, which is characterized in that the permission and resource pipe Layer is controlled, includes centralized Log Administration System Sentry, centralized security management system Ranger and resource management system Yarn。

8. such as big data application platform of the claim 1-7 any one of them based on cloud computing, which is characterized in that the application Module includes visualization model and cooperation programming module, and the visualization model is used to carry out the result of calculation of the computation layer Visualization processing, the cooperation programming module program for realizing team collaboration.