CN108595473A - A kind of big data application platform based on cloud computing - Google Patents

A kind of big data application platform based on cloud computing Download PDF

Info

Publication number
CN108595473A
CN108595473A CN201810194531.7A CN201810194531A CN108595473A CN 108595473 A CN108595473 A CN 108595473A CN 201810194531 A CN201810194531 A CN 201810194531A CN 108595473 A CN108595473 A CN 108595473A
Authority
CN
China
Prior art keywords
data
layer
cloud computing
application platform
platform based
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810194531.7A
Other languages
Chinese (zh)
Inventor
袁进波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Unipower Computer Co Ltd
Original Assignee
Guangzhou Unipower Computer Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Unipower Computer Co Ltd filed Critical Guangzhou Unipower Computer Co Ltd
Priority to CN201810194531.7A priority Critical patent/CN108595473A/en
Publication of CN108595473A publication Critical patent/CN108595473A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/20Software design
    • G06F8/22Procedural
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The big data application platform based on cloud computing that the invention discloses a kind of, including data acquisition layer, data analysis layer, accumulation layer, computation layer, application layer and permission and resource management and control layer, the big data application platform, it is realized based on cloud computing, its accumulation layer and computation layer is set to be extended online, and there are many different computing engines for computation layer offer, to realize data prediction, data analysis and data mining, user is in operation process, it can select the computing engines for being suitble to current work, or the computing engines itself being familiar with, to mitigate the burden of operation and improve efficiency.

Description

A kind of big data application platform based on cloud computing
Technical field
The present invention relates to field of computer technology more particularly to a kind of big data application platforms based on cloud computing.
Background technology
With the development of mass data poured in the epoch, each business is required for more data spaces and stronger Data-handling capacity.In terms of data processing, single computing engines can not meet the demand of user.The meter of form The ability to express for calculating result is very limited to.The same operation can only be with a kind of limitation of computing engines to the data processing of user Work increases many live loads, such as the operation for needing team collaboration, different Team Members is often responsible for Different work, each member is intended to be advantageously selected for the computing engines of work at present or itself more known calculating is drawn It holds up, burden can be brought to individual work or team collaboration by only providing single computing engines.
Invention content
For overcome the deficiencies in the prior art, the big data application based on cloud computing that the purpose of the present invention is to provide a kind of Platform can provide a variety of different computing engines to the user, can online be extended to computation layer and accumulation layer, and conveniently The different work demand of user.
The purpose of the present invention adopts the following technical scheme that realization:
A kind of big data application platform based on cloud computing, including:
Data acquisition layer, is used for gathered data;
Data analysis layer is used to store in data pick-up to accumulation layer that data acquisition layer is acquired;
Accumulation layer is used to store collected Various types of data, is called for upper layer application;
Computation layer is used to provide a variety of different computing engines, to realize that data prediction, data analysis and data are dug Pick;
Application layer is the entrance using data, for providing application module, to realize calling, inquiry and the pipe of data Reason;
Permission and resource management and control layer cross over data acquisition layer, data analysis layer, accumulation layer, computation layer and application layer, with Realize the unified management to user right and resource.
Further, in the data acquisition layer, the data acquired include real time data, structural data and non-knot Structure data.
Further, in the data acquisition layer, the structural data is acquired by Oracle technologies, described real-time Data realize that online acquisition, the unstructured data are acquired by Flume technologies by Kafka message queue technologies.
Further, when being extracted to data in the data analysis layer, extraction process includes being carried out just to data Step cleaning, conversion and calculating, so that data form the data format for being suitable for storage in the accumulation layer.
Further, the accumulation layer includes distributed file storage system HDFS, the distributed storage system towards row Unite HBase and key assignments storage system Redis.
Further, the computing engines that the computation layer is provided are included the parallel computation engine calculated based on memory, used In the parallel computation engine of SQL analyses and for realizing the parallel computation engine of MapReduce tasks.
Further, the permission and resource management and control layer include centralized Log Administration System Sentry, centralization peace Full management system Ranger and resource management system Yarn.
Further, the application module includes visualization model and cooperation programming module, and the visualization model is used for Visualization processing is carried out to the result of calculation of the computation layer, the cooperation programming module programs for realizing team collaboration.
Compared with prior art, the beneficial effects of the present invention are:
The big data application platform based on cloud computing of the present invention is realized based on cloud computing, makes its accumulation layer and computation layer It can be extended online, and there are many different computing engines for computation layer offer, to realize data prediction, data analysis And data mining, user can select to be suitble to the computing engines of current work or itself be familiar in operation process Computing engines, to mitigate the burden of operation and improve efficiency.
Description of the drawings
Fig. 1 is the system architecture diagram of the big data application platform based on cloud computing of present pre-ferred embodiments.
Specific implementation mode
In the following, in conjunction with attached drawing and specific implementation mode, the present invention is described further, it should be noted that not Under the premise of conflicting, new implementation can be formed between various embodiments described below or between each technical characteristic in any combination Example.
It is the system architecture diagram of the big data application platform based on cloud computing of present pre-ferred embodiments shown in Fig. 1.It should Big data application platform includes:
Data acquisition layer, is used for gathered data;
Data analysis layer is used to store in data pick-up to accumulation layer that data acquisition layer is acquired;
Accumulation layer is used to store collected Various types of data, is called for upper layer application;
Computation layer is used to provide a variety of different computing engines, to realize that data prediction, data analysis and data are dug Pick;
Application layer is the entrance using data, for providing application module, to realize calling, inquiry and the pipe of data Reason;
Permission and resource management and control layer cross over data acquisition layer, data analysis layer, accumulation layer, computation layer and application layer, with Realize the unified management to user right and resource.
As shown in Figure 1, building mode this figure provides a kind of relatively reasonable each level.The present embodiment based on cloud meter The big data application platform of calculation, can be achieved on computing resource extend online, support team cooperation programming and be provided with The platform of a variety of difference computing engines.
The computing resource of the platform is cloud computing resources, includes the resource of accumulation layer and computation layer so that the platform can be with The online computing capability and processing capacity for promoting computing engines, extension storage ability.
Preferably, the data of data acquisition layer acquisition include structural data, real time data and unstructured data, tool For body, the type of data includes business datum, historical data, daily record data and behavioral data;In addition, in data acquisition layer In, structural data is acquired by Oracle technologies, and real time data realizes online acquisition by Kafka message queue technologies, non- Structural data is acquired by Flume technologies.Furthermore it is also possible to using Sqoop technologies, come implementation relation type database and Data transmission between Hadoop;In fact, data acquisition layer is equivalent to the data active layer of the big data application platform.
Preferably, when being extracted to data in data analysis layer, extraction process include to data carry out tentatively cleaning, Conversion and calculating, so that data form the data format for being suitable for storage in the accumulation layer.Data analysis layer include Oozie, Informatica, Spark and MR etc., wherein Oozie are a workflow engines, for assisting Hadoop job managements, Informatica is ETL tools.
Preferably, the accumulation layer includes distributed file storage system HDFS, the distributed memory system towards row HBase and key assignments storage system Redis, for realizing the storage and management of different types of data.In addition, accumulation layer is also wrapped Kudu has been included, has been suitable for quickly analyzing fast-changing data, has been that one kind taking into account data update real-time and analysis speed The storage engines of degree.
Preferably, as shown in Figure 1, computation layer includes that there are many different computing engines, generally speaking for realizing three kinds Main function, respectively data prediction, data analysis and data mining.
Wherein, Spark is increased income cluster computing environment using the big data calculated based on memory that Scala is realized, is provided The interfaces such as Java, Scala, Python and R language;Python can be used for carrying out data mining;
Hawq is the primary large-scale parallel SQL analysis engines of a Hadoop, is directed to analytical application.With other passes Be type class database seemingly, receive SQL, return the result collection.But it have many traditional databases of MPP and its His database no characteristic and function;
Hive is a Tool for Data Warehouse based on Hadoop, can the data file of structuring be mapped as a number According to library table, and complete SQL query function is provided, SQL statement can be converted to MapReduce tasks and run.Its is excellent Point is that learning cost is low, simple MapReduce statistics can be fast implemented by class SQL statement, it is not necessary to develop special MapReduce is applied, and is very suitable for the statistical analysis of data warehouse.
Preferably, permission and resource management layer include centralized Log Administration System Sentry, centralized security management System Ranger and resource management system Yarn, to carry out centralized and unified management, realization pair to user right and resource The distribution of computing resource, the computing resource that this platform is just assigned with acquiescence when distributing account (including CPU, memory, are deposited Storage) give account.
Preferably, the application module of application layer includes visualization model and cooperation programming module, and wherein visualization model is used Visualization processing is carried out in the result of calculation to computation layer, cooperation programming module programs for realizing team collaboration.Specifically, HUE and Zeppelin can be built in application layer.HUE and Zeppelin is handled for realizing interactive editing, to realize that team assists It programs.Wherein, Zeppelin is an offer interaction data analysis and the notes (notebook) based on web.
In the present embodiment, mode is built due to application layer and computation layer so that the different paragraphs of same notes can be with It is write using different computing engines, when executing notes, the different paragraphs in notes can be executed sequentially, and system can be automatically according to section Corresponding computing engines are called in engine statement (interpreter binding) in falling.
The big data application platform based on cloud computing of the present embodiment, in terms of team collaboration, the authors of notes can be with The permission of notes is shared with other user, different users can edit, run the notes together, to realize work compound, And in the process, different users can select suitable computing engines according to current homework type or select oneself The computing engines being familiar with improve operating efficiency to mitigate the burden of individual work and team collaboration.And by HUE and The implementing result of notes can be carried out visualization processing by Zeppelin, and user is allowed to have more intuitive understanding to data result.
The above embodiment is only the preferred embodiment of the present invention, and the scope of protection of the present invention is not limited thereto, The variation and replacement for any unsubstantiality that those skilled in the art is done on the basis of the present invention belong to institute of the present invention Claimed range.

Claims (8)

1. a kind of big data application platform based on cloud computing, which is characterized in that including:
Data acquisition layer, is used for gathered data;
Data analysis layer is used to store in data pick-up to accumulation layer that data acquisition layer is acquired;
Accumulation layer is used to store collected Various types of data, is called for upper layer application;
Computation layer is used to provide a variety of different computing engines, to realize data prediction, data analysis and data mining;
Application layer is the entrance using data, for providing application module, to realize calling, inquiry and the management of data;
Permission and resource management and control layer cross over data acquisition layer, data analysis layer, accumulation layer, computation layer and application layer, to realize Unified management to user right and resource.
2. the big data application platform based on cloud computing as described in claim 1, which is characterized in that in the data acquisition layer In, the data acquired include real time data, structural data and unstructured data.
3. the big data application platform based on cloud computing as claimed in claim 2, which is characterized in that in the data acquisition layer In, the structural data is acquired by Oracle technologies, and the real time data is realized online by Kafka message queue technologies Acquisition, the unstructured data are acquired by Flume technologies.
4. the big data application platform based on cloud computing as described in claim 1, which is characterized in that in the data analysis layer In when being extracted to data, extraction process includes to data tentatively clean, convert and calculate, so that data formation is suitable for It is stored in the data format of the accumulation layer.
5. the big data application platform based on cloud computing as described in claim 1, which is characterized in that the accumulation layer includes Distributed file storage system HDFS, the distributed memory system HBase towards row and key assignments storage system Redis.
6. the big data application platform based on cloud computing as described in claim 1, which is characterized in that the computation layer is provided Computing engines include the parallel computation engine calculated based on memory, for the SQL parallel computation engines analyzed and for real The parallel computation engine of existing MapReduce tasks.
7. the big data application platform based on cloud computing as described in claim 1, which is characterized in that the permission and resource pipe Layer is controlled, includes centralized Log Administration System Sentry, centralized security management system Ranger and resource management system Yarn。
8. such as big data application platform of the claim 1-7 any one of them based on cloud computing, which is characterized in that the application Module includes visualization model and cooperation programming module, and the visualization model is used to carry out the result of calculation of the computation layer Visualization processing, the cooperation programming module program for realizing team collaboration.
CN201810194531.7A 2018-03-09 2018-03-09 A kind of big data application platform based on cloud computing Pending CN108595473A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810194531.7A CN108595473A (en) 2018-03-09 2018-03-09 A kind of big data application platform based on cloud computing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810194531.7A CN108595473A (en) 2018-03-09 2018-03-09 A kind of big data application platform based on cloud computing

Publications (1)

Publication Number Publication Date
CN108595473A true CN108595473A (en) 2018-09-28

Family

ID=63626065

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810194531.7A Pending CN108595473A (en) 2018-03-09 2018-03-09 A kind of big data application platform based on cloud computing

Country Status (1)

Country Link
CN (1) CN108595473A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109309686A (en) * 2018-11-01 2019-02-05 浪潮软件集团有限公司 Multi-tenant management method and device
CN109471907A (en) * 2018-11-15 2019-03-15 刘长山 A kind of driving law-analysing system and method based on bayonet data
CN109739663A (en) * 2018-12-29 2019-05-10 深圳前海微众银行股份有限公司 Job processing method, device, equipment and computer readable storage medium
CN109740765A (en) * 2019-01-31 2019-05-10 成都品果科技有限公司 A kind of machine learning system building method based on Amazon server
CN110515603A (en) * 2019-07-09 2019-11-29 成都品果科技有限公司 A method of deployment Spark application
CN111721355A (en) * 2020-05-14 2020-09-29 中铁第一勘察设计院集团有限公司 Railway contact net monitoring data acquisition system
WO2021047506A1 (en) * 2019-09-11 2021-03-18 中兴通讯股份有限公司 System and method for statistical analysis of data, and computer-readable storage medium
CN113347170A (en) * 2021-05-27 2021-09-03 北京计算机技术及应用研究所 Intelligent analysis platform design method based on big data framework
CN113377877A (en) * 2021-08-10 2021-09-10 深圳市爱云信息科技有限公司 Multi-engine big data platform

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1420488A (en) * 2001-08-07 2003-05-28 陈涛 Vedio tape picture and text data generating and coding method and picture and text data playback device
CN105608144A (en) * 2015-12-17 2016-05-25 山东鲁能软件技术有限公司 Big data analysis platform device and method based on multilayer model iteration
CN106815338A (en) * 2016-12-25 2017-06-09 北京中海投资管理有限公司 A kind of real-time storage of big data, treatment and inquiry system
CN107515927A (en) * 2017-08-24 2017-12-26 深圳市云房网络科技有限公司 A kind of real estate user behavioural analysis platform
CN107577805A (en) * 2017-09-26 2018-01-12 华南理工大学 A kind of business service system towards the analysis of daily record big data
US20180069888A1 (en) * 2015-08-31 2018-03-08 Splunk Inc. Identity resolution in data intake of a distributed data processing system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1420488A (en) * 2001-08-07 2003-05-28 陈涛 Vedio tape picture and text data generating and coding method and picture and text data playback device
US20180069888A1 (en) * 2015-08-31 2018-03-08 Splunk Inc. Identity resolution in data intake of a distributed data processing system
CN105608144A (en) * 2015-12-17 2016-05-25 山东鲁能软件技术有限公司 Big data analysis platform device and method based on multilayer model iteration
CN106815338A (en) * 2016-12-25 2017-06-09 北京中海投资管理有限公司 A kind of real-time storage of big data, treatment and inquiry system
CN107515927A (en) * 2017-08-24 2017-12-26 深圳市云房网络科技有限公司 A kind of real estate user behavioural analysis platform
CN107577805A (en) * 2017-09-26 2018-01-12 华南理工大学 A kind of business service system towards the analysis of daily record big data

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MIAO君: ""一文读懂大数据平台—写给大数据开发初学者的话!"", 《HTTPS://ZHUANLAN.ZHIHU.COM/P/26545566》 *
哥不是小萝莉: ""Hadoop生态系统"", 《HTTPS://WWW.CNBLOGS.COM/SMARTLOLI/P/5640587.HTML》 *
罗树兰: ""基于Hadoop数据处理研究及应用"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109309686A (en) * 2018-11-01 2019-02-05 浪潮软件集团有限公司 Multi-tenant management method and device
CN109471907A (en) * 2018-11-15 2019-03-15 刘长山 A kind of driving law-analysing system and method based on bayonet data
CN109471907B (en) * 2018-11-15 2022-04-29 刘长山 Traffic law analysis system and method based on checkpoint data
CN109739663A (en) * 2018-12-29 2019-05-10 深圳前海微众银行股份有限公司 Job processing method, device, equipment and computer readable storage medium
CN109740765A (en) * 2019-01-31 2019-05-10 成都品果科技有限公司 A kind of machine learning system building method based on Amazon server
CN109740765B (en) * 2019-01-31 2023-05-02 成都品果科技有限公司 Machine learning system building method based on Amazon network server
CN110515603A (en) * 2019-07-09 2019-11-29 成都品果科技有限公司 A method of deployment Spark application
WO2021047506A1 (en) * 2019-09-11 2021-03-18 中兴通讯股份有限公司 System and method for statistical analysis of data, and computer-readable storage medium
CN111721355A (en) * 2020-05-14 2020-09-29 中铁第一勘察设计院集团有限公司 Railway contact net monitoring data acquisition system
CN113347170A (en) * 2021-05-27 2021-09-03 北京计算机技术及应用研究所 Intelligent analysis platform design method based on big data framework
CN113377877A (en) * 2021-08-10 2021-09-10 深圳市爱云信息科技有限公司 Multi-engine big data platform

Similar Documents

Publication Publication Date Title
CN108595473A (en) A kind of big data application platform based on cloud computing
CN104820670B (en) A kind of acquisition of power information big data and storage method
CN105045820B (en) Method for processing video image information of high-level data and database system
CN104899199B (en) A kind of data warehouse data processing method and system
CN104331435B (en) A kind of efficient mass data abstracting method of low influence based on Hadoop big data platforms
CN104346143B (en) A kind of data transfer device by EBOM to MBOM
CN107945086A (en) A kind of big data resource management system applied to smart city
CN107247799A (en) Data processing method, system and its modeling method of compatible a variety of big data storages
CN106407278A (en) Architecture design system of big data platform
CN104573071A (en) Intelligent school situation analysis system and method based on megadata technology
CN107341205A (en) A kind of intelligent distribution system based on big data platform
CN103399887A (en) Query and statistical analysis system for mass logs
CN107545014A (en) Stream calculation instant disposal system for treating based on Storm
CN101799808A (en) Data processing method and system thereof
CN106951475A (en) Big data distributed approach and system based on cloud computing
CN103699676B (en) MSSQL SERVER based table partition and automatic maintenance method and system
CN106951552A (en) A kind of user behavior data processing method based on Hadoop
CN104899314A (en) Pedigree analysis method and device of data warehouse
CN104361091A (en) Big data system
CN106202566A (en) A kind of magnanimity electricity consumption data mixing based on big data storage system and method
CN107733696A (en) A kind of machine learning and artificial intelligence application all-in-one dispositions method
CN105956932A (en) Distribution and utilization data fusion method and system
CN107784039A (en) A kind of data load method, apparatus and system
CN112948353B (en) Data analysis method, system and storage medium applied to DAstudio
Zhang et al. A 2-tier clustering algorithm with map-reduce

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180928

RJ01 Rejection of invention patent application after publication