CN103067501B - The large data processing method of PaaS platform - Google Patents

The large data processing method of PaaS platform Download PDF

Info

Publication number
CN103067501B
CN103067501B CN201210581670.8A CN201210581670A CN103067501B CN 103067501 B CN103067501 B CN 103067501B CN 201210581670 A CN201210581670 A CN 201210581670A CN 103067501 B CN103067501 B CN 103067501B
Authority
CN
China
Prior art keywords
paas platform
cluster
data
data processing
distributed treatment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210581670.8A
Other languages
Chinese (zh)
Other versions
CN103067501A (en
Inventor
李进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
GCI Science and Technology Co Ltd
Original Assignee
GCI Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by GCI Science and Technology Co Ltd filed Critical GCI Science and Technology Co Ltd
Priority to CN201210581670.8A priority Critical patent/CN103067501B/en
Publication of CN103067501A publication Critical patent/CN103067501A/en
Application granted granted Critical
Publication of CN103067501B publication Critical patent/CN103067501B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of large data processing method of PaaS platform, comprising: the cluster of PaaS platform server receives user input creates parameter; PaaS platform server creates parameter according to described cluster, generates distributed treatment cluster by Intel Virtualization Technology; PaaS platform server analyzes the script transmission of data to described distributed treatment cluster by being used for, and is processed data to be analyzed by described distributed treatment cluster; Data processed result is supplied to described user by PaaS platform server.The embodiment of the present invention can solve the process problem of the mass data of PaaS platform, improves data-handling efficiency.

Description

The large data processing method of PaaS platform
Technical field
The present invention relates to field of cloud computer technology, particularly relate to a kind of PaaS(Platform-as-a-Service, namely platform serves) the large data processing method of platform.
Background technology
Cloud computing development is just in full swing, and as the key areas of cloud computing industry, PaaS has become each large enterprises and contended following important camp.Due to IaaS(InfrastructureasaService, namely infrastructure serve) and SaaS(Software-as-a-service, namely software serve) achieve commercialization, in cloud environment, numerous application software all achieves standardization, user needs to make full use of the innovative solution that PaaS brings, and service provider then needs this solution to embody the differential competition of oneself.PaaS, as a kind of service mode, can advance the development of SaaS, can improve available resource quantity on Web platform.PaaS solution is that the deployment of application program provides conveniently, simplifies the complexity bought and manage bottom software and hardware, also reduces cost.
Along with the development of PaaS platform, more and more, increasing application deployments is in PaaS platform, due to the automation progress that data produce, increasing application requires the data flow persistence of these quantity sustainable growths to preserve, and carry out follow-up query analysis and data mining, this proposes stern challenge to the management of the mass data of PaaS platform, and the large data processing problem under PaaS platform arises.
Summary of the invention
The embodiment of the present invention proposes a kind of large data processing method of PaaS platform, can solve the process problem of the mass data of PaaS platform, improves data-handling efficiency.
The embodiment of the present invention provides a kind of large data processing method of PaaS platform, comprising:
S1, the cluster of PaaS platform server receives user input creates parameter; Described cluster creates the storage size that parameter comprises the quantity of node of distributed treatment cluster to be created, the memory size of node and node;
S2, PaaS platform server creates parameter according to described cluster, generates distributed treatment cluster by Intel Virtualization Technology;
S3, the journal file memory address that PaaS platform server inputs according to described user or the Apply Names that described user disposes, configure data source to be analyzed;
S4, PaaS platform server analyzes the script transmission of data to described distributed treatment cluster by being used for, and is processed data to be analyzed by described distributed treatment cluster;
S5, data processed result is supplied to described user by PaaS platform server.
Wherein, described node is the virtual machine in distributed treatment cluster; Described node comprises Controlling vertex and computing node, and described Controlling vertex is used for management cluster and distribute data Processing tasks, and described computing node is used for analyzing and processing data.
The large data processing method of the PaaS platform that the embodiment of the present invention provides, utilizes PaaS platform existing resource, generates each node in distributed treatment cluster by PaaS platform by the IaaS layer Intel Virtualization Technology of bottom; There is provided large data-handling capacity by the distributed treatment cluster generated for PaaS platform, the process problem of the mass data of PaaS platform can be solved, improve data-handling efficiency.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of an embodiment of the large data processing method of PaaS platform provided by the invention;
Fig. 2 is the structural representation of an embodiment of the large data handling system of PaaS platform provided by the invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
See Fig. 1, it is the schematic flow sheet of an embodiment of the large data processing method of PaaS platform provided by the invention.
The embodiment of the present invention provides a kind of large data processing method of PaaS platform, comprises step S1 ~ S5, specific as follows:
s1, the cluster of PaaS platform server receives user input creates parameter.
Described cluster creates the storage size that parameter comprises the quantity of node of distributed treatment cluster to be created, the memory size of node and node, and other parameters.
Described node is the virtual machine in distributed treatment cluster; Described node comprises Controlling vertex and computing node, and described Controlling vertex is used for management cluster and distribute data Processing tasks, and described computing node is used for analyzing and processing data.
In addition, PaaS platform server also creates parameter according to described cluster, and whether detection system resource meets the demands.When meeting the demands, performing step S2, creating distributed treatment cluster.
s2, PaaS platform server creates parameter according to described cluster, generates distributed treatment cluster by Intel Virtualization Technology.
Step S2 specifically comprises step S201 ~ S204, as follows:
S201, creates parameter according to described cluster, generates a virtual machine, and configure the running environment of described virtual machine by Intel Virtualization Technology.
Such as, the virtual machine generated installs the softwares such as jdk, mysql, hadoop, and sets.Required software can copy from the soft file large data processing service assembly.In one embodiment, virtual machine adopts Centos5.5 operating system, and jdk version is 1.6.23, mysql version be 5.5, hadoop version is 1.0.2.
S202, creates the quantity of the node in parameter, the virtual machine that copy step S201 generates according to described cluster, generate the virtual machine of requirement.
S203, arrange between virtual machine without cryptographic communication.
Step S203 specifically comprises: control every platform virtual machine activation key generation procedure, generate respective PKI and private key.Again the PKI that every platform virtual machine generates is copied on other virtual machines, realize without cryptographic communication.
During concrete enforcement, a ssh-keygen-tdsa program can be run on every platform virtual machine, respective PKI and private key can be generated.And the content inside PKI file is copied in the authorized_key file of other virtual machines mutually, respectively log in once, generate known_hosts file, realize without cryptographic communication.
S204, arranges the Controlling vertex in distributed treatment cluster and computing node.
The virtual machine that first time generates by the present embodiment acquiescence is as Controlling vertex, and all the other virtual machines are as computing node.Further, slaves, masters, mapred-site.xml, hdfs-site.xml, hadoop-env.sh, core-site.xml in amendment hadoop, configure distributed treatment cluster parameter.
s3, the journal file memory address that PaaS platform server inputs according to described user or the Apply Names that described user disposes, configure data source to be analyzed.
Step S3 specifically comprises:
PaaS platform server receives user input journal file memory address, or obtain corresponding journal file memory address according to described user in the title of the application of PaaS platform deploy;
Whether the file format that PaaS platform server detects in described journal file memory address is journal file (namely judging whether journal file exists); If so, then data to be analyzed are imported from described journal file memory address, otherwise data source configuration failure to be analyzed.
Journal file in described journal file memory address, is data source to be analyzed, is imported in distributed type assemblies and carries out data processing in follow-up step S4.
s4, PaaS platform server analyzes the script transmission of data to described distributed treatment cluster by being used for, and is processed data to be analyzed by described distributed treatment cluster.
Above-mentioned steps S4 specifically comprises:
S401, PaaS platform server will be used for analyzing the script transmission of data to the Controlling vertex in described distributed treatment cluster; The described script for analyzing data is MapReduce script, is used to indicate the method importing data to be analyzed and the method performing MapReduce operation.
S402, described Controlling vertex selects computing node idle in described distributed treatment cluster, performs data processing task concurrently, process data to be analyzed by described computing node.
Controlling vertex in distributed treatment cluster mainly exercises supervision to the execution of MapReduce operation in cluster and manages, and computing node is responsible for the specific implementation of Map task and Reduce task in MapReduce operation.When MapReduce Hand up homework is to distributed treatment cluster, first relevant input data will be divided into multiple segment, and then Controlling vertex is selected idle computing node and performed Map task concurrently to data fragments.Then these intermediate records produced by Map task, again can be divided into and select by Controlling vertex the Reduce task that idle computing node performs concurrently to them, thus obtain the data acquisition system corresponding with each key assignments as operation result.Such process will perform repeatedly, until Map tasks all in MapReduce operation and Reduce tasks carrying complete.
During concrete enforcement, whether PaaS platform server also meets the requirements for the script analyzing data according to script type detection.Such as, require that script is necessary for jar type.After meeting the requirements, perform step S401 and S402.
s5, data processed result is supplied to described user by PaaS platform server.
The large data processing method of the PaaS platform that the embodiment of the present invention provides, can utilize PaaS platform existing resource, generates each node in distributed treatment cluster by PaaS platform by the IaaS layer Intel Virtualization Technology of bottom; There is provided large data-handling capacity by the distributed treatment cluster generated for PaaS platform, thus solve the process problem of the mass data of PaaS platform, improve data-handling efficiency.
In the middle of concrete enforcement, PaaS platform server configures PaaS platform, and this PaaS platform is integrated with large data processing service assembly, is performed the large flow chart of data processing in above-mentioned steps S1 ~ S5 by described large data processing service assembly.
See Fig. 2, the structural representation of an embodiment of the large data handling system of PaaS platform provided by the invention.
The embodiment of the present invention provides a kind of large data handling system of PaaS platform, comprising: PaaS platform layer, virtual distributed treatment cluster, cloud store and server.Specific as follows:
Described PaaS platform layer provides various serviced component, comprises large data processing service assembly, and provides the user interface (UserInterface, be called for short UI) of operation for user.Described PaaS platform adopts OSGi(OpenServiceGatewayInitiative) framework, the various services such as middleware services, data, services, monitor service, large data processing service insert PaaS platform with kit form, thus define pluggable, that dynamic changes behavior, stability and high efficiency system.Described large data processing service assembly provides the input of configuration parameter needed for generating virtual distributed treatment cluster, the representing of result for user; Virtual distributed treatment cluster management function is provided simultaneously, comprises the life cycle controlling cluster, the process monitoring cluster deal with data.
Described virtual distributed treatment cluster, for system provides the analysis data-handling capacity of core.The parameter configuration that described cluster is provided according to large data processing service assembly by PaaS platform, is generated by Intel Virtualization Technology.Described cluster obtains data to be analyzed from cloud stores, and carries out data processing and analysis, and by the user interface of the large data processing service assembly of PaaS platform, analysis result is presented to user according to the script that large data processing service assembly provides.Described cluster adopts Hadoop aggregated structure, achieves a distributed file system (HadoopDistributedFileSystem is called for short HDFS).HDFS has the feature of high fault tolerance, and design is used for being deployed on cheap hardware.And HDFS provides high transmission rates to visit the data of application program.By described Hadoop framework, utilize PaaS platform existing resource, provide the large data-handling capacity of a high reliability, high scalability, high efficiency, high fault tolerance.
Described cloud stores and server, can adopt the existing resource of PaaS platform to build, for whole system provides hardware resource basis.All disk units during described cloud stores derive from cheap PC equipment, are incorporated into the application server being supplied to front end in single shared storage pool, greatly improve disk utilization.Distributed storage improves file read-write efficiency; Cloud storage can realize Large Copacity by linear expansion, can export for unstructured data provides high I O(to input simultaneously) bandwidth.Storage backup strategy eliminates the Single Point of Faliure of disk, ensures high reliability, and conventional store has the advantage of low cost relatively.
The large data processing method of the PaaS platform that the embodiment of the present invention provides and system, have following beneficial effect:
(1), the present invention makes full use of existing storage and the computational resource of PaaS platform, improves PaaS platform resource utilization; User no longer needs again to buy new storage and server, can effectively reduce cost; Meanwhile, large data processing service with the mode of assembly integrated enter PaaS platform, can expand easily, Speeding up development efficiency.
(2), along with the development of PaaS platform, more and more, increasing application deployments is in PaaS platform, the mass data processing of PaaS platform is inevitable, and the present invention can solve the mass data processing problem in PaaS platform effectively, provides data-handling efficiency.
The above is the preferred embodiment of the present invention; it should be pointed out that for those skilled in the art, under the premise without departing from the principles of the invention; can also make some improvements and modifications, these improvements and modifications are also considered as protection scope of the present invention.

Claims (5)

1. a large data processing method for PaaS platform, is characterized in that, comprising:
S1, the cluster of PaaS platform server receives user input creates parameter; Described cluster creates the storage size that parameter comprises the quantity of node of distributed treatment cluster to be created, the memory size of node and node;
S2, PaaS platform server creates parameter according to described cluster, generates distributed treatment cluster by Intel Virtualization Technology;
S3, the journal file memory address that PaaS platform server inputs according to described user or the Apply Names that described user disposes, configure data source to be analyzed;
S4, PaaS platform server analyzes the script transmission of data to described distributed treatment cluster by being used for, and is processed data to be analyzed by described distributed treatment cluster;
S5, data processed result is supplied to described user by PaaS platform server;
Wherein, described PaaS platform is configured on described PaaS platform server, and described PaaS platform is integrated with large data processing service assembly; Described large data processing service assembly is for performing the method flow in above-mentioned steps S1 ~ S5.
2. the large data processing method of PaaS platform as claimed in claim 1, it is characterized in that, described node is the virtual machine in distributed treatment cluster; Described node comprises Controlling vertex and computing node, and described Controlling vertex is used for management cluster and distribute data Processing tasks, and described computing node is used for analyzing and processing data.
3. the large data processing method of PaaS platform as claimed in claim 2, it is characterized in that, described step S2 specifically comprises:
S201, creates parameter according to described cluster, generates a virtual machine, and configure the running environment of described virtual machine by Intel Virtualization Technology;
S202, creates the quantity of the node in parameter, the virtual machine that copy step S201 generates according to described cluster, generate the virtual machine of requirement;
S203, arrange between virtual machine without cryptographic communication;
S204, arranges the Controlling vertex in distributed treatment cluster and computing node.
4. the large data processing method of PaaS platform as claimed in claim 3, it is characterized in that, described step S3 specifically comprises:
PaaS platform server receives user input journal file memory address, or obtain corresponding journal file memory address according to described user in the title of the application of PaaS platform deploy;
Whether the file format that PaaS platform server detects in described journal file memory address is journal file; If so, then data to be analyzed are imported from described journal file memory address, otherwise data source configuration failure to be analyzed.
5. the large data processing method of PaaS platform as claimed in claim 4, it is characterized in that, described step S4 specifically comprises:
S401, PaaS platform server will be used for analyzing the script transmission of data to the Controlling vertex in described distributed treatment cluster; The described script for analyzing data is MapReduce script, is used to indicate the method importing data to be analyzed and the method performing MapReduce operation;
S402, described Controlling vertex selects computing node idle in described distributed treatment cluster, performs data processing task concurrently, process data to be analyzed by described computing node.
CN201210581670.8A 2012-12-28 2012-12-28 The large data processing method of PaaS platform Active CN103067501B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210581670.8A CN103067501B (en) 2012-12-28 2012-12-28 The large data processing method of PaaS platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210581670.8A CN103067501B (en) 2012-12-28 2012-12-28 The large data processing method of PaaS platform

Publications (2)

Publication Number Publication Date
CN103067501A CN103067501A (en) 2013-04-24
CN103067501B true CN103067501B (en) 2015-12-09

Family

ID=48109955

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210581670.8A Active CN103067501B (en) 2012-12-28 2012-12-28 The large data processing method of PaaS platform

Country Status (1)

Country Link
CN (1) CN103067501B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105468625A (en) * 2014-09-04 2016-04-06 中国石油化工股份有限公司 Database cluster constructing method by using virtual machine
WO2017128365A1 (en) * 2016-01-30 2017-08-03 深圳市博信诺达经贸咨询有限公司 Automation information analysis method and system based on big data
CN105897707A (en) * 2016-04-01 2016-08-24 浪潮电子信息产业股份有限公司 Password-less access method to cluster system and master control server
CN106777164B (en) * 2016-12-20 2020-07-10 东软集团股份有限公司 Data migration cluster and data migration method
CN106648672A (en) * 2016-12-28 2017-05-10 北京云星宇交通科技股份有限公司 Method and system for developing and running big data
CN109218101B (en) * 2018-09-26 2020-07-17 北京交通大学 Method and system for creating intelligent cooperative network group
CN110795626A (en) * 2019-10-28 2020-02-14 南京弹跳力信息技术有限公司 Big data processing method and system
CN113518095B (en) * 2021-09-14 2021-12-14 北京华云安信息技术有限公司 SSH cluster deployment method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102110071A (en) * 2011-03-04 2011-06-29 浪潮(北京)电子信息产业有限公司 Virtual machine cluster system and implementation method thereof
CN102404385A (en) * 2011-10-25 2012-04-04 华中科技大学 Virtual cluster deployment system and deployment method for high performance computing
CN102821000A (en) * 2012-09-14 2012-12-12 乐视网信息技术(北京)股份有限公司 Method for improving usability of PaaS platform

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8966450B2 (en) * 2010-06-01 2015-02-24 Red Hat, Inc. Facilitating the execution of web applications in the cloud
US8613004B2 (en) * 2010-12-07 2013-12-17 Nec Laboratories America, Inc. System and method for cloud infrastructure data sharing through a uniform communication framework

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102110071A (en) * 2011-03-04 2011-06-29 浪潮(北京)电子信息产业有限公司 Virtual machine cluster system and implementation method thereof
CN102404385A (en) * 2011-10-25 2012-04-04 华中科技大学 Virtual cluster deployment system and deployment method for high performance computing
CN102821000A (en) * 2012-09-14 2012-12-12 乐视网信息技术(北京)股份有限公司 Method for improving usability of PaaS platform

Also Published As

Publication number Publication date
CN103067501A (en) 2013-04-24

Similar Documents

Publication Publication Date Title
CN103067501B (en) The large data processing method of PaaS platform
US10445121B2 (en) Building virtual machine disk images for different cloud configurations from a single generic virtual machine disk image
US20180331896A1 (en) Creating new cloud resource instruction set architecture
Bojanova et al. Analysis of cloud computing delivery architecture models
CN103064742B (en) A kind of automatic deployment system and method for hadoop cluster
US10324754B2 (en) Managing virtual machine patterns
CN107451147B (en) Method and device for dynamically switching kafka clusters
US9245111B2 (en) Owner command execution in a multi-tenant cloud hosting environment
CN103412768A (en) Zookeeper cluster automatic-deployment method based on script program
WO2019055871A1 (en) Systems and methods for a policy-driven orchestration of deployment of distributed applications
US20130227547A1 (en) Adaptable middleware layer
US20140109095A1 (en) Seamless extension of local computing power
US9678984B2 (en) File access for applications deployed in a cloud environment
CN106533713A (en) Application deployment method and device
KR101680702B1 (en) System for web hosting based cloud service
CN112527647B (en) NS-3-based Raft consensus algorithm test system
CN111064626A (en) Configuration updating method, device, server and readable storage medium
Deyhim Best practices for amazon emr
US9106676B1 (en) Grid-based server messaging infrastructure
CN114579250B (en) Method, device and storage medium for constructing virtual cluster
CN114338820A (en) Network access method, device, electronic equipment and storage medium
CN108595169B (en) Visual programming method, cloud server and storage medium
Agarwal et al. Towards an MPI-like framework for the Azure cloud platform
AU2016203737A1 (en) Adaptive virtual environment management system
CN114327820A (en) Processing method and device for offline tasks

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant