CN103067501A - Large data processing method of PaaS platform - Google Patents

Large data processing method of PaaS platform Download PDF

Info

Publication number
CN103067501A
CN103067501A CN2012105816708A CN201210581670A CN103067501A CN 103067501 A CN103067501 A CN 103067501A CN 2012105816708 A CN2012105816708 A CN 2012105816708A CN 201210581670 A CN201210581670 A CN 201210581670A CN 103067501 A CN103067501 A CN 103067501A
Authority
CN
China
Prior art keywords
paas platform
cluster
data
node
data processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012105816708A
Other languages
Chinese (zh)
Other versions
CN103067501B (en
Inventor
李进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
GCI Science and Technology Co Ltd
Original Assignee
GCI Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by GCI Science and Technology Co Ltd filed Critical GCI Science and Technology Co Ltd
Priority to CN201210581670.8A priority Critical patent/CN103067501B/en
Publication of CN103067501A publication Critical patent/CN103067501A/en
Application granted granted Critical
Publication of CN103067501B publication Critical patent/CN103067501B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a large data processing method of a PaaS platform. The large data processing method of the PaaS platform comprises that a PaaS platform server receives a colony creating parameter which is inputted by a user. The PaaS platform server generates a distributed processing colony by virtualization technology according to the colony creating parameter. The PaaS platform server transmits script which is used for analyzing data to the distributed processing unit, and processes analyzed data by the distributed processing colony. The PaaS platform server provides data processing results for the user. The large data processing method of the PaaS platform can solve the problem of processing of mass data of the PaaS platform, and improves data processing efficiency.

Description

The large data processing method of PaaS platform
Technical field
The present invention relates to the cloud computing technology field, relate in particular to a kind of PaaS(Platform-as-a-Service, platform is namely served) the large data processing method of platform.
Background technology
The cloud computing development is just in full swing, and as the key areas of cloud computing industry, PaaS has become each large enterprises and contended following important camp.Because IaaS(Infrastructure as a Service, infrastructure is namely served) and SaaS(Software-as-a-service, software is namely served) realized commercialization, numerous application software have all realized standardization in the cloud environment, the user needs to take full advantage of the innovative solution that PaaS brings, and the service provider then needs this solution to embody the differential competition of oneself.PaaS can advance the development of SaaS as a kind of service mode, can improve available resource quantity on the Web platform.The PaaS solution provides convenience for the deployment of application program, has simplified the complexity of buying and managing the bottom software and hardware, has also reduced cost.
Development along with the PaaS platform, more and more, increasing application deployments are on the PaaS platform, because the automation progress that data produce, increasing application requirements preserves the data flow persistence of these quantity sustainable growths, and carry out follow-up query analysis and data mining, this management to the mass data of PaaS platform has proposed stern challenge, and the large data processing problem under the PaaS platform arises.
Summary of the invention
The embodiment of the invention proposes a kind of large data processing method of PaaS platform, can solve the processing problem of the mass data of PaaS platform, improves data-handling efficiency.
The embodiment of the invention provides a kind of large data processing method of PaaS platform, comprising:
The cluster that S1, PaaS Platform Server receive user's input creates parameter; Described cluster creates parameter and comprises the quantity of the node of distributed treatment cluster to be created, the memory size of node and the storage size of node;
S2, the PaaS Platform Server creates parameter according to described cluster, generates the distributed treatment cluster by Intel Virtualization Technology;
S3, the PaaS Platform Server disposes data source to be analyzed according to the journal file memory address of described user's input or the Apply Names of described user deployment;
S4, the PaaS Platform Server will be given described distributed treatment cluster for the script transmission of analyzing data, by described distributed treatment cluster data to be analyzed be processed;
S5, the PaaS Platform Server offers described user with data processed result.
Wherein, described node is the virtual machine in the distributed treatment cluster; Described node comprises control node and computing node, and described control node is used for management cluster and distribute data Processing tasks, and described computing node is used for analyzing and processing data.
The large data processing method of the PaaS platform that the embodiment of the invention provides is utilized PaaS platform existing resource, and the IaaS layer Intel Virtualization Technology that is passed through bottom by the PaaS platform generates each node in the distributed treatment cluster; Provide large data-handling capacity by the distributed treatment cluster that generates for the PaaS platform, can solve the processing problem of the mass data of PaaS platform, improve data-handling efficiency.
Description of drawings
Fig. 1 is the schematic flow sheet of an embodiment of the large data processing method of PaaS platform provided by the invention;
Fig. 2 is the structural representation of an embodiment of the large data handling system of PaaS platform provided by the invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that obtains under the creative work prerequisite.
Referring to Fig. 1, it is the schematic flow sheet of an embodiment of the large data processing method of PaaS platform provided by the invention.
The embodiment of the invention provides a kind of large data processing method of PaaS platform, comprises step S1 ~ S5, and is specific as follows:
The cluster that S1, PaaS Platform Server receive user's input creates parameter.
Described cluster creates parameter and comprises the quantity of the node of distributed treatment cluster to be created, the memory size of node and the storage size of node, and other parameters.
Described node is the virtual machine in the distributed treatment cluster; Described node comprises control node and computing node, and described control node is used for management cluster and distribute data Processing tasks, and described computing node is used for analyzing and processing data.
In addition, the PaaS Platform Server also creates parameter according to described cluster, and whether the detection system resource meets the demands.When meeting the demands, execution in step S2 creates the distributed treatment cluster.
S2, the PaaS Platform Server creates parameter according to described cluster, generates the distributed treatment cluster by Intel Virtualization Technology.
Step S2 specifically comprises step S201 ~ S204, and is as follows:
S201 creates parameter according to described cluster, generates a virtual machine by Intel Virtualization Technology, and disposes the running environment of described virtual machine.
For example, at the virtual machine that generates the softwares such as jdk, mysql, hadoop are installed, and are set.Required software can copy the soft file under large data processing serviced component.In one embodiment, virtual machine adopts Centos5.5 operating system, and the jdk version is 1.6.23, and the mysql version is that 5.5, hadoop version is 1.0.2.
S202 is according to the quantity of the node in the described cluster establishment parameter, the virtual machine that copy step S201 generates, the virtual machine of generation requirement.
S203, arrange between the virtual machine without cryptographic communication.
Step S203 specifically comprises: control every virtual machine activation key generator, generate separately PKI and private key.The PKI that more every virtual machine is generated copies on other virtual machines, realizes without cryptographic communication.
During implementation, can in ssh-keygen-t dsa program of every virtual machine operation, can generate separately PKI and private key.And the content of PKI file the inside copied in the authorized_key file of other virtual machines mutually, respectively log in once, generate the known_hosts file, realize without cryptographic communication.
S204 arranges control node and computing node in the distributed treatment cluster.
The virtual machine that the present embodiment acquiescence will generate for the first time is as the control node, and all the other virtual machines are as computing node.And, revise slaves, masters, mapred-site.xml, hdfs-site.xml, hadoop-env.sh, core-site.xml among the hadoop, configure distributed treatment cluster parameter.
S3, the PaaS Platform Server disposes data source to be analyzed according to the journal file memory address of described user's input or the Apply Names of described user deployment.
Step S3 specifically comprises:
The PaaS Platform Server receives user's input journal file memory address, perhaps obtains corresponding journal file memory address according to described user in the title of the application of PaaS platform deploy;
Whether the file format that the PaaS Platform Server detects in the described journal file memory address is journal file (judging namely whether journal file exists); If, then import data to be analyzed from described journal file memory address, otherwise data source configuration failure to be analyzed.
Journal file in the described journal file memory address is data source to be analyzed, is imported into to carry out the data processing in the distributed type assemblies in follow-up step S4.
S4, the PaaS Platform Server will be given described distributed treatment cluster for the script transmission of analyzing data, by described distributed treatment cluster data to be analyzed be processed.
Above-mentioned steps S4 specifically comprises:
S401, the PaaS Platform Server will be given for the script transmission of analyzing data the control node of described distributed treatment cluster; Described script for analyzing data is the MapReduce script, is used to indicate the method that imports data to be analyzed and the method for carrying out the MapReduce operation.
S402, described control node select in the described distributed treatment cluster idle computing node, by described computing node executing data Processing tasks concurrently, data to be analyzed are processed.
Control node in the distributed treatment cluster mainly exercises supervision to the execution of MapReduce operation in the cluster and manages, and computing node is responsible for the specific implementation of Map task and Reduce task in the MapReduce operation.When the distributed treatment cluster is submitted in the MapReduce operation, relevant input data will at first be divided into a plurality of segments, then control node and select idle computing node the data segment is carried out the Map task concurrently.Then these can again be divided into by the control node and be selected the Reduce task that idle computing node is carried out concurrently to them by the intermediate record that the Map task produces, thereby obtain the data acquisition system corresponding with each key assignments as operation result.Such process will be carried out repeatedly, until all Map task and Reduce tasks carrying is complete in the MapReduce operation.
During implementation, whether the PaaS Platform Server also detects the script that is used for the analysis data according to script type and meets the requirements.For example, require script to be necessary for the jar type.After meeting the requirements, execution in step S401 and S402.
S5, the PaaS Platform Server offers described user with data processed result.
The large data processing method of the PaaS platform that the embodiment of the invention provides can be utilized PaaS platform existing resource, and the IaaS layer Intel Virtualization Technology that is passed through bottom by the PaaS platform generates each node in the distributed treatment cluster; Provide large data-handling capacity by the distributed treatment cluster that generates for the PaaS platform, thereby solve the processing problem of the mass data of PaaS platform, improve data-handling efficiency.
In the middle of implementation, at PaaS Platform Server configuration PaaS platform, this PaaS platform is integrated, and large data are processed serviced component, process serviced component by described large data and carry out large flow chart of data processing among above-mentioned steps S1 ~ S5.
 
Referring to Fig. 2, the structural representation of an embodiment of the large data handling system of PaaS platform provided by the invention.
The embodiment of the invention provides a kind of large data handling system of PaaS platform, comprising: PaaS podium level, virtual distributed treatment cluster, cloud storage and server.Specific as follows:
Described PaaS podium level provides various serviced components, comprises large data processing serviced component, and the user interface (User Interface is called for short UI) of operation is provided for the user.Described PaaS platform adopts OSGi(Open Service Gateway Initiative) framework, middleware services, data, services, monitor service, large data are processed the various services such as service and are inserted the PaaS platform with kit form, thus formed can plug, the system of capable of dynamic change behavior, stability and high efficiency.Described large data are processed serviced component provides the required configuration parameter of generating virtual distributed treatment cluster for the user the representing of input, result; Virtual distributed treatment cluster management function is provided simultaneously, comprises the life cycle of controlling cluster, the process of monitoring cluster deal with data.
Described virtual distributed treatment cluster provides the analysis data-handling capacity of core for system.Described cluster is processed the parameter configuration that serviced component provides by the PaaS platform according to large data, generates by Intel Virtualization Technology.Described cluster obtains data to be analyzed from cloud stores, the script of processing serviced component and providing according to large data carries out data to be processed and analyze, and by the user interface that the large data of PaaS platform are processed serviced component analysis result is represented to the user.Described cluster adopts the Hadoop aggregated structure, has realized a distributed file system (Hadoop Distributed File System is called for short HDFS).HDFS has the characteristics of high fault tolerance, and design is used for being deployed on the cheap hardware.And HDFS provides high transmission rates to visit the data of application program.By described Hadoop framework, utilize PaaS platform existing resource, the large data-handling capacity of a high reliability, high scalability, high efficiency, high fault tolerance is provided.
Described cloud storage and server can adopt the existing resource of PaaS platform to make up, for whole system provides the hardware resource basis.All disk units in the described cloud storage derive from cheap PC equipment, are incorporated into the application server that offers front end in the single shared storage pool, have greatly improved disk utilization.Distributed storage has improved file read-write efficient; The cloud storage can realize large capacity by linear expansion, can provide high I O(input output for unstructured data simultaneously) bandwidth.The storage backup strategy is eliminated the Single Point of Faliure of disk, ensures high reliability, and conventional store has cheaply advantage relatively.
Large data processing method and the system of the PaaS platform that the embodiment of the invention provides have following beneficial effect:
(1), the present invention takes full advantage of existing storage and the computational resource of PaaS platform, improves PaaS platform resource service efficiency; The user no longer needs again to buy new storage and server, can effectively reduce cost; Simultaneously, large data are processed service and are advanced the PaaS platform so that the mode of assembly is integrated, can expand easily Speeding up development efficient.
(2), along with the development of PaaS platform, more and more, increasing application deployments are on the PaaS platform, the mass data processing of PaaS platform is inevitable, and the present invention can solve the mass data processing problem on the PaaS platform effectively, and data-handling efficiency is provided.
The above is preferred implementation of the present invention; should be pointed out that for those skilled in the art, under the prerequisite that does not break away from the principle of the invention; can also make some improvements and modifications, these improvements and modifications also are considered as protection scope of the present invention.

Claims (5)

1. the large data processing method of a PaaS platform is characterized in that, comprising:
The cluster that S1, PaaS Platform Server receive user's input creates parameter; Described cluster creates parameter and comprises the quantity of the node of distributed treatment cluster to be created, the memory size of node and the storage size of node;
S2, the PaaS Platform Server creates parameter according to described cluster, generates the distributed treatment cluster by Intel Virtualization Technology;
S3, the PaaS Platform Server disposes data source to be analyzed according to the journal file memory address of described user's input or the Apply Names of described user deployment;
S4, the PaaS Platform Server will be given described distributed treatment cluster for the script transmission of analyzing data, by described distributed treatment cluster data to be analyzed be processed;
S5, the PaaS Platform Server offers described user with data processed result.
2. the large data processing method of PaaS platform as claimed in claim 1 is characterized in that, described node is the virtual machine in the distributed treatment cluster; Described node comprises control node and computing node, and described control node is used for management cluster and distribute data Processing tasks, and described computing node is used for analyzing and processing data.
3. the large data processing method of PaaS platform as claimed in claim 2 is characterized in that, described step S2 specifically comprises:
S201 creates parameter according to described cluster, generates a virtual machine by Intel Virtualization Technology, and disposes the running environment of described virtual machine;
S202 is according to the quantity of the node in the described cluster establishment parameter, the virtual machine that copy step S201 generates, the virtual machine of generation requirement;
S203, arrange between the virtual machine without cryptographic communication;
S204 arranges control node and computing node in the distributed treatment cluster.
4. the large data processing method of PaaS platform as claimed in claim 3 is characterized in that, described step S3 specifically comprises:
The PaaS Platform Server receives user's input journal file memory address, perhaps obtains corresponding journal file memory address according to described user in the title of the application of PaaS platform deploy;
Whether the file format that the PaaS Platform Server detects in the described journal file memory address is journal file; If, then import data to be analyzed from described journal file memory address, otherwise data source configuration failure to be analyzed.
5. the large data processing method of PaaS platform as claimed in claim 4 is characterized in that, described step S4 specifically comprises:
S401, the PaaS Platform Server will be given for the script transmission of analyzing data the control node of described distributed treatment cluster; Described script for analyzing data is the MapReduce script, is used to indicate the method that imports data to be analyzed and the method for carrying out the MapReduce operation;
S402, described control node select in the described distributed treatment cluster idle computing node, by described computing node executing data Processing tasks concurrently, data to be analyzed are processed.
CN201210581670.8A 2012-12-28 2012-12-28 The large data processing method of PaaS platform Active CN103067501B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210581670.8A CN103067501B (en) 2012-12-28 2012-12-28 The large data processing method of PaaS platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210581670.8A CN103067501B (en) 2012-12-28 2012-12-28 The large data processing method of PaaS platform

Publications (2)

Publication Number Publication Date
CN103067501A true CN103067501A (en) 2013-04-24
CN103067501B CN103067501B (en) 2015-12-09

Family

ID=48109955

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210581670.8A Active CN103067501B (en) 2012-12-28 2012-12-28 The large data processing method of PaaS platform

Country Status (1)

Country Link
CN (1) CN103067501B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105468625A (en) * 2014-09-04 2016-04-06 中国石油化工股份有限公司 Database cluster constructing method by using virtual machine
CN105683965A (en) * 2016-01-30 2016-06-15 深圳市博信诺达经贸咨询有限公司 Method and system for automated information analysis based on big data
CN105897707A (en) * 2016-04-01 2016-08-24 浪潮电子信息产业股份有限公司 Password-less access method to cluster system and master control server
CN106648672A (en) * 2016-12-28 2017-05-10 北京云星宇交通科技股份有限公司 Method and system for developing and running big data
CN106777164A (en) * 2016-12-20 2017-05-31 东软集团股份有限公司 A kind of Data Migration cluster and data migration method
CN109218101A (en) * 2018-09-26 2019-01-15 北京交通大学 A kind of method and system of wisdom contract network group creation
CN110795626A (en) * 2019-10-28 2020-02-14 南京弹跳力信息技术有限公司 Big data processing method and system
CN113518095A (en) * 2021-09-14 2021-10-19 北京华云安信息技术有限公司 SSH cluster deployment method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102110071A (en) * 2011-03-04 2011-06-29 浪潮(北京)电子信息产业有限公司 Virtual machine cluster system and implementation method thereof
US20110295984A1 (en) * 2010-06-01 2011-12-01 Tobias Kunze Cartridge-based package management
CN102404385A (en) * 2011-10-25 2012-04-04 华中科技大学 Virtual cluster deployment system and deployment method for high performance computing
US20120143951A1 (en) * 2010-12-07 2012-06-07 Nec Laboratories America, Inc. System and method for providing a platform as a service (paas) with a materialized shared space
CN102821000A (en) * 2012-09-14 2012-12-12 乐视网信息技术(北京)股份有限公司 Method for improving usability of PaaS platform

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110295984A1 (en) * 2010-06-01 2011-12-01 Tobias Kunze Cartridge-based package management
US20120143951A1 (en) * 2010-12-07 2012-06-07 Nec Laboratories America, Inc. System and method for providing a platform as a service (paas) with a materialized shared space
CN102110071A (en) * 2011-03-04 2011-06-29 浪潮(北京)电子信息产业有限公司 Virtual machine cluster system and implementation method thereof
CN102404385A (en) * 2011-10-25 2012-04-04 华中科技大学 Virtual cluster deployment system and deployment method for high performance computing
CN102821000A (en) * 2012-09-14 2012-12-12 乐视网信息技术(北京)股份有限公司 Method for improving usability of PaaS platform

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105468625A (en) * 2014-09-04 2016-04-06 中国石油化工股份有限公司 Database cluster constructing method by using virtual machine
CN105683965A (en) * 2016-01-30 2016-06-15 深圳市博信诺达经贸咨询有限公司 Method and system for automated information analysis based on big data
CN105897707A (en) * 2016-04-01 2016-08-24 浪潮电子信息产业股份有限公司 Password-less access method to cluster system and master control server
CN106777164A (en) * 2016-12-20 2017-05-31 东软集团股份有限公司 A kind of Data Migration cluster and data migration method
CN106777164B (en) * 2016-12-20 2020-07-10 东软集团股份有限公司 Data migration cluster and data migration method
CN106648672A (en) * 2016-12-28 2017-05-10 北京云星宇交通科技股份有限公司 Method and system for developing and running big data
CN109218101A (en) * 2018-09-26 2019-01-15 北京交通大学 A kind of method and system of wisdom contract network group creation
CN109218101B (en) * 2018-09-26 2020-07-17 北京交通大学 Method and system for creating intelligent cooperative network group
CN110795626A (en) * 2019-10-28 2020-02-14 南京弹跳力信息技术有限公司 Big data processing method and system
CN113518095A (en) * 2021-09-14 2021-10-19 北京华云安信息技术有限公司 SSH cluster deployment method, device, equipment and storage medium
CN113518095B (en) * 2021-09-14 2021-12-14 北京华云安信息技术有限公司 SSH cluster deployment method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN103067501B (en) 2015-12-09

Similar Documents

Publication Publication Date Title
CN103067501B (en) The large data processing method of PaaS platform
US11210204B2 (en) Agentless distributed monitoring of microservices through a virtual switch
US10445121B2 (en) Building virtual machine disk images for different cloud configurations from a single generic virtual machine disk image
Massie et al. Monitoring with ganglia
US9262238B2 (en) Connection management for an application in a computing platform
US20180121230A1 (en) Evaluating distributed application performance in a new environment
US9665356B2 (en) Configuration of an application in a computing platform
CN110083455B (en) Graph calculation processing method, graph calculation processing device, graph calculation processing medium and electronic equipment
CN110908658A (en) Micro-service and micro-application system, data processing method and device
CN106201566A (en) The rich big special hot upgrade method of software of profit and equipment
Lumpkins The internet of things meets cloud computing [standards corner]
US9141363B2 (en) Application construction for execution on diverse computing infrastructures
CN109213498A (en) A kind of configuration method and server of internet web front-end
CN111064626A (en) Configuration updating method, device, server and readable storage medium
CN104020992A (en) Method and device for generating Java Web service through C/C++
Mc Evoy et al. Performance and deployment evaluation of a parallel application on a private cloud
US9106676B1 (en) Grid-based server messaging infrastructure
CN114579250B (en) Method, device and storage medium for constructing virtual cluster
US9690728B1 (en) Burst buffer appliance comprising multiple virtual machines
CN114253798A (en) Index data acquisition method and device, electronic equipment and storage medium
CN105407150A (en) Remote control method of application program
Pham A big data analytics framework for iot applications in the cloud
CN108595169B (en) Visual programming method, cloud server and storage medium
CN105677442A (en) Deployment method based on container virtualization
Agarwal et al. Towards an MPI-like framework for the Azure cloud platform

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant