CN110011827A

CN110011827A - Towards doctor conjuncted multi-user's big data analysis service system and method

Info

Publication number: CN110011827A
Application number: CN201910142598.0A
Authority: CN
Inventors: 王永明; 崔修涛; 章玉宇; 胡天龙; 刘佳伟; 赵政达
Original assignee: Shanghai Changjiang Science And Technology Development Co Ltd; Zhong Electricity Ke Software Information Services Co Ltd
Current assignee: CETC SOFTWARE INFORMATION SERVICES Co.,Ltd.
Priority date: 2019-02-26
Filing date: 2019-02-26
Publication date: 2019-07-12

Abstract

The present invention provides a kind of towards conjuncted multi-user's big data analysis service system and method is cured, and including being used based on the big data cluster creation from method of service, big data cluster, big data cluster is deleted.Use the container service of OpenStack as run time infrastructure, the resource management capacity of OpenStack is utilized, without in addition realizing physical resource management function in big data analysis service subsystem；Shared physical resource can be serviced with other, improve resource utilization, reduce physical resource waste.Big data cluster itself provided by the invention is aggregated structure, and is Spark cluster, and such as Worker nodes break down, entire cluster only can cause performance to decline still in available mode because available resources reduce.With cheap deployment and maintenance mode, user will carry out deployment installation configuration without taking a significant amount of time, and have powerful programming extended capability, support the exploitation of multilingual.

Description

Towards doctor conjuncted multi-user's big data analysis service system and method

Technical field

The present invention relates to big data analysis fields, and in particular, to a kind of multi-user big data analysis conjuncted towards doctor Service system and method.

Background technique

Big data analysis, which refers to, analyzes huge data.Big data may be summarized to be 4 V, and data volume is big (Volume), speed fast (Velocity), type more (Variety), value (Value).Big data is as most burning hot at present The vocabulary of IT industry, the following data warehouse, data safety, data analysis, data mining etc. surround the quotient of big data The utilization of industry value is increasingly becoming the profit focus that industry personage falls over each other to pursue.With the arriving of big data era, big data point Analysis is also come into being.How to realize the automation of big data analysis service cluster, provided from serviceization management without care bottom The operations such as source application, software installation and configuration, meet the demand under user, especially multi-user environment to big data cluster, The problem that big data analysis urgently cracks is carried out using big data technology through becoming.

Summary of the invention

For the defects in the prior art, the object of the present invention is to provide a kind of multi-user big datas conjuncted towards doctor point Analyse service system and method.

A kind of multi-user big data analysis service system conjuncted towards doctor provided according to the present invention, comprising:

Cluster service creation module: the big number for selecting creation in a manner of Self-Service by user from console is received Big data cluster is configured by backstage automatically dispose after receiving user's setting big data cluster scale according to cluster, completes big number It is created according to cluster；

Cluster service uses module: checking instruction according to user by console, provides the access of big data cluster Information provides and accesses big data cluster by ssh client or VNC client, uses big data cluster.

Preferably, described towards conjuncted multi-user's big data analysis service system is cured, it further include that cluster service is deleted Module: the big data cluster for being selected to delete in a manner of Self-Service by user from console is received, by automating from the background Delete big data cluster.

Preferably, the big data cluster is the big data service constructed on cloud.

Preferably, the use service construction of the big data cluster uses on x86 and IBMPower server cluster Customize JVM.

Preferably, the big data cluster creation carries out cluster creation using OpenStack Heat.

Preferably, the analysis result of the analysis task is stored in HDFS, Hive or cluster file system.

Preferably, management node virtual machine and big data cluster are disposed, the management node virtual machine includes Nginx, big Data platform service, Heat management, file synchronization services, object storage, Nginx realize external interface mapping, big data platform Service processing Web request, big data platform service are interacted with file synchronization services, Heat management, and file synchronization services are complete It is synchronous with HDFS's as storing in pairs.

Preferably, described using in big data cluster, user submits analysis task by uploading data or program, passes through The analysis result of SCP downloading analysis task.

Preferably, described using in big data cluster, user can check the operating status of big data cluster；

In creation big data cluster, user can check the creation progress of big data cluster.

A kind of multi-user big data analysis method of servicing conjuncted towards doctor provided according to the present invention, comprising:

Cluster service foundation step: the big number for selecting creation in a manner of Self-Service by user from console is received Big data cluster is configured by backstage automatically dispose after receiving user's setting big data cluster scale according to cluster, completes big number It is created according to cluster；

Cluster service uses step: checking instruction according to user by console, provides the access of big data cluster Information provides and accesses big data cluster by ssh client or VNC client, uses big data cluster；

Cluster service deletes step: receiving the big number for being selected to delete in a manner of Self-Service by user from console According to cluster, is automated by backstage and delete big data cluster.

Compared with prior art, the present invention have it is following the utility model has the advantages that

1, big data service construction of the invention has extremely outstanding on x86 and IBM Power server cluster Computing capability, in order to preferably be suitable for big data business, big data service uses the JVM customized for Power, for opening For the user for sending out big data business, bottom cluster application Power or x86 any influence no for programming is ok It is run with same code.

2, big data service of the invention is a kind of big data service of the building on cloud, is capable of handling any amount of number According to data handling capacity is extended to several PB ranks from several TB on demand.In big data cluster service, user can at any time quickly Create any number of node.

3, the present invention has powerful programming extended capability, supports Java, the multilinguals such as Python, Scala will also branch R language is held, the programming language that oneself habit can be used in user carries out writing, create, configuring, mentioning for Hadoop/Spark operation It hands over and monitors.

4, the present invention has cheap deployment and maintenance mode, and user will match without taking a significant amount of time dispose installing It sets, is not necessarily to other preceding period costs, big data service can be automatically performed these work for user, and user can start in a few minutes One cluster.

Detailed description of the invention

Upon reading the detailed description of non-limiting embodiments with reference to the following drawings, other feature of the invention, Objects and advantages will become more apparent upon:

Fig. 1 is that big data cluster creates service flow diagram；

Fig. 2 is that big data cluster deletes service flow diagram；

Fig. 3 is that big data analysis service subsystem disposes architecture diagram；

Fig. 4 is that big data clustering services flow graph；

Fig. 5 is big data analysis service subsystem database design drawing.

Specific embodiment

The present invention is described in detail combined with specific embodiments below.Following embodiment will be helpful to the technology of this field Personnel further understand the present invention, but the invention is not limited in any way.It should be pointed out that the ordinary skill of this field For personnel, without departing from the inventive concept of the premise, several changes and improvements can also be made.These belong to the present invention Protection scope.

Specifically, described towards conjuncted multi-user's big data analysis service system is cured, it further include that cluster service is deleted Module: the big data cluster for being selected to delete in a manner of Self-Service by user from console is received, by automating from the background Delete big data cluster.

Wherein, big data cluster is created, is deposited in the form of services respectively using big data cluster, deletion big data cluster Big data cluster creation service, big data cluster are being respectively corresponded using service, big data cluster and is deleting service, and big data Cluster, which creates service, to delete service using service and big data cluster prior to big data cluster.

The big data cluster creation service in a manner of from service, selects creation big data cluster in console by user And after confirming cluster scale (number of nodes), by big data cluster creation service background automatically dispose and configuration large data sets Group.User can check that big data cluster creates progress by console, and use can be logged in after the completion of creation.

The big data cluster is after the completion of big data cluster creates using service, and user can be checked by console The access information (IP address that most important information is cluster Master node) of cluster.Later, user can by SSH or VNC mode directly accesses Master node, analyzes data, program by uploading, and submits analysis task to Spark；Analysis is completed After can by using SCP download analysis result.In this process, user can also be clicked in cluster by console Spark monitoring and Hadoop monitoring, carry out real time inspection to cluster state.

The big data cluster deletes service also using user from by the way of servicing, and to be deleted by user in console selection Big data cluster simultaneously confirms that system can delete work in the completion cluster of backstage automation.Cluster can not restore once deleting, User needs to have been manually done data backup before deletion.

Specifically, the big data cluster is the big data service constructed on cloud.Any amount of data are capable of handling, Data handling capacity is extended into several PB ranks from several TB on demand.In big data cluster service, user can quickly create at any time Build any number of node.

Specifically, the use service construction of the big data cluster calculates on x86 and IBMPower server cluster Ability is strong, using customization JVM.For developing for the user of big data business, bottom cluster application Power or x86 For programming no any influence, can be run with same code.The big data cluster is had powerful using service Extended capability is programmed, supports Java, the multilinguals such as Python, Scala that will also support R language.User can be used oneself and practise Used programming language carries out writing, create, configure, submit and monitoring for Hadoop/Spark operation.

Specifically, the big data cluster creation carries out cluster creation using OpenStack Heat.

Specifically, the analysis result of the analysis task is stored in HDFS, Hive or cluster file system.

Specifically, management node virtual machine and big data cluster are disposed, the management node virtual machine includes Nginx, big Data platform service, Heat management, file synchronization services, object storage, Nginx realize external interface mapping, big data platform Service processing Web request, big data platform service are interacted with file synchronization services, Heat management, and file synchronization services are complete It is synchronous with HDFS's as storing in pairs.

Specifically, described using in big data cluster, user submits analysis task by uploading data or program, passes through The analysis result of SCP downloading analysis task.

Specifically, described using in big data cluster, user can check the operating status of big data cluster；It is big in creation In data cluster, user can check the creation progress of big data cluster.

Cluster service foundation step: instruction is checked according to user by console, the access of big data cluster is provided Information after providing setting big data cluster scale, configures big data cluster by backstage automatically dispose, completes big data cluster wound It builds；

The multi-user big data analysis service system conjuncted towards doctor provided by the invention, can be by conjuncted towards curing The step process of multi-user's big data analysis method of servicing is realized.Those skilled in the art can be by the multi-user conjuncted towards doctor Big data analysis method of servicing is interpreted as described towards the preference for curing conjuncted multi-user's big data analysis service system.

Preference of the invention is further elaborated below in conjunction with attached drawing.

As shown in Figure 1, in one embodiment, when user needs using big data cluster, being created and being collected by console Group, user select creation big data cluster on console.Selection is wished to create the number of nodes of cluster in popup web page.From the background The number of nodes that system is selected according to user, by the creation for completing cluster with OpenStack Heat.During creation, use Family can see the state of cluster by console.After the completion of cluster creation, cluster enters available mode, and user can lead to The host node (Spark Master) that ssh client logs on to cluster is crossed, cluster is used；Or by web console to Hadoop Cluster and Spark cluster are monitored.

In one embodiment, user uploads data to be analyzed to cluster by SCP, and as needed, import HDFS or In Hive.The data in HDFS or Hive can be read in a program by uploading Spark program by SCP；Pass through Spark Submit submits program task, carries out analytic operation；Analysis result may remain in the file system of HDFS, Hive or cluster In.File in Hive or HDFS is exported into local file system；Data in cluster file system are downloaded to by SCP Client.

As shown in Fig. 2, in one embodiment, user logs in cluster or downloads the number for needing to retain in cluster by SCP According to.Being deleted due to cluster is irrecoverable operation, and does not retain any data.Therefore, before user deletes cluster, hand is needed The data that work downloading needs to retain.User is clicked by console and deletes cluster.Background system is completed cluster by Heat and is deleted Operation.

As shown in figure 3, in one embodiment, main includes being deployed in management control node (x86-01~x86-03) Management node virtual machine and operate in Docker calculate node, have management node dynamic manage life cycle big data Cluster composition.Nginx, big data platform service, Heat management, text are contained in big data analysis service management node virtual machine The components such as part is synchronous and object stores, Nginx are responsible for external interface mappings, and big data platform server, which is responsible for processing Web, asks It asks, and is connected with file synchronization services and Heat management service, when user carries out big data service-creation, big data platform Processing server can input according to user and call corresponding Heat template, complete big data and create task.Meanwhile if user wants To file operation, the file management in Web can be called, big data platform server can handle this request, call file same The synchronization between object storage and HDFS is completed in step service.It is worth noting that, multiple big data platform servers can be used Carry out highly reliable guarantee.

As shown in figure 4, big data cluster is created according to user's request dynamic, i.e., when user apply in the console it is new big When data cluster, cluster creation instruction is issued by management node, completes cluster creation.Firstly, the creation information of user can pass through Big data analysis subsystem controls platform (Dashboard) is transmitted to Heat module (orchestrating services in OpenStack), updates Heat Relevant configuration in template；It includes big data service node that Heat is obtained by Glance (mirroring service in OpenStack) Software package and relevant environment configuration Docker mirror image, and it is corresponding by nova (the calculating service in OpenStack) creation Container instance；Finally start in Docker each big data node (NameNode, DataNode, Master, Worker, Driver, Executor), big data cluster environment needed for user is built in completion.

As shown in figure 5, the present embodiment uses 7 tables, including stack in the design of big data service database Information table, user message table operate table, stack operation table, and mode of operation table is subordinate to node table and message table.The user information Table is for storing user's name and information.Information is some notification informations, and user is helped to understand big data cluster More information.The stack information table is used to store the heat active stack information of user.The stack operation table is for storing user couple All operation behaviors of big data cluster, the start and end time including operation, operation content, success or not, error message. The ip address information for being subordinate to node table storage big data slave node.The mode of operation table is for storing big data clothes Business mode of operation.The message table is for storing information, including message content, the shape for notifying time and user whether to read State.

The present invention uses the container service (Nova Docker) of OpenStack as run time infrastructure, and as user creates The cluster built operates in the Docker calculate node of OpenStack.The resource management capacity of OpenStack is utilized, is not necessarily to Physical resource management function is in addition realized in big data analysis service subsystem；Shared physical resource can be serviced with other, mentioned High resource utilization reduces physical resource waste.In terms of system High Availabitity, big data cluster itself provided by the invention is collection Group's framework, and be Spark cluster.In Spark cluster, such as Worker nodes break down, entire cluster is still in available shape State only can reduce (number of nodes reduction) because of available resources due to performance is caused to decline, such as Master nodes break down, it will cause Entirely cluster is unavailable.

One skilled in the art will appreciate that in addition to realizing system provided by the invention in a manner of pure computer readable program code It, completely can be by the way that method and step be carried out programming in logic come so that provided by the invention other than system, device and its modules System, device and its modules are declined with logic gate, switch, specific integrated circuit, programmable logic controller (PLC) and insertion The form of controller etc. realizes identical program.So system provided by the invention, device and its modules may be considered that It is a kind of hardware component, and the knot that the module for realizing various programs for including in it can also be considered as in hardware component Structure；It can also will be considered as realizing the module of various functions either the software program of implementation method can be Hardware Subdivision again Structure in part.

Specific embodiments of the present invention are described above.It is to be appreciated that the invention is not limited to above-mentioned Particular implementation, those skilled in the art can make a variety of changes or modify within the scope of the claims, this not shadow Ring substantive content of the invention.In the absence of conflict, the feature in embodiments herein and embodiment can any phase Mutually combination.

Claims

1. a kind of multi-user big data analysis service system conjuncted towards doctor characterized by comprising

Cluster service creation module: the large data sets for selecting creation in a manner of Self-Service by user from console are received Group configures big data cluster by backstage automatically dispose, completes large data sets after receiving user's setting big data cluster scale Group's creation；

Cluster service uses module: instruction checked according to user by console, the access information of big data cluster is provided, It provides and big data cluster is accessed by ssh client or VNC client, use big data cluster.

2. the multi-user big data analysis service system conjuncted towards doctor according to claim 1, which is characterized in that also wrap It includes cluster service removing module: receiving the large data sets for being selected to delete in a manner of Self-Service by user from console Group is automated by backstage and deletes big data cluster.

3. the multi-user big data analysis service system conjuncted towards doctor according to claim 1, which is characterized in that described Big data cluster is the big data service constructed on cloud.

4. the multi-user big data analysis service system conjuncted towards doctor according to claim 1, which is characterized in that described Big data cluster uses service construction on x86 and IBMPower server cluster, using customization JVM.

5. the multi-user big data analysis service system conjuncted towards doctor according to claim 1, which is characterized in that described The creation of big data cluster carries out cluster creation using OpenStack Heat.

6. the multi-user big data analysis service system conjuncted towards doctor according to claim 1, which is characterized in that described The analysis result of analysis task is stored in HDFS, Hive or cluster file system.

7. the multi-user big data analysis service system conjuncted towards doctor according to claim 1, which is characterized in that deployment Management node virtual machine and big data cluster, the management node virtual machine include Nginx, big data platform service, Heat pipe Reason, file synchronization services, object storage, Nginx realize external interface mapping, big data platform service processing Web request, big number It is interacted according to platform service and file synchronization services, Heat management, file synchronization services completion object storage is same with HDFS's Step.

8. the multi-user big data analysis service system conjuncted towards doctor according to claim 1, which is characterized in that described Using in big data cluster, user submits analysis task by uploading data or program, and the analysis of analysis task is downloaded by SCP As a result.

9. the multi-user big data analysis service system conjuncted towards doctor according to claim 1, which is characterized in that described Using in big data cluster, user can check the operating status of big data cluster；

10. a kind of multi-user big data analysis method of servicing conjuncted towards doctor characterized by comprising

Cluster service foundation step: the large data sets for selecting creation in a manner of Self-Service by user from console are received Group configures big data cluster by backstage automatically dispose, completes large data sets after receiving user's setting big data cluster scale Group's creation；

Cluster service uses step: instruction checked according to user by console, the access information of big data cluster is provided, It provides and big data cluster is accessed by ssh client or VNC client, use big data cluster；

Cluster service deletes step: receiving the large data sets for being selected to delete in a manner of Self-Service by user from console Group is automated by backstage and deletes big data cluster.