Summary of the invention
For the defects in the prior art, the object of the present invention is to provide a kind of multi-user big datas conjuncted towards doctor point
Analyse service system and method.
A kind of multi-user big data analysis service system conjuncted towards doctor provided according to the present invention, comprising:
Cluster service creation module: the big number for selecting creation in a manner of Self-Service by user from console is received
Big data cluster is configured by backstage automatically dispose after receiving user's setting big data cluster scale according to cluster, completes big number
It is created according to cluster;
Cluster service uses module: checking instruction according to user by console, provides the access of big data cluster
Information provides and accesses big data cluster by ssh client or VNC client, uses big data cluster.
Preferably, described towards conjuncted multi-user's big data analysis service system is cured, it further include that cluster service is deleted
Module: the big data cluster for being selected to delete in a manner of Self-Service by user from console is received, by automating from the background
Delete big data cluster.
Preferably, the big data cluster is the big data service constructed on cloud.
Preferably, the use service construction of the big data cluster uses on x86 and IBMPower server cluster
Customize JVM.
Preferably, the big data cluster creation carries out cluster creation using OpenStack Heat.
Preferably, the analysis result of the analysis task is stored in HDFS, Hive or cluster file system.
Preferably, management node virtual machine and big data cluster are disposed, the management node virtual machine includes Nginx, big
Data platform service, Heat management, file synchronization services, object storage, Nginx realize external interface mapping, big data platform
Service processing Web request, big data platform service are interacted with file synchronization services, Heat management, and file synchronization services are complete
It is synchronous with HDFS's as storing in pairs.
Preferably, described using in big data cluster, user submits analysis task by uploading data or program, passes through
The analysis result of SCP downloading analysis task.
Preferably, described using in big data cluster, user can check the operating status of big data cluster;
In creation big data cluster, user can check the creation progress of big data cluster.
A kind of multi-user big data analysis method of servicing conjuncted towards doctor provided according to the present invention, comprising:
Cluster service foundation step: the big number for selecting creation in a manner of Self-Service by user from console is received
Big data cluster is configured by backstage automatically dispose after receiving user's setting big data cluster scale according to cluster, completes big number
It is created according to cluster;
Cluster service uses step: checking instruction according to user by console, provides the access of big data cluster
Information provides and accesses big data cluster by ssh client or VNC client, uses big data cluster;
Cluster service deletes step: receiving the big number for being selected to delete in a manner of Self-Service by user from console
According to cluster, is automated by backstage and delete big data cluster.
Compared with prior art, the present invention have it is following the utility model has the advantages that
1, big data service construction of the invention has extremely outstanding on x86 and IBM Power server cluster
Computing capability, in order to preferably be suitable for big data business, big data service uses the JVM customized for Power, for opening
For the user for sending out big data business, bottom cluster application Power or x86 any influence no for programming is ok
It is run with same code.
2, big data service of the invention is a kind of big data service of the building on cloud, is capable of handling any amount of number
According to data handling capacity is extended to several PB ranks from several TB on demand.In big data cluster service, user can at any time quickly
Create any number of node.
3, the present invention has powerful programming extended capability, supports Java, the multilinguals such as Python, Scala will also branch
R language is held, the programming language that oneself habit can be used in user carries out writing, create, configuring, mentioning for Hadoop/Spark operation
It hands over and monitors.
4, the present invention has cheap deployment and maintenance mode, and user will match without taking a significant amount of time dispose installing
It sets, is not necessarily to other preceding period costs, big data service can be automatically performed these work for user, and user can start in a few minutes
One cluster.
Specific embodiment
The present invention is described in detail combined with specific embodiments below.Following embodiment will be helpful to the technology of this field
Personnel further understand the present invention, but the invention is not limited in any way.It should be pointed out that the ordinary skill of this field
For personnel, without departing from the inventive concept of the premise, several changes and improvements can also be made.These belong to the present invention
Protection scope.
A kind of multi-user big data analysis service system conjuncted towards doctor provided according to the present invention, comprising:
Cluster service creation module: the big number for selecting creation in a manner of Self-Service by user from console is received
Big data cluster is configured by backstage automatically dispose after receiving user's setting big data cluster scale according to cluster, completes big number
It is created according to cluster;
Cluster service uses module: checking instruction according to user by console, provides the access of big data cluster
Information provides and accesses big data cluster by ssh client or VNC client, uses big data cluster.
Specifically, described towards conjuncted multi-user's big data analysis service system is cured, it further include that cluster service is deleted
Module: the big data cluster for being selected to delete in a manner of Self-Service by user from console is received, by automating from the background
Delete big data cluster.
Wherein, big data cluster is created, is deposited in the form of services respectively using big data cluster, deletion big data cluster
Big data cluster creation service, big data cluster are being respectively corresponded using service, big data cluster and is deleting service, and big data
Cluster, which creates service, to delete service using service and big data cluster prior to big data cluster.
The big data cluster creation service in a manner of from service, selects creation big data cluster in console by user
And after confirming cluster scale (number of nodes), by big data cluster creation service background automatically dispose and configuration large data sets
Group.User can check that big data cluster creates progress by console, and use can be logged in after the completion of creation.
The big data cluster is after the completion of big data cluster creates using service, and user can be checked by console
The access information (IP address that most important information is cluster Master node) of cluster.Later, user can by SSH or
VNC mode directly accesses Master node, analyzes data, program by uploading, and submits analysis task to Spark;Analysis is completed
After can by using SCP download analysis result.In this process, user can also be clicked in cluster by console
Spark monitoring and Hadoop monitoring, carry out real time inspection to cluster state.
The big data cluster deletes service also using user from by the way of servicing, and to be deleted by user in console selection
Big data cluster simultaneously confirms that system can delete work in the completion cluster of backstage automation.Cluster can not restore once deleting,
User needs to have been manually done data backup before deletion.
Specifically, the big data cluster is the big data service constructed on cloud.Any amount of data are capable of handling,
Data handling capacity is extended into several PB ranks from several TB on demand.In big data cluster service, user can quickly create at any time
Build any number of node.
Specifically, the use service construction of the big data cluster calculates on x86 and IBMPower server cluster
Ability is strong, using customization JVM.For developing for the user of big data business, bottom cluster application Power or x86
For programming no any influence, can be run with same code.The big data cluster is had powerful using service
Extended capability is programmed, supports Java, the multilinguals such as Python, Scala that will also support R language.User can be used oneself and practise
Used programming language carries out writing, create, configure, submit and monitoring for Hadoop/Spark operation.
Specifically, the big data cluster creation carries out cluster creation using OpenStack Heat.
Specifically, the analysis result of the analysis task is stored in HDFS, Hive or cluster file system.
Specifically, management node virtual machine and big data cluster are disposed, the management node virtual machine includes Nginx, big
Data platform service, Heat management, file synchronization services, object storage, Nginx realize external interface mapping, big data platform
Service processing Web request, big data platform service are interacted with file synchronization services, Heat management, and file synchronization services are complete
It is synchronous with HDFS's as storing in pairs.
Specifically, described using in big data cluster, user submits analysis task by uploading data or program, passes through
The analysis result of SCP downloading analysis task.
Specifically, described using in big data cluster, user can check the operating status of big data cluster;It is big in creation
In data cluster, user can check the creation progress of big data cluster.
A kind of multi-user big data analysis method of servicing conjuncted towards doctor provided according to the present invention, comprising:
Cluster service foundation step: instruction is checked according to user by console, the access of big data cluster is provided
Information after providing setting big data cluster scale, configures big data cluster by backstage automatically dispose, completes big data cluster wound
It builds;
Cluster service uses step: checking instruction according to user by console, provides the access of big data cluster
Information provides and accesses big data cluster by ssh client or VNC client, uses big data cluster;
Cluster service deletes step: receiving the big number for being selected to delete in a manner of Self-Service by user from console
According to cluster, is automated by backstage and delete big data cluster.
The multi-user big data analysis service system conjuncted towards doctor provided by the invention, can be by conjuncted towards curing
The step process of multi-user's big data analysis method of servicing is realized.Those skilled in the art can be by the multi-user conjuncted towards doctor
Big data analysis method of servicing is interpreted as described towards the preference for curing conjuncted multi-user's big data analysis service system.
Preference of the invention is further elaborated below in conjunction with attached drawing.
As shown in Figure 1, in one embodiment, when user needs using big data cluster, being created and being collected by console
Group, user select creation big data cluster on console.Selection is wished to create the number of nodes of cluster in popup web page.From the background
The number of nodes that system is selected according to user, by the creation for completing cluster with OpenStack Heat.During creation, use
Family can see the state of cluster by console.After the completion of cluster creation, cluster enters available mode, and user can lead to
The host node (Spark Master) that ssh client logs on to cluster is crossed, cluster is used;Or by web console to Hadoop
Cluster and Spark cluster are monitored.
In one embodiment, user uploads data to be analyzed to cluster by SCP, and as needed, import HDFS or
In Hive.The data in HDFS or Hive can be read in a program by uploading Spark program by SCP;Pass through Spark
Submit submits program task, carries out analytic operation;Analysis result may remain in the file system of HDFS, Hive or cluster
In.File in Hive or HDFS is exported into local file system;Data in cluster file system are downloaded to by SCP
Client.
As shown in Fig. 2, in one embodiment, user logs in cluster or downloads the number for needing to retain in cluster by SCP
According to.Being deleted due to cluster is irrecoverable operation, and does not retain any data.Therefore, before user deletes cluster, hand is needed
The data that work downloading needs to retain.User is clicked by console and deletes cluster.Background system is completed cluster by Heat and is deleted
Operation.
As shown in figure 3, in one embodiment, main includes being deployed in management control node (x86-01~x86-03)
Management node virtual machine and operate in Docker calculate node, have management node dynamic manage life cycle big data
Cluster composition.Nginx, big data platform service, Heat management, text are contained in big data analysis service management node virtual machine
The components such as part is synchronous and object stores, Nginx are responsible for external interface mappings, and big data platform server, which is responsible for processing Web, asks
It asks, and is connected with file synchronization services and Heat management service, when user carries out big data service-creation, big data platform
Processing server can input according to user and call corresponding Heat template, complete big data and create task.Meanwhile if user wants
To file operation, the file management in Web can be called, big data platform server can handle this request, call file same
The synchronization between object storage and HDFS is completed in step service.It is worth noting that, multiple big data platform servers can be used
Carry out highly reliable guarantee.
As shown in figure 4, big data cluster is created according to user's request dynamic, i.e., when user apply in the console it is new big
When data cluster, cluster creation instruction is issued by management node, completes cluster creation.Firstly, the creation information of user can pass through
Big data analysis subsystem controls platform (Dashboard) is transmitted to Heat module (orchestrating services in OpenStack), updates Heat
Relevant configuration in template;It includes big data service node that Heat is obtained by Glance (mirroring service in OpenStack)
Software package and relevant environment configuration Docker mirror image, and it is corresponding by nova (the calculating service in OpenStack) creation
Container instance;Finally start in Docker each big data node (NameNode, DataNode, Master,
Worker, Driver, Executor), big data cluster environment needed for user is built in completion.
As shown in figure 5, the present embodiment uses 7 tables, including stack in the design of big data service database
Information table, user message table operate table, stack operation table, and mode of operation table is subordinate to node table and message table.The user information
Table is for storing user's name and information.Information is some notification informations, and user is helped to understand big data cluster
More information.The stack information table is used to store the heat active stack information of user.The stack operation table is for storing user couple
All operation behaviors of big data cluster, the start and end time including operation, operation content, success or not, error message.
The ip address information for being subordinate to node table storage big data slave node.The mode of operation table is for storing big data clothes
Business mode of operation.The message table is for storing information, including message content, the shape for notifying time and user whether to read
State.
The present invention uses the container service (Nova Docker) of OpenStack as run time infrastructure, and as user creates
The cluster built operates in the Docker calculate node of OpenStack.The resource management capacity of OpenStack is utilized, is not necessarily to
Physical resource management function is in addition realized in big data analysis service subsystem;Shared physical resource can be serviced with other, mentioned
High resource utilization reduces physical resource waste.In terms of system High Availabitity, big data cluster itself provided by the invention is collection
Group's framework, and be Spark cluster.In Spark cluster, such as Worker nodes break down, entire cluster is still in available shape
State only can reduce (number of nodes reduction) because of available resources due to performance is caused to decline, such as Master nodes break down, it will cause
Entirely cluster is unavailable.
One skilled in the art will appreciate that in addition to realizing system provided by the invention in a manner of pure computer readable program code
It, completely can be by the way that method and step be carried out programming in logic come so that provided by the invention other than system, device and its modules
System, device and its modules are declined with logic gate, switch, specific integrated circuit, programmable logic controller (PLC) and insertion
The form of controller etc. realizes identical program.So system provided by the invention, device and its modules may be considered that
It is a kind of hardware component, and the knot that the module for realizing various programs for including in it can also be considered as in hardware component
Structure;It can also will be considered as realizing the module of various functions either the software program of implementation method can be Hardware Subdivision again
Structure in part.
Specific embodiments of the present invention are described above.It is to be appreciated that the invention is not limited to above-mentioned
Particular implementation, those skilled in the art can make a variety of changes or modify within the scope of the claims, this not shadow
Ring substantive content of the invention.In the absence of conflict, the feature in embodiments herein and embodiment can any phase
Mutually combination.