CN111400374B - Data mining-oriented containerized data exploration isolation region and use method thereof - Google Patents
Data mining-oriented containerized data exploration isolation region and use method thereof Download PDFInfo
- Publication number
- CN111400374B CN111400374B CN202010189150.7A CN202010189150A CN111400374B CN 111400374 B CN111400374 B CN 111400374B CN 202010189150 A CN202010189150 A CN 202010189150A CN 111400374 B CN111400374 B CN 111400374B
- Authority
- CN
- China
- Prior art keywords
- data
- module
- exploration
- user
- isolation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/24569—Query processing with adaptation to specific hardware, e.g. adapted for using GPUs or SSDs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/02—Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
- H04L63/0209—Architectural arrangements, e.g. perimeter networks or demilitarized zones
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45595—Network integration; Enabling network access in virtual machine instances
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Computer Hardware Design (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a containerized data exploration isolation area oriented to data mining, which comprises a data integration module, a data warehouse module, a mirror image warehouse module, a container development environment based on Kubernets and a knowledge sharing module, wherein the data integration module is connected with an external data source and the data warehouse module and performs data interaction, and the data warehouse module is connected with the container development environment based on Kubernets and the knowledge sharing module and the data integration module and performs data interaction; the knowledge sharing module is connected with an external application and performs data interaction; the container development environment based on the Kubernets is connected with the data warehouse module, the mirror warehouse module and the Web browser and performs data interaction. A method for using the above-mentioned containerized data exploration isolation region is also provided. The invention solves the problems of data mining and sharing of the data exploration isolation area.
Description
Technical Field
The invention relates to the field of data exploration isolation areas, in particular to a containerized data exploration isolation area for data mining and a use method thereof.
Background
The exploration isolation zone (Demilitarized Zone, abbreviated as DMZ) is set for solving the problem that an external network cannot access an internal network server after a firewall is set, and is a network area between a non-security system and a security system or between an internal network and an external network of an enterprise, and is a buffer zone divided for placing server facilities which have to be disclosed externally for the internal network. The technology of exploring the isolation area aims at solving the network security problem, and the exploring the isolation area isolates a network security area by using the firewall technology to protect the network security of the data and services of the area.
The exploration isolation zone (DMZ) in the prior art is mainly used for providing services such as HTTP, FTP, SSH, SMTP and the like which are safe to the outside, meanwhile, compared with a general firewall scheme, the network deployment has one more gateway for an attacker, the internal network can be more effectively protected through the exploration isolation zone (DMZ), and the network deployment is a buffer zone for the internal network, wherein the buffer zone is divided into a plurality of server facilities which have to be disclosed to the outside.
The network structure of the exploration isolation zone (DMZ) is shown in fig. 1, and comprises a local area network, a firewall 1, a router 1, a firewall 2 and the Internet which are sequentially connected and perform data interaction, and also comprises an exploration isolation zone which is connected with the router 1 and performs data interaction, wherein the exploration isolation zone comprises a Web server, a service server and a database. However, with the increase of network access requirements and user use functions, in the data mining process, a code development environment is often required to develop and debug a data mining algorithm, and a code running dependent environment required by data mining needs to be configured for the development environment so as to ensure that written codes can run normally and stably. The existing exploration isolation area only provides a network security isolation area, does not provide an algorithm development and data analysis tool suitable for data mining, is not suitable for data mining related functions, and cannot meet the data mining requirements.
In order to solve the problem, the method uses the Jupyter Notebook as a data mining and algorithm development tool according to the data mining requirement on the basis of providing a network security environment by the traditional exploration isolation area, and realizes a containerized JupyterNotebook development environment based on a Docker container virtualization technology and a Kubernetes container arrangement technology.
Disclosure of Invention
In view of this, the present invention proposes a data mining oriented containerized data exploration isolation region, which aims to solve the problem that the data exploration isolation region in the prior art cannot be suitable for data mining and cannot meet the data mining requirement and data sharing.
According to a first aspect of the present invention, a data mining oriented containerized data exploration isolation region is presented,
the system comprises a data integration module, a data warehouse module, a mirror image warehouse module, a container development environment based on Kubernets and a knowledge sharing module; the data integration module is connected with an external data source and performs data interaction on one hand, and is connected with the data warehouse module and performs data interaction on the other hand; the data warehouse module is connected with the Kubernets-based container development environment and performs data interaction on one hand, and is connected with the knowledge sharing module and performs data interaction on the other hand, and meanwhile, is also connected with the data integration module and performs data interaction on the other hand; the knowledge sharing module is connected with the external application and performs data interaction; the container development environment based on the Kubernets is connected with the data warehouse module and performs data interaction on one hand, and is connected with the mirror image warehouse module and performs data interaction on the other hand, and meanwhile, the container development environment based on the Kubernets is connected with the Web browser and performs data interaction.
In one embodiment, the data exploration isolation creates a containerized jupyternotbook development environment.
In one embodiment, the containerized juyter Notebook development environment includes an OAuth2 access authorization agent, an ISTIO gateway, a system namespace, and a user namespace, wherein the OAuth2 access authorization agent is connected to the ISTIO gateway, and the ISTIO gateway is connected to the system namespace and the user namespace, respectively.
In one embodiment, the system namespaces include a Jupyter Notebook controller, a resource configuration controller, and a user configuration, the user namespaces including a runtime space of Jupyter Notebook.
In one embodiment, the OAuth2 access authorization agent authenticates the user's access behavior based on the OAuth 2-compliant authentication service; the ISTIO gateway is used for limiting HTTP access of a specific user to the JupyterNotebook instance; the JupitterNotebook controller is used for managing JupitterNotebook instances; the resource configuration controller is used for creating and managing a name space of each user; the user configuration is responsible for recording configuration information of a user Jupyter Notebook instance; the system namespace serves as a runtime for system services, and the user namespace provides a runtime for a user jupyterNotebook.
In one embodiment, the data integration module performs data interaction with the external data source through a firewall, and the data integration module performs data interaction with the data warehouse module, and the data of the external data source is synchronized into the data warehouse module of the data exploration isolation area through the firewall by the data integration module.
In one embodiment, the mirror repository module provides the exploration isolation with a mirror image suitable for a data mining scenario.
In one embodiment, the knowledge sharing module performs data interaction with the external application through the firewall, and performs data interaction with the data warehouse module, so that knowledge generated by data mining is shared to the external application through the firewall through a sharing service.
In another embodiment of the present invention, a method for using a data mining oriented containerized data exploration isolation region is provided, including the following steps:
step 1, a user logs in a data exploration isolation area through a WEB browser;
step 2, the user applies for creating projects in the data exploration isolation area according to the data mining requirement, and performs project configuration;
step 3, the user creates a JupyterNotebook development environment in the created data exploration isolation area project, and configures the JupyterNotebook development environment;
step 4, synchronizing the data in the external data source into a data warehouse module of the data exploration isolation area project by using a data integration function;
step 5, the user logs in the JupyterNotebook development environment, accesses the data in the data warehouse module, completes the data mining work, and stores the data mining result in the data warehouse module;
step 6, the user shares the data mining result to the external application of the data exploration isolation area through the knowledge sharing module;
and 7, after the data mining work is completed, the user logs out the project created in the data exploration isolation area, and releases all computing resources and data resources related to the project.
In one embodiment, the project configuration in the step 2 includes configuration of computing resources and configuration of storage space of a data warehouse module; the development environment configuration in the step 3 includes a mirror configuration, a computing resource configuration, and a storage space configuration.
According to the containerized data exploration isolation region and the application method thereof for data mining, which are provided by the invention, on the basis that a network security environment is provided by a traditional exploration isolation region, a Jupyter Notebook is used as a data mining and algorithm development tool aiming at data mining requirements, and the containerized JupyterNotebook development environment is realized based on a Docker container virtualization technology and a Kubernetes container arrangement technology. A user can log in the data exploration isolation region through a browser, the running environment requirement and the computing resource requirement required by the code running of the JupyterNoteBook development environment are configured, the system automatically generates and starts a container mirror image containing JupyterNoteBook service and the running environment configured by the user, and the computing resource is configured for the containerized JupyterNoteBook development environment according to the computing resource requirement configured by the user through a Kubernetes container arrangement technology. After the jupyterNotebook service is started, the user may Jupter Notebook WEB develop an environment touch justerNotebook container runtime environment and use the computing and storage resources of the Kubernetes cluster bound to the jupyterNotebook. A user may write debugging algorithm code through a browser using a JupyterNotebook WEB development environment and run the algorithm code in a jupyterNotook service container on a Kubernetes cluster. The jupyterNotebook containerized development environment implemented based on Kubernetes and Docker may also dynamically modify the computing resources required for jupyterNotebook operation. The invention is applied to knowledge sharing service, actually forms a knowledge sharing service system based on a containerized data exploration isolation area, meets the requirements of data mining and artificial intelligent modeling under the premise of data security, and builds a containerized Notebook artificial intelligent modeling environment in the exploration isolation area based on a Kubernetes container arranging technology and a Docker container virtualization technology. Aiming at the requirements of artificial intelligent algorithm development and knowledge sharing service, the invention uses the container arrangement engine to construct the functions of container algorithm development, model training, model deployment, knowledge sharing service and the like in the exploration isolation region, thereby overcoming the defect that the exploration isolation region cannot carry out data mining and sharing in the prior art.
Drawings
In order to more clearly illustrate the invention and the technical solutions of the prior art, the drawings that need to be used will be briefly described below, it being evident that the drawings in the following description are only some embodiments of the invention, and that other drawings can be obtained from these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a diagram of a prior art data exploration isolation region architecture;
FIG. 2 is a diagram of a system architecture for exploring isolated regions for containerized data in accordance with the present invention;
FIG. 3 is a schematic diagram of the main functional modules and functional interactions of the data exploration isolation system of the present invention;
FIG. 4 is a schematic diagram of a JupyterNotebook development environment of the present invention;
FIG. 5 is a functional schematic of a data integration module according to the present invention;
FIG. 6 is a functional schematic of the knowledge sharing module of the present invention;
FIG. 7 is a flow chart of a method of using the data mining oriented containerized data exploration isolation region of the present invention.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the invention. Rather, they are merely examples of apparatus and methods consistent with aspects of the invention as detailed in the accompanying claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the invention. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.
The following explains some technical terms related to the present invention:
dock: is an open-source application container engine, allows developers to package their applications and rely on packages into a portable image, then release them to any popular Linux or Windows machine, and also can implement virtualization. The containers are completely sandboxed without any interface to each other.
Kubernets: is an open source\is used for managing containerized applications on multiple hosts in a cloud platform, and the goal of Kubernetes is to make deploying containerized applications simple and efficient, and Kubernetes provides a mechanism for application deployment, planning, updating, and maintenance.
Kubernetes CRD (Custom Resource Definitions), an extension to the Kubernetes API, represents a custom installation of one particular kubrenetes. In an operating cluster, custom resources may be dynamically registered into the cluster.
Juyter Notebook: is an interactive notebook, essentially a Web application program, is convenient for creating and sharing literature program documents, supports real-time codes, mathematical equations, visualizations and markdown. The application comprises the following steps: data cleaning and conversion, numerical simulation, statistical modeling, machine learning and the like.
OAuth2: is an open standard that allows users to authorize third party mobile applications to access information they store on another service provider without the need to provide the third party mobile application with a user name and password or to share all of their data.
Istio: a simple way is provided to build a network for deployed services with load balancing, inter-service authentication, monitoring, etc. functions without any modification of the code of the service.
Referring to fig. 2, a diagram of a system architecture of a containerized data exploration isolation area according to an exemplary embodiment of the present invention is shown, where an engine constructs a containerized notbook artificial intelligence modeling environment based on a Kubernetes container orchestration technology and a Docker container virtualization technology based on an exploration isolation area technology.
The containerized data exploration isolation area of the embodiment comprises a data mining system, and the data mining system can comprise a data integration module, a data warehouse, a mirror warehouse, a data mining development environment, a knowledge sharing service, a container engine and other functional modules. Firstly, constructing a data exploration isolation region by adopting an exploration isolation region technology, isolating the data exploration isolation region between an internal local area network and a wide area network Internet by using two firewalls, then constructing a container cloud environment in the data exploration isolation region by adopting a Docker container technology and a Kubernets container arrangement technology, and finally constructing service function modules such as a data mining development environment, data integration, a data warehouse, a mirror image warehouse, knowledge sharing and the like on the container cloud of the data exploration isolation region.
As shown in fig. 2, the network structure of the exploration isolation zone (DMZ) includes a local area network, a firewall 1, a router 1, a firewall 2, and the Internet, which are sequentially connected and perform data interaction, wherein the router 1 is connected with the containerized data exploration isolation zone and performs data interaction, and the containerized data exploration isolation zone includes a data mining system, and the data mining system includes a data integration module, a data warehouse module, a mirror warehouse module, a data mining development environment module (i.e., a development environment in the figure), and a knowledge sharing module.
FIG. 3 is a functional interaction diagram of a containerized data exploration isolation system, wherein the containerized data exploration isolation system includes a data integration module, a data warehouse module, a mirror warehouse module, a Kubernets-based container development environment (i.e., development environment), and a knowledge sharing module. The data integration module is connected with an external data source and performs data interaction on one hand, and is connected with the data warehouse module and performs data interaction on the other hand; the data warehouse module is connected with the container development environment based on the Kubernets and performs data interaction, and is connected with the knowledge sharing module and performs data interaction, and meanwhile, the data warehouse module is connected with the data integration module and performs data interaction; the knowledge sharing module is connected with an external application and performs data interaction; the container development environment based on the Kubernets is connected with the data warehouse module and performs data interaction on one hand, and is connected with the mirror image warehouse module and performs data interaction on the other hand, and meanwhile, the container development environment based on the Kubernets is connected with the Web browser and performs data interaction on the other hand.
The container development environment based on the Kubernets supports the self-defined container mirror image, the dynamic application and release of the container instance of the development environment, and the flexible configuration of computing resources such as CPU, GPU and memory. The data integration module synchronizes the data of the external data source into the data exploration isolation area through the firewall, and provides data resources for data mining and artificial intelligence algorithm development. The data warehouse uniformly stores the data in the data exploration isolation area and supports versioned data management. The data exploration isolation area provides a privately owned container mirror image warehouse function and supports user-defined container mirror images; the mirror image warehouse module is built by adopting a Harbor, which is an enterprise-level Registry server for storing and distributing Docker mirrors and provides good performance and safety by adding some functional characteristics such as safety, identification, management and the like necessary for enterprises, and provides common mirrors suitable for data mining scenes for exploring isolation areas. The knowledge sharing module shares the knowledge generated by the data mining to the external application through the firewall in the form of a RestFul API through the sharing service.
In the specific implementation process, a user logs in a data exploration isolation area through a Web browser; according to the data mining requirement, applying for creating projects in the data exploration isolation area, and configuring computing resources and data warehouse storage space for the projects; creating a development environment in the created data exploration isolation area project by a user, and selecting mirror image resources, computing resources and storage space required by data mining work for the development environment; synchronizing data in an external data source into a data warehouse of a data exploration isolation area project through a data integration function; the user completes data mining work through a development environment and stores the data mining result in a data warehouse; and the user shares the data mining result to the external application of the exploration isolation area for use through the knowledge sharing service function. An alternative implementation process is that after completing the data mining work, the user can log out the data exploration isolation area project, and release all the computing resources and data resources related to the project. By the method, resource isolation among development environments is realized, dynamic application, configuration and release of the development environments are supported, and data mining requirements are met.
In some embodiments, FIG. 4 illustrates a specific implementation of a Kubernets-based container development Environment for a data exploration isolation system. The system provides a containerized Jupyter Notebook development environment based on a Docker container technology and a Kubernetes container orchestration technology, supports custom container mirroring, supports dynamic application and release of container instances of the development environment, and supports flexible configuration of computing resources such as CPU, GPU and memory.
The user can select a container mirror image required by the data mining task in the data exploration isolation area, create a jupyterNotebook container and configure computing resources such as a CPU, a GPU, a memory and the like for the container. After the jupyterNotebook instance is successfully created, a user can use the computing resources bound to the container through the WEB programming environment provided by the jupyterNotebook, and different jupyterNotobook instances are isolated through the container. The Kubernetes-based containerized jupyternotbook is implemented in a Kubernetes CRD (custom resource definition) manner.
As shown in FIG. 4, the containerized Jupyter Notebook development environment comprises an OAuth2 access authorization agent, an ISTIO gateway, a system namespace and a user namespace, wherein the OAuth2 access authorization agent is connected with the ISTIO gateway, and the ISTIO gateway is respectively connected with the system namespace and the user namespace. More specifically, the system namespaces include a juyter Notebook controller, a resource configuration controller, and a user configuration, and the user namespaces include a runtime of the juyter Notebook.
In the implementation of a containerized JupyterNotebook development environment based on a Kuberntes container orchestration engine and a Docker container virtualization technology, an OAuth2 access authorization agent authenticates access behaviors of a user based on an authentication service of the OAuth2 specification; the ISTIO gateway is used for limiting HTTP access of a specific user to the JupyterNotebook instance; the JupiterNotebook controller is realized in a JupyterNotebook CRD mode and is used for managing JupiterNotebook instances, and the JupiterNotebook controller runs in a system naming space for exploring the isolation region service and is responsible for managing Pod, ISTIO routing and other related resources for running the JupiterNotebook container instances; the resource allocation controller is used for creating and managing the name space of each user; the user configuration is responsible for recording configuration information of a user Jupyter Notebook instance; the system namespace and the user namespace both actually form a total namespace, the namespaces refer to the namespaces in Kubernetes, kubernetes Namespace are abstract sets of a set of resources and objects, for example, the objects in the system can be divided into different project groups or user groups, the data exploration isolation area creates two types of namespaces on Kubernetes, namely, the system namespace and the user namespace, the system namespace is used as a running space of a system service, and the user namespace is an independent and isolated running space provided for a user jupyterNotook.
The creation and start-up flow of the containerized juyternotbook development environment is described next. Firstly, a user searches an isolation area through login data, and initiates a request for using a JupyterNotebook service and configuration information of the JupyterNotebook service; secondly, the request initiated by the user carries out access authentication through an OAuth2 access authorization agent, and after the access authentication is passed, the system uses an ISTIO gateway to manage the Http/Https access of the user to the JupyterNotook container instance; then, the system searches the configuration information of the user JupyterNotebook instance recorded in the user configuration; next, the system creates a namespace for the jupyterNotebook service applied by the user based on the jupyterNotebook instance configuration information by the Kubernetes resource configuration controller; finally, after the application of the naming space is completed, the system creates and starts the JupyterNotebook service instance in the naming space of the user by a Kubernetes CRD mode according to the configuration information of the JupyterNotebook instance of the user.
The system of the invention supports the self-defined container mirror image, the dynamic application and release of the container instance of the development environment and the flexible configuration of the computing resources such as CPU, GPU and memory by providing the containerized Jupyter Notebook development environment based on the Docker container technology and the Kubernetes container orchestration technology.
In some embodiments, FIG. 5 illustrates an implementation of a particular data integration module and data warehouse of the present invention. The data integration module is responsible for synchronizing data of an external data source into the data exploration isolation area through the firewall, and providing data resources for data mining and artificial intelligent algorithm development; the data warehouse uniformly stores the data in the data exploration isolation area and supports versioned data management. As shown in fig. 5, the data integration module performs data interaction with an external data source through a firewall, and performs data interaction with the data warehouse module, and the data of the external data source is synchronized into the data warehouse of the data exploration isolation area through the firewall.
In some embodiments, the data exploration isolation provides a privately owned container mirror repository function, supporting user-defined container mirroring. Specifically, the mirror warehouse is built by adopting a Harbor, and a common mirror suitable for a data mining scene is provided for exploring an isolation area, wherein the Harbor is an enterprise-level Registry server for storing and distributing the Docker mirror, and the Harbor provides good performance and safety by adding some functional characteristics, such as safety, identification, management and the like, required by enterprises.
In some embodiments, FIG. 6 illustrates an implementation of a particular knowledge sharing module of the present invention. As shown in fig. 6, the knowledge sharing module performs data interaction with the external application through the firewall, and performs data interaction with the data warehouse module, and in a specific operation process, the knowledge sharing module shares knowledge generated by data mining to the external application through the firewall in a form of a RestFul API through the sharing service. The API is a calling interface reserved for the application program by the operating system, and the application program enables the operating system to execute a command of the application program by calling the API of the operating system; restFul is a style of software architecture, and is mainly used for application programs of client and server interaction types; the RestFul API is a RestFul style API.
In some embodiments, as shown in fig. 7, the present invention further provides a method for using the containerized data exploration isolation region for data mining, so that a user can dynamically and flexibly apply, configure and release computing resources, thereby realizing resource isolation, dynamic configuration and release between development environments.
The application method of the containerized data exploration isolation area facing data mining comprises the following steps:
step 1, a user logs in a data exploration isolation area through a WEB browser;
step 2, the user applies for creating projects in the data exploration isolation area according to the data mining requirement, and performs project configuration;
step 3, the user creates a JupyterNotebook development environment in the created data exploration isolation area project, and configures the JupyterNotebook development environment;
step 4, synchronizing the data in the external data source into a data warehouse module of the data exploration isolation area project by using a data integration function;
step 5, the user logs in the JupyterNotebook development environment, accesses the data in the data warehouse module, completes the data mining work, and stores the data mining result in the data warehouse module;
step 6, the user shares the data mining result to the external application of the data exploration isolation area through the knowledge sharing module;
and 7, after the data mining work is completed, the user logs out the project created in the data exploration isolation area, and releases all computing resources and data resources related to the project.
Specifically, the project configuration in the step 2 includes configuration of computing resources and configuration of storage space of a data warehouse module; the computing resources can specifically comprise CPU, memory, GPU and other computing resources; the configuration of the storage space of the data warehouse module is specifically to open up the storage space of the data warehouse module in the data exploration isolation area.
Specifically, the development environment configuration in step 3 includes a mirror configuration, a computing resource configuration, and a storage space configuration; the mirror image configuration refers to selecting a mirror image required by the running of the development environment, the computing resources comprise CPU, memory, GPU and other computing resources required by the running of the development environment, and the storage space refers to data storage space required in the data mining process.
By the method for using the containerized data exploration isolation region facing data mining, a user can log in the data exploration isolation region through a browser to configure the running environment requirement and the computing resource requirement required by the code running of the JupyterNotebook development environment, the system automatically generates and starts the container mirror image containing the JupyterNotebook service and the running environment configured by the user, and the computing resource is configured for the containerized JupyterNotebook development environment according to the computing resource requirement configured by the user through a Kubernetes container arrangement technology. After the jupyterNotebook service is started, the user may Jupter Notebook WEB develop an environment touch justerNotebook container runtime environment and use the computing and storage resources of the Kubernetes cluster bound to the jupyterNotebook.
The above is a detailed description of the technical solution proposed by the present invention. In the description of the present specification, reference to the terms "one embodiment," "some embodiments," "illustrative embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The foregoing examples merely illustrate several embodiments of the invention and are presented herein to illustrate the principles and embodiments of the invention and are merely intended to aid in the understanding of the method of the invention and its core ideas; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.
Claims (7)
1. A containerized data exploration isolation region oriented to data mining is characterized in that,
the system comprises a data integration module, a data warehouse module, a mirror image warehouse module, a container development environment based on Kubernets and a knowledge sharing module; the data integration module is connected with an external data source and performs data interaction, and the data integration module is connected with the data warehouse module and performs data interaction; the data warehouse module is connected with the container development environment based on the Kubernets and performs data interaction, is connected with the knowledge sharing module and performs data interaction, and is also connected with the data integration module and performs data interaction; the knowledge sharing module is connected with the external application and performs data interaction; the container development environment based on the Kubernets is connected with the data warehouse module and performs data interaction, and is connected with the mirror image warehouse module and performs data interaction, and meanwhile, the container development environment based on the Kubernets is connected with the Web browser and performs data interaction;
creating a containerized JupyterNotebook development environment in the data exploration isolation area, wherein the containerized JupyterNotebook development environment is created based on a Docker container virtualization technology and a Kubernetes container orchestration technology;
the containerized JupyterNotebook development environment comprises an OAuth2 access authorization agent, an ISTIO gateway, a system naming space and a user naming space, wherein the OAuth2 access authorization agent is connected with the ISTIO gateway, and the ISTIO gateway is respectively connected with the system naming space and the user naming space;
the OAuth2 access authorization agent authenticates the access behavior of the user based on the authentication service of the OAuth2 specification; the ISTIO gateway is used for limiting HTTP access of a specific user to the JupyterNotebook instance; the JupitterNotebook controller is used for managing JupitterNotebook instances; the resource allocation controller is used for creating and managing the name space of each user; the user configuration is responsible for recording configuration information of a user JupyterNotebook instance; the system namespace serves as a runtime for system services, and the user namespace provides a runtime for a user jupyterNotebook.
2. The data mining oriented containerized data exploration isolation of claim 1, wherein said system namespace comprises a jupyterNotebook controller, a resource configuration controller, and a user configuration, said user namespace comprising a jupyterNotebook runtime.
3. The data mining oriented containerized data exploration isolation of claim 1, wherein said data integration module is in data interaction with said external data source through a firewall and said data integration module is in data interaction with said data warehouse module, data of said external data source being synchronized by said data integration module through said firewall into said data warehouse module of said data exploration isolation.
4. The data mining oriented containerized data exploration isolation of claim 1, wherein said mirrored repository module provides mirroring for said exploration isolation that is applicable to a data mining scenario.
5. The data mining oriented containerized data exploration isolation of claim 1, wherein the knowledge sharing module performs data interaction with the external application through a firewall, and the knowledge sharing module performs data interaction with the data warehouse module, and shares knowledge generated by data mining to the external application through a sharing service through the firewall.
6. The method for using the data mining oriented containerized data exploration isolation according to claim 1, wherein the method comprises the following steps:
step 1, a user logs in a data exploration isolation area through a WEB browser;
step 2, the user applies for creating projects in the data exploration isolation area according to the data mining requirement, and performs project configuration;
step 3, the user creates a JupyterNotebook development environment in the created data exploration isolation area project, and configures the JupyterNotebook development environment;
step 4, synchronizing the data in the external data source into a data warehouse module of the data exploration isolation area project by using a data integration function;
step 5, the user logs in the JupyterNotebook development environment, accesses the data in the data warehouse module, completes the data mining work, and stores the data mining result in the data warehouse module;
step 6, the user shares the data mining result to the external application of the data exploration isolation area through the knowledge sharing module;
and 7, after the data mining work is completed, the user logs out the project created in the data exploration isolation area, and releases all computing resources and data resources related to the project.
7. The method for using a data mining oriented containerized data exploration isolation according to claim 6, wherein the project configuration in the step 2 includes configuration of computing resources and configuration of storage space of a data warehouse module; the development environment configuration in the step 3 includes a mirror configuration, a computing resource configuration, and a storage space configuration.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010189150.7A CN111400374B (en) | 2020-03-18 | 2020-03-18 | Data mining-oriented containerized data exploration isolation region and use method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010189150.7A CN111400374B (en) | 2020-03-18 | 2020-03-18 | Data mining-oriented containerized data exploration isolation region and use method thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111400374A CN111400374A (en) | 2020-07-10 |
CN111400374B true CN111400374B (en) | 2023-05-23 |
Family
ID=71432555
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010189150.7A Active CN111400374B (en) | 2020-03-18 | 2020-03-18 | Data mining-oriented containerized data exploration isolation region and use method thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111400374B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112732495A (en) * | 2021-01-19 | 2021-04-30 | 北京澎思科技有限公司 | Information interaction method and device and storage medium |
CN112988165A (en) * | 2021-04-15 | 2021-06-18 | 成都新希望金融信息有限公司 | Kubernetes-based interactive modeling method and device, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105119750A (en) * | 2015-09-08 | 2015-12-02 | 南京联成科技发展有限公司 | Distributed information security operation and maintenance management platform based on massive data |
CN109635018A (en) * | 2018-10-19 | 2019-04-16 | 陕西艾特信息化工程咨询有限责任公司 | A kind of method for interchanging data based on container |
CN110197084A (en) * | 2019-06-12 | 2019-09-03 | 上海联息生物科技有限公司 | Medical data combination learning system and method based on trust computing and secret protection |
CN110430173A (en) * | 2019-07-19 | 2019-11-08 | 河南工程学院 | A kind of cloud platform based on Vue+SpringCloud |
CN110781226A (en) * | 2019-10-28 | 2020-02-11 | 中国建设银行股份有限公司 | Data analysis method, device, storage medium, equipment and system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106888254B (en) * | 2017-01-20 | 2020-08-18 | 华南理工大学 | Kubernetes-based container cloud architecture and interaction method among modules thereof |
-
2020
- 2020-03-18 CN CN202010189150.7A patent/CN111400374B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105119750A (en) * | 2015-09-08 | 2015-12-02 | 南京联成科技发展有限公司 | Distributed information security operation and maintenance management platform based on massive data |
CN109635018A (en) * | 2018-10-19 | 2019-04-16 | 陕西艾特信息化工程咨询有限责任公司 | A kind of method for interchanging data based on container |
CN110197084A (en) * | 2019-06-12 | 2019-09-03 | 上海联息生物科技有限公司 | Medical data combination learning system and method based on trust computing and secret protection |
CN110430173A (en) * | 2019-07-19 | 2019-11-08 | 河南工程学院 | A kind of cloud platform based on Vue+SpringCloud |
CN110781226A (en) * | 2019-10-28 | 2020-02-11 | 中国建设银行股份有限公司 | Data analysis method, device, storage medium, equipment and system |
Also Published As
Publication number | Publication date |
---|---|
CN111400374A (en) | 2020-07-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11064014B2 (en) | System and method for batch computing | |
Pop et al. | The FORA fog computing platform for industrial IoT | |
US9424554B2 (en) | Enterprise managed systems with collaborative application support | |
US8862933B2 (en) | Apparatus, systems and methods for deployment and management of distributed computing systems and applications | |
US20120066487A1 (en) | System and method for providing load balancer visibility in an intelligent workload management system | |
CN106031128B (en) | The method and apparatus of mobile device management | |
US10534337B2 (en) | Flow engine for building automated flows within a cloud based developmental platform | |
CN112948278B (en) | Product gray level publishing method, device, equipment and medium based on gray level database | |
US20140172954A1 (en) | System and method for private cloud introduction and implementation | |
US9210098B2 (en) | Enhanced command selection in a networked computing environment | |
CN111400374B (en) | Data mining-oriented containerized data exploration isolation region and use method thereof | |
US20220244982A1 (en) | Network-efficient isolation environment redistribution | |
CN109901823A (en) | Interactive model exploitation environmental system and method based on cloud environment | |
Grandinetti | Pervasive cloud computing technologies: future outlooks and interdisciplinary perspectives: future outlooks and interdisciplinary perspectives | |
Erulanova et al. | Hardware and software support of technological processes virtualization | |
US20210406227A1 (en) | Linking, deploying, and executing distributed analytics with distributed datasets | |
KR102287972B1 (en) | operation method of cloud-based virtualized computer room service | |
CN114422542A (en) | Terminal domain management system | |
CN113835827A (en) | Application deployment method and device based on container Docker and electronic equipment | |
Muthoni et al. | Infrastructure as Code for Business Continuity in Institutions of Higher Learning | |
Wagner et al. | User managed virtual clusters in comet | |
CN112564979A (en) | Execution method and device for construction task, computer equipment and storage medium | |
Kampert | A taxonomy of virtualization technologies | |
Battarra et al. | Storm clouds platform: a cloud computing platform for smart city applications | |
Oh et al. | A Survey on Microservices Use Cases for AI based Application on Hybrid Cloud |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |