CN111400374B - Data mining-oriented containerized data exploration isolation region and use method thereof - Google Patents

Data mining-oriented containerized data exploration isolation region and use method thereof Download PDF

Info

Publication number
CN111400374B
CN111400374B CN202010189150.7A CN202010189150A CN111400374B CN 111400374 B CN111400374 B CN 111400374B CN 202010189150 A CN202010189150 A CN 202010189150A CN 111400374 B CN111400374 B CN 111400374B
Authority
CN
China
Prior art keywords
data
module
exploration
user
isolation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010189150.7A
Other languages
Chinese (zh)
Other versions
CN111400374A (en
Inventor
王臻
赵龙军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Xiongan Group Digital Urban Technology Co ltd
Original Assignee
China Xiongan Group Digital Urban Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Xiongan Group Digital Urban Technology Co ltd filed Critical China Xiongan Group Digital Urban Technology Co ltd
Priority to CN202010189150.7A priority Critical patent/CN111400374B/en
Publication of CN111400374A publication Critical patent/CN111400374A/en
Application granted granted Critical
Publication of CN111400374B publication Critical patent/CN111400374B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/24569Query processing with adaptation to specific hardware, e.g. adapted for using GPUs or SSDs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0209Architectural arrangements, e.g. perimeter networks or demilitarized zones
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Hardware Design (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a containerized data exploration isolation area oriented to data mining, which comprises a data integration module, a data warehouse module, a mirror image warehouse module, a container development environment based on Kubernets and a knowledge sharing module, wherein the data integration module is connected with an external data source and the data warehouse module and performs data interaction, and the data warehouse module is connected with the container development environment based on Kubernets and the knowledge sharing module and the data integration module and performs data interaction; the knowledge sharing module is connected with an external application and performs data interaction; the container development environment based on the Kubernets is connected with the data warehouse module, the mirror warehouse module and the Web browser and performs data interaction. A method for using the above-mentioned containerized data exploration isolation region is also provided. The invention solves the problems of data mining and sharing of the data exploration isolation area.

Description

Data mining-oriented containerized data exploration isolation region and use method thereof
Technical Field
The invention relates to the field of data exploration isolation areas, in particular to a containerized data exploration isolation area for data mining and a use method thereof.
Background
The exploration isolation zone (Demilitarized Zone, abbreviated as DMZ) is set for solving the problem that an external network cannot access an internal network server after a firewall is set, and is a network area between a non-security system and a security system or between an internal network and an external network of an enterprise, and is a buffer zone divided for placing server facilities which have to be disclosed externally for the internal network. The technology of exploring the isolation area aims at solving the network security problem, and the exploring the isolation area isolates a network security area by using the firewall technology to protect the network security of the data and services of the area.
The exploration isolation zone (DMZ) in the prior art is mainly used for providing services such as HTTP, FTP, SSH, SMTP and the like which are safe to the outside, meanwhile, compared with a general firewall scheme, the network deployment has one more gateway for an attacker, the internal network can be more effectively protected through the exploration isolation zone (DMZ), and the network deployment is a buffer zone for the internal network, wherein the buffer zone is divided into a plurality of server facilities which have to be disclosed to the outside.
The network structure of the exploration isolation zone (DMZ) is shown in fig. 1, and comprises a local area network, a firewall 1, a router 1, a firewall 2 and the Internet which are sequentially connected and perform data interaction, and also comprises an exploration isolation zone which is connected with the router 1 and performs data interaction, wherein the exploration isolation zone comprises a Web server, a service server and a database. However, with the increase of network access requirements and user use functions, in the data mining process, a code development environment is often required to develop and debug a data mining algorithm, and a code running dependent environment required by data mining needs to be configured for the development environment so as to ensure that written codes can run normally and stably. The existing exploration isolation area only provides a network security isolation area, does not provide an algorithm development and data analysis tool suitable for data mining, is not suitable for data mining related functions, and cannot meet the data mining requirements.
In order to solve the problem, the method uses the Jupyter Notebook as a data mining and algorithm development tool according to the data mining requirement on the basis of providing a network security environment by the traditional exploration isolation area, and realizes a containerized JupyterNotebook development environment based on a Docker container virtualization technology and a Kubernetes container arrangement technology.
Disclosure of Invention
In view of this, the present invention proposes a data mining oriented containerized data exploration isolation region, which aims to solve the problem that the data exploration isolation region in the prior art cannot be suitable for data mining and cannot meet the data mining requirement and data sharing.
According to a first aspect of the present invention, a data mining oriented containerized data exploration isolation region is presented,
the system comprises a data integration module, a data warehouse module, a mirror image warehouse module, a container development environment based on Kubernets and a knowledge sharing module; the data integration module is connected with an external data source and performs data interaction on one hand, and is connected with the data warehouse module and performs data interaction on the other hand; the data warehouse module is connected with the Kubernets-based container development environment and performs data interaction on one hand, and is connected with the knowledge sharing module and performs data interaction on the other hand, and meanwhile, is also connected with the data integration module and performs data interaction on the other hand; the knowledge sharing module is connected with the external application and performs data interaction; the container development environment based on the Kubernets is connected with the data warehouse module and performs data interaction on one hand, and is connected with the mirror image warehouse module and performs data interaction on the other hand, and meanwhile, the container development environment based on the Kubernets is connected with the Web browser and performs data interaction.
In one embodiment, the data exploration isolation creates a containerized jupyternotbook development environment.
In one embodiment, the containerized juyter Notebook development environment includes an OAuth2 access authorization agent, an ISTIO gateway, a system namespace, and a user namespace, wherein the OAuth2 access authorization agent is connected to the ISTIO gateway, and the ISTIO gateway is connected to the system namespace and the user namespace, respectively.
In one embodiment, the system namespaces include a Jupyter Notebook controller, a resource configuration controller, and a user configuration, the user namespaces including a runtime space of Jupyter Notebook.
In one embodiment, the OAuth2 access authorization agent authenticates the user's access behavior based on the OAuth 2-compliant authentication service; the ISTIO gateway is used for limiting HTTP access of a specific user to the JupyterNotebook instance; the JupitterNotebook controller is used for managing JupitterNotebook instances; the resource configuration controller is used for creating and managing a name space of each user; the user configuration is responsible for recording configuration information of a user Jupyter Notebook instance; the system namespace serves as a runtime for system services, and the user namespace provides a runtime for a user jupyterNotebook.
In one embodiment, the data integration module performs data interaction with the external data source through a firewall, and the data integration module performs data interaction with the data warehouse module, and the data of the external data source is synchronized into the data warehouse module of the data exploration isolation area through the firewall by the data integration module.
In one embodiment, the mirror repository module provides the exploration isolation with a mirror image suitable for a data mining scenario.
In one embodiment, the knowledge sharing module performs data interaction with the external application through the firewall, and performs data interaction with the data warehouse module, so that knowledge generated by data mining is shared to the external application through the firewall through a sharing service.
In another embodiment of the present invention, a method for using a data mining oriented containerized data exploration isolation region is provided, including the following steps:
step 1, a user logs in a data exploration isolation area through a WEB browser;
step 2, the user applies for creating projects in the data exploration isolation area according to the data mining requirement, and performs project configuration;
step 3, the user creates a JupyterNotebook development environment in the created data exploration isolation area project, and configures the JupyterNotebook development environment;
step 4, synchronizing the data in the external data source into a data warehouse module of the data exploration isolation area project by using a data integration function;
step 5, the user logs in the JupyterNotebook development environment, accesses the data in the data warehouse module, completes the data mining work, and stores the data mining result in the data warehouse module;
step 6, the user shares the data mining result to the external application of the data exploration isolation area through the knowledge sharing module;
and 7, after the data mining work is completed, the user logs out the project created in the data exploration isolation area, and releases all computing resources and data resources related to the project.
In one embodiment, the project configuration in the step 2 includes configuration of computing resources and configuration of storage space of a data warehouse module; the development environment configuration in the step 3 includes a mirror configuration, a computing resource configuration, and a storage space configuration.
According to the containerized data exploration isolation region and the application method thereof for data mining, which are provided by the invention, on the basis that a network security environment is provided by a traditional exploration isolation region, a Jupyter Notebook is used as a data mining and algorithm development tool aiming at data mining requirements, and the containerized JupyterNotebook development environment is realized based on a Docker container virtualization technology and a Kubernetes container arrangement technology. A user can log in the data exploration isolation region through a browser, the running environment requirement and the computing resource requirement required by the code running of the JupyterNoteBook development environment are configured, the system automatically generates and starts a container mirror image containing JupyterNoteBook service and the running environment configured by the user, and the computing resource is configured for the containerized JupyterNoteBook development environment according to the computing resource requirement configured by the user through a Kubernetes container arrangement technology. After the jupyterNotebook service is started, the user may Jupter Notebook WEB develop an environment touch justerNotebook container runtime environment and use the computing and storage resources of the Kubernetes cluster bound to the jupyterNotebook. A user may write debugging algorithm code through a browser using a JupyterNotebook WEB development environment and run the algorithm code in a jupyterNotook service container on a Kubernetes cluster. The jupyterNotebook containerized development environment implemented based on Kubernetes and Docker may also dynamically modify the computing resources required for jupyterNotebook operation. The invention is applied to knowledge sharing service, actually forms a knowledge sharing service system based on a containerized data exploration isolation area, meets the requirements of data mining and artificial intelligent modeling under the premise of data security, and builds a containerized Notebook artificial intelligent modeling environment in the exploration isolation area based on a Kubernetes container arranging technology and a Docker container virtualization technology. Aiming at the requirements of artificial intelligent algorithm development and knowledge sharing service, the invention uses the container arrangement engine to construct the functions of container algorithm development, model training, model deployment, knowledge sharing service and the like in the exploration isolation region, thereby overcoming the defect that the exploration isolation region cannot carry out data mining and sharing in the prior art.
Drawings
In order to more clearly illustrate the invention and the technical solutions of the prior art, the drawings that need to be used will be briefly described below, it being evident that the drawings in the following description are only some embodiments of the invention, and that other drawings can be obtained from these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a diagram of a prior art data exploration isolation region architecture;
FIG. 2 is a diagram of a system architecture for exploring isolated regions for containerized data in accordance with the present invention;
FIG. 3 is a schematic diagram of the main functional modules and functional interactions of the data exploration isolation system of the present invention;
FIG. 4 is a schematic diagram of a JupyterNotebook development environment of the present invention;
FIG. 5 is a functional schematic of a data integration module according to the present invention;
FIG. 6 is a functional schematic of the knowledge sharing module of the present invention;
FIG. 7 is a flow chart of a method of using the data mining oriented containerized data exploration isolation region of the present invention.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the invention. Rather, they are merely examples of apparatus and methods consistent with aspects of the invention as detailed in the accompanying claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the invention. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.
The following explains some technical terms related to the present invention:
dock: is an open-source application container engine, allows developers to package their applications and rely on packages into a portable image, then release them to any popular Linux or Windows machine, and also can implement virtualization. The containers are completely sandboxed without any interface to each other.
Kubernets: is an open source\is used for managing containerized applications on multiple hosts in a cloud platform, and the goal of Kubernetes is to make deploying containerized applications simple and efficient, and Kubernetes provides a mechanism for application deployment, planning, updating, and maintenance.
Kubernetes CRD (Custom Resource Definitions), an extension to the Kubernetes API, represents a custom installation of one particular kubrenetes. In an operating cluster, custom resources may be dynamically registered into the cluster.
Juyter Notebook: is an interactive notebook, essentially a Web application program, is convenient for creating and sharing literature program documents, supports real-time codes, mathematical equations, visualizations and markdown. The application comprises the following steps: data cleaning and conversion, numerical simulation, statistical modeling, machine learning and the like.
OAuth2: is an open standard that allows users to authorize third party mobile applications to access information they store on another service provider without the need to provide the third party mobile application with a user name and password or to share all of their data.
Istio: a simple way is provided to build a network for deployed services with load balancing, inter-service authentication, monitoring, etc. functions without any modification of the code of the service.
Referring to fig. 2, a diagram of a system architecture of a containerized data exploration isolation area according to an exemplary embodiment of the present invention is shown, where an engine constructs a containerized notbook artificial intelligence modeling environment based on a Kubernetes container orchestration technology and a Docker container virtualization technology based on an exploration isolation area technology.
The containerized data exploration isolation area of the embodiment comprises a data mining system, and the data mining system can comprise a data integration module, a data warehouse, a mirror warehouse, a data mining development environment, a knowledge sharing service, a container engine and other functional modules. Firstly, constructing a data exploration isolation region by adopting an exploration isolation region technology, isolating the data exploration isolation region between an internal local area network and a wide area network Internet by using two firewalls, then constructing a container cloud environment in the data exploration isolation region by adopting a Docker container technology and a Kubernets container arrangement technology, and finally constructing service function modules such as a data mining development environment, data integration, a data warehouse, a mirror image warehouse, knowledge sharing and the like on the container cloud of the data exploration isolation region.
As shown in fig. 2, the network structure of the exploration isolation zone (DMZ) includes a local area network, a firewall 1, a router 1, a firewall 2, and the Internet, which are sequentially connected and perform data interaction, wherein the router 1 is connected with the containerized data exploration isolation zone and performs data interaction, and the containerized data exploration isolation zone includes a data mining system, and the data mining system includes a data integration module, a data warehouse module, a mirror warehouse module, a data mining development environment module (i.e., a development environment in the figure), and a knowledge sharing module.
FIG. 3 is a functional interaction diagram of a containerized data exploration isolation system, wherein the containerized data exploration isolation system includes a data integration module, a data warehouse module, a mirror warehouse module, a Kubernets-based container development environment (i.e., development environment), and a knowledge sharing module. The data integration module is connected with an external data source and performs data interaction on one hand, and is connected with the data warehouse module and performs data interaction on the other hand; the data warehouse module is connected with the container development environment based on the Kubernets and performs data interaction, and is connected with the knowledge sharing module and performs data interaction, and meanwhile, the data warehouse module is connected with the data integration module and performs data interaction; the knowledge sharing module is connected with an external application and performs data interaction; the container development environment based on the Kubernets is connected with the data warehouse module and performs data interaction on one hand, and is connected with the mirror image warehouse module and performs data interaction on the other hand, and meanwhile, the container development environment based on the Kubernets is connected with the Web browser and performs data interaction on the other hand.
The container development environment based on the Kubernets supports the self-defined container mirror image, the dynamic application and release of the container instance of the development environment, and the flexible configuration of computing resources such as CPU, GPU and memory. The data integration module synchronizes the data of the external data source into the data exploration isolation area through the firewall, and provides data resources for data mining and artificial intelligence algorithm development. The data warehouse uniformly stores the data in the data exploration isolation area and supports versioned data management. The data exploration isolation area provides a privately owned container mirror image warehouse function and supports user-defined container mirror images; the mirror image warehouse module is built by adopting a Harbor, which is an enterprise-level Registry server for storing and distributing Docker mirrors and provides good performance and safety by adding some functional characteristics such as safety, identification, management and the like necessary for enterprises, and provides common mirrors suitable for data mining scenes for exploring isolation areas. The knowledge sharing module shares the knowledge generated by the data mining to the external application through the firewall in the form of a RestFul API through the sharing service.
In the specific implementation process, a user logs in a data exploration isolation area through a Web browser; according to the data mining requirement, applying for creating projects in the data exploration isolation area, and configuring computing resources and data warehouse storage space for the projects; creating a development environment in the created data exploration isolation area project by a user, and selecting mirror image resources, computing resources and storage space required by data mining work for the development environment; synchronizing data in an external data source into a data warehouse of a data exploration isolation area project through a data integration function; the user completes data mining work through a development environment and stores the data mining result in a data warehouse; and the user shares the data mining result to the external application of the exploration isolation area for use through the knowledge sharing service function. An alternative implementation process is that after completing the data mining work, the user can log out the data exploration isolation area project, and release all the computing resources and data resources related to the project. By the method, resource isolation among development environments is realized, dynamic application, configuration and release of the development environments are supported, and data mining requirements are met.
In some embodiments, FIG. 4 illustrates a specific implementation of a Kubernets-based container development Environment for a data exploration isolation system. The system provides a containerized Jupyter Notebook development environment based on a Docker container technology and a Kubernetes container orchestration technology, supports custom container mirroring, supports dynamic application and release of container instances of the development environment, and supports flexible configuration of computing resources such as CPU, GPU and memory.
The user can select a container mirror image required by the data mining task in the data exploration isolation area, create a jupyterNotebook container and configure computing resources such as a CPU, a GPU, a memory and the like for the container. After the jupyterNotebook instance is successfully created, a user can use the computing resources bound to the container through the WEB programming environment provided by the jupyterNotebook, and different jupyterNotobook instances are isolated through the container. The Kubernetes-based containerized jupyternotbook is implemented in a Kubernetes CRD (custom resource definition) manner.
As shown in FIG. 4, the containerized Jupyter Notebook development environment comprises an OAuth2 access authorization agent, an ISTIO gateway, a system namespace and a user namespace, wherein the OAuth2 access authorization agent is connected with the ISTIO gateway, and the ISTIO gateway is respectively connected with the system namespace and the user namespace. More specifically, the system namespaces include a juyter Notebook controller, a resource configuration controller, and a user configuration, and the user namespaces include a runtime of the juyter Notebook.
In the implementation of a containerized JupyterNotebook development environment based on a Kuberntes container orchestration engine and a Docker container virtualization technology, an OAuth2 access authorization agent authenticates access behaviors of a user based on an authentication service of the OAuth2 specification; the ISTIO gateway is used for limiting HTTP access of a specific user to the JupyterNotebook instance; the JupiterNotebook controller is realized in a JupyterNotebook CRD mode and is used for managing JupiterNotebook instances, and the JupiterNotebook controller runs in a system naming space for exploring the isolation region service and is responsible for managing Pod, ISTIO routing and other related resources for running the JupiterNotebook container instances; the resource allocation controller is used for creating and managing the name space of each user; the user configuration is responsible for recording configuration information of a user Jupyter Notebook instance; the system namespace and the user namespace both actually form a total namespace, the namespaces refer to the namespaces in Kubernetes, kubernetes Namespace are abstract sets of a set of resources and objects, for example, the objects in the system can be divided into different project groups or user groups, the data exploration isolation area creates two types of namespaces on Kubernetes, namely, the system namespace and the user namespace, the system namespace is used as a running space of a system service, and the user namespace is an independent and isolated running space provided for a user jupyterNotook.
The creation and start-up flow of the containerized juyternotbook development environment is described next. Firstly, a user searches an isolation area through login data, and initiates a request for using a JupyterNotebook service and configuration information of the JupyterNotebook service; secondly, the request initiated by the user carries out access authentication through an OAuth2 access authorization agent, and after the access authentication is passed, the system uses an ISTIO gateway to manage the Http/Https access of the user to the JupyterNotook container instance; then, the system searches the configuration information of the user JupyterNotebook instance recorded in the user configuration; next, the system creates a namespace for the jupyterNotebook service applied by the user based on the jupyterNotebook instance configuration information by the Kubernetes resource configuration controller; finally, after the application of the naming space is completed, the system creates and starts the JupyterNotebook service instance in the naming space of the user by a Kubernetes CRD mode according to the configuration information of the JupyterNotebook instance of the user.
The system of the invention supports the self-defined container mirror image, the dynamic application and release of the container instance of the development environment and the flexible configuration of the computing resources such as CPU, GPU and memory by providing the containerized Jupyter Notebook development environment based on the Docker container technology and the Kubernetes container orchestration technology.
In some embodiments, FIG. 5 illustrates an implementation of a particular data integration module and data warehouse of the present invention. The data integration module is responsible for synchronizing data of an external data source into the data exploration isolation area through the firewall, and providing data resources for data mining and artificial intelligent algorithm development; the data warehouse uniformly stores the data in the data exploration isolation area and supports versioned data management. As shown in fig. 5, the data integration module performs data interaction with an external data source through a firewall, and performs data interaction with the data warehouse module, and the data of the external data source is synchronized into the data warehouse of the data exploration isolation area through the firewall.
In some embodiments, the data exploration isolation provides a privately owned container mirror repository function, supporting user-defined container mirroring. Specifically, the mirror warehouse is built by adopting a Harbor, and a common mirror suitable for a data mining scene is provided for exploring an isolation area, wherein the Harbor is an enterprise-level Registry server for storing and distributing the Docker mirror, and the Harbor provides good performance and safety by adding some functional characteristics, such as safety, identification, management and the like, required by enterprises.
In some embodiments, FIG. 6 illustrates an implementation of a particular knowledge sharing module of the present invention. As shown in fig. 6, the knowledge sharing module performs data interaction with the external application through the firewall, and performs data interaction with the data warehouse module, and in a specific operation process, the knowledge sharing module shares knowledge generated by data mining to the external application through the firewall in a form of a RestFul API through the sharing service. The API is a calling interface reserved for the application program by the operating system, and the application program enables the operating system to execute a command of the application program by calling the API of the operating system; restFul is a style of software architecture, and is mainly used for application programs of client and server interaction types; the RestFul API is a RestFul style API.
In some embodiments, as shown in fig. 7, the present invention further provides a method for using the containerized data exploration isolation region for data mining, so that a user can dynamically and flexibly apply, configure and release computing resources, thereby realizing resource isolation, dynamic configuration and release between development environments.
The application method of the containerized data exploration isolation area facing data mining comprises the following steps:
step 1, a user logs in a data exploration isolation area through a WEB browser;
step 2, the user applies for creating projects in the data exploration isolation area according to the data mining requirement, and performs project configuration;
step 3, the user creates a JupyterNotebook development environment in the created data exploration isolation area project, and configures the JupyterNotebook development environment;
step 4, synchronizing the data in the external data source into a data warehouse module of the data exploration isolation area project by using a data integration function;
step 5, the user logs in the JupyterNotebook development environment, accesses the data in the data warehouse module, completes the data mining work, and stores the data mining result in the data warehouse module;
step 6, the user shares the data mining result to the external application of the data exploration isolation area through the knowledge sharing module;
and 7, after the data mining work is completed, the user logs out the project created in the data exploration isolation area, and releases all computing resources and data resources related to the project.
Specifically, the project configuration in the step 2 includes configuration of computing resources and configuration of storage space of a data warehouse module; the computing resources can specifically comprise CPU, memory, GPU and other computing resources; the configuration of the storage space of the data warehouse module is specifically to open up the storage space of the data warehouse module in the data exploration isolation area.
Specifically, the development environment configuration in step 3 includes a mirror configuration, a computing resource configuration, and a storage space configuration; the mirror image configuration refers to selecting a mirror image required by the running of the development environment, the computing resources comprise CPU, memory, GPU and other computing resources required by the running of the development environment, and the storage space refers to data storage space required in the data mining process.
By the method for using the containerized data exploration isolation region facing data mining, a user can log in the data exploration isolation region through a browser to configure the running environment requirement and the computing resource requirement required by the code running of the JupyterNotebook development environment, the system automatically generates and starts the container mirror image containing the JupyterNotebook service and the running environment configured by the user, and the computing resource is configured for the containerized JupyterNotebook development environment according to the computing resource requirement configured by the user through a Kubernetes container arrangement technology. After the jupyterNotebook service is started, the user may Jupter Notebook WEB develop an environment touch justerNotebook container runtime environment and use the computing and storage resources of the Kubernetes cluster bound to the jupyterNotebook.
The above is a detailed description of the technical solution proposed by the present invention. In the description of the present specification, reference to the terms "one embodiment," "some embodiments," "illustrative embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The foregoing examples merely illustrate several embodiments of the invention and are presented herein to illustrate the principles and embodiments of the invention and are merely intended to aid in the understanding of the method of the invention and its core ideas; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims (7)

1. A containerized data exploration isolation region oriented to data mining is characterized in that,
the system comprises a data integration module, a data warehouse module, a mirror image warehouse module, a container development environment based on Kubernets and a knowledge sharing module; the data integration module is connected with an external data source and performs data interaction, and the data integration module is connected with the data warehouse module and performs data interaction; the data warehouse module is connected with the container development environment based on the Kubernets and performs data interaction, is connected with the knowledge sharing module and performs data interaction, and is also connected with the data integration module and performs data interaction; the knowledge sharing module is connected with the external application and performs data interaction; the container development environment based on the Kubernets is connected with the data warehouse module and performs data interaction, and is connected with the mirror image warehouse module and performs data interaction, and meanwhile, the container development environment based on the Kubernets is connected with the Web browser and performs data interaction;
creating a containerized JupyterNotebook development environment in the data exploration isolation area, wherein the containerized JupyterNotebook development environment is created based on a Docker container virtualization technology and a Kubernetes container orchestration technology;
the containerized JupyterNotebook development environment comprises an OAuth2 access authorization agent, an ISTIO gateway, a system naming space and a user naming space, wherein the OAuth2 access authorization agent is connected with the ISTIO gateway, and the ISTIO gateway is respectively connected with the system naming space and the user naming space;
the OAuth2 access authorization agent authenticates the access behavior of the user based on the authentication service of the OAuth2 specification; the ISTIO gateway is used for limiting HTTP access of a specific user to the JupyterNotebook instance; the JupitterNotebook controller is used for managing JupitterNotebook instances; the resource allocation controller is used for creating and managing the name space of each user; the user configuration is responsible for recording configuration information of a user JupyterNotebook instance; the system namespace serves as a runtime for system services, and the user namespace provides a runtime for a user jupyterNotebook.
2. The data mining oriented containerized data exploration isolation of claim 1, wherein said system namespace comprises a jupyterNotebook controller, a resource configuration controller, and a user configuration, said user namespace comprising a jupyterNotebook runtime.
3. The data mining oriented containerized data exploration isolation of claim 1, wherein said data integration module is in data interaction with said external data source through a firewall and said data integration module is in data interaction with said data warehouse module, data of said external data source being synchronized by said data integration module through said firewall into said data warehouse module of said data exploration isolation.
4. The data mining oriented containerized data exploration isolation of claim 1, wherein said mirrored repository module provides mirroring for said exploration isolation that is applicable to a data mining scenario.
5. The data mining oriented containerized data exploration isolation of claim 1, wherein the knowledge sharing module performs data interaction with the external application through a firewall, and the knowledge sharing module performs data interaction with the data warehouse module, and shares knowledge generated by data mining to the external application through a sharing service through the firewall.
6. The method for using the data mining oriented containerized data exploration isolation according to claim 1, wherein the method comprises the following steps:
step 1, a user logs in a data exploration isolation area through a WEB browser;
step 2, the user applies for creating projects in the data exploration isolation area according to the data mining requirement, and performs project configuration;
step 3, the user creates a JupyterNotebook development environment in the created data exploration isolation area project, and configures the JupyterNotebook development environment;
step 4, synchronizing the data in the external data source into a data warehouse module of the data exploration isolation area project by using a data integration function;
step 5, the user logs in the JupyterNotebook development environment, accesses the data in the data warehouse module, completes the data mining work, and stores the data mining result in the data warehouse module;
step 6, the user shares the data mining result to the external application of the data exploration isolation area through the knowledge sharing module;
and 7, after the data mining work is completed, the user logs out the project created in the data exploration isolation area, and releases all computing resources and data resources related to the project.
7. The method for using a data mining oriented containerized data exploration isolation according to claim 6, wherein the project configuration in the step 2 includes configuration of computing resources and configuration of storage space of a data warehouse module; the development environment configuration in the step 3 includes a mirror configuration, a computing resource configuration, and a storage space configuration.
CN202010189150.7A 2020-03-18 2020-03-18 Data mining-oriented containerized data exploration isolation region and use method thereof Active CN111400374B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010189150.7A CN111400374B (en) 2020-03-18 2020-03-18 Data mining-oriented containerized data exploration isolation region and use method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010189150.7A CN111400374B (en) 2020-03-18 2020-03-18 Data mining-oriented containerized data exploration isolation region and use method thereof

Publications (2)

Publication Number Publication Date
CN111400374A CN111400374A (en) 2020-07-10
CN111400374B true CN111400374B (en) 2023-05-23

Family

ID=71432555

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010189150.7A Active CN111400374B (en) 2020-03-18 2020-03-18 Data mining-oriented containerized data exploration isolation region and use method thereof

Country Status (1)

Country Link
CN (1) CN111400374B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112732495A (en) * 2021-01-19 2021-04-30 北京澎思科技有限公司 Information interaction method and device and storage medium
CN112988165A (en) * 2021-04-15 2021-06-18 成都新希望金融信息有限公司 Kubernetes-based interactive modeling method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105119750A (en) * 2015-09-08 2015-12-02 南京联成科技发展有限公司 Distributed information security operation and maintenance management platform based on massive data
CN109635018A (en) * 2018-10-19 2019-04-16 陕西艾特信息化工程咨询有限责任公司 A kind of method for interchanging data based on container
CN110197084A (en) * 2019-06-12 2019-09-03 上海联息生物科技有限公司 Medical data combination learning system and method based on trust computing and secret protection
CN110430173A (en) * 2019-07-19 2019-11-08 河南工程学院 A kind of cloud platform based on Vue+SpringCloud
CN110781226A (en) * 2019-10-28 2020-02-11 中国建设银行股份有限公司 Data analysis method, device, storage medium, equipment and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106888254B (en) * 2017-01-20 2020-08-18 华南理工大学 Kubernetes-based container cloud architecture and interaction method among modules thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105119750A (en) * 2015-09-08 2015-12-02 南京联成科技发展有限公司 Distributed information security operation and maintenance management platform based on massive data
CN109635018A (en) * 2018-10-19 2019-04-16 陕西艾特信息化工程咨询有限责任公司 A kind of method for interchanging data based on container
CN110197084A (en) * 2019-06-12 2019-09-03 上海联息生物科技有限公司 Medical data combination learning system and method based on trust computing and secret protection
CN110430173A (en) * 2019-07-19 2019-11-08 河南工程学院 A kind of cloud platform based on Vue+SpringCloud
CN110781226A (en) * 2019-10-28 2020-02-11 中国建设银行股份有限公司 Data analysis method, device, storage medium, equipment and system

Also Published As

Publication number Publication date
CN111400374A (en) 2020-07-10

Similar Documents

Publication Publication Date Title
US11064014B2 (en) System and method for batch computing
Pop et al. The FORA fog computing platform for industrial IoT
US9424554B2 (en) Enterprise managed systems with collaborative application support
US8862933B2 (en) Apparatus, systems and methods for deployment and management of distributed computing systems and applications
US20120066487A1 (en) System and method for providing load balancer visibility in an intelligent workload management system
CN106031128B (en) The method and apparatus of mobile device management
US10534337B2 (en) Flow engine for building automated flows within a cloud based developmental platform
CN112948278B (en) Product gray level publishing method, device, equipment and medium based on gray level database
US20140172954A1 (en) System and method for private cloud introduction and implementation
US9210098B2 (en) Enhanced command selection in a networked computing environment
CN111400374B (en) Data mining-oriented containerized data exploration isolation region and use method thereof
US20220244982A1 (en) Network-efficient isolation environment redistribution
CN109901823A (en) Interactive model exploitation environmental system and method based on cloud environment
Grandinetti Pervasive cloud computing technologies: future outlooks and interdisciplinary perspectives: future outlooks and interdisciplinary perspectives
Erulanova et al. Hardware and software support of technological processes virtualization
US20210406227A1 (en) Linking, deploying, and executing distributed analytics with distributed datasets
KR102287972B1 (en) operation method of cloud-based virtualized computer room service
CN114422542A (en) Terminal domain management system
CN113835827A (en) Application deployment method and device based on container Docker and electronic equipment
Muthoni et al. Infrastructure as Code for Business Continuity in Institutions of Higher Learning
Wagner et al. User managed virtual clusters in comet
CN112564979A (en) Execution method and device for construction task, computer equipment and storage medium
Kampert A taxonomy of virtualization technologies
Battarra et al. Storm clouds platform: a cloud computing platform for smart city applications
Oh et al. A Survey on Microservices Use Cases for AI based Application on Hybrid Cloud

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant