CN112286682A

CN112286682A - Machine learning task processing method, device and equipment based on distributed cluster

Info

Publication number: CN112286682A
Application number: CN202011166411.XA
Authority: CN
Inventors: 钟路; 张奔奔; 戴会杰
Original assignee: Shanghai Qifu Information Technology Co ltd
Current assignee: Shanghai Qifu Information Technology Co ltd
Priority date: 2020-10-27
Filing date: 2020-10-27
Publication date: 2021-01-29

Abstract

The disclosure relates to a machine learning task processing method and device based on a distributed cluster, an electronic device and a computer readable medium. The method comprises the following steps: logging in a preset distributed cluster according to a user operation instruction, wherein the distributed cluster is designed based on a Kubernetes micro-service architecture; the distributed cluster allocates an atomic scheduling unit to the user based on an interactive computing notebook environment; the atomic scheduling unit is associated with a target server, and the target server is a notebook server; and the user sends the machine learning task to be processed to the atomic scheduling unit so that the atomic scheduling unit can calculate and generate return information based on the target server. The method can provide a clustered high-availability scientific data analysis platform for business personnel, so that the business personnel can safely and quickly access real transaction big data in an intranet, utilize a server CPU and storage resources at low cost and use various algorithm components in a customized manner.

Description

Machine learning task processing method, device and equipment based on distributed cluster

Technical Field

The present disclosure relates to the field of computer information processing, and in particular, to a method and an apparatus for processing a machine learning task based on a distributed cluster, an electronic device, and a computer-readable medium.

Background

With the rapid development of computers and networks, machine learning functions are getting larger and larger, and our lives and works are being changed. Internet search, online advertising, machine translation, handwriting recognition, spam filtering, etc. are all machine learning-based technologies. With the increasing application fields of machine learning and the increasing complexity of machine learning, a single machine model cannot meet the use requirements of a machine learning platform, the development of a server side enables more and more machine learning to be transferred to a server cluster, the machine learning platform is established based on the server cluster, each user logs in the machine learning platform, sends a machine learning task to the server cluster, and then processes the machine learning task through a server in the server cluster.

The existing machine learning platform scheme is established based on an independent Hadoop cluster, and then is established and deployed in a mode of Apache Hadoop YARN (Yet other Resource coordinator), so that the machine learning platform established in the mode has the problems that the dynamic allocation support of machine resources is not friendly enough, the real transaction big data needs poor manual non-real-time synchronous experience, the machine learning platform is depended on and bound with other platforms, the manual non-real-time synchronous experience is not good, and the machine learning platform is coupled and is not easy to migrate. When a user performs processing such as feature analysis, feature engineering, model training, model deployment, data management and the like, cross-department and cross-region management is often required.

Therefore, a new method, apparatus, electronic device and computer readable medium for processing a distributed cluster-based machine learning task are needed.

The above information disclosed in this background section is only for enhancement of understanding of the background of the disclosure and therefore it may contain information that does not constitute prior art that is already known to a person of ordinary skill in the art.

Disclosure of Invention

In view of the above, the present disclosure provides a distributed cluster-based machine learning task processing method, device, electronic device, and computer readable medium, which can provide a convenient, interactive, visual, safe, reliable, and clustered scientific data analysis platform for business personnel, can also safely and quickly access big data of real transactions in an intranet, can also utilize a server CPU and storage resources at low cost, and can also use various algorithm components in a customizable manner.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to an aspect of the present disclosure, a method for processing a machine learning task based on a distributed cluster is provided, where the method includes: logging in a preset distributed cluster according to a user operation instruction, wherein the distributed cluster is designed based on a Kubernetes micro-service architecture; the distributed cluster allocates an atomic scheduling unit to the user based on an interactive computing notebook environment; the atomic scheduling unit is associated with a target server, and the target server is a notebook server; and the user sends the machine learning task to be processed to the atomic scheduling unit so that the atomic scheduling unit can calculate and generate return information based on the target server.

Optionally, before logging in the preset distributed cluster according to the user operation instruction, the method further includes: the user sends a login application to a hub of the distributed cluster through an agent; when the login application is valid, the hub returns a login page to the user; and the user operates on the login page to generate the user operation instruction.

Optionally, the distributing cluster allocates an atomic scheduling unit to the user based on an interactive computing notebook environment, including: the user applying for a container in the distributed clustered interactive computing notebook environment; the distributed cluster creates an atomic scheduling unit based on the container and assigns to the user.

Optionally, the associating, by the atomic scheduling unit, a target server includes: determining a target server in the distributed cluster according to the login information of the user; and associating the atomic scheduling unit with a target server to carry out data mutual transmission.

Optionally, after the atomic scheduling unit associates with the target server, the method further includes: and the hub of the distributed cluster configures task forwarding rules for the user.

Optionally, the sending, by the user, the machine learning task to be processed to the atomic scheduling unit includes: the user sends a task request to the agent through the browser; the agent sends the task request to the atomic scheduling unit based on a task forwarding rule; and the atomic scheduling unit sends the code page to the user.

Optionally, the sending, by the user, the machine learning task to be processed to the atomic scheduling unit further includes: the user sends the machine learning task to be processed to the agent based on the code page; the agent sends the machine learning task to the atomic scheduling unit based on a task forwarding rule.

Optionally, the atomic scheduling unit performs calculation based on the target server to generate the return information, including: the user sends a machine learning model training task to be processed to the agent through a browser; the atomic scheduling unit trains modeling samples in a machine learning model training task based on the target server; and after the model training is finished, generating the return information in the JSON format.

Optionally, the atomic scheduling unit performs calculation based on the target server to generate the return information, further including: calling user data of the distributed cluster according to the instruction of the user; testing the trained model through the user data; and when the test meets the requirement, generating an application model.

Optionally, the atomic scheduling unit performs calculation based on the target server to generate the return information, further including: deploying the application model in an application server according to the instruction of a user; and generating a timing monitoring task of the application model so as to monitor the real-time data of the application model.

According to an aspect of the present disclosure, a distributed cluster-based machine learning task processing system is provided, where the apparatus includes: the login module is used for logging in a preset distributed cluster according to a user operation instruction, and the distributed cluster is designed based on a Kubernetes micro-service architecture; an allocation module for the distributed cluster allocating an atomic scheduling unit to the user based on an interactive computing notebook environment; the association module is used for associating the atomic scheduling unit with a target server, and the target server is a notebook server; and the computing module is used for sending the machine learning task to be processed to the atomic scheduling unit by the user so that the atomic scheduling unit can calculate and generate return information based on the target server.

Optionally, the method further comprises: the page module is used for sending a login application to a hub of the distributed cluster through an agent by the user; when the login application is valid, the hub returns a login page to the user; and the user operates on the login page to generate the user operation instruction.

Optionally, the allocation module is further configured to apply for a container by the user in the interactive computing notebook environment of the distributed cluster; the distributed cluster creates an atomic scheduling unit based on the container and assigns to the user.

Optionally, the associating module includes: the target unit is used for determining a target server in the distributed cluster according to the login information of the user; and the association unit is used for associating the atomic scheduling unit with the target server so as to carry out data mutual transmission.

Optionally, the associating module further includes: and the rule unit is used for configuring a task forwarding rule for the user by the hub of the distributed cluster.

Optionally, the calculation module includes: the request unit is used for sending a task request to the proxy through the browser by the user; the forwarding unit is used for sending the task request to the atomic scheduling unit by the agent based on a task forwarding rule; and the code unit is used for sending a code page to the user by the atomic scheduling unit.

Optionally, the computing module further includes: a sending unit, configured to send, by the user, a machine learning task to be processed to the agent based on the code page; and the scheduling unit is used for sending the machine learning task to the atomic scheduling unit by the agent based on a task forwarding rule.

Optionally, the computing module further includes: the training unit is used for sending a machine learning model training task to be processed to the agent by the user through a browser; the atomic scheduling unit trains modeling samples in a machine learning model training task based on the target server; and after the model training is finished, generating the return information in the JSON format.

Optionally, the computing module further includes: the testing unit is used for calling the user data of the distributed cluster according to the instruction of the user; testing the trained model through the user data; and when the test meets the requirement, generating an application model.

Optionally, the computing module further includes: the deployment unit is used for deploying the application model in an application server according to an instruction of a user; and generating a timing monitoring task of the application model so as to monitor the real-time data of the application model.

According to an aspect of the present disclosure, an electronic device is provided, the electronic device including: one or more processors; storage means for storing one or more programs; when executed by one or more processors, cause the one or more processors to implement a method as above.

According to an aspect of the disclosure, a computer-readable medium is proposed, on which a computer program is stored, which program, when being executed by a processor, carries out the method as above.

According to the machine learning task processing method, device, electronic equipment and computer readable medium based on the distributed cluster, a preset distributed cluster is logged in according to a user operation instruction, and an atomic scheduling unit is allocated to a user; the atomic scheduling unit is associated with a target server, and the target server is a notebook server; the user sends the machine learning task to be processed to the atomic scheduling unit, so that the atomic scheduling unit can provide a convenient interactive, visualized, safe, reliable and clustered scientific data analysis platform for business personnel in a mode of calculating and generating return information based on the target server, can safely and quickly access real transaction big data in an intranet, can utilize a server CPU and storage resources at low cost, and can use various algorithm components in a customized mode.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings. The drawings described below are merely some embodiments of the present disclosure, and other drawings may be derived from those drawings by those of ordinary skill in the art without inventive effort.

Fig. 1 is a schematic application diagram of a distributed cluster-based machine learning task processing system according to an exemplary embodiment.

Fig. 2 is an architecture diagram illustrating a distributed cluster-based machine learning task processing method according to another exemplary embodiment.

Fig. 3 is a flowchart illustrating a distributed cluster-based machine learning task processing method according to an example embodiment.

Fig. 4 is a flowchart illustrating a distributed cluster-based machine learning task processing method according to another example embodiment.

Fig. 5 is an information interaction diagram illustrating a distributed cluster-based machine learning task processing method according to another exemplary embodiment.

FIG. 6 is a block diagram illustrating a distributed cluster-based machine learning task processing system in accordance with an exemplary embodiment.

FIG. 7 is a block diagram illustrating an electronic device in accordance with an example embodiment.

FIG. 8 is a block diagram illustrating a computer-readable medium in accordance with an example embodiment.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The same reference numerals denote the same or similar parts in the drawings, and thus, a repetitive description thereof will be omitted.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

It will be understood that, although the terms first, second, third, etc. may be used herein to describe various components, these components should not be limited by these terms. These terms are used to distinguish one element from another. Thus, a first component discussed below may be termed a second component without departing from the teachings of the disclosed concept. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

It is to be understood by those skilled in the art that the drawings are merely schematic representations of exemplary embodiments, and that the blocks or processes shown in the drawings are not necessarily required to practice the present disclosure and are, therefore, not intended to limit the scope of the present disclosure.

As shown in fig. 1, system architecture 10 may include

terminal devices

101, 102, 103, a network 104 and a distributed cluster server 105. The network 104 serves to provide a medium of communication links between the

terminal devices

101, 102, 103 and the distributed cluster server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

Users may use

terminal devices

101, 102, 103 to interact with distributed cluster server 105 over network 104 to receive or transmit machine learning tasks. The

terminal devices

101, 102, 103 may have various communication client applications installed thereon, such as a financial services application, a shopping application, a web browser application, an instant messaging tool, a mailbox client, social platform software, and the like.

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The distributed cluster server 105 may be a server that provides various services, such as a computing server that processes machine learning tasks provided by users with the

terminal devices

101, 102, 103. The background management server can train the received machine learning model, test the trained machine learning module, deploy the machine learning model in an actual application environment and monitor the effect of the module.

Logging on distributed cluster 105 may be performed, for example, according to user operation instructions; the distributed cluster server 105 may, for example, assign an atomic scheduling unit to the user; the atomic scheduling unit in the distributed cluster server 105 may, for example, associate a target server, which is a notebook server; the user sends the machine learning task to be processed to the atomic scheduling unit in the distributed cluster server 105, so that the atomic scheduling unit performs calculation based on the target server to generate return information.

The distributed cluster server 105 may be designed based on a kubernets micro-service architecture, and it should be noted that the machine learning task processing method based on the distributed cluster provided by the embodiment of the present disclosure may be executed by the distributed cluster server 105, and accordingly, the machine learning task processing system based on the distributed cluster may be disposed in the distributed cluster server 105. And the web page end provided for the user to submit the machine learning task is generally located in the

terminal equipment

101, 102, 103.

Fig. 2 is an architecture diagram illustrating a distributed cluster-based machine learning task processing method according to another exemplary embodiment. As shown in fig. 2, Jupyter notebook is an interactive shell (software for providing application interface for users), and the core of Jupyter notebook is a server of notebook. The user connects to the notebook server through a browser, and the notebook appears as a Web application. The code written by the user in the Web application is sent to the kernel through the server, and the kernel runs the code and sends the result back to the server. Any output is then returned to the browser. When saving the notebook, it writes it into the notebook server as a JSON file (file extension. ipynb). One advantage of this architecture is that the kernel does not need to run Python. Since the notebook and kernel are separate, code in any language can be sent between the two.

To the deficiency of the existing scheme, the application provides a new machine learning platform, mainly solves the following aspects: the method comprises the steps of cluster management of server resources, dynamic allocation of machine resources, extensible deployment management, mirroring, micro-service and persistent storage.

Fig. 3 is a flowchart illustrating a distributed cluster-based machine learning task processing method according to an example embodiment. The distributed cluster-based machine learning task processing method 30 includes at least steps S302 to S308.

As shown in fig. 3, in S302, a preset distributed cluster is logged in according to a user operation instruction, where the distributed cluster is designed based on a kubernets micro-service architecture. The Kubernetes (K8S for short) is used as a distributed cluster management platform and is also a container arrangement system, which is used for automatically deploying, expanding and running application containers among host clusters and provides a container-centric infrastructure. The K8S has functions such as service naming and discovery, load balancing, running condition checking, transverse elastic expansion and rolling update and the like, and is suitable for deploying application programs in a production environment.

The service running on the K8S micro service architecture can be based on an interactive computing Notebook environment (Jupiter Notebook), the Jupiter Notebook is an application of a Web architecture, a client part is responsible for running, storing, outputting and other functions of a note code, marks the functions through a markdown grammar and sends the functions to a server end for storing in a JSON format, and the server end is responsible for accessing the note code, calling a compiling kernel and other functions.

Before logging in the preset distributed cluster according to the user operation instruction, the method further comprises the following steps: the user sends a login application to a hub of the distributed cluster through an agent; when the login application is valid, the hub returns a login page to the user; and the user operates on the login page to generate the user operation instruction.

In S304, the distributed cluster allocates an atomic scheduling unit to the user based on the interactive computing notebook environment. Specific examples thereof include: the user applying for a container in the distributed clustered interactive computing notebook environment; the distributed cluster creates an atomic scheduling unit (pod) based on the container and allocates to the user.

In S306, the atomic scheduling unit associates with a target server, and the target server is a notebook server. For example, a target server is determined in the distributed cluster according to the login information of the user; and associating the atomic scheduling unit with a target server to carry out data mutual transmission.

After the atomic scheduling unit associates with the target server, the method further includes: and the hub of the distributed cluster configures task forwarding rules for the user.

In S308, the user sends the machine learning task to be processed to the atomic scheduling unit, so that the atomic scheduling unit performs calculation based on the target server to generate return information.

More specifically, a user may, for example, send a pending machine learning task to an atomic scheduling unit, including: the user sends a task request to the agent through the browser; the agent sends the task request to the atomic scheduling unit based on a task forwarding rule; and the atomic scheduling unit sends the code page to the user.

More specifically, for example, the user may send a machine learning task to be processed to the agent based on the code page; the agent sends the machine learning task to the atomic scheduling unit based on a task forwarding rule.

More specifically, for example, the atomic scheduling unit may perform calculation based on the target server to generate the return information, including: the user sends a machine learning model training task to be processed to the agent through a browser; the atomic scheduling unit trains modeling samples in a machine learning model training task based on the target server; and after the model training is finished, generating the return information in the JSON format.

For example, the user data of the distributed cluster can be called according to the instruction of the user; testing the trained model through the user data; and when the test meets the requirement, generating an application model.

The application model may also be deployed in an application server, for example, according to a user's instructions; and generating a timing monitoring task of the application model so as to monitor the real-time data of the application model.

According to the machine learning task processing method based on the distributed cluster, a preset distributed cluster is logged in according to a user operation instruction, and the distributed cluster is designed based on a Kubernetes micro-service architecture; the distributed cluster allocates an atomic scheduling unit to the user based on an interactive computing notebook environment; the atomic scheduling unit is associated with a target server, and the target server is a notebook server; the user sends the machine learning task to be processed to the atomic scheduling unit, so that the atomic scheduling unit can provide a convenient interactive, visualized, safe, reliable and clustered scientific data analysis platform for business personnel in a mode of calculating and generating return information based on the target server, can safely and quickly access real transaction big data in an intranet, can utilize a server CPU and storage resources at low cost, and can use various algorithm components in a customized mode.

According to the machine learning platform generated by the distributed cluster-based machine learning task processing method, an internal unified platform is provided, large data sources for real transaction are accessed in a unified mode, customized algorithm components, cluster management deployment application, micro-service and safe data extraction, import and export management schemes can be customized.

It should be clearly understood that this disclosure describes how to make and use particular examples, but the principles of this disclosure are not limited to any details of these examples. Rather, these principles can be applied to many other embodiments based on the teachings of the present disclosure.

Fig. 4 is a flowchart illustrating a distributed cluster-based machine learning task processing method according to another example embodiment. The flow 40 shown in fig. 4 is a detailed description of the present disclosure from the user application level.

As shown in fig. 4, in S402, the user applies for an account.

In S404, a data authority is applied.

In S406, the model sample preparation may specifically include establishing a data dictionary, performing ad hoc query, performing data extraction, performing temporary table query, and the like.

In S408, the modeling samples are imported into the distributed cluster.

In S410, the distributed cluster trains the model, visualizes the result, and may also perform model analysis.

In S412, the model file is exported.

In S414, the model file is imported into the PAI, wherein PAI (Personal application Integration) is a method and technology for integrating Personal-oriented applications running on different devices and provided by different vendors.

In S416, a model execution file is generated.

In S418, model deployment is performed, and more specifically, model deployment may include deploying a model interface, matching a generated call flow, and performing online and offline management of the model.

In S420, the model is monitored, and all model information may be monitored, such as the number of calls per day and score distribution monitoring, score stability monitoring, and the like.

The machine learning platform generated by the method can provide a convenient, interactive, visualized, safe, reliable and clustered scientific data analysis platform for business personnel. The system can safely and quickly access the financial big data of the real transaction of the group, can utilize a server CPU and storage resources at low cost, and can customizedly use various algorithm components.

The machine learning platform generated by the method disclosed by the invention has the following advantages:

the function is various and abundant: four major functions are managed by the following data supporting machine learning: data extraction, data dictionary, ad hoc query, data analysis and modeling;

distributed cluster management: based on a distributed cluster and a resource scheduling and distributing algorithm, a high-elasticity distributed micro-service architecture based on a container technology is realized, the extraction and analysis processing of sample data with large data volume are supported, the concurrent parallel training is supported, and the calculation speed is increased quickly;

file data can be stored persistently: currently, the NFS rear-end storage assembly is supported, and remote reading and writing are simple and transparent;

and (3) multi-user authority management: the method supports users of internal domain accounts of companies to log in a use platform, supports multiple persons to be online simultaneously, and supports independent allocation and space allocation according to the persons;

algorithm diversity can be customized: a variety of deep learning machine algorithms, for example: tensoflow, R, etc.;

reliability and safety: the platform supports multi-user authentication, a function authority approval distribution mode, real transaction big data isolation, desensitization synchronous access and the like.

Fig. 5 is an information interaction diagram illustrating a distributed cluster-based machine learning task processing method according to another exemplary embodiment. FIG. 5 is an exemplary depiction of the interaction of user data in a distributed cluster.

As shown in fig. 5, in S501, an entry is accessed.

In S502, proxy forwarding.

In S503, the login page is returned.

In S504, the login is successful.

In S505, the proxy forwards.

In S506, a container is requested.

In S507, a pod is created.

In S508, notebook is started.

In S509, the process returns.

In S510, return is made.

In S511, the procedure returns.

In S512, a jupyter server forwarding rule is configured.

In S513, a jump is made to the page.

In S514, jupyterlab is accessed.

In S515, forwarding is performed.

In S516, a jupyterlab UI is returned.

In S517, the code is executed.

In S518, forwarding is performed.

In S519, the result is returned.

The machine learning platform is an interactive data scientific analysis platform based on Jupyter, can be provided for business personnel to use, can be built locally without local manual configuration, and can be used for quickly connecting large data clusters of a shared group.

Those skilled in the art will appreciate that all or part of the steps implementing the above embodiments are implemented as computer programs executed by a CPU. When executed by the CPU, performs the functions defined by the above-described methods provided by the present disclosure. The program may be stored in a computer readable storage medium, which may be a read-only memory, a magnetic or optical disk, or the like.

Furthermore, it should be noted that the above-mentioned figures are only schematic illustrations of the processes involved in the methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods. For details not disclosed in the embodiments of the apparatus of the present disclosure, refer to the embodiments of the method of the present disclosure.

FIG. 6 is a block diagram illustrating a distributed cluster-based machine learning task processing system in accordance with an exemplary embodiment. As shown in fig. 6, a distributed cluster-based machine learning task processing system 60 may include: a login module 602, an assignment module 604, an association module 606, a calculation module 608, and a page module 610.

The login module 602 is configured to log in a preset distributed cluster according to a user operation instruction, where the distributed cluster is designed based on a kubernets micro-service architecture;

an allocating module 604, configured to allocate an atomic scheduling unit to the user by the distributed cluster based on an interactive computing notebook environment; the assignment module 604 is further configured for the user to apply for a container in the distributed clustered interactive computing notebook environment; the distributed cluster creates an atomic scheduling unit based on the container and assigns to the user.

The associating module 606 is configured to associate, by the atomic scheduling unit, a target server, where the target server is a notebook server; the association module 606 includes: the target unit is used for determining a target server in the distributed cluster according to the login information of the user; the association unit is used for associating the atomic scheduling unit with a target server so as to carry out data mutual transmission; and the rule unit is used for configuring a task forwarding rule for the user by the hub of the distributed cluster.

The calculation module 608 is configured to send the machine learning task to be processed to the atomic scheduling unit by the user, so that the atomic scheduling unit performs calculation based on the target server to generate return information. The calculation module 608 includes: the request unit is used for sending a task request to the proxy through the browser by the user; the forwarding unit is used for sending the task request to the atomic scheduling unit by the agent based on a task forwarding rule; the code unit is used for sending a code page to the user by the atomic scheduling unit; a sending unit, configured to send, by the user, a machine learning task to be processed to the agent based on the code page; the scheduling unit is used for sending the machine learning task to the atomic scheduling unit by the agent based on a task forwarding rule; the training unit is used for sending a machine learning model training task to be processed to the agent by the user through a browser; the atomic scheduling unit trains modeling samples in a machine learning model training task based on the target server; after the model training is finished, generating return information in a JSON format; the testing unit is used for calling the user data of the distributed cluster according to the instruction of the user; testing the trained model through the user data; when the test meets the requirement, generating an application model; the deployment unit is used for deploying the application model in an application server according to an instruction of a user; and generating a timing monitoring task of the application model so as to monitor the real-time data of the application model.

The page module 610 is used for the user to send a login application to the hub of the distributed cluster through an agent; when the login application is valid, the hub returns a login page to the user; and the user operates on the login page to generate the user operation instruction.

According to the machine learning task processing system based on the distributed cluster, a preset distributed cluster is logged in according to a user operation instruction, and the distributed cluster is designed based on a Kubernetes micro-service architecture; the distributed cluster allocates an atomic scheduling unit to the user based on an interactive computing notebook environment; the atomic scheduling unit is associated with a target server, and the target server is a notebook server; the user sends the machine learning task to be processed to the atomic scheduling unit, so that the atomic scheduling unit can provide a convenient interactive, visualized, safe, reliable and clustered scientific data analysis platform for business personnel in a mode of calculating and generating return information based on the target server, can safely and quickly access real transaction big data in an intranet, can utilize a server CPU and storage resources at low cost, and can use various algorithm components in a customized mode.

An electronic device 700 according to this embodiment of the disclosure is described below with reference to fig. 7. The electronic device 700 shown in fig. 7 is only an example and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 7, electronic device 700 is embodied in the form of a general purpose computing device. The components of the electronic device 700 may include, but are not limited to: at least one processing unit 710, at least one memory unit 720, a bus 730 that connects the various system components (including the memory unit 720 and the processing unit 710), a display unit 740, and the like.

Wherein the storage unit stores program codes executable by the processing unit 710 to cause the processing unit 710 to perform the steps according to various exemplary embodiments of the present disclosure described in the above-mentioned electronic prescription flow processing method section of the present specification. For example, the processing unit 710 may perform the steps as shown in fig. 3, 4, 5.

The memory unit 720 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)7201 and/or a cache memory unit 7202, and may further include a read only memory unit (ROM) 7203.

The memory unit 720 may also include a program/utility 7204 having a set (at least one) of program modules 7205, such program modules 7205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 730 may be any representation of one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 700 may also communicate with one or more external devices 700' (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 700, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 700 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 750. Also, the electronic device 700 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet) via the network adapter 760. The network adapter 760 may communicate with other modules of the electronic device 700 via the bus 730. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 700, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, as shown in fig. 8, the technical solution according to the embodiment of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, or a network device, etc.) to execute the above method according to the embodiment of the present disclosure.

The software product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

The computer readable medium carries one or more programs which, when executed by a device, cause the computer readable medium to perform the functions of: logging in a preset distributed cluster according to a user operation instruction, wherein the distributed cluster is designed based on a Kubernetes micro-service architecture; the distributed cluster allocates an atomic scheduling unit to the user based on an interactive computing notebook environment; the atomic scheduling unit is associated with a target server, and the target server is a notebook server; and the user sends the machine learning task to be processed to the atomic scheduling unit so that the atomic scheduling unit can calculate and generate return information based on the target server.

Those skilled in the art will appreciate that the modules described above may be distributed in the apparatus according to the description of the embodiments, or may be modified accordingly in one or more apparatuses unique from the embodiments. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

Exemplary embodiments of the present disclosure are specifically illustrated and described above. It is to be understood that the present disclosure is not limited to the precise arrangements, instrumentalities, or instrumentalities described herein; on the contrary, the disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. A machine learning task processing method based on distributed clusters is characterized by comprising the following steps:

logging in a preset distributed cluster according to a user operation instruction, wherein the distributed cluster is designed based on a Kubernetes micro-service architecture;

the distributed cluster allocates an atomic scheduling unit to the user based on an interactive computing notebook environment;

the atomic scheduling unit is associated with a target server, and the target server is a notebook server;

and the user sends the machine learning task to be processed to the atomic scheduling unit so that the atomic scheduling unit can calculate and generate return information based on the target server.

2. The method of claim 1, wherein before logging in the preset distributed cluster according to the user operation instruction, the method further comprises:

the user sends a login application to a hub of the distributed cluster through an agent;

when the login application is valid, the hub returns a login page to the user;

and the user operates on the login page to generate the user operation instruction.

3. The method of any of claims 1-2, wherein the distributed cluster allocates an atomic scheduling unit for the user based on an interactive computing notebook environment, comprising:

the user applying for a container in the distributed clustered interactive computing notebook environment;

the distributed cluster creates an atomic scheduling unit based on the container and assigns to the user.

4. The method of any of claims 1-3, wherein the atomic schedule unit associates a target server, comprising:

determining a target server in the distributed cluster according to the login information of the user;

and associating the atomic scheduling unit with a target server to carry out data mutual transmission.

5. The method of any of claims 1-4, wherein the atomic scheduling unit, after associating with the target server, further comprises:

and the hub of the distributed cluster configures task forwarding rules for the user.

6. The method of any of claims 1-5, wherein the user sending a pending machine learning task to the atomic scheduling unit, comprises:

the user sends a task request to the agent through the browser;

the agent sends the task request to the atomic scheduling unit based on a task forwarding rule;

and the atomic scheduling unit sends the code page to the user.

7. The method of any of claims 1-6, wherein the user sends a pending machine learning task to the atomic scheduling unit, further comprising:

the user sends the machine learning task to be processed to the agent based on the code page;

the agent sends the machine learning task to the atomic scheduling unit based on a task forwarding rule.

8. A distributed cluster-based machine learning task processing system, comprising:

the login module is used for logging in a preset distributed cluster according to a user operation instruction, and the distributed cluster is designed based on a Kubernetes micro-service architecture;

an allocation module for the distributed cluster allocating an atomic scheduling unit to the user based on an interactive computing notebook environment;

the association module is used for associating the atomic scheduling unit with a target server, and the target server is a notebook server;

and the computing module is used for sending the machine learning task to be processed to the atomic scheduling unit by the user so that the atomic scheduling unit can calculate and generate return information based on the target server.

9. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

10. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-7.