CN113537507A

CN113537507A - Machine learning system, method and electronic equipment

Info

Publication number: CN113537507A
Application number: CN202010907359.2A
Authority: CN
Inventors: 李伟; 陈守志; 苏函晶; 洪立涛
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-09-02
Filing date: 2020-09-02
Publication date: 2021-10-22

Abstract

The application provides a machine learning system, a method, an electronic device and a computer readable storage medium; data computing and data transmission in the technical field of cloud computing; the method comprises the following steps: receiving a data transfer request between any two sub-computing systems; when the data transfer request is detected to meet the security condition of cross-system data transfer, authorizing to execute the data transfer operation corresponding to the data transfer request; the plurality of sub-computing systems are configured to execute the computing tasks in the corresponding processing stages according to the stored data. By the method and the device, the data security of the processing flow of the machine learning model can be enhanced.

Description

Machine learning system, method and electronic equipment

Technical Field

The present disclosure relates to cloud computing and artificial intelligence technologies, and in particular, to a machine learning system, a method, an electronic device, and a computer-readable storage medium.

Background

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. Machine Learning (ML) is an important branch of artificial intelligence, and is mainly used for studying how a computer simulates or realizes human Learning behaviors to acquire new knowledge or skills and reorganize an existing knowledge structure to continuously improve the performance of the computer.

The lifecycle (including modeling and application) of the machine learning model is a multi-process and highly flexible process, and relates to large data processing of cloud technology, such as data calculation and data transmission of different sub-processes; is divided into a plurality of sub-processes, each sub-process being carried by a corresponding subsystem.

In the modeling process and the application process, data such as parameters and characteristics of the machine learning model are frequently adjusted according to various requirements, and if adjustment is wrong, safety accidents in different subsystems are caused, so that data in different sub-processes are lost or damaged, and the data security is influenced.

Disclosure of Invention

The embodiment of the application provides a machine learning system, a machine learning method, an electronic device and a computer readable storage medium, which can enhance the data security of the processing flow of a machine learning model.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides a machine learning system, includes:

a publishing system and a plurality of sub-computing systems, the plurality of sub-computing systems being isolated from each other and corresponding to different processing stages of the machine learning model; wherein the content of the first and second substances,

the plurality of sub-computing systems are used for executing computing tasks in corresponding processing stages according to the stored data;

the issuing system is used for receiving a data transfer request between any two sub-computing systems; and when the data transfer request is detected to meet the security condition of cross-system data transfer, authorizing to execute the data transfer operation corresponding to the data transfer request.

The embodiment of the application provides a machine learning method, which is applied to a plurality of sub-computing systems, wherein the sub-computing systems are isolated from each other and correspond to different processing stages of a machine learning model;

the machine learning method comprises the following steps:

receiving a data transfer request between any two of the sub-computing systems;

when the data transfer request is detected to meet the security condition of cross-system data transfer, authorizing to execute the data transfer operation corresponding to the data transfer request;

the plurality of sub-computing systems are used for executing the computing tasks in the corresponding processing stages according to the stored data.

In the above scheme, the method further comprises:

the plurality of sub-computing systems configured to be isolated from each other by at least one of:

performing respective computing tasks using different computing resources; different storage resources are used to store data needed to perform the computing task, as well as the results of the computing task.

In the above solution, the plurality of sub-computing systems include an offline computing system, a near-line computing system, and an online computing system; the machine learning method further comprises:

the off-line computing system executes an off-line training computing task of the machine learning model according to the stored historical data;

wherein the offline training computational task comprises: extracting historical samples from the historical data based on the feature statistical conversion of the historical data and the historical features obtained by the feature statistical conversion, and training the machine learning model based on the historical samples;

the near-line computing system executes a near-line training computing task according to the real-time data;

wherein the near-line training computational task comprises: extracting real-time samples from the real-time data based on the feature statistical conversion of the real-time data and the real-time features obtained based on the feature statistical conversion, and training the machine learning model based on the real-time samples;

the online computing system responds to a real-time prediction request, executes an online prediction computing task of the machine learning model, and responds to the prediction request based on an obtained prediction result.

In the above solution, the performing, by the nearline computing system, a nearline training computing task according to real-time data includes:

the online computing system acquires first data from the offline computing system through authorized first data transfer operation, and executes the online training computing task by combining the first data and the real-time data;

wherein the first data comprises at least one of:

and historical parameters of the machine learning model calculated in the off-line training calculation task and the historical characteristics calculated in the off-line training calculation task.

In the above solution, the performing the near-line training computational task by combining the first data and the real-time data includes:

the near-line computing system takes the historical characteristics as real-time characteristics and extracts real-time samples from the real-time data according to the real-time characteristics;

training the machine learning model that deploys the historical parameters based on the real-time samples.

In the above solution, after the nearline computing system executes a nearline training computing task according to the real-time data, the method further includes:

the online computing system acquires second data from the near-line computing system through second data transfer operation authorized by the release system, and executes the online prediction computing task by combining the second data and the stored data to be tested;

wherein the second data comprises at least one of:

the real-time parameters of the machine learning model calculated in the near-line training calculation task and the real-time features calculated in the near-line training calculation task.

In the foregoing solution, the performing the online prediction calculation task by combining the second data and the stored data to be measured includes:

the online computing system extracts a sample to be tested from the data to be tested according to the real-time characteristics;

and predicting the sample to be tested by deploying the machine learning model of the real-time parameters to obtain a prediction result.

the near-line computing system executes a near-line prediction computing task by combining the second data and the stored data to be tested to obtain a prediction result;

wherein the second data comprises at least one of:

the real-time parameters of the machine learning model obtained by calculation in the near-line training calculation task and the real-time features obtained by calculation in the near-line training calculation task;

the online computing system responds to a real-time forecasting request, and obtains the forecasting result from the near-line computing system through a third data transfer operation authorized by the issuing system so as to respond to the forecasting request.

In the above solution, after the performing the offline training calculation task of the machine learning model, the method further includes:

the off-line computing system executes an off-line prediction computing task by combining the first data and the stored data to be tested to obtain a prediction result;

wherein the first data comprises at least one of:

historical parameters of the machine learning model calculated in the off-line training calculation task and the historical characteristics calculated in the off-line training calculation task;

the online computing system responds to a real-time forecasting request, and acquires the forecasting result from the offline computing system through a fourth data transfer operation authorized by the issuing system so as to respond to the forecasting request.

In the above scheme, the method further comprises:

when the type of the data to be transferred corresponding to the data transfer request accords with the type of the allowed transfer, determining that the data transfer request meets the safety condition;

wherein the type of permitted transfer includes a result of a computing task.

In the above scheme, the method further comprises:

and when the data transfer time of the data transfer request meets the time interval for allowing data transfer, determining that the data transfer request meets the safety condition.

In the above scheme, the method further comprises:

and when the data transfer direction of the data transfer request accords with a set transfer direction, determining that the data transfer request meets the safety condition.

In the above scheme, the method further comprises:

the sub-computing system performs backup processing on the stored original data to obtain backup data so as to

And when the original data has errors, restoring according to the backup data.

In the above scheme, the method further comprises:

the sub-computing system performs audit processing on set data in the stored data to obtain an audit log including data access operation performed on the set data, so as to

And when the set data has errors, positioning data access operation causing the errors according to the audit log.

An embodiment of the present application provides an electronic device, including:

a memory for storing executable instructions;

and the processor is used for realizing the machine learning method provided by the embodiment of the application when executing the executable instructions stored in the memory.

The embodiment of the application provides a computer-readable storage medium, which stores executable instructions for causing a processor to execute, so as to implement the machine learning method provided by the embodiment of the application.

The embodiment of the application has the following beneficial effects:

according to the plurality of sub-computing systems which are isolated from each other, computing tasks of different processing stages of the machine learning model are executed, and the computing tasks of the different processing stages are effectively guaranteed not to be influenced by each other; meanwhile, when data transfer is needed among different sub-computing systems, security audit is carried out according to the issuing system, whether data transfer operation is authorized to be executed or not is determined, and data security of a processing flow of the machine learning model can be enhanced.

Drawings

Fig. 1 is an alternative architecture diagram of a machine learning system provided by an embodiment of the present application;

FIG. 2 is an alternative architectural diagram of an electronic device provided by an embodiment of the present application;

fig. 3A is an alternative flow chart of a machine learning method provided by an embodiment of the present application;

fig. 3B is an alternative flow chart of a machine learning method provided by the embodiment of the present application;

fig. 3C is an alternative flow chart of a machine learning method provided by the embodiment of the present application;

fig. 3D is an alternative flow chart of a machine learning method provided by the embodiment of the present application;

FIG. 4 is an alternative schematic diagram of a storage medium isolation strategy provided by an embodiment of the present application;

FIG. 5 is an alternative schematic diagram of data transfer provided by embodiments of the present application;

FIG. 6 is an alternative diagram of a computing resource isolation policy provided by an embodiment of the present application.

Detailed Description

In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

In the following description, references to the terms "first \ second \ third" are only to distinguish similar objects and do not denote a particular order, but rather the terms "first \ second \ third" are used to interchange specific orders or sequences, where appropriate, so as to enable the embodiments of the application described herein to be practiced in other than the order shown or described herein. In the following description, the term "plurality" referred to means at least two.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.

1) The sub-computing system: for performing computational tasks in one or more processing stages of the machine learning model based on its own computational resources. The processing stages of the machine learning model include, but are not limited to, a training stage and a prediction stage.

2) The release system: namely, the data transfer system is used for auditing data transfer requests among different sub-computing systems and determining whether to authorize the execution of corresponding data transfer operations.

3) An off-line environment: the calculation task is executed based on the generated historical data, the real-time performance is poor, and real-time service is not provided. In an embodiment of the present application, an offline environment is constructed based on an offline computing system.

4) A near-line environment: performing computational tasks based on data generated in real-time does not guarantee the provision of real-time services. In an embodiment of the present application, a nearline environment is constructed based on a nearline computing system.

5) An online environment: and responding to the real-time request, executing a corresponding computing task, and ensuring to provide real-time service. In an embodiment of the present application, an online environment is constructed based on an online computing system.

6) Cloud Computing (Cloud Computing): a computing model distributes computing tasks across a pool of resources formed by a large number of electronic devices, enabling various application systems (sub-computing systems) to acquire computing resources, storage resources, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the "cloud" appear to the user as being infinitely expandable and available at any time, available on demand, expandable at any time, and paid for on-demand.

7) Database (Database): data sets that are stored together in a manner that enables sharing with multiple users, has as little redundancy as possible, and is independent of the application, the users can perform additions, queries, updates, and deletions to the data in the database.

The life cycle of the machine learning model is a multi-process, particularly flexible process, and in order to prevent the input or output data of each sub-process from influencing each other, the input and output of the calculation data are usually distinguished from the application parameters, such as: model ID, model time, sample time, application scenario ID, sample table, feature table, etc. Related personnel often need to finely adjust the related model parameters, characteristics, samples and the like in different sub-processes, then compare the effects of the model parameters, the characteristics, the samples and the like, if the fine adjustment is verified to be effective, the full application scene needs to be manually updated when the fine adjustment is pushed to the full application scene, and the updating operations are repetitive, so that the labor cost is high. In addition, when the update operation is performed manually, a problem of modification error (for example, a model ID is input by mistake) may occur, which easily causes a serious safety accident, and data safety and modeling efficiency are low.

The embodiment of the application provides a machine learning system, a machine learning method, an electronic device and a computer readable storage medium, which can strengthen data security in a machine learning model processing process and improve modeling efficiency. An exemplary application of the electronic device provided in the embodiments of the present application is described below, and the electronic device provided in the embodiments of the present application may be implemented as various types of terminal devices such as a notebook computer, a tablet computer, a desktop computer, a set-top box, a mobile device (e.g., a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, and a portable game device), and may also be implemented as a server.

The embodiment of the application can be suitable for machine learning processing processes under various application scenes, for example, a content recommendation scene is taken, a record can be triggered according to the content of a user in an application program, a calculation task (a training calculation task) in a training stage of a machine learning model is executed at a cloud end, then, when a real-time prediction request initiated in the application program is received, the calculation task (a prediction calculation task) in a prediction stage of the machine learning model is executed in the cloud end or a terminal device carried by the application program, and the prediction request is responded according to an obtained prediction result. Here, the type of the content is not limited, and may be, for example, an advertisement, a payment coupon, a public number, a movie or a tv show.

Referring to fig. 1, fig. 1 is an alternative architecture diagram of a machine learning system 100 provided in the embodiment of the present application, and for ease of understanding, a plurality of sub-computing systems include an offline computing system 300, a near-line computing system 400, and an online computing system 500, and are exemplified by an application scenario recommended by the content. In fig. 1, the distribution system 200 includes a server 200-1, the offline computing system 300 includes a server 300-1 and a database 300-2, the near-line computing system 400 includes a server 400-1 and a database 400-2, and the online computing system 500 includes a terminal device 500-1. Wherein, the storage resource in the sub-computing system may be provided by at least one of a database, a distributed file system and a distributed memory system, and the database is taken as an example only; the online computing system 500 may also be provided computing resources by a server, here exemplified by a terminal device.

In the processing of the machine learning model, the server 300-1 first performs an offline training computational task based on historical data stored in the database 300-2, such as historical content trigger records of the user. The historical data may be generated by the online computing system 500 and obtained by the offline computing system 300 through a data transfer request (not shown in fig. 1), or may be manually stored in the offline computing system 300. The content trigger record in the embodiment of the present application is not limited in form, and may include, for example, user data, recommended content data, and a trigger result.

The server 300-1 may transfer the historical features computed in the offline training computation task and the historical parameters of the machine learning model to the database 400-2 of the nearline computation system 400 through the publishing system 200. The server 400-1 in the nearline computing system 400 performs a nearline training computing task based on the real-time features, the historical parameters, and the real-time data using the historical features in the database 400-2 as the real-time features. The real-time data, such as the real-time generated and current content of the user triggering recording, may also be generated by the online computing system 500 and obtained by the near-line computing system 400 through a data transfer request, or may be manually stored in the near-line computing system 400. The history feature is a type of feature, and indicates, for example, which features are extracted from user data as user features and which features are extracted from content data as content features.

The server 400-1 may transfer the real-time features in the near-line training computational task, as well as the real-time parameters of the machine learning model, to the online computing system 500 through the publishing system 200, e.g., locally to the terminal device 500-1. The terminal device 500-1 responds to the real-time prediction request, executes the online prediction calculation task of the machine learning model according to the real-time parameters and the real-time characteristics, and responds to the prediction request based on the obtained prediction result.

The terminal device 500-1 may display various results and final results in the processing of the machine learning model in the graphical interface 510-1. In fig. 1, a scene of content recommendation is taken as an example, and a button of intelligent content recommendation, which is used to be triggered to generate a real-time prediction request, and recommended content 1 and content 2 are shown, where content 1 and content 2 are the above prediction results.

It should be noted that the above processing procedure of the machine learning model is only an example and does not constitute a limitation to the embodiment of the present application, for example, the server 300-1 may further perform an offline prediction calculation task to obtain a prediction result, and transfer the prediction result to the terminal device 500-1 through the distribution system 200; for another example, the server 400-1 may also perform a calculation task of the nearline prediction to obtain a prediction result, and transfer the prediction result to the terminal device 500-1 through the distribution system 200.

In some embodiments, the servers (e.g., server 200-1, server 300-1, and server 400-1) referred to in fig. 1 may be independent physical servers, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be cloud servers that provide basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDNs, and big data and artificial intelligence platforms, for example, server 300-1 may provide cloud services for offline training computing tasks and offline predicting computing tasks; the server 400-1 may provide cloud services for the nearline training computing task and the nearline prediction computing task; when the online computing system includes a server, the server may also provide cloud services for online predictive computing tasks. The terminal device 500-1 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal device and the server may be directly or indirectly connected through wired or wireless communication, and the embodiment of the present application is not limited.

Referring to fig. 2, fig. 2 is a schematic structural diagram of an electronic device 800 provided in the embodiment of the present application, where the electronic device 800 may be an electronic device that provides a computing resource in a distribution system, and may also be an electronic device that provides a computing resource in a sub-computing system, and for convenience of understanding, a case where the electronic device 800 is taken as a server is illustrated in fig. 2. The electronic device 800 shown in fig. 2 includes: at least one processor 810, memory 840, and at least one network interface 820. The various components in electronic device 800 are coupled together by a bus system 830. It is understood that bus system 830 is used to enable communications among the components. The bus system 830 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 830 in fig. 2.

The Processor 810 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The memory 840 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 840 optionally includes one or more storage devices physically located remote from processor 810.

The memory 840 includes volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The nonvolatile memory may be a Read Only Memory (ROM), and the volatile memory may be a Random Access Memory (RAM). The memory 840 described in embodiments herein is intended to comprise any suitable type of memory.

In some embodiments, memory 840 is capable of storing data, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below, to support various operations.

An operating system 841, including system programs for handling various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and handling hardware-based tasks;

a network communication module 842 for communicating to other computing devices via one or more (wired or wireless) network interfaces 820, exemplary network interfaces 820 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), among others.

In some embodiments, the machine learning process may be implemented in software, and fig. 2 shows software in the form of programs and plug-ins stored in the memory 840, which includes the following software modules: sub-computation modules 8431 and a distribution module 8432, which are logical and thus may be arbitrarily combined or further split depending on the functionality implemented. In addition, fig. 2 is a diagram illustrating various software modules involved in the machine learning process in a centralized manner, and actually, the software modules may be respectively deployed in different electronic devices, for example, the sub-computing module 8431 is deployed in an electronic device of a sub-computing system, the publishing module 8432 is deployed in an electronic device of a publishing system, and so on. The functions of the respective modules will be explained below.

In other embodiments, the machine learning process may be implemented in hardware, for example, a processor in the form of a hardware decoding processor may be programmed to execute the machine learning method provided in the embodiments of the present Application, for example, the processor in the form of the hardware decoding processor may be implemented by one or more Application Specific Integrated Circuits (ASICs), DS ps, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate arrays (FPG a), or other electronic components.

In some embodiments, for the case where the electronic device is a terminal device, other structures may be included on the basis of fig. 2. For example, a user interface is also included that includes one or more output devices, including one or more speakers and/or one or more visual display screens, that enable presentation of the media content. The user interface also includes one or more input devices, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.

In memory 840, a presentation module may also be included for enabling presentation of information (e.g., user interfaces for operating peripherals and displaying content and information) via one or more output devices (e.g., display screens, speakers, etc.) associated with the user interface; in the memory 840, an input processing module may also be included for detecting one or more user inputs or interactions from one of the one or more input devices and translating the detected inputs or interactions.

The machine learning method provided by the embodiment of the present application will be described in conjunction with exemplary applications and implementations of the electronic device provided by the embodiment of the present application.

Referring to fig. 3A, fig. 3A is an alternative flowchart of the machine learning method provided in the embodiment of the present application, and for convenience of understanding, the offline computing system and the near-line computing system are taken as examples for illustration, but this does not constitute a limitation to the embodiment of the present application.

In step 101, the plurality of sub-computing systems perform the computing tasks in the corresponding processing stages according to the stored data.

In the embodiment of the present application, a plurality of sub-computing systems are included, which are isolated from each other, and correspond to different processing stages of the machine learning model, wherein the number of the machine learning models is not limited, and the processing stages of the machine learning model include, but are not limited to, a training stage and a prediction stage. For each sub-computing system, to perform the computing task in the corresponding processing stage according to the stored data.

In some embodiments, further comprising: the plurality of sub-computing systems are isolated from each other by at least one of: performing respective computing tasks using different computing resources; different storage resources are used to store data needed to perform the computing task, as well as the results of the computing task.

Here, two ways of isolation are provided. The first way is that the multiple sub-computing systems use different computing resources to perform their respective computing tasks, i.e., the different sub-computing systems use different computing resources. For example, there are 10 servers available, of which 5 are partitioned into offline computing systems and the remaining 5 are partitioned into near-line computing systems, thus achieving isolation of the offline computing systems from the near-line computing systems in terms of computing resources. By the method, the situation that the computing task cannot be normally executed due to the fact that computing resources of different sub-computing systems compete can be prevented.

In the second way, the plurality of sub-computing systems use different storage resources to store data required for executing the computing task and a result of the computing task, that is, different sub-computing systems correspond to different storage spaces, and the sub-computing systems only have data access rights to the corresponding storage spaces and do not have data access rights to other storage spaces. For example, the sub-computing system A corresponds to a storage space A₁The storage space corresponding to the sub-computing system B is B₁Then storage space A₁Only the sub-computing system a is allowed to perform data access operations and the sub-computing system B is prohibited from performing data access operations (provided that the sub-computing system B does not have access to perform data transfer operations). The data access operation comprises at least one of read operation, write operation, modification operation and query operation on the data stored in the storage space; in addition, the storage space includes at least one of a database, a distributed file system, and a distributed memory system, and of course, other forms of storage spaces may also be applied, for example, when the magnitude of data to be stored is large, a data warehouse may be applied, which is not limited to this. By the method, different sub-computing systems are isolated according to the storage resources, and the safety of the data stored in the sub-computing systems is improved. According to different practical application scenes, at least one of the two modes can be applied to carry out isolation.

In some embodiments, the plurality of sub-computing systems includes an offline computing system, a near-line computing system, and an online computing system; the above-described plurality of sub-computing systems may be implemented in such a way as to perform computing tasks in corresponding processing stages according to the stored data: the off-line computing system executes an off-line training computing task of the machine learning model according to the stored historical data; wherein, the off-line training calculation task comprises: the method comprises the steps of performing feature statistical conversion based on historical data, extracting historical samples from the historical data based on historical features obtained through feature statistical conversion, and training a machine learning model based on the historical samples; the near-line computing system executes a near-line training computing task according to the real-time data; wherein, the calculation task of the near-line training comprises the following steps: the method comprises the steps of performing feature statistical conversion based on real-time data, extracting real-time samples from the real-time data based on real-time features obtained through the feature statistical conversion, and training a machine learning model based on the real-time samples; the online computing system responds to the real-time prediction request, executes the online prediction computing task of the machine learning model, and responds to the prediction request based on the obtained prediction result.

In an embodiment of the present application, the plurality of sub-computing systems may include an offline computing system, a near-line computing system, and an online computing system for respectively constructing an offline environment, a near-line environment, and an online environment. For ease of understanding, the roles of the various sub-computing systems are illustrated in the context of content recommendations.

For an offline computing system, it may be used to perform an offline training computational task of a machine learning model based on stored historical data collected from an online application environment (online environment) of the machine learning model. In the process of executing the off-line training calculation task, firstly, feature statistical conversion based on historical data is executed, namely, feature engineering processing is carried out, so that historical features are obtained, the historical features represent types of features extracted from the historical data, and rules of the feature statistical conversion can be preset. For example, the history data is a history content trigger record, the content trigger record includes user data, recommended content data, and a trigger result, the history characteristics may include user characteristics such as gender, age, city of residence, and hobbies of interest in the user data, content characteristics such as content type, display location, and display duration in the recommended content data, and a trigger result, where the trigger result may indicate whether the user triggers the recommended content, or may indicate a duration that the user triggers the recommended content (e.g., advertisement browsing duration, game playing duration, etc.), and the trigger form may be a click or a long press according to an actual application scenario, and is not limited as well. Then, a history sample is extracted from the history data according to the history characteristics, for example, a history sample of "user characteristics-content characteristics-trigger results" is obtained, parameters of the machine learning model are updated based on the history sample, and the updated parameters are named as history parameters for convenience of distinguishing. Here, the updating manner is not limited, and for example, the parameter updating may be performed in combination with a mechanism of back propagation and gradient descent. In addition to off-line training computational tasks, the off-line computing system may also perform off-line predictive computational tasks, as described in detail below.

The near-line computing system can be used for executing a near-line training computing task of the machine learning model according to real-time data, wherein the real-time data is also data acquired from an online application environment of the machine learning model. Similar to the off-line training calculation task, in the process of executing the near-line training calculation task, firstly, feature statistical conversion based on real-time data is executed, namely, feature engineering processing is carried out, so as to obtain real-time features, and the real-time features represent the types of features extracted from the real-time data. Then, extracting real-time samples from the real-time data according to the real-time characteristics, for example, obtaining the real-time samples of "user characteristics-content characteristics-trigger results", updating parameters of the machine learning model based on the real-time samples, and naming the updated parameters as real-time parameters for easy distinction. In addition to the nearline training computational task, the nearline computing system may also perform a nearline prediction computational task, as described in more detail below.

For an online computing system, when a real-time prediction request is received, an online prediction computing task of a machine learning model is executed, and the prediction request is responded based on the obtained prediction result. For example, in the process of performing an online prediction calculation task, feature statistical conversion may be performed on the stored data to be measured, so as to extract a sample to be measured from the data to be measured. In the context of content recommendation, the to-be-tested sample is different from the above historical sample and real-time sample in that no trigger result exists in the to-be-tested sample. Here, it is exemplified that the online computing system includes user data of a user corresponding to the prediction request and data of a plurality of contents to be recommended, the sample to be tested may include user features extracted from the user data and content features extracted from data of a certain content to be recommended, that is, each content to be recommended corresponds to a sample to be tested. Then, respectively predicting the multiple samples to be tested through the machine learning model to obtain a prediction result corresponding to each sample to be tested, wherein the prediction result is a prediction trigger result of the content to be recommended corresponding to the sample to be tested. According to the prediction trigger result, a plurality of contents to be recommended can be screened, for example, when the prediction trigger result indicates whether triggering is performed, the contents to be recommended corresponding to the prediction trigger result indicating triggering are used as screened contents; and when the predicted triggering result shows the triggering duration, taking the content to be recommended corresponding to the predicted triggering result with the triggering duration exceeding the duration threshold as the screened content. And finally, recommending the screened contents to the user corresponding to the prediction request by the online computing system, so that the content recommendation effect can be improved, and the user experience is enhanced. In some cases, the screened content may also be used as a prediction result.

It should be noted that the scenario in the embodiment of the present application is not limited to content recommendation, and may also be a scenario such as computer vision (e.g., human face detection, human body detection, or vehicle detection), speech technology (e.g., speech recognition or speech synthesis), or natural language processing (e.g., solid recognition, part-of-speech tagging, machine translation, or robot question and answer).

In some embodiments, between any of the steps, further comprising: and the sub-computing system performs backup processing on the stored original data to obtain backup data, and when the original data has errors, the sub-computing system performs recovery according to the backup data.

For the sub-computing system, the stored original data can be backed up to obtain the backup data. For example, the online computing system may backup parameters of the machine learning model, and if the online prediction computing task cannot be executed due to an error occurring in the parameters of the machine learning model after the machine learning model is online, the online prediction computing task may be restored according to the backed-up parameters, that is, the online prediction computing task is rolled back to the machine learning model before the error occurs. By the method, the fault tolerance of the sub-computing system during the execution of the computing task can be improved.

In some embodiments, between any of the steps, further comprising: and the sub-computing system performs auditing processing on the set data in the stored data to obtain an audit log which comprises data access operation executed on the set data, so that when the set data has errors, the data access operation causing the errors is positioned according to the audit log.

In the embodiment of the present application, the sub-computing system may further perform audit processing on setting data in the stored data, where the setting data may be specifically set according to an actual application scenario, and may be parameters of a machine learning model, for example. The audit processing is to record data access operation executed on the set data to obtain an audit log, where the data access operation may also be set according to an actual application scenario, for example, to set to record only modification operation executed on parameters of the machine learning model. Therefore, when the set data has errors, the data access operation causing the errors can be positioned according to the audit log, so that related personnel can accurately repair the errors conveniently, wherein the positioning mode can be manual positioning, and can also be positioning by means of a specific positioning rule. Through the mode, the source tracing can be accurately and quickly carried out on the error.

In step 102, the publishing system receives a data transfer request between any two child computing systems.

Here, there may be a need for data transfer between different sub-computing systems, for example, an offline computing system needs to transfer trained historical parameters to a near-line computing system. In the embodiment of the application, the data transfer between the sub-computing systems isolated from each other is realized by the publishing system, and the publishing system judges whether to authorize or not after receiving a data transfer request between any two sub-computing systems, wherein the data transfer request may be a data sending request or a data obtaining request.

In step 103, when it is detected that the data transfer request meets the security condition of the cross-system data transfer, the issuing system authorizes the execution of the data transfer operation corresponding to the data transfer request.

If the issuing system detects that the data transfer request meets the security condition of cross-system data transfer, authorizing to execute the data transfer operation corresponding to the data transfer request; if the condition that the safety is not met is detected, no processing is carried out, so that the safety of the data and the orderliness of the processing process of the machine learning model are ensured to the maximum extent. It should be noted that, in the embodiment of the present application, the execution order of executing the calculation task and performing the data transfer is not limited, for example, the calculation task may be executed first, and the data transfer may be performed according to the result of the calculation task, or the calculation task may be executed according to the data obtained by the data transfer.

In addition, a release system can also be used for realizing backup and audit, for example, the release system can perform backup processing on the transferred original data to obtain backup data, and thus, when the original data has errors, the original data can be restored according to the backup data; the issuing system can also perform audit processing on the set data in the transferred data to obtain an audit log including data access operation executed on the set data, and thus, when the set data has errors, the data access operation causing the errors can be positioned according to the audit log.

When data transfer is performed, one implementation manner is that a sub-computing system can acquire the name and/or other recognizable information of data stored by other sub-computing systems, but cannot acquire the storage address of the data. Wherein, the data list (including the name of the stored data and/or other recognizable information) of each sub-computing system can be obtained by the publishing system in real time or periodically, and the data list is synchronized to each sub-computing system. Taking the example of a case including an offline computing system, a nearline computing system, and an online computing system, the file system may synchronize the data manifests of the offline computing system and the online computing system to the nearline computing system, and so on. On the basis, the distribution system can perform synchronization according to the set synchronization rule, for example, only the data list of the offline computing system is synchronized to the offline computing system, and only the data list of the offline computing system is synchronized to the online computing system. After receiving the data transfer request, if the issuing system detects that the data transfer request meets the security condition, the issuing system sends the storage address of the data to be transferred (corresponding to the data transfer request) to the corresponding sub-computing system, so that the sub-computing system transfers the data according to the storage address. For example, the data transfer request is a data acquisition request sent by the near-line computing system, and is used for acquiring the history parameters in the offline computing system, and when detecting that the data acquisition request meets the security condition, the issuing system may send the storage address of the history parameters in the offline computing system to the near-line computing system, so that the near-line computing system acquires the history parameters from the offline computing system according to the storage address.

In another implementation, the sub-computing system can obtain the storage address of the data stored by other sub-computing systems, but has no data transfer authority. The publishing system may acquire the data list (including the storage address of the stored data, and may also include a name and/or other recognizable information) of each sub-computing system in real time or periodically, and synchronize the data list to each sub-computing system, and the synchronization rule may also be freely set. After receiving the data transfer request, the issuing system opens the data transfer authority to the corresponding sub-computing system if it is detected that the data transfer request meets the security condition, so that the sub-computing system transfers data according to the data transfer authority, wherein the form of the data transfer authority is not limited, and may be, for example, an authentication password or other forms. For example, the data transfer request is a data acquisition request sent by the near-line computing system, which is used to acquire the history parameters in the offline computing system, and when detecting that the data acquisition request satisfies the security condition, the issuing system may send the data transfer authority (here, the data acquisition authority) to the near-line computing system, so that the near-line computing system acquires the history parameters from the offline computing system according to the data transfer authority and the storage address of the history parameters acquired in advance.

In some embodiments, before step 103, further comprising: when the type of the data to be transferred corresponding to the data transfer request accords with the type of the allowed transfer, the issuing system determines that the data transfer request meets the safety condition; wherein the type of allowable transitions includes results of the computing task.

Here, an example of a security condition is provided, namely including the type of permitted transfer. Here, the type of the allowable transition may be set uniformly for all the sub-computing systems or may be set individually for different sub-computing systems. For example, the types of permissible transitions may be uniformly set as the results of the computational task, including the features obtained after performing the training computational task and the parameters of the machine learning model, and also including the predicted results obtained after performing the predictive computational task. For another example, in a scenario where the offline computing system executes the offline predictive computing task and the online computing system directly obtains the prediction result from the offline computing system, the type of the allowable branch is set to include only the prediction result obtained after the offline predictive computing task is executed for the data branch request related to the offline computing system. By the method, the data transfer process is restricted by the specific type, and the data transfer safety can be improved.

In some embodiments, before step 103, further comprising: and when the data transfer time of the data transfer request meets the time interval for allowing the data transfer, the issuing system determines that the data transfer request meets the safety condition.

Here, another example of the security condition, that is, the time interval during which the data transfer is permitted is provided, and next, description will be made in two specific cases. In one case, the data transfer time refers to the time when the data transfer request is issued by the sub-computing system or the time when the data transfer request is received by the issuing system. In this case, the load degrees of the plurality of sub-computing systems in each historical time interval may be counted in advance, and one or more time intervals with the lowest load degree may be used as the time intervals allowing data transfer, where the time intervals may be divided according to the actual application scenario, for example, 1 day may be divided into 24 time intervals, and each time interval is 1 hour. And if the data transfer time falls into the time interval allowing the data transfer, determining that the data transfer request meets the security condition. Therefore, data transfer can be carried out in a time interval with a low load degree, and the consequences that the data transfer duration is too long or the data transfer is wrong and the like due to too high load degree are effectively avoided.

In addition, the time interval for allowing data transfer can also be set according to the security requirements of different sub-computing systems, for example, the data transfer requirements from an offline computing system to a near-line computing system are relatively frequent, and the influence degree is small, so the time interval for allowing data transfer is not limited, that is, the data transfer requirements are directly determined to meet the security conditions; for the data transfer requirement from the near-line computing system to the on-line computing system, because the influence degree on the on-line service provided by the on-line computing system is large, the time interval (such as three points in the morning) in which the use frequency of the on-line service is lower than the frequency threshold value can be used as the time interval for allowing the data transfer, so that the use users of the on-line service in the time interval are few, the use frequency is low, even if the on-line service has problems due to the data transfer, the influence can be reduced to the greatest extent, and the related personnel can repair the on-line service quickly.

In another case, the data transfer time refers to the generation time of the data to be transferred corresponding to the data transfer request, and when the generation time falls within the time interval in which the data transfer is allowed, it is determined that the data transfer request meets the security condition. The time interval for allowing data transfer may be set according to the service timeliness requirement of the machine learning model processing process, for example, set to be within 1 day from the current time. By the mode, poor model processing effect (training effect or prediction effect) caused by transferring of data losing timeliness can be avoided, and meanwhile, resources of data transmission can be saved.

In some embodiments, before step 103, further comprising: and when the data transfer direction of the data transfer request accords with the set transfer direction, the issuing system determines that the data transfer request meets the safety condition.

Here, another example of a security condition is provided, namely setting a transfer direction. For example, setting the transfer direction may include directions from the offline computing system to the nearline computing system and from the nearline computing system to the online computing system, and may include only directions from the offline computing system to the online computing system. Through the mode, the data transfer is restricted by setting the transfer direction, the orderliness of the data transfer is improved, and errors caused by the data transfer in the illegal direction are avoided.

It should be noted that, the above-mentioned type of permission to transfer, the time interval of permission to transfer data, and the set transfer direction may be applied by any one of them, or may be combined with any application, and in addition, it may be determined whether the data transfer operation is authorized to be performed in the release system by a manual review method. In the case where the number of machine learning models to be modeled is plural, safety conditions corresponding to different machine learning models may be set so as to meet the safety requirements of the respective machine learning models.

As shown in fig. 3A, by using a plurality of sub-computing systems isolated from each other, it is effectively ensured that the computing tasks of different processing stages do not affect each other; meanwhile, the issuing system determines whether to authorize execution of data transfer operation, so that risks in the processing process of the machine learning model can be reduced, and data security is enhanced.

In some embodiments, referring to fig. 3B, fig. 3B is an optional flowchart of the machine learning method provided in the embodiments of the present application, and for ease of understanding, the offline computing system, the near-line computing system, and the online computing system are taken as examples, and are described in conjunction with the illustrated steps.

In step 201, the offline computing system performs an offline training computational task of the machine learning model based on the stored historical data.

Here, the offline computing system performs feature statistical conversion based on the stored historical data to obtain a historical feature, which is used to indicate the type of the feature extracted from the historical data, wherein the historical data may be obtained from the online computing system by the offline computing system through data transfer. Then, the offline computing system extracts a historical sample from the historical data based on the historical characteristics, such as a content trigger record comprising the user data, the recommended content data, and the trigger result, and the historical sample extracted from the historical content trigger record may be in the form of a "user characteristics-content characteristics-trigger result". Training the machine learning model based on the historical samples, for example, performing prediction processing on the user features and the content features in the historical samples through the machine learning model to obtain a prediction result, determining a difference (namely a loss value) between the prediction result and the trigger result in the historical samples according to a loss function of the machine learning model, performing back propagation in the machine learning model according to the difference, and updating parameters of the machine learning model along a gradient descending direction in the process of back propagation. For the convenience of distinguishing, parameters of the machine learning model obtained after the off-line training calculation task is executed are named as historical parameters.

In step 202, the near-line computing system obtains first data from the off-line computing system through a first data transfer operation authorized by the publishing system; the first data includes at least one of: and historical parameters of the machine learning model obtained by calculation in the off-line training calculation task and historical characteristics obtained by calculation in the off-line training calculation task.

Here, the offline computing system or the online computing system may send a data transfer request corresponding to the first data to the publishing system, and the publishing system authorizes to perform a data transfer operation corresponding to the data transfer request when detecting that the data transfer request satisfies a security condition.

In step 203, the nearline computing system performs a nearline training computing task in conjunction with the first data and the real-time data.

Here, the real-time data may be acquired from the online computing system by the near-line computing system by means of data transfer, wherein the near-line computing system may acquire the real-time data by means of streaming.

In some embodiments, performing the nearline training computational task described above in conjunction with the first data and the real-time data may be accomplished in a manner that: the near-line computing system takes the historical characteristics as real-time characteristics and extracts real-time samples from the real-time data according to the real-time characteristics; training a machine learning model that deploys historical parameters based on the real-time samples.

When the first data simultaneously comprises the historical characteristics and the historical parameters, the near-line computing system can directly take the historical characteristics as real-time characteristics, extract real-time samples from the real-time data according to the real-time characteristics, train a machine learning model for deploying the historical parameters based on the real-time samples, and name the obtained parameters as the real-time parameters.

In addition, when the first data only comprises the historical features, the near-line computing system can directly use the historical features as real-time features to further obtain real-time samples, and then train the machine learning model stored in the near-line computing system based on the real-time samples to obtain real-time parameters; when the first data only comprises the historical parameters, the near-line computing system can perform feature statistical conversion based on the real-time data to obtain real-time features, extract real-time samples from the real-time data according to the real-time features, and then train the machine learning model for deploying the historical parameters based on the real-time samples. By means of the method, the process of model training in the near-line computing system can be fitted with real-time conditions, and the accuracy of the obtained real-time parameters is improved.

In step 204, the online computing system obtains second data from the near-line computing system through a second data transfer operation authorized by the publishing system; the second data includes at least one of: the real-time parameters of the machine learning model obtained by calculation in the near-line training calculation task and the real-time characteristics obtained by calculation in the near-line training calculation task.

Here, the near-line computing system or the online computing system may send a data transfer request corresponding to the second data to the publishing system, and the publishing system authorizes to perform the second data transfer operation corresponding to the data transfer request when detecting that the data transfer request satisfies the security condition.

In step 205, the online computing system responds to the real-time prediction request, and performs an online prediction computing task by combining the second data and the stored data to be tested, and responds to the prediction request based on the obtained prediction result.

Here, the data to be measured may be acquired from the offline computing system or the online computing system by the online computing system through a data transfer method, may be pre-stored locally in the online computing system, or may be carried by the received prediction request. And when the online computing system receives a real-time prediction request, the online prediction computing task is executed by combining the second data and the stored data to be tested, and the prediction request is responded based on the obtained prediction result.

In some embodiments, performing the online predictive computation task described above in conjunction with the second data and the stored data under test may be accomplished by: the online computing system extracts a sample to be tested from the data to be tested according to the real-time characteristics; and predicting the sample to be tested by deploying a machine learning model of real-time parameters to obtain a prediction result.

When the second data includes both the real-time feature and the real-time parameter, the online computing system may directly extract the sample to be tested from the data to be tested according to the real-time feature, for example, the data to be tested includes user data and data of a plurality of contents to be recommended, extract the user feature from the user data according to the feature type represented by the real-time feature, extract the content feature from the data of each content to be recommended, and combine the user feature and the content features corresponding to the plurality of contents to be recommended, respectively, to obtain a plurality of samples to be tested. Then, a machine learning model with real-time parameters is deployed, prediction processing is carried out on the sample to be tested, a prediction result is obtained, and a prediction request is responded based on the prediction result. For example, when the prediction result indicates whether triggering is performed or not, the content to be recommended corresponding to the prediction triggering result indicating triggering is used as the screened content; and when the prediction result shows the trigger duration, taking the content to be recommended corresponding to the prediction result with the trigger duration exceeding the duration threshold as the screened content. And recommending the screened content to a user corresponding to the prediction request as a response to the prediction request, so that accurate content recommendation can be realized, and the user experience is improved.

In addition, the processing manner when the second data only includes the real-time feature or only includes the real-time parameter is similar to the processing manner when the first data only includes the historical feature or only includes the historical parameter, which is not described herein again.

As shown in fig. 3B, in the embodiment of the present application, the effect of model training can be effectively improved and the accuracy of the finally obtained prediction result can be improved in an off-line training-near-line training-on-line prediction manner.

In some embodiments, referring to fig. 3C, fig. 3C is an optional flowchart of the machine learning method provided in the embodiment of the present application, based on fig. 3B, after step 203, in step 301, the near-line computing system may further perform a near-line prediction computing task by combining the second data and the stored data to be tested, so as to obtain a prediction result; the second data includes at least one of: the real-time parameters of the machine learning model obtained by calculation in the near-line training calculation task and the real-time characteristics obtained by calculation in the near-line training calculation task.

In the embodiment of the present application, the near-line computing system may execute the near-line prediction computing task in combination with second data obtained by computing in the near-line training computing task and stored to-be-detected data, where the to-be-detected data may be obtained by the near-line computing system from the offline computing system or the online computing system through data transfer, or may be pre-stored locally in the near-line computing system.

In some embodiments, the above-mentioned performing of the near-line prediction calculation task by combining the second data and the stored data to be measured can be implemented in such a way that the prediction result is obtained: and the near-line computing system extracts a sample to be tested from the stored data to be tested according to the real-time characteristics, and performs prediction processing on the sample to be tested by deploying a machine learning model with real-time parameters to obtain a prediction result.

Here, the process of performing the near-line prediction calculation task is similar to the above-described process of performing the near-line prediction calculation task, and is not described herein again.

In fig. 3C, after step 301, the online computing system may also obtain a prediction result from the near-line computing system in response to the prediction request through a third data transfer operation authorized by the publishing system in response to the real-time prediction request in step 302.

Here, the near-line computing system or the online computing system may send a data transfer request corresponding to the prediction result to the publishing system, and the publishing system authorizes to perform a third data transfer operation corresponding to the data transfer request when detecting that the data transfer request satisfies the security condition.

As shown in fig. 3C, in the embodiment of the present application, by using the off-line training, the near line prediction, and the on-line obtaining, the processing pressure of the on-line computing system can be reduced, and the method is suitable for a scenario where the processing capability of the on-line computing system is low, such as a scenario where the terminal device provides the computing resource in the on-line computing system.

In some embodiments, referring to fig. 3D, fig. 3D is an optional flowchart of the machine learning method provided in the embodiments of the present application, and based on fig. 3B, after step 201, in step 401, the offline computing system may further perform an offline prediction computing task by combining the first data and the stored data to be tested, so as to obtain a prediction result; wherein the first data comprises at least one of: and historical parameters of the machine learning model obtained by calculation in the off-line training calculation task and historical characteristics obtained by calculation in the off-line training calculation task.

For scenarios where timeliness is not a requirement, such as scenarios where the recommended content is a payment coupon, offline prediction may be performed by an offline computing system. For example, the offline computing system performs an offline prediction computing task to obtain a prediction result by combining the first data and the stored data to be tested, where the data to be tested may be acquired from the near-line computing system or the online computing system by the offline computing system through a data transfer method, or may be pre-stored locally in the offline computing system.

In some embodiments, the above-mentioned performing an offline prediction calculation task by combining the first data and the stored data to be tested may be implemented in such a manner that a prediction result is obtained: the off-line computing system extracts a sample to be tested from the stored data to be tested based on the historical characteristics, and performs prediction processing on the sample to be tested through a machine learning model with deployed historical parameters to obtain a prediction result.

Here, when the first data includes both the historical feature and the historical parameter, the offline computing system extracts a sample to be tested from the data to be tested based on the historical feature, and performs prediction processing on the sample to be tested through a machine learning model with the historical parameter deployed, so as to obtain a prediction result. In addition, when the first data only comprises the historical features, the offline computing system can extract a sample to be tested from the data to be tested based on the historical features, and perform prediction processing on the sample to be tested through an original machine learning model (namely, a machine learning model before performing an offline training computing task); when the first data only comprises the historical parameters, the offline computing system can perform feature statistical conversion based on the data to be detected, further extract a sample to be detected, and perform prediction processing on the sample to be detected through a machine learning model with the historical parameters.

In step 402, the online computing system obtains the forecast results from the offline computing system in response to the forecast request via a fourth data transfer operation authorized by the publishing system in response to the forecast request.

Here, the offline computing system or the online computing system may send a data transfer request corresponding to the prediction result to the publishing system, and the publishing system authorizes to perform a fourth data transfer operation corresponding to the data transfer request when detecting that the data transfer request satisfies the security condition.

As shown in fig. 3D, in the embodiment of the present application, the prediction result is calculated in advance in an offline prediction-online acquisition manner, so that the processing pressure when the prediction request is acquired can be reduced, and the method is suitable for a scene with a low real-time requirement.

Next, an exemplary application of the embodiments of the present application in an actual application scenario will be described. The embodiment of the application can be applied to various application scenarios of machine learning modeling, for example, content recommendation in an application program, where the content may be an advertisement, a payment coupon, a movie or a tv series, and the like, without limitation. Next, the various modules involved in the machine learning modeling process are illustrated in tabular form:

here, each of the modules involved will be specifically described.

A data warehouse: the data analysis method refers to a warehouse for storing data, the magnitude of the stored data is large, and the warehouse is mainly used for data analysis, such as a Hive data warehouse. In this embodiment of the application, the data warehouse may store the report log related to the application and the processed standardized data for machine learning modeling, for example, the data for machine learning modeling is related to the payment application, and the data warehouse may store the transaction flow related to the payment application, the statistical data generated based on the flow, and the like.

A streaming data processing system: a system for processing data streams generated by real-time online services, such as a Flink stream processing framework, etc. The data warehouse above processes the data that has been generated (corresponding to the historical data above), while the streaming data processing system processes the data that is being generated (corresponding to the real-time data above).

Calculating a cluster: in the processing process of the machine learning model, a large amount of data is often generated, and a single electronic device is difficult to process, so that a computing cluster comprising a plurality of electronic devices can be used for executing computing tasks, such as a Hadoop cluster, a Spark cluster or a distributed tensrflow cluster.

The dispatching system: the processing procedure of the machine learning model often involves multi-person collaboration, multi-flow computation, etc., and therefore, a scheduling system is required to fully use computing cluster resources to implement a scheduling policy of a multi-user and multi-task scenario, such as Another Resource coordinator (YARN) scheduling system.

Distributed storage system: the distributed file system and the distributed memory system are deployed based on a computing cluster, and the difference is that the distributed file system utilizes a disk, and the distributed memory system utilizes a memory. In the execution process of a computing task, when data is stored in a disk, the data is usually stored in a Distributed File System, such as a Hadoop Distributed File System (HDFS); when real-time online service is provided, the data volume is large, and the data volume is also stored in a distributed memory system, such as a FeatureKV distributed memory system. That is, the distributed file system is usually used for offline analysis, such as storing device logs or application logs that are generated at any time; distributed memory systems are typically used by online services, such as storing user characteristics for performing predictive computing tasks.

And (3) online service: the services of model reasoning or calculation provided through the network often involve loading of machine learning models, reading of features, and the like. For example, the online service may be an intelligent content recommendation service.

In addition, a plurality of environments referred to in the above table will be described separately.

An off-line environment: the data processing is finished in a batch mode by performing calculation (sample statistic conversion, characteristic statistic conversion, model training, offline prediction and the like) based on the generated historical data, the real-time performance of the data processing is poor, and the data and the model cannot be updated immediately.

A near-line environment: the calculation (real-time feature calculation, real-time model training, etc.) is performed based on the data generated in real time, the data processing is completed in a streaming manner, and the data is processed with a time delay of the order of minutes or even seconds, that is, the provision of real-time services is not guaranteed.

An online environment: online prediction of the model is performed in response to a prediction request from a user, typically requiring computations to be completed in milliseconds to ensure real-time online service is provided.

In the processing process of the machine learning model, data in the offline environment is generally pushed to the online environment, data in the offline environment is pushed to the online environment, and data in the online environment is pushed to the online environment, so that the final effect is generally exerted in the online environment. In order to avoid the problem of data security caused by mutual influence of all links, all modules are segmented according to an offline environment, a near-line environment and an online environment. The angle of the split includes two aspects, one being the storage medium and the other being the computing resource.

For a storage medium, an embodiment of the present application provides a schematic diagram of a storage medium isolation policy as shown in fig. 4, each computing environment corresponds to a separate storage space, and it is ensured that a situation of using data across environments does not occur from a system by controlling data access rights of the storage space, for example, data access rights of an offline data warehouse and an offline distributed file system are set, and are only held by an electronic device in an offline environment. When data transfer is performed between different computing environments, the data transfer can be successfully performed after the data transfer is checked and authorized by an automatic issuing system.

The release system is an auditing and transferring mechanism introduced when multiplexing (transferring) data in different computing environments, and prevents data errors caused by manual operation when transferring data in different computing environments. In the distribution system, the audit can be performed by setting the transfer direction. The embodiment of the application provides a schematic diagram of data transfer as shown in fig. 5, and in fig. 5, the issuing system may only authorize the data transfer operations corresponding to the transfer directions shown by (i), (ii), and (iii), and not authorize the data transfer operations corresponding to other transfer directions. Of course, this is not a limitation to the embodiment of the present application, and the setting of the transfer direction may be specifically determined for different machine learning systems according to the data usage manner and environment thereof.

Moreover, an auditing mechanism can be determined according to the security requirements of different computing environments, for example, for data transfer operation of transferring data in an offline environment to a near-line environment, the data transfer operation is relatively frequent and has a small influence degree, a time interval for allowing data transfer can not be limited in an issuing system, and manual auditing by a cooperation developer can be set; for the data transfer operation of transferring the data in the offline environment to the online environment or the data transfer operation of transferring the data in the online environment to the online environment, because the influence degree is large, in the release system, the time interval allowing the data transfer can be limited to be the non-core working time interval of the online service, and in addition, more responsible persons can be set for carrying out manual review.

In addition, a backup and audit mechanism can be applied, wherein the backup mechanism is used for backing up data for rollback before the data is online; the auditing mechanism is used for auditing set data and is used for tracing when problems occur, so that the operation of locating the source of the problems is convenient. In the embodiment of the application, a backup and audit mechanism can be deployed in a release system, and can also be deployed in each computing environment.

For computing resources, the embodiment of the present application provides a schematic diagram of a computing resource isolation policy as shown in fig. 6, and in order to prevent that when a scheduling system schedules tasks, different computing environments compete to cause some important tasks to be unable to be completed normally on time, different computing clusters may be divided for different computing environments. For example, there is a Spark cluster comprising 10 servers, of which 5 servers may be partitioned to the offline environment as an offline-Spark cluster for providing computing resources to the offline environment; the other 5 servers are partitioned into the near-line environment as a near-line-Spark cluster for providing the near-line environment with computing resources, thus being isolated from each other at the level of computing resources.

Through this application embodiment, become artifical restraint problem for system's restraint problem, can realize following technological effect: 1) risks in the processing process of the machine learning model are reduced, and a data use mode is normalized; 2) the daily invalid work content of related personnel (such as algorithm engineers) is reduced, the efficiency of the system can be improved, and the labor cost is reduced; 3) the normalized data operation mode can provide uniform input for subsequent abnormality detection and the like, and is convenient for the expansion of other applications.

Continuing with the exemplary structure of the software modules in the electronic device 800 provided by the embodiments of the present application, in some embodiments, as shown in fig. 2, when the electronic device 800 is an electronic device in a sub-computing system, the sub-computing module 8431 is configured to: executing the calculation task in the corresponding processing stage according to the stored data; wherein the plurality of sub-computing systems are isolated from each other and correspond to different processing stages of the machine learning model. When the electronic device 800 is an electronic device in a distribution system, the distribution module 8432 is configured to: receiving a data transfer request between any two sub-computing systems; and when the data transfer request is detected to meet the security condition of the cross-system data transfer, authorizing to execute the data transfer operation corresponding to the data transfer request.

In some embodiments, a plurality of sub-computing systems, for isolation from each other by at least one of: performing respective computing tasks using different computing resources; different storage resources are used to store data needed to perform the computing task, as well as the results of the computing task.

In some embodiments, the plurality of sub-computing systems includes an offline computing system, a near-line computing system, and an online computing system. When the electronic device 800 is an electronic device in an offline computing system, the sub-computing module 8431 may be updated to be an offline computing module, and the offline computing module is further configured to: executing an off-line training calculation task of the machine learning model according to the stored historical data; wherein, the off-line training calculation task comprises: the method comprises the steps of feature statistics conversion based on historical data, extraction of historical samples from the historical data based on historical features obtained through feature statistics conversion, and training of a machine learning model based on the historical samples.

In some embodiments, when the electronic device 800 is an electronic device in a near line computing system, the sub-computing module 8431 may be updated as a near line computing module, the near line computing module further to: executing a near-line training calculation task according to the real-time data; wherein, the calculation task of the near-line training comprises the following steps: the method comprises the steps of feature statistics conversion based on real-time data, extraction of real-time samples from the real-time data based on real-time features obtained through feature statistics conversion, and training of a machine learning model based on the real-time samples.

In some embodiments, when the electronic device 800 is an electronic device in an online computing system, the sub-computing module 8431 may be updated as an online computing module, the online computing module further to: and responding to the real-time prediction request by executing an online prediction calculation task of the machine learning model and responding to the prediction request based on the obtained prediction result.

In some embodiments, the near line calculation module is further to: acquiring first data from an offline computing system through first data transfer operation authorized by an issuing system, and executing a near-line training computing task by combining the first data and real-time data; wherein the first data comprises at least one of: and historical parameters of the machine learning model obtained by calculation in the off-line training calculation task and historical characteristics obtained by calculation in the off-line training calculation task.

In some embodiments, the near line calculation module is further to: taking the historical characteristics as real-time characteristics, and extracting real-time samples from real-time data according to the real-time characteristics; training a machine learning model that deploys historical parameters based on the real-time samples.

In some embodiments, the online computing module is further to: acquiring second data from the near-line computing system through second data transfer operation authorized by the issuing system, and executing an online prediction computing task by combining the second data and the stored data to be tested; wherein the second data comprises at least one of: the real-time parameters of the machine learning model obtained by calculation in the near-line training calculation task and the real-time characteristics obtained by calculation in the near-line training calculation task.

In some embodiments, the online computing module is further to: extracting a sample to be detected from the data to be detected according to the real-time characteristics; and predicting the sample to be tested by deploying a machine learning model of real-time parameters to obtain a prediction result.

In some embodiments, the near line calculation module is further to: combining the second data with the stored data to be detected, and executing a near-line prediction calculation task to obtain a prediction result; wherein the second data comprises at least one of: real-time parameters of the machine learning model obtained through calculation in the near-line training calculation task and real-time characteristics obtained through calculation in the near-line training calculation task; the online computing module is further to: in response to the real-time prediction request, a prediction result is obtained from the near-line computing system through a third data transfer operation authorized by the issuing system to respond to the prediction request.

In some embodiments, the offline computation module is further to: combining the first data with the stored data to be tested, and executing an offline prediction calculation task to obtain a prediction result; wherein the first data comprises at least one of: historical parameters of the machine learning model obtained by calculation in the off-line training calculation task and historical characteristics obtained by calculation in the off-line training calculation task; the online computing module is further to: in response to the real-time prediction request, obtaining a prediction result from the offline computing system through a fourth data transfer operation authorized by the publishing system in response to the prediction request.

In some embodiments, the publishing module 8432 is further configured to: when the type of the data to be transferred corresponding to the data transfer request accords with the type of the allowed transfer, determining that the data transfer request meets the safety condition; wherein the type of allowable transitions includes results of the computing task.

In some embodiments, the publishing module 8432 is further configured to: and when the data transfer time of the data transfer request accords with the time interval for allowing the data transfer, determining that the data transfer request meets the safety condition.

In some embodiments, the publishing module 8432 is further configured to: and when the data transfer direction of the data transfer request accords with the set transfer direction, determining that the data transfer request meets the safety condition.

In some embodiments, sub-computation module 8431, further to: and performing backup processing on the stored original data to obtain backup data, and recovering according to the backup data when the original data has errors.

In some embodiments, sub-computation module 8431, further to: and auditing the set data in the stored data to obtain an audit log comprising data access operation executed on the set data, so that when the set data has errors, the data access operation causing the errors is positioned according to the audit log.

Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the machine learning method described in the embodiment of the present application.

Embodiments of the present application provide a computer-readable storage medium having stored therein executable instructions that, when executed by a processor, cause the processor to perform a method provided by embodiments of the present application, for example, a machine learning method as illustrated in fig. 3A, 3B, 3C, or 3D. Note that the computer includes various computing devices including a terminal device and a server.

In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

In summary, the following technical effects can be achieved through the embodiments of the present application:

1) through the sub-computing systems which are isolated from each other, the computing tasks in different processing stages are effectively ensured not to be influenced by each other; meanwhile, the issuing system determines whether to authorize execution of data transfer operation, so that risks in the processing process of the machine learning model can be reduced, and data security is enhanced. The isolation can be realized by at least one of a computing resource isolation mode and a storage resource isolation mode, and the flexibility is improved.

2) The issuing system can audit the data transfer request through at least one of the type of the transfer permission, the time interval of the data transfer permission and the set transfer direction, so that the flexibility and the safety of the data transfer are improved, and the risk in the data transfer process is reduced.

3) In the processing process of the machine learning model, modes such as offline training, near line training, online prediction, offline training, near line prediction, offline training, offline prediction and the like can be utilized, so that the processing flexibility is improved, and the modes can be selected according to actual application scenes.

4) Based on a backup mechanism, when data has errors, the data can be rolled back quickly, so that loss is reduced as much as possible; based on an auditing mechanism, the source can be traced quickly when data is wrong, and related personnel can repair the data as soon as possible.

The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims

1. A machine learning system, comprising:

2. The machine learning system of claim 1,

3. The machine learning system of claim 1,

the plurality of sub-computing systems comprises an offline computing system, a near-line computing system, and an online computing system;

the off-line computing system is further used for executing an off-line training computing task of the machine learning model according to the stored historical data;

the near-line computing system is also used for executing a near-line training computing task according to the real-time data;

the online computing system is further used for responding to a real-time prediction request, executing an online prediction computing task of the machine learning model, and responding to the prediction request based on the obtained prediction result.

4. The machine learning system of claim 3,

the online computing system is further used for acquiring first data from the offline computing system through a first data transfer operation authorized by the release system, and executing the online training computing task by combining the first data and the real-time data;

wherein the first data comprises at least one of:

5. The machine learning system of claim 4,

the near-line computing system is further configured to:

taking the historical characteristics as real-time characteristics, and extracting real-time samples from the real-time data according to the real-time characteristics;

6. The machine learning system of claim 4,

the online computing system is further used for acquiring second data from the near-line computing system through a second data transfer operation authorized by the release system, and executing the online prediction computing task by combining the second data and the stored data to be tested;

wherein the second data comprises at least one of:

7. The machine learning system of claim 6,

the online computing system is further configured to:

extracting a sample to be detected from the data to be detected according to the real-time characteristics;

8. The machine learning system of claim 4,

the near-line computing system is also used for executing a near-line prediction computing task by combining the second data and the stored data to be tested to obtain a prediction result;

wherein the second data comprises at least one of:

the online computing system is further configured to obtain the prediction result from the near-line computing system through a third data transfer operation authorized by the publishing system in response to a real-time prediction request, so as to respond to the prediction request.

9. The machine learning system of claim 3,

the off-line computing system is also used for executing an off-line prediction computing task by combining the first data and the stored data to be tested to obtain a prediction result;

wherein the first data comprises at least one of:

the online computing system is further configured to obtain the prediction result from the offline computing system through a fourth data transfer operation authorized by the publishing system in response to a real-time prediction request, in response to the prediction request.

10. The machine learning system of any one of claims 1 to 9,

the issuing system is further configured to perform at least one of the following processes:

wherein the type of permitted transfer includes a result of a computing task;

when the data transfer time of the data transfer request meets a time interval allowing data transfer, determining that the data transfer request meets the safety condition;

11. The machine learning system of any one of claims 1 to 9,

the sub-computing system is also used for carrying out backup processing on the stored original data to obtain backup data so as to

And when the original data has errors, restoring according to the backup data.

12. The machine learning system of any one of claims 1 to 9,

the sub-computing system is further used for auditing set data in the stored data to obtain an audit log including data access operation executed on the set data, so as to

13. A machine learning method is applied to a plurality of sub-computing systems which are isolated from each other and correspond to different processing stages of a machine learning model;

the machine learning method comprises the following steps:

receiving a data transfer request between any two of the sub-computing systems;

14. An electronic device, comprising:

a memory for storing executable instructions;

a processor, when executing executable instructions stored in the memory, implementing the machine learning method of claim 13.

15. A computer-readable storage medium having stored thereon executable instructions for, when executed by a processor, implementing the machine learning method of claim 13.