WO2019041708A1

WO2019041708A1 - Classification model training system and realisation method therefor

Info

Publication number: WO2019041708A1
Application number: PCT/CN2017/120174
Authority: WO
Inventors: 王毅; 张文明; 陈少杰
Original assignee: 武汉斗鱼网络科技有限公司
Priority date: 2017-08-29
Filing date: 2017-12-29
Publication date: 2019-03-07
Also published as: CN107562859B; CN107562859A

Abstract

A classification model training system and a realisation method therefor. The realisation method comprises: S1, creating a front-end management and display interface (1) of a SPARK algorithm classification model training system, and defining a front-end interaction request interface (101) of the front-end management and display interface (1); S2, creating a back-end service data source system (2) of the SPARK algorithm classification model training system; S3, based on the front-end interaction request interface (101) of the front-end management and display interface (1), creating a back-end service control interface (3), and establishing a correlation between the back-end service control interface (3) and the front-end interaction request interface (101); and S4, creating a SPARK algorithm training and optimised classification model based internal service logic in the back-end service control interface (3). By means of the classification model training system, an operation flow of classification model training can be effectively simplified, so as to effectively reduce the labour intensity for a developer and improve the development efficiency.

Description

Classification model training system and implementation method thereof

cross reference

The present application is hereby incorporated by reference in its entirety in its entirety in its entirety in its entirety in its entirety in the the the the the the the the the

Technical field

The present invention relates to the field of information processing technologies, and in particular, to a classification model training system and an implementation method thereof.

Background technique

At present, machine learning library SPARK.MLlib using SPARK algorithm has become a common machine learning method. In order to quickly and easily use SPARK.MLlib for classification algorithm model training, and because the classification algorithm belongs to supervised learning, it is necessary to prepare a large number of labeled samples in advance, which are divided into training samples and test samples, and then these labeled samples are used by SPARK.MLlib. The training of the classification algorithm model is carried out. In this process, the sample and model parameters need to be continuously adjusted to optimize the classification algorithm model.

The commonly used method of optimizing the classification model requires manual addition of training samples to cover all the features of the model and increase the accuracy and recall rate of the classification model. Manually adding training samples and optimizing model parameters requires a lot of time and effort on the data preparation and program operation, resulting in low development efficiency.

Summary of the invention

In order to overcome the above problems or at least partially solve the above problems, the present invention provides a classification model training system and an implementation method thereof, so as to effectively simplify the training operation process of the classification model, thereby effectively reducing the labor intensity of the developer and improving the development efficiency.

According to an aspect of the present invention, a method for implementing a classification model training system is provided, including:

S1, based on the SPARK algorithm to train the external management requirements of the classification model, create a front-end management presentation interface, and define a front-end interaction request interface of the front-end management display interface based on the interaction requirements of the external management and the back-end service;

S2, based on the SPARK algorithm to train the internal business data requirements of the classification model, and create a back-end service data source system;

S3. Create a backend service control interface based on the front end interaction request interface of the front end management presentation interface, and establish a correspondence between the backend service control interface and the front end interaction request interface.

S4, creating internal business logic of the backend service control interface, the internal business logic includes: a business logic requirement for training a classification model based on a SPARK algorithm, and the front end interaction request interface, by calling the backend service data source system An initial classification model is created, and the initial classification model is trained and predicted to obtain a target classification model.

The step of creating a front-end management display interface in the step S1 further includes:

Separately creating a training management interface, an optimization management interface, and a classification model management interface of the classification model, wherein the training management interface is used to provide external management support for the training phase of the SPARK algorithm training classification model, and the optimization management interface is used for training the SPARK algorithm. The predictive optimization phase of the classification model provides external management support for providing external management support for the target classification model;

Correspondingly, the front end interaction request interface comprises: a front end training interaction request interface, a front end optimization interaction request interface, and a front end model management interaction request interface.

The training management interface includes at least: a classification model algorithm selection interface, a classification model algorithm parameter setting interface, a training data source setting interface, and a data preprocessing flow setting interface;

The optimization management interface includes at least: a classification model optimization strategy selection interface, a classification model optimization standard setting interface, and a prediction optimization data source setting interface;

The classification model management interface includes at least: a classification model version management interface and a classification model effect presentation interface.

The step of creating a backend service data source system in step S2 further includes:

S21, importing the SPARK-MLlib machine learning library SPARK-MLlib, and separately creating a training data source library, a prediction optimization data source library, and a model system metadata database;

S22. The prepared training sample data is stored in the training data source library, and the predicted optimized sample data is stored in the predictive optimized data source library.

The step of S3 further includes:

S31. Create a backend training management control interface based on the front end training interaction request interface, and establish a correspondence between the front end training interaction request interface and the back end training management control interface.

S32. Create a backend optimization management control interface based on the front end optimization interaction request interface, and establish a correspondence between the front end optimization interaction request interface and the backend optimization management control interface.

S33. The interaction request interface is managed based on the front-end model, and a back-end model management control interface is created, and a correspondence between the front-end model management interaction request interface and the back-end model management control interface is established.

The step of S4 further includes at least:

S41: Create internal training service logic of the backend training management control interface, where the internal training service logic includes an internal business logic flow for training a classification model process based on a SPARK algorithm and the front end training interaction request interface, by calling the SPARK-MLlib, the training data source library and the model system metadata database, creating an initial classification model, and training the initial classification model to obtain a classification model to be optimized;

S42: Create an internal optimization service logic of the backend optimization management control interface, where the internal optimization service logic includes an internal business logic flow that optimizes a classification model process based on a SPARK algorithm, and the front end optimization interaction request interface, by calling the The SPARK-MLlib, the prediction optimization data source library, and the model system metadata database perform prediction optimization on the classification model to be optimized, and acquire the target classification model.

The step of creating the internal training service logic of the backend training management control interface in the step S41 further includes:

S411: Create a pre-processed internal service logic corresponding to each of the data pre-processing algorithms based on a data pre-processing algorithm included in the data pre-processing database;

S412. Create, according to a classification algorithm included in the SPARK-MLlib, an internal service logic that generates a classification model corresponding to each of the classification algorithms.

S413. Create internal business logic of the training classification model by calling the training data source library, the pre-processing internal business logic, and the internal business logic of the generated classification model based on the setting data of the training management interface.

The step of creating the internal optimization service logic of the backend optimization management control interface in the step S42 further includes:

S421: Select, according to the request data of the front end optimization interaction request interface, predictively optimize the prediction optimization data source and the prediction optimization constraint condition of the classification model to be optimized;

S422, training an internal business logic flow of the optimization process of the classification model based on the SPARK algorithm, and creating a data access and prediction processing implementation logic for predicting and optimizing the classification model to be optimized;

S423: Create a data correction internal business logic that predicts and optimizes the classification model process to be optimized, and the data correction internal business logic includes: extracting a record of the prediction error of the classification model based on the optimization result of the classification model to be optimized, and performing Data correction

S424: Create a data update internal business logic that predicts and optimizes the classification model process to be optimized, and the data update internal business logic includes: extracting a classification model element to import the prediction optimization data based on the model system metadata database corrected by the data The next partition of the source library;

S425. Create an internal business logic that stops the optimization classification model based on the prediction optimization constraint condition. The internal business logic of the stop optimization classification model includes: re-specifying the prediction optimization data source library of the classification model, creating a prediction classification model and data optimization. The business logic stops the model optimization until the classification model parameters reach the prediction optimization constraint.

According to another aspect of the present invention, a classification model training system is provided, comprising:

The front-end management display interface is used for performing the training classification model process, the prediction optimization classification model process, and the external setting management of the classification model management. The front-end management presentation interface includes a front-end interaction request interface, and is used for external management and information interaction of the back-end service. ;

a backend service data source system for training an internal business logic call request of the classification model according to the SPARK algorithm, providing a machine learning data source of the SPARK algorithm, a training data source, a prediction optimization data source, and a model system metadata database;

a backend service control interface unit, configured to establish a correspondence between the front end interaction request interface and a backend service business logic call;

a backend service business processing unit, configured to train a business logic requirement of the classification model and the front end interaction request interface based on the SPARK algorithm, and create an initial classification model by calling the backend service data source system, and the initial classification model Perform training and predictive optimization to obtain the target classification model.

According to still another aspect of the present invention, there is provided a classification model training method according to a classification model training system as described above, comprising:

Acquiring, by the front end interaction request interface and the back end service control interface unit, build setting data, training process setting data, and optimization process setting data of the classification model input by the front end management display interface;

Constructing setting data based on the classification model, internally calling the machine learning data source of the SPARK algorithm by the backend service business processing unit, constructing an initial classification model, and depositing the model system metadata database;

Based on the training process setting data of the classification model, the training data source is internally invoked by the backend service processing unit, and the initial classification model is trained by using a SPARK algorithm to obtain a classification model to be optimized;

Based on the optimization process setting data of the classification model, the predictive optimization data source is internally invoked by the backend service business processing unit, and the SPARK algorithm is used to predict and optimize the classification model to be optimized to obtain a target classification model.

According to still another aspect of the present invention, an apparatus for implementing a classification model training system is provided, including:

At least one processor;

At least one memory communicatively coupled to the processor, wherein:

The memory stores program instructions executable by the processor, the processor invoking the program instructions to perform an implementation of a classification model training system as described above.

According to still another aspect of the present invention, a non-transitory computer readable storage medium is provided, wherein the non-transitory computer readable storage medium stores computer instructions that cause the computer to perform any of the above The implementation method of the classification model training system.

According to still another aspect of the present invention, a computer program product is provided, comprising: a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when When the program instructions are executed by the computer, the computer is caused to perform the method of implementing the classification model training system as described above.

This application proposes a classification model training system and its implementation method. By integrating the training data in the classification model training process with SPARK-MLlib with model training, adding new feature training samples and optimizing model parameters, the SPARK-based SPARK- MLlib's classification model training system uses the classification model training system to perform classification model training, which only needs to create classification model engineering on the front-end management display interface, and specify training data source, ETL algorithm, model algorithm, parameters and other training models and optimization. The basic process of the model can realize the automatic creation, training and prediction optimization of the classification model, which can effectively simplify the training operation process of the classification model, thereby effectively reducing the labor intensity of the developer and improving the development efficiency.

DRAWINGS

1 is a flowchart of a method for implementing a classification model training system according to an embodiment of the present application;

2 is a flowchart of a process for creating a backend service data source system according to an embodiment of the present application;

3 is a flowchart of a process for creating a backend service control interface according to an embodiment of the present application;

4 is a flowchart of a process of creating an internal service logic of a backend service control interface according to an embodiment of the present application;

FIG. 5 is a flowchart of a process for creating an internal training service logic of a backend training management control interface according to an embodiment of the present application;

6 is a flowchart of a process for creating an internal optimization service logic of a backend optimization management control interface according to an embodiment of the present application;

7 is a schematic structural diagram of a classification model training system according to an embodiment of the present application;

FIG. 8 is a schematic structural diagram of a backend service data source system according to an embodiment of the present application;

FIG. 9 is a flowchart of a method for training a classification model by using the classification model training system of the present application;

10 is a flowchart of a processing procedure of a SPARK algorithm training classification model according to an embodiment of the present application;

11 is a flowchart of a processing procedure of a SPARK algorithm predictive optimization classification model according to an embodiment of the present application;

FIG. 12 is a structural block diagram of an apparatus for implementing a classification model training system according to an embodiment of the present application.

Detailed ways

The present invention will be clearly and completely described in the following with reference to the drawings in the embodiments of the present invention. Some embodiments, but not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

As an aspect of the embodiment of the present invention, the present embodiment provides a method for implementing a classification model training system. Referring to FIG. 1 , it is a flowchart of a method for implementing a classification model training system according to an embodiment of the present invention, including:

S1, based on the SPARK algorithm to train the external management requirements of the classification model, create a front-end management presentation interface, and define a front-end interaction request interface of the front-end management display interface based on the interaction requirements of the external management and the back-end service.

It can be understood that the goal of the present embodiment is to establish a classification model training system based on the SPARK algorithm. The whole system is a classification model automatic training and optimization system with a front-end management display interface and a service management system at the back end. The user sets the algorithm and parameters of the classification model training through the front-end management display interface. The back-end service management system calls the corresponding data source according to the front-end settings, constructs the classification model by using the built-in SPARK algorithm, and calls the training data source and the prediction optimization data source to classify. Model training and predictive optimization to obtain the target classification model.

In step S1, the external management requirements of the process of generating the classification model process, the training classification model process and the prediction optimization classification model based on the SPARK algorithm are considered, that is, the data source that needs to be externally prepared, the algorithm and parameters to be set, etc., and the corresponding front-end management display The interface is created, and a management interface is set on the front-end management display interface for each management requirement. In addition, considering that the front-end management display interface needs to perform data interaction with the back-end service management system to input the algorithms and parameters set by the user into the back-end service management system, the front-end management is performed according to the interaction requirements of the external management and the back-end service. The corresponding front-end interactive request interface is defined in the display interface.

When the user performs the setting of the classification model training process through the front-end management presentation interface, the front-end management display interface transmits the user settings to the back-end service management system through the front-end interaction request interface. The front-end management display page interacts with the back-end service management system using standard REST APIs.

S2, based on the SPARK algorithm to train the internal business data requirements of the classification model, and create a back-end service data source system.

It can be understood that when constructing, training and predicting the classification model, the corresponding algorithms and processes need to be called, and the training model and the prediction optimization data are used to train and predict the optimized classification model. This step therefore creates a system that provides these algorithms, processes, and data.

According to the algorithms, processes and data needed to construct, train and predict the optimization process, the corresponding system is created. When the classification model training system based on SPARK algorithm is created, the algorithms, processes and data according to the SPARK algorithm are needed. That is, the internal business data needs, correspondingly create each data unit, the overall of each data unit is the back-end service data source system.

The back-end service data source system is a relatively important part, carrying the functions of the entire model training process control, data storage, model optimization strategy and providing data to the front-end display.

Optionally, the processing step of creating the backend service data source system in the step S2 is as follows: FIG. 2 is a flowchart of a process for creating a backend service data source system according to an embodiment of the present invention, including:

S21, importing a SPARK-MLlib machine learning library SPARK-MLlib, and separately creating a training data source library, a prediction optimization data source library, and a model system metadata database; S22, storing the prepared training sample data into the training data source library, The predicted optimized sample data is stored in the predictive optimized data source library.

It can be understood that this step performs data source preparation. The springBoot microservices framework is used to design the backend service management system. The system metadata management uses MySQL database storage. The model training and optimization are created using SPARK-MLlib. The data source used by the training model uses hive storage. The SPARK algorithm is used to train the classification model. The machine learning library of the SPARK algorithm is used. Therefore, the SPARK-MLlib machine learning library SPARK-MLlib is first introduced, and then the training data source preparation, the model system metadata source preparation and the prediction optimization data source are separately performed. Prepare, ie:

First, the tagged training data source is prepared. The manually prepared tagged training sample data is stored in the hive database, and the tag column (lable) and the data column (data) in the table are created, and the stored database is called a training data source library.

Second, prepare the model system metadata source (mysql). Used to store model metadata information, called the Model System Metabase (MySQL).

Third, the forecasting optimizes the data source preparation. The predictive optimization sample data is used to continuously optimize the classification model, called the prediction optimization data source (Hive-MySQL). The prediction optimization data source is a partitioned hive data table, partitioned by day, and stores the data source that needs to be predicted every day. MySQL's predictive optimization data source is used to interact with the front-end management presentation interface to store data, which is imported by the hive table.

It can be understood that in the SPARK algorithm-based classification model training system, the back-end management flow control system includes a control layer and a service layer. The Controller layer is mainly used to connect the front-end management display interface request and the back-end service data call. The Service layer is mainly used to create the actual call link of the model training and optimization process.

Step S3 can be understood as the creation of the Controller layer. The front-end management display interface request is transmitted through the front-end interaction request interface of the front-end management display interface. When the current-end management display interface sends a request through the front-end interaction request interface, in order to enable the back-end service management system to recognize the request, the corresponding back-end service is established. Controlling the interface and establishing a correspondence between the backend service control interface and the corresponding front end interaction request interface.

The front-end interactive request interface is a request url link in the form of http, and different request links are created for different service requests to ensure the uniqueness of the url of different service requests.

The back-end service control interface is a code method for implementing the business logic. The function is to describe the service request described by the front end in the url manner, and correspondingly implement the specific code on the server side.

For example, in the end management display interface, the url of the common classification model is defined as ip:port/create-model, and the CreateModel (Model model) function is defined in the Controller layer of the backend service management system and associated with /create-model. The CreateModel function is called when the backend service receives the frontend/create-model request.

It can be understood that, according to the foregoing embodiment, the back-end management flow control system includes a Controller layer and a Service layer, and the Service layer is mainly used to create an actual calling link of the model training and optimization process, that is, defining a specific interface defined in the Controller layer. Implementation process. This step creates the internal business logic of the backend service control interface by creating a Service layer.

First create the springBoot portal program, bind port 8180, listen for requests when the service starts. When there is a request, the corresponding business logic is triggered. Then, the specific implementation process of the interface defined in the Controller layer is defined. The implementation process includes: firstly training the business logic requirement of the classification model based on the SPARK algorithm and the front end interaction request interface, and creating an initial classification model by calling the backend service data source system. The SPARK algorithm is used to train and predict the initial model to obtain the target classification model. For example, the createModel(Model model) function is defined in the Controller layer, and the process of implementing the createModel function is defined in the service layer.

An implementation method of a classification model training system provided by an embodiment of the present invention, by creating an external management front-end management presentation interface, and a back-end service data source system and a back-end service business processing unit of the back-end service management, and establishing each system The correspondence between the units is integrated into a system based on the SPARK algorithm for classification model construction, training and prediction optimization. A SPARK-MLlib-based front-end management display interface is set up and classified in the back-end service management system. The process framework for model training. When using the system for classification model training, the training optimization process of the entire classification model can be completed only by the front-end management display interface operation, which can effectively simplify the classification model training operation process, thereby effectively reducing the developer labor intensity and improving the development efficiency.

In an embodiment, the step of creating a front-end management presentation interface in step S1 further comprises: respectively creating a training management interface, an optimization management interface, and a classification model management interface of the classification model, wherein the training management interface is used for training the SPARK algorithm. The training phase of the classification model provides external management support for providing external management support for the predictive optimization phase of the SPARK algorithm training classification model, the classification model management interface being used to provide external management support for the target classification model Correspondingly, the front end interaction request interface includes: a front end training interaction request interface, a front end optimization interaction request interface, and a front end model management interaction request interface.

It can be understood that when the classification model training is performed, the algorithms and parameters of the training process and the prediction optimization process need to be set. At the same time, in order to manage the classification model, the management parameters need to be set. Therefore, when creating the front-end management display interface, at least the training management interface, the optimization management interface, and the classification model management interface of the classification model need to be created.

Similarly, in order to interact with the back-end service management system, an interface function needs to be set in each management interface, that is, a front-end training interaction request interface is set in the training management interface, and a front-end optimized interaction request interface is set in the optimization management interface, and the classification model management is performed. The interface sets the front-end model management interaction request interface.

Optionally, the training management interface includes at least: a classification model algorithm selection interface, a classification model algorithm parameter setting interface, a training data source setting interface, and a data preprocessing process setting interface; and the optimization management interface includes at least: classification model optimization The policy selection interface, the classification model optimization standard setting interface, and the prediction optimization data source setting interface; the classification model management interface at least includes: a classification model version management interface and a classification model effect presentation interface.

It can be understood that, according to the foregoing embodiment, when creating a front-end management display interface, the implementation code is written by using Angularjs and html, firstly, a training management interface, an optimization management interface, and a classification model management interface of the classification model are created, and then created in each management interface. Sub-interfaces, including:

In the training management interface, a classification model algorithm selection interface, a classification model algorithm parameter setting interface, a training data source setting interface, and a data preprocessing flow setting interface are respectively used for selecting a classification model algorithm, setting a classification model algorithm parameter, and training data. Source selection and pre-processing settings for training data;

Create a classification model optimization strategy selection interface, a classification model optimization standard setting interface, and a prediction optimization data source setting interface in the optimization management interface, respectively, for optimizing strategy selection settings, optimizing standard settings, and optimizing data source selection settings;

A classification model version management interface and a classification model effect presentation interface are created in the classification model management interface, which are respectively used for the version management of the classification model and the effect presentation of the classification model.

The above selection settings can be selected by the drop-down list. All front-end management presentation interfaces use post requests to interact with the back-end service management system for data interaction.

An implementation method of a classification model training system provided by an embodiment of the present invention can conveniently implement a classification model by separately creating a training management interface, an optimization management interface, and a classification model management interface of a classification model, and defining a setting sub-interface of each management interface. The external settings management of the construction process, the training process, and the predictive optimization settings, and the setting of the classification algorithm by the pull-down selection list, so that the user only needs to click the corresponding option according to the need, without manual input, can improve work efficiency and user experience.

In another embodiment, the processing step of the step S3 is as follows: FIG. 3 is a flowchart of a process for creating a backend service control interface according to an embodiment of the present invention, including:

S31, based on the front-end training interaction request interface, creating a back-end training management control interface, and establishing a correspondence between the front-end training interaction request interface and the back-end training management control interface; S32, optimizing an interaction request based on the front-end Interface, creating a backend optimization management control interface, and establishing a correspondence between the front end optimization interaction request interface and the back end optimization management control interface; S33, managing an interaction request interface based on the front end model, and creating a backend model management control An interface, and establishing a correspondence between the front-end model management interaction request interface and the back-end model management control interface.

It can be understood that, according to the foregoing embodiment, the creation of the backend service control interface is implemented by creating the Controller layer of the backend service management system, and the training management interface, the optimization management interface, and the classification are created when the front end management display interface is created. The model management interface defines the front-end interaction request interface of each management interface. Therefore, when creating the backend service control interface, it is necessary to create a backend training management control interface, a backend optimization management control interface, and a backend model management control interface, and respectively establish corresponding correspondences between the corresponding interfaces, so as to obtain the classification model. Each processing stage smoothly calls the corresponding interface. In addition, step numbers S31, S32, and S33 in this embodiment only distinguish each step, and do not limit the order of implementation of the corresponding steps.

The implementation method of the classification model training system provided by the embodiment of the present invention creates a back-end service control interface of the back-end service management system by corresponding to each front-end management display interface, and establishes a correspondence between the front-end and the back-end, which can be called on the interface. Quickly and accurately call the corresponding interface to improve system processing efficiency.

In still another embodiment, the processing step of the step S4 is further related to the process of creating an internal service logic of the backend service control interface according to the embodiment of the present invention, which includes:

S41: Create internal training service logic of the backend training management control interface, where the internal training service logic includes an internal business logic flow for training a classification model process based on a SPARK algorithm and the front end training interaction request interface, by calling the The SPARK-MLlib, the training data source library and the model system metadata database create an initial classification model, and train the initial classification model to obtain a classification model to be optimized.

It can be understood that, in order to enable the model training system to display and interface the classification model according to the front-end management display interface, the internal implementation business logic of the back-end training management control interface, that is, the internal training business logic, needs to be defined correspondingly. The implementation process of the back-end training management control interface, that is, the internal training business logic includes:

According to the front-end training interaction request interface data, the processing rules and processes of the classification model are trained according to the SPARK algorithm, the initial classification model is constructed by calling SPARK-MLlib, and the initial classification model data is stored in the model system metadata database. Then, the training data source is obtained by accessing the training data source library, and the constructed initial classification model is trained by using the acquired training data source, and the trained classification model is the classification model to be optimized.

Optionally, the step of further processing the internal training service logic of the backend training management control interface is described in step S41. Referring to FIG. 5, an internal training service for creating a backend training management control interface is performed according to an embodiment of the present invention. The logical process flow diagram includes at least:

S411. Create a pre-processed internal business logic corresponding to each of the data pre-processing algorithms based on a data pre-processing algorithm included in the data pre-processing database.

It can be understood that, in the training process of the classification model, before the training of the initial classification model is performed by using the training data source, the prepared training data source is preprocessed to remove the noise in the data and better adapt to the model training. There are many training model preprocessing algorithms, such as data uniform format, normalization and word substitution. The user can select the pre-processing algorithm through the front-end management display interface, and the back-end service management system invokes the corresponding processing logic according to the front-end selection.

Therefore, the pre-processing internal business logic of the pre-processing algorithm option included in the front-end management display interface needs to be created, and the data pre-processing algorithm corresponding to the front-end pre-processing algorithm option is included in the data pre-processing database, so only the data pre-processing is required. The data preprocessing algorithm included in the database is processed to create a corresponding preprocessed internal business logic.

S412. Create an internal service logic for generating a classification model corresponding to each of the classification algorithms based on a classification algorithm included in the SPARK-MLlib.

It can be understood that, similarly to the above steps, the classification model business implementation logic is different based on different classification algorithms, the constructed classification models are different, and the training process of the model is different. In order to set the classification algorithm and the algorithm parameters through the front-end management display interface, the corresponding classification model construction and classification model training process are implemented according to the user selection setting, and the internal business logic of the classification model corresponding to each classification algorithm is created. The SPARK-MLlib-based classification model is constructed and trained to realize business logic, and the classification algorithms currently supported by SPARK-MLlib such as Naive Bayes, Support Vector Machine and Logistic Regression are newly built.

It can be understood that step S413 implements the creation of the classification model training program. The training program is specifically implemented. According to the parameter setting of the front-end training management interface, the SPARK program is created, the training data source is read, the classification model training script is generated and automatically uploaded to the SPARK cluster server. The system calls the script, starts the SPARK program, creates the classification model, and stores the classification model results to the specified hdfs path, and stores the system metadata of the classification model such as model confusion matrix, correct rate and recall rate to the model system metadata database ( MySQL).

It can be understood that, in order to enable the model training system to display the interface settings according to the front-end management, the prediction optimization of the classification model to be optimized is performed by itself, and the internal implementation business logic of the back-end optimization management control interface is defined correspondingly, that is, the internal optimization business logic. The implementation process of the backend optimization management control interface, that is, the internal optimization business logic includes:

According to the front-end optimization interaction request interface data, according to the SPARK algorithm to predict the optimization classification model processing rules and processes, by calling SPARK-MLlib, and accessing the prediction optimization data source library to obtain the prediction optimization data source, the optimization classification model is predicted, and then according to the prediction As a result, the multi-classification model is optimized, and the classification model that has been predicted and optimized and reaches the optimization standard is the target classification model. In addition, step numbers S41 and S42 in this embodiment only distinguish each step, and do not limit the order of implementation of the corresponding steps.

Optionally, the further processing step of creating the internal optimization service logic of the backend optimization management control interface in step S42 is as follows: FIG. 6 is an internal optimization service for creating a backend optimization management control interface according to an embodiment of the present invention. Logical process flow diagram, including:

S421. Select, according to the request data of the front end optimization interaction request interface, predictively optimize the prediction optimization data source and the prediction optimization constraint condition of the classification model to be optimized.

It can be understood that for the classification model that has been trained, it needs to be predicted by another prediction optimization data source, and the classification model is optimized according to the prediction result. This step specifically creates a system according to the request data of the front-end optimized interaction request interface, generates a classification model optimization strategy, specifies a classification optimization model data source and a data column that needs to be predicted, creates a daily prediction task, and specifies an optimal parameter threshold of the classification model. That is, predictive optimization constraints to determine whether the classification model needs to continue to optimize the internal business logic.

S422. The SPARK algorithm is used to train the internal business logic flow of the optimization process of the classification model, and the data access and prediction processing implementation logic of the prediction optimization optimization classification model is created.

It can be understood that when performing the prediction optimization of the classification model, the corresponding data access and the prediction optimization processing steps are performed according to the defined processing flow. This step specifically creates a predictive optimization strategy for the system, including: the system reads the hdfs path model, loads the classification model data into the memory, reads the source data that needs to be predicted from the Hive, predicts by the classification model, and writes the result to In the model system metabase MySQL, the results of the predictions are displayed on the system page.

S423: Create a data correction internal business logic that predicts and optimizes the classification model process to be optimized, and the data correction internal business logic includes: extracting a record of the prediction error of the classification model based on the optimization result of the classification model to be optimized, and performing Data correction.

It can be understood that in the process of predictive optimization of the classification model, it is necessary to record the prediction error data after predicting the model with each set of prediction data, and correct the prediction data and model parameters of the prediction error record. This step specifically creates a data correction strategy for the predictive optimization process, including: correcting the predicted optimization data, and adding new predictive optimization sample data. At the same time, the system front-end management display interface is used to view the prediction results, and the model prediction error records are extracted and re-corrected, and the corrected classification model data is stored in the model system metadata database MySQL.

S424: Create a data update internal business logic that predicts and optimizes the classification model process to be optimized, and the data update internal business logic includes: extracting a classification model element into the prediction optimization data source library based on the data corrected model system metadata database Next partition.

It can be understood that after correcting the parameters of the classification model, that is, after optimizing the classification model, it is necessary to continue to predict the corrected classification model by using the prediction optimization data source. This step specifically creates an update strategy for optimizing the classification model data, updates the classification model data, and adds new model features to the model system metadata database MySQL. After the new feature data is stored in the predictive optimized data source library MySQL, the system calls the Sqoop tool to extract MySQL into the new day partition data of the predicted optimized data source (Hive).

It can be understood that after the classification model data is corrected in the above steps, and the prediction optimization data source is updated, the corrected classification model needs to be trained with the updated prediction optimization data source. In this step, a model optimization stop strategy is created, a training data source of the classification model is re-designated, and the training classification model and the training data optimization step are repeated until the classification model parameters reach a preset prediction optimization constraint condition, then the training of the classification model is stopped, and the target is acquired. Classification model.

An implementation method of a classification model training system provided by an embodiment of the present invention, by separately creating an internal implementation business logic of a classification model construction, training, and prediction optimization process, so that when the user uses the classification model training system to perform classification model training, only It is necessary to set the data and parameters in the front-end management display interface, and the system can automatically complete the construction training and prediction optimization of the classification model, and the operation is simple, and the development efficiency is improved.

As another aspect of the embodiment of the present invention, the present embodiment provides a classification model training system. Referring to FIG. 7, FIG. 7 is a schematic structural diagram of a classification model training system according to an embodiment of the present invention, including: a front-end management display interface, and a back end. The service data source system 2, the backend service control interface unit 3, and the backend service business processing unit 4.

The front-end management display interface 1 is used for performing the training classification model process, the predictive optimization classification model process, and the external setting management of the classification model management. The front-end management display interface 1 includes a front-end interaction request interface 101 for external management and back-end services. Information interaction; the back-end service data source system 2 is configured to train the internal business logic calling request of the classification model according to the SPARK algorithm, provide the machine learning data source of the SPARK algorithm, the training data source, the prediction optimization data source, and the model system metadata database; The service control interface unit 3 is configured to establish a correspondence between the front end interaction request interface and the back end service business logic call; the back end service business processing unit 4 is configured to train the business logic requirement of the classification model and the front end interaction based on the SPARK algorithm. The request interface generates an initial classification model by calling the backend service data source system, and performs training and prediction optimization on the initial classification model to obtain a target classification model.

It can be understood that the classification model training system of the embodiment includes a front-end management presentation interface for the user to perform external management settings, a back-end service business processing unit 4 for back-end service management, and data for training for the classification model. The supported backend service data source system 2 and the backend service control interface unit 3 for establishing a relationship between the user external management and the backend service management. When the user performs the setting of the classification model training process through the front-end management presentation interface, the front-end management display interface transmits the user settings to the back-end service management system through the front-end interaction request interface. The front-end management display page interacts with the back-end service management system using standard REST APIs.

The front-end management display interface request is transmitted through the front-end interactive request interface. When the current-end management display interface sends a request through the front-end interactive request interface, the back-end service management system identifies the request through the corresponding back-end service control interface, and the back-end service service is recognized by the back-end service management system. Processing unit 4 invokes the corresponding algorithm and flow. For the construction, training and predictive optimization of the classification model, the backend service business processing unit 4 calls the corresponding training and predictive optimization data to train and predict the constructed classification model.

When using the system for classification model training, the service listens for the request when it starts, and when there is a request, triggers the corresponding business logic. Firstly, based on the SPARK algorithm, the business logic requirements of the classification model and the front-end interactive request interface are described, and the initial classification model is created by calling the back-end service data source system; then the SPARK algorithm is used to train and predict the initial model to obtain the target classification. model.

In one embodiment, the framework of the backend service data source system refers to FIG. 8, which is a schematic structural diagram of a backend service data source system according to an embodiment of the present invention, including: a MySQL model system metadata database, a Hive training data source library, and MySQL. -Hive predictive optimization data source library, classification model system unit, algorithm model unit and SPARK cluster. The MySQL model system metabase is used to store model metadata, the Hive training data source library is used to store training source data, and the MySQL-Hive predictive optimization data source library is used to store predictive optimized data sources.

The classification model training system provided by the embodiment of the invention enables the user to create a classification model project on the front-end management display interface of the system, and specifies a training data source, an ETL algorithm, a model algorithm, and the like. The basic process of training model and optimization model such as parameters, follow-up training and optimization classification model only need to select and click on the interface, or create a timed task automatically executed by the system to obtain the target classification model in a short time, avoiding repeated and continuous Training sample preparation and parameter optimization, so that the user's focus on the optimization and implementation of the algorithm itself, to get rid of the past a lot of energy in data preparation and program operation, improve development efficiency.

As another aspect of the embodiment of the present invention, the present embodiment provides a classification model training method according to the classification model training system as described above. Referring to FIG. 9, according to an embodiment of the present invention, a classification model training system of the present invention is used for classification. Flow chart of method training, including:

S901. Acquire, by the front end interaction request interface and the backend service control interface unit, build setting data, training process setting data, and optimization process setting data of the classification model input by the front end management display interface.

It can be understood that the classification model training process is defined in the system, and the classification model training project is created. The user selects the classification model using the algorithm through the front-end management display interface, formulates the algorithm parameters, selects the data source table, specifies the label column and the data column in the table, and defines the pre-processing (ETL) process of the training data source, and performs data on the initial data column. Pre-processing, specifying pre-processing operations such as data unification, normalization, and word replacement are used to remove noise from the data columns and better accommodate model training. The backend service management system obtains the user's custom setting data through the front end interaction request interface and the back end service control interface unit.

S902. Based on the configuration setting data of the classification model, the back-end service processing unit internally invokes the machine learning data source of the SPARK algorithm to construct an initial classification model and store the model in the model system metadata database.

It can be understood that after acquiring the user-defined setting data, the back-end service business processing unit creates a model training script according to the selected model algorithm and the training data source and automatically uploads it to the SPARK cluster server. The system calls the script, starts the SPARK program, builds the initial classification model, and stores the model results to the specified hdfs path, and stores the classification model data such as model confusion matrix, correct rate and recall rate to the model system metadata database MySQL.

S903, based on the training process setting data of the classification model, the training data source is internally invoked by the backend service processing unit, and the initial classification model is trained by using a SPARK algorithm to obtain a classification model to be optimized.

It can be understood that, referring to FIG. 10, a flowchart of a processing procedure of a SPARK algorithm training classification model according to an embodiment of the present invention, according to a training data source and a classification model selected by a user through a front-end management display interface, processing processing parameters, and a back-end service processing The unit calls the corresponding tagged training data source in the Hive library and initializes the data source. The initial classification model constructed by the above steps is then trained with the processed training data source. The model training result is written into the model system metabase MySQL, and the model prediction file is written into the system storage unit to obtain the classification model to be optimized.

S904, based on the optimization process setting data of the classification model, invoking the predictive optimization data source by using the backend service business processing unit, and performing prediction optimization on the to-be-optimized classification model by using a SPARK algorithm to obtain a target classification model.

It can be understood that, referring to FIG. 11 , a flowchart of a processing procedure of a SPARK algorithm predictive optimization classification model according to an embodiment of the present invention, the system creates a daily prediction task according to a user selecting a predictive optimization data source of a classification model and a data column that needs to be predicted. Specify the optimal parameter threshold of the model, that is, predictive optimization constraints to determine whether the classification model needs to be continuously optimized.

Then, the system reads the hdfs path model, loads the classification model data into the memory, reads the data source that needs to be predicted from Hive, predicts it by the classification model, and writes the prediction result to the model system metabase MySQL, and displays it in the front-end management. The interface displays the predicted results;

Next, the classification model parameters are optimized, that is, the data is corrected, and a new training model sample is added. The front-end management display interface is used to view the prediction results of the classification model, and the records of the model prediction errors are corrected, extracted, and stored in the model system metadata database MySQL.

Then, the training data source is updated for the classification model after optimizing the parameters. After adding new model features to the training samples, the new feature data is stored in the model system metabase MySQL, and the system calls the Sqoop tool to extract MySQL into the new day partition data of the predicted optimized data source (Hive).

Finally, re-specify the predictive optimization data source of the classification model, and repeat the steps of defining the predictive optimization classification model until the classification model parameters reach the specified threshold, stop the model training, and obtain the target classification model.

According to the embodiment of the present invention, a classification model training method according to the classification model training system as described above is adopted, and the parameter selection and setting of the classification model construction, the classification model training, and the classification model prediction optimization process are performed in the front-end management display interface. According to the setting, the end service management system automatically creates a classification model construction, training and prediction optimization processing flow, and obtains a target classification model that conforms to the setting, which can effectively simplify the classification model training operation flow, thereby effectively reducing the developer labor intensity and improving the development efficiency.

Referring to FIG. 12, the implementation device of the classification model training system includes: a processor 1201, a memory 1202, and a bus 1203;

The processor 1201 and the memory 1202 complete communication with each other through the bus 1203;

The memory 1202 stores program instructions that are executable by the processor 1201, and the processor 1201 is configured to invoke program instructions in the memory 1202 to perform the implementation of the foregoing method for implementing the classification model training system. The method includes, for example, training an external management requirement of the classification model based on the SPARK algorithm, creating a front-end management presentation interface, and defining a front-end interaction request interface of the front-end management presentation interface based on an interaction requirement of the external management and the back-end service; S21, importing a SPARK-MLlib machine learning library SPARK-MLlib, and separately creating a training data source library, a prediction optimization data source library, and a model system metadata database; S22, storing the prepared training sample data into the training data source library, The predicted optimized sample data is stored in the predictive optimized data source library and the like.

The embodiment discloses a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a computer, the computer The method provided by the implementation method of the foregoing classification model training system includes, for example, training an external management requirement of the classification model based on the SPARK algorithm, creating a front-end management presentation interface, and based on an interaction requirement between the external management and the back-end service, Defining a front-end interaction request interface of the front-end management display interface; and, S21, importing a SPARK-MLlib of a SPARK algorithm, and separately creating a training data source library, a prediction optimization data source library, and a model system metadata database; S22, The prepared training sample data is stored in the training data source library, and the predicted optimized sample data is stored in the predicted optimized data source library or the like.

Another embodiment of the present invention provides a non-transitory computer readable storage medium storing computer instructions, the computer instructions causing the computer to execute an implementation method of each of the above classification model training systems The method provided by the embodiment includes, for example, training an external management requirement of the classification model based on the SPARK algorithm, creating a front-end management presentation interface, and defining a front-end interaction request of the front-end management presentation interface based on an interaction requirement of the external management and the back-end service. Interface; and, S21, importing SPARK-MLlib of the SPARK algorithm, and respectively creating a training data source library, a prediction optimization data source library, and a model system metadata database; S22, storing the prepared training sample data into the training The data source library stores the predicted optimized sample data into the predicted optimized data source library and the like.

A person skilled in the art may understand that all or part of the steps of implementing the foregoing method for implementing the classification model training system may be completed by using hardware related to the program instructions, and the foregoing program may be stored in a computer readable storage medium. The program, when executed, performs the steps of the embodiment of the implementation method including the above-described various classification model training systems; and the foregoing storage medium includes various media that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.

The implementation device and the like of the classification model training system described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as the unit may or may not be It is not a physical unit, it can be located in one place, or it can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment. Those of ordinary skill in the art can understand and implement without deliberate labor.

Through the description of the above embodiments, those skilled in the art can clearly understand that the various embodiments can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware. Based on such understanding, the above-described technical solutions may be embodied in the form of software products in essence or in the form of software products, which may be stored in a computer readable storage medium such as ROM/RAM, magnetic Discs, optical discs, etc., include instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments or portions of the embodiments.

Finally, it should be noted that the above embodiments are only for explaining the technical solutions of the present invention, and are not limited thereto; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that they can still The technical solutions described in the foregoing embodiments are modified, or the equivalents of the technical features are replaced by the equivalents of the technical solutions of the embodiments of the present invention.

Claims

A method for implementing a classification model training system, comprising:

S1, based on the SPARK algorithm to train the external management requirements of the classification model, create a front-end management presentation interface, and define a front-end interaction request interface of the front-end management display interface based on the interaction requirements of the external management and the back-end service;

S2, based on the SPARK algorithm to train the internal business data requirements of the classification model, and create a back-end service data source system;

S3. Create a backend service control interface based on the front end interaction request interface of the front end management presentation interface, and establish a correspondence between the backend service control interface and the front end interaction request interface.

S4, creating internal business logic of the backend service control interface, the internal business logic includes: a business logic requirement for training a classification model based on a SPARK algorithm, and the front end interaction request interface, by calling the backend service data source system An initial classification model is created, and the initial classification model is trained and predicted to obtain a target classification model.
The method according to claim 1, wherein the step of creating a front-end management presentation interface in step S1 further comprises:

Separately creating a training management interface, an optimization management interface, and a classification model management interface of the classification model; the training management interface is used to provide external management support for the training phase of the SPARK algorithm training classification model, the optimization management interface is used for training the SPARK algorithm The predictive optimization phase of the classification model provides external management support for providing external management support for the target classification model;

Correspondingly, the front end interaction request interface comprises: a front end training interaction request interface, a front end optimization interaction request interface, and a front end model management interaction request interface.
The method according to claim 2, wherein the training management interface comprises at least: a classification model algorithm selection interface, a classification model algorithm parameter setting interface, a training data source setting interface, and a data preprocessing flow setting interface;

The optimization management interface includes at least: a classification model optimization strategy selection interface, a classification model optimization standard setting interface, and a prediction optimization data source setting interface;

The classification model management interface includes at least: a classification model version management interface and a classification model effect presentation interface.
The method of claim 3, wherein the step of creating a backend service data source system in step S2 further comprises:

S21, importing the SPARK-MLlib machine learning library SPARK-MLlib, and separately creating a training data source library, a prediction optimization data source library, and a model system metadata database;

S22. The prepared training sample data is stored in the training data source library, and the predicted optimized sample data is stored in the predictive optimized data source library.
The method of claim 4, wherein the step of S3 further comprises:

S31. Create a backend training management control interface based on the front end training interaction request interface, and establish a correspondence between the front end training interaction request interface and the back end training management control interface.

S32. Create a backend optimization management control interface based on the front end optimization interaction request interface, and establish a correspondence between the front end optimization interaction request interface and the backend optimization management control interface.

S33. The interaction request interface is managed based on the front-end model, and a back-end model management control interface is created, and a correspondence between the front-end model management interaction request interface and the back-end model management control interface is established.
The method according to claim 5, wherein the step of S4 further comprises at least:

S41: Create internal training service logic of the backend training management control interface, where the internal training service logic includes an internal business logic flow for training a classification model process based on a SPARK algorithm and the front end training interaction request interface, by calling the SPARK-MLlib, the training data source library and the model system metadata database, creating an initial classification model, and training the initial classification model to obtain a classification model to be optimized;

S42: Create an internal optimization service logic of the backend optimization management control interface, where the internal optimization service logic includes an internal business logic flow that optimizes a classification model process based on a SPARK algorithm, and the front end optimization interaction request interface, by calling the The SPARK-MLlib, the prediction optimization data source library, and the model system metadata database perform prediction optimization on the classification model to be optimized, and acquire the target classification model.
The method according to claim 6, wherein the step of creating the internal training service logic of the backend training management control interface in step S41 further comprises at least:

S411: Create a pre-processed internal service logic corresponding to each of the data pre-processing algorithms based on a data pre-processing algorithm included in the data pre-processing database;

S412. Create, according to a classification algorithm included in the SPARK-MLlib, an internal service logic that generates a classification model corresponding to each of the classification algorithms.

S413. Create internal business logic of the training classification model by calling the training data source library, the pre-processing internal business logic, and the internal business logic of the generated classification model based on the setting data of the training management interface.
The method according to claim 6, wherein the step of creating the internal optimization service logic of the backend optimization management control interface in S42 further comprises:

S421: Select, according to the request data of the front end optimization interaction request interface, predictively optimize the prediction optimization data source and the prediction optimization constraint condition of the classification model to be optimized;

S422, training an internal business logic process of the optimization process of the classification model based on the SPARK algorithm, and creating a data access and prediction processing implementation logic for predicting and optimizing the classification model to be optimized;

S423: Create a data correction internal business logic that predicts and optimizes the classification model process to be optimized, and the data correction internal business logic includes: extracting a record of the prediction error of the classification model based on the optimization result of the classification model to be optimized, and performing Data correction

S424: Create a data update internal business logic that predicts and optimizes the classification model process to be optimized, and the data update internal business logic includes: extracting a classification model element to import the prediction optimization data based on the model system metadata database corrected by the data The next partition of the source library;

S425. Create an internal business logic that stops the optimization classification model based on the prediction optimization constraint condition. The internal business logic of the stop optimization classification model includes: re-specifying the prediction optimization data source library of the classification model, creating a prediction classification model and data optimization. The business logic stops the model optimization until the classification model parameters reach the prediction optimization constraint.
A classification model training system, comprising:

The front-end management display interface is used for performing the training classification model process, the prediction optimization classification model process, and the external setting management of the classification model management. The front-end management presentation interface includes a front-end interaction request interface, and is used for external management and information interaction of the back-end service. ;

a backend service data source system for training an internal business logic call request of the classification model according to the SPARK algorithm, providing a machine learning data source of the SPARK algorithm, a training data source, a prediction optimization data source, and a model system metadata database;

a backend service control interface unit, configured to establish a correspondence between the front end interaction request interface and a backend service business logic call;

a backend service business processing unit, configured to train a business logic requirement of the classification model and the front end interaction request interface based on the SPARK algorithm, and create an initial classification model by calling the backend service data source system, and the initial classification model Perform training and predictive optimization to obtain the target classification model.
A classification model training method for a classification model training system according to claim 9, comprising:

Obtaining, by the front end interaction request interface and the back end service control interface unit, build setting data, training process setting data, and optimization process setting data of the classification model input by the front end management display interface;

Constructing setting data based on the classification model, internally calling the machine learning data source of the SPARK algorithm by the backend service business processing unit, constructing an initial classification model, and depositing the model system metadata database;

Based on the training process setting data of the classification model, the training data source is internally invoked by the backend service processing unit, and the initial classification model is trained by using a SPARK algorithm to obtain a classification model to be optimized;

Based on the optimization process setting data of the classification model, the predictive optimization data source is internally invoked by the backend service business processing unit, and the SPARK algorithm is used to predict and optimize the classification model to be optimized to obtain a target classification model.
An apparatus for implementing a classification model training system, comprising:

At least one processor;

At least one memory communicatively coupled to the processor, wherein:

The memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any one of claims 1-8.
A non-transitory computer readable storage medium, wherein the non-transitory computer readable storage medium stores computer instructions, the computer instructions causing the computer to perform the method of any one of claims 1-8 method.
A computer program product, comprising: a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a computer, The computer performs the method of any of claims 1-8.