WO2019041708A1 - Classification model training system and realisation method therefor - Google Patents

Classification model training system and realisation method therefor Download PDF

Info

Publication number
WO2019041708A1
WO2019041708A1 PCT/CN2017/120174 CN2017120174W WO2019041708A1 WO 2019041708 A1 WO2019041708 A1 WO 2019041708A1 CN 2017120174 W CN2017120174 W CN 2017120174W WO 2019041708 A1 WO2019041708 A1 WO 2019041708A1
Authority
WO
WIPO (PCT)
Prior art keywords
classification model
training
interface
optimization
management
Prior art date
Application number
PCT/CN2017/120174
Other languages
French (fr)
Chinese (zh)
Inventor
王毅
张文明
陈少杰
Original Assignee
武汉斗鱼网络科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 武汉斗鱼网络科技有限公司 filed Critical 武汉斗鱼网络科技有限公司
Publication of WO2019041708A1 publication Critical patent/WO2019041708A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification

Definitions

  • the present invention relates to the field of information processing technologies, and in particular, to a classification model training system and an implementation method thereof.
  • SPARK.MLlib using SPARK algorithm has become a common machine learning method.
  • SPARK.MLlib for classification algorithm model training, and because the classification algorithm belongs to supervised learning, it is necessary to prepare a large number of labeled samples in advance, which are divided into training samples and test samples, and then these labeled samples are used by SPARK.MLlib.
  • the training of the classification algorithm model is carried out. In this process, the sample and model parameters need to be continuously adjusted to optimize the classification algorithm model.
  • the commonly used method of optimizing the classification model requires manual addition of training samples to cover all the features of the model and increase the accuracy and recall rate of the classification model. Manually adding training samples and optimizing model parameters requires a lot of time and effort on the data preparation and program operation, resulting in low development efficiency.
  • the present invention provides a classification model training system and an implementation method thereof, so as to effectively simplify the training operation process of the classification model, thereby effectively reducing the labor intensity of the developer and improving the development efficiency.
  • a method for implementing a classification model training system including:
  • S1 based on the SPARK algorithm to train the external management requirements of the classification model, create a front-end management presentation interface, and define a front-end interaction request interface of the front-end management display interface based on the interaction requirements of the external management and the back-end service;
  • the internal business logic includes: a business logic requirement for training a classification model based on a SPARK algorithm, and the front end interaction request interface, by calling the backend service data source system An initial classification model is created, and the initial classification model is trained and predicted to obtain a target classification model.
  • the step of creating a front-end management display interface in the step S1 further includes:
  • the training management interface is used to provide external management support for the training phase of the SPARK algorithm training classification model
  • the optimization management interface is used for training the SPARK algorithm.
  • the predictive optimization phase of the classification model provides external management support for providing external management support for the target classification model;
  • the front end interaction request interface comprises: a front end training interaction request interface, a front end optimization interaction request interface, and a front end model management interaction request interface.
  • the training management interface includes at least: a classification model algorithm selection interface, a classification model algorithm parameter setting interface, a training data source setting interface, and a data preprocessing flow setting interface;
  • the optimization management interface includes at least: a classification model optimization strategy selection interface, a classification model optimization standard setting interface, and a prediction optimization data source setting interface;
  • the classification model management interface includes at least: a classification model version management interface and a classification model effect presentation interface.
  • step S2 The step of creating a backend service data source system in step S2 further includes:
  • the prepared training sample data is stored in the training data source library, and the predicted optimized sample data is stored in the predictive optimized data source library.
  • the step of S3 further includes:
  • the interaction request interface is managed based on the front-end model, and a back-end model management control interface is created, and a correspondence between the front-end model management interaction request interface and the back-end model management control interface is established.
  • the step of S4 further includes at least:
  • S41 Create internal training service logic of the backend training management control interface, where the internal training service logic includes an internal business logic flow for training a classification model process based on a SPARK algorithm and the front end training interaction request interface, by calling the SPARK-MLlib, the training data source library and the model system metadata database, creating an initial classification model, and training the initial classification model to obtain a classification model to be optimized;
  • S42 Create an internal optimization service logic of the backend optimization management control interface, where the internal optimization service logic includes an internal business logic flow that optimizes a classification model process based on a SPARK algorithm, and the front end optimization interaction request interface, by calling the The SPARK-MLlib, the prediction optimization data source library, and the model system metadata database perform prediction optimization on the classification model to be optimized, and acquire the target classification model.
  • the internal optimization service logic includes an internal business logic flow that optimizes a classification model process based on a SPARK algorithm, and the front end optimization interaction request interface, by calling the The SPARK-MLlib, the prediction optimization data source library, and the model system metadata database perform prediction optimization on the classification model to be optimized, and acquire the target classification model.
  • the step of creating the internal training service logic of the backend training management control interface in the step S41 further includes:
  • S411 Create a pre-processed internal service logic corresponding to each of the data pre-processing algorithms based on a data pre-processing algorithm included in the data pre-processing database;
  • S412. Create, according to a classification algorithm included in the SPARK-MLlib, an internal service logic that generates a classification model corresponding to each of the classification algorithms.
  • the step of creating the internal optimization service logic of the backend optimization management control interface in the step S42 further includes:
  • S421 Select, according to the request data of the front end optimization interaction request interface, predictively optimize the prediction optimization data source and the prediction optimization constraint condition of the classification model to be optimized;
  • S423 Create a data correction internal business logic that predicts and optimizes the classification model process to be optimized, and the data correction internal business logic includes: extracting a record of the prediction error of the classification model based on the optimization result of the classification model to be optimized, and performing Data correction
  • S424 Create a data update internal business logic that predicts and optimizes the classification model process to be optimized, and the data update internal business logic includes: extracting a classification model element to import the prediction optimization data based on the model system metadata database corrected by the data The next partition of the source library;
  • the internal business logic of the stop optimization classification model includes: re-specifying the prediction optimization data source library of the classification model, creating a prediction classification model and data optimization.
  • the business logic stops the model optimization until the classification model parameters reach the prediction optimization constraint.
  • a classification model training system comprising:
  • the front-end management display interface is used for performing the training classification model process, the prediction optimization classification model process, and the external setting management of the classification model management.
  • the front-end management presentation interface includes a front-end interaction request interface, and is used for external management and information interaction of the back-end service. ;
  • a backend service data source system for training an internal business logic call request of the classification model according to the SPARK algorithm, providing a machine learning data source of the SPARK algorithm, a training data source, a prediction optimization data source, and a model system metadata database;
  • a backend service control interface unit configured to establish a correspondence between the front end interaction request interface and a backend service business logic call
  • a backend service business processing unit configured to train a business logic requirement of the classification model and the front end interaction request interface based on the SPARK algorithm, and create an initial classification model by calling the backend service data source system, and the initial classification model Perform training and predictive optimization to obtain the target classification model.
  • a classification model training method according to a classification model training system as described above, comprising:
  • the training data source is internally invoked by the backend service processing unit, and the initial classification model is trained by using a SPARK algorithm to obtain a classification model to be optimized;
  • the predictive optimization data source is internally invoked by the backend service business processing unit, and the SPARK algorithm is used to predict and optimize the classification model to be optimized to obtain a target classification model.
  • an apparatus for implementing a classification model training system including:
  • At least one processor At least one processor
  • At least one memory communicatively coupled to the processor, wherein:
  • the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform an implementation of a classification model training system as described above.
  • a non-transitory computer readable storage medium stores computer instructions that cause the computer to perform any of the above The implementation method of the classification model training system.
  • a computer program product comprising: a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when When the program instructions are executed by the computer, the computer is caused to perform the method of implementing the classification model training system as described above.
  • This application proposes a classification model training system and its implementation method.
  • the SPARK-based SPARK- MLlib's classification model training system uses the classification model training system to perform classification model training, which only needs to create classification model engineering on the front-end management display interface, and specify training data source, ETL algorithm, model algorithm, parameters and other training models and optimization.
  • the basic process of the model can realize the automatic creation, training and prediction optimization of the classification model, which can effectively simplify the training operation process of the classification model, thereby effectively reducing the labor intensity of the developer and improving the development efficiency.
  • FIG. 1 is a flowchart of a method for implementing a classification model training system according to an embodiment of the present application
  • FIG. 2 is a flowchart of a process for creating a backend service data source system according to an embodiment of the present application
  • FIG. 3 is a flowchart of a process for creating a backend service control interface according to an embodiment of the present application
  • FIG. 4 is a flowchart of a process of creating an internal service logic of a backend service control interface according to an embodiment of the present application
  • FIG. 5 is a flowchart of a process for creating an internal training service logic of a backend training management control interface according to an embodiment of the present application
  • FIG. 6 is a flowchart of a process for creating an internal optimization service logic of a backend optimization management control interface according to an embodiment of the present application
  • FIG. 7 is a schematic structural diagram of a classification model training system according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a backend service data source system according to an embodiment of the present application.
  • FIG. 9 is a flowchart of a method for training a classification model by using the classification model training system of the present application.
  • FIG. 10 is a flowchart of a processing procedure of a SPARK algorithm training classification model according to an embodiment of the present application
  • FIG. 11 is a flowchart of a processing procedure of a SPARK algorithm predictive optimization classification model according to an embodiment of the present application.
  • FIG. 12 is a structural block diagram of an apparatus for implementing a classification model training system according to an embodiment of the present application.
  • the present embodiment provides a method for implementing a classification model training system.
  • FIG. 1 it is a flowchart of a method for implementing a classification model training system according to an embodiment of the present invention, including:
  • S1 based on the SPARK algorithm to train the external management requirements of the classification model, create a front-end management presentation interface, and define a front-end interaction request interface of the front-end management display interface based on the interaction requirements of the external management and the back-end service.
  • the goal of the present embodiment is to establish a classification model training system based on the SPARK algorithm.
  • the whole system is a classification model automatic training and optimization system with a front-end management display interface and a service management system at the back end.
  • the user sets the algorithm and parameters of the classification model training through the front-end management display interface.
  • the back-end service management system calls the corresponding data source according to the front-end settings, constructs the classification model by using the built-in SPARK algorithm, and calls the training data source and the prediction optimization data source to classify. Model training and predictive optimization to obtain the target classification model.
  • step S1 the external management requirements of the process of generating the classification model process, the training classification model process and the prediction optimization classification model based on the SPARK algorithm are considered, that is, the data source that needs to be externally prepared, the algorithm and parameters to be set, etc., and the corresponding front-end management display
  • the interface is created, and a management interface is set on the front-end management display interface for each management requirement.
  • the front-end management display interface needs to perform data interaction with the back-end service management system to input the algorithms and parameters set by the user into the back-end service management system, the front-end management is performed according to the interaction requirements of the external management and the back-end service.
  • the corresponding front-end interactive request interface is defined in the display interface.
  • the front-end management display interface transmits the user settings to the back-end service management system through the front-end interaction request interface.
  • the front-end management display page interacts with the back-end service management system using standard REST APIs.
  • the algorithms processes and data needed to construct, train and predict the optimization process, the corresponding system is created.
  • the classification model training system based on SPARK algorithm is created, the algorithms, processes and data according to the SPARK algorithm are needed. That is, the internal business data needs, correspondingly create each data unit, the overall of each data unit is the back-end service data source system.
  • the back-end service data source system is a relatively important part, carrying the functions of the entire model training process control, data storage, model optimization strategy and providing data to the front-end display.
  • FIG. 2 is a flowchart of a process for creating a backend service data source system according to an embodiment of the present invention, including:
  • S21 importing a SPARK-MLlib machine learning library SPARK-MLlib, and separately creating a training data source library, a prediction optimization data source library, and a model system metadata database; S22, storing the prepared training sample data into the training data source library, The predicted optimized sample data is stored in the predictive optimized data source library.
  • the springBoot microservices framework is used to design the backend service management system.
  • the system metadata management uses MySQL database storage.
  • the model training and optimization are created using SPARK-MLlib.
  • the data source used by the training model uses hive storage.
  • the SPARK algorithm is used to train the classification model.
  • the machine learning library of the SPARK algorithm is used. Therefore, the SPARK-MLlib machine learning library SPARK-MLlib is first introduced, and then the training data source preparation, the model system metadata source preparation and the prediction optimization data source are separately performed.
  • the tagged training data source is prepared.
  • the manually prepared tagged training sample data is stored in the hive database, and the tag column (lable) and the data column (data) in the table are created, and the stored database is called a training data source library.
  • the forecasting optimizes the data source preparation.
  • the predictive optimization sample data is used to continuously optimize the classification model, called the prediction optimization data source (Hive-MySQL).
  • the prediction optimization data source is a partitioned hive data table, partitioned by day, and stores the data source that needs to be predicted every day. MySQL's predictive optimization data source is used to interact with the front-end management presentation interface to store data, which is imported by the hive table.
  • the back-end management flow control system includes a control layer and a service layer.
  • the Controller layer is mainly used to connect the front-end management display interface request and the back-end service data call.
  • the Service layer is mainly used to create the actual call link of the model training and optimization process.
  • Step S3 can be understood as the creation of the Controller layer.
  • the front-end management display interface request is transmitted through the front-end interaction request interface of the front-end management display interface.
  • the current-end management display interface sends a request through the front-end interaction request interface, in order to enable the back-end service management system to recognize the request, the corresponding back-end service is established. Controlling the interface and establishing a correspondence between the backend service control interface and the corresponding front end interaction request interface.
  • the front-end interactive request interface is a request url link in the form of http, and different request links are created for different service requests to ensure the uniqueness of the url of different service requests.
  • the back-end service control interface is a code method for implementing the business logic.
  • the function is to describe the service request described by the front end in the url manner, and correspondingly implement the specific code on the server side.
  • the url of the common classification model is defined as ip:port/create-model
  • the CreateModel (Model model) function is defined in the Controller layer of the backend service management system and associated with /create-model.
  • the CreateModel function is called when the backend service receives the frontend/create-model request.
  • the internal business logic includes: a business logic requirement for training a classification model based on a SPARK algorithm, and the front end interaction request interface, by calling the backend service data source system An initial classification model is created, and the initial classification model is trained and predicted to obtain a target classification model.
  • the back-end management flow control system includes a Controller layer and a Service layer
  • the Service layer is mainly used to create an actual calling link of the model training and optimization process, that is, defining a specific interface defined in the Controller layer.
  • Implementation process. This step creates the internal business logic of the backend service control interface by creating a Service layer.
  • An implementation method of a classification model training system provided by an embodiment of the present invention, by creating an external management front-end management presentation interface, and a back-end service data source system and a back-end service business processing unit of the back-end service management, and establishing each system The correspondence between the units is integrated into a system based on the SPARK algorithm for classification model construction, training and prediction optimization.
  • a SPARK-MLlib-based front-end management display interface is set up and classified in the back-end service management system.
  • the process framework for model training When using the system for classification model training, the training optimization process of the entire classification model can be completed only by the front-end management display interface operation, which can effectively simplify the classification model training operation process, thereby effectively reducing the developer labor intensity and improving the development efficiency.
  • the step of creating a front-end management presentation interface in step S1 further comprises: respectively creating a training management interface, an optimization management interface, and a classification model management interface of the classification model, wherein the training management interface is used for training the SPARK algorithm.
  • the training phase of the classification model provides external management support for providing external management support for the predictive optimization phase of the SPARK algorithm training classification model, the classification model management interface being used to provide external management support for the target classification model
  • the front end interaction request interface includes: a front end training interaction request interface, a front end optimization interaction request interface, and a front end model management interaction request interface.
  • the algorithms and parameters of the training process and the prediction optimization process need to be set.
  • the management parameters need to be set. Therefore, when creating the front-end management display interface, at least the training management interface, the optimization management interface, and the classification model management interface of the classification model need to be created.
  • an interface function needs to be set in each management interface, that is, a front-end training interaction request interface is set in the training management interface, and a front-end optimized interaction request interface is set in the optimization management interface, and the classification model management is performed.
  • the interface sets the front-end model management interaction request interface.
  • the training management interface includes at least: a classification model algorithm selection interface, a classification model algorithm parameter setting interface, a training data source setting interface, and a data preprocessing process setting interface; and the optimization management interface includes at least: classification model optimization The policy selection interface, the classification model optimization standard setting interface, and the prediction optimization data source setting interface; the classification model management interface at least includes: a classification model version management interface and a classification model effect presentation interface.
  • a front-end management display interface when creating a front-end management display interface, the implementation code is written by using Angularjs and html, firstly, a training management interface, an optimization management interface, and a classification model management interface of the classification model are created, and then created in each management interface.
  • Sub-interfaces including:
  • a classification model algorithm selection interface In the training management interface, a classification model algorithm selection interface, a classification model algorithm parameter setting interface, a training data source setting interface, and a data preprocessing flow setting interface are respectively used for selecting a classification model algorithm, setting a classification model algorithm parameter, and training data. Source selection and pre-processing settings for training data;
  • a classification model version management interface and a classification model effect presentation interface are created in the classification model management interface, which are respectively used for the version management of the classification model and the effect presentation of the classification model.
  • All front-end management presentation interfaces use post requests to interact with the back-end service management system for data interaction.
  • An implementation method of a classification model training system provided by an embodiment of the present invention can conveniently implement a classification model by separately creating a training management interface, an optimization management interface, and a classification model management interface of a classification model, and defining a setting sub-interface of each management interface.
  • the external settings management of the construction process, the training process, and the predictive optimization settings, and the setting of the classification algorithm by the pull-down selection list, so that the user only needs to click the corresponding option according to the need, without manual input, can improve work efficiency and user experience.
  • FIG. 3 is a flowchart of a process for creating a backend service control interface according to an embodiment of the present invention, including:
  • S31 based on the front-end training interaction request interface, creating a back-end training management control interface, and establishing a correspondence between the front-end training interaction request interface and the back-end training management control interface;
  • S32 optimizing an interaction request based on the front-end Interface, creating a backend optimization management control interface, and establishing a correspondence between the front end optimization interaction request interface and the back end optimization management control interface;
  • S33 managing an interaction request interface based on the front end model, and creating a backend model management control An interface, and establishing a correspondence between the front-end model management interaction request interface and the back-end model management control interface.
  • the creation of the backend service control interface is implemented by creating the Controller layer of the backend service management system, and the training management interface, the optimization management interface, and the classification are created when the front end management display interface is created.
  • the model management interface defines the front-end interaction request interface of each management interface. Therefore, when creating the backend service control interface, it is necessary to create a backend training management control interface, a backend optimization management control interface, and a backend model management control interface, and respectively establish corresponding correspondences between the corresponding interfaces, so as to obtain the classification model.
  • Each processing stage smoothly calls the corresponding interface.
  • step numbers S31, S32, and S33 in this embodiment only distinguish each step, and do not limit the order of implementation of the corresponding steps.
  • the implementation method of the classification model training system provided by the embodiment of the present invention creates a back-end service control interface of the back-end service management system by corresponding to each front-end management display interface, and establishes a correspondence between the front-end and the back-end, which can be called on the interface. Quickly and accurately call the corresponding interface to improve system processing efficiency.
  • processing step of the step S4 is further related to the process of creating an internal service logic of the backend service control interface according to the embodiment of the present invention, which includes:
  • S41 Create internal training service logic of the backend training management control interface, where the internal training service logic includes an internal business logic flow for training a classification model process based on a SPARK algorithm and the front end training interaction request interface, by calling the The SPARK-MLlib, the training data source library and the model system metadata database create an initial classification model, and train the initial classification model to obtain a classification model to be optimized.
  • the internal implementation business logic of the back-end training management control interface that is, the internal training business logic
  • the implementation process of the back-end training management control interface that is, the internal training business logic includes:
  • the processing rules and processes of the classification model are trained according to the SPARK algorithm, the initial classification model is constructed by calling SPARK-MLlib, and the initial classification model data is stored in the model system metadata database. Then, the training data source is obtained by accessing the training data source library, and the constructed initial classification model is trained by using the acquired training data source, and the trained classification model is the classification model to be optimized.
  • step S41 the step of further processing the internal training service logic of the backend training management control interface is described in step S41.
  • an internal training service for creating a backend training management control interface is performed according to an embodiment of the present invention.
  • the logical process flow diagram includes at least:
  • the prepared training data source is preprocessed to remove the noise in the data and better adapt to the model training.
  • training model preprocessing algorithms such as data uniform format, normalization and word substitution. The user can select the pre-processing algorithm through the front-end management display interface, and the back-end service management system invokes the corresponding processing logic according to the front-end selection.
  • the pre-processing internal business logic of the pre-processing algorithm option included in the front-end management display interface needs to be created, and the data pre-processing algorithm corresponding to the front-end pre-processing algorithm option is included in the data pre-processing database, so only the data pre-processing is required.
  • the data preprocessing algorithm included in the database is processed to create a corresponding preprocessed internal business logic.
  • S412. Create an internal service logic for generating a classification model corresponding to each of the classification algorithms based on a classification algorithm included in the SPARK-MLlib.
  • the classification model business implementation logic is different based on different classification algorithms
  • the constructed classification models are different
  • the training process of the model is different.
  • the corresponding classification model construction and classification model training process are implemented according to the user selection setting, and the internal business logic of the classification model corresponding to each classification algorithm is created.
  • the SPARK-MLlib-based classification model is constructed and trained to realize business logic, and the classification algorithms currently supported by SPARK-MLlib such as Naive Bayes, Support Vector Machine and Logistic Regression are newly built.
  • step S413 implements the creation of the classification model training program.
  • the training program is specifically implemented.
  • the SPARK program is created, the training data source is read, the classification model training script is generated and automatically uploaded to the SPARK cluster server.
  • the system calls the script, starts the SPARK program, creates the classification model, and stores the classification model results to the specified hdfs path, and stores the system metadata of the classification model such as model confusion matrix, correct rate and recall rate to the model system metadata database ( MySQL).
  • MySQL model system metadata database
  • S42 Create an internal optimization service logic of the backend optimization management control interface, where the internal optimization service logic includes an internal business logic flow that optimizes a classification model process based on a SPARK algorithm, and the front end optimization interaction request interface, by calling the The SPARK-MLlib, the prediction optimization data source library, and the model system metadata database perform prediction optimization on the classification model to be optimized, and acquire the target classification model.
  • the internal optimization service logic includes an internal business logic flow that optimizes a classification model process based on a SPARK algorithm, and the front end optimization interaction request interface, by calling the The SPARK-MLlib, the prediction optimization data source library, and the model system metadata database perform prediction optimization on the classification model to be optimized, and acquire the target classification model.
  • the prediction optimization of the classification model to be optimized is performed by itself, and the internal implementation business logic of the back-end optimization management control interface is defined correspondingly, that is, the internal optimization business logic.
  • the implementation process of the backend optimization management control interface, that is, the internal optimization business logic includes:
  • step numbers S41 and S42 in this embodiment only distinguish each step, and do not limit the order of implementation of the corresponding steps.
  • FIG. 6 is an internal optimization service for creating a backend optimization management control interface according to an embodiment of the present invention.
  • Logical process flow diagram including:
  • This step specifically creates a system according to the request data of the front-end optimized interaction request interface, generates a classification model optimization strategy, specifies a classification optimization model data source and a data column that needs to be predicted, creates a daily prediction task, and specifies an optimal parameter threshold of the classification model. That is, predictive optimization constraints to determine whether the classification model needs to continue to optimize the internal business logic.
  • the SPARK algorithm is used to train the internal business logic flow of the optimization process of the classification model, and the data access and prediction processing implementation logic of the prediction optimization optimization classification model is created.
  • This step specifically creates a predictive optimization strategy for the system, including: the system reads the hdfs path model, loads the classification model data into the memory, reads the source data that needs to be predicted from the Hive, predicts by the classification model, and writes the result to In the model system metabase MySQL, the results of the predictions are displayed on the system page.
  • S423 Create a data correction internal business logic that predicts and optimizes the classification model process to be optimized, and the data correction internal business logic includes: extracting a record of the prediction error of the classification model based on the optimization result of the classification model to be optimized, and performing Data correction.
  • This step specifically creates a data correction strategy for the predictive optimization process, including: correcting the predicted optimization data, and adding new predictive optimization sample data.
  • the system front-end management display interface is used to view the prediction results, and the model prediction error records are extracted and re-corrected, and the corrected classification model data is stored in the model system metadata database MySQL.
  • S424 Create a data update internal business logic that predicts and optimizes the classification model process to be optimized, and the data update internal business logic includes: extracting a classification model element into the prediction optimization data source library based on the data corrected model system metadata database Next partition.
  • This step specifically creates an update strategy for optimizing the classification model data, updates the classification model data, and adds new model features to the model system metadata database MySQL.
  • the system calls the Sqoop tool to extract MySQL into the new day partition data of the predicted optimized data source (Hive).
  • the internal business logic of the stop optimization classification model includes: re-specifying the prediction optimization data source library of the classification model, creating a prediction classification model and data optimization.
  • the business logic stops the model optimization until the classification model parameters reach the prediction optimization constraint.
  • the corrected classification model needs to be trained with the updated prediction optimization data source.
  • a model optimization stop strategy is created, a training data source of the classification model is re-designated, and the training classification model and the training data optimization step are repeated until the classification model parameters reach a preset prediction optimization constraint condition, then the training of the classification model is stopped, and the target is acquired.
  • Classification model is created, a training data source of the classification model is re-designated, and the training classification model and the training data optimization step are repeated until the classification model parameters reach a preset prediction optimization constraint condition, then the training of the classification model is stopped, and the target is acquired.
  • An implementation method of a classification model training system provided by an embodiment of the present invention, by separately creating an internal implementation business logic of a classification model construction, training, and prediction optimization process, so that when the user uses the classification model training system to perform classification model training, only It is necessary to set the data and parameters in the front-end management display interface, and the system can automatically complete the construction training and prediction optimization of the classification model, and the operation is simple, and the development efficiency is improved.
  • FIG. 7 is a schematic structural diagram of a classification model training system according to an embodiment of the present invention, including: a front-end management display interface, and a back end.
  • the front-end management display interface 1 is used for performing the training classification model process, the predictive optimization classification model process, and the external setting management of the classification model management.
  • the front-end management display interface 1 includes a front-end interaction request interface 101 for external management and back-end services.
  • Information interaction the back-end service data source system 2 is configured to train the internal business logic calling request of the classification model according to the SPARK algorithm, provide the machine learning data source of the SPARK algorithm, the training data source, the prediction optimization data source, and the model system metadata database;
  • the service control interface unit 3 is configured to establish a correspondence between the front end interaction request interface and the back end service business logic call;
  • the back end service business processing unit 4 is configured to train the business logic requirement of the classification model and the front end interaction based on the SPARK algorithm.
  • the request interface generates an initial classification model by calling the backend service data source system, and performs training and prediction optimization on the initial classification model to obtain a target classification model.
  • the classification model training system of the embodiment includes a front-end management presentation interface for the user to perform external management settings, a back-end service business processing unit 4 for back-end service management, and data for training for the classification model.
  • the supported backend service data source system 2 and the backend service control interface unit 3 for establishing a relationship between the user external management and the backend service management.
  • the front-end management display interface transmits the user settings to the back-end service management system through the front-end interaction request interface.
  • the front-end management display page interacts with the back-end service management system using standard REST APIs.
  • the front-end management display interface request is transmitted through the front-end interactive request interface.
  • the back-end service management system identifies the request through the corresponding back-end service control interface, and the back-end service service is recognized by the back-end service management system.
  • Processing unit 4 invokes the corresponding algorithm and flow. For the construction, training and predictive optimization of the classification model, the backend service business processing unit 4 calls the corresponding training and predictive optimization data to train and predict the constructed classification model.
  • the service listens for the request when it starts, and when there is a request, triggers the corresponding business logic. Firstly, based on the SPARK algorithm, the business logic requirements of the classification model and the front-end interactive request interface are described, and the initial classification model is created by calling the back-end service data source system; then the SPARK algorithm is used to train and predict the initial model to obtain the target classification. model.
  • the framework of the backend service data source system refers to FIG. 8, which is a schematic structural diagram of a backend service data source system according to an embodiment of the present invention, including: a MySQL model system metadata database, a Hive training data source library, and MySQL. -Hive predictive optimization data source library, classification model system unit, algorithm model unit and SPARK cluster.
  • the MySQL model system metabase is used to store model metadata
  • the Hive training data source library is used to store training source data
  • the MySQL-Hive predictive optimization data source library is used to store predictive optimized data sources.
  • the classification model training system enables the user to create a classification model project on the front-end management display interface of the system, and specifies a training data source, an ETL algorithm, a model algorithm, and the like.
  • the basic process of training model and optimization model such as parameters, follow-up training and optimization classification model only need to select and click on the interface, or create a timed task automatically executed by the system to obtain the target classification model in a short time, avoiding repeated and continuous Training sample preparation and parameter optimization, so that the user's focus on the optimization and implementation of the algorithm itself, to get rid of the past a lot of energy in data preparation and program operation, improve development efficiency.
  • the present embodiment provides a classification model training method according to the classification model training system as described above.
  • a classification model training system of the present invention is used for classification.
  • Flow chart of method training including:
  • the classification model training process is defined in the system, and the classification model training project is created.
  • the user selects the classification model using the algorithm through the front-end management display interface, formulates the algorithm parameters, selects the data source table, specifies the label column and the data column in the table, and defines the pre-processing (ETL) process of the training data source, and performs data on the initial data column.
  • Pre-processing specifying pre-processing operations such as data unification, normalization, and word replacement are used to remove noise from the data columns and better accommodate model training.
  • the backend service management system obtains the user's custom setting data through the front end interaction request interface and the back end service control interface unit.
  • the back-end service processing unit internally invokes the machine learning data source of the SPARK algorithm to construct an initial classification model and store the model in the model system metadata database.
  • the back-end service business processing unit creates a model training script according to the selected model algorithm and the training data source and automatically uploads it to the SPARK cluster server.
  • the system calls the script, starts the SPARK program, builds the initial classification model, and stores the model results to the specified hdfs path, and stores the classification model data such as model confusion matrix, correct rate and recall rate to the model system metadata database MySQL.
  • the training data source is internally invoked by the backend service processing unit, and the initial classification model is trained by using a SPARK algorithm to obtain a classification model to be optimized.
  • FIG. 10 a flowchart of a processing procedure of a SPARK algorithm training classification model according to an embodiment of the present invention, according to a training data source and a classification model selected by a user through a front-end management display interface, processing processing parameters, and a back-end service processing
  • the unit calls the corresponding tagged training data source in the Hive library and initializes the data source.
  • the initial classification model constructed by the above steps is then trained with the processed training data source.
  • the model training result is written into the model system metabase MySQL, and the model prediction file is written into the system storage unit to obtain the classification model to be optimized.
  • FIG. 11 a flowchart of a processing procedure of a SPARK algorithm predictive optimization classification model according to an embodiment of the present invention
  • the system creates a daily prediction task according to a user selecting a predictive optimization data source of a classification model and a data column that needs to be predicted.
  • Specify the optimal parameter threshold of the model that is, predictive optimization constraints to determine whether the classification model needs to be continuously optimized.
  • the system reads the hdfs path model, loads the classification model data into the memory, reads the data source that needs to be predicted from Hive, predicts it by the classification model, and writes the prediction result to the model system metabase MySQL, and displays it in the front-end management.
  • the interface displays the predicted results;
  • the classification model parameters are optimized, that is, the data is corrected, and a new training model sample is added.
  • the front-end management display interface is used to view the prediction results of the classification model, and the records of the model prediction errors are corrected, extracted, and stored in the model system metadata database MySQL.
  • the training data source is updated for the classification model after optimizing the parameters.
  • the new feature data is stored in the model system metabase MySQL, and the system calls the Sqoop tool to extract MySQL into the new day partition data of the predicted optimized data source (Hive).
  • a classification model training method according to the classification model training system as described above is adopted, and the parameter selection and setting of the classification model construction, the classification model training, and the classification model prediction optimization process are performed in the front-end management display interface.
  • the end service management system automatically creates a classification model construction, training and prediction optimization processing flow, and obtains a target classification model that conforms to the setting, which can effectively simplify the classification model training operation flow, thereby effectively reducing the developer labor intensity and improving the development efficiency.
  • FIG. 12 is a structural block diagram of an apparatus for implementing a classification model training system according to an embodiment of the present application.
  • the implementation device of the classification model training system includes: a processor 1201, a memory 1202, and a bus 1203;
  • the processor 1201 and the memory 1202 complete communication with each other through the bus 1203;
  • the memory 1202 stores program instructions that are executable by the processor 1201, and the processor 1201 is configured to invoke program instructions in the memory 1202 to perform the implementation of the foregoing method for implementing the classification model training system.
  • the method includes, for example, training an external management requirement of the classification model based on the SPARK algorithm, creating a front-end management presentation interface, and defining a front-end interaction request interface of the front-end management presentation interface based on an interaction requirement of the external management and the back-end service; S21, importing a SPARK-MLlib machine learning library SPARK-MLlib, and separately creating a training data source library, a prediction optimization data source library, and a model system metadata database; S22, storing the prepared training sample data into the training data source library, The predicted optimized sample data is stored in the predictive optimized data source library and the like.
  • the embodiment discloses a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a computer, the computer
  • the method provided by the implementation method of the foregoing classification model training system includes, for example, training an external management requirement of the classification model based on the SPARK algorithm, creating a front-end management presentation interface, and based on an interaction requirement between the external management and the back-end service, Defining a front-end interaction request interface of the front-end management display interface; and, S21, importing a SPARK-MLlib of a SPARK algorithm, and separately creating a training data source library, a prediction optimization data source library, and a model system metadata database; S22, The prepared training sample data is stored in the training data source library, and the predicted optimized sample data is stored in the predicted optimized data source library or the like.
  • Another embodiment of the present invention provides a non-transitory computer readable storage medium storing computer instructions, the computer instructions causing the computer to execute an implementation method of each of the above classification model training systems
  • the method provided by the embodiment includes, for example, training an external management requirement of the classification model based on the SPARK algorithm, creating a front-end management presentation interface, and defining a front-end interaction request of the front-end management presentation interface based on an interaction requirement of the external management and the back-end service.
  • a person skilled in the art may understand that all or part of the steps of implementing the foregoing method for implementing the classification model training system may be completed by using hardware related to the program instructions, and the foregoing program may be stored in a computer readable storage medium.
  • the program when executed, performs the steps of the embodiment of the implementation method including the above-described various classification model training systems; and the foregoing storage medium includes various media that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
  • the implementation device and the like of the classification model training system described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as the unit may or may not be It is not a physical unit, it can be located in one place, or it can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment. Those of ordinary skill in the art can understand and implement without deliberate labor.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Telephonic Communication Services (AREA)

Abstract

A classification model training system and a realisation method therefor. The realisation method comprises: S1, creating a front-end management and display interface (1) of a SPARK algorithm classification model training system, and defining a front-end interaction request interface (101) of the front-end management and display interface (1); S2, creating a back-end service data source system (2) of the SPARK algorithm classification model training system; S3, based on the front-end interaction request interface (101) of the front-end management and display interface (1), creating a back-end service control interface (3), and establishing a correlation between the back-end service control interface (3) and the front-end interaction request interface (101); and S4, creating a SPARK algorithm training and optimised classification model based internal service logic in the back-end service control interface (3). By means of the classification model training system, an operation flow of classification model training can be effectively simplified, so as to effectively reduce the labour intensity for a developer and improve the development efficiency.

Description

一种分类模型训练系统及其实现方法Classification model training system and implementation method thereof
交叉引用cross reference
本申请引用于2017年08月29日提交的专利名称为“一种分类模型训练系统及其实现方法”的第201710756004.6号中国专利申请,其通过引用被全部并入本申请。The present application is hereby incorporated by reference in its entirety in its entirety in its entirety in its entirety in its entirety in the the the the the the the the the
技术领域Technical field
本发明涉及信息处理技术领域,更具体地,涉及一种分类模型训练系统及其实现方法。The present invention relates to the field of information processing technologies, and in particular, to a classification model training system and an implementation method thereof.
背景技术Background technique
目前,使用SPARK算法的机器学习库SPARK.MLlib进行机器学习已成为常用机器学习方式。为了方便快捷地使用SPARK.MLlib进行分类算法模型训练,且由于分类算法属于监督学习,需要提前准备大量带标签的样本,分为训练样本与测试样本,再由SPARK.MLlib利用这些带标签的样本进行分类算法模型的训练,在此过程中需不断调整样本与模型参数来优化分类算法模型。At present, machine learning library SPARK.MLlib using SPARK algorithm has become a common machine learning method. In order to quickly and easily use SPARK.MLlib for classification algorithm model training, and because the classification algorithm belongs to supervised learning, it is necessary to prepare a large number of labeled samples in advance, which are divided into training samples and test samples, and then these labeled samples are used by SPARK.MLlib. The training of the classification algorithm model is carried out. In this process, the sample and model parameters need to be continuously adjusted to optimize the classification algorithm model.
常用的优化分类模型的方法需要手动不断新增训练样本,以使样本覆盖模型的全部特征,增加分类模型的准确率和召回率。手动新增训练样本和进行模型参数优化,需要花费开发者大量的时间跟精力在数据准备和程序运行上,导致开发效率较低。The commonly used method of optimizing the classification model requires manual addition of training samples to cover all the features of the model and increase the accuracy and recall rate of the classification model. Manually adding training samples and optimizing model parameters requires a lot of time and effort on the data preparation and program operation, resulting in low development efficiency.
发明内容Summary of the invention
为了克服上述问题或者至少部分地解决上述问题,本发明提供一种分类模型训练系统及其实现方法,以达到有效简化分类模型训练操作流程,从而有效降低开发者劳动强度及提高开发效率的目的。In order to overcome the above problems or at least partially solve the above problems, the present invention provides a classification model training system and an implementation method thereof, so as to effectively simplify the training operation process of the classification model, thereby effectively reducing the labor intensity of the developer and improving the development efficiency.
根据本发明的一个方面,提供一种分类模型训练系统的实现方法,包括:According to an aspect of the present invention, a method for implementing a classification model training system is provided, including:
S1,基于SPARK算法训练分类模型的外部管理需求,创建前端管理展示界面,并基于外部管理与后端服务的交互需求,定义所述前端管理展示界面的前端交互请求接口;S1, based on the SPARK algorithm to train the external management requirements of the classification model, create a front-end management presentation interface, and define a front-end interaction request interface of the front-end management display interface based on the interaction requirements of the external management and the back-end service;
S2,基于SPARK算法训练分类模型的内部业务数据需求,创建后端服务数据源系统;S2, based on the SPARK algorithm to train the internal business data requirements of the classification model, and create a back-end service data source system;
S3,基于所述前端管理展示界面的前端交互请求接口,创建后端服务控制接口,并建立所述后端服务控制接口与所述前端交互请求接口的对应关系;S3. Create a backend service control interface based on the front end interaction request interface of the front end management presentation interface, and establish a correspondence between the backend service control interface and the front end interaction request interface.
S4,创建所述后端服务控制接口的内部业务逻辑,所述内部业务逻辑包括,基于SPARK算法训练分类模型的业务逻辑需求和所述前端交互请求接口,通过调用所述后端服务数据源系统,创建初始分类模型,并对所述初始分类模型进行训练和预测优化,获取目标分类模型。S4, creating internal business logic of the backend service control interface, the internal business logic includes: a business logic requirement for training a classification model based on a SPARK algorithm, and the front end interaction request interface, by calling the backend service data source system An initial classification model is created, and the initial classification model is trained and predicted to obtain a target classification model.
其中,步骤S1中所述创建前端管理展示界面的步骤进一步包括:The step of creating a front-end management display interface in the step S1 further includes:
分别创建分类模型的训练管理界面、优化管理界面和分类模型管理界面,所述训练管理界面用于为SPARK算法训练分类模型的训练阶段提供外部管理支持,所述优化管理界面用于为SPARK算法训练分类模型的预测优化阶段提供外部管理支持,所述分类模型管理界面用于为所述目标分类模型提供外部管理支持;Separately creating a training management interface, an optimization management interface, and a classification model management interface of the classification model, wherein the training management interface is used to provide external management support for the training phase of the SPARK algorithm training classification model, and the optimization management interface is used for training the SPARK algorithm. The predictive optimization phase of the classification model provides external management support for providing external management support for the target classification model;
相应的,所述前端交互请求接口包括:前端训练交互请求接口、前端优化交互请求接口和前端模型管理交互请求接口。Correspondingly, the front end interaction request interface comprises: a front end training interaction request interface, a front end optimization interaction request interface, and a front end model management interaction request interface.
其中,所述训练管理界面至少包括:分类模型算法选择界面、分类模型算法参数设置界面、训练数据源设置界面和数据预处理流程设置界面;The training management interface includes at least: a classification model algorithm selection interface, a classification model algorithm parameter setting interface, a training data source setting interface, and a data preprocessing flow setting interface;
所述优化管理界面至少包括:分类模型优化策略选择界面、分类模型优化标准设置界面和预测优化数据源设置界面;The optimization management interface includes at least: a classification model optimization strategy selection interface, a classification model optimization standard setting interface, and a prediction optimization data source setting interface;
所述分类模型管理界面至少包括:分类模型版本管理界面和分类模型效果展现界面。The classification model management interface includes at least: a classification model version management interface and a classification model effect presentation interface.
其中,步骤S2中所述创建后端服务数据源系统的步骤进一步包括:The step of creating a backend service data source system in step S2 further includes:
S21,导入SPARK算法的机器学习库SPARK-MLlib,并分别创建训练数据源库、预测优化数据源库和模型系统元数据库;S21, importing the SPARK-MLlib machine learning library SPARK-MLlib, and separately creating a training data source library, a prediction optimization data source library, and a model system metadata database;
S22,将准备好的训练样本数据存入所述训练数据源库,并将预测优化样本数据存入所述预测优化数据源库。S22. The prepared training sample data is stored in the training data source library, and the predicted optimized sample data is stored in the predictive optimized data source library.
其中,所述S3的步骤进一步包括:The step of S3 further includes:
S31,基于所述前端训练交互请求接口,创建后端训练管理控制接口,并建立所述前端训练交互请求接口与所述后端训练管理控制接口的对应关系;S31. Create a backend training management control interface based on the front end training interaction request interface, and establish a correspondence between the front end training interaction request interface and the back end training management control interface.
S32,基于所述前端优化交互请求接口,创建后端优化管理控制接口,并建立所述前端优化交互请求接口与所述后端优化管理控制接口的对应关系;S32. Create a backend optimization management control interface based on the front end optimization interaction request interface, and establish a correspondence between the front end optimization interaction request interface and the backend optimization management control interface.
S33,基于所述前端模型管理交互请求接口,创建后端模型管理控制接口,并建立所述前端模型管理交互请求接口与所述后端模型管理控制接口的对应关系。S33. The interaction request interface is managed based on the front-end model, and a back-end model management control interface is created, and a correspondence between the front-end model management interaction request interface and the back-end model management control interface is established.
其中,所述S4的步骤进一步至少包括:The step of S4 further includes at least:
S41,创建所述后端训练管理控制接口的内部训练业务逻辑,所述内部训练业务逻辑包括,基于SPARK算法训练分类模型过程的内部业务逻辑流程和所述前端训练交互请求接口,通过调用所述 SPARK-MLlib、所述训练数据源库和所述模型系统元数据库,创建初始分类模型,并对所述初始分类模型进行训练,获取待优化分类模型;S41: Create internal training service logic of the backend training management control interface, where the internal training service logic includes an internal business logic flow for training a classification model process based on a SPARK algorithm and the front end training interaction request interface, by calling the SPARK-MLlib, the training data source library and the model system metadata database, creating an initial classification model, and training the initial classification model to obtain a classification model to be optimized;
S42,创建所述后端优化管理控制接口的内部优化业务逻辑,所述内部优化业务逻辑包括,基于SPARK算法优化分类模型过程的内部业务逻辑流程和所述前端优化交互请求接口,通过调用所述SPARK-MLlib、所述预测优化数据源库和所述模型系统元数据库,对所述待优化分类模型进行预测优化,获取所述目标分类模型。S42: Create an internal optimization service logic of the backend optimization management control interface, where the internal optimization service logic includes an internal business logic flow that optimizes a classification model process based on a SPARK algorithm, and the front end optimization interaction request interface, by calling the The SPARK-MLlib, the prediction optimization data source library, and the model system metadata database perform prediction optimization on the classification model to be optimized, and acquire the target classification model.
其中,步骤S41中所述创建所述后端训练管理控制接口的内部训练业务逻辑的步骤进一步至少包括:The step of creating the internal training service logic of the backend training management control interface in the step S41 further includes:
S411,基于数据预处理数据库包含的数据预处理算法,创建各所述数据预处理算法对应的预处理内部业务逻辑;S411: Create a pre-processed internal service logic corresponding to each of the data pre-processing algorithms based on a data pre-processing algorithm included in the data pre-processing database;
S412,基于SPARK-MLlib包含的分类算法,创建各所述分类算法对应的生成分类模型的内部业务逻辑;S412. Create, according to a classification algorithm included in the SPARK-MLlib, an internal service logic that generates a classification model corresponding to each of the classification algorithms.
S413,基于所述训练管理界面的设置数据,通过调用所述训练数据源库、所述预处理内部业务逻辑和所述生成分类模型的内部业务逻辑,创建训练分类模型的内部业务逻辑。S413. Create internal business logic of the training classification model by calling the training data source library, the pre-processing internal business logic, and the internal business logic of the generated classification model based on the setting data of the training management interface.
其中,步骤S42中所述创建所述后端优化管理控制接口的内部优化业务逻辑的步骤进一步包括:The step of creating the internal optimization service logic of the backend optimization management control interface in the step S42 further includes:
S421,基于所述前端优化交互请求接口的请求数据,选定预测优化所述待优化分类模型的预测优化数据源和预测优化约束条件;S421: Select, according to the request data of the front end optimization interaction request interface, predictively optimize the prediction optimization data source and the prediction optimization constraint condition of the classification model to be optimized;
S422,基于SPARK算法训练分类模型的优化过程内部业务逻辑流程,创建预测优化所述待优化分类模型的数据访问和预测处理实现逻辑;S422, training an internal business logic flow of the optimization process of the classification model based on the SPARK algorithm, and creating a data access and prediction processing implementation logic for predicting and optimizing the classification model to be optimized;
S423,创建预测优化所述待优化分类模型过程的数据纠正内部业务逻辑,所述数据纠正内部业务逻辑包括,基于对所述待优化分类模 型的优化结果,提取对分类模型预测错误的记录,进行数据纠正;S423: Create a data correction internal business logic that predicts and optimizes the classification model process to be optimized, and the data correction internal business logic includes: extracting a record of the prediction error of the classification model based on the optimization result of the classification model to be optimized, and performing Data correction
S424,创建预测优化所述待优化分类模型过程的数据更新内部业务逻辑,所述数据更新内部业务逻辑包括,基于经所述数据纠正的模型系统元数据库,抽取分类模型元导入所述预测优化数据源库下一分区;S424: Create a data update internal business logic that predicts and optimizes the classification model process to be optimized, and the data update internal business logic includes: extracting a classification model element to import the prediction optimization data based on the model system metadata database corrected by the data The next partition of the source library;
S425,基于所述预测优化约束条件,创建停止优化分类模型的内部业务逻辑,所述停止优化分类模型的内部业务逻辑包括,重新指定分类模型的预测优化数据源库,创建预测分类模型与数据优化业务逻辑,直至分类模型参数达到所述预测优化约束条件,停止模型优化。S425. Create an internal business logic that stops the optimization classification model based on the prediction optimization constraint condition. The internal business logic of the stop optimization classification model includes: re-specifying the prediction optimization data source library of the classification model, creating a prediction classification model and data optimization. The business logic stops the model optimization until the classification model parameters reach the prediction optimization constraint.
根据本发明另一方面,提供一种分类模型训练系统,包括:According to another aspect of the present invention, a classification model training system is provided, comprising:
前端管理展示界面,用于进行训练分类模型过程、预测优化分类模型过程和分类模型管理的外部设置管理,所述前端管理展示界面包括前端交互请求接口,用于外部管理与后端服务的信息交互;The front-end management display interface is used for performing the training classification model process, the prediction optimization classification model process, and the external setting management of the classification model management. The front-end management presentation interface includes a front-end interaction request interface, and is used for external management and information interaction of the back-end service. ;
后端服务数据源系统,用于根据SPARK算法训练分类模型的内部业务逻辑调用请求,提供SPARK算法的机器学习数据源,训练数据源、预测优化数据源和模型系统元数据库;a backend service data source system for training an internal business logic call request of the classification model according to the SPARK algorithm, providing a machine learning data source of the SPARK algorithm, a training data source, a prediction optimization data source, and a model system metadata database;
后端服务控制接口单元,用于建立所述前端交互请求接口与后端服务业务逻辑调用间的对应关系;a backend service control interface unit, configured to establish a correspondence between the front end interaction request interface and a backend service business logic call;
后端服务业务处理单元,用于基于SPARK算法训练分类模型的业务逻辑需求和所述前端交互请求接口,通过调用所述后端服务数据源系统,创建初始分类模型,并对所述初始分类模型进行训练和预测优化,获取目标分类模型。a backend service business processing unit, configured to train a business logic requirement of the classification model and the front end interaction request interface based on the SPARK algorithm, and create an initial classification model by calling the backend service data source system, and the initial classification model Perform training and predictive optimization to obtain the target classification model.
根据本发明又一方面,提供一种根据如上所述分类模型训练系统的分类模型训练方法,包括:According to still another aspect of the present invention, there is provided a classification model training method according to a classification model training system as described above, comprising:
通过所述前端交互请求接口和所述后端服务控制接口单元,获取 所述前端管理展示界面输入的分类模型的构建设置数据、训练过程设置数据和优化过程设置数据;Acquiring, by the front end interaction request interface and the back end service control interface unit, build setting data, training process setting data, and optimization process setting data of the classification model input by the front end management display interface;
基于所述分类模型的构建设置数据,通过所述后端服务业务处理单元内部调用所述SPARK算法的机器学习数据源,构建初始分类模型,并存入所述模型系统元数据库;Constructing setting data based on the classification model, internally calling the machine learning data source of the SPARK algorithm by the backend service business processing unit, constructing an initial classification model, and depositing the model system metadata database;
基于所述分类模型的训练过程设置数据,通过所述后端服务业务处理单元内部调用所述训练数据源,采用SPARK算法对所述初始分类模型进行训练,获取待优化分类模型;Based on the training process setting data of the classification model, the training data source is internally invoked by the backend service processing unit, and the initial classification model is trained by using a SPARK algorithm to obtain a classification model to be optimized;
基于所述分类模型的优化过程设置数据,通过所述后端服务业务处理单元内部调用所述预测优化数据源,采用SPARK算法对所述待优化分类模型进行预测优化,获取目标分类模型。Based on the optimization process setting data of the classification model, the predictive optimization data source is internally invoked by the backend service business processing unit, and the SPARK algorithm is used to predict and optimize the classification model to be optimized to obtain a target classification model.
根据本发明再一方面,提供一种分类模型训练系统的实现设备,包括:According to still another aspect of the present invention, an apparatus for implementing a classification model training system is provided, including:
至少一个处理器;以及At least one processor;
与所述处理器通信连接的至少一个存储器,其中:At least one memory communicatively coupled to the processor, wherein:
所述存储器存储有可被所述处理器执行的程序指令,所述处理器调用所述程序指令能够执行如上述任一所述的分类模型训练系统的实现方法。The memory stores program instructions executable by the processor, the processor invoking the program instructions to perform an implementation of a classification model training system as described above.
根据本发明再一方面,提供一种非暂态计算机可读存储介质,其特征在于,所述非暂态计算机可读存储介质存储计算机指令,所述计算机指令使所述计算机执行如上述任一所述的分类模型训练系统的实现方法。According to still another aspect of the present invention, a non-transitory computer readable storage medium is provided, wherein the non-transitory computer readable storage medium stores computer instructions that cause the computer to perform any of the above The implementation method of the classification model training system.
根据本发明再一方面,提供一种计算机程序产品,其特征在于,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执 行时,使所述计算机执行如上述任一所述的分类模型训练系统的实现方法。According to still another aspect of the present invention, a computer program product is provided, comprising: a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when When the program instructions are executed by the computer, the computer is caused to perform the method of implementing the classification model training system as described above.
本申请提出一种分类模型训练系统及其实现方法,通过将利用SPARK-MLlib进行分类模型训练过程中的训练数据与模型训练、新增新特征训练样本和优化模型参数进行整合,形成基于SPARK-MLlib的分类模型训练系统,利用所述分类模型训练系统进行分类模型训练,实现只需在前端管理展示界面上创建分类模型工程,指定训练数据源、ETL算法、模型算法、参数等训练模型与优化模型的基本流程,即可实现分类模型的自动创建、训练和预测优化,能够有效简化分类模型训练操作流程,从而有效降低开发者劳动强度,提高开发效率。This application proposes a classification model training system and its implementation method. By integrating the training data in the classification model training process with SPARK-MLlib with model training, adding new feature training samples and optimizing model parameters, the SPARK-based SPARK- MLlib's classification model training system uses the classification model training system to perform classification model training, which only needs to create classification model engineering on the front-end management display interface, and specify training data source, ETL algorithm, model algorithm, parameters and other training models and optimization. The basic process of the model can realize the automatic creation, training and prediction optimization of the classification model, which can effectively simplify the training operation process of the classification model, thereby effectively reducing the labor intensity of the developer and improving the development efficiency.
附图说明DRAWINGS
图1为本申请实施例一种分类模型训练系统的实现方法的流程图;1 is a flowchart of a method for implementing a classification model training system according to an embodiment of the present application;
图2为本申请实施例一种创建后端服务数据源系统的处理过程流程图;2 is a flowchart of a process for creating a backend service data source system according to an embodiment of the present application;
图3为本申请实施例一种创建后端服务控制接口的处理过程流程图;3 is a flowchart of a process for creating a backend service control interface according to an embodiment of the present application;
图4为本申请实施例一种创建后端服务控制接口的内部业务逻辑的处理过程流程图;4 is a flowchart of a process of creating an internal service logic of a backend service control interface according to an embodiment of the present application;
图5为本申请实施例一种创建后端训练管理控制接口的内部训练业务逻辑的处理过程流程图;FIG. 5 is a flowchart of a process for creating an internal training service logic of a backend training management control interface according to an embodiment of the present application;
图6为本申请实施例一种创建后端优化管理控制接口的内部优化业务逻辑的处理过程流程图;6 is a flowchart of a process for creating an internal optimization service logic of a backend optimization management control interface according to an embodiment of the present application;
图7为本申请实施例一种分类模型训练系统的结构示意图;7 is a schematic structural diagram of a classification model training system according to an embodiment of the present application;
图8为本申请实施例一种后端服务数据源系统的结构示意图;FIG. 8 is a schematic structural diagram of a backend service data source system according to an embodiment of the present application;
图9为本申请实施例一种利用本申请分类模型训练系统进行分类模型训练的方法流程图;FIG. 9 is a flowchart of a method for training a classification model by using the classification model training system of the present application;
图10为本申请实施例一种SPARK算法训练分类模型的处理过程流程图;10 is a flowchart of a processing procedure of a SPARK algorithm training classification model according to an embodiment of the present application;
图11为本申请实施例一种SPARK算法预测优化分类模型的处理过程流程图;11 is a flowchart of a processing procedure of a SPARK algorithm predictive optimization classification model according to an embodiment of the present application;
图12为本申请实施例一种分类模型训练系统的实现设备的结构框图。FIG. 12 is a structural block diagram of an apparatus for implementing a classification model training system according to an embodiment of the present application.
具体实施方式Detailed ways
为使本发明的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明的一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本发明保护的范围。The present invention will be clearly and completely described in the following with reference to the drawings in the embodiments of the present invention. Some embodiments, but not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
作为本发明实施例的一个方面,本实施例提供一种分类模型训练系统的实现方法,参考图1,为本发明实施例一种分类模型训练系统的实现方法的流程图,包括:As an aspect of the embodiment of the present invention, the present embodiment provides a method for implementing a classification model training system. Referring to FIG. 1 , it is a flowchart of a method for implementing a classification model training system according to an embodiment of the present invention, including:
S1,基于SPARK算法训练分类模型的外部管理需求,创建前端管理展示界面,并基于外部管理与后端服务的交互需求,定义所述前端管理展示界面的前端交互请求接口。S1, based on the SPARK algorithm to train the external management requirements of the classification model, create a front-end management presentation interface, and define a front-end interaction request interface of the front-end management display interface based on the interaction requirements of the external management and the back-end service.
可以理解为,本实施例的目标在于建立一个基于SPARK算法的分类模型训练系统,整个系统是一个前端有管理展示界面,后端有服务管理系统的分类模型自动训练和优化系统。用户通过前端的管理展示界面设置分类模型训练的算法和参数,后端服务管理系统根据前端设 置调用相应数据源,利用内置的SPARK算法构建分类模型,并调用训练数据源和预测优化数据源进行分类模型的训练和预测优化,获取目标分类模型。It can be understood that the goal of the present embodiment is to establish a classification model training system based on the SPARK algorithm. The whole system is a classification model automatic training and optimization system with a front-end management display interface and a service management system at the back end. The user sets the algorithm and parameters of the classification model training through the front-end management display interface. The back-end service management system calls the corresponding data source according to the front-end settings, constructs the classification model by using the built-in SPARK algorithm, and calls the training data source and the prediction optimization data source to classify. Model training and predictive optimization to obtain the target classification model.
步骤S1中考虑到基于SPARK算法生成分类模型过程、训练分类模型过程和预测优化分类模型过程的外部管理需求,即需要外部准备的数据源、需要设置的算法和参数等,相应的进行前端管理展示界面的创建,并针对每项管理需求在前端管理展示界面上设置管理接口。另外,考虑到前端管理展示界面需要与后端服务管理系统进行数据交互,以将用户设置的算法和参数输入到后端服务管理系统,因此根据外部管理与后端服务的交互需求,在前端管理展示界面中相应的定义前端交互请求接口。In step S1, the external management requirements of the process of generating the classification model process, the training classification model process and the prediction optimization classification model based on the SPARK algorithm are considered, that is, the data source that needs to be externally prepared, the algorithm and parameters to be set, etc., and the corresponding front-end management display The interface is created, and a management interface is set on the front-end management display interface for each management requirement. In addition, considering that the front-end management display interface needs to perform data interaction with the back-end service management system to input the algorithms and parameters set by the user into the back-end service management system, the front-end management is performed according to the interaction requirements of the external management and the back-end service. The corresponding front-end interactive request interface is defined in the display interface.
当用户通过前端管理展示界面进行分类模型训练过程的设置时,前端管理展示界面通过前端交互请求接口将用户设置传输到后端服务管理系统。前端管理展示页面采用标准REST API与后端服务管理系统进行交互。When the user performs the setting of the classification model training process through the front-end management presentation interface, the front-end management display interface transmits the user settings to the back-end service management system through the front-end interaction request interface. The front-end management display page interacts with the back-end service management system using standard REST APIs.
S2,基于SPARK算法训练分类模型的内部业务数据需求,创建后端服务数据源系统。S2, based on the SPARK algorithm to train the internal business data requirements of the classification model, and create a back-end service data source system.
可以理解为,在进行分类模型的构建、训练和预测优化时,需要调用相应的算法和流程,并用相应的训练和预测优化数据对构建的分类模型进行训练和预测优化。因此本步骤进行提供这些算法、流程和数据的系统的创建。It can be understood that when constructing, training and predicting the classification model, the corresponding algorithms and processes need to be called, and the training model and the prediction optimization data are used to train and predict the optimized classification model. This step therefore creates a system that provides these algorithms, processes, and data.
具体根据进行分类模型构建、训练和预测优化过程需要用到的算法、流程和数据创建相应的系统,在进行基于SPARK算法的分类模型训练系统的创建时,根据SPARK算法需要的算法、流程和数据,即内部业务数据需求,相应的创建各个数据单元,各数据单元的总体即为后端服务数据源系统。According to the algorithms, processes and data needed to construct, train and predict the optimization process, the corresponding system is created. When the classification model training system based on SPARK algorithm is created, the algorithms, processes and data according to the SPARK algorithm are needed. That is, the internal business data needs, correspondingly create each data unit, the overall of each data unit is the back-end service data source system.
后端服务数据源系统是比较重要的部分,承载着整个模型训练的流程控制、数据存储、模型优化策略及提供数据给前端展现等功能。The back-end service data source system is a relatively important part, carrying the functions of the entire model training process control, data storage, model optimization strategy and providing data to the front-end display.
其中可选的,步骤S2中所述创建后端服务数据源系统的进一步处理步骤参考图2,为本发明实施例一种创建后端服务数据源系统的处理过程流程图,包括:Optionally, the processing step of creating the backend service data source system in the step S2 is as follows: FIG. 2 is a flowchart of a process for creating a backend service data source system according to an embodiment of the present invention, including:
S21,导入SPARK算法的机器学习库SPARK-MLlib,并分别创建训练数据源库、预测优化数据源库和模型系统元数据库;S22,将准备好的训练样本数据存入所述训练数据源库,并将预测优化样本数据存入所述预测优化数据源库。S21, importing a SPARK-MLlib machine learning library SPARK-MLlib, and separately creating a training data source library, a prediction optimization data source library, and a model system metadata database; S22, storing the prepared training sample data into the training data source library, The predicted optimized sample data is stored in the predictive optimized data source library.
可以理解为,本步骤进行数据源准备。使用springBoot微服务框架设计后端服务管理系统,系统元数据管理使用MySQL数据库存储,模型训练与优化创建使用SPARK-MLlib创建,训练模型使用到的数据源使用hive存储。基于SPARK算法进行分类模型的训练需用到SPARK算法的机器学习库,因此首先导入SPARK算法的机器学习库SPARK-MLlib,然后分别进行训练数据源准备、模型系统元数据源准备和预测优化数据源准备,即:It can be understood that this step performs data source preparation. The springBoot microservices framework is used to design the backend service management system. The system metadata management uses MySQL database storage. The model training and optimization are created using SPARK-MLlib. The data source used by the training model uses hive storage. The SPARK algorithm is used to train the classification model. The machine learning library of the SPARK algorithm is used. Therefore, the SPARK-MLlib machine learning library SPARK-MLlib is first introduced, and then the training data source preparation, the model system metadata source preparation and the prediction optimization data source are separately performed. Prepare, ie:
其一,进行带标签的训练数据源准备。将手动准备的带标签训练样本数据存储在hive数据库中,创建表中标签列(lable)与数据列(data),存入的数据库称为训练数据源库。First, the tagged training data source is prepared. The manually prepared tagged training sample data is stored in the hive database, and the tag column (lable) and the data column (data) in the table are created, and the stored database is called a training data source library.
其二,进行模型系统元数据源(mysql)准备。用来存储模型元数据信息,称为模型系统元数据库(MySQL)。Second, prepare the model system metadata source (mysql). Used to store model metadata information, called the Model System Metabase (MySQL).
其三,进行预测优化数据源准备。预测优化样本数据用来不断优化分类模型,称为预测优化数据源(Hive-MySQL),预测优化数据源为带分区的hive数据表,按天进行分区,存储每天需要预测的数据源。MySQL的预测优化数据源用来与前端管理展示界面交互存储数据,由hive表导入。Third, the forecasting optimizes the data source preparation. The predictive optimization sample data is used to continuously optimize the classification model, called the prediction optimization data source (Hive-MySQL). The prediction optimization data source is a partitioned hive data table, partitioned by day, and stores the data source that needs to be predicted every day. MySQL's predictive optimization data source is used to interact with the front-end management presentation interface to store data, which is imported by the hive table.
S3,基于所述前端管理展示界面的前端交互请求接口,创建后端服务控制接口,并建立所述后端服务控制接口与所述前端交互请求接口的对应关系。S3. Create a backend service control interface based on the front end interaction request interface of the front end management presentation interface, and establish a correspondence between the backend service control interface and the front end interaction request interface.
可以理解为,在基于SPARK算法的分类模型训练系统中,后端管理流程控制系统包含控制(Controller)层和服务(Service)层。Controller层主要用于连接前端管理展示界面请求与后端服务数据调用,Service层主要用于创建模型训练与优化过程的实际调用链路。It can be understood that in the SPARK algorithm-based classification model training system, the back-end management flow control system includes a control layer and a service layer. The Controller layer is mainly used to connect the front-end management display interface request and the back-end service data call. The Service layer is mainly used to create the actual call link of the model training and optimization process.
步骤S3可以理解为Controller层的创建。上述前端管理展示界面请求通过前端管理展示界面的前端交互请求接口传递,当前端管理展示界面通过前端交互请求接口发送请求时,为了使后端服务管理系统能够识别该请求,对应的建立后端服务控制接口,并建立该后端服务控制接口与对应前端交互请求接口的对应关系。Step S3 can be understood as the creation of the Controller layer. The front-end management display interface request is transmitted through the front-end interaction request interface of the front-end management display interface. When the current-end management display interface sends a request through the front-end interaction request interface, in order to enable the back-end service management system to recognize the request, the corresponding back-end service is established. Controlling the interface and establishing a correspondence between the backend service control interface and the corresponding front end interaction request interface.
前端交互请求接口为http形式的请求url链接,针对不同的业务请求创建不同的请求链接,确保不同业务请求的url的唯一性。The front-end interactive request interface is a request url link in the form of http, and different request links are created for different service requests to ensure the uniqueness of the url of different service requests.
后端服务控制接口,为具体实现业务逻辑的代码方法,其作用是将前端以url方式描述的业务请求,相对应的在服务端使用具体的代码实现。The back-end service control interface is a code method for implementing the business logic. The function is to describe the service request described by the front end in the url manner, and correspondingly implement the specific code on the server side.
例如,在端管理展示界面定义常见分类模型的url为ip:port/create-model,在后端服务管理系统Controller层中定义CreateModel(Model model)函数并与/create-model建立对应关系。当后端服务接收到前端/create-model请求时即触发调用CreateModel函数。For example, in the end management display interface, the url of the common classification model is defined as ip:port/create-model, and the CreateModel (Model model) function is defined in the Controller layer of the backend service management system and associated with /create-model. The CreateModel function is called when the backend service receives the frontend/create-model request.
S4,创建所述后端服务控制接口的内部业务逻辑,所述内部业务逻辑包括,基于SPARK算法训练分类模型的业务逻辑需求和所述前端交互请求接口,通过调用所述后端服务数据源系统,创建初始分类模型,并对所述初始分类模型进行训练和预测优化,获取目标分类模型。S4, creating internal business logic of the backend service control interface, the internal business logic includes: a business logic requirement for training a classification model based on a SPARK algorithm, and the front end interaction request interface, by calling the backend service data source system An initial classification model is created, and the initial classification model is trained and predicted to obtain a target classification model.
可以理解为,根据上述实施例,后端管理流程控制系统包含Controller层和Service层,Service层主要用于创建模型训练与优化过程的实际调用链路,即定义Controller层中定义的接口的具体的实现过程。本步骤通过创建Service层创建后端服务控制接口的内部业务逻辑。It can be understood that, according to the foregoing embodiment, the back-end management flow control system includes a Controller layer and a Service layer, and the Service layer is mainly used to create an actual calling link of the model training and optimization process, that is, defining a specific interface defined in the Controller layer. Implementation process. This step creates the internal business logic of the backend service control interface by creating a Service layer.
首先创建springBoot入口程序,绑定8180端口,服务启动时监听请求。当有请求时,触发调用对应的业务逻辑。然后定义Controller层中定义的接口的具体的实现过程,该实现过程包括:先基于SPARK算法训练分类模型的业务逻辑需求和上述前端交互请求接口,通过调用后端服务数据源系统,创建初始分类模型;再采用SPARK算法对所述初始模型进行训练和预测优化,获取目标分类模型。例如,在Controller层中定义了createModel(Model model)函数,在service层中定义具体实现createModel函数功能的过程。First create the springBoot portal program, bind port 8180, listen for requests when the service starts. When there is a request, the corresponding business logic is triggered. Then, the specific implementation process of the interface defined in the Controller layer is defined. The implementation process includes: firstly training the business logic requirement of the classification model based on the SPARK algorithm and the front end interaction request interface, and creating an initial classification model by calling the backend service data source system. The SPARK algorithm is used to train and predict the initial model to obtain the target classification model. For example, the createModel(Model model) function is defined in the Controller layer, and the process of implementing the createModel function is defined in the service layer.
本发明实施例提供的一种分类模型训练系统的实现方法,通过创建外部管理的前端管理展示界面,以及后端服务管理的后端服务数据源系统和后端服务业务处理单元,并建立各系统单元间的对应关系,将基于SPARK算法进行分类模型构建、训练和预测优化的处理整合到一个系统中,形成一个基于SPARK-MLlib的在前端管理展示界面进行设置、在后端服务管理系统进行分类模型训练的流程化框架。在利用该系统进行分类模型训练时,只需在前端管理展示界面操作即可完成整个分类模型的训练优化过程,能够有效简化分类模型训练操作流程,从而有效降低开发者劳动强度,提高开发效率。An implementation method of a classification model training system provided by an embodiment of the present invention, by creating an external management front-end management presentation interface, and a back-end service data source system and a back-end service business processing unit of the back-end service management, and establishing each system The correspondence between the units is integrated into a system based on the SPARK algorithm for classification model construction, training and prediction optimization. A SPARK-MLlib-based front-end management display interface is set up and classified in the back-end service management system. The process framework for model training. When using the system for classification model training, the training optimization process of the entire classification model can be completed only by the front-end management display interface operation, which can effectively simplify the classification model training operation process, thereby effectively reducing the developer labor intensity and improving the development efficiency.
在一个实施例中,步骤S1中所述创建前端管理展示界面的步骤进一步包括:分别创建分类模型的训练管理界面、优化管理界面和分类模型管理界面,所述训练管理界面用于为SPARK算法训练分类模型的训练阶段提供外部管理支持,所述优化管理界面用于为SPARK算法训练分类模型的预测优化阶段提供外部管理支持,所述分类模型管理界面用于为所述目标分类模型提供外部管理支持;相应的,所述前端交 互请求接口包括:前端训练交互请求接口、前端优化交互请求接口和前端模型管理交互请求接口。In an embodiment, the step of creating a front-end management presentation interface in step S1 further comprises: respectively creating a training management interface, an optimization management interface, and a classification model management interface of the classification model, wherein the training management interface is used for training the SPARK algorithm. The training phase of the classification model provides external management support for providing external management support for the predictive optimization phase of the SPARK algorithm training classification model, the classification model management interface being used to provide external management support for the target classification model Correspondingly, the front end interaction request interface includes: a front end training interaction request interface, a front end optimization interaction request interface, and a front end model management interaction request interface.
可以理解为,在进行分类模型训练时,需要对训练过程和预测优化过程的算法和参数进行设置,同时,为了对分类模型进行管理,需要对管理参数进行设置。因此,在进行前端管理展示界面的创建时,至少需要创建分类模型的训练管理界面、优化管理界面和分类模型管理界面。It can be understood that when the classification model training is performed, the algorithms and parameters of the training process and the prediction optimization process need to be set. At the same time, in order to manage the classification model, the management parameters need to be set. Therefore, when creating the front-end management display interface, at least the training management interface, the optimization management interface, and the classification model management interface of the classification model need to be created.
同样的,为了与后端服务管理系统进行数据交互,需要在各管理界面设置接口函数,即在训练管理界面设置前端训练交互请求接口,在优化管理界面设置前端优化交互请求接口,在分类模型管理界面设置前端模型管理交互请求接口。Similarly, in order to interact with the back-end service management system, an interface function needs to be set in each management interface, that is, a front-end training interaction request interface is set in the training management interface, and a front-end optimized interaction request interface is set in the optimization management interface, and the classification model management is performed. The interface sets the front-end model management interaction request interface.
其中可选的,所述训练管理界面至少包括:分类模型算法选择界面、分类模型算法参数设置界面、训练数据源设置界面和数据预处理流程设置界面;所述优化管理界面至少包括:分类模型优化策略选择界面、分类模型优化标准设置界面和预测优化数据源设置界面;所述分类模型管理界面至少包括:分类模型版本管理界面和分类模型效果展现界面。Optionally, the training management interface includes at least: a classification model algorithm selection interface, a classification model algorithm parameter setting interface, a training data source setting interface, and a data preprocessing process setting interface; and the optimization management interface includes at least: classification model optimization The policy selection interface, the classification model optimization standard setting interface, and the prediction optimization data source setting interface; the classification model management interface at least includes: a classification model version management interface and a classification model effect presentation interface.
可以理解为,根据上述实施例,在创建前端管理展示界面时,采用Angularjs与html编写实现代码,首先创建分类模型的训练管理界面、优化管理界面和分类模型管理界面,然后在各管理界面中创建子界面,包括:It can be understood that, according to the foregoing embodiment, when creating a front-end management display interface, the implementation code is written by using Angularjs and html, firstly, a training management interface, an optimization management interface, and a classification model management interface of the classification model are created, and then created in each management interface. Sub-interfaces, including:
在训练管理界面创建分类模型算法选择界面、分类模型算法参数设置界面、训练数据源设置界面和数据预处理流程设置界面,分别用于分类模型算法的选择设置、分类模型算法参数的设置、训练数据源的选择和训练数据的预处理设置;In the training management interface, a classification model algorithm selection interface, a classification model algorithm parameter setting interface, a training data source setting interface, and a data preprocessing flow setting interface are respectively used for selecting a classification model algorithm, setting a classification model algorithm parameter, and training data. Source selection and pre-processing settings for training data;
在优化管理界面创建分类模型优化策略选择界面、分类模型优化 标准设置界面和预测优化数据源设置界面,分别用于优化策略选择设置、优化标准设置、和优化数据源选择设置;Create a classification model optimization strategy selection interface, a classification model optimization standard setting interface, and a prediction optimization data source setting interface in the optimization management interface, respectively, for optimizing strategy selection settings, optimizing standard settings, and optimizing data source selection settings;
在分类模型管理界面创建分类模型版本管理界面和分类模型效果展现界面,分别用于分类模型的版本管理和分类模型的效果展现。A classification model version management interface and a classification model effect presentation interface are created in the classification model management interface, which are respectively used for the version management of the classification model and the effect presentation of the classification model.
以上选择设置可采用下拉列表选择方式。所有的前端管理展示界面使用post请求与后端服务管理系统进行数据交互。The above selection settings can be selected by the drop-down list. All front-end management presentation interfaces use post requests to interact with the back-end service management system for data interaction.
本发明实施例提供的一种分类模型训练系统的实现方法,通过分别创建分类模型的训练管理界面、优化管理界面和分类模型管理界面,并定义各管理界面的设置子界面,能够方便实现分类模型构建过程、训练过程和预测优化设置的外部设置管理,另外通过下拉选择列表设置分类算法等的设置,使用户只需根据需要选择点击相应的选项,无需手动输入,能够提高工作效率和用户体验。An implementation method of a classification model training system provided by an embodiment of the present invention can conveniently implement a classification model by separately creating a training management interface, an optimization management interface, and a classification model management interface of a classification model, and defining a setting sub-interface of each management interface. The external settings management of the construction process, the training process, and the predictive optimization settings, and the setting of the classification algorithm by the pull-down selection list, so that the user only needs to click the corresponding option according to the need, without manual input, can improve work efficiency and user experience.
在另一个实施例中,所述步骤S3的进一步处理步骤参考图3,为本发明实施例一种创建后端服务控制接口的处理过程流程图,包括:In another embodiment, the processing step of the step S3 is as follows: FIG. 3 is a flowchart of a process for creating a backend service control interface according to an embodiment of the present invention, including:
S31,基于所述前端训练交互请求接口,创建后端训练管理控制接口,并建立所述前端训练交互请求接口与所述后端训练管理控制接口的对应关系;S32,基于所述前端优化交互请求接口,创建后端优化管理控制接口,并建立所述前端优化交互请求接口与所述后端优化管理控制接口的对应关系;S33,基于所述前端模型管理交互请求接口,创建后端模型管理控制接口,并建立所述前端模型管理交互请求接口与所述后端模型管理控制接口的对应关系。S31, based on the front-end training interaction request interface, creating a back-end training management control interface, and establishing a correspondence between the front-end training interaction request interface and the back-end training management control interface; S32, optimizing an interaction request based on the front-end Interface, creating a backend optimization management control interface, and establishing a correspondence between the front end optimization interaction request interface and the back end optimization management control interface; S33, managing an interaction request interface based on the front end model, and creating a backend model management control An interface, and establishing a correspondence between the front-end model management interaction request interface and the back-end model management control interface.
可以理解为,根据上述实施例,通过创建后端服务管理系统的Controller层,实现后端服务控制接口的创建,且在前端管理展示界面的创建时,创建了训练管理界面、优化管理界面和分类模型管理界面,且定义了各管理界面的前端交互请求接口。因此在创建后端服务控制接口时,需对应创建后端训练管理控制接口、后端优化管理控制接口 和后端模型管理控制接口,并分别建立对应接口之间的对应关系,以便在获取分类模型的各处理阶段顺利调用相应的接口。此外,本实施例中步骤标号S31、S32和S33仅为对各步骤进行区分,不限制对应步骤的实现顺序。It can be understood that, according to the foregoing embodiment, the creation of the backend service control interface is implemented by creating the Controller layer of the backend service management system, and the training management interface, the optimization management interface, and the classification are created when the front end management display interface is created. The model management interface defines the front-end interaction request interface of each management interface. Therefore, when creating the backend service control interface, it is necessary to create a backend training management control interface, a backend optimization management control interface, and a backend model management control interface, and respectively establish corresponding correspondences between the corresponding interfaces, so as to obtain the classification model. Each processing stage smoothly calls the corresponding interface. In addition, step numbers S31, S32, and S33 in this embodiment only distinguish each step, and do not limit the order of implementation of the corresponding steps.
本发明实施例提供的一种分类模型训练系统的实现方法,通过分别对应各前端管理展示界面创建后端服务管理系统的后端服务控制接口,并建立前端与后端的对应关系,能够在接口调用时快速准确的调用对应接口,提高系统处理效率。The implementation method of the classification model training system provided by the embodiment of the present invention creates a back-end service control interface of the back-end service management system by corresponding to each front-end management display interface, and establishes a correspondence between the front-end and the back-end, which can be called on the interface. Quickly and accurately call the corresponding interface to improve system processing efficiency.
在又一个实施例中,所述步骤S4的进一步处理步骤参考图4,为本发明实施例一种创建后端服务控制接口的内部业务逻辑的处理过程流程图,包括:In still another embodiment, the processing step of the step S4 is further related to the process of creating an internal service logic of the backend service control interface according to the embodiment of the present invention, which includes:
S41,创建所述后端训练管理控制接口的内部训练业务逻辑,所述内部训练业务逻辑包括,基于SPARK算法训练分类模型过程的内部业务逻辑流程和所述前端训练交互请求接口,通过调用所述SPARK-MLlib、所述训练数据源库和所述模型系统元数据库,创建初始分类模型,并对所述初始分类模型进行训练,获取待优化分类模型。S41: Create internal training service logic of the backend training management control interface, where the internal training service logic includes an internal business logic flow for training a classification model process based on a SPARK algorithm and the front end training interaction request interface, by calling the The SPARK-MLlib, the training data source library and the model system metadata database create an initial classification model, and train the initial classification model to obtain a classification model to be optimized.
可以理解为,为了使模型训练系统能够根据前端管理展示界面设置,自行进行分类模型的构建和训练,需对应定义后端训练管理控制接口的内部实现业务逻辑,即内部训练业务逻辑。后端训练管理控制接口的实现过程,即内部训练业务逻辑包括:It can be understood that, in order to enable the model training system to display and interface the classification model according to the front-end management display interface, the internal implementation business logic of the back-end training management control interface, that is, the internal training business logic, needs to be defined correspondingly. The implementation process of the back-end training management control interface, that is, the internal training business logic includes:
根据前端训练交互请求接口数据,按照SPARK算法训练分类模型的处理规则和流程,通过调用SPARK-MLlib构建初始分类模型,并将初始分类模型数据存入模型系统元数据库。然后通过访问训练数据源库获取训练数据源,并用获取的训练数据源训练构建的初始分类模型,经过训练的分类模型即为待优化分类模型。According to the front-end training interaction request interface data, the processing rules and processes of the classification model are trained according to the SPARK algorithm, the initial classification model is constructed by calling SPARK-MLlib, and the initial classification model data is stored in the model system metadata database. Then, the training data source is obtained by accessing the training data source library, and the constructed initial classification model is trained by using the acquired training data source, and the trained classification model is the classification model to be optimized.
其中可选的,步骤S41中所述创建所述后端训练管理控制接口的 内部训练业务逻辑的进一步处理步骤参考图5,为本发明实施例一种创建后端训练管理控制接口的内部训练业务逻辑的处理过程流程图,至少包括:Optionally, the step of further processing the internal training service logic of the backend training management control interface is described in step S41. Referring to FIG. 5, an internal training service for creating a backend training management control interface is performed according to an embodiment of the present invention. The logical process flow diagram includes at least:
S411,基于数据预处理数据库包含的数据预处理算法,创建各所述数据预处理算法对应的预处理内部业务逻辑。S411. Create a pre-processed internal business logic corresponding to each of the data pre-processing algorithms based on a data pre-processing algorithm included in the data pre-processing database.
可以理解为,分类模型训练过程中,在用训练数据源进行初始分类模型的训练之前,先对准备好的训练数据源进行预处理,以去除数据中的噪声,更好的适应模型训练。训练模型预处理算法有多种,如数据统一格式、归一化和词替换等。用户可通过前端管理展示界面进行预处理算法选择,后端服务管理系统根据前端选择调用相应的处理逻辑。It can be understood that, in the training process of the classification model, before the training of the initial classification model is performed by using the training data source, the prepared training data source is preprocessed to remove the noise in the data and better adapt to the model training. There are many training model preprocessing algorithms, such as data uniform format, normalization and word substitution. The user can select the pre-processing algorithm through the front-end management display interface, and the back-end service management system invokes the corresponding processing logic according to the front-end selection.
因此,需创建前端管理展示界面中包含的预处理算法选项的预处理内部业务逻辑,前端预处理算法选项对应的数据预处理算法都是包含在数据预处理数据库中的,因此只需根据数据预处理数据库中包含的数据预处理算法,创建对应的预处理内部业务逻辑。Therefore, the pre-processing internal business logic of the pre-processing algorithm option included in the front-end management display interface needs to be created, and the data pre-processing algorithm corresponding to the front-end pre-processing algorithm option is included in the data pre-processing database, so only the data pre-processing is required. The data preprocessing algorithm included in the database is processed to create a corresponding preprocessed internal business logic.
S412,基于SPARK-MLlib包含的分类算法,创建各所述分类算法对应的生成分类模型的内部业务逻辑。S412. Create an internal service logic for generating a classification model corresponding to each of the classification algorithms based on a classification algorithm included in the SPARK-MLlib.
可以理解为,与上述步骤同理,基于不同分类算法,构建的分类模型业务实现逻辑不同,所构建的分类模型不同,且对模型的训练过程也不同。为了在用户通过前端管理展示界面设置分类算法和算法参数之后,根据用户选择设置实现对应的分类模型构建和分类模型训练过程,创建各分类算法对应的生成分类模型的内部业务逻辑。具体创建基于SPARK-MLlib的分类模型构建与训练实现业务逻辑,新建朴素贝叶斯、支持向量机和逻辑回归等SPARK-MLlib目前支持的分类算法。It can be understood that, similarly to the above steps, the classification model business implementation logic is different based on different classification algorithms, the constructed classification models are different, and the training process of the model is different. In order to set the classification algorithm and the algorithm parameters through the front-end management display interface, the corresponding classification model construction and classification model training process are implemented according to the user selection setting, and the internal business logic of the classification model corresponding to each classification algorithm is created. The SPARK-MLlib-based classification model is constructed and trained to realize business logic, and the classification algorithms currently supported by SPARK-MLlib such as Naive Bayes, Support Vector Machine and Logistic Regression are newly built.
S413,基于所述训练管理界面的设置数据,通过调用所述训练数据源库、所述预处理内部业务逻辑和所述生成分类模型的内部业务逻 辑,创建训练分类模型的内部业务逻辑。S413. Create internal business logic of the training classification model by calling the training data source library, the pre-processing internal business logic, and the internal business logic of the generated classification model based on the setting data of the training management interface.
可以理解为,步骤S413实现分类模型训练程序的创建。该训练程序具体实现,根据前端训练管理界面的参数设置,创建SPARK程序,读取训练数据源,生成分类模型训练脚本并自动上传到SPARK集群服务器。由系统调用脚本,启动SPARK程序,创建分类模型,并存储分类模型结果到指定的hdfs路径,存储分类模型的系统元数据如模型混淆矩阵、正确率和召回率等指标数据到模型系统元数据库(MySQL)。It can be understood that step S413 implements the creation of the classification model training program. The training program is specifically implemented. According to the parameter setting of the front-end training management interface, the SPARK program is created, the training data source is read, the classification model training script is generated and automatically uploaded to the SPARK cluster server. The system calls the script, starts the SPARK program, creates the classification model, and stores the classification model results to the specified hdfs path, and stores the system metadata of the classification model such as model confusion matrix, correct rate and recall rate to the model system metadata database ( MySQL).
S42,创建所述后端优化管理控制接口的内部优化业务逻辑,所述内部优化业务逻辑包括,基于SPARK算法优化分类模型过程的内部业务逻辑流程和所述前端优化交互请求接口,通过调用所述SPARK-MLlib、所述预测优化数据源库和所述模型系统元数据库,对所述待优化分类模型进行预测优化,获取所述目标分类模型。S42: Create an internal optimization service logic of the backend optimization management control interface, where the internal optimization service logic includes an internal business logic flow that optimizes a classification model process based on a SPARK algorithm, and the front end optimization interaction request interface, by calling the The SPARK-MLlib, the prediction optimization data source library, and the model system metadata database perform prediction optimization on the classification model to be optimized, and acquire the target classification model.
可以理解为,为了使模型训练系统能够根据前端管理展示界面设置,自行进行待优化分类模型的预测优化,需对应定义后端优化管理控制接口的内部实现业务逻辑,即内部优化业务逻辑。后端优化管理控制接口的实现过程,即内部优化业务逻辑包括:It can be understood that, in order to enable the model training system to display the interface settings according to the front-end management, the prediction optimization of the classification model to be optimized is performed by itself, and the internal implementation business logic of the back-end optimization management control interface is defined correspondingly, that is, the internal optimization business logic. The implementation process of the backend optimization management control interface, that is, the internal optimization business logic includes:
根据前端优化交互请求接口数据,按照SPARK算法预测优化分类模型的处理规则和流程,通过调用SPARK-MLlib,并访问预测优化数据源库获取预测优化数据源,对待优化分类模型进行预测,再根据预测结果多分类模型进行优化,经过预测优化并达到优化标准的分类模型即为目标分类模型。另外,本实施例中步骤标号S41和S42仅为对各步骤进行区分,不限制对应步骤的实现顺序。According to the front-end optimization interaction request interface data, according to the SPARK algorithm to predict the optimization classification model processing rules and processes, by calling SPARK-MLlib, and accessing the prediction optimization data source library to obtain the prediction optimization data source, the optimization classification model is predicted, and then according to the prediction As a result, the multi-classification model is optimized, and the classification model that has been predicted and optimized and reaches the optimization standard is the target classification model. In addition, step numbers S41 and S42 in this embodiment only distinguish each step, and do not limit the order of implementation of the corresponding steps.
其中可选的,步骤S42中所述创建所述后端优化管理控制接口的内部优化业务逻辑的进一步处理步骤参考图6,为本发明实施例一种创建后端优化管理控制接口的内部优化业务逻辑的处理过程流程图,包括:Optionally, the further processing step of creating the internal optimization service logic of the backend optimization management control interface in step S42 is as follows: FIG. 6 is an internal optimization service for creating a backend optimization management control interface according to an embodiment of the present invention. Logical process flow diagram, including:
S421,基于所述前端优化交互请求接口的请求数据,选定预测优化所述待优化分类模型的预测优化数据源和预测优化约束条件。S421. Select, according to the request data of the front end optimization interaction request interface, predictively optimize the prediction optimization data source and the prediction optimization constraint condition of the classification model to be optimized.
可以理解为,对于已经训练完成的分类模型,需要用另外的预测优化数据源对其进行预测,并根据预测结果优化分类模型。本步骤具体创建系统根据前端优化交互请求接口的请求数据,生成分类模型优化策略、指定分类模型的预测优化数据源及需要预测的数据列、创建每天预测任务、指定分类模型最优的参数阈值,即预测优化约束条件以确定分类模型是否需要继续优化的内部业务逻辑。It can be understood that for the classification model that has been trained, it needs to be predicted by another prediction optimization data source, and the classification model is optimized according to the prediction result. This step specifically creates a system according to the request data of the front-end optimized interaction request interface, generates a classification model optimization strategy, specifies a classification optimization model data source and a data column that needs to be predicted, creates a daily prediction task, and specifies an optimal parameter threshold of the classification model. That is, predictive optimization constraints to determine whether the classification model needs to continue to optimize the internal business logic.
S422,基于SPARK算法训练分类模型的优化过程内部业务逻辑流程,创建预测优化所述待优化分类模型的数据访问和预测处理实现逻辑。S422. The SPARK algorithm is used to train the internal business logic flow of the optimization process of the classification model, and the data access and prediction processing implementation logic of the prediction optimization optimization classification model is created.
可以理解为,在进行分类模型的预测优化时,需按照定义的处理流程进行相应数据的访问和预测优化处理步骤。本步骤具体创建系统的预测优化策略,包括:系统读取hdfs路径模型,加载分类模型数据到内存中,从Hive中读取需要预测的源数据,经分类模型进行预测,并将结果写入到模型系统元数据库MySQL中,在系统页面上显示展现预测的结果。It can be understood that when performing the prediction optimization of the classification model, the corresponding data access and the prediction optimization processing steps are performed according to the defined processing flow. This step specifically creates a predictive optimization strategy for the system, including: the system reads the hdfs path model, loads the classification model data into the memory, reads the source data that needs to be predicted from the Hive, predicts by the classification model, and writes the result to In the model system metabase MySQL, the results of the predictions are displayed on the system page.
S423,创建预测优化所述待优化分类模型过程的数据纠正内部业务逻辑,所述数据纠正内部业务逻辑包括,基于对所述待优化分类模型的优化结果,提取对分类模型预测错误的记录,进行数据纠正。S423: Create a data correction internal business logic that predicts and optimizes the classification model process to be optimized, and the data correction internal business logic includes: extracting a record of the prediction error of the classification model based on the optimization result of the classification model to be optimized, and performing Data correction.
可以理解为,在进行分类模型的预测优化过程中,需要在用每组预测数据对模型进行预测之后,记录预测错误的数据,并对预测错误记录的预测数据和模型参数进行纠正。本步骤具体创建预测优化过程的数据纠正策略,包括:纠正预测优化数据,添加新的预测优化样本数据。同时在系统前端管理展示界面查看预测结果,对模型预测错误的记录进行提取并重新纠正,纠正后的分类模型数据存入模型系统元 数据库MySQL中。It can be understood that in the process of predictive optimization of the classification model, it is necessary to record the prediction error data after predicting the model with each set of prediction data, and correct the prediction data and model parameters of the prediction error record. This step specifically creates a data correction strategy for the predictive optimization process, including: correcting the predicted optimization data, and adding new predictive optimization sample data. At the same time, the system front-end management display interface is used to view the prediction results, and the model prediction error records are extracted and re-corrected, and the corrected classification model data is stored in the model system metadata database MySQL.
S424,创建预测优化所述待优化分类模型过程的数据更新内部业务逻辑,所述数据更新内部业务逻辑包括,基于经数据纠正的模型系统元数据库,抽取分类模型元导入所述预测优化数据源库下一分区。S424: Create a data update internal business logic that predicts and optimizes the classification model process to be optimized, and the data update internal business logic includes: extracting a classification model element into the prediction optimization data source library based on the data corrected model system metadata database Next partition.
可以理解为,在对分类模型的参数进行纠正之后,即对分类模型进行优化之后,需继续用预测优化数据源对纠正后的分类模型进行预测。本步骤具体创建优化分类模型数据的更新策略,对分类模型数据进行更新,添加新的模型特征到模型系统元数据库MySQL。新的特征数据存入预测优化数据源库MySQL之后,系统调用Sqoop工具抽取MySQL到预测优化数据源(Hive)新的一天分区数据中。It can be understood that after correcting the parameters of the classification model, that is, after optimizing the classification model, it is necessary to continue to predict the corrected classification model by using the prediction optimization data source. This step specifically creates an update strategy for optimizing the classification model data, updates the classification model data, and adds new model features to the model system metadata database MySQL. After the new feature data is stored in the predictive optimized data source library MySQL, the system calls the Sqoop tool to extract MySQL into the new day partition data of the predicted optimized data source (Hive).
S425,基于所述预测优化约束条件,创建停止优化分类模型的内部业务逻辑,所述停止优化分类模型的内部业务逻辑包括,重新指定分类模型的预测优化数据源库,创建预测分类模型与数据优化业务逻辑,直至分类模型参数达到所述预测优化约束条件,停止模型优化。S425. Create an internal business logic that stops the optimization classification model based on the prediction optimization constraint condition. The internal business logic of the stop optimization classification model includes: re-specifying the prediction optimization data source library of the classification model, creating a prediction classification model and data optimization. The business logic stops the model optimization until the classification model parameters reach the prediction optimization constraint.
可以理解为,在上述步骤对分类模型数据进行纠正,并对预测优化数据源进行更新之后,需继续用更新后的预测优化数据源对纠正后的分类模型进行训练。本步骤创建模型优化停止策略,重新指定分类模型的训练数据源,重复训练分类模型与训练数据优化步骤,直到分类模型参数达到预设的预测优化约束条件,则停止对分类模型的训练,获取目标分类模型。It can be understood that after the classification model data is corrected in the above steps, and the prediction optimization data source is updated, the corrected classification model needs to be trained with the updated prediction optimization data source. In this step, a model optimization stop strategy is created, a training data source of the classification model is re-designated, and the training classification model and the training data optimization step are repeated until the classification model parameters reach a preset prediction optimization constraint condition, then the training of the classification model is stopped, and the target is acquired. Classification model.
本发明实施例提供的一种分类模型训练系统的实现方法,通过分别创建分类模型构建、训练和预测优化过程的内部实现业务逻辑,使用户在使用该分类模型训练系统进行分类模型训练时,只需在前端管理展示界面进行数据和参数设置,即可由系统自动完成分类模型的构建训练和预测优化,操作简单,提高开发效率。An implementation method of a classification model training system provided by an embodiment of the present invention, by separately creating an internal implementation business logic of a classification model construction, training, and prediction optimization process, so that when the user uses the classification model training system to perform classification model training, only It is necessary to set the data and parameters in the front-end management display interface, and the system can automatically complete the construction training and prediction optimization of the classification model, and the operation is simple, and the development efficiency is improved.
作为本发明实施例的另一个方面,本实施例提供一种分类模型训 练系统,参考图7,为本发明实施例一种分类模型训练系统的结构示意图,包括:前端管理展示界面1、后端服务数据源系统2、后端服务控制接口单元3和后端服务业务处理单元4。As another aspect of the embodiment of the present invention, the present embodiment provides a classification model training system. Referring to FIG. 7, FIG. 7 is a schematic structural diagram of a classification model training system according to an embodiment of the present invention, including: a front-end management display interface, and a back end. The service data source system 2, the backend service control interface unit 3, and the backend service business processing unit 4.
其中,前端管理展示界面1用于进行训练分类模型过程、预测优化分类模型过程和分类模型管理的外部设置管理,前端管理展示界面1包括前端交互请求接口101,用于外部管理与后端服务的信息交互;后端服务数据源系统2用于根据SPARK算法训练分类模型的内部业务逻辑调用请求,提供SPARK算法的机器学习数据源,训练数据源、预测优化数据源和模型系统元数据库;后端服务控制接口单元3用于建立所述前端交互请求接口与后端服务业务逻辑调用间的对应关系;后端服务业务处理单元4用于基于SPARK算法训练分类模型的业务逻辑需求和所述前端交互请求接口,通过调用所述后端服务数据源系统,创建初始分类模型,并对所述初始分类模型进行训练和预测优化,获取目标分类模型。The front-end management display interface 1 is used for performing the training classification model process, the predictive optimization classification model process, and the external setting management of the classification model management. The front-end management display interface 1 includes a front-end interaction request interface 101 for external management and back-end services. Information interaction; the back-end service data source system 2 is configured to train the internal business logic calling request of the classification model according to the SPARK algorithm, provide the machine learning data source of the SPARK algorithm, the training data source, the prediction optimization data source, and the model system metadata database; The service control interface unit 3 is configured to establish a correspondence between the front end interaction request interface and the back end service business logic call; the back end service business processing unit 4 is configured to train the business logic requirement of the classification model and the front end interaction based on the SPARK algorithm. The request interface generates an initial classification model by calling the backend service data source system, and performs training and prediction optimization on the initial classification model to obtain a target classification model.
可以理解为,本实施例的分类模型训练系统包括用于用户进行外部管理设置的前端管理展示界面1、用于后端服务管理的后端服务业务处理单元4、用于为分类模型训练提供数据支持的后端服务数据源系统2以及用于在用户外部管理和后端服务管理之间建立联系的后端服务控制接口单元3。当用户通过前端管理展示界面进行分类模型训练过程的设置时,前端管理展示界面通过前端交互请求接口将用户设置传输到后端服务管理系统。前端管理展示页面采用标准REST API与后端服务管理系统进行交互。It can be understood that the classification model training system of the embodiment includes a front-end management presentation interface for the user to perform external management settings, a back-end service business processing unit 4 for back-end service management, and data for training for the classification model. The supported backend service data source system 2 and the backend service control interface unit 3 for establishing a relationship between the user external management and the backend service management. When the user performs the setting of the classification model training process through the front-end management presentation interface, the front-end management display interface transmits the user settings to the back-end service management system through the front-end interaction request interface. The front-end management display page interacts with the back-end service management system using standard REST APIs.
前端管理展示界面请求通过前端交互请求接口传递,当前端管理展示界面通过前端交互请求接口发送请求时,通过对应的后端服务控制接口,后端服务管理系统识别该请求,并由后端服务业务处理单元4调用相应的算法和流程。如进行分类模型的构建、训练和预测优化时,由后端服务业务处理单元4调用相应的训练和预测优化数据对构建的 分类模型进行训练和预测优化。The front-end management display interface request is transmitted through the front-end interactive request interface. When the current-end management display interface sends a request through the front-end interactive request interface, the back-end service management system identifies the request through the corresponding back-end service control interface, and the back-end service service is recognized by the back-end service management system. Processing unit 4 invokes the corresponding algorithm and flow. For the construction, training and predictive optimization of the classification model, the backend service business processing unit 4 calls the corresponding training and predictive optimization data to train and predict the constructed classification model.
在使用该系统进行分类模型训练时,服务启动时监听请求,当有请求时,触发调用对应的业务逻辑。先基于SPARK算法训练分类模型的业务逻辑需求和上述前端交互请求接口,通过调用后端服务数据源系统,创建初始分类模型;再采用SPARK算法对所述初始模型进行训练和预测优化,获取目标分类模型。When using the system for classification model training, the service listens for the request when it starts, and when there is a request, triggers the corresponding business logic. Firstly, based on the SPARK algorithm, the business logic requirements of the classification model and the front-end interactive request interface are described, and the initial classification model is created by calling the back-end service data source system; then the SPARK algorithm is used to train and predict the initial model to obtain the target classification. model.
其中一个实施例中,后端服务数据源系统的框架参考图8,为本发明实施例一种后端服务数据源系统的结构示意图,包括:MySQL模型系统元数据库、Hive训练数据源库、MySQL-Hive预测优化数据源库、分类模型系统单元、算法模型单元和SPARK集群。其中,MySQL模型系统元数据库用于存储模型元数据,Hive训练数据源库用于存储训练源数据,MySQL-Hive预测优化数据源库用于存储预测优化数据源。In one embodiment, the framework of the backend service data source system refers to FIG. 8, which is a schematic structural diagram of a backend service data source system according to an embodiment of the present invention, including: a MySQL model system metadata database, a Hive training data source library, and MySQL. -Hive predictive optimization data source library, classification model system unit, algorithm model unit and SPARK cluster. The MySQL model system metabase is used to store model metadata, the Hive training data source library is used to store training source data, and the MySQL-Hive predictive optimization data source library is used to store predictive optimized data sources.
本发明实施例提供的一种分类模型训练系统,使用户在用其进行分类模型训练时,只需在系统的前端管理展示界面上创建分类模型工程,指定训练数据源、ETL算法、模型算法、参数等训练模型与优化模型的基本流程,后续训练与优化分类模型只需在界面进行选择点击,或者创建定时任务由系统自动执行即可在很短时间内获取目标分类模型,避免重复不断的进行训练样本准备和参数优化,使用户的关注点在算法本身的优化与实现,摆脱以往的花大量精力在数据准备和程序运行上,提高开发效率。The classification model training system provided by the embodiment of the invention enables the user to create a classification model project on the front-end management display interface of the system, and specifies a training data source, an ETL algorithm, a model algorithm, and the like. The basic process of training model and optimization model such as parameters, follow-up training and optimization classification model only need to select and click on the interface, or create a timed task automatically executed by the system to obtain the target classification model in a short time, avoiding repeated and continuous Training sample preparation and parameter optimization, so that the user's focus on the optimization and implementation of the algorithm itself, to get rid of the past a lot of energy in data preparation and program operation, improve development efficiency.
作为本发明实施例的又一个方面,本实施例提供一种根据如上所述分类模型训练系统的分类模型训练方法,参考图9,为本发明实施例一种利用本发明分类模型训练系统进行分类模型训练的方法流程图,包括:As another aspect of the embodiment of the present invention, the present embodiment provides a classification model training method according to the classification model training system as described above. Referring to FIG. 9, according to an embodiment of the present invention, a classification model training system of the present invention is used for classification. Flow chart of method training, including:
S901,通过所述前端交互请求接口和所述后端服务控制接口单元, 获取所述前端管理展示界面输入的分类模型的构建设置数据、训练过程设置数据和优化过程设置数据。S901. Acquire, by the front end interaction request interface and the backend service control interface unit, build setting data, training process setting data, and optimization process setting data of the classification model input by the front end management display interface.
可以理解为,在系统中定义分类模型训练流程,创建分类模型训练工程。用户通过前端管理展示界面选择分类模型使用算法,制定算法参数,选择数据源表,指定表中的label列与数据列,并定义训练数据源的预处理(ETL)流程,对初始数据列进行数据预处理,指定预处理操作如,数据统一格式、归一化和词替换等预处理操作用来去除数据列中的噪声,更好的适应模型训练。后端服务管理系统通过前端交互请求接口和后端服务控制接口单元获取用户的自定义设置数据。It can be understood that the classification model training process is defined in the system, and the classification model training project is created. The user selects the classification model using the algorithm through the front-end management display interface, formulates the algorithm parameters, selects the data source table, specifies the label column and the data column in the table, and defines the pre-processing (ETL) process of the training data source, and performs data on the initial data column. Pre-processing, specifying pre-processing operations such as data unification, normalization, and word replacement are used to remove noise from the data columns and better accommodate model training. The backend service management system obtains the user's custom setting data through the front end interaction request interface and the back end service control interface unit.
S902,基于所述分类模型的构建设置数据,通过所述后端服务业务处理单元内部调用所述SPARK算法的机器学习数据源,构建初始分类模型,并存入所述模型系统元数据库。S902. Based on the configuration setting data of the classification model, the back-end service processing unit internally invokes the machine learning data source of the SPARK algorithm to construct an initial classification model and store the model in the model system metadata database.
可以理解为,在获取用户自定义设置数据之后,后端服务业务处理单元根据选择的模型算法和训练数据源,创建模型训练脚本并自动上传到SPARK集群服务器。由系统调用脚本,启动SPARK程序,构建初始分类模型,并存储模型结果到指定的hdfs路径,同时存储分类模型数据如模型混淆矩阵、正确率和召回率等指标数据到模型系统元数据库MySQL。It can be understood that after acquiring the user-defined setting data, the back-end service business processing unit creates a model training script according to the selected model algorithm and the training data source and automatically uploads it to the SPARK cluster server. The system calls the script, starts the SPARK program, builds the initial classification model, and stores the model results to the specified hdfs path, and stores the classification model data such as model confusion matrix, correct rate and recall rate to the model system metadata database MySQL.
S903,基于所述分类模型的训练过程设置数据,通过所述后端服务业务处理单元内部调用所述训练数据源,采用SPARK算法对所述初始分类模型进行训练,获取待优化分类模型。S903, based on the training process setting data of the classification model, the training data source is internally invoked by the backend service processing unit, and the initial classification model is trained by using a SPARK algorithm to obtain a classification model to be optimized.
可以理解为,参考图10,为本发明实施例一种SPARK算法训练分类模型的处理过程流程图,根据用户通过前端管理展示界面选择的训练数据源和分类模型训练处理参数,后端服务业务处理单元调用Hive库中相应的带标签训练数据源,并将该数据源进行初始化处理。然后用处理后的训练数据源训练上述步骤构建的初始分类模型。将模 型训练结果写入模型系统元数据库MySQL,同时将模型预测文件写入系统存储单元,获取待优化分类模型。It can be understood that, referring to FIG. 10, a flowchart of a processing procedure of a SPARK algorithm training classification model according to an embodiment of the present invention, according to a training data source and a classification model selected by a user through a front-end management display interface, processing processing parameters, and a back-end service processing The unit calls the corresponding tagged training data source in the Hive library and initializes the data source. The initial classification model constructed by the above steps is then trained with the processed training data source. The model training result is written into the model system metabase MySQL, and the model prediction file is written into the system storage unit to obtain the classification model to be optimized.
S904,基于所述分类模型的优化过程设置数据,通过所述后端服务业务处理单元内部调用所述预测优化数据源,采用SPARK算法对所述待优化分类模型进行预测优化,获取目标分类模型。S904, based on the optimization process setting data of the classification model, invoking the predictive optimization data source by using the backend service business processing unit, and performing prediction optimization on the to-be-optimized classification model by using a SPARK algorithm to obtain a target classification model.
可以理解为,参考图11,为本发明实施例一种SPARK算法预测优化分类模型的处理过程流程图,系统根据用户选择指定分类模型的预测优化数据源及需要预测的数据列,创建每天预测任务,指定模型最优的参数阈值,即预测优化约束条件,以确定是否需要对分类模型进行继续优化。It can be understood that, referring to FIG. 11 , a flowchart of a processing procedure of a SPARK algorithm predictive optimization classification model according to an embodiment of the present invention, the system creates a daily prediction task according to a user selecting a predictive optimization data source of a classification model and a data column that needs to be predicted. Specify the optimal parameter threshold of the model, that is, predictive optimization constraints to determine whether the classification model needs to be continuously optimized.
然后,系统读取hdfs路径模型,加载分类模型数据到内存,从Hive中读取需要预测的数据源,经分类模型进行预测,并将预测结果写入模型系统元数据库MySQL,并在前端管理展示界面展现预测结果;Then, the system reads the hdfs path model, loads the classification model data into the memory, reads the data source that needs to be predicted from Hive, predicts it by the classification model, and writes the prediction result to the model system metabase MySQL, and displays it in the front-end management. The interface displays the predicted results;
接下来进行分类模型参数的优化,即数据纠正,添加新的训练模型样本。通过前端管理展示界面查看分类模型的预测结果,对模型预测错误的记录进行纠正、提取,并存入模型系统元数据库MySQL。Next, the classification model parameters are optimized, that is, the data is corrected, and a new training model sample is added. The front-end management display interface is used to view the prediction results of the classification model, and the records of the model prediction errors are corrected, extracted, and stored in the model system metadata database MySQL.
再然后,针对优化参数后的分类模型,更新训练数据源。添加新的模型特征到训练样本,新的特征数据存入模型系统元数据库MySQL之后,系统调用Sqoop工具抽取MySQL到预测优化数据源(Hive)新的一天分区数据中。Then, the training data source is updated for the classification model after optimizing the parameters. After adding new model features to the training samples, the new feature data is stored in the model system metabase MySQL, and the system calls the Sqoop tool to extract MySQL into the new day partition data of the predicted optimized data source (Hive).
最后,重新指定分类模型的预测优化数据源,重复定义预测优化分类模型的步骤,直到分类模型参数达到指定阈值,停止模型训练,获取目标分类模型。Finally, re-specify the predictive optimization data source of the classification model, and repeat the steps of defining the predictive optimization classification model until the classification model parameters reach the specified threshold, stop the model training, and obtain the target classification model.
本发明实施例提供的一种根据如上所述分类模型训练系统的分类 模型训练方法,通过在前端管理展示界面对分类模型构建、分类模型训练和分类模型预测优化过程的参数选择和设置,在后端服务管理系统根据该设置自动创建分类模型构建、训练和预测优化处理流程,获取符合设定的目标分类模型,能够有效简化分类模型训练操作流程,从而有效降低开发者劳动强度,提高开发效率。According to the embodiment of the present invention, a classification model training method according to the classification model training system as described above is adopted, and the parameter selection and setting of the classification model construction, the classification model training, and the classification model prediction optimization process are performed in the front-end management display interface. According to the setting, the end service management system automatically creates a classification model construction, training and prediction optimization processing flow, and obtains a target classification model that conforms to the setting, which can effectively simplify the classification model training operation flow, thereby effectively reducing the developer labor intensity and improving the development efficiency.
图12示出本申请实施例一种分类模型训练系统的实现设备的结构框图。FIG. 12 is a structural block diagram of an apparatus for implementing a classification model training system according to an embodiment of the present application.
参照图12,所述分类模型训练系统的实现设备,包括:处理器(processor)1201、存储器(memory)1202和总线1203;其中,Referring to FIG. 12, the implementation device of the classification model training system includes: a processor 1201, a memory 1202, and a bus 1203;
所述处理器1201和存储器1202通过所述总线1203完成相互间的通信;The processor 1201 and the memory 1202 complete communication with each other through the bus 1203;
所述存储器1202存储有可被所述处理器1201执行的程序指令,所述处理器1201用于调用所述存储器1202中的程序指令,以执行上述各分类模型训练系统的实现方法实施例所提供的方法,例如包括:基于SPARK算法训练分类模型的外部管理需求,创建前端管理展示界面,并基于外部管理与后端服务的交互需求,定义所述前端管理展示界面的前端交互请求接口;以及,S21,导入SPARK算法的机器学习库SPARK-MLlib,并分别创建训练数据源库、预测优化数据源库和模型系统元数据库;S22,将准备好的训练样本数据存入所述训练数据源库,并将预测优化样本数据存入所述预测优化数据源库等。The memory 1202 stores program instructions that are executable by the processor 1201, and the processor 1201 is configured to invoke program instructions in the memory 1202 to perform the implementation of the foregoing method for implementing the classification model training system. The method includes, for example, training an external management requirement of the classification model based on the SPARK algorithm, creating a front-end management presentation interface, and defining a front-end interaction request interface of the front-end management presentation interface based on an interaction requirement of the external management and the back-end service; S21, importing a SPARK-MLlib machine learning library SPARK-MLlib, and separately creating a training data source library, a prediction optimization data source library, and a model system metadata database; S22, storing the prepared training sample data into the training data source library, The predicted optimized sample data is stored in the predictive optimized data source library and the like.
本实施例公开一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,计算机能够执行上述各分类模型训练系统的实现方法实施例所提供的方法,例如包括:基于SPARK算法训练分类模型的外部管理需求,创建前端管理展示界面,并基于外部管理与后端服务的交互需求,定义所述前端管理展示 界面的前端交互请求接口;以及,S21,导入SPARK算法的机器学习库SPARK-MLlib,并分别创建训练数据源库、预测优化数据源库和模型系统元数据库;S22,将准备好的训练样本数据存入所述训练数据源库,并将预测优化样本数据存入所述预测优化数据源库等。The embodiment discloses a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a computer, the computer The method provided by the implementation method of the foregoing classification model training system includes, for example, training an external management requirement of the classification model based on the SPARK algorithm, creating a front-end management presentation interface, and based on an interaction requirement between the external management and the back-end service, Defining a front-end interaction request interface of the front-end management display interface; and, S21, importing a SPARK-MLlib of a SPARK algorithm, and separately creating a training data source library, a prediction optimization data source library, and a model system metadata database; S22, The prepared training sample data is stored in the training data source library, and the predicted optimized sample data is stored in the predicted optimized data source library or the like.
本发明另一个实施例提供一种非暂态计算机可读存储介质,所述非暂态计算机可读存储介质存储计算机指令,所述计算机指令使所述计算机执行上述各分类模型训练系统的实现方法实施例所提供的方法,例如包括:基于SPARK算法训练分类模型的外部管理需求,创建前端管理展示界面,并基于外部管理与后端服务的交互需求,定义所述前端管理展示界面的前端交互请求接口;以及,S21,导入SPARK算法的机器学习库SPARK-MLlib,并分别创建训练数据源库、预测优化数据源库和模型系统元数据库;S22,将准备好的训练样本数据存入所述训练数据源库,并将预测优化样本数据存入所述预测优化数据源库等。Another embodiment of the present invention provides a non-transitory computer readable storage medium storing computer instructions, the computer instructions causing the computer to execute an implementation method of each of the above classification model training systems The method provided by the embodiment includes, for example, training an external management requirement of the classification model based on the SPARK algorithm, creating a front-end management presentation interface, and defining a front-end interaction request of the front-end management presentation interface based on an interaction requirement of the external management and the back-end service. Interface; and, S21, importing SPARK-MLlib of the SPARK algorithm, and respectively creating a training data source library, a prediction optimization data source library, and a model system metadata database; S22, storing the prepared training sample data into the training The data source library stores the predicted optimized sample data into the predicted optimized data source library and the like.
本领域普通技术人员可以理解:实现上述各分类模型训练系统的实现方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述各分类模型训练系统的实现方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。A person skilled in the art may understand that all or part of the steps of implementing the foregoing method for implementing the classification model training system may be completed by using hardware related to the program instructions, and the foregoing program may be stored in a computer readable storage medium. The program, when executed, performs the steps of the embodiment of the implementation method including the above-described various classification model training systems; and the foregoing storage medium includes various media that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
以上所描述的分类模型训练系统的实现设备等实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。The implementation device and the like of the classification model training system described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as the unit may or may not be It is not a physical unit, it can be located in one place, or it can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment. Those of ordinary skill in the art can understand and implement without deliberate labor.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the various embodiments can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware. Based on such understanding, the above-described technical solutions may be embodied in the form of software products in essence or in the form of software products, which may be stored in a computer readable storage medium such as ROM/RAM, magnetic Discs, optical discs, etc., include instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments or portions of the embodiments.
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only for explaining the technical solutions of the present invention, and are not limited thereto; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that they can still The technical solutions described in the foregoing embodiments are modified, or the equivalents of the technical features are replaced by the equivalents of the technical solutions of the embodiments of the present invention.

Claims (13)

  1. 一种分类模型训练系统的实现方法,其特征在于,包括:A method for implementing a classification model training system, comprising:
    S1,基于SPARK算法训练分类模型的外部管理需求,创建前端管理展示界面,并基于外部管理与后端服务的交互需求,定义所述前端管理展示界面的前端交互请求接口;S1, based on the SPARK algorithm to train the external management requirements of the classification model, create a front-end management presentation interface, and define a front-end interaction request interface of the front-end management display interface based on the interaction requirements of the external management and the back-end service;
    S2,基于SPARK算法训练分类模型的内部业务数据需求,创建后端服务数据源系统;S2, based on the SPARK algorithm to train the internal business data requirements of the classification model, and create a back-end service data source system;
    S3,基于所述前端管理展示界面的前端交互请求接口,创建后端服务控制接口,并建立所述后端服务控制接口与所述前端交互请求接口的对应关系;S3. Create a backend service control interface based on the front end interaction request interface of the front end management presentation interface, and establish a correspondence between the backend service control interface and the front end interaction request interface.
    S4,创建所述后端服务控制接口的内部业务逻辑,所述内部业务逻辑包括,基于SPARK算法训练分类模型的业务逻辑需求和所述前端交互请求接口,通过调用所述后端服务数据源系统,创建初始分类模型,并对所述初始分类模型进行训练和预测优化,获取目标分类模型。S4, creating internal business logic of the backend service control interface, the internal business logic includes: a business logic requirement for training a classification model based on a SPARK algorithm, and the front end interaction request interface, by calling the backend service data source system An initial classification model is created, and the initial classification model is trained and predicted to obtain a target classification model.
  2. 根据权利要求1所述的方法,其特征在于,步骤S1中所述创建前端管理展示界面的步骤进一步包括:The method according to claim 1, wherein the step of creating a front-end management presentation interface in step S1 further comprises:
    分别创建分类模型的训练管理界面、优化管理界面和分类模型管理界面;所述训练管理界面用于为SPARK算法训练分类模型的训练阶段提供外部管理支持,所述优化管理界面用于为SPARK算法训练分类模型的预测优化阶段提供外部管理支持,所述分类模型管理界面用于为所述目标分类模型提供外部管理支持;Separately creating a training management interface, an optimization management interface, and a classification model management interface of the classification model; the training management interface is used to provide external management support for the training phase of the SPARK algorithm training classification model, the optimization management interface is used for training the SPARK algorithm The predictive optimization phase of the classification model provides external management support for providing external management support for the target classification model;
    相应的,所述前端交互请求接口包括:前端训练交互请求接口、前端优化交互请求接口和前端模型管理交互请求接口。Correspondingly, the front end interaction request interface comprises: a front end training interaction request interface, a front end optimization interaction request interface, and a front end model management interaction request interface.
  3. 根据权利要求2所述的方法,其特征在于,所述训练管理界面至少包括:分类模型算法选择界面、分类模型算法参数设置界面、训 练数据源设置界面和数据预处理流程设置界面;The method according to claim 2, wherein the training management interface comprises at least: a classification model algorithm selection interface, a classification model algorithm parameter setting interface, a training data source setting interface, and a data preprocessing flow setting interface;
    所述优化管理界面至少包括:分类模型优化策略选择界面、分类模型优化标准设置界面和预测优化数据源设置界面;The optimization management interface includes at least: a classification model optimization strategy selection interface, a classification model optimization standard setting interface, and a prediction optimization data source setting interface;
    所述分类模型管理界面至少包括:分类模型版本管理界面和分类模型效果展现界面。The classification model management interface includes at least: a classification model version management interface and a classification model effect presentation interface.
  4. 根据权利要求3所述的方法,其特征在于,步骤S2中所述创建后端服务数据源系统的步骤进一步包括:The method of claim 3, wherein the step of creating a backend service data source system in step S2 further comprises:
    S21,导入SPARK算法的机器学习库SPARK-MLlib,并分别创建训练数据源库、预测优化数据源库和模型系统元数据库;S21, importing the SPARK-MLlib machine learning library SPARK-MLlib, and separately creating a training data source library, a prediction optimization data source library, and a model system metadata database;
    S22,将准备好的训练样本数据存入所述训练数据源库,并将预测优化样本数据存入所述预测优化数据源库。S22. The prepared training sample data is stored in the training data source library, and the predicted optimized sample data is stored in the predictive optimized data source library.
  5. 根据权利要求4所述的方法,其特征在于,所述S3的步骤进一步包括:The method of claim 4, wherein the step of S3 further comprises:
    S31,基于所述前端训练交互请求接口,创建后端训练管理控制接口,并建立所述前端训练交互请求接口与所述后端训练管理控制接口的对应关系;S31. Create a backend training management control interface based on the front end training interaction request interface, and establish a correspondence between the front end training interaction request interface and the back end training management control interface.
    S32,基于所述前端优化交互请求接口,创建后端优化管理控制接口,并建立所述前端优化交互请求接口与所述后端优化管理控制接口的对应关系;S32. Create a backend optimization management control interface based on the front end optimization interaction request interface, and establish a correspondence between the front end optimization interaction request interface and the backend optimization management control interface.
    S33,基于所述前端模型管理交互请求接口,创建后端模型管理控制接口,并建立所述前端模型管理交互请求接口与所述后端模型管理控制接口的对应关系。S33. The interaction request interface is managed based on the front-end model, and a back-end model management control interface is created, and a correspondence between the front-end model management interaction request interface and the back-end model management control interface is established.
  6. 根据权利要求5所述的方法,其特征在于,所述S4的步骤进一步至少包括:The method according to claim 5, wherein the step of S4 further comprises at least:
    S41,创建所述后端训练管理控制接口的内部训练业务逻辑,所述内部训练业务逻辑包括,基于SPARK算法训练分类模型过程的内部业务逻辑流程和所述前端训练交互请求接口,通过调用所述SPARK-MLlib、所述训练数据源库和所述模型系统元数据库,创建初始分类模型,并对所述初始分类模型进行训练,获取待优化分类模型;S41: Create internal training service logic of the backend training management control interface, where the internal training service logic includes an internal business logic flow for training a classification model process based on a SPARK algorithm and the front end training interaction request interface, by calling the SPARK-MLlib, the training data source library and the model system metadata database, creating an initial classification model, and training the initial classification model to obtain a classification model to be optimized;
    S42,创建所述后端优化管理控制接口的内部优化业务逻辑,所述内部优化业务逻辑包括,基于SPARK算法优化分类模型过程的内部业务逻辑流程和所述前端优化交互请求接口,通过调用所述SPARK-MLlib、所述预测优化数据源库和所述模型系统元数据库,对所述待优化分类模型进行预测优化,获取所述目标分类模型。S42: Create an internal optimization service logic of the backend optimization management control interface, where the internal optimization service logic includes an internal business logic flow that optimizes a classification model process based on a SPARK algorithm, and the front end optimization interaction request interface, by calling the The SPARK-MLlib, the prediction optimization data source library, and the model system metadata database perform prediction optimization on the classification model to be optimized, and acquire the target classification model.
  7. 根据权利要求6所述的方法,其特征在于,步骤S41中所述创建所述后端训练管理控制接口的内部训练业务逻辑的步骤进一步至少包括:The method according to claim 6, wherein the step of creating the internal training service logic of the backend training management control interface in step S41 further comprises at least:
    S411,基于数据预处理数据库包含的数据预处理算法,创建各所述数据预处理算法对应的预处理内部业务逻辑;S411: Create a pre-processed internal service logic corresponding to each of the data pre-processing algorithms based on a data pre-processing algorithm included in the data pre-processing database;
    S412,基于SPARK-MLlib包含的分类算法,创建各所述分类算法对应的生成分类模型的内部业务逻辑;S412. Create, according to a classification algorithm included in the SPARK-MLlib, an internal service logic that generates a classification model corresponding to each of the classification algorithms.
    S413,基于所述训练管理界面的设置数据,通过调用所述训练数据源库、所述预处理内部业务逻辑和所述生成分类模型的内部业务逻辑,创建训练分类模型的内部业务逻辑。S413. Create internal business logic of the training classification model by calling the training data source library, the pre-processing internal business logic, and the internal business logic of the generated classification model based on the setting data of the training management interface.
  8. 根据权利要求6所述的方法,其特征在于,S42中所述创建所述后端优化管理控制接口的内部优化业务逻辑的步骤进一步包括:The method according to claim 6, wherein the step of creating the internal optimization service logic of the backend optimization management control interface in S42 further comprises:
    S421,基于所述前端优化交互请求接口的请求数据,选定预测优化所述待优化分类模型的预测优化数据源和预测优化约束条件;S421: Select, according to the request data of the front end optimization interaction request interface, predictively optimize the prediction optimization data source and the prediction optimization constraint condition of the classification model to be optimized;
    S422,基于SPARK算法训练分类模型的优化过程内部业务逻辑流 程,创建预测优化所述待优化分类模型的数据访问和预测处理实现逻辑;S422, training an internal business logic process of the optimization process of the classification model based on the SPARK algorithm, and creating a data access and prediction processing implementation logic for predicting and optimizing the classification model to be optimized;
    S423,创建预测优化所述待优化分类模型过程的数据纠正内部业务逻辑,所述数据纠正内部业务逻辑包括,基于对所述待优化分类模型的优化结果,提取对分类模型预测错误的记录,进行数据纠正;S423: Create a data correction internal business logic that predicts and optimizes the classification model process to be optimized, and the data correction internal business logic includes: extracting a record of the prediction error of the classification model based on the optimization result of the classification model to be optimized, and performing Data correction
    S424,创建预测优化所述待优化分类模型过程的数据更新内部业务逻辑,所述数据更新内部业务逻辑包括,基于经所述数据纠正的模型系统元数据库,抽取分类模型元导入所述预测优化数据源库下一分区;S424: Create a data update internal business logic that predicts and optimizes the classification model process to be optimized, and the data update internal business logic includes: extracting a classification model element to import the prediction optimization data based on the model system metadata database corrected by the data The next partition of the source library;
    S425,基于所述预测优化约束条件,创建停止优化分类模型的内部业务逻辑,所述停止优化分类模型的内部业务逻辑包括,重新指定分类模型的预测优化数据源库,创建预测分类模型与数据优化业务逻辑,直至分类模型参数达到所述预测优化约束条件,停止模型优化。S425. Create an internal business logic that stops the optimization classification model based on the prediction optimization constraint condition. The internal business logic of the stop optimization classification model includes: re-specifying the prediction optimization data source library of the classification model, creating a prediction classification model and data optimization. The business logic stops the model optimization until the classification model parameters reach the prediction optimization constraint.
  9. 一种分类模型训练系统,其特征在于,包括:A classification model training system, comprising:
    前端管理展示界面,用于进行训练分类模型过程、预测优化分类模型过程和分类模型管理的外部设置管理,所述前端管理展示界面包括前端交互请求接口,用于外部管理与后端服务的信息交互;The front-end management display interface is used for performing the training classification model process, the prediction optimization classification model process, and the external setting management of the classification model management. The front-end management presentation interface includes a front-end interaction request interface, and is used for external management and information interaction of the back-end service. ;
    后端服务数据源系统,用于根据SPARK算法训练分类模型的内部业务逻辑调用请求,提供SPARK算法的机器学习数据源,训练数据源、预测优化数据源和模型系统元数据库;a backend service data source system for training an internal business logic call request of the classification model according to the SPARK algorithm, providing a machine learning data source of the SPARK algorithm, a training data source, a prediction optimization data source, and a model system metadata database;
    后端服务控制接口单元,用于建立所述前端交互请求接口与后端服务业务逻辑调用间的对应关系;a backend service control interface unit, configured to establish a correspondence between the front end interaction request interface and a backend service business logic call;
    后端服务业务处理单元,用于基于SPARK算法训练分类模型的业务逻辑需求和所述前端交互请求接口,通过调用所述后端服务数据源系统,创建初始分类模型,并对所述初始分类模型进行训练和预测优 化,获取目标分类模型。a backend service business processing unit, configured to train a business logic requirement of the classification model and the front end interaction request interface based on the SPARK algorithm, and create an initial classification model by calling the backend service data source system, and the initial classification model Perform training and predictive optimization to obtain the target classification model.
  10. 一种根据权利要求9所述分类模型训练系统的分类模型训练方法,其特征在于,包括:A classification model training method for a classification model training system according to claim 9, comprising:
    通过所述前端交互请求接口和所述后端服务控制接口单元,获取所述前端管理展示界面输入的分类模型的构建设置数据、训练过程设置数据和优化过程设置数据;Obtaining, by the front end interaction request interface and the back end service control interface unit, build setting data, training process setting data, and optimization process setting data of the classification model input by the front end management display interface;
    基于所述分类模型的构建设置数据,通过所述后端服务业务处理单元内部调用所述SPARK算法的机器学习数据源,构建初始分类模型,并存入所述模型系统元数据库;Constructing setting data based on the classification model, internally calling the machine learning data source of the SPARK algorithm by the backend service business processing unit, constructing an initial classification model, and depositing the model system metadata database;
    基于所述分类模型的训练过程设置数据,通过所述后端服务业务处理单元内部调用所述训练数据源,采用SPARK算法对所述初始分类模型进行训练,获取待优化分类模型;Based on the training process setting data of the classification model, the training data source is internally invoked by the backend service processing unit, and the initial classification model is trained by using a SPARK algorithm to obtain a classification model to be optimized;
    基于所述分类模型的优化过程设置数据,通过所述后端服务业务处理单元内部调用所述预测优化数据源,采用SPARK算法对所述待优化分类模型进行预测优化,获取目标分类模型。Based on the optimization process setting data of the classification model, the predictive optimization data source is internally invoked by the backend service business processing unit, and the SPARK algorithm is used to predict and optimize the classification model to be optimized to obtain a target classification model.
  11. 一种分类模型训练系统的实现设备,其特征在于,包括:An apparatus for implementing a classification model training system, comprising:
    至少一个处理器;以及At least one processor;
    与所述处理器通信连接的至少一个存储器,其中:At least one memory communicatively coupled to the processor, wherein:
    所述存储器存储有可被所述处理器执行的程序指令,所述处理器调用所述程序指令能够执行如权利要求1至8中任一所述的方法。The memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any one of claims 1-8.
  12. 一种非暂态计算机可读存储介质,其特征在于,所述非暂态计算机可读存储介质存储计算机指令,所述计算机指令使所述计算机执行如权利要求1至8中任一所述的方法。A non-transitory computer readable storage medium, wherein the non-transitory computer readable storage medium stores computer instructions, the computer instructions causing the computer to perform the method of any one of claims 1-8 method.
  13. 一种计算机程序产品,其特征在于,所述计算机程序产品包 括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行如权利要求1至8中任一所述的方法。A computer program product, comprising: a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a computer, The computer performs the method of any of claims 1-8.
PCT/CN2017/120174 2017-08-29 2017-12-29 Classification model training system and realisation method therefor WO2019041708A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710756004.6A CN107562859B (en) 2017-08-29 2017-08-29 A kind of disaggregated model training system and its implementation
CN201710756004.6 2017-08-29

Publications (1)

Publication Number Publication Date
WO2019041708A1 true WO2019041708A1 (en) 2019-03-07

Family

ID=60977352

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/120174 WO2019041708A1 (en) 2017-08-29 2017-12-29 Classification model training system and realisation method therefor

Country Status (2)

Country Link
CN (1) CN107562859B (en)
WO (1) WO2019041708A1 (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107577708A (en) * 2017-07-31 2018-01-12 北京北信源软件股份有限公司 Class base construction method and system based on SparkMLlib document classifications
CN107562859B (en) * 2017-08-29 2019-10-22 武汉斗鱼网络科技有限公司 A kind of disaggregated model training system and its implementation
CN109063050A (en) * 2018-07-19 2018-12-21 郑州云海信息技术有限公司 A kind of database journal analysis and early warning method and apparatus
CN109189767B (en) * 2018-08-01 2021-07-23 北京三快在线科技有限公司 Data processing method and device, electronic equipment and storage medium
CN109344853A (en) * 2018-08-06 2019-02-15 杭州雄迈集成电路技术有限公司 A kind of the intelligent cloud plateform system and operating method of customizable algorithm of target detection
CN109299178B (en) * 2018-09-30 2020-01-14 北京九章云极科技有限公司 Model application method and data analysis system
CN109656914A (en) * 2018-11-07 2019-04-19 上海前隆信息科技有限公司 On-line off-line mixed air control modeling training and production dissemination method and system
CN110119271B (en) * 2018-12-19 2020-09-04 厦门渊亭信息科技有限公司 Cross-machine learning platform model definition protocol and adaptation system
CN110347721A (en) * 2019-07-08 2019-10-18 紫光云技术有限公司 A kind of floristic analysing method of flag flower
CN111158666B (en) * 2019-12-27 2023-07-04 北京百度网讯科技有限公司 Entity normalization processing method, device, equipment and storage medium
CN111399958B (en) * 2020-03-17 2023-04-28 青岛创新奇智科技集团股份有限公司 Model training system and method with user interaction interface
CN112000325A (en) * 2020-08-11 2020-11-27 福建博思数字科技有限公司 Visual algorithm model construction method and storage medium
CN112486461B (en) * 2020-11-30 2024-04-09 彩讯科技股份有限公司 Information processing system based on springboot framework
CN112817650B (en) * 2020-12-28 2022-04-26 浙江中控技术股份有限公司 Task creation method, device and system in laboratory management system
CN114356300B (en) * 2021-09-17 2022-11-25 北京能科瑞元数字技术有限公司 Intelligent agent construction and development method based on industry digitalization

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105868019A (en) * 2016-02-01 2016-08-17 中国科学院大学 Automatic optimization method for performance of Spark platform
CN106547627A (en) * 2016-11-24 2017-03-29 郑州云海信息技术有限公司 The method and system that a kind of Spark MLlib data processings accelerate
CN106777006A (en) * 2016-12-07 2017-05-31 重庆邮电大学 A kind of sorting algorithm based on parallel super-network under Spark
CN106874478A (en) * 2017-02-17 2017-06-20 重庆邮电大学 Parallelization random tags subset multi-tag file classification method based on Spark
CN107562859A (en) * 2017-08-29 2018-01-09 武汉斗鱼网络科技有限公司 A kind of disaggregated model training system and its implementation

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1480870A (en) * 2003-07-16 2004-03-10 中南大学 Creater of swarm intelligence decision support system based on Internet structure and application method
CN106250987B (en) * 2016-07-22 2019-03-01 无锡华云数据技术服务有限公司 A kind of machine learning method, device and big data platform
CN106850346B (en) * 2017-01-23 2020-02-07 北京京东金融科技控股有限公司 Method and device for monitoring node change and assisting in identifying blacklist and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105868019A (en) * 2016-02-01 2016-08-17 中国科学院大学 Automatic optimization method for performance of Spark platform
CN106547627A (en) * 2016-11-24 2017-03-29 郑州云海信息技术有限公司 The method and system that a kind of Spark MLlib data processings accelerate
CN106777006A (en) * 2016-12-07 2017-05-31 重庆邮电大学 A kind of sorting algorithm based on parallel super-network under Spark
CN106874478A (en) * 2017-02-17 2017-06-20 重庆邮电大学 Parallelization random tags subset multi-tag file classification method based on Spark
CN107562859A (en) * 2017-08-29 2018-01-09 武汉斗鱼网络科技有限公司 A kind of disaggregated model training system and its implementation

Also Published As

Publication number Publication date
CN107562859B (en) 2019-10-22
CN107562859A (en) 2018-01-09

Similar Documents

Publication Publication Date Title
WO2019041708A1 (en) Classification model training system and realisation method therefor
US11449670B2 (en) Iterative development and/or scalable deployment of a spreadsheet-based formula algorithm
JP6816136B2 (en) Unified interface specification for interacting with and running models in a variety of runtime environments
CN106067080B (en) Configurable workflow capabilities are provided
US9043750B2 (en) Automated generation of two-tier mobile applications
CN109325041A (en) Business data processing method, device, computer equipment and storage medium
US11704594B2 (en) Machine learning system
US10902508B2 (en) Methods for extracting and adapting information to generate custom widgets and devices thereof
CN102915237A (en) Method and system of adapting data quality rules based upon user application requirements
US20190057335A1 (en) Targeted data element detection for crowd sourced projects with machine learning
CN105677751B (en) Scheduling method and system of relational database
EP2869195B1 (en) Application coordination system, application coordination method, and application coordination program
JP2018067280A (en) System, method, and program for executing software service
CN104661093A (en) Method and system for determining updates for a video tutorial
KR20210012400A (en) Method of building backend with automatic programming code generation
JP6176389B2 (en) Source code generation apparatus, source code generation method, and recording medium
US10885586B2 (en) Methods for automatically generating structured pricing models from unstructured multi-channel communications and devices thereof
RU2019128272A (en) Method and System for Determining User Performance in a Computer Crowdsourced Environment
US10248452B2 (en) Interaction framework for executing user instructions with online services
US11829890B2 (en) Automated machine learning: a unified, customizable, and extensible system
US9779387B2 (en) Business-to-business document user interface and integration design
CN111444170B (en) Automatic machine learning method and equipment based on predictive business scene
CN112181951A (en) Heterogeneous database data migration method, device and equipment
CN112395366A (en) Data processing and creating method and device of distributed database and electronic equipment
TWI735512B (en) Database operation method and device

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17923341

Country of ref document: EP

Kind code of ref document: A1