CN116362352A

CN116362352A - Model automatic updating method, system, medium and terminal based on machine learning

Info

Publication number: CN116362352A
Application number: CN202310636456.6A
Authority: CN
Inventors: 黄潮勇; 吴华夫; 黄鹏; 张亿仙
Original assignee: Guangzhou Smart Software Co ltd
Current assignee: Guangzhou Smart Software Co ltd
Priority date: 2023-06-01
Filing date: 2023-06-01
Publication date: 2023-06-30

Abstract

The invention provides a model automatic updating method and system based on machine learning, comprising the following steps: acquiring data and preprocessing the data; creating a first machine learning model based on the preprocessed data; running a first machine learning model and storing model nodes therein; calling the stored model node, constructing a second machine learning model, and releasing and deploying the second machine learning model on line; setting a first machine learning model to perform self-learning, and storing parameters of model nodes after new data optimization; when the first machine learning model reaches the threshold value set by the monitoring index through self-learning, the second machine learning model automatically loads the optimized parameters before the next operation. The invention automates the self-learning of the model, can automatically release or deploy the model meeting certain requirements into production, can solve the problem of reduced accuracy caused by the fact that the model cannot be updated for a long time, and greatly lightens the workload of operation and maintenance staff.

Description

Model automatic updating method, system, medium and terminal based on machine learning

Technical Field

The invention relates to the technical field of machine learning, in particular to a model automatic updating method, system, medium and terminal based on machine learning.

Background

Models trained in machine learning published into produced services may gradually decrease in accuracy over time. At this time, if training is performed again by supplementing new data, it is possible to improve the model accuracy. However, at present, a model meeting certain requirements is manually released or deployed in production, and the problem that accuracy is reduced due to the fact that the model cannot be updated in a short period can be solved, but the workload of operation and maintenance staff is greatly increased.

Through searching, the Chinese patent publication No. CN113077057A discloses an unbiased machine learning method, wherein a machine learning model is established, unbiased parameters are automatically learned, the optimal performance is automatically mined, and a clustering model and an association model are not applicable to a classification algorithm.

Further, the Chinese patent publication No. CN113011596A discloses an automatic model updating method comprising the following steps: constructing an initial model based on a plurality of initial features of the first time series data within a first predetermined period of time; judging whether a model evaluation index of the constructed initial model meets a preset requirement or not; if the model evaluation index of the initial model meets the preset requirement, acquiring second time sequence data in a second preset time period; determining a plurality of update characteristics of the acquired second time series data; updating the initial model based on the plurality of update characteristics. The patent has a problem that the accuracy of model prediction is lowered due to the change of the characteristic distribution.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a model automatic updating method, a system, a medium and a terminal based on machine learning.

According to one aspect of the invention, obtaining data and preprocessing the data is provided for creating a model;

creating a first machine learning model comprising a plurality of model nodes based on the preprocessed data;

running the first machine learning model and storing model nodes therein;

calling the stored model nodes, constructing a second machine learning model, and publishing and deploying the second machine learning model on line;

setting the first machine learning model to perform self-learning, and storing parameters of model nodes after new data optimization;

when the first machine learning model reaches the threshold value set by the monitoring index through self-learning, the second machine learning model automatically loads the optimized parameters before the next operation.

Preferably, the acquiring data and preprocessing the data includes:

acquiring a field from a data source as a data characteristic;

and performing feature conversion and feature dispersion on the data features.

Preferably, the model node is a learning result generated by training from data; the model nodes comprise classification method nodes, cluster evaluation nodes or association rule nodes.

Preferably, the classification method is a calculation and analysis from a training set of known classes, from which class rules are found and the class of new data is predicted;

the clustering evaluation is to divide a large number of unknown marked data sets into a plurality of categories according to the intrinsic similarity of the data, so that the data similarity in the categories is larger, and the data similarity among the categories is smaller;

the association rule is used for reflecting interdependence and association between one thing and other things, mining the correlation relation between valuable data items from a large amount of data, and analyzing the association from the data to obtain a rule of 'occurrence of some events caused by occurrence of other events'.

Preferably, the stored model nodes are implemented by extraction or training, wherein:

the extraction comprises the steps of using a feature extraction method in machine learning to count and learn a rule of features from the data; after the extraction is completed, the rule of the learned characteristics can be stored as a model for calling;

the training includes learning a correlation law from the extracted features using a plurality of classification and regression methods in machine learning; after the training is completed, the learned relevant rules can be saved as a model for calling.

Preferably, the self-learning mechanism of the first machine learning model is timed by setting a timed mode, and parameters of the first machine learning model are optimized at regular time.

Preferably, the monitoring index comprises an online threshold and an early warning threshold;

when the first machine learning model reaches at least one of the online threshold or the early warning threshold, the second machine learning model automatically loads optimized parameters before the next operation.

According to a second aspect of the present invention, there is provided an automatic model updating system based on machine learning, comprising:

the data module is used for acquiring data and preprocessing the data and creating a model;

a first model module that creates a first machine learning model comprising a plurality of model nodes based on the preprocessed data;

a storage module that runs the first machine learning model and stores model nodes therein;

the second model module calls the stored model nodes, builds a second machine learning model and distributes the second machine learning model to the online;

the updating module is used for setting the first machine learning model to perform self-learning and storing parameters of model nodes after new data optimization; when the first machine learning model reaches the threshold value set by the monitoring index through self-learning, the second machine learning model automatically loads the optimized parameters before the next operation.

According to a third aspect of the present invention, there is provided a terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor being operable to perform the above-described automatic model updating method based on machine learning or to run the above-described automatic model updating system based on machine learning when executing the program.

According to a fourth aspect of the present invention, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, is operative to perform the above-described machine learning based model automatic updating method or to run the above-described machine learning based model automatic updating system.

Compared with the prior art, the invention has at least one of the following beneficial effects:

the model self-learning in the embodiment of the invention is to automate the step, and can automatically release or deploy the model meeting certain requirements into production, so that the problem of reduced accuracy caused by long-term update failure of the model can be solved, and the workload of operation and maintenance personnel is greatly reduced.

The invention is suitable for scenes such as portrait analysis, accurate marketing, risk control and the like in the fields of finance, retail, real estate, manufacturing, education and the like, and is also suitable for scenes such as public opinion monitoring, anomaly identification, event prediction and the like in the fields of medical treatment, weather, audit and the like. In the above scenario, the data is continuously updated, so that the model is required to be updated periodically in order to ensure the accuracy of the corresponding model.

Drawings

Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:

FIG. 1 is a flow chart of a machine learning based model automatic update method in an embodiment of the invention;

fig. 1-1 is an interface diagram of a case of establishing a bank credit in a first embodiment of the application of the present invention;

FIGS. 1-2 are schematic diagrams of interfaces for performing model saving in a first embodiment of the present invention;

FIGS. 1-3 are schematic diagrams of interfaces for storing models into trained models in a first embodiment of an application of the present invention;

FIGS. 1-4, 1-5, 1-6 are schematic interface diagrams of new credit batch prediction cases in application example one of the present invention;

FIGS. 1-7 are diagrams of an automatic training setting operation interface in a first embodiment of the present invention;

FIGS. 1-8 are schematic diagrams of a timing job setting operation interface according to a first embodiment of the present invention;

FIGS. 1-9 are schematic diagrams of interfaces for updated credit batch prediction cases in application example one of the present invention;

FIG. 2-1 is a schematic diagram of an interface for creating a wine identification type identification model in a second embodiment of the present invention;

FIG. 2-2 is a schematic diagram of an interface for establishing a new wine type recognition model for self-learning in a second embodiment of the present invention;

FIGS. 2-3 are schematic diagrams of an interface for automatic training settings in a second embodiment of the present invention;

FIG. 3-1 is a schematic diagram of an interface for creating a shopping basket analysis model in a third embodiment of the present invention;

FIG. 3-2 is a schematic diagram of an interface for establishing a shopper prediction model in a third embodiment of the present invention;

FIGS. 3-3-1, 3-3-2 and 3-3-3 are schematic interface diagrams of the automatic training arrangement in a third embodiment of the application of the present invention.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the present invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications could be made by those skilled in the art without departing from the inventive concept. These are all within the scope of the present invention.

Referring to fig. 1, the invention provides an embodiment, an automatic model updating method based on machine learning, which specifically comprises the following steps:

s100, acquiring data and preprocessing the data to obtain a data structure conforming to a model input rule;

s200, creating a first machine learning model comprising a plurality of model nodes based on the data preprocessed in S100;

s300, running the first machine learning model in S200 and storing model nodes therein;

s400, calling the model node stored in the S300, constructing a second machine learning model, and releasing and deploying the second machine learning model on line;

s500, setting a first machine learning model established in S200 to perform self-learning, and storing parameters of model nodes after new data optimization;

when the first machine learning model reaches the threshold set by the monitoring index through self-learning, the optimized parameters are automatically loaded before the next operation of the second machine learning model established in S400.

The first machine learning model in S200 in this embodiment is mainly in the early effect verification stage, and is used for generating the initial model. The second machine learning model in S400 is mainly service release deployment after late service online. The only difference between the two models is the source of the data and the output result. The data source of the former machine learning model is mainly the data for test; the latter is the actual business data.

Models trained in machine learning published into produced services may gradually decrease in accuracy over time. At this time, if training is performed again by supplementing new data, it is possible to improve the model accuracy. The model self-learning in the embodiment mainly automates the training and online deployment processes of the model, and can automatically release or deploy the model meeting certain requirements into production, so that the problem of accuracy reduction caused by long-term non-updating of the model can be solved, and the workload of operation and maintenance personnel is greatly reduced.

In a preferred embodiment of the present invention, S100 is implemented, specifically, data of a high-value guest group in a product of a retail customer of a bank for about one year is obtained from a data source, and feature conversion and feature dispersion are performed on the data, so as to obtain a data structure conforming to a model input rule. Specifically, the operation of feature transformation is similar to the value digitization of the original field, such as changing men and women into [ 0,1 ]; the characteristic discretization refers to discretizing continuous data; operations of this type may be collectively referred to as feature engineering, e.g., dimension of unified data, normalization, etc. Further, in some embodiments, the method includes feature selection, feature transformation, chi-square feature selection, PCA, oneHot encoding, feature discrete, custom discrete, random forest feature selection, GBDT feature selection, automatic feature combination, regularization in normalization, normalization of minimum and maximum values, normalization of maximum and absolute values, and the like.

In a preferred embodiment of the invention, S200 is implemented. Specifically, the model nodes are learning results generated by training from the data. The model nodes comprise classification method nodes, cluster evaluation nodes and association rule nodes. Models described in this embodiment include, but are not limited to, models generated by classification, clustering, and association rule-related methods training.

In a preferred embodiment, model self-learning is a general framework, and classification, clustering and association rules are a key ring. Classification methods, clustering methods, and association rule methods are tools for solving specific business problems. The classification method is the calculation and analysis of a training set of known categories, from which category rules are found and the categories of new data are predicted; the clustering method is to divide a large number of unknown marked data sets into a plurality of categories according to the intrinsic similarity of the data, so that the data in the categories have larger similarity and the data among the categories have smaller similarity. For example: customer value subdivision: high value customers, general customers, low value customers. And the association rule method reflects the interdependence and association between one thing and other things, is used for mining the correlation relation between valuable data items from a large amount of data, and can analyze rules such as 'the occurrence of some events caused by the occurrence of other events' from the data. Association rules are often used to make recommendation problem analysis. Under these business scenarios, the data is continuously updated, and in order to ensure the accuracy of the corresponding model, it is necessary to update the model periodically.

In a preferred embodiment of the present invention, the self-learning mechanism of the first machine learning model is timed by setting a pattern of timings, which periodically optimizes parameters of the first machine learning model.

Model self-learning is a general framework, and the classification method, the clustering method and the association rule mentioned in the above embodiments of the present invention are a key ring in model self-learning. They are tools for solving specific business problems. In the business scene used by the classification method, the clustering method and the association rule, the data can be continuously updated, and in order to ensure the accuracy of the corresponding model, the model is required to be periodically updated.

In a preferred embodiment of the present invention, a method of storing model nodes from a first machine learning model is implemented S300 including, but not limited to, extraction and training. Extraction refers to the rule of extracting features from data by using a feature extraction method in machine learning. After extraction, the learned rules can be stored as models for use by the on-line service. Training refers to learning relevant knowledge from selected features using models of various classification and regression methods in machine learning. After training, the learned rules can be stored as models for use by the on-line service. That is, extraction is to preprocess the features, and training is to learn the data after feature preprocessing further to obtain a final model.

In a preferred embodiment of the present invention, S400 and S500 are implemented, specifically, the extracted and trained model nodes in S300 are directly invoked to form a new machine learning model, i.e. a second machine learning model; then, after the first machine learning model reaches the threshold set by the monitoring index through self-learning, the optimized parameters are automatically loaded before the next operation of the second machine learning model established in S400.

The monitoring indexes in this embodiment, including optional weighted recovery, weighted F1 score, accuracies and weighted precision (weighted precision), determine whether the first machine learning model is synchronized with the second machine learning model from the optimized parameters after learning by determining whether the monitoring indexes reach the early warning threshold or the online threshold.

The on-line threshold is a value of the monitoring index reaching the synchronous optimization parameter. Further, an accuracy rate is set as a monitoring index and a lower online threshold value, so that the first machine learning model self-learning can update a corresponding second machine learning model when the monitoring index reaches the online threshold value. For accuracy, the condition online deployment as a service can be satisfied only when the accuracy of the newly generated first machine learning model is higher than the online threshold.

The on-line effect of the newly trained model is to be checked, so that the above embodiment of the invention sets the threshold value to screen the newly trained model, and only the model relatively meeting the user requirement is selected. In addition, the performance indexes of the model may be different in different service scenes, so that service experts are required to select the indexes of the model according to the past modeling experience, and the model has greater flexibility.

Based on the same inventive concept, in other embodiments, a model automatic updating system based on machine learning is provided, including a data module, a first model module, a storage module, a second model module, and an updating module; the data module acquires data and preprocesses the data, and the data module is used for creating a model; the first model module creates a first machine learning model comprising a plurality of model nodes based on the preprocessed data; the storage module runs a first machine learning model and stores model nodes therein; the second model module invokes the stored model nodes to construct a second machine learning model and distributes the second machine learning model to the online; the updating module sets a first machine learning model to perform self-learning and stores parameters of model nodes after new data optimization; when the first machine learning model reaches the threshold value set by the monitoring index through self-learning, the second machine learning model automatically loads the optimized parameters before the next operation.

The specific implementation techniques of the steps corresponding to the automatic model updating method based on machine learning in the above embodiment of the present invention may be referred to for each module/unit in the above embodiment, and will not be described herein.

Based on the same inventive concept, in other embodiments, a terminal is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor is configured to execute the above-mentioned automatic model updating method based on machine learning or to execute the above-mentioned automatic model updating system based on machine learning when executing the program.

Optionally, a memory for storing a program; memory, which may include volatile memory (english) such as random-access memory (RAM), such as static random-access memory (SRAM), double data rate synchronous dynamic random-access memory (Double Data Rate Synchronous Dynamic Random Access Memory, DDR SDRAM), and the like; the memory may also include a non-volatile memory (English) such as a flash memory (English). The memory is used to store computer programs (e.g., application programs, functional modules, etc. that implement the methods described above), computer instructions, etc., which may be stored in one or more memories in a partitioned manner. And the above-described computer programs, computer instructions, data, etc. may be invoked by a processor.

The computer programs, computer instructions, etc. described above may be stored in one or more memories in partitions. And the above-described computer programs, computer instructions, data, etc. may be invoked by a processor.

A processor for executing the computer program stored in the memory to implement the steps in the method according to the above embodiment. Reference may be made in particular to the description of the embodiments of the method described above.

The processor and the memory may be separate structures or may be integrated structures that are integrated together. When the processor and the memory are separate structures, the memory and the processor may be connected by a bus coupling.

Based on the same inventive concept, in other embodiments, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor is operable to perform the above-described machine learning based model automatic updating method, or to run the above-described machine learning based model automatic updating system.

Among them, computer-readable media include computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. In addition, the ASIC may reside in a user device. The processor and the storage medium may reside as discrete components in a communication device.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to the application. It will be understood that each flowchart and/or block of the flowchart illustrations and/or block diagrams, and combinations of flowcharts and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Based on the same conception as the above embodiments, in other embodiments of the present invention, a unified model, online service and model self-learning monitoring system is provided so as to monitor the running states and action call records of each task. The present invention provides three specific examples of application to better explain the present invention.

Example 1

1-1, a machine learning modeling flow (a bank credit loan prediction case can be considered as a first machine learning model) is established in a data mining experimental system as shown in a graph interface, and the machine learning modeling flow is successfully operated and stored for self-learning of a subsequent model.

As can be seen from fig. 1-1, the bank credit prediction case includes 5 steps,

s101, reading a loan data source;

s102, carrying out data preprocessing on the loan data read in the S101;

s103, data exploration is carried out on the data preprocessed in the S102;

s104, constructing a prediction model based on the data preprocessed in the S102;

s105, evaluating the prediction model of the model 104, and checking the model performance.

In some possible embodiments, the preprocessing in S102 includes feature selection, feature conversion, extraction, transformation, and feature dispersion. The series of actions do not occur sequentially, but may occur in a combined fashion, primarily to process data entered by the most conforming model. For example, in a specific embodiment, the feature transformation may be the conversion of "yes" or "no" values of the feature into ordinal values, and the feature discretization may be the discretization of the feature (e.g., age and balance features).

Further, in a preferred embodiment, the data preprocessing process in S102 is as follows:

s1021, selecting features of the loan data source;

s1022, extracting the selected feature combination feature conversion in S1021;

s1023, selecting the features in S1021 and converting the features extracted in S1022;

s1024, selecting the characteristics from the characteristics after the conversion in S1023;

s1025, S1024 feature combining feature discrete extraction;

s1025, the features extracted in S1025 are transformed with the features of S1024.

The relational data about the loan data source in S1021 is divided into rows and columns, where the fields of each column are features, and each row will be referred to as a record. The method is characterized in that the field capable of describing the record, such as a student information table, may comprise several fields, but the condition of a certain record can be described by age, sex and height, so that the terms of age, sex and height are called as fields. The fields of the loan data source of this example include ' age ', ' occupation ', ' marital status ', ' education level ', ' balance ', ' whether there is room ', ' whether there is a loan ', ' loan term ', ' default time ', ' day ', ' month ', ' campaign ', ' whether there is a loan (bit).

The feature selection in S1021 is mainly to select and input the field of the next node. The conversion in S1022 is to convert the field, for example, the 'marital status' is in the original data [ married|unmarked ], and the model training cannot be entered, and the conversion is needed to be performed [ 1|0 ]; the feature selection node needs to be used with other functional nodes, such as feature conversion, feature dispersion and various algorithm nodes.

The feature discrete in S1025 is obtained by performing fractional discrete on the ' age ' balance '. The extraction is used together with other nodes of other nodes, the left side of the node is connected with a corresponding feature extraction algorithm, the right side of the node is connected with data, and the extraction rule is counted and learned from the data. When extraction is completed, the learned extraction rules need to be saved as a model in the prediction process.

In some other embodiments, the data exploration of S103 is to view the correlation between the field and the target.

In some other embodiments, the model building process of S104 is:

s1041, selecting the characteristics from the data characteristics preprocessed in S102;

s1042, splitting the features selected in S1041, wherein a part of the features are used as training sets and a part of the features are used as test sets;

s1043, training the training set split in the S1042 by using a logistic regression method;

s1044, predicting the test set split in S1042.

Secondly, clicking the extraction node and the training node in the figure 1-1, and storing the corresponding methods into models, wherein the stored models appear in a left 'trained model' folder as shown in the figure 1-2, and the stored models are credit conversion, credit dispersion and credit prediction as shown in the figure 1-3. Extracting some characteristic rules of the corresponding data preprocessing process in the step for learning; training in the step corresponds to learning of a corresponding rule of the feature pre-processed data by a gradient lifting decision tree method; and extracting and training corresponding learning results, and directly storing the learning results as a model.

Thirdly, referring to fig. 1-4, a second data mining experiment, i.e. a second machine learning model, is created using the same experimental procedure, where the same case may be used again. Specifically, the new credit batch prediction case comprises four steps, specifically:

s201, reading a loan data source;

s202, performing data preprocessing on the data of the S201;

s203, constructing a model based on the data preprocessed in the S202;

s204, evaluating the model constructed in the S203, and checking the model performance.

In some other embodiments, the preprocessing of S202 includes two steps, respectively:

s2021, credit conversion is carried out on the loan data source;

s2022, performing credit dispersion on the credit converted data.

In the implementation, the conversion is to convert the value of 'yes' and 'no' of the feature into an order value; the feature discretization is to discretize the age and balance features.

In some other embodiments, the build model of S203 includes two steps, respectively:

s2031, selecting features from the preprocessed data;

s2032, credit prediction is performed for the feature selected in S2031.

The extraction nodes and training nodes in fig. 1-1 are then replaced with corresponding trained models in the second step (the extraction nodes and training nodes in the first machine learning model are saved and renamed 'credit conversion', 'credit dispersion', 'credit prediction'), i.e. feature preprocessing is performed using the trained models, data prediction is performed using a clustering method model or an association rule model, and the model is run successfully and saved for model batch prediction. Referring to fig. 1-5, clicking on the model batch prediction icon of the panel below the experiment, the pop-up window clicks on the save.

Fourth, referring to fig. 1-6, returning to the first experiment (bank credit prediction case), clicking on the set model self-study Xi Tubiao in the lower panel, configuring in the pop-up dialog box, specifically referring to fig. 1-7, setting the selected evaluation node as the evaluation node in fig. 1-1, monitoring the index as accuracy, the on-line threshold as 0.9, the early warning threshold as 0.6, selecting the training node as the training node in fig. 1-1, selecting the second experiment (model batch prediction) just created in the update service/prediction, and selecting the corresponding saved model (credit prediction) by the model node.

In the embodiment, the accuracy is selected as the monitoring index and a lower online threshold value, so that the model self-learning can update the corresponding model when the index reaches the threshold value.

And fifthly, referring to fig. 1-7, clicking a save button, clicking a 'set timing task', referring to fig. 1-8, clicking a 'manual execution' on a pop-up timing operation dialog box, filling in corresponding parameters such as a plan name, a task to be executed, an interval type, operation setting and the like, and displaying an operation request to send, wherein the model self-learning is running. After a period of operation or after seeing completion in the model self-learning list, a second experiment (credit batch prediction) is opened, and it can be seen that the corresponding prediction model has been updated, see fig. 1-9.

In this embodiment, the time stamp suffix added to the model name is one of the ways of judging that the model has been updated, or it is also possible to adjust the model parameters in the model self-learning, and then observe the change of the prediction result in the model batch prediction, but this method is relatively more complicated.

Example two clustering model

The clustering process is similar to the process of the first embodiment, and specifically:

in a first step, referring to fig. 2-1, a wine kind identification case (first machine learning model) is established, and the case includes four steps, specifically:

s301, reading a wine data source;

s302, preprocessing the data of the wine data source in S301, and performing characteristic engineering;

s303, constructing a data clustering model for the data obtained in the S302 based on a clustering method;

s304, data exploration is carried out on the clusters of the data clustering model acquired in S303.

In some other embodiments, the feature engineering of S302 is as follows:

s3021, performing feature selection on the data of S301, wherein the feature selection generally comprises common fields including alcohol, malic_acid, ash, alcalinity_of_ash, magnesium, total_phenol, flavanoids, non-flavanoid_phenol, pro-hook_intensity, hue, and pro-line.

S3022, extracting the features selected in the S3021 by combining with standardization, and eliminating the influence of data dimension differences;

s3023, transforming the features extracted in S3022 and the features selected in S3021 ("transforming" function nodes need to be used together with two nodes, namely "algorithm" and "feature selection", and "transform" is connected to the left side "algorithm" and "feature selection" is connected to the right side ", which means that the selected feature column completes the transformation of a certain algorithm).

In some other embodiments, the process of constructing the cluster model of S303 is:

s3031, performing feature selection on the data preprocessed in the step S302;

s3032, clustering training is carried out by adopting a K-means method;

s3033, the model after the clustering training is completed outputs the clustering coefficient and the data clustering.

S303, clustering the data set according to the input model, connecting a clustering model on the left side and connecting data to be clustered on the right side;

using a model of a clustering algorithm, and finally outputting a clustering coefficient and a data clustering condition; and outputting each index clustering center point through the clustering coefficient.

2-2, clicking the buttons of extraction and clustering training based on the wine category recognition cases in the first step, and storing the extraction corresponding model of grape standard and the clustering training corresponding grape prediction into the trained model;

thirdly, a new 'wine category identification self-learning model' case (second machine learning model) is established, wherein the case comprises four steps, namely:

s401, reading a wine data source;

s402, preprocessing the wine data of the S401, and performing characteristic engineering;

s403, constructing a clustering model;

s404, data searching is carried out on the data clusters of the cluster model constructed in S403.

In a preferred embodiment, the data preprocessing of S402 is to transform the data source acquired in S401, at which time the "grape standard" model in the trained model is called at the node.

In a preferred embodiment, the cluster model is constructed in S403 by invoking the "grape prediction" model in the trained model at that node.

2-3, performing self-learning on the model stored in the 'wine type recognition case' in the first step, setting conditions, specifically, setting a selection evaluation node as cluster evaluation, wherein a monitoring index is SSE, an online threshold value can be 99, a threshold early warning is 1, a training node is selected as cluster training, updating service/prediction is performed as a wine type recognition self-learning model, and a model node is grape prediction.

And if the condition is met, updating and replacing the corresponding model of the 'wine type identification self-learning model' case established in the third step, and then deploying and applying online.

Embodiment three association rule model

First, a shopping basket analysis case is established, see fig. 3-1, which includes:

s501, reading shopping list data sources;

s502, preprocessing shopping list data sources of S501;

s503, performing model training based on the data preprocessed in the S502;

s504, outputting the model coefficient trained in the S503;

s505, predicting by adopting a trained model.

In some other embodiments, the data preprocessing of S502 is by way of aggregation and ordering, enabling shopping list data to group and order cell phone item lists according to user ids.

In some other embodiments, the model training process of S503 is specifically:

s5031, performing feature selection on the preprocessed data, such as group_id and collection_set;

s5032, splitting 5031 selected features into a training set and a prediction set;

s5033, carrying out model training on the training set in S5032 by adopting an FP-GROWTH rule association method to obtain a shopping basket model;

in some other embodiments, the trained model coefficients in S504 include frequent item sets, frequencies, supporters;

second, storing the shopping basket model trained in the step S5033 as a trained model;

3-2, calling the shopping basket model stored in the second step, and building a new shopping basket prediction case which is consistent with the shopping basket analysis case structure in the first step;

fourthly, referring to the figures 3-3-1, 3-3-2 and 3-3-3, returning to the shopping basket analysis case in the first step, setting a monitoring index and an online threshold, and when the new shopping basket prediction case meets the index and threshold requirements, performing online release. FP-Growth is a specific association rule generation method, and the data in the example data source is shopping basket data, and the purpose of this experiment is to recommend related items to the user through the purchased items of the user (i.e., predict item list recommendations for possible purchase based on the item list purchased).

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The foregoing describes specific embodiments of the present invention. It is to be understood that the invention is not limited to the particular embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the claims without affecting the spirit of the invention. The above-described preferred features may be used in any combination without collision.

Claims

1. A machine learning based model automatic update method, comprising:

acquiring data and preprocessing the data to obtain a data structure conforming to a model input rule;

running the first machine learning model and storing model nodes therein;

2. The machine learning based model automatic updating method of claim 1, wherein the acquiring data and preprocessing the data comprises:

acquiring a field from a data source as a data characteristic;

and performing feature conversion and feature dispersion on the data features.

3. The automatic model updating method based on machine learning according to claim 1, wherein the model node is a learning result generated by training from the data; the model nodes comprise classification method nodes, cluster evaluation nodes or association rule nodes.

4. A machine learning based model auto-update method according to claim 3 characterized in that,

the classification method is to calculate and analyze a training set of known categories, discover category rules from the training set and predict the categories of new data;

5. The machine learning based model automatic updating method of claim 1, wherein the stored model nodes are implemented by extraction or training, wherein:

6. The method according to claim 1, wherein the self-learning mechanism of the first machine learning model is performed at regular time by setting a pattern of the timing, and parameters of the first machine learning model are optimized at regular time.

7. The method for automatically updating a model based on machine learning of claim 1,

the monitoring index comprises an online threshold and an early warning threshold;

8. A machine learning based model automatic update system, comprising:

the data module acquires data and preprocesses the data to acquire a data structure conforming to the input rule of the model;

9. A terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor is operable to perform the method of any one of claims 1-7 or to run the system of claim 8 when the program is executed by the processor.

10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor is operative to perform the method of any one of claims 1-7 or to run the system of claim 8.