WO2019148669A1 - Method and apparatus for generating machine learning model, computer device, and storage medium - Google Patents

Method and apparatus for generating machine learning model, computer device, and storage medium Download PDF

Info

Publication number
WO2019148669A1
WO2019148669A1 PCT/CN2018/084039 CN2018084039W WO2019148669A1 WO 2019148669 A1 WO2019148669 A1 WO 2019148669A1 CN 2018084039 W CN2018084039 W CN 2018084039W WO 2019148669 A1 WO2019148669 A1 WO 2019148669A1
Authority
WO
WIPO (PCT)
Prior art keywords
identification information
machine learning
data processing
stored
learning model
Prior art date
Application number
PCT/CN2018/084039
Other languages
French (fr)
Chinese (zh)
Inventor
陈海涛
晏存
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019148669A1 publication Critical patent/WO2019148669A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Definitions

  • the present application relates to the field of machine learning technologies, and in particular, to a method, an apparatus, a computer device, and a storage medium for generating a machine learning model.
  • the machine learning process generally includes a data preprocessing process and a model training process. Both the data pre-processing process and the model training process require engineers to implement the code. However, in many cases, the data preprocessing methods used in the machine learning process, the algorithms for model training, etc. all have high similarities. If you need engineers to write code every time, it will inevitably bring a lot of work to the engineers, causing repetitive work and wasting time.
  • the embodiment of the present application provides a method, a device, a computer device, and a storage medium for generating a machine learning model, which can quickly obtain a machine learning model.
  • the embodiment of the present application provides a method for generating a machine learning model, including: acquiring training data; acquiring a pre-stored data processing module and a pre-stored algorithm module; and performing the training data according to the pre-stored data processing module.
  • the rules display the machine learning model and its corresponding verification metrics.
  • an embodiment of the present application provides a device for generating a machine learning model, including:
  • a data acquisition unit configured to acquire training data
  • a module retrieval unit configured to retrieve a pre-stored data processing module and a pre-stored algorithm module
  • a data processing unit configured to perform data preprocessing on the training data according to the pre-stored data processing module to obtain processed training data
  • a model generating unit configured to perform training and verification on the pre-stored algorithm module according to the processed training data to obtain a machine learning model and a verification indicator corresponding to the machine learning model
  • a display unit configured to display the machine learning model and its corresponding verification indicator according to a preset display rule.
  • the embodiment of the present application further provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program
  • a computer device including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program
  • the embodiment of the present application further provides a storage medium, wherein the storage medium stores a computer program, where the computer program includes program instructions, and the program instructions, when executed by the processor, cause the processor to execute The method for generating a machine learning model according to any one of the embodiments of the present application.
  • the method for generating a machine learning model provided by the embodiment of the present application does not need to acquire a machine learning model through programming every time, thereby greatly reducing the workload of the engineer, saving the time of the engineer, and improving the efficiency of acquiring the machine learning model.
  • FIG. 1 is a schematic flowchart of a method for generating a machine learning model according to an embodiment of the present application
  • FIG. 2 is a schematic diagram of a specific process of a method for generating a machine learning model shown in FIG. 1;
  • FIG. 3 is another schematic flowchart of a method for generating a machine learning model according to an embodiment of the present disclosure
  • FIG. 4 is a schematic structural diagram of a preset user operation interface according to an embodiment of the present application.
  • FIG. 5 is a schematic block diagram of a device for generating a machine learning model according to an embodiment of the present disclosure
  • FIG. 6 is a schematic block diagram of a device for generating a machine learning model according to an embodiment of the present disclosure
  • FIG. 7 is a schematic block diagram of a device for generating a machine learning model according to an embodiment of the present disclosure
  • FIG. 8 is a schematic block diagram of a computer device according to an embodiment of the present application.
  • FIG. 1 is a schematic flowchart of a method for generating a machine learning model according to an embodiment of the present application.
  • the method is applied to terminals such as desktop computers, laptop computers, and tablet computers.
  • the method includes steps S101 to S105.
  • the terminal can interact with the user by using a preset user operation interface to obtain training data.
  • the user can input the storage path of the training data into the corresponding input box by pressing a button such as “Browse” in the user operation interface, so that after the user clicks the “confirm submit” button, the terminal acquires the corresponding according to the storage path. Training data.
  • the training data can be historical registration data.
  • the training data can be used to register the user name, mobile phone number, etc. of the Lujin Institute financial platform.
  • the training data can also be transaction data.
  • the training data is transaction data for a user to purchase a wealth management product or the like.
  • the training data can also be other data, and no specific restrictions are imposed here.
  • S102 Acquire a pre-stored data processing module and a pre-stored algorithm module.
  • the pre-stored data processing module and the pre-stored algorithm module are pre-stored in the terminal.
  • the pre-stored data processing module may be a module encapsulated by a data processing method commonly used in a machine learning process.
  • the pre-stored data processing module may be a module encapsulated by methods such as an outlier detection method and a continuous value discretization method.
  • the pre-stored algorithm module can be a module encapsulated by algorithms commonly used in the model training process.
  • the pre-stored algorithm module may be a module encapsulated by a classification algorithm, a regression algorithm, or the like.
  • the pre-stored algorithm module may be a module encapsulated by a logistic regression algorithm, an SVM algorithm, a decision tree algorithm, or the like.
  • the pre-stored data processing module and the pre-stored algorithm module may be a shell script. That is to say, some commonly used data processing methods, algorithms, etc. are pre-packaged into a shell script, and then stored in the terminal for subsequent calls.
  • the terminal needs to acquire the data processing module identification information and the algorithm module identification information before the pre-stored data processing module and the pre-stored algorithm module are retrieved.
  • the pre-stored data processing module identification information in the terminal is obtained as the data processing module identification information
  • the pre-stored algorithm module identification information in the terminal is obtained as the algorithm module identification information. That is to say, at this time, the user does not need to input the data processing module identification information and the algorithm module identification information.
  • the method includes: acquiring a corresponding pre-stored data processing module according to the data processing module identification information; and acquiring corresponding pre-stored according to the algorithm module identification information Algorithm module.
  • the terminal stores pre-stored data processing module and pre-stored algorithm module corresponding identification information, which are respectively referred to as pre-stored data processing module identification information and pre-stored algorithm module identification information.
  • the pre-stored data processing module identification information and the pre-stored algorithm module identification information may both be names of methods or algorithms.
  • the pre-stored data processing module identification information and the pre-stored algorithm module identification information may also be other information that plays a role of identification, and is not specifically limited herein.
  • the number of the pre-stored data processing module and the pre-stored algorithm module stored in the terminal is at least one, that is, the number of the acquired data processing module identification information and the algorithm module identification information is at least one. Therefore, before the terminal retrieves the pre-stored data processing module and the pre-stored algorithm module, the terminal further needs to arrange all the data processing module identification information and all the algorithm module identification information to form at least one group of machine learning groups; and sequentially read Data processing module identification information and algorithm module identification information in each group of the machine learning group.
  • the pre-stored data processing module stored in the terminal includes an outlier detection method module and a continuous value discretization method module
  • the pre-stored algorithm module includes a logistic regression algorithm module and an SVM algorithm module. That is, the number of pre-stored data processing modules and pre-stored algorithm modules is two. Then, after obtaining the training data, all the pre-stored data processing module identification information in the reading terminal is used as all the data processing module identification information and all the pre-stored algorithm module identification information as all the algorithm module identification information, and are arranged and combined to form four groups of machines. Study group. The four groups of machine learning groups are shown in Table 1.
  • Machine learning group Data processing module identification information Algorithm module identification information First group of machine learning groups Outlier detection method Logistic regression algorithm
  • Second group of machine learning groups Outlier detection method SVM algorithm
  • the third group of machine learning groups Continuous value discretization method Logistic regression algorithm
  • the fourth group of machine learning groups Continuous value discretization method SVM algorithm
  • the data processing module identification information and the algorithm module identification information in each group of machine learning groups are sequentially read, and the corresponding modules are retrieved, and then step S103 and steps are performed.
  • step S104 For example, the data processing module identification information and the algorithm module identification information in the first group of machine learning groups are first read, that is, the outlier detection method and the logistic regression algorithm respectively; then, the corresponding outlier detection is retrieved according to the outlier detection method.
  • the method module and the corresponding logistic regression algorithm module are retrieved according to the logistic regression algorithm; and then the steps S103 to S104 are performed according to the corresponding outlier detection method module and the logistic regression algorithm module in the first group of machine learning groups to obtain the first group of machines The machine learning model corresponding to the learning group and the corresponding verification indicators. Then, the process returns to perform reading of the data processing module identification information and the algorithm module identification information in the second group of machine learning groups, that is, the abnormal value detection method and the SVM algorithm, and the like, until four machine learning models and corresponding verifications are obtained. Until the indicator.
  • the pre-stored data processing module and the pre-stored algorithm module when at least two pre-stored data processing modules and/or at least two pre-stored algorithm modules are stored in the terminal, when the pre-stored data processing module and the pre-stored algorithm module are retrieved, one may be randomly selected.
  • the data processing module and a pre-stored algorithm module are pre-stored and steps S103 to S104 are performed. That is to say, it is not necessary to arrange and combine all the pre-stored data processing modules and all the pre-stored algorithm modules stored in the terminal. In this case, only one machine learning model and corresponding verification indicators are obtained, and no specific limitation is imposed herein.
  • the terminal performs pre-processing of the abnormal value detecting data on the training data by using the abnormal value detecting method module to obtain the processed training data.
  • the pre-stored algorithm module obtained in step S102 is trained and verified by using the processed training data, thereby obtaining a machine learning model and corresponding verification indicators.
  • FIG. 2 is a specific schematic flowchart of a method for generating a machine learning model shown in FIG. 1.
  • This step S104 includes steps S1041 to S1043.
  • the processed training data is divided into training model data and verification model data according to a preset ratio.
  • the processed training data is divided into training model data and verification model data according to a preset ratio of 9 to 1. That is to say, 90% of the processed training data is used as the training model data, and 10% of the processed training data is used as the verification model data.
  • the algorithm parameters corresponding to the pre-stored algorithm are also encapsulated in the pre-stored algorithm module.
  • the algorithm parameters in the pre-stored algorithm module are brought to the corresponding positions of the corresponding pre-stored algorithms to form an initial model, and then the initial model is trained with the training model data to obtain a final machine learning model.
  • the pre-stored algorithm module includes default algorithm parameters.
  • the terminal forms the initial model by using the default algorithm parameters in the pre-stored algorithm module.
  • the algorithm parameters may include a step size and an iteration number.
  • the algorithm parameters may include tree depth, maximum split feature number, and non-existence.
  • the algorithm parameters may be set according to a specific algorithm in the corresponding pre-stored algorithm module, and no specific limitation is imposed herein.
  • the verification indicator may be a recall rate, an accuracy rate, etc., and no specific restrictions are imposed here.
  • the identification information of the machine learning model and the verification indicator corresponding to the machine learning model are sequentially displayed according to the size of the verification indicator. For example, when two pre-stored data processing modules and two pre-stored algorithm modules are acquired in step S102, after steps S103 to S104, four machine learning models and corresponding verification indicators are obtained. Assuming that the verification index is the accuracy rate, the terminal can obtain the order of the four machine learning models in descending order of accuracy, as shown in Table 2.
  • the identification information of the machine learning model may be the number of each machine learning model, which is "001", “002", “003”, and "004", respectively.
  • the identification information of the machine learning model may also be other identifiers for distinguishing from each other, and no specific limitation is imposed herein.
  • the terminal can display the information about the four machine learning models in the form of Table 2, so that the user can see the accuracy rate corresponding to each machine learning model through Table 2, so that the user can select the appropriate machine learning model according to the needs. Subsequent forecasts.
  • the terminal may further display data processing module identification information and algorithm module identification information corresponding to each machine learning model, as shown in Table 2, so that the user can better understand the generation process of the machine learning model. Information such as the data processing method used and the algorithm used in model training.
  • the terminal may also display the algorithm parameters corresponding to the pre-stored algorithm module in each machine learning model finally obtained, and the display content is not limited herein.
  • the preset display rules are not limited to the above-mentioned order of display according to the size of the verification indicator, and may also be other types of display rules.
  • the preset display rule may also be the identification information of the machine learning model that displays the best verification indicator and the corresponding verification indicator. As shown in Table 2, the highest accuracy of the four machine learning models is 95%. Therefore, the terminal can display the machine learning model with the identification information of the machine learning model as 001 and the corresponding accuracy rate.
  • the method for generating the machine learning model does not need to acquire the machine learning model through programming every time, which can greatly reduce the workload of the engineer and improve the efficiency of acquiring the machine learning model.
  • FIG. 3 is a schematic flowchart of a method for generating a machine learning model according to an embodiment of the present application.
  • the method is applied to terminals such as desktop computers, laptop computers, and tablet computers.
  • the method includes steps S201 to S208.
  • the terminal may interact with the user by using a preset user operation interface, thereby acquiring training data, data processing module identification information, and algorithm module identification information input by the user.
  • FIG. 4 is a schematic structural diagram of a preset user operation interface according to an embodiment of the present application.
  • the preset user operation interface 10 is provided with a "data processing module identification information selection column", an "algorithm module identification information selection column", and the like. The user can select the data processing method and algorithm to be used in the "data processing module identification information selection column" and "algorithm module identification information selection column".
  • the user can click the “Browse” function button, so that the user can input the storage path of the training data into the corresponding input box, and then click the “confirm submit” button, so that the terminal can obtain the user by preset the user operation interface 10.
  • the input data processing module identification information, the algorithm module identification information, and the storage path of the training data, and the corresponding training data are obtained according to the storage path of the training data.
  • the training data may be historical registration data, transaction data, etc., and is not specifically limited herein.
  • the data processing module identification information may be a name corresponding to the data processing method.
  • the data processing module identification information may be an abnormal value detection method.
  • the algorithm module identification information may be a name corresponding to the algorithm used in the model training.
  • the algorithm module identification information may be a logistic regression algorithm.
  • the data processing module identification information and the algorithm module identification information may also adopt other identification information, which is not specifically limited herein.
  • the manner in which the terminal acquires the training data, the data processing module identification information, and the algorithm module identification information input by the user is not limited to the manner shown in FIG. 4, and may be other modes.
  • the manner shown in FIG. 4 is only There is one of many ways to obtain, and no specific restrictions are imposed here.
  • the user can input the required data processing module identification information and algorithm module identification information in a manner similar to that shown in FIG. In this way, the user can set the data processing method and the algorithm used for model training according to the individual's experience and project requirements, etc., to meet the acquisition requirements of the machine learning model of the professional user population.
  • the user may also set algorithm parameters corresponding to the algorithm module identification information.
  • the acquiring the training data, the data processing module identification information, and the algorithm module identification information further includes: acquiring the algorithm parameter corresponding to the identifier information of the algorithm module.
  • the algorithm parameters may include the step size and the number of iterations.
  • the algorithm parameters may include tree depth, maximum split feature number, and non-existence.
  • the algorithm parameters may be set according to a specific algorithm in the corresponding pre-stored algorithm module, and no specific limitation is imposed herein. In this embodiment, after the terminal passes the step S201, the number of the data processing module identification information and the algorithm module identification information acquired are at least one.
  • S202 Arrange and combine all the data processing module identification information and all the algorithm module identification information to form at least one group of machine learning groups.
  • Each group of machine learning groups includes a data processing module identification information and an algorithm module identification information.
  • step S204 to step S207 are performed. Then, the data processing module module identification information and the algorithm module identification information in the next group of machine learning groups are read, and then steps S204 to S207 are performed until the machine learning model corresponding to each group of machine learning groups and the corresponding verification indicators are obtained. .
  • the terminal matches the data processing module identification information input by the user with the pre-stored data processing module identification information in the terminal, and retrieves the pre-stored data processing module corresponding to the matched identification information.
  • the identifier information of the pre-stored data processing module may be a name corresponding to the data processing method.
  • the identifier information of the pre-stored data processing module may be an abnormal value detection method.
  • the terminal matches the algorithm module identification information input by the user with the pre-stored algorithm module identification information in the terminal, and retrieves the pre-stored algorithm module corresponding to the matched identification information.
  • the pre-stored algorithm module identification information may be a name corresponding to the algorithm.
  • the pre-stored algorithm module identification information may be a logistic regression algorithm.
  • the terminal performs data pre-processing on the training data by using the abnormal value detecting method module to obtain the processed training data.
  • the pre-stored algorithm module acquired in S205 is trained and verified by using the processed training data, thereby obtaining a machine learning model and corresponding verification indicators.
  • step S207 specifically includes: training and verifying the pre-stored algorithm module by setting the algorithm parameter according to the processed training data. To obtain a machine learning model and a verification indicator corresponding to the machine learning model.
  • the algorithm parameters set by the user are brought into the corresponding positions of the pre-stored algorithm module to form an initial model, and then the initial model is trained with the processed training data to obtain a final machine learning model. That is to say, when the user sets the algorithm parameters, the algorithm parameters set by the user are brought into the pre-stored algorithm module to form an initial model.
  • the machine learning model After obtaining the machine learning model, the machine learning model needs to be verified with the processed training data to obtain the verification index.
  • the verification indicators include recall rate, accuracy rate, etc., and no specific restrictions are imposed here.
  • the identification information of the machine learning model and the verification indicator corresponding to the machine learning model are sequentially displayed according to the size of the verification indicator.
  • the identification information of the machine learning model may be a number of each machine learning model, for example, the number “001”, the number “002”, and the like.
  • the identification information of the machine learning model may also be the name of the machine learning model, for example, "Logistic Regression Machine Learning Model 1", “Logistic Regression Machine Learning Model 2", etc., and no specific restrictions are imposed herein.
  • the preset display rules are not limited to the above-mentioned order according to the size of the verification index, and may also be other kinds of display rules.
  • the preset display rule may also be the identification information of the machine learning model that displays the best verification indicator and the corresponding verification indicator.
  • the verification index is the accuracy rate
  • the identification information of a machine learning model with the highest accuracy rate in multiple machine learning models and the corresponding accuracy rate can be displayed.
  • the terminal may further display data processing module identification information and algorithm module identification information corresponding to each machine learning model, so that the user can better understand the data processing method used in the generation process of the machine learning model. And information such as algorithms used in model training.
  • the terminal can also display the algorithm parameters corresponding to the pre-stored algorithm module in each machine learning model finally obtained, and the display content is not limited herein.
  • the method for generating the machine learning model does not need to acquire the machine learning model through programming every time, which can greatly reduce the workload of the engineer and improve the efficiency of acquiring the machine learning model.
  • data processing methods, model training algorithms, algorithm parameters and other information can be set according to personal experience, project requirements, etc., so that the terminal quickly acquires the corresponding machine learning model according to the user's settings. It makes the acquisition of the machine learning model more simple and convenient.
  • the embodiment of the present application further provides a device for generating a machine learning model, where the device for generating a machine learning model is configured to execute a method for generating a machine learning model.
  • FIG. 5 is a schematic block diagram of a device for generating a machine learning model according to an embodiment of the present application.
  • the machine learning model generating device 300 can be installed in a desktop computer, a tablet computer, a laptop computer, or the like.
  • the machine learning model generating apparatus 300 includes a data acquiring unit 301, a module calling unit 302, a data processing unit 303, a model generating unit 304, and a display unit 305.
  • the data acquisition sheet 301 is used to acquire training data.
  • the module retrieving unit 302 is configured to retrieve a pre-stored data processing module and a pre-stored algorithm module.
  • the data acquisition unit 301 is further configured to acquire data processing module identification information and algorithm module identification information. Specifically, the data acquisition unit 301 acquires the pre-stored data processing module identification information in the terminal as the data processing module identification information, and acquires the pre-stored algorithm module identification information in the terminal as the algorithm module identification information.
  • the module retrieving unit 302 is configured to retrieve a corresponding pre-stored data processing module according to the data processing module identification information; and retrieve a corresponding pre-stored algorithm module according to the algorithm module identification information.
  • FIG. 6 is another schematic block diagram of a device for generating a machine learning model according to an embodiment of the present application.
  • the machine learning model generating device 300 further includes an arrangement combining unit 306 and a reading unit 307.
  • the arrangement combining unit 306 is configured to arrange all the data processing module identification information and all the algorithm module identification information into at least one set of machine learning groups.
  • the reading unit 307 is configured to sequentially read data processing module identification information and algorithm module identification information in each group of the machine learning group.
  • the module retrieving unit 302 retrieves the corresponding pre-stored data processing module according to the data processing module identification information read by the reading unit 307, and retrieves the corresponding pre-stored algorithm module according to the algorithm module identification information read by the reading unit 307. .
  • the data processing unit 303 is configured to perform data preprocessing on the training data according to the pre-stored data processing module to obtain processed training data.
  • the model generating unit 304 is configured to perform training and verification on the pre-stored algorithm module according to the processed training data to obtain a machine learning model and a verification indicator corresponding to the machine learning model.
  • the model generation unit 304 includes a division subunit 3041, a training subunit 3042, and a verification subunit 3043.
  • the dividing subunit 3041 is configured to divide the processed training data into training model data and verification model data according to a preset ratio.
  • the training subunit 3042 is configured to train the pre-stored algorithm module according to the training model data to obtain a machine learning model.
  • the verification subunit 3043 is configured to verify the machine learning model according to the verification model data to obtain a verification indicator corresponding to the machine learning model.
  • the display unit 305 is configured to display the machine learning model and its corresponding verification indicator according to a preset display rule.
  • the display unit 305 sequentially displays the identification information of the machine learning model and the verification indicator corresponding to the machine learning model according to the size of the verification indicator.
  • the display unit 305 can also display the data processing module identification information and the algorithm module identification information corresponding to each machine learning model, so that the user can better understand the data processing used in the process of generating the machine learning model.
  • the display unit 305 can also display the algorithm parameters corresponding to the pre-stored algorithm modules in each machine learning model finally obtained, and the display content is not limited herein.
  • the above-mentioned machine learning model generating apparatus 300 and the specific working process of each unit can refer to the foregoing machine learning model generating method. Corresponding processes in the embodiments are not described herein again.
  • the machine learning model generating apparatus 300 can improve the efficiency of acquiring the machine learning model, and at the same time, make the machine learning model acquisition mode simpler and applicable to a wider population.
  • FIG. 7 is a schematic block diagram of a device for generating a machine learning model according to an embodiment of the present application.
  • the machine learning model generating device 400 can be installed in a desktop computer, a tablet computer, a laptop computer, or the like.
  • the machine learning model generating apparatus 400 includes a data acquiring unit 401, an array combining unit 402, a reading unit 403, a module calling unit 404, a data processing unit 405, a model generating unit 406, and a display unit 407.
  • the data obtaining unit 401 is configured to acquire training data, data processing module identification information, and algorithm module identification information.
  • the data acquiring unit 401 is further configured to: acquire algorithm parameters corresponding to the algorithm module identification information.
  • the user can set the algorithm parameters in the algorithm module corresponding to the algorithm module identification information according to personal experience, and the machine learning model can be obtained faster and better.
  • the number of data processing module identification information and algorithm module identification information acquired by the data acquiring unit 401 is at least one.
  • the arranging unit 402 is configured to arrange all the data processing module identification information and all the algorithm module identification information into at least one set of machine learning groups.
  • the reading unit 403 is configured to sequentially read data processing module identification information and algorithm module identification information in each group of the machine learning group.
  • the module retrieving unit 404 is configured to retrieve a corresponding pre-stored data processing module according to the data processing module identification information, and retrieve a corresponding pre-stored algorithm module according to the algorithm module identification information.
  • the data processing unit 405 is configured to perform data preprocessing on the training data according to the pre-stored data processing module to obtain processed training data.
  • the model generating unit 406 is configured to perform training and verification on the pre-stored algorithm module according to the processed training data to obtain a machine learning model and a verification indicator corresponding to the machine learning model.
  • the model generation unit 406 is specifically configured to set the algorithm parameter to the pre-stored algorithm module according to the processed training data. Training and verification are performed to obtain a machine learning model and a verification indicator corresponding to the machine learning model.
  • the display unit 407 is configured to display the machine learning model and its corresponding verification indicator according to a preset display rule.
  • the display unit 407 is configured to display the identification information of the machine learning model and the verification indicator corresponding to the machine learning model according to the size of the verification indicator.
  • the display unit 407 can also display the data processing module identification information and the algorithm module identification information corresponding to each machine learning model, so that the user can better understand the data used in the process of generating the machine learning model. Processing methods and information such as algorithms used in model training.
  • the display unit 407 can also display the algorithm parameters corresponding to the pre-stored algorithm modules in each machine learning model finally obtained, and the display content is not limited herein.
  • the machine learning model generating apparatus 400 can improve the efficiency of acquiring the machine learning model, and at the same time, make the machine learning model acquisition manner more simple and convenient.
  • FIG. 8 is a schematic block diagram of a computer device according to an embodiment of the present application.
  • the computer device 500 can be a terminal.
  • the terminal can be an electronic device such as a tablet computer, a notebook computer, a desktop computer, or a personal digital assistant.
  • the computer device 500 includes a processor 502, a memory, and a network interface 505 connected by a system bus 501, wherein the memory can include a non-volatile storage medium 503 and an internal memory 504.
  • the non-volatile storage medium 503 can store an operating system 5031 and a computer program 5032.
  • the computer program 5032 includes program instructions that, when executed, cause the processor 502 to perform a method of generating a machine learning model.
  • the processor 502 is used to provide computing and control capabilities to support the operation of the entire computer device 500.
  • the internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503, which when executed by the processor 502, may cause the processor 502 to perform a method of generating a machine learning model.
  • the network interface 505 is used for network communication, such as sending assigned tasks and the like. It will be understood by those skilled in the art that the structure shown in FIG.
  • FIG. 8 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation of the computer device 500 to which the solution of the present application is applied, and a specific computer device. 500 may include more or fewer components than shown, or some components may be combined, or have different component arrangements.
  • the processor 502 is configured to execute a computer program 5032 stored in a memory to implement a method for generating a machine learning model in various embodiments of the present application.
  • the processor 502 may be a central processing unit, and may also be other general-purpose processors, digital signal processors, application specific integrated circuits, off-the-shelf programmable gate arrays or other programmable logic devices, and discrete gates. Or transistor logic devices, discrete hardware components, and so on.
  • the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • a storage medium in another embodiment, can be a computer readable storage medium.
  • the storage medium stores a computer program, wherein the computer program includes program instructions.
  • the program instructions when executed by the processor, cause the processor to perform the method of generating the machine learning model of the present application.
  • the storage medium may be a medium that can store program codes, such as a USB flash drive, a removable hard disk, a Read-Only Memory (ROM), a flash memory, a magnetic disk, or an optical disk.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit can be stored in a storage medium if it is implemented in the form of a software functional unit and sold or used as a standalone product. Based on such understanding, the technical solution of the present application may be in essence or part of the contribution to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium. There are a number of instructions for causing a computer device (which may be a personal computer, terminal, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.
  • a computer device which may be a personal computer, terminal, or network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Embodiments of the present application disclose a method and apparatus for generating a machine learning model, a computer device, and a storage medium. The method comprises: obtaining training data; invoking a pre-stored data processing module and a pre-stored algorithm module; performing data preprocessing on the training data according to the pre-stored data processing module; training and validating the pre-stored algorithm module according to the processed training data to obtain a machine learning model and a corresponding validation index; and displaying the machine learning model and the corresponding validation index.

Description

机器学习模型的生成方法、装置、计算机设备及存储介质Method, device, computer equipment and storage medium for generating machine learning model
本申请要求于2018年1月30日提交中国专利局、申请号为201810089701.5、发明名称为“机器学习模型的生成方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese Patent Application filed on January 30, 2018, the Chinese Patent Office, Application No. 201810089701.5, and the invented name of the "Machine Learning Model Generation Method, Apparatus, Computer Equipment, and Storage Medium". This is incorporated herein by reference.
技术领域Technical field
本申请涉及机器学习技术领域,尤其涉及一种机器学习模型的生成方法、装置、计算机设备及存储介质。The present application relates to the field of machine learning technologies, and in particular, to a method, an apparatus, a computer device, and a storage medium for generating a machine learning model.
背景技术Background technique
机器学习过程一般包括数据预处理过程及模型训练过程。无论是数据预处理过程还是模型训练过程都需要工程师通过代码来实现。然而,在很多情况下,机器学习过程中所使用的数据预处理方法、模型训练的算法等都具有较高的相似度。若每次都需要工程师写代码实现,势必会给工程师带来较大的工作量,造成重复性工作的同时,还浪费时间。The machine learning process generally includes a data preprocessing process and a model training process. Both the data pre-processing process and the model training process require engineers to implement the code. However, in many cases, the data preprocessing methods used in the machine learning process, the algorithms for model training, etc. all have high similarities. If you need engineers to write code every time, it will inevitably bring a lot of work to the engineers, causing repetitive work and wasting time.
发明内容Summary of the invention
本申请实施例提供了一种机器学习模型的生成方法、装置、计算机设备及存储介质,可以快速地得到机器学习模型。The embodiment of the present application provides a method, a device, a computer device, and a storage medium for generating a machine learning model, which can quickly obtain a machine learning model.
第一方面,本申请实施例提供了一种机器学习模型的生成方法,其包括:获取训练数据;调取预存数据处理模块及预存算法模块;根据所述预存数据处理模块对所述训练数据进行数据预处理以得到处理后的训练数据;根据所述处理后的训练数据对所述预存算法模块进行训练及验证以得到机器学习模型和所述机器学习模型对应的验证指标;以及按照预设显示规则显示所述机器学习模型及其对应的验证指标。In a first aspect, the embodiment of the present application provides a method for generating a machine learning model, including: acquiring training data; acquiring a pre-stored data processing module and a pre-stored algorithm module; and performing the training data according to the pre-stored data processing module. Data pre-processing to obtain processed training data; training and verifying the pre-stored algorithm module according to the processed training data to obtain a machine learning model and a verification indicator corresponding to the machine learning model; and displaying according to a preset The rules display the machine learning model and its corresponding verification metrics.
第二方面,本申请实施例提供了一种机器学习模型的生成装置,其包括:In a second aspect, an embodiment of the present application provides a device for generating a machine learning model, including:
数据获取单元,用于获取训练数据;a data acquisition unit, configured to acquire training data;
模块调取单元,用于调取预存数据处理模块及预存算法模块;a module retrieval unit, configured to retrieve a pre-stored data processing module and a pre-stored algorithm module;
数据处理单元,用于根据所述预存数据处理模块对所述训练数据进行数据预处理以得到处理后的训练数据;a data processing unit, configured to perform data preprocessing on the training data according to the pre-stored data processing module to obtain processed training data;
模型生成单元,用于根据所述处理后的训练数据对所述预存算法模块进行训练及验证以得到机器学习模型和所述机器学习模型对应的验证指标;以及a model generating unit, configured to perform training and verification on the pre-stored algorithm module according to the processed training data to obtain a machine learning model and a verification indicator corresponding to the machine learning model;
显示单元,用于按照预设显示规则显示所述机器学习模型及其对应的验证指标。And a display unit, configured to display the machine learning model and its corresponding verification indicator according to a preset display rule.
第三方面,本申请实施例又提供了一种计算机设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现本申请实施例提供的任一项所述的机器学习模型的生成方法。In a third aspect, the embodiment of the present application further provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program The method for generating a machine learning model according to any one of the embodiments of the present application is implemented.
第四方面,本申请实施例还提供了一种存储介质,其中所述存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行本申请实施例提供的任一项所述的机器学习模型的生成方法。In a fourth aspect, the embodiment of the present application further provides a storage medium, wherein the storage medium stores a computer program, where the computer program includes program instructions, and the program instructions, when executed by the processor, cause the processor to execute The method for generating a machine learning model according to any one of the embodiments of the present application.
本申请实施例提供的机器学习模型的生成方法,无需每次都通过编程来获取机器学习模型,大大减轻了工程师的工作量,节省工程师的时间,提高获取机器学习模型的效率。The method for generating a machine learning model provided by the embodiment of the present application does not need to acquire a machine learning model through programming every time, thereby greatly reducing the workload of the engineer, saving the time of the engineer, and improving the efficiency of acquiring the machine learning model.
附图说明DRAWINGS
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the description of the embodiments will be briefly described below. Obviously, the drawings in the following description are some embodiments of the present application, For the ordinary technicians, other drawings can be obtained based on these drawings without any creative work.
图1为本申请一实施例提供的一种机器学习模型的生成方法的示意流程图;FIG. 1 is a schematic flowchart of a method for generating a machine learning model according to an embodiment of the present application;
图2为图1所示机器学习模型的生成方法的具体流程示意图;2 is a schematic diagram of a specific process of a method for generating a machine learning model shown in FIG. 1;
图3为本申请一实施例提供的一种机器学习模型的生成方法的另一示意流程图;FIG. 3 is another schematic flowchart of a method for generating a machine learning model according to an embodiment of the present disclosure;
图4为本申请一实施例中预设用户操作界面的结构示意图;4 is a schematic structural diagram of a preset user operation interface according to an embodiment of the present application;
图5为本申请一实施例提供的一种机器学习模型的生成装置的示意性框图;FIG. 5 is a schematic block diagram of a device for generating a machine learning model according to an embodiment of the present disclosure;
图6为本申请一实施例提供的一种机器学习模型的生成装置的示意性框图;FIG. 6 is a schematic block diagram of a device for generating a machine learning model according to an embodiment of the present disclosure;
图7为本申请一实施例提供的一种机器学习模型的生成装置的示意性框图;FIG. 7 is a schematic block diagram of a device for generating a machine learning model according to an embodiment of the present disclosure;
图8为本申请一实施例提供的一种计算机设备的示意性框图。FIG. 8 is a schematic block diagram of a computer device according to an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application are clearly and completely described in the following with reference to the drawings in the embodiments of the present application. It is obvious that the described embodiments are a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.
请参阅图1,图1是本申请实施例提供的一种机器学习模型的生成方法的示意流程图。该方法应用于台式电脑、手提电脑、平板电脑等终端中。如图1所示,该方法包括步骤S101~S105。Please refer to FIG. 1. FIG. 1 is a schematic flowchart of a method for generating a machine learning model according to an embodiment of the present application. The method is applied to terminals such as desktop computers, laptop computers, and tablet computers. As shown in FIG. 1, the method includes steps S101 to S105.
S101、获取训练数据。S101. Obtain training data.
在本实施例中,终端可以通过预设用户操作界面与用户进行交互,进而获取训练数据。譬如,用户可以通过预设用户操作界面中的“浏览”等按钮将训练数据的存储路径输入至相应输入框中,以使得在用户点击“确认提交”按钮后,终端根据该存储路径获取到对应的训练数据。In this embodiment, the terminal can interact with the user by using a preset user operation interface to obtain training data. For example, the user can input the storage path of the training data into the corresponding input box by pressing a button such as “Browse” in the user operation interface, so that after the user clicks the “confirm submit” button, the terminal acquires the corresponding according to the storage path. Training data.
在一实施例中,该训练数据可以为历史注册数据。譬如,训练数据可以为用户注册“陆金所”理财平台的用户名、手机号码等等注册数据。该训练数据还可以为交易数据。譬如,该训练数据为某用户购买理财产品等的交易数据。当然,该训练数据还可以为其他数据,在此不做具体限制。In an embodiment, the training data can be historical registration data. For example, the training data can be used to register the user name, mobile phone number, etc. of the Lujin Institute financial platform. The training data can also be transaction data. For example, the training data is transaction data for a user to purchase a wealth management product or the like. Of course, the training data can also be other data, and no specific restrictions are imposed here.
S102、调取预存数据处理模块及预存算法模块。S102: Acquire a pre-stored data processing module and a pre-stored algorithm module.
在本实施例中,终端中预先存储有预存数据处理模块和预存算法模块。其中,该预存数据处理模块可以为机器学习过程中常用的数据处理方法封装而成的模块。譬如,该预存数据处理模块可以为异常值检测方法、连续值离散化方法等方法封装而成的模块。同理,该预存算法模块可以为模型训练过程中常用的算法封装而成的模块。譬如,该预存算法模块可以为分类算法、回归算法等方法封装而成的模块。具体地,该预存算法模块可以为逻辑回归算法、SVM算法、决策树算法等封装而成的模块。In this embodiment, the pre-stored data processing module and the pre-stored algorithm module are pre-stored in the terminal. The pre-stored data processing module may be a module encapsulated by a data processing method commonly used in a machine learning process. For example, the pre-stored data processing module may be a module encapsulated by methods such as an outlier detection method and a continuous value discretization method. Similarly, the pre-stored algorithm module can be a module encapsulated by algorithms commonly used in the model training process. For example, the pre-stored algorithm module may be a module encapsulated by a classification algorithm, a regression algorithm, or the like. Specifically, the pre-stored algorithm module may be a module encapsulated by a logistic regression algorithm, an SVM algorithm, a decision tree algorithm, or the like.
在一实施例中,该预存数据处理模块和预存算法模块可以为Shell脚本。也 就是说,将一些常用的数据处理方法、算法等预先封装成Shell脚本,然后再存储在终端中,以供后续调用。In an embodiment, the pre-stored data processing module and the pre-stored algorithm module may be a shell script. That is to say, some commonly used data processing methods, algorithms, etc. are pre-packaged into a shell script, and then stored in the terminal for subsequent calls.
在一实施例中,该终端在调取预存数据处理模块及预存算法模块之前,还需要:获取数据处理模块标识信息和算法模块标识信息。具体地,获取终端中预存数据处理模块标识信息作为数据处理模块标识信息,以及获取所述终端中预存算法模块标识信息作为算法模块标识信息。也就是说,此时用户不需要输入数据处理模块标识信息和算法模块标识信息。In an embodiment, the terminal needs to acquire the data processing module identification information and the algorithm module identification information before the pre-stored data processing module and the pre-stored algorithm module are retrieved. Specifically, the pre-stored data processing module identification information in the terminal is obtained as the data processing module identification information, and the pre-stored algorithm module identification information in the terminal is obtained as the algorithm module identification information. That is to say, at this time, the user does not need to input the data processing module identification information and the algorithm module identification information.
相应地,终端在调取预存数据处理模块及预存算法模块时,具体包括:根据所述数据处理模块标识信息调取对应的预存数据处理模块;以及根据所述算法模块标识信息调取对应的预存算法模块。Correspondingly, when the terminal retrieves the pre-stored data processing module and the pre-stored algorithm module, the method includes: acquiring a corresponding pre-stored data processing module according to the data processing module identification information; and acquiring corresponding pre-stored according to the algorithm module identification information Algorithm module.
在一实施例中,终端中预先存储有预存数据处理模块和预存算法模块对应的标识信息,分别称为预存数据处理模块标识信息和预存算法模块标识信息。该预存数据处理模块标识信息以及预存算法模块标识信息均可以为方法或算法的名称。当然,在其他实施例中,该预存数据处理模块标识信息以及预存算法模块标识信息还可以为其他起到标识作用的信息,在此不做具体限制。In an embodiment, the terminal stores pre-stored data processing module and pre-stored algorithm module corresponding identification information, which are respectively referred to as pre-stored data processing module identification information and pre-stored algorithm module identification information. The pre-stored data processing module identification information and the pre-stored algorithm module identification information may both be names of methods or algorithms. Of course, in other embodiments, the pre-stored data processing module identification information and the pre-stored algorithm module identification information may also be other information that plays a role of identification, and is not specifically limited herein.
一般来说,终端中存储的预存数据处理模块和预存算法模块的个数均为至少一个,也即获取的数据处理模块标识信息和算法模块标识信息的个数均为至少一个。因此,终端在调取预存数据处理模块及预存算法模块之前,还需要将所有所述数据处理模块标识信息和所有所述算法模块标识信息进行排列组合形成至少一组机器学习组;以及依次读取每组所述机器学习组中的数据处理模块标识信息和算法模块标识信息。Generally, the number of the pre-stored data processing module and the pre-stored algorithm module stored in the terminal is at least one, that is, the number of the acquired data processing module identification information and the algorithm module identification information is at least one. Therefore, before the terminal retrieves the pre-stored data processing module and the pre-stored algorithm module, the terminal further needs to arrange all the data processing module identification information and all the algorithm module identification information to form at least one group of machine learning groups; and sequentially read Data processing module identification information and algorithm module identification information in each group of the machine learning group.
譬如,终端中存储的预存数据处理模块包括异常值检测方法模块和连续值离散化方法模块,该预存算法模块包括逻辑回归算法模块和SVM算法模块。也即预存数据处理模块和预存算法模块的个数均为两个。那么在获取到训练数据后,将读取终端中的所有预存数据处理模块标识信息作为所有数据处理模块标识信息和所有预存算法模块标识信息作为所有算法模块标识信息,并进行排列组合形成四组机器学习组。四组机器学习组如表1所示。For example, the pre-stored data processing module stored in the terminal includes an outlier detection method module and a continuous value discretization method module, and the pre-stored algorithm module includes a logistic regression algorithm module and an SVM algorithm module. That is, the number of pre-stored data processing modules and pre-stored algorithm modules is two. Then, after obtaining the training data, all the pre-stored data processing module identification information in the reading terminal is used as all the data processing module identification information and all the pre-stored algorithm module identification information as all the algorithm module identification information, and are arranged and combined to form four groups of machines. Study group. The four groups of machine learning groups are shown in Table 1.
表1Table 1
机器学习组Machine learning group 数据处理模块标识信息Data processing module identification information 算法模块标识信息Algorithm module identification information
第一组机器学习组First group of machine learning groups 异常值检测方法Outlier detection method 逻辑回归算法Logistic regression algorithm
第二组机器学习组Second group of machine learning groups 异常值检测方法Outlier detection method SVM算法SVM algorithm
第三组机器学习组The third group of machine learning groups 连续值离散化方法Continuous value discretization method 逻辑回归算法Logistic regression algorithm
第四组机器学习组The fourth group of machine learning groups 连续值离散化方法Continuous value discretization method SVM算法SVM algorithm
在形成如表1所示的四组机器学习组后,将依次读取每组机器学习组中的数据处理模块标识信息和算法模块标识信息,并调取对应的模块,然后进行步骤S103和步骤S104。如,先读取第一组机器学习组中的数据处理模块标识信息和算法模块标识信息,即分别为异常值检测方法和逻辑回归算法;然后,根据异常值检测方法调取对应的异常值检测方法模块以及根据逻辑回归算法调取对应的逻辑回归算法模块;再根据第一组机器学习组中对应的异常值检测方法模块和逻辑回归算法模块来执行步骤S103至S104,以得到第一组机器学习组对应的机器学习模型以及对应的验证指标。然后再返回执行读取第二组机器学习组中的数据处理模块标识信息和算法模块标识信息,即分别为异常值检测方法和SVM算法,等步骤,直至获得四个机器学习模型以及对应的验证指标为止。After forming the four sets of machine learning groups as shown in Table 1, the data processing module identification information and the algorithm module identification information in each group of machine learning groups are sequentially read, and the corresponding modules are retrieved, and then step S103 and steps are performed. S104. For example, the data processing module identification information and the algorithm module identification information in the first group of machine learning groups are first read, that is, the outlier detection method and the logistic regression algorithm respectively; then, the corresponding outlier detection is retrieved according to the outlier detection method. The method module and the corresponding logistic regression algorithm module are retrieved according to the logistic regression algorithm; and then the steps S103 to S104 are performed according to the corresponding outlier detection method module and the logistic regression algorithm module in the first group of machine learning groups to obtain the first group of machines The machine learning model corresponding to the learning group and the corresponding verification indicators. Then, the process returns to perform reading of the data processing module identification information and the algorithm module identification information in the second group of machine learning groups, that is, the abnormal value detection method and the SVM algorithm, and the like, until four machine learning models and corresponding verifications are obtained. Until the indicator.
当然,在其他实施例中,当终端中存储有至少两个预存数据处理模块和/或至少两个预存算法模块时,在调取预存数据处理模块及预存算法模块时,也可以随机调取一个预存数据处理模块和一个预存算法模块并进行步骤S103至S104。也就是说,无需对终端内存储的所有预存数据处理模块和所有预存算法模块进行排列组合,此时将只获取到一个机器学习模型及对应的验证指标,在此不做具体限制。Of course, in other embodiments, when at least two pre-stored data processing modules and/or at least two pre-stored algorithm modules are stored in the terminal, when the pre-stored data processing module and the pre-stored algorithm module are retrieved, one may be randomly selected. The data processing module and a pre-stored algorithm module are pre-stored and steps S103 to S104 are performed. That is to say, it is not necessary to arrange and combine all the pre-stored data processing modules and all the pre-stored algorithm modules stored in the terminal. In this case, only one machine learning model and corresponding verification indicators are obtained, and no specific limitation is imposed herein.
S103、根据所述预存数据处理模块对所述训练数据进行数据预处理以得到处理后的训练数据。S103. Perform data preprocessing on the training data according to the pre-stored data processing module to obtain processed training data.
譬如,步骤S102获取到的预存数据处理模块为异常值检测方法模块,那么终端就利用异常值检测方法模块对训练数据进行异常值检测数据预处理,以得到处理后的训练数据。For example, if the pre-stored data processing module acquired in step S102 is an abnormal value detecting method module, the terminal performs pre-processing of the abnormal value detecting data on the training data by using the abnormal value detecting method module to obtain the processed training data.
S104、根据所述处理后的训练数据对所述预存算法模块进行训练及验证以得到机器学习模型和所述机器学习模型对应的验证指标。S104. Train and verify the pre-stored algorithm module according to the processed training data to obtain a machine learning model and a verification indicator corresponding to the machine learning model.
在对训练数据进行数据预处理后,将利用处理后的训练数据对步骤S102中获取到的预存算法模块进行训练及验证,进而得到机器学习模型及对应的验证指标。After performing data pre-processing on the training data, the pre-stored algorithm module obtained in step S102 is trained and verified by using the processed training data, thereby obtaining a machine learning model and corresponding verification indicators.
具体地,在一实施例中,如图2所示,图2为图1所示机器学习模型的生 成方法的具体示意流程图。该步骤S104包括步骤S1041~S1043。Specifically, in an embodiment, as shown in FIG. 2, FIG. 2 is a specific schematic flowchart of a method for generating a machine learning model shown in FIG. 1. This step S104 includes steps S1041 to S1043.
S1041、将所述处理后的训练数据按照预设比例划分成训练模型数据和验证模型数据。S1041: The processed training data is divided into training model data and verification model data according to a preset ratio.
譬如,将处理后的训练数据按照9比1的预设比例划分成训练模型数据和验证模型数据。也就是说,90%的处理后的训练数据作为训练模型数据,10%的处理后的训练数据作为验证模型数据。For example, the processed training data is divided into training model data and verification model data according to a preset ratio of 9 to 1. That is to say, 90% of the processed training data is used as the training model data, and 10% of the processed training data is used as the verification model data.
S1042、根据所述训练模型数据对所述预存算法模块进行训练以得到机器学习模型。S1042. Train the pre-stored algorithm module according to the training model data to obtain a machine learning model.
在一实施例中,在将预存算法封装成Shell脚本等模块时,该预存算法对应的算法参数也一起被封装在该预存算法模块中。当得到训练模型数据时,将预存算法模块中的算法参数带入至对应的预存算法的相应位置,形成初始模型,然后用训练模型数据对初始模型进行训练得到最终的机器学习模型。也就是说,预存算法模块包括默认的算法参数,当用户未设置算法参数时,终端采用预存算法模块中的默认的算法参数形成初始模型。其中,对于逻辑回归算法来说,该算法参数可以包括步长和迭代次数。对于决策树算法来说,该算法参数可以包括树深度、最大分裂特征数和不存度。该算法参数可以根据对应的预存算法模块中具体算法来设置,在此不做具体限制。In an embodiment, when the pre-stored algorithm is encapsulated into a module such as a shell script, the algorithm parameters corresponding to the pre-stored algorithm are also encapsulated in the pre-stored algorithm module. When the training model data is obtained, the algorithm parameters in the pre-stored algorithm module are brought to the corresponding positions of the corresponding pre-stored algorithms to form an initial model, and then the initial model is trained with the training model data to obtain a final machine learning model. That is to say, the pre-stored algorithm module includes default algorithm parameters. When the user does not set the algorithm parameters, the terminal forms the initial model by using the default algorithm parameters in the pre-stored algorithm module. Wherein, for the logistic regression algorithm, the algorithm parameters may include a step size and an iteration number. For the decision tree algorithm, the algorithm parameters may include tree depth, maximum split feature number, and non-existence. The algorithm parameters may be set according to a specific algorithm in the corresponding pre-stored algorithm module, and no specific limitation is imposed herein.
S1043、根据所述验证模型数据对所述机器学习模型进行验证以得到所述机器学习模型对应的验证指标。S1043. Verify the machine learning model according to the verification model data to obtain a verification indicator corresponding to the machine learning model.
在得到机器学习模型后,将用验证模型数据对获得的机器学习模型进行验证以得到对应的验证指标。其中,该验证指标可以为召回率,也可以为精准率等等,在此不做具体限制。After obtaining the machine learning model, the obtained machine learning model will be verified with the verification model data to obtain corresponding verification indicators. The verification indicator may be a recall rate, an accuracy rate, etc., and no specific restrictions are imposed here.
S105、按照预设显示规则显示所述机器学习模型及其对应的验证指标。S105. Display the machine learning model and its corresponding verification indicator according to a preset display rule.
具体地,在一实施例中,根据所述验证指标的大小顺序显示所述机器学习模型的标识信息及所述机器学习模型对应的验证指标。譬如,当步骤S102获取两个预存数据处理模块和两个预存算法模块时,经过步骤S103至S104之后,会获得四个机器学习模型以及对应的验证指标。假设验证指标为精准率,终端可以按照精准率从大到小的顺序得到四个机器学习模型的排列顺序,如表2所示。Specifically, in an embodiment, the identification information of the machine learning model and the verification indicator corresponding to the machine learning model are sequentially displayed according to the size of the verification indicator. For example, when two pre-stored data processing modules and two pre-stored algorithm modules are acquired in step S102, after steps S103 to S104, four machine learning models and corresponding verification indicators are obtained. Assuming that the verification index is the accuracy rate, the terminal can obtain the order of the four machine learning models in descending order of accuracy, as shown in Table 2.
表2Table 2
Figure PCTCN2018084039-appb-000001
Figure PCTCN2018084039-appb-000001
在表2中,该机器学习模型的标识信息可以为每个机器学习模型的编号,分别为“001”、“002”、“003”和“004”。当然,在其他实施例中,机器学习模型的标识信息还可以为其他用于相互区分的标识,在此不做具体限制。In Table 2, the identification information of the machine learning model may be the number of each machine learning model, which is "001", "002", "003", and "004", respectively. Of course, in other embodiments, the identification information of the machine learning model may also be other identifiers for distinguishing from each other, and no specific limitation is imposed herein.
终端可以以表2的形式显示出四个机器学习模型的相关信息,以使得用户可以通过表2看到每个机器学习模型对应的精准率,以供用户根据所需选择合适的机器学习模型进行后续预测。The terminal can display the information about the four machine learning models in the form of Table 2, so that the user can see the accuracy rate corresponding to each machine learning model through Table 2, so that the user can select the appropriate machine learning model according to the needs. Subsequent forecasts.
在一实施例中,该终端还可以显示出每个机器学习模型对应的数据处理模块标识信息和算法模块标识信息,如表2所示,这样方便用户更好地了解机器学习模型的生成过程中所采用的数据处理方法以及模型训练采用的算法等信息。当然,终端还可以将最终获得的每个机器学习模型中预存算法模块对应的算法参数显示出来,在此不对显示内容做限制。In an embodiment, the terminal may further display data processing module identification information and algorithm module identification information corresponding to each machine learning model, as shown in Table 2, so that the user can better understand the generation process of the machine learning model. Information such as the data processing method used and the algorithm used in model training. Of course, the terminal may also display the algorithm parameters corresponding to the pre-stored algorithm module in each machine learning model finally obtained, and the display content is not limited herein.
需要说明的是,在其他实施例中,预设显示规则不局限于上述按照验证指标的大小顺序排列显示,还可以为其他种显示规则。譬如,预设显示规则还可以为显示验证指标最好的机器学习模型的标识信息及对应的验证指标。如表2所示,四个机器学习模型的精准率中最高的为95%,因此,终端可以显示机器学习模型的标识信息为001的机器学习模型以及对应的精准率。It should be noted that, in other embodiments, the preset display rules are not limited to the above-mentioned order of display according to the size of the verification indicator, and may also be other types of display rules. For example, the preset display rule may also be the identification information of the machine learning model that displays the best verification indicator and the corresponding verification indicator. As shown in Table 2, the highest accuracy of the four machine learning models is 95%. Therefore, the terminal can display the machine learning model with the identification information of the machine learning model as 001 and the corresponding accuracy rate.
在本实施例中,该机器学习模型的生成方法无需每次都通过编程来获取机器学习模型,可以大大减轻工程师的工作量,提高获取机器学习模型的效率。同时,对于业务员等非计算机领域的用户来说,只需要将训练数据提交至终端即可以快速地获取到至少一个机器学习模型,使得机器学习模型的获取方式更加简单,适用人群更广。In this embodiment, the method for generating the machine learning model does not need to acquire the machine learning model through programming every time, which can greatly reduce the workload of the engineer and improve the efficiency of acquiring the machine learning model. At the same time, for non-computer users such as salesmen, it is only necessary to submit the training data to the terminal to quickly acquire at least one machine learning model, so that the machine learning model can be obtained in a simpler manner and is applicable to a wider population.
请参阅图3,图3是本申请实施例提供的一种机器学习模型的生成方法的示意流程图。该方法应用于台式电脑、手提电脑、平板电脑等终端中。如图3所 示,该方法包括步骤S201~S208。Please refer to FIG. 3. FIG. 3 is a schematic flowchart of a method for generating a machine learning model according to an embodiment of the present application. The method is applied to terminals such as desktop computers, laptop computers, and tablet computers. As shown in Fig. 3, the method includes steps S201 to S208.
S201、获取训练数据、数据处理模块标识信息和算法模块标识信息。S201. Obtain training data, data processing module identification information, and algorithm module identification information.
在本实施例中,终端可以通过预设用户操作界面与用户进行交互,进而获取用户输入的训练数据、数据处理模块标识信息和算法模块标识信息。譬如,如图4所示,图4为本申请一实施例中预设用户操作界面的结构示意图。该预设用户操作界面10中设有“数据处理模块标识信息选择栏”、“算法模块标识信息选择栏”等。用户可以在“数据处理模块标识信息选择栏”、“算法模块标识信息选择栏”中勾选需要使用到的数据处理方法以及算法。用户可以点击“浏览”功能按钮,以使得用户可以将训练数据的存储路径输入值相应输入框中,然后点击“确认提交”按钮,这样,终端就可以通过预设用户操作界面10来获取到用户输入的数据处理模块标识信息、算法模块标识信息和训练数据的存储路径,并根据训练数据的存储路径来获取到对应的训练数据。In this embodiment, the terminal may interact with the user by using a preset user operation interface, thereby acquiring training data, data processing module identification information, and algorithm module identification information input by the user. For example, as shown in FIG. 4, FIG. 4 is a schematic structural diagram of a preset user operation interface according to an embodiment of the present application. The preset user operation interface 10 is provided with a "data processing module identification information selection column", an "algorithm module identification information selection column", and the like. The user can select the data processing method and algorithm to be used in the "data processing module identification information selection column" and "algorithm module identification information selection column". The user can click the “Browse” function button, so that the user can input the storage path of the training data into the corresponding input box, and then click the “confirm submit” button, so that the terminal can obtain the user by preset the user operation interface 10. The input data processing module identification information, the algorithm module identification information, and the storage path of the training data, and the corresponding training data are obtained according to the storage path of the training data.
在一实施例中,该训练数据可以为历史注册数据、交易数据等,在此不做具体限制。另外,该数据处理模块标识信息可以为数据处理方法对应的名称,譬如,数据处理模块标识信息可以为异常值检测方法。同理,该算法模块标识信息可以为模型训练所采用的算法对应的名称,譬如,算法模块标识信息可以为逻辑回归算法。当然,在其他实施例中,数据处理模块标识信息以及算法模块标识信息也可以采用其他的标识信息,在此不做具体限制。In an embodiment, the training data may be historical registration data, transaction data, etc., and is not specifically limited herein. In addition, the data processing module identification information may be a name corresponding to the data processing method. For example, the data processing module identification information may be an abnormal value detection method. Similarly, the algorithm module identification information may be a name corresponding to the algorithm used in the model training. For example, the algorithm module identification information may be a logistic regression algorithm. Of course, in other embodiments, the data processing module identification information and the algorithm module identification information may also adopt other identification information, which is not specifically limited herein.
需要说明的是,终端获取用户输入的训练数据、数据处理模块标识信息和算法模块标识信息的方式不局限于图4所示的方式,还可以为其他种方式,图4所示的方式仅仅是众多种获取方式中的一种,在此不做具体限制。在本实施例中,对于一些计算机等领域的专业用户来说,用户可以通过类似图4所示的方式输入所需的数据处理模块标识信息和算法模块标识信息。这样用户可以根据个人的经验和项目需求等来有针对性地设置数据处理方法和模型训练所用的算法,满足专业用户人群的机器学习模型的获取需求。It should be noted that the manner in which the terminal acquires the training data, the data processing module identification information, and the algorithm module identification information input by the user is not limited to the manner shown in FIG. 4, and may be other modes. The manner shown in FIG. 4 is only There is one of many ways to obtain, and no specific restrictions are imposed here. In this embodiment, for some professional users in the field of computers and the like, the user can input the required data processing module identification information and algorithm module identification information in a manner similar to that shown in FIG. In this way, the user can set the data processing method and the algorithm used for model training according to the individual's experience and project requirements, etc., to meet the acquisition requirements of the machine learning model of the professional user population.
在一实施例中,用户还可以设置算法模块标识信息对应的算法参数。具体地,获取训练数据、数据处理模块标识信息和算法模块标识信息,还包括:获取所述算法模块标识信息对应的算法参数。这样,用户可以根据个人的经验来设置算法模块标识信息对应的算法模块中的算法参数,可以更快、更好地得到机器学习模型。其中,对于逻辑回归算法来说,该算法参数可以包括步长和迭 代次数。对于决策树算法来说,该算法参数可以包括树深度、最大分裂特征数和不存度。该算法参数可以根据对应的预存算法模块中具体算法来设置,在此不做具体限制。在本实施例中,终端通过步骤S201后,获取的数据处理模块标识信息和算法模块标识信息的个数均为至少一个。In an embodiment, the user may also set algorithm parameters corresponding to the algorithm module identification information. Specifically, the acquiring the training data, the data processing module identification information, and the algorithm module identification information, further includes: acquiring the algorithm parameter corresponding to the identifier information of the algorithm module. In this way, the user can set the algorithm parameters in the algorithm module corresponding to the algorithm module identification information according to personal experience, and the machine learning model can be obtained faster and better. For the logistic regression algorithm, the algorithm parameters may include the step size and the number of iterations. For the decision tree algorithm, the algorithm parameters may include tree depth, maximum split feature number, and non-existence. The algorithm parameters may be set according to a specific algorithm in the corresponding pre-stored algorithm module, and no specific limitation is imposed herein. In this embodiment, after the terminal passes the step S201, the number of the data processing module identification information and the algorithm module identification information acquired are at least one.
S202、将所有所述数据处理模块标识信息和所有所述算法模块标识信息进行排列组合形成至少一组机器学习组。S202. Arrange and combine all the data processing module identification information and all the algorithm module identification information to form at least one group of machine learning groups.
譬如,数据处理模块标识信息和算法模块标识信息的个数均为两个,那么排列组合后可以形成四组机器学习组。每组机器学习组中均包括一个数据处理模块标识信息和一个算法模块标识信息。For example, if the number of data processing module identification information and algorithm module identification information is two, then four groups of machine learning groups can be formed after the combination. Each group of machine learning groups includes a data processing module identification information and an algorithm module identification information.
S203、依次读取每组所述机器学习组中的数据处理模块标识信息和算法模块标识信息。S203. Read data processing module identification information and algorithm module identification information in each group of the machine learning group in sequence.
在本实施例中,每读取一组机器学习组中的数据处理模块模块标识信息和算法模块标识信息后,执行步骤S204至步骤S207。然后再读取下一组机器学习组中的数据处理模块模块标识信息和算法模块标识信息,再执行步骤S204至步骤S207,直至获得每组机器学习组对应的机器学习模型及对应的验证指标为止。In this embodiment, after reading the data processing module module identification information and the algorithm module identification information in a group of machine learning groups, step S204 to step S207 are performed. Then, the data processing module module identification information and the algorithm module identification information in the next group of machine learning groups are read, and then steps S204 to S207 are performed until the machine learning model corresponding to each group of machine learning groups and the corresponding verification indicators are obtained. .
S204、根据所述数据处理模块标识信息调取对应的预存数据处理模块。S204. Acquire a corresponding pre-stored data processing module according to the data processing module identification information.
具体地,终端将用户输入的数据处理模块标识信息与终端中预存数据处理模块标识信息进行匹配,并将相匹配的标识信息对应的预存数据处理模块调取出来。其中,该预存数据处理模块标识信息可以为数据处理方法对应的名称,譬如,预存数据处理模块标识信息可以为异常值检测方法。Specifically, the terminal matches the data processing module identification information input by the user with the pre-stored data processing module identification information in the terminal, and retrieves the pre-stored data processing module corresponding to the matched identification information. The identifier information of the pre-stored data processing module may be a name corresponding to the data processing method. For example, the identifier information of the pre-stored data processing module may be an abnormal value detection method.
S205、根据所述算法模块标识信息调取对应的预存算法模块。S205. Acquire a corresponding pre-stored algorithm module according to the algorithm module identification information.
具体地,终端将用户输入的算法模块标识信息与终端中预存算法模块标识信息进行匹配,并将相匹配的标识信息对应的预存算法模块调取出来。其中,该预存算法模块标识信息可以为算法对应的名称,譬如,预存算法模块标识信息可以为逻辑回归算法。Specifically, the terminal matches the algorithm module identification information input by the user with the pre-stored algorithm module identification information in the terminal, and retrieves the pre-stored algorithm module corresponding to the matched identification information. The pre-stored algorithm module identification information may be a name corresponding to the algorithm. For example, the pre-stored algorithm module identification information may be a logistic regression algorithm.
S206、根据所述预存数据处理模块对所述训练数据进行数据预处理以得到处理后的训练数据。S206. Perform data preprocessing on the training data according to the pre-stored data processing module to obtain processed training data.
譬如,步骤S204获取到的预存数据处理模块为异常值检测方法模块,那么终端就利用异常值检测方法模块对训练数据进行数据预处理,以得到处理后的训练数据。For example, if the pre-stored data processing module acquired in step S204 is an abnormal value detecting method module, the terminal performs data pre-processing on the training data by using the abnormal value detecting method module to obtain the processed training data.
S207、根据所述处理后的训练数据对所述预存算法模块进行训练及验证以得到机器学习模型和所述机器学习模型对应的验证指标。S207. Train and verify the pre-stored algorithm module according to the processed training data to obtain a machine learning model and a verification indicator corresponding to the machine learning model.
在对训练数据进行数据预处理后,将利用处理后的训练数据对S205中获取到的预存算法模块进行训练及验证,进而得到机器学习模型及对应的验证指标。After preprocessing the data of the training data, the pre-stored algorithm module acquired in S205 is trained and verified by using the processed training data, thereby obtaining a machine learning model and corresponding verification indicators.
在一实施例中,当步骤S201获取了算法模块标识信息对应的算法参数时,步骤S207具体包括:根据所述处理后的训练数据通过设置所述算法参数对所述预存算法模块进行训练及验证以得到机器学习模型和所述机器学习模型对应的验证指标。In an embodiment, when the algorithm parameter corresponding to the algorithm module identification information is acquired in step S201, step S207 specifically includes: training and verifying the pre-stored algorithm module by setting the algorithm parameter according to the processed training data. To obtain a machine learning model and a verification indicator corresponding to the machine learning model.
具体地,将用户设置的算法参数带入到预存算法模块的相应位置,形成初始模型,然后用处理后的训练数据对初始模型进行训练得到最终的机器学习模型。也就是说,当用户设置了算法参数时,采用用户设置的算法参数带入预存算法模块中以形成初始模型。Specifically, the algorithm parameters set by the user are brought into the corresponding positions of the pre-stored algorithm module to form an initial model, and then the initial model is trained with the processed training data to obtain a final machine learning model. That is to say, when the user sets the algorithm parameters, the algorithm parameters set by the user are brought into the pre-stored algorithm module to form an initial model.
在得到机器学习模型后,需要用处理后的训练数据对机器学习模型进行验证,以得到验证指标。其中,该验证指标包括召回率、精准率等等,在此不做具体限制。After obtaining the machine learning model, the machine learning model needs to be verified with the processed training data to obtain the verification index. Among them, the verification indicators include recall rate, accuracy rate, etc., and no specific restrictions are imposed here.
S208、按照预设显示规则显示所述机器学习模型及其对应的验证指标。S208. Display the machine learning model and its corresponding verification indicator according to a preset display rule.
具体地,在一实施例中,根据所述验证指标的大小顺序显示所述机器学习模型的标识信息及所述机器学习模型对应的验证指标。Specifically, in an embodiment, the identification information of the machine learning model and the verification indicator corresponding to the machine learning model are sequentially displayed according to the size of the verification indicator.
其中,该机器学习模型的标识信息可以为每个机器学习模型的编号,譬如,编号“001”、编号“002”等。该机器学习模型的标识信息还可以为机器学习模型的名字,譬如,“逻辑回归机器学习模型1”、“逻辑回归机器学习模型2”等等,在此不做具体限制。The identification information of the machine learning model may be a number of each machine learning model, for example, the number “001”, the number “002”, and the like. The identification information of the machine learning model may also be the name of the machine learning model, for example, "Logistic Regression Machine Learning Model 1", "Logistic Regression Machine Learning Model 2", etc., and no specific restrictions are imposed herein.
需要说明的是,预设显示规则不局限于上述按照验证指标的大小顺序排列显示,还可以为其他种显示规则。譬如,预设显示规则还可以为显示验证指标最好的机器学习模型的标识信息及对应的验证指标。如,当验证指标为精准率时,可以显示多个机器学习模型中精准率最高的一个机器学习模型的标识信息及对应的精准率。It should be noted that the preset display rules are not limited to the above-mentioned order according to the size of the verification index, and may also be other kinds of display rules. For example, the preset display rule may also be the identification information of the machine learning model that displays the best verification indicator and the corresponding verification indicator. For example, when the verification index is the accuracy rate, the identification information of a machine learning model with the highest accuracy rate in multiple machine learning models and the corresponding accuracy rate can be displayed.
在一实施例中,该终端还可以显示出每个机器学习模型对应的数据处理模块标识信息和算法模块标识信息,这样方便用户更好地了解机器学习模型的生成过程中所采用的数据处理方法以及模型训练采用的算法等信息。当然,终端 还可以将最终获得的每个机器学习模型中预存算法模块对应的算法参数显示出来,在此不对显示内容做限制。In an embodiment, the terminal may further display data processing module identification information and algorithm module identification information corresponding to each machine learning model, so that the user can better understand the data processing method used in the generation process of the machine learning model. And information such as algorithms used in model training. Of course, the terminal can also display the algorithm parameters corresponding to the pre-stored algorithm module in each machine learning model finally obtained, and the display content is not limited herein.
在本实施例中,该机器学习模型的生成方法无需每次都通过编程来获取机器学习模型,可以大大减轻工程师的工作量,提高获取机器学习模型的效率。同时,对于计算机领域的专业用户来说,可以根据个人经验、项目所需等设置数据处理方法、模型训练的算法、算法参数等信息,使得终端根据用户的设置快速地获取到对应的机器学习模型,使得机器学习模型的获取方式更加简单方便。In this embodiment, the method for generating the machine learning model does not need to acquire the machine learning model through programming every time, which can greatly reduce the workload of the engineer and improve the efficiency of acquiring the machine learning model. At the same time, for professional users in the computer field, data processing methods, model training algorithms, algorithm parameters and other information can be set according to personal experience, project requirements, etc., so that the terminal quickly acquires the corresponding machine learning model according to the user's settings. It makes the acquisition of the machine learning model more simple and convenient.
本申请实施例还提供一种机器学习模型的生成装置,该机器学习模型的生成装置用于执行前述任一项机器学习模型的生成方法。具体地,请参阅图5,图5是本申请实施例提供的一种机器学习模型的生成装置的示意性框图。机器学习模型的生成装置300可以安装于台式电脑、平板电脑、手提电脑、等终端中。The embodiment of the present application further provides a device for generating a machine learning model, where the device for generating a machine learning model is configured to execute a method for generating a machine learning model. Specifically, please refer to FIG. 5. FIG. 5 is a schematic block diagram of a device for generating a machine learning model according to an embodiment of the present application. The machine learning model generating device 300 can be installed in a desktop computer, a tablet computer, a laptop computer, or the like.
如图5所示,机器学习模型的生成装置300包括数据获取单元301、模块调取单元302、数据处理单元303、模型生成单元304和显示单元305。As shown in FIG. 5, the machine learning model generating apparatus 300 includes a data acquiring unit 301, a module calling unit 302, a data processing unit 303, a model generating unit 304, and a display unit 305.
数据获取单301,用于获取训练数据。The data acquisition sheet 301 is used to acquire training data.
模块调取单元302,用于调取预存数据处理模块及预存算法模块。The module retrieving unit 302 is configured to retrieve a pre-stored data processing module and a pre-stored algorithm module.
在一实施例中,该数据获取单301还用于获取数据处理模块标识信息和算法模块标识信息。具体地,该数据获取单301获取终端中预存数据处理模块标识信息作为数据处理模块标识信息,以及获取所述终端中预存算法模块标识信息作为算法模块标识信息。相应地,模块调取单元302具体用于根据所述数据处理模块标识信息调取对应的预存数据处理模块;以及根据所述算法模块标识信息调取对应的预存算法模块。In an embodiment, the data acquisition unit 301 is further configured to acquire data processing module identification information and algorithm module identification information. Specifically, the data acquisition unit 301 acquires the pre-stored data processing module identification information in the terminal as the data processing module identification information, and acquires the pre-stored algorithm module identification information in the terminal as the algorithm module identification information. Correspondingly, the module retrieving unit 302 is configured to retrieve a corresponding pre-stored data processing module according to the data processing module identification information; and retrieve a corresponding pre-stored algorithm module according to the algorithm module identification information.
一般来说,终端中存储的预存数据处理模块和预存算法模块的个数均为至少一个。如图6所示,图6为本申请实施例提供的机器学习模型的生成装置的另一示意性框图。该机器学习模型的生成装置300还包括排列组合单元306和读取单元307。排列组合单元306用于将所有所述数据处理模块标识信息和所有所述算法模块标识信息进行排列组合形成至少一组机器学习组。该读取单元307用于依次读取每组所述机器学习组中的数据处理模块标识信息和算法模块标识信息。该模块调取单元302根据读取单元307读取到的数据处理模块标识信息 调取对应的预存数据处理模块,以及根据读取单元307读取到的算法模块标识信息调取对应的预存算法模块。Generally, the number of pre-stored data processing modules and pre-stored algorithm modules stored in the terminal is at least one. As shown in FIG. 6, FIG. 6 is another schematic block diagram of a device for generating a machine learning model according to an embodiment of the present application. The machine learning model generating device 300 further includes an arrangement combining unit 306 and a reading unit 307. The arrangement combining unit 306 is configured to arrange all the data processing module identification information and all the algorithm module identification information into at least one set of machine learning groups. The reading unit 307 is configured to sequentially read data processing module identification information and algorithm module identification information in each group of the machine learning group. The module retrieving unit 302 retrieves the corresponding pre-stored data processing module according to the data processing module identification information read by the reading unit 307, and retrieves the corresponding pre-stored algorithm module according to the algorithm module identification information read by the reading unit 307. .
数据处理单元303,用于根据所述预存数据处理模块对所述训练数据进行数据预处理以得到处理后的训练数据。The data processing unit 303 is configured to perform data preprocessing on the training data according to the pre-stored data processing module to obtain processed training data.
模型生成单元304,用于根据所述处理后的训练数据对所述预存算法模块进行训练及验证以得到机器学习模型和所述机器学习模型对应的验证指标。The model generating unit 304 is configured to perform training and verification on the pre-stored algorithm module according to the processed training data to obtain a machine learning model and a verification indicator corresponding to the machine learning model.
具体地,在一实施例中,如图6所示,该模型生成单元304包括划分子单元3041、训练子单元3042和验证子单元3043。Specifically, in an embodiment, as shown in FIG. 6, the model generation unit 304 includes a division subunit 3041, a training subunit 3042, and a verification subunit 3043.
划分子单元3041,用于将所述处理后的训练数据按照预设比例划分成训练模型数据和验证模型数据。The dividing subunit 3041 is configured to divide the processed training data into training model data and verification model data according to a preset ratio.
训练子单元3042,用于根据所述训练模型数据对所述预存算法模块进行训练以得到机器学习模型。The training subunit 3042 is configured to train the pre-stored algorithm module according to the training model data to obtain a machine learning model.
验证子单元3043,用于根据所述验证模型数据对所述机器学习模型进行验证以得到所述机器学习模型对应的验证指标。The verification subunit 3043 is configured to verify the machine learning model according to the verification model data to obtain a verification indicator corresponding to the machine learning model.
显示单元305,用于按照预设显示规则显示所述机器学习模型及其对应的验证指标。The display unit 305 is configured to display the machine learning model and its corresponding verification indicator according to a preset display rule.
具体地,在一实施例中,显示单元305根据所述验证指标的大小顺序显示所述机器学习模型的标识信息及所述机器学习模型对应的验证指标。Specifically, in an embodiment, the display unit 305 sequentially displays the identification information of the machine learning model and the verification indicator corresponding to the machine learning model according to the size of the verification indicator.
在一实施例中,显示单元305还可以显示出每个机器学习模型对应的数据处理模块标识信息和算法模块标识信息,这样方便用户更好地了解机器学习模型的生成过程中所采用的数据处理方法以及模型训练采用的算法等信息。当然,显示单元305还可以将最终获得的每个机器学习模型中预存算法模块对应的算法参数显示出来,在此不对显示内容做限制。In an embodiment, the display unit 305 can also display the data processing module identification information and the algorithm module identification information corresponding to each machine learning model, so that the user can better understand the data processing used in the process of generating the machine learning model. The method and the algorithm used in the model training. Of course, the display unit 305 can also display the algorithm parameters corresponding to the pre-stored algorithm modules in each machine learning model finally obtained, and the display content is not limited herein.
需要说明的是,所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的机器学习模型的生成装置300和各单元的具体工作过程,可以参考前述机器学习模型的生成方法实施例中的对应过程,在此不再赘述。在本实施例中,该机器学习模型的生成装置300可以提高获取机器学习模型的效率,同时,使得机器学习模型的获取方式更加简单,适用人群更广。It should be noted that those skilled in the art can clearly understand that for the convenience and brevity of the description, the above-mentioned machine learning model generating apparatus 300 and the specific working process of each unit can refer to the foregoing machine learning model generating method. Corresponding processes in the embodiments are not described herein again. In this embodiment, the machine learning model generating apparatus 300 can improve the efficiency of acquiring the machine learning model, and at the same time, make the machine learning model acquisition mode simpler and applicable to a wider population.
请参阅图7,图7是本申请实施例提供的一种机器学习模型的生成装置的示 意性框图。该机器学习模型的生成装置400可以安装于台式电脑、平板电脑、手提电脑、等终端中。Referring to FIG. 7, FIG. 7 is a schematic block diagram of a device for generating a machine learning model according to an embodiment of the present application. The machine learning model generating device 400 can be installed in a desktop computer, a tablet computer, a laptop computer, or the like.
如图7所示,机器学习模型的生成装置400包括数据获取单元401、排列组合单元402、读取单元403、模块调取单元404、数据处理单元405、模型生成单元406和显示单元407。As shown in FIG. 7, the machine learning model generating apparatus 400 includes a data acquiring unit 401, an array combining unit 402, a reading unit 403, a module calling unit 404, a data processing unit 405, a model generating unit 406, and a display unit 407.
数据获取单元401,用于获取训练数据、数据处理模块标识信息和算法模块标识信息。The data obtaining unit 401 is configured to acquire training data, data processing module identification information, and algorithm module identification information.
在一实施例中,具体地,数据获取单元401还用于:获取所述算法模块标识信息对应的算法参数。这样,用户可以根据个人的经验来设置算法模块标识信息对应的算法模块中的算法参数,可以更快、更好地得到机器学习模型。In an embodiment, the data acquiring unit 401 is further configured to: acquire algorithm parameters corresponding to the algorithm module identification information. In this way, the user can set the algorithm parameters in the algorithm module corresponding to the algorithm module identification information according to personal experience, and the machine learning model can be obtained faster and better.
可以理解的是,在本实施例中,数据获取单元401获取的数据处理模块标识信息和算法模块标识信息的个数均为至少一个。It can be understood that, in this embodiment, the number of data processing module identification information and algorithm module identification information acquired by the data acquiring unit 401 is at least one.
排列组合单元402,用于将所有所述数据处理模块标识信息和所有所述算法模块标识信息进行排列组合形成至少一组机器学习组。The arranging unit 402 is configured to arrange all the data processing module identification information and all the algorithm module identification information into at least one set of machine learning groups.
读取单元403,用于依次读取每组所述机器学习组中的数据处理模块标识信息和算法模块标识信息。The reading unit 403 is configured to sequentially read data processing module identification information and algorithm module identification information in each group of the machine learning group.
模块调取单元404,用于根据所述数据处理模块标识信息调取对应的预存数据处理模块;以及根据所述算法模块标识信息调取对应的预存算法模块。The module retrieving unit 404 is configured to retrieve a corresponding pre-stored data processing module according to the data processing module identification information, and retrieve a corresponding pre-stored algorithm module according to the algorithm module identification information.
数据处理单元405,用于根据所述预存数据处理模块对所述训练数据进行数据预处理以得到处理后的训练数据。The data processing unit 405 is configured to perform data preprocessing on the training data according to the pre-stored data processing module to obtain processed training data.
模型生成单元406,用于根据所述处理后的训练数据对所述预存算法模块进行训练及验证以得到机器学习模型和所述机器学习模型对应的验证指标。The model generating unit 406 is configured to perform training and verification on the pre-stored algorithm module according to the processed training data to obtain a machine learning model and a verification indicator corresponding to the machine learning model.
在一实施例中,当数据获取单元401获取了算法模块标识信息对应的算法参数时,模型生成单元406具体用于根据所述处理后的训练数据通过设置所述算法参数对所述预存算法模块进行训练及验证以得到机器学习模型和所述机器学习模型对应的验证指标。In an embodiment, when the data acquisition unit 401 acquires the algorithm parameter corresponding to the algorithm module identification information, the model generation unit 406 is specifically configured to set the algorithm parameter to the pre-stored algorithm module according to the processed training data. Training and verification are performed to obtain a machine learning model and a verification indicator corresponding to the machine learning model.
显示单元407,用于按照预设显示规则显示所述机器学习模型及其对应的验证指标。The display unit 407 is configured to display the machine learning model and its corresponding verification indicator according to a preset display rule.
具体地,在一实施例中,显示单元407用于根据所述验证指标的大小顺序显示所述机器学习模型的标识信息及所述机器学习模型对应的验证指标。Specifically, in an embodiment, the display unit 407 is configured to display the identification information of the machine learning model and the verification indicator corresponding to the machine learning model according to the size of the verification indicator.
在一实施例中,该显示单元407还可以显示出每个机器学习模型对应的数据处理模块标识信息和算法模块标识信息,这样方便用户更好地了解机器学习模型的生成过程中所采用的数据处理方法以及模型训练采用的算法等信息。当然,显示单元407还可以将最终获得的每个机器学习模型中预存算法模块对应的算法参数显示出来,在此不对显示内容做限制。In an embodiment, the display unit 407 can also display the data processing module identification information and the algorithm module identification information corresponding to each machine learning model, so that the user can better understand the data used in the process of generating the machine learning model. Processing methods and information such as algorithms used in model training. Of course, the display unit 407 can also display the algorithm parameters corresponding to the pre-stored algorithm modules in each machine learning model finally obtained, and the display content is not limited herein.
需要说明的是,所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的机器学习模型的生成装置400和各单元的具体工作过程,可以参考前述机器学习模型的生成方法实施例中的对应过程,在此不再赘述。It should be noted that those skilled in the art can clearly understand that for the convenience and brevity of the description, the above-mentioned machine learning model generating apparatus 400 and the specific working process of each unit can refer to the foregoing machine learning model generating method. Corresponding processes in the embodiments are not described herein again.
在本实施例中,该机器学习模型的生成装置400可以提高获取机器学习模型的效率,同时,使得机器学习模型的获取方式更加简单方便。In the embodiment, the machine learning model generating apparatus 400 can improve the efficiency of acquiring the machine learning model, and at the same time, make the machine learning model acquisition manner more simple and convenient.
上述机器学习模型的生成装置可以实现为一种计算机程序的形式,该计算机程序可以在如图8所示的计算机设备上运行。请参阅图8,图8是本申请实施例提供的一种计算机设备的示意性框图。该计算机设备500可以是终端。该终端可以是平板电脑、笔记本电脑、台式电脑、个人数字助理等电子设备。参阅图8,该计算机设备500包括通过系统总线501连接的处理器502、存储器和网络接口505,其中,存储器可以包括非易失性存储介质503和内存储器504。The apparatus for generating a machine learning model described above can be implemented in the form of a computer program that can be run on a computer device as shown in FIG. Please refer to FIG. 8. FIG. 8 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 500 can be a terminal. The terminal can be an electronic device such as a tablet computer, a notebook computer, a desktop computer, or a personal digital assistant. Referring to FIG. 8, the computer device 500 includes a processor 502, a memory, and a network interface 505 connected by a system bus 501, wherein the memory can include a non-volatile storage medium 503 and an internal memory 504.
该非易失性存储介质503可存储操作系统5031和计算机程序5032。该计算机程序5032包括程序指令,该程序指令被执行时,可使得处理器502执行一种机器学习模型的生成方法。该处理器502用于提供计算和控制能力,支撑整个计算机设备500的运行。该内存储器504为非易失性存储介质503中的计算机程序5032的运行提供环境,该计算机程序5032被处理器502执行时,可使得处理器502执行一种机器学习模型的生成方法。该网络接口505用于进行网络通信,如发送分配的任务等。本领域技术人员可以理解,图8中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备500的限定,具体的计算机设备500可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。The non-volatile storage medium 503 can store an operating system 5031 and a computer program 5032. The computer program 5032 includes program instructions that, when executed, cause the processor 502 to perform a method of generating a machine learning model. The processor 502 is used to provide computing and control capabilities to support the operation of the entire computer device 500. The internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503, which when executed by the processor 502, may cause the processor 502 to perform a method of generating a machine learning model. The network interface 505 is used for network communication, such as sending assigned tasks and the like. It will be understood by those skilled in the art that the structure shown in FIG. 8 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation of the computer device 500 to which the solution of the present application is applied, and a specific computer device. 500 may include more or fewer components than shown, or some components may be combined, or have different component arrangements.
其中,所述处理器502用于运行存储在存储器中的计算机程序5032以实现本申请各实施例中的机器学习模型的生成方法。应当理解,在本申请实施例中,处理器502可以是中央处理单元,还可以是其他通用处理器、数字信号处理器、 专用集成电路、现成可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The processor 502 is configured to execute a computer program 5032 stored in a memory to implement a method for generating a machine learning model in various embodiments of the present application. It should be understood that in the embodiment of the present application, the processor 502 may be a central processing unit, and may also be other general-purpose processors, digital signal processors, application specific integrated circuits, off-the-shelf programmable gate arrays or other programmable logic devices, and discrete gates. Or transistor logic devices, discrete hardware components, and so on. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
在本申请的另一实施例中提供一种存储介质。该存储介质可以为计算机可读存储介质。该存储介质存储有计算机程序,其中计算机程序包括程序指令。该程序指令被处理器执行时使处理器执行本申请中的机器学习模型的生成方法。该存储介质可以是U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、闪存、磁碟或者光盘等各种可以存储程序代码的介质。In another embodiment of the present application, a storage medium is provided. The storage medium can be a computer readable storage medium. The storage medium stores a computer program, wherein the computer program includes program instructions. The program instructions, when executed by the processor, cause the processor to perform the method of generating the machine learning model of the present application. The storage medium may be a medium that can store program codes, such as a USB flash drive, a removable hard disk, a Read-Only Memory (ROM), a flash memory, a magnetic disk, or an optical disk.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both, for clarity of hardware and software. Interchangeability, the composition and steps of the various examples have been generally described in terms of function in the above description.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。本申请实施例方法中的步骤可以根据实际需要进行顺序调整、合并和删减。本申请实施例装置中的单元可以根据实际需要进行合并、划分和删减。另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以是两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, multiple units or components may be combined or integrated into another system, or some features may be omitted or not implemented. The steps in the method of the embodiment of the present application may be sequentially adjusted, merged, and deleted according to actual needs. The units in the apparatus of the embodiment of the present application may be combined, divided, and deleted according to actual needs. In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
该集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分,或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,终端,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。The integrated unit can be stored in a storage medium if it is implemented in the form of a software functional unit and sold or used as a standalone product. Based on such understanding, the technical solution of the present application may be in essence or part of the contribution to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium. There are a number of instructions for causing a computer device (which may be a personal computer, terminal, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。The foregoing is only a specific embodiment of the present application, but the scope of protection of the present application is not limited thereto, and any equivalents can be easily conceived by those skilled in the art within the technical scope disclosed in the present application. Modifications or substitutions are intended to be included within the scope of the present application. Therefore, the scope of protection of this application should be determined by the scope of protection of the claims.

Claims (20)

  1. 一种机器学习模型的生成方法,其包括:A method for generating a machine learning model, comprising:
    获取训练数据;Obtain training data;
    调取预存数据处理模块及预存算法模块;Retrieving a pre-stored data processing module and a pre-stored algorithm module;
    根据所述预存数据处理模块对所述训练数据进行数据预处理以得到处理后的训练数据;Performing data preprocessing on the training data according to the pre-stored data processing module to obtain processed training data;
    根据所述处理后的训练数据对所述预存算法模块进行训练及验证以得到机器学习模型和所述机器学习模型对应的验证指标;以及Performing training and verification on the pre-stored algorithm module according to the processed training data to obtain a verification index corresponding to the machine learning model and the machine learning model;
    按照预设显示规则显示所述机器学习模型及其对应的验证指标。The machine learning model and its corresponding verification indicator are displayed according to a preset display rule.
  2. 根据权利要求1所述的机器学习模型的生成方法,其中,在所述调取预存数据处理模块及预存算法模块之前,还包括:获取数据处理模块标识信息和算法模块标识信息;The method for generating a machine learning model according to claim 1, wherein before the retrieving the pre-stored data processing module and the pre-stored algorithm module, the method further comprises: acquiring data processing module identification information and algorithm module identification information;
    所述调取预存数据处理模块及预存算法模块,包括:The retrieving the pre-stored data processing module and the pre-stored algorithm module includes:
    根据所述数据处理模块标识信息调取对应的预存数据处理模块;以及And corresponding to the pre-stored data processing module according to the data processing module identification information;
    根据所述算法模块标识信息调取对应的预存算法模块。And corresponding to the pre-stored algorithm module according to the algorithm module identification information.
  3. 根据权利要求2所述的机器学习模型的生成方法,其中,所述获取数据处理模块标识信息和算法模块标识信息,包括:获取终端中预存数据处理模块标识信息作为数据处理模块标识信息,以及获取所述终端中预存算法模块标识信息作为算法模块标识信息。The method for generating a machine learning model according to claim 2, wherein the obtaining the data processing module identification information and the algorithm module identification information comprises: acquiring the pre-stored data processing module identification information in the terminal as the data processing module identification information, and acquiring The algorithm module identification information is pre-stored in the terminal as the algorithm module identification information.
  4. 根据权利要求2所述的机器学习模型的生成方法,其中,所述获取数据处理模块标识信息和算法模块标识信息,包括:获取用户输入的数据处理模块标识信息和算法模块标识信息。The method for generating a machine learning model according to claim 2, wherein the obtaining the data processing module identification information and the algorithm module identification information comprises: acquiring data processing module identification information and algorithm module identification information input by the user.
  5. 根据权利要求2所述的机器学习模型的生成方法,其中,所述数据处理模块标识信息和算法模块标识信息的个数均为至少一个;在所述调取预存数据处理模块及预存算法模块之前,还包括:The method for generating a machine learning model according to claim 2, wherein the number of the data processing module identification information and the algorithm module identification information are at least one; before the retrieving the pre-stored data processing module and the pre-stored algorithm module ,Also includes:
    将所有所述数据处理模块标识信息和所有所述算法模块标识信息进行排列组合形成至少一组机器学习组;以及Arranging all of the data processing module identification information and all of the algorithm module identification information into at least one set of machine learning groups;
    依次读取每组所述机器学习组中的数据处理模块标识信息和算法模块标识信息。The data processing module identification information and the algorithm module identification information in each group of the machine learning group are sequentially read.
  6. 根据权利要求4所述的机器学习模型的生成方法,其中,所述获取数据处理模块标识信息和算法模块标识信息,还包括:获取所述算法模块标识信息对应的算法参数;The method for generating a machine learning model according to claim 4, wherein the obtaining the data processing module identification information and the algorithm module identification information further comprises: acquiring an algorithm parameter corresponding to the algorithm module identification information;
    所述根据所述处理后的训练数据对所述预存算法模块进行训练及验证以得到机器学习模型和所述机器学习模型对应的验证指标,包括:根据所述处理后的训练数据通过设置所述算法参数对所述预存算法模块进行训练及验证以得到机器学习模型和所述机器学习模型对应的验证指标。And performing the training and verification on the pre-stored algorithm module according to the processed training data to obtain the machine learning model and the verification indicator corresponding to the machine learning model, including: setting, according to the processed training data, The algorithm parameters train and verify the pre-stored algorithm module to obtain a machine learning model and a verification indicator corresponding to the machine learning model.
  7. 根据权利要求1所述的机器学习模型的生成方法,其中,所述根据所述处理后的训练数据对所述预存算法模块进行训练及验证以得到机器学习模型和所述机器学习模型对应的验证指标,包括:The method for generating a machine learning model according to claim 1, wherein said training and verifying said pre-stored algorithm module according to said processed training data to obtain verification corresponding to said machine learning model and said machine learning model Indicators, including:
    将所述处理后的训练数据按照预设比例划分成训练模型数据和验证模型数据;And dividing the processed training data into training model data and verification model data according to a preset ratio;
    根据所述训练模型数据对所述预存算法模块进行训练以得到机器学习模型;以及Training the pre-stored algorithm module to obtain a machine learning model according to the training model data;
    根据所述验证模型数据对所述机器学习模型进行验证以得到所述机器学习模型对应的验证指标。And verifying the machine learning model according to the verification model data to obtain a verification indicator corresponding to the machine learning model.
  8. 根据权利要求1所述的机器学习模型的生成方法,其中,所述按照预设显示规则显示所述机器学习模型及其对应的验证指标,包括:根据所述验证指标的大小顺序显示所述机器学习模型的标识信息及所述机器学习模型对应的验证指标。The method for generating a machine learning model according to claim 1, wherein the displaying the machine learning model and the corresponding verification indicator according to the preset display rule comprises: displaying the machine according to the size of the verification indicator The identification information of the learning model and the verification indicator corresponding to the machine learning model.
  9. 一种机器学习模型的生成装置,其包括:A device for generating a machine learning model, comprising:
    数据获取单元,用于获取训练数据;a data acquisition unit, configured to acquire training data;
    模块调取单元,用于调取预存数据处理模块及预存算法模块;a module retrieval unit, configured to retrieve a pre-stored data processing module and a pre-stored algorithm module;
    数据处理单元,用于根据所述预存数据处理模块对所述训练数据进行数据预处理以得到处理后的训练数据;a data processing unit, configured to perform data preprocessing on the training data according to the pre-stored data processing module to obtain processed training data;
    模型生成单元,用于根据所述处理后的训练数据对所述预存算法模块进行训练及验证以得到机器学习模型和所述机器学习模型对应的验证指标;以及a model generating unit, configured to perform training and verification on the pre-stored algorithm module according to the processed training data to obtain a machine learning model and a verification indicator corresponding to the machine learning model;
    显示单元,用于按照预设显示规则显示所述机器学习模型及其对应的验证指标。And a display unit, configured to display the machine learning model and its corresponding verification indicator according to a preset display rule.
  10. 一种计算机设备,包括存储器、处理器及存储在所述存储器上并可在所 述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现:获取训练数据;调取预存数据处理模块及预存算法模块;根据所述预存数据处理模块对所述训练数据进行数据预处理以得到处理后的训练数据;根据所述处理后的训练数据对所述预存算法模块进行训练及验证以得到机器学习模型和所述机器学习模型对应的验证指标;以及按照预设显示规则显示所述机器学习模型及其对应的验证指标。A computer device comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein the processor executes the computer program: acquiring training data; a pre-stored data processing module and a pre-stored algorithm module; performing data pre-processing on the training data according to the pre-stored data processing module to obtain processed training data; and training the pre-stored algorithm module according to the processed training data Verifying to obtain a machine learning model and a verification indicator corresponding to the machine learning model; and displaying the machine learning model and its corresponding verification indicator according to a preset display rule.
  11. 根据权利要求10所述的计算机设备,其中,所述处理器执行调取预存数据处理模块及预存算法模块之前,还实现:获取数据处理模块标识信息和算法模块标识信息;所述处理器执行调取预存数据处理模块及预存算法模块时,具体实现:根据所述数据处理模块标识信息调取对应的预存数据处理模块;以及根据所述算法模块标识信息调取对应的预存算法模块。The computer device according to claim 10, wherein before the processor executes the retrieving the pre-stored data processing module and the pre-stored algorithm module, the method further comprises: acquiring data processing module identification information and algorithm module identification information; When the pre-stored data processing module and the pre-stored algorithm module are taken, the specific implementation: the corresponding pre-stored data processing module is retrieved according to the data processing module identification information; and the corresponding pre-stored algorithm module is retrieved according to the algorithm module identification information.
  12. 根据权利要求11所述的计算机设备,其中,所述处理器执行获取数据处理模块标识信息和算法模块标识信息时,具体实现:获取终端中预存数据处理模块标识信息作为数据处理模块标识信息,以及获取所述终端中预存算法模块标识信息作为算法模块标识信息。The computer device according to claim 11, wherein the processor performs the acquisition of the data processing module identification information and the algorithm module identification information, and specifically implements: acquiring the pre-stored data processing module identification information in the terminal as the data processing module identification information, and Obtaining the pre-stored algorithm module identification information in the terminal as the algorithm module identification information.
  13. 根据权利要求11所述的计算机设备,其中,所述处理器执行获取数据处理模块标识信息和算法模块标识信息时,具体实现:获取用户输入的数据处理模块标识信息和算法模块标识信息。The computer device according to claim 11, wherein the processor performs the acquisition of the data processing module identification information and the algorithm module identification information, and specifically implements: acquiring data processing module identification information and algorithm module identification information input by the user.
  14. 根据权利要求11所述的计算机设备,其中,所述数据处理模块标识信息和算法模块标识信息的个数均为至少一个;所述处理器执行调取预存数据处理模块及预存算法模块之前,还实现:将所有所述数据处理模块标识信息和所有所述算法模块标识信息进行排列组合形成至少一组机器学习组;以及依次读取每组所述机器学习组中的数据处理模块标识信息和算法模块标识信息。The computer device according to claim 11, wherein the number of the data processing module identification information and the algorithm module identification information are at least one; before the processor executes the pre-stored data processing module and the pre-stored algorithm module, Implementation: arranging all the data processing module identification information and all the algorithm module identification information into at least one set of machine learning groups; and sequentially reading data identification module identification information and algorithms in each group of the machine learning groups Module identification information.
  15. 根据权利要求13所述的计算机设备,其中,所述处理器执行获取数据处理模块标识信息和算法模块标识信息时,还实现:获取所述算法模块标识信息对应的算法参数;所述处理器执行根据所述处理后的训练数据对所述预存算法模块进行训练及验证以得到机器学习模型和所述机器学习模型对应的验证指标时,具体实现:根据所述处理后的训练数据通过设置所述算法参数对所述预存算法模块进行训练及验证以得到机器学习模型和所述机器学习模型对应的验证指标。The computer device according to claim 13, wherein when the processor performs the acquisition of the data processing module identification information and the algorithm module identification information, the method further comprises: acquiring an algorithm parameter corresponding to the algorithm module identification information; When the pre-stored algorithm module is trained and verified according to the processed training data to obtain the machine learning model and the verification indicator corresponding to the machine learning model, the specific implementation is: according to the processed training data, by setting the The algorithm parameters train and verify the pre-stored algorithm module to obtain a machine learning model and a verification indicator corresponding to the machine learning model.
  16. 一种存储介质,其中,所述存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行:获取训练数据;调取预存数据处理模块及预存算法模块;根据所述预存数据处理模块对所述训练数据进行数据预处理以得到处理后的训练数据;根据所述处理后的训练数据对所述预存算法模块进行训练及验证以得到机器学习模型和所述机器学习模型对应的验证指标;以及按照预设显示规则显示所述机器学习模型及其对应的验证指标。A storage medium, wherein the storage medium stores a computer program, the computer program comprising program instructions that, when executed by a processor, cause the processor to execute: acquire training data; and retrieve pre-stored data processing a module and a pre-stored algorithm module; performing data pre-processing on the training data according to the pre-stored data processing module to obtain processed training data; and training and verifying the pre-stored algorithm module according to the processed training data to obtain a machine learning model and a verification indicator corresponding to the machine learning model; and displaying the machine learning model and its corresponding verification indicator according to a preset display rule.
  17. 根据权利要求16所述的存储介质,其中,所述程序指令当被处理器执行调取预存数据处理模块及预存算法模块之前,还使所述处理器执行:获取数据处理模块标识信息和算法模块标识信息;所述程序指令当被处理器执行调取预存数据处理模块及预存算法模块时使所述处理器执行:根据所述数据处理模块标识信息调取对应的预存数据处理模块;以及根据所述算法模块标识信息调取对应的预存算法模块。The storage medium according to claim 16, wherein the program instructions, when executed by the processor to retrieve the pre-stored data processing module and the pre-stored algorithm module, further cause the processor to execute: acquiring data processing module identification information and an algorithm module The program instruction is executed by the processor when the processor executes the retrieving the pre-stored data processing module and the pre-stored algorithm module: the corresponding pre-stored data processing module is retrieved according to the data processing module identification information; The algorithm module identification information retrieves a corresponding pre-stored algorithm module.
  18. 根据权利要求17所述的存储介质,其中,所述程序指令当被处理器执行获取数据处理模块标识信息和算法模块标识信息时使所述处理器执行:获取终端中预存数据处理模块标识信息作为数据处理模块标识信息,以及获取所述终端中预存算法模块标识信息作为算法模块标识信息。The storage medium according to claim 17, wherein the program instruction, when executed by the processor, acquires data processing module identification information and algorithm module identification information, causing the processor to execute: acquiring pre-stored data processing module identification information in the terminal as The data processing module identifies the information, and obtains the pre-stored algorithm module identification information in the terminal as the algorithm module identification information.
  19. 根据权利要求17所述的存储介质,其中,所述程序指令当被处理器执行获取数据处理模块标识信息和算法模块标识信息时使所述处理器执行:获取用户输入的数据处理模块标识信息和算法模块标识信息。The storage medium according to claim 17, wherein the program instructions, when executed by the processor to acquire the data processing module identification information and the algorithm module identification information, cause the processor to execute: acquiring data processing module identification information input by the user and Algorithm module identification information.
  20. 根据权利要求17所述的存储介质,其中,所述数据处理模块标识信息和算法模块标识信息的个数均为至少一个;所述程序指令当被处理器执行调取预存数据处理模块及预存算法模块之前,还使所述处理器执行:将所有所述数据处理模块标识信息和所有所述算法模块标识信息进行排列组合形成至少一组机器学习组;以及依次读取每组所述机器学习组中的数据处理模块标识信息和算法模块标识信息。The storage medium according to claim 17, wherein the number of the data processing module identification information and the algorithm module identification information are at least one; the program instructions are executed by the processor to retrieve the pre-stored data processing module and the pre-stored algorithm. Before the module, the processor is further configured to: arrange all of the data processing module identification information and all the algorithm module identification information into at least one set of machine learning groups; and sequentially read each group of the machine learning group Data processing module identification information and algorithm module identification information.
PCT/CN2018/084039 2018-01-30 2018-04-23 Method and apparatus for generating machine learning model, computer device, and storage medium WO2019148669A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810089701.5 2018-01-30
CN201810089701.5A CN108416363A (en) 2018-01-30 2018-01-30 Generation method, device, computer equipment and the storage medium of machine learning model

Publications (1)

Publication Number Publication Date
WO2019148669A1 true WO2019148669A1 (en) 2019-08-08

Family

ID=63127304

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/084039 WO2019148669A1 (en) 2018-01-30 2018-04-23 Method and apparatus for generating machine learning model, computer device, and storage medium

Country Status (2)

Country Link
CN (1) CN108416363A (en)
WO (1) WO2019148669A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11989475B2 (en) 2018-10-09 2024-05-21 Hewlett-Packard Development Company, L.P. Selecting a display with machine learning

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271602B (en) * 2018-09-05 2020-09-15 腾讯科技(深圳)有限公司 Deep learning model publishing method and device
CN109299785B (en) * 2018-09-17 2022-04-26 浪潮软件股份有限公司 Method and device for realizing machine learning model
CN109409533B (en) * 2018-09-28 2021-07-27 深圳乐信软件技术有限公司 Method, device, equipment and storage medium for generating machine learning model
CN111174370A (en) * 2018-11-09 2020-05-19 珠海格力电器股份有限公司 Fault detection method and device, storage medium and electronic device
CN109949031A (en) * 2019-04-02 2019-06-28 山东浪潮云信息技术有限公司 A kind of machine learning model training method and device
US11675334B2 (en) 2019-06-18 2023-06-13 International Business Machines Corporation Controlling a chemical reactor for the production of polymer compounds
US11520310B2 (en) 2019-06-18 2022-12-06 International Business Machines Corporation Generating control settings for a chemical reactor
CN110287171B (en) * 2019-06-28 2020-05-26 北京九章云极科技有限公司 Data processing method and system
CN110342625A (en) * 2019-07-23 2019-10-18 珠海格力智能装备有限公司 Method for treating water and device, water treatment system
CN111538494A (en) * 2020-07-09 2020-08-14 南京红松信息技术有限公司 Big data automatic modeling and verification engine system and method
WO2022141516A1 (en) * 2020-12-31 2022-07-07 华为技术有限公司 Model verification method and device
CN114943976B (en) * 2022-07-26 2022-10-11 深圳思谋信息科技有限公司 Model generation method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912500A (en) * 2016-03-30 2016-08-31 百度在线网络技术(北京)有限公司 Machine learning model generation method and machine learning model generation device
CN107273979A (en) * 2017-06-08 2017-10-20 第四范式(北京)技术有限公司 The method and system of machine learning prediction are performed based on service class
US20170330078A1 (en) * 2017-07-18 2017-11-16 Ashok Reddy Method and system for automated model building

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105468736A (en) * 2015-11-23 2016-04-06 国云科技股份有限公司 Plug-in and component based data preprocessing system and realization method therefor
CN105808500A (en) * 2016-02-26 2016-07-27 山西牡丹深度智能科技有限公司 Realization method and device of deep learning
CN106020811A (en) * 2016-05-13 2016-10-12 乐视控股(北京)有限公司 Development method and device of algorithm model
CN107368892B (en) * 2017-06-07 2020-06-16 无锡小天鹅电器有限公司 Model training method and device based on machine learning
CN107563417A (en) * 2017-08-18 2018-01-09 北京天元创新科技有限公司 A kind of deep learning artificial intelligence model method for building up and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912500A (en) * 2016-03-30 2016-08-31 百度在线网络技术(北京)有限公司 Machine learning model generation method and machine learning model generation device
CN107273979A (en) * 2017-06-08 2017-10-20 第四范式(北京)技术有限公司 The method and system of machine learning prediction are performed based on service class
US20170330078A1 (en) * 2017-07-18 2017-11-16 Ashok Reddy Method and system for automated model building

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11989475B2 (en) 2018-10-09 2024-05-21 Hewlett-Packard Development Company, L.P. Selecting a display with machine learning

Also Published As

Publication number Publication date
CN108416363A (en) 2018-08-17

Similar Documents

Publication Publication Date Title
WO2019148669A1 (en) Method and apparatus for generating machine learning model, computer device, and storage medium
WO2021169208A1 (en) Text review method and apparatus, and computer device, and readable storage medium
US11442939B2 (en) Configurable and incremental database migration framework for heterogeneous databases
WO2019033520A1 (en) Subsystem page development method, storage medium and server
US10592672B2 (en) Testing insecure computing environments using random data sets generated from characterizations of real data sets
WO2019041753A1 (en) Information modification method, apparatus, computer device and computer-readable storage medium
WO2019061664A1 (en) Electronic device, user's internet surfing data-based product recommendation method, and storage medium
WO2019179030A1 (en) Product purchasing prediction method, server and storage medium
WO2019085463A1 (en) Department demand recommendation method, application server, and computer-readable storage medium
WO2019019702A1 (en) Algorithm generation method and device, terminal device and storage medium
US20230334075A1 (en) Search platform for unstructured interaction summaries
CN117493309A (en) Standard model generation method, device, equipment and storage medium
US9754208B2 (en) Automatic rule coaching
CN111625567A (en) Data model matching method, device, computer system and readable storage medium
WO2016081194A1 (en) System and method for managing extra calendar periods in retail
CN109739876B (en) Data query method and device for database based on Sqltoy-orm framework
CN113361220A (en) Verification environment construction method and device for automatically cutting integrated circuit design
CN113434122A (en) Multi-role page creation method and device, server and readable storage medium
CN111858386A (en) Data testing method and device, computer equipment and storage medium
CN114677186B (en) Offer calculation method and device for financial product, computer equipment and storage medium
US11934300B2 (en) Reducing computing power for generating test scenario files for decision models
US10693494B2 (en) Reducing a size of multiple data sets
CN114139078B (en) Method and device for extracting elements from web page, computer equipment and readable storage medium
CN108038012A (en) Data calibration method and device, electronic equipment and computer-readable recording medium
WO2021135531A1 (en) Ai telemarketing test method and apparatus, electronic device and storage medium

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 16.11.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18903221

Country of ref document: EP

Kind code of ref document: A1