CN109409533B - Method, device, equipment and storage medium for generating machine learning model - Google Patents

Method, device, equipment and storage medium for generating machine learning model Download PDF

Info

Publication number
CN109409533B
CN109409533B CN201811143220.4A CN201811143220A CN109409533B CN 109409533 B CN109409533 B CN 109409533B CN 201811143220 A CN201811143220 A CN 201811143220A CN 109409533 B CN109409533 B CN 109409533B
Authority
CN
China
Prior art keywords
model
model generation
machine learning
data
generating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811143220.4A
Other languages
Chinese (zh)
Other versions
CN109409533A (en
Inventor
钱信羽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Lexin Software Technology Co Ltd
Original Assignee
Shenzhen Lexin Software Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Lexin Software Technology Co Ltd filed Critical Shenzhen Lexin Software Technology Co Ltd
Priority to CN201811143220.4A priority Critical patent/CN109409533B/en
Publication of CN109409533A publication Critical patent/CN109409533A/en
Application granted granted Critical
Publication of CN109409533B publication Critical patent/CN109409533B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The embodiment of the invention discloses a method, a device, equipment and a storage medium for generating a machine learning model, wherein the method comprises the following steps: obtaining model association parameters input by a user through a human-computer interaction interface, wherein the model association parameters comprise: generating data and displaying information types by the model; generating at least one machine learning model matched with at least one model generation algorithm according to the model generation data and the display information type; generating target display information matched with the display information type according to the generated machine learning model; and providing the target display information to the user. The technical scheme of the embodiment of the invention can enable the machine learning modeling process to be more generalized and automated.

Description

Method, device, equipment and storage medium for generating machine learning model
Technical Field
The embodiment of the invention relates to the technical field of machine learning, in particular to a method, a device, equipment and a storage medium for generating a machine learning model.
Background
Machine learning is a very important research field in computer science and artificial intelligence, and developing a machine learning model is a time-consuming and expert-driven work flow, which comprises data preparation, feature selection, model or technology selection, training, tuning and the like.
In the existing automatic machine learning modeling tools with open sources in the field of machine learning, such as Tpot, Auto-skearn and the like, the modeling process can be simplified by calling the packaged built-in functions, the super parameters are automatically searched, and the purpose of automatic or semi-automatic modeling is achieved.
In the process of implementing the invention, the inventor finds that the prior art has the following defects:
the existing machine learning modeling tools are not mature enough and are not suitable for any people. Using existing machine learning modeling tools requires a certain machine learning use base and associated experience, which limits the scope of use of machine learning tools to some extent. In addition, the algorithms supported by the existing machine learning modeling tools are mainly based on traditional algorithms, such as Decision trees or GBDT (Decision Tree-based classification regression) algorithms. However, currently, algorithms commonly used in the industry and having good effects do not support algorithms such as xgboost, lgbm, or catboost, which also limits the wide application of machine learning modeling tools to some extent.
Disclosure of Invention
The embodiment of the invention provides a method, a device, equipment and a storage medium for generating a machine learning model, so that a machine learning modeling process is more generalized and automated.
In a first aspect, an embodiment of the present invention provides a method for generating a machine learning model, including:
obtaining model association parameters input by a user through a human-computer interaction interface, wherein the model association parameters comprise: generating data and displaying information types by the model;
generating at least one machine learning model matched with at least one model generation algorithm according to the model generation data and the display information type;
generating target display information matched with the display information type according to the generated machine learning model;
and providing the target display information to the user.
In a second aspect, an embodiment of the present invention further provides an apparatus for generating a machine learning model, including:
the model association parameter acquisition module is used for acquiring model association parameters input by a user through a human-computer interaction interface, and the model association parameters comprise: generating data and displaying information types by the model;
the machine learning model generation module is used for generating at least one machine learning model matched with at least one model generation algorithm according to the model generation data and the display information type;
the target display information generation module is used for generating target display information matched with the display information type according to the generated machine learning model;
and the target display information providing module is used for providing the target display information to the user.
In a third aspect, an embodiment of the present invention further provides a computer device, where the computer device includes:
one or more processors;
storage means for storing one or more programs;
when the one or more programs are executed by the one or more processors, the one or more processors implement the method for generating a machine learning model provided in any embodiment of the present invention.
In a fourth aspect, an embodiment of the present invention further provides a computer storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for generating a machine learning model provided in any embodiment of the present invention.
According to the embodiment of the invention, the model association parameters input by a user through a human-computer interaction interface are obtained, at least one machine learning model matched with at least one model generation algorithm is generated according to the model generation data and the display information type included in the model association parameters, and the target display information matched with the display information type is generated according to the generated machine learning model to be provided for the user, so that the required machine learning model is automatically generated under the condition that the user does not contact codes, and the visual information display function is provided for the user according to the user requirement, the problem of low universality of the existing machine learning modeling tool is solved, and the generalization and automation performance of the machine learning modeling tool are improved.
Drawings
FIG. 1 is a flowchart of a method for generating a machine learning model according to an embodiment of the present invention;
FIG. 2a is a flowchart of a method for generating a machine learning model according to a second embodiment of the present invention;
FIG. 2b is a flowchart of a method for generating a machine learning model according to a second embodiment of the present invention;
FIG. 2c is a flowchart of a method for generating a machine learning model according to a second embodiment of the present invention;
FIG. 2d is a schematic diagram of a human-computer interaction interface of a machine learning model generation tool according to a second embodiment of the present invention;
FIG. 2e is a diagram of a model report file according to a second embodiment of the present invention;
FIG. 3 is a schematic diagram of an apparatus for generating a machine learning model according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention.
It should be further noted that, for the convenience of description, only some but not all of the relevant aspects of the present invention are shown in the drawings. Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.
Example one
Fig. 1 is a flowchart of a method for generating a machine learning model according to an embodiment of the present invention, where the embodiment is applicable to a case where a machine learning model and matching target presentation information are automatically generated, and the method may be executed by a device for generating a machine learning model, where the device may be implemented by software and/or hardware, and may be generally integrated in a computer device. Accordingly, as shown in fig. 1, the method comprises the following operations:
s110, obtaining model association parameters input by a user through a human-computer interaction interface, wherein the model association parameters comprise: model generation data and presentation information types.
The model-related parameters may be relevant parameters for generating a machine learning model, such as model generation data and a model generation algorithm. The model generation data may be user-specified, data sources for generating machine learning models, including training data and testing data. The type of the display information can be selected by a user and is used for displaying related functions. For example, the presentation information types may include important features, index parameters, model report files, and the like.
In the embodiment of the invention, when a user uses the machine learning model generation tool for modeling, model generation data for generating the machine learning model can be input through a visual human-computer interaction interface provided by the machine learning model generation tool, and a specific display information type is selected on the human-computer interaction interface, so that the machine learning model generation tool can automatically generate a corresponding machine learning model according to the model generation data and the display information type provided by the user.
In an optional embodiment of the present invention, obtaining model generation data input by a user through a human-computer interaction interface may include:
obtaining model generation data uploaded by the user through the human-computer interaction interface; or
And obtaining model generation data selected by the user in a data storage list through the human-computer interaction interface.
In the embodiment of the present invention, the model generation data may be corresponding file data uploaded by a user through a human-computer interface, or may be related data pre-stored in a server, which is specified by the user in a data storage list through the human-computer interface.
And S120, generating at least one machine learning model matched with at least one model generation algorithm according to the model generation data and the display information type.
The model generation algorithm may be a correlation algorithm for generating a machine learning model, such as a catboost, an xgboost, or an lgbm.
Correspondingly, after a user specifies the model generation data and the display information type through the human-computer interaction interface, the machine learning model generation tool can generate at least one machine learning model matched with at least one model generation algorithm. The model generation algorithm may be specified by a user, or randomly specified by a machine learning model generation tool, which is not limited in the embodiment of the present invention.
And S130, generating target display information matched with the display information type according to the generated machine learning model.
The target display information may be display data provided by the machine learning model generation tool according to the display information type selected by the user, for example, important data features of the screening model generation algorithm, an algorithm effect of the display model generation algorithm, a model report file corresponding to the machine learning model, and the like. The target display information can be generated according to the display information type, any display information which can be expanded through a machine learning model generation tool can be used as the target display information, and the content of the target display information is not limited in the embodiment of the invention.
In the embodiment of the invention, after the machine learning model is generated by the machine learning model generation tool, the matched target display information can be generated according to the display information type selected by the user through the human-computer interaction interface.
In an optional embodiment of the present invention, before generating at least one machine learning model matching at least one model generation algorithm according to the model generation data and the presentation information type, the method may further include: and performing a preprocessing operation on the model generation data to enable the model generation data to meet model generation requirements, wherein the preprocessing operation comprises at least one of processing missing data values, processing character-type fields and processing unbalanced data.
In the embodiment of the present invention, optionally, after the machine learning model generation tool obtains the model generation data input by the user through the human-computer interaction interface, the received data may be preprocessed first. Preprocessing operations include, but are not limited to, processing missing data values, processing character-type fields, and processing unbalanced data. The model generation data is preprocessed, so that the data specified by a user can better meet modeling requirements.
And S140, providing the target display information to the user.
Correspondingly, after the machine learning model generation tool generates the target display information matched with the display information type generation, the target display information can be provided for the user to use or refer to.
In the embodiment of the invention, a user can automatically generate the corresponding machine learning model by using the machine learning model generation tool only by needing the corresponding data source and appointing the corresponding model generation algorithm in the human-computer interaction interface, and no code needs to be input by the user in the whole process, and any person can automatically perform modeling by using the machine learning model generation tool.
According to the embodiment of the invention, the model association parameters input by a user through a human-computer interaction interface are obtained, at least one machine learning model matched with at least one model generation algorithm is generated according to the model generation data and the display information type included in the model association parameters, and the target display information matched with the display information type is generated according to the generated machine learning model to be provided for the user, so that the required machine learning model is automatically generated under the condition that the user does not contact codes, and the visual information display function is provided for the user according to the user requirement, the problem of low universality of the existing machine learning modeling tool is solved, and the generalization and automation performance of the machine learning modeling tool are improved.
Example two
Fig. 2a is a flowchart of a method for generating a machine learning model according to a second embodiment of the present invention, fig. 2b is a flowchart of a method for generating a machine learning model according to a second embodiment of the present invention, and fig. 2c is a flowchart of a method for generating a machine learning model according to a second embodiment of the present invention.
Accordingly, as shown in fig. 2a, when the model association parameters include model generation data, a model generation algorithm, and a machine learning model file and a model report file matched with the model generation algorithm, the method of the embodiment may include:
s210a, obtaining model association parameters input by a user through a human-computer interaction interface, wherein the model association parameters comprise: the model generation data, the model generation algorithm and the machine learning model file and the model report file which are matched with the model generation algorithm.
In the embodiment of the present invention, optionally, the model association parameters may include model generation data, a model generation algorithm, and a machine learning model file and a model report file matched with the model generation algorithm. That is, the user may specify the model generation data through the human-computer interaction interface, select the algorithm for generating the machine learning model, and specify the presentation information type as the machine learning model file and the model report file that match the model generation algorithm.
S220a, dividing the model generation data into model training subdata and model test subdata.
The model training subdata may be training data (i.e., a training set) for generating a machine model, and the model test subdata may be test data (i.e., a test set) for generating the machine model.
Correspondingly, after a user specifies model generation data and a model generation algorithm through a human-computer interaction interface and specifies display information types as a machine learning model file and a model report file which are matched with the model generation algorithm, a machine learning model generation tool needs to process the model generation data to generate corresponding model training subdata and model test subdata. The model test subdata is used for verifying the label generated by the machine learning model.
S230a, according to the model training subdata and the model testing subdata, a Bayesian optimization algorithm is adopted to carry out parameter adjustment on at least one hyper-parameter included in the model generation algorithm under limited trial times.
In the embodiment of the present invention, optionally, the determined model training subdata and the determined model test subdata may be subjected to detailed parameter adjustment by using a bayes optimization algorithm and a model generation algorithm selected by a user. The Bayesian optimization algorithm can search the probability distribution of the hyper-parameters in limited trial times to finally obtain reasonable parameters for modeling, and compared with the traditional method, the method can save a large amount of time and computing resources.
S240a, generating a machine learning model matched with the model generation algorithm according to the model generation data and the adjusted hyper-parameters.
Accordingly, after the parameter tuning operation is completed, the machine learning model generation tool may operate the model generation algorithm to generate a machine learning model that matches the model generation algorithm.
And S250a, generating a machine learning model file and a model report file matched with the model generation algorithm according to the generated machine learning model, and providing the files to a user.
In the embodiment of the present invention, optionally, the machine learning model generation tool may further generate a detailed model report file matched with the model generation algorithm, and provide the machine learning model file and the model report file to the user.
Correspondingly, as shown in fig. 2b, when the model-related parameters include model generation data, a model generation algorithm, and preset data characteristics matching with the model generation algorithm, the method of this embodiment may include:
s210b, obtaining model association parameters input by a user through a human-computer interaction interface, wherein the model association parameters comprise: the method comprises the steps of generating model generating data, generating a model generating algorithm and matching preset data characteristics with the model generating algorithm.
The preset data feature may be an important data feature of the model generation algorithm.
In the embodiment of the present invention, optionally, the model association parameters may further include model generation data, a model generation algorithm, and preset data features matched with the model generation algorithm. That is, the user may specify the model generation data through the human-computer interaction interface, select the algorithm for generating the machine learning model, and specify the presentation information type as the preset data feature matched with the model generation algorithm.
S220b, dividing the model generation data into model training subdata and model test subdata.
Similarly, after the user specifies the model generation data and the model generation algorithm through the human-computer interaction interface and specifies that the display information type is the preset data characteristic matched with the model generation algorithm, the machine learning model generation tool needs to process the model generation data to generate corresponding model training subdata and model test subdata.
And S230b, generating a machine learning model corresponding to the model generation algorithm according to the preset hyper-parameter corresponding to the model generation algorithm and the model generation data.
The preset hyper-parameter may be a default hyper-parameter of the model generation algorithm.
In the embodiment of the invention, if a user only wants to check important data characteristics of a relevant model generation algorithm in the operation process through a machine learning model generation tool, the display information type can be specified to be the preset data characteristics matched with the model generation algorithm on the human-computer interaction interface. When the data characteristics in the operation process of the model generation algorithm are checked, the optimization algorithm is not needed to be adopted for parameter adjustment, the determined model training subdata and the determined model test subdata can be operated according to the default hyper-parameters by adopting the model generation algorithm selected by the user, and the machine learning model corresponding to the model generation algorithm is generated.
S240b, the model generation data are sorted according to a set sorting rule.
The set ordering rule may be a preset rule for ordering the model generation data. For example, the set ordering rule may be ordered according to the importance degree of the fields in the data, and the embodiment of the present invention does not limit the specific form of the set ordering rule.
Accordingly, in order to provide important data characteristics for users, the model generation data can be sorted according to a set sorting rule.
And S250b, generating at least one preset data characteristic value matched with the model generation algorithm according to the generated machine learning model and the sequencing result.
In the embodiment of the invention, in the operation process of the model generation algorithm, several most important preset data features and corresponding feature values (such as importance degree scores of the data features and the like) can be output according to a set sorting rule so as to be used for a user to refer to the screening features.
Accordingly, as shown in fig. 2c, when the model-related parameters include model generation data, at least two model generation algorithms, and a comparison of performances of each model generation algorithm, the method of the embodiment may include:
s210c, obtaining model association parameters input by a user through a human-computer interaction interface, wherein the model association parameters comprise: model generation data, at least two model generation algorithms, and a comparison of the performance of each of the model generation algorithms.
In the embodiment of the present invention, optionally, the model association parameters may further include model generation data, at least two model generation algorithms, and a performance comparison of each model generation algorithm. That is, the user may specify model generation data through the human-computer interface, select at least two algorithms that generate the machine learning model, and specify the presentation information type to generate a performance comparison of the algorithms for each model.
S220c, dividing the model generation data into model training subdata and model test subdata.
Similarly, after a user specifies model generation data and at least two model generation algorithms through a human-computer interaction interface and specifies the display information type as performance comparison of each model generation algorithm, the machine learning model generation tool needs to process the model generation data to generate corresponding model training subdata and model test subdata.
And S230c, generating machine learning models respectively corresponding to the model generation algorithms according to preset hyper-parameters respectively corresponding to the model generation algorithms and the model generation data.
In the embodiment of the invention, if a user only wants to compare the performances of the relevant model generation algorithms through the machine learning model generation tool, the display information type can be appointed to compare the performances of the model generation algorithms on the human-computer interaction interface. Similarly, when the machine learning model generation tool runs each model generation algorithm quickly, the optimization algorithm is not needed to be used for parameter adjustment, and the determined model training subdata and the determined model test subdata can be run according to the default hyper-parameters by adopting the model generation algorithm selected by the user so as to generate the machine learning model corresponding to each model generation algorithm.
S240c, generating a model evaluation index corresponding to each model generation algorithm based on the generated machine learning model.
Correspondingly, in order to compare the performances of at least two model generation algorithms selected by the user, model evaluation indexes corresponding to the model generation algorithms can be generated so that the user can select the corresponding model generation algorithms by reference.
In an optional embodiment of the present invention, dividing the model generation data into model test subdata may include:
using the target data specified by the user in the model generation data as the model test subdata, or
And generating the target data with a set proportion in the model generation data as the model test sub-data.
The set ratio may be a ratio preset by the machine learning model generation tool, for example, 20% or 30%, and may be specifically set according to an actual requirement, and the embodiment of the present invention does not limit a specific value of the set ratio.
In the embodiment of the present invention, optionally, when determining the test set, if the user specifies the test set data through the human-computer interaction interface, the target data specified by the user may be directly used as the model test subdata. If the user does not specify test set data, the machine learning model generation tool may treat a portion of the target data of the model training subdata as model test subdata. For example, 20% of the model training subdata is used as the model test subdata.
It should be noted that, in the embodiment of the present invention, the model report file is a detailed report, and may include the important data features of the algorithm and the contents of the model evaluation index of the algorithm.
It should be noted that fig. 2a, fig. 2b, and fig. 2c are only schematic diagrams of an implementation manner, and there is no sequential relationship among S210a-S250a, S210b-S250b, and S210c-S250c, and the three operations respectively correspond to operations executed when the user selects different display information types, and the three operations may be alternatively implemented.
Fig. 2d is a schematic diagram of a human-computer interaction interface of a machine learning model generation tool according to a second embodiment of the present invention, and fig. 2e is a schematic diagram of a model report file according to a second embodiment of the present invention. In a specific example, as shown in fig. 2d, a selectable operation MODE (STEP 1: sense RUN MODE part) may be provided for a user in a human-computer interaction interface of the machine learning model generation tool, that is, a presentation information type, and the user may select through each presentation information type in the operation MODE, so that the machine learning model generation tool provides a corresponding service. The inportant Feature may be a presentation of preset data features, the Fast match may be a presentation of performance comparison of each model generation algorithm, and the Detail Report (Slow) may be a presentation of a machine learning model file and a model Report file that are matched with the model generation algorithm. STEP2 in fig. 2 d: the INPUT Y NAME part is the label NAME INPUT by the user, i.e. the label corresponding to the output of the expected machine learning model. The human-machine interface may also provide the user with an alternative model generation algorithm, such as STEP3 in fig. 2 d: the CHOOSE ALGORITHM (JUST POR DETALL MODE) part comprises Random Forest, Catboost, Xgboost or Logistic Regression and the like, and can also provide other model generation ALGORITHM options for the user according to actual requirements. STEP 4: the CHOOSE DATA portion is the user-provided function that generates DATA for the specified model. The user can select a data storage list From Database to use the data pre-stored in the server as model generation data, and can also Upload a local file From Upload as model generation data. After the user finishes all the steps, the user can Click a Click Me button, at the moment, the machine learning model generation tool can automatically run, and the corresponding machine learning model and the target display information are generated according to the display information type selected by the user. Specifically, if a user needs a complete model Report file, a Detail Report can be selected in the human-computer interaction interface, and the generated model Report file can simultaneously include preset data characteristics, model evaluation indexes of an algorithm and other related contents. If the user only needs to view Important features of the algorithm, an Important Feature can be selected in the human-computer interaction interface, and the generated target display information only comprises Feature information ranked according to the field importance degree. If the user needs to Compare the performance indexes of several algorithms, Fast Compare can be selected, and the generated target display information only comprises the corresponding model evaluation indexes of several model generation algorithms. It should be noted that fig. 2d is only a schematic diagram, and further function expansion of the human-computer interaction interface may be performed according to actual needs on the basis of fig. 2d, which is not limited in the embodiment of the present invention.
For example, referring to fig. 2e, for part of data contents of the model report file, Index is a model evaluation Index of the algorithm, Sample Info represents Sample information, which is some statistical contents of the number of samples, Dev represents values corresponding to each model evaluation Index and Sample statistics corresponding to the model training subdata, and Val and Off represent values corresponding to each model evaluation Index and Sample statistics corresponding to the model test subdata. All Variables can be All data features involved in the running process of the model generation algorithm, and the import Top 50 can be the first 50 data features and corresponding data feature values displayed according to the Importance degree of each data feature, namely the Importance degree score of the features. It should be noted that fig. 2e is only a schematic diagram, and the layout format of the model report file and the displayed content information may be set according to actual requirements, which is not limited in the embodiment of the present invention.
By adopting the technical scheme, the required machine learning model can be automatically generated under the condition that a user does not contact codes, a visual information display function is provided for the machine learning model according to the user requirements, the problem of low universality of the existing machine learning modeling tool is solved, and the generalization and automation performance of the machine learning modeling tool is improved.
EXAMPLE III
Fig. 3 is a schematic diagram of an apparatus for generating a machine learning model according to a third embodiment of the present invention, and as shown in fig. 3, the apparatus includes: a model association parameter obtaining module 310, a machine learning model generating module 320, a target display information generating module 330, and a target display information providing module 340, wherein:
a model association parameter obtaining module 310, configured to obtain a model association parameter input by a user through a human-computer interaction interface, where the model association parameter includes: generating data and displaying information types by the model;
a machine learning model generation module 320, configured to generate at least one machine learning model matching at least one model generation algorithm according to the model generation data and the display information type;
a target display information generating module 330, configured to generate target display information matched with the display information type according to the generated machine learning model;
and a target display information providing module 340, configured to provide the target display information to the user.
According to the embodiment of the invention, the model association parameters input by a user through a human-computer interaction interface are obtained, at least one machine learning model matched with at least one model generation algorithm is generated according to the model generation data and the display information type included in the model association parameters, and the target display information matched with the display information type is generated according to the generated machine learning model to be provided for the user, so that the required machine learning model is automatically generated under the condition that the user does not contact codes, and the visual information display function is provided for the user according to the user requirement, the problem of low universality of the existing machine learning modeling tool is solved, and the generalization and automation performance of the machine learning modeling tool are improved.
Optionally, the model-related parameters further include: a model generation algorithm; the display information types include: a machine learning model file and a model report file matched with the model generation algorithm; a machine learning model generation module 320, configured to divide the model generation data into model training subdata and model test subdata; according to the model training subdata and the model testing subdata, a Bayesian optimization algorithm is adopted to carry out parameter adjustment on at least one hyper-parameter included in the model generation algorithm under limited trial times; and generating a machine learning model matched with the model generation algorithm according to the model generation data and the adjusted hyper-parameters.
Optionally, the model-related parameters further include: a model generation algorithm; the display information types include: preset data characteristics matched with the model generation algorithm; a machine learning model generation module 320, configured to divide the model generation data into model training subdata and model test subdata; generating a machine learning model corresponding to the model generation algorithm according to preset hyper-parameters corresponding to the model generation algorithm and the model generation data; the target display information generating module 330 is specifically configured to sort the model generation data according to a set sorting rule; and generating at least one preset data characteristic value matched with the model generation algorithm according to the generated machine learning model and the sequencing result.
Optionally, the model-related parameters further include: at least two model generation algorithms; the display information types include: comparing the performances of the model generation algorithms; a machine learning model generation module 320, configured to divide the model generation data into model training subdata and model test subdata; generating machine learning models respectively corresponding to the model generation algorithms according to preset hyper-parameters respectively corresponding to the model generation algorithms and the model generation data; the target display information generating module 330 is specifically configured to generate a model evaluation index corresponding to each model generation algorithm according to the generated machine learning model.
Optionally, the model association parameter obtaining module 310 is configured to obtain model generation data uploaded by the user through the human-computer interaction interface; or
And obtaining model generation data selected by the user in a data storage list through the human-computer interaction interface.
Optionally, the machine learning model generation module 320 is further configured to use target data specified by the user in the model generation data as the model test sub-data, or
And generating the target data with a set proportion in the model generation data as the model test sub-data.
Optionally, the apparatus further comprises: the data preprocessing module is used for preprocessing the model generation data to enable the model generation data to meet model generation requirements, wherein the preprocessing operation comprises at least one of processing missing data values, processing character-type fields and processing unbalanced data.
The device for generating the machine learning model can execute the method for generating the machine learning model provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. For the technical details that are not described in detail in this embodiment, reference may be made to a method for generating a machine learning model according to any embodiment of the present invention.
Example four
Fig. 4 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention. FIG. 4 illustrates a block diagram of a computer device 412 suitable for use in implementing embodiments of the present invention. The computer device 412 shown in FIG. 4 is only one example and should not impose any limitations on the functionality or scope of use of embodiments of the present invention.
As shown in FIG. 4, computer device 412 is in the form of a general purpose computing device. Components of computer device 412 may include, but are not limited to: one or more processors 416, a storage device 428, and a bus 418 that couples the various system components including the storage device 428 and the processors 416.
Bus 418 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, an enhanced ISA bus, a Video Electronics Standards Association (VESA) local bus, and a Peripheral Component Interconnect (PCI) bus.
Computer device 412 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer device 412 and includes both volatile and nonvolatile media, removable and non-removable media.
Storage 428 may include computer system readable media in the form of volatile Memory, such as Random Access Memory (RAM) 430 and/or cache Memory 432. The computer device 412 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 434 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, commonly referred to as a "hard drive"). Although not shown in FIG. 4, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a Compact disk-Read Only Memory (CD-ROM), a Digital Video disk (DVD-ROM), or other optical media) may be provided. In these cases, each drive may be connected to bus 418 by one or more data media interfaces. Storage 428 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
Program 436 having a set (at least one) of program modules 426 may be stored, for example, in storage 428, such program modules 426 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination may comprise an implementation of a network environment. Program modules 426 generally perform the functions and/or methodologies of embodiments of the invention as described herein.
The computer device 412 may also communicate with one or more external devices 414 (e.g., keyboard, pointing device, camera, display 424, etc.), with one or more devices that enable a user to interact with the computer device 412, and/or with any devices (e.g., network card, modem, etc.) that enable the computer device 412 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 422. Also, computer device 412 may communicate with one or more networks (e.g., a Local Area Network (LAN), Wide Area Network (WAN), and/or a public Network, such as the internet) through Network adapter 420. As shown, network adapter 420 communicates with the other modules of computer device 412 over bus 418. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the computer device 412, including but not limited to: microcode, device drivers, Redundant processing units, external disk drive Arrays, disk array (RAID) systems, tape drives, and data backup storage systems, to name a few.
The processor 416 executes various functional applications and data processing by executing programs stored in the storage device 428, for example, to implement the generation method of the machine learning model provided by the above-described embodiment of the present invention.
That is, the processing unit implements, when executing the program: obtaining model association parameters input by a user through a human-computer interaction interface, wherein the model association parameters comprise: generating data and displaying information types by the model; generating at least one machine learning model matched with at least one model generation algorithm according to the model generation data and the display information type; generating target display information matched with the display information type according to the generated machine learning model; and providing the target display information to the user.
EXAMPLE five
An embodiment of the present invention further provides a computer storage medium storing a computer program, which when executed by a computer processor is configured to execute the method for generating a machine learning model according to any one of the above embodiments of the present invention: obtaining model association parameters input by a user through a human-computer interaction interface, wherein the model association parameters comprise: generating data and displaying information types by the model; generating at least one machine learning model matched with at least one model generation algorithm according to the model generation data and the display information type; generating target display information matched with the display information type according to the generated machine learning model; and providing the target display information to the user.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM) or flash Memory), an optical fiber, a portable compact disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, Radio Frequency (RF), etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (9)

1. A method for generating a machine learning model, comprising:
obtaining model association parameters input by a user through a human-computer interaction interface, wherein the model association parameters comprise: generating data and displaying information types by the model;
generating at least one machine learning model matched with at least one model generation algorithm according to the model generation data and the display information type;
generating target display information matched with the display information type according to the generated machine learning model;
providing the target display information to the user;
the model-associated parameters further include: a model generation algorithm; the display information types include: a machine learning model file and a model report file matched with the model generation algorithm;
generating at least one machine learning model matched with at least one model generation algorithm according to the model generation data and the presentation information type, comprising:
dividing the model generation data into model training subdata and model test subdata;
according to the model training subdata and the model testing subdata, a Bayesian optimization algorithm is adopted to carry out parameter adjustment on at least one hyper-parameter included in the model generation algorithm under limited trial times;
and generating a machine learning model matched with the model generation algorithm according to the model generation data and the adjusted hyper-parameters.
2. The method of claim 1, wherein the model association parameters further comprise: a model generation algorithm; the display information types include: preset data characteristics matched with the model generation algorithm;
generating at least one machine learning model matched with at least one model generation algorithm according to the model generation data and the presentation information type, comprising:
dividing the model generation data into model training subdata and model test subdata;
generating a machine learning model corresponding to the model generation algorithm according to preset hyper-parameters corresponding to the model generation algorithm and the model generation data;
generating target display information matched with the display information type according to the generated machine learning model, wherein the target display information comprises:
sequencing the model generation data according to a set sequencing rule;
and generating at least one preset data characteristic value matched with the model generation algorithm according to the generated machine learning model and the sequencing result.
3. The method of claim 1, wherein the model association parameters further comprise: at least two model generation algorithms; the display information types include: comparing the performances of the model generation algorithms;
generating at least one machine learning model matched with at least one model generation algorithm according to the model generation data and the presentation information type, comprising:
dividing the model generation data into model training subdata and model test subdata;
generating machine learning models respectively corresponding to the model generation algorithms according to preset hyper-parameters respectively corresponding to the model generation algorithms and the model generation data;
generating target display information matched with the display information type according to the generated machine learning model, wherein the target display information comprises:
and generating a model evaluation index corresponding to each model generation algorithm according to the generated machine learning model.
4. The method according to any one of claims 1-3, wherein obtaining model generation data input by a user through a human-computer interaction interface comprises:
obtaining model generation data uploaded by the user through the human-computer interaction interface; or
And obtaining model generation data selected by the user in a data storage list through the human-computer interaction interface.
5. The method of any of claims 1-3, wherein partitioning the model generation data into model test subdata, comprises:
using the target data specified by the user in the model generation data as the model test subdata, or
And generating the target data with a set proportion in the model generation data as the model test sub-data.
6. The method of claim 1, further comprising, prior to generating at least one machine learning model that matches at least one model generation algorithm based on the model generation data and the presentation information type:
and performing a preprocessing operation on the model generation data to enable the model generation data to meet model generation requirements, wherein the preprocessing operation comprises at least one of processing missing data values, processing character-type fields and processing unbalanced data.
7. An apparatus for generating a machine learning model, comprising:
the model association parameter acquisition module is used for acquiring model association parameters input by a user through a human-computer interaction interface, and the model association parameters comprise: generating data and displaying information types by the model;
the machine learning model generation module is used for generating at least one machine learning model matched with at least one model generation algorithm according to the model generation data and the display information type;
the target display information generation module is used for generating target display information matched with the display information type according to the generated machine learning model;
the target display information providing module is used for providing the target display information to the user;
wherein the model-associated parameters further comprise: a model generation algorithm; the display information types include: a machine learning model file and a model report file matched with the model generation algorithm; the machine learning model generation module 320 is specifically configured to divide the model generation data into model training subdata and model test subdata; according to the model training subdata and the model testing subdata, a Bayesian optimization algorithm is adopted to carry out parameter adjustment on at least one hyper-parameter included in the model generation algorithm under limited trial times; and generating a machine learning model matched with the model generation algorithm according to the model generation data and the adjusted hyper-parameters.
8. A computer device, the device comprising:
one or more processors;
a storage device for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of generating a machine learning model of any of claims 1-6.
9. A computer storage medium having a computer program stored thereon, the program, when executed by a processor, implementing a method of generating a machine learning model according to any of claims 1-6.
CN201811143220.4A 2018-09-28 2018-09-28 Method, device, equipment and storage medium for generating machine learning model Active CN109409533B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811143220.4A CN109409533B (en) 2018-09-28 2018-09-28 Method, device, equipment and storage medium for generating machine learning model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811143220.4A CN109409533B (en) 2018-09-28 2018-09-28 Method, device, equipment and storage medium for generating machine learning model

Publications (2)

Publication Number Publication Date
CN109409533A CN109409533A (en) 2019-03-01
CN109409533B true CN109409533B (en) 2021-07-27

Family

ID=65466418

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811143220.4A Active CN109409533B (en) 2018-09-28 2018-09-28 Method, device, equipment and storage medium for generating machine learning model

Country Status (1)

Country Link
CN (1) CN109409533B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008121B (en) * 2019-03-19 2022-07-12 合肥中科类脑智能技术有限公司 Personalized test system and test method thereof
CN110569984B (en) * 2019-09-10 2023-04-14 Oppo广东移动通信有限公司 Configuration information generation method, device, equipment and storage medium
CN111221517A (en) * 2019-10-12 2020-06-02 中国平安财产保险股份有限公司 Model creating method and device, computer equipment and readable storage medium
CN110991649A (en) * 2019-10-28 2020-04-10 中国电子产品可靠性与环境试验研究所((工业和信息化部电子第五研究所)(中国赛宝实验室)) Deep learning model building method, device, equipment and storage medium
CN112784181A (en) * 2019-11-08 2021-05-11 阿里巴巴集团控股有限公司 Information display method, image processing method, information display device, image processing equipment and information display device
CN113570062B (en) * 2020-04-28 2023-10-10 大唐移动通信设备有限公司 Machine learning model parameter transmission method and device
CN112685958B (en) * 2020-12-30 2022-11-01 西南交通大学 SiC MOSFET blocking voltage determination method based on neural network
CN112966439A (en) * 2021-03-05 2021-06-15 北京金山云网络技术有限公司 Machine learning model training method and device and virtual experiment box
CN113706285A (en) * 2021-07-08 2021-11-26 长江大学 Credit card fraud detection method
CN114579822B (en) * 2021-12-13 2023-05-30 北京市建筑设计研究院有限公司 Modeling tool pushing method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678580A (en) * 2013-12-07 2014-03-26 浙江大学 Multitask machine learning method for text classification and device thereof
CN104200087A (en) * 2014-06-05 2014-12-10 清华大学 Parameter optimization and feature tuning method and system for machine learning
CN106169096A (en) * 2016-06-24 2016-11-30 山西大学 A kind of appraisal procedure of machine learning system learning performance
CN106446230A (en) * 2016-10-08 2017-02-22 国云科技股份有限公司 Method for optimizing word classification in machine learning text
CN106897570A (en) * 2017-03-02 2017-06-27 山东师范大学 A kind of COPD test system based on machine learning
CN108416363A (en) * 2018-01-30 2018-08-17 平安科技(深圳)有限公司 Generation method, device, computer equipment and the storage medium of machine learning model

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0119890D0 (en) * 2001-08-15 2001-10-10 Proteom Ltd Apparatus and method for predicting rules of protein sequence interactions
CN101782976B (en) * 2010-01-15 2013-04-10 南京邮电大学 Automatic selection method for machine learning in cloud computing environment
CN102608285B (en) * 2012-02-21 2014-08-06 南京工业大学 Prediction method for explosion characteristics of organic mixture based on support vector machine
US9092665B2 (en) * 2013-01-30 2015-07-28 Aquifi, Inc Systems and methods for initializing motion tracking of human hands
CN105068661B (en) * 2015-09-07 2018-09-07 百度在线网络技术(北京)有限公司 Man-machine interaction method based on artificial intelligence and system
US20170249547A1 (en) * 2016-02-26 2017-08-31 The Board Of Trustees Of The Leland Stanford Junior University Systems and Methods for Holistic Extraction of Features from Neural Networks
CN106067040A (en) * 2016-06-01 2016-11-02 深圳市寒武纪智能科技有限公司 A kind of method by fragment interactive training machine learning image recognition algorithm model
US10032111B1 (en) * 2017-02-16 2018-07-24 Rockwell Collins, Inc. Systems and methods for machine learning of pilot behavior
CN107391603B (en) * 2017-06-30 2020-12-18 北京奇虎科技有限公司 User portrait establishing method and device for mobile terminal
CN107480435B (en) * 2017-07-31 2020-12-08 广东精点数据科技股份有限公司 Automatic search machine learning system and method applied to clinical data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678580A (en) * 2013-12-07 2014-03-26 浙江大学 Multitask machine learning method for text classification and device thereof
CN104200087A (en) * 2014-06-05 2014-12-10 清华大学 Parameter optimization and feature tuning method and system for machine learning
CN106169096A (en) * 2016-06-24 2016-11-30 山西大学 A kind of appraisal procedure of machine learning system learning performance
CN106446230A (en) * 2016-10-08 2017-02-22 国云科技股份有限公司 Method for optimizing word classification in machine learning text
CN106897570A (en) * 2017-03-02 2017-06-27 山东师范大学 A kind of COPD test system based on machine learning
CN108416363A (en) * 2018-01-30 2018-08-17 平安科技(深圳)有限公司 Generation method, device, computer equipment and the storage medium of machine learning model

Also Published As

Publication number Publication date
CN109409533A (en) 2019-03-01

Similar Documents

Publication Publication Date Title
CN109409533B (en) Method, device, equipment and storage medium for generating machine learning model
US11210569B2 (en) Method, apparatus, server, and user terminal for constructing data processing model
US11062090B2 (en) Method and apparatus for mining general text content, server, and storage medium
US20190362222A1 (en) Generating new machine learning models based on combinations of historical feature-extraction rules and historical machine-learning models
CN111143226B (en) Automatic test method and device, computer readable storage medium and electronic equipment
CN109240901B (en) Performance analysis method, performance analysis device, storage medium, and electronic apparatus
US11100917B2 (en) Generating ground truth annotations corresponding to digital image editing dialogues for training state tracking models
US9626164B1 (en) Test-driven development module for repository-based development
US11960858B2 (en) Performance based system configuration as preprocessing for system peformance simulation
WO2021121296A1 (en) Exercise test data generation method and apparatus
US20230115163A1 (en) Method for processing data, and electronic device, storage medium and program product
KR20230006601A (en) Alignment methods, training methods for alignment models, devices, electronic devices and media
CN112632893B (en) Graph screening method and device, server and storage medium
JP2021500639A (en) Prediction engine for multi-step pattern discovery and visual analysis recommendations
CN112364185A (en) Method and device for determining characteristics of multimedia resource, electronic equipment and storage medium
US11593700B1 (en) Network-accessible service for exploration of machine learning models and results
CN112447173A (en) Voice interaction method and device and computer storage medium
CN114237588A (en) Code warehouse selection method, device, equipment and storage medium
US11681511B2 (en) Systems and methods for building and deploying machine learning applications
CN113190154B (en) Model training and entry classification methods, apparatuses, devices, storage medium and program
US11829890B2 (en) Automated machine learning: a unified, customizable, and extensible system
CN113495963B (en) Embedded representation method and device of network security knowledge graph
CN111274480B (en) Feature combination method and device for content recommendation
KR20190130395A (en) Apparatus and method for analyzing heterogeneous data based on web
CN115168577B (en) Model updating method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant