CN111222553A - Training data processing method and device of machine learning model and computer equipment - Google Patents

Training data processing method and device of machine learning model and computer equipment Download PDF

Info

Publication number
CN111222553A
CN111222553A CN201911403575.7A CN201911403575A CN111222553A CN 111222553 A CN111222553 A CN 111222553A CN 201911403575 A CN201911403575 A CN 201911403575A CN 111222553 A CN111222553 A CN 111222553A
Authority
CN
China
Prior art keywords
machine learning
characteristic
learning model
parameters
characteristic parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911403575.7A
Other languages
Chinese (zh)
Other versions
CN111222553B (en
Inventor
饶慧林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Cubesili Information Technology Co Ltd
Original Assignee
Guangzhou Huaduo Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Huaduo Network Technology Co Ltd filed Critical Guangzhou Huaduo Network Technology Co Ltd
Priority to CN201911403575.7A priority Critical patent/CN111222553B/en
Publication of CN111222553A publication Critical patent/CN111222553A/en
Application granted granted Critical
Publication of CN111222553B publication Critical patent/CN111222553B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Feedback Control In General (AREA)

Abstract

The application provides a training data processing method and device of a machine learning model and computer equipment, and relates to the technical field of machine learning model training, wherein the training data processing method of the machine learning model comprises the following steps: acquiring a characteristic parameter set of the updated training data of the machine learning model; wherein the feature parameter set comprises a plurality of candidate feature parameters; determining the range of the characteristic parameters to be selected according to the type of the machine learning model and the type of the characteristic parameters; in the range of the characteristic parameters, selecting characteristic parameters from the characteristic parameter set in sequence and inputting the characteristic parameters into the machine learning model for training; and acquiring an output result of the machine learning model, calculating an AUC value of the output result, and selecting a target characteristic parameter from the characteristic parameter set as training data according to the AUC value. The training data processing scheme of the machine learning model can improve the efficiency of model training.

Description

Training data processing method and device of machine learning model and computer equipment
Technical Field
The application relates to the technical field of machine learning model training, in particular to a training data processing method and device of a machine learning model and computer equipment.
Background
In the training process of the machine learning model, characteristics need to be added or modified to the machine learning model. In order to increase training samples for machine learning, different variables need to be added to the features, or different combinations of the features are input into the machine learning model one by one, and the training efficiency is low through complicated training and waiting.
Disclosure of Invention
In order to overcome the problem of low training efficiency of the current machine learning model, the following technical scheme is specially provided:
in a first aspect, the present application provides a method for processing training data of a machine learning model, including the following steps:
acquiring a characteristic parameter set of the updated training data of the machine learning model; wherein the feature parameter set comprises a plurality of candidate feature parameters;
determining the range of the characteristic parameters to be selected according to the type of the machine learning model and the type of the characteristic parameters;
in the range of the characteristic parameters, selecting characteristic parameters from the characteristic parameter set in sequence and inputting the characteristic parameters into the machine learning model for training;
and acquiring an output result of the machine learning model, calculating an AUC value of the output result, and selecting a target characteristic parameter from the characteristic parameter set as training data according to the AUC value.
In one embodiment, the step of obtaining the feature parameter set of the training data of the updated machine learning model includes:
and acquiring newly added or modified characteristic parameters of the training data of the machine learning model, and updating a characteristic parameter set.
In one embodiment, the step of sequentially selecting feature parameters from the feature parameter set within the range of the feature parameters and inputting the selected feature parameters into the machine learning model for training includes:
determining the interval of the characteristic parameters obtained twice in the range of the characteristic parameters according to the granularity of the characteristic parameters;
and sequentially acquiring each characteristic parameter within the range of the characteristic parameters according to the interval of the characteristic parameters.
In one embodiment, the step of determining the range of the feature parameters to be selected according to the type of the machine learning model and the type of the feature parameters includes:
confirming the value characteristics of the newly added or modified characteristic parameters according to the type of the machine learning model;
and determining the range of the characteristic parameter to be selected according to the value characteristics of the newly added or modified characteristic parameter.
In one embodiment, the step of sequentially obtaining each feature parameter within the range of the feature parameter according to the interval of the feature parameter includes:
when the characteristic parameters newly added or modified by the training data of the machine learning model are continuous characteristic parameters, sequentially acquiring each characteristic parameter according to the interval of the characteristic parameters in the range of the characteristic parameters;
and inputting each characteristic parameter into the machine learning model for training.
In one embodiment, the step of sequentially obtaining each feature parameter within the range of the feature parameter according to the interval of the feature parameter includes:
when the training of the machine learning model requires a plurality of characteristic parameters for training, the characteristic parameters comprise discrete characteristic quantity;
and in the range of the characteristic parameters, sequentially acquiring the combination of the characteristic parameters corresponding to the characteristic quantity from the characteristic parameter set, and inputting the combination into the machine learning model for training.
In one embodiment, the step of sequentially acquiring combinations of feature parameters corresponding to feature quantities from the feature parameter set within the range of the feature parameters and inputting the combinations into the machine learning model for training includes:
and according to the corresponding feature quantity, sequentially acquiring all combinations of feature parameters from the feature parameter set, and inputting the combinations of feature parameters into the machine learning model one by one for training.
In a second aspect, the present application further provides a training data processing apparatus for a machine learning model, including:
the acquisition module is used for acquiring a characteristic parameter set of the updated training data of the machine learning model; wherein the feature parameter set comprises a plurality of candidate feature parameters;
the range determining module is used for determining the range of the characteristic parameters needing to be selected according to the type of the machine learning model and the type of the characteristic parameters;
the training module is used for sequentially selecting characteristic parameters from the characteristic parameter set in the range of the characteristic parameters and inputting the characteristic parameters into the machine learning model for training;
and the selection module is used for acquiring the output result of the machine learning model, calculating an AUC value of the output result, and selecting a target characteristic parameter from the characteristic parameter set as training data according to the AUC value.
In a third aspect, the present application further provides a computer device, comprising:
one or more processors;
a memory;
one or more computer programs, wherein the one or more computer programs are stored in the memory and configured to be executed by the one or more processors, the one or more computer programs being configured to perform any of the training data processing methods of the machine learning model provided in the first aspect.
In a fourth aspect, the present application further provides a computer-readable storage medium, wherein the computer-readable storage medium stores thereon a computer program, and the computer program, when executed by a processor, implements the method for processing training data of a machine learning model according to any one of the aspects provided in the first aspect.
The training data processing method, the training data processing device and the computer equipment of the machine learning model have the beneficial effects that:
according to the training data processing method and device for the machine learning model and the computer device, the computer device sets the range of the characteristic parameters in the process of updating the characteristic set and training the machine learning model, corresponding training data are automatically acquired one by one according to the set range, and the optimal characteristic parameters are acquired as the training data according to the AUC value corresponding to the result of each training. Therefore, the problem that training efficiency is low due to the fact that training samples of characteristic parameters need to be trained one by one and wait at present according to manual experience is solved, and the efficiency of model training in machine learning is improved.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic flowchart of a training data processing method of a machine learning model according to an embodiment of the present application;
FIG. 2 is a detailed flowchart illustrating a step S130 of a training data processing method of a machine learning model according to an embodiment of the present application;
FIG. 3 is a schematic flow chart illustrating a training data processing method of a machine learning model according to another embodiment of the present disclosure;
FIG. 4 is a flow chart illustrating a training data processing method for a machine learning model according to another embodiment of the present disclosure;
FIG. 5 is a schematic flow chart corresponding to the embodiment of FIG. 4;
FIG. 6 is a flowchart illustrating a method for processing training data of a machine learning model according to yet another embodiment of the present application;
FIG. 7 is a schematic flow chart corresponding to the embodiment of FIG. 6;
FIG. 8 is a block diagram of an apparatus at a training data point of a machine learning model provided in an embodiment of the present application;
fig. 9 is a schematic diagram of an internal structure of a computer device provided in an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.
It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Referring to fig. 1, fig. 1 is a flowchart illustrating a training data processing method of a machine learning model according to an embodiment of the present application.
The training data processing method of the machine learning model comprises the following steps:
s110, acquiring a characteristic parameter set of training data of the machine learning model to be updated; wherein the feature parameter set comprises a plurality of candidate feature parameters;
s120, determining the range of the characteristic parameters needing to be selected according to the type of the machine learning model and the type of the characteristic parameters;
s130, sequentially selecting characteristic parameters from the characteristic parameter set in the range of the characteristic parameters, and inputting the characteristic parameters into the machine learning model for training;
s140, obtaining an output result of the machine learning model, calculating an AUC value of the output result, and selecting a target characteristic parameter from the characteristic parameter set as training data according to the AUC value.
In steps S110 to S140, in the training of the machine learning model training, candidate feature parameters are added to the machine learning model according to different purposes of use, requirements and test requirements of the machine learning model. And updating the training data of the machine learning model by using the alternative characteristic parameters. The feature parameters of the training data of the machine learning model form feature parameter combinations. The feature parameter set includes original feature parameters and updated feature parameters in training data of the machine learning model. The set of feature parameters provides alternative feature parameters for training of the machine learning model.
The range of the corresponding characteristic parameters is different for different types of machine learning models and the types of the characteristic parameters needing to be updated. The range of the characteristic parameter includes a combination of the respective characteristic parameters within the range, in addition to the interval formed by the minimum value and the maximum value. The combination form may include a discrete type and a continuous type.
And according to the type and the combination form of the characteristic parameters, sequentially selecting the characteristic parameters from the characteristic parameter set in the range of the characteristic parameters, and inputting the characteristic parameters obtained each time into a machine learning model for training. The range of the characteristic parameter may be a value only for the characteristic parameter, or may be obtained by determining other characteristic parameters according to the value of the characteristic parameter.
And training one by one according to the selected characteristic parameters each time to obtain each output result of the training of the machine learning model. And respectively solving the corresponding AUC value according to each output result. And comparing the obtained plurality of AUC values according to the obtained AUC value of each output result, selecting the optimal characteristic parameter as a target characteristic parameter from the characteristic parameter set, and inputting the target characteristic parameter into the updated machine learning model as training data.
According to the training data processing method of the machine learning model, during the process that the computer equipment trains the machine learning model and updates the characteristic parameter set, the range of the characteristic parameters is set, and in the range, the characteristic parameters are selected from the characteristic parameter set in sequence according to the types of the characteristic parameters and input into the machine learning model for training to obtain the target characteristic parameters. The training data processing method of the machine learning model provided by the application solves the problem of low training efficiency caused by the fact that training samples of characteristic parameters are obtained according to experience of operators and are trained and waited one by one in the training process of the existing machine learning model.
The step S110 may further include:
and S111, acquiring newly added or modified characteristic parameters of the training data of the machine learning model, and updating a characteristic parameter set.
In the training of the machine learning model training, according to different use purposes, requirements and test requirements of the machine learning model, updating a characteristic parameter set of original training data of the machine learning model, wherein the updating mode comprises modifying the characteristic parameters of the original characteristic parameter set or adding a new characteristic parameter set to the original characteristic parameter set. And sequentially selecting characteristic parameters from the updated characteristic parameter set in the range of the characteristic parameters, and inputting the characteristic parameters into the machine learning model for training.
Referring to fig. 2, fig. 2 is a detailed flowchart illustrating a step S130 of a training data processing method of a machine learning model according to an embodiment of the present application.
For step S130, it may further include:
s131, determining the interval of the characteristic parameters obtained twice in the range of the characteristic parameters according to the granularity of the characteristic parameters;
and S132, sequentially acquiring each characteristic parameter within the range of the characteristic parameters according to the interval of the characteristic parameters.
In the process of steps S131 to S132, the characteristic parameter is expressed in the form of a numerical value. And determining the granularity of the characteristic parameters according to the test requirements, and determining the interval of the characteristic parameters obtained by two adjacent times of training within the range of the characteristic parameters according to the granularity. The granularity of the characteristic parameter needs to be combined with the type of the characteristic parameter. If the characteristic parameter is used to characterize the weight, and the interval is (0,1), the granularity is greater than 0 and less than 1; if the feature parameter is used for representing the number of feature parameters obtained during training, the granularity is a positive integer greater than or equal to 1.
And according to the interval of the characteristic parameters, sequentially corresponding characteristic parameters in the range of the characteristic parameters, and inputting the characteristic parameters into a machine learning model for training.
Referring to fig. 3, fig. 3 is a flowchart illustrating a training data processing method of a machine learning model according to another embodiment of the present application.
On the basis of the expansion of step S110, step S120 may further include:
s121, confirming the value-taking characteristics of the newly added or modified characteristic parameters according to the type of the machine learning model;
and S122, determining the range of the characteristic parameter to be selected according to the value characteristics of the newly added or modified characteristic parameter.
In steps S121-S122, the computer device obtains a model file of the machine learning model, obtaining a type of the machine learning model. And determining updated characteristic parameters according to the type of the machine learning model and the purpose of model training, wherein the updated characteristic parameters comprise newly added characteristic parameters and modified characteristic parameters. And determining the value characteristics according to the types of the newly added or modified characteristic parameters. The value-taking characteristics include a value-taking range corresponding to the characteristic parameter, and also can be included in training data to obtain the quantity of other characteristic parameters.
And determining the range of the characteristic parameter to be selected according to the value characteristic of the newly added or modified characteristic parameter.
Referring to fig. 4, fig. 4 is a flowchart illustrating a training data processing method of a machine learning model according to yet another embodiment of the present application.
On this basis, step S132 may further include:
s11, when the newly added or modified characteristic parameters of the training data of the machine learning model are continuous characteristic parameters, sequentially acquiring each characteristic parameter according to the interval of the characteristic parameters in the range of the characteristic parameters;
and S12, inputting each characteristic parameter into the machine learning model for training.
In steps S11-S12, when the feature parameters added or modified in the training data of the machine learning model are continuous, each feature parameter is sequentially obtained according to the interval between two adjacent feature parameters of the feature parameter within the range of the feature parameters, and each feature parameter is input into the machine learning model for training.
In order to more clearly illustrate the execution of the steps S11-S12, a specific embodiment is described below:
and detecting the junk mails by using the LS model. The specific function of the LS model is: x1y1+ x2y2+ … … + xnyn. The summed value ranges of the function are (0, 1). If spam, the result of the function tends to 1; for non-spam, the result of the function tends to be 0. And detecting the junk mails according to the calculation result of the function. Where x1, x2, … …, xn are variables and y1, y2, … …, yn are the values of the samples evaluated for different parts of the mail. For example, y1, y2, … …, yn may characterize the sender, subject, body, etc. of the mail piece, respectively. In order to detect the influence of different parts of the mail on the junk mail, a characteristic parameter of weight is added to the model, the characteristic parameter is input into the model, different weights are distributed to different parts of the mail, and the result of the obtained function is compared with the situation of a prediction sample.
In this example, the computer device assigns different weights a, b, c to different parts of the mail, and 0< a + b + c <1, with the ranges of values a, b, c being (0,1), respectively. According to the detection experience, the intervals of the characteristic parameters, namely the intervals of the value ranges of a, b and c are respectively 0.02, namely the value ranges of a, b and c can be taken from 0.02 at the minimum. And combining the characteristic parameters a, b and c according to the relation of a, b and c to form the characteristic parameters (a, b and c). And a, b and c are sub-parameters of the characteristic parameters (a, b and c), and each sub-parameter is a continuous characteristic parameter. The characteristic parameters (a, b, c) may be arbitrarily combined as long as the above-described value conditions are satisfied. Each sub-parameter starts from 0.02, continuous value taking is carried out towards the trend of a value of 1, taking the sub-parameter a as an example, the value of a is 0.02,0.04, 0.06 and … …, the value taking situations of the sub-parameters b and c are the same as a, all combinations which meet the conditions are obtained from all the values of a, b and c according to the condition that 0< a + b + c <1, and the combinations are input into the machine learning model for training.
Referring to fig. 5, fig. 5 is a flowchart corresponding to an embodiment of fig. 4.
The process of training data processing and training for the machine learning model may include the steps of:
s51, adding characteristic parameters according to the training purpose of the model;
s52, if the added characteristic parameter is continuous, sequentially forming each characteristic parameter combination according to the value condition which is in the range of the characteristic parameter and accords with the added characteristic parameter;
s53, inputting the characteristic parameter combinations to the machine learning model one by one;
and S54, obtaining an output result of each training of the machine learning model, and calculating an AUC value corresponding to the output result.
If the modified characteristic parameters are continuous characteristic parameters, the computer equipment can continuously value the modified characteristic parameters within the range of the characteristic parameters according to instructions and input the modified characteristic parameters into the machine learning model one by one for training.
As can be seen from the above embodiments, in the process of machine learning model training, the number of related feature parameters is large, and if an operator needs to obtain the feature parameters one by one based on experience, even under the condition of performing combined value-taking on a plurality of sub-parameters of the feature parameters, the data size involved in the training is huge, and if the operator needs to obtain the feature parameters one by one based on experience, omission easily occurs, and the result of model training is affected. In the application, the computer equipment can sequentially acquire continuous characteristic parameters within the range of the characteristic parameters according to the intervals of the characteristic parameters and input the continuous characteristic parameters into the machine learning model for training, so that the efficiency of model training is improved, and the integrity of the training is improved.
Referring to fig. 6, fig. 6 is a flowchart illustrating a training data processing method of a machine learning model according to still another embodiment of the present application.
In addition, step S132 may further include:
s21, when the training of the machine learning model requires a plurality of characteristic parameters for training, the characteristic parameters comprise discrete characteristic quantity;
and S22, sequentially acquiring combinations of the characteristic parameters corresponding to the characteristic quantities from the characteristic parameter set in the range of the characteristic parameters, and inputting the combinations into the machine learning model for training.
In steps S21-S22, in this embodiment, the feature parameter set required for training the machine learning model includes a plurality of candidate feature parameters. The feature parameter set includes discrete feature quantities, which are quantities determined to be required to extract other feature parameters during the training of the machine learning model. For example, in the feature parameters of the machine learning model, 5 candidate feature parameters are included in addition to the feature parameters of the discrete feature quantity, but for training of the machine learning model, several feature parameters may be extracted for training, and the feature parameters of the discrete feature quantity are determined whether to extract 1 to 3 feature parameters for training or 2 to 5 feature parameters for training in the training of the machine learning model. And the range of the characteristic parameter corresponding to the characteristic parameter of the discrete characteristic quantity is [1,3] or [2,5] respectively. The values within the range of the characteristic parameter are discrete number values.
On this basis, the step S22 may further include:
and S221, sequentially acquiring all combinations of the characteristic parameters from the characteristic parameter set according to the corresponding characteristic quantity, and inputting the combinations into the machine learning model one by one for training.
In step S221, combinations of feature parameters corresponding to the number of features are sequentially obtained from the feature parameter set according to the range of the feature parameters. Taking the range of the feature parameter as [1,3] as an example, the obtained feature quantity may be one of 1,2,3, that is, 1,2,3 feature parameters are extracted from the feature parameter set, and the feature parameters with the corresponding quantity obtained by extraction are trained in the machine learning model.
In order to more clearly illustrate the execution of steps S21-S22 when the characteristic parameter includes a discrete number of characteristics, a specific embodiment is described below:
and detecting the influence factors of the online time of the audience in the live broadcast room by using the model. In this embodiment, the machine learning model may include 4 feature parameters, such as a layout theme of a live broadcast room, a live broadcast item of a main broadcast, a live broadcast time period, and a type of the main broadcast.
The discrete feature number is 4, and according to the training requirement of a general machine learning model, the range of the feature parameters of the discrete feature number is [1,4], in the training process of the model, the feature parameters corresponding to the feature number are sequentially extracted in the range of the feature parameters, and the combination formed by the feature parameters is input to the machine learning model for training. And the characteristic parameters extracted each time are in different combination modes.
Moreover, the set of characteristic parameters may also include necessary characteristic parameters, such as factors such as the online frequency of the viewer, the online duration of the viewer, the historical preferences of the viewer, and the like. In this case, the feature parameter set of the machine learning model includes the necessary feature parameters, the candidate feature parameters, and the feature parameters of the discrete feature quantity are only for the feature quantities of the candidate feature parameters.
And (4) inputting the feature parameter combination formed by the extracted optional feature parameters and the necessary feature parameters into a machine learning model for training one by one.
Referring to fig. 7, fig. 7 is a flowchart corresponding to an embodiment of fig. 6.
The process of training data processing and training for the machine learning model may include the steps of:
s71, acquiring necessary characteristic parameters and alternative characteristic parameters according to the adjustment purpose of the model;
s72, determining the characteristic parameters of discrete characteristic quantity according to the alternative characteristic parameters;
s73, determining the minimum value and the maximum value in the characteristic parameters of the discrete characteristic quantity by referring to the empirical value, and forming the range of the characteristic parameters of the discrete characteristic quantity;
s74, according to the range, obtaining alternative characteristic parameters corresponding to the characteristic quantity in the characteristic parameter combination in sequence;
s75, combining the optional characteristic parameters extracted each time with the necessary characteristic parameters, and inputting the combined characteristic parameters into the machine learning model one by one;
s76, obtaining an output result of each training of the machine learning model, and calculating an AUC value corresponding to the output result;
and S77, selecting target characteristic parameters from the characteristic parameter set according to the AUC values to serve as training data.
According to the training data processing method of the machine learning model, the range of the feature quantity of model training is limited through computer equipment, the optional feature parameters are extracted according to the range of the feature quantity, and the machine learning model is trained one by one. In this way, the computer device can obtain all feature parameter combinations available for training within the allowable number of candidate feature parameters, and input the feature parameter combinations into the machine learning model. Training data can be rapidly acquired and the machine learning model can be trained. And after comparison is carried out according to the AUC value of each training result, the characteristic parameter corresponding to the AUC value closest to 1 is taken as the target characteristic parameter and is taken as the training data of the subsequent model detection data.
Referring to fig. 8, fig. 8 is a schematic structural diagram of an apparatus at a training data processing of a machine learning model according to an embodiment of the present application.
Based on the same inventive concept as the training data processing method of the machine learning model, the embodiment of the present application further provides a training data processing apparatus of a machine learning model, including:
an obtaining module 81, configured to obtain a feature parameter set of the updated training data of the machine learning model; wherein the feature parameter set comprises a plurality of candidate feature parameters;
a range determining module 82, configured to determine a range of the feature parameter to be selected according to the type of the machine learning model and the type of the feature parameter;
the training module 83 is configured to select feature parameters from the feature parameter set in sequence in the range of the feature parameters, and input the feature parameters into the machine learning model for training;
and the selecting module 84 is configured to obtain an output result of the machine learning model, calculate an AUC value of the output result, and select a target feature parameter from the feature parameter set according to the AUC value as training data.
Referring to fig. 9, fig. 9 is a schematic diagram of an internal structure of a computer device according to an embodiment of the present application. As shown in fig. 9, the computer apparatus includes a processor 91, a storage medium 92, a memory 93, and a network interface 94, which are connected by a system bus. The storage medium 92 of the computer device stores an operating system, a database and computer readable instructions, the database may store control information sequences, and the computer readable instructions, when executed by the processor 91, may cause the processor 91 to implement a data transmission method, and the processor 91 may implement the functions of the obtaining module 81, the range determining module 82, the training module 83 and the selecting module 84 in the training data processing apparatus of a machine learning model in the embodiment shown in fig. 8. The processor 91 of the computer device is used to provide computing and control capabilities to support the operation of the entire computer device. The memory 93 of the computer device may have stored therein computer readable instructions that, when executed by the processor 91, may cause the processor 91 to perform a data transfer method. The network interface 94 of the computer device is used for communicating with the terminal connection. Those skilled in the art will appreciate that the architecture shown in fig. 9 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, the present application also proposes a storage medium storing computer-readable instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of: acquiring a characteristic parameter set of the updated training data of the machine learning model; wherein the feature parameter set comprises a plurality of candidate feature parameters; determining the range of the characteristic parameters to be selected according to the type of the machine learning model and the type of the characteristic parameters; in the range of the characteristic parameters, selecting characteristic parameters from the characteristic parameter set in sequence and inputting the characteristic parameters into the machine learning model for training; and acquiring an output result of the machine learning model, calculating an AUC value of the output result, and selecting a target characteristic parameter from the characteristic parameter set as training data according to the AUC value.
By combining the above embodiments, the application has the following greatest beneficial effects:
according to the training data processing method of the machine learning model, the computer equipment updates the feature set and trains the machine learning model, the range of the feature parameters is set, corresponding training data are automatically acquired one by one according to the setting, and the optimal feature parameters are obtained as the training data according to the AUC value corresponding to the result of each training. Therefore, the problem that training efficiency is low due to the fact that training samples of characteristic parameters need to be trained one by one and wait at present according to manual experience is solved, and the efficiency of model training in machine learning is improved.
Further, the updated feature parameters include newly added or modified feature parameters.
And determining the interval of the characteristic parameters obtained twice in the range of the characteristic parameters according to the granularity of the characteristic parameters.
When the new or modified feature parameters of the training data of the machine learning model are continuous feature parameters, each feature parameter can be sequentially obtained according to the interval of the feature parameters within the range of the feature parameters, and the feature parameters are input into the machine learning model one by one for training. Like this, applicable training sample size's that corresponds in the characteristic parameter the more condition, also can avoid artifical input one by one moreover and probably appear the condition of omitting, under the prerequisite that improves model training efficiency in the machine learning, can also improve the degree of accuracy of model training.
When the training of the machine learning model requires a plurality of characteristic parameters for training, the characteristic parameters of discrete characteristic quantity are added, the combinations of the characteristic parameters corresponding to the characteristic quantity can be sequentially obtained from the characteristic parameter set according to the range of the characteristic parameters, different characteristic parameter combinations are input into the machine learning model for training, and thus, all the combinations of the characteristic parameters can be quickly obtained, the combinations of all the characteristic parameters are trained, the training can be more comprehensive, and the training efficiency and accuracy are improved.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims (10)

1. A training data processing method of a machine learning model is characterized by comprising the following steps:
acquiring a characteristic parameter set of the updated training data of the machine learning model; wherein the feature parameter set comprises a plurality of candidate feature parameters;
determining the range of the characteristic parameters to be selected according to the type of the machine learning model and the type of the characteristic parameters;
in the range of the characteristic parameters, selecting characteristic parameters from the characteristic parameter set in sequence and inputting the characteristic parameters into the machine learning model for training;
and acquiring an output result of the machine learning model, calculating an AUC value of the output result, and selecting a target characteristic parameter from the characteristic parameter set as training data according to the AUC value.
2. The method of claim 1,
the step of obtaining the feature parameter set of the updated training data of the machine learning model includes:
and acquiring newly added or modified characteristic parameters of the training data of the machine learning model, and updating a characteristic parameter set.
3. The method of claim 2,
and in the range of the characteristic parameters, sequentially selecting the characteristic parameters from the characteristic parameter set and inputting the characteristic parameters into the machine learning model for training, wherein the method comprises the following steps of:
determining the interval of the characteristic parameters obtained twice in the range of the characteristic parameters according to the granularity of the characteristic parameters;
and sequentially acquiring each characteristic parameter within the range of the characteristic parameters according to the interval of the characteristic parameters.
4. The method of claim 3,
the step of determining the range of the feature parameters to be selected according to the type of the machine learning model and the type of the feature parameters includes:
confirming the value characteristics of the newly added or modified characteristic parameters according to the type of the machine learning model;
and determining the range of the characteristic parameter to be selected according to the value characteristics of the newly added or modified characteristic parameter.
5. The method of claim 3,
the step of sequentially acquiring each characteristic parameter within the range of the characteristic parameter according to the interval of the characteristic parameter comprises the following steps:
when the characteristic parameters newly added or modified by the training data of the machine learning model are continuous characteristic parameters, sequentially acquiring each characteristic parameter according to the interval of the characteristic parameters in the range of the characteristic parameters;
and inputting each characteristic parameter into the machine learning model for training.
6. The method of claim 4,
the step of sequentially acquiring each characteristic parameter within the range of the characteristic parameter according to the interval of the characteristic parameter comprises the following steps:
when the training of the machine learning model requires a plurality of characteristic parameters for training, the characteristic parameters comprise discrete characteristic quantity;
and in the range of the characteristic parameters, sequentially acquiring the combination of the characteristic parameters corresponding to the characteristic quantity from the characteristic parameter set, and inputting the combination into the machine learning model for training.
7. The method of claim 6,
the step of sequentially acquiring the combination of the characteristic parameters corresponding to the characteristic quantity from the characteristic parameter set in the range of the characteristic parameters and inputting the combination of the characteristic parameters into the machine learning model for training comprises the following steps:
and according to the corresponding feature quantity, sequentially acquiring all combinations of feature parameters from the feature parameter set, and inputting the combinations of feature parameters into the machine learning model one by one for training.
8. A training data processing apparatus for a machine learning model, comprising:
the acquisition module is used for acquiring a characteristic parameter set of the updated training data of the machine learning model; wherein the feature parameter set comprises a plurality of candidate feature parameters;
the range determining module is used for determining the range of the characteristic parameters needing to be selected according to the type of the machine learning model and the type of the characteristic parameters;
the training module is used for sequentially selecting characteristic parameters from the characteristic parameter set in the range of the characteristic parameters and inputting the characteristic parameters into the machine learning model for training;
and the selection module is used for acquiring the output result of the machine learning model, calculating an AUC value of the output result, and selecting a target characteristic parameter from the characteristic parameter set as training data according to the AUC value.
9. A computer device, comprising:
one or more processors;
a memory;
one or more computer programs, wherein the one or more computer programs are stored in the memory and configured to be executed by the one or more processors, the one or more computer programs being configured to perform the training data processing method of the machine learning model according to any of claims 1 to 7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the training data processing method of the machine learning model of any one of claims 1 to 7.
CN201911403575.7A 2019-12-30 2019-12-30 Training data processing method and device of machine learning model and computer equipment Active CN111222553B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911403575.7A CN111222553B (en) 2019-12-30 2019-12-30 Training data processing method and device of machine learning model and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911403575.7A CN111222553B (en) 2019-12-30 2019-12-30 Training data processing method and device of machine learning model and computer equipment

Publications (2)

Publication Number Publication Date
CN111222553A true CN111222553A (en) 2020-06-02
CN111222553B CN111222553B (en) 2023-08-29

Family

ID=70830968

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911403575.7A Active CN111222553B (en) 2019-12-30 2019-12-30 Training data processing method and device of machine learning model and computer equipment

Country Status (1)

Country Link
CN (1) CN111222553B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111783872A (en) * 2020-06-30 2020-10-16 北京百度网讯科技有限公司 Method and device for training model, electronic equipment and computer readable storage medium
WO2022141751A1 (en) * 2020-12-31 2022-07-07 杭州富加镓业科技有限公司 High-resistance gallium oxide prediction method and preparation method based on deep learning and bridgman-stockbarger method
WO2022141759A1 (en) * 2020-12-31 2022-07-07 杭州富加镓业科技有限公司 Gallium oxide quality prediction method based on deep learning and czochralski method, and preparation method and system
US20230196378A1 (en) * 2021-12-21 2023-06-22 International Business Machines Corporation Carbon emission bounded machine learning

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160078361A1 (en) * 2014-09-11 2016-03-17 Amazon Technologies, Inc. Optimized training of linear machine learning models
CN109325541A (en) * 2018-09-30 2019-02-12 北京字节跳动网络技术有限公司 Method and apparatus for training pattern
CN109409528A (en) * 2018-09-10 2019-03-01 平安科技(深圳)有限公司 Model generating method, device, computer equipment and storage medium
JP2019049975A (en) * 2017-09-07 2019-03-28 富士通株式会社 Training apparatus and method for deep learning classification model
CN109800884A (en) * 2017-11-14 2019-05-24 阿里巴巴集团控股有限公司 Processing method, device, equipment and the computer storage medium of model parameter
CN109816116A (en) * 2019-01-17 2019-05-28 腾讯科技(深圳)有限公司 The optimization method and device of hyper parameter in machine learning model
US20190332938A1 (en) * 2017-02-24 2019-10-31 Deepmind Technologies Limited Training machine learning models
US20190362222A1 (en) * 2018-05-22 2019-11-28 Adobe Inc. Generating new machine learning models based on combinations of historical feature-extraction rules and historical machine-learning models
CN110532466A (en) * 2019-08-21 2019-12-03 广州华多网络科技有限公司 Processing method, device, storage medium and the equipment of platform training data is broadcast live

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160078361A1 (en) * 2014-09-11 2016-03-17 Amazon Technologies, Inc. Optimized training of linear machine learning models
US20190332938A1 (en) * 2017-02-24 2019-10-31 Deepmind Technologies Limited Training machine learning models
JP2019049975A (en) * 2017-09-07 2019-03-28 富士通株式会社 Training apparatus and method for deep learning classification model
CN109800884A (en) * 2017-11-14 2019-05-24 阿里巴巴集团控股有限公司 Processing method, device, equipment and the computer storage medium of model parameter
US20190362222A1 (en) * 2018-05-22 2019-11-28 Adobe Inc. Generating new machine learning models based on combinations of historical feature-extraction rules and historical machine-learning models
CN109409528A (en) * 2018-09-10 2019-03-01 平安科技(深圳)有限公司 Model generating method, device, computer equipment and storage medium
CN109325541A (en) * 2018-09-30 2019-02-12 北京字节跳动网络技术有限公司 Method and apparatus for training pattern
CN109816116A (en) * 2019-01-17 2019-05-28 腾讯科技(深圳)有限公司 The optimization method and device of hyper parameter in machine learning model
CN110532466A (en) * 2019-08-21 2019-12-03 广州华多网络科技有限公司 Processing method, device, storage medium and the equipment of platform training data is broadcast live

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111783872A (en) * 2020-06-30 2020-10-16 北京百度网讯科技有限公司 Method and device for training model, electronic equipment and computer readable storage medium
CN111783872B (en) * 2020-06-30 2024-02-02 北京百度网讯科技有限公司 Method, device, electronic equipment and computer readable storage medium for training model
WO2022141751A1 (en) * 2020-12-31 2022-07-07 杭州富加镓业科技有限公司 High-resistance gallium oxide prediction method and preparation method based on deep learning and bridgman-stockbarger method
WO2022141759A1 (en) * 2020-12-31 2022-07-07 杭州富加镓业科技有限公司 Gallium oxide quality prediction method based on deep learning and czochralski method, and preparation method and system
US12026616B2 (en) 2020-12-31 2024-07-02 Hangzhou Fujia Gallium Technology Co. Ltd. Preparation method of high resistance gallium oxide based on deep learning and vertical bridgman growth method
US20230196378A1 (en) * 2021-12-21 2023-06-22 International Business Machines Corporation Carbon emission bounded machine learning

Also Published As

Publication number Publication date
CN111222553B (en) 2023-08-29

Similar Documents

Publication Publication Date Title
CN111222553A (en) Training data processing method and device of machine learning model and computer equipment
CN112181666B (en) Equipment assessment and federal learning importance aggregation method based on edge intelligence
CN111031346B (en) Method and device for enhancing video image quality
CN104778173B (en) Target user determination method, device and equipment
CN110149237B (en) Hadoop platform computing node load prediction method
CN110942154A (en) Data processing method, device, equipment and storage medium based on federal learning
CN107547154B (en) Method and device for establishing video traffic prediction model
CN112990478B (en) Federal learning data processing system
CN109685785A (en) A kind of image quality measure method, apparatus and electronic equipment
CN109903103B (en) Method and device for recommending articles
WO2017197330A1 (en) Two-stage training of a spoken dialogue system
CN112860756A (en) Exercise test-based learning resource recommendation method and device and computer equipment
CN110782448A (en) Rendered image evaluation method and device
CN110489131B (en) Gray level user selection method and device
CN108108299B (en) User interface testing method and device
CN116361130B (en) Evaluation method based on virtual reality man-machine interaction system
CN112559868A (en) Information recall method and device, storage medium and electronic equipment
CN113762382B (en) Model training and scene recognition method, device, equipment and medium
CN116540546A (en) Recommendation method, system, equipment and medium for control parameters of process control system
CN111382349A (en) Information recommendation method and device, computer equipment and storage medium
CN109255016A (en) Answer method, device and computer readable storage medium based on deep learning
CN110838021A (en) Conversion rate estimation method and device, electronic equipment and storage medium
CN115393100A (en) Resource recommendation method and device
CN110414845B (en) Risk assessment method and device for target transaction
CN113222843A (en) Image restoration method and related equipment thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210115

Address after: 511442 3108, 79 Wanbo 2nd Road, Nancun Town, Panyu District, Guangzhou City, Guangdong Province

Applicant after: GUANGZHOU CUBESILI INFORMATION TECHNOLOGY Co.,Ltd.

Address before: 29th floor, building B-1, Wanda Plaza, Wanbo business district, Nancun Town, Panyu District, Guangzhou City, Guangdong Province

Applicant before: GUANGZHOU HUADUO NETWORK TECHNOLOGY Co.,Ltd.

EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20200602

Assignee: GUANGZHOU HUADUO NETWORK TECHNOLOGY Co.,Ltd.

Assignor: GUANGZHOU CUBESILI INFORMATION TECHNOLOGY Co.,Ltd.

Contract record no.: X2021440000054

Denomination of invention: Training data processing method, device and computer equipment of machine learning model

License type: Common License

Record date: 20210208

GR01 Patent grant
GR01 Patent grant