CN114254763B - Machine learning model repairing method, system, computer equipment and storage medium - Google Patents

Machine learning model repairing method, system, computer equipment and storage medium Download PDF

Info

Publication number
CN114254763B
CN114254763B CN202111620425.9A CN202111620425A CN114254763B CN 114254763 B CN114254763 B CN 114254763B CN 202111620425 A CN202111620425 A CN 202111620425A CN 114254763 B CN114254763 B CN 114254763B
Authority
CN
China
Prior art keywords
machine learning
learning model
training
preset
repair
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111620425.9A
Other languages
Chinese (zh)
Other versions
CN114254763A (en
Inventor
沈超
张笑宇
蔺琛皓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202111620425.9A priority Critical patent/CN114254763B/en
Publication of CN114254763A publication Critical patent/CN114254763A/en
Application granted granted Critical
Publication of CN114254763B publication Critical patent/CN114254763B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The invention belongs to the field of machine learning, and discloses a machine learning model repairing method, a system, computer equipment and a storage medium, wherein the method comprises the following steps: s1: acquiring a machine learning model to be trained; s2: training a machine learning model through a preset training set, and acquiring preset monitoring index data of the machine learning model in the training process; s3: judging whether the machine learning model has a training problem according to the acquired preset monitoring index data; s4: when the machine learning model has no training problem, outputting the current machine learning model and ending the repair; s5: when the machine learning model has training problems, training problem repair strategies in a preset training problem repair strategy set are sequentially selected, and repair steps are carried out every time one training problem repair strategy is selected: and repairing the machine learning model according to the selected training problem repairing strategy, and repeating S2-S4. The method has high working efficiency and low complexity, can effectively detect and repair the training problem, and reduces the labor cost.

Description

Machine learning model repairing method, system, computer equipment and storage medium
Technical Field
The invention belongs to the field of machine learning, and relates to a machine learning model repairing method, a system, computer equipment and a storage medium.
Background
In the field of machine learning, machine learning security is a research field that is becoming increasingly interesting as machine learning is widely used in various industries. However, with the continuous development of scientific research and application, the machine learning model becomes more and more huge, the training process becomes more and more lengthy, and any training problem may cause the failure of the training process for a long time, thereby wasting a great deal of time and computing resources.
At present, most of researches and works related to a machine learning model repairing process are mainly visual training processes, and the type of works can generally record and visually represent parameter changes and input and output of the machine learning model repairing process in real time, but lack the capability of detecting training problems and giving modification opinions, and have certain limitations on effectiveness and usability. The machine learning training problem is influenced by factors such as random initialization in the machine learning model repairing process, the machine learning model repairing process has certain randomness, current staff often needs to judge whether the problem occurs or not and repair the problem manually after the training is finished, a large amount of manpower, time and resources are consumed, and the machine learning development efficiency and the model training effect are greatly reduced.
Disclosure of Invention
The present invention is directed to overcoming the drawbacks of the prior art described above, and providing a machine learning model repairing method, system, computer device and storage medium.
In order to achieve the purpose, the invention is realized by adopting the following technical scheme:
in a first aspect of the present invention, a machine learning model repair method includes the steps of:
s1: acquiring a machine learning model to be trained;
s2: training a machine learning model through a preset training set, and acquiring preset monitoring index data of the machine learning model in the training process;
s3: judging whether the machine learning model has a training problem according to the acquired preset monitoring index data;
s4: when the machine learning model has no training problem, outputting the current machine learning model and ending the repair;
s5: when the machine learning model has training problems, training problem repair strategies in a preset training problem repair strategy set are sequentially selected, and repair steps are carried out every time one training problem repair strategy is selected: and repairing the machine learning model according to the selected training problem repairing strategy, and repeating S2-S4.
Optionally, the machine learning model to be trained is a text classification machine learning model, an image classification machine learning model or an audio recognition machine learning model.
Optionally, the preset monitoring index includes an accuracy, a model gradient and a model weight.
Optionally, the training problems include gradient explosion problems, gradient vanishing problems, dead ReLU problems, unstable convergence problems, and slow convergence problems; the specific method for judging whether the machine learning model has a training problem according to the acquired preset monitoring index data comprises the following steps:
when the ratio of the model gradient of the input layer to the model gradient of the output layer of the machine learning model exceeds a first preset threshold value and the accuracy of the machine learning model is lower than a preset accuracy threshold value, or NaN values exist in the model gradient or model weight of the machine learning model, the machine learning model has a training problem and is a gradient explosion problem;
when the ratio of the model gradient of the input layer to the model gradient of the output layer of the machine learning model exceeds a second preset threshold value and the model gradient of the input layer is smaller than a third preset threshold value, the machine learning model has a training problem and is a gradient disappearance problem;
when the model gradient of the machine learning model contains zero values exceeding a preset proportion, the activation function is a ReLU activation function, and the accuracy of the machine learning model is lower than a preset accuracy threshold, the machine learning model has a training problem and is a death ReLU problem;
when the jitter amplitude of the accuracy of the machine learning model exceeds a preset jitter amplitude threshold value and continuously exceeds the training times of a preset number, the machine learning model has the training problem and is unstable and convergent;
when the accuracy of the machine learning model is lower than a preset accuracy threshold and the maximum improvement of two adjacent training steps does not exceed a fourth preset threshold, the machine learning model has the training problem and is slow convergence problem.
Optionally, the first preset threshold is 70, and the second preset threshold is 1e -3 The third preset threshold value is 1e -4 The fourth preset threshold is 0.01, the preset jitter amplitude threshold is 0.03, and the preset number of training times is 20% of the total training times.
Optionally, the training problem repair strategy includes adding a batch regularization layer, modifying a machine learning model activation function, clipping a machine learning model gradient, modifying a machine learning model initializer, adjusting a machine learning model repair batch size, adjusting a machine learning model repair optimizer learning rate, and replacing a machine learning model repair optimizer.
Optionally, the step S5 further includes: when the training problem restoration strategies in the training problem restoration strategy set are all selected, outputting the current machine learning model and the training problems of the machine learning model.
In a second aspect of the present invention, a machine learning model repair system includes:
the model acquisition module is used for acquiring a machine learning model to be trained;
the index acquisition module is used for training the machine learning model through a preset training set and acquiring preset monitoring index data of the machine learning model in the training process;
the training problem confirmation module is used for judging whether the machine learning model has a training problem or not according to the acquired preset monitoring index data;
the repair module is used for sequentially selecting the training problem repair strategies in the preset training problem repair strategy set when the machine learning model has training problems, and performing repair steps when one training problem repair strategy is selected: repairing the machine learning model according to the selected training problem repairing strategy, and triggering an index acquisition module and a training problem confirmation module;
and the output module is used for outputting the current machine learning model and ending the repair when the machine learning model has no training problem.
In a third aspect of the present invention, a computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the machine learning model restoration method described above when the computer program is executed.
In a fourth aspect of the present invention, a computer-readable storage medium stores a computer program which, when executed by a processor, implements the steps of the machine learning model restoration method described above.
Compared with the prior art, the invention has the following beneficial effects:
according to the machine learning model repairing method, the preset monitoring index data of the machine learning model in the training process are obtained in real time, whether the machine learning model has training problems is judged according to the obtained preset monitoring index data, when the machine learning model has training problems, training problem repairing strategies in a preset training problem repairing strategy set are sequentially selected, and each selected training problem repairing strategy is repaired, so that automatic repairing is realized. In the method, the detection and repair process of the machine learning model training problem is simple, the operation is convenient, the user does not need to have more priori knowledge on the machine learning model training, any user can automatically monitor the training problem only by inputting the machine learning model and the training set, and finally, the machine learning model with excellent performance is obtained, and the user does not need to observe data manually, analyze the training problem manually or search a repair method manually to repair the machine learning model. The method is simple to implement, low in complexity and high in repairing efficiency of the machine learning training problem, the training efficiency of the machine learning model can be effectively improved, the resource cost in the training process is reduced, and the quality of the final machine learning model is guaranteed.
Drawings
FIG. 1 is a flow chart of a machine learning model repair method of the present invention;
FIG. 2 is a diagram illustrating a machine learning model training problem detection in accordance with one embodiment of the present invention;
FIG. 3 is a diagram illustrating a machine learning model training problem repair result according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a machine learning model training problem detection in accordance with another embodiment of the present invention;
FIG. 5 is a diagram illustrating the result of machine learning model training problem repair in accordance with another embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The invention is described in further detail below with reference to the attached drawing figures:
referring to fig. 1, in one embodiment of the present invention, a machine learning model repair method is provided, which is configured for a given machine learning model and related training process, and which efficiently, real-time and automatically detects the machine learning model repair process and automatically repairs the existing training problems.
Specifically, the machine learning model repairing method comprises the following steps:
s1: and acquiring a machine learning model to be trained.
S2: training a machine learning model through a preset training set, and acquiring preset monitoring index data of the machine learning model in the training process.
S3: and judging whether the machine learning model has a training problem according to the acquired preset monitoring index data.
S4: when the machine learning model has no training problem, outputting the current machine learning model and ending the repair.
S5: when the machine learning model has training problems, training problem repair strategies in a preset training problem repair strategy set are sequentially selected, and repair steps are carried out every time one training problem repair strategy is selected: and repairing the machine learning model according to the selected training problem repairing strategy, and repeating S2-S4.
In the step S1, the machine learning model to be trained may be a text classification machine learning model (for example, RNN recurrent neural network model), an image classification machine learning model (for example, CNN convolutional neural network model), or an audio recognition machine learning model (for example, DNN-HMM deep neural network-hidden markov model), and the machine learning model repairing method may be directed to machine learning models such as RNN, CNN, and DNN-HMM applied to tasks such as text classification, image classification, and audio recognition.
In the step S2, the preset training set is a training sample of the corresponding machine learning model when applied, for example, the training set is an image sample for identifying the machine learning model and classifying the machine learning model by aiming at the image; aiming at the audio recognition machine learning model, the training set is an audio sample, basic training parameters of the machine learning model are not changed in the method, and the training is performed by adopting the self-specified training process configuration based on various machine learning models.
In the step S2, the preset monitoring indexes comprise accuracy, model gradient and model weight. Specifically, for a given n-layer machine learning model (which may be Keras-based implementation), the machine learning model is loaded and trained in accordance with its training configuration. Recording layer parameter configuration L= { L of machine learning model 1 ,l 2 ,…,l n Training the optimizer parameter set o= { O } 1 ,O 2 …, in addition, training process monitoring based on the machine learning model and recording Data in real time in the ith training iteration period i ={A i ,G i ,W i …, which contains the accuracy A of each training iteration i Model gradient G i And model weight W i
In S3, the training problems include gradient explosion problem, gradient vanishing problem, dead ReLU problem, unstable convergence problem, and slow convergence problem. The specific method for judging whether the machine learning model has a training problem according to the acquired preset monitoring index data comprises the following steps:
when the model gradient of the machine learning model in the training process is obviously increased layer by layer from the output layer to the input layer, the model gradient of the input layer is higher than a specific value, even operational data overflow occurs, and the training accuracy and other evaluation indexes are obviously lower, the machine learning model has the gradient explosion problem. Specifically, in this embodiment, when the ratio of the model gradient of the input layer to the model gradient of the output layer of the machine learning model exceeds a first preset threshold and the accuracy of the machine learning model is lower than a preset accuracy threshold, or when a NaN value exists in the model gradient or model weight of the machine learning model, the machine learning model has a training problem and is a gradient explosion problem.
When the model gradient of the machine learning model in the training process is obviously reduced layer by layer from the output layer to the input layer, the model gradient of the input layer is lower than a specific value, the training accuracy and other evaluation indexes are obviously lower, and the machine learning model is considered to have the gradient disappearance problem. Specifically, in this embodiment, when the ratio of the model gradient of the input layer to the model gradient of the output layer of the machine learning model exceeds the second preset threshold, and the model gradient of the input layer is smaller than the third preset threshold, the machine learning model has a training problem and is a gradient vanishing problem.
When the model gradient of the machine learning model in the training process contains zero values exceeding a certain proportion and the activation function is used as a ReLU activation function, and the training accuracy and other evaluation indexes are obviously lower, the machine learning model has the death problem of ReLU. Specifically, in this embodiment, when the model gradient of the machine learning model includes a zero value exceeding a preset proportion, the activation function is a ReLU activation function, and the accuracy of the machine learning model is lower than a preset accuracy threshold, the machine learning model has a training problem and is a dead ReLU problem.
When the accuracy of the machine learning model in the training process and other evaluation indexes have obvious jitter and continue to exceed a certain training iteration number, the machine learning model is considered to have an unstable convergence problem. Specifically, in this embodiment, when the jitter amplitude of the accuracy of the machine learning model exceeds the preset jitter amplitude threshold and continuously exceeds the preset number of training times, the machine learning model has a training problem and is an unstable convergence problem.
When the accuracy of the machine learning model and other evaluation indexes are obviously lower in the training process, and the accuracy of the machine learning model can not be obviously improved and improved within a certain time, the machine learning model is considered to have the slow convergence problem. Specifically, in this embodiment, when the accuracy of the machine learning model is lower than a preset accuracy threshold, and the maximum improvement of two adjacent training steps does not exceed a fourth preset threshold, the machine learning model has a training problem and is a slow convergence problem.
Specifically, in this embodiment, the first preset threshold is 70, and the second preset threshold is 1e -3 The third preset threshold value is 1e -4 The fourth preset threshold is 0.01, the preset jitter amplitude threshold is 0.03, and the preset number of training times is 20% of the total training times.
In the step S5, a plurality of training problem repair strategies are preset for each training problem, and the effect of the training problem repair strategies is verified through an actual test, wherein the training problem repair strategies comprise adding a batch regularization layer, modifying a machine learning model activation function, cutting a machine learning model gradient, modifying a machine learning model initializer, adjusting the machine learning model repair batch size, adjusting the machine learning model repair optimizer learning rate and replacing the machine learning model repair optimizer for 7 kinds in total. And repairing the machine learning models with the detected training problems one by using a training problem repairing strategy, and storing the repaired machine learning models.
In the step S5, the repaired machine learning model is retrained, and the process setting is ensured to be the same as the initial setting when the machine learning model is retrained, so that other interference factors are prevented from interfering the retrained process. And the retraining process of the repaired machine learning model is monitored, the preset monitoring index data are continuously collected, the training problem is continuously monitored, the training is rapidly detected and terminated when the training problem occurs again, and the repair of the machine learning model is carried out again.
Finally, the method is adopted to complete the repaired machine learning model, and a better application effect can be realized. For example, the method is applied to the field of image recognition, and aiming at an image recognition machine learning model, after the method is adopted for repairing, the training time of the image recognition machine learning model can be greatly reduced when the image is classified, and better image recognition effect can be obtained by inputting an image to be recognized into the repaired and trained image recognition machine learning model, so that the accuracy of image recognition is greatly improved. Similarly, aiming at the image classification field and the audio recognition field, the method can achieve higher accuracy by utilizing the repaired image classification machine learning model and the repaired audio recognition machine learning model to classify the images to be classified and recognize the audio to be recognized.
Optionally, the step S5 further includes: when the training problem restoration strategies in the training problem restoration strategy set are all selected, outputting the current machine learning model and the training problems of the machine learning model. So that the operator is informed of the training problem currently existing in the machine learning model.
In summary, the machine learning model repairing method acquires the preset monitoring index data of the machine learning model in the training process in real time, further judges whether the machine learning model has training problems according to the acquired preset monitoring index data, sequentially selects training problem repairing strategies in a preset training problem repairing strategy set when the machine learning model has training problems, repairs each time one training problem repairing strategy is selected, and achieves automatic repairing. In the method, the detection and repair process of the machine learning model training problem is simple, the operation is convenient, the user does not need to have more priori knowledge on the machine learning model training, any user can automatically monitor the training problem only by inputting the machine learning model and the training set, and finally, the machine learning model with excellent performance is obtained, and the user does not need to observe data manually, analyze the training problem manually or search a repair method manually to repair the machine learning model. The method is simple to implement, low in complexity and high in repairing efficiency of the machine learning training problem, the training efficiency of the machine learning model can be effectively improved, the resource cost in the training process is reduced, and the quality of the final machine learning model is guaranteed.
In addition, the detection and repair method has low requirements on dependent machine learning frames, and the existing Keras, tensorFlow and Pytorch can be applied, so that the machine learning model repair method can be applied between almost all machine learning model frames, efficient model training on different machine learning model frames becomes possible, and the overall model training efficiency is improved.
Further, the machine learning model training problems include gradient explosion problems, gradient disappearance problems, death ReLU problems, unstable convergence problems and slow convergence problems, and more kinds of problems can be detected and repaired compared with the traditional work, and the problems and symptoms thereof are more perfectly defined.
Further, the machine learning model training problem repair strategy includes: adding a batch regularization layer, modifying a machine learning model activation function, cutting a machine learning model gradient, modifying a machine learning model initializer, adjusting the size of a machine learning model training batch, adjusting the learning rate of a machine learning model training optimizer and replacing the machine learning model training optimizer.
Referring to fig. 2, in still another embodiment of the present invention, a machine learning model with gradient explosion problem is trained, and the training problem of the machine learning model is detected by the above-mentioned machine learning model repairing method monitoring method, which specifically includes the following steps:
step 1: the training problem detection step comprises the following steps:
step 1-1: and starting a training process by using Keras and TensorFlow, and training the input machine learning model according to training configuration. The Keras tool can conveniently and automatically define relevant configuration of a training process and load a data set to train a target machine learning model. The training process configuration mainly designates key information such as optimizers, learning rate, batch processing quantity and the like used in the training process. And recording information such as training configuration, model configuration and the like as static data.
Step 1-2: and recording information of the change of the machine learning model in the training process such as accuracy, model gradient, model weight and the like, wherein the information is dynamic data for judging training problems. The information such as accuracy and the like can be directly obtained from training through a Keras interface, and the model gradient and the model weight can be obtained through a solving method realized by TensorFlow.
Step 1-3: in training, every a certain number of iteration cycles (default 1), whether training problems exist or not is evaluated by using the dynamic data and the static data recorded in the step 1. The evaluation method has complete evaluation standards and threshold values for symptoms of each machine learning model training problem, when the standards or the threshold values are met, the training problem is considered to exist, the problem type is recorded and the training is stopped, otherwise, the training is normally completed. In this example, the gradient ratio of the input layer to the output layer is detected to be too large in the 4 th training iteration period, and the gradient explosion problem threshold value is exceeded, so that the training is terminated and the training problem repairing step is entered.
Step 2: the step of repairing the training problem is as follows:
step 2-1: for the gradient explosion of the training problem detected before, a feasible repair strategy is applied one by one in the example to repair the problem. Firstly, a strategy for modifying a machine learning model activation function is applied, the model activation function is modified into a 'ReLU', a new machine learning model is generated, and the rest training configuration and the model configuration are unchanged.
Step 2-2: re-entering a training process of the new machine learning model, collecting dynamic data and static data again, monitoring training problems according to the information, normally completing training if the training problems are not found, and outputting a repaired model; otherwise, reapplying other strategies for repair until the training problem is solved or all repair strategies are tried. Referring to fig. 3, after the activation function is replaced by ReLU, the training problem disappears, so that the retraining link can be normally completed and the repaired machine learning model can be output.
Referring to FIG. 4, in yet another embodiment of the present invention, a machine learning model of a plurality of training problems with potential gradient vanishing and death ReLU is used to detect and repair training problems therein.
In the 10 th training iteration period, the example detects that the model gradient proportion of the input layer and the output layer is smaller than a specific threshold value, and considers that the gradient vanishing problem exists; then applying a repair strategy, changing the activation function from tanh to ReLU, and restarting the training process. The new training process continues to monitor static data and dynamic data of the machine learning model in training, and detects the death ReLU training problem in the 3 rd iteration period; the repair strategy is then reapplied and a new machine learning model is generated to restart training. In this way, the loop is continued until the termination condition is satisfied.
The result of the repair process is shown in fig. 5, wherein the gradient vanishing training problem is found in the 1 st attempt, and the training process is restarted after the activation function is changed to ReLU; the death ReLU training problem is found in the 2 nd training attempt, the training process is restarted after the activation function is changed into the SeLU, the machine learning model at the moment does not have the training problem, the training process is finally completed, and the normal machine learning model is output.
The following are device embodiments of the present invention that may be used to perform method embodiments of the present invention. For details not disclosed in the apparatus embodiments, please refer to the method embodiments of the present invention.
In still another embodiment of the present invention, a machine learning model repairing system is provided, which can be used to implement the machine learning model repairing method described above, and specifically, the machine learning model repairing system includes a model obtaining module, an index obtaining module, a training problem confirming module, a repairing module, and an output module.
The model acquisition module is used for acquiring a machine learning model to be trained; the index acquisition module is used for training the machine learning model through a preset training set and acquiring preset monitoring index data of the machine learning model in the training process; the training problem confirmation module is used for judging whether the machine learning model has a training problem or not according to the acquired preset monitoring index data; the repair module is used for sequentially selecting training problem repair strategies in a preset training problem repair strategy set when the machine learning model has training problems, and performing repair steps when each training problem repair strategy is selected: repairing the machine learning model according to the selected training problem repairing strategy, and triggering an index acquisition module and a training problem confirmation module; the output module is used for outputting the current machine learning model and ending the repair when the machine learning model has no training problem.
All relevant contents of each step involved in the foregoing embodiment of the machine learning model repairing method may be cited to the functional description of the functional module corresponding to the machine learning model repairing system in the embodiment of the present invention, which is not described herein.
The division of the modules in the embodiments of the present invention is schematically only one logic function division, and there may be another division manner in actual implementation, and in addition, each functional module in each embodiment of the present invention may be integrated in one processor, or may exist separately and physically, or two or more modules may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules.
In yet another embodiment of the present invention, a computer device is provided that includes a processor and a memory for storing a computer program including program instructions, the processor for executing the program instructions stored by the computer storage medium. The processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf Programmable gate arrays (FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., which are the computational core and control core of the terminal adapted to implement one or more instructions, in particular adapted to load and execute one or more instructions within a computer storage medium to implement the corresponding method flow or corresponding functions; the processor disclosed by the embodiment of the invention can be used for the operation of a machine learning model repairing method.
In yet another embodiment of the present invention, a storage medium, specifically a computer readable storage medium (Memory), is a Memory device in a computer device, for storing a program and data. It is understood that the computer readable storage medium herein may include both built-in storage media in a computer device and extended storage media supported by the computer device. The computer-readable storage medium provides a storage space storing an operating system of the terminal. Also stored in the memory space are one or more instructions, which may be one or more computer programs (including program code), adapted to be loaded and executed by the processor. The computer readable storage medium herein may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. One or more instructions stored in a computer-readable storage medium may be loaded and executed by a processor to implement the corresponding steps of the method for repairing a learning model of a machine in the above embodiments.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.

Claims (9)

1. An image recognition method, characterized by comprising the steps of:
s1: acquiring a machine learning model to be trained; the machine learning model to be trained is an image recognition machine learning model;
s2: training a machine learning model through a preset training set, and acquiring preset monitoring index data of the machine learning model in the training process;
s3: judging whether the machine learning model has a training problem according to the acquired preset monitoring index data;
s4: when the machine learning model has no training problem, outputting the current machine learning model and ending the repair;
s5: when the machine learning model has training problems, training problem repair strategies in a preset training problem repair strategy set are sequentially selected, and repair steps are carried out every time one training problem repair strategy is selected: repairing the machine learning model according to the selected training problem repairing strategy, and repeating S2-S4;
s6: and inputting the image to be identified into the repaired and trained image identification machine learning model to obtain an image identification result.
2. The image recognition method according to claim 1, wherein the preset monitoring index includes, accuracy, model gradient and model weight.
3. The image recognition method according to claim 2, wherein the training problem includes a gradient explosion problem, a gradient vanishing problem, a death ReLU problem, an unstable convergence problem, and a slow convergence problem; the specific method for judging whether the machine learning model has a training problem according to the acquired preset monitoring index data comprises the following steps:
when the ratio of the model gradient of the input layer to the model gradient of the output layer of the machine learning model exceeds a first preset threshold value and the accuracy of the machine learning model is lower than a preset accuracy threshold value, or NaN values exist in the model gradient or model weight of the machine learning model, the machine learning model has a training problem and is a gradient explosion problem;
when the ratio of the model gradient of the input layer to the model gradient of the output layer of the machine learning model exceeds a second preset threshold value and the model gradient of the input layer is smaller than a third preset threshold value, the machine learning model has a training problem and is a gradient disappearance problem;
when the model gradient of the machine learning model contains zero values exceeding a preset proportion, the activation function is a ReLU activation function, and the accuracy of the machine learning model is lower than a preset accuracy threshold, the machine learning model has a training problem and is a death ReLU problem;
when the jitter amplitude of the accuracy of the machine learning model exceeds a preset jitter amplitude threshold value and continuously exceeds the training times of a preset number, the machine learning model has the training problem and is unstable and convergent;
when the accuracy of the machine learning model is lower than a preset accuracy threshold and the maximum improvement of two adjacent training steps does not exceed a fourth preset threshold, the machine learning model has the training problem and is slow convergence problem.
4. The image recognition method of claim 3, wherein the first predetermined threshold is 70 and the second predetermined threshold is 1e -3 The third preset threshold value is 1e -4 The fourth preset threshold is 0.01, the preset jitter amplitude threshold is 0.03, and the preset number of training times is 20% of the total training times.
5. The image recognition method of claim 1, wherein the training problem repair strategy comprises adding a batch regularization layer, modifying a machine learning model activation function, clipping a machine learning model gradient, modifying a machine learning model initializer, adjusting a machine learning model repair batch size, adjusting a machine learning model repair optimizer learning rate, and replacing a machine learning model repair optimizer.
6. The image recognition method according to claim 1, wherein the S5 further comprises: when the training problem restoration strategies in the training problem restoration strategy set are all selected, outputting the current machine learning model and the training problems of the machine learning model.
7. An image recognition system, comprising:
the model acquisition module is used for acquiring a machine learning model to be trained; the machine learning model to be trained is an image recognition machine learning model;
the index acquisition module is used for training the machine learning model through a preset training set and acquiring preset monitoring index data of the machine learning model in the training process;
the training problem confirmation module is used for judging whether the machine learning model has a training problem or not according to the acquired preset monitoring index data;
the repair module is used for sequentially selecting the training problem repair strategies in the preset training problem repair strategy set when the machine learning model has training problems, and performing repair steps when one training problem repair strategy is selected: repairing the machine learning model according to the selected training problem repairing strategy, and triggering an index acquisition module and a training problem confirmation module;
the output module is used for outputting the current machine learning model and finishing repairing when the machine learning model has no training problem;
the recognition module is used for inputting the image to be recognized into the repaired and trained image recognition machine learning model to obtain an image recognition result.
8. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the image recognition method according to any one of claims 1 to 6 when the computer program is executed.
9. A computer-readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the image recognition method according to any one of claims 1 to 6.
CN202111620425.9A 2021-12-27 2021-12-27 Machine learning model repairing method, system, computer equipment and storage medium Active CN114254763B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111620425.9A CN114254763B (en) 2021-12-27 2021-12-27 Machine learning model repairing method, system, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111620425.9A CN114254763B (en) 2021-12-27 2021-12-27 Machine learning model repairing method, system, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114254763A CN114254763A (en) 2022-03-29
CN114254763B true CN114254763B (en) 2024-04-05

Family

ID=80795372

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111620425.9A Active CN114254763B (en) 2021-12-27 2021-12-27 Machine learning model repairing method, system, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114254763B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472743A (en) * 2019-07-31 2019-11-19 北京百度网讯科技有限公司 Processing method and processing device, equipment and the readable medium that feature is passed through in sample set
WO2021139237A1 (en) * 2020-05-14 2021-07-15 平安科技(深圳)有限公司 Method and apparatus for loss function optimization, device, and storage medium
CN113807541A (en) * 2021-09-17 2021-12-17 西安交通大学 Fairness repair method, system, equipment and storage medium for decision system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017223192A1 (en) * 2016-06-21 2017-12-28 Sri International Systems and methods for machine learning using a trusted model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472743A (en) * 2019-07-31 2019-11-19 北京百度网讯科技有限公司 Processing method and processing device, equipment and the readable medium that feature is passed through in sample set
WO2021139237A1 (en) * 2020-05-14 2021-07-15 平安科技(深圳)有限公司 Method and apparatus for loss function optimization, device, and storage medium
CN113807541A (en) * 2021-09-17 2021-12-17 西安交通大学 Fairness repair method, system, equipment and storage medium for decision system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于TensorFlow的卷积神经网络图像分类实践策略研究;赵浩;;价值工程(第09期);全文 *

Also Published As

Publication number Publication date
CN114254763A (en) 2022-03-29

Similar Documents

Publication Publication Date Title
CN109636446B (en) User loss prediction method and device and electronic equipment
US9417859B2 (en) Purity analysis using white list/black list analysis
CN107015875B (en) Method and device for evaluating storage life of electronic complete machine
WO2014074164A1 (en) Memoizing with read only side effects
WO2014074161A1 (en) Determination of function purity for memoization
CN111144561A (en) Neural network model determining method and device
CN111931179A (en) Cloud malicious program detection system and method based on deep learning
Mills et al. A machine learning approach for determining the validity of traceability links
CN110019067A (en) A kind of log analysis method and system
CN114254763B (en) Machine learning model repairing method, system, computer equipment and storage medium
Saraph et al. Test case generation and reduction by automated input-output analysis
CN109743200B (en) Resource feature-based cloud computing platform computing task cost prediction method and system
CN116166967A (en) Data processing method, equipment and storage medium based on meta learning and residual error network
KR20170140625A (en) System and Method for recognizing driving pattern of driver
CN110991659B (en) Abnormal node identification method, device, electronic equipment and storage medium
CN114254764B (en) Feedback-based machine learning model searching method, system, equipment and medium
CN104035866B (en) The software action appraisal procedure and device of analysis are called based on system
CN112527631A (en) bug positioning method, system, electronic equipment and storage medium
CN113268419A (en) Method, device, equipment and storage medium for generating test case optimization information
CN112532643A (en) Deep learning-based traffic anomaly detection method, system, terminal and medium
EP4300361A1 (en) Methods and systems for optimizing a peak memory usage of an artificial neural network graph
CN117216701B (en) Intelligent bridge monitoring and early warning method and system
CN113192228B (en) Cluster automation inspection method and device
KR102276654B1 (en) Artificial intelligence system using pid controller
Agostinelli et al. Intelligent Robotic Process Automation: Generating Executable RPA Scripts from Unsegmented UI Logs.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant