CN114972928A - Image recognition model training method and device - Google Patents

Image recognition model training method and device Download PDF

Info

Publication number
CN114972928A
CN114972928A CN202210883039.7A CN202210883039A CN114972928A CN 114972928 A CN114972928 A CN 114972928A CN 202210883039 A CN202210883039 A CN 202210883039A CN 114972928 A CN114972928 A CN 114972928A
Authority
CN
China
Prior art keywords
model
parameter
training
original
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210883039.7A
Other languages
Chinese (zh)
Other versions
CN114972928B (en
Inventor
钟雨崎
凌明
杨作兴
艾国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen MicroBT Electronics Technology Co Ltd
Original Assignee
Shenzhen MicroBT Electronics Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen MicroBT Electronics Technology Co Ltd filed Critical Shenzhen MicroBT Electronics Technology Co Ltd
Priority to CN202210883039.7A priority Critical patent/CN114972928B/en
Publication of CN114972928A publication Critical patent/CN114972928A/en
Application granted granted Critical
Publication of CN114972928B publication Critical patent/CN114972928B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an image recognition model training method and device, wherein the method comprises the following steps: acquiring a training data set of a target to be recognized; training an original model according to a training data set to obtain an image recognition model; in each round of training process, limiting the parameter values of all original model parameters of each round of model, obtaining the parameter values of the limited model parameters corresponding to all the original model parameters of each round of model, and performing model training; and the parameter values of the image recognition model parameters included by the image recognition model are the parameter values of the limiting model parameters adopted by the last round of model training. The device is used for executing the method. The image recognition model training method and device provided by the embodiment of the invention improve the training efficiency of the image recognition model.

Description

Image recognition model training method and device
Technical Field
The invention relates to the technical field of computers, in particular to an image recognition model training method and device.
Background
Deep learning is a research direction of machine learning, and is applied to image recognition, character recognition, voice recognition, semantic analysis and the like.
The deep learning technology can be applied to image recognition, the training of an image recognition model needs to be carried out firstly, and the trained image recognition model is put into practical application. In the training process of the image recognition model, it is found that the loss function is reduced more and more slowly in the later stage of model training, the training period is prolonged, and the training efficiency of the image recognition model is low.
Disclosure of Invention
For solving the problems in the prior art, embodiments of the present invention provide a method and an apparatus for training an image recognition model, which can at least partially solve the problems in the prior art.
In a first aspect, the present invention provides a training method for an image recognition model, including:
acquiring a training data set of a target to be recognized;
training an original model according to a training data set to obtain an image recognition model; in each round of training process, limiting the parameter values of the original model parameters of each round of model, obtaining the parameter values of the limited model parameters corresponding to the original model parameters of each round of model, and performing model training; and the parameter values of the image recognition model parameters included by the image recognition model are the parameter values of the limiting model parameters adopted by the last round of model training.
In a second aspect, the present invention provides an image recognition model training apparatus, including:
the acquisition unit is used for acquiring a training data set of a target to be recognized;
the training unit is used for training the original model according to a training data set to obtain an image recognition model; in each round of training process, limiting the parameter values of all original model parameters of each round of model, and obtaining the parameter values of the limited model parameters corresponding to all the original model parameters of each round of model to train the model; and the parameter values of the image recognition model parameters included by the image recognition model are the parameter values of the limiting model parameters adopted by the last round of model training.
In a third aspect, the present invention provides a computer device, including a memory, a processor, and instructions stored on the memory and executable on the processor, where the processor implements the image recognition model training method according to any one of the above embodiments when executing the program.
In a fourth aspect, the present invention provides a computer-readable storage medium, having stored thereon instructions, which when executed by a processor, implement the image recognition model training method according to any of the above embodiments.
The image recognition model training method and device provided by the embodiment of the invention have the advantages that the training data set of the target to be recognized is obtained, the original model is trained according to the training data set to obtain the image recognition model, the parameter value of each original model parameter of each model is limited in each round of training process to obtain the limited model corresponding to each original model parameter of each model and perform model training, the parameter value of the image recognition model parameter included in the image recognition model is the parameter value of the limited model parameter adopted in the last round of model training, and the training efficiency of the image recognition model is improved due to the limited number of the parameter values of the limited model parameter.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a training method for an image recognition model according to a first embodiment of the present invention.
Fig. 2 is a flowchart illustrating an image recognition model training method according to a second embodiment of the present invention.
Fig. 3 is a schematic structural diagram of an image recognition model training apparatus according to a third embodiment of the present invention.
Fig. 4 is a schematic structural diagram of an image recognition model training apparatus according to a fourth embodiment of the present invention.
Fig. 5 is a schematic physical structure diagram of an electronic device according to a fifth embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention. It should be noted that the embodiments and features of the embodiments in the present application may be arbitrarily combined with each other without conflict.
In order to facilitate understanding of the technical solutions provided in the present application, the following first describes relevant contents of the technical solutions in the present application.
For model training, the training speed of the model and whether the model is over-fitted need to be considered. The model training speed is related to the development period of the product, and whether the model is over-fitted or not influences the intelligence of the product.
The training range of the model parameters is 32-bit floating point number, and the numerical range is [ -3.4 × 10 [ ] 38 ,3.4×10 38 ]The numerical representation is very broad. In actual training, the model weights are usually of the value [ -2.0,2.0 [ -2.0 [ ]]The expression range can be regarded as infinite, namely the expression granularity is 1/∞, and the information contained in each granularity is rare. When the number of training samples is insufficient, an overfitting phenomenon of the model easily occurs.
In many model training processes, it is found that after a model is trained for a certain period of time, because the expression range is infinite, the model parameters are updated only by changing the numerical values of the model parameters and the model parameters do not really go to the learning task target, so that the loss function is reduced more and more slowly and the training period is lengthened in the later stage of model training.
Therefore, the embodiment of the invention provides an image recognition model training method, which limits model parameters to accelerate convergence of the model parameters, improves the model training speed and can effectively avoid overfitting of the model.
The following describes a specific implementation process of the image recognition model training method provided by the embodiment of the present invention with a server as an execution subject. It can be understood that the execution subject of the image recognition model training method provided by the embodiment of the invention is not limited to the server.
Fig. 1 is a schematic flow chart of an image recognition model training method according to a first embodiment of the present invention, and as shown in fig. 1, the image recognition model training method according to the embodiment of the present invention includes:
s101, acquiring a training data set of a target to be recognized;
specifically, the server may obtain a training data set of the target to be recognized, where the training data set is used for model training. The target to be identified is selected according to actual needs, and the embodiment of the invention is not limited.
For example, in order to obtain a face recognition model through training, a preset number of face pictures can be collected and scaled to a uniform size, then each face picture is labeled, and the preset number of face pictures with uniform size and corresponding labels form a training data set.
S102, training an original model according to a training data set to obtain an image recognition model; in each round of training process, limiting the parameter values of all original model parameters of each round of model, obtaining the parameter values of the limited model parameters corresponding to all the original model parameters of each round of model, and performing model training; the image recognition model comprises image recognition model parameters which are parameter values for limiting model parameters adopted by the last round of model training.
After obtaining the image recognition model, the target to be recognized may be recognized according to the image recognition model. For example, if the training data set is a picture of a human face, the target to be recognized is a picture, and the human face is recognized by the image recognition model.
Specifically, the server trains the original model according to the training data set, and the image recognition model can be obtained through training. The model training process has multiple rounds of training, in each round of training process, the parameter values of all original model parameters of each round of model can be obtained, the parameter values of all the original model parameters are limited, the value number of the parameter values of the original model parameters is limited, the expression granularity of the parameter values of all the original model parameters is improved, the parameter values of the limited model parameters corresponding to all the original model parameters are obtained, model training is carried out through the parameter values of the limited model parameters corresponding to all the original model parameters of each round of model, and the parameter values of all the original model parameters are updated. The training speed of the model is accelerated due to the limited number of parameter values for limiting the model parameters. And when the end condition of the model training is met, the server acquires the parameter value of the limiting model parameter used by the last round of model training as the parameter value of the image recognition model parameter of the image recognition model. The original model may be a convolutional neural network model, and is selected according to actual needs, which is not limited in the embodiments of the present invention. The first round model is an original model, and parameter values of original model parameters of the original model can be randomly generated or preset. From the second round of training, each round of model is the model with the updated parameter values of the original model parameters, and the parameter values of the original model parameters of each round of model are the updated parameter values of the original model parameters of the previous round of model. In the embodiment of the present invention, the original model parameters refer to parameters that need to be updated in the training process, and include, but are not limited to, parameters such as weight and offset.
The image recognition model training method provided by the embodiment of the invention comprises the steps of obtaining a training data set of a target to be recognized, training an original model according to the training data set to obtain an image recognition model, limiting parameter values of original model parameters of each round of model in each round of training process to obtain a limiting model corresponding to each original model parameter of each round of model, and performing model training, wherein the image recognition model comprises the parameter values of the image recognition model parameters which are the parameter values of the limiting model parameters adopted in the last round of model training, and the training efficiency of the image recognition model is improved because the quantity of the parameter values of the limiting model parameters is limited. In addition, due to the fact that the number of the parameter values limiting the model parameters is limited and the model parameters can express more information, the phenomenon of overfitting of the model can be effectively avoided.
Fig. 2 is a schematic flow chart of an image recognition model training method according to a second embodiment of the present invention, and as shown in fig. 2, in addition to the above embodiments, the performing model training by limiting parameter values of original model parameters of each model in each round of training to obtain parameter values of limited model parameters corresponding to the original model parameters of each model in each round of training further includes:
s201, obtaining the maximum value of the parameter values of all original model parameters of the current wheel model, and rounding to obtain an adjustment reference value;
specifically, the server may obtain a parameter value of each original model parameter of the current wheel model, compare the parameter values of the original model parameters, obtain a maximum value of the parameter values of the original model parameters, and then perform rounding on the maximum value of the parameter values of the original model parameters to obtain an adjustment reference value. The current round model refers to a currently trained model. Since the parameter values of the model are updated every round of training, the parameter values of the original model parameters of each round of model vary.
S202, obtaining an amplification parameter value corresponding to each original model parameter of the current wheel model according to the parameter value, the maximum expression particle size value and the adjustment reference value of each original model parameter of the current wheel model;
specifically, for each original model parameter of the current wheel model, the server may obtain an amplification parameter value corresponding to the original model parameter according to the parameter value, the maximum expression particle size value, and the adjustment reference value of the original model parameter. The amplification parameter value may be an integer value. The range of values of the amplification parameter value is predetermined and includes a limited number of values.
S203, restoring the amplification parameter value corresponding to each original model parameter of the current wheel model to obtain the parameter value of the limiting model parameter corresponding to each original model parameter of the current wheel model;
specifically, for the amplification parameter value corresponding to each original model parameter of the current wheel model, the server may restore the amplification parameter value corresponding to each original model parameter, so that the amplification parameter value corresponding to each original model parameter is reduced to the size range of the parameter value of the original model parameter, and the parameter value of the constraint model parameter corresponding to each original model parameter is obtained.
S204, performing current wheel model training according to the parameter values of the limiting model parameters corresponding to the original model parameters of the current wheel model, and updating the parameter values of the original model parameters of the current wheel model to obtain the parameter values of the original model parameters of the next wheel model.
Specifically, the server performs the current wheel model training based on the parameter values of the constraint model parameters corresponding to the original model parameters of the current wheel model, that is, the model obtained by replacing the parameter values of the constraint model parameters corresponding to the original model parameters of the current wheel model with the parameter values of the constraint model parameters corresponding to the original model parameters of the current wheel model is used for performing the model training. When the parameters are updated, the parameter values of all original model parameters of the current wheel model are updated, and the updated parameter values of all original model parameters of the current wheel model are used as the parameter values of all original model parameters of the next wheel model of the current wheel model. The calculation of loss functions, gradients, and the like involved in the model training process is the same as that in the prior art, and is not described herein again.
On the basis of the foregoing embodiments, further, the obtaining, according to the parameter value, the maximum expression particle size value, and the adjustment reference value of each original model parameter of the current wheel model, an amplification parameter value corresponding to each original model parameter of the current wheel model includes:
according to the formula
Figure 12140DEST_PATH_IMAGE001
Calculating to obtain an amplification parameter value corresponding to the ith original model parameter of the current wheel model,wherein,
Figure 930418DEST_PATH_IMAGE002
represents the amplification parameter value corresponding to the ith original model parameter of the current wheel model,
Figure 441165DEST_PATH_IMAGE003
a parameter value representing an ith original model parameter of the current wheel model,
Figure 862919DEST_PATH_IMAGE004
it is indicated that the reference value is adjusted,
Figure 565033DEST_PATH_IMAGE005
the value of the maximum expressed particle size is expressed,
Figure 705028DEST_PATH_IMAGE005
is a constant number of times, and is,
Figure 144099DEST_PATH_IMAGE006
the method is characterized in that rounding is performed, i is a positive integer, and i is less than or equal to the number of original model parameters of the current wheel model.
Specifically, the server compares the parameter value of the ith original model parameter of the current wheel model
Figure 561305DEST_PATH_IMAGE003
Adjusting the reference value
Figure 60420DEST_PATH_IMAGE004
Maximum expressed particle size value
Figure 563076DEST_PATH_IMAGE005
Into formulas
Figure 540260DEST_PATH_IMAGE001
In the method, the amplification parameter value corresponding to the ith original model parameter of the current wheel model can be obtained through calculation
Figure 811972DEST_PATH_IMAGE002
Obtained by calculation
Figure 481988DEST_PATH_IMAGE002
Are integers. The maximum expression particle size value is set according to actual needs, and the embodiment of the invention is not limited. The parameter value of the original model parameter is amplified to be mapped into the range interval of the amplified parameter value, so that the value number of the parameter value is limited.
For example, setting the amplification parameter value to a range of [ -100,100], then the maximum expression particle size value is 100.
For example, the range of the amplification parameter value can be set according to the final task difficulty, such as image classification, and finally divided into 1000 classes, and if each class is expressed by 200 numerical values, the range of the amplification parameter value can be set to be [ -100 × 1000,100 × 1000], and the maximum expression particle size value is 100000.
On the basis of the foregoing embodiments, further, the image recognition model training method provided in the embodiment of the present invention further includes:
if judged to acquire
Figure 964616DEST_PATH_IMAGE002
Is greater than
Figure 745491DEST_PATH_IMAGE005
Then will be
Figure 996343DEST_PATH_IMAGE002
Value replacement by
Figure 712627DEST_PATH_IMAGE005
If judged to acquire
Figure 314509DEST_PATH_IMAGE002
Is less than
Figure 508861DEST_PATH_IMAGE005
Then will be
Figure 879800DEST_PATH_IMAGE002
Value replacement by
Figure 766984DEST_PATH_IMAGE005
Specifically, in calculating
Figure 590584DEST_PATH_IMAGE002
Then, will
Figure 713261DEST_PATH_IMAGE002
Absolute value of and
Figure 312607DEST_PATH_IMAGE005
make a comparison if
Figure 495327DEST_PATH_IMAGE002
Is greater than
Figure 212747DEST_PATH_IMAGE005
Description of the invention
Figure 14481DEST_PATH_IMAGE002
Out of the limit range if
Figure 94432DEST_PATH_IMAGE002
At a positive value, then
Figure 588999DEST_PATH_IMAGE002
Value replacement by
Figure 387190DEST_PATH_IMAGE005
If, if
Figure 851670DEST_PATH_IMAGE002
Negative values then will
Figure 160029DEST_PATH_IMAGE002
Value replacement by-
Figure 950131DEST_PATH_IMAGE005
. Limiting
Figure 110985DEST_PATH_IMAGE002
Maximum and minimum values of the values are to prevent
Figure 379155DEST_PATH_IMAGE003
After amplification, the calculation result is beyond the numerical range allowed by the computer.
For example, the amplification parameter value is set to a range of [ -100,100 [ ]]Then the maximum expression particle size value is 100. If obtained by calculation
Figure 309065DEST_PATH_IMAGE007
105, then will
Figure 270068DEST_PATH_IMAGE007
The value of (1) is changed to 100; if obtained by calculation
Figure 387059DEST_PATH_IMAGE008
Is-108, then will
Figure 458920DEST_PATH_IMAGE008
The value of (A) is changed to-100.
On the basis of the foregoing embodiments, further, the reducing the amplification parameter value corresponding to each original model parameter of the current wheel model, and obtaining the parameter value of the constraint model parameter corresponding to each original model parameter of the current wheel model includes:
according to the formula
Figure 7451DEST_PATH_IMAGE009
Calculating to obtain the parameter value of the limited model parameter corresponding to the ith original model parameter of the current wheel model, wherein,
Figure 873776DEST_PATH_IMAGE010
a parameter value of a constraint model parameter corresponding to an ith original model parameter representing a current wheel model,
Figure 44775DEST_PATH_IMAGE002
is shown asThe amplification parameter value corresponding to the ith original model parameter of the front wheel model,
Figure 795694DEST_PATH_IMAGE004
it is indicated that the reference value is adjusted,
Figure 824830DEST_PATH_IMAGE005
the value of the maximum expressed particle size is expressed,
Figure 3001DEST_PATH_IMAGE005
and the number of the original model parameters of the current wheel model is less than or equal to the number of the original model parameters of the current wheel model.
Specifically, the server amplifies a parameter value corresponding to the ith original model parameter of the current wheel model
Figure 124279DEST_PATH_IMAGE002
Adjusting the reference value
Figure 537942DEST_PATH_IMAGE004
Maximum expressed particle size value
Figure 93689DEST_PATH_IMAGE005
Into formulas
Figure 177182DEST_PATH_IMAGE009
In the method, the parameter value of the limited model parameter corresponding to the ith original model parameter of the current wheel model can be calculated and obtained
Figure 146275DEST_PATH_IMAGE010
. Limiting parameter values of model parameters
Figure 581541DEST_PATH_IMAGE010
For model training.
On the basis of the above embodiments, further, the original model parameters are weights and offsets.
For example, for a deep learning model, a plurality of neurons are included, each neuron corresponding to at least one weight and one bias. The proto-model parameters of the deep learning model are the weights and biases of all neurons.
The following describes a specific implementation process of the image recognition model training method provided by the embodiment of the present invention, taking a training process of a handwritten number recognition model as an example.
And collecting N handwritten digital pictures, and taking the number written in each handwritten digital picture as a label corresponding to the corresponding handwritten digital picture to obtain a training data set.
The original model adopts a Convolutional Neural Network (CNN) model, a Deep Neural network (Deep DNN) model, and the like, and is selected according to actual needs, which is not limited in the embodiments of the present invention. The following description will take the original model as an example, where the CNN model includes a first feature extraction layer, a second feature extraction layer and a classifier, and the first feature extraction layer includes weights
Figure 460635DEST_PATH_IMAGE011
And bias
Figure 839664DEST_PATH_IMAGE012
The second feature extraction layer includes weights
Figure 669954DEST_PATH_IMAGE013
And bias
Figure 425421DEST_PATH_IMAGE014
The classifier includes weights
Figure 159022DEST_PATH_IMAGE015
And bias
Figure 354292DEST_PATH_IMAGE016
Setting the range of the amplification parameter values to be [ -100,100], and setting the total number of the amplification parameter values to be 201. The maximum expression particle size value was 100.
In the process of model training based on the training data set, the first round of training is performed, the first round of model is a CNN model, and a primary model of the first round of modelForm parameter
Figure 875140DEST_PATH_IMAGE011
Figure 309664DEST_PATH_IMAGE012
Figure 22405DEST_PATH_IMAGE013
Figure 415340DEST_PATH_IMAGE014
Figure 485802DEST_PATH_IMAGE015
And
Figure 583071DEST_PATH_IMAGE016
the parameter values of (a) are randomly generated. Obtaining
Figure 25685DEST_PATH_IMAGE011
Figure 182997DEST_PATH_IMAGE012
Figure 976641DEST_PATH_IMAGE013
Figure 346442DEST_PATH_IMAGE014
Figure 909141DEST_PATH_IMAGE015
And
Figure 237355DEST_PATH_IMAGE016
and (4) rounding the maximum value of the parameter values of the six parameters to obtain an adjustment reference value of the first round of training.
According to the formula
Figure 214233DEST_PATH_IMAGE001
Will be
Figure 653304DEST_PATH_IMAGE011
Figure 70510DEST_PATH_IMAGE012
Figure 569625DEST_PATH_IMAGE013
Figure 72281DEST_PATH_IMAGE014
Figure 783885DEST_PATH_IMAGE015
And
Figure 819712DEST_PATH_IMAGE016
the parameter values of the six parameters are respectively brought into the formula and can be obtained by calculation
Figure 224149DEST_PATH_IMAGE011
Figure 338735DEST_PATH_IMAGE012
Figure 729396DEST_PATH_IMAGE013
Figure 245828DEST_PATH_IMAGE014
Figure 962112DEST_PATH_IMAGE015
And
Figure 298415DEST_PATH_IMAGE016
the corresponding amplification parameter values are respectively recorded as
Figure 820663DEST_PATH_IMAGE017
Figure 565503DEST_PATH_IMAGE018
Figure 577321DEST_PATH_IMAGE007
Figure 10708DEST_PATH_IMAGE019
Figure 398964DEST_PATH_IMAGE008
And
Figure 358830DEST_PATH_IMAGE020
. According to the formula
Figure 577102DEST_PATH_IMAGE009
Will be
Figure 294523DEST_PATH_IMAGE017
Figure 361836DEST_PATH_IMAGE018
Figure 848312DEST_PATH_IMAGE007
Figure 467512DEST_PATH_IMAGE019
Figure 639605DEST_PATH_IMAGE008
And
Figure 104084DEST_PATH_IMAGE020
are respectively substituted into the formula to obtain
Figure 54854DEST_PATH_IMAGE011
Figure 844956DEST_PATH_IMAGE012
Figure 176449DEST_PATH_IMAGE013
Figure 710198DEST_PATH_IMAGE014
Figure 640108DEST_PATH_IMAGE015
And
Figure 335532DEST_PATH_IMAGE016
the corresponding parameter values of the constraint model parameters are respectively recorded as
Figure 983682DEST_PATH_IMAGE021
Figure 55543DEST_PATH_IMAGE022
Figure 338494DEST_PATH_IMAGE023
Figure 470399DEST_PATH_IMAGE024
Figure 74686DEST_PATH_IMAGE025
And
Figure 950238DEST_PATH_IMAGE026
by using
Figure 854741DEST_PATH_IMAGE021
Figure 626387DEST_PATH_IMAGE022
Figure 373764DEST_PATH_IMAGE023
Figure 662794DEST_PATH_IMAGE024
Figure 546436DEST_PATH_IMAGE025
And
Figure 146043DEST_PATH_IMAGE026
replacement in CNN model
Figure 849557DEST_PATH_IMAGE011
Figure 942278DEST_PATH_IMAGE012
Figure 414847DEST_PATH_IMAGE013
Figure 731559DEST_PATH_IMAGE014
Figure 63314DEST_PATH_IMAGE015
And
Figure 818781DEST_PATH_IMAGE016
obtaining a training model corresponding to the first round of models, performing model training on the training model corresponding to the first round of models, performing correlation calculation, and updating after calculating to obtain a gradient
Figure 411436DEST_PATH_IMAGE011
Figure 600847DEST_PATH_IMAGE012
Figure 13374DEST_PATH_IMAGE013
Figure 447897DEST_PATH_IMAGE014
Figure 160638DEST_PATH_IMAGE015
And
Figure 22415DEST_PATH_IMAGE016
of the parameter values of
Figure 187817DEST_PATH_IMAGE011
Figure 285086DEST_PATH_IMAGE012
Figure 462121DEST_PATH_IMAGE013
Figure 619433DEST_PATH_IMAGE014
Figure 646032DEST_PATH_IMAGE015
And
Figure 546992DEST_PATH_IMAGE016
and taking the updated parameter value as the parameter value of the original model parameter of the second round model. The second round model is the CNN model is updated
Figure 968746DEST_PATH_IMAGE011
Figure 172325DEST_PATH_IMAGE012
Figure 46741DEST_PATH_IMAGE013
Figure 361178DEST_PATH_IMAGE014
And, and
Figure 637439DEST_PATH_IMAGE016
a model of the parameter values of (a).
From the second round of training to the end of model training, the specific process of each round of training is similar to the process of the first round of training, which is not repeated herein, and when the calculated value of the loss function reaches the preset requirement, the model training is ended. After the model training is finished, the training model corresponding to the model in the last round can be used as a handwritten number recognition model.
The following describes a specific implementation process of the image recognition model training method provided by the embodiment of the present invention, taking a training process of an object recognition model for recognizing an object from a picture as an example.
And collecting M object pictures, and taking the object in each object picture as an object label corresponding to each object picture to obtain an object identification training data set.
The original model adopts a RESNET50 model, and the RESNET50 model comprises 49 convolutional layers and 1 full-connection layer. Each convolutional layer includes a weight and an offset.
Setting the range of the amplification parameter values to be [ -100000,100000], and setting the total number of the amplification parameter values to be 200001. The maximum expressed particle size value was 100000.
In the process of model training based on the object recognition training data set, the first round of training is conducted, the first round of model is a RESNET50 model, original model parameters of the first round of model are weights and offsets included by each convolutional layer, and parameter values of the weights and the offsets included by each convolutional layer are generated randomly. And acquiring the maximum value of the weight and the biased parameter value included in each convolutional layer, and rounding the maximum value to obtain an adjustment reference value of the first round of training.
According to the formula
Figure 136553DEST_PATH_IMAGE001
The values of the weight and offset parameters included in each convolutional layer are respectively introduced into the formula, and the values of the amplification parameters corresponding to the weight and offset parameters included in each convolutional layer can be calculated and obtained and are respectively recorded as
Figure 639210DEST_PATH_IMAGE017
Figure 616393DEST_PATH_IMAGE018
Figure 652220DEST_PATH_IMAGE007
Figure 56657DEST_PATH_IMAGE019
Figure 905664DEST_PATH_IMAGE008
……
Figure 827484DEST_PATH_IMAGE027
. According to the formula
Figure 343916DEST_PATH_IMAGE009
Will be
Figure 60199DEST_PATH_IMAGE017
Figure 130923DEST_PATH_IMAGE018
Figure 590854DEST_PATH_IMAGE007
Figure 696214DEST_PATH_IMAGE019
Figure 708032DEST_PATH_IMAGE008
……
Figure 176971DEST_PATH_IMAGE027
Respectively substituting into the formula, calculating to obtain the parameter values of the constraint model parameters corresponding to the weight and bias included in each convolution layer, and respectively recording as
Figure 34069DEST_PATH_IMAGE021
Figure 400459DEST_PATH_IMAGE022
Figure 583179DEST_PATH_IMAGE023
Figure 628496DEST_PATH_IMAGE024
Figure 430229DEST_PATH_IMAGE025
……
Figure 775760DEST_PATH_IMAGE028
By using
Figure 739168DEST_PATH_IMAGE021
Figure 537360DEST_PATH_IMAGE022
Figure 641320DEST_PATH_IMAGE023
Figure 575778DEST_PATH_IMAGE024
Figure 100300DEST_PATH_IMAGE025
……
Figure 261154DEST_PATH_IMAGE028
Replacing the weight and offset parameter values included in each convolution layer of the first round model to obtain a training model corresponding to the first round model, performing model training on a constraint model corresponding to the first round model, performing correlation calculation, after calculating to obtain a gradient, updating the weight and offset parameter values included in each convolution layer of the first round model, and taking the updated weight and offset parameter values included in each convolution layer of the first round model as the original model parameter values of the second round model, namely the weight and offset parameter values included in each convolution layer of the second round model. The second round model updates the model for the RESNET50 model with the parameter values for the weights and biases included in each convolutional layer.
From the second round of training to the end of model training, the specific process of each round of training is similar to the process of the first round of training, which is not repeated herein, and when the calculated value of the loss function reaches the preset requirement, the model training is ended. And after the model training is finished, the training model corresponding to the model in the last round can be used as the object recognition model.
According to the image recognition model training method provided by the embodiment of the invention, because the expressible range of the model parameters is limited, the information contained in the expression granularity of the model parameters is more, the model is regressed to the characteristic analysis of the sample instead of the meaningless value range change, the model convergence can be accelerated, the model training speed is increased, the lifting rate of the model training is in direct proportion to the number of the model parameters, and the lifting rate is increased when the number of the model parameters is more. Because the expression meaning of the model parameters is richer, the possibility of the overfitting of the model is reduced, and the generalization capability of the model is improved.
The image recognition model training method provided by the embodiment of the invention can be applied to tasks such as image classification, face recognition, target detection, human shape detection and the like in the image field; tasks such as voice awakening, voice recognition, voiceprint recognition, voice synthesis and the like in the voice field; the tasks of named body recognition, semantic analysis, text generation, text classification and the like in the field of natural language processing accelerate model training and prevent overfitting of the model.
Fig. 3 is a schematic structural diagram of an image recognition model training apparatus according to a third embodiment of the present invention, and as shown in fig. 3, the image recognition model training apparatus according to the embodiment of the present invention includes an obtaining unit 301 and a training unit 302, where:
the acquiring unit 301 is configured to acquire a training data set of a target to be recognized; the training unit 302 is configured to train the original model according to a training data set to obtain an image recognition model; in each round of training process, limiting the parameter values of all original model parameters of each round of model, and obtaining the parameter values of the limited model parameters corresponding to all the original model parameters of each round of model to train the model; and the parameter values of the image recognition model parameters included by the image recognition model are the parameter values of the limiting model parameters adopted by the last round of model training.
Specifically, the obtaining unit 301 may obtain a training data set of the target to be identified, where the training data set is used for model training. The target to be identified is selected according to actual needs, and the embodiment of the invention is not limited.
The training unit 302 trains the original model according to the training data set, and may train to obtain the image recognition model. The model training process has multiple rounds of training, in each round of training process, the parameter value of each original model parameter of each round of model can be obtained, the parameter value of each original model parameter is limited, the value number of the parameter value of the original model parameter is limited, the expression granularity of the parameter value of each original model parameter is improved, the parameter value of the limited model parameter corresponding to each original model parameter is obtained, model training is carried out through the parameter value of the limited model parameter corresponding to each original model parameter of each round of model, and the parameter value of each original model parameter is updated. The training speed of the model is accelerated due to the limited number of parameter values for limiting the model parameters. And when the end condition of the model training is met, the server acquires the parameter value of the limiting model parameter used by the last round of model training as the parameter value of the image recognition model parameter of the image recognition model. The original model may be a convolutional neural network model, and is selected according to actual needs, which is not limited in the embodiments of the present invention. The first round model is an original model, and parameter values of original model parameters of the original model can be randomly generated or preset. From the second round of training, each round of model is the model with the updated parameter values of the original model parameters, and the parameter values of the original model parameters of each round of model are the updated parameter values of the original model parameters of the previous round of model. In the embodiment of the present invention, the original model parameters refer to parameters that need to be updated in the training process, and include, but are not limited to, parameters such as weight and offset.
The image recognition model training device provided by the embodiment of the invention obtains a training data set of a target to be recognized, trains an original model according to the training data set to obtain an image recognition model, limits the parameter values of each original model parameter of each model in each round of training process to obtain a limited model corresponding to each original model parameter of each model in each round of training and performs model training, wherein the parameter values of the image recognition model parameters included in the image recognition model are the parameter values of the limited model parameters adopted in the last round of model training, and the training efficiency of the image recognition model is improved because the quantity of the parameter values of the limited model parameters is limited. In addition, the number of the parameter values limiting the model parameters is limited, and the model parameters can express more information, so that the phenomenon of overfitting of the model can be effectively avoided.
Fig. 4 is a schematic structural diagram of an image recognition model training apparatus according to a fourth embodiment of the present invention, and as shown in fig. 4, on the basis of the foregoing embodiments, further, the training unit 302 includes an acquiring subunit 3021, an amplifying subunit 3022, a restoring subunit 3023, and a training subunit 3024, where:
the obtaining subunit 3021 is configured to obtain a maximum value of parameter values of each original model parameter of the current wheel model, and perform rounding to obtain an adjustment reference value; the amplification subunit 3022 is configured to obtain an amplification parameter value corresponding to each original model parameter of the current wheel model according to the parameter value, the maximum expression particle size value, and the adjustment reference value of each original model parameter of the current wheel model; the atomic reduction unit 3023 is configured to reduce the amplification parameter value corresponding to each original model parameter of the current wheel model, and obtain a parameter value amplification parameter value of the constraint model parameter corresponding to each original model parameter of the current wheel model; the training subunit 3024 is configured to perform model training on the current wheel model according to the parameter values of the constraint model parameters corresponding to the original model parameters of the current wheel model, and update the parameter values of the original model parameters of the current wheel model to obtain the parameter values of the original model parameters of the next wheel model.
Specifically, the obtaining subunit 3021 may obtain the parameter value of each original model parameter of the current wheel model, compare the parameter values of the original model parameters, obtain the maximum value of the parameter value of each original model parameter, and then perform rounding on the maximum value of the parameter value of each original model parameter to obtain the adjustment reference value. The current round model refers to a currently trained model. Since the parameter values of the model are updated every round of training, the parameter values of the original model parameters of each round of model vary.
For each original model parameter of the current wheel model, the amplification subunit 3022 may obtain an amplification parameter value corresponding to the original model parameter according to the parameter value, the maximum expression particle size value, and the adjustment reference value of the original model parameter. The amplification parameter value may be an integer value. The range of values of the amplification parameter value is predetermined and includes a limited number of values.
For the amplification parameter value corresponding to each original model parameter of the current wheel model, the reduction subunit 3023 may reduce the amplification parameter value corresponding to each original model parameter, so that the amplification parameter value corresponding to each original model parameter is reduced to the size range of the parameter value of the original model parameter, and the parameter value of the limited model parameter corresponding to each original model parameter is obtained. The size range of the parameter values scaled to the original model parameters is to prevent the calculation result from exceeding the computer allowable value range after the parameter values are amplified.
The training subunit 3024 performs model training on the current wheel model based on the parameter values of the constraint model parameters corresponding to the original model parameters of the current wheel model, that is, performs model training by replacing the parameter values of the original model parameters of the current wheel model with the parameter values of the constraint model parameters corresponding to the original model parameters of the current wheel model. When the parameters are updated, the parameter values of all original model parameters of the current wheel model are updated, and the updated parameter values of all original model parameters of the current wheel model are used as the parameter values of all original model parameters of the next wheel model of the current wheel model. The calculation of loss functions, gradients, and the like involved in the model training process is the same as that in the prior art, and is not described herein again.
On the basis of the above embodiments, further, the amplifying subunit 3022 is specifically configured to:
according to the formula
Figure 529324DEST_PATH_IMAGE001
Calculating to obtain an amplification parameter value corresponding to the ith original model parameter of the current wheel model, wherein,
Figure 459234DEST_PATH_IMAGE002
represents the amplification parameter value corresponding to the ith original model parameter of the current wheel model,
Figure 889078DEST_PATH_IMAGE003
a parameter value representing an ith original model parameter of the current wheel model,
Figure 661862DEST_PATH_IMAGE004
it is indicated that the reference value is adjusted,
Figure 343511DEST_PATH_IMAGE005
the value of the maximum expressed particle size is expressed,
Figure 252561DEST_PATH_IMAGE005
is a constant number of times, and is,
Figure 118886DEST_PATH_IMAGE006
the method is characterized in that rounding is performed, i is a positive integer, and i is less than or equal to the number of original model parameters of the current wheel model.
Specifically, the amplification subunit 3022 converts the parameter value of the ith original model parameter of the current wheel model into the parameter value of the ith original model parameter
Figure 487288DEST_PATH_IMAGE003
Adjusting the reference value
Figure 362840DEST_PATH_IMAGE004
Maximum expressed particle size value
Figure 736184DEST_PATH_IMAGE005
Into formulas
Figure 38989DEST_PATH_IMAGE001
In the method, the amplification parameter value corresponding to the ith original model parameter of the current wheel model can be obtained through calculation
Figure 396152DEST_PATH_IMAGE002
Obtained by calculation
Figure 544237DEST_PATH_IMAGE002
Is an integer. The maximum expression particle size value is set according to actual needs, and the embodiment of the invention is not limited. The parameter value of the original model parameter is amplified to be mapped into the range interval of the amplified parameter value, so that the value number of the parameter value is limited.
On the basis of the above embodiments, further, the amplifying subunit 3022 is further configured to:
if judged to acquire
Figure 427879DEST_PATH_IMAGE002
Is greater than
Figure 776952DEST_PATH_IMAGE005
Then will be
Figure 480466DEST_PATH_IMAGE002
Take a value of
Figure 71722DEST_PATH_IMAGE005
If judged to acquire
Figure 544291DEST_PATH_IMAGE002
Is less than
Figure 188899DEST_PATH_IMAGE005
Then will be
Figure 255076DEST_PATH_IMAGE002
Take a value of
Figure 10542DEST_PATH_IMAGE005
Specifically, in calculating
Figure 744143DEST_PATH_IMAGE002
Thereafter, the amplification subunit 3022 will amplify the signal
Figure 294073DEST_PATH_IMAGE002
Absolute value of and
Figure 972179DEST_PATH_IMAGE005
make a comparison if
Figure 406702DEST_PATH_IMAGE002
Is greater than
Figure 119443DEST_PATH_IMAGE005
Description of the invention
Figure 473896DEST_PATH_IMAGE002
Out of the limit range if
Figure 373719DEST_PATH_IMAGE002
At a positive value, then
Figure 346354DEST_PATH_IMAGE002
Value replacement by
Figure 913602DEST_PATH_IMAGE005
If, if
Figure 70913DEST_PATH_IMAGE002
Negative values then will
Figure 598978DEST_PATH_IMAGE002
Value replacement
Figure 234359DEST_PATH_IMAGE005
. On the basis of the above embodiments, further, the reducing subunit 3023 is specifically configured to:
according to the formula
Figure 797058DEST_PATH_IMAGE009
Calculating to obtain the parameter value of the limited model parameter corresponding to the ith original model parameter of the current wheel model, wherein,
Figure 125271DEST_PATH_IMAGE010
a parameter value of a constraint model parameter corresponding to an ith original model parameter representing a current wheel model,
Figure 639167DEST_PATH_IMAGE002
represents the amplification parameter value corresponding to the ith original model parameter of the current wheel model,
Figure 812659DEST_PATH_IMAGE004
it is indicated that the reference value is adjusted,
Figure 354499DEST_PATH_IMAGE005
the value of the maximum expressed particle size is expressed,
Figure 463400DEST_PATH_IMAGE005
and the number of the original model parameters of the current wheel model is less than or equal to the number of the original model parameters of the current wheel model.
Specifically, the reduction subunit 3023 uses the amplification parameter value corresponding to the ith original model parameter of the current wheel model
Figure 90691DEST_PATH_IMAGE002
Adjusting the reference value
Figure 208820DEST_PATH_IMAGE004
Maximum expressed particle size value
Figure 339587DEST_PATH_IMAGE005
Into formulas
Figure 9602DEST_PATH_IMAGE009
In the method, the parameter value of the limited model parameter corresponding to the ith original model parameter of the current wheel model can be calculated and obtained
Figure 999555DEST_PATH_IMAGE010
. Limiting parameter values of model parameters
Figure 780429DEST_PATH_IMAGE010
For model training.
On the basis of the above embodiments, further, the original model parameters are weights and offsets.
The embodiment of the apparatus provided in the embodiment of the present invention may be specifically configured to execute the processing flows of the above method embodiments, and the functions of the apparatus are not described herein again, and refer to the detailed description of the above method embodiments.
Fig. 5 is a schematic physical structure diagram of an electronic device according to a fifth embodiment of the present invention, and as shown in fig. 5, the electronic device may include: a processor (processor)501, a communication Interface (Communications Interface)502, a memory (memory)503, and a communication bus 504, wherein the processor 501, the communication Interface 502, and the memory 503 are configured to communicate with each other via the communication bus 504. The processor 501 may call logic instructions in the memory 503 to perform the following method: acquiring a training data set of a target to be recognized; training an original model according to a training data set to obtain an image recognition model; in each round of training process, limiting the parameter values of all original model parameters of each round of model, obtaining the parameter values of the limited model parameters corresponding to all the original model parameters of each round of model, and performing model training; and the parameter values of the image recognition model parameters included by the image recognition model are the parameter values of the limiting model parameters adopted by the last round of model training.
In addition, the logic instructions in the memory 503 may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The present embodiment discloses a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the method provided by the above-mentioned method embodiments, for example, comprising: acquiring a training data set of a target to be recognized; training an original model according to a training data set to obtain an image recognition model; in each round of training process, limiting the parameter values of all original model parameters of each round of model, obtaining the parameter values of the limited model parameters corresponding to all the original model parameters of each round of model, and performing model training; and the parameter values of the image recognition model parameters included by the image recognition model are the parameter values of the limiting model parameters adopted by the last round of model training.
The present embodiment provides a computer-readable storage medium, which stores instructions that cause the computer to execute the method provided by the above method embodiments, for example, including: acquiring a training data set of a target to be recognized; training an original model according to a training data set to obtain an image recognition model; in each round of training process, limiting the parameter values of all original model parameters of each round of model, obtaining the parameter values of the limited model parameters corresponding to all the original model parameters of each round of model, and performing model training; and the parameter values of the image recognition model parameters included by the image recognition model are the parameter values of the limiting model parameters adopted by the last round of model training.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In the description herein, reference to the description of the terms "one embodiment," "a particular embodiment," "some embodiments," "for example," "an example," "a particular example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (14)

1. An image recognition model training method is characterized by comprising the following steps:
acquiring a training data set of a target to be recognized;
training an original model according to a training data set to obtain an image recognition model; in each round of training process, limiting the parameter values of all original model parameters of each round of model, obtaining the parameter values of the limited model parameters corresponding to all the original model parameters of each round of model, and performing model training; and the parameter values of the image recognition model parameters included by the image recognition model are the parameter values of the limiting model parameters adopted by the last round of model training.
2. The method according to claim 1, wherein the limiting of the parameter values of the original model parameters of each model in each round of training process, and the obtaining of the parameter values of the limited model parameters corresponding to the original model parameters of each model in each round of training and the model training comprises:
acquiring the maximum value of the parameter values of all original model parameters of the current wheel model, and rounding to obtain an adjustment reference value;
obtaining an amplification parameter value corresponding to each original model parameter of the current wheel model according to the parameter value, the maximum expression particle size value and the adjustment reference value of each original model parameter of the current wheel model;
restoring the amplification parameter value corresponding to each original model parameter of the current wheel model to obtain the parameter value of the limiting model parameter corresponding to each original model parameter of the current wheel model;
and performing current wheel model training according to the parameter values of the limiting model parameters corresponding to the original model parameters of the current wheel model, and updating the parameter values of the original model parameters of the current wheel model to obtain the parameter values of the original model parameters of the next wheel model.
3. The method of claim 2, wherein obtaining the amplified parameter value corresponding to each original model parameter of the current wheel model according to the parameter value, the maximum expression particle size value and the adjustment reference value of each original model parameter of the current wheel model comprises:
according to the formula
Figure 695339DEST_PATH_IMAGE001
Calculating to obtain an amplification parameter value corresponding to the ith original model parameter of the current wheel model, wherein,
Figure 753425DEST_PATH_IMAGE002
represents the amplification parameter value corresponding to the ith original model parameter of the current wheel model,
Figure 64321DEST_PATH_IMAGE003
a parameter value representing an ith original model parameter of the current wheel model,
Figure 725109DEST_PATH_IMAGE004
it is indicated that the reference value is adjusted,
Figure 946006DEST_PATH_IMAGE005
the value of the maximum expressed particle size is expressed,
Figure 440573DEST_PATH_IMAGE005
is a constant number of times, and is,
Figure 973185DEST_PATH_IMAGE006
the method is characterized in that rounding is performed, i is a positive integer, and i is less than or equal to the number of original model parameters of the current wheel model.
4. The method of claim 3, further comprising:
if judged to acquire
Figure 703244DEST_PATH_IMAGE002
Is greater than
Figure 542762DEST_PATH_IMAGE005
Then will be
Figure 801705DEST_PATH_IMAGE002
Take a value of
Figure 962559DEST_PATH_IMAGE005
If judged to acquire
Figure 230729DEST_PATH_IMAGE002
Is less than
Figure 895060DEST_PATH_IMAGE005
Then will be
Figure 856062DEST_PATH_IMAGE002
Take a value of
Figure 504212DEST_PATH_IMAGE005
5. The method of claim 2, wherein the restoring the amplified parameter value corresponding to each original model parameter of the current wheel model to obtain the parameter value of the constrained model parameter corresponding to each original model parameter of the current wheel model comprises:
according to the formula
Figure 310494DEST_PATH_IMAGE007
Calculating to obtain the parameter value of the limited model parameter corresponding to the ith original model parameter of the current wheel model, wherein,
Figure 219545DEST_PATH_IMAGE008
a parameter value of a constraint model parameter corresponding to an ith original model parameter representing a current wheel model,
Figure 459771DEST_PATH_IMAGE002
ith original model parameter correspondence representing current wheel modelThe value of the amplification parameter of (a),
Figure 719851DEST_PATH_IMAGE004
it is indicated that the reference value is adjusted,
Figure 205190DEST_PATH_IMAGE005
the value of the maximum expressed particle size is expressed,
Figure 234326DEST_PATH_IMAGE005
and the number of the original model parameters of the current wheel model is less than or equal to the number of the original model parameters of the current wheel model.
6. The method according to any of claims 1 to 5, wherein the proto-model parameters are weights and biases.
7. An image recognition model training apparatus, comprising:
the acquisition unit is used for acquiring a training data set of a target to be recognized;
the training unit is used for training the original model according to the training data set to obtain an image recognition model; in each round of training process, limiting the parameter values of all original model parameters of each round of model, obtaining the parameter values of the limited model parameters corresponding to all the original model parameters of each round of model, and performing model training; and the parameter values of the image recognition model parameters included by the image recognition model are the parameter values of the limiting model parameters adopted by the last round of model training.
8. The apparatus of claim 7, wherein the training unit comprises:
the acquisition subunit is used for acquiring the maximum value of the parameter values of all the original model parameters of the current wheel model and rounding the maximum value to obtain an adjustment reference value;
the amplifying subunit is used for obtaining an amplifying parameter value corresponding to each original model parameter of the current wheel model according to the parameter value, the maximum expression particle size value and the adjustment reference value of each original model parameter of the current wheel model;
the restoring subunit is used for restoring the amplification parameter value corresponding to each original model parameter of the current wheel model to obtain a parameter value amplification parameter value of the limited model parameter corresponding to each original model parameter of the current wheel model;
and the training subunit is used for performing model training on the current wheel model according to the parameter values of the limited model parameters corresponding to the original model parameters of the current wheel model, updating the parameter values of the original model parameters of the current wheel model, and obtaining the parameter values of the original model parameters of the next wheel model.
9. The apparatus of claim 8, wherein the amplification subunit is specifically configured to:
according to the formula
Figure 146918DEST_PATH_IMAGE001
Calculating to obtain an amplification parameter value corresponding to the ith original model parameter of the current wheel model, wherein,
Figure 628715DEST_PATH_IMAGE002
represents the amplification parameter value corresponding to the ith original model parameter of the current wheel model,
Figure 776800DEST_PATH_IMAGE003
a parameter value representing an ith original model parameter of the current wheel model,
Figure 801387DEST_PATH_IMAGE004
it is indicated that the reference value is adjusted,
Figure 654855DEST_PATH_IMAGE005
the value of the maximum expressed particle size is expressed,
Figure 623948DEST_PATH_IMAGE005
is a constant number of times, and is,
Figure 451090DEST_PATH_IMAGE006
the method is characterized in that rounding is performed, i is a positive integer, and i is less than or equal to the number of original model parameters of the current wheel model.
10. The apparatus of claim 9, wherein the amplification subunit is further configured to:
if judged to acquire
Figure 189238DEST_PATH_IMAGE002
Is greater than
Figure 833846DEST_PATH_IMAGE005
Then will be
Figure 900023DEST_PATH_IMAGE002
Take a value of
Figure 655489DEST_PATH_IMAGE005
If judged to acquire
Figure 123511DEST_PATH_IMAGE002
Is less than
Figure 673441DEST_PATH_IMAGE005
Then will be
Figure 256606DEST_PATH_IMAGE002
Take a value of
Figure 550184DEST_PATH_IMAGE005
11. The apparatus of claim 8, wherein the further atomic unit is specifically configured to:
according to the formula
Figure 262926DEST_PATH_IMAGE007
Calculating to obtain the parameter value of the limited model parameter corresponding to the ith original model parameter of the current wheel model, wherein the parameter value of the limited model parameter corresponding to the ith original model parameter of the current wheel model is represented,
Figure 859123DEST_PATH_IMAGE002
represents the amplification parameter value corresponding to the ith original model parameter of the current wheel model,
Figure 290104DEST_PATH_IMAGE004
it is indicated that the reference value is adjusted,
Figure 997160DEST_PATH_IMAGE005
the value of the maximum expressed particle size is expressed,
Figure 95566DEST_PATH_IMAGE005
and the number of the original model parameters of the current wheel model is less than or equal to the number of the original model parameters of the current wheel model.
12. The apparatus according to any of the claims 7 to 11, characterized in that the proto-model parameters are weights and biases.
13. A computer device comprising a memory, a processor, and instructions stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1 to 6 when executing the instructions.
14. A computer-readable storage medium, characterized in that the computer-readable storage medium stores instructions that, when executed by a processor, implement the method of any of claims 1 to 6.
CN202210883039.7A 2022-07-26 2022-07-26 Image recognition model training method and device Active CN114972928B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210883039.7A CN114972928B (en) 2022-07-26 2022-07-26 Image recognition model training method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210883039.7A CN114972928B (en) 2022-07-26 2022-07-26 Image recognition model training method and device

Publications (2)

Publication Number Publication Date
CN114972928A true CN114972928A (en) 2022-08-30
CN114972928B CN114972928B (en) 2022-11-11

Family

ID=82970029

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210883039.7A Active CN114972928B (en) 2022-07-26 2022-07-26 Image recognition model training method and device

Country Status (1)

Country Link
CN (1) CN114972928B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI855773B (en) 2023-07-14 2024-09-11 新唐科技股份有限公司 Training method of image processing model

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106952235A (en) * 2017-02-10 2017-07-14 维沃移动通信有限公司 A kind of image processing method and mobile terminal
CN107516112A (en) * 2017-08-24 2017-12-26 北京小米移动软件有限公司 Object type recognition methods, device, equipment and storage medium
JP2019159694A (en) * 2018-03-12 2019-09-19 Kddi株式会社 Information processing device, information processing method, and program
CN110610237A (en) * 2019-09-17 2019-12-24 普联技术有限公司 Quantitative training method and device of model and storage medium
CN111783996A (en) * 2020-06-18 2020-10-16 杭州海康威视数字技术股份有限公司 Data processing method, device and equipment
CN112085205A (en) * 2019-06-14 2020-12-15 第四范式(北京)技术有限公司 Method and system for automatically training machine learning models
US20210092280A1 (en) * 2019-09-24 2021-03-25 Sony Corporation Artificial intelligence (ai)-based control of imaging parameters of image-capture apparatus
CN112668639A (en) * 2020-12-28 2021-04-16 苏州浪潮智能科技有限公司 Model training method and device, server and storage medium
CN112818387A (en) * 2021-01-22 2021-05-18 百度在线网络技术(北京)有限公司 Method, apparatus, storage medium, and program product for model parameter adjustment
CN113255445A (en) * 2021-04-20 2021-08-13 杭州飞步科技有限公司 Multitask model training and image processing method, device, equipment and storage medium
CN113344117A (en) * 2021-06-28 2021-09-03 清华大学 Network training method and device, target identification method and device and electronic equipment
CN114463586A (en) * 2022-01-30 2022-05-10 中国农业银行股份有限公司 Training and image recognition method, device, equipment and medium of image recognition model
CN114548213A (en) * 2021-12-29 2022-05-27 浙江大华技术股份有限公司 Model training method, image recognition method, terminal device, and computer medium

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106952235A (en) * 2017-02-10 2017-07-14 维沃移动通信有限公司 A kind of image processing method and mobile terminal
CN107516112A (en) * 2017-08-24 2017-12-26 北京小米移动软件有限公司 Object type recognition methods, device, equipment and storage medium
JP2019159694A (en) * 2018-03-12 2019-09-19 Kddi株式会社 Information processing device, information processing method, and program
CN112085205A (en) * 2019-06-14 2020-12-15 第四范式(北京)技术有限公司 Method and system for automatically training machine learning models
CN110610237A (en) * 2019-09-17 2019-12-24 普联技术有限公司 Quantitative training method and device of model and storage medium
US20210092280A1 (en) * 2019-09-24 2021-03-25 Sony Corporation Artificial intelligence (ai)-based control of imaging parameters of image-capture apparatus
CN111783996A (en) * 2020-06-18 2020-10-16 杭州海康威视数字技术股份有限公司 Data processing method, device and equipment
CN112668639A (en) * 2020-12-28 2021-04-16 苏州浪潮智能科技有限公司 Model training method and device, server and storage medium
CN112818387A (en) * 2021-01-22 2021-05-18 百度在线网络技术(北京)有限公司 Method, apparatus, storage medium, and program product for model parameter adjustment
CN113255445A (en) * 2021-04-20 2021-08-13 杭州飞步科技有限公司 Multitask model training and image processing method, device, equipment and storage medium
CN113344117A (en) * 2021-06-28 2021-09-03 清华大学 Network training method and device, target identification method and device and electronic equipment
CN114548213A (en) * 2021-12-29 2022-05-27 浙江大华技术股份有限公司 Model training method, image recognition method, terminal device, and computer medium
CN114463586A (en) * 2022-01-30 2022-05-10 中国农业银行股份有限公司 Training and image recognition method, device, equipment and medium of image recognition model

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
NICHOLAS MERRILL等: "Unsupervised Ensemble-Kernel Principal Component Analysis for Hyperspectral Anomaly Detection", 《2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW)》 *
XUZHENHE等: "Efficient reliability analysis considering uncertainty in random field parameters: Trained neural networks as surrogate models", 《COMPUTERS AND GEOTECHNICS》 *
张宇航: "基于深度度量学习的电商鞋类图像检索技术的应用研究", 《中国硕士学位论文全文数据库信息科技辑》 *
胡金涛: "基于深度学习和流处理技术的通用推荐服务系统", 《中国硕士学位论文全文数据库信息科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI855773B (en) 2023-07-14 2024-09-11 新唐科技股份有限公司 Training method of image processing model

Also Published As

Publication number Publication date
CN114972928B (en) 2022-11-11

Similar Documents

Publication Publication Date Title
KR102071582B1 (en) Method and apparatus for classifying a class to which a sentence belongs by using deep neural network
CN111164601B (en) Emotion recognition method, intelligent device and computer readable storage medium
CN110990543A (en) Intelligent conversation generation method and device, computer equipment and computer storage medium
CN110110323B (en) Text emotion classification method and device and computer readable storage medium
EP3570220B1 (en) Information processing method, information processing device, and computer-readable storage medium
CN113378940B (en) Neural network training method and device, computer equipment and storage medium
CN108197294A (en) A kind of text automatic generation method based on deep learning
CN110427802B (en) AU detection method and device, electronic equipment and storage medium
CN110502976A (en) The training method and Related product of text identification model
Lam Word2bits-quantized word vectors
CN108805833A (en) Miscellaneous minimizing technology of copybook binaryzation ambient noise of network is fought based on condition
CN111241820A (en) Bad phrase recognition method, device, electronic device, and storage medium
CN117951649B (en) Training method, device and equipment of polypeptide and receptor binding activity prediction model
CN114359592A (en) Model training and image processing method, device, equipment and storage medium
CN110570844A (en) Speech emotion recognition method and device and computer readable storage medium
CN109961152B (en) Personalized interaction method and system of virtual idol, terminal equipment and storage medium
CN110858307B (en) Character recognition model training method and device and character recognition method and device
CN109948569B (en) Three-dimensional mixed expression recognition method using particle filter framework
CN114662601A (en) Intention classification model training method and device based on positive and negative samples
CN117037006B (en) Unmanned aerial vehicle tracking method with high endurance capacity
CN114444686A (en) Method and device for quantizing model parameters of convolutional neural network and related device
CN114972928B (en) Image recognition model training method and device
CN116958712B (en) Image generation method, system, medium and device based on prior probability distribution
KR20190129698A (en) Electronic apparatus for compressing recurrent neural network and method thereof
CN114707483B (en) Zero sample event extraction system and method based on contrast learning and data enhancement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant