WO2021147365A1 - 一种图像处理模型训练方法及装置 - Google Patents

一种图像处理模型训练方法及装置 Download PDF

Info

Publication number
WO2021147365A1
WO2021147365A1 PCT/CN2020/117900 CN2020117900W WO2021147365A1 WO 2021147365 A1 WO2021147365 A1 WO 2021147365A1 CN 2020117900 W CN2020117900 W CN 2020117900W WO 2021147365 A1 WO2021147365 A1 WO 2021147365A1
Authority
WO
WIPO (PCT)
Prior art keywords
neurons
parameters
neural network
group
image processing
Prior art date
Application number
PCT/CN2020/117900
Other languages
English (en)
French (fr)
Inventor
邹声元
常亚
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP20915287.5A priority Critical patent/EP4080415A4/en
Publication of WO2021147365A1 publication Critical patent/WO2021147365A1/zh
Priority to US17/871,389 priority patent/US20220366254A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/061Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using biological neurons, e.g. biological neurons connected to an integrated circuit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/98Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions

Definitions

  • the invention relates to the technical field of neural networks, in particular to an image processing model training method and device.
  • the image processing model is used to process images such as detection, segmentation, and classification.
  • the image processing model is usually a model based on a neural network architecture.
  • the image processing model is composed of multiple neural network layers, and each neural network layer includes multiple neurons. Through the training data in the training data set, the parameters of the neuron can be trained, so as to realize the training of the image processing model.
  • the training data in the training data set is input to the image processing model, and the image processing model calculates the output result of the training data.
  • the annotation result of the training data is compared with the output result of the image processing model, and based on the comparison result, the parameters of the image processing model are adjusted until the output result of the image processing model approaches the annotation result, or the output result of the image processing model and The marked results are consistent.
  • the accuracy of the image processing model is generally verified through test data.
  • the problem of over-fitting may occur when verifying the image processing model. Over-fitting means that the image processing model can fit the labeling results of the training data well, but it cannot fit the labeling results of the test data well. With the increase in the number of training of the image processing model, the better the fit of the annotation results of the training data, and the worse the fit of the annotation results of the test data. Therefore, the over-fitting problem will affect the accuracy of the image processing model, and how to suppress the over-fitting has become an important problem that needs to be solved in the image processing process.
  • the embodiments of the present application provide an image processing model training method and device, which are used to achieve the effect of suppressing overfitting and improve the accuracy of the image processing model.
  • an embodiment of the present application provides an image processing model training method.
  • the method includes: inputting image data in a training data set into an image processing model for processing to obtain a processing result corresponding to the image data, and the image processing
  • the parameters of n1 neurons in the model are enlarged and the parameters of n2 neurons are reduced; calculate the error between the annotation result of the image data in the training data set and the processing result; For the error of the processing result, the parameters of the image processing model are adjusted; where n1 and n2 are positive integers.
  • the image processing model training device scrambles the training process of the image processing model by scaling the parameters of the neuron in the image processing model to improve the anti-interference ability of the image processing model, thereby suppressing over-simulation.
  • the combined effect improves the accuracy of the image processing model and also ensures the training efficiency of the image processing model.
  • the image processing model is a model based on a neural network architecture
  • the neural network architecture includes M neural network layers
  • the M neural network layers include an input layer, a hidden layer, and an output layer
  • the parameters of n1 neurons in the m neural network layers in the image processing model are enlarged, and the parameters of n2 neurons in the m neural network layers are reduced; where M, m are positive Integer, m is less than or equal to M.
  • the parameters of the neurons in the m neural network layers are selected for scaling. The difference in the m neural network layers selected in each training process can further improve the anti-interference ability of the image processing model, and further Achieve better suppression of over-fitting effect.
  • the method before inputting the image data in the training data set to the image processing model for processing, the method further includes: determining the scaling ratio of each neural network layer in the m neural network layers and A zoom factor, where the zoom factor includes a zoom factor and a zoom factor; according to the zoom ratio of each neural network layer in the m neural network layers, the neuron of the parameter to be zoomed in and the parameter to be zoomed out in each neural network layer are determined
  • N1 is the sum of the number of neurons whose parameters are to be enlarged in each neural network layer
  • n2 is the sum of the number of neurons whose parameters are to be reduced in each neural network layer; according to the m
  • the magnification of each neural network layer in each neural network layer is to enlarge the parameters of the neurons whose parameters are to be magnified in each neural network layer; according to the reduction magnification of each neural network layer, the parameters of each neural network layer are The parameters of the neurons whose parameters are to be reduced in each neural network layer are reduced.
  • each neural network layer of the m neural network layers includes at least one group of neurons whose parameters are to be enlarged and at least one group of neurons whose parameters are to be reduced.
  • the neurons whose parameters are enlarged and the at least one group of neurons whose parameters are to be reduced form N groups of neurons; according to the magnification of each neural network layer in the m neural network layers, each neural network Amplifying the parameters of the neurons of the parameters to be amplified in each layer includes: according to the magnification factor corresponding to the neurons of each group of parameters to be amplified in each neural network layer, the neuron of the neurons of each group of parameters to be amplified Zooming in on the parameters of each neural network layer; the zooming in on the parameters of the neurons whose parameters are to be reduced in each neural network layer according to the reduction multiple of each neural network layer includes: according to each neural network layer
  • the reduction factor corresponding to each group of neurons with parameters to be reduced in each group is to reduce the parameters of neurons in each group of neurons with parameters to be reduced.
  • the number of neurons in each group of neurons in the N groups of neurons can be the same or different.
  • N is the magnification factor corresponding to each group of parameters to be magnified and each group of neurons to be magnified The sum of the zoom factor corresponding to the neuron of the parameter.
  • the number of neurons in each group of neurons in the N groups of neurons is different, and the following conditions are met: N is the magnification of all neurons in each group of parameters to be magnified and each The sum of the reduction magnifications of all neurons in the group of parameters to be reduced, wherein the magnification of all neurons in each group of parameters to be magnified is the number of neurons in each group of parameters to be magnified and the corresponding The product of the magnification factor of, the reduction factor of all neurons in each group of parameters to be reduced is the product of the number of neurons of each group of parameter to be reduced and the corresponding reduction factor.
  • the image data is all or part of the image data in the training data set.
  • the method further includes: adjusting the parameters of the n1 neurons Zooming in; and/or zooming in on the parameters of the n2 neurons.
  • an embodiment of the present application provides an image processing model training device, which may have the function of realizing the above-mentioned first aspect or any one of the possible designs of the first aspect.
  • the functions of the above-mentioned image processing model training device can be realized by hardware, or can be realized by hardware executing corresponding software, and the hardware or software includes one or more modules corresponding to the above-mentioned functions.
  • the device may include: a processing unit, a calculation unit, and an adjustment unit.
  • the processing unit is used to input the image data in the training data set into the image processing model for processing to obtain the processing result corresponding to the image data.
  • the parameters of n1 neurons in the image processing model are enlarged and n2
  • the parameters of the neuron have been reduced, where n1 and n2 are positive integers;
  • a calculation unit configured to calculate the error between the annotation result of the image data in the training data set and the processing result
  • the adjustment unit is configured to adjust the parameters of the image processing model according to the error between the annotation result and the processing result.
  • the image processing model is a model based on a neural network architecture
  • the neural network architecture includes M neural network layers
  • the M neural network layers include an input layer, a hidden layer, and an output layer ;
  • n1 neurons in the m neural network layers in the image processing model are enlarged, and the parameters of n2 neurons in the m neural network layers are reduced;
  • M and m are positive integers, and m is less than or equal to M.
  • the device further includes:
  • the scaling unit is used to determine the scaling ratio and the scaling factor of each neural network layer in the m neural network layers, where the scaling factor includes a reduction factor and an enlargement factor; according to each neural network layer in the m neural network layers
  • n1 is the sum of the number of neurons of the parameter to be enlarged in each neural network layer
  • n2 is The sum of the number of neurons whose parameters are to be reduced in each neural network layer; according to the magnification of each neural network layer in the m neural network layers, the number of parameters to be magnified in each neural network layer
  • the parameter of the neuron is enlarged; and the parameter of the neuron whose parameter is to be reduced in each neural network layer is reduced according to the reduction multiple of each neural network layer.
  • each neural network layer of the m neural network layers includes at least one group of neurons whose parameters are to be enlarged and at least one group of neurons whose parameters are to be reduced.
  • the neurons whose parameters are zoomed in and the at least one group of neurons whose parameters are to be zoomed out form N groups of neurons;
  • the scaling unit is specifically configured to amplify the parameters of the neurons in each group of neurons of the parameters to be amplified according to the corresponding magnification factor of each group of neurons of the parameters to be amplified in each neural network layer;
  • the reduction factor corresponding to each group of neurons with parameters to be reduced in each neural network layer is to reduce the parameters of neurons in each group of neurons with parameters to be reduced.
  • N is the magnification factor corresponding to each group of parameters to be magnified and each group of neurons to be magnified The sum of the zoom factor corresponding to the neuron of the parameter.
  • the number of neurons in each group of neurons in the N groups of neurons is different, and the following conditions are met: N is the magnification of all neurons in each group of parameters to be magnified and each The sum of the reduction magnifications of all neurons in the group of parameters to be reduced, wherein the magnification of all neurons in each group of parameters to be magnified is the number of neurons in each group of parameters to be magnified and the corresponding The product of the magnification factor of, the reduction factor of all neurons in each group of parameters to be reduced is the product of the number of neurons of each group of parameter to be reduced and the corresponding reduction factor.
  • the image data is all or part of the image data in the training data set.
  • the device further includes:
  • the restoration unit is used to reduce the parameters of the n1 neurons; and/or to enlarge the parameters of the n2 neurons.
  • an embodiment of the present application provides an image processing model training device, which may have the function of realizing the foregoing first aspect or any one of the possible designs of the first aspect.
  • the functions of the above-mentioned image processing model training device can be realized by hardware, or can be realized by hardware executing corresponding software, and the hardware or software includes one or more modules corresponding to the above-mentioned functions.
  • the structure of the device includes at least one processor, and may also include at least one memory. At least one processor is coupled with at least one memory, and can be used to execute computer program instructions stored in the memory, so that the device executes the above-mentioned first aspect or any one of the possible design methods of the first aspect.
  • the device further includes a communication interface, and the processor is coupled with the communication interface.
  • the communication interface may be a transceiver or an input/output interface; when the device is a chip included in the server, the communication interface may be an input/output interface of the chip.
  • the transceiver may be a transceiver circuit, and the input/output interface may be an input/output circuit.
  • an embodiment of the present application provides a chip system, including: a processor, the processor is coupled with a memory, the memory is used to store a program or an instruction, when the program or an instruction is executed by the processor , So that the chip system implements the above-mentioned first aspect or any one of the possible design methods of the first aspect.
  • the chip system further includes an interface circuit for receiving and transmitting code instructions to the processor.
  • processors in the chip system, and the processors may be implemented by hardware or software.
  • the processor may be a logic circuit, an integrated circuit, or the like.
  • the processor may be a general-purpose processor, which is implemented by reading software codes stored in the memory.
  • the memory may be integrated with the processor, or may be provided separately from the processor, which is not limited in this application.
  • the memory may be a non-transitory processor, such as a read-only memory ROM, which may be integrated with the processor on the same chip, or may be set on different chips.
  • the setting method of the processor is not specifically limited.
  • an embodiment of the present application provides a readable storage medium having a computer program or instruction stored thereon, and when the computer program or instruction is executed, the computer executes the first aspect or any one of the first aspects. Possible design methods.
  • embodiments of the present application provide a computer program product, which when a computer reads and executes the computer program product, causes the computer to execute the above-mentioned first aspect or any one of the possible design methods of the first aspect.
  • FIG. 1 is a schematic diagram of the architecture of an image processing model provided by an embodiment of the application
  • Figure 2 is a schematic diagram of the architecture of an image processing model applying the discarding method
  • FIG. 3 is a schematic structural diagram of a training device provided by an embodiment of the application.
  • FIG. 4 is a schematic diagram of an image processing model training process provided by an embodiment of the application.
  • FIG. 5 is a schematic diagram of a neural network structure provided by an embodiment of the application.
  • FIG. 6 is a schematic structural diagram of a CNN-based image classification model provided by an embodiment of the application.
  • FIG. 7 is a schematic structural diagram of a VGG-based image classification model provided by an embodiment of the application.
  • FIG. 8 is a schematic diagram of image classification provided by an embodiment of this application.
  • FIG. 9 is a schematic diagram of a flow of vehicle detection provided by an embodiment of the application.
  • FIG. 10 is a schematic diagram of a vehicle detection provided by an embodiment of the application.
  • FIG. 11 is a schematic structural diagram of a 3D CNN provided by an embodiment of this application.
  • FIG. 12 is a schematic diagram of a training process of an image processing model provided by an embodiment of the application.
  • FIG. 13 is a schematic structural diagram of an image processing model training device provided by an embodiment of the application.
  • FIG. 14 is a schematic diagram of another structure of an image processing model training device provided by an embodiment of the application.
  • the present application provides an image processing model training method and device, aiming to better suppress the over-fitting problem generated in the image processing model training process and improve the accuracy of the image processing model.
  • the method and the device are based on the same technical concept. Since the principles of the method and the device to solve the problem are similar, the implementation of the device and the method can be referred to each other, and the repetition will not be repeated.
  • Image processing model used for image processing, such as detection, classification, segmentation and other processing.
  • the image processing model is usually a model based on a neural network (NN) architecture.
  • the image processing model is composed of multiple neural network layers.
  • the neural network layer includes an input layer, an output layer and a hidden layer.
  • the number of input layer, output layer and hidden layer is one or more.
  • the image processing model includes one input layer, multiple hidden layers and one output layer.
  • Each neural network layer includes multiple neurons, a linear operator and a nonlinear activation function, and the linear operator includes multiple weights and a bias.
  • the weight is also referred to as w for short
  • the bias is also referred to as a for short.
  • the non-linear excitation function includes one or more of the Sigmoid function or the rectified linear unit (ReLU) function.
  • each neuron includes a set of corresponding parameters.
  • the training of the image processing model can be achieved by training the parameters of the neuron in the image processing model.
  • Neural networks including feedforward neural networks (FNN), convolutional neural networks (convolutional neural networks, CNN), recurrent neural networks (RNN), autoencoders (AE) , Generative adversarial networks (GAN), etc.
  • FNN feedforward neural networks
  • CNN convolutional neural networks
  • RNN recurrent neural networks
  • AE autoencoders
  • GAN Generative adversarial networks
  • Training data and test data are used to train the image processing model.
  • the training data is also called sample data
  • the test data is used to verify the accuracy of the image processing model.
  • both training data and test data are marked with results.
  • all or part of the training data can be used for training.
  • Using all the training data to perform one training can be called an epoch training.
  • One training using part of the training data can be called batch training.
  • all training data can be divided into multiple parts in advance, and one part of the training data is called a batch of data.
  • Underfit means that the image processing model can’t fit the training data well
  • overfit means that the image processing model can fit the training data well.
  • it cannot fit the labeling results of the test data and as the number of image processing model training increases, the better the fit to the training data, the worse the fit to the test data.
  • the scaling ratio refers to the ratio of the number of neurons that need to be scaled (parameters) to the number of all neurons in each neural network layer that needs to be scaled parameters.
  • the zoom factor refers to the zoom factor of the neuron that needs to be zoomed in (parameter) and the zoom factor of the neuron that needs to be zoomed out (parameter) in each neural network layer that needs to be zoomed in.
  • the "and/or” in this application describes the association relationship of the associated objects, indicating that there can be three relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, and B exists alone. This situation.
  • the character "/" generally indicates that the associated objects are in an "or” relationship.
  • the multiple involved in this application refers to two or more.
  • the training data in the training data set is labeled with results, that is, the training data set includes training data and the labeled results corresponding to the training data.
  • the training equipment inputs the training data into the image processing model for processing.
  • the image processing model calculates the output result of the training data. According to the error between the annotation result of the training data and the output result of the image processing model, the parameters of the neuron in the image processing model are processed. adjust.
  • the training device determines that the training of the image processing model is completed.
  • the training device uses the test data of the actual scene to verify the accuracy of the image processing model after the training.
  • problems of under-fitting and over-fitting may occur.
  • the purpose of training the image processing model is to enable the image processing model to correctly predict the results of the input data, but under-fitting and over-fitting will affect the accuracy of the prediction results.
  • the problem of underfitting can generally be solved by increasing the number of neural network layers in the image processing model and/or increasing the number of neurons in the neural network layer.
  • the training device Before training the image processing model, the training device determines a discarding rate, and the training device determines the second number of neurons in the hidden layer that need to be discarded according to the discarding rate and the first number of neurons in the hidden layer. In the training process of the image processing model, the training device randomly selects the second number of neurons in the hidden layer to discard, that is, the second number of randomly selected neurons does not participate in this training.
  • the discarding method can improve the generalization ability of the image processing model.
  • interference can be added to the input of the next hidden layer of the hidden layer, which can improve image processing
  • the model's anti-interference ability so as to achieve the effect of suppressing overfitting.
  • n neurons in a hidden layer of the image processing model which are neuron 1, neuron 2,..., neuron n.
  • the training device randomly discards several neurons.
  • Neurons, such as discarding neuron 2 in the hidden layer during the second training, the training device randomly discards several neurons, such as discarding neuron 1 in the hidden layer.
  • this application proposes an image processing model training method and device to achieve a better suppression of overfitting.
  • the training device determines the zoom ratio and zoom factor.
  • the training device determines the number of neurons that need to be zoomed in each neural network layer according to the zoom ratio.
  • the parameter of the neuron that needs to be enlarged is enlarged according to the magnification factor in the zoom factor, and the parameter of the neuron that needs to be reduced is reduced according to the reduction factor in the zoom factor.
  • the number of neurons in the neural network layer does not change during each training process in the present application, so there is no need to increase the number of trainings, which can suppress overfitting and ensure the training efficiency of the image processing model.
  • the image processing model training method proposed in the present application is also referred to as a scaling (Scaleout) method.
  • the training device 300 may include a controller 310, a memory 320, and an image processing model 330, where the controller 310 includes a random number.
  • Generator 311 may be included in the training device 300.
  • the image processing model 330 is used to process images.
  • the image processing model 330 is composed of an input layer, multiple hidden layers, and an output layer.
  • Each hidden layer includes multiple neurons
  • each neuron includes a set of corresponding parameters
  • the parameters of the neuron include weight w and bias a.
  • the parameters of neurons in the hidden layer in the image processing model 330 are scaled, and then the image processing model 330 is trained.
  • the trained image processing model 330 can process the input image. And output the processed results, and can achieve the effect of suppressing over-fitting.
  • the image processing model 330 is a model based on the FNN structure, and the parameters of neurons in the fully connected (FC) hidden layer of the image processing model 330 can be scaled to achieve the effect of suppressing overfitting.
  • the number of neurons in each fully connected hidden layer is not limited in this application. For example, it may include 16 or 32 neurons in a smaller number, or may include 1024 or 2048 neurons in a larger number. Neurons.
  • the image processing model 330 is a model based on the CNN structure, and the parameters of the neurons in the fully connected layer of the image processing model 330 can be scaled to achieve the effect of suppressing overfitting. Because the CNN structure itself has powerful image processing capabilities, the image processing model 330 based on the CNN structure can achieve great results in image classification, target detection, semantic/instance segmentation, face detection, face recognition, and image quality enhancement. Good treatment effect.
  • scaling the parameters of neurons in the hidden layer is taken as an example. In actual possible scenarios, the parameters of neurons in the input layer and/or output layer may also be scaled, which is not limited in this application.
  • the memory 320 is used to store data related to the training process of the image processing model 330, for example, including but not limited to one or more of the following data: training data set (training data set includes training data and annotations corresponding to the training data Result), the number of neural network layers, the number of neurons in each neural network layer, the first parameter of each neuron before each training, the zoom ratio and the zoom factor, and which neurons are used before each training The parameters of which are zoomed in, and the parameters of which neurons are zoomed out.
  • the training data of the image processing model includes image data
  • the annotation result corresponding to the training data includes the annotation result (such as an annotation box) for the target object in the image data.
  • different neural network layers can adopt the same set of scaling ratios and scaling factors, or different neural network layers can adopt scaling ratios and scaling parameters corresponding to each layer.
  • the zoom ratio and zoom factor remain unchanged during each training, or adjusted during each training. For example, the zoom ratio and zoom factor decrease as the number of training increases.
  • the zoom ratio ratio satisfies the following conditions: b ⁇ ratio ⁇ c, where ratio represents the zoom ratio, b ⁇ 0, c ⁇ 1.
  • b and c can be set values, or values selected based on experimental results or actual usage requirements. For example, b is 0, 0.1, or 0.3, etc., and c is 0.3, 0.5, or 0.9, etc.
  • the number of neurons that need to be enlarged in the neural network layer is equal to the number of neurons that need to be scaled down in the neural network layer.
  • the zoom factor includes the magnification factor X and the reduction factor Y.
  • the magnification X satisfies the following conditions: d ⁇ X ⁇ e, d>1, e>1, and e>d.
  • d can be 1, 1.5, or 1.9, etc.
  • e can be 1.5, 1.7, or 2, etc.
  • f can be 2 or 5, etc.
  • the zoom ratio can be set between the interval (0, 0.5], for example, the zoom ratio can be set to 0.1, 0.2,..., 0.5.
  • the zoom ratio in one possible implementation, perform comparative experiments. For example, when the zoom ratio is set between [0.3, 0.5], the effect of suppressing overfitting is better, and when the zoom ratio is less than 0.5, the image processing model is The error rate is relatively stable, so you can set the zoom ratio based on different needs.
  • the magnification can be set between the interval (1,2), for example, the magnification can be set to 1.1, 1.2, 1.3,..., 1.9.
  • magnification when the magnification is set to different values, comparative experiments are performed. For example, when the magnification is set in the interval [1.5,1.7], the effect of suppressing overfitting is better, and if the magnification is less than 1.5
  • the zoom factor can be set based on different needs, and the zoom factor can be determined.
  • the zoom ratio can be set between the interval [0.1, 0.5], for example, the zoom ratio can be set to 0.5.
  • the controller 310 is used to control the training process of the image processing model 330.
  • the training process of the controller 310 controlling the image processing model 330 can be referred to as shown in FIG. 4.
  • the controller 310 determines the hidden layer of the zoom parameter required for the current training, and determines the zoom ratio and the zoom multiple of the current training.
  • the hidden layer that requires scaling parameters during each training may be preset, or may be randomly selected by the controller 310 controlling the random number generator 311.
  • the random number generator 311 selects neurons that require scaling parameters in the hidden layer that needs to be scaled in units of groups. For example, a total of N groups of neurons are selected, and the N1 group of neurons includes the neurons that need to be scaled.
  • the number of neurons included in each group of neurons may be the same or different. For example, the number of neurons included in each group of neurons in N groups of neurons is different, and the number of neurons included in each group of neurons is different.
  • the numbers are g 1 , g 2 ,..., g n .
  • the zoom factor corresponding to each group of neurons can be different.
  • the zoom factor corresponding to each group of neurons in the N groups of neurons is t 1 , t 2 ,..., t n , where n is greater than or equal to 1, And an integer less than or equal to N.
  • the number of neurons included in each group of neurons is the same, the number of neurons included in the N groups of neurons satisfies: g ⁇ N ⁇ M, g is the number of neurons included in each group of neurons, and N is the neuron The number of element groups, M is the number of all neurons in a layer.
  • the number of neurons included in each group of neurons is different, the number of neurons included in the N groups of neurons satisfies: i is an integer greater than or equal to 1 and less than or equal to N, representing the i-th group of neurons, and g i is the number of neurons included in the i-th group of neurons.
  • the zoom factor corresponding to each group of neurons satisfies: t i The zoom factor corresponding to the i-th group of neurons.
  • the controller 310 Before each training, the controller 310 enlarges the parameters of the neurons that need to be enlarged in this training according to the magnification factor corresponding to this training, and according to the reduction factor corresponding to this training, performs the calculation of the neurons that need to be reduced in this training. The parameters are reduced.
  • the controller 310 inputs the training data into the image processing model 330 with scaled parameters to obtain the processing result of the image processing model 330, and calculates the error between the processing result and the labeling result of the training data.
  • the controller 310 adjusts the parameters of the image processing model 330 according to the error between the processing result and the annotation result of the training data.
  • each batch of training data is used as the training data required for one training session.
  • the controller 310 restores the parameters of the neuron scaled during this training.
  • the controller 310 obtains the first parameter of each neuron before the current training, and resets the parameters of the neuron scaled during this training process to the first parameter corresponding to each zoomed neuron.
  • One parameter the parameters of the neurons zoomed in during this training process are divided by the magnification factor to zoom out, and the parameters of the neurons zoomed out in this training process are divided by the zoom factor to zoom in.
  • the controller 310 restores the parameters of the neuron, it can also restore only the parameters of the enlarged neuron, or only the parameters of the reduced neuron.
  • the image processing model 330 is a model based on a neural network architecture.
  • the training process of the neural network includes a forward pass and a back pass.
  • the scaling method provided in this application can be used only in the forward process, that is, the parameters of the neuron are scaled before the forward process.
  • the parameters of the neuron are restored before the process; or the scaling method provided in this application can be used only in the backward process, that is, the parameters of the neuron are scaled after the forward process and before the backward process.
  • the parameters of the neuron are restored after the backward process; or the zoom method provided in this application can be used in both the forward process and the backward process.
  • the image processing model 330 based on the FNN structure, take a possible neural network structure shown in FIG. 5 as an example, including an input layer, an output layer, and four fully connected layers, of which the four hidden layers are respectively Including: the number of neurons is 2048*784 fully connected + Relu layer, the number of neurons is 2048*2048 fully connected + Relu layer, the number of neurons is 2048*2048 fully connected + Relu layer, and neurons The number is 10*2048 fully connected layers.
  • the scaling method provided in this application is compared with the dropout method in the prior art. According to the experimental results, it is compared with the dropout method.
  • the scaling method provided by the present application can effectively reduce the number of training, improve the efficiency of training, and can significantly reduce the error rate of the image processing model, and better achieve the effect of suppressing overfitting.
  • CNN-based image classification model structure including the input layer (the image is input through the input layer), the CNN feature extraction network, one or Multiple fully connected layers (two fully connected layers are shown in FIG. 6) and an output layer (the output layer outputs classification results).
  • CNN feature extraction networks include Alexnet, VGG, GoogleNet, Resnet, Densenet, Mobilenet, SeNet, or Shuffnet.
  • VGG image classification model structure including an input layer, two conv+relu layers, a pooling layer, two conv+relu layers, a pooling layer, Three conv+relu layers, pooling layer, three conv+relu layers, pooling layer, three conv+relu layers, pooling layer, three Fc+Relu layers, and output layer.
  • the zoom ratio is set to 0.5
  • the magnification is set to 1.7
  • the reduction factor is set to 0.3
  • the parameters of the neurons in the last three Fc+Relu layers of the VGG image classification model are scaled to train the VGG image classification model.
  • the processing result includes the classification result marked for the target in the first image.
  • the target in the first image is marked with a marking frame, and the classification result of the target is marked as "cat".
  • the VGG image classification model is used to explain the animal classification. If the trained VGG image classification model has under-fitting problems, the VGG image classification model cannot identify animal information. If the trained VGG image classification model has over-fitting The combined problem is that for different types of animals, or animals of the same type with different appearances, the VGG image classification model cannot accurately classify, and the VGG image classification model has poor adaptability.
  • the target detection model includes Faster R-CNN, R-FCN, or SSD.
  • the zoom ratio is set to 0.5
  • the magnification is set to 1.7
  • the reduction factor is set to 0.3. If the Faster R-CNN target detection model includes two Fc+ReLU layers, each Fc+ReLU layer includes 1024 neurons, and for each Fc+ReLU layer, the number of neurons that need to be amplified in the Fc+ReLU layer is 512, the number of neurons that need to be reduced is 512.
  • 512 neurons are randomly selected as the neurons that need to be enlarged in the Fc+ReLU layer, and 512 neurons are randomly selected as the ones that need to be reduced.
  • the parameters of the 512 neurons that need to be enlarged are magnified by 1.7 times, and the parameters of the 512 neurons that need to be reduced are reduced by 0.3.
  • the trained target detection model if the trained target detection model has an underfitting problem, the vehicle information may not be recognized from the image. On the contrary, if the trained target detection model has an over-fitting problem, the adaptability is poor, and the model applicable to the A vehicle may lose its accuracy when applied to the B vehicle. Therefore, the above model can also be applied to the field of intelligent monitoring or automatic driving to more accurately identify vehicle information.
  • FIG. 9 is a schematic diagram of a possible vehicle detection process.
  • Road surveillance cameras are installed at traffic intersections to collect traffic conditions at traffic intersections and passing traffic intersections. Vehicle information.
  • the road monitoring camera sends the collected video data to the device for vehicle detection.
  • the device for vehicle detection includes the vehicle detection model that has been trained.
  • the device for vehicle detection can be a server or a vehicle Training equipment for training the detection model, etc.
  • the device receives the video data collected by the road surveillance camera, decodes the video data, obtains the video image from the decoded video data, and converts the format of each frame of the video image (such as converting to blue green red (BGR) ) Format) to process the size of each frame of the video image after the format conversion (such as scaling and/or resetting the size of the video image (resize)), for example, as shown in Figure 10 ( a) shows a frame of video image obtained after processing the size, which is referred to as the second image in the embodiment of the present application.
  • the device inputs the second image shown in Figure 10(a) into the trained vehicle detection model for processing, and the trained vehicle detection model outputs the processing result of the second image.
  • the processing result of the second image includes For the detection result marked by the target in the second image, as shown in Figure 10(b), the target in the second image is marked with a marking frame, and the detection result of the target is marked as "car".
  • the vehicle detection model can detect the presence or absence of vehicles in the image, and can also detect the type of vehicles in the image.
  • the types of vehicles can include motor vehicles and non-motor vehicles (as shown in Figure 10 ( As shown in b), the vehicle is detected and the vehicle is detected as a car), which can also include the manufacturer, brand, and other types of the vehicle.
  • multiple road surveillance cameras can be linked, such as being located in an area, or multiple road monitoring cameras on a specific driving route can be linked, and the video collected by multiple road monitoring cameras after linkage Data can be shared, for example, it can intelligently provide driving routes for vehicles according to the traffic conditions of each traffic intersection.
  • the road surveillance camera can also be connected to the public security traffic system.
  • the public security traffic system can analyze the video images collected by the road surveillance camera. For example, it can determine whether a vehicle passing through the traffic intersection where the road surveillance camera is located violates laws and regulations based on the analysis result Or, according to the analysis result, it can be determined whether there is traffic congestion at the traffic intersection where the road monitoring camera is located, so as to notify the traffic police near the traffic intersection to assist in traffic flow and so on.
  • Three-dimensional (3Dimensions, 3D) CNN can achieve good results in video classification, action recognition, etc., which is different from CNN which treats each frame of video as a static picture for processing.
  • 3D CNN performs video processing, it can take the video into consideration.
  • 3D CNN processes the video, in 3D CNN, multiple consecutive image frames in the video are stacked to form a cube. Because multiple image frames in the cube have temporal continuity, the cube can be captured by the 3D convolution kernel. Sports information.
  • 3D CNN can also be used in combination with the zoom method provided in this application.
  • Figure 11 shows a possible 3D CNN architecture, including input layer, conv1a layer, pooling layer, conv2a layer, pooling layer, conv3a layer, conv3b layer, pooling layer, conv4a layer, conv4b layer, pooling layer, conv5a layer , Conv5b layer, pooling layer, Fc6 layer, Fc7 layer and output layer.
  • the parameters of the neurons in the two fully connected layers of Fc6 and Fc7 are scaled. For example, the zoom ratio is set to 0.5, the zoom factor is set to 1.7, and the zoom factor is set to 0.3.
  • the 3D CNN shown in FIG. 11 can be used to detect highlights in a video.
  • the video clips to be detected are input into the trained 3D CNN for processing, and the trained 3D CNN outputs the highlight score of the video clip.
  • FIG. 3 is only a schematic structural diagram of a possible training device provided by an embodiment of the present application, and the positional relationship between the modules shown in the figure does not constitute any limitation.
  • the memory 320 is built in
  • the training device can also be an external memory.
  • the training equipment can be a personal computer (PC), a notebook computer, a server, and other equipment.
  • the scaling method provided in this application can also be applied to automatic machine learning (AutoML) or neural architecture search (neural architecture search, NAS).
  • AutoML automatic machine learning
  • neural architecture search neural architecture search
  • the number of training required for the scaling method provided in this application is Less, for scenarios where AutoML and NAS need to try different hyperparameters for training multiple times, it can also reduce the time for trying model search and training.
  • the scaling method provided in this application can also be applied to achieve a better suppression of overfitting.
  • the scaling method provided in this application can also be used as a scaling operator for neural network model training to be used by public cloud tenants.
  • public cloud tenants can also use the public cloud when building their own deep learning models.
  • the scaling operator provided for neural network model training can train your own deep learning model to achieve better results.
  • an embodiment of the present application provides an image processing model training method. It includes the following steps:
  • S1201 Input the image data in the training data set to the image processing model for processing, and obtain the processing result corresponding to the image data.
  • the parameters of n1 neurons are enlarged and the parameters of n2 neurons are processed. The reduction; where n1 and n2 are positive integers.
  • the image processing model training device may also obtain a training data set in the memory, and the training data set includes image data.
  • the training data set includes the training data and the labeling results corresponding to the training data.
  • the image data input to the image processing model in S1201 is all or part of the image data in the training data set. That is to say, in one training of the image processing model, all or part of the training data in the training data set can be used for training. Among them, the use of all the training data in the training data set for one training can be called a training, and the training data set is used for training. Part of the training data for one training can be called a batch of training.
  • the memory may be an internal memory, as shown in FIG. 3, the memory is built into the image processing model training device, or the memory may also be an external memory (such as a hard disk, a floppy disk, or an optical disk, etc.).
  • the image processing model is a model based on a neural network architecture
  • the neural network architecture includes M neural network layers
  • the M neural network layers include an input layer, a hidden layer, and an output layer, where M is Positive integer.
  • the parameters of the neurons in the neural network layer of the image processing model are scaled, for example, there are m neural network layers that require scaling parameters in the M neural network layers, and the m neural network layers in the image processing model
  • the parameters of n1 neurons in n1 are enlarged, and the parameters of n2 neurons in the m neural network layers are shrunk, where m is a positive integer and m is less than or equal to M.
  • the image processing model training device may also determine the zoom ratio and zoom factor of each neural network layer in the m neural network layers, and the zoom factor includes a reduction factor and an enlargement factor.
  • the zoom ratio and zoom factor of each neural network layer can also be stored in the memory.
  • the scaling ratio refers to the ratio of the number of neurons that need to be scaled to the values of all neurons in each neural network layer of the m neural network layers that need to be scaled parameters.
  • the image processing model training device may determine the neuron of the parameter to be enlarged and the neuron of the parameter to be reduced in each neural network layer according to the zoom ratio of each neural network layer in the m neural network layers.
  • the sum of the number of elements, n2 is the sum of the number of neurons whose parameters are to be reduced in each neural network layer.
  • the image processing model training device can select neurons that need to be scaled in each neural network layer of m neural network layers in units of groups, such as selecting a total of N groups of neurons, and each of the m neural network layers
  • Each neural network layer includes at least one set of neurons with parameters to be enlarged and at least one set of neurons with parameters to be reduced, and the at least one set of neurons with parameters to be enlarged and the at least one set of neurons with parameters to be reduced are composed of N groups of neurons.
  • the number of neurons in each group of neurons in the N groups of neurons can be the same or different.
  • the zoom factor refers to the zoom factor of the neuron that needs to zoom in and the zoom factor of the neuron that needs to zoom in in each of the m neural network layers that need to be zoomed in.
  • the image processing model training device can amplify the parameters of the neurons whose parameters are to be amplified in each neural network layer according to the magnification of each neural network layer in the m neural network layers; The reduction multiple of the network layer reduces the parameters of the neurons whose parameters are to be reduced in each neural network layer.
  • the image processing model training device can also target each neural network layer of the m neural network layers. Determine the magnification factor corresponding to each group of neurons whose parameters are to be magnified in each neural network layer, and the magnification factor corresponding to each group of neurons whose parameters are to be reduced in each neural network layer.
  • the image processing model training device scales the parameters of the neurons, it can be based on the corresponding magnification of the neurons of each group of parameters to be magnified in each neural network layer.
  • the parameters of the cells are enlarged, and the parameters of the neurons in each group of neurons with the parameters to be reduced are reduced according to the corresponding reduction multiples of the neurons of each group of parameters to be reduced in each neural network layer.
  • N is the magnification corresponding to the neurons of each group of parameters to be zoomed out and the zoom out corresponding to the neurons of each group of parameters to be zoomed out The sum of multiples.
  • N is the magnification of all neurons in each group of parameters to be enlarged and the number of neurons in each group of parameters to be reduced
  • the sum of the reduction magnifications of all neurons in each group of neurons, wherein the magnification of all neurons in each group of parameters to be magnified is the product of the number of neurons of each group of parameters to be magnified and the corresponding magnification, so
  • the reduction factor of all neurons in each group of parameters to be reduced is the product of the number of neurons of each group of parameters to be reduced and the corresponding reduction multiple.
  • the processing result corresponding to the image data output in S1201 is the predicted value of the image processing model.
  • S1202 Calculate the error between the annotation result of the image data in the training data set and the processing result.
  • a loss function can be used to calculate the error between the annotation result of the image processing and the processing result.
  • the higher the output value (loss) of the function the greater the error, and the training process of the image processing model becomes a process of reducing this loss as much as possible.
  • S1203 Adjust the parameters of the image processing model according to the error between the annotation result and the processing result.
  • the image processing model training device updates the parameters in the image processing model according to the error between the labeling result and the processing result, and through continuous adjustments until the processing result of the image data predicted by the image processing model approaches or equals the labeling result of the image data, the correction is completed Training of image processing models.
  • the image processing model training device can adjust the parameters of the image processing model this time after adjusting the parameters of the image processing model before inputting the image data into the image processing model next time. Based on this, the parameters of the neuron in the image processing model are scaled.
  • the image processing model training device can also perform adjustments on the parameters of the n1 neurons after adjusting the parameters of the image processing model. Zoom out; and/or zoom in on the parameters of the n2 neurons. For example, divide the parameters of n1 neurons in this training process by the corresponding magnification of each neuron to zoom out, and divide the parameters of n2 neurons in this training process by the corresponding zoom out of each neuron Multiply to zoom in.
  • Figure 12 mainly introduces the solution provided by this application from the perspective of the method flow. It can be understood that, in order to realize the above-mentioned functions, the apparatus may include corresponding hardware structures and/or software modules for performing various functions. Those skilled in the art should easily realize that in combination with the units and algorithm steps of the examples described in the embodiments disclosed herein, the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
  • FIG. 13 shows a possible exemplary block diagram of the image processing model training device involved in the embodiment of the present application, and the image processing model training device 1300 may exist in the form of software.
  • the image processing model training device 1300 may include: a processing unit 1301, a calculation unit 1302, and an adjustment unit 1303.
  • the image processing model training device 1300 may be the training device in FIG. 3 described above, or may also be a semiconductor chip provided in the training device.
  • the processing unit 1301 is configured to input image data in the training data set into an image processing model for processing to obtain a processing result corresponding to the image data.
  • the calculation unit 1302 is configured to calculate the error between the annotation result of the image data in the training data set and the processing result;
  • the adjustment unit 1303 is configured to adjust the parameters of the image processing model according to the error between the annotation result and the processing result.
  • the image processing model is a model based on a neural network architecture
  • the neural network architecture includes M neural network layers
  • the M neural network layers include an input layer, a hidden layer, and an output layer ;
  • n1 neurons in the m neural network layers in the image processing model are enlarged, and the parameters of n2 neurons in the m neural network layers are reduced;
  • M and m are positive integers, and m is less than or equal to M.
  • the device further includes:
  • the scaling unit 1304 is configured to determine the scaling ratio and scaling factor of each neural network layer in the m neural network layers, where the scaling factor includes a reduction factor and a magnification factor; according to each neural network in the m neural network layers
  • the zoom ratio of the layer determines the neuron of the parameter to be enlarged and the neuron of the parameter to be reduced in each neural network layer
  • n1 is the sum of the number of neurons of the parameter to be enlarged in each neural network layer
  • n2 Is the sum of the number of neurons whose parameters are to be reduced in each neural network layer
  • the parameters to be magnified in each neural network layer The parameter of the neuron of is enlarged; the parameter of the neuron whose parameter is to be reduced in each neural network layer is reduced according to the reduction multiple of each neural network layer.
  • each neural network layer of the m neural network layers includes at least one group of neurons whose parameters are to be enlarged and at least one group of neurons whose parameters are to be reduced.
  • the neurons whose parameters are zoomed in and the at least one group of neurons whose parameters are to be zoomed out form N groups of neurons;
  • the scaling unit 1304 is specifically configured to amplify the parameters of the neurons in each group of neurons of the parameters to be amplified according to the corresponding magnification factor of each group of neurons of the parameters to be amplified in each neural network layer;
  • the parameters of the neurons in each group of neurons whose parameters are to be reduced are reduced according to the reduction multiples corresponding to each group of neurons whose parameters are to be reduced in each neural network layer.
  • N is the magnification factor corresponding to each group of parameters to be magnified and each group of neurons to be magnified The sum of the zoom factor corresponding to the neuron of the parameter.
  • the number of neurons in each group of neurons in the N groups of neurons is different, and the following conditions are met: N is the magnification of all neurons in each group of parameters to be magnified and each The sum of the reduction magnifications of all neurons in the group of parameters to be reduced, wherein the magnification of all neurons in each group of parameters to be magnified is the number of neurons in each group of parameters to be magnified and the corresponding The product of the magnification factor of, the reduction factor of all neurons in each group of parameters to be reduced is the product of the number of neurons of each group of parameter to be reduced and the corresponding reduction factor.
  • the image data is all or part of the image data in the training data set.
  • the device further includes:
  • the restoration unit 1305 is configured to reduce the parameters of the n1 neurons; and/or enlarge the parameters of the n2 neurons.
  • the division of units in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.
  • the functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including a number of instructions to enable a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program code .
  • an embodiment of the present application also provides a schematic structural diagram of another possible image processing model training device.
  • the image processing model training device includes at least one processor 1402 and at least one communication interface 1404. Further, the image processing model training device may further include a memory 1406, and the memory 1406 is used to store computer programs or instructions.
  • the memory 1406 may be a memory in the processor or a memory outside the processor. In the case where the unit modules described in FIG. 13 are implemented by software, the software or program codes required by the processing module 1402 to perform corresponding actions are stored in the memory 1406.
  • the processor 1402 is configured to execute programs or instructions in the memory 1406 to implement the steps shown in FIG. 12 in the foregoing embodiment.
  • the communication interface 1404 is used to implement communication between the device and other devices.
  • bus 1408 which may be a peripheral component interconnection standard (PCI) Bus or extended industry standard architecture (EISA) bus, etc.
  • PCI peripheral component interconnection standard
  • EISA extended industry standard architecture
  • the bus can be divided into an address bus, a data bus, a control bus, and so on. For ease of representation, only one thick line is used in FIG. 14, but it does not mean that there is only one bus or one type of bus.
  • each module in the device 1400 is used to implement the corresponding process of the method shown in FIG. 12, and are not repeated here for brevity.
  • An embodiment of the present application also provides a chip system, including: a processor, the processor is coupled with a memory, the memory is used to store a program or instruction, when the program or instruction is executed by the processor, the The chip system implements the method in any of the foregoing method embodiments.
  • processors in the chip system there may be one or more processors in the chip system.
  • the processor can be implemented by hardware or software.
  • the processor may be a logic circuit, an integrated circuit, or the like.
  • the processor may be a general-purpose processor, which is implemented by reading software codes stored in the memory.
  • the memory may be integrated with the processor, or may be provided separately from the processor, which is not limited in this application.
  • the memory may be a non-transitory processor, such as a read-only memory ROM, which may be integrated with the processor on the same chip, or may be set on different chips.
  • the setting method of the processor is not specifically limited.
  • the chip system may be a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), or a system on chip (SoC). It can also be a central processor unit (CPU), a network processor (NP), a digital signal processing circuit (digital signal processor, DSP), or a microcontroller (microcontroller).
  • the controller unit, MCU may also be a programmable controller (programmable logic device, PLD) or other integrated chips.
  • each step in the foregoing method embodiments may be completed by an integrated logic circuit of hardware in a processor or instructions in the form of software.
  • the steps of the method disclosed in the embodiments of the present application can be directly embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.
  • the embodiment of the present application also provides a computer-readable storage medium, which stores computer-readable instructions, and when the computer reads and executes the computer-readable instructions, the computer is caused to execute any of the above-mentioned method embodiments In the method.
  • the embodiments of the present application also provide a computer program product.
  • the computer reads and executes the computer program product, the computer is caused to execute the method in any of the foregoing method embodiments.
  • processors mentioned in the embodiments of this application may be a central processing unit (central processing unit, CPU), or other general-purpose processors, digital signal processors (digital signal processors, DSP), and application-specific integrated circuits ( application specific integrated circuit (ASIC), ready-made programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • CPU central processing unit
  • DSP digital signal processors
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the memory mentioned in the embodiments of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory can be read-only memory (ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), and electrically available Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • the volatile memory may be random access memory (RAM), which is used as an external cache.
  • RAM random access memory
  • static random access memory static random access memory
  • dynamic RAM dynamic RAM
  • DRAM dynamic random access memory
  • synchronous dynamic random access memory synchronous DRAM, SDRAM
  • double data rate synchronous dynamic random access memory double data rate SDRAM, DDR SDRAM
  • enhanced synchronous dynamic random access memory enhanced SDRAM, ESDRAM
  • synchronous connection dynamic random access memory serial DRAM, SLDRAM
  • direct rambus RAM direct rambus RAM, DR RAM
  • the processor is a general-purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, or discrete hardware component
  • the memory storage module
  • the size of the sequence number of the foregoing processes does not mean the order of execution.
  • the execution order of each process should be determined by its function and internal logic, and should not be used in the embodiments of the present invention
  • the implementation process constitutes any limitation.
  • the disclosed system, device, and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Neurology (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Quality & Reliability (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本申请实施例涉及一种图像处理模型训练方法及装置,该方法包括:将训练数据集中的图像数据输入到图像处理模型进行处理,得到所述图像数据对应的处理结果,所述图像处理模型中n1个神经元的参数进行了放大且n2个神经元的参数进行了缩小;计算所述训练数据集中所述图像数据的标注结果与所述处理结果的误差;根据所述标注结果与所述处理结果的误差,对所述图像处理模型的参数进行调整;其中,n1、n2为正整数。通过对神经元的参数进行放大和缩小,提高图像处理模型的抗干扰能力,从而实现抑制过拟合的效果,并且能够保证图像处理模型的训练效率。

Description

一种图像处理模型训练方法及装置
相关申请的交叉引用
本申请要求在2020年01月23日提交中国专利局、申请号为202010077091.4、申请名称为“一种图像处理模型训练方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及神经网络技术领域,尤其涉及一种图像处理模型训练方法及装置。
背景技术
图像处理模型用于对图像进行检测、分割、分类等处理,图像处理模型通常为基于神经网络架构的模型。图像处理模型由多个神经网络层组成,每个神经网络层包括多个神经元。通过训练数据集中的训练数据可以对神经元的参数进行训练,从而实现对图像处理模型的训练。
在图像处理模型的训练过程中,将训练数据集中的训练数据输入图像处理模型,图像处理模型计算训练数据的输出结果。将训练数据的标注结果和图像处理模型的输出结果进行比较,并基于比较的结果,调整图像处理模型的参数,直至图像处理模型的输出结果趋近于标注结果,或图像处理模型的输出结果与标注结果一致。
在图像处理模型训练完成后,一般还通过测试数据来验证图像处理模型的准确性。在对图像处理模型验证时可能出现过拟合的问题,过拟合指图像处理模型可以很好地拟合训练数据的标注结果,但不能很好地拟合测试数据的标注结果,并且随着图像处理模型训练次数的增加,对训练数据的标注结果的拟合越好,对测试数据的标注结果的拟合越不好。因此过拟合问题会影响图像处理模型的准确性,而如何抑制过拟合成为了图像处理过程中需要解决的重要问题。
发明内容
本申请实施例提供一种图像处理模型训练方法及装置,用以达到抑制过拟合的效果,提高图像处理模型的准确性。
第一方面,本申请实施例提供一种图像处理模型训练方法,该方法包括:将训练数据集中的图像数据输入到图像处理模型进行处理,得到所述图像数据对应的处理结果,所述图像处理模型中n1个神经元的参数进行了放大且n2个神经元的参数进行了缩小;计算所述训练数据集中所述图像数据的标注结果与所述处理结果的误差;根据所述标注结果与所述处理结果的误差,对所述图像处理模型的参数进行调整;其中,n1、n2为正整数。
本申请实施例中,图像处理模型训练装置通过对图像处理模型中神经元的参数进行缩放,对图像处理模型的训练过程进行加扰,提高图像处理模型的抗干扰能力,从而起到抑制过拟合的效果,提高图像处理模型的准确性,并且还保证了图像处理模型的训练效率。
在一种可能的设计中,所述图像处理模型为基于神经网络架构的模型,所述神经网络 架构包括M个神经网络层,所述M个神经网络层包括输入层、隐含层和输出层;所述图像处理模型中m个神经网络层中的n1个神经元的参数进行了放大,且所述m个神经网络层中的n2个神经元的参数进行缩小;其中,M、m为正整数,m小于或等于M。在每次训练过程中选取m个神经网络层中神经元的参数进行缩放,通过每次训练过程中选取的m个神经网络层的不同,能够进一步地提高图像处理模型的抗干扰能力,从而进一步达到更好的抑制过拟合的效果。
在一种可能的设计中,所述将训练数据集中的图像数据输入到图像处理模型进行处理之前,所述方法还包括:确定所述m个神经网络层中每个神经网络层的缩放比率和缩放倍数,其中缩放倍数包括缩小倍数和放大倍数;根据所述m个神经网络层中每个神经网络层的缩放比率,确定所述每个神经网络层中待放大参数的神经元和待缩小参数的神经元,n1为所述每个神经网络层中待放大参数的神经元的数量之和,n2为所述每个神经网络层中待缩小参数的神经元的数量之和;根据所述m个神经网络层中每个神经网络层的放大倍数,对所述每个神经网络层中待放大参数的神经元的参数进行放大;根据所述每个神经网络层的缩小倍数,对所述每个神经网络层中待缩小参数的神经元的参数进行缩小。在每次训练之前,选取待放大参数的神经元和待缩小参数的神经元,然后采用对应的放大倍数对待放大参数的神经元的参数放大,采用对应的放大倍数对待缩小参数的神经元的参数缩小,以在每次训练之前对图像处理模型增加干扰,能够进一步地提高图像处理模型的抗干扰能力,从而进一步达到更好的抑制过拟合的效果。
在一种可能的设计中,所述m个神经网络层中的每个神经网络层中包括至少一组待放大参数的神经元和至少一组待缩小参数的神经元,所述至少一组待放大参数的神经元和所述至少一组待缩小参数的神经元组成N组神经元;所述根据所述m个神经网络层中每个神经网络层的放大倍数,对所述每个神经网络层中待放大参数的神经元的参数进行放大,包括:根据所述每个神经网络层中每组待放大参数的神经元对应的放大倍数,对所述每组待放大参数的神经元中神经元的参数进行放大;所述根据所述每个神经网络层的缩小倍数,对所述每个神经网络层中待缩小参数的神经元的参数进行缩小,包括:根据所述每个神经网络层中每组待缩小参数的神经元对应的缩小倍数,对所述每组待缩小参数的神经元中神经元的参数进行缩小。通过选取不同组的神经元,可以提供更多种放大和缩小的组合,从而在每次训练之前对图像处理模型增加干扰,能够进一步地提高图像处理模型的抗干扰能力,从而进一步达到更好的抑制过拟合的效果。
N组神经元中每组神经元内的神经元的数量可以相同或不同。
在一种可能的设计中,所述N组神经元中每组神经元内的神经元的数量相同,满足以下条件:N为每组待放大参数的神经元对应的放大倍数与每组待缩小参数的神经元对应的缩小倍数之和。
在一种可能的设计中,所述N组神经元中每组神经元内的神经元的数量不同,满足以下条件:N为每组待放大参数的神经元内所有神经元的放大倍数与每组待缩小参数的神经元内所有神经元的缩小倍数之和,其中所述每组待放大参数的神经元内所有神经元的放大倍数为所述每组待放大参数的神经元的数量与对应的放大倍数的乘积,所述每组待缩小参数的神经元内所有神经元的缩小倍数为所述每组待缩小参数的神经元的数量与对应的缩小倍数的乘积。
在一种可能的设计中,所述图像数据为所述训练数据集中的全部图像数据或部分图像 数据。
在一种可能的设计中,所述根据所述标注结果与所述处理结果的误差,对所述图像处理模型的参数进行调整之后,所述方法还包括:对所述n1个神经元的参数进行缩小;和/或对所述n2个神经元的参数进行放大。
第二方面,本申请实施例提供一种图像处理模型训练装置,该装置可具有实现上述第一方面或第一方面的任一种可能的设计中的功能。上述图像处理模型训练装置的功能可以通过硬件实现,也可以通过硬件执行相应的软件实现,所述硬件或软件包括一个或多个与上述功能相对应的模块。该装置可以包括:处理单元,计算单元和调整单元。
其中,处理单元,用于将训练数据集中的图像数据输入到图像处理模型进行处理,得到所述图像数据对应的处理结果,所述图像处理模型中n1个神经元的参数进行了放大且n2个神经元的参数进行了缩小,其中,n1、n2为正整数;
计算单元,用于计算所述训练数据集中所述图像数据的标注结果与所述处理结果的误差;
调整单元,用于根据所述标注结果与所述处理结果的误差,对所述图像处理模型的参数进行调整。
在一种可能的设计中,所述图像处理模型为基于神经网络架构的模型,所述神经网络架构包括M个神经网络层,所述M个神经网络层包括输入层、隐含层和输出层;
所述图像处理模型中m个神经网络层中的n1个神经元的参数进行了放大,且所述m个神经网络层中的n2个神经元的参数进行缩小;
其中,M、m为正整数,m小于或等于M。
在一种可能的设计中,所述装置还包括:
缩放单元,用于确定所述m个神经网络层中每个神经网络层的缩放比率和缩放倍数,其中缩放倍数包括缩小倍数和放大倍数;根据所述m个神经网络层中每个神经网络层的缩放比率,确定所述每个神经网络层中待放大参数的神经元和待缩小参数的神经元,n1为所述每个神经网络层中待放大参数的神经元的数量之和,n2为所述每个神经网络层中待缩小参数的神经元的数量之和;根据所述m个神经网络层中每个神经网络层的放大倍数,对所述每个神经网络层中待放大参数的神经元的参数进行放大;根据所述每个神经网络层的缩小倍数,对所述每个神经网络层中待缩小参数的神经元的参数进行缩小。
在一种可能的设计中,所述m个神经网络层中的每个神经网络层中包括至少一组待放大参数的神经元和至少一组待缩小参数的神经元,所述至少一组待放大参数的神经元和所述至少一组待缩小参数的神经元组成N组神经元;
所述缩放单元,具体用于根据所述每个神经网络层中每组待放大参数的神经元对应的放大倍数,对所述每组待放大参数的神经元中神经元的参数进行放大;根据所述每个神经网络层中每组待缩小参数的神经元对应的缩小倍数,对所述每组待缩小参数的神经元中神经元的参数进行缩小。
在一种可能的设计中,所述N组神经元中每组神经元内的神经元的数量相同,满足以下条件:N为每组待放大参数的神经元对应的放大倍数与每组待缩小参数的神经元对应的缩小倍数之和。
在一种可能的设计中,所述N组神经元中每组神经元内的神经元的数量不同,满足以下条件:N为每组待放大参数的神经元内所有神经元的放大倍数与每组待缩小参数的神经 元内所有神经元的缩小倍数之和,其中所述每组待放大参数的神经元内所有神经元的放大倍数为所述每组待放大参数的神经元的数量与对应的放大倍数的乘积,所述每组待缩小参数的神经元内所有神经元的缩小倍数为所述每组待缩小参数的神经元的数量与对应的缩小倍数的乘积。
在一种可能的设计中,所述图像数据为所述训练数据集中的全部图像数据或部分图像数据。
在一种可能的设计中,所述装置还包括:
恢复单元,用于对所述n1个神经元的参数进行缩小;和/或对所述n2个神经元的参数进行放大。
第三方面,本申请实施例提供一种图像处理模型训练装置,该装置可具有实现上述第一方面或第一方面的任一种可能的设计中的功能。上述图像处理模型训练装置的功能可以通过硬件实现,也可以通过硬件执行相应的软件实现,所述硬件或软件包括一个或多个与上述功能相对应的模块。
该装置的结构中包括至少一个处理器,还可以包括至少一个存储器。至少一个处理器与至少一个存储器耦合,可用于执行存储器中存储的计算机程序指令,以使装置执行上述第一方面或第一方面的任一种可能的设计中的方法。可选地,该装置还包括通信接口,处理器与通信接口耦合。当装置为服务器时,该通信接口可以是收发器或输入/输出接口;当该装置为服务器中包含的芯片时,该通信接口可以是芯片的输入/输出接口。可选地,收发器可以为收发电路,输入/输出接口可以是输入/输出电路。
第四方面,本申请实施例提供一种芯片系统,包括:处理器,所述处理器与存储器耦合,所述存储器用于存储程序或指令,当所述程序或指令被所述处理器执行时,使得该芯片系统实现上述第一方面或第一方面的任一种可能的设计中的方法。
可选地,该芯片系统还包括接口电路,该接口电路用于接收代码指令并传输至所述处理器。
可选地,该芯片系统中的处理器可以为一个或多个,该处理器可以通过硬件实现也可以通过软件实现。当通过硬件实现时,该处理器可以是逻辑电路、集成电路等。当通过软件实现时,该处理器可以是一个通用处理器,通过读取存储器中存储的软件代码来实现。
可选地,该芯片系统中的存储器也可以为一个或多个。该存储器可以与处理器集成在一起,也可以和处理器分离设置,本申请并不限定。示例性的,存储器可以是非瞬时性处理器,例如只读存储器ROM,其可以与处理器集成在同一块芯片上,也可以分别设置在不同的芯片上,本申请对存储器的类型,以及存储器与处理器的设置方式不作具体限定。
第五方面,本申请实施例提供一种可读存储介质,其上存储有计算机程序或指令,当该计算机程序或指令被执行时,使得计算机执行上述第一方面或第一方面的任一种可能的设计中的方法。
第六方面,本申请实施例提供一种计算机程序产品,当计算机读取并执行所述计算机程序产品时,使得计算机执行上述第一方面或第一方面的任一种可能的设计中的方法。
附图说明
图1为本申请实施例提供的一种图像处理模型的架构示意图;
图2为应用丢弃法的图像处理模型的架构示意图;
图3为本申请实施例提供的一种训练设备的结构示意图;
图4为本申请实施例提供的一种图像处理模型训练的流程示意图;
图5为本申请实施例提供的一种神经网络结构示意图;
图6为本申请实施例提供的一种基于CNN的图像分类模型结构示意图;
图7为本申请实施例提供的一种基于VGG的图像分类模型结构示意图;
图8为本申请实施例提供一种图像分类的示意图;
图9为本申请实施例提供的一种车辆检测的流程示意图;
图10为本申请实施例提供的一种车辆检测的示意图;
图11为本申请实施例提供的一种3D CNN的结构示意图;
图12为本申请实施例提供的一种图像处理模型的训练流程示意图;
图13为本申请实施例提供的一种图像处理模型训练装置的结构示意图;
图14为本申请实施例提供的一种图像处理模型训练装置的另一结构示意图。
具体实施方式
本申请提供一种图像处理模型训练方法及装置,旨在更好地抑制在图像处理模型训练过程中产生的过拟合问题,提高图像处理模型的准确性。其中方法和装置是基于同一技术构思的,由于方法及装置解决问题的原理相似,因此装置与方法的实施可以相互参见,重复之处不再赘述。
以下对本申请实施例的部分用于进行解释说明,以便于本领域技术人员理解。
1)图像处理模型,用于对图像进行处理,如进行检测、分类、分割等处理。图像处理模型通常为基于神经网络(neural networks,NN)架构的模型。图像处理模型由多个神经网络层组成,神经网络层包括输入层、输出层和隐含层。输入层、输出层和隐含层的数量均为一个或多个,例如图1所示,图像处理模型包括一个输入层、多个隐含层和一个输出层。每个神经网络层包括多个神经元,线性算子和非线性的激励函数,线性算子包括多个权重值(weight)和一个偏置值(bias)。在本申请实施例中,权重也简称为w,偏置也简称为a。非线性的激励函数包括Sigmoid函数或线性整流函数(rectified linear unit,ReLU)函数等一种或多种。
2)神经元的参数,包括权重和/或偏置,每个神经元包括一组对应的参数。通过对图像处理模型中神经元的参数进行训练,可以实现对图像处理模型的训练。
3)神经网络,包括前馈神经网络(feedforward neural network,FNN)、卷积神经网络(convolutional neural networks,CNN)、循环神经网络(recurrent neural networks,RNN)、自动编码器(auto encoder,AE)、生成对抗网络(generative adversarial networks,GAN)等。
4)训练数据和测试数据,训练数据用于对图像处理模型进行训练,在本申请实施例中,训练数据也称为样本数据,测试数据用于验证图像处理模型的准确性。可选的,训练数据和测试数据都被标注有结果。
在图像处理模型的一次训练过程中,可以采用全部或部分训练数据进行训练。使用全部训练数据进行一次训练,可以称为一趟(epoch)训练。使用部分训练数据进行一次训练,可以称为一批(batch)训练,例如,可以预先将全部训练数据分为多个部分,一个部分训练数据称为一批数据。
5)欠拟合(underfit)和过拟合(overfit),欠拟合指图像处理模型不能很好地拟合训练数据的标注结果,过拟合指图像处理模型可以很好地拟合训练数据的标注结果,但不能地拟合测试数据的标注结果,并且随着图像处理模型训练次数的增加,对训练数据的拟合越好,对测试数据的拟合越不好。
6)缩放比率,指在每个需要缩放参数的神经网络层中,需要缩放(参数)的神经元的数量和所有神经元的数量的比值。
缩放倍数,指在每个需要缩放参数的神经网络层中,需要放大(参数)的神经元的放大倍数,和需要缩小(参数)的神经元的缩小倍数。
本申请中的“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
本申请中所涉及的多个,是指两个或两个以上。
另外,需要理解的是,在本申请的描述中,“第一”、“第二”等词汇,仅用于区分描述的目的,而不能理解为指示或暗示相对重要性,也不能理解为指示或暗示顺序。
为了便于理解本申请实施例,对本申请使用的应用场景进行说明。
在图像处理模型的训练过程中,以有监督学习为例,训练数据集中的训练数据被标注有结果,即训练数据集中包括训练数据,以及训练数据对应的标注结果。训练设备将训练数据输入到图像处理模型中进行处理,图像处理模型计算训练数据的输出结果,根据训练数据的标注结果和图像处理模型的输出结果的误差,对图像处理模型中神经元的参数进行调整。当图像处理模型的输出结果趋近于或等于训练数据的标注结果时,训练设备确定图像处理模型训练完成。
在训练完成后,训练设备采用实际场景的测试数据,验证训练完成的图像处理模型的准确性。在验证过程中,可能出现欠拟合和过拟合的问题。训练图像处理模型的目的是使图像处理模型能够正确地预测输入数据的结果,但是欠拟合和过拟合都会影响预测结果的准确。欠拟合的问题一般可以通过增加图像处理模型中神经网络层的层数和/或增加神经网络层中神经元的数量来解决。
而针对过拟合的问题,现有技术中提出了丢弃(dropout)法。在图像处理模型训练之前,训练设备确定一个丢弃率,训练设备根据丢弃率和隐含层中神经元的第一数量,确定隐含层中需要丢弃的神经元的第二数量。在图像处理模型的训练过程中,训练设备在隐含层中随机选择第二数量个神经元进行丢弃,即随机选择的第二数量个神经元不参与本次训练。丢弃法可以提高图像处理模型的泛化能力,或者说,将某个隐含层的一部分神经元丢弃后,可以对该隐含层的下一个隐含层的输入增加干扰,这样能够提高图像处理模型的抗干扰能力,从而达到抑制过拟合的效果。例如图2所示,图像处理模型的一个隐含层中共有n个神经元,分别为神经元1,神经元2,…,神经元n,在第一次训练时,训练设备随机丢弃若干个神经元,如丢弃隐含层中的神经元2,第二次训练时,训练设备随机丢弃若干个神经元,如丢弃隐含层中的神经元1。
但是对于神经元数量少的隐含层来说,如果再丢弃该隐含层中的部分神经元,容易导致图像处理模型出现欠拟合的问题。另外,由于每次训练时隐含层中参与训练的神经元的数量变少,就可能需要更多的训练次数,导致影响图像处理模型的训练效率。而如果不增 加训练过程中的训练次数,就无法很好地实现抑制过拟合。
鉴于此,本申请提出一种图像处理模型训练方法及装置来达到更好地抑制过拟合的效果。在该方法中,在图像处理模型训练前,训练设备确定缩放比率和缩放倍数,在图像处理模型的训练过程中,训练设备根据缩放比率确定每个神经网络层中需要缩放的神经元的数量,根据缩放倍数中的放大倍数对需要放大的神经元的参数进行放大,根据缩放倍数中的缩小倍数对需要缩小的神经元的参数进行缩小。这样,通过对神经元的参数进行放大和缩小,相当于对神经网络层输入的训练数据进行加扰,能够提高图像处理模型的抗干扰能力,从而起到抑制过拟合的效果。并且本申请在每次训练过程中神经网络层的神经元的数量不发生改变,因此无需增加训练次数,从而既能够起到抑制过拟合的效果,也保证了图像处理模型的训练效率。
在本申请实施例中,本申请提出的图像处理模型训练方法也称为缩放(Scaleout)法。
下面将结合附图对本申请实施例作进一步地详细描述。
参见图3所示,为本申请实施例提供的一种可能的训练设备的结构示意图,该训练设备300中可以包括控制器310、存储器320和图像处理模型330,其中控制器310中包括随机数发生器311。
图像处理模型330,用于对图像进行处理。图像处理模型330由一个输入层、多个隐含层和一个输出层组成。每个隐含层包括多个神经元,每个神经元包括一组对应的参数,神经元的参数包括权重w和偏置a。在本申请提供的缩放法中,对图像处理模型330中隐含层的神经元的参数进行缩放,然后对图像处理模型330进行训练,训练完成的图像处理模型330能够对输入的图像进行处理,并输出处理后的结果,并且能够达到抑制过拟合的效果。
例如,图像处理模型330为基于FNN结构的模型,可以通过对图像处理模型330的全连接(fully connected,FC)的隐含层中神经元的参数进行缩放,来达到抑制过拟合的效果。每个全连接的隐含层中神经元的数量在本申请中不做限定,如可以包括16个或32个等较少数量的神经元,或者可以包括1024个或2048个等较多数量的神经元。
又例如,图像处理模型330为基于CNN结构的模型,可以通过对图像处理模型330的全连接层中神经元的参数进行缩放,以达到抑制过拟合的效果。由于CNN结构自身具有强大的图像处理能力,因此基于CNN结构的图像处理模型330在图像分类、目标检测、语义/实例分割、人脸检测、人脸识别以及图像画质增强的方面都能够达到很好的处理效果。
其中在图3中以缩放隐含层中神经元的参数为例,实际可能的场景中,还可能对输入层和/或输出层中神经元的参数进行缩放,本申请不做限定。
存储器320,用于存储与图像处理模型330的训练过程相关的数据,例如,包括但不限于以下数据中的一项或多项:训练数据集(训练数据集包括训练数据和训练数据对应的标注结果)、神经网络层的层数、每个神经网络层中神经元的数量、在每次训练前每个神经元的第一参数、缩放比率和缩放倍数、以及每次训练前对哪些神经元的参数进行了放大,对哪些神经元的参数进行了缩小。例如,图像处理模型的训练数据包括图像数据,训练数据对应的标注结果包括针对图像数据中的目标对象的标注结果(如标注框)。
其中,针对每个需要缩放参数的神经网络层,不同的神经网络层可以采用同一组缩放比率和缩放倍数,或者不同的神经网络层可以采用每层对应的缩放比率和缩放参数。另外可选的,缩放比率和缩放倍数在每次训练时保持不变,或者在每次训练时进行调整,例如 缩放比率和缩放倍数随着训练次数的增加而减小。
缩放比率ratio满足以下条件:b<ratio≤c,其中ratio表示缩放比率,b≥0,c<1。b和c可以是设定的数值,也可以是根据实验结果或根据实际使用需求选取的数值。例如,b为0,0.1或0.3等,c为0.3,0.5或0.9等。一种可能的实现方式中,针对每个需要缩放参数的神经网络层,神经网络层中需要放大的神经元的数量与神经网络层中需要缩小的神经元的数量相等。即针对每个需要缩放参数的神经网络层,满足以下条件:num1=num2=M*ratio,其中M表示神经网络层中所有神经元的数量,num1表示神经网络层中需要放大的神经元的数量,num2表示神经网络层中需要缩小的神经元的数量。
缩放倍数包括放大倍数X和缩小倍数Y。放大倍数X满足以下条件:d<X<e,d﹥1,e﹥1,且e﹥d。缩小倍数Y满足以下条件:Y=f-X,其中Y表示缩小倍数,f≥e。d,e和f可以是设定的数值,也可以是根据实验结果或根据实际使用需求选取的数值。例如,d可以为1,1.5或1.9等,e可以为1.5,1.7或2等,f可以为2或5等。
例如,针对基于FNN结构的图像处理模型330,缩放比率可以在区间(0,0.5]之间进行设置,如缩放比率可以设置为0.1,0.2,…,0.5。一种可能的实现方式中,在缩放比率设置为不同的数值时,进行对比性实验,如缩放比率在区间[0.3,0.5]之间设置时,抑制过拟合的效果较好,又如缩放比率小于0.5时,图像处理模型的错误率较稳定,因此可以基于不同的需求,设置缩放比率。放大倍数可以在区间(1,2)之间进行设置,例如放大倍数可以设置为1.1,1.2,1.3,…,1.9。一种可能的实现方式中,在放大倍数设置为不同的数值时,进行对比性实验,如放大倍数在区间[1.5,1.7]之间设置时,抑制过拟合的效果较好,又如放大倍数小于1.5时,图像处理模型的错误率较稳定,因此可以基于不同的需求,设置缩放倍数。并确定缩小倍数。
又例如,针对基于CNN结构的图像处理模型330,缩放比率可以在区间[0.1,0.5]之间进行设置,如缩放比率可以设置为0.5。放大倍数可以在区间[1.1,1.9]之间进行设置,如放大倍数可以设置为1.7,若f为2,则缩小倍数为2-1.7=0.3。
控制器310,用于控制图像处理模型330的训练过程。控制器310控制图像处理模型330的训练过程可以参见图4所示。
控制器310在每次训练前,确定本次训练需要缩放参数的隐含层,并确定本次训练的缩放比率和缩放倍数。其中每次训练时需要缩放参数的隐含层可以是预先设置的,也可以是控制器310控制随机数发生器311随机选取的。
控制器310根据缩放比率,控制随机数发生器311在需要缩放参数的隐含层中随机选取需要缩放参数的神经元,其中对于每个隐含层来说,该层中需要放大参数的神经元的数量=该层所有神经元的数量*缩放比率,每层中需要放大参数的神经元的数量和需要缩小参数的神经元的数量相等。
一种可能的实现方式中,随机数发生器311在需要缩放的隐含层中以组为单位,选取需要缩放参数的神经元,如共选取N组神经元,其中N1组神经元包括需要放大参数的神经元,N2组神经元包括需要缩小参数的神经元,N=N1+N2,且N,N1和N2均为正整数。可选的,每组神经元中包括的神经元的数量可以相同或可以不同,如N组神经元中每组神经元中包括的神经元的数量不同,每组神经元中包括的神经元的数量分别为g 1,g 2,…,g n。另外可选的,每组神经元对应的缩放倍数可以不同,如N组神经元中每组神经元对应的缩放倍数分别为t 1,t 2,…,t n,n为大于或等于1,且小于或等于N的整数。
若每组神经元中包括的神经元的数量相同,N组神经元中包括的神经元的数量满足:g×N≤M,g为每组神经元中包括的神经元的数量,N为神经元的组数,M为一层中所有神经元的数量。每组神经元对应的缩放倍数满足:t 1+t 2+…+t n=N,t 1,t 2,…,t n为每组神经元分别对应的缩放倍数。
若每组神经元中包括的神经元的数量不同,N组神经元中包括的神经元的数量满足:
Figure PCTCN2020117900-appb-000001
i为大于或等于1且小于或等于N的整数,表示第i组神经元,g i为第i组神经元中包括的神经元的数量。每组神经元对应的缩放倍数满足:
Figure PCTCN2020117900-appb-000002
t i第i组神经元对应的缩放倍数。
控制器310在每次训练前,按照本次训练对应的放大倍数,对本次训练需要放大的神经元的参数进行放大,按照本次训练对应的缩小倍数,对本次训练需要缩小的神经元的参数进行缩小。
在每次训练过程中,控制器310将训练数据输入到已缩放参数的图像处理模型330中,得到图像处理模型330的处理结果,计算处理结果和训练数据的标注结果的误差。控制器310根据处理结果和训练数据的标注结果的误差,对图像处理模型330的参数进行调整。一种可能的实现方式中,以每批训练数据作为一次训练所需的训练数据。
在每次训练结束后,控制器310对本次训练过程中缩放的神经元的参数进行恢复。一种可能的实现方式中,控制器310获取在本次训练前每个神经元的第一参数,将本次训练过程中缩放的神经元的参数重置为每个缩放的神经元对应的第一参数。另一种可能的实现方式中,将本次训练过程中放大的神经元的参数除以放大倍数进行缩小,以及将本次训练过程中缩小的神经元的参数除以缩小倍数进行放大。可选的,控制器310对神经元的参数进行恢复时,也可以仅对放大的神经元的参数进行恢复,也可以仅对缩小的神经元的参数进行恢复。
图像处理模型330为基于神经网络架构的模型,在神经网络的训练过程中,包括前向过程(forward pass)和后向过程(back pass)。在图像处理模型330的训练过程中,可以仅在前向过程中采用本申请提供的缩放法,即在前向过程前对神经元的参数进行缩放,对应的,在前向过程后、后向过程前对神经元的参数进行恢复;或者可以仅在后向过程中采用本申请提供的缩放法,即在前向过程后、后向过程前对神经元的参数进行缩放,对应的,在后向过程后对神经元的参数进行恢复;或者可以在前向过程和后向过程中均采用本申请提供的缩放法。
例如,针对基于FNN结构的图像处理模型330,以图5所示的一种可能的神经网络结构为例,包括一个输入层,一个输出层和四个全连接层,其中四个隐含层分别包括:神经元的数量为2048*784的全连接+Relu层、神经元的数量为2048*2048的全连接+Relu层、神经元的数量为2048*2048的全连接+Relu层,以及神经元的数量为10*2048的全连接层。以对前三个全连接层中神经元的参数进行缩放为例,将本申请提供的缩放法与现有技术中的丢弃法(dropout)进行对比性实验,根据实验结果显示,相对于丢弃法,本申请提供的缩放法能够有效减少训练的次数,提高训练的效率,并且能够显著降低图像处理模型的错误率,更好地实现抑制过拟合的效果。
又例如,针对基于CNN结构的图像处理模型330,如图6所示,为一种可能的基于CNN的图像分类模型结构,包括输入层(图像通过输入层输入),CNN特征提取网络、一个或多个全连接层(图6中示出了两个全连接层)和输出层(输出层输出分类结果)。其 中CNN特征提取网络包括Alexnet、VGG、GoogleNet、Resnet、Densenet、Mobilenet、SeNet或Shuffnet等。如图7所示,为一种可能的VGG图像分类模型结构,包括输入层、两个卷积(conv)+relu层、池化(pooling)层、两个conv+relu层、池化层、三个conv+relu层、池化层、三个conv+relu层、池化层、三个conv+relu层、池化层、三个Fc+Relu层和输出层。例如缩放比率设置为0.5,放大倍数设置为1.7,缩小倍数设置为0.3,以及对VGG图像分类模型最后三个Fc+Relu层中的神经元的参数进行缩放,对VGG图像分类模型进行训练。将如图8中的(a)所示的待分类的第一图像输入到训练完成的VGG图像分类模型中进行处理,训练完成的VGG图像分类模型输出第一图像的处理结果,第一图像的处理结果包括针对第一图像中目标所标注的分类结果,如图8中的(b)所示将第一图像中的目标用标注框标出,并标识该目标的分类结果为“猫”。以VGG图像分类模型对动物分类进行说明,如果训练完成的VGG图像分类模型存在欠拟合的问题,则该VGG图像分类模型识别不出动物的信息,如果训练完成的VGG图像分类模型存在过拟合的问题,则对于不同种类的动物,或者是外形存在不同的同一种类的动物,该VGG图像分类模型无法准确进行分类,VGG图像分类模型适应性差。
以基于CNN的目标检测模型为例,目标检测模型包括Faster R-CNN、R-FCN或SSD等。以基于Faster R-CNN的目标检测模型为例,将缩放比率设置为0.5,放大倍数设置为1.7,缩小倍数设置为0.3。若Faster R-CNN的目标检测模型包括两个Fc+ReLU层,每个Fc+ReLU层包括1024个神经元,针对每个Fc+ReLU层,Fc+ReLU层中需要放大的神经元的数量为512,需要缩小的神经元的数量为512。在Faster R-CNN的目标检测模型的训练过程中,针对每个Fc+ReLU层,在Fc+ReLU层随机选取512个神经元作为需要放大的神经元,随机选取512个神经元作为需要缩小的神经元,对需要放大的512个神经元的参数放大1.7倍,将需要缩小的512个神经元的参数缩小0.3。
通常对车辆的检测,如果训练完成的目标检测模型存在欠拟合的问题,则该从图像中可能识别不出车辆信息。相反,如果训练完成的目标检测模型存在过拟合的问题,则适应性差,可能对A车辆适用的模型应用到B车辆就失去准确性了。因此,还可以将上述模型应用到智能监控领域,或者自动驾驶领域,以更加准确地识别出车辆信息。
以Faster R-CNN的目标检测模型检测车辆为例,如图9所示为一种可能的车辆检测流程示意图,道路监控摄像头安装在交通路口处,用于采集交通路口的交通情况和经过交通路口的车辆信息。道路监控摄像头将采集到的视频数据发送给用于进行车辆检测的设备,用于进行车辆检测的设备中包括训练完成的车辆检测模型,用于进行车辆检测的设备可以为服务器,或者为对车辆检测模型进行训练的训练设备等。该设备接收道路监控摄像头采集到的视频数据,对视频数据进行解码,在解码后的视频数据中获取视频图像,将每帧视频图像进行格式转换(如转换为蓝绿红(blue green red,BGR)格式),将格式转换后的每帧视频图像的尺寸进行处理(如对视频图像进行缩放(scale)和/或重新设定视频图像的尺寸(resize)),例如,如图10中的(a)所示为处理尺寸后得到的一帧视频图像,在本申请实施例中称为第二图像。该设备将图10中的(a)所示的第二图像输入到训练完成的车辆检测模型中进行处理,训练完成的车辆检测模型输出第二图像的处理结果,第二图像的处理结果包括针对第二图像中目标所标注的检测结果,如图10中(b)所示将第二图像中的目标用标注框标出,并标识该目标的检测结果为“汽车”。
可以理解的是,车辆检测模型中可以对图像中有无车辆进行检测,也可以对图像中车 辆的类型进行检测,车辆的类型可以包括机动车和非机动车等类型(如图10中的(b)所示,检测到车辆并检测到车辆为汽车),也可以包括车辆的厂家、品牌等类型。
另外可选的,多个道路监控摄像头之间可以联动,如位于一个区域内,或者是特定行驶路线上的多个道路监控摄像头之间可以联动,联动后的多个道路监控摄像头采集到的视频数据之间可以进行共享,例如可以根据每个交通路口的交通情况智能化地为车辆提供行驶路线。或者道路监控摄像头也可以接入公安交通系统,公安交通系统可以对道路监控摄像头采集到的视频图像进行分析,例如可以根据分析结果确定经过该道路监控摄像头所在的交通路口的车辆是否存在违法违规行为,或者可以根据分析结果确定该道路监控摄像头所在的交通路口是否存在交通拥挤的情况,从而通知交通路口附近的交警协助疏导交通等。
在其他可能的场景中,例如随着短视频、直播视频等业务的快速发展,如何更好地对用户观看和感兴趣的视频内容进行分析,从而来为用户提供更满足其需求的搜索和推荐等功能具有重要的意义。三维(3Dimensions,3D)CNN在视频分类、动作识别等方面能够达到很好的效果,区别于CNN把视频中每一帧图像当做静态图片进行处理,3D CNN在进行视频处理时,能够考虑到视频中连续帧间的运动信息,从而更好地捕获分析出视频在时间维度和空间维度的运动信息。在3D CNN对视频进行处理时,3D CNN中将视频中多个连续图像帧堆叠组成立方体,由于立方体中的多个图像帧具有时间上的连续性,因此可以通过3D卷积核捕捉该立方体中的运动信息。
3D CNN也能够和本申请提供的缩放法结合使用。如图11所示为一种可能的3D CNN架构,包括输入层、conv1a层、pooling层、conv2a层、pooling层、conv3a层、conv3b层、pooling层、conv4a层、conv4b层、pooling层、conv5a层、conv5b层、pooling层、Fc6层、Fc7层和输出层,其中在训练过程中,对Fc6层和Fc7层这两个全连接层中的神经元的参数进行了缩放。例如缩放比率设置为0.5,放大倍数设置为1.7,缩小倍数设置为0.3。图11所示的3D CNN可以用于检测视频中的精彩片段,将待检测的视频片段输入到训练完成的3D CNN中进行处理,训练完成的3D CNN输出该视频片段的精彩度分值。
值得注意的,图3仅是本申请实施例提供的一种可能的训练设备的结构示意图,图中所示模块之间的位置关系不构成任何限制,例如,在图3中,存储器320内置于训练设备中,在其它情况下,也可以是外部存储器。其中,训练设备可以为个人电脑(personal computer,PC)、笔记本电脑、服务器等设备。
可以理解,本申请提供的缩放法也可以应用在自动机器学习(AutoML)或者神经网络架构搜索(neural architecture search,NAS)中,相比于丢弃法,本申请提供的缩放法所需的训练次数更少,对于AutoML和NAS需要多次尝试不同的超参数来训练的场景,也可以减少尝试模型搜索以及训练的时间。对于自然语言识别、语言识别等模型的训练时,也可以应用本申请提供的缩放法,以达到更好地抑制过拟合的效果。
此外,本申请提供的缩放法也可以作为神经网络模型训练的缩放法算子,来提供给公有云的租户使用,这样,公有云的租户在建立自己的深度学习模型时,也可以利用公有云提供的神经网络模型训练的缩放法算子,对自己的深度学习模型进行训练,来达到更好的效果。
结合上述实施例及附图,如图12所示,本申请实施例提供一种图像处理模型训练方法。包括以下步骤:
S1201:将训练数据集中的图像数据输入到图像处理模型进行处理,得到所述图像数据对应的处理结果,所述图像处理模型中n1个神经元的参数进行了放大且n2个神经元的参数进行了缩小;其中,n1、n2为正整数。
在该S1201之前,图像处理模型训练装置还可以在存储器中获取训练数据集,训练数据集中包括图像数据。对于有监督学习的场景,训练数据集中包括训练数据及训练数据对应的标注结果。
可选的,S1201中输入图像处理模型的所述图像数据为所述训练数据集中的全部图像数据或部分图像数据。也就是说,在图像处理模型的一次训练中,可以采用训练数据集中的全部或部分训练数据进行训练,其中采用训练数据集中的全部训练数据进行一次训练可以称为一趟训练,采用训练数据集中的部分训练数据进行一次训练可以称为一批训练。
存储器可以为内部存储器,如图3所示存储器内置于图像处理模型训练装置中,或者存储器也可以是外部存储器(如硬盘、软盘或光盘等)。
示例的,所述图像处理模型为基于神经网络架构的模型,所述神经网络架构包括M个神经网络层,所述M个神经网络层包括输入层、隐含层和输出层,其中,M为正整数。
在S1201中,图像处理模型的神经网络层中的神经元的参数进行了缩放,例如M个神经网络层中有m个需要缩放参数的神经网络层,所述图像处理模型中m个神经网络层中的n1个神经元的参数进行了放大,且所述m个神经网络层中的n2个神经元的参数进行了缩小,其中,m为正整数,m小于或等于M。
在一些实施例中,在S1201之前,图像处理模型训练装置还可以确定所述m个神经网络层中每个神经网络层的缩放比率和缩放倍数,缩放倍数包括缩小倍数和放大倍数。每个神经网络层的缩放比率和缩放倍数也可以保存在存储器中。
缩放比率指在m个需要缩放参数的神经网络层中的每个神经网络层中,需要缩放的神经元数量和所有神经元的数值的比值。图像处理模型训练装置可以根据所述m个神经网络层中每个神经网络层的缩放比率,确定所述每个神经网络层中待放大参数的神经元和待缩小参数的神经元。在一些实施例中,每个神经网络层中需要放大参数的神经元的数量和需要缩小参数的神经元的数量相等,n1=n2,n1为所述每个神经网络层中待放大参数的神经元的数量之和,n2为所述每个神经网络层中待缩小参数的神经元的数量之和。
示例的,图像处理模型训练装置可以在m个神经网络层中每个神经网络层中,以组为单位选取需要缩放的神经元,如共选取N组神经元,m个神经网络层中的每个神经网络层中包括至少一组待放大参数的神经元和至少一组待缩小参数的神经元,所述至少一组待放大参数的神经元和所述至少一组待缩小参数的神经元组成N组神经元。N组神经元中每组神经元内的神经元的数量可以相同或不同。
缩放倍数指在m个需要缩放参数的神经网络层中的每个神经网络层中,需要放大参数的神经元的放大倍数和需要缩小参数的神经元的缩小倍数。图像处理模型训练装置可以根据所述m个神经网络层中每个神经网络层的放大倍数,对所述每个神经网络层中待放大参数的神经元的参数进行放大;根据所述每个神经网络层的缩小倍数,对所述每个神经网络层中待缩小参数的神经元的参数进行缩小。
示例的,若图像处理模型训练装置在m个神经网络层中每个神经网络层中选取了N组神经元,图像处理模型训练装置还可以针对m个神经网络层中的每个神经网络层,确定每个神经网络层中每组待放大参数的神经元对应的放大倍数,及每个神经网络层中每组待 缩小参数的神经元对应的缩小倍数。这样,图像处理模型训练装置在对神经元的参数进行缩放时,可以是根据每个神经网络层中每组待放大参数的神经元对应的放大倍数,对每组待放大参数的神经元中神经元的参数进行放大,根据每个神经网络层中每组待缩小参数的神经元对应的缩小倍数,对每组待缩小参数的神经元中神经元的参数进行缩小。
若所述N组神经元中每组神经元内的神经元的数量相同,满足以下条件:N为每组待放大参数的神经元对应的放大倍数与每组待缩小参数的神经元对应的缩小倍数之和。
若所述N组神经元中每组神经元内的神经元的数量不同,满足以下条件:N为每组待放大参数的神经元内所有神经元的放大倍数与每组待缩小参数的神经元内所有神经元的缩小倍数之和,其中所述每组待放大参数的神经元内所有神经元的放大倍数为所述每组待放大参数的神经元的数量与对应的放大倍数的乘积,所述每组待缩小参数的神经元内所有神经元的缩小倍数为所述每组待缩小参数的神经元的数量与对应的缩小倍数的乘积。
该S1201输出的图像数据对应的处理结果为图像处理模型的预测值。
S1202:计算所述训练数据集中所述图像数据的标注结果与所述处理结果的误差。
在一些实施例中,可以通过损失函数(loss function),计算图像处理的标注结果与处理结果的误差。一般的,所述函数的输出值(loss)越高表示误差越大,那么对图像处理模型的训练过程就变成了尽可能缩小这个loss的过程。
S1203:根据所述标注结果与所述处理结果的误差,对所述图像处理模型的参数进行调整。
图像处理模型训练装置根据标注结果与处理结果的误差来更新图像处理模型中的参数,通过不断的调整直至图像处理模型预测的图像数据的处理结果趋近于或等于图像数据的标注结果,完成对图像处理模型的训练。
同时为了达到抑制过拟合的效果,图像处理模型训练装置在本次调整图像处理模型的参数后,在下一次将图像数据输入到图像处理模型中之前,可以在本次调整图像处理模型的参数的基础上,对图像处理模型中神经元的参数进行缩放。
由于在本次训练过程中对图像处理模型中神经元的参数进行了缩放,图像处理模型训练装置还可以在对所述图像处理模型的参数进行调整之后,对所述n1个神经元的参数进行缩小;和/或对所述n2个神经元的参数进行放大。示例的,将本次训练过程中n1个神经元的参数除以每个神经元对应的放大倍数进行缩小,以及将本次训练过程中n2个神经元的参数除以每个神经元对应的缩小倍数进行放大。
本申请实施例中图12所示的具体实现方式可以参见上述相关实施例的介绍。
本申请实施例中各个实施例可以相互结合使用,也可以单独使用。
图12中主要从方法流程的角度对本申请提供的方案进行了介绍。可以理解的是,为了实现上述功能,装置可以包括执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在采用集成的单元的情况下,图13示出了本申请实施例中所涉及的图像处理模型训 练装置的可能的示例性框图,该图像处理模型训练装置1300可以以软件的形式存在。图像处理模型训练装置1300可以包括:处理单元1301,计算单元1302和调整单元1303。
该图像处理模型训练装置1300可以为上述图3中的训练设备、或者还可以为设置在训练设备中的半导体芯片。具体地,在一个实施例中,处理单元1301,用于将训练数据集中的图像数据输入到图像处理模型进行处理,得到所述图像数据对应的处理结果,所述图像处理模型中n1个神经元的参数进行了放大且n2个神经元的参数进行了缩小,其中,n1、n2为正整数;
计算单元1302,用于计算所述训练数据集中所述图像数据的标注结果与所述处理结果的误差;
调整单元1303,用于根据所述标注结果与所述处理结果的误差,对所述图像处理模型的参数进行调整。
在一种可能的设计中,所述图像处理模型为基于神经网络架构的模型,所述神经网络架构包括M个神经网络层,所述M个神经网络层包括输入层、隐含层和输出层;
所述图像处理模型中m个神经网络层中的n1个神经元的参数进行了放大,且所述m个神经网络层中的n2个神经元的参数进行缩小;
其中,M、m为正整数,m小于或等于M。
在一种可能的设计中,所述装置还包括:
缩放单元1304,用于确定所述m个神经网络层中每个神经网络层的缩放比率和缩放倍数,其中缩放倍数包括缩小倍数和放大倍数;根据所述m个神经网络层中每个神经网络层的缩放比率,确定所述每个神经网络层中待放大参数的神经元和待缩小参数的神经元,n1为所述每个神经网络层中待放大参数的神经元的数量之和,n2为所述每个神经网络层中待缩小参数的神经元的数量之和;根据所述m个神经网络层中每个神经网络层的放大倍数,对所述每个神经网络层中待放大参数的神经元的参数进行放大;根据所述每个神经网络层的缩小倍数,对所述每个神经网络层中待缩小参数的神经元的参数进行缩小。
在一种可能的设计中,所述m个神经网络层中的每个神经网络层中包括至少一组待放大参数的神经元和至少一组待缩小参数的神经元,所述至少一组待放大参数的神经元和所述至少一组待缩小参数的神经元组成N组神经元;
所述缩放单元1304,具体用于根据所述每个神经网络层中每组待放大参数的神经元对应的放大倍数,对所述每组待放大参数的神经元中神经元的参数进行放大;根据所述每个神经网络层中每组待缩小参数的神经元对应的缩小倍数,对所述每组待缩小参数的神经元中神经元的参数进行缩小。
在一种可能的设计中,所述N组神经元中每组神经元内的神经元的数量相同,满足以下条件:N为每组待放大参数的神经元对应的放大倍数与每组待缩小参数的神经元对应的缩小倍数之和。
在一种可能的设计中,所述N组神经元中每组神经元内的神经元的数量不同,满足以下条件:N为每组待放大参数的神经元内所有神经元的放大倍数与每组待缩小参数的神经元内所有神经元的缩小倍数之和,其中所述每组待放大参数的神经元内所有神经元的放大倍数为所述每组待放大参数的神经元的数量与对应的放大倍数的乘积,所述每组待缩小参数的神经元内所有神经元的缩小倍数为所述每组待缩小参数的神经元的数量与对应的缩小倍数的乘积。
在一种可能的设计中,所述图像数据为所述训练数据集中的全部图像数据或部分图像数据。
在一种可能的设计中,所述装置还包括:
恢复单元1305,用于对所述n1个神经元的参数进行缩小;和/或对所述n2个神经元的参数进行放大。
本申请实施例中对单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。在本申请的实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
如图14所示,本申请实施例还提供了另一种可能的图像处理模型训练装置的结构示意图,该图像处理模型训练装置包括至少一个处理器1402和至少一个通信接口1404。进一步地,该图像处理模型训练装置中还可以包括存储器1406,所述存储器1406用于存储计算机程序或指令。所述存储器1406既可以是处理器内的存储器,也可以是处理器之外的存储器。在图13中所描述的各单元模块为通过软件实现的情况下,所述处理模块1402执行相应动作所需的软件或程序代码存储在存储器1406中。所述处理器1402用于执行存储器1406中的程序或指令,以实现上述实施例中图12所示的步骤。通信接口1404用于实现该装置与其他装置之间的通信。
在存储器1406置于处理器之外的情况下,所述存储器1406、处理器1402和通信接口1404通过总线1408相互连接,所述总线1408可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。应理解,总线可以分为地址总线、数据总线、控制总线等。为便于表示,图14中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
需要说明的是,该装置1400中的各个模块的操作和/或功能分别为了实现图12中所示方法的相应流程,为了简洁,在此不再赘述。
本申请实施例还提供一种芯片系统,包括:处理器,所述处理器与存储器耦合,所述存储器用于存储程序或指令,当所述程序或指令被所述处理器执行时,使得该芯片系统实现上述任一方法实施例中的方法。
可选地,该芯片系统中的处理器可以为一个或多个。该处理器可以通过硬件实现也可以通过软件实现。当通过硬件实现时,该处理器可以是逻辑电路、集成电路等。当通过软件实现时,该处理器可以是一个通用处理器,通过读取存储器中存储的软件代码来实现。
可选地,该芯片系统中的存储器也可以为一个或多个。该存储器可以与处理器集成在一起,也可以和处理器分离设置,本申请并不限定。示例性的,存储器可以是非瞬时性处 理器,例如只读存储器ROM,其可以与处理器集成在同一块芯片上,也可以分别设置在不同的芯片上,本申请对存储器的类型,以及存储器与处理器的设置方式不作具体限定。
示例性的,该芯片系统可以是现场可编程门阵列(field programmable gate array,FPGA),可以是专用集成芯片(application specific integrated circuit,ASIC),还可以是系统芯片(system on chip,SoC),还可以是中央处理器(central processor unit,CPU),还可以是网络处理器(network processor,NP),还可以是数字信号处理电路(digital signal processor,DSP),还可以是微控制器(micro controller unit,MCU),还可以是可编程控制器(programmable logic device,PLD)或其他集成芯片。
应理解,上述方法实施例中的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。结合本申请实施例所公开的方法步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。
本申请实施例还提供一种计算机可读存储介质,所述计算机存储介质中存储有计算机可读指令,当计算机读取并执行所述计算机可读指令时,使得计算机执行上述任一方法实施例中的方法。
本申请实施例还提供一种计算机程序产品,当计算机读取并执行所述计算机程序产品时,使得计算机执行上述任一方法实施例中的方法。
应理解,本申请实施例中提及的处理器可以是中央处理单元(central processing unit,CPU),还可以是其他通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
还应理解,本申请实施例中提及的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。
需要说明的是,当处理器为通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件时,存储器(存储模块)集成在处理器中。
应注意,本文描述的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实施过程构成任何限定。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以 硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应所述以权利要求的保护范围为准。

Claims (19)

  1. 一种图像处理模型训练方法,其特征在于,包括:
    将训练数据集中的图像数据输入到图像处理模型进行处理,得到所述图像数据对应的处理结果,所述图像处理模型中n1个神经元的参数进行了放大且n2个神经元的参数进行了缩小;
    计算所述训练数据集中所述图像数据的标注结果与所述处理结果的误差;
    根据所述标注结果与所述处理结果的误差,对所述图像处理模型的参数进行调整;
    其中,n1、n2为正整数。
  2. 如权利要求1所述的方法,其特征在于,所述图像处理模型为基于神经网络架构的模型,所述神经网络架构包括M个神经网络层,所述M个神经网络层包括输入层、隐含层和输出层;
    所述图像处理模型中m个神经网络层中的n1个神经元的参数进行了放大,且所述m个神经网络层中的n2个神经元的参数进行缩小;
    其中,M、m为正整数,m小于或等于M。
  3. 如权利要求2所述的方法,其特征在于,所述将训练数据集中的图像数据输入到图像处理模型进行处理之前,所述方法还包括:
    确定所述m个神经网络层中每个神经网络层的缩放比率和缩放倍数,其中缩放倍数包括缩小倍数和放大倍数;
    根据所述m个神经网络层中每个神经网络层的缩放比率,确定所述每个神经网络层中待放大参数的神经元和待缩小参数的神经元,n1为所述每个神经网络层中待放大参数的神经元的数量之和,n2为所述每个神经网络层中待缩小参数的神经元的数量之和;
    根据所述m个神经网络层中每个神经网络层的放大倍数,对所述每个神经网络层中待放大参数的神经元的参数进行放大;根据所述每个神经网络层的缩小倍数,对所述每个神经网络层中待缩小参数的神经元的参数进行缩小。
  4. 如权利要求3所述的方法,其特征在于,所述m个神经网络层中的每个神经网络层中包括至少一组待放大参数的神经元和至少一组待缩小参数的神经元,所述至少一组待放大参数的神经元和所述至少一组待缩小参数的神经元组成N组神经元;
    所述根据所述m个神经网络层中每个神经网络层的放大倍数,对所述每个神经网络层中待放大参数的神经元的参数进行放大,包括:
    根据所述每个神经网络层中每组待放大参数的神经元对应的放大倍数,对所述每组待放大参数的神经元中神经元的参数进行放大;
    所述根据所述每个神经网络层的缩小倍数,对所述每个神经网络层中待缩小参数的神经元的参数进行缩小,包括:
    根据所述每个神经网络层中每组待缩小参数的神经元对应的缩小倍数,对所述每组待缩小参数的神经元中神经元的参数进行缩小。
  5. 如权利要求4所述的方法,其特征在于,所述N组神经元中每组神经元内的神经元的数量相同,满足以下条件:N为每组待放大参数的神经元对应的放大倍数与每组待缩小参数的神经元对应的缩小倍数之和。
  6. 如权利要求4所述的方法,其特征在于,所述N组神经元中每组神经元内的神经 元的数量不同,满足以下条件:N为每组待放大参数的神经元内所有神经元的放大倍数与每组待缩小参数的神经元内所有神经元的缩小倍数之和,其中所述每组待放大参数的神经元内所有神经元的放大倍数为所述每组待放大参数的神经元的数量与对应的放大倍数的乘积,所述每组待缩小参数的神经元内所有神经元的缩小倍数为所述每组待缩小参数的神经元的数量与对应的缩小倍数的乘积。
  7. 如权利要求1-6任一项所述的方法,其特征在于,所述图像数据为所述训练数据集中的全部图像数据或部分图像数据。
  8. 如权利要求1-7任一项所述的方法,其特征在于,所述根据所述标注结果与所述处理结果的误差,对所述图像处理模型的参数进行调整之后,所述方法还包括:
    对所述n1个神经元的参数进行缩小;和/或
    对所述n2个神经元的参数进行放大。
  9. 一种图像处理模型训练装置,其特征在于,所述装置包括:
    处理单元,用于将训练数据集中的图像数据输入到图像处理模型进行处理,得到所述图像数据对应的处理结果,所述图像处理模型中n1个神经元的参数进行了放大且n2个神经元的参数进行了缩小,其中,n1、n2为正整数;
    计算单元,用于计算所述训练数据集中所述图像数据的标注结果与所述处理结果的误差;
    调整单元,用于根据所述标注结果与所述处理结果的误差,对所述图像处理模型的参数进行调整。
  10. 如权利要求9所述的装置,其特征在于,所述图像处理模型为基于神经网络架构的模型,所述神经网络架构包括M个神经网络层,所述M个神经网络层包括输入层、隐含层和输出层;
    所述图像处理模型中m个神经网络层中的n1个神经元的参数进行了放大,且所述m个神经网络层中的n2个神经元的参数进行缩小;
    其中,M、m为正整数,m小于或等于M。
  11. 如权利要求10所述的装置,其特征在于,所述装置还包括:
    缩放单元,用于确定所述m个神经网络层中每个神经网络层的缩放比率和缩放倍数,其中缩放倍数包括缩小倍数和放大倍数;根据所述m个神经网络层中每个神经网络层的缩放比率,确定所述每个神经网络层中待放大参数的神经元和待缩小参数的神经元,n1为所述每个神经网络层中待放大参数的神经元的数量之和,n2为所述每个神经网络层中待缩小参数的神经元的数量之和;根据所述m个神经网络层中每个神经网络层的放大倍数,对所述每个神经网络层中待放大参数的神经元的参数进行放大;根据所述每个神经网络层的缩小倍数,对所述每个神经网络层中待缩小参数的神经元的参数进行缩小。
  12. 如权利要求11所述的装置,其特征在于,所述m个神经网络层中的每个神经网络层中包括至少一组待放大参数的神经元和至少一组待缩小参数的神经元,所述至少一组待放大参数的神经元和所述至少一组待缩小参数的神经元组成N组神经元;
    所述缩放单元,具体用于根据所述每个神经网络层中每组待放大参数的神经元对应的放大倍数,对所述每组待放大参数的神经元中神经元的参数进行放大;根据所述每个神经网络层中每组待缩小参数的神经元对应的缩小倍数,对所述每组待缩小参数的神经元中神经元的参数进行缩小。
  13. 如权利要求12所述的装置,其特征在于,所述N组神经元中每组神经元内的神经元的数量相同,满足以下条件:N为每组待放大参数的神经元对应的放大倍数与每组待缩小参数的神经元对应的缩小倍数之和。
  14. 如权利要求12所述的装置,其特征在于,所述N组神经元中每组神经元内的神经元的数量不同,满足以下条件:N为每组待放大参数的神经元内所有神经元的放大倍数与每组待缩小参数的神经元内所有神经元的缩小倍数之和,其中所述每组待放大参数的神经元内所有神经元的放大倍数为所述每组待放大参数的神经元的数量与对应的放大倍数的乘积,所述每组待缩小参数的神经元内所有神经元的缩小倍数为所述每组待缩小参数的神经元的数量与对应的缩小倍数的乘积。
  15. 如权利要求9-14任一项所述的装置,其特征在于,所述图像数据为所述训练数据集中的全部图像数据或部分图像数据。
  16. 如权利要求9-15任一项所述的装置,其特征在于,所述装置还包括:
    恢复单元,用于对所述n1个神经元的参数进行缩小;和/或对所述n2个神经元的参数进行放大。
  17. 一种图像处理模型训练装置,其特征在于,所述装置包括至少一个处理器,所述至少一个处理器与至少一个存储器耦合:
    所述至少一个处理器,用于执行所述至少一个存储器中存储的计算机程序或指令,以使得所述装置执行如权利要求1至8中任一项所述的方法。
  18. 一种可读存储介质,其特征在于,用于存储指令,当所述指令被执行时,使如权利要求1至8中任一项所述的方法被实现。
  19. 一种图像处理模型训练装置,其特征在于,包括处理器和接口电路;
    所述接口电路,用于接收代码指令并传输至所述处理器;
    所述处理器用于运行所述代码指令以执行如权利要求1至8中任一项所述的方法。
PCT/CN2020/117900 2020-01-23 2020-09-25 一种图像处理模型训练方法及装置 WO2021147365A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP20915287.5A EP4080415A4 (en) 2020-01-23 2020-09-25 IMAGE PROCESSING MODEL LEARNING METHOD AND DEVICE
US17/871,389 US20220366254A1 (en) 2020-01-23 2022-07-22 Image Processing Model Training Method and Apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010077091.4 2020-01-23
CN202010077091.4A CN113160027A (zh) 2020-01-23 2020-01-23 一种图像处理模型训练方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/871,389 Continuation US20220366254A1 (en) 2020-01-23 2022-07-22 Image Processing Model Training Method and Apparatus

Publications (1)

Publication Number Publication Date
WO2021147365A1 true WO2021147365A1 (zh) 2021-07-29

Family

ID=76882122

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/117900 WO2021147365A1 (zh) 2020-01-23 2020-09-25 一种图像处理模型训练方法及装置

Country Status (4)

Country Link
US (1) US20220366254A1 (zh)
EP (1) EP4080415A4 (zh)
CN (1) CN113160027A (zh)
WO (1) WO2021147365A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114501164A (zh) * 2021-12-28 2022-05-13 海信视像科技股份有限公司 音视频数据的标注方法、装置及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103548042A (zh) * 2011-05-25 2014-01-29 高通股份有限公司 用于对初级视皮层简单细胞和其他神经电路的输入突触进行无监督训练的方法和设备
US20180005111A1 (en) * 2016-06-30 2018-01-04 International Business Machines Corporation Generalized Sigmoids and Activation Function Learning
CN109344888A (zh) * 2018-09-19 2019-02-15 广东工业大学 一种基于卷积神经网络的图像识别方法、装置及设备
CN110084368A (zh) * 2018-04-20 2019-08-02 谷歌有限责任公司 用于正则化神经网络的系统和方法
CN110428042A (zh) * 2018-05-01 2019-11-08 国际商业机器公司 往复地缩放神经元的连接权重和输入值来挫败硬件限制

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875779A (zh) * 2018-05-07 2018-11-23 深圳市恒扬数据股份有限公司 神经网络的训练方法、装置及终端设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103548042A (zh) * 2011-05-25 2014-01-29 高通股份有限公司 用于对初级视皮层简单细胞和其他神经电路的输入突触进行无监督训练的方法和设备
US20180005111A1 (en) * 2016-06-30 2018-01-04 International Business Machines Corporation Generalized Sigmoids and Activation Function Learning
CN110084368A (zh) * 2018-04-20 2019-08-02 谷歌有限责任公司 用于正则化神经网络的系统和方法
CN110428042A (zh) * 2018-05-01 2019-11-08 国际商业机器公司 往复地缩放神经元的连接权重和输入值来挫败硬件限制
CN109344888A (zh) * 2018-09-19 2019-02-15 广东工业大学 一种基于卷积神经网络的图像识别方法、装置及设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YUAN HONGJIN, QING PAN: "Research on Image Classification Based on DropWeight Algorithm", INFORMATION & COMMUNICATIONS, no. 194, 1 February 2019 (2019-02-01), pages 25 - 29, XP055830920, ISSN: 1673-1131 *

Also Published As

Publication number Publication date
EP4080415A4 (en) 2023-01-18
US20220366254A1 (en) 2022-11-17
CN113160027A (zh) 2021-07-23
EP4080415A1 (en) 2022-10-26

Similar Documents

Publication Publication Date Title
CN109493350B (zh) 人像分割方法及装置
EP3933693A1 (en) Object recognition method and device
WO2018036276A1 (zh) 图片品质的检测方法、装置、服务器及存储介质
WO2021196873A1 (zh) 车牌字符识别方法、装置、电子设备和存储介质
US20220230282A1 (en) Image processing method, image processing apparatus, electronic device and computer-readable storage medium
EP4099220A1 (en) Processing apparatus, method and storage medium
US20190220652A1 (en) Face anti-spoofing using spatial and temporal convolutional neural network analysis
WO2021136528A1 (zh) 一种实例分割的方法及装置
WO2021027193A1 (zh) 人脸聚类方法、装置、设备和存储介质
WO2021208667A1 (zh) 图像处理方法及装置、电子设备和存储介质
US20210342593A1 (en) Method and apparatus for detecting target in video, computing device, and storage medium
JP7222209B2 (ja) イベント検出に用いられる深層学習ネットワーク、該ネットワークの訓練装置及び訓練方法
WO2023206944A1 (zh) 一种语义分割方法、装置、计算机设备和存储介质
CN110533046B (zh) 一种图像实例分割方法、装置、计算机可读存储介质及电子设备
WO2019214240A1 (zh) 动态图片的生成
CN111079507A (zh) 一种行为识别方法及装置、计算机装置及可读存储介质
CN112183649A (zh) 一种用于对金字塔特征图进行预测的算法
JP2023507248A (ja) 物体検出および認識のためのシステムおよび方法
WO2021147365A1 (zh) 一种图像处理模型训练方法及装置
WO2024041108A1 (zh) 图像矫正模型训练及图像矫正方法、装置和计算机设备
CN112465847A (zh) 一种基于预测清晰边界的边缘检测方法、装置及设备
Hou et al. Real-time surveillance video salient object detection using collaborative cloud-edge deep reinforcement learning
CN117237867A (zh) 基于特征融合的自适应场面监视视频目标检测方法和系统
WO2020244076A1 (zh) 人脸识别方法、装置、电子设备及存储介质
US20220327663A1 (en) Video Super-Resolution using Deep Neural Networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20915287

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020915287

Country of ref document: EP

Effective date: 20220720

NENP Non-entry into the national phase

Ref country code: DE