CN114424253A

CN114424253A - Model training method and device, storage medium and electronic equipment

Info

Publication number: CN114424253A
Application number: CN201980100619.0A
Authority: CN
Inventors: 高洪涛
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd; Shenzhen Huantai Technology Co Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd; Shenzhen Huantai Technology Co Ltd
Priority date: 2019-11-08
Filing date: 2019-11-08
Publication date: 2022-04-29
Also published as: WO2021087985A1

Abstract

A model training method, a device, a storage medium and an electronic device are provided, wherein the method comprises the following steps: acquiring a sample image set; inputting the sample image set into a deep neural network for training; if the input image is a classified image, calculating a loss value based on a first loss function; if the input image is the target detection image, calculating a loss value based on the first loss function and the second loss function; and performing back propagation based on the loss value to update the network parameters until convergence to obtain the image recognition model. The method can improve the accuracy of the deep neural network in image category prediction and target detection.

Description

Model training method and device, storage medium and electronic equipment

Technical Field

The application relates to the technical field of image processing, in particular to a model training method, a model training device, a storage medium and electronic equipment.

Background

Image processing is a technique that uses a computer to analyze an image to achieve a desired result. In the field of image processing technology, image category prediction has become an important research subject. As research on neural network models advances, methods for obtaining a prediction type of an image by performing type prediction on the image by using the models have been widely accepted. Therefore, it is particularly important how to train through the model to improve the accuracy of the prediction of the subsequent image classes.

Disclosure of Invention

The embodiment of the application provides a model training method and device, a storage medium and electronic equipment, which can improve the accuracy of a deep neural network on image category prediction.

In a first aspect, an embodiment of the present application provides a model training method, including:

obtaining a sample image set, wherein the sample image set comprises a target detection image and a classification image, and the target detection image carries position information and a first class label;

inputting the sample images in the sample image set into a preset deep neural network for training;

if the sample image input into the deep neural network is the classified image, calculating a loss value based on a first loss function;

if the sample image input into the deep neural network is the target detection image, calculating a loss value based on the first loss function and the second loss function;

performing back propagation based on the calculated loss value to update the network parameters until convergence to obtain an image recognition model, wherein the image recognition model is used for recognizing the category of the input image and the position of the category object

In a second aspect, an embodiment of the present application provides a model training apparatus, including:

the image acquisition module is used for acquiring a sample image set, wherein the sample image set comprises a target detection image and a classification image, and the target detection image carries position information and a first class label;

the image input module is used for inputting the sample images in the sample image set into a preset deep neural network for training;

the first calculation module is used for calculating a loss value based on a first loss function if the sample image input into the deep neural network is the classified image;

a second calculation module, configured to calculate a loss value based on the first loss function and a second loss function if the sample image input to the deep neural network is the target detection image;

an iterative training module for performing back propagation to update network parameters until convergence based on the calculated loss value to obtain an image recognition model, wherein the image recognition model is used for recognizing the category of the input image and the position of the category object

In a third aspect, an embodiment of the present application provides a storage medium having a computer program stored thereon, which, when run on a computer, causes the computer to perform:

and performing back propagation on the basis of the calculated loss value to update the network parameters until convergence to obtain an image recognition model, wherein the image recognition model is used for recognizing the category of the input image and the position of the category object.

In a fourth aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the memory has a computer program, and the processor, by calling the computer program, is configured to perform:

In the scheme provided by the embodiment of the application, when a deep neural network is trained, a sample image set comprising a target detection image and a classification image is obtained, the sample image in the sample image set is used for training a preset deep neural network for training, in the training process, when the sample image input into the deep neural network is the classification image, a loss value is calculated based on a first loss function, when the sample image input into the deep neural network is the target detection image, the loss value is calculated based on the first loss function and a second loss function, and back propagation is performed based on the loss value so as to update network parameters until convergence, in the training scheme, the preset deep neural network is trained by combining the target detection image and the classification image, and as the target detection image carries position information and a first class label, the position information indicates the specific position of a class object in the image, in the network training process, the network can more accurately extract the characteristics of the class objects, and the accuracy of the image recognition model obtained by image recognition model training on image class prediction is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flowchart of a first method for training a model according to an embodiment of the present disclosure.

Fig. 2 is a schematic flowchart of a second method for training a model according to an embodiment of the present disclosure.

Fig. 3 is a schematic structural diagram of a model training device according to an embodiment of the present application.

Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Fig. 5 is a schematic structural diagram of a model training circuit of an electronic device according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without inventive step, are within the scope of the present application.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

The embodiment of the application provides a model training method, which comprises the following steps:

In some embodiments, the classification image carries a second class label, the target detection image carries position information and a first class label, and the first class labels carried by all the target detection images form a first class label set;

before the calculating the loss value based on the first loss function, the method further includes:

if the sample image input into the deep neural network is the classified image, judging whether a second class label corresponding to the input classified image is contained in the first class label set;

if yes, calculating a loss value based on the first loss function;

and if not, calculating a loss value based on a third loss function, wherein when the input sample images are the same, the loss value calculated by the first loss function is smaller than the loss value calculated by the third loss function.

In some embodiments, the third loss function is k x the first loss function, where k > 1.

In some embodiments, the first loss function is m f and the third loss function is n f, where f is a base loss function, 0 < m < 1, and n > 1.

In some embodiments, the deep neural network is a convolutional neural network; and forming a second class label set by second class labels carried by all classified images, wherein the label types in the first class label set are less than the label types in the second class label set.

In some embodiments, the back-propagating based on the calculated loss value to update the network parameter until after convergence further comprises:

acquiring an image to be classified;

and carrying out image recognition on the image to be classified according to the image recognition model so as to determine a target class corresponding to the image to be classified and the position of an object belonging to the target class in the image to be classified.

An execution subject of the model training method may be the model training apparatus provided in the embodiment of the present application, or an electronic device integrated with the model training apparatus, where the model training apparatus may be implemented in a hardware or software manner. The electronic device may be a smart phone, a tablet computer, a palm computer, a notebook computer, or a desktop computer.

Referring to fig. 1, fig. 1 is a first flowchart of a model training method according to an embodiment of the present disclosure. The specific process of the model training method provided by the embodiment of the application can be as follows:

in 101, a sample image set is obtained, where the sample image set includes a target detection image and a classification image, where the target detection image carries position information and a first class label.

The image multi-classification based on target detection belongs to strong supervision, and position information of each class object in the image needs to be provided, but when the training sample size of the classification model is large, the labeling of the position information is high in labor cost. The general image multi-classification belongs to a weakly supervised image classification method, and the classification method only needs to label the class name of an image, but cannot identify the position of a class object in the image.

The model training scheme of the embodiment of the application can be applied to an image classification and positioning model, and the model can identify the category of the image and the position of a category object in the image. For example, the position of the category object can be marked by the target box. The model may be constructed based on a deep neural network, such as a Back Propagation (BP) neural network, a convolutional neural network, and the like.

The method and the device adopt two training samples to form a sample image set in a mixed mode, wherein the two sample images comprise a target detection image and a classification image, the target detection image carries a class label and has position information, and the position information indicates the position of a class object in the image. The classified images carry class labels. For convenience of the following description, the class label carried by the target detection image is denoted as a first class label, and the class label carried by the classification image is denoted as a second class label. The first class labels carried by all the target detection images form a first class label set; the second class labels carried by all classified images constitute a second class label set. In some embodiments, the label categories in the second set of category labels may partially coincide with the category labels in the first set of category labels.

In 102, the sample images in the sample image set are input into a preset deep neural network for training.

The two training samples are mixed to form a sample image training model, and the model is essentially strong supervision algorithm and direct classification combined training. In the training process, sample pictures in a sample image set mixed with a target detection image and a classification image are randomly input into a preset neural network for calculation. And calculating loss values by adopting different loss functions according to different types of input sample images.

In 103, if the sample image input to the deep neural network is a classified image, a loss value is calculated based on the first loss function.

At 104, if the sample image input to the deep neural network is the target detection image, a loss value is calculated based on the first loss function and the second loss function.

When training the network using the classified images, the loss function in the network is constituted by a first loss function for calculating a loss value generated when the images are classified. At this time, because the training data has no target frame, when the error information is reversely propagated, only the network parameters related to the classification training part are updated, and the network parameters related to the target detection part are not updated. At this time, since the training data carries the target frame, when the error information is reversely propagated, the network parameters related to the classification training part and the network parameters related to the target detection part are updated, that is, all the network parameters are updated.

When the network is trained by using the target detection image, the loss function in the network is composed of a first loss function and a second loss function, the second loss function is used for calculating the loss value generated when the target detection is carried out on the image, and the first loss function is used for calculating the loss value generated when the image is classified.

Therefore, two loss functions are involved in the training process of the deep neural network. The total loss function can be expressed as L ═ L_p+L _clsWherein L is_clsIs a first loss function, L_pIs a second loss function. If the sample image input into the deep neural network is a classified image, L_p＝0。

In this embodiment, the loss function may be selected according to the deep neural network used. For example, a mean square error function, a cross entropy function, or the like may be employed as the loss function.

At 105, the network parameters are updated based on the calculated loss values to converge, resulting in an image recognition model that identifies the class of the input image and the location of the class object.

And in the training process of the network, calculating a loss value based on the loss function and the calculation mode, and performing back propagation based on the calculated loss value to update the network parameters until the network converges. For example, until the number of iterative training reaches a preset value, or until the loss value reaches a minimum, or until the loss value is less than a preset value. And after the training is carried out to be convergent, determining network parameters, and taking the deep neural network after the network parameters are determined as an image number recognition model.

In the network training process, the target detection image carries the position information, and the position information indicates the specific position of the class object in the image, so that the network can more accurately extract the features of the class object in the network training process. By the method, when the sample image input to the network is the classified image, even if the classified image does not carry the position information, the capability of the network for identifying the features of the class objects is enhanced due to the training of the target detection image, the features of the class objects in the classified image can be identified more accurately, and the positions of the class objects can be determined with higher accuracy. It is understood that the category object in the present application refers to an object corresponding to the category label corresponding to the sample image.

For example, taking a preset deep neural network as a convolutional neural network as an example, a cross entropy function is used as a loss function, training data is input, a loss value is calculated according to the loss function, and back propagation is performed based on the loss value, so as to optimize the weight in each convolutional kernel in each convolutional layer of the network.

In particular implementation, the present application is not limited by the execution sequence of the described steps, and some steps may be performed in other sequences or simultaneously without conflict.

From the above, in the model training method provided in the embodiment of the present application, when training the deep neural network, a sample image set including a target detection image and a classification image is obtained, a preset deep neural network is trained by using the sample images in the sample image set, in the training process, when the sample image input to the deep neural network is the classification image, a loss value is calculated based on a first loss function, when the sample image input to the deep neural network is the target detection image, the loss value is calculated based on the first loss function and a second loss function, and back propagation is performed based on the loss value to update a network parameter until convergence, in the training scheme, the preset deep neural network is trained by combining the target detection image and the classification image, and since the target detection image carries position information and a first class label, the position information indicates a specific position of a class object in the image, in the process of training the network, the network can more accurately extract the characteristics of the class objects, and the accuracy of the image recognition model obtained by training the image recognition model on the image class prediction is improved.

The model training method of the present application will be described in further detail below on the basis of the methods described in the above embodiments. Referring to fig. 2, fig. 2 is a second flow chart of the model training method according to the embodiment of the invention. The method comprises the following steps:

in 201, a sample image set is obtained, where the sample image set includes a target detection image and a classification image, where the target detection image carries position information and a first class label, and the first class labels carried by all the target detection images form a first class label set.

In the embodiment, two training samples are mixed to form a sample image set, wherein the two sample images include a target detection image and a classification image, the target detection image carries a class label and has position information, and the position information indicates the position of a class object in the image. The classified images carry class labels. For convenience of the following description, the class label carried by the target detection image is denoted as a first class label, and the class label carried by the classification image is denoted as a second class label. The first class labels carried by all the target detection images form a first class label set; the second class labels carried by all classified images constitute a second class label set. In some embodiments, the label categories in the second set of category labels may partially coincide with the category labels in the first set of category labels.

For example, the deep neural network is used for classification of animals. The sample images are animal images, wherein the target detection images not only carry class labels of the animals, but also identify the positions of the class animals corresponding to the images in the form of target frames in each image. However, the animal categories in the target detection image are only of a large animal class, e.g., dog, cat, deer, etc., but there is no finer category classification, e.g., dogs are not classified as golden retrievers, huskies, shepherds, etc. Meanwhile, the classified images only carry the class labels of the animals, and the specific positions of the animals in the images are not identified, but the classified images have wider and deeper class labels. For example, the classification image includes a large class that is not included in the target detection image, for example, the target detection image does not include the large class, but the classification image includes the large class. The classification image may also include small classes that are not included in the target detection image, for example, small classes such as golden retrievers, huskies, shepherds, etc., are not included in the target detection image, but these classes are included in the classification image. That is, the number of categories of category labels in the second set of category labels may be greater than the number of categories of category labels in the first set of category labels.

Based on the scheme of the embodiment, the two sample images are mixed together to serve as a training sample, the deep neural network is trained in a joint training mode, and the trained network can output high-accuracy position information for small-category dogs which do not appear in the target detection image.

In 202, the sample images in the sample image set are input into a preset deep neural network for training.

At 203, if the sample image input into the deep neural network is a classified image, it is determined whether a second class label corresponding to the input classified image is included in the first class label set.

At 204, if so, a loss value is calculated based on the first loss function.

If not, a loss value is calculated based on a third loss function, wherein the loss value calculated by the first loss function is smaller than the loss value calculated by the third loss function when the input sample images are the same, in 205.

Based on the above example, although the deep neural network is trained by means of joint training, the trained network can output position information with higher accuracy for small types of dogs that do not appear in the target detection image. However, in the training process, when the classification image includes a large class that is not included in the target detection image, for example, the classification image includes a large class that is not included in the target detection image, the accuracy of the position detection is poor. In the present embodiment, this problem is solved in a new loss value calculation manner.

When the sample image input to the deep neural network is a classified image, whether a second class label corresponding to the input classified image is included in the first class label set is judged. If so, a loss value is calculated based on the first loss function. And if not, calculating a loss value based on a third loss function, wherein when the input sample images are the same, the loss value calculated by the first loss function is smaller than the loss value calculated by the third loss function. That is, when the category of the classification image includes a category that is not included in the target detection image, in order to improve the accuracy of network target detection, the loss value is calculated by using a third loss function that is different from another case (a case where the category of the classification image is included in the category of the target detection image), so that the calculated loss value is larger, the network is more sensitive to the category, the features of the category image can be learned more accurately, so as to optimize the model parameters, and further improve the accuracy of detection of the category and the target.

For example, in some implementations, the third loss function is k x the first loss function, where k > 1. In this embodiment, a third loss function is obtained by multiplying a weight coefficient, which is a constant greater than 1, by a formula of the first loss function, for example, k is 1 to 3 in some embodiments; for another example, in some embodiments, k is 1-1.5; for another example, in some embodiments, k is 1.5 ~ 2.

For another example, in some embodiments, the first loss function is m f and the third loss function is n f, where f is the base loss function, 0 < m < 1, and n > 1. For example, f is a cross-entropy loss function, the calculation formula of the first loss function is obtained by multiplying the calculation formula of the cross-entropy loss function by a positive number smaller than 1, and the calculation formula of the second loss function is obtained by multiplying the calculation formula of the cross-entropy loss function by a constant larger than 1.

At 206, if the sample image input to the deep neural network is the target detection image, a loss value is calculated based on the first loss function and the second loss function.

In 207, the network parameters are updated based on the calculated loss values back-propagated until convergence, resulting in an image recognition model that identifies the class of the input image and the location of the class object.

In some embodiments, the back-propagating based on the calculated loss value to update the network parameter until after convergence further comprises: acquiring an image to be classified; and carrying out image recognition on the image to be classified according to the image recognition model so as to determine a target class corresponding to the image to be classified and the position of an object belonging to the target class in the image to be classified.

In this embodiment, the image recognition model obtained by training is used to recognize the image category, and the image to be classified is input into the image recognition model to be calculated, so as to obtain the category label corresponding to the image to be classified and the position of the corresponding category object in the image.

As can be seen from the above, in the model training method provided in the embodiment of the present invention, on the basis of the joint training of the classification data and the target detection data, when the sample image input to the deep neural network is a classification image, if the class label corresponding to the classification image is not included in the class label of the target detection image, the back propagation is performed with a large loss value, the recognition capability of the model for multiple classes is extended, and the accuracy of multiple classes is improved.

The embodiment of the present application further provides a model training device, including:

and the iterative training module is used for carrying out back propagation on the basis of the calculated loss value so as to update the network parameters until convergence, so as to obtain an image recognition model, and the image recognition model is used for recognizing the category of the input image and the position of the category object.

In some embodiments, the classified images carry second class labels, and the first class labels carried by all the target detection images form a first class label set; the device further comprises:

the label detection module is used for judging whether a second class label corresponding to the input classified image is contained in the first class label set or not if the sample image input into the deep neural network is the classified image;

the first computing module is further to:

if the second class label corresponding to the input classified image is contained in the first class label set, calculating a loss value based on the first loss function;

and if the second class label corresponding to the input classified image is not included in the first class label set, calculating a loss value based on a third loss function, wherein when the input sample images are the same, the loss value calculated by the first loss function is smaller than the loss value calculated by the third loss function.

In some embodiments, the apparatus further comprises an image classification module to:

acquiring an image to be classified;

In one embodiment, a model training apparatus is also provided. Referring to fig. 3, fig. 3 is a schematic structural diagram of a model training apparatus 300 according to an embodiment of the present disclosure. The model training apparatus 300 is applied to an electronic device, and the model training apparatus 300 includes an image obtaining module 301, an image input module 302, a first calculating module 303, a second calculating module 304, and an iterative training module 305, as follows:

the image obtaining module 301 is configured to obtain a sample image set, where the sample image set includes a target detection image and a classification image, and the target detection image carries position information and a first class label;

an image input module 302, configured to input a sample image in the sample image set into a preset deep neural network for training;

a first calculating module 303, configured to calculate a loss value based on a first loss function if the sample image input to the deep neural network is the classified image;

a second calculating module 304, configured to calculate a loss value based on the first loss function and the second loss function if the sample image input to the deep neural network is the target detection image;

and the iterative training module 305 is configured to perform back propagation to update the network parameters until convergence based on the calculated loss value, so as to obtain an image recognition model, where the image recognition model is used to recognize the category of the input image and the position of the category object.

the model training apparatus 300 further includes a label detection module, configured to determine whether a second class label corresponding to the input classified image is included in the first class label set if the sample image input to the deep neural network is the classified image;

the first calculation module 303 is further configured to: if the second class label corresponding to the input classified image is contained in the first class label set, calculating a loss value based on the first loss function;

and calculating a loss value based on a third loss function if a second class label corresponding to the input classified image is not included in the first class label set, wherein the loss value calculated by the first loss function is smaller than the loss value calculated by the third loss function when the input sample images are the same.

In some embodiments, the model training apparatus 300 further comprises an image classification module for: acquiring an image to be classified; and carrying out image recognition on the image to be classified according to the image recognition model so as to determine a target class corresponding to the image to be classified and the position of an object belonging to the target class in the image to be classified.

In specific implementation, the above modules may be implemented as independent entities, or may be combined arbitrarily to be implemented as the same or several entities, and specific implementation of the above modules may refer to the foregoing method embodiments, which are not described herein again.

It should be noted that the model training apparatus provided in the embodiment of the present application and the model training method in the above embodiment belong to the same concept, and any method provided in the embodiment of the model training method may be run on the model training apparatus, and the specific implementation process thereof is described in detail in the embodiment of the model training method, and is not described herein again.

As can be seen from the above, in the model training apparatus provided in this embodiment of the application, when training a deep neural network, a sample image set including a target detection image and a classification image is obtained, a preset deep neural network is trained by using sample images in the sample image set, in the training process, when a sample image input to the deep neural network is a classification image, a loss value is calculated based on a first loss function, when a sample image input to the deep neural network is a target detection image, a loss value is calculated based on the first loss function and a second loss function, and back propagation is performed based on the loss value to update a network parameter until convergence, in the training scheme, the preset deep neural network is trained by combining the target detection image and the classification image, and since the target detection image carries position information and a first class label, the position information indicates a specific position of a class object in the image, in the process of training the network, the network can more accurately extract the characteristics of the class objects, and the accuracy of the image recognition model obtained by training the image recognition model on the image class prediction is improved.

The embodiment of the application further provides an electronic device, and the electronic device can be a mobile terminal such as a tablet computer or a smart phone. Referring to fig. 4, fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. The electronic device 800 may include a camera module 801, a memory 802, a processor 803, a touch display 804, a speaker 805, a microphone 806, and the like.

The camera module 801 may include model training circuitry, which may be implemented using hardware and/or software components, and may include various Processing units that define an Image Signal Processing (Image Signal Processing) pipeline. The model training circuit may include at least: a camera, an Image Signal Processor (ISP Processor), control logic, an Image memory, and a display. Wherein the camera may comprise at least one or more lenses and an image sensor. The image sensor may include an array of color filters (e.g., Bayer filters). The image sensor may acquire light intensity and wavelength information captured with each imaging pixel of the image sensor and provide a set of raw image data that may be processed by an image signal processor.

The image signal processor may process the raw image data pixel by pixel in a variety of formats. For example, each image pixel may have a bit depth of 8, 10, 12, or 14 bits, and the image signal processor may perform one or more model training operations on the raw image data, gathering statistical information about the image data. Wherein the model training operations may be performed with the same or different bit depth precision. The raw image data can be stored in an image memory after being processed by an image signal processor. The image signal processor may also receive image data from an image memory.

The image Memory may be part of a Memory device, a storage device, or a separate dedicated Memory within the electronic device, and may include a DMA (Direct Memory Access) feature.

When image data is received from the image memory, the image signal processor may perform one or more model training operations, such as temporal filtering. The processed image data may be sent to an image memory for additional processing before being displayed. The image signal processor may also receive processed data from the image memory and perform image data processing on the processed data in the raw domain and in the RGB and YCbCr color spaces. The processed image data may be output to a display for viewing by a user and/or further processed by a Graphics Processing Unit (GPU). Further, the output of the image signal processor may also be sent to an image memory, and the display may read image data from the image memory. In one embodiment, the image memory may be configured to implement one or more frame buffers.

The statistical data determined by the image signal processor may be sent to the control logic. For example, the statistical data may include statistical information of the image sensor such as auto exposure, auto white balance, auto focus, flicker detection, black level compensation, lens shading correction, and the like.

The control logic may include a processor and/or microcontroller that executes one or more routines (e.g., firmware). One or more routines may determine camera control parameters and ISP control parameters based on the received statistics. For example, the control parameters of the camera may include camera flash control parameters, control parameters of the lens (e.g., focal length for focusing or zooming), or a combination of these parameters. The ISP control parameters may include gain levels and color correction matrices for automatic white balance and color adjustment (e.g., during RGB processing), etc.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a model training circuit in the present embodiment. For ease of illustration, only aspects of model training techniques related to embodiments of the present invention are shown.

For example, the model training circuit may include: camera, image signal processor, control logic ware, image memory, display. The camera may include one or more lenses and an image sensor, among others. In some embodiments, the camera may be either a tele camera or a wide camera.

And the image collected by the camera is transmitted to an image signal processor for processing. After the image signal processor processes the image, statistical data of the image (such as brightness of the image, contrast value of the image, color of the image, etc.) may be sent to the control logic. The control logic device can determine the control parameters of the camera according to the statistical data, so that the camera can carry out operations such as automatic focusing and automatic exposure according to the control parameters. The image can be stored in the image memory after being processed by the image signal processor. The image signal processor may also read the image stored in the image memory for processing. In addition, the image can be directly sent to a display for displaying after being processed by the image signal processor. The display may also read the image in the image memory for display.

In addition, not shown in the figure, the electronic device may further include a CPU and a power supply module. The CPU is connected with the logic controller, the image signal processor, the image memory and the display, and is used for realizing global control. The power supply module is used for supplying power to each module.

The memory 802 stores applications containing executable code. The application programs may constitute various functional modules. The processor 803 executes various functional applications and data processing by running the application programs stored in the memory 802.

The processor 803 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by running or executing an application program stored in the memory 802 and calling data stored in the memory 802, thereby integrally monitoring the electronic device.

The touch display screen 804 may be used to receive user touch control operations for the electronic device. Speaker 805 may play sound signals. The microphone 806 may be used to pick up sound signals.

In this embodiment, the processor 803 in the electronic device loads the executable code corresponding to the processes of one or more application programs into the memory 802 according to the following instructions, and the processor 803 runs the application programs stored in the memory 802, so as to execute:

In some embodiments, the classification image carries a second class label, the target detection image carries position information and a first class label, and the first class labels carried by all the target detection images form a first class label set; the processor 803 also performs:

if yes, calculating a loss value based on the first loss function;

In some embodiments, the processor 803 also performs:

acquiring an image to be classified; and carrying out image recognition on the image to be classified according to the image recognition model so as to determine a target class corresponding to the image to be classified and the position of an object belonging to the target class in the image to be classified.

From the above, an embodiment of the present application provides an electronic device, where the electronic device obtains a sample image set including a target detection image and a classification image when training a deep neural network, trains a preset deep neural network by using the sample images in the sample image set to perform training, calculates a loss value based on a first loss function when a sample image input to the deep neural network is a classification image, calculates a loss value based on the first loss function and a second loss function when the sample image input to the deep neural network is a target detection image, and performs back propagation based on the loss value to update network parameters until convergence, in the above training scheme, the preset deep neural network is trained by combining the target detection image and the classification image, and since the target detection image carries position information and a first class label, the position information indicates the specific position of the class object in the image, so that in the process of training the network, the network can more accurately extract the characteristics of the class object, and the accuracy of the image recognition model obtained by training the image recognition model for image class prediction is improved.

An embodiment of the present application further provides a storage medium, where a computer program is stored in the storage medium, and when the computer program runs on a computer, the computer executes the model training method according to any of the above embodiments.

It should be noted that, all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer-readable storage medium, which may include, but is not limited to: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

Furthermore, the terms "first", "second", and "third", etc. in this application are used to distinguish different objects, and are not used to describe a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or modules is not limited to only those steps or modules listed, but rather, some embodiments may include other steps or modules not listed or inherent to such process, method, article, or apparatus.

The model training method, the model training device, the storage medium, and the electronic device provided in the embodiments of the present application are described in detail above. The principle and the implementation of the present application are explained herein by applying specific examples, and the above description of the embodiments is only used to help understand the method and the core idea of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

A method of model training, comprising:

obtaining a sample image set, wherein the sample image set comprises a target detection image and a classification image, and the target detection image carries position information and a first class label;

inputting the sample images in the sample image set into a preset deep neural network for training;

if the sample image input into the deep neural network is the classified image, calculating a loss value based on a first loss function;

if the sample image input into the deep neural network is the target detection image, calculating a loss value based on the first loss function and the second loss function;

and performing back propagation on the basis of the calculated loss value to update the network parameters until convergence to obtain an image recognition model, wherein the image recognition model is used for recognizing the category of the input image and the position of the category object.
The model training method of claim 1, wherein the classification images carry second class labels, the object detection images carry position information and first class labels, and the first class labels carried by all the object detection images form a first class label set;

before the calculating the loss value based on the first loss function, the method further includes:

if the sample image input into the deep neural network is the classified image, judging whether a second class label corresponding to the input classified image is contained in the first class label set;

if yes, calculating a loss value based on the first loss function;

and if not, calculating a loss value based on a third loss function, wherein when the input sample images are the same, the loss value calculated by the first loss function is smaller than the loss value calculated by the third loss function.
The model training method of claim 2, wherein the third loss function is k first loss function, where k > 1.
The model training method of claim 2, wherein the first loss function is m f and the third loss function is n f, wherein f is a base loss function, 0 < m < 1, and n > 1.
The model training method of claim 2, wherein the deep neural network is a convolutional neural network; and forming a second class label set by second class labels carried by all classified images, wherein the label types in the first class label set are less than the label types in the second class label set.
The model training method of claim 1, wherein said back-propagating based on the calculated loss value to update the network parameters until after convergence further comprises:

acquiring an image to be classified;

and carrying out image recognition on the image to be classified according to the image recognition model so as to determine a target class corresponding to the image to be classified and the position of an object belonging to the target class in the image to be classified.
A model training apparatus, comprising:

the image acquisition module is used for acquiring a sample image set, wherein the sample image set comprises a target detection image and a classification image, and the target detection image carries position information and a first class label;

the image input module is used for inputting the sample images in the sample image set into a preset deep neural network for training;

the first calculation module is used for calculating a loss value based on a first loss function if the sample image input into the deep neural network is the classified image;

a second calculation module, configured to calculate a loss value based on the first loss function and a second loss function if the sample image input to the deep neural network is the target detection image;

and the iterative training module is used for carrying out back propagation on the basis of the calculated loss value so as to update the network parameters until convergence, so as to obtain an image recognition model, and the image recognition model is used for recognizing the category of the input image and the position of the category object.
The model training apparatus of claim 7, wherein the classification images carry second class labels, and the first class labels carried by all the target detection images form a first class label set; the device further comprises:

the label detection module is used for judging whether a second class label corresponding to the input classified image is contained in the first class label set or not if the sample image input into the deep neural network is the classified image;

the first computing module is further to:

if the second class label corresponding to the input classified image is contained in the first class label set, calculating a loss value based on the first loss function;

and if the second class label corresponding to the input classified image is not included in the first class label set, calculating a loss value based on a third loss function, wherein when the input sample images are the same, the loss value calculated by the first loss function is smaller than the loss value calculated by the third loss function.
The model training apparatus of claim 8, wherein the third loss function is k x the first loss function, where k > 1.
The model training apparatus of claim 8 wherein the first loss function is m f and the third loss function is n f, wherein f is a base loss function, 0 < m < 1, and n > 1.
The model training apparatus of claim 8, wherein the deep neural network is a convolutional neural network; and forming a second class label set by second class labels carried by all classified images, wherein the label types in the first class label set are less than the label types in the second class label set.
The model training apparatus of claim 7, wherein the apparatus further comprises an image classification module to:

acquiring an image to be classified;

and carrying out image recognition on the image to be classified according to the image recognition model so as to determine a target class corresponding to the image to be classified and the position of an object belonging to the target class in the image to be classified.
A storage medium having a computer program stored thereon, which, when run on a computer, causes the computer to perform:

obtaining a sample image set, wherein the sample image set comprises a target detection image and a classification image, and the target detection image carries position information and a first class label;

inputting the sample images in the sample image set into a preset deep neural network for training;

if the sample image input into the deep neural network is the classified image, calculating a loss value based on a first loss function;

if the sample image input into the deep neural network is the target detection image, calculating a loss value based on the first loss function and the second loss function;

and performing back propagation on the basis of the calculated loss value to update the network parameters until convergence to obtain an image recognition model, wherein the image recognition model is used for recognizing the category of the input image and the position of the category object.
The storage medium of claim 13, wherein the classified images carry second category labels, the object detection images carry location information and first category labels, and the first category labels carried by all the object detection images form a first category label set;

the computer program, when executed on a computer, may further cause the computer to perform:

if the sample image input into the deep neural network is the classified image, judging whether a second class label corresponding to the input classified image is contained in the first class label set;

if yes, calculating a loss value based on the first loss function;

and if not, calculating a loss value based on a third loss function, wherein when the input sample images are the same, the loss value calculated by the first loss function is smaller than the loss value calculated by the third loss function.
An electronic device comprising a processor and a memory, the memory storing a computer program, wherein the processor, by invoking the computer program, is configured to perform:

obtaining a sample image set, wherein the sample image set comprises a target detection image and a classification image, and the target detection image carries position information and a first class label;

inputting the sample images in the sample image set into a preset deep neural network for training;

if the sample image input into the deep neural network is the classified image, calculating a loss value based on a first loss function;

if the sample image input into the deep neural network is the target detection image, calculating a loss value based on the first loss function and the second loss function;

and performing back propagation on the basis of the calculated loss value to update the network parameters until convergence to obtain an image recognition model, wherein the image recognition model is used for recognizing the category of the input image and the position of the category object.
The electronic device of claim 15, wherein the classified images carry second category labels, the object detection images carry location information and first category labels, and the first category labels carried by all the object detection images form a first category label set; the processor may further be configured to, by invoking the computer program, perform:

if the sample image input into the deep neural network is the classified image, judging whether a second class label corresponding to the input classified image is contained in the first class label set;

if yes, calculating a loss value based on the first loss function;

and if not, calculating a loss value based on a third loss function, wherein when the input sample images are the same, the loss value calculated by the first loss function is smaller than the loss value calculated by the third loss function.
The electronic device of claim 16, wherein the third loss function is k x the first loss function, where k > 1.
The electronic device of claim 16, wherein the first loss function is m f and the third loss function is n f, wherein f is a base loss function, 0 < m < 1, and n > 1.
The electronic device of claim 16, wherein the deep neural network is a convolutional neural network; and forming a second class label set by second class labels carried by all classified images, wherein the label types in the first class label set are less than the label types in the second class label set.
The electronic device of claim 15, wherein the processor is further operable by invoking the computer program to perform:

acquiring an image to be classified;

and carrying out image recognition on the image to be classified according to the image recognition model so as to determine a target class corresponding to the image to be classified and the position of an object belonging to the target class in the image to be classified.