WO2021087985A1 - 模型训练方法、装置、存储介质及电子设备 - Google Patents

模型训练方法、装置、存储介质及电子设备 Download PDF

Info

Publication number
WO2021087985A1
WO2021087985A1 PCT/CN2019/116710 CN2019116710W WO2021087985A1 WO 2021087985 A1 WO2021087985 A1 WO 2021087985A1 CN 2019116710 W CN2019116710 W CN 2019116710W WO 2021087985 A1 WO2021087985 A1 WO 2021087985A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
category
loss function
input
neural network
Prior art date
Application number
PCT/CN2019/116710
Other languages
English (en)
French (fr)
Inventor
高洪涛
Original Assignee
深圳市欢太科技有限公司
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市欢太科技有限公司, Oppo广东移动通信有限公司 filed Critical 深圳市欢太科技有限公司
Priority to PCT/CN2019/116710 priority Critical patent/WO2021087985A1/zh
Priority to CN201980100619.0A priority patent/CN114424253A/zh
Publication of WO2021087985A1 publication Critical patent/WO2021087985A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • This application relates to the field of image processing technology, in particular to a model training method, device, storage medium and electronic equipment.
  • Image processing is a technique that uses a computer to analyze images to achieve the desired results.
  • image category prediction has become an important research topic.
  • neural network model research the method of predicting the image category through the model to obtain the predicted category of the image has gradually been widely recognized. It can be seen that how to improve the accuracy of subsequent image category prediction through model training is particularly important.
  • the embodiments of the present application provide a model training method, device, storage medium, and electronic equipment, which can improve the accuracy of image category prediction by a deep neural network.
  • an embodiment of the present application provides a model training method, including:
  • the sample image set contains a target detection image and a classification image, wherein the target detection image carries location information and a first category label;
  • the sample image input to the deep neural network is the target detection image, calculating a loss value based on the first loss function and the second loss function;
  • an embodiment of the present application provides a model training device, including:
  • An image acquisition module for acquiring a sample image set, the sample image set contains a target detection image and a classification image, wherein the target detection image carries location information and a first category label;
  • An image input module configured to input sample images in the sample image set into a preset deep neural network for training
  • the first calculation module is configured to calculate a loss value based on a first loss function if the sample image input to the deep neural network is the classified image;
  • a second calculation module configured to calculate a loss value based on the first loss function and the second loss function if the sample image input to the deep neural network is the target detection image;
  • the iterative training module is used to perform back propagation based on the calculated loss value to update the network parameters until convergence to obtain an image recognition model, which is used to recognize the category of the input image and the location of the category object
  • an embodiment of the present application provides a storage medium on which a computer program is stored, and when the computer program runs on a computer, the computer is caused to execute:
  • the sample image set contains a target detection image and a classification image, wherein the target detection image carries location information and a first category label;
  • the sample image input to the deep neural network is the target detection image, calculating a loss value based on the first loss function and the second loss function;
  • Backpropagation is performed based on the calculated loss value to update the network parameters until convergence, and an image recognition model is obtained.
  • the image recognition model is used to recognize the category of the input image and the location of the category object.
  • an embodiment of the present application provides an electronic device, including a processor and a memory, the memory has a computer program, and the processor is configured to execute:
  • the sample image set contains a target detection image and a classification image, wherein the target detection image carries location information and a first category label;
  • the sample image input to the deep neural network is the target detection image, calculating a loss value based on the first loss function and the second loss function;
  • Backpropagation is performed based on the calculated loss value to update the network parameters until convergence, and an image recognition model is obtained.
  • the image recognition model is used to recognize the category of the input image and the location of the category object.
  • the solution provided by the embodiment of this application obtains a sample image set containing target detection images and classification images when training a deep neural network, and uses the sample images in the sample image set to train a preset deep neural network for training.
  • the sample image input to the deep neural network is a classification image
  • the loss value is calculated based on the first loss function
  • the sample image input to the deep neural network is the target detection image
  • the loss value is calculated based on the first loss function and the second loss function
  • the target detection image and the classification image are combined to train the preset deep neural network.
  • the location information indicates the specific location of the category object in the image, so that in the process of training the network, the network can more accurately extract the characteristics of the category object, which improves the image recognition model obtained by the image recognition model training for the image category The accuracy of the forecast.
  • FIG. 1 is a schematic diagram of the first flow of a model training method provided by an embodiment of the application.
  • FIG. 2 is a schematic diagram of the second flow of the model training method provided by an embodiment of the application.
  • Fig. 3 is a schematic structural diagram of a model training device provided by an embodiment of the application.
  • FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the application.
  • FIG. 5 is a schematic structural diagram of a model training circuit of an electronic device provided by an embodiment of the application.
  • the embodiment of the present application provides a model training method, including:
  • the sample image set contains a target detection image and a classification image, wherein the target detection image carries location information and a first category label;
  • the sample image input to the deep neural network is the target detection image, calculating a loss value based on the first loss function and the second loss function;
  • Backpropagation is performed based on the calculated loss value to update the network parameters until convergence, and an image recognition model is obtained.
  • the image recognition model is used to recognize the category of the input image and the location of the category object.
  • the classification image carries a second category label
  • the target detection image carries location information and a first category label
  • the first category labels carried by all target detection images constitute a first category label set
  • the method further includes:
  • the sample image input to the deep neural network is the classification image, determining whether the second category label corresponding to the input classification image is included in the first category label set;
  • the loss value is calculated based on the third loss function, where when the input sample images are the same, the loss value calculated by the first loss function is smaller than the loss value calculated by the third loss function.
  • the third loss function k*first loss function, where k>1.
  • the first loss function is m*f
  • the third loss function is n*f, where f is the basic loss function, 0 ⁇ m ⁇ 1, n>1.
  • the deep neural network is a convolutional neural network; the second category labels carried by all classified images constitute a second category label set, and the label types in the first category label set are less than those in the second category label set.
  • the tag type of the category tag set is a convolutional neural network
  • the deep neural network is a convolutional neural network; the second category labels carried by all classified images constitute a second category label set, and the label types in the first category label set are less than those in the second category label set.
  • the tag type of the category tag set is a convolutional neural network
  • the performing back propagation based on the calculated loss value to update the network parameters until convergence further includes:
  • the embodiment of the application provides a model training method.
  • the execution subject of the model training method may be the model training device provided in the embodiment of the application, or an electronic device integrated with the model training device, wherein the model training device may use hardware or Realized by software.
  • the electronic device may be a smart phone, a tablet computer, a palmtop computer, a notebook computer, or a desktop computer and other devices.
  • FIG. 1 is a schematic diagram of the first process of the model training method provided by an embodiment of this application.
  • the specific process of the model training method provided in the embodiment of the application may be as follows:
  • a sample image set is acquired, and the sample image set includes a target detection image and a classification image, where the target detection image carries position information and a first category label.
  • Multi-classification of images based on target detection belongs to strong supervision, and the location information of each category in the image needs to be provided.
  • labeling location information is a huge labor cost.
  • General image multi-classification is a weakly supervised image classification method. This classification method only needs to label the category name of the image, but this classification method cannot identify the position of the category object in the image.
  • the model training solution of the embodiment of the present application can be applied to an image classification and positioning model.
  • the model can not only identify the category of the image, but also identify the position of the category object in the image. For example, the location of the category object can be marked by the target frame.
  • the model can be constructed based on a deep neural network, for example, a BP (back propagation) neural network, a convolutional neural network, and so on.
  • This application uses a mixture of two training samples to form a sample image set, where the two sample images include a target detection image and a classification image.
  • the target detection image carries a category label and also has location information. The location information indicates that the category object is in the image. In the location.
  • the classified image carries a category label.
  • the category label carried by the target detection image is recorded as the first category label
  • the category label carried by the classification image is recorded as the second category label.
  • the first category labels carried by all target detection images constitute a first category label set; the second category labels carried by all classified images constitute a second category label set.
  • the tag categories in the second category tag set may partially overlap with the category tags in the first category tag set.
  • the sample images in the sample image set are input into a preset deep neural network for training.
  • sample image training model Two kinds of training samples are mixed to form a sample image training model, which is essentially a joint training of strong supervision algorithms and direct classification.
  • sample pictures in the sample image set mixed with the target detection image and the classification image will be randomly input into the preset neural network for calculation.
  • different loss functions are used to calculate the loss value.
  • the loss value is calculated based on the first loss function.
  • the loss value is calculated based on the first loss function and the second loss function.
  • the loss function in the network is composed of the first loss function, and the first loss function is used to calculate the loss value generated during image classification. Since there is no target frame in the training data at this time, when the error information is backpropagated, only the network parameters related to the classification training part will be updated, and the network parameters related to the target detection part will not be updated. Since the target frame is carried in the training data at this time, when the error information is backpropagated, the network parameters related to the classification training part and the network parameters related to the target detection part will be updated, that is, the network parameters related to the target detection part will be updated. Update all network parameters.
  • the loss function in the network consists of the first loss function and the second loss function.
  • the second loss function is used to calculate the loss value generated when the image is detected by the target, and the first loss function is used for Calculate the loss value when classifying the image.
  • L p 0.
  • the loss function can be selected according to the deep neural network used.
  • a mean square error function or a cross entropy function can be used as the loss function.
  • back-propagation is performed based on the calculated loss value to update the network parameters until convergence, and an image recognition model is obtained.
  • the image recognition model is used to identify the category of the input image and the location of the category object.
  • the loss value is calculated based on the above loss function and calculation method, and back propagation is performed based on the calculated loss value to update the network parameters until the network converges. For example, until the number of iterative training reaches a preset value, or until the loss value reaches a minimum, or until the loss value is less than the preset value.
  • the network parameters are determined, and the deep neural network after determining the network parameters is used as the image number recognition model.
  • the location information indicates the specific position of the category object in the image, so that the network can more accurately extract the characteristics of the category object during the training process of the network. .
  • the sample image input to the network is a classified image
  • the network's ability to recognize the characteristics of the category object is enhanced, and the classification can be more accurately identified
  • the characteristics of the category object in the image, and the location of the category object is determined with high accuracy.
  • the category object in this application refers to the object corresponding to the category label corresponding to the sample image.
  • the preset deep neural network as a convolutional neural network as an example, use the cross entropy function as the loss function, input the training data, calculate the loss value according to the loss function, and backpropagate based on the loss value to optimize the convolutions of the network The weight of each convolution kernel in the layer.
  • this application is not limited by the order of execution of the various steps described, and certain steps may also be performed in other order or at the same time if there is no conflict.
  • the model training method proposed in the embodiment of this application when training a deep neural network, obtains a sample image set containing target detection images and classification images, and uses the sample images in the sample image set to train a preset deep neural network. Training.
  • the loss value is calculated based on the first loss function.
  • the loss value is calculated based on the first loss function and the first loss function.
  • the second loss function calculates the loss value and performs back propagation based on the loss value to update the network parameters until convergence.
  • the target detection image and the classification image are combined to train the preset deep neural network, because the target detection image carries There are location information and the first category label.
  • the location information indicates the specific location of the category object in the image, so that in the process of training the network, the network can more accurately extract the characteristics of the category object, and improve the image recognition model training. The accuracy of the image recognition model for image category prediction.
  • FIG. 2 is a schematic diagram of a second process of a model training method provided by an embodiment of the present invention.
  • the method includes:
  • a sample image set is obtained.
  • the sample image set contains target detection images and classification images.
  • the target detection images carry position information and first category labels.
  • the first category labels carried by all target detection images constitute the first category. Label collection.
  • This embodiment uses a mixture of two training samples to form a sample image set, where the two sample images include a target detection image and a classification image.
  • the target detection image carries a category label and also has location information. The location information indicates that the category object is in The position in the image.
  • the classified image carries a category label.
  • the category label carried by the target detection image is recorded as the first category label
  • the category label carried by the classification image is recorded as the second category label.
  • the first category labels carried by all target detection images constitute a first category label set; the second category labels carried by all classified images constitute a second category label set.
  • the tag categories in the second category tag set may partially overlap with the category tags in the first category tag set.
  • this deep neural network is used to classify animals.
  • the sample image is an animal image, where the target detection image not only carries the category label of the animal, but also identifies the location of the category animal corresponding to the image in the form of a target frame in each image.
  • the animal categories in the target detection image are only animal categories, such as dogs, cats, deer, etc., but there is no more detailed category classification, for example, dogs are not divided into golden retrievers, huskies, and shepherds.
  • the classified image only carries the category label of the animal, and does not identify the specific position of the animal in the image.
  • the classified image has a broader and deeper category label.
  • the category of the classified image includes a large category that is not in the target detection image. For example, there is no elephant category in the target detection image, but this category is present in the classified image.
  • the category of the classified image may also include small categories that are not in the target detection image. For example, there are no small categories such as golden retriever, husky, and shepherd in the target detection image, but there are these categories in the classified image.
  • the number of types of category labels in the second category label set may be greater than the number of types of category labels in the first category label set.
  • the above two sample images are mixed together as training samples, and the deep neural network is trained by joint training.
  • the trained network can also detect small categories of dogs that have not appeared in the target detection image. Will output location information with higher accuracy.
  • the sample images in the sample image set are input into a preset deep neural network for training.
  • sample image training model Two kinds of training samples are mixed to form a sample image training model, which is essentially a joint training of strong supervision algorithms and direct classification.
  • sample pictures in the sample image set mixed with the target detection image and the classification image will be randomly input into the preset neural network for calculation.
  • different loss functions are used to calculate the loss value.
  • the sample image input to the deep neural network is a classification image
  • the loss value is calculated based on the first loss function.
  • the loss value is calculated based on the third loss function, where when the input sample images are the same, the loss value calculated by the first loss function is smaller than the loss value calculated by the third loss function.
  • the trained network can also output high-accuracy position information for small categories of dogs that have not appeared in the target detection image.
  • the category of the classified image contains the elephant category that is not in the target detection image.
  • the accuracy of the position detection is Will be worse.
  • a new loss value calculation method is used to solve this problem.
  • the sample image input to the deep neural network is a classification image
  • the category of the classified image contains categories that are not in the target detection image
  • the use is different from another situation (the category of the classified image is included in the category of the target detection image
  • the third loss function calculates the loss value, which makes the calculated loss value larger, makes the network more sensitive to this category, and can learn the features of this category of images more accurately to optimize the model parameters. In turn, the accuracy of detection of categories and targets is improved.
  • a weight coefficient is multiplied to obtain the third loss function.
  • the first loss function is m*f
  • the third loss function is n*f, where f is the basic loss function, 0 ⁇ m ⁇ 1, n>1.
  • f is the cross entropy loss function
  • the calculation formula of the first loss function is the calculation formula of the cross entropy loss function multiplied by a positive number less than 1
  • the calculation formula of the second loss function is the calculation formula of the cross entropy loss function Multiply by a constant greater than 1.
  • the loss value is calculated based on the first loss function and the second loss function.
  • the loss function in the network consists of the first loss function and the second loss function.
  • the second loss function is used to calculate the loss value generated when the image is detected by the target, and the first loss function is used for Calculate the loss value when classifying the image.
  • back-propagation is performed based on the calculated loss value to update the network parameters until convergence, and an image recognition model is obtained.
  • the image recognition model is used to identify the category of the input image and the location of the category object.
  • performing backpropagation based on the calculated loss value to update the network parameters until convergence further includes: acquiring an image to be classified; performing image recognition on the image to be classified according to the image recognition model to determine The target category corresponding to the image to be classified, and the position of the object belonging to the target category in the image to be classified.
  • the image recognition model obtained by training is used to recognize the image category, and the image to be classified is input into the image recognition model for calculation to obtain the category label corresponding to the image to be classified and the corresponding category object in the image. position.
  • the model training method proposed in the embodiment of the present invention is based on the joint training of classification data and target detection data.
  • the sample image input to the deep neural network is a classification image
  • the classification image corresponding to the classification image does not have a class label
  • backpropagation is performed with a larger loss value, which expands the model's ability to recognize multiple categories and improves the accuracy of multiple categories.
  • the embodiment of the present application also provides a model training device, including:
  • An image acquisition module for acquiring a sample image set, the sample image set contains a target detection image and a classification image, wherein the target detection image carries location information and a first category label;
  • An image input module configured to input sample images in the sample image set into a preset deep neural network for training
  • the first calculation module is configured to calculate a loss value based on a first loss function if the sample image input to the deep neural network is the classified image;
  • a second calculation module configured to calculate a loss value based on the first loss function and the second loss function if the sample image input to the deep neural network is the target detection image;
  • the iterative training module is used to perform back propagation based on the calculated loss value to update the network parameters until convergence to obtain an image recognition model.
  • the image recognition model is used to recognize the category of the input image and the location of the category object.
  • the classified image carries a second category label
  • the first category labels carried by all target detection images constitute a first category label set
  • the device further includes:
  • the label detection module is configured to determine whether the second category label corresponding to the input classification image is included in the first category label set if the sample image input to the deep neural network is the classification image;
  • the first calculation module is also used for:
  • the loss value is calculated based on the third loss function, where, when the input sample images are the same, the first loss function is calculated The obtained loss value is less than the loss value calculated by the third loss function.
  • the third loss function k*first loss function, where k>1.
  • the first loss function is m*f
  • the third loss function is n*f, where f is the basic loss function, 0 ⁇ m ⁇ 1, n>1.
  • the deep neural network is a convolutional neural network; the second category labels carried by all classified images constitute a second category label set, and the label types in the first category label set are less than those in the second category label set.
  • the tag type of the category tag set is a convolutional neural network
  • the device further includes an image classification module, and the image classification module is configured to:
  • a model training device is also provided.
  • FIG. 3 is a schematic structural diagram of a model training apparatus 300 provided by an embodiment of the application.
  • the model training device 300 is applied to electronic equipment.
  • the model training device 300 includes an image acquisition module 301, an image input module 302, a first calculation module 303, a second calculation module 304, and an iterative training module 305, as follows:
  • the image acquisition module 301 is configured to acquire a sample image set, the sample image set contains a target detection image and a classification image, wherein the target detection image carries location information and a first category label;
  • the image input module 302 is configured to input sample images in the sample image set into a preset deep neural network for training;
  • the first calculation module 303 is configured to calculate a loss value based on a first loss function if the sample image input to the deep neural network is the classified image;
  • the second calculation module 304 is configured to calculate a loss value based on the first loss function and the second loss function if the sample image input to the deep neural network is the target detection image;
  • the iterative training module 305 is configured to perform back propagation based on the calculated loss value to update the network parameters until convergence to obtain an image recognition model.
  • the image recognition model is used to recognize the category of the input image and the location of the category object.
  • the classification image carries a second category label
  • the target detection image carries location information and a first category label
  • the first category labels carried by all target detection images constitute a first category label set
  • the model training device 300 also includes a label detection module for determining whether the second category label corresponding to the input classification image is included in the classification image if the sample image input to the deep neural network is the classification image.
  • a label detection module for determining whether the second category label corresponding to the input classification image is included in the classification image if the sample image input to the deep neural network is the classification image.
  • the first calculation module 303 is further configured to: if the second category label corresponding to the input classification image is included in the first category label set, calculate a loss value based on the first loss function;
  • the loss value is calculated based on the third loss function, wherein when the input sample images are the same, the first loss The loss value calculated by the function is smaller than the loss value calculated by the third loss function.
  • the third loss function k*first loss function, where k>1.
  • the first loss function is m*f
  • the third loss function is n*f, where f is the basic loss function, 0 ⁇ m ⁇ 1, n>1.
  • the deep neural network is a convolutional neural network; the second category labels carried by all classified images constitute a second category label set, and the label types in the first category label set are less than those in the second category label set.
  • the tag type of the category tag set is a convolutional neural network
  • the model training device 300 further includes a target image classification module, and the image classification module is used to: obtain the image to be classified; perform image recognition on the image to be classified according to the image recognition model to determine the The target category corresponding to the image to be classified, and the position of the object belonging to the target category in the image to be classified.
  • each of the above modules can be implemented as an independent entity, or can be combined arbitrarily, and implemented as the same or several entities.
  • each of the above modules please refer to the previous method embodiments, which will not be repeated here.
  • model training device provided in this embodiment of the application belongs to the same concept as the model training method in the above embodiment. Any method provided in the model training method embodiment can be run on the model training device, and its specific implementation For details of the process, refer to the embodiment of the model training method, which will not be repeated here.
  • the model training device proposed in this embodiment of the application obtains a sample image set containing target detection images and classification images when training a deep neural network, and uses the sample images in the sample image set to train a preset deep neural network to perform Training.
  • the loss value is calculated based on the first loss function.
  • the loss value is calculated based on the first loss function and the first loss function.
  • the second loss function calculates the loss value and performs back propagation based on the loss value to update the network parameters until convergence.
  • the target detection image and the classification image are combined to train the preset deep neural network, because the target detection image carries There are location information and the first category label.
  • the location information indicates the specific location of the category object in the image, so that in the process of training the network, the network can more accurately extract the characteristics of the category object, and improve the image recognition model training. The accuracy of the image recognition model for image category prediction.
  • the embodiments of the present application also provide an electronic device, which may be a mobile terminal such as a tablet computer or a smart phone.
  • an electronic device which may be a mobile terminal such as a tablet computer or a smart phone.
  • FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the application.
  • the electronic device 800 may include a camera module 801, a memory 802, a processor 803, a touch screen 804, a speaker 805, a microphone 806 and other components.
  • the camera module 801 may include a model training circuit, which may be implemented by hardware and/or software components, and may include various processing units that define an image signal processing (Image Signal Processing) pipeline.
  • the model training circuit may at least include a camera, an image signal processor (Image Signal Processor, ISP processor), a control logic, an image memory, a display, and so on.
  • the camera may at least include one or more lenses and image sensors.
  • the image sensor may include a color filter array (such as a Bayer filter). The image sensor can obtain the light intensity and wavelength information captured with each imaging pixel of the image sensor, and provide a set of raw image data that can be processed by the image signal processor.
  • the image signal processor can process the original image data pixel by pixel in a variety of formats. For example, each image pixel may have a bit depth of 8, 10, 12, or 14 bits, and the image signal processor may perform one or more model training operations on the original image data and collect statistical information about the image data. Among them, the model training operation can be performed with the same or different bit depth accuracy.
  • the original image data can be stored in the image memory after being processed by the image signal processor.
  • the image signal processor can also receive image data from the image memory.
  • the image memory may be a part of a memory device, a storage device, or an independent dedicated memory in an electronic device, and may include DMA (Direct Memory Access) features.
  • DMA Direct Memory Access
  • the image signal processor can perform one or more model training operations, such as temporal filtering.
  • the processed image data can be sent to the image memory for additional processing before being displayed.
  • the image signal processor may also receive processed data from the image memory, and perform image data processing in the original domain and in the RGB and YCbCr color spaces on the processed data.
  • the processed image data can be output to a display for viewing by the user and/or further processed by a graphics engine or GPU (Graphics Processing Unit, graphics processor).
  • the output of the image signal processor can also be sent to the image memory, and the display can read image data from the image memory.
  • the image memory may be configured to implement one or more frame buffers.
  • the statistical data determined by the image signal processor can be sent to the control logic.
  • the statistical data may include the statistical information of the image sensor such as automatic exposure, automatic white balance, automatic focus, flicker detection, black level compensation, and lens shading correction.
  • the control logic may include a processor and/or microcontroller that executes one or more routines (such as firmware).
  • routines can determine the control parameters of the camera and the ISP control parameters based on the received statistical data.
  • the control parameters of the camera may include camera flash control parameters, lens control parameters (for example, focal length for focusing or zooming), or a combination of these parameters.
  • ISP control parameters may include gain levels and color correction matrices for automatic white balance and color adjustment (for example, during RGB processing).
  • FIG. 5 is a schematic diagram of the structure of the model training circuit in this embodiment. For ease of description, only various aspects of the model training technology related to the embodiment of the present invention are shown.
  • the model training circuit may include: a camera, an image signal processor, a control logic, an image memory, and a display.
  • the camera may include one or more lenses and image sensors.
  • the camera may be any one of a telephoto camera or a wide-angle camera.
  • the images collected by the camera are transmitted to the image signal processor for processing.
  • the image signal processor processes the image, it can send the statistical data of the image (such as the brightness of the image, the contrast value of the image, the color of the image, etc.) to the control logic.
  • the control logic can determine the control parameters of the camera according to the statistical data, so that the camera can perform operations such as autofocus and automatic exposure according to the control parameters.
  • the image can be stored in the image memory after being processed by the image signal processor.
  • the image signal processor can also read the image stored in the image memory for processing.
  • the image can be directly sent to the monitor for display after being processed by the image signal processor.
  • the display can also read the image in the image memory for display.
  • the electronic device may also include a CPU and a power supply module.
  • the CPU is connected to the logic controller, image signal processor, image memory, and display, and the CPU is used to implement global control.
  • the power supply module is used to supply power to each module.
  • the application program stored in the memory 802 contains executable code.
  • Application programs can be composed of various functional modules.
  • the processor 803 executes various functional applications and data processing by running application programs stored in the memory 802.
  • the processor 803 is the control center of the electronic device. It uses various interfaces and lines to connect the various parts of the entire electronic device, and executes the electronic device by running or executing the application program stored in the memory 802 and calling the data stored in the memory 802.
  • the various functions and processing data of the electronic equipment can be used to monitor the electronic equipment as a whole.
  • the touch display screen 804 may be used to receive a user's touch control operation on the electronic device.
  • the speaker 805 can play sound signals.
  • the microphone 806 can be used to pick up sound signals.
  • the processor 803 in the electronic device will load the executable code corresponding to the process of one or more application programs into the memory 802 according to the following instructions, and the processor 803 will run and store the executable code in the memory. 802 application program to execute:
  • the sample image set contains a target detection image and a classification image, wherein the target detection image carries location information and a first category label;
  • the sample image input to the deep neural network is the target detection image, calculating a loss value based on the first loss function and the second loss function;
  • Backpropagation is performed based on the calculated loss value to update the network parameters until convergence, and an image recognition model is obtained.
  • the image recognition model is used to recognize the category of the input image and the location of the category object.
  • the classified image carries a second category label
  • the target detection image carries location information and a first category label
  • the first category labels carried by all target detection images constitute a first category label set
  • the sample image input to the deep neural network is the classification image, determining whether the second category label corresponding to the input classification image is included in the first category label set;
  • the loss value is calculated based on the third loss function, where when the input sample images are the same, the loss value calculated by the first loss function is smaller than the loss value calculated by the third loss function.
  • the processor 803 also executes:
  • an embodiment of the present application provides an electronic device that, when training a deep neural network, obtains a sample image set containing target detection images and classification images, and uses sample image training presets in the sample image set
  • the loss value is calculated based on the first loss function.
  • the loss value is calculated based on the first loss function.
  • the first loss function and the second loss function calculate the loss value, and perform back propagation based on the loss value to update the network parameters until convergence.
  • the target detection image and the classification image are combined to train the preset deep neural network, Since the target detection image carries the location information and the first category label, the location information indicates the specific location of the category object in the image, so that in the process of training the network, the network can more accurately extract the characteristics of the category object and improve the image The accuracy of the image recognition model obtained by the recognition model training for the image category prediction.
  • An embodiment of the present application also provides a storage medium in which a computer program is stored.
  • the computer program When the computer program is run on a computer, the computer executes the model training method described in any of the above embodiments.
  • the storage medium may include, but is not limited to: read only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

一种模型训练方法、装置、存储介质及电子设备,其中,该方法包括:获取样本图像集;将样本图像集输入深度神经网络训练;若输入的为分类图像,则基于第一损失函数计算损失值;若输入的为目标检测图像,则基于第一损失函数和第二损失函数计算损失值;基于损失值进行反向传播以更新网络参数直至收敛,得到图像识别模型。该方法能够提高深度神经网络对图像类别预测和目标检测的准确度。

Description

模型训练方法、装置、存储介质及电子设备 技术领域
本申请涉及图像处理技术领域,具体涉及一种模型训练方法、装置、存储介质及电子设备。
背景技术
图像处理是一种使用计算机对图像进行分析,以达到所需结果的技术。而在图像处理技术领域中,图像的类别预测已经成为重要的研究课题。随着神经网络模型的研究推进,通过模型对图像进行类别预测从而得到图像的预测类别的方法逐渐受到了广泛认可。由此可见,如何通过模型训练,以提高对后续图像类别预测的准确性尤其重要。
发明内容
本申请实施例提供一种模型训练方法、装置、存储介质及电子设备,能够提高深度神经网络对图像类别预测的准确度。
第一方面,本申请实施例提供一种模型训练方法,包括:
获取样本图像集,所述样本图像集中包含有目标检测图像和分类图像,其中,所述目标检测图像携带有位置信息和第一类别标签;
将所述样本图像集中的样本图像输入预设的深度神经网络进行训练;
若输入所述深度神经网络的样本图像为所述分类图像,则基于第一损失函数计算损失值;
若输入所述深度神经网络的样本图像为所述目标检测图像,则基于所述第一损失函数和第二损失函数计算损失值;
基于计算得到的损失值进行反向传播以更新网络参数直至收敛,得到图像识别模型,所述图像识别模型用于识别输入图像的类别以及类别物的位置
第二方面,本申请实施例提供一种模型训练装置,包括:
图像获取模块,用于获取样本图像集,所述样本图像集中包含有目标检测图像和分类图像,其中,所述目标检测图像携带有位置信息和第一类别标签;
图像输入模块,用于将所述样本图像集中的样本图像输入预设的深度神经网络进行训练;
第一计算模块,用于若输入所述深度神经网络的样本图像为所述分类图像,则基于第一损失函数计算损失值;
第二计算模块,用于若输入所述深度神经网络的样本图像为所述目标检测图像,则基于所述第一损失函数和第二损失函数计算损失值;
迭代训练模块,用于基于计算得到的损失值进行反向传播以更新网络参数直至收敛,得到图像识别模型,所述图像识别模型用于识别输入图像的类别以及类别物的位置
第三方面,本申请实施例提供一种存储介质,其上存储有计算机程序,当所述计算机程序在计算机上运行时,使得所述计算机执行:
获取样本图像集,所述样本图像集中包含有目标检测图像和分类图像,其中,所述目标检测图像携带有位置信息和第一类别标签;
将所述样本图像集中的样本图像输入预设的深度神经网络进行训练;
若输入所述深度神经网络的样本图像为所述分类图像,则基于第一损失函数计算损失值;
若输入所述深度神经网络的样本图像为所述目标检测图像,则基于所述第一损失函数和第二损失函数计算损失值;
基于计算得到的损失值进行反向传播以更新网络参数直至收敛,得到图像识别模型,所述图像识别模型用于识别输入图像的类别以及类别物的位置。
第四方面,本申请实施例提供一种电子设备,包括处理器和存储器,所述存储器有计算机程序,所述处理器通过调用所述计算机程序,用于执行:
获取样本图像集,所述样本图像集中包含有目标检测图像和分类图像,其中,所述目标检测图像携带有位置信息和第一类别标签;
将所述样本图像集中的样本图像输入预设的深度神经网络进行训练;
若输入所述深度神经网络的样本图像为所述分类图像,则基于第一损失函数计算损失值;
若输入所述深度神经网络的样本图像为所述目标检测图像,则基于所述第一损失函数和第二损失函数计算损失值;
基于计算得到的损失值进行反向传播以更新网络参数直至收敛,得到图像识别模型,所述图像识别模型用于识别输入图像的类别以及类别物的位置。
本申请实施例提供的方案,在训练深度神经网络时,获取包含有目标检测图像和分类图像的样本图像集合,使用样本图像集中的样本图像训练预设的深度神经网络进行训练,在训练过程中,当输入深度神经网络的样本图像为分类图像时,基于第一损失函数计算损失值,当输入深度神经网络的样本图像为目标检测图像时,基于第一损失函数和第二损失函数计算损失值,并基于损失值进行反向传播,以更新网络参数直至收敛,上述训练方案中,联合目标检测图像和分类图像对预设的深度神经网络进行训练,由于目标检测图像携带有位置信息和第一类别标签,位置信息指示了类别物在图像中的具体位置,使得在训 练网络的过程中,网络能够更准确地提取到类别物的特征,提高了图像识别模型训练得到的图像识别模型对于图像类别预测的准确度。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的模型训练方法的第一种流程示意图。
图2为本申请实施例提供的模型训练方法的第二种流程示意图。
图3为本申请实施例提供的模型训练装置的结构示意图。
图4为本申请实施例提供的电子设备的结构示意图。
图5为本申请实施例提供的电子设备的模型训练电路的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域技术人员在没有付出创造性劳动前提下所获得的所有其他实施例,都属于本申请的保护范围。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
本申请实施例提供一种模型训练方法,包括:
获取样本图像集,所述样本图像集中包含有目标检测图像和分类图像,其中,所述目标检测图像携带有位置信息和第一类别标签;
将所述样本图像集中的样本图像输入预设的深度神经网络进行训练;
若输入所述深度神经网络的样本图像为所述分类图像,则基于第一损失函数计算损失值;
若输入所述深度神经网络的样本图像为所述目标检测图像,则基于所述第一损失函数和第二损失函数计算损失值;
基于计算得到的损失值进行反向传播以更新网络参数直至收敛,得到图像识别模型,所述图像识别模型用于识别输入图像的类别以及类别物的位置。
在一些实施例中,所述分类图像携带有第二类别标签,所述目标检测图像携带有位置信息和第一类别标签,全部目标检测图像携带的第一类别标签构成第一类别标签集合;
所述基于第一损失函数计算损失值之前,还包括:
若输入所述深度神经网络的样本图像为所述分类图像,则判断输入的分类图像对应的第二类别标签是否包含在所述第一类别标签集合中;
若是,则执行基于所述第一损失函数计算损失值;
若否,则基于第三损失函数计算损失值,其中,当输入的样本图像相同时,所述第一损失函数计算得到的损失值小于所述第三损失函数计算得到的损失值。
在一些实施例中,所述第三损失函数=k*第一损失函数,其中,k>1。
在一些实施例中,所述第一损失函数为m*f,所述第三损失函数为n*f,其中,f为基础损失函数,0<m<1,n>1。
在一些实施例中,所述深度神经网络为卷积神经网络;全部分类图像携带的第二类别标签构成第二类别标签集合,所述第一类别标签集合中的标签种类少于所述第二类别标签集合的标签种类。
在一些实施例中,所述深度神经网络为卷积神经网络;全部分类图像携带的第二类别标签构成第二类别标签集合,所述第一类别标签集合中的标签种类少于所述第二类别标签集合的标签种类。
在一些实施例中,所述基于计算得到的损失值进行反向传播,以更新网络参数直至收敛之后,还包括:
获取待分类图像;
根据所述图像识别模型对所述待分类图像进行图像识别,以确定所述待分类图像对应的目标类别,以及属于所述目标类别的对象在所述待分类图像中的位置。
本申请实施例提供一种模型训练方法,该模型训练方法的执行主体可以是本申请实施例提供的模型训练装置,或者集成了该模型训练装置的电子设备,其中该模型训练装置可以采用硬件或者软件的方式实现。其中,电子设备可以是智能手机、平板电脑、掌上电脑、笔记本电脑、或者台式电脑等设备。
请参照图1,图1为本申请实施例提供的模型训练方法的第一种流程示意图。本申请实施例提供的模型训练方法的具体流程可以如下:
在101中、获取样本图像集,样本图像集中包含有目标检测图像和分类图像,其中,目标检测图像携带有位置信息和第一类别标签。
基于目标检测的图像多分类属于强监督,需要提供图像中每个类别物的位置信息,但是这样的分类模型的训练样本量大时,标注位置信息是一个很大的人力成本。一般的图像多分类属于弱监督的图像分类方法,这种分类方法只需要标注图像的类别名称即可,但是这种分类方法无法识别类别物在图像中位置。
本申请实施例的模型训练方案可以应用于图像分类与定位模型,该模型不仅可以识别图像的类别,而且可以识别类别物在图像中位置。例如,可以通过目标框标记出类别物所在的位置。其中,该模型可以基于深度神经网络进行构建,例如,BP(back propagation,反向传播)神经网络,卷积神经网络等。
本申请采用两种训练样本混合构成样本图像集,其中,两种样本图像包括目标检测图像和分类图像,目标检测图像即携带有类别标签,又具有位置信息,该位置信息指示了类别物在图像中的位置。分类图像携带有类别标签。为了便于接下来的描述,将目标检测图像携带的类别标签记为第一类别标签,将分类图像携带的类别标签记为第二类别标签。全部目标检测图像携带的第一类别标签构成第一类别标签集合;全部分类图像携带的第二类别标签构成第二类别标签集合。在一些实施例中,第二类别标签集合中的标签类别可以与第一类别标签集合中的类别标签有部分重合。
在102中、将样本图像集中的样本图像输入预设的深度神经网络进行训练。
采用两种训练样本混合构成样本图像训练模型,实质上是强监督算法和直接分类联合训练。在训练的过程中,混合有目标检测图像和分类图像的样本图像集中的样本图片会随机地输入到预设的神经网络中进行计算。并根据输入的样本图像的种类的不同,采用不同的损失函数计算损失值。
在103中、若输入深度神经网络的样本图像为分类图像,则基于第一损失函数计算损失值。
在104中、若输入深度神经网络的样本图像为目标检测图像,则基于第一损失函数和第二损失函数计算损失值。
当使用分类图像训练网络时,网络中的损失函数由第一损失函数构成,第一损失函数用于计算图像分类时产生的损失值。由于此时训练数据中没有目标框,因此,在误差信息反向传播时,只会对涉及到分类训练部分的网络参数进行更新,而涉及到目标检测部分的网络参数则不会进行更新。由于此时训练数据中携带有目标框,因此,在误差信息反向传播时,会对涉及到分类训练部分的网络参数,以及涉及到目标检测部分的网络参数则进行更新,也就是说,会对全部的网络参数更新。
当使用目标检测图像训练网络时,网络中的损失函数由第一损失函数和第 二损失函数构成,第二损失函数用于计算对图像进行目标检测时产生的损失值,第一损失函数用于计算对图像分类时产生损失值。
因此,该深度神经网络的训练过程中涉及到两个损失函数。其总的损失函数可以表示为L=L p+L cls,其中,L cls为第一损失函数,L p为第二损失函数。当若输入深度神经网络的样本图像为分类图像,则L p=0。
其中,在本实施例中,可以根据使用的深度神经网络选择损失函数。例如,可以采用均方差函数或者交叉熵函数等作为损失函数。
在105中、基于计算得到的损失值进行反向传播以更新网络参数直至收敛,得到图像识别模型,图像识别模型用于识别输入图像的类别以及类别物的位置。
在网络的训练过程中,基于上述损失函数和计算方式计算损失值,并基于计算得到的损失值进行反向传播,以更新网络参数直至网络收敛。例如,直至迭代训练的次数达到预设值,或者直至损失值达到最小,或者直至损失值小于预设值。在训练至收敛后,确定网络参数,并将确定网络参数后的深度神经网络作为图像数识别模型。
其中,在网络的训练过程中,由于目标检测图像携带有位置信息,位置信息指示了类别物在图像中的具体位置,使得在训练网络的过程中,网络能够更准确地提取到类别物的特征。通过这样的方式,输入到网络的样本图像为分类图像时,即使分类图像中没有携带位置信息,由于经过目标检测图像的训练,网络识别类别物的特征的能力增强,也能够更准确的识别分类图像中类别物的特征,并以较高准确度确定类别物所在的位置。可以理解的是,本申请中的类别物是指样本图像对应的类别标签对应的物体。
例如,以预设的深度神经网络为卷积神经网络为例,使用交叉熵函数作为损失函数,输入训练数据,根据损失函数计算损失值,基于损失值反向传播,以优化网络的各卷积层中各卷积核中的权重。
具体实施时,本申请不受所描述的各个步骤的执行顺序的限制,在不产生冲突的情况下,某些步骤还可以采用其它顺序进行或者同时进行。
由上可知,本申请实施例提出的模型训练方法,在训练深度神经网络时,获取包含有目标检测图像和分类图像的样本图像集合,使用样本图像集中的样本图像训练预设的深度神经网络进行训练,在训练过程中,当输入深度神经网络的样本图像为分类图像时,基于第一损失函数计算损失值,当输入深度神经网络的样本图像为目标检测图像时,基于第一损失函数和第二损失函数计算损失值,并基于损失值进行反向传播,以更新网络参数直至收敛,上述训练方案中,联合目标检测图像和分类图像对预设的深度神经网络进行训练,由于目标 检测图像携带有位置信息和第一类别标签,位置信息指示了类别物在图像中的具体位置,使得在训练网络的过程中,网络能够更准确地提取到类别物的特征,提高了图像识别模型训练得到的图像识别模型对于图像类别预测的准确度。
下面将在上述实施例描述的方法基础上,对本申请的模型训练方法做进一步详细介绍。请参阅图2,图2是本发明实施例提供的模型训练方法的第二流程示意图。该方法包括:
在201中,获取样本图像集,样本图像集中包含有目标检测图像和分类图像,其中,目标检测图像携带有位置信息和第一类别标签,全部目标检测图像携带的第一类别标签构成第一类别标签集合。
本实施例采用两种训练样本混合构成样本图像集,其中,两种样本图像包括目标检测图像和分类图像,目标检测图像即携带有类别标签,又具有位置信息,该位置信息指示了类别物在图像中的位置。分类图像携带有类别标签。为了便于接下来的描述,将目标检测图像携带的类别标签记为第一类别标签,将分类图像携带的类别标签记为第二类别标签。全部目标检测图像携带的第一类别标签构成第一类别标签集合;全部分类图像携带的第二类别标签构成第二类别标签集合。在一些实施例中,第二类别标签集合中的标签类别可以与第一类别标签集合中的类别标签有部分重合。
例如,将该深度神经网络用于动物的分类。样本图像为动物图像,其中,目标检测图像不仅携带有动物的类别标签,每一张图像中还以目标框的形式标识出了该图像对应的类别动物所在的位置。但是,目标检测图像中的动物类别只有动物大类,例如,狗、猫、鹿等,但是没有更细的类别划分,例如,没有将狗分为金毛犬、哈士奇、牧羊犬等。与此同时,分类图像中仅携带有动物的类别标签,并没有标识出动物在图像中的具体位置,但是,分类图像具有更广和更深的类别标签。例如,分类图像的类别中包含有目标检测图像中所没有的大类,比如,目标检测图像中没有大象这个类别,但是分类图像中有这个类别。分类图像的类别中还可以包含有目标检测图像中所没有的小类别,例如,目标检测图像中没有金毛犬、哈士奇、牧羊犬等小类别,但是分类图像中有这些类别。也就是说,第二类别标签集合中的类别标签的种类数量可以大于第一类别标签集合中的类别标签的种类数量。
基于本实施例的方案,将上述两种样本图像混合在一起,作为训练样本,采用联合训练的方式训练深度神经网络,训练得到的网络能够对目标检测图像中没有出现过的小类别的狗也会输出较高准确度的位置信息。
在202中,将样本图像集中的样本图像输入预设的深度神经网络进行训练。
采用两种训练样本混合构成样本图像训练模型,实质上是强监督算法和直接分类联合训练。在训练的过程中,混合有目标检测图像和分类图像的样本图像集中的样本图片会随机地输入到预设的神经网络中进行计算。并根据输入的样本图像的种类的不同,采用不同的损失函数计算损失值。
在203中,若输入深度神经网络的样本图像为分类图像,则判断输入的分类图像对应的第二类别标签是否包含在第一类别标签集合中。
在204中,若是,则基于第一损失函数计算损失值。
在205中,若否,则基于第三损失函数计算损失值,其中,当输入的样本图像相同时,第一损失函数计算得到的损失值小于第三损失函数计算得到的损失值。
基于上述例子,虽然采用联合训练的方式训练深度神经网络,训练得到的网络能够对目标检测图像中没有出现过的小类别的狗也会输出较高准确度的位置信息。但是在训练过程中,对于分类图像中出现的目标检测图像中没有的大类别时,例如,分类图像的类别中包含有目标检测图像中所没有的大象类别,这个时候位置检测的准确率就会较差。本实施例中,以一种新的损失值计算方式来解决这个问题。
当输入到深度神经网络的样本图像为分类图像时,先判断输入的分类图像对应的第二类别标签是否包含在第一类别标签集合中。如果在,则基于第一损失函数计算损失值。如果不在,则基于第三损失函数计算损失值,其中,当输入的样本图像相同时,第一损失函数计算得到的损失值小于第三损失函数计算得到的损失值。即,当分类图像的类别中包含有目标检测图像中所没有的类别时,为了提高网络目标检测的准确度,此时使用区别于另一情况(分类图像的类别包含在目标检测图像的类别中的情况)的第三损失函数计算损失值,使得计算得到的损失值更大,使得网络对于这一类别更加敏感,能够更加准确地学习到这一类别图像的特征,以对模型参数进行优化,进而提升对类别和目标的检测准确度。
例如,在一些实施中,第三损失函数=k*第一损失函数,其中,k>1。该实施例中,在第一损失函数的公式基础上,乘以一个权重系数,得到第三损失函数,该权重系数为一个大于1的常数,比如,在一些实施例中,k=1~3;又比如,在一些实施例中,k=1~1.5;又比如,在一些实施例中,k=1.5~2。
又例如,在一些实施例中,第一损失函数为m*f,第三损失函数为n*f,其中,f为基础损失函数,0<m<1,n>1。例如,f为交叉熵损失函数,第一损失函数的计算公式为交叉熵损失函数的计算公式乘以一个小于1的正数得到的, 第二损失函数的计算公式为交叉熵损失函数的计算公式乘以一个大于1的常数得到的。
在206中,若输入深度神经网络的样本图像为目标检测图像,则基于第一损失函数和第二损失函数计算损失值。
当使用目标检测图像训练网络时,网络中的损失函数由第一损失函数和第二损失函数构成,第二损失函数用于计算对图像进行目标检测时产生的损失值,第一损失函数用于计算对图像分类时产生损失值。
在207中,基于计算得到的损失值进行反向传播以更新网络参数直至收敛,得到图像识别模型,图像识别模型用于识别输入图像的类别以及类别物的位置。
在一些实施例中基于计算得到的损失值进行反向传播,以更新网络参数直至收敛之后,还包括:获取待分类图像;根据所述图像识别模型对所述待分类图像进行图像识别,以确定所述待分类图像对应的目标类别,以及属于所述目标类别的对象在所述待分类图像中的位置。
该实施例中,使用训练得到的图像识别模型进行图像类别的识别,将待分类图像输入图像识别模型进行计算,得到该待分类图像对应的类别标签,以及图像中对应的类别物在图像中的位置。
由上可知,本发明实施例提出的模型训练方法,在将分类数据和目标检测数据联合训练的基础上,当输入深度神经网络的样本图像为分类图像时,如果该分类图像对应的类别标签没有包含在目标检测图像的类别标签中时,以较大的损失值进行反向传播,扩展该模型对多类别的识别能力,提高了多分类的准确率。
本申请实施例还提供一种模型训练装置,包括:
图像获取模块,用于获取样本图像集,所述样本图像集中包含有目标检测图像和分类图像,其中,所述目标检测图像携带有位置信息和第一类别标签;
图像输入模块,用于将所述样本图像集中的样本图像输入预设的深度神经网络进行训练;
第一计算模块,用于若输入所述深度神经网络的样本图像为所述分类图像,则基于第一损失函数计算损失值;
第二计算模块,用于若输入所述深度神经网络的样本图像为所述目标检测图像,则基于所述第一损失函数和第二损失函数计算损失值;
迭代训练模块,用于基于计算得到的损失值进行反向传播以更新网络参数直至收敛,得到图像识别模型,所述图像识别模型用于识别输入图像的类别以及类别物的位置。
在一些实施例中,所述分类图像携带有第二类别标签,全部目标检测图像携带的第一类别标签构成第一类别标签集合;所述装置还包括:
标签检测模块,用于若输入所述深度神经网络的样本图像为所述分类图像,则判断输入的分类图像对应的第二类别标签是否包含在所述第一类别标签集合中;
所述第一计算模块还用于:
若输入的分类图像对应的第二类别标签包含在所述第一类别标签集合中,则基于所述第一损失函数计算损失值;
若输入的分类图像对应的第二类别标签不包含在所述第一类别标签集合中,则基于第三损失函数计算损失值,其中,当输入的样本图像相同时,所述第一损失函数计算得到的损失值小于所述第三损失函数计算得到的损失值。
在一些实施例中,所述第三损失函数=k*第一损失函数,其中,k>1。
在一些实施例中,所述第一损失函数为m*f,所述第三损失函数为n*f,其中,f为基础损失函数,0<m<1,n>1。
在一些实施例中,所述深度神经网络为卷积神经网络;全部分类图像携带的第二类别标签构成第二类别标签集合,所述第一类别标签集合中的标签种类少于所述第二类别标签集合的标签种类。
在一些实施例中,所述装置还包括图像分类模块,所述图像分类模块用于:
获取待分类图像;
以及,根据所述图像识别模型对所述待分类图像进行图像识别,以确定所述待分类图像对应的目标类别,以及属于所述目标类别的对象在所述待分类图像中的位置。
在一实施例中还提供了一种模型训练装置。请参阅图3,图3为本申请实施例提供的模型训练装置300的结构示意图。其中该模型训练装置300应用于电子设备,该模型训练装置300包括图像获取模块301、图像输入模块302、第一计算模块303、第二计算模块304以及迭代训练模块305,如下:
图像获取模块301,用于获取样本图像集,所述样本图像集中包含有目标检测图像和分类图像,其中,所述目标检测图像携带有位置信息和第一类别标签;
图像输入模块302,用于将所述样本图像集中的样本图像输入预设的深度神经网络进行训练;
第一计算模块303,用于若输入所述深度神经网络的样本图像为所述分类图像,则基于第一损失函数计算损失值;
第二计算模块304,用于若输入所述深度神经网络的样本图像为所述目标检测图像,则基于所述第一损失函数和第二损失函数计算损失值;
迭代训练模块305,用于基于计算得到的损失值进行反向传播以更新网络参数直至收敛,得到图像识别模型,所述图像识别模型用于识别输入图像的类别以及类别物的位置。
在一些实施例中,所述分类图像携带有第二类别标签,所述目标检测图像携带有位置信息和第一类别标签,全部目标检测图像携带的第一类别标签构成第一类别标签集合;
该模型训练装置300还包括标签检测模块,该标签检测模块用于若输入所述深度神经网络的样本图像为所述分类图像,则判断输入的分类图像对应的第二类别标签是否包含在所述第一类别标签集合中;
第一计算模块303还用于:若输入的分类图像对应的第二类别标签包含在所述第一类别标签集合中,则基于所述第一损失函数计算损失值;
以及,若输入的分类图像对应的第二类别标签不包含在所述第一类别标签集合中,则基于第三损失函数计算损失值,其中,当输入的样本图像相同时,所述第一损失函数计算得到的损失值小于所述第三损失函数计算得到的损失值。
在一些实施例中,所述第三损失函数=k*第一损失函数,其中,k>1。
在一些实施例中,所述第一损失函数为m*f,所述第三损失函数为n*f,其中,f为基础损失函数,0<m<1,n>1。
在一些实施例中,所述深度神经网络为卷积神经网络;全部分类图像携带的第二类别标签构成第二类别标签集合,所述第一类别标签集合中的标签种类少于所述第二类别标签集合的标签种类。
在一些实施例中,该模型训练装置300还包括标图像分类模块,该图像分类模块用于:获取待分类图像;根据所述图像识别模型对所述待分类图像进行图像识别,以确定所述待分类图像对应的目标类别,以及属于所述目标类别的对象在所述待分类图像中的位置。
具体实施时,以上各个模块可以作为独立的实体来实现,也可以进行任意组合,作为同一或若干个实体来实现,以上各个模块的具体实施可参见前面的方法实施例,在此不再赘述。
应当说明的是,本申请实施例提供的模型训练装置与上文实施例中的模型训练方法属于同一构思,在模型训练装置上可以运行模型训练方法实施例中提供的任一方法,其具体实现过程详见模型训练方法实施例,此处不再赘述。
由上可知,本申请实施例提出的模型训练装置,在训练深度神经网络时,获取包含有目标检测图像和分类图像的样本图像集合,使用样本图像集中的样本图像训练预设的深度神经网络进行训练,在训练过程中,当输入深度神经网络的样本图像为分类图像时,基于第一损失函数计算损失值,当输入深度神经网络的样本图像为目标检测图像时,基于第一损失函数和第二损失函数计算损失值,并基于损失值进行反向传播,以更新网络参数直至收敛,上述训练方案中,联合目标检测图像和分类图像对预设的深度神经网络进行训练,由于目标检测图像携带有位置信息和第一类别标签,位置信息指示了类别物在图像中的具体位置,使得在训练网络的过程中,网络能够更准确地提取到类别物的特征,提高了图像识别模型训练得到的图像识别模型对于图像类别预测的准确度。
本申请实施例还提供一种电子设备,该电子设备可以是诸如平板电脑或者智能手机等移动终端。请参阅图4,图4为本申请实施例提供的电子设备的结构示意图。电子设备800可以包括摄像模组801、存储器802、处理器803、触摸显示屏804、扬声器805、麦克风806等部件。
摄像模组801可以包括模型训练电路,模型训练电路可以利用硬件和/或软件组件实现,可包括定义图像信号处理(Image Signal Processing)管线的各种处理单元。模型训练电路至少可以包括:摄像头、图像信号处理器(Image Signal Processor,ISP处理器)、控制逻辑器、图像存储器以及显示器等。其中摄像头至少可以包括一个或多个透镜和图像传感器。图像传感器可包括色彩滤镜阵列(如Bayer滤镜)。图像传感器可获取用图像传感器的每个成像像素捕捉的光强度和波长信息,并提供可由图像信号处理器处理的一组原始图像数据。
图像信号处理器可以按多种格式逐个像素地处理原始图像数据。例如,每个图像像素可具有8、10、12或14比特的位深度,图像信号处理器可对原始图像数据进行一个或多个模型训练操作、收集关于图像数据的统计信息。其中,模型训练操作可按相同或不同的位深度精度进行。原始图像数据经过图像信号处理器处理后可存储至图像存储器中。图像信号处理器还可从图像存储器处接收图像数据。
图像存储器可为存储器装置的一部分、存储设备、或电子设备内的独立的专用存储器,并可包括DMA(Direct Memory Access,直接直接存储器存取)特征。
当接收到来自图像存储器的图像数据时,图像信号处理器可进行一个或多个模型训练操作,如时域滤波。处理后的图像数据可发送给图像存储器,以便 在被显示之前进行另外的处理。图像信号处理器还可从图像存储器接收处理数据,并对所述处理数据进行原始域中以及RGB和YCbCr颜色空间中的图像数据处理。处理后的图像数据可输出给显示器,以供用户观看和/或由图形引擎或GPU(Graphics Processing Unit,图形处理器)进一步处理。此外,图像信号处理器的输出还可发送给图像存储器,且显示器可从图像存储器读取图像数据。在一种实施方式中,图像存储器可被配置为实现一个或多个帧缓冲器。
图像信号处理器确定的统计数据可发送给控制逻辑器。例如,统计数据可包括自动曝光、自动白平衡、自动聚焦、闪烁检测、黑电平补偿、透镜阴影校正等图像传感器的统计信息。
控制逻辑器可包括执行一个或多个例程(如固件)的处理器和/或微控制器。一个或多个例程可根据接收的统计数据,确定摄像头的控制参数以及ISP控制参数。例如,摄像头的控制参数可包括照相机闪光控制参数、透镜的控制参数(例如聚焦或变焦用焦距)、或这些参数的组合。ISP控制参数可包括用于自动白平衡和颜色调整(例如,在RGB处理期间)的增益水平和色彩校正矩阵等。
请参阅图5,图5为本实施例中模型训练电路的结构示意图。为便于说明,仅示出与本发明实施例相关的模型训练技术的各个方面。
例如模型训练电路可以包括:摄像头、图像信号处理器、控制逻辑器、图像存储器、显示器。其中,摄像头可以包括一个或多个透镜和图像传感器。在一些实施例中,摄像头可为长焦摄像头或广角摄像头中的任一者。
摄像头采集的图像传输给图像信号处理器进行处理。图像信号处理器处理图像后,可将图像的统计数据(如图像的亮度、图像的反差值、图像的颜色等)发送给控制逻辑器。控制逻辑器可根据统计数据确定摄像头的控制参数,从而摄像头可根据控制参数进行自动对焦、自动曝光等操作。图像经过图像信号处理器进行处理后可存储至图像存储器中。图像信号处理器也可以读取图像存储器中存储的图像以进行处理。另外,图像经过图像信号处理器进行处理后可直接发送至显示器进行显示。显示器也可以读取图像存储器中的图像以进行显示。
此外,图中没有展示的,电子设备还可以包括CPU和供电模块。CPU和逻辑控制器、图像信号处理器、图像存储器和显示器均连接,CPU用于实现全局控制。供电模块用于为各个模块供电。
存储器802存储的应用程序中包含有可执行代码。应用程序可以组成各种功能模块。处理器803通过运行存储在存储器802的应用程序,从而执行各种功能应用以及数据处理。
处理器803是电子设备的控制中心,利用各种接口和线路连接整个电子设 备的各个部分,通过运行或执行存储在存储器802内的应用程序,以及调用存储在存储器802内的数据,执行电子设备的各种功能和处理数据,从而对电子设备进行整体监控。
触摸显示屏804可以用于接收用户对电子设备的触摸控制操作。扬声器805可以播放声音信号。麦克风806可以用于拾取声音信号。
在本实施例中,电子设备中的处理器803会按照如下的指令,将一个或一个以上的应用程序的进程对应的可执行代码加载到存储器802中,并由处理器803来运行存储在存储器802中的应用程序,从而执行:
获取样本图像集,所述样本图像集中包含有目标检测图像和分类图像,其中,所述目标检测图像携带有位置信息和第一类别标签;
将所述样本图像集中的样本图像输入预设的深度神经网络进行训练;
若输入所述深度神经网络的样本图像为所述分类图像,则基于第一损失函数计算损失值;
若输入所述深度神经网络的样本图像为所述目标检测图像,则基于所述第一损失函数和第二损失函数计算损失值;
基于计算得到的损失值进行反向传播以更新网络参数直至收敛,得到图像识别模型,所述图像识别模型用于识别输入图像的类别以及类别物的位置。
在一些实施例中,所述分类图像携带有第二类别标签,所述目标检测图像携带有位置信息和第一类别标签,全部目标检测图像携带的第一类别标签构成第一类别标签集合;处理器803还执行:
若输入所述深度神经网络的样本图像为所述分类图像,则判断输入的分类图像对应的第二类别标签是否包含在所述第一类别标签集合中;
若是,则执行基于所述第一损失函数计算损失值;
若否,则基于第三损失函数计算损失值,其中,当输入的样本图像相同时,所述第一损失函数计算得到的损失值小于所述第三损失函数计算得到的损失值。
在一些实施例中,处理器803还执行:
获取待分类图像;根据所述图像识别模型对所述待分类图像进行图像识别,以确定所述待分类图像对应的目标类别,以及属于所述目标类别的对象在所述待分类图像中的位置。
由上可知,本申请实施例提供了一种电子设备,所述电子设备在训练深度神经网络时,获取包含有目标检测图像和分类图像的样本图像集合,使用样本图像集中的样本图像训练预设的深度神经网络进行训练,在训练过程中,当输 入深度神经网络的样本图像为分类图像时,基于第一损失函数计算损失值,当输入深度神经网络的样本图像为目标检测图像时,基于第一损失函数和第二损失函数计算损失值,并基于损失值进行反向传播,以更新网络参数直至收敛,上述训练方案中,联合目标检测图像和分类图像对预设的深度神经网络进行训练,由于目标检测图像携带有位置信息和第一类别标签,位置信息指示了类别物在图像中的具体位置,使得在训练网络的过程中,网络能够更准确地提取到类别物的特征,提高了图像识别模型训练得到的图像识别模型对于图像类别预测的准确度。
本申请实施例还提供一种存储介质,所述存储介质中存储有计算机程序,当所述计算机程序在计算机上运行时,所述计算机执行上述任一实施例所述的模型训练方法。
需要说明的是,本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过计算机程序来指令相关的硬件来完成,所述计算机程序可以存储于计算机可读存储介质中,所述存储介质可以包括但不限于:只读存储器(ROM,Read Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁盘或光盘等。
此外,本申请中的术语“第一”、“第二”和“第三”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或模块的过程、方法、系统、产品或设备没有限定于已列出的步骤或模块,而是某些实施例还包括没有列出的步骤或模块,或某些实施例还包括对于这些过程、方法、产品或设备固有的其它步骤或模块。
以上对本申请实施例所提供的模型训练方法、装置、存储介质及电子设备进行了详细介绍。本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (20)

  1. 一种模型训练方法,其特征在于,包括:
    获取样本图像集,所述样本图像集中包含有目标检测图像和分类图像,其中,所述目标检测图像携带有位置信息和第一类别标签;
    将所述样本图像集中的样本图像输入预设的深度神经网络进行训练;
    若输入所述深度神经网络的样本图像为所述分类图像,则基于第一损失函数计算损失值;
    若输入所述深度神经网络的样本图像为所述目标检测图像,则基于所述第一损失函数和第二损失函数计算损失值;
    基于计算得到的损失值进行反向传播以更新网络参数直至收敛,得到图像识别模型,所述图像识别模型用于识别输入图像的类别以及类别物的位置。
  2. 如权利要求1所述的模型训练方法,其特征在于,所述分类图像携带有第二类别标签,所述目标检测图像携带有位置信息和第一类别标签,全部目标检测图像携带的第一类别标签构成第一类别标签集合;
    所述基于第一损失函数计算损失值之前,还包括:
    若输入所述深度神经网络的样本图像为所述分类图像,则判断输入的分类图像对应的第二类别标签是否包含在所述第一类别标签集合中;
    若是,则执行基于所述第一损失函数计算损失值;
    若否,则基于第三损失函数计算损失值,其中,当输入的样本图像相同时,所述第一损失函数计算得到的损失值小于所述第三损失函数计算得到的损失值。
  3. 如权利要求2所述的模型训练方法,其特征在于,所述第三损失函数=k*第一损失函数,其中,k>1。
  4. 如权利要求2所述的模型训练方法,其特征在于,所述第一损失函数为m*f,所述第三损失函数为n*f,其中,f为基础损失函数,0<m<1,n>1。
  5. 如权利要求2所述的模型训练方法,其特征在于,所述深度神经网络为卷积神经网络;全部分类图像携带的第二类别标签构成第二类别标签集合,所述第一类别标签集合中的标签种类少于所述第二类别标签集合的标签种类。
  6. 如权利要求1所述的模型训练方法,其特征在于,所述基于计算得到的损失值进行反向传播,以更新网络参数直至收敛之后,还包括:
    获取待分类图像;
    根据所述图像识别模型对所述待分类图像进行图像识别,以确定所述待分类图像对应的目标类别,以及属于所述目标类别的对象在所述待分类图像中的 位置。
  7. 一种模型训练装置,其特征在于,包括:
    图像获取模块,用于获取样本图像集,所述样本图像集中包含有目标检测图像和分类图像,其中,所述目标检测图像携带有位置信息和第一类别标签;
    图像输入模块,用于将所述样本图像集中的样本图像输入预设的深度神经网络进行训练;
    第一计算模块,用于若输入所述深度神经网络的样本图像为所述分类图像,则基于第一损失函数计算损失值;
    第二计算模块,用于若输入所述深度神经网络的样本图像为所述目标检测图像,则基于所述第一损失函数和第二损失函数计算损失值;
    迭代训练模块,用于基于计算得到的损失值进行反向传播以更新网络参数直至收敛,得到图像识别模型,所述图像识别模型用于识别输入图像的类别以及类别物的位置。
  8. 如权利要求7所述的模型训练装置,其特征在于,所述分类图像携带有第二类别标签,全部目标检测图像携带的第一类别标签构成第一类别标签集合;所述装置还包括:
    标签检测模块,用于若输入所述深度神经网络的样本图像为所述分类图像,则判断输入的分类图像对应的第二类别标签是否包含在所述第一类别标签集合中;
    所述第一计算模块还用于:
    若输入的分类图像对应的第二类别标签包含在所述第一类别标签集合中,则基于所述第一损失函数计算损失值;
    若输入的分类图像对应的第二类别标签不包含在所述第一类别标签集合中,则基于第三损失函数计算损失值,其中,当输入的样本图像相同时,所述第一损失函数计算得到的损失值小于所述第三损失函数计算得到的损失值。
  9. 如权利要求8所述的模型训练装置,其特征在于,所述第三损失函数=k*第一损失函数,其中,k>1。
  10. 如权利要求8所述的模型训练装置,其特征在于,所述第一损失函数为m*f,所述第三损失函数为n*f,其中,f为基础损失函数,0<m<1,n>1。
  11. 如权利要求8所述的模型训练装置,其特征在于,所述深度神经网络为卷积神经网络;全部分类图像携带的第二类别标签构成第二类别标签集合,所述第一类别标签集合中的标签种类少于所述第二类别标签集合的标签种类。
  12. 如权利要求7所述的模型训练装置,其特征在于,所述装置还包括图 像分类模块,所述图像分类模块用于:
    获取待分类图像;
    以及,根据所述图像识别模型对所述待分类图像进行图像识别,以确定所述待分类图像对应的目标类别,以及属于所述目标类别的对象在所述待分类图像中的位置。
  13. 一种存储介质,其上存储有计算机程序,其特征在于,当所述计算机程序在计算机上运行时,使得所述计算机执行:
    获取样本图像集,所述样本图像集中包含有目标检测图像和分类图像,其中,所述目标检测图像携带有位置信息和第一类别标签;
    将所述样本图像集中的样本图像输入预设的深度神经网络进行训练;
    若输入所述深度神经网络的样本图像为所述分类图像,则基于第一损失函数计算损失值;
    若输入所述深度神经网络的样本图像为所述目标检测图像,则基于所述第一损失函数和第二损失函数计算损失值;
    基于计算得到的损失值进行反向传播以更新网络参数直至收敛,得到图像识别模型,所述图像识别模型用于识别输入图像的类别以及类别物的位置。
  14. 如权利要求13所述存储介质,其特征在于,所述分类图像携带有第二类别标签,所述目标检测图像携带有位置信息和第一类别标签,全部目标检测图像携带的第一类别标签构成第一类别标签集合;
    当所述计算机程序在计算机上运行时,还可以使所述计算机执行:
    若输入所述深度神经网络的样本图像为所述分类图像,则判断输入的分类图像对应的第二类别标签是否包含在所述第一类别标签集合中;
    若是,则执行基于所述第一损失函数计算损失值;
    若否,则基于第三损失函数计算损失值,其中,当输入的样本图像相同时,所述第一损失函数计算得到的损失值小于所述第三损失函数计算得到的损失值。
  15. 一种电子设备,包括处理器和存储器,所述存储器存储有计算机程序,其特征在于,所述处理器通过调用所述计算机程序,用于执行:
    获取样本图像集,所述样本图像集中包含有目标检测图像和分类图像,其中,所述目标检测图像携带有位置信息和第一类别标签;
    将所述样本图像集中的样本图像输入预设的深度神经网络进行训练;
    若输入所述深度神经网络的样本图像为所述分类图像,则基于第一损失函数计算损失值;
    若输入所述深度神经网络的样本图像为所述目标检测图像,则基于所述第一损失函数和第二损失函数计算损失值;
    基于计算得到的损失值进行反向传播以更新网络参数直至收敛,得到图像识别模型,所述图像识别模型用于识别输入图像的类别以及类别物的位置。
  16. 如权利要求15所述的电子设备,其特征在于,所述分类图像携带有第二类别标签,所述目标检测图像携带有位置信息和第一类别标签,全部目标检测图像携带的第一类别标签构成第一类别标签集合;所述处理器还可以通过调用所述计算机程序,用于执行:
    若输入所述深度神经网络的样本图像为所述分类图像,则判断输入的分类图像对应的第二类别标签是否包含在所述第一类别标签集合中;
    若是,则执行基于所述第一损失函数计算损失值;
    若否,则基于第三损失函数计算损失值,其中,当输入的样本图像相同时,所述第一损失函数计算得到的损失值小于所述第三损失函数计算得到的损失值。
  17. 如权利要求16所述的电子设备,其特征在于,所述第三损失函数=k*第一损失函数,其中,k>1。
  18. 如权利要求16所述的电子设备,其特征在于,所述第一损失函数为m*f,所述第三损失函数为n*f,其中,f为基础损失函数,0<m<1,n>1。
  19. 如权利要求16所述的电子设备,其特征在于,所述深度神经网络为卷积神经网络;全部分类图像携带的第二类别标签构成第二类别标签集合,所述第一类别标签集合中的标签种类少于所述第二类别标签集合的标签种类。
  20. 如权利要求15所述的电子设备,其特征在于,所述处理器还可以通过调用所述计算机程序,用于执行:
    获取待分类图像;
    根据所述图像识别模型对所述待分类图像进行图像识别,以确定所述待分类图像对应的目标类别,以及属于所述目标类别的对象在所述待分类图像中的位置。
PCT/CN2019/116710 2019-11-08 2019-11-08 模型训练方法、装置、存储介质及电子设备 WO2021087985A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2019/116710 WO2021087985A1 (zh) 2019-11-08 2019-11-08 模型训练方法、装置、存储介质及电子设备
CN201980100619.0A CN114424253A (zh) 2019-11-08 2019-11-08 模型训练方法、装置、存储介质及电子设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/116710 WO2021087985A1 (zh) 2019-11-08 2019-11-08 模型训练方法、装置、存储介质及电子设备

Publications (1)

Publication Number Publication Date
WO2021087985A1 true WO2021087985A1 (zh) 2021-05-14

Family

ID=75849227

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/116710 WO2021087985A1 (zh) 2019-11-08 2019-11-08 模型训练方法、装置、存储介质及电子设备

Country Status (2)

Country Link
CN (1) CN114424253A (zh)
WO (1) WO2021087985A1 (zh)

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113221837A (zh) * 2021-06-01 2021-08-06 北京金山云网络技术有限公司 对象分割方法、对象分割模型的训练方法和装置
CN113282927A (zh) * 2021-05-31 2021-08-20 平安国际智慧城市科技股份有限公司 恶意代码检测方法、装置、设备及计算机可读存储介质
CN113298156A (zh) * 2021-05-28 2021-08-24 有米科技股份有限公司 用于图像性别分类的神经网络训练方法及装置
CN113364792A (zh) * 2021-06-11 2021-09-07 奇安信科技集团股份有限公司 流量检测模型的训练方法、流量检测方法、装置及设备
CN113378833A (zh) * 2021-06-25 2021-09-10 北京百度网讯科技有限公司 图像识别模型训练方法、图像识别方法、装置及电子设备
CN113408662A (zh) * 2021-07-19 2021-09-17 北京百度网讯科技有限公司 图像识别、图像识别模型的训练方法和装置
CN113449704A (zh) * 2021-08-31 2021-09-28 北京的卢深视科技有限公司 人脸识别模型训练方法、装置、电子设备及存储介质
CN113496256A (zh) * 2021-06-24 2021-10-12 中汽创智科技有限公司 一种图像标注模型训练方法、标注方法、装置、设备及介质
CN113505800A (zh) * 2021-06-30 2021-10-15 深圳市慧鲤科技有限公司 图像处理方法及其模型的训练方法和装置、设备、介质
CN113505820A (zh) * 2021-06-23 2021-10-15 北京阅视智能技术有限责任公司 图像识别模型训练方法、装置、设备及介质
CN113516053A (zh) * 2021-05-28 2021-10-19 西安空间无线电技术研究所 一种具有旋转不变性的舰船目标精细化检测方法
CN113537286A (zh) * 2021-06-11 2021-10-22 浙江智慧视频安防创新中心有限公司 一种图像分类方法、装置、设备及介质
CN113591918A (zh) * 2021-06-29 2021-11-02 北京百度网讯科技有限公司 图像处理模型的训练方法、图像处理方法、装置和设备
CN113657523A (zh) * 2021-08-23 2021-11-16 科大讯飞股份有限公司 一种图像目标分类方法、装置、设备及存储介质
CN113780101A (zh) * 2021-08-20 2021-12-10 京东鲲鹏(江苏)科技有限公司 避障模型的训练方法、装置、电子设备及存储介质
CN113780480A (zh) * 2021-11-11 2021-12-10 深圳佑驾创新科技有限公司 基于YOLOv5的多目标检测及类别识别模型的构建方法
CN113837216A (zh) * 2021-06-01 2021-12-24 腾讯科技(深圳)有限公司 数据分类方法、训练方法、装置、介质及电子设备
CN113836338A (zh) * 2021-07-21 2021-12-24 北京邮电大学 细粒度图像分类方法、装置、存储介质及终端
CN113947701A (zh) * 2021-10-18 2022-01-18 北京百度网讯科技有限公司 训练方法、对象识别方法、装置、电子设备以及存储介质
CN113963148A (zh) * 2021-10-29 2022-01-21 北京百度网讯科技有限公司 对象检测方法、对象检测模型的训练方法及装置
CN113962965A (zh) * 2021-10-26 2022-01-21 腾讯科技(深圳)有限公司 图像质量评价方法、装置、设备以及存储介质
CN114332547A (zh) * 2022-03-17 2022-04-12 浙江太美医疗科技股份有限公司 医学目标分类方法和装置、电子设备和存储介质
CN114549938A (zh) * 2022-04-25 2022-05-27 广州市玄武无线科技股份有限公司 模型训练方法、图像信息管理方法、图像识别方法及装置
CN114972725A (zh) * 2021-12-30 2022-08-30 华为技术有限公司 模型训练方法、可读介质和电子设备
CN115270848A (zh) * 2022-06-17 2022-11-01 合肥心之声健康科技有限公司 一种ppg与ecg自动转换智能算法、存储介质和计算机系统
CN115294396A (zh) * 2022-08-12 2022-11-04 北京百度网讯科技有限公司 骨干网络的训练方法以及图像分类方法
CN115331062A (zh) * 2022-08-29 2022-11-11 北京达佳互联信息技术有限公司 图像识别方法、装置、电子设备和计算机可读存储介质
CN115529159A (zh) * 2022-08-16 2022-12-27 中国电信股份有限公司 加密流量检测模型的训练方法、装置、设备及存储介质
CN115601618A (zh) * 2022-11-29 2023-01-13 浙江华是科技股份有限公司(Cn) 一种磁芯缺陷检测方法、系统及计算机存储介质
CN115793490A (zh) * 2023-02-06 2023-03-14 南通弈匠智能科技有限公司 基于大数据的智能家居节能控制方法
CN116468973A (zh) * 2023-06-09 2023-07-21 深圳比特微电子科技有限公司 用于低照度图像的目标检测模型的训练方法、装置
CN116663650A (zh) * 2023-06-06 2023-08-29 北京百度网讯科技有限公司 深度学习模型的训练方法、目标对象检测方法及装置
CN116935102A (zh) * 2023-06-30 2023-10-24 上海蜜度信息技术有限公司 一种轻量化模型训练方法、装置、设备和介质
WO2023216251A1 (zh) * 2022-05-13 2023-11-16 华为技术有限公司 地图生成方法、模型训练方法、可读介质和电子设备
CN117282687A (zh) * 2023-10-18 2023-12-26 广州市普理司科技有限公司 印刷品视觉检测自动剔补标控制系统
CN113836338B (zh) * 2021-07-21 2024-05-24 北京邮电大学 细粒度图像分类方法、装置、存储介质及终端

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114821207B (zh) * 2022-06-30 2022-11-04 浙江凤凰云睿科技有限公司 一种图像分类方法、装置、存储介质及终端
CN115439699B (zh) * 2022-10-25 2023-06-30 北京鹰瞳科技发展股份有限公司 目标检测模型的训练方法、目标检测的方法及相关产品
CN116486134A (zh) * 2023-03-02 2023-07-25 哈尔滨市科佳通用机电股份有限公司 基于深度神经网络的列车制动软管挂钩脱出故障检测方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107134144A (zh) * 2017-04-27 2017-09-05 武汉理工大学 一种用于交通监控的车辆检测方法
CN109522967A (zh) * 2018-11-28 2019-03-26 广州逗号智能零售有限公司 一种商品定位识别方法、装置、设备以及存储介质
US20190251333A1 (en) * 2017-06-02 2019-08-15 Tencent Technology (Shenzhen) Company Limited Face detection training method and apparatus, and electronic device
CN110189317A (zh) * 2019-05-30 2019-08-30 上海卡罗网络科技有限公司 一种基于深度学习的道路影像智能采集和识别方法
CN110298266A (zh) * 2019-06-10 2019-10-01 天津大学 基于多尺度感受野特征融合的深度神经网络目标检测方法
CN110349147A (zh) * 2019-07-11 2019-10-18 腾讯医疗健康(深圳)有限公司 模型的训练方法、眼底黄斑区病变识别方法、装置及设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107134144A (zh) * 2017-04-27 2017-09-05 武汉理工大学 一种用于交通监控的车辆检测方法
US20190251333A1 (en) * 2017-06-02 2019-08-15 Tencent Technology (Shenzhen) Company Limited Face detection training method and apparatus, and electronic device
CN109522967A (zh) * 2018-11-28 2019-03-26 广州逗号智能零售有限公司 一种商品定位识别方法、装置、设备以及存储介质
CN110189317A (zh) * 2019-05-30 2019-08-30 上海卡罗网络科技有限公司 一种基于深度学习的道路影像智能采集和识别方法
CN110298266A (zh) * 2019-06-10 2019-10-01 天津大学 基于多尺度感受野特征融合的深度神经网络目标检测方法
CN110349147A (zh) * 2019-07-11 2019-10-18 腾讯医疗健康(深圳)有限公司 模型的训练方法、眼底黄斑区病变识别方法、装置及设备

Cited By (58)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113516053B (zh) * 2021-05-28 2024-05-14 西安空间无线电技术研究所 一种具有旋转不变性的舰船目标精细化检测方法
CN113516053A (zh) * 2021-05-28 2021-10-19 西安空间无线电技术研究所 一种具有旋转不变性的舰船目标精细化检测方法
CN113298156A (zh) * 2021-05-28 2021-08-24 有米科技股份有限公司 用于图像性别分类的神经网络训练方法及装置
CN113282927B (zh) * 2021-05-31 2024-02-02 平安国际智慧城市科技股份有限公司 恶意代码检测方法、装置、设备及计算机可读存储介质
CN113282927A (zh) * 2021-05-31 2021-08-20 平安国际智慧城市科技股份有限公司 恶意代码检测方法、装置、设备及计算机可读存储介质
CN113837216B (zh) * 2021-06-01 2024-05-10 腾讯科技(深圳)有限公司 数据分类方法、训练方法、装置、介质及电子设备
CN113221837A (zh) * 2021-06-01 2021-08-06 北京金山云网络技术有限公司 对象分割方法、对象分割模型的训练方法和装置
CN113837216A (zh) * 2021-06-01 2021-12-24 腾讯科技(深圳)有限公司 数据分类方法、训练方法、装置、介质及电子设备
CN113537286A (zh) * 2021-06-11 2021-10-22 浙江智慧视频安防创新中心有限公司 一种图像分类方法、装置、设备及介质
CN113364792A (zh) * 2021-06-11 2021-09-07 奇安信科技集团股份有限公司 流量检测模型的训练方法、流量检测方法、装置及设备
CN113364792B (zh) * 2021-06-11 2022-07-12 奇安信科技集团股份有限公司 流量检测模型的训练方法、流量检测方法、装置及设备
CN113505820A (zh) * 2021-06-23 2021-10-15 北京阅视智能技术有限责任公司 图像识别模型训练方法、装置、设备及介质
CN113505820B (zh) * 2021-06-23 2024-02-06 北京阅视智能技术有限责任公司 图像识别模型训练方法、装置、设备及介质
CN113496256A (zh) * 2021-06-24 2021-10-12 中汽创智科技有限公司 一种图像标注模型训练方法、标注方法、装置、设备及介质
CN113496256B (zh) * 2021-06-24 2024-04-09 中汽创智科技有限公司 一种图像标注模型训练方法、标注方法、装置、设备及介质
CN113378833B (zh) * 2021-06-25 2023-09-01 北京百度网讯科技有限公司 图像识别模型训练方法、图像识别方法、装置及电子设备
CN113378833A (zh) * 2021-06-25 2021-09-10 北京百度网讯科技有限公司 图像识别模型训练方法、图像识别方法、装置及电子设备
CN113591918A (zh) * 2021-06-29 2021-11-02 北京百度网讯科技有限公司 图像处理模型的训练方法、图像处理方法、装置和设备
CN113591918B (zh) * 2021-06-29 2024-02-06 北京百度网讯科技有限公司 图像处理模型的训练方法、图像处理方法、装置和设备
CN113505800A (zh) * 2021-06-30 2021-10-15 深圳市慧鲤科技有限公司 图像处理方法及其模型的训练方法和装置、设备、介质
CN113408662A (zh) * 2021-07-19 2021-09-17 北京百度网讯科技有限公司 图像识别、图像识别模型的训练方法和装置
CN113836338A (zh) * 2021-07-21 2021-12-24 北京邮电大学 细粒度图像分类方法、装置、存储介质及终端
CN113836338B (zh) * 2021-07-21 2024-05-24 北京邮电大学 细粒度图像分类方法、装置、存储介质及终端
CN113780101A (zh) * 2021-08-20 2021-12-10 京东鲲鹏(江苏)科技有限公司 避障模型的训练方法、装置、电子设备及存储介质
CN113657523A (zh) * 2021-08-23 2021-11-16 科大讯飞股份有限公司 一种图像目标分类方法、装置、设备及存储介质
CN113449704A (zh) * 2021-08-31 2021-09-28 北京的卢深视科技有限公司 人脸识别模型训练方法、装置、电子设备及存储介质
CN113947701B (zh) * 2021-10-18 2024-02-23 北京百度网讯科技有限公司 训练方法、对象识别方法、装置、电子设备以及存储介质
CN113947701A (zh) * 2021-10-18 2022-01-18 北京百度网讯科技有限公司 训练方法、对象识别方法、装置、电子设备以及存储介质
CN113962965A (zh) * 2021-10-26 2022-01-21 腾讯科技(深圳)有限公司 图像质量评价方法、装置、设备以及存储介质
CN113962965B (zh) * 2021-10-26 2023-06-09 腾讯科技(深圳)有限公司 图像质量评价方法、装置、设备以及存储介质
CN113963148B (zh) * 2021-10-29 2023-08-08 北京百度网讯科技有限公司 对象检测方法、对象检测模型的训练方法及装置
CN113963148A (zh) * 2021-10-29 2022-01-21 北京百度网讯科技有限公司 对象检测方法、对象检测模型的训练方法及装置
CN113780480A (zh) * 2021-11-11 2021-12-10 深圳佑驾创新科技有限公司 基于YOLOv5的多目标检测及类别识别模型的构建方法
CN114972725B (zh) * 2021-12-30 2023-05-23 华为技术有限公司 模型训练方法、可读介质和电子设备
CN114972725A (zh) * 2021-12-30 2022-08-30 华为技术有限公司 模型训练方法、可读介质和电子设备
CN114332547A (zh) * 2022-03-17 2022-04-12 浙江太美医疗科技股份有限公司 医学目标分类方法和装置、电子设备和存储介质
CN114549938B (zh) * 2022-04-25 2022-09-09 广州市玄武无线科技股份有限公司 模型训练方法、图像信息管理方法、图像识别方法及装置
CN114549938A (zh) * 2022-04-25 2022-05-27 广州市玄武无线科技股份有限公司 模型训练方法、图像信息管理方法、图像识别方法及装置
WO2023216251A1 (zh) * 2022-05-13 2023-11-16 华为技术有限公司 地图生成方法、模型训练方法、可读介质和电子设备
CN115270848A (zh) * 2022-06-17 2022-11-01 合肥心之声健康科技有限公司 一种ppg与ecg自动转换智能算法、存储介质和计算机系统
CN115270848B (zh) * 2022-06-17 2023-09-29 合肥心之声健康科技有限公司 一种ppg与ecg自动转换智能算法、存储介质和计算机系统
CN115294396B (zh) * 2022-08-12 2024-04-23 北京百度网讯科技有限公司 骨干网络的训练方法以及图像分类方法
CN115294396A (zh) * 2022-08-12 2022-11-04 北京百度网讯科技有限公司 骨干网络的训练方法以及图像分类方法
CN115529159A (zh) * 2022-08-16 2022-12-27 中国电信股份有限公司 加密流量检测模型的训练方法、装置、设备及存储介质
CN115529159B (zh) * 2022-08-16 2024-03-08 中国电信股份有限公司 加密流量检测模型的训练方法、装置、设备及存储介质
CN115331062B (zh) * 2022-08-29 2023-08-08 北京达佳互联信息技术有限公司 图像识别方法、装置、电子设备和计算机可读存储介质
CN115331062A (zh) * 2022-08-29 2022-11-11 北京达佳互联信息技术有限公司 图像识别方法、装置、电子设备和计算机可读存储介质
CN115601618A (zh) * 2022-11-29 2023-01-13 浙江华是科技股份有限公司(Cn) 一种磁芯缺陷检测方法、系统及计算机存储介质
CN115793490B (zh) * 2023-02-06 2023-04-11 南通弈匠智能科技有限公司 基于大数据的智能家居节能控制方法
CN115793490A (zh) * 2023-02-06 2023-03-14 南通弈匠智能科技有限公司 基于大数据的智能家居节能控制方法
CN116663650A (zh) * 2023-06-06 2023-08-29 北京百度网讯科技有限公司 深度学习模型的训练方法、目标对象检测方法及装置
CN116663650B (zh) * 2023-06-06 2023-12-19 北京百度网讯科技有限公司 深度学习模型的训练方法、目标对象检测方法及装置
CN116468973B (zh) * 2023-06-09 2023-10-10 深圳比特微电子科技有限公司 用于低照度图像的目标检测模型的训练方法、装置
CN116468973A (zh) * 2023-06-09 2023-07-21 深圳比特微电子科技有限公司 用于低照度图像的目标检测模型的训练方法、装置
CN116935102B (zh) * 2023-06-30 2024-02-20 上海蜜度科技股份有限公司 一种轻量化模型训练方法、装置、设备和介质
CN116935102A (zh) * 2023-06-30 2023-10-24 上海蜜度信息技术有限公司 一种轻量化模型训练方法、装置、设备和介质
CN117282687A (zh) * 2023-10-18 2023-12-26 广州市普理司科技有限公司 印刷品视觉检测自动剔补标控制系统
CN117282687B (zh) * 2023-10-18 2024-05-28 广州市普理司科技有限公司 印刷品视觉检测自动剔补标控制系统

Also Published As

Publication number Publication date
CN114424253A (zh) 2022-04-29

Similar Documents

Publication Publication Date Title
WO2021087985A1 (zh) 模型训练方法、装置、存储介质及电子设备
WO2021057848A1 (zh) 网络的训练方法、图像处理方法、网络、终端设备及介质
WO2020192483A1 (zh) 图像显示方法和设备
CN106845487B (zh) 一种端到端的车牌识别方法
WO2019233297A1 (zh) 数据集的构建方法、移动终端、可读存储介质
WO2019233392A1 (zh) 图像处理方法、装置、电子设备和计算机可读存储介质
US20190213474A1 (en) Frame selection based on a trained neural network
WO2022067668A1 (zh) 基于视频图像目标检测的火灾检测方法、系统、终端以及存储介质
WO2021047408A1 (zh) 图像处理方法、装置、存储介质及电子设备
WO2020001196A1 (zh) 图像处理方法、电子设备、计算机可读存储介质
CN110929785B (zh) 数据分类方法、装置、终端设备及可读存储介质
CN111209970A (zh) 视频分类方法、装置、存储介质及服务器
CN112602319B (zh) 一种对焦装置、方法及相关设备
WO2022082999A1 (zh) 一种物体识别方法、装置、终端设备及存储介质
WO2021238586A1 (zh) 一种训练方法、装置、设备以及计算机可读存储介质
WO2021134485A1 (zh) 视频评分方法、装置、存储介质及电子设备
CN111325181B (zh) 一种状态监测方法、装置、电子设备及存储介质
CN115100732A (zh) 钓鱼检测方法、装置、计算机设备及存储介质
CN116863286A (zh) 一种双流目标检测方法及其模型搭建方法
CN111597937B (zh) 鱼姿势识别方法、装置、设备及存储介质
CN111753775B (zh) 鱼的生长评估方法、装置、设备及存储介质
CN111428567B (zh) 一种基于仿射多任务回归的行人跟踪系统及方法
CN115690747B (zh) 车辆盲区检测模型测试方法、装置、电子设备及存储介质
US20230336878A1 (en) Photographing mode determination method and apparatus, and electronic device and storage medium
CN114170271B (zh) 一种具有自跟踪意识的多目标跟踪方法、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19951862

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19951862

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 02.11.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19951862

Country of ref document: EP

Kind code of ref document: A1