CN111160434A

CN111160434A - Training method and device of target detection model and computer readable storage medium

Info

Publication number: CN111160434A
Application number: CN201911323856.1A
Authority: CN
Inventors: 赖丹宇
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2019-12-19
Filing date: 2019-12-19
Publication date: 2020-05-15

Abstract

The invention relates to a biological recognition technology, and discloses a training method and a device of a target detection model and a computer readable storage medium, wherein the training method of the target detection model comprises the following steps: providing a target detection model, wherein the target detection model comprises a backbone network and a detection layer; acquiring a target detection data set, wherein the target detection data set comprises target object images of different categories; cutting out images of different target objects through a frame based on the target detection data set to form classification data sets of different categories; training a backbone network of the target detection model by using the classification data set; freezing a backbone network and finely adjusting a detection layer of the target detection model according to the detection target detection data set; receiving an image of a target object to be identified; and identifying the image of the target object to be identified based on the trained target detection model. The invention improves the training speed and precision of the target detection model, shortens the training time and improves the training efficiency.

Description

Training method and device of target detection model and computer readable storage medium

Technical Field

The invention relates to the technical field of biological recognition, in particular to a training method and a training device for a target detection model and a computer readable storage medium.

Background

The task of object detection is to find all objects of interest (objects) in the image, determine their position and size. Because various objects have different appearances, shapes and postures, and interference of factors such as illumination, shielding and the like during imaging is added, target detection is always the most challenging problem in the field of machine vision. Object detection has many mature applications in many computer vision fields, such as face detection, pedestrian detection, image retrieval, video surveillance, and the like.

The existing target detection method mainly migrates classification pre-training models on the imagenet for fine adjustment when training, the types and the applicability of the pre-training models are limited, and redesigning a network and training on a large data set such as the imagenet are required to identify the type of a target object, and simultaneously, the specific position of the target object is required to be known, so that the method is time-consuming.

Disclosure of Invention

The invention provides a training method and a training device for a target detection model and a computer readable storage medium, and mainly aims to separately train a backbone network and a detection layer of the detection model by using a classification data set without detecting the information of a specific position of a target object, thereby shortening the training time.

In order to achieve the above object, the present invention provides a training method of a target detection model, including:

providing a target detection model, wherein the target detection model comprises a backbone network and a detection layer;

acquiring a target detection data set, wherein the target detection data set comprises target object images of different categories;

cutting out images of different target objects through a frame based on the target detection data set to form classification data sets of different categories;

training a backbone network of the target detection model by using the classification dataset;

freezing the backbone network and finely adjusting a detection layer of the target detection model according to the target detection data set;

receiving an image of a target object to be identified;

and identifying the image of the target object to be identified based on the trained target detection model.

Optionally, the step of fine-tuning the detection layer of the target detection model according to the target detection data set includes:

the detection layer comprises a first detection sublayer and a second detection sublayer, and the first detection sublayer is subjected to fine adjustment on the image with the pixel value in the classified data set larger than the reference pixel value; fine-tuning the second detection sublayer for images in the classification dataset having pixel values less than or equal to reference pixel values.

Optionally, the step of acquiring a target detection data set includes:

receiving video data and extracting each frame of picture of the video data;

and labeling the human head in each frame of picture by adopting a data labeling tool so as to generate the target detection data set.

Optionally, the step of acquiring a target detection data set further comprises:

classifying the plurality of different image samples into a complex image sample class and a simple image sample class;

extracting complex image features according to a plurality of image samples contained in the complex image sample class;

and extracting simple image features according to the plurality of image samples contained in the simple image sample class and the extracted complex image features.

Optionally, the step of classifying the plurality of different image samples into a complex image sample class and a simple image sample class includes:

obtaining the classification loss rate of the plurality of different image samples;

and classifying the samples with the classification loss rate larger than a preset threshold value into a complex image sample class, and classifying the samples with the classification loss rate smaller than or equal to the preset threshold value into a simple image sample class.

The present invention also provides an electronic device, including a memory and a processor, where the memory stores a training program of an object detection model that is executable on the processor, and the training program of the object detection model, when executed by the processor, implements the following steps:

receiving an image of a target object to be identified;

Optionally, the step of acquiring a target detection data set includes:

receiving video data and extracting each frame of picture of the video data;

In addition, to achieve the above object, the present invention also provides a computer readable storage medium, which stores thereon a training program of an object detection model, the training program of the object detection model being executable by one or more processors to implement the steps of the training method of the object detection model described above.

The training method and device of the target detection model and the computer readable storage medium provided by the invention separately train the backbone network and the detection layer of the detection model by using the classification data set without detecting the information of the specific position of the target object, thereby improving the training speed and precision of the target detection model, shortening the training time and improving the training efficiency.

Drawings

Fig. 1 is a schematic flowchart of a training method of a target detection model according to an embodiment of the present invention;

fig. 2 is a schematic diagram illustrating an internal structure of an electronic device according to an embodiment of the invention;

fig. 3 is a block diagram illustrating a training procedure based on a target detection model in an electronic device according to an embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The invention provides a training method of a target detection model. Fig. 1 is a schematic flow chart of a training method of a target detection model according to an embodiment of the present invention. The method may be performed by a device, which may be implemented by software and/or hardware, and in this embodiment, the device is an intelligent terminal.

In this embodiment, the training method of the target detection model includes:

s101, providing a target detection model, wherein the target detection model comprises a backbone network and a detection layer;

s102, acquiring a target detection data set, wherein the target detection data set comprises target object images of different categories;

s103, cutting out images of different target objects through a frame based on the target detection data set to form classification data sets of different categories;

s104, training a backbone network of the target detection model by using a classification data set;

s105, freezing the backbone network and finely adjusting a detection layer of the target detection model according to the target detection data set; specifically, freezing the backbone network means not updating the model parameters of the backbone network, and updating the model parameters of the detection layer of the target detection model according to the target detection data set means updating the model parameters of the detection layer according to the target detection data set until the loss function of the target detection model does not decrease;

s106, receiving an image of a target object to be identified;

and S107, identifying the image of the target object to be identified based on the trained target detection model.

The step of fine-tuning the detection layer of the target detection model according to the target detection dataset comprises:

The step of obtaining a target detection data set comprises:

receiving video data and extracting each frame of picture of the video data;

The step of obtaining a target detection data set further comprises:

The step of classifying the plurality of different image samples into a complex image sample class and a simple image sample class comprises:

The classification loss rate is obtained based on the ratio of the number of lost features of each image sample to the number of originally included features.

In the process of classifying multiple image samples, some features may be lost from the image samples. Assuming that the number of features originally included in each image sample is a1, and the number of features lost in the classification process is a2, the classification loss rate is the ratio a1/a2 of the number of features lost to the number of features originally included in each image sample. It will be appreciated that the classification loss rate is relatively high for a large number of missing features and relatively low for a small number of missing features.

After the classification loss rate of each image sample is obtained, the image samples with the classification loss rate larger than a preset threshold value are classified into a complex image sample class, and the image samples with the classification loss rate smaller than or equal to the preset threshold value are classified into a simple image sample class, so that the image samples are classified into levels, and the complex image sample class is preferentially used for training a neural network in the deep learning process.

The step of cropping out different target objects through a border based on the target detection dataset to form different categories of classification datasets comprises:

labeling cropping targets of a plurality of image samples contained in the complex image sample class based on the complex image features;

labeling, based on the simple image features, crop targets of a plurality of image samples included in the simple image sample class.

The convolution layer of the target detection model is located in a convolutional neural network, each convolution layer of the convolutional neural network is composed of a plurality of convolution units, and parameters of each convolution unit are obtained through optimization of a back propagation algorithm.

The convolution operation aims to extract different input features, the first layer of convolution layer may only extract some low-level features such as edges, lines, angles and other levels, and more layers of networks can iteratively extract more complex features from the low-level features.

The target detection has a plurality of mature applications in the field of computer vision, such as face detection, pedestrian detection, image retrieval, video monitoring and the like.

The training method of the target detection model can be applied to the field of human head detection, the participation rate is automatically counted according to the number of all human heads in a detected human head counting meeting room in a meeting room scene, the number of people is prevented from being counted manually, a large amount of time and manpower are saved, the human heads can be better visualized and verified whether the number of the obtained human heads is correct or not by framing the specific positions of the obtained human heads in a picture, the model only needs to be divided into two types, namely the human heads and the non-human heads, the video is converted into a frame of picture by collecting the class videos of students, and a data labeling tool is adopted to label all the human heads in the picture to generate a training data set.

The training method for the target detection model provided by the embodiment utilizes the classification data set to separately train the backbone network and the detection layer of the detection model, does not need to detect the information of the specific position of the target object, improves the training speed and precision of the target detection model, shortens the training time, and improves the training efficiency.

The invention also provides an electronic device 1. Fig. 2 is a schematic view of an internal structure of an electronic device according to an embodiment of the invention.

In this embodiment, the electronic device 1 may be a computer, an intelligent terminal or a server. The electronic device 1 comprises at least a memory 11, a processor 13, a communication bus 15, and a network interface 17. In this embodiment, the electronic device 1 is an intelligent terminal.

The memory 11 includes at least one type of readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The memory 11 may in some embodiments be an internal storage unit of the electronic device, for example a hard disk of the electronic device. The memory 11 may be an external storage device of the electronic apparatus in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a flash Card (FlashCard), and the like, which are provided on the electronic apparatus. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic apparatus. The memory 11 may be used not only to store application software installed in the electronic apparatus 1 and various types of data, such as a code of the living body detection program 111, but also to temporarily store data that has been output or is to be output.

The processor 13 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor or other data Processing chip in some embodiments, and is used for executing program codes stored in the memory 11 or Processing data.

The communication bus 15 is used to realize connection communication between these components.

The network interface 17 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), and is typically used to establish a communication link between the electronic apparatus 1 and other electronic devices.

Optionally, the electronic device 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface may also comprise a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device and for displaying a visualized user interface.

While FIG. 2 shows only the electronic device 1 with the components 11-17, those skilled in the art will appreciate that the configuration shown in FIG. 2 does not constitute a limitation of the electronic device, and may include fewer or more components than shown, or some components in combination, or a different arrangement of components.

In the embodiment of the electronic device 1 shown in fig. 2, the memory 11 stores therein a training program 111 of the object detection model; the processor 13 implements the following steps when executing the training program 111 of the object detection model stored in the memory 11:

freezing the backbone network and finely adjusting a detection layer of the target detection model according to the target detection data set; specifically, freezing the backbone network means not updating the model parameters of the backbone network, and updating the model parameters of the detection layer of the target detection model according to the target detection data set means updating the model parameters of the detection layer according to the target detection data set until the loss function of the target detection model does not decrease;

receiving an image of a target object to be identified;

The step of obtaining a target detection data set comprises:

receiving video data and extracting each frame of picture of the video data;

The step of obtaining a target detection data set further comprises:

The electronic device provided by the embodiment separately trains the backbone network and the detection layer of the detection model by using the classification data set, does not need to detect the information of the specific position of the target object, improves the training speed and precision of the target detection model, shortens the training time, and improves the training efficiency.

Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium has stored thereon a training program 111 of an object detection model, and the training program 111 of the object detection model is executable by one or more processors to implement the following operations:

receiving an image of a target object to be identified;

The embodiment of the computer readable storage medium of the present invention is substantially the same as the embodiments of the electronic device and the method, and will not be described herein in a repeated manner.

Alternatively, in other embodiments, the training program 111 of the object detection model may be further divided into one or more modules, and the one or more modules are stored in the memory 11 and executed by one or more processors (in this embodiment, the processor 13) to implement the present invention, where the module referred to in the present invention refers to a series of computer program instruction segments capable of performing a specific function for describing the execution process of the training program of the object detection model in the electronic device.

For example, referring to fig. 3, a schematic diagram of program modules of a training program 111 of an object detection model in an embodiment of the electronic device of the present invention is shown, in this embodiment, the training program 111 of the object detection model may be divided into a providing module 10, an obtaining module 20, a clipping module 30, a training module 40, a freezing module 50, a receiving module 60, and a recognition module 70, which exemplarily:

the providing module 10 is configured to provide a target detection model, where the target detection model includes a backbone network and a detection layer;

the acquiring module 20 is configured to acquire a target detection data set, where the target detection data set includes target object images of different categories;

the cutting module 30 is configured to cut out images of different target objects through a frame based on the target detection data set to form different categories of classification data sets;

the training module 40 is configured to train a backbone network of the target detection model by using a classification data set;

the freezing module 50 is configured to freeze the backbone network and perform fine tuning on a detection layer of the target detection model according to the target detection data set; specifically, freezing the backbone network means not updating the model parameters of the backbone network, and updating the model parameters of the detection layer of the target detection model according to the target detection data set means updating the model parameters of the detection layer according to the target detection data set until the loss function of the target detection model does not decrease;

the receiving module 60 is configured to receive an image of a target object to be identified;

the recognition module 70 is configured to recognize the image of the target object to be recognized based on the trained target detection model.

The functions or operation steps implemented when the program modules such as the providing module 10, the obtaining module 20, the clipping module 30, the training module 40, the freezing module 50, the receiving module 60, and the identifying module 70 are executed are substantially the same as those of the above embodiments, and are not described herein again.

It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A training method of an object detection model is characterized in that the training method of the object detection model comprises the following steps:

receiving an image of a target object to be identified;

2. The method of claim 1, wherein the step of fine-tuning the detection layer of the object detection model according to the object detection data set comprises:

3. The method of training an object detection model of claim 1, wherein the step of acquiring an object detection data set comprises:

receiving video data and extracting each frame of picture of the video data;

4. The method of training an object detection model of claim 3, wherein the step of obtaining an object detection data set further comprises:

5. The method of claim 4, wherein the step of classifying the plurality of different image samples into a complex image sample class and a simple image sample class comprises:

6. An electronic device, comprising a memory and a processor, wherein the memory stores a training program of an object detection model executable on the processor, and the training program of the object detection model when executed by the processor implements the steps of:

receiving an image of a target object to be identified;

7. The electronic device of claim 6, wherein the step of fine-tuning a detection layer of the object detection model according to the object detection dataset comprises:

8. The electronic device of claim 6, wherein the step of acquiring a target detection data set comprises:

receiving video data and extracting each frame of picture of the video data;

9. The electronic device of claim 8, wherein the step of acquiring a target detection data set further comprises:

10. A computer-readable storage medium, having stored thereon a training program of an object detection model, the training program of the object detection model being executable by one or more processors to implement the steps of the training method of the object detection model according to any one of claims 1 to 5.