CN111160434A - Training method and device of target detection model and computer readable storage medium - Google Patents

Training method and device of target detection model and computer readable storage medium Download PDF

Info

Publication number
CN111160434A
CN111160434A CN201911323856.1A CN201911323856A CN111160434A CN 111160434 A CN111160434 A CN 111160434A CN 201911323856 A CN201911323856 A CN 201911323856A CN 111160434 A CN111160434 A CN 111160434A
Authority
CN
China
Prior art keywords
target detection
target
data set
detection model
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911323856.1A
Other languages
Chinese (zh)
Inventor
赖丹宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN201911323856.1A priority Critical patent/CN111160434A/en
Publication of CN111160434A publication Critical patent/CN111160434A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The invention relates to a biological recognition technology, and discloses a training method and a device of a target detection model and a computer readable storage medium, wherein the training method of the target detection model comprises the following steps: providing a target detection model, wherein the target detection model comprises a backbone network and a detection layer; acquiring a target detection data set, wherein the target detection data set comprises target object images of different categories; cutting out images of different target objects through a frame based on the target detection data set to form classification data sets of different categories; training a backbone network of the target detection model by using the classification data set; freezing a backbone network and finely adjusting a detection layer of the target detection model according to the detection target detection data set; receiving an image of a target object to be identified; and identifying the image of the target object to be identified based on the trained target detection model. The invention improves the training speed and precision of the target detection model, shortens the training time and improves the training efficiency.

Description

Training method and device of target detection model and computer readable storage medium
Technical Field
The invention relates to the technical field of biological recognition, in particular to a training method and a training device for a target detection model and a computer readable storage medium.
Background
The task of object detection is to find all objects of interest (objects) in the image, determine their position and size. Because various objects have different appearances, shapes and postures, and interference of factors such as illumination, shielding and the like during imaging is added, target detection is always the most challenging problem in the field of machine vision. Object detection has many mature applications in many computer vision fields, such as face detection, pedestrian detection, image retrieval, video surveillance, and the like.
The existing target detection method mainly migrates classification pre-training models on the imagenet for fine adjustment when training, the types and the applicability of the pre-training models are limited, and redesigning a network and training on a large data set such as the imagenet are required to identify the type of a target object, and simultaneously, the specific position of the target object is required to be known, so that the method is time-consuming.
Disclosure of Invention
The invention provides a training method and a training device for a target detection model and a computer readable storage medium, and mainly aims to separately train a backbone network and a detection layer of the detection model by using a classification data set without detecting the information of a specific position of a target object, thereby shortening the training time.
In order to achieve the above object, the present invention provides a training method of a target detection model, including:
providing a target detection model, wherein the target detection model comprises a backbone network and a detection layer;
acquiring a target detection data set, wherein the target detection data set comprises target object images of different categories;
cutting out images of different target objects through a frame based on the target detection data set to form classification data sets of different categories;
training a backbone network of the target detection model by using the classification dataset;
freezing the backbone network and finely adjusting a detection layer of the target detection model according to the target detection data set;
receiving an image of a target object to be identified;
and identifying the image of the target object to be identified based on the trained target detection model.
Optionally, the step of fine-tuning the detection layer of the target detection model according to the target detection data set includes:
the detection layer comprises a first detection sublayer and a second detection sublayer, and the first detection sublayer is subjected to fine adjustment on the image with the pixel value in the classified data set larger than the reference pixel value; fine-tuning the second detection sublayer for images in the classification dataset having pixel values less than or equal to reference pixel values.
Optionally, the step of acquiring a target detection data set includes:
receiving video data and extracting each frame of picture of the video data;
and labeling the human head in each frame of picture by adopting a data labeling tool so as to generate the target detection data set.
Optionally, the step of acquiring a target detection data set further comprises:
classifying the plurality of different image samples into a complex image sample class and a simple image sample class;
extracting complex image features according to a plurality of image samples contained in the complex image sample class;
and extracting simple image features according to the plurality of image samples contained in the simple image sample class and the extracted complex image features.
Optionally, the step of classifying the plurality of different image samples into a complex image sample class and a simple image sample class includes:
obtaining the classification loss rate of the plurality of different image samples;
and classifying the samples with the classification loss rate larger than a preset threshold value into a complex image sample class, and classifying the samples with the classification loss rate smaller than or equal to the preset threshold value into a simple image sample class.
The present invention also provides an electronic device, including a memory and a processor, where the memory stores a training program of an object detection model that is executable on the processor, and the training program of the object detection model, when executed by the processor, implements the following steps:
providing a target detection model, wherein the target detection model comprises a backbone network and a detection layer;
acquiring a target detection data set, wherein the target detection data set comprises target object images of different categories;
cutting out images of different target objects through a frame based on the target detection data set to form classification data sets of different categories;
training a backbone network of the target detection model by using the classification dataset;
freezing the backbone network and finely adjusting a detection layer of the target detection model according to the target detection data set;
receiving an image of a target object to be identified;
and identifying the image of the target object to be identified based on the trained target detection model.
Optionally, the step of fine-tuning the detection layer of the target detection model according to the target detection data set includes:
the detection layer comprises a first detection sublayer and a second detection sublayer, and the first detection sublayer is subjected to fine adjustment on the image with the pixel value in the classified data set larger than the reference pixel value; fine-tuning the second detection sublayer for images in the classification dataset having pixel values less than or equal to reference pixel values.
Optionally, the step of acquiring a target detection data set includes:
receiving video data and extracting each frame of picture of the video data;
and labeling the human head in each frame of picture by adopting a data labeling tool so as to generate the target detection data set.
Optionally, the step of acquiring a target detection data set further comprises:
classifying the plurality of different image samples into a complex image sample class and a simple image sample class;
extracting complex image features according to a plurality of image samples contained in the complex image sample class;
and extracting simple image features according to the plurality of image samples contained in the simple image sample class and the extracted complex image features.
In addition, to achieve the above object, the present invention also provides a computer readable storage medium, which stores thereon a training program of an object detection model, the training program of the object detection model being executable by one or more processors to implement the steps of the training method of the object detection model described above.
The training method and device of the target detection model and the computer readable storage medium provided by the invention separately train the backbone network and the detection layer of the detection model by using the classification data set without detecting the information of the specific position of the target object, thereby improving the training speed and precision of the target detection model, shortening the training time and improving the training efficiency.
Drawings
Fig. 1 is a schematic flowchart of a training method of a target detection model according to an embodiment of the present invention;
fig. 2 is a schematic diagram illustrating an internal structure of an electronic device according to an embodiment of the invention;
fig. 3 is a block diagram illustrating a training procedure based on a target detection model in an electronic device according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides a training method of a target detection model. Fig. 1 is a schematic flow chart of a training method of a target detection model according to an embodiment of the present invention. The method may be performed by a device, which may be implemented by software and/or hardware, and in this embodiment, the device is an intelligent terminal.
In this embodiment, the training method of the target detection model includes:
s101, providing a target detection model, wherein the target detection model comprises a backbone network and a detection layer;
s102, acquiring a target detection data set, wherein the target detection data set comprises target object images of different categories;
s103, cutting out images of different target objects through a frame based on the target detection data set to form classification data sets of different categories;
s104, training a backbone network of the target detection model by using a classification data set;
s105, freezing the backbone network and finely adjusting a detection layer of the target detection model according to the target detection data set; specifically, freezing the backbone network means not updating the model parameters of the backbone network, and updating the model parameters of the detection layer of the target detection model according to the target detection data set means updating the model parameters of the detection layer according to the target detection data set until the loss function of the target detection model does not decrease;
s106, receiving an image of a target object to be identified;
and S107, identifying the image of the target object to be identified based on the trained target detection model.
The step of fine-tuning the detection layer of the target detection model according to the target detection dataset comprises:
the detection layer comprises a first detection sublayer and a second detection sublayer, and the first detection sublayer is subjected to fine adjustment on the image with the pixel value in the classified data set larger than the reference pixel value; fine-tuning the second detection sublayer for images in the classification dataset having pixel values less than or equal to reference pixel values.
The step of obtaining a target detection data set comprises:
receiving video data and extracting each frame of picture of the video data;
and labeling the human head in each frame of picture by adopting a data labeling tool so as to generate the target detection data set.
The step of obtaining a target detection data set further comprises:
classifying the plurality of different image samples into a complex image sample class and a simple image sample class;
extracting complex image features according to a plurality of image samples contained in the complex image sample class;
and extracting simple image features according to the plurality of image samples contained in the simple image sample class and the extracted complex image features.
The step of classifying the plurality of different image samples into a complex image sample class and a simple image sample class comprises:
obtaining the classification loss rate of the plurality of different image samples;
and classifying the samples with the classification loss rate larger than a preset threshold value into a complex image sample class, and classifying the samples with the classification loss rate smaller than or equal to the preset threshold value into a simple image sample class.
The classification loss rate is obtained based on the ratio of the number of lost features of each image sample to the number of originally included features.
In the process of classifying multiple image samples, some features may be lost from the image samples. Assuming that the number of features originally included in each image sample is a1, and the number of features lost in the classification process is a2, the classification loss rate is the ratio a1/a2 of the number of features lost to the number of features originally included in each image sample. It will be appreciated that the classification loss rate is relatively high for a large number of missing features and relatively low for a small number of missing features.
After the classification loss rate of each image sample is obtained, the image samples with the classification loss rate larger than a preset threshold value are classified into a complex image sample class, and the image samples with the classification loss rate smaller than or equal to the preset threshold value are classified into a simple image sample class, so that the image samples are classified into levels, and the complex image sample class is preferentially used for training a neural network in the deep learning process.
The step of cropping out different target objects through a border based on the target detection dataset to form different categories of classification datasets comprises:
labeling cropping targets of a plurality of image samples contained in the complex image sample class based on the complex image features;
labeling, based on the simple image features, crop targets of a plurality of image samples included in the simple image sample class.
The convolution layer of the target detection model is located in a convolutional neural network, each convolution layer of the convolutional neural network is composed of a plurality of convolution units, and parameters of each convolution unit are obtained through optimization of a back propagation algorithm.
The convolution operation aims to extract different input features, the first layer of convolution layer may only extract some low-level features such as edges, lines, angles and other levels, and more layers of networks can iteratively extract more complex features from the low-level features.
The target detection has a plurality of mature applications in the field of computer vision, such as face detection, pedestrian detection, image retrieval, video monitoring and the like.
The training method of the target detection model can be applied to the field of human head detection, the participation rate is automatically counted according to the number of all human heads in a detected human head counting meeting room in a meeting room scene, the number of people is prevented from being counted manually, a large amount of time and manpower are saved, the human heads can be better visualized and verified whether the number of the obtained human heads is correct or not by framing the specific positions of the obtained human heads in a picture, the model only needs to be divided into two types, namely the human heads and the non-human heads, the video is converted into a frame of picture by collecting the class videos of students, and a data labeling tool is adopted to label all the human heads in the picture to generate a training data set.
The training method for the target detection model provided by the embodiment utilizes the classification data set to separately train the backbone network and the detection layer of the detection model, does not need to detect the information of the specific position of the target object, improves the training speed and precision of the target detection model, shortens the training time, and improves the training efficiency.
The invention also provides an electronic device 1. Fig. 2 is a schematic view of an internal structure of an electronic device according to an embodiment of the invention.
In this embodiment, the electronic device 1 may be a computer, an intelligent terminal or a server. The electronic device 1 comprises at least a memory 11, a processor 13, a communication bus 15, and a network interface 17. In this embodiment, the electronic device 1 is an intelligent terminal.
The memory 11 includes at least one type of readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The memory 11 may in some embodiments be an internal storage unit of the electronic device, for example a hard disk of the electronic device. The memory 11 may be an external storage device of the electronic apparatus in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a flash Card (FlashCard), and the like, which are provided on the electronic apparatus. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic apparatus. The memory 11 may be used not only to store application software installed in the electronic apparatus 1 and various types of data, such as a code of the living body detection program 111, but also to temporarily store data that has been output or is to be output.
The processor 13 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor or other data Processing chip in some embodiments, and is used for executing program codes stored in the memory 11 or Processing data.
The communication bus 15 is used to realize connection communication between these components.
The network interface 17 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), and is typically used to establish a communication link between the electronic apparatus 1 and other electronic devices.
Optionally, the electronic device 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface may also comprise a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device and for displaying a visualized user interface.
While FIG. 2 shows only the electronic device 1 with the components 11-17, those skilled in the art will appreciate that the configuration shown in FIG. 2 does not constitute a limitation of the electronic device, and may include fewer or more components than shown, or some components in combination, or a different arrangement of components.
In the embodiment of the electronic device 1 shown in fig. 2, the memory 11 stores therein a training program 111 of the object detection model; the processor 13 implements the following steps when executing the training program 111 of the object detection model stored in the memory 11:
providing a target detection model, wherein the target detection model comprises a backbone network and a detection layer;
acquiring a target detection data set, wherein the target detection data set comprises target object images of different categories;
cutting out images of different target objects through a frame based on the target detection data set to form classification data sets of different categories;
training a backbone network of the target detection model by using the classification dataset;
freezing the backbone network and finely adjusting a detection layer of the target detection model according to the target detection data set; specifically, freezing the backbone network means not updating the model parameters of the backbone network, and updating the model parameters of the detection layer of the target detection model according to the target detection data set means updating the model parameters of the detection layer according to the target detection data set until the loss function of the target detection model does not decrease;
receiving an image of a target object to be identified;
and identifying the image of the target object to be identified based on the trained target detection model.
The step of fine-tuning the detection layer of the target detection model according to the target detection dataset comprises:
the detection layer comprises a first detection sublayer and a second detection sublayer, and the first detection sublayer is subjected to fine adjustment on the image with the pixel value in the classified data set larger than the reference pixel value; fine-tuning the second detection sublayer for images in the classification dataset having pixel values less than or equal to reference pixel values.
The step of obtaining a target detection data set comprises:
receiving video data and extracting each frame of picture of the video data;
and labeling the human head in each frame of picture by adopting a data labeling tool so as to generate the target detection data set.
The step of obtaining a target detection data set further comprises:
classifying the plurality of different image samples into a complex image sample class and a simple image sample class;
extracting complex image features according to a plurality of image samples contained in the complex image sample class;
and extracting simple image features according to the plurality of image samples contained in the simple image sample class and the extracted complex image features.
The step of classifying the plurality of different image samples into a complex image sample class and a simple image sample class comprises:
obtaining the classification loss rate of the plurality of different image samples;
and classifying the samples with the classification loss rate larger than a preset threshold value into a complex image sample class, and classifying the samples with the classification loss rate smaller than or equal to the preset threshold value into a simple image sample class.
The classification loss rate is obtained based on the ratio of the number of lost features of each image sample to the number of originally included features.
In the process of classifying multiple image samples, some features may be lost from the image samples. Assuming that the number of features originally included in each image sample is a1, and the number of features lost in the classification process is a2, the classification loss rate is the ratio a1/a2 of the number of features lost to the number of features originally included in each image sample. It will be appreciated that the classification loss rate is relatively high for a large number of missing features and relatively low for a small number of missing features.
After the classification loss rate of each image sample is obtained, the image samples with the classification loss rate larger than a preset threshold value are classified into a complex image sample class, and the image samples with the classification loss rate smaller than or equal to the preset threshold value are classified into a simple image sample class, so that the image samples are classified into levels, and the complex image sample class is preferentially used for training a neural network in the deep learning process.
The step of cropping out different target objects through a border based on the target detection dataset to form different categories of classification datasets comprises:
labeling cropping targets of a plurality of image samples contained in the complex image sample class based on the complex image features;
labeling, based on the simple image features, crop targets of a plurality of image samples included in the simple image sample class.
The convolution layer of the target detection model is located in a convolutional neural network, each convolution layer of the convolutional neural network is composed of a plurality of convolution units, and parameters of each convolution unit are obtained through optimization of a back propagation algorithm.
The convolution operation aims to extract different input features, the first layer of convolution layer may only extract some low-level features such as edges, lines, angles and other levels, and more layers of networks can iteratively extract more complex features from the low-level features.
The target detection has a plurality of mature applications in the field of computer vision, such as face detection, pedestrian detection, image retrieval, video monitoring and the like.
The training method of the target detection model can be applied to the field of human head detection, the participation rate is automatically counted according to the number of all human heads in a detected human head counting meeting room in a meeting room scene, the number of people is prevented from being counted manually, a large amount of time and manpower are saved, the human heads can be better visualized and verified whether the number of the obtained human heads is correct or not by framing the specific positions of the obtained human heads in a picture, the model only needs to be divided into two types, namely the human heads and the non-human heads, the video is converted into a frame of picture by collecting the class videos of students, and a data labeling tool is adopted to label all the human heads in the picture to generate a training data set.
The electronic device provided by the embodiment separately trains the backbone network and the detection layer of the detection model by using the classification data set, does not need to detect the information of the specific position of the target object, improves the training speed and precision of the target detection model, shortens the training time, and improves the training efficiency.
Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium has stored thereon a training program 111 of an object detection model, and the training program 111 of the object detection model is executable by one or more processors to implement the following operations:
providing a target detection model, wherein the target detection model comprises a backbone network and a detection layer;
acquiring a target detection data set, wherein the target detection data set comprises target object images of different categories;
cutting out images of different target objects through a frame based on the target detection data set to form classification data sets of different categories;
training a backbone network of the target detection model by using the classification dataset;
freezing the backbone network and finely adjusting a detection layer of the target detection model according to the target detection data set;
receiving an image of a target object to be identified;
and identifying the image of the target object to be identified based on the trained target detection model.
The embodiment of the computer readable storage medium of the present invention is substantially the same as the embodiments of the electronic device and the method, and will not be described herein in a repeated manner.
Alternatively, in other embodiments, the training program 111 of the object detection model may be further divided into one or more modules, and the one or more modules are stored in the memory 11 and executed by one or more processors (in this embodiment, the processor 13) to implement the present invention, where the module referred to in the present invention refers to a series of computer program instruction segments capable of performing a specific function for describing the execution process of the training program of the object detection model in the electronic device.
For example, referring to fig. 3, a schematic diagram of program modules of a training program 111 of an object detection model in an embodiment of the electronic device of the present invention is shown, in this embodiment, the training program 111 of the object detection model may be divided into a providing module 10, an obtaining module 20, a clipping module 30, a training module 40, a freezing module 50, a receiving module 60, and a recognition module 70, which exemplarily:
the providing module 10 is configured to provide a target detection model, where the target detection model includes a backbone network and a detection layer;
the acquiring module 20 is configured to acquire a target detection data set, where the target detection data set includes target object images of different categories;
the cutting module 30 is configured to cut out images of different target objects through a frame based on the target detection data set to form different categories of classification data sets;
the training module 40 is configured to train a backbone network of the target detection model by using a classification data set;
the freezing module 50 is configured to freeze the backbone network and perform fine tuning on a detection layer of the target detection model according to the target detection data set; specifically, freezing the backbone network means not updating the model parameters of the backbone network, and updating the model parameters of the detection layer of the target detection model according to the target detection data set means updating the model parameters of the detection layer according to the target detection data set until the loss function of the target detection model does not decrease;
the receiving module 60 is configured to receive an image of a target object to be identified;
the recognition module 70 is configured to recognize the image of the target object to be recognized based on the trained target detection model.
The functions or operation steps implemented when the program modules such as the providing module 10, the obtaining module 20, the clipping module 30, the training module 40, the freezing module 50, the receiving module 60, and the identifying module 70 are executed are substantially the same as those of the above embodiments, and are not described herein again.
It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A training method of an object detection model is characterized in that the training method of the object detection model comprises the following steps:
providing a target detection model, wherein the target detection model comprises a backbone network and a detection layer;
acquiring a target detection data set, wherein the target detection data set comprises target object images of different categories;
cutting out images of different target objects through a frame based on the target detection data set to form classification data sets of different categories;
training a backbone network of the target detection model by using the classification dataset;
freezing the backbone network and finely adjusting a detection layer of the target detection model according to the target detection data set;
receiving an image of a target object to be identified;
and identifying the image of the target object to be identified based on the trained target detection model.
2. The method of claim 1, wherein the step of fine-tuning the detection layer of the object detection model according to the object detection data set comprises:
the detection layer comprises a first detection sublayer and a second detection sublayer, and the first detection sublayer is subjected to fine adjustment on the image with the pixel value in the classified data set larger than the reference pixel value; fine-tuning the second detection sublayer for images in the classification dataset having pixel values less than or equal to reference pixel values.
3. The method of training an object detection model of claim 1, wherein the step of acquiring an object detection data set comprises:
receiving video data and extracting each frame of picture of the video data;
and labeling the human head in each frame of picture by adopting a data labeling tool so as to generate the target detection data set.
4. The method of training an object detection model of claim 3, wherein the step of obtaining an object detection data set further comprises:
classifying the plurality of different image samples into a complex image sample class and a simple image sample class;
extracting complex image features according to a plurality of image samples contained in the complex image sample class;
and extracting simple image features according to the plurality of image samples contained in the simple image sample class and the extracted complex image features.
5. The method of claim 4, wherein the step of classifying the plurality of different image samples into a complex image sample class and a simple image sample class comprises:
obtaining the classification loss rate of the plurality of different image samples;
and classifying the samples with the classification loss rate larger than a preset threshold value into a complex image sample class, and classifying the samples with the classification loss rate smaller than or equal to the preset threshold value into a simple image sample class.
6. An electronic device, comprising a memory and a processor, wherein the memory stores a training program of an object detection model executable on the processor, and the training program of the object detection model when executed by the processor implements the steps of:
providing a target detection model, wherein the target detection model comprises a backbone network and a detection layer;
acquiring a target detection data set, wherein the target detection data set comprises target object images of different categories;
cutting out images of different target objects through a frame based on the target detection data set to form classification data sets of different categories;
training a backbone network of the target detection model by using the classification dataset;
freezing the backbone network and finely adjusting a detection layer of the target detection model according to the target detection data set;
receiving an image of a target object to be identified;
and identifying the image of the target object to be identified based on the trained target detection model.
7. The electronic device of claim 6, wherein the step of fine-tuning a detection layer of the object detection model according to the object detection dataset comprises:
the detection layer comprises a first detection sublayer and a second detection sublayer, and the first detection sublayer is subjected to fine adjustment on the image with the pixel value in the classified data set larger than the reference pixel value; fine-tuning the second detection sublayer for images in the classification dataset having pixel values less than or equal to reference pixel values.
8. The electronic device of claim 6, wherein the step of acquiring a target detection data set comprises:
receiving video data and extracting each frame of picture of the video data;
and labeling the human head in each frame of picture by adopting a data labeling tool so as to generate the target detection data set.
9. The electronic device of claim 8, wherein the step of acquiring a target detection data set further comprises:
classifying the plurality of different image samples into a complex image sample class and a simple image sample class;
extracting complex image features according to a plurality of image samples contained in the complex image sample class;
and extracting simple image features according to the plurality of image samples contained in the simple image sample class and the extracted complex image features.
10. A computer-readable storage medium, having stored thereon a training program of an object detection model, the training program of the object detection model being executable by one or more processors to implement the steps of the training method of the object detection model according to any one of claims 1 to 5.
CN201911323856.1A 2019-12-19 2019-12-19 Training method and device of target detection model and computer readable storage medium Pending CN111160434A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911323856.1A CN111160434A (en) 2019-12-19 2019-12-19 Training method and device of target detection model and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911323856.1A CN111160434A (en) 2019-12-19 2019-12-19 Training method and device of target detection model and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN111160434A true CN111160434A (en) 2020-05-15

Family

ID=70557483

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911323856.1A Pending CN111160434A (en) 2019-12-19 2019-12-19 Training method and device of target detection model and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111160434A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112488098A (en) * 2020-11-16 2021-03-12 浙江新再灵科技股份有限公司 Training method of target detection model
CN112580734A (en) * 2020-12-25 2021-03-30 深圳市优必选科技股份有限公司 Target detection model training method, system, terminal device and storage medium
CN112749802A (en) * 2021-01-25 2021-05-04 深圳力维智联技术有限公司 Neural network model training method and device and computer readable storage medium
CN113361487A (en) * 2021-07-09 2021-09-07 无锡时代天使医疗器械科技有限公司 Foreign matter detection method, device, equipment and computer readable storage medium
CN114140637A (en) * 2021-10-21 2022-03-04 阿里巴巴达摩院(杭州)科技有限公司 Image classification method, storage medium and electronic device
WO2022156061A1 (en) * 2021-01-22 2022-07-28 平安科技(深圳)有限公司 Image model training method and apparatus, electronic device, and storage medium
CN115100419A (en) * 2022-07-20 2022-09-23 中国科学院自动化研究所 Target detection method and device, electronic equipment and storage medium
WO2022252089A1 (en) * 2021-05-31 2022-12-08 京东方科技集团股份有限公司 Training method for object detection model, and object detection method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101922964B1 (en) * 2017-06-27 2018-11-28 아주대학교산학협력단 Apparatus and method for recovering image using image distortion detection
CN110211173A (en) * 2019-04-03 2019-09-06 中国地质调查局发展研究中心 A kind of paleontological fossil positioning and recognition methods based on deep learning
CN110503097A (en) * 2019-08-27 2019-11-26 腾讯科技(深圳)有限公司 Training method, device and the storage medium of image processing model
CN110533103A (en) * 2019-08-30 2019-12-03 的卢技术有限公司 A kind of lightweight wisp object detection method and system
CN110533051A (en) * 2019-08-02 2019-12-03 中国民航大学 Contraband automatic testing method in X-ray safety check image based on convolutional neural networks
EP3579147A1 (en) * 2018-06-08 2019-12-11 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Image processing method and electronic device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101922964B1 (en) * 2017-06-27 2018-11-28 아주대학교산학협력단 Apparatus and method for recovering image using image distortion detection
EP3579147A1 (en) * 2018-06-08 2019-12-11 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Image processing method and electronic device
CN110211173A (en) * 2019-04-03 2019-09-06 中国地质调查局发展研究中心 A kind of paleontological fossil positioning and recognition methods based on deep learning
CN110533051A (en) * 2019-08-02 2019-12-03 中国民航大学 Contraband automatic testing method in X-ray safety check image based on convolutional neural networks
CN110503097A (en) * 2019-08-27 2019-11-26 腾讯科技(深圳)有限公司 Training method, device and the storage medium of image processing model
CN110533103A (en) * 2019-08-30 2019-12-03 的卢技术有限公司 A kind of lightweight wisp object detection method and system

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112488098A (en) * 2020-11-16 2021-03-12 浙江新再灵科技股份有限公司 Training method of target detection model
CN112580734A (en) * 2020-12-25 2021-03-30 深圳市优必选科技股份有限公司 Target detection model training method, system, terminal device and storage medium
CN112580734B (en) * 2020-12-25 2023-12-29 深圳市优必选科技股份有限公司 Target detection model training method, system, terminal equipment and storage medium
WO2022156061A1 (en) * 2021-01-22 2022-07-28 平安科技(深圳)有限公司 Image model training method and apparatus, electronic device, and storage medium
CN112749802A (en) * 2021-01-25 2021-05-04 深圳力维智联技术有限公司 Neural network model training method and device and computer readable storage medium
CN112749802B (en) * 2021-01-25 2024-02-09 深圳力维智联技术有限公司 Training method and device for neural network model and computer readable storage medium
WO2022252089A1 (en) * 2021-05-31 2022-12-08 京东方科技集团股份有限公司 Training method for object detection model, and object detection method and device
CN113361487A (en) * 2021-07-09 2021-09-07 无锡时代天使医疗器械科技有限公司 Foreign matter detection method, device, equipment and computer readable storage medium
CN114140637A (en) * 2021-10-21 2022-03-04 阿里巴巴达摩院(杭州)科技有限公司 Image classification method, storage medium and electronic device
CN114140637B (en) * 2021-10-21 2023-09-12 阿里巴巴达摩院(杭州)科技有限公司 Image classification method, storage medium and electronic device
CN115100419A (en) * 2022-07-20 2022-09-23 中国科学院自动化研究所 Target detection method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111160434A (en) Training method and device of target detection model and computer readable storage medium
CN107895367B (en) Bone age identification method and system and electronic equipment
WO2019109526A1 (en) Method and device for age recognition of face image, storage medium
CN110705405B (en) Target labeling method and device
CN107223246B (en) Image labeling method and device and electronic equipment
US10635946B2 (en) Eyeglass positioning method, apparatus and storage medium
CN109145759B (en) Vehicle attribute identification method, device, server and storage medium
CN110543857A (en) Contraband identification method, device and system based on image analysis and storage medium
CN110610169B (en) Picture marking method and device, storage medium and electronic device
CN111160169B (en) Face detection method, device, equipment and computer readable storage medium
CN112307853A (en) Detection method of aerial image, storage medium and electronic device
CN109389096B (en) Detection method and device
CN110020653A (en) Image, semantic dividing method, device and computer readable storage medium
CN113496208B (en) Video scene classification method and device, storage medium and terminal
CN111104841A (en) Violent behavior detection method and system
CN110796069A (en) Behavior detection method, system, equipment and machine readable medium
CN112464890A (en) Face recognition control method, device, equipment and storage medium
CN111353429A (en) Interest degree method and system based on eyeball turning
CN113011403B (en) Gesture recognition method, system, medium and device
CN111291761B (en) Method and device for recognizing text
CN106339684A (en) Pedestrian detection method, device and vehicle
CN111401438B (en) Image sorting method, device and system
CN110796071B (en) Behavior detection method, system, machine-readable medium and device
CN112215221A (en) Automatic vehicle frame number identification method
CN116824135A (en) Atmospheric natural environment test industrial product identification and segmentation method based on machine vision

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination