CN111079638A

CN111079638A - Target detection model training method, device and medium based on convolutional neural network

Info

Publication number: CN111079638A
Application number: CN201911279230.5A
Authority: CN
Inventors: 王奇锋; 童亨成; 王杰; 仵浩; 王胜; 刘明月; 靳朋伟; 章星星; 何良语
Original assignee: Hebei Aier Industrial Internet Technology Co Ltd
Current assignee: Hebei Aier Industrial Internet Technology Co Ltd
Priority date: 2019-12-13
Filing date: 2019-12-13
Publication date: 2020-04-28

Abstract

The application discloses a target detection model training method, equipment and a medium based on a convolutional neural network, wherein the method comprises the following steps: inputting the images in the training set into the target detection model one by one; the images in the training set are provided with target frames containing detection targets and detection target IDs, and a group of pre-selection frames with different scales and different positions; calculating the intersection ratio of each preselected frame and the target frame; extracting the label characteristic data of the preselected frame with the intersection ratio being more than or equal to a set threshold value as a positive sample, and extracting the label characteristic data of the preselected frame with the intersection ratio being less than the set threshold value as a negative sample; the tag characteristic data comprises a detection target ID, a point coordinate value and a center coordinate value of a preselected frame; the target detection model is trained with positive and negative examples. By adding the central label characteristic data and correspondingly adding the central loss function, the training precision of the target detection model is improved, and the detection precision of the target detection model is improved.

Description

Target detection model training method, device and medium based on convolutional neural network

Technical Field

The present disclosure relates generally to the field of target detection technology, and more particularly, to a convolutional neural network-based target detection model training method, apparatus, and medium.

Background

Image classification, detection and segmentation are three major tasks in the field of computer vision. An image classification model is a model that classifies images into a single class, usually corresponding to the most prominent objects in the image. However, many pictures of the real world usually contain more than one object, and this is very rough and inaccurate if a single label is assigned to the image using the image classification model. For such cases, an object detection model is needed, which can identify multiple objects in a picture and locate different objects (given a bounding box). Object detection is useful in many scenarios, such as unmanned and security systems.

The existing target detection model training method based on the convolutional neural network also has the problems of low recognition precision and false recognition due to a plurality of reasons.

Disclosure of Invention

In view of the above-mentioned drawbacks and deficiencies in the prior art, it is desirable to provide a convolutional neural network-based object detection model training method, device, and medium with high detection accuracy.

In a first aspect, the present application provides a convolutional neural network-based target detection model training method, including the following steps:

inputting the images in the training set into the target detection model one by one; the images in the training set are provided with target frames containing detection targets and detection target IDs, and a group of pre-selection frames with different scales and different positions;

calculating the intersection ratio of each preselected frame and the target frame;

extracting the label characteristic data of the preselected frame with the intersection ratio being more than or equal to a set threshold value as a positive sample, and extracting the label characteristic data of the preselected frame with the intersection ratio being less than the set threshold value as a negative sample; the tag characteristic data comprises a detection target ID, a point coordinate value and a center coordinate value of the preselected frame;

training the target detection model with the positive and negative examples.

According to the technical solution provided by the embodiment of the present application,

the central coordinate value is calculated by the following formula:

b_x＝δ(t_x)+c_x

b_y＝δ(t_y)+c_y

where δ represents the following function:

wherein (b)_x,b_y) As a central coordinate value, b_wWidth of center coordinate, b_hIs the height of the central coordinate; (t)_x,t_y) For predicting coordinate values of lower right points in tag feature data, t_wTo predict the width of the bottom right point in the tag feature data, t_hIs the height of the lower right point.

The convolutional neural network-based target detection model training method of claim 1, wherein the target detection model comprises the following loss functions:

wherein mask is the target ID of the target frame; class _ num is the number of target classes; yi i^trueA y-axis position value of the actual center coordinate of the target frame; yi i^preA y-axis position value of the predicted central coordinate of the target frame;

box _ num is the number of detection boxes generated by model feedforward; ki^trueActual coordinate information; ki^preTo predict coordinate information; (x, y) are center coordinates; (w, h) is the intersection point coordinate of the preselection frame and the target frame; ci^trueIs the actual confidence; ci^preIs the prediction confidence; the ignore is a predicted target ID in the predicted tag feature data.

According to the technical scheme provided by the embodiment of the application, the training set is obtained by acquiring an image set containing a detection target through an actual scene and processing the image set through the following steps:

screening out an image meeting the requirement from an image set containing a detection target obtained through an actual scene;

performing data enhancement on the screened image to obtain an expanded data set;

and equally dividing the extended data set to obtain a training set, a testing set and a verification set.

According to the technical scheme provided by the embodiment of the application, the data enhancement method comprises the steps of picture turning, picture translation and picture sharpening.

According to the technical scheme provided by the embodiment of the application, the method further comprises the following steps:

and after the target detection model is trained and evaluated by using the test set, the target detection model is retrained again by using the false detection image in the test set.

In a second aspect, the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor, when executing the computer program, implements the steps of the convolutional neural network-based object detection model training method as described in any item above.

In a third aspect, the present application provides a computer-readable storage medium having a computer program, wherein the computer program is configured to, when executed by a processor, implement the steps of the convolutional neural network-based object detection model training method as described in any one of the above.

According to the template detection model training method, the central label characteristic data are added, the central loss function is correspondingly added, the training precision of the target detection model is improved, and therefore the detection precision of the target detection model is improved. In the technical scheme of this application, select positive sample and negative sample through adopting the intersection to compare, improved the extraction precision of positive sample and negative sample, help the training precision of model.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is a flowchart of a first embodiment of the present application;

FIG. 2 is a schematic block diagram of a second embodiment of the present application;

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Please refer to fig. 1, which is a method for training a target detection model based on a convolutional neural network according to this embodiment, the method includes the following steps:

s10, inputting the images in the training set into the target detection model one by one; the images in the training set are provided with target frames containing detection targets and detection target IDs, and a group of pre-selection frames with different scales and different positions;

the detection target ID is determined according to the target to be identified, for example, if the target of the target detection model is to identify the sample a, the sample B, and the sample C, the ID of the detection target may be 01, 02, 03, respectively.

s20, calculating the intersection ratio of each pre-selected frame and the target frame; the larger the intersection ratio is, the more the overlapped part of the pre-selection frame and the target frame is represented;

s30, extracting the label characteristic data of the preselected frame with the intersection ratio being more than or equal to the set threshold value as a positive sample, and extracting the label characteristic data of the preselected frame with the intersection ratio being less than the set threshold value as a negative sample; the tag characteristic data comprises a detection target ID, a point coordinate value and a center coordinate value of the preselected frame;

in this embodiment, the threshold is set to 0.6, and in other embodiments, the threshold may be set to other values between 0.5 and 0.7.

s40, training the target detection model with the positive and negative examples.

In this embodiment, the central coordinate value is calculated by the following formula:

b_x＝δ(t_x)+c_x

b_y＝δ(t_y)+c_y

where δ represents the following function:

the main function of the function is to use t_xScaled to between 0-1 to ensure that the center point is within the area of the preselected frame of features to be extracted.

Wherein (b)_x,b_y) As a central coordinate value, b_wIs the width of the central coordinate (i.e., the x-axis value in the coordinate system), b_hAs the center coordinate height (i.e., as the y-axis value in the coordinate system); (t)_x,t_y) For predicting coordinate values of lower right points in tag feature data, t_wTo predict the width of the bottom right point in the tag feature data (i.e., the x-axis value in the coordinate system), t_hIs the height of the lower right point (i.e., the y-axis value in the coordinate system).

Correspondingly, in this embodiment, the target detection model includes the following loss function:

In the embodiment, the target detection model increases the central coordinate in the training process, and adaptively increases the central coordinate loss function, so that the training precision of the target detection model is improved.

Preferably, the training set is obtained by acquiring an image set containing a detection target through an actual scene through the following steps:

screening out an image meeting the requirement from an image set containing a detection target obtained through an actual scene; screening of images is generally achieved by manual screening.

The image selection method expands the diversity of training data and makes up for the limitation of actually acquired images, so that more negative samples can be provided for the target detection model, and the detection precision of the target detection model is further improved.

Preferably, after the target detection model is evaluated through the test set test training, the target detection model is retrained again by using the false detection images in the test set. Therefore, the detection precision of the target detection model can be further improved.

Further, preferably, the number of feature fusion layers of the target detection model in this embodiment is four, so as to extract more information; in addition, in this embodiment, the target detection model does not change the number of scaling layers of the picture on the basis of increasing the number of feature fusion layers, and in order to reduce the loss of model feed-forward information, the features are extracted by using dilation convolution.

Example two:

the present embodiment provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the object detection method according to any one of the above items are implemented. As shown in fig. 2, the terminal device is, for example, a computer, and the computer system includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section into a Random Access Memory (RAM) 503. In the RAM503, various programs and data necessary for system operation are also stored. The CPU 501, ROM 502, and RAM503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output section including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The drives are also connected to the I/O interface 505 as needed. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.

In particular, according to an embodiment of the present invention, the processes described above with reference to the flowcharts of fig. 1 to 2 may be implemented as computer software programs. For example, an embodiment of the invention includes a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section, and/or installed from a removable medium. The above-described functions defined in the system of the present application are executed when the computer program is executed by the Central Processing Unit (CPU) 501.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present invention may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves. The described units or modules may also be provided in a processor, and may be described as: a processor comprises a first generation module, an acquisition module, a search module, a second generation module and a merging module. The names of these units or modules do not in some cases form a limitation to the units or modules themselves, for example, the input module may also be described as "an acquisition module for acquiring a plurality of instances to be detected in the base table".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to implement the object detection method as described in the above embodiments.

For example, the electronic device may implement the steps as shown in fig. 1.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Moreover, although the steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware.

The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts of the present invention. The foregoing is only a preferred embodiment of the present invention, and it should be noted that there are objectively infinite specific structures due to the limited character expressions, and it will be apparent to those skilled in the art that a plurality of modifications, decorations or changes may be made without departing from the principle of the present invention, and the technical features described above may be combined in a suitable manner; such modifications, variations, combinations, or adaptations of the invention using its spirit and scope, as defined by the claims, may be directed to other uses and embodiments.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by a person skilled in the art that the scope of the invention as referred to in the present application is not limited to the embodiments with a specific combination of the above-mentioned features, but also covers other embodiments with any combination of the above-mentioned features or their equivalents without departing from the inventive concept. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. The target detection model training method based on the convolutional neural network is characterized by comprising the following steps of:

training the target detection model with the positive and negative examples.

2. The convolutional neural network-based object detection model training method as claimed in claim 1,

the central coordinate value is calculated by the following formula:

b_x＝δ(t_x)+c_x

b_y＝δ(t_y)+c_y

where δ represents the following function:

3. The convolutional neural network-based target detection model training method of claim 1, wherein the target detection model comprises the following loss functions:

class loss function

Position loss function

Center loss function

Confidence loss function

4. The convolutional neural network-based object detection model training method as claimed in any one of claims 2 to 3, wherein the training set is obtained by acquiring an image set containing a detection object through an actual scene through the following steps:

5. The convolutional neural network-based target detection model training method of claim 4, wherein the data enhancement method comprises picture flipping, picture panning and picture sharpening.

6. The convolutional neural network-based target detection model training method of claim 5, further comprising:

7. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor when executing the computer program implements the steps of the convolutional neural network based object detection model training method as claimed in any one of claims 1 to 6.

8. A computer-readable storage medium having a computer program, wherein the computer program is adapted to perform the steps of the convolutional neural network-based object detection model training method according to any one of claims 1 to 6 when executed by a processor.